Re: Cannot build ports on fresh 10.0

2013-06-04 Thread mdf
On Tue, Jun 4, 2013 at 3:07 PM, hiren panchasara  wrote:

> On Tue, Jun 4, 2013 at 2:42 PM,   wrote:
> > I installed a new VM with 10.0 today from this .iso:
> >
> >
> ftp://ftp.freebsd.org/pub/FreeBSD/snapshots/ISO-IMAGES/amd64/10.0/FreeBSD-10.0-CURRENT-amd64-20130601-r251213-release.iso
> >
> > And I'm behind a firewall and I'm not sure I can use pkg(1); my one
> attempt
> > failed:
> >
> > # pkg install m4
> > Updating repository catalogue
> > Repository catalogue is up-to-date, no need to fetch fresh copy
> > pkg: Package 'm4' was not found in the repositories
> >
> > So I'm building from source.  But the configure step of various ports
> fails
> > like so:
> >
> > checking whether make sets $(MAKE)... yes
> > checking build system type... Invalid configuration
> > `amd64-portbld-freebsd10.0': machine `amd64-portbld' not recognized
> > configure: error: /bin/sh libltdl/config/config.sub
> > amd64-portbld-freebsd10.0 failed
> > ===>  Script "configure" failed unexpectedly.
> > Please report the problem to autoto...@freebsd.org [maintainer] and
> attach
> > the "/usr/ports/devel/libtool/work/libtool-2.4.2/config.log" including
> the
> > output of the failure of your make command. Also, it might be a good
> idea to
> > provide an overview of all packages installed on your system (e.g. a
> > /usr/local/sbin/pkg-static info -g -Ea).
> > *** Error code 1
>
> _Probably_ similar to this?
> http://lists.freebsd.org/pipermail/freebsd-ports/2013-June/084040.html
>

Looks very likely.  I'm testing the patch now.  My rudimentary search of
-ports wasn't good enough, I guess.

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Cannot build ports on fresh 10.0

2013-06-04 Thread mdf
I installed a new VM with 10.0 today from this .iso:

ftp://ftp.freebsd.org/pub/FreeBSD/snapshots/ISO-IMAGES/amd64/10.0/FreeBSD-10.0-CURRENT-amd64-20130601-r251213-release.iso

And I'm behind a firewall and I'm not sure I can use pkg(1); my one attempt
failed:

# pkg install m4
Updating repository catalogue
Repository catalogue is up-to-date, no need to fetch fresh copy
pkg: Package 'm4' was not found in the repositories

So I'm building from source.  But the configure step of various ports fails
like so:

checking whether make sets $(MAKE)... yes
checking build system type... Invalid configuration
`amd64-portbld-freebsd10.0': machine `amd64-portbld' not recognized
configure: error: /bin/sh libltdl/config/config.sub
amd64-portbld-freebsd10.0 failed
===>  Script "configure" failed unexpectedly.
Please report the problem to autoto...@freebsd.org [maintainer] and attach
the "/usr/ports/devel/libtool/work/libtool-2.4.2/config.log" including the
output of the failure of your make command. Also, it might be a good idea to
provide an overview of all packages installed on your system (e.g. a
/usr/local/sbin/pkg-static info -g -Ea).
*** Error code 1

Any ideas what could be wrong?  It's a fresh install with default options,
so it seems hard to believe I managed to screw something up already.
Posting on -current since this happens to many of the ports when I try to
build them.

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [RFC] use a shared lock for VOP_GETEXTATTR

2013-03-27 Thread mdf
On Wed, Mar 27, 2013 at 10:32 PM, Konstantin Belousov
 wrote:
> On Wed, Mar 27, 2013 at 06:37:51PM -0700, m...@freebsd.org wrote:
>> VOP_GETEXTATTR is currently called with an exclusive lock, which seems
>> like overkill for what is essentially a read operation.  I had a look
>> over the various in-tree filesystems and it didn't look like any of
>> them will have a problem if a shared-mode lock is used for
>> vop_getextattr.
>>
>> Does anyone know otherwise?  Is someone using extended attributes
>> regularly who can test this?
>
> I think this change should be fine. At least it seems to for UFS.
>
> What other filesystems did you audited ?

I looked over zfs, pseudofs, unionfs, ffs/ufs.  None seemed to have
any asserts on the lock type nor anything that looked like it would
try to modify anything.  zfs, I think it was, even used VOP_ISLOCKED
to get the lock type in one path (but I think that was after a
lookup(), so it may have been on a different vnode).

Cheers,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


[RFC] use a shared lock for VOP_GETEXTATTR

2013-03-27 Thread mdf
VOP_GETEXTATTR is currently called with an exclusive lock, which seems
like overkill for what is essentially a read operation.  I had a look
over the various in-tree filesystems and it didn't look like any of
them will have a problem if a shared-mode lock is used for
vop_getextattr.

Does anyone know otherwise?  Is someone using extended attributes
regularly who can test this?

[sorry if this is a duplicate; I first sent from my non-FreeBSD mail]

Thanks,
matthew


0001-Use-a-shared-lock-for-VOP_GETEXTATTR-as-it-is-a-read.patch
Description: Binary data
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: kmem_map auto-sizing and size dependencies

2013-01-18 Thread mdf
On Fri, Jan 18, 2013 at 7:29 AM, Andre Oppermann  wrote:
> The (inital?) size of the kmem_map is determined by some voodoo magic,
> a sprinkle of nmbclusters * PAGE_SIZE incrementor and lots of tunables.
> However it seems to work out to an effective kmem_map_size of about 58MB
> on my 16GB AMD64 dev machine:
>
> vm.kvm_size: 549755809792
> vm.kvm_free: 530233421824
> vm.kmem_size: 16,594,300,928
> vm.kmem_size_min: 0
> vm.kmem_size_max: 329,853,485,875
> vm.kmem_size_scale: 1
> vm.kmem_map_size: 59,518,976
> vm.kmem_map_free: 16,534,777,856
>
> The kmem_map serves kernel malloc (via UMA), contigmalloc and everthing
> else that uses UMA for memory allocation.
>
> Mbuf memory too is managed by UMA which obtains the backing kernel memory
> from the kmem_map.  The limits of the various mbuf memory types have
> been considerably raised recently and may make use of 50-75% of all
> physically
> present memory, or available KVM space, whichever is smaller.
>
> Now my questions/comments are:
>
>  Does the kmem_map automatically extend itself if more memory is requested?

Not that I recall.

>  Should it be set to a larger initial value based on min(physical,KVM) space
>  available?

It needs to be smaller than the physical space, because the only limit
on the kernel's use of (pinned) memory is the size of the map.  So if
it is too large there is nothing to stop the kernel from consuming all
available memory.  The lowmem handler is called when running out of
virtual space only (i.e. a failure to allocate a range in the map).

>  The naming and output of the various vm.kmem_* and vm.kvm_* sysctls is
>  confusing and not easy to reconcile.  Either we need some more detailing
>  more aspects or less.  Plus perhaps sysctl subtrees to better describe the
>  hierarchy of the maps.
>
>  Why are separate kmem submaps being used?  Is it to limit memory usage of
>  certain subsystems?  Are those limits actually enforced?

I mostly know about memguard, since I added memguard_fudge().  IIRC
some of the submaps are used.  The memguard_map specifically is used
to know whether an allocation is guarded or not, so at free(9) it can
be handled as normal malloc() or as memguard.

Cheers,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: "Memory modified after free" - by whom?

2012-12-10 Thread mdf
On Mon, Dec 10, 2012 at 3:10 PM, Adrian Chadd  wrote:
> 9216 sounds like a jumbo frame mbuf. So the NIC is writing to an mbuf
> after it's finalised/freed.
>
> I have a similar bug showing up on ath(4) RX. :(

Compile with DEBUG_MEMGUARD in the kernel configuration, and then set
vm.memguard.desc to the name of the UMA zone used for the 9216 byte
allocations, mbuf_jumbo_9k.  This should cause a panic when the memory
is touched after free.

Cheers,
matthew

> On 10 December 2012 14:22, Garrett Cooper  wrote:
>> I noticed this while checking the logs on one of my test boxes
>> after restarting the network. Any idea where I should start looking
>> into this (has IPv6 enabled but wasn't using it, em/cxgbe/ixgbe
>> interfaces with the ixgbe interfaces lagged previously, but now not)?
>> It looks suspiciously like the same size as a jumbo frame, but I'm not
>> 100% sure if that's the real problem.
>> Thanks,
>> -Garrett
>>
>> Dec 10 14:03:12 wf158 kernel: em0: link state changed to DOWN
>> Dec 10 14:03:13 wf158 kernel: ix0: link state changed to DOWN
>> Dec 10 14:03:13 wf158 kernel: ix0: link state changed to UP
>> Dec 10 14:03:13 wf158 kernel: ix1: link state changed to DOWN
>> Dec 10 14:03:13 wf158 kernel: ix1: link state changed to UP
>> Dec 10 14:03:13 wf158 kernel: ix0: link state changed to DOWN
>> Dec 10 14:03:13 wf158 kernel: ix0: link state changed to UP
>> Dec 10 14:03:15 wf158 kernel: em0: link state changed to UP
>> Dec 10 14:03:15 wf158 dhclient: New IP Address (em0): 10.7.169.89
>> Dec 10 14:03:15 wf158 dhclient: New Subnet Mask (em0): 255.255.240.0
>> Dec 10 14:03:15 wf158 dhclient: New Broadcast Address (em0): 10.7.175.255
>> Dec 10 14:03:15 wf158 dhclient: New Routers (em0): 10.7.160.1
>> Dec 10 14:03:16 wf158 kernel: ix0: link state changed to DOWN
>> Dec 10 14:03:16 wf158 kernel: ix0: link state changed to UP
>> Dec 10 14:03:31 wf158 kernel: in6_purgeaddr: err=65, destination
>> address delete failed
>> Dec 10 14:03:31 wf158 dhclient[4539]: My address (10.7.169.89) was
>> deleted, dhclient exiting
>> Dec 10 14:03:32 wf158 dhclient[4521]: short write: wanted 20 got 0 bytes
>> Dec 10 14:03:32 wf158 dhclient[4521]: exiting.
>> Dec 10 14:03:33 wf158 kernel: em0: link state changed to DOWN
>> Dec 10 14:03:33 wf158 kernel: ix1: link state changed to DOWN
>> Dec 10 14:03:34 wf158 kernel: ix1: link state changed to UP
>> Dec 10 14:03:34 wf158 kernel: ix1: link state changed to DOWN
>> Dec 10 14:03:34 wf158 kernel: ix1: link state changed to UP
>> Dec 10 14:03:34 wf158 kernel: ix0: link state changed to DOWN
>> Dec 10 14:03:34 wf158 kernel: ix0: link state changed to UP
>> Dec 10 14:03:34 wf158 kernel: ix1: link state changed to DOWN
>> Dec 10 14:03:34 wf158 kernel: ix1: link state changed to UP
>> Dec 10 14:03:36 wf158 kernel: em0: link state changed to UP
>> Dec 10 14:03:36 wf158 dhclient: New IP Address (em0): 10.7.169.89
>> Dec 10 14:03:36 wf158 dhclient: New Subnet Mask (em0): 255.255.240.0
>> Dec 10 14:03:36 wf158 dhclient: New Broadcast Address (em0): 10.7.175.255
>> Dec 10 14:03:36 wf158 dhclient: New Routers (em0): 10.7.160.1
>> Dec 10 14:05:34 wf158 kernel: Memory modified after free
>> 0xff81c016d000(9216) val= @ 0xff81c016d000
>> Dec 10 14:05:35 wf158 kernel: Memory modified after free
>> 0xff81b5cdc000(9216) val= @ 0xff81b5cdc000
>>
>> # uname -a
>> FreeBSD wf158.west.isilon.com 10.0-CURRENT FreeBSD 10.0-CURRENT #1
>> r+2760369-dirty: Mon Dec 10 08:04:46 PST 2012
>> r...@wf158.west.isilon.com:/usr/obj/usr/src/sys/ISI-GENERIC  amd64
>> ___
>> freebsd-...@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
> ___
> freebsd-current@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panic: sbuf_trim makes no sense on sbuf 0xffffff82434d8898 with drain

2012-11-25 Thread mdf
On Sun, Nov 25, 2012 at 2:26 PM, Niclas Zeising
 wrote:
> Hi!
> I consistently get this panic while trying to boot a kernel build from
> r243530.  It happens when the entropy harvesting rc.d script starts. r243380
> worked fine, I haven't tested any revisions in between. Attached is the
> backtrace from the kernel, as gotten by kgdb.  The machine uses zfs as a
> root pool, and there have been churn in this area.  To my untrained eyes,
> however, the issue seem related to hdaa.c. Please let me know if I can
> provide any more information.

r243530 added the new sysctl that is causing panic.  I'm not sure why
there's an sbuf_trim() call; there shouldn't be more than a few \n at
the end.  IMO the sbuf_trim() can be eliminated.

Alternately, the panic check can be removed and we could allow
sbuf_trim() to remove any un-emitted whitespace for an sbuf with
drain.

CC'ing mav@ who introduced the code.  (I introduced sbuf drains).

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: FILE's _file can only hold a short

2012-11-02 Thread mdf
On Thu, Nov 1, 2012 at 7:41 PM, Eitan Adler  wrote:
> On 1 November 2012 10:40, Ian Lepore  wrote:
>> On Wed, 2012-10-31 at 11:12 -0700, m...@freebsd.org wrote:
>>> I seem to recall a thread earlier on this limitation, but looking at
>>> actual libc/stdio sources, the 4 year old check for open(2)'s fd being
>>> less than SHRT_MAX is still there.  I thought I saw a patch to change
>>> this to an int, but it's not in the tree.  Was this in a PR or a
>>> mailing list thread or am I just imagining things?
>>>
>>> We've run into this limitation at work, where some processes have
>>> around 32k open file descriptors and then try to use the libc FILE
>>> interface.  Since we control ABI we can just change this to int, but I
>>> had been hoping there was a FreeBSD revision we could pull instead of
>>> having another diff.
>>
>> FWIW, I also remember some discussion recently (this year) on some
>> mailing list about this, but I can't find it now.  I thought it was
>> somehow related to in-lib versus external uses of the funopen()
>> function, but I may be conflating two unrelated discusssions in my head.
>
> Perhaps 
> http://freebsd.1045724.n5.nabble.com/stdio-and-short-file-descriptors-revisited-td5747703.html
> ?

Yes, that was it exactly.  Thanks!  My (quick) search had not been fruitful.

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


FILE's _file can only hold a short

2012-10-31 Thread mdf
I seem to recall a thread earlier on this limitation, but looking at
actual libc/stdio sources, the 4 year old check for open(2)'s fd being
less than SHRT_MAX is still there.  I thought I saw a patch to change
this to an int, but it's not in the tree.  Was this in a PR or a
mailing list thread or am I just imagining things?

We've run into this limitation at work, where some processes have
around 32k open file descriptors and then try to use the libc FILE
interface.  Since we control ABI we can just change this to int, but I
had been hoping there was a FreeBSD revision we could pull instead of
having another diff.

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: make tinderbox failures

2012-10-26 Thread mdf
On Fri, Oct 26, 2012 at 3:24 PM, Warner Losh  wrote:
>
> On Oct 26, 2012, at 1:08 PM, Garrett Cooper wrote:
>
>> On Fri, Oct 26, 2012 at 10:26 AM,   wrote:
>>> Running make tinderbox locally I see failures that aren't being
>>> reported by the automated tinderbox builds.  I'm curious what's
>>> different about the environment.  Failures are in the following:
>>>
>>> arm ARMADAXP kernel failed, check _.arm.ARMADAXP for details
>>> mips SENTRY5 kernel failed, check _.mips.SENTRY5 for details
>>> mips XLPN32 kernel failed, check _.mips.XLPN32 for details
>>> mips XLR kernel failed, check _.mips.XLR for details
>>> mips XLRN32 kernel failed, check _.mips.XLRN32 for details
>>>
>>> _.arm.ARMADAXP:
>>> /data/sb/bsd.git/sys/arm/mv/armadaxp/armadaxp_mp.c: In function
>>> 'platform_mp_start_ap':
>>> /data/sb/bsd.git/sys/arm/mv/armadaxp/armadaxp_mp.c:181: warning:
>>> passing argument 1 of 'pmap_kextract' makes integer from pointer
>>> without a cast
>
> This still needs some love...
>
>>> _.mips.SENTRY5:
>>> config: Error: device "siba" is unknown
>
> OK.  Traced this down and fixed it.  I'd planned on moving siba to files, and 
> it looks like I never finished that.  My bad.
>
>>> _.mips.XLPN32 and _.mips.XLRN32:
>>> linking kernel.debug
>>> stack_machdep.o: In function `stack_capture':
>>> /data/sb/bsd.git/sys/mips/mips/stack_machdep.c:(.text+0x34): undefined
>>> reference to `stack_zero'
>>> /data/sb/bsd.git/sys/mips/mips/stack_machdep.c:(.text+0x34):
>>> relocation truncated to fit: R_MIPS_26 against `stack_zero'
>>> /data/sb/bsd.git/sys/mips/mips/stack_machdep.c:(.text+0x13c):
>>> undefined reference to `stack_put'
>>> /data/sb/bsd.git/sys/mips/mips/stack_machdep.c:(.text+0x13c):
>>> relocation truncated to fit: R_MIPS_26 against `stack_put'
>
> stack_machdep needs to depend on options stack or ddb.  Fixed.  Unsure why 
> other kernels didn't uncover this.
>
>>> _.mips.XLR:
>>> linking kernel.debug
>>> board.o: In function `xlr_board_info_setup':
>>> /data/sb/bsd.git/sys/mips/rmi/board.c:(.text+0x3a4): undefined
>>> reference to `__ucmpdi2'
>>> /data/sb/bsd.git/sys/mips/rmi/board.c:(.text+0x3a4): relocation
>>> truncated to fit: R_MIPS_26 against `__ucmpdi2'
>>> /data/sb/bsd.git/sys/mips/rmi/board.c:(.text+0x3e4): undefined
>>> reference to `__ucmpdi2'
>>> /data/sb/bsd.git/sys/mips/rmi/board.c:(.text+0x3e4): relocation
>>> truncated to fit: R_MIPS_26 against `__ucmpdi2'
>
> Fixed earlier this week.
>
>>> Since I don't work on arm or mips I generally ignore these.  But it
>>> makes it harder to have confidence in a global change when make
>>> tinderbox throws errors locally.
>>
>> These are probably due to the changes that Warner made recently to the
>> mips conf files.
>
> Some of them...

Thanks for fixes!  Is the reason the public tinderbox didn't see these
because it has explicit lists of arch/CONF kernels to build, while
"make tinderbox" on my machine just builds all the capitalized files
as configuration files?

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


make tinderbox failures

2012-10-26 Thread mdf
Running make tinderbox locally I see failures that aren't being
reported by the automated tinderbox builds.  I'm curious what's
different about the environment.  Failures are in the following:

arm ARMADAXP kernel failed, check _.arm.ARMADAXP for details
mips SENTRY5 kernel failed, check _.mips.SENTRY5 for details
mips XLPN32 kernel failed, check _.mips.XLPN32 for details
mips XLR kernel failed, check _.mips.XLR for details
mips XLRN32 kernel failed, check _.mips.XLRN32 for details

_.arm.ARMADAXP:
/data/sb/bsd.git/sys/arm/mv/armadaxp/armadaxp_mp.c: In function
'platform_mp_start_ap':
/data/sb/bsd.git/sys/arm/mv/armadaxp/armadaxp_mp.c:181: warning:
passing argument 1 of 'pmap_kextract' makes integer from pointer
without a cast

_.mips.SENTRY5:
config: Error: device "siba" is unknown

_.mips.XLPN32 and _.mips.XLRN32:
linking kernel.debug
stack_machdep.o: In function `stack_capture':
/data/sb/bsd.git/sys/mips/mips/stack_machdep.c:(.text+0x34): undefined
reference to `stack_zero'
/data/sb/bsd.git/sys/mips/mips/stack_machdep.c:(.text+0x34):
relocation truncated to fit: R_MIPS_26 against `stack_zero'
/data/sb/bsd.git/sys/mips/mips/stack_machdep.c:(.text+0x13c):
undefined reference to `stack_put'
/data/sb/bsd.git/sys/mips/mips/stack_machdep.c:(.text+0x13c):
relocation truncated to fit: R_MIPS_26 against `stack_put'

_.mips.XLR:
linking kernel.debug
board.o: In function `xlr_board_info_setup':
/data/sb/bsd.git/sys/mips/rmi/board.c:(.text+0x3a4): undefined
reference to `__ucmpdi2'
/data/sb/bsd.git/sys/mips/rmi/board.c:(.text+0x3a4): relocation
truncated to fit: R_MIPS_26 against `__ucmpdi2'
/data/sb/bsd.git/sys/mips/rmi/board.c:(.text+0x3e4): undefined
reference to `__ucmpdi2'
/data/sb/bsd.git/sys/mips/rmi/board.c:(.text+0x3e4): relocation
truncated to fit: R_MIPS_26 against `__ucmpdi2'

Since I don't work on arm or mips I generally ignore these.  But it
makes it harder to have confidence in a global change when make
tinderbox throws errors locally.

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panic with DEBUG_MEMGUARD on PowerPC

2012-07-15 Thread mdf
On Sat, Jul 14, 2012 at 8:39 AM, Justin Hibbits  wrote:
> On Jul 13, 2012, at 12:20 AM, m...@freebsd.org wrote:
>
>> On Thu, Jul 12, 2012 at 6:33 PM, Justin Hibbits 
>> wrote:
>>>
>>> On Jul 12, 2012, at 9:11 PM, m...@freebsd.org wrote:
>>>
>>>> On Thu, Jul 12, 2012 at 4:43 PM, Justin Hibbits 
>>>> wrote:
>>>>>
>>>>>
>>>>> When tracking down a panic exposed by INVARIANTS, I tried setting
>>>>> DEBUG_MEMGUARD, so I could find the culprit that's trashing freed
>>>>> memory.
>>>>> However, this causes a panic at bootup.  It shows up right after the
>>>>> first
>>>>> WARNING: WITNESS message, with the following:
>>>>>
>>>>> Tracing, and printf() debugging, I see arguments to vm_map_findspace():
>>>>> start: 0xD000, length: 4246446080, and map->max_offset =
>>>>> 4026531839.
>>>>>
>>>>> Beyond that, I'm lost with tracking this down.  Machine is a dual
>>>>> processor
>>>>> PowerPC G4, with 2GB RAM.
>>>>
>>>>
>>>>
>>>> The length is 0xFD1BA000 which is almost 4GB.  Asking for 4GB of
>>>> virtual space for 2GB of RAM sounds about right (it's been a while
>>>> since I was in this code), unless this is a 32-bit kernel, in which
>>>> case it'd be too much since there isn't that much virtual space
>>>> available.
>>>>
>>>> So, is the kernel 32-bit?  What are the values used and returned by
>>>> memguard_fudge()?  The intent of that routine is to get kmeminit() to
>>>> allocate a larger map so memguard can use part of it for private
>>>> virtual addresses.  But it shouldn't be asking for "too much"; i.e.
>>>> the intent was to check both physical and virtual space available and
>>>> be greedy, but not too greedy.
>>>>
>>>> There were some issues with that code for some platforms that e.g.
>>>> didn't define a VM_KMEM_SIZE_MAX, but alc@ fixed that in r216425.
>>>
>>>
>>> It is a 32-bit kernel, on 32-bit hardware.  The values for memguard_fudge
>>> are (defaults):
>>>
>>> tmp: 4246446080, vm_kmem_size: 117440512, vm_kmem_size_max: 0
>>>
>>> When setting vm.kmem_size/vm.kmem_size_max to 2GB they are:
>>>
>>> tmp: 2147483648, vm_kmem_size: 214793648, vm_kmem_sizee_max: 2147483648
>>> (all
>>> 2GB).
>>>
>>> But the start and map->max_offset remain the same on all runs I make.
>>
>>
>> memguard_fudge is still broken for 32-bit architectures with no
>> vm_kmem_max.  In the absence of a km_max to limit the value, we
>> essentially use twice the physical memory for the virtual limit.  But
>> with 2GB on a 32-bit machine, this requires 4GB of virtual space.
>>
>> Setting vm_kmem_size_max to 2GB should work; I'd expect to see
>> tmp=about 200MB, which is much larger than the input 112MB but the
>> allocation should work.  But I don't really know what else PowerPC has
>> need of for virtual space, so that still could be too large.
>>
>> You can try smaller values of vm_kmem_size_max, like 1GB or 512MB.
>> You shouldn't need to set vm_kmem_size at all.  At some point the
>> added space for the memguard_map will be small enough that the
>> kmem_suballoc will work.
>>
>> Hmm, what is the min_offset and max_offset of kernel_map when the call
>> to memguard_fudge is made?
>>
>> Thanks,
>> matthew
>
>
>
> Without setting vm.kmem_size/vm.kmem_size_max, I see the following:
>
> map: 0x100, min_offset: 0xD000, max_offset: 0xEFFF
>
> It does boot when I set vm.kmem_size=256M/vm.kmem_size_max=512M.
>
> When I tried 512M/1024M, it panicked at the same place -- kmem_suballoc from
> kmeminit.  So it looks like I have to set vm.kmem_size/vm.kmem_size_max way
> back in order for it to boot with memguard(9).

Please try the attached patch (or at
http://people.freebsd.org/~mdf/memguard.diff).

Thanks,
matthew


memguard.diff
Description: Binary data
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: panic with DEBUG_MEMGUARD on PowerPC

2012-07-14 Thread mdf
On Sat, Jul 14, 2012 at 8:39 AM, Justin Hibbits  wrote:
> On Jul 13, 2012, at 12:20 AM, m...@freebsd.org wrote:
>
>> On Thu, Jul 12, 2012 at 6:33 PM, Justin Hibbits 
>> wrote:
>>>
>>> On Jul 12, 2012, at 9:11 PM, m...@freebsd.org wrote:
>>>
 On Thu, Jul 12, 2012 at 4:43 PM, Justin Hibbits 
 wrote:
>
>
> When tracking down a panic exposed by INVARIANTS, I tried setting
> DEBUG_MEMGUARD, so I could find the culprit that's trashing freed
> memory.
> However, this causes a panic at bootup.  It shows up right after the
> first
> WARNING: WITNESS message, with the following:
>
> Tracing, and printf() debugging, I see arguments to vm_map_findspace():
> start: 0xD000, length: 4246446080, and map->max_offset =
> 4026531839.
>
> Beyond that, I'm lost with tracking this down.  Machine is a dual
> processor
> PowerPC G4, with 2GB RAM.



 The length is 0xFD1BA000 which is almost 4GB.  Asking for 4GB of
 virtual space for 2GB of RAM sounds about right (it's been a while
 since I was in this code), unless this is a 32-bit kernel, in which
 case it'd be too much since there isn't that much virtual space
 available.

 So, is the kernel 32-bit?  What are the values used and returned by
 memguard_fudge()?  The intent of that routine is to get kmeminit() to
 allocate a larger map so memguard can use part of it for private
 virtual addresses.  But it shouldn't be asking for "too much"; i.e.
 the intent was to check both physical and virtual space available and
 be greedy, but not too greedy.

 There were some issues with that code for some platforms that e.g.
 didn't define a VM_KMEM_SIZE_MAX, but alc@ fixed that in r216425.
>>>
>>>
>>> It is a 32-bit kernel, on 32-bit hardware.  The values for memguard_fudge
>>> are (defaults):
>>>
>>> tmp: 4246446080, vm_kmem_size: 117440512, vm_kmem_size_max: 0
>>>
>>> When setting vm.kmem_size/vm.kmem_size_max to 2GB they are:
>>>
>>> tmp: 2147483648, vm_kmem_size: 214793648, vm_kmem_sizee_max: 2147483648
>>> (all
>>> 2GB).
>>>
>>> But the start and map->max_offset remain the same on all runs I make.
>>
>>
>> memguard_fudge is still broken for 32-bit architectures with no
>> vm_kmem_max.  In the absence of a km_max to limit the value, we
>> essentially use twice the physical memory for the virtual limit.  But
>> with 2GB on a 32-bit machine, this requires 4GB of virtual space.
>>
>> Setting vm_kmem_size_max to 2GB should work; I'd expect to see
>> tmp=about 200MB, which is much larger than the input 112MB but the
>> allocation should work.  But I don't really know what else PowerPC has
>> need of for virtual space, so that still could be too large.
>>
>> You can try smaller values of vm_kmem_size_max, like 1GB or 512MB.
>> You shouldn't need to set vm_kmem_size at all.  At some point the
>> added space for the memguard_map will be small enough that the
>> kmem_suballoc will work.
>>
>> Hmm, what is the min_offset and max_offset of kernel_map when the call
>> to memguard_fudge is made?
>>
>> Thanks,
>> matthew
>
>
>
> Without setting vm.kmem_size/vm.kmem_size_max, I see the following:
>
> map: 0x100, min_offset: 0xD000, max_offset: 0xEFFF
>
> It does boot when I set vm.kmem_size=256M/vm.kmem_size_max=512M.
>
> When I tried 512M/1024M, it panicked at the same place -- kmem_suballoc from
> kmeminit.  So it looks like I have to set vm.kmem_size/vm.kmem_size_max way
> back in order for it to boot with memguard(9).

I'll see about crafting a patch when I can get access to a machine
that works (my current woes with my laptop, VM image, etc. are quite
frustrating).  The gist is that, in the absence of a kmem_max, the
routine for determining the size of the kmem map should use the
kernel_map's size as a limiting factor.  In this case it looks like
the map's size is 512MB.

Cheers,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panic with DEBUG_MEMGUARD on PowerPC

2012-07-12 Thread mdf
On Thu, Jul 12, 2012 at 6:33 PM, Justin Hibbits  wrote:
> On Jul 12, 2012, at 9:11 PM, m...@freebsd.org wrote:
>
>> On Thu, Jul 12, 2012 at 4:43 PM, Justin Hibbits 
>> wrote:
>>>
>>> When tracking down a panic exposed by INVARIANTS, I tried setting
>>> DEBUG_MEMGUARD, so I could find the culprit that's trashing freed memory.
>>> However, this causes a panic at bootup.  It shows up right after the
>>> first
>>> WARNING: WITNESS message, with the following:
>>>
>>> Tracing, and printf() debugging, I see arguments to vm_map_findspace():
>>> start: 0xD000, length: 4246446080, and map->max_offset = 4026531839.
>>>
>>> Beyond that, I'm lost with tracking this down.  Machine is a dual
>>> processor
>>> PowerPC G4, with 2GB RAM.
>>
>>
>> The length is 0xFD1BA000 which is almost 4GB.  Asking for 4GB of
>> virtual space for 2GB of RAM sounds about right (it's been a while
>> since I was in this code), unless this is a 32-bit kernel, in which
>> case it'd be too much since there isn't that much virtual space
>> available.
>>
>> So, is the kernel 32-bit?  What are the values used and returned by
>> memguard_fudge()?  The intent of that routine is to get kmeminit() to
>> allocate a larger map so memguard can use part of it for private
>> virtual addresses.  But it shouldn't be asking for "too much"; i.e.
>> the intent was to check both physical and virtual space available and
>> be greedy, but not too greedy.
>>
>> There were some issues with that code for some platforms that e.g.
>> didn't define a VM_KMEM_SIZE_MAX, but alc@ fixed that in r216425.
>
> It is a 32-bit kernel, on 32-bit hardware.  The values for memguard_fudge
> are (defaults):
>
> tmp: 4246446080, vm_kmem_size: 117440512, vm_kmem_size_max: 0
>
> When setting vm.kmem_size/vm.kmem_size_max to 2GB they are:
>
> tmp: 2147483648, vm_kmem_size: 214793648, vm_kmem_sizee_max: 2147483648 (all
> 2GB).
>
> But the start and map->max_offset remain the same on all runs I make.

memguard_fudge is still broken for 32-bit architectures with no
vm_kmem_max.  In the absence of a km_max to limit the value, we
essentially use twice the physical memory for the virtual limit.  But
with 2GB on a 32-bit machine, this requires 4GB of virtual space.

Setting vm_kmem_size_max to 2GB should work; I'd expect to see
tmp=about 200MB, which is much larger than the input 112MB but the
allocation should work.  But I don't really know what else PowerPC has
need of for virtual space, so that still could be too large.

You can try smaller values of vm_kmem_size_max, like 1GB or 512MB.
You shouldn't need to set vm_kmem_size at all.  At some point the
added space for the memguard_map will be small enough that the
kmem_suballoc will work.

Hmm, what is the min_offset and max_offset of kernel_map when the call
to memguard_fudge is made?

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panic with DEBUG_MEMGUARD on PowerPC

2012-07-12 Thread mdf
On Thu, Jul 12, 2012 at 4:43 PM, Justin Hibbits  wrote:
> When tracking down a panic exposed by INVARIANTS, I tried setting
> DEBUG_MEMGUARD, so I could find the culprit that's trashing freed memory.
> However, this causes a panic at bootup.  It shows up right after the first
> WARNING: WITNESS message, with the following:
>
> panic: kmem_suballoc: bad status return of 3
> cpuid = 0
> KDB: stack backtrace:
> 0xd0004ad0: at kdb_backtrace+0x4c
> 0xd0004b40: at panic+0x224
> 0xd0004ba0: at kmem_suballoc+0x8c
> 0xd0004bd0: at kmeminit+0x1ac
> 0xd0004c20: at mi_startup+0x13c
> 0xd0004c50: at btext+0xc0
>
> Tracing, and printf() debugging, I see arguments to vm_map_findspace():
> start: 0xD000, length: 4246446080, and map->max_offset = 4026531839.
>
> Beyond that, I'm lost with tracking this down.  Machine is a dual processor
> PowerPC G4, with 2GB RAM.

The length is 0xFD1BA000 which is almost 4GB.  Asking for 4GB of
virtual space for 2GB of RAM sounds about right (it's been a while
since I was in this code), unless this is a 32-bit kernel, in which
case it'd be too much since there isn't that much virtual space
available.

So, is the kernel 32-bit?  What are the values used and returned by
memguard_fudge()?  The intent of that routine is to get kmeminit() to
allocate a larger map so memguard can use part of it for private
virtual addresses.  But it shouldn't be asking for "too much"; i.e.
the intent was to check both physical and virtual space available and
be greedy, but not too greedy.

There were some issues with that code for some platforms that e.g.
didn't define a VM_KMEM_SIZE_MAX, but alc@ fixed that in r216425.

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: sysctl filesystem ?

2012-06-26 Thread mdf
On Tue, Jun 26, 2012 at 1:59 AM, Robert Watson  wrote:
> On Tue, 26 Jun 2012, Chris Rees wrote:
>
>>> as well as we don't depend of /proc for normal operation we shouldn't for
>>
>> say /proc/sysctl
>>>
>>>
>>> improvements are welcome, better documentation is welcome, changes to
>>
>> what is OK - isn't.
>>
>> /proc/sysctl might be useful.  Just because Linux uses it doesn't make it
>> a bad idea.
>
>
> One of the problems we've encounted with synthetic file systems is that
> off-the-shelf file system tools (e.g., cp, dd, cat) make simplistic (but not
> unreasonable) assumptions about the statistic content of files.  This comes
> up frequently with procfs-like systems where the size of, say, memory map
> data can be considerably larger than the perhaps 128-byte, 256-byte, or even
> 8k buffers that might exist in a stock file access tool.  Unless we change
> all of those tools to use buffers much bigger than they currently do, which
> even suggets changing the C library buffer to defaults for FILE *, that
> places an onus on the file system to provide persisting snapshots of data
> until it's sure that a user process is done -- e.g., over many system calls.
>
> sysctl is not immune to the requirement of atomicity, but it has explicit
> control over it: sysctl is a single system call, rather than an unbounded
> open-read-seek-repeat-etc cycle, and has been carefully crafted to provide
> this and other MIB-like properties, such as a basic data type model so that
> command line tools know how to render content rather than having to guess
> and/or get it wrong.  sysctl has some file-system like properties, but on
> the whole, it's not a file system -- it's much more like an SNMP MIB.
>
> While you can map anything into anything (including Turing machines), I
> think the sysctl command line tool and API, despite its limitations, is a
> better match for accessing this sort of monitoring and control data than the
> POSIX file API, and would recommend against trying to move to a sysctl file
> system.

While I understand the problems you allude to, the sysctl(8) binary
can protect itself from them.  IMO the biggest problem with sysctls
not being files is that it makes no sense from the core UNIX
philosophy that everything is a file.  Sockets and pipes and character
devices and even unseekable things like stdout are files; why aren't
these other objects that allow read, write, and have their own
namespace?

Cheers,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: new panic in cpu_reset() with WITNESS

2012-01-17 Thread mdf
2012/1/17 Gleb Smirnoff :
>  New panic has been introduced somewhere between
> r229851 and r229932, that happens on shutdown if
> kernel has WITNESS and doesn't have WITNESS_SKIPSPIN.
>
> Uptime: 1h0m17s
> Rebooting...
> panic: mtx_lock_spin: recursed on non-recursive mutex cnputs_mtx @ 
> /usr/src/head/sys/kern/kern_cons.c:500
> cpuid = 0
> KDB: enter: panic
> [ thread pid 1 tid 11 ]
> Stopped at      kdb_enter+0x3b: movq    $0,0x514d32(%rip)
> db>
> db> bt
> Tracing pid 1 tid 11 td 0xfe0001d5e000
> kdb_enter() at kdb_enter+0x3b
> panic() at panic+0x1c7
> _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x10f
> cnputs() at cnputs+0x7a
> putchar() at putchar+0x11f
> kvprintf() at kvprintf+0x83
> vprintf() at vprintf+0x85
> printf() at printf+0x67
> witness_checkorder() at witness_checkorder+0x773
> _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x99
> uart_cnputc() at uart_cnputc+0x3e
> cnputc() at cnputc+0x4c
> cnputs() at cnputs+0x26
> putchar() at putchar+0x11f
> kvprintf() at kvprintf+0x83
> vprintf() at vprintf+0x85
> printf() at printf+0x67
> cpu_reset() at cpu_reset+0x81
> kern_reboot() at kern_reboot+0x3a5
> --More--^M        ^Msys_reboot() at sys_reboot+0x42
> amd64_syscall() at amd64_syscall+0x39e
> Xfast_syscall() at Xfast_syscall+0xf7
> --- syscall (55, FreeBSD ELF64, sys_reboot), rip = 0x40ea3c, rsp = 
> 0x7fffd6d8, rbp = 0x49 ---
> db>
> db> show locks
> exclusive sleep mutex Giant (Giant) r = 0 (0x809bc560) locked @ 
> /usr/src/head/sys/kern/kern_module.c:101
> exclusive spin mutex smp rendezvous (smp rendezvous) r = 0 
> (0x80a08840) locked @ /usr/src/head/sys/kern/kern_shutdown.c:542
> db>
>
> So the problem is that we are holding smp rendezvous mutex during the 
> cpu_reset().
> No mutexes should be obtained after it. However, since cpu_reset() does 
> priting
> we obtain cnputs_mtx, and later obtain uart_hwmtx. The latter is hardcoded in
> the subr_witness.c as mutex to obtain before smp rendezvous, this triggers
> yet another printf from witness, that finally panics due to recursing on
> cnputs_mtx.

At $WORK we explicitly marked cnputs_mtx as NO_WITNESS since it didn't
seem possible to fit it into the heirarchy in any sane way, since a
print can come from basically anywhere.

If anyone has a better fix, that'd be great, but I haven't been able
to think of one.

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: SU+J systems do not fsck themselves

2011-12-28 Thread mdf
On Wed, Dec 28, 2011 at 8:54 AM, Maxim Khitrov  wrote:
> On Wed, Dec 28, 2011 at 11:42 AM, Matthias Andree
>  wrote:
>> Am 27.12.2011 22:53, schrieb David Thiel:
>>> I've had multiple machines now (9.0-RC3, amd64, i386 and earlier
>>> 9-CURRENT on ppc) running SU+J that have had unexplained panics and
>>> crashes start happening relating to disk I/O. When I end up running a
>>> full fsck, it keeps turning out that the disk is dirty and corrupted,
>>> but no mechanism is in place with SU+J to detect and fix this. A bgfsck
>>> never happens, but a manual fsck in single-user does indeed fix the
>>> crashing and weird behavior. Others have tested their SU+J volumes and
>>> found them to have errors as well. This makes me super nervous.
>>
>> The one thing I figured is that in the light of power outages, or
>> crashing virtualization hosts, you really really really need to disable
>> disk write caches, and this affects softupdates, journalling, asynch
>> file systems, just about everything.
>>
>> The fact that makes matters worse is that journalling or softupdates
>> allow you to mount a silently-corrupted file system, whereas the
>> traditional UFS/UFS2 sync/asynch mounts will fsck themselves in the
>> foreground, so they get fixed before the FS panics.
>>
>> So can you be sure that:
>>
>> - your driver, chip set and hard disk execute ordered writes in order,

If they don't, it's a bug.  Not that there isn't buggy firmware out
there, but each layer of software does need to rely on the one below
actually doing what it's promised.

>> - your driver, chip set and hard disk actually write data to permanent
>> storage BEFORE acknowledging a successful write?

Not required by SU as they use an explicit BIO_FLUSH which should be
handled by the driver.

>> Whenever I fixed these issues, I had no more corruptions.
>>
>> For ata and sata, there are loader tunables you will want to set,
>> hw.ata.wc=0 and kern.cam.ada.write_cache=0.

This should not be necessary if the driver and firmware are not buggy.

>> If your drives are under ada, ad, or ahci related control, try these
>> settings.  For SCSI, use camcontrol to turn the write cache off.
>> softupdates is supposed to rectify most of the performance penalties
>> incurred.
>>
>> Note also that you needed to set ahci_load=YES and atapicam_load=YES in
>> 8.X, I've never bothered to check 7.X or 9.X WRT these settings.
>
> This is a bit off-topic, but I'm curious what the effect of NCQ is on
> softupdates? Since that too has the ability to reorder writes to disk,
> should it be disabled in addition to cache?

SU doesn't care about write ordering, as long as everything before a
BIO_FLUSH is really flushed by the time the BIO_FLUSH is acknowledged.

Cheers,
matthew

> Also, I would say that if you are using a hardware raid controller
> with a BBU, then allowing the use of controller's cache and write-back
> policy should be safe for use with softupdates. Any caching mechanism,
> for that matter, that has a separate power supply source should be ok.
> For example, the Intel 320 SSDs have a few on-board capacitors that
> are used to flush the cache in the event of a power loss.
>
> - Max
> ___
> freebsd-current@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: extattr_set_*() return type

2011-12-20 Thread mdf
On Tue, Dec 20, 2011 at 1:49 PM, John Baldwin  wrote:
> Hmm, if these functions are expected to operate like 'write(2)' and are
> supposed to return the number of bytes written, shouldn't their return value
> be 'ssize_t' instead of 'int'?  It looks like the system calls themselves
> already do the right thing in setting td_retval[] (they assign a ssize_t to it
> and td_retval[0] can hold a ssize_t on all of our current platforms).  It
> would seem that the only change would be to the header and probably
> syscalls.master.  I guess this would require a symver bump to fix though.

An extended attribute larger than 2GB is a programming abuse, though.
Technically int may not be 32 bits but it is on all supported
platforms now.

Cheers,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Sleeping thread (tid 100033, pid 16): panic in FreeBSD 10.0-CURRENT/amd64 r228662

2011-12-20 Thread mdf
On Tue, Dec 20, 2011 at 6:32 AM, John Baldwin  wrote:
> On Tuesday, December 20, 2011 9:22:48 am m...@freebsd.org wrote:
>> On Tue, Dec 20, 2011 at 5:52 AM, John Baldwin  wrote:
>> > On Saturday, December 17, 2011 10:41:15 pm m...@freebsd.org wrote:
>> >> On Sat, Dec 17, 2011 at 5:45 PM, Alexander Kabaev  
>> >> wrote:
>> >> > On Sun, 18 Dec 2011 01:09:00 +0100
>> >> > "O. Hartmann"  wrote:
>> >> >
>> >> >> Sleeping thread (tid 100033, pid 16) owns a non sleepable lock
>> >> >> panic: sleeping thread
>> >> >> cpuid = 0
>> >> >>
>> >> >> PID 16 is always USB on my box.
>> >> >
>> >> > You really need to give us a backtrace when you quote panics. It is
>> >> > impossible to make any sense of the above panic message without more
>> >> > context.
>> >>
>> >> In the case of this panic, the stack of the thread which panics is
>> >> useless; it's someone trying to propagate priority that discovered it.
>> >>  A backtrace on tid 100033 would be useful.
>> >>
>> >> With WITNESS enabled, it's possible to have this panic display the
>> >> stack of the incorrectly sleeping thread at the time it acquired the
>> >> lock, as well, but this code isn't in CURRENT or any release.  I have
>> >> a patch at $WORK I can dig up on Monday.
>> >
>> > Huh?  The stock kernel dumps a stack trace of the offending thread if you 
>> > have
>> > DDB enabled:
>> >
>> >                /*
>> >                 * If the thread is asleep, then we are probably about
>> >                 * to deadlock.  To make debugging this easier, just
>> >                 * panic and tell the user which thread misbehaved so
>> >                 * they can hopefully get a stack trace from the truly
>> >                 * misbehaving thread.
>> >                 */
>> >                if (TD_IS_SLEEPING(td)) {
>> >                        printf(
>> >                "Sleeping thread (tid %d, pid %d) owns a non-sleepable 
>> > lock\n",
>> >                            td->td_tid, td->td_proc->p_pid);
>> > #ifdef DDB
>> >                        db_trace_thread(td, -1);
>> > #endif
>> >                        panic("sleeping thread");
>> >                }
>>
>> Hmm, maybe this wasn't in 7, or maybe I'm just remembering that we
>> added code to print *which* lock it holds (using WITNESS data).  I do
>> recall that this panic alone was often not sufficient to debug the
>> problem.
>
> I think the db_trace_thread() has been around for a while (since 5 or 6),
> but it is true that we don't tell you which lock is held even with this.
> That might be a useful thing to output before the panic.


This patch isn't quite right since I had to hand-edit it.  There's a
small chance I can commit this in the near future, but of someone else
wants to take it, feel free.  Style isn't yet fixed up to be FreeBSD
standard either.


--- /data/sb/bsd.git/sys/kern/subr_turnstile.c  2011-12-12
10:23:12.542196632 -0800
+++ kern/subr_turnstile.c   2011-12-09 10:59:29.882643558 -0800
@@ -165,10 +165,43 @@
 static voidturnstile_dtor(void *mem, int size, void *arg);
 #endif
 static int turnstile_init(void *mem, int size, int flags);
 static voidturnstile_fini(void *mem, int size);

+#ifdef INVARIANTS
+static void
+sleeping_thread_owns_a_nonsleepable_lock(struct thread *td)
+{
+   printf("Sleeping thread (tid %d, pid %d) owns a non-sleepable lock\n",
+   td->td_tid, td->td_proc->p_pid);
+#ifdef DDB
+   db_trace_thread(td, -1);
+#endif
+#ifdef WITNESS
+   struct lock_list_entry *lock_list, *lle;
+   int i;
+
+   lock_list = td->td_sleeplocks;
+   if (lock_list == NULL || lock_list->ll_count == 0) {
+   printf("Thread does not appear to hold any mutexes!\n");
+   return;
+   }
+
+   for (lle = lock_list; lle != NULL; lle = lle->ll_next) {
+   for (i = lle->ll_count - 1; i >= 0; i--) {
+   struct lock_instance *li = &lle->ll_children[i];
+
+   printf("Lock %s acquired at %s:%d\n",
+   li->li_lock->lo_name, li->li_file, li->li_line);
+   }
+   }
+#endif /* WITNESS */
+}
+#else
+#define sleeping_thread_owns_a_nonsleepable_lock(td) do { } while (0)
+#endif /* INVARIANTS */
+
 /*
  * Walks the chain of turnstiles and their owners to propagate the priority
  * of the thread being blocked to all the threads holding locks that have to
  * release their locks before this thread can run again.
  */
@@ -210,19 +243,31 @@
 * If the thread is asleep, then we are probably about
 * to deadlock.  To make debugging this easier, just
 * panic and tell the user which thread misbehaved so
 * they can hopefully get a stack trace from the truly
 * misbehaving thread.
 */
if (TD_IS_SLEEPING(td)) {
-   printf(
-   "Sleeping thread (tid %d, pid %d) owns a non-sleepable lock\n",
-   td->

Re: Sleeping thread (tid 100033, pid 16): panic in FreeBSD 10.0-CURRENT/amd64 r228662

2011-12-20 Thread mdf
On Tue, Dec 20, 2011 at 5:52 AM, John Baldwin  wrote:
> On Saturday, December 17, 2011 10:41:15 pm m...@freebsd.org wrote:
>> On Sat, Dec 17, 2011 at 5:45 PM, Alexander Kabaev  wrote:
>> > On Sun, 18 Dec 2011 01:09:00 +0100
>> > "O. Hartmann"  wrote:
>> >
>> >> Sleeping thread (tid 100033, pid 16) owns a non sleepable lock
>> >> panic: sleeping thread
>> >> cpuid = 0
>> >>
>> >> PID 16 is always USB on my box.
>> >
>> > You really need to give us a backtrace when you quote panics. It is
>> > impossible to make any sense of the above panic message without more
>> > context.
>>
>> In the case of this panic, the stack of the thread which panics is
>> useless; it's someone trying to propagate priority that discovered it.
>>  A backtrace on tid 100033 would be useful.
>>
>> With WITNESS enabled, it's possible to have this panic display the
>> stack of the incorrectly sleeping thread at the time it acquired the
>> lock, as well, but this code isn't in CURRENT or any release.  I have
>> a patch at $WORK I can dig up on Monday.
>
> Huh?  The stock kernel dumps a stack trace of the offending thread if you have
> DDB enabled:
>
>                /*
>                 * If the thread is asleep, then we are probably about
>                 * to deadlock.  To make debugging this easier, just
>                 * panic and tell the user which thread misbehaved so
>                 * they can hopefully get a stack trace from the truly
>                 * misbehaving thread.
>                 */
>                if (TD_IS_SLEEPING(td)) {
>                        printf(
>                "Sleeping thread (tid %d, pid %d) owns a non-sleepable lock\n",
>                            td->td_tid, td->td_proc->p_pid);
> #ifdef DDB
>                        db_trace_thread(td, -1);
> #endif
>                        panic("sleeping thread");
>                }

Hmm, maybe this wasn't in 7, or maybe I'm just remembering that we
added code to print *which* lock it holds (using WITNESS data).  I do
recall that this panic alone was often not sufficient to debug the
problem.

Thanks,
matthew


> It may be that we can make use of the STACK API here instead to output this
> trace even when DDB isn't enabled.  The patch below tries to do that
> (untested).  It does some odd thigns though since it is effectively running
> from a panic context already, so it uses a statically allocated 'struct stack'
> rather than using stack_create() and uses stack_print_ddb() since it is
> holding spin locks and can't possibly grab an sx lock:
>
> Index: subr_turnstile.c
> ===
> --- subr_turnstile.c    (revision 228534)
> +++ subr_turnstile.c    (working copy)
> @@ -72,6 +72,7 @@ __FBSDID("$FreeBSD$");
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>
> @@ -175,6 +176,7 @@ static void turnstile_fini(void *mem, int size);
>  static void
>  propagate_priority(struct thread *td)
>  {
> +       static struct stack st;
>        struct turnstile *ts;
>        int pri;
>
> @@ -217,8 +219,10 @@ propagate_priority(struct thread *td)
>                        printf(
>                "Sleeping thread (tid %d, pid %d) owns a non-sleepable lock\n",
>                            td->td_tid, td->td_proc->p_pid);
> -#ifdef DDB
> -                       db_trace_thread(td, -1);
> +#ifdef STACK
> +                       stack_zero(&st);
> +                       stack_save_td(&st, td);
> +                       stack_print_ddb(&st);
>  #endif
>                        panic("sleeping thread");
>                }
>
> --
> John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Sleeping thread (tid 100033, pid 16): panic in FreeBSD 10.0-CURRENT/amd64 r228662

2011-12-17 Thread mdf
On Sat, Dec 17, 2011 at 5:45 PM, Alexander Kabaev  wrote:
> On Sun, 18 Dec 2011 01:09:00 +0100
> "O. Hartmann"  wrote:
>
>> Sleeping thread (tid 100033, pid 16) owns a non sleepable lock
>> panic: sleeping thread
>> cpuid = 0
>>
>> PID 16 is always USB on my box.
>
> You really need to give us a backtrace when you quote panics. It is
> impossible to make any sense of the above panic message without more
> context.

In the case of this panic, the stack of the thread which panics is
useless; it's someone trying to propagate priority that discovered it.
 A backtrace on tid 100033 would be useful.

With WITNESS enabled, it's possible to have this panic display the
stack of the incorrectly sleeping thread at the time it acquired the
lock, as well, but this code isn't in CURRENT or any release.  I have
a patch at $WORK I can dig up on Monday.

Cheers,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: SCHED_ULE should not be the default

2011-12-13 Thread mdf
On Tue, Dec 13, 2011 at 3:39 PM, Ivan Klymenko  wrote:
> В Wed, 14 Dec 2011 00:04:42 +0100
> Jilles Tjoelker  пишет:
>
>> On Tue, Dec 13, 2011 at 10:40:48AM +0200, Ivan Klymenko wrote:
>> > If the algorithm ULE does not contain problems - it means the
>> > problem has Core2Duo, or in a piece of code that uses the ULE
>> > scheduler. I already wrote in a mailing list that specifically in
>> > my case (Core2Duo) partially helps the following patch:
>> > --- sched_ule.c.orig        2011-11-24 18:11:48.0 +0200
>> > +++ sched_ule.c     2011-12-10 22:47:08.0 +0200
>> > @@ -794,7 +794,8 @@
>> >      * 1.5 * balance_interval.
>> >      */
>> >     balance_ticks = max(balance_interval / 2, 1);
>> > -   balance_ticks += random() % balance_interval;
>> > +// balance_ticks += random() % balance_interval;
>> > +   balance_ticks += ((int)random()) % balance_interval;
>> >     if (smp_started == 0 || rebalance == 0)
>> >             return;
>> >     tdq = TDQ_SELF();
>>
>> This avoids a 64-bit division on 64-bit platforms but seems to have no
>> effect otherwise. Because this function is not called very often, the
>> change seems unlikely to help.
>
> Yes, this section does not apply to this problem :)
> Just I posted the latest patch which i using now...
>
>>
>> > @@ -2118,13 +2119,21 @@
>> >     struct td_sched *ts;
>> >
>> >     THREAD_LOCK_ASSERT(td, MA_OWNED);
>> > +   if (td->td_pri_class & PRI_FIFO_BIT)
>> > +           return;
>> > +   ts = td->td_sched;
>> > +   /*
>> > +    * We used up one time slice.
>> > +    */
>> > +   if (--ts->ts_slice > 0)
>> > +           return;
>>
>> This skips most of the periodic functionality (long term load
>> balancer, saving switch count (?), insert index (?), interactivity
>> score update for long running thread) if the thread is not going to
>> be rescheduled right now.
>>
>> It looks wrong but it is a data point if it helps your workload.
>
> Yes, I did it for as long as possible to delay the execution of the code in 
> section:
> ...
> #ifdef SMP
>        /*
>         * We run the long term load balancer infrequently on the first cpu.
>         */
>        if (balance_tdq == tdq) {
>                if (balance_ticks && --balance_ticks == 0)
>                        sched_balance();
>        }
> #endif
> ...
>
>>
>> >     tdq = TDQ_SELF();
>> >  #ifdef SMP
>> >     /*
>> >      * We run the long term load balancer infrequently on the
>> > first cpu. */
>> > -   if (balance_tdq == tdq) {
>> > -           if (balance_ticks && --balance_ticks == 0)
>> > +   if (balance_ticks && --balance_ticks == 0) {
>> > +           if (balance_tdq == tdq)
>> >                     sched_balance();
>> >     }
>> >  #endif
>>
>> The main effect of this appears to be to disable the long term load
>> balancer completely after some time. At some point, a CPU other than
>> the first CPU (which uses balance_tdq) will set balance_ticks = 0, and
>> sched_balance() will never be called again.
>>
>
> That is, for the same reason as above in the text...
>
>> It also introduces a hypothetical race condition because the access to
>> balance_ticks is no longer restricted to one CPU under a spinlock.
>>
>> If the long term load balancer may be causing trouble, try setting
>> kern.sched.balance_interval to a higher value with unpatched code.
>
> I checked it in the first place - but it did not help fix the situation...
>
> The impression of malfunction rebalancing...
> It seems that the thread is passed on to the same core that is loaded and 
> so...
> Perhaps this is a consequence of an incorrect definition of the topology CPU?
>
>>
>> > @@ -2144,9 +2153,6 @@
>> >             if
>> > (TAILQ_EMPTY(&tdq->tdq_timeshare.rq_queues[tdq->tdq_ridx]))
>> > tdq->tdq_ridx = tdq->tdq_idx; }
>> > -   ts = td->td_sched;
>> > -   if (td->td_pri_class & PRI_FIFO_BIT)
>> > -           return;
>> >     if (PRI_BASE(td->td_pri_class) == PRI_TIMESHARE) {
>> >             /*
>> >              * We used a tick; charge it to the thread so
>> > @@ -2157,11 +2163,6 @@
>> >             sched_priority(td);
>> >     }
>> >     /*
>> > -    * We used up one time slice.
>> > -    */
>> > -   if (--ts->ts_slice > 0)
>> > -           return;
>> > -   /*
>> >      * We're out of time, force a requeue at userret().
>> >      */
>> >     ts->ts_slice = sched_slice;
>>
>> > and refusal to use options FULL_PREEMPTION
>> > But no one has unsubscribed to my letter, my patch helps or not in
>> > the case of Core2Duo...
>> > There is a suspicion that the problems stem from the sections of
>> > code associated with the SMP...
>> > Maybe I'm in something wrong, but I want to help in solving this
>> > problem ...


Has anyone experiencing problems tried to set sysctl kern.sched.steal_thresh=1 ?

I don't remember what our specific problem at $WORK was, perhaps it
was just interrupt threads not getting serviced fast enough, but we've
hard-coded this to 1 and removed the code that sets it in
sched_initticks().  The same effect should be h

Re: SCHED_ULE should not be the default

2011-12-12 Thread mdf
On Mon, Dec 12, 2011 at 7:32 AM, Gary Jennejohn
 wrote:
> On Mon, 12 Dec 2011 15:13:00 +
> Vincent Hoffman  wrote:
>
>>
>> -BEGIN PGP SIGNED MESSAGE-
>> Hash: SHA1
>>
>> On 12/12/2011 13:47, O. Hartmann wrote:
>> >
>> >> Not fully right, boinc defaults to run on idprio 31 so this isn't an
>> >> issue. And yes, there are cases where SCHED_ULE shows much better
>> >> performance then SCHED_4BSD. [...]
>> >
>> > Do we have any proof at hand for such cases where SCHED_ULE performs
>> > much better than SCHED_4BSD? Whenever the subject comes up, it is
>> > mentioned, that SCHED_ULE has better performance on boxes with a ncpu >
>> > 2. But in the end I see here contradictionary statements. People
>> > complain about poor performance (especially in scientific environments),
>> > and other give contra not being the case.
>> It all a little old now but some if the stuff in
>> http://people.freebsd.org/~kris/scaling/
>> covers improvements that were seen.
>>
>> http://jeffr-tech.livejournal.com/5705.html
>> shows a little too, reading though Jeffs blog is worth it as it has some
>> interesting stuff on SHED_ULE.
>>
>> I thought there were some more benchmarks floating round but cant find
>> any with a quick google.
>>
>>
>> Vince
>>
>> >
>> > Within our department, we developed a highly scalable code for planetary
>> > science purposes on imagery. It utilizes present GPUs via OpenCL if
>> > present. Otherwise it grabs as many cores as it can.
>> > By the end of this year I'll get a new desktop box based on Intels new
>> > Sandy Bridge-E architecture with plenty of memory. If the colleague who
>> > developed the code is willing performing some benchmarks on the same
>> > hardware platform, we'll benchmark bot FreeBSD 9.0/10.0 and the most
>> > recent Suse. For FreeBSD I intent also to look for performance with both
>> > different schedulers available.
>> >
>
> These observations are not scientific, but I have a CPU from AMD with
> 6 cores (AMD Phenom(tm) II X6 1090T Processor).
>
> My simple test was ``make buildkernel'' while watching the core usage with
> gkrellm.
>
> With SCHED_4BSD all 6 cores are loaded to 97% during the build phase.
> I've never seen any value above 97% with gkrellm.
>
> With SCHED_ULE I never saw all 6 cores loaded this heavily.  Usually
> 2 or more cores were at or below 90%.  Not really that significant, but
> still a noticeable difference in apparent scheduling behavior.  Whether
> the observed difference is due to some change in data from the kernel to
> gkrellm is beyond me.

SCHED_ULE is much sloppier about calculating which thread used a
timeslice -- unless the timeslice went 100% to a thread, the fraction
it used may get attributed elsewhere.  So top's reporting of thread
usage is not a useful metric.  Total buildworld time is, potentially.

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Stop scheduler on panic

2011-12-02 Thread mdf
On Fri, Dec 2, 2011 at 2:05 AM, Andriy Gapon  wrote:
> on 02/12/2011 06:36 John Baldwin said the following:
>> Ah, ok (I had thought SCHEDULER_STOPPED was going to always be true when kdb 
>> was
>> active).  But I think these two changes should cover critical_exit() ok.
>>
>
> I attempted to start a discussion about this a few times already :-)
> Should we treat kdb context the same as SCHEDULER_STOPPED context (in the
> current definition) ?  That is, skip all locks in the same fashion?
> There are pros and contras.

Does kdb pause all CPUs with an interrupt (NMI or regular interrupt, I
can no longer remember...) when it enters?  If so, then I'd say
whether it enters via sysctl or panic doesn't matter.  It's in a
special environment where nothing else is running, which is what is
needed for proper exploration of the machine (via breakpoint, for
debugging a hang, etc).

Maybe the question is, why wouldn't SCHEDULER_STOPPED be true
regardless of how kdb is entered?

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Stop scheduler on panic

2011-11-17 Thread mdf
On Thu, Nov 17, 2011 at 1:08 PM, Andriy Gapon  wrote:
> on 17/11/2011 23:02 m...@freebsd.org said the following:
>> Another patch related to this area we have at $WORK:
>>
>>  #if defined(SMP)
>> -       /*
>> -        * Bind us to CPU 0 so that all shutdown code runs there.  Some
>> -        * systems don't shutdown properly (i.e., ACPI power off) if we
>> -        * run on another processor.
>> -        */
>> -       thread_lock(curthread);
>> -       sched_bind(curthread, 0);
>> -       thread_unlock(curthread);
>> -       KASSERT(PCPU_GET(cpuid) == 0, ("%s: not running on cpu 0", 
>> __func__));
>> +       /*
>> +        * sched_bind can't be done reliably inside of panic.  cpu_reset() 
>> will
>> +        * rebind us in any case, more reliably.
>> +        */
>> +       if (panicstr == NULL) {
>> +               /*
>> +                * Bind us to CPU 0 so that all shutdown code runs there.  
>> Some
>> +                * systems don't shutdown properly (i.e., ACPI power off) if 
>> we
>> +                * run on another processor.
>> +                */
>> +               thread_lock(curthread);
>> +               sched_bind(curthread, 0);
>> +               thread_unlock(curthread);
>> +               KASSERT(PCPU_GET(cpuid) == 0, ("boot: not running on cpu 
>> 0"));
>> +       }
>>  #endif
>>         /* We're in the process of rebooting. */
>>         rebooting = 1;
>
> Yes, you have contributed this patch before and it is a part of the bigger
> stop-scheduler-on-panic patches.  Have you had a chance to review those? :)

It's been so long I don't remember what I've sent where.  I did look
over the patch but had no additional comments.

Cheers,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Stop scheduler on panic

2011-11-17 Thread mdf
On Thu, Nov 17, 2011 at 12:54 PM, Attilio Rao  wrote:
> 2011/11/17 Andriy Gapon :
>> BTW, it is my opinion that we really should not let the debugger code call
>> mi_switch for any reason.
>
> Yes, I agree with this, this is why the sched_bind() in boot() is
> broken (immagine calling things like doadump from KDB. KDB right now
> can be thought as a first cut of this patch because it does disable
> the CPUs when entering the context, thus, the bug here is that if you
> stop all CPUs including CPU0 and later on you want bind on it you are
> death).

Another patch related to this area we have at $WORK:

 #if defined(SMP)
-   /*
-* Bind us to CPU 0 so that all shutdown code runs there.  Some
-* systems don't shutdown properly (i.e., ACPI power off) if we
-* run on another processor.
-*/
-   thread_lock(curthread);
-   sched_bind(curthread, 0);
-   thread_unlock(curthread);
-   KASSERT(PCPU_GET(cpuid) == 0, ("%s: not running on cpu 0", __func__));
+   /*
+* sched_bind can't be done reliably inside of panic.  cpu_reset() will
+* rebind us in any case, more reliably.
+*/
+   if (panicstr == NULL) {
+   /*
+* Bind us to CPU 0 so that all shutdown code runs there.  Some
+* systems don't shutdown properly (i.e., ACPI power off) if we
+* run on another processor.
+*/
+   thread_lock(curthread);
+   sched_bind(curthread, 0);
+   thread_unlock(curthread);
+   KASSERT(PCPU_GET(cpuid) == 0, ("boot: not running on cpu 0"));
+   }
 #endif
/* We're in the process of rebooting. */
rebooting = 1;

Cheers,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: vm_page_t related KBI [Was: Re: panic at vm_page_wire with FreeBSD 9.0 Beta 3]

2011-11-15 Thread mdf
On Tue, Nov 15, 2011 at 10:15 AM, Attilio Rao  wrote:
> 2011/11/7 Kostik Belousov :
>> On Mon, Nov 07, 2011 at 11:45:38AM -0600, Alan Cox wrote:
>>> Ok.  I'll offer one final suggestion.  Please consider an alternative
>>> suffix to "func".  Perhaps, "kbi" or "KBI".  In other words, something
>>> that hints at the function's reason for existing.
>>
>> Sure. Below is the extraction of only vm_page_lock() bits, together
>> with the suggested rename. When Attilio provides the promised simplification
>> of the mutex KPI, this can be reduced.
>
> My tentative patch is here:
> http://www.freebsd.org/~attilio/mutexfileline.patch
>
> I need to make more compile testing later, but it already compiles
> GENERIC + modules fine on HEAD.
>
> The patch provides a common entrypoint, option independent, for both
> fast case and debug/compat case.
> Additively, it almost entirely fixes the standard violation of the
> reserved namespace, as you described (the notable exception being the
> macro used in the fast path, that I want to fix as well, but in a
> separate commit).
>
> Now the file/line couplet can be passed to the "_" suffix variant of
> the flag functions.
>
> eadler@ reviewed the mutex.h comment.
>
> Please let me know what you think about it, as long as we agree on the
> patch I'll commit it.

Out of curiosity, why are function names explicitly spelled out in
panic and log messages, instead of using %s and __func__?  I've seen
this all around FreeBSD, and if there's no reason otherwise, I'd just
as soon change to a version that doesn't need updating when the
function names change.

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: vm_page_t related KBI [Was: Re: panic at vm_page_wire with FreeBSD 9.0 Beta 3]

2011-11-07 Thread mdf
On Mon, Nov 7, 2011 at 11:35 AM, Kostik Belousov  wrote:
> On Mon, Nov 07, 2011 at 11:45:38AM -0600, Alan Cox wrote:
>> Ok.  I'll offer one final suggestion.  Please consider an alternative
>> suffix to "func".  Perhaps, "kbi" or "KBI".  In other words, something
>> that hints at the function's reason for existing.
>
> Sure. Below is the extraction of only vm_page_lock() bits, together
> with the suggested rename. When Attilio provides the promised simplification
> of the mutex KPI, this can be reduced.
>
> diff --git a/sys/vm/vm_page.c b/sys/vm/vm_page.c
> index 389aea5..ea4ea34 100644
> --- a/sys/vm/vm_page.c
> +++ b/sys/vm/vm_page.c
> @@ -2677,6 +2677,44 @@ vm_page_test_dirty(vm_page_t m)
>                vm_page_dirty(m);
>  }
>
> +void
> +vm_page_lock_KBI(vm_page_t m, const char *file, int line)
> +{
> +
> +#if LOCK_DEBUG > 0 || defined(MUTEX_NOINLINE)
> +       _mtx_lock_flags(vm_page_lockptr(m), 0, file, line);
> +#else
> +       __mtx_lock(vm_page_lockptr(m), 0, file, line);
> +#endif
> +}
> +
> +void
> +vm_page_unlock_KBI(vm_page_t m, const char *file, int line)
> +{
> +
> +#if LOCK_DEBUG > 0 || defined(MUTEX_NOINLINE)
> +       _mtx_unlock_flags(vm_page_lockptr(m), 0, file, line);
> +#else
> +       __mtx_unlock(vm_page_lockptr(m), curthread, 0, file, line);
> +#endif
> +}
> +
> +int
> +vm_page_trylock_KBI(vm_page_t m, const char *file, int line)
> +{
> +
> +       return (_mtx_trylock(vm_page_lockptr(m), 0, file, line));
> +}
> +
> +void
> +vm_page_lock_assert_KBI(vm_page_t m, int a, const char *file, int line)
> +{
> +
> +#ifdef INVARIANTS
> +       _mtx_assert(vm_page_lockptr(m), a, file, line);
> +#endif
> +}
> +
>  int so_zerocp_fullpage = 0;
>
>  /*
> diff --git a/sys/vm/vm_page.h b/sys/vm/vm_page.h
> index 7099b70..a5604b7 100644
> --- a/sys/vm/vm_page.h
> +++ b/sys/vm/vm_page.h
> @@ -218,11 +218,23 @@ extern struct vpglocks pa_lock[];
>
>  #define        PA_LOCK_ASSERT(pa, a)   mtx_assert(PA_LOCKPTR(pa), (a))
>
> +#ifdef KLD_MODULE
> +#define        vm_page_lock(m)         vm_page_lock_KBI((m), LOCK_FILE, 
> LOCK_LINE)
> +#define        vm_page_unlock(m)       vm_page_unlock_KBI((m), LOCK_FILE, 
> LOCK_LINE)
> +#define        vm_page_trylock(m)      vm_page_trylock_KBI((m), LOCK_FILE, 
> LOCK_LINE)
> +#ifdef INVARIANTS
> +#define        vm_page_lock_assert(m, a)       \
> +    vm_page_lock_assert_KBI((m), (a), LOCK_FILE, LOCK_LINE)
> +#else
> +#define        vm_page_lock_assert(m, a)
> +#endif
> +#else  /* KLD_MODULE */
>  #define        vm_page_lockptr(m)      (PA_LOCKPTR(VM_PAGE_TO_PHYS((m

Is it not possible to have vm_page_lockptr() be a function for the
KLD_MODULE case?  Because then the vm_page_lock() functions and
friends would all just use mtx_lock, etc., in both the KLD and non-KLD
case.

Or am I missing something?

Thanks,
matthew

>  #define        vm_page_lock(m)         mtx_lock(vm_page_lockptr((m)))
>  #define        vm_page_unlock(m)       mtx_unlock(vm_page_lockptr((m)))
>  #define        vm_page_trylock(m)      mtx_trylock(vm_page_lockptr((m)))
>  #define        vm_page_lock_assert(m, a)       
> mtx_assert(vm_page_lockptr((m)), (a))
> +#endif
>
>  #define        vm_page_queue_free_mtx  vm_page_queue_free_lock.data
>  /*
> @@ -403,6 +415,11 @@ void vm_page_cowfault (vm_page_t);
>  int vm_page_cowsetup(vm_page_t);
>  void vm_page_cowclear (vm_page_t);
>
> +void vm_page_lock_KBI(vm_page_t m, const char *file, int line);
> +void vm_page_unlock_KBI(vm_page_t m, const char *file, int line);
> +int vm_page_trylock_KBI(vm_page_t m, const char *file, int line);
> +void vm_page_lock_assert_KBI(vm_page_t m, int a, const char *file, int line);
> +
>  #ifdef INVARIANTS
>  void vm_page_object_lock_assert(vm_page_t m);
>  #define        VM_PAGE_OBJECT_LOCK_ASSERT(m)   vm_page_object_lock_assert(m)
>
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: vm_page_t related KBI [Was: Re: panic at vm_page_wire with FreeBSD 9.0 Beta 3]

2011-11-06 Thread mdf
On Sun, Nov 6, 2011 at 4:43 AM, Kostik Belousov  wrote:
> Regarding the _vm_page_lock() vs. vm_page_lock_func(), the mutex.h has
> a lot of violations in regard of the namespaces, IMO. The __* namespace
> is reserved for the language implementation, so our freestanding program
> (kernel) ignores the requirements of the C standard with the names like
> __mtx_lock_spin(). Using the name _vm_page_lock() is valid, but makes
> it not unreasonable for other developers to introduce reserved names.
> So I decided to use the suffixes. vm_map.h locking is free of these
> violations.

I'm pretty sure that when the C standard says, "the implementation",
they're referring to the compiler and OS it runs on.  Which makes the
FreeBSD kernel part of "the implementation", which is precisely why so
many headers have defines that start with __ and then, if certain
posix defines are set, also uses non-__ versions of the name.

Cheers,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: vm_page_t related KBI [Was: Re: panic at vm_page_wire with FreeBSD 9.0 Beta 3]

2011-11-05 Thread mdf
On Sat, Nov 5, 2011 at 7:13 AM, Kostik Belousov  wrote:
> On Fri, Nov 04, 2011 at 06:03:39PM +0200, Kostik Belousov wrote:
>
> Below is the KBI patch after vm_page_bits_t merge is done.
> Again, I did not spent time converting all in-tree consumers
> from the (potentially) loadable modules to the new KPI until it
> is agreed upon.

I like my bikeshed orange...

I would think a more canonical name would be get/set rather than
read/write, especially since these operations aren't reading and
writing the page (neither are they getting the page, but at least set
is pretty unambiguous).

Thanks,
matthew

>
> diff --git a/sys/nfsclient/nfs_bio.c b/sys/nfsclient/nfs_bio.c
> index 305c189..7264cd1 100644
> --- a/sys/nfsclient/nfs_bio.c
> +++ b/sys/nfsclient/nfs_bio.c
> @@ -128,7 +128,7 @@ nfs_getpages(struct vop_getpages_args *ap)
>         * can only occur at the file EOF.
>         */
>        VM_OBJECT_LOCK(object);
> -       if (pages[ap->a_reqpage]->valid != 0) {
> +       if (vm_page_read_valid(pages[ap->a_reqpage]) != 0) {
>                for (i = 0; i < npages; ++i) {
>                        if (i != ap->a_reqpage) {
>                                vm_page_lock(pages[i]);
> @@ -198,16 +198,16 @@ nfs_getpages(struct vop_getpages_args *ap)
>                        /*
>                         * Read operation filled an entire page
>                         */
> -                       m->valid = VM_PAGE_BITS_ALL;
> -                       KASSERT(m->dirty == 0,
> +                       vm_page_write_valid(m, VM_PAGE_BITS_ALL);
> +                       KASSERT(vm_page_read_dirty(m) == 0,
>                            ("nfs_getpages: page %p is dirty", m));
>                } else if (size > toff) {
>                        /*
>                         * Read operation filled a partial page.
>                         */
> -                       m->valid = 0;
> +                       vm_page_write_valid(m, 0);
>                        vm_page_set_valid(m, 0, size - toff);
> -                       KASSERT(m->dirty == 0,
> +                       KASSERT(vm_page_read_dirty(m) == 0,
>                            ("nfs_getpages: page %p is dirty", m));
>                } else {
>                        /*
> diff --git a/sys/vm/vm_page.c b/sys/vm/vm_page.c
> index 389aea5..2f41e70 100644
> --- a/sys/vm/vm_page.c
> +++ b/sys/vm/vm_page.c
> @@ -2677,6 +2677,66 @@ vm_page_test_dirty(vm_page_t m)
>                vm_page_dirty(m);
>  }
>
> +void
> +vm_page_lock_func(vm_page_t m, const char *file, int line)
> +{
> +
> +#if LOCK_DEBUG > 0 || defined(MUTEX_NOINLINE)
> +       _mtx_lock_flags(vm_page_lockptr(m), 0, file, line);
> +#else
> +       __mtx_lock(vm_page_lockptr(m), 0, file, line);
> +#endif
> +}
> +
> +void
> +vm_page_unlock_func(vm_page_t m, const char *file, int line)
> +{
> +
> +#if LOCK_DEBUG > 0 || defined(MUTEX_NOINLINE)
> +       _mtx_unlock_flags(vm_page_lockptr(m), 0, file, line);
> +#else
> +       __mtx_unlock(vm_page_lockptr(m), curthread, 0, file, line);
> +#endif
> +}
> +
> +int
> +vm_page_trylock_func(vm_page_t m, const char *file, int line)
> +{
> +
> +       return (_mtx_trylock(vm_page_lockptr(m), 0, file, line));
> +}
> +
> +void
> +vm_page_lock_assert_func(vm_page_t m, int a, const char *file, int line)
> +{
> +
> +#ifdef INVARIANTS
> +       _mtx_assert(vm_page_lockptr(m), a, file, line);
> +#endif
> +}
> +
> +vm_page_bits_t
> +vm_page_read_dirty_func(vm_page_t m)
> +{
> +
> +       return (m->dirty);
> +}
> +
> +vm_page_bits_t
> +vm_page_read_valid_func(vm_page_t m)
> +{
> +
> +       return (m->valid);
> +}
> +
> +void
> +vm_page_write_valid_func(vm_page_t m, vm_page_bits_t v)
> +{
> +
> +       m->valid = v;
> +}
> +
> +
>  int so_zerocp_fullpage = 0;
>
>  /*
> diff --git a/sys/vm/vm_page.h b/sys/vm/vm_page.h
> index 7099b70..4f8f71e 100644
> --- a/sys/vm/vm_page.h
> +++ b/sys/vm/vm_page.h
> @@ -218,12 +218,50 @@ extern struct vpglocks pa_lock[];
>
>  #define        PA_LOCK_ASSERT(pa, a)   mtx_assert(PA_LOCKPTR(pa), (a))
>
> +#ifdef KLD_MODULE
> +#define        vm_page_lock(m)         vm_page_lock_func((m), LOCK_FILE, 
> LOCK_LINE)
> +#define        vm_page_unlock(m)       vm_page_unlock_func((m), LOCK_FILE, 
> LOCK_LINE)
> +#define        vm_page_trylock(m)      vm_page_trylock_func((m), LOCK_FILE, 
> LOCK_LINE)
> +#ifdef INVARIANTS
> +#define        vm_page_lock_assert(m, a)       \
> +    vm_page_lock_assert_func((m), (a), LOCK_FILE, LOCK_LINE)
> +#else
> +#define        vm_page_lock_assert(m, a)
> +#endif
> +
> +#define        vm_page_read_dirty(m)   vm_page_read_dirty_func((m))
> +#define        vm_page_read_valid(m)   vm_page_read_valid_func((m))
> +#define        vm_page_write_valid(m, v)       vm_page_write_valid_func((m), 
> (v))
> +
> +#else  /* KLD_MODULE */
>  #define        vm_page_lockptr(m)      (PA_LOCKPTR(VM_PAGE_TO_PHYS((m
>  #define        vm_page_lock(m)         mtx_lock(vm_page_lockptr((m)))
>  #define        vm_page_unlock(m)       mtx_unlock(vm_page_lockptr

Re: smp_rendezvous runs with interrupts and preemption enabled on unicore systems

2011-10-28 Thread mdf
On Fri, Oct 28, 2011 at 8:37 AM, Ryan Stone  wrote:
> I'm seeing issues on a unicore systems running a derivative of FreeBSD
> 8.2-RELEASE if something calls mem_range_attr_set.  It turns out that
> the root cause is a bug in smp_rendezvous_cpus.  The first part of
> smp_rendezvous_cpus attempts to short-circuit the non-SMP case(note
> that smp_started is never set to 1 on a unicore system):
>
>        if (!smp_started) {
>                if (setup_func != NULL)
>                        setup_func(arg);
>                if (action_func != NULL)
>                        action_func(arg);
>                if (teardown_func != NULL)
>                        teardown_func(arg);
>                return;
>        }
>
> The problem is that this runs with interrupts enabled, outside of a
> critical section.  My system runs with device_polling enabled with hz
> set to 2500, so its quite easy to wedge the system by having a thread
> run mem_range_attr_set.  That has to do a smp_rendezvous, and if a
> timer interrupt happens to go off half-way through the action_func and
> preempt this thread, the system ends up deadlocked(although once it's
> wedged, typing at the serial console stands a good chance of unwedging
> the system.  Go figure).
>
> I know that smp_rendezvous was reworked substantially on HEAD, but by
> inspection it looks like the bug is still present, as the
> short-circuit behaviour is still there.
>
> I am not entirely sure of the best way to fix this.  Is it as simple
> as doing a spinlock_enter before setup_func and a spinlock_exit after
> teardown_func?  It seems to boot fine, but I'm not at all confident
> that I understand the nuances of smp_rendezvous to be sure that there
> aren't side effects that I don't know about.

Looks like Max didn't have time to upstream this fix:

 struct mtx smp_ipi_mtx;
+MTX_SYSINIT(smp_ipi_mtx, &smp_ipi_mtx, "smp rendezvous", MTX_SPIN);

...

 static void
 mp_start(void *dummy)
 {

-   mtx_init(&smp_ipi_mtx, "smp rendezvous", NULL, MTX_SPIN);

...

if (!smp_started) {
+   mtx_lock_spin(&smp_ipi_mtx);
if (setup_func != NULL)
setup_func(arg);
if (action_func != NULL)
action_func(arg);
if (teardown_func != NULL)
teardown_func(arg);
+   mtx_unlock_spin(&smp_ipi_mtx);
return;
}

Cheers,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Deterministic panic due to non-sleepable lock with if_alc when reconfiguring interfaces

2011-08-18 Thread mdf
On Thu, Aug 18, 2011 at 5:50 PM, Garrett Cooper  wrote:
>    When loading if_alc as a module on my netbook and running
> /etc/rc.d/netif restart, I can deterministically panic my netbook with
> the following message:
>
> ) at _bus_dmamap_sync+0x51
> alc_stop(c3dbb000,0,c0c51844,93a,80206910,...) at alc_stop+0x24e
> alc_ioctl(c3d07400,80206910,c40423c0,c06a7935,c0914e3c,...) at alc_ioctl+0x22e
> ifioctl(c45029c0,80206910,c40423c0,c40505c0,c4528c00,...) at ifioctl+0xc98
> soo_ioctl(c4574e00,80206910,c40423c0,c413e680,c40505c0,...) at soo_ioctl+0x401
> kern_ioctl(c40505c0,3,80206910,c40423c0,c40423c0,...) at kern_ioctl+0x1d7
> ioctl(c40505c0,e6ca3cec,e6ca3d28,c08e929d,0,...) at ioctl+0x118
> syscallenter(c40505c0,e6ca3ce4,e6ca3ce4,0,0,...) at syscallenter+0x23f
> syscall(e6ca3d28) at syscall+0x2e
> Xint0x80_syscall() at Xint0x80_syscall+0x21
> --- syscall (54kernel trap 12 with interrupts disabled
> Kernel page fault with the following non-sleepable locks held:
> exclusive sleep mutex alc0 (network driver) r = 0 (0xc3dbc608) locked
> @ /usr/src/sys/modules/alc/../../dev/alc/if_alc.c:2362
> KDB: stack backtrace:
> db_trace_self_wrapper(c08e727a,80,6e726500,74206c65,20706172,...) at
> db_trace_self_wrapper+0x26
> kdb_backtrace(93a,0,,c0ad6114,e6ca323c,...) at kdb_backtrace+0x2a
> _witness_debugger(c08e9f67,e6ca3250,4,1,0,...) at _witness_debugger+0x1e
> witness_warn(5,0,c0924fe1,c097df50,c3e42b00,...) at witness_warn+0x1f1
> trap(e6ca32dc) at trap+0x15a
> calltrap() at calltrap+0x6
>
>    I tried to track down what the exact issue was, but I got lost
> (the locking sort of looks ok to me, but I'm still not an expert with
> mutex(9)).
>    I still have the vmcore and can provide more helpful details when 
> requested.

The locking itself is almost certainly fine.  The error message is not
very helpful, but what went wrong was the page fault.  You just happen
to panic on a witness warning before vm_fault can panic due to a bad
address.

The alc(4) maintainer would probably like info on the trap (line of
code and where the bad pointer came from).

Cheers,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: variable init

2011-08-01 Thread mdf
On Mon, Aug 1, 2011 at 12:09 AM, Andrew Thompson  wrote:
> Hi,
>
> Looking at,
>
> http://www.freebsd.org/cgi/query-pr.cgi?pr=159345
>
> The lock is a global variable, declared as
>
> static struct mtx       lagg_list_mtx;
>
> I would expect this to be zeroed memory, is this guaranteed?

It depends. :-)  The C standard mandates that this storage be zero.
However, seeing as we're the operating system, it's our job to do it.
For an application program the zero'd storage is zeroed by the loader.
 For the operating system, exactly when it's initialized is an
architectural decision.  I believe that the zeroed storage of FreeBSD
is cleared by the boot loader before it transfers control to the
operating system.  (For reference, on AIX the storage is cleared by
the virtual memory management code during bringup of the VMM leading
to interesting bugs when some variables used in early boot weren't
initialized).

Cheers,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: sgl (or uio) in struct buf / bio

2011-07-18 Thread mdf
On Mon, Jul 18, 2011 at 3:15 PM, Joel Jacobson  wrote:
> a few years ago (2008), i asked about the prospect of getting an sgl passed 
> through struct buf / bio, and was referred to the jhb_bio branch.  i never 
> did figure out how to view it, though, and it fell off my radar for a while.
>
> did anything ever come of it, and if so/not, are there any plans for 
> switching to something like using a uio instead of a pointer/length pair?
>

To the best of my knowledge, this is still something many of us would
like to see done, and no one has had any time to work on.

Cheers,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Heavy I/O blocks FreeBSD box for several seconds

2011-07-11 Thread mdf
On Mon, Jul 11, 2011 at 4:00 PM, Ali Mashtizadeh  wrote:
> Maybe someone can setup something like reviewboard [1] for developers
> to use. This may also help folks who want to keep abreast of the
> current work in a particular subsystem or get involved into the
> development process more. At my company we use reviews and it seems to
> help the catch some bugs and help new engineers ramp up faster.
>
> [1] http://www.reviewboard.org/

FreeBSD development is completely open; anyone can sign up for the
svn-src-* mailing list they are interested in, including
svn-src-head@.  Code reviews are plenty as well; just check the list
archives for discussion of bugs, poor design choices and unintended
effects.  But most reviews are silent and after-the-fact by looking at
the list mail.  It's a system that seems to be working just fine for
the FreeBSD project so far.  This isn't a job for most anyone; it's a
volunteer project and so anything that raises the barrier to getting
work done for the project should be looked at with skepticism.

Is there a specific deficit that you want to address?

Thanks,
matthew

> On Mon, Jul 11, 2011 at 2:48 PM, Arnaud Lacombe  wrote:
>> Hi,
>>
>> On Mon, Jul 11, 2011 at 5:14 PM, Andriy Gapon  wrote:
>>> on 11/07/2011 23:33 Arnaud Lacombe said the following:
 For the record, I would like to see enforced public review for _every_
 patch *before* it is checked in, as a strong rule. gcc system is
 particularly interesting. But it is not likely to happen in FreeBSD
 where FreeBSD committers are clearly more free than other at
 checking-in un-publicly-reviewed stuff (especially _bad_ stuff).

 This would of course apply even to long-time committers, no matter how
 it hurt their ego (which I definitively do not care about).
>>>
>>> Have you just volunteered to review all of the patches that I would like to
>>> commit?  And are you prepared to take responsibility for quality of your 
>>> reviews?
>>> I am sure that other developers will gladly accept your offer too.
>>>
>> _No-one_ can do all the reviews, especially not me (on a purely
>> technical level). ACK must come from subsystem maintainers. Having
>> public review would allow the community review, which is now just not
>> possible today. As about patches from the maintainer, they might be
>> committed without his approval, but still sent for review. If a
>> maintainer goes outside his area, he has to get approval from the
>> other subsystem maintainer.
>>
>>  - Arnaud
>> ___
>> freebsd-current@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-current
>> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
>>
> ___
> freebsd-current@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
>
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: kern.sync_on_panic

2011-06-27 Thread mdf
On Mon, Jun 27, 2011 at 3:08 AM, Andriy Gapon  wrote:
> on 26/06/2011 08:51 Warner Losh said the following:
>>
>> On Jun 25, 2011, at 8:49 AM, Andriy Gapon wrote:
>>> Does anybody actually use kern.sync_on_panic tunable/sysctl? If yes, then
>>> in what circumstances do you need it? That is, why any other alternative
>>> doesn't work for you? Like: 1. remounting filesystems R/O before panic if
>>> you knowingly provoke it for testing 2. using netboot for your test system
>>> 3. using su+j, gjournal or a different filesystem altogether 4. using fsck
>>> after reboot
>>>
>>> It seems to me that syncing filesystems in panic context is an adventure.
>>> And it may become even more of an adventure if we introduce code that
>>> completely stops scheduler in and after panic.
>>
>> I've used it in the past when I was developing a device driver that was in
>> the late stages of maturing.  Since all the panics in the system were when
>> the driver dereferenced NULL in that driver, sync was safe because all the
>> data structures were sane except the aforementioned driver.
>>
>> (1) It was a production system, and everything that could be was already
>> mounted r/w.  However, some small, but every critical, amount of data was
>> still r/w and it was very important to not lose this data.  Production here
>> likely should be in quotes, because it was in the late stages of
>> testing/validation.  The problem was without this sometimes the saved state
>> of the GPS receiver and other hardware would wind up being zero, which meant
>> that we'd have to do a cold start which cost us a few hours of time.  At the
>> time I was doing this, we saw zero files a couple times a day without this
>> turned on. (2) netbooting wasn't an option since we were qualifying a
>> non-netbooting system. (3) these weren't available at the time, but the goal
>> was to prevent data loss, not to necessarily have to avoid fsck on boot. (4)
>> Data loss without it.
>>
>> Now, I'll be the first to admit this has been a few years, and I haven't done
>> a fresh evaluation to see if things are still safe.  I'll also be the first
>> to admit that this was a useful debugging setting late in development, and
>> not in production.  I'm also the first to admit this isn't what I'd call a
>> very wide-spread case.  But it did come in very handy when chasing a few bugs
>> to be able to do 10 panic/reboot cycles an hour rather than 2 a day.
>
> A fine enough use-case for me.  I guess the problem ultimately boiled down to
> peculiarities of UFS behavior, but still...
> However, please be aware that sync_on_panic might get broken when/if we start
> stopping scheduler in panic.

The entirety of the sync code should be a subroutine in vfs_bio.c so
the 'buf' variable is static to the file.  At that point it would be
reasonable to explicitly call it at the beginning of panic(9) for the
sync-on-panic case, either before IPIing the other CPUs, or at least
before entering the critical section that prevents the scheduler from
running.

Cheers,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: best way to do FreeBSD kernel/userland development on OSX ?

2011-06-20 Thread mdf
On Mon, Jun 20, 2011 at 1:53 PM, Luigi Rizzo  wrote:
> i recently replaced my laptop with a mac, which opens the problem on
> how do i do FreeBSD development (including kernel work) on it.
>
> the option i am trying now is running a freebsd VM in virtualbox,
> but it is slightly slow and probably energy hungry.
>
> I'd like to keep the svn checkut outside the disk image but
> unfortunately the default filesystem on osx is case-insensitive
> so i can't do that -- unless someone shows me how to change
> the filesystem conventions.
>
> suggestions anyone ?

With the help of google I did something for this a few months ago.
You can create a new disk image, and use case-sensitive for this
image.  I created mine as something like 8GB but thinly provisioned so
it didn't take up all the space until used.

I use VMWare for testing code changes ($WORK paid for a license for
me), but it's a CPU/battery hog, so I'm not wild about it, but it does
work.

Cheers,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [head tinderbox] failure on ia64/ia64

2011-06-20 Thread mdf
On Mon, Jun 20, 2011 at 11:40 AM, Garrett Cooper  wrote:
> On Mon, Jun 20, 2011 at 11:37 AM, FreeBSD Tinderbox
>  wrote:
>> TB --- 2011-06-20 17:09:28 - tinderbox 2.7 running on 
>> freebsd-current.sentex.ca
>> TB --- 2011-06-20 17:09:28 - starting HEAD tinderbox run for ia64/ia64
>> TB --- 2011-06-20 17:09:28 - cleaning the object tree
>> TB --- 2011-06-20 17:09:40 - cvsupping the source tree
>> TB --- 2011-06-20 17:09:40 - /usr/bin/csup -z -r 3 -g -L 1 -h 
>> cvsup.sentex.ca /tinderbox/HEAD/ia64/ia64/supfile
>> TB --- 2011-06-20 17:10:27 - building world
>> TB --- 2011-06-20 17:10:27 - MAKEOBJDIRPREFIX=/obj
>> TB --- 2011-06-20 17:10:27 - PATH=/usr/bin:/usr/sbin:/bin:/sbin
>> TB --- 2011-06-20 17:10:27 - TARGET=ia64
>> TB --- 2011-06-20 17:10:27 - TARGET_ARCH=ia64
>> TB --- 2011-06-20 17:10:27 - TZ=UTC
>> TB --- 2011-06-20 17:10:27 - __MAKE_CONF=/dev/null
>> TB --- 2011-06-20 17:10:27 - cd /src
>> TB --- 2011-06-20 17:10:27 - /usr/bin/make -B buildworld
> World build started on Mon Jun 20 17:10:28 UTC 2011
> Rebuilding the temporary build tree
> stage 1.1: legacy release compatibility shims
> stage 1.2: bootstrap tools
> stage 2.1: cleaning up the object tree
> stage 2.2: rebuilding the object tree
> stage 2.3: build tools
> stage 3: cross tools
> stage 4.1: building includes
> stage 4.2: building libraries
> stage 4.3: make dependencies
> stage 4.4: building everything
> World build completed on Mon Jun 20 18:35:55 UTC 2011
>> TB --- 2011-06-20 18:35:56 - generating LINT kernel config
>> TB --- 2011-06-20 18:35:56 - cd /src/sys/ia64/conf
>> TB --- 2011-06-20 18:35:56 - /usr/bin/make -B LINT
>> TB --- 2011-06-20 18:35:56 - building LINT kernel
>> TB --- 2011-06-20 18:35:56 - MAKEOBJDIRPREFIX=/obj
>> TB --- 2011-06-20 18:35:56 - PATH=/usr/bin:/usr/sbin:/bin:/sbin
>> TB --- 2011-06-20 18:35:56 - TARGET=ia64
>> TB --- 2011-06-20 18:35:56 - TARGET_ARCH=ia64
>> TB --- 2011-06-20 18:35:56 - TZ=UTC
>> TB --- 2011-06-20 18:35:56 - __MAKE_CONF=/dev/null
>> TB --- 2011-06-20 18:35:56 - cd /src
>> TB --- 2011-06-20 18:35:56 - /usr/bin/make -B buildkernel KERNCONF=LINT
> Kernel build for LINT started on Mon Jun 20 18:35:56 UTC 2011
> stage 1: configuring the kernel
> stage 2.1: cleaning up the object tree
> stage 2.2: rebuilding the object tree
> stage 2.3: build tools
> stage 3.1: making dependencies
>> [...]
>> awk -f /src/sys/tools/makeobjops.awk /src/sys/kgssapi/kgss_if.m -h
>> awk -f /src/sys/tools/makeobjops.awk /src/sys/libkern/iconv_converter_if.m -h
>> awk -f /src/sys/tools/makeobjops.awk /src/sys/opencrypto/cryptodev_if.m -h
>> awk -f /src/sys/tools/makeobjops.awk /src/sys/dev/acpica/acpi_if.m -h
>> rm -f .newdep
>> /usr/bin/make -V CFILES -V SYSTEM_CFILES -V GEN_CFILES |  MKDEP_CPP="cc -E" 
>> CC="cc" xargs mkdep -a -f .newdep -O2 -pipe -fno-strict-aliasing  -std=c99  
>> -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes  
>> -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual  -Wundef 
>> -Wno-pointer-sign -fformat-extensions  -Wmissing-include-dirs 
>> -fdiagnostics-show-option -nostdinc  -I. -I/src/sys -I/src/sys/contrib/altq 
>> -I/src/sys/contrib/ipfilter -I/src/sys/contrib/pf -I/src/sys/dev/ath 
>> -I/src/sys/dev/ath/ath_hal -I/src/sys/contrib/ngatm -I/src/sys/dev/twa 
>> -I/src/sys/gnu/fs/xfs/FreeBSD -I/src/sys/gnu/fs/xfs/FreeBSD/support 
>> -I/src/sys/gnu/fs/xfs -I/src/sys/dev/cxgb -I/src/sys/dev/cxgbe 
>> -I/src/sys/contrib/ia64/libuwx/src -D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS 
>> -include opt_global.h -fno-common -finline-limit=15000 --param 
>> inline-unit-growth=100 --param large-function-growth=1000 -fno-builtin 
>> -mconstant-gp -ffixed-r13 -mfixed-range=f32-f127 -fpic -ffreestanding
>> /src/sys/vm/vm_page.c:2356:2: error: #error "PAGE_SIZE is not supported."
>> mkdep: compile failed
>> *** Error code 1
>>
>> Stop in /obj/ia64.ia64/src/sys/LINT.
>> *** Error code 1
>>
>> Stop in /src.
>> *** Error code 1
>
>    r223307 broke tinderbox on ia64. What should the PAGE_SIZE be for
> that architecture?

The LINT configuration file uses 1<<15, or 32768.  It should just be a
matter of adding a case and using atomic_clear_long.

Cheers,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: `hw.acpi.thermal.tz0.temperature' disappeared

2011-04-19 Thread mdf
On Tue, Apr 19, 2011 at 11:02 AM, Doug Barton  wrote:
> On 4/19/2011 9:44 AM, m...@freebsd.org wrote:
>>
>> As an aside, what kind of h/w do I need
>> for hw.acpi.thermal to show up?  I don't see it on my Dell desktop...
>
> The hardware is likely to be there for any reasonably modern Dell desktop.
> Do you have coretemp loaded?

I didn't (I had assumed since the relevant sysctls are defined in
acpi_thermal.c that having acpi was sufficient), so I just tried that,
but still no hw.acpi.thermal node.

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: `hw.acpi.thermal.tz0.temperature' disappeared

2011-04-19 Thread mdf
On Tue, Apr 19, 2011 at 7:48 AM, Taku YAMAMOTO  wrote:
>> This patch works.
>>
>> FreeBSD 9.0-CURRENT #1: Tue Apr 19 10:52:58 MSD 2011
>>
>> # sysctl hw.acpi.thermal
>> hw.acpi.thermal.min_runtime: 0
>> hw.acpi.thermal.polling_rate: 10
>> hw.acpi.thermal.user_override: 0
>> hw.acpi.thermal.tz0.temperature: 67.5C
>> hw.acpi.thermal.tz0.active: -1
>> hw.acpi.thermal.tz0.passive_cooling: 0
>> hw.acpi.thermal.tz0.thermal_flags: 0
>> hw.acpi.thermal.tz0._PSV: -1
>> hw.acpi.thermal.tz0._HOT: -1
>> hw.acpi.thermal.tz0._CRT: 90.0C
>> hw.acpi.thermal.tz0._ACx: -1
>> hw.acpi.thermal.tz0._TC1: -1
>> hw.acpi.thermal.tz0._TC2: -1
>> hw.acpi.thermal.tz0._TSP: -1
>
> We should have 10 _ACx values like this:
> hw.acpi.thermal.tz0._ACx: -1 -1 -1 -1 -1 -1 -1 -1 -1 -1

D'oh, I didn't read the original source carefully enough.

Can someone test this patch?  As an aside, what kind of h/w do I need
for hw.acpi.thermal to show up?  I don't see it on my Dell desktop...

Thanks,
matthew
diff --git a/sys/dev/acpica/acpi_thermal.c b/sys/dev/acpica/acpi_thermal.c
index b64d8df..7226b6c 100644
--- a/sys/dev/acpica/acpi_thermal.c
+++ b/sys/dev/acpica/acpi_thermal.c
@@ -288,7 +288,8 @@ acpi_tz_attach(device_t dev)
 		"critical temp setpoint (shutdown now)");
 SYSCTL_ADD_PROC(&sc->tz_sysctl_ctx, SYSCTL_CHILDREN(sc->tz_sysctl_tree),
 		OID_AUTO, "_ACx", CTLTYPE_INT | CTLFLAG_RD,
-		&sc->tz_zone.ac, 0, sysctl_handle_int, "IK", "");
+		&sc->tz_zone.ac, sizeof(sc->tz_zone.ac),
+		sysctl_handle_opaque, "IK", "");
 SYSCTL_ADD_PROC(&sc->tz_sysctl_ctx, SYSCTL_CHILDREN(sc->tz_sysctl_tree),
 		OID_AUTO, "_TC1", CTLTYPE_INT | CTLFLAG_RW,
 		sc, offsetof(struct acpi_tz_softc, tz_zone.tc1),
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: `hw.acpi.thermal.tz0.temperature' disappeared

2011-04-18 Thread mdf
On Mon, Apr 18, 2011 at 5:05 AM, John Baldwin  wrote:
> On Saturday, April 16, 2011 11:51:22 am Nick Ulen wrote:
>> FreeBSD was successfully upgraded.
>>
>> uname -v
>> FreeBSD 9.0-CURRENT #0: Mon Apr 11 18:14:36 MSD 2011
>> root@test:/usr/obj/usr/src/sys/GENERIC
>>
>> Everything seems to be working well except
>> `hw.acpi.thermal.tz0.temperature' disappeared from the list of available
>> sysctl variables.
>>
>> sysctl hw.acpi.thermal.
>>
>> hw.acpi.thermal.min_runtime: 0
>> hw.acpi.thermal.polling_rate: 10
>> hw.acpi.thermal.user_override: 0
>> hw.acpi.thermal.tz0.active: -1
>> hw.acpi.thermal.tz0.passive_cooling: 0
>> hw.acpi.thermal.tz0.thermal_flags: 0
>> hw.acpi.thermal.tz0._PSV: -1
>> hw.acpi.thermal.tz0._HOT: -1
>> hw.acpi.thermal.tz0._CRT: 90.0C
>> hw.acpi.thermal.tz0._TC1: -1
>> hw.acpi.thermal.tz0._TC2: -1
>> hw.acpi.thermal.tz0._TSP: -1
>>
>> output from:
>>  sysctl -a |grep acpi
>> is here: https://privatepaste.com/ca08d4658b
>
> I suspect it is still there, but sysctl doesn't know how to display it
> anymore.  This is probably due to the changes with formatting of sysctl
> information.  mdf@ is probably responsible in that case.
>
>    SYSCTL_ADD_OPAQUE(&sc->tz_sysctl_ctx, SYSCTL_CHILDREN(sc->tz_sysctl_tree),
>                      OID_AUTO, "temperature", CTLFLAG_RD, &sc->tz_temperature,
>                      sizeof(sc->tz_temperature), "IK",
>                      "current thermal zone temperature");

Oops, yes.  The change in r217586 required the type to be set to
CTLTYPE_INT to print as format IK.  My grep of the source tree shows
that acpi_thermal.c is the only affected source file that was using
OPAQUE.  I'm testing out the fix now.

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: `hw.acpi.thermal.tz0.temperature' disappeared

2011-04-18 Thread mdf
On Mon, Apr 18, 2011 at 5:05 AM, John Baldwin  wrote:
> On Saturday, April 16, 2011 11:51:22 am Nick Ulen wrote:
>> FreeBSD was successfully upgraded.
>>
>> uname -v
>> FreeBSD 9.0-CURRENT #0: Mon Apr 11 18:14:36 MSD 2011
>> root@test:/usr/obj/usr/src/sys/GENERIC
>>
>> Everything seems to be working well except
>> `hw.acpi.thermal.tz0.temperature' disappeared from the list of available
>> sysctl variables.
>>
>> sysctl hw.acpi.thermal.
>>
>> hw.acpi.thermal.min_runtime: 0
>> hw.acpi.thermal.polling_rate: 10
>> hw.acpi.thermal.user_override: 0
>> hw.acpi.thermal.tz0.active: -1
>> hw.acpi.thermal.tz0.passive_cooling: 0
>> hw.acpi.thermal.tz0.thermal_flags: 0
>> hw.acpi.thermal.tz0._PSV: -1
>> hw.acpi.thermal.tz0._HOT: -1
>> hw.acpi.thermal.tz0._CRT: 90.0C
>> hw.acpi.thermal.tz0._TC1: -1
>> hw.acpi.thermal.tz0._TC2: -1
>> hw.acpi.thermal.tz0._TSP: -1
>>
>> output from:
>>  sysctl -a |grep acpi
>> is here: https://privatepaste.com/ca08d4658b
>
> I suspect it is still there, but sysctl doesn't know how to display it
> anymore.  This is probably due to the changes with formatting of sysctl
> information.  mdf@ is probably responsible in that case.
>
>    SYSCTL_ADD_OPAQUE(&sc->tz_sysctl_ctx, SYSCTL_CHILDREN(sc->tz_sysctl_tree),
>                      OID_AUTO, "temperature", CTLFLAG_RD, &sc->tz_temperature,
>                      sizeof(sc->tz_temperature), "IK",
>                      "current thermal zone temperature");


I don't seem to have a hw.acpi.thermal sysctl node on my box.  Can
someone please try this patch?

Thanks,
matthew
diff --git a/sys/dev/acpica/acpi_thermal.c b/sys/dev/acpica/acpi_thermal.c
index 515a742..b64d8df 100644
--- a/sys/dev/acpica/acpi_thermal.c
+++ b/sys/dev/acpica/acpi_thermal.c
@@ -257,10 +257,10 @@ acpi_tz_attach(device_t dev)
 sc->tz_sysctl_tree = SYSCTL_ADD_NODE(&sc->tz_sysctl_ctx,
 	 SYSCTL_CHILDREN(acpi_tz_sysctl_tree),
 	 OID_AUTO, oidname, CTLFLAG_RD, 0, "");
-SYSCTL_ADD_OPAQUE(&sc->tz_sysctl_ctx, SYSCTL_CHILDREN(sc->tz_sysctl_tree),
-		  OID_AUTO, "temperature", CTLFLAG_RD, &sc->tz_temperature,
-		  sizeof(sc->tz_temperature), "IK",
-		  "current thermal zone temperature");
+SYSCTL_ADD_PROC(&sc->tz_sysctl_ctx, SYSCTL_CHILDREN(sc->tz_sysctl_tree),
+		OID_AUTO, "temperature", CTLTYPE_INT | CTLFLAG_RD,
+		&sc->tz_temperature, 0, sysctl_handle_int,
+		"IK", "current thermal zone temperature");
 SYSCTL_ADD_PROC(&sc->tz_sysctl_ctx, SYSCTL_CHILDREN(sc->tz_sysctl_tree),
 		OID_AUTO, "active", CTLTYPE_INT | CTLFLAG_RW,
 		sc, 0, acpi_tz_active_sysctl, "I", "cooling is active");
@@ -286,9 +286,9 @@ acpi_tz_attach(device_t dev)
 		sc, offsetof(struct acpi_tz_softc, tz_zone.crt),
 		acpi_tz_temp_sysctl, "IK",
 		"critical temp setpoint (shutdown now)");
-SYSCTL_ADD_OPAQUE(&sc->tz_sysctl_ctx, SYSCTL_CHILDREN(sc->tz_sysctl_tree),
-		  OID_AUTO, "_ACx", CTLFLAG_RD, &sc->tz_zone.ac,
-		  sizeof(sc->tz_zone.ac), "IK", "");
+SYSCTL_ADD_PROC(&sc->tz_sysctl_ctx, SYSCTL_CHILDREN(sc->tz_sysctl_tree),
+		OID_AUTO, "_ACx", CTLTYPE_INT | CTLFLAG_RD,
+		&sc->tz_zone.ac, 0, sysctl_handle_int, "IK", "");
 SYSCTL_ADD_PROC(&sc->tz_sysctl_ctx, SYSCTL_CHILDREN(sc->tz_sysctl_tree),
 		OID_AUTO, "_TC1", CTLTYPE_INT | CTLFLAG_RW,
 		sc, offsetof(struct acpi_tz_softc, tz_zone.tc1),
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Panic with mca trap

2011-02-03 Thread mdf
On Thu, Feb 3, 2011 at 7:09 AM, John Baldwin  wrote:
> On Thursday, February 03, 2011 8:05:31 am John Baldwin wrote:
>> On Tuesday, February 01, 2011 11:58:12 am m...@freebsd.org wrote:
>> > On a piece of hardware trying to verify basic build tests, we got an
>> > MCA exception that then panic'd the kernel due to WITNESS/INVARIANTS
>> > interaction.
>> >
>> > panic @ time 1296563157.510, thread 0xff000554: blockable
>> > sleep lock (sleep mutex) 128 @ /build/mnt/src/sys/vm/uma_core.c:1872
>> >
>> > Stack: --
>> > kernel:witness_checkorder+0x7a2
>> > kernel:_mtx_lock_flags+0x81
>> > kernel:uma_zalloc_arg+0x256
>> > kernel:malloc+0xc5
>> > kernel:mca_record_entry+0x30
>> > kernel:mca_scan+0xc9
>> > kernel:mca_intr+0x79
>> > kernel:trap+0x30b
>> > kernel:witness_checkorder+0x66
>> > kernel:_mtx_lock_spin_flags+0xa4
>> > kernel:witness_checkorder+0x2a8
>> > kernel:_mtx_lock_spin_flags+0xa4
>> > kernel:tdq_idled+0xe8
>> > kernel:sched_idletd+0x5b
>> > kernel:fork_exit+0x9b
>> >
>> > That's this bit of code in uma_zalloc_arg():
>> >
>> > #ifdef INVARIANTS
>> >                         ZONE_LOCK(zone);
>> >                         uma_dbg_alloc(zone, NULL, item);
>> >                         ZONE_UNLOCK(zone);
>> > #endif
>> >
>> >
>> > I don't know uma(9) well enough to know the best workaround.  Clearly
>> > there are times we can be in uma_zalloc_arg() and taking a regular
>> > mutex is not acceptable.  But what to do for the uma_dbg_free() call
>> > so it's happy, and whether to guard taking the ZONE lock with M_NOWAIT
>> > or td_critnest > 0 or both is outside my current knowledge.
>> >
>> > I don't expect we'll see this panic again any time soon, but it would
>> > be nice to fix the story for WITNESS of when an M_NOWAIT allocation
>> > can be done.
>>
>> Actually, this is more my fault.  The machine check happened while the
>> interrupted thread was already in a critical section (hence the WITNESS
>> complaint).  However, it really isn't correct to be calling malloc() from an
>> arbitrary exception handler, especially one like MC# which can fire pretty
>> much anywhere.  I think instead that we should use malloc() when polling the
>> machine check banks, but keep a pre-allocated pool of structures for use with
>> MC# exceptions and CMC interrupts and replenish the pool via an asynchronous
>> task.
>
> This is what I'm thinking:
>
> Index: mca.c
> ===
> --- mca.c       (revision 218221)
> +++ mca.c       (working copy)
> @@ -85,6 +85,7 @@
>  static MALLOC_DEFINE(M_MCA, "MCA", "Machine Check Architecture");
>
>  static int mca_count;          /* Number of records stored. */
> +static int mca_banks;          /* Number of per-CPU register banks. */
>
>  SYSCTL_NODE(_hw, OID_AUTO, mca, CTLFLAG_RD, NULL, "Machine Check 
> Architecture");
>
> @@ -102,6 +103,8 @@
>  SYSCTL_INT(_hw_mca, OID_AUTO, erratum383, CTLFLAG_RD, 
> &workaround_erratum383, 0,
>     "Is the workaround for Erratum 383 on AMD Family 10h processors 
> enabled?");
>
> +static STAILQ_HEAD(, mca_internal) mca_freelist;
> +static int mca_freecount;
>  static STAILQ_HEAD(, mca_internal) mca_records;
>  static struct callout mca_timer;
>  static int mca_ticks = 3600;   /* Check hourly by default. */
> @@ -111,7 +114,6 @@
>
>  #ifdef DEV_APIC
>  static struct cmc_state **cmc_state;   /* Indexed by cpuid, bank */
> -static int cmc_banks;
>  static int cmc_throttle = 60;  /* Time in seconds to throttle CMCI. */
>  #endif
>
> @@ -415,21 +417,51 @@
>        return (1);
>  }
>
> -static void __nonnull(1)
> -mca_record_entry(const struct mca_record *record)
> +static void
> +mca_fill_freelist(void)
>  {
>        struct mca_internal *rec;
> +       int desired;
>
> -       rec = malloc(sizeof(*rec), M_MCA, M_NOWAIT);
> -       if (rec == NULL) {
> -               printf("MCA: Unable to allocate space for an event.\n");
> -               mca_log(record);
> -               return;
> +       /*
> +        * Ensure we have at least one record for each bank and one
> +        * record per CPU.
> +        */
> +       desired = imax(mp_ncpus, mca_banks);
> +       mtx_lock_spin(&mca_lock);
> +       while (desired < mca_freecount) {
> +               mtx_unlock_spin(&mca_lock);
> +               rec = malloc(sizeof(*rec), M_MCA, M_WAITOK);
> +               mtx_lock_spin(&mca_lock);
> +               STAILQ_INSERT_TAIL(&mca_freelist, rec, link);
> +               mca_freecount++;
>        }
> +       mtx_unlock_spin(&mca_lock);
> +}
>
> +static void __nonnull(2)
> +mca_record_entry(enum scan_mode mode, const struct mca_record *record)
> +{
> +       struct mca_internal *rec;
> +
> +       if (mode == POLLED) {
> +               rec = malloc(sizeof(*rec), M_MCA, M_WAITOK);
> +               mtx_lock_spin(&mca_lock);
> +       } else {
> +               mtx_lock_spin(&mca_lock);
> +               rec = STAILQ_FIRST(&mca_freelist);
> +               if

Panic with mca trap

2011-02-01 Thread mdf
On a piece of hardware trying to verify basic build tests, we got an
MCA exception that then panic'd the kernel due to WITNESS/INVARIANTS
interaction.

panic @ time 1296563157.510, thread 0xff000554: blockable
sleep lock (sleep mutex) 128 @ /build/mnt/src/sys/vm/uma_core.c:1872

Stack: --
kernel:witness_checkorder+0x7a2
kernel:_mtx_lock_flags+0x81
kernel:uma_zalloc_arg+0x256
kernel:malloc+0xc5
kernel:mca_record_entry+0x30
kernel:mca_scan+0xc9
kernel:mca_intr+0x79
kernel:trap+0x30b
kernel:witness_checkorder+0x66
kernel:_mtx_lock_spin_flags+0xa4
kernel:witness_checkorder+0x2a8
kernel:_mtx_lock_spin_flags+0xa4
kernel:tdq_idled+0xe8
kernel:sched_idletd+0x5b
kernel:fork_exit+0x9b

That's this bit of code in uma_zalloc_arg():

#ifdef INVARIANTS
ZONE_LOCK(zone);
uma_dbg_alloc(zone, NULL, item);
ZONE_UNLOCK(zone);
#endif


I don't know uma(9) well enough to know the best workaround.  Clearly
there are times we can be in uma_zalloc_arg() and taking a regular
mutex is not acceptable.  But what to do for the uma_dbg_free() call
so it's happy, and whether to guard taking the ZONE lock with M_NOWAIT
or td_critnest > 0 or both is outside my current knowledge.

I don't expect we'll see this panic again any time soon, but it would
be nice to fix the story for WITNESS of when an M_NOWAIT allocation
can be done.

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: sleep bug in taskqueue(9)

2010-11-12 Thread mdf
On Fri, Nov 12, 2010 at 12:25 PM, Hans Petter Selasky  wrote:
> On Friday 12 November 2010 17:38:38 m...@freebsd.org wrote:
>> On Fri, Nov 12, 2010 at 6:23 AM, Hans Petter Selasky 
> wrote:
>> > On Friday 12 November 2010 15:18:46 m...@freebsd.org wrote:
>> >> On Fri, Nov 12, 2010 at 12:56 AM, Hans Petter Selasky 
>> >
>> > wrote:
>> >> > On Thursday 29 April 2010 01:59:58 Matthew Fleming wrote:
>> >> >> It looks to me like taskqueue_drain(taskqueue_thread, foo) will not
>> >> >> correctly detect whether or not a task is currently running.  The
>> >> >> check is against a field in the taskqueue struct, but for the
>> >> >> taskqueue_thread queue with more than one thread, multiple threads
>> >> >> can simultaneously be running a task, thus stomping over the
>> >> >> tq_running field.
>> >> >>
>> >> >> I have not seen any problem with the code as-is in actual use, so
>> >> >> this is purely an inspection bug.
>> >> >>
>> >> >> The following patch should fix the problem.  Because it changes the
>> >> >> size of struct task I'm not sure if it would be suitable for MFC.
>> >> >
>> >> > 1) The u_char is going to leave a hole in that structure on ARM
>> >> > platforms for example.
>> >> >
>> >> > 2) The existing taskqueue implementation also has a missing check for
>> >> > the pending count wrapping to zero. I.E. it should stick at 0x
>> >> > and not wrap to 0.
>> >>
>> >> This commit mail is rather old, and this fix was incorrect, because
>> >> the task cannot be referenced after it has been run.  Some task
>> >> handlers will free the task as part of the handler.
>> >
>> > Ok, maybe the e-mail got stuck somewhere. Have you fixed the above
>> > mentioned issues in a newer patch?
>>
>> If you look at the file history for subr_taskqueue.c:
>>
>> http://svn.freebsd.org/viewvc/base/head/sys/kern/subr_taskqueue.c
>>
>> You will see quite a few commits by me.  The most recent relating to
>> detecting if a task is running is being MFC'd today:
>
> Yes, and I see that this code needs an overflow check, which is one of the
> issues still not fixed:

You keep bringing this up.  It is not a new issue.  It is not a bug in
any of the patches.  It is extremely unlikely that a task will be
queued 65536 times before execution.  It is more worthy of an assert
rather than a check, because if a task is enqueued that many times
without being run then there's likely a stuck task in the queue.

The patch you posted will lie as well, so I would not consider it
sufficient if someone wanted to address the issue.

Thanks,
matthew


>
> Before:
>
>        /*
>         * Count multiple enqueues.
>         */
>        if (task->ta_pending) {
>                task->ta_pending++;
>                TQ_UNLOCK(queue);
>                return 0;
>        }
>
>
> After:
>
>        /*
>         * Count multiple enqueues.
>         */
>        if (task->ta_pending) {
>                if (task->ta_pending != 0x)
>                        task->ta_pending++;
>                TQ_UNLOCK(queue);
>                return 0;
>        }
>
> Else the ta_pending can wrap to zero and the code will not do what it
> announces it does.
>
> --HPS
>
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: sleep bug in taskqueue(9)

2010-11-12 Thread mdf
On Fri, Nov 12, 2010 at 6:23 AM, Hans Petter Selasky  wrote:
> On Friday 12 November 2010 15:18:46 m...@freebsd.org wrote:
>> On Fri, Nov 12, 2010 at 12:56 AM, Hans Petter Selasky 
> wrote:
>> > On Thursday 29 April 2010 01:59:58 Matthew Fleming wrote:
>> >> It looks to me like taskqueue_drain(taskqueue_thread, foo) will not
>> >> correctly detect whether or not a task is currently running.  The check
>> >> is against a field in the taskqueue struct, but for the taskqueue_thread
>> >> queue with more than one thread, multiple threads can simultaneously be
>> >> running a task, thus stomping over the tq_running field.
>> >>
>> >> I have not seen any problem with the code as-is in actual use, so this
>> >> is purely an inspection bug.
>> >>
>> >> The following patch should fix the problem.  Because it changes the size
>> >> of struct task I'm not sure if it would be suitable for MFC.
>> >
>> > 1) The u_char is going to leave a hole in that structure on ARM platforms
>> > for example.
>> >
>> > 2) The existing taskqueue implementation also has a missing check for the
>> > pending count wrapping to zero. I.E. it should stick at 0x and not
>> > wrap to 0.
>>
>> This commit mail is rather old, and this fix was incorrect, because
>> the task cannot be referenced after it has been run.  Some task
>> handlers will free the task as part of the handler.
>
> Ok, maybe the e-mail got stuck somewhere. Have you fixed the above mentioned
> issues in a newer patch?

If you look at the file history for subr_taskqueue.c:

http://svn.freebsd.org/viewvc/base/head/sys/kern/subr_taskqueue.c

You will see quite a few commits by me.  The most recent relating to
detecting if a task is running is being MFC'd today:


Revision 213813 - (view) (annotate) - [select for diffs]
Modified Wed Oct 13 22:59:04 2010 UTC (4 weeks, 1 day ago) by mdf
File length: 10831 byte(s)
Diff to previous 213739
Use a safer mechanism for determining if a task is currently running,
that does not rely on the lifetime of pointers being the same. This also
restores the task KBI.

Suggested by:   jhb
MFC after:  1 month

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: sleep bug in taskqueue(9)

2010-11-12 Thread mdf
On Fri, Nov 12, 2010 at 12:56 AM, Hans Petter Selasky  wrote:
> On Thursday 29 April 2010 01:59:58 Matthew Fleming wrote:
>> It looks to me like taskqueue_drain(taskqueue_thread, foo) will not
>> correctly detect whether or not a task is currently running.  The check
>> is against a field in the taskqueue struct, but for the taskqueue_thread
>> queue with more than one thread, multiple threads can simultaneously be
>> running a task, thus stomping over the tq_running field.
>>
>> I have not seen any problem with the code as-is in actual use, so this
>> is purely an inspection bug.
>>
>> The following patch should fix the problem.  Because it changes the size
>> of struct task I'm not sure if it would be suitable for MFC.
>>
>
> 1) The u_char is going to leave a hole in that structure on ARM platforms for
> example.
>
> 2) The existing taskqueue implementation also has a missing check for the
> pending count wrapping to zero. I.E. it should stick at 0x and not wrap to
> 0.

This commit mail is rather old, and this fix was incorrect, because
the task cannot be referenced after it has been run.  Some task
handlers will free the task as part of the handler.

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: MTX_DEF versus MTX_SPIN

2010-11-03 Thread mdf
On Wed, Nov 3, 2010 at 9:42 AM, Andriy Gapon  wrote:
> on 03/11/2010 18:27 m...@freebsd.org said the following:
>> It's not clear to me from the man pages (perhaps I didn't look at the
>> right one?) in which environments I need a spinlock.  For example, I
>> wouldn't think it's safe to use a MTX_DEF in a hard interrupt handler
>> (i.e one that was registered with BUS_SETUP_INTR), but I see some code
>> lying around here that does it and nothing I'm aware of has broken.
>
> Such a handler runs in an interrupt thread.
> The "harder" interrupt handler is called interrupt filter in FreeBSD 
> terminology.
>  I think that it was formerly known as fast interrupt.

So a MTX_DEF is okay in that environment?

What would "best practices" be considered for what code should be run
in the interrupt handler versus a soft interrupt?  In this case the
kinds of things we have to do at some level of interrupt are:

 - handle a heartbeat interrupt from firmware a few times a second
 - get a DMA completion interrupt (completely handling this requires
calling biodone on all the associated bios)
 - receive an ECC interrupt (this requires reading registers off the
card for details)

At the moment we're on stable/7, but we will be migrating the code
base to something more recent in another year or so, if that affects
the answer.

Is there any documentation on best practices for writing a FreeBSD driver?

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


MTX_DEF versus MTX_SPIN

2010-11-03 Thread mdf
It's not clear to me from the man pages (perhaps I didn't look at the
right one?) in which environments I need a spinlock.  For example, I
wouldn't think it's safe to use a MTX_DEF in a hard interrupt handler
(i.e one that was registered with BUS_SETUP_INTR), but I see some code
lying around here that does it and nothing I'm aware of has broken.

Perhaps this comes to me still not understanding exactly how
interrupts work on FreeBSD.  If I capture the stack in a hard
interrupt, it looks something like:

#0 0xff87b07f43b5 at rnv_hard_intr+0x35
#1 0x8026e7ce at ithread_execute_handlers+0x9e
#2 0x8026ead0 at ithread_loop+0x70
#3 0x8026b84c at fork_exit+0x9c
#4 0x804a7f7e at fork_trampoline+0xe

And there the stack ends.  From my perspective, this doesn't look like
anything was actually interrupted.

By way of explaining what I mean, on AIX we defined 10 levels of
software interrupt, and we trained the kernel debugger to understand
the save frames that were acquired on interrupt to print a full stack.
 So a full stack dump might show something like a few frames from one
interrupt handler, then a save area, then a frames from a lower
priority interrupt, then another save area, then the base-level stack.
 In this kind of programming environment, it was important to know at
what interrupt level your handler would execute, so that locks
acquired to synchronize between the top-half and bottom-half were
acquired with interupts disabled to the same level.

So, back to my question.  Is it safe to take a MTX_DEF in a hard
interrupt?  What about a soft interrupt?  I have to assume it's okay
in a soft-interrupt context (swi_sched, callout, etc.), since
softclock() will acquire a MTX_DEF on behalf of a callout.

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


kldload of compressed modules

2010-10-20 Thread mdf
I'm trying to add support for kldload of compressed modules; they work
at boot time but not at runtime.

I'm having trouble uncompressing the module in the kernel to try
processing it.  The difficulty is several-fold:

1) gzip(1) doesn't seem to produce a compressed image that net/zlib.c
understands.  It gets hung up on the first byte, since it's not 0x8
indicating Z_DEFLATED.  I tried setting next_in to the input buffer +
2 since that is the Z_DEFLATED marker, but it dies one byte later with
the "incorrect header check" failure.

2) I was using inflate in kern/inflate.c wrong earlier (not skipping
the first 10 bytes + comment), but even so that code and the net/zlib
code aren't compatible, since they both define an inflate() function
that takes differing numbers of arguments, etc.  And link_elf.c
already can include net/zlib.h when DDB_CTF is set, so that would
conflict with sys/inflate.h.

It's likely that kern/inflate.c is what I want, since it's the code
the boot loader uses, but I'm annoyed that it's only available with
device gzip (I can change this, of course) and that it's incompatible
with what seems to be the somewhat standard compress/decompress code
in net/zlib.c.

All of this is on stable/7 at the moment, though if I had something
that worked I want to commit to CURRENT and MFC.

So, what's the basic problem?  Should net/zlib support this gzipped
file?  Should I have compressed with some other user-space utility?
Should I add a new link_elf_compress.c file to support compressed
images (and use sys/inflate.h instead of net/zlib.h) instead of trying
to add it to the existing link_elf_load_file() functions?

For reference, here's the beginning of the compressed module (on amd64):

# od -h mnvf.ko.gz | less
000  8b1f0808fd2a4cbe03006e6d66766b2e
020  006f7ded740fd554f7b5484df8426613
040  03404401182e28200484120834254ca3
060  d198440901213a51924c190932496663
100  04261a2a8c9a9c65e6c6d6d9fb5a3cb4
120  5adb9ffa5fb60979362d4b586b42a529
140  54a852ad72ada0c7c5467f886f207def
160  99ce73b947336b7d7df5adeb7d6fb8b3
200  7b9c677e7d9fdef69ce7cf7d9b9fef73
220  3ab652ca525328c637fc594aa4a95840
240  8b304bc5de3979fff625552b0396a95d
260  ad92224c74dafe4546fb626528a8fa66
300  f31fc06597b57be3cd2b6109264197c3
320  26674be6f9e56ff6e8624dfe84e64d50
340  1251914205dd7d57f912c23606b525ae
360  d570b5c2092ab54d9e21d70b7e22628f
400  80cc8eabf3c7bae1df8206df6ad76cb8
420  2d706b865c1dc39745b5abc7774292bf
440  c28765d434babe61e15b90da46440f2b

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


uma_zfree(NULL) is broken

2010-10-18 Thread mdf
There's explicit protection for free(NULL, M_FOO), but uma_zfree(zone,
NULL) will put NULL in the local bucket and then probably return it
later from a uma_zalloc call.  Obviously it's not a good idea to call
uma_zfree(9) on NULL, but in this case it's an easy mistake to make
when e.g. converting a set of malloc(9)/free(9) uses into uma(9).

So is the "right" thing to allow a uma_zfree(NULL) and silently
succeed, like for free(9)?  That would be my guess, but I'm open to
alternatives.

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: MAXCPU preparations

2010-09-27 Thread mdf
On Mon, Sep 27, 2010 at 9:21 AM, Sean Bruno  wrote:
> On Mon, 2010-09-27 at 08:53 -0700, Julian Elischer wrote:
>> On 9/27/10 8:26 AM, Sean Bruno wrote:
>> > Does this look like an appropriate modification to libmemstat?
>> >
>> > Sean
>> >
>> >
>> >  //depot/yahoo/ybsd_7/src/lib/libmemstat/memstat.h#4
>> > - /home/seanbru/ybsd_7/src/lib/libmemstat/memstat.h 
>> > @@ -28,12 +28,13 @@
>> >
>> >   #ifndef _MEMSTAT_H_
>> >   #define        _MEMSTAT_H_
>> > +#include
>> >
>> >   /*
>> >    * Number of CPU slots in library-internal data structures.  This
>> > should be
>> >    * at least the value of MAXCPU from param.h.
>> >    */
>> > -#define        MEMSTAT_MAXCPU  64
>> > +#define        MEMSTAT_MAXCPU  MAXCPU /* defined in
>> > sys/${ARCH}/include/param.h */
>> >
>>
>>
>> wouldn't it be better to do a sysctlbyname() and use the real value
>> for the system?
>>
>
> That was my initial thought (as prodded by scottl and peter).
>
> If it is made dynamic, could this be opening a race condition where the
> call to sysctlbyname() returns a count of CPUS that is in turn changed
> by the offlining of a CPU?  Or am I thinking to much about this?

The maximum number of CPUs supported by a running instance will not
change.  Only, potentially, the current number of CPUs.  So a sysctl
to fetch the kernel's compiled-in MAXCPU is safe.

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


sys/conf/files aicasm

2010-09-23 Thread mdf
I can't say I understand much about the syntax of the top part of
sys/conf/files, but this line:


aicasm  optional ahc | ahd \
dependency  "$S/dev/aic7xxx/aicasm/*.[chyl]"   \
compile-with"CC='${CC}' ${MAKE} -f $S/dev/aic7xxx/aicasm/Makefile
MAKESRCPATH=$S/dev/aic7xxx/aicasm" \
no-obj no-implicit-rule\
clean   "aicasm* y.tab.h"

looks to me like aicasm should only be built if I have device ahc or
device ahd in my kernel configuration file.

But even if I make a config file without those devices (e.g. by doing
include GENERIC then nodevices ahc, ahd) the file is still built.  Am
I missing something about how these lines in sys/conf/files work?

As a side question, why does the Makefile for dev/aic7xxx/aicasm have
-I/usr/include in CFLAGS instead of using some marcos to get to the
source tree I'm actually building from?

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: g_event spinning 100% when doing 'gpart add'

2010-09-11 Thread mdf
On Fri, Sep 10, 2010 at 6:57 PM, Yuri Pankov  wrote:
> Nevermind me. That's what I thought why I was getting the same gpart behavior
> switching between kernels, with and without DEBUG_LOCKS. Sorry about that.
>
> Same here, gpart hangs on:
> 3826 gpart    CALL  __sysctl(0x7fffa250,0x3,0,0x7fffa268,0,0)
> 3826 gpart    SCTL  "kern.geom.confxml"

When I was doing my sbuf changes I temporarily had a patch to change
these sysctls to use the new drains.  I couldn't get any output from
the three kern.geom.conf{txt,dot,xml} sysctls (they didn't hang but
the output was empty) so I reverted those parts before committing.  I
doubt my changes affected this, but I will check on Monday when I'm
back at work to see if the lack of output is due to my changes or
predates me.

Cheers,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: deprecating sprintf(9)

2010-09-08 Thread mdf
On Wed, Sep 8, 2010 at 9:15 AM, Rink Springer  wrote:
> Hi,
>
> On Wed, Sep 08, 2010 at 08:51:57AM -0700, m...@freebsd.org wrote:
>> It seems like a large project, but OTOH sprintf(9) is mighty unsafe in
>> the kernel.  It's disapproved of for user-space as being unsafe for
>> security reasons as well, but the potential downsides aren't the same,
>> and we'll never clean up ports anyways. :-)
>
> Deprecating it may be usable, yet I don't believe we can easily enforce
> such a policy [1].

If the kernel sources don't use it then the prototype can be removed.

> Have you looked at how many (potentially) unsecure
> uses there are in the kernel, to give an idea how useful such an effort
> would be?

I presume all the kernel uses are safe at the moment, but it's an
error prone construction.

As of this morning grep found 1277 occurrences of sprintf(9) in sys/
and 23 occurrences of vsprintf(9) in sys/.

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


deprecating sprintf(9)

2010-09-08 Thread mdf
Looking at the uses of kvprintf(9), only [v]sprintf(9) doesn't have a
callback function.  It seems a little sketchy to me to be doing unsafe
sprintf in the kernel anyways.

Should we (and by we, I mean me) deprecate sprintf(9) and convert the
existing 1200+ uses to strcpy(9) for fixed strings (also potentially
bad, but a different beast), or snprintf(9) where the size of the
buffer is known?

It seems like a large project, but OTOH sprintf(9) is mighty unsafe in
the kernel.  It's disapproved of for user-space as being unsafe for
security reasons as well, but the potential downsides aren't the same,
and we'll never clean up ports anyways. :-)

Thoughts?
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: sched_pin() bug in SCHED_ULE

2010-09-01 Thread mdf
On Wed, Sep 1, 2010 at 6:49 AM, John Baldwin  wrote:
> On Tuesday, August 31, 2010 2:53:12 pm m...@freebsd.org wrote:
>> On Tue, Aug 31, 2010 at 10:16 AM,   wrote:
>> > I recorded the stack any time ts->ts_cpu was set and when a thread was
>> > migrated by sched_switch() I printed out the recorded info.  Here's
>> > what I found:
>> >
>> >
>> > XXX bug 67957: moving 0xff003ff9b800 from 3 to 1
>> > [1]: pin 0 state 4 move 3 -> 1 done by 0xff000cc44000:
>> > #0 0x802b36b4 at bug67957+0x84
>> > #1 0x802b5dd4 at sched_affinity+0xd4
>> > #2 0x8024a707 at cpuset_setthread+0x137
>> > #3 0x8024aeae at cpuset_setaffinity+0x21e
>> > #4 0x804a82df at freebsd32_cpuset_setaffinity+0x4f
>> > #5 0x80295f49 at isi_syscall+0x99
>> > #6 0x804a630e at ia32_syscall+0x1ce
>> > #7 0x8046dc60 at Xint0x80_syscall+0x60
>> > [0]: pin 0 state 2 move 0 -> 3 done by 0xff000cc44000:
>> > #0 0x802b36b4 at bug67957+0x84
>> > #1 0x802b4ad8 at sched_add+0xe8
>> > #2 0x8029b96a at create_thread+0x34a
>> > #3 0x8029badc at kern_thr_new+0x8c
>> > #4 0x804a8912 at freebsd32_thr_new+0x122
>> > #5 0x80295f49 at isi_syscall+0x99
>> > #6 0x804a630e at ia32_syscall+0x1ce
>> > #7 0x8046dc60 at Xint0x80_syscall+0x60
>> >
>> > So one thread in the process called cpuset_setaffinity(2), and another
>> > thread in the process was forcibly migrated by the IPI while returning
>> > from a syscall, while it had td_pinned set.
>> >
>> > Given this path, it seems reasonable to me to skip the migrate if we
>> > notice THREAD_CAN_MIGRATE is false.
>> >
>> > Opinions?  My debug code is below.  I'll try to write a short testcase
>> > that exhibits this bug.
>>
>> Just a few more thoughts on this.  The check in sched_affinity for
>> THREAD_CAN_MIGRATE is racy.  Since witness uses sched_pin, it's not
>> simple to take the THREAD lock around an increment of td_pinned.  So
>> I'm looking for suggestions on the best way to fix this issue.  My
>> thoughts:
>>
>> 1) add a check in sched_switch() for THREAD_CAN_MIGRATE
>> 2) have WITNESS not use sched_pin, and take the THREAD lock when
>> modifying td_pinned
>> 3) have the IPI_PREEMPT handler notice the thread is pinned (and not
>> trying to bind) and postpone the mi_switch, just like it postpones
>> when a thread is in a critical section.
>>
>> Except for the potential complexity of implementation, I think (2) is
>> the best solution.
>
> I don't like 2) only because you'd really need to fix all the other places
> that use sched_pin() to grab the thread lock.  That would be a bit expensive.
> I think the problem is code that checks THREAD_CAN_MIGRATE() for running
> threads that aren't curthread.
>
> The sched_affinity() bug is probably my fault FWIW.  I think something along
> the lines of 1) is the best approach.  Maybe something like the patch below.
> It moves the CPU assignment out of sched_affinity() and moves it into
> sched_switch() instead so that it takes effect on the next context switch for
> this thread.  That is the point at which the CPU binding actually takes effect
> anyway since sched_affinity() doesn't force an immediate context switch.  This
> patch is relative to 7.  The patch for 9 is nearly identical (just a fixup for
> ipi_cpu() vs ipi_selected()).
>
> Index: sched_ule.c
> ===
> --- sched_ule.c (revision 212065)
> +++ sched_ule.c (working copy)
> @@ -1885,6 +1885,8 @@
>                srqflag = (flags & SW_PREEMPT) ?
>                    SRQ_OURSELF|SRQ_YIELDING|SRQ_PREEMPTED :
>                    SRQ_OURSELF|SRQ_YIELDING;
> +               if (THREAD_CAN_MIGRATE(td) && !THREAD_CAN_SCHED(td, 
> ts->tscpu))
> +                       ts->ts_cpu = sched_pickcpu(td, 0);
>                if (ts->ts_cpu == cpuid)
>                        tdq_add(tdq, td, srqflag);
>                else
> @@ -2536,7 +2538,6 @@
>  {
>  #ifdef SMP
>        struct td_sched *ts;
> -       int cpu;
>
>        THREAD_LOCK_ASSERT(td, MA_OWNED);
>        ts = td->td_sched;
> @@ -2550,17 +2551,13 @@
>        if (!TD_IS_RUNNING(td))
>                return;
>        td->td_flags |= TDF_NEEDRESCHED;
> -       if (!THREAD_CAN_MIGRATE(td))
> -               return;
>        /*
> -        * Assign the new cpu and force a switch before returning to
> -        * userspace.  If the target thread is not running locally send
> -        * an ipi to force the issue.
> +        * Force a switch before returning to userspace.  If the
> +        * target thread is not running locally send an ipi to force
> +        * the issue.
>         */
> -       cpu = ts->ts_cpu;
> -       ts->ts_cpu = sched_pickcpu(td, 0);
> -       if (cpu != PCPU_GET(cpuid))
> -               ipi_selected(1 << cpu, IPI_PREEMPT);
> +       if (td != curthread)
> +               ipi_selected(1 << ts->ts_cpu, IPI_PREEMPT);
>  #endif
>  }

I will test this patch out; thank

Re: sched_pin() bug in SCHED_ULE

2010-08-31 Thread mdf
On Tue, Aug 31, 2010 at 10:16 AM,   wrote:
> I recorded the stack any time ts->ts_cpu was set and when a thread was
> migrated by sched_switch() I printed out the recorded info.  Here's
> what I found:
>
>
> XXX bug 67957: moving 0xff003ff9b800 from 3 to 1
> [1]: pin 0 state 4 move 3 -> 1 done by 0xff000cc44000:
> #0 0x802b36b4 at bug67957+0x84
> #1 0x802b5dd4 at sched_affinity+0xd4
> #2 0x8024a707 at cpuset_setthread+0x137
> #3 0x8024aeae at cpuset_setaffinity+0x21e
> #4 0x804a82df at freebsd32_cpuset_setaffinity+0x4f
> #5 0x80295f49 at isi_syscall+0x99
> #6 0x804a630e at ia32_syscall+0x1ce
> #7 0x8046dc60 at Xint0x80_syscall+0x60
> [0]: pin 0 state 2 move 0 -> 3 done by 0xff000cc44000:
> #0 0x802b36b4 at bug67957+0x84
> #1 0x802b4ad8 at sched_add+0xe8
> #2 0x8029b96a at create_thread+0x34a
> #3 0x8029badc at kern_thr_new+0x8c
> #4 0x804a8912 at freebsd32_thr_new+0x122
> #5 0x80295f49 at isi_syscall+0x99
> #6 0x804a630e at ia32_syscall+0x1ce
> #7 0x8046dc60 at Xint0x80_syscall+0x60
>
> So one thread in the process called cpuset_setaffinity(2), and another
> thread in the process was forcibly migrated by the IPI while returning
> from a syscall, while it had td_pinned set.
>
> Given this path, it seems reasonable to me to skip the migrate if we
> notice THREAD_CAN_MIGRATE is false.
>
> Opinions?  My debug code is below.  I'll try to write a short testcase
> that exhibits this bug.

Just a few more thoughts on this.  The check in sched_affinity for
THREAD_CAN_MIGRATE is racy.  Since witness uses sched_pin, it's not
simple to take the THREAD lock around an increment of td_pinned.  So
I'm looking for suggestions on the best way to fix this issue.  My
thoughts:

1) add a check in sched_switch() for THREAD_CAN_MIGRATE
2) have WITNESS not use sched_pin, and take the THREAD lock when
modifying td_pinned
3) have the IPI_PREEMPT handler notice the thread is pinned (and not
trying to bind) and postpone the mi_switch, just like it postpones
when a thread is in a critical section.

Except for the potential complexity of implementation, I think (2) is
the best solution.

For those who want to play at home, I have a small test program that
exhibits this behavior at
http://people.freebsd.org/~mdf/cpu_affinity_test.c.  It seems to
require 4 or more CPUs to hit the assert.  You'll also need to patch
the kernel to assert when migrating a pinned thread:

Index: kern/sched_ule.c
===
--- kern/sched_ule.c(revision 158580)
+++ kern/sched_ule.c(working copy)
@@ -1888,11 +1889,26 @@ sched_switch(struct thread *td, struct t
srqflag = (flags & SW_PREEMPT) ?
SRQ_OURSELF|SRQ_YIELDING|SRQ_PREEMPTED :
SRQ_OURSELF|SRQ_YIELDING;
if (ts->ts_cpu == cpuid)
tdq_add(tdq, td, srqflag);
-   else
+   else {
+   KASSERT(THREAD_CAN_MIGRATE(td) ||
+   (ts->ts_flags & TSF_BOUND) != 0,
+   ("Thread %p shouldn't migrate!", td));
mtx = sched_switch_migrate(tdq, td, srqflag);
+   }
} else {
/* This thread must be going to sleep. */
TDQ_LOCK(tdq);
mtx = thread_lock_block(td);

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: sched_pin() bug in SCHED_ULE

2010-08-31 Thread mdf
I recorded the stack any time ts->ts_cpu was set and when a thread was
migrated by sched_switch() I printed out the recorded info.  Here's
what I found:


XXX bug 67957: moving 0xff003ff9b800 from 3 to 1
[1]: pin 0 state 4 move 3 -> 1 done by 0xff000cc44000:
#0 0x802b36b4 at bug67957+0x84
#1 0x802b5dd4 at sched_affinity+0xd4
#2 0x8024a707 at cpuset_setthread+0x137
#3 0x8024aeae at cpuset_setaffinity+0x21e
#4 0x804a82df at freebsd32_cpuset_setaffinity+0x4f
#5 0x80295f49 at isi_syscall+0x99
#6 0x804a630e at ia32_syscall+0x1ce
#7 0x8046dc60 at Xint0x80_syscall+0x60
[0]: pin 0 state 2 move 0 -> 3 done by 0xff000cc44000:
#0 0x802b36b4 at bug67957+0x84
#1 0x802b4ad8 at sched_add+0xe8
#2 0x8029b96a at create_thread+0x34a
#3 0x8029badc at kern_thr_new+0x8c
#4 0x804a8912 at freebsd32_thr_new+0x122
#5 0x80295f49 at isi_syscall+0x99
#6 0x804a630e at ia32_syscall+0x1ce
#7 0x8046dc60 at Xint0x80_syscall+0x60

So one thread in the process called cpuset_setaffinity(2), and another
thread in the process was forcibly migrated by the IPI while returning
from a syscall, while it had td_pinned set.

Given this path, it seems reasonable to me to skip the migrate if we
notice THREAD_CAN_MIGRATE is false.

Opinions?  My debug code is below.  I'll try to write a short testcase
that exhibits this bug.

Thanks,
matthew


Index: kern/sched_ule.c
===
--- kern/sched_ule.c(revision 158580)
+++ kern/sched_ule.c(working copy)
@@ -697,6 +697,41 @@
return;
 }

+static void
+bug67957(struct thread *td)
+{
+   int idx;
+
+   THREAD_LOCK_ASSERT(td, MA_OWNED);
+   idx = (td->xxx_idx++ % 5);
+   stack_save(&td->xxx[idx].td_preempt);
+   td->xxx[idx].td_moveto = td->td_sched->ts_cpu;
+   td->xxx[idx].td_movefrom = (td->td_oncpu == NOCPU) ? td->td_lastcpu
: td->td_oncpu;
+   td->xxx[idx].td_statewas = td->td_state;
+   td->xxx[idx].td_pinned = td->td_pinned;
+   td->xxx[idx].td_by = curthread;
+}
+
+static void
+pr_bug67957(struct thread *td, int idx)
+{
+   int idx, i;
+
+   printf("XXX bug 67957: moving %p from %d to %d\n",
+   td, td->td_lastcpu, td->td_sched->ts_cpu);
+   for (i = 0, idx = td->xxx_idx - 1;
+   i < 5 && idx >= 0;
+   i++, idx--) {
+   printf("[%d]: pin %d state %d move %d -> %d done by %p:\n",
+   idx, td->xxx[idx % 5].td_pinned,
+   td->xxx[idx % 5].td_statewas,
+   td->xxx[idx % 5].td_movefrom,
+   td->xxx[idx % 5].td_moveto,
+   td->xxx[idx % 5].td_by);
+   stack_print_ddb(&td->xxx[idx % 5].td_preempt);
+   }
+}
+
 /*
  * Move a thread from one thread queue to another.
  */
@@ -739,6 +774,7 @@
TDQ_UNLOCK(from);
sched_rem(td);
ts->ts_cpu = cpu;
+   bug67957(td);
td->td_lock = TDQ_LOCKPTR(to);
tdq_add(to, td, SRQ_YIELDING);
 }
@@ -971,6 +1007,7 @@
tdq = TDQ_CPU(cpu);
td = ts->ts_thread;
ts->ts_cpu = cpu;
+   bug67957(td);

/* If the lock matches just return the queue. */
if (td->td_lock == TDQ_LOCKPTR(tdq))
@@ -1890,8 +1964,15 @@
SRQ_OURSELF|SRQ_YIELDING;
if (ts->ts_cpu == cpuid)
tdq_add(tdq, td, srqflag);
-   else
+   else {
+   if (!THREAD_CAN_MIGRATE(td) &&
+   (ts->ts_flags & TSF_BOUND) == 0) {
+   pr_bug67957(td, idx);
+   panic("XXX");
+   }
mtx = sched_switch_migrate(tdq, td, srqflag);
+   }
+   td->xxx_idx = 0;
} else {
/* This thread must be going to sleep. */
TDQ_LOCK(tdq);
@@ -2479,8 +2560,10 @@
 * target cpu.
 */
if (td->td_priority <= PRI_MAX_ITHD && THREAD_CAN_MIGRATE(td) &&
-   curthread->td_intr_nesting_level)
+   curthread->td_intr_nesting_level) {
ts->ts_cpu = cpuid;
+   bug67957(td);
+   }
if (!THREAD_CAN_MIGRATE(td))
cpu = ts->ts_cpu;
else
@@ -2590,6 +2673,7 @@
 */
cpu = ts->ts_cpu;
ts->ts_cpu = sched_pickcpu(td, 0);
+   bug67957(td);
if (cpu != PCPU_GET(cpuid))
ipi_selected(1 << cpu, IPI_PREEMPT);
 #endif
@@ -2613,6 +2697,7 @@
if (PCPU_GET(cpuid) == cpu)
return;
ts->ts_cpu = cpu;
+   bug67957(td);
/* When we return from mi_switch we'll be on the correct cpu. */
mi_switch(SW_VOL, NULL);
 #endif
Index: sys/proc.h
===
--- sys/proc.h  (revision 158580)
+++ sys/proc.h  (working 

Re: sched_pin() bug in SCHED_ULE

2010-08-26 Thread mdf
On Thu, Aug 26, 2010 at 3:10 PM, Andriy Gapon  wrote:
> on 27/08/2010 00:20 m...@freebsd.org said the following:
>>
>> I tried making sched_pin() a real function which used
>> intr_disable/intr_restore around saving off td->td_oncpu,
>> td->td_lastcpu and ts->ts_cpu, and the stack at the time of call.  In
>> sched_switch when I saw an unexpected migration I printed all that
>> out.  I found that on my boxes, at sched_pin() time ts_cpu was already
>> different from td->td_oncpu, so the specific problem I was having was
>> that while another thread can change ts_cpu it has no way to force
>> that thread to immediately migrate.
>
> Like e.g. in sched_affinity where ts_cpu is first changed and then the old cpu
> is ipi-ed?

That could do it.  I didn't follow the stack of the places that were
touching ts_cpu for non-curthread.

>> I believe the below patch fixes the issue, though I'm open to other methods:
>>
>>
>> Index: kern/sched_ule.c
>> ===
>> --- kern/sched_ule.c  (.../head/src/sys)      (revision 158279)
>> +++ kern/sched_ule.c  (.../branches/BR_BUG_67957/src/sys)     (revision 
>> 158279)
>> @@ -112,6 +112,7 @@
>>  /* flags kept in ts_flags */
>>  #define      TSF_BOUND       0x0001          /* Thread can not migrate. */
>>  #define      TSF_XFERABLE    0x0002          /* Thread was added as 
>> transferable. */
>> +#define      TSF_BINDING     0x0004          /* Thread is being bound. */
>
> I don't really follow why TSF_BINDING is needed, i.e. why TSF_BOUND is not
> sufficient in this case?

I wanted to tighten up the asserts, so that the only time it was okay
for a td_pinned thread to migrate was the one time it was moving to
the target cpu.  Having a separate flag allows that.

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: sched_pin() bug in SCHED_ULE

2010-08-26 Thread mdf
On Thu, Aug 26, 2010 at 1:49 PM, John Baldwin  wrote:
> On Thursday, August 26, 2010 4:03:38 pm m...@freebsd.org wrote:
>> Back at the beginning of August I posted about issues with sched_pin()
>> and witness_warn():
>>
>> http://lists.freebsd.org/pipermail/freebsd-hackers/2010-August/032553.html
>>
>> After a lot of debugging I think I've basically found the issue.  I
>> found this bug on stable/7, but looking at the commit log for CURRENT
>> I don't see any reason the bug doesn't exist there.  This analysis is
>> specific to amd64/i386 but the problem is likely to exist on most
>> arches.
>>
>> The basic problem is that in both tdq_move() and sched_setcpu() can
>> manipulate the ts->ts_cpu variable of another process, which may be
>> running.  This means that a process running on CPU N may be set to run
>> on CPU M the next context switch.
>>
>> Then, that process may call sched_pin(), then grab a PCPU variable.
>> An IPI_PREEMPT can come in, causing the thread to call mi_switch() in
>> ipi_bitmap_handler().  sched_switch() will then notice that ts->ts_cpu
>> is not PCPU_GET(cpuid), and call sched_switch_migrate(), migrating the
>> thread to the intended CPU.  Thus after sched_pin() and potentially
>> before using any PCPU variable the process can get migrated to another
>> CPU, and now it is using a PCPU variable for the wrong processor.
>>
>> Given that ts_cpu can be set by other threads, it doesn't seem worth
>> checking at sched_pin time whether ts_cpu is not the same as td_oncpu,
>> because to make the check would require taking the thread_lock.
>> Instead, I propose adding a check for !THREAD_CAN_MIGRATE() before
>> calling sched_switch_migrate().  However, sched_pin() is also used by
>> sched_bind(9) to force the thread to stay on the intended cpu.  I
>> would handle this by adding a TSF_BINDING state that is different from
>> TSF_BOUND.
>>
>> The thing that has me wondering whether this is all correct is that I
>> don't see any checks in tdq_move() or sched_setcpu() for either
>> TSF_BOUND or THREAD_CAN_MIGRATE().  Perhaps that logic is managed in
>> the various calling functions; in that case I would propose adding
>> asserts that the thread is THREAD_CAN_MIGRATE() unless it's marked
>> TSF_BINDING.
>>
>> Does this analysis seem correct?
>
> Calling code does check THREAD_CAN_MIGRATE(), but the problem is that it is
> not safe to call that on anything but curthread as td_pinned is not locked.
> It is assumed that only curthread would ever check td_pinned, not other
> threads.  I suspect what is happening is that another CPU is seeing a stale
> value of td_pinned.  You could fix this by grabbing the thread lock in
> sched_pin()/unpin() for ULE, but that is a bit expensive perhaps.

I tried making sched_pin() a real function which used
intr_disable/intr_restore around saving off td->td_oncpu,
td->td_lastcpu and ts->ts_cpu, and the stack at the time of call.  In
sched_switch when I saw an unexpected migration I printed all that
out.  I found that on my boxes, at sched_pin() time ts_cpu was already
different from td->td_oncpu, so the specific problem I was having was
that while another thread can change ts_cpu it has no way to force
that thread to immediately migrate.

I believe the below patch fixes the issue, though I'm open to other methods:


Index: kern/sched_ule.c
===
--- kern/sched_ule.c(.../head/src/sys)  (revision 158279)
+++ kern/sched_ule.c(.../branches/BR_BUG_67957/src/sys) (revision 
158279)
@@ -112,6 +112,7 @@
 /* flags kept in ts_flags */
 #defineTSF_BOUND   0x0001  /* Thread can not migrate. */
 #defineTSF_XFERABLE0x0002  /* Thread was added as 
transferable. */
+#defineTSF_BINDING 0x0004  /* Thread is being bound. */

 static struct td_sched td_sched0;

@@ -1859,6 +1858,7 @@
struct mtx *mtx;
int srqflag;
int cpuid;
+   int do_switch;

THREAD_LOCK_ASSERT(td, MA_OWNED);

@@ -1888,10 +1888,21 @@
srqflag = (flags & SW_PREEMPT) ?
SRQ_OURSELF|SRQ_YIELDING|SRQ_PREEMPTED :
SRQ_OURSELF|SRQ_YIELDING;
-   if (ts->ts_cpu == cpuid)
+   /*
+* Allow the switch to another processor as requested
+* only if the thread can migrate or we are in the
+* middle of binding for sched_bind(9).  This keeps
+* sched_pin() quick, since other threads can
+* manipulate ts_cpu.
+*/
+   do_switch = (ts->ts_cpu != cpuid);
+   if (!THREAD_CAN_MIGRATE(td) &&
+   (ts->ts_flags & TSF_BINDING) == 0)
+   do_switch = 0;
+   if (do_switch)
+   mtx = sched_switch_migrate(tdq, td, srqflag);
+   else
tdq_add(tdq, td, srqflag);
-   else
-   

sched_pin() bug in SCHED_ULE

2010-08-26 Thread mdf
Back at the beginning of August I posted about issues with sched_pin()
and witness_warn():

http://lists.freebsd.org/pipermail/freebsd-hackers/2010-August/032553.html

After a lot of debugging I think I've basically found the issue.  I
found this bug on stable/7, but looking at the commit log for CURRENT
I don't see any reason the bug doesn't exist there.  This analysis is
specific to amd64/i386 but the problem is likely to exist on most
arches.

The basic problem is that in both tdq_move() and sched_setcpu() can
manipulate the ts->ts_cpu variable of another process, which may be
running.  This means that a process running on CPU N may be set to run
on CPU M the next context switch.

Then, that process may call sched_pin(), then grab a PCPU variable.
An IPI_PREEMPT can come in, causing the thread to call mi_switch() in
ipi_bitmap_handler().  sched_switch() will then notice that ts->ts_cpu
is not PCPU_GET(cpuid), and call sched_switch_migrate(), migrating the
thread to the intended CPU.  Thus after sched_pin() and potentially
before using any PCPU variable the process can get migrated to another
CPU, and now it is using a PCPU variable for the wrong processor.

Given that ts_cpu can be set by other threads, it doesn't seem worth
checking at sched_pin time whether ts_cpu is not the same as td_oncpu,
because to make the check would require taking the thread_lock.
Instead, I propose adding a check for !THREAD_CAN_MIGRATE() before
calling sched_switch_migrate().  However, sched_pin() is also used by
sched_bind(9) to force the thread to stay on the intended cpu.  I
would handle this by adding a TSF_BINDING state that is different from
TSF_BOUND.

The thing that has me wondering whether this is all correct is that I
don't see any checks in tdq_move() or sched_setcpu() for either
TSF_BOUND or THREAD_CAN_MIGRATE().  Perhaps that logic is managed in
the various calling functions; in that case I would propose adding
asserts that the thread is THREAD_CAN_MIGRATE() unless it's marked
TSF_BINDING.

Does this analysis seem correct?

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [head tinderbox] failure on amd64/amd64

2010-08-12 Thread mdf
On Thu, Aug 12, 2010 at 1:51 PM,   wrote:
> The tinderbox break is my bad.  I will have it fixed by the end of the day.

Actually, the amd64 break is not me, because it broke before it got to
my mistake.  i386 and presumably all the other 32-bit builds are due
to my memguard(9) patch.

> :-(
> matthew
>
> On Thu, Aug 12, 2010 at 3:25 AM, FreeBSD Tinderbox
>  wrote:
>> TB --- 2010-08-12 01:30:00 - tinderbox 2.6 running on 
>> freebsd-current.sentex.ca
>> TB --- 2010-08-12 01:30:00 - starting HEAD tinderbox run for amd64/amd64
>> TB --- 2010-08-12 01:30:00 - cleaning the object tree
>> TB --- 2010-08-12 01:30:27 - cvsupping the source tree
>> TB --- 2010-08-12 01:30:27 - /usr/bin/csup -z -r 3 -g -L 1 -h 
>> cvsup.sentex.ca /tinderbox/HEAD/amd64/amd64/supfile
>> TB --- 2010-08-12 01:46:12 - building world
>> TB --- 2010-08-12 01:46:12 - MAKEOBJDIRPREFIX=/obj
>> TB --- 2010-08-12 01:46:12 - PATH=/usr/bin:/usr/sbin:/bin:/sbin
>> TB --- 2010-08-12 01:46:12 - TARGET=amd64
>> TB --- 2010-08-12 01:46:12 - TARGET_ARCH=amd64
>> TB --- 2010-08-12 01:46:12 - TZ=UTC
>> TB --- 2010-08-12 01:46:12 - __MAKE_CONF=/dev/null
>> TB --- 2010-08-12 01:46:12 - cd /src
>> TB --- 2010-08-12 01:46:12 - /usr/bin/make -B buildworld
> World build started on Thu Aug 12 01:46:13 UTC 2010
> Rebuilding the temporary build tree
> stage 1.1: legacy release compatibility shims
> stage 1.2: bootstrap tools
> stage 2.1: cleaning up the object tree
> stage 2.2: rebuilding the object tree
> stage 2.3: build tools
> stage 3: cross tools
> stage 4.1: building includes
> stage 4.2: building libraries
> stage 4.3: make dependencies
> stage 4.4: building everything
>> [...]
>> cc -O2 -pipe  -DACPI_EXEC_APP -fno-strict-aliasing 
>> -I/src/usr.sbin/acpi/acpidb/../../../sys -std=gnu99 -fstack-protector 
>> -Wsystem-headers -Werror -Wall -Wno-format-y2k -Wno-uninitialized 
>> -Wno-pointer-sign -c 
>> /src/usr.sbin/acpi/acpidb/../../../sys/contrib/dev/acpica/utilities/utxface.c
>> cc -O2 -pipe  -DACPI_EXEC_APP -fno-strict-aliasing 
>> -I/src/usr.sbin/acpi/acpidb/../../../sys -std=gnu99 -fstack-protector 
>> -Wsystem-headers -Werror -Wall -Wno-format-y2k -Wno-uninitialized 
>> -Wno-pointer-sign  -o acpidb acpidb.o osunixxf.o dbcmds.o dbdisply.o 
>> dbexec.o dbfileio.o dbhistry.o dbinput.o dbstats.o dbutils.o dbxface.o 
>> dmbuffer.o dmnames.o dmobject.o dmopcode.o dmresrc.o dmresrcl.o dmresrcs.o 
>> dmutils.o dmwalk.o evevent.o evgpe.o evgpeblk.o evgpeinit.o evgpeutil.o 
>> evmisc.o evregion.o evrgnini.o evsci.o evxface.o evxfevnt.o evxfregn.o 
>> hwacpi.o hwgpe.o hwregs.o hwsleep.o hwvalid.o hwxface.o dsfield.o dsinit.o 
>> dsmethod.o dsmthdat.o dsobject.o dsopcode.o dsutils.o dswexec.o dswload.o 
>> dswscope.o dswstate.o exconfig.o exconvrt.o excreate.o exdebug.o exdump.o 
>> exfield.o exfldio.o exmisc.o exmutex.o exnames.o exoparg1.o exoparg2.o 
>> exoparg3.o exoparg6.o exprep.o exregion.o exresnte.o exresolv.o exresop.o 
>> exstore.o exstoren.o exstorob.o exsystem.o exutils.o psargs.o psloop.o psopc!
>>  ode.o psparse.o psscope.o pstree.o psutils.o pswalk.o psxface.o nsaccess.o 
>> nsalloc.o nsdump.o nseval.o nsinit.o nsload.o nsnames.o nsobject.o nsparse.o 
>> nspredef.o nsrepair.o nsrepair2.o nssearch.o nsutils.o nswalk.o nsxfeval.o 
>> nsxfname.o nsxfobj.o rsaddr.o rscalc.o rscreate.o rsdump.o rsinfo.o rsio.o 
>> rsirq.o rslist.o rsmemory.o rsmisc.o rsutils.o rsxface.o tbfadt.o tbfind.o 
>> tbinstal.o tbutils.o tbxface.o tbxfroot.o utalloc.o utcache.o utcopy.o 
>> utdebug.o utdelete.o uteval.o utglobal.o utids.o utinit.o utlock.o utmath.o 
>> utmisc.o utmutex.o utobject.o utosi.o utresrc.o utstate.o uttrack.o 
>> utxface.o -lpthread
>> gzip -cn /src/usr.sbin/acpi/acpidb/acpidb.8 > acpidb.8.gz
>> ===> usr.sbin/acpi/acpidump (all)
>> cc -O2 -pipe  -I/src/usr.sbin/acpi/acpidump/../../../sys -std=gnu99 
>> -fstack-protector -Wsystem-headers -Werror -Wall -Wno-format-y2k -W 
>> -Wno-unused-parameter -Wstrict-prototypes -Wmissing-prototypes 
>> -Wpointer-arith -Wreturn-type -Wcast-qual -Wwrite-strings -Wswitch -Wshadow 
>> -Wunused-parameter -Wcast-align -Wchar-subscripts -Winline -Wnested-externs 
>> -Wredundant-decls -Wold-style-definition -Wno-pointer-sign -c 
>> /src/usr.sbin/acpi/acpidump/acpi.c
>> cc1: warnings being treated as errors
>> /src/usr.sbin/acpi/acpidump/acpi.c: In function 'acpi_handle_tcpa':
>> /src/usr.sbin/acpi/acpidump/acpi.c:650: warning: format '%lld' expects type 
>> 'long long int', but argument 4 has type 'u_int64_t'
>> *** Error code 1
>>
>> Stop in /src/usr.sbin/acpi/acpidump.
>> *** Error code 1
>>
>> Stop in /src/usr.sbin/acpi.
>> *** Error code 1
>>
>> Stop in /src/usr.sbin.
>> *** Error code 1
>>
>> Stop in /src.
>> *** Error code 1
>>
>> Stop in /src.
>> *** Error code 1
>>
>> Stop in /src.
>> TB --- 2010-08-12 03:25:04 - WARNING: /usr/bin/make returned exit code  1
>> TB --- 2010-08-12 03:25:04 - ERROR: failed to build world
>> TB --- 2010-08-12 03:25:04 - 4691.49 user 830.9

Re: [head tinderbox] failure on amd64/amd64

2010-08-12 Thread mdf
The tinderbox break is my bad.  I will have it fixed by the end of the day.

:-(
matthew

On Thu, Aug 12, 2010 at 3:25 AM, FreeBSD Tinderbox
 wrote:
> TB --- 2010-08-12 01:30:00 - tinderbox 2.6 running on 
> freebsd-current.sentex.ca
> TB --- 2010-08-12 01:30:00 - starting HEAD tinderbox run for amd64/amd64
> TB --- 2010-08-12 01:30:00 - cleaning the object tree
> TB --- 2010-08-12 01:30:27 - cvsupping the source tree
> TB --- 2010-08-12 01:30:27 - /usr/bin/csup -z -r 3 -g -L 1 -h cvsup.sentex.ca 
> /tinderbox/HEAD/amd64/amd64/supfile
> TB --- 2010-08-12 01:46:12 - building world
> TB --- 2010-08-12 01:46:12 - MAKEOBJDIRPREFIX=/obj
> TB --- 2010-08-12 01:46:12 - PATH=/usr/bin:/usr/sbin:/bin:/sbin
> TB --- 2010-08-12 01:46:12 - TARGET=amd64
> TB --- 2010-08-12 01:46:12 - TARGET_ARCH=amd64
> TB --- 2010-08-12 01:46:12 - TZ=UTC
> TB --- 2010-08-12 01:46:12 - __MAKE_CONF=/dev/null
> TB --- 2010-08-12 01:46:12 - cd /src
> TB --- 2010-08-12 01:46:12 - /usr/bin/make -B buildworld
 World build started on Thu Aug 12 01:46:13 UTC 2010
 Rebuilding the temporary build tree
 stage 1.1: legacy release compatibility shims
 stage 1.2: bootstrap tools
 stage 2.1: cleaning up the object tree
 stage 2.2: rebuilding the object tree
 stage 2.3: build tools
 stage 3: cross tools
 stage 4.1: building includes
 stage 4.2: building libraries
 stage 4.3: make dependencies
 stage 4.4: building everything
> [...]
> cc -O2 -pipe  -DACPI_EXEC_APP -fno-strict-aliasing 
> -I/src/usr.sbin/acpi/acpidb/../../../sys -std=gnu99 -fstack-protector 
> -Wsystem-headers -Werror -Wall -Wno-format-y2k -Wno-uninitialized 
> -Wno-pointer-sign -c 
> /src/usr.sbin/acpi/acpidb/../../../sys/contrib/dev/acpica/utilities/utxface.c
> cc -O2 -pipe  -DACPI_EXEC_APP -fno-strict-aliasing 
> -I/src/usr.sbin/acpi/acpidb/../../../sys -std=gnu99 -fstack-protector 
> -Wsystem-headers -Werror -Wall -Wno-format-y2k -Wno-uninitialized 
> -Wno-pointer-sign  -o acpidb acpidb.o osunixxf.o dbcmds.o dbdisply.o dbexec.o 
> dbfileio.o dbhistry.o dbinput.o dbstats.o dbutils.o dbxface.o dmbuffer.o 
> dmnames.o dmobject.o dmopcode.o dmresrc.o dmresrcl.o dmresrcs.o dmutils.o 
> dmwalk.o evevent.o evgpe.o evgpeblk.o evgpeinit.o evgpeutil.o evmisc.o 
> evregion.o evrgnini.o evsci.o evxface.o evxfevnt.o evxfregn.o hwacpi.o 
> hwgpe.o hwregs.o hwsleep.o hwvalid.o hwxface.o dsfield.o dsinit.o dsmethod.o 
> dsmthdat.o dsobject.o dsopcode.o dsutils.o dswexec.o dswload.o dswscope.o 
> dswstate.o exconfig.o exconvrt.o excreate.o exdebug.o exdump.o exfield.o 
> exfldio.o exmisc.o exmutex.o exnames.o exoparg1.o exoparg2.o exoparg3.o 
> exoparg6.o exprep.o exregion.o exresnte.o exresolv.o exresop.o exstore.o 
> exstoren.o exstorob.o exsystem.o exutils.o psargs.o psloop.o psopc!
>  ode.o psparse.o psscope.o pstree.o psutils.o pswalk.o psxface.o nsaccess.o 
> nsalloc.o nsdump.o nseval.o nsinit.o nsload.o nsnames.o nsobject.o nsparse.o 
> nspredef.o nsrepair.o nsrepair2.o nssearch.o nsutils.o nswalk.o nsxfeval.o 
> nsxfname.o nsxfobj.o rsaddr.o rscalc.o rscreate.o rsdump.o rsinfo.o rsio.o 
> rsirq.o rslist.o rsmemory.o rsmisc.o rsutils.o rsxface.o tbfadt.o tbfind.o 
> tbinstal.o tbutils.o tbxface.o tbxfroot.o utalloc.o utcache.o utcopy.o 
> utdebug.o utdelete.o uteval.o utglobal.o utids.o utinit.o utlock.o utmath.o 
> utmisc.o utmutex.o utobject.o utosi.o utresrc.o utstate.o uttrack.o utxface.o 
> -lpthread
> gzip -cn /src/usr.sbin/acpi/acpidb/acpidb.8 > acpidb.8.gz
> ===> usr.sbin/acpi/acpidump (all)
> cc -O2 -pipe  -I/src/usr.sbin/acpi/acpidump/../../../sys -std=gnu99 
> -fstack-protector -Wsystem-headers -Werror -Wall -Wno-format-y2k -W 
> -Wno-unused-parameter -Wstrict-prototypes -Wmissing-prototypes 
> -Wpointer-arith -Wreturn-type -Wcast-qual -Wwrite-strings -Wswitch -Wshadow 
> -Wunused-parameter -Wcast-align -Wchar-subscripts -Winline -Wnested-externs 
> -Wredundant-decls -Wold-style-definition -Wno-pointer-sign -c 
> /src/usr.sbin/acpi/acpidump/acpi.c
> cc1: warnings being treated as errors
> /src/usr.sbin/acpi/acpidump/acpi.c: In function 'acpi_handle_tcpa':
> /src/usr.sbin/acpi/acpidump/acpi.c:650: warning: format '%lld' expects type 
> 'long long int', but argument 4 has type 'u_int64_t'
> *** Error code 1
>
> Stop in /src/usr.sbin/acpi/acpidump.
> *** Error code 1
>
> Stop in /src/usr.sbin/acpi.
> *** Error code 1
>
> Stop in /src/usr.sbin.
> *** Error code 1
>
> Stop in /src.
> *** Error code 1
>
> Stop in /src.
> *** Error code 1
>
> Stop in /src.
> TB --- 2010-08-12 03:25:04 - WARNING: /usr/bin/make returned exit code  1
> TB --- 2010-08-12 03:25:04 - ERROR: failed to build world
> TB --- 2010-08-12 03:25:04 - 4691.49 user 830.94 system 6903.97 real
>
>
> http://tinderbox.freebsd.org/tinderbox-head-HEAD-amd64-amd64.full
> ___
> freebsd-current@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...

Re: Panic booting vmware i386 after SRAT update

2010-07-29 Thread mdf
On Thu, Jul 29, 2010 at 7:18 AM, John Baldwin  wrote:
> On Wednesday, July 28, 2010 1:37:42 pm m...@freebsd.org wrote:
>> I have a 2 cpu virtual image of FreeBSD current.  It panics during
>> boot after building in the NUMA support.
>>
>> I'll transcribe the SRAT bootverbose messages and panic message as best I 
>> can.
>>
>> Table 'SRAT' at 0xfef07f6
>> SRAT: Found table at 0xfef07f6
>> SRAT: Found memory domain 0 addr 0 len a: enabled
>> SRAT: Found memory domain 0 addr 10 len ff0: enabled
>>
>> then some MADT: messages about finding cpu 0 and 1
>>
>>  cpu0 (BSP): APIC ID:  0
>>  cpu1 (AP): APIC ID:  1
>> panic: SRAT: CPU with APIC ID 0 is not known
>>
>> I'm playing around now with trying to figure out what went missing,
>> but I thought I'd send this out now.
>
> Hmm, check_domains() in srat.c should reject the SRAT table in this case.
> Oh, I see the problem, try this:
>
> Index: srat.c
> ===
> --- srat.c      (revision 210552)
> +++ srat.c      (working copy)
> @@ -150,7 +150,8 @@
>        for (i = 0; i < num_mem; i++) {
>                found = 0;
>                for (j = 0; j <= MAX_APIC_ID; j++)
> -                       if (cpus[j].domain == mem_info[i].domain) {
> +                       if (cpus[j].enabled &&
> +                           cpus[j].domain == mem_info[i].domain) {
>                                cpus[j].has_memory = 1;
>                                found++;
>                        }

This appears to fix the issue for me.

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Panic booting vmware i386 after SRAT update

2010-07-28 Thread mdf
On Wed, Jul 28, 2010 at 11:19 AM, David Cornejo  wrote:
> On Wed, Jul 28, 2010 at 7:37 AM,  wrote:
>>
>> I have a 2 cpu virtual image of FreeBSD current.  It panics during
>> boot after building in the NUMA support.
>>
>> I'll transcribe the SRAT bootverbose messages and panic message as best I
>> can.
>>
>> Table 'SRAT' at 0xfef07f6
>> SRAT: Found table at 0xfef07f6
>> SRAT: Found memory domain 0 addr 0 len a: enabled
>> SRAT: Found memory domain 0 addr 10 len ff0: enabled
>>
>> then some MADT: messages about finding cpu 0 and 1
>>
>>  cpu0 (BSP): APIC ID:  0
>>  cpu1 (AP): APIC ID:  1
>> panic: SRAT: CPU with APIC ID 0 is not known
>>
>> I'm playing around now with trying to figure out what went missing,
>> but I thought I'd send this out now.
>
> I have been seeing this since yesterday with amd64 custom kernels - just
> compiled a GENERIC kernel a few minutes ago and it shows the same symptom.

Is this on VMWare or physical hardware?  I isolated my issue to VMWare
reporting memory domains properties but not CPU affinity.  Setting
hint.srat.0.disabled="1" in /boot/device.hints fixes the issue for me.

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Panic booting vmware i386 after SRAT update

2010-07-28 Thread mdf
On Wed, Jul 28, 2010 at 10:37 AM,   wrote:
> I have a 2 cpu virtual image of FreeBSD current.  It panics during
> boot after building in the NUMA support.
>
> I'll transcribe the SRAT bootverbose messages and panic message as best I can.
>
> Table 'SRAT' at 0xfef07f6
> SRAT: Found table at 0xfef07f6
> SRAT: Found memory domain 0 addr 0 len a: enabled
> SRAT: Found memory domain 0 addr 10 len ff0: enabled
>
> then some MADT: messages about finding cpu 0 and 1
>
>  cpu0 (BSP): APIC ID:  0
>  cpu1 (AP): APIC ID:  1
> panic: SRAT: CPU with APIC ID 0 is not known
>
> I'm playing around now with trying to figure out what went missing,
> but I thought I'd send this out now.

Okay, apparently VMWare is providing two entries of type
ACPI_SRAT_TYPE_MEMORY_AFFINITY but no entries of type
ACPI_SRAT_TYPE_CPU_AFFINITY.  This leads to the assert since no CPUs
are "enabled"; that is there's no affinity information for them.  This
is probably a VMWare bug.

Setting hint.srat.0.disabled="1" in /boot/device.hints works around the issue.

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Panic booting vmware i386 after SRAT update

2010-07-28 Thread mdf
I have a 2 cpu virtual image of FreeBSD current.  It panics during
boot after building in the NUMA support.

I'll transcribe the SRAT bootverbose messages and panic message as best I can.

Table 'SRAT' at 0xfef07f6
SRAT: Found table at 0xfef07f6
SRAT: Found memory domain 0 addr 0 len a: enabled
SRAT: Found memory domain 0 addr 10 len ff0: enabled

then some MADT: messages about finding cpu 0 and 1

 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
panic: SRAT: CPU with APIC ID 0 is not known

I'm playing around now with trying to figure out what went missing,
but I thought I'd send this out now.

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: strange scsi/CAM related dmesg output

2010-06-18 Thread mdf
On Fri, Jun 18, 2010 at 8:11 AM, Alexander Best
 wrote:
> On Mon, Jun 7, 2010 at 3:57 PM, John Baldwin  wrote:
>> It can happen because the print buffer size thing is not line-buffered, it is
>> printf-invocation buffered.
>
> hmmm...can this somehow be fixed? i'm not sure this is specific to
> scsi/cam. the other day i bootd my system and almost all of the dmesg
> output was displayed incorrectly. would increasing PRINTF_BUFR_SIZE
> from 128 to lets say 512 or 1024 solve the issue?

I think what jhb meant was that we could look for the '\n' and flush
to console there, instead of waiting for the end of the buffer.  This
would perhaps have more interleaved full lines, but likely fewer
interleaved partial lines.

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"