Re: mounting root from NFS via ROOTDEVNAME

2013-01-30 Thread Ian Lepore
On Wed, 2013-01-30 at 09:32 +, Eggert, Lars wrote:
> Hi,
> 
> On Jan 29, 2013, at 20:22, Craig Rodrigues  wrote:
> > What kind of architecture are you trying to do this on?  Is this
> > i386/amd64 or something else?
> 
> amd64
> 
> >  I am not familiar with netboot compared to
> > PXE.  Is TFTP involved at all with netboot?
> 
> TFTP is not involved. The kernel gets booted by our custom loader (over HTTP) 
> and the root FS is supposed to be mounted over NFS.
> 
> > What does your dhcpd configuration file look like?
> 
> Completely standard, with the addition of a "root-path" option. (Which I 
> would like to get rid of by setting ROOTDEVNAME in the kernel.)
> 
> > Also, are you using the FreeBSD loader, or something else?  What kinds of
> > customizations have you done on the loader?
> 
> Custom loader. 
> 
> > If through your setup you have already managed to load the kernel over
> > the network, then a lot of the hard work has been done.  Telling the kernel
> > where the root file system is located becomes the next tricky part.
> 
> Right, that's the step I am struggeling with. 
> 
> > In src/sys/boot/common/boot.c which is part of the loader (not the kernel),
> > if you look in the getrootmount() function,
> > you will see that the loader will try to figure out where the root file
> > system
> > is by parsing /etc/fstab, and looking for the "/" mount.
> > 
> > So, if your kernel is located in:
> > 
> >   /usr/home/elars/dst/boot/kernel/kernel
> > 
> > Then create a file /usr/home/elars/dst/etc/fstab file with something like:
> > 
> > # Device MountpointFSType
> > Options  Dump Pass
> > 10.11.12.13:/usr/home/elars/dst/   / nfs  ro00
> 
> Thanks, will try that!
> 
> > Alternatively, if you don't want to create an /etc/fstab file, then
> > you could put something like this in your loader.conf file:
> > 
> > vfs.root.mountfrom=nfs:10.11.12.13:/usr/home/elars/dst
> 
> Will try that too, but not sure if this works with our custom loader.
> 
> Lars
> 
> > 
> > If you can get this to work without introducing new kernel options,
> > that would be ideal, because the kernel options you are
> > enabling are triggering behaviors.
> > 

Just FYI, I believe the current behavior of BOOTP and BOOTP_NFSROOT is a
bug, and I've entered a PR for it 

  http://www.freebsd.org/cgi/query-pr.cgi?pr=175671

I also put a little effort into changing the behavior so that BOOTP
without BOOTP_NFSROOT gets you an address and then moves on to use the
ROOTDEVNAME you have configured, but I didn't have any success yet (it
stays stuck in the state of waiting for the root path).  I intend to get
back to it after wrapping up some other work, if someone else doesn't
get to it first.

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: mounting root from NFS via ROOTDEVNAME

2013-01-29 Thread Ian Lepore
On Tue, 2013-01-29 at 09:17 +, Eggert, Lars wrote:
> On Jan 29, 2013, at 10:13, Lars Eggert 
>  wrote:
> > On Jan 29, 2013, at 9:34, Craig Rodrigues  wrote:
> >> I recommend that you do not use ROOTDEVNAME, and instead
> >> you should follow the instructions which I wrote and contributed to the
> >> FreeBSD handbook:
> >> 
> >> "PXE Booting with an NFS Root File System"
> >> 
> >> http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/network-pxe-nfs.html
> >> 
> >> The content of this document is the same as the text file which Rick
> >> Macklem pointed out (I wrote that too).
> > 
> > I had read both before, and they're very useful documents. Unfortunately, 
> > they don't fully apply to my case, since I'm not PXE-booting the system; it 
> > netboots the kernel from a custom loader. So once the kernel bootstraps, I 
> > need it to obtain an IP address and then NFS-mount root.
> 
> (Whoops, hit send by mistake.)
> 
> That's what I was trying to achieve with the BOOTP and BOOTP_WIRED_TO options.
> 
> Hm, I wonder if I could simply use the custom loader to netboot tftpboot, and 
> then follow your instructions... Will try.

I think that's what I used to do before I switched to configuring the
boot file and root path via dhcp as well.  I could've sworn I used BOOTP
without BOOTP_NFSROOT, but perhaps that's just my muddled memory of what
I tried to do that never worked out.

I also think all of this is a bug.  It seems to me that BOOTP without
BOOTP_NFSROOT should obtain ip-related info from dhcp but use
ROOTDEVNAME as configured, perhaps with any dhcp-provided root path as a
fallback if there's a problem or ROOTDEVNAME is unconfigured.

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: mounting root from NFS via ROOTDEVNAME

2013-01-28 Thread Ian Lepore
On Mon, 2013-01-28 at 15:13 +, Eggert, Lars wrote:
> Hi,
> 
> I'm trying to netboot a system where the root device is specified in the 
> kernel via ROOTDEVNAME:
> 
> options BOOTP
> options BOOTP_NFSROOT
> options BOOTP_NFSV3
> options BOOTP_COMPAT
> options BOOTP_WIRED_TO=em4
> options ROOTDEVNAME=\"nfs:10.11.12.13:/usr/home/elars/dst\"
> 
> I was under the assumption that specifying a ROOTDEVNAME in the kernel config 
> would override the "root-path" option in DHCP, or at least take effect when 
> "root-path" wasn't provided via DHCP, but that doesn't seem to be the case. 
> The system configures it's address correctly over em4, but then enters a loop:
> 
> em4: link state changed to UP
> Received DHCP Offer packet on em4 from 0.0.0.0 (accepted) (no root path)
> Sending DHCP Request packet from interface em4 (XX:XX:XX:XX:XX:XX)
> Received DHCP Ack packet on em4 from 0.0.0.0 (accepted) (no root path)
> Received DHCP Ack packet on em4 from 0.0.0.0 (accepted) (no root path)
> DHCP/BOOTP timeout for server 255.255.255.255
> Received DHCP Ack packet on em4 from 0.0.0.0 (accepted) (no root path)
> DHCP/BOOTP timeout for server 255.255.255.255
> ...
> 
> If I hand out a root path via DHCP the system boots fine, but the idea here 
> is to be able to boot different root devices without needing to diddle 
> dhcpd.conf. Can this be done?

Remove the BOOTP_NFSROOT option, it tells the bootp/dhcp code to keep
querying the server until a root path is delivered.  Without it, the
ROOTDEVNAME option should get used (and I think even override a path
from the server, if it delivers one).

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Trouble with recent auto-tuning changes

2013-01-28 Thread Ian Lepore
On Mon, 2013-01-28 at 00:09 -0600, Alan Cox wrote:
> On Sun, Jan 27, 2013 at 12:11 PM, Ian Lepore  wrote:
> 
> > I ran into a panic while attempting to un-tar a large file on a
> > DreamPlug (arm-based system) running -current.  The source and dest of
> > the un-tar is the root filesystem on sdcard, and I get this:
> >
> > panic: kmem_malloc(4096): kmem_map too small: 12582912 total allocated
> >
> > Just before the panic I see the tar process get hung in a "nokva" wait.
> > 12582912 is the value of VM_KMEM_SIZE from arm/include/vmparam.h.
> >
> > In r245575 the init order for mbuf limits was changed from
> > SI_SUB_TUNABLES to SI_SUB_KMEM so that mbuf limits could be based on the
> > results of sizing kernel memory.  Unfortunately, the process of sizing
> > kernel memory relies on the mbuf limits; in kmeminit():
> >
> > vm_kmem_size = VM_KMEM_SIZE + nmbclusters * PAGE_SIZE;
> >
> > Since r245575, nmbclusters is zero when this line of code runs.  If I
> > manually plugin "32768" (the number tunable_mbinit() comes up with for
> > this platform) in that line, the panic stops happening.
> >
> > So we've got two problems here... one is the circular dependency in
> > calculating the mbuf limits.  The other is the fact that some
> > non-trivial amount of kernel memory we're allowing for mbufs is actually
> > being used for other things.  That is, if my system was actually using
> > all the mbufs that tunable_mbinit() allowed for, then this panic while
> > untarring a huge file would still have happened.
> >
> >
> All of this is factually correct.  However, it's a red herring.  The real
> problem is that arm, unlike every other architecture in the tree, does not
> enable auto-sizing of the kmem map based on the physical memory size.
> Specifically, you'll find VM_KMEM_SIZE_SCALE defined in
> "arch"/include/vmparam.h on every other architecture, just not on arm.
> This auto-sizing overrides the value of VM_KMEM_SIZE.
> 

Aha.  I'll investigate what other architectures do with that and try to
get the same thing going for arm.

-- Ian

> 
> 
> > I arrive at the latter conclusion based on the fact that this panic
> > happens even if no network interfaces (other than lo0) are configured.
> > That is, nmbclusters == 0 is a reasonable approximation of my need for
> > network mbufs.  So something in the system needs to be taken into
> > account when sizing kernel memory to allow for whatever it is about
> > untarring a huge file that eats kernel memory (buffer cache?).
> >
> > I can easily reproduce this if you need me to gather any specific info.
> >
> > -- Ian
> >
> >
> > ___
> > freebsd-current@freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-current
> > To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
> >


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Trouble with recent auto-tuning changes

2013-01-27 Thread Ian Lepore
I ran into a panic while attempting to un-tar a large file on a
DreamPlug (arm-based system) running -current.  The source and dest of
the un-tar is the root filesystem on sdcard, and I get this:

panic: kmem_malloc(4096): kmem_map too small: 12582912 total allocated

Just before the panic I see the tar process get hung in a "nokva" wait.
12582912 is the value of VM_KMEM_SIZE from arm/include/vmparam.h.

In r245575 the init order for mbuf limits was changed from
SI_SUB_TUNABLES to SI_SUB_KMEM so that mbuf limits could be based on the
results of sizing kernel memory.  Unfortunately, the process of sizing
kernel memory relies on the mbuf limits; in kmeminit():

vm_kmem_size = VM_KMEM_SIZE + nmbclusters * PAGE_SIZE;

Since r245575, nmbclusters is zero when this line of code runs.  If I
manually plugin "32768" (the number tunable_mbinit() comes up with for
this platform) in that line, the panic stops happening.

So we've got two problems here... one is the circular dependency in
calculating the mbuf limits.  The other is the fact that some
non-trivial amount of kernel memory we're allowing for mbufs is actually
being used for other things.  That is, if my system was actually using
all the mbufs that tunable_mbinit() allowed for, then this panic while
untarring a huge file would still have happened.

I arrive at the latter conclusion based on the fact that this panic
happens even if no network interfaces (other than lo0) are configured.
That is, nmbclusters == 0 is a reasonable approximation of my need for
network mbufs.  So something in the system needs to be taken into
account when sizing kernel memory to allow for whatever it is about
untarring a huge file that eats kernel memory (buffer cache?).

I can easily reproduce this if you need me to gather any specific info.

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [RFC/RFT] calloutng

2013-01-17 Thread Ian Lepore
On Sun, 2012-12-30 at 16:13 -0700, Ian Lepore wrote:
> On Wed, 2012-12-26 at 21:24 +0200, Alexander Motin wrote:
> >[...]
> > 
> 
> I grabbed testsleep.c to test an arm event timer implementation, and had
> to fix a couple nits... kqueueto was missing from the names[] array, and
> I had to add a "* 1000" to a couple places where usec was stuffed into a
> timespec's tv_nsec.
> 
> I also tested the calloutng_12_17 patches and the kqueue stuff behaved
> very strangely.  Then I noticed you had a 12_26 patchset so I tested
> that (after crudely fixing a couple uninitialized var warnings), and it
> all looks good on this arm (Raspberry Pi).  I'll attach the results.
> 
> It's so sweet to be able to do precision sleeps.
> 
> -- Ian
> 
> 
> plain text document attachment (calloutng_test.txt)
> for t in 1 300 3000 3 30 ; do
>   for m in select poll usleep nanosleep kqueue kqueueto syscall ; do
> ./testsleep $t $m
>   done
> done
> 
> 
> With calloutng_12_26.patch...
> 
> HZ=100   HZ=250   HZ=1000
> --   
> select  1 55.79  1 50.96  1 61.32
> poll1   1109.46  1   1107.86  1   1114.38
> usleep  1 56.33  1 72.90  1 62.78
> nanosleep   1 52.66  1 55.23  1 64.23
> kqueue  1   1114.23  1   1113.81  1   1121.21
> kqueueto1 65.44  1 71.00  1 75.01
> syscall 1  4.70  1  4.45  1  4.55
> select300355.79300357.76300362.35
> poll  300   1107.85300   1122.55300   1115.62
> usleep300355.28300357.28300360.79
> nanosleep 300354.49300355.82300360.62
> kqueue300   1112.57300   1118.13300   1117.16
> kqueueto  300375.98300378.62300395.61
> syscall   300  4.41300  4.45300  4.54
> select   3000   3246.75   3000   3246.74   3000   3252.72
> poll 3000   3238.10   3000   3229.12   3000   3250.10
> usleep   3000   3242.47   3000   3237.06   3000   3249.61
> nanosleep3000   3238.79   3000   3231.55   3000   3248.11
> kqueue   3000   3240.01   3000   3236.07   3000   3247.60
> kqueueto 3000   3265.36   3000   3267.22   3000   3274.96
> syscall  3000  4.69   3000  4.44   3000  4.50
> select  3  31714.60  3  31941.17  3  32467.69
> poll3  31522.76  3  31983.00  3  32497.81
> usleep  3  31459.67  3  31980.76  3  32458.71
> nanosleep   3  31431.02  3  31982.22  3  32525.20
> kqueue  3  31466.75  3  31873.90  3  31973.54
> kqueueto3  31564.67  3  32522.35  3  32475.59
> syscall 3  4.70  3  4.73  3  4.89
> select 30 319133.02 30 311562.33 30 309918.62
> poll   30 319604.27 30 311422.94 30 31.76
> usleep 30 319314.60 30 311269.69 30 309996.34
> nanosleep  30 319497.58 30 311425.40 30 309997.13
> kqueue 30 309995.55 30 303980.27 30 309908.82
> kqueueto   30 319505.88 30 311424.97 30 309996.16
> syscall30  4.41 30  4.45 30  4.89
>
> 
> With no patches...
> 
> HZ=100   HZ=250   HZ=1000
> --   
> select  1  19941.70  1   7989.10  1   1999.16
> poll1  19904.61  1   7987.32  1   1999.78
> usleep  1  19904.95  1   7993.30  1   1999.96
> nanosleep   1  19905.64  1   7993.71  1   1999.72
> kqueue  1  10001.61  1   4004.00  1   1000.27
> kqueueto1  19904.00  1   7993.03  1   1999.54
> syscall 1  4.04  1  4.05  1  4.75
> select300  19904.66300   7998.39300   2000.27
> poll  300  19904.35300   7993.47300   1999.86
> usleep300  19903.96300   7994.11300   1999.81
> nanosleep 300  19904.48300   7993.77300   1999.80
> kqueue300  10001.68300   4004.18300   1000.31
> kqueueto

Re: [RFC/RFT] calloutng

2013-01-17 Thread Ian Lepore
On Mon, 2013-01-14 at 11:38 +1100, Bruce Evans wrote:
> On Sun, 13 Jan 2013, Alexander Motin wrote:
> 
> > On 13.01.2013 20:09, Marius Strobl wrote:
> >> On Tue, Jan 08, 2013 at 12:46:57PM +0200, Alexander Motin wrote:
[...]
> >
> > In existing code in HEAD and 9 timecounters are never called with spin
> > mutex held.  I intentionally tried to avoid that in existing eventtimers
> > code.
> 
> Er, timecounters are called with a spin mutex held in existing code:
> though it is dangerous to do so, timecounters are called from fast
> interrupt handlers for very timekeeping-critical purposes:
> - to implement the TIOCTIMESTAMP ioctl (except this is broken in
>-current).  This was a primitive version of pps timestamping.
> - for pps timestamping.  The interrupt handler (which should be a fast
>interrupt handler to minimize latency) calls pps_capture() which
>calls tc_get_timecount() and does other "lock-free" accesses to the
>timecounter state.  This still works in -current (at least there is
>still code for it).
> 

Unfortunately, calling pps_capture() in the primary interrupt context is
no longer an option with the stock pps driver.  Ever since the ppbus
rewrite all ppbus children must use threaded handlers.  I tried to fix
that a couple different ways, and both ended up with crazy-complex code
scattered around the ppbus family just to support the rarely-used pps
capture.  It would have been easier to do if filter and threaded
interrupt handlers had the same function signature.

I ended up writting a separate driver that can be used instead of ppc +
ppbus + pps, since anyone who cares about precise pps capture is
unlikely to be sharing the port with a printer or plip device or some
such.

>OTOH, all drivers that call pps_capture() from their interrupt handler
>then immediately call pps_event().  This has always been very broken,
>and became even more broken with SMPng.  pps_event() does many more
>timecounter and pps accesses whose locking is unclear at best, and
>in some configurations it calls hardpps(), which is only locked by
>Giant, despite comments in kern_ntptime.c still saying that it (and
>many other functions in kern_ntptime.c) must be called at splclock()
>or higher.  splclock() is of course now null, but the locking
>requirements in kern_ntptime.c haven't changed much.  kern_ntptime.c
>always needed to be locked by the equivalent of a spin mutex, which
>is stronger locking than was given by splclock().  pps_event() would
>have to aquire the spin mutex before calling hardpps(), although
>this is bad for fast interrupt handlers.  The correct implementation
>is probably to only do the capture part from fast interrupt handlers.
> 

In my rewritten dedicated pps driver I call pps_capture() from the
filter handler and pps_event() from the threaded handler.  I never found
any good documentation on the low-level details of this stuff, and there
isn't enough good example code to work from.  My hazy memory is that I
ended up studying the pps_capture() and pps_event() code enough to infer
that their design intent seems to be to allow you to capture with no
locking and do the event processing later in some sort of deferred or
threaded context.

> > Callout code same time can be called in any environment with any
> > locks held. And new callout code may need to know precise current time
> > in any of those conditions. Attempt to use an IPI and wait there can be
> > fatal.
> 
> Callout code can't be called from such a general "any" environment as
> timecounter code.  Not from a fast interrupt handler.  Not from an NMI
> or IPI handler.  I hope.  But timecounter code has a good chance of
> working even for the last 2 environments, due to its design requirement
> of working in the first.
> 
> The spinlock in the i8254 timecounter certainly breaks some cases.
> For example, suppose the lock is held for a timecounter read from
> normal context.  It masks hardware interrupts on the current CPU (except
> in my version).  It doesn't mask NMIs or other traps.  So if the NMI
> or other trap handler does a timecounter hardware call, there is
> deadlock in at least the !SMP case.  In my version, it blocks normal
> interrupts later if they occur, but doesn't block fast interrupts, so
> the pps_capture() call would deadlock if it occurs, like a timecounter
> call from an NMI.  I avoid this by not using pps in any fast interrupt
> handler, and by only using the i8254 timecounter for testing.  I do
> use pps in a (nonstandard) x86 RTC clock interrupt handler.  My clock
> interrupt handlers are all non-fast to avoid this and other locking
> problems.

Hrm, now you've got me a bit worried about capturing in the primary
context.  Not that I have much option, on a 300mhz Geode and similar
wimpy embedded processors there's enough latency on a theaded handler
that the pps signal can be de-asserted by time the handler runs
(precision timing gear often outputs

Re: [RFC/RFT] calloutng

2013-01-13 Thread Ian Lepore
On Sun, 2013-01-13 at 21:36 +0200, Alexander Motin wrote:
> On 13.01.2013 20:09, Marius Strobl wrote:
[...]
> > 
> > Uhm, there are no NMIs on sparc64. Does it make sense to bypass this
> > adjustment on sparc64?
> 
> If it is not possible or not good to to stop timer during programming,
> there will always be some race window between code execution and timer
> ticking. So some minimal safety range should be reserved. Though it
> probably can be significantly reduced. In case of x86/HPET there is
> additional factor of NMI, that extends race to unpredictable level and
> so makes additional post-read almost mandatory.
> 
> >> May be with rereading counter
> >> after programming comparator (same as done for HPET, reading which is
> >> probably much more expensive) this value could be reduced.
> > 
> > I see. There are some bigger fish to fry at the moment though :)
> 

Speaking of the HPET code, it seems to me that its restart logic can
fire the same event twice.  Is that harmless?

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [RFC/RFT] calloutng

2013-01-02 Thread Ian Lepore
On Wed, 2013-01-02 at 15:11 +0200, Alexander Motin wrote:
> 02.01.2013 14:28 пользователь "Luigi Rizzo"  написал:
> >
> > On Wed, Jan 02, 2013 at 01:24:26PM +0200, Alexander Motin wrote:
> > > On 02.01.2013 12:57, Luigi Rizzo wrote:

> > First of all, if you know that there is already a hardclock/statclock/*
> > scheduled in [T_X, T_X+D] you just reuse that. This particular bullet
> > was ""no event scheduled in [T_X, T_X+D]" so you need to generate
> > a new one.
> 
> That is true, but my main point was about merging with external events,
> which I can't predict and the only way to merge is increase sleep period,
> hoping for better.
> 

This really is the crux of the problem, because you can't *by default*
dispatch an event earlier than requested because that's just a violation
of the usual rules of precision timing (where you expect to be late but
never early).

Sometimes there is no need for such precision, and an early wakeup is no
more or less detrimental than a late wakeup.  In fact, that may even be
the majority case.  I wonder if it might make sense to allow the
precision specification to indicate whether it needs traditional "never
early" behavior or whether it can be interpretted as "plus or minus this
amount is fine."  Like maybe negative precision is interpretted as "plus
or minus abs(precision)" or something like that.

Or maybe even the other way around... you get "plus or minus" precision
by default and the few things that really care about precision timing
have a way of indicating that.  (But in that case the userland sleeps
would have to assume the traditional behavior because that's how they've
always been documented.)

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: [RFC/RFT] calloutng

2012-12-31 Thread Ian Lepore
On Mon, 2012-12-31 at 12:17 +0200, Alexander Motin wrote:
> On 31.12.2012 08:17, Luigi Rizzo wrote:
> > On Sun, Dec 30, 2012 at 04:13:43PM -0700, Ian Lepore wrote:
> > ...
> >> I grabbed testsleep.c to test an arm event timer implementation, and had
> >> to fix a couple nits... kqueueto was missing from the names[] array, and
> >> I had to add a "* 1000" to a couple places where usec was stuffed into a
> >> timespec's tv_nsec.
> >>
> >> I also tested the calloutng_12_17 patches and the kqueue stuff behaved
> >> very strangely.
> 
> I've rewritten kqueue timeouts at the calloutng_12_26.patch.
> 
> >> Then I noticed you had a 12_26 patchset so I tested
> >> that (after crudely fixing a couple uninitialized var warnings), and it
> >> all looks good on this arm (Raspberry Pi).  I'll attach the results.
> >>
> >> It's so sweet to be able to do precision sleeps.
> 
> Thank you for testing, Ian.
> 
> > interesting numbers, but there seems to be some problem in computing
> > the exact interval; delays are much larger than expected.
> >
> > In this test, the original timer code used to round to the next multiple
> > of 1 tick and then add another tick (except for the kqueue case),
> > which is exactly what you see in the second set of measurements.
> >
> > The calloutng code however seems to do something odd:
> > in addition to fixed overhead (some 50us, which you can see in
> > the tests for 1us and 300us), all delay seem to be ~10% larger
> > than what is requested, upper bounded to 10ms (note, the
> > numbers are averages so i cannot tell whether all samples are
> > the same or there is some distribution of values).
> >
> > I am not sure if this error is peculiar of the ARM version or also
> > appears on x86/amd64 but I believe it should be fixed.
> >
> > If you look at the results below:
> >
> > 1us possily ok:
> > for very short intervals i would expect some kind
> > of 'reschedule' without actually firing a timer; maybe
> > 50us are what it takes to do a round through the scheduler ?
> >
> > 300us   probably ok
> > i guess the extra 50-90us are what it takes to do a round
> > through the scheduler
> >
> > 1000us  borderline (this is the case for poll and kqueue, which are
> > rounded to 1ms)
> > here intervals seem to be increased by 10%, and i cannot see
> > a good reason for this (more below).
> >
> > 3000us and above: wrong
> > here again, the intervals seem to be 10% larger than what is
> > requested, perhaps limiting the error to 10-20ms.
> >
> >
> > Maybe the 10% extension results from creating a default 'precision'
> > for legacy calls, but i do not think this is done correctly.
> >
> > First of all, if users do not specify a precision themselves, the
> > automatically generated value should never exceed one tick.
> >
> > Second, the only point of a 'precision' parameter is to merge
> > requests that may be close in time, so if there is already a
> > timer scheduled within [Treq, Treq+precision] i will get it;
> > but if there no pending timer, then one should schedule it
> > for the requested interval.
> >
> > Davide/Alexander, any ideas ?
> 
> All mentioned effects could be explained with implemented logic. 50us at 
> 1us is probably sum of minimal latency of the hardware eventtimer on the 
> specific platform and some software processing overhead (syscall, 
> callout, timecouters, scheduler, etc). At later points system starts to 
> noticeably use precision specified by kern.timecounter.alloweddeviation 
> sysctl. It affects results from two sides: 1) extending intervals for 
> specified percent of time to allow event aggregation, and 2) choosing 
> time base between fast getbinuptime() and precise binuptime(). Extending 
> interval is needed to aggregate not only callouts with each other, but 
> also callouts with other system events, which are impossible to schedule 
> in advance. It gives specified relative error, but no more then one CPU 
> wakeup period in absolute: for busy CPU (not skipping hardclock() ticks) 
> it is 1/hz, for completely idle one it can be up to 0.5s. Second point 
> allows to reduce processing overhead by the cost of error up to 1/hz for 
> long periods (>(100/allowed)*(1/hz)), when it is used.
> 
> To get best possible precision kern.timecounter.alloweddeviation sysctl 
> can be set to smaller value. Setting it to 0 will effectively disable 
> all optimiza

Re: [RFC/RFT] calloutng

2012-12-30 Thread Ian Lepore
On Wed, 2012-12-26 at 21:24 +0200, Alexander Motin wrote:
> On 26.12.2012 01:21, Marius Strobl wrote:
> > On Tue, Dec 18, 2012 at 11:03:47AM +0200, Alexander Motin wrote:
> >> Experiments with dummynet shown ineffective support for very short
> >> tick-based callouts. New version fixes that, allowing to get as many
> >> tick-based callout events as hz value permits, while still be able to
> >> aggregate events and generating minimum of interrupts.
> >>
> >> Also this version modifies system load average calculation to fix some
> >> cases existing in HEAD and 9 branches, that could be fixed with new
> >> direct callout functionality.
> >>
> >> http://people.freebsd.org/~mav/calloutng_12_17.patch
> >>
> >> With several important changes made last time I am going to delay commit
> >> to HEAD for another week to do more testing. Comments and new test cases
> >> are welcome. Thanks for staying tuned and commenting.
> >
> > FYI, I gave both calloutng_12_15_1.patch and calloutng_12_17.patch a
> > try on sparc64 and it at least survives a buildworld there. However,
> > with the patched kernels, buildworld times seem to increase slightly but
> > reproducible by 1-2% (I only did four runs but typically buildworld
> > times are rather stable and don't vary more than a minute for the
> > same kernel and source here). Is this an expected trade-off (system
> > time as such doesn't seem to increase)?
> 
> I don't think build process uses significant number of callouts to 
> affect results directly. I think this additional time could be result of 
> the deeper next event look up, done by the new code, that is practically 
> useless for sparc64, which effectively has no cpu_idle() routine. It 
> wouldn't affect system time and wouldn't show up in any statistics 
> (except PMC or something alike) because it is executed inside timer 
> hardware interrupt handler. If my guess is right, that is a part that 
> probably still could be optimized. I'll look on it. Thanks.
> 
> > Is there anything specific to test?
> 
> Since the most of code is MI, for sparc64 I would mostly look on related 
> MD parts (eventtimers and timecounters) to make sure they are working 
> reliably in more stressful conditions.  I still have some worries about 
> possible deadlock on hardware where IPIs are used to fetch present time 
> from other CPU.
> 
> Here is small tool we are using for test correctness and performance of 
> different user-level APIs: http://people.freebsd.org/~mav/testsleep.c
> 

I grabbed testsleep.c to test an arm event timer implementation, and had
to fix a couple nits... kqueueto was missing from the names[] array, and
I had to add a "* 1000" to a couple places where usec was stuffed into a
timespec's tv_nsec.

I also tested the calloutng_12_17 patches and the kqueue stuff behaved
very strangely.  Then I noticed you had a 12_26 patchset so I tested
that (after crudely fixing a couple uninitialized var warnings), and it
all looks good on this arm (Raspberry Pi).  I'll attach the results.

It's so sweet to be able to do precision sleeps.

-- Ian


for t in 1 300 3000 3 30 ; do
  for m in select poll usleep nanosleep kqueue kqueueto syscall ; do
./testsleep $t $m
  done
done


With calloutng_12_26.patch...

HZ=100   HZ=250   HZ=1000
--   
select  1 55.79  1 50.96  1 61.32
poll1   1109.46  1   1107.86  1   1114.38
usleep  1 56.33  1 72.90  1 62.78
nanosleep   1 52.66  1 55.23  1 64.23
kqueue  1   1114.23  1   1113.81  1   1121.21
kqueueto1 65.44  1 71.00  1 75.01
syscall 1  4.70  1  4.45  1  4.55
select300355.79300357.76300362.35
poll  300   1107.85300   1122.55300   1115.62
usleep300355.28300357.28300360.79
nanosleep 300354.49300355.82300360.62
kqueue300   1112.57300   1118.13300   1117.16
kqueueto  300375.98300378.62300395.61
syscall   300  4.41300  4.45300  4.54
select   3000   3246.75   3000   3246.74   3000   3252.72
poll 3000   3238.10   3000   3229.12   3000   3250.10
usleep   3000   3242.47   3000   3237.06   3000   3249.61
nanosleep3000   3238.79   3000   3231.55   3000   3248.11
kqueue   3000   3240.01   3000   3236.07   3000   3247.60
kqueueto 3000   3265.36   3000   3267.22   3000   3274.96
syscall  3000  4.69   3000  4.44   3000  4.50
select  3  31714.60  3  31941.17  3  32467.69
poll3  31522.76  3  31983.00  3  32497.81

Re: Why does sleep(1) end up blocked in bwillwrite()?

2012-12-23 Thread Ian Lepore
On Sun, 2012-12-23 at 21:37 +0200, Konstantin Belousov wrote:
> On Sun, Dec 23, 2012 at 11:55:15AM -0700, Ian Lepore wrote:
> > Background:  I'm trying to get nandfs working on a low-end small-memory
> > embedded system.  I'm debugging performance problems that manifest as
> > the system (or large portions of it) becoming unresponsive for many
> > seconds at a time.  It appears that sometimes the nandfs background
> > garbage collector does things that lead to dirtying lots of buffers way
> > faster than they can be written.  When that happens it seems to take too
> > long (many seconds) for the problem to clear.  That's the basic
> > situation I'm investigating, but NOT what this mail is about, that's
> > just the background.
> > 
> > When this situation happens, some of the threads in my application keep
> > running fine.  Others get blocked unexpectedly even though they do no
> > disk IO at all, they're working with sockets and serial (uart) devices.
> > I discovered by accident that I can see a form of the problem happening
> > just using sleep(1) and hitting ^T while the buffer starvation is in
> > progress...
> > 
> >   guava# sleep 99
> > [ hit ^T]
> >   load: 1.03  cmd: sleep 472 [nanslp] 2.03r 0.01u 0.02s 0% 1372k
> >   sleep: about 97 second(s) left out of the original 99
> > [ hit ^T]
> >   load: 1.27  cmd: sleep 472 [nanslp] 9.32r 0.01u 0.02s 0% 1376k
> >   sleep: about 89 second(s) left out of the original 99
> > [ hit ^T]
> >   load: 1.49  cmd: sleep 472 [nanslp] 11.53r 0.01u 0.02s 0% 1376k
> > [ note no output from sleep(1) here, repeated ^T now gives...]
> >   load: 1.49  cmd: sleep 472 [flswai] 12.01r 0.01u 0.03s 0% 1376k
> >   load: 1.49  cmd: sleep 472 [flswai] 12.27r 0.01u 0.03s 0% 1376k
> >   load: 1.49  cmd: sleep 472 [flswai] 12.76r 0.01u 0.03s 0% 1376k
> >   load: 1.49  cmd: sleep 472 [flswai] 13.06r 0.01u 0.03s 0% 1376k
> >   load: 1.49  cmd: sleep 472 [flswai] 13.26r 0.01u 0.03s 0% 1376k
> >   load: 1.61  cmd: sleep 472 [flswai] 20.03r 0.02u 0.07s 0% 1376k
> >   load: 1.64  cmd: sleep 472 [flswai] 20.49r 0.02u 0.07s 0% 1376k
> >   load: 1.64  cmd: sleep 472 [flswai] 20.68r 0.02u 0.08s 0% 1376k
> >   sleep: about 87 second(s) left out of the original 99
> > 
> > So here sleep(1) was blocked in bwillwrite() for about 9 seconds on a
> > write to stderr (which is an ssh xterm connection).
> > 
> > The call to bwillwrite() is in kern/sys_generic.c in dofilewrite():
> > 
> > if (fp->f_type == DTYPE_VNODE)
> >   bwillwrite();
> > 
> > I just noticed the checkin message that added the DTYPE_VNODE check
> > specifically mentions not penalizing devices and pipes and such.  I
> > think maybe things have evolved since then (Dec 2000) and this check is
> > no longer sufficient.  Maybe it needs to be something more like
> > 
> > if (fp->f_type == DTYPE_VNODE && fp->f_vnode->v_type == VREG)
> > 
> > but I have a gut feeling it needs to be more complex than that (can
> > f_vnode be NULL, what sort of locking is required to peek into f_vnode
> > at this point, etc), so I can't really propose a patch for this.  In
> > fact, I can't even say for sure it's a bug, but it sure feels like one
> > to the application-developer part of me.
> 
> The patch below would do what you want. But my opinion is that it is more
> bug in the filesystem than in the VFS. Anyway, try this and report how
> it works for you.
> 

If by "bug in the filesystem" you mean the real problem is nandfs
driving the system into buffer starvation, then yes I agree... that's
the real problem I'm pursuing.  The difficulty I had was that anything I
did to try to investigate the state of the system resulted in blocking
when it tried to output to the terminal.  

I'm running with your patch now and it seems to be working perfectly,
thanks!

-- Ian

> diff --git a/sys/fs/devfs/devfs_vnops.c b/sys/fs/devfs/devfs_vnops.c
> index 97a1bcf..9851229 100644
> --- a/sys/fs/devfs/devfs_vnops.c
> +++ b/sys/fs/devfs/devfs_vnops.c
> @@ -1049,6 +1049,7 @@ devfs_open(struct vop_open_args *ap)
>   int error, ref, vlocked;
>   struct cdevsw *dsw;
>   struct file *fpop;
> + struct mtx *mtxp;
>  
>   if (vp->v_type == VBLK)
>   return (ENXIO);
> @@ -1099,6 +1100,16 @@ devfs_open(struct vop_open_args *ap)
>  #endif
>   if (fp->f_ops == &badfileops)
>   finit(fp, fp->f_flag, DTYPE_VNODE, dev, &devfs_ops_f);
> + mtxp = mtx_pool_find(mtxpool_sleep, fp);

Why does sleep(1) end up blocked in bwillwrite()?

2012-12-23 Thread Ian Lepore
Background:  I'm trying to get nandfs working on a low-end small-memory
embedded system.  I'm debugging performance problems that manifest as
the system (or large portions of it) becoming unresponsive for many
seconds at a time.  It appears that sometimes the nandfs background
garbage collector does things that lead to dirtying lots of buffers way
faster than they can be written.  When that happens it seems to take too
long (many seconds) for the problem to clear.  That's the basic
situation I'm investigating, but NOT what this mail is about, that's
just the background.

When this situation happens, some of the threads in my application keep
running fine.  Others get blocked unexpectedly even though they do no
disk IO at all, they're working with sockets and serial (uart) devices.
I discovered by accident that I can see a form of the problem happening
just using sleep(1) and hitting ^T while the buffer starvation is in
progress...

  guava# sleep 99
[ hit ^T]
  load: 1.03  cmd: sleep 472 [nanslp] 2.03r 0.01u 0.02s 0% 1372k
  sleep: about 97 second(s) left out of the original 99
[ hit ^T]
  load: 1.27  cmd: sleep 472 [nanslp] 9.32r 0.01u 0.02s 0% 1376k
  sleep: about 89 second(s) left out of the original 99
[ hit ^T]
  load: 1.49  cmd: sleep 472 [nanslp] 11.53r 0.01u 0.02s 0% 1376k
[ note no output from sleep(1) here, repeated ^T now gives...]
  load: 1.49  cmd: sleep 472 [flswai] 12.01r 0.01u 0.03s 0% 1376k
  load: 1.49  cmd: sleep 472 [flswai] 12.27r 0.01u 0.03s 0% 1376k
  load: 1.49  cmd: sleep 472 [flswai] 12.76r 0.01u 0.03s 0% 1376k
  load: 1.49  cmd: sleep 472 [flswai] 13.06r 0.01u 0.03s 0% 1376k
  load: 1.49  cmd: sleep 472 [flswai] 13.26r 0.01u 0.03s 0% 1376k
  load: 1.61  cmd: sleep 472 [flswai] 20.03r 0.02u 0.07s 0% 1376k
  load: 1.64  cmd: sleep 472 [flswai] 20.49r 0.02u 0.07s 0% 1376k
  load: 1.64  cmd: sleep 472 [flswai] 20.68r 0.02u 0.08s 0% 1376k
  sleep: about 87 second(s) left out of the original 99

So here sleep(1) was blocked in bwillwrite() for about 9 seconds on a
write to stderr (which is an ssh xterm connection).

The call to bwillwrite() is in kern/sys_generic.c in dofilewrite():

if (fp->f_type == DTYPE_VNODE)
  bwillwrite();

I just noticed the checkin message that added the DTYPE_VNODE check
specifically mentions not penalizing devices and pipes and such.  I
think maybe things have evolved since then (Dec 2000) and this check is
no longer sufficient.  Maybe it needs to be something more like

if (fp->f_type == DTYPE_VNODE && fp->f_vnode->v_type == VREG)

but I have a gut feeling it needs to be more complex than that (can
f_vnode be NULL, what sort of locking is required to peek into f_vnode
at this point, etc), so I can't really propose a patch for this.  In
fact, I can't even say for sure it's a bug, but it sure feels like one
to the application-developer part of me.

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: API explosion (Re: [RFC/RFT] calloutng)

2012-12-18 Thread Ian Lepore
On Wed, 2012-12-19 at 00:29 +0100, Luigi Rizzo wrote:
> On Tue, Dec 18, 2012 at 04:27:45PM -0700, Ian Lepore wrote:
> > On Tue, 2012-12-18 at 23:58 +0100, Luigi Rizzo wrote:
> > > [top posting for readability;
> > > in summary we were discussing the new callout API trying to avoid
> > > an explosion of methods and arguments while at the same time
> > > supporting the old API and the new one]
> > > (I am also Cc-ing phk as he might have better insight
> > > on the topic).
> > > 
> > > I think the patch you propose is a step in the right direction,
> > > but i still remain concerned by having to pass two bintimes
> > > (by reference, but they should really go by value)
> > > and one 'ticks' value to all these functions.
> > > 
> > > I am also dubious that we need a full 128 bits to specify
> > > the 'precision': there would be absolutely no loss of functionality
> > > if we decided to specify the precision in powers of 2, so a precision
> > > 'k' (signed) means 2^k seconds. This way 8 bits are enough to
> > > represent any precision we want.
> 
> ...
> > I'm not so sure about the 2^k precision.  You speak of seconds, but I
> > would be worrying about sub-second precision in my work.  It would
> > typical to want a 500uS timeout but be willing to late by up to 250uS if
> 
> i said k is signed so negative values represent fractions of a
> second. 2^-128 is pretty short :)
> 
> cheers
> luigi

Ahh, I missed that.  Good enough then!  Hmmm, if that ideas survives
further review, then could precision be encoded in 8 bits of the flags,
eliminating another parm?

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: API explosion (Re: [RFC/RFT] calloutng)

2012-12-18 Thread Ian Lepore
On Tue, 2012-12-18 at 23:58 +0100, Luigi Rizzo wrote:
> [top posting for readability;
> in summary we were discussing the new callout API trying to avoid
> an explosion of methods and arguments while at the same time
> supporting the old API and the new one]
> (I am also Cc-ing phk as he might have better insight
> on the topic).
> 
> I think the patch you propose is a step in the right direction,
> but i still remain concerned by having to pass two bintimes
> (by reference, but they should really go by value)
> and one 'ticks' value to all these functions.
> 
> I am also dubious that we need a full 128 bits to specify
> the 'precision': there would be absolutely no loss of functionality
> if we decided to specify the precision in powers of 2, so a precision
> 'k' (signed) means 2^k seconds. This way 8 bits are enough to
> represent any precision we want.
> 
> The key difference between 'ticks' and bintimes (and the main
> difficulty in the conversion) is that ticks are relative and bintimes
> are interpreted as absolute. This could be easily solved by using
> a flag to specify if the 'bt' argument is absolute or relative, and
> passing the argument by value.
> 
> So now the flags could contain C_DIRECT_EXEC, C_BT_IS_RELATIVE, the
> precision, and another 64 or 128 bit field contains the bintime.
> 
> How does this look ?
> 
> cheers
> luigi
> 

I tend to agree that the bintime should be passed by value instead of
reference.  That would allow an inline tickstobintime() that converts
relative ticks to an absolute bintime returned by value and passed right
along in one tidy line/clump of code without any temporary variables
cluttering things up.  

While the 1980s C programmer in me still wants to avoid returning
complex objects by value, the reality is that modern compilers tend to
generate really nice code for such constructs, usually without any
copying of the return value at all.

I'm not so sure about the 2^k precision.  You speak of seconds, but I
would be worrying about sub-second precision in my work.  It would
typical to want a 500uS timeout but be willing to late by up to 250uS if
that helped scheduling and performance.  Also, my idea of precision
would virtually always be "I'm willing to be late by this much, but
never early by any amount."

To reinforce the point of that last paragraph... the way I'm looking at
these changes has nothing to do with power saving (I've never owned a
battery-operated computer, probably never will) and everything to do
with performance and being able to sleep accurately for less than a
tick.

-- Ian

> > On 18.12.2012 20:03, Alexander Motin wrote:
> > >On 18.12.2012 19:36, Luigi Rizzo wrote:
> > >>On Mon, Dec 17, 2012 at 11:03:53PM +0200, Alexander Motin wrote:
> > I would instead do the following:
> > >>>
> > >>>I also don't very like the wide API and want to hear fresh ideas, but
> > >>>approaches to time measurement there are too different to do what you
> > >>>are proposing.  Main problem is that while ticks value is relative,
> > >>>bintime is absolute. It is not easy to make conversion between them fast
> > >>>and precise. I've managed to do it, but the only function that does it
> > >>>now is _callout_reset_on(). All other functions are just passing values
> > >>>down. I am not sure I want to duplicate that code in each place, though
> > >>>doing it at least for for callout may be a good idea.
> > >>
> > >>I am afraid the above is not convincing.
> > >>
> > >>Most/all of the APIs i mentioned still have the conversion from
> > >>ticks to bintime, and the code in your patch is just
> > >>building multiple parallel paths (one for each of the
> > >>three versions of the same function) to some final
> > >>piece of code where the conversion takes place.
> > >>
> > >>The problem is that all of this goes through a set of obfuscating
> > >>macros and the end result is horrible.
> > >>
> > >>To be clear, i believe the work you have been doing on cleaning up
> > >>callout is great, i am just saying that this is the time to look
> > >>at the code from a few steps away and clean up all those design
> > >>decisions that perhaps were made in a haste to make things work.
> > >>
> > >>I will give you another example to show how convoluted
> > >>is the code now:
> > >>
> > >>cv_timedwait() and cv_timedwait_sig() now have three
> > >>versions each (plain, bt, flags).
> > >>
> > >>These six are remapped through macros to two functions, _cv_timedwait()
> > >>and _cv_timedwait_sig(),  with a possible bug (cv_timedwait_bt()
> > >>maps to _cv_timedwait_sig() )
> > >>
> > >>These two _cv_timedwait*() take both ticks and bintimes,
> > >>and contain this sequence:
> > >>
> > >>+if (bt == NULL)
> > >>+sleepq_set_timeout_flags(cvp, timo, flags);
> > >>+else
> > >>+sleepq_set_timeout_bt(cvp, bt, precision);
> > >>
> > >>Guess what, both sleepq_* are macros that remap to the same
> > >>_sleepq_set_timeout(...) . So the above "if (bt == NULL)" is useless.
> > >>
> 

Re: /usr/src/sys/conf/newvers.sh, SYSDIR set to wrong directory.

2012-12-13 Thread Ian Lepore
On Wed, 2012-12-12 at 20:52 +0200, Kimmo Paasiala wrote:
> On Wed, Dec 12, 2012 at 6:53 PM, Ian Lepore
>  wrote:
> > On Wed, 2012-12-12 at 18:14 +0200, Kimmo Paasiala wrote:
> >> Hello,
> >>
> >> My 9-STABLE buildworld broke in a very inexplicable way,  I was
> >> getting an error on /usr/src/include/osreldate.h that I couldn't
> >> figure out until I started looking at the sys/conf/newvers.sh and what
> >> it does. It turned out that the thing that broke my buildworld was
> >> having .git directory at the root directory of the system because I
> >> recently started using GIT to track the configuration files.
> >>
> >> I added some debug echos to the newvers.sh and I found out it's
> >> setting SYSDIR to /bin/.. which in turn causes the newvers.sh to set
> >> the gitdir to /.git and that seems to break the logic in newvers.sh.
> >>
> >> Isn't SYSDIR supposed to be set to the sys -subdirectory of the source
> >> tree (/usr/src/sys default)?
> >>
> >> I'm guessing the reason the SYSDIR gets set to /bin/.. is the line in
> >> newvers.sh:
> >>
> >> SYSDIR=$(dirname $0)/..
> >>
> >> $0 is actually /bin/sh and not the path to newver.sh because the
> >> newvers.sh is sourced by the Makefile in /usr/src/include instead of
> >> executing it:
> >>
> >> osreldate.h: ${.CURDIR}/../sys/conf/newvers.sh 
> >> ${.CURDIR}/../sys/sys/param.h \
> >> ${.CURDIR}/Makefile
> >> @${ECHO} creating osreldate.h from newvers.sh
> >> @MAKE=${MAKE}; \
> >> PARAMFILE=${.CURDIR}/../sys/sys/param.h; \
> >> . ${.CURDIR}/../sys/conf/newvers.sh; \
> >>
> >> Now the question is how to fix this?
> >>
> >> -Kimmo
> >
> > Perhaps it could be handled similar to PARAMFILE, something like this in
> > the makefile:
> >
> >   PARAMFILE=${.CURDIR}/../sys/sys/param.h; \
> >   SYSDIR=${.CURDIR}/../sys; \
> >  . ${.CURDIR}/../sys/conf/newvers.sh; \
> >
> > I'm not sure if newvers.sh needs to work in ways that don't involve
> > being invoked from that makefile rule, so to be safe it could have
> > default handling, something like:
> >
> >  : ${SYSDIR:=$(dirname $0)/..}
> >
> > -- Ian
> >
> >
> 
> Thanks, that works. Should I file a PR about this?
> 
> -Kimmo

I think that would probably be a good idea, since no committer has
chimed in on this thread saying they're about to commit a fix.

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: /usr/src/sys/conf/newvers.sh, SYSDIR set to wrong directory.

2012-12-12 Thread Ian Lepore
On Wed, 2012-12-12 at 18:14 +0200, Kimmo Paasiala wrote:
> Hello,
> 
> My 9-STABLE buildworld broke in a very inexplicable way,  I was
> getting an error on /usr/src/include/osreldate.h that I couldn't
> figure out until I started looking at the sys/conf/newvers.sh and what
> it does. It turned out that the thing that broke my buildworld was
> having .git directory at the root directory of the system because I
> recently started using GIT to track the configuration files.
> 
> I added some debug echos to the newvers.sh and I found out it's
> setting SYSDIR to /bin/.. which in turn causes the newvers.sh to set
> the gitdir to /.git and that seems to break the logic in newvers.sh.
> 
> Isn't SYSDIR supposed to be set to the sys -subdirectory of the source
> tree (/usr/src/sys default)?
> 
> I'm guessing the reason the SYSDIR gets set to /bin/.. is the line in
> newvers.sh:
> 
> SYSDIR=$(dirname $0)/..
> 
> $0 is actually /bin/sh and not the path to newver.sh because the
> newvers.sh is sourced by the Makefile in /usr/src/include instead of
> executing it:
> 
> osreldate.h: ${.CURDIR}/../sys/conf/newvers.sh ${.CURDIR}/../sys/sys/param.h \
> ${.CURDIR}/Makefile
> @${ECHO} creating osreldate.h from newvers.sh
> @MAKE=${MAKE}; \
> PARAMFILE=${.CURDIR}/../sys/sys/param.h; \
> . ${.CURDIR}/../sys/conf/newvers.sh; \
> 
> Now the question is how to fix this?
> 
> -Kimmo

Perhaps it could be handled similar to PARAMFILE, something like this in
the makefile:

  PARAMFILE=${.CURDIR}/../sys/sys/param.h; \
  SYSDIR=${.CURDIR}/../sys; \
 . ${.CURDIR}/../sys/conf/newvers.sh; \

I'm not sure if newvers.sh needs to work in ways that don't involve
being invoked from that makefile rule, so to be safe it could have
default handling, something like:

 : ${SYSDIR:=$(dirname $0)/..}

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: 9.1-RC3 LiveCD missing features

2012-12-08 Thread Ian Lepore
On Fri, 2012-12-07 at 23:31 -0800, Garrett Cooper wrote:
> On Fri, Dec 7, 2012 at 11:06 PM, CeDeROM  wrote:
> > Hello Ian :-)
> >
> > This is the problem - / is read only and /etc/resolv.conf already links to
> > nonexistent file. This way I cannot modify its content nor link other file
> > (i.e. /var/resolv.conf) to /etc/resolv.conf. Creating /var/resolv.conf does
> > not help either.
> >
> > I think /etc/resolv.conf should point to /var/resolv.conf from start so the
> > resolver is functional :-)
> 
> I generally get around this with mdmfs and unionfs mounts, but
> it's a bit annoying... I'll see if I can file a PR with all of the
> things that need to be fixed/enhanced and maybe fix some of the items
> if I get some time (if the liveCD used rc.initdiskless it would be
> considerably simpler and some key filesystems would be writable after
> boot).
> Thanks,
> -Garrett

It shouldn't require rc.initdiskless; just the fact that rc.d/var
detects it can't write to /var should cause it to automatically create a
memory fileystem for it, and minimally populate it.  As far as I know,
this is automatic unless you use rc.conf knobs to disable it.

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: 9.1-RC3 LiveCD missing features

2012-12-07 Thread Ian Lepore
On Fri, 2012-12-07 at 23:34 +0100, Bas Smeelen wrote:
> On 12/07/12 23:11, Chuck Burns wrote:
> > On 12/7/2012 3:50 PM, CeDeROM wrote:
> >> Hello :-)
> >>
> >> I have tried to chceck for badblocks on my / but I did not find 
> >> badblocks
> >> program on LiveCD and there is no option to install it. This is very 
> >> useful
> >> utility, please add it as part of LiveCD :-)
> >>
> >> Also there is a problem with DHCP based workstations using LiveCD -
> >> although interface gets configured it is impossible to update
> >> /etc/resolv.conf (by dhclient and by hand) and so this workstation 
> >> pretty
> >> useless for IPv4 (is it more usable on IPv6?). Please update :-)
> >>
> >> Thank you :-)
> >> Tomek
> >>
> >>
> >
> > dd if=/dev/zer of=/dev/ada0
> >
> > ^^^ There's your "badblocks" program.  Any hard drive made in the last 
> > decade have been self-remapping..  Attempting to write to a bad block 
> > will cause the hard drive to remap an unused sector into it's place, 
> > until the drive runs out of said "unused" backup sectors, and at that 
> > time, will begin simply begin just "losing" storage space... IE the 
> > number of total sectors on the drive will begin to shrink.
> >
> :)
> 
> /dev/zero
> 
> Badblocks is outdated for more than 17 years I guess
> The dd mentioned above will let the firmware remap all bad sectors until 
> there are no spare sectors left (and wipe anything on disk as a bonus :) 
> ;then you can begin to think about replacing your harddrive.
> 
> As for DHCP, it works for me when booting from a netinstall for instance 
> or going to fixit.
> Tomek, please try to describe more accurately what you are doing and try 
> to accomplish
> 
> Cheers

When booting a system with a read-only root filesystem (a LiveCD is one
example of such), DHCP works in the sense that you get an IP address,
but because it can't write the nameserver address into /etc/resolv.conf
you're left with a system that's on a network but you can't do much with
it unless you have a really good memory for IP addresses.

It has to be fixed when the readonly filesystem is created.  If you
make /etc/resolv.conf a symlink to ../var/db/resolv.conf it works out
pretty well.  If you're not using dhcp, then instead of having a
missing /etc/resolv.conf you have a symlink to missing file.  When you
are using DHCP, it is able to write the resolv.conf file in /var and
life is good.

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Call for testers, users with scsi cards

2012-12-05 Thread Ian Lepore
On Tue, 2012-12-04 at 14:58 -1000, Jeff Roberson wrote:
> On Tue, 4 Dec 2012, Ian Lepore wrote:
> 
> > On Tue, 2012-12-04 at 14:49 -0700, Warner Losh wrote:
> >> On Dec 4, 2012, at 2:36 PM, Jeff Roberson wrote:
> >>
> >>> http://people.freebsd.org/~jeff/loadccb.diff
> >>>
> >>> This patch consolidates all of the functions that map cam control blocks 
> >>> for DMA into one central function.  This change is a precursor to adding 
> >>> new features to the I/O stack.  It is mostly mechanical.  If you are 
> >>> running current on a raid or scsi card, especially if it is a lesser used 
> >>> one, I would really like you to apply this patch and report back any 
> >>> problems.  If it works you should notice nothing.  If it doesn't work you 
> >>> will probably panic immediately on I/O or otherwise no I/O will happen.
> >>
> >> I haven't tested it yet.  My only comment from reading it though would be 
> >> to make subr_busdma.c be dependent on cam, since it can only used from 
> >> cam.  We've grown sloppy about noting these dependencies in the tree...
> >>
> >> Warner
> >
> > Hmmm, if it's only used by cam, why isn't it in cam/ rather than kern/ ?
> 
> kib pointed out drivers that use ccbs but do not depend on cam.  

Ahh, I didn't realize.

> I also 
> intend to consolidate many of the busdma_load_* functions into this 
> subr_busdma.c eventually.  I will add a load_bio and things like load_uio 
> and load_mbuf don't need to be re-implemented for every machine.  I will 
> define a MD function that allows you to add virtual or physical segments 
> piecemeal (as they all currently have) so that function may be called for 
> each member in the uio, mbuf, ccb, or bio.

I'm afraid the current near-identicalness of things like the load_mbuf
implementations have more to do with the cut-and-paste nature of how the
non-x86 implementations came to be, rather than actual correctness.

A proper implementation of the load_mbuf routines on architectures with
VIVT cache should involve setting some flags in the map so that the sync
operations can be handled differently for mbufs than for anonymous
memory.  (Mbufs are allowed to bend the rules about DMA buffers being
aligned to cacheline boundaries.)

The uio-related busdma operations for VIVT cache platforms are probably
just plain wrong -- like "would cause a panic" type wrong if they were
actually invoked.

I posted a set of patches that fix all the problems I know of in the
armv4 busdma implementation, except for the uio stuff.  It didn't get
much comment at the time and lacks a champion who can actually commit
the code.  They won't even apply cleanly anymore because of other
changes that have happened, I guess I should go re-spin the patchset and
post it again.

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Call for testers, users with scsi cards

2012-12-04 Thread Ian Lepore
On Tue, 2012-12-04 at 14:49 -0700, Warner Losh wrote:
> On Dec 4, 2012, at 2:36 PM, Jeff Roberson wrote:
> 
> > http://people.freebsd.org/~jeff/loadccb.diff
> > 
> > This patch consolidates all of the functions that map cam control blocks 
> > for DMA into one central function.  This change is a precursor to adding 
> > new features to the I/O stack.  It is mostly mechanical.  If you are 
> > running current on a raid or scsi card, especially if it is a lesser used 
> > one, I would really like you to apply this patch and report back any 
> > problems.  If it works you should notice nothing.  If it doesn't work you 
> > will probably panic immediately on I/O or otherwise no I/O will happen.
> 
> I haven't tested it yet.  My only comment from reading it though would be to 
> make subr_busdma.c be dependent on cam, since it can only used from cam.  
> We've grown sloppy about noting these dependencies in the tree...
> 
> Warner

Hmmm, if it's only used by cam, why isn't it in cam/ rather than kern/ ?

-- Ian

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Is cross-world building broken?

2012-12-02 Thread Ian Lepore
On Fri, 2012-11-30 at 14:16 -0800, Garrett Cooper wrote:
> On Nov 30, 2012, at 10:14 AM, Ian Lepore  
> wrote:
> 
> > On Fri, 2012-11-30 at 09:36 -0800, Simon J. Gerraty wrote:
> >> On Fri, 30 Nov 2012 08:15:03 -0700, Ian Lepore writes:
> >>> So when did this break, and why can't it be fixed?  I've been using
> >> 
> >> Sorry I missed the begining of this thread,
> >> is anything broken?
> > 

Apparently the only thing that's broken is the original implication in
this mail thread that something is broken, and also the advice that
DESTDIR must not be specified on a make command line for targets other
than installworld, installkernel, or distribute.  (To be fair, the
original was more of a question, but then what followed were replies
that reinforced an implication that something is really wrong.)

I finally got updated with -current this morning and I find that my
cross-build scripts which just blindly always pass DESTDIR= on
the make command line regardless of the targets being built still work
fine, like they always have.

> > I haven't experienced anything myself, I assumed because I've been too
> > busy to update any of my -current sandboxes for weeks.  I was just going
> > by the earlier messages in this thread, which were roughly 
> > 
> >  "A cross-build breaks early in the process if DESTDIR is set" 
> > 
> > followed by 
> > 
> >  "DESTDIR must only be set for installworld, buildworld, and distribute
> > targets."
> 
> s/buildworld/installkernel/

Oops, indeed; sorry for contributing to the confusion.

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: New jail does not understand nullfs

2012-12-02 Thread Ian Lepore
On Sun, 2012-12-02 at 01:17 -0600, Matt Donovan wrote:
> When attempting to start the jail in question the following happens
> 
> 
> server# jail -c poudriere
> jail: poudriere: unknown parameter: allow.mount.nullfs
> 
> 
> Below is my jail.conf
> 
> poudriere {
>  name=poudriere;
>  host.hostname=poudriere;
> ip4.addr="192.168.1.30";
>persist;
> children.max=10;
> allow.mount;
>  mount.devfs;
>   allow.mount.nullfs;
>   allow.raw_sockets;
>   allow.socket_af;
>   allow.sysvipc;
>   enforce_statfs=1;
>   path=/newsystem/jail/poudriere;
>   exec.stop="umount -a";
> }
> 
> Does the switch not work yet? As I am using CURRENT with the latest
> revision.
> 

That "mount.devfs" doesn't look right, should that have "allow." on the
front?  I wonder if that's the problem and the error report is off by
one line or something?

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: RFC: sysctl -f filename

2012-12-01 Thread Ian Lepore
On Sun, 2012-12-02 at 01:50 +0900, Hiroki Sato wrote:
> Hi,
> 
>  I would like comments about the attached patch for sysctl(8) to add a
>  new option "-f filename".  It supports reading of a file with
>  key=value lines.
> 
>  As you probably know, we already have /etc/sysctl.conf and it is
>  processed by rc.d/sysctl shell script in a line-by-line basis.  The
>  problem I want to fix is a confusing syntax of /etc/sysctl.conf.  The
>  file supports a typical configuration file syntax but problematic in
>  some cases.  For example:
> 
>   kern.coredump=1
> 
>  works well in /etc/sysctl.conf, but
> 
>   kern.coredump="1"
> 
>  does not work.  Similarly, it is difficult to use whitespaces and "#"
>  in the value:
> 
>   OK: kern.domainname=domain\ name\ with\ spaces
>   NG: kern.domainname="domain name with spaces"
>   NG: kern.domainname=domain\ name\ including\ #\ character
>   NG: kern.domainname=domain\ name\ including\ \#\ character
> 
>  The attached patch solves them, and in addition it displays an error
>  message with a line number if there is something wrong in the file
>  like this:
> 
>   % cat -n /etc/sysctl.conf
>   ...
>   10  kern.coredump=1
>   11  kern.coredump2=1
>   ...
> 
>   % /etc/rc.d/sysctl start
>   sysctl: kern.coredump at line 10: Operation not permitted
>   sysctl: unknown oid 'kern.coredump2' at line 11
> 
>   # /etc/rc.d/sysctl start
>   kern.coredump: 1 -> 1
>   sysctl: unknown oid 'kern.coredump2' at line 11
> 
>  Any comments are welcome.
> 
> -- Hiroki

This is cool, thanks.  

Shouldn't there be an update to sysctl.conf(5) to mention the ability to
handle quoting now?

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Is cross-world building broken?

2012-11-30 Thread Ian Lepore
On Fri, 2012-11-30 at 09:36 -0800, Simon J. Gerraty wrote:
> On Fri, 30 Nov 2012 08:15:03 -0700, Ian Lepore writes:
> >So when did this break, and why can't it be fixed?  I've been using
> 
> Sorry I missed the begining of this thread,
> is anything broken?
> 

I haven't experienced anything myself, I assumed because I've been too
busy to update any of my -current sandboxes for weeks.  I was just going
by the earlier messages in this thread, which were roughly 

  "A cross-build breaks early in the process if DESTDIR is set" 

followed by 

  "DESTDIR must only be set for installworld, buildworld, and distribute
targets."  

If either of those statements is true, that strikes me as breakage,
because it never used to be that way.

Maybe I've misunderstood something about the earlier messages in the
thread.

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Is cross-world building broken?

2012-11-30 Thread Ian Lepore
On Wed, 2012-11-28 at 16:19 -0800, Garrett Cooper wrote:
> On Wed, Nov 28, 2012 at 4:07 PM, Adrian Chadd  wrote:
> > top posting, out of laziness and busy-ness at work..
> >
> > Ok. So:
> >
> > * make installworld/installkernel/distribution - set DESTDIR on the command 
> > line
> > * make buildworld/make buildkernel/make  -
> > don't set DESTDIR?
> 
> (re-CCing -current@)
> Correct. DESTDIR should only be used for install targets (install,
> installworld, installkernel, distribution, etc), and not for any of
> the other build-related targets on FreeBSD. [as the meme goes] "If you
> set DESTDIR when building, you're gonna have a bad time".
> HTH!
> -Garrett

So when did this break, and why can't it be fixed?  I've been using
DESTDIR= on the command line for cross-builds as long as I've been doing
cross-builds and have never had a problem; doing this is just built in
to the scripts I use for cross-building.  It's still working for me in
some sandboxes that haven't been updated since r240278.  

Also, how about "make DESTDIR=foo buildkernel installkernel" which is
something I've been doing for years, you're saying that now that won't
work because DESTDIR will be there during the build part?

This just seems all wrong.

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Q) KLDload error

2012-11-16 Thread Ian Lepore
On Sat, 2012-11-17 at 11:04 +0900, ken wrote:
> From: Lucas James 
> > 
> > You will need to rebuild and install the virtualbox-ose-kmod port.
> > 
> > 
> > regards,
> > Lucas
> 
>   Yes, I did and yet I have the following error with "kldload vboxdrv".
> 
> Is "vm_page_lock_queues" renamed?  It is in 
> "./work/VirtualBox-4.1.22/src/VBox/Runtime/r0drv/freebsd/memobj-r0drv-freebsd.c"
> 
> # tail -f /var/log/messages 
>   :: :
> Nov 17 10:53:17 t3 pkg: virtualbox-ose-kmod-4.1.22 installed
> Nov 17 10:53:55 t3 kernel: link_elf_obj: symbol vm_page_lock_queues undefined
> Nov 17 10:53:55 t3 kernel: linker_load_file: Unsupported file type
>   :: :


It's not renamed, it's gone in favor of per-queue locks.  See r242941.

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: compiler info in kernel identification string

2012-11-14 Thread Ian Lepore
On Wed, 2012-11-14 at 10:25 +0100, Dimitry Andric wrote:
> On 2012-11-14 00:43, Mateusz Guzik wrote:
> > avg@ suggested to include compiler version in the kernel so that it's
> > present in uname (and one can easly tell what was used to compile it).
> >
> > Here is my attempt:
> > http://people.freebsd.org/~mjg/patches/newvers-compiler.diff
> >
> > Basically adds compiler name and version/revision after revision of
> > system sources.
> >
> > Sample output from dirty git sources:
> > gcc:
> > FreeBSD 10.0-CURRENT #7 r242962=264d569-dirty(gcc-4.2.1-20070831): Wed
> > Nov 14 00:11:51 CET 2012
> >
> > clang:
> > FreeBSD 10.0-CURRENT #8 r242962=264d569-dirty(clang-r162107): Wed Nov 14
> > 00:12:26 CET 2012
> >
> > Sample output from svn with gcc:
> > FreeBSD 10.0-CURRENT #1 r243006:243007M(gcc-4.2.1-20070831): Wed Nov 14
> > 00:41:23 CET 2012
> >
> > I have no strong opinions on format, I just want this information easly
> > accessible.
> 
> Yes, this is handy to have.  Note that gcc already puts an id string
> into each object file it produces, but sometimes during linking, these
> can be stripped out...
> 
> Regarding the format, I don't see the necessity of parsing the version
> information, which will always be very fragile.  Just include the
> complete version string in the compiler identification, similar to what
> Linux does, e.g. on a CentOS box:
> 
>$ gcc -v 2>&1 | grep 'version '
>gcc version 4.4.6 20120305 (Red Hat 4.4.6-4) (GCC)
> 
>$ dmesg | grep 'gcc version '
>Linux version 2.6.32-279.2.1.el6.x86_64 
> (mockbu...@c6b7.bsys.dev.centos.org) (gcc version 4.4.6 20120305 (Red Hat 
> 4.4.6-4) (GCC) ) #1 SMP Fri Jul 20 01:55:29 UTC 2012
> 
> That way, you are sure never to lose information.  This also works for
> gcc from ports (which is the reason for the space after 'version' in the
> grep command):
> 
>$ gcc47 -v 2>&1 | grep 'version '
>gcc version 4.7.3 20120929 (prerelease) (FreeBSD Ports Collection)
> 
> I realize this is a bit long, but it is better to have complete than
> stripped information.

Rather than just taking whatever the compiler emits, the proposed patch
seems to be carefully crafted to avoid breaking existing 3rd party tools
which parse uname output based on the location of whitespace.  I'm not
sure how important that is given that the uname manpage doesn't document
the output format as if it were somehow rigidly specified.  

I may be more sensitive to this than usual right now, after having
caused a bunch of grief to our manufacturing folks at work yesterday by
removing a useless line of output about an obsolete feature from a
script that I didn't realize they use in their automation.

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: 9.1-RC3 feels okay :-)

2012-11-08 Thread Ian Lepore
On Thu, 2012-11-08 at 08:45 -0700, Warren Block wrote:
> On Thu, 8 Nov 2012, CeDeROM wrote:
> 
> > I have tested additional options in xorg runtime :-)
> >
> > With the patched xorg mouse driver 1.7.1 (or driver version >=1.7.2)
> > situation is following:
> >
> > 1. With hald and dbus no xorg.conf file is needed. However it might bo
> > option to pass some additional featutes parameters with xorg.conf.
> > 2. With no hald and dbus mouse and keyboard does not work in xorg unless
> > Option "AllowEmptyInput" "False" is added to  Section "ServerLayout" by
> > hand in xorg.conf. Without this option input does not work even if
> > xorg.conf defines it! AllowEmptyInput=False forces to detect input deviced
> > by Xorg at startup.
> 
> No.  AllowEmptyInput is wrong.  It was causing so many problems that it 
> has been removed from later xorg-server releases.

This is disturbing news.  We build embedded systems at work that use X
for presentation and have no input devices.  I understand that
AllowEmptyInput is inappropriate to work around the problem we're
discussing here, but that doesn't mean it's never needed.

> Option "AutoAddDevices" "Off" is the one that means "dont' use Hal to 
> detect input devices".
> 
> > Thank you for this hint! This could be added to the handbook :-)
> > AllowEmptyInput=False should be a default for Xorg IMO we can report it to
> > the Xorg project! :-)
> 
> Really, the simplest solution is to build xorg-server with the HAL 
> option disabled.  I agree that this should be the default.

So if you're using xorg-server that was built with hal included (maybe
because you're more a package than a ports kind of person and have no
control over the build), is AutoAddDevices still the right option to
manipulate?  That is, will it disable the use of hal and fall back to
honoring the xorg.conf input devices even if the server was built with
hal support?

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: 9.1-RC3 feels okay :-)

2012-11-06 Thread Ian Lepore
On Wed, 2012-11-07 at 00:34 +0100, CeDeROM wrote:
> Isn't this a Xorg bug then? When I have no configuration file Hal
> should
> provide the configuration, so sooner or later the mouse should start
> moving... but is does not..
> 
> Do I get http://www.wonkity.com/~wblock/docs/html/aei.html correct
> that
> when I am using xorg.conf there is no need for Hal and when I am using
> Hal
> there is no need for xorg.conf?
> 
> Thanks :-)

I think that is true in general, usually you have one or the other.
There are times when you need an xorg.conf and you may have hald running
for other reasons, and then you have to get them to play nice together.
I had that situation at one time (I needed to customize something about
my monitor that wasn't auto-detected), but now it just works for me
without any xorg.conf.  When I did have both, turning off AutoAddDevices
and configuring sysmouse as the input device worked for me (but that was
on 8.x, not 9.x, and probably an older port of the X server).

I've also seen a couple sites recommend turning off AutoAddDevices if
you manually configure the mouse without mentioning hal specifically.
They just say things like "X will automatically find your mouse unless
you turn off AutoAddDevices."  It's unclear to me whether X is able to
do that without hal, or maybe those statements are just glossing over
important details to keep the explanation simple.

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: 9.1-RC3 feels okay :-)

2012-11-06 Thread Ian Lepore
On Tue, 2012-11-06 at 22:57 +0100, Julian H. Stacey wrote:
> Hi,
> Reference:
> > From:   CeDeROM  
> > Date:   Tue, 6 Nov 2012 22:14:03 +0100 
> > Message-id: 
> >  
> 
> CeDeROM wrote:
> > I have also noted that mouse cursor is very often not moving in Xorg
> > but it works in the console! I need to move cursor while statrx or
> > restart Xorg for mouse to start moving. Is it a bug or feature? :-)
> > 
> > In the xorg.conf:
> > Section "InputDevice"
> >  Identifier "Mouse0"
> >  Driver "mouse"
> >  Option "Protocol" "auto"
> >  Option "Device" "/dev/sysmouse"
> >  Option "ZAxisMapping" "4 5 6 7"
> > EndSection
> 
> Inside
>   Section "ServerLayout"
> Just after
>   InputDevice"Mouse0" "CorePointer"
> Append 
>   Option  "AllowEmptyInput" "False"
> 
> Cheers,
> Julian

Before you do that, read this:

 http://www.wonkity.com/~wblock/docs/html/aei.html

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: FreeBSD as read-only firmware

2012-11-03 Thread Ian Lepore
On Sat, 2012-11-03 at 08:01 -0700, Mehmet Erol Sanliturk wrote:
> I do not know exact data transmission rate of SDHC cards , but , I
> think ,
> it is faster than CD or DVD . For CD and DVD , at present there is NO
> any
> only READ CD or DVD devices . They are disappeared from the market .
> For
> writable CD or DVD , it may be possible to append some files at the
> end of
> recorded area , and the media may be corrupted by re-recording ( I
> think ) . 

Expect roughly 22-25MB/sec on a modern SDHC with a 4-bit datapath.

Be aware that there's no way to truly write protect an SD card.  There
is a write protect tab on a full-size card (but not on a MicroSD), but
it's not enforced in the card's hardware, it is a polite request to the
system "please don't write to this card" and some systems don't even
have the hardware to sense the switch position.  

Since it's flash-memory based, it also may corrupt the media on write,
including the possibility of corrupting existing data that has no
relation to the new data being written.  That is, you could have a
write-protected partition and a write-enabled partition on the same
SDCard, and writing into the write-enabled partition can damage data on
the write-protected partition.  This is because you have no control over
the way the embedded flash microcontroller allocates storage internally,
and it is free to place data pages from unrelated filesystems into the
same blocks (block = erase/programming sized unit).

I suspect all off-the-shelf nand-flash based storage has the same
problems, but CF and SDCard are the only ones I've got hands-on
experience with.  At work we're now moving away from CF and SDCard and
towards putting nand flash chips directly onto our boards, and using
FreeBSD to access them rather than relying on the behaviors of some
embedded microcontroller we know nothing about.

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: FILE's _file can only hold a short

2012-11-01 Thread Ian Lepore
On Wed, 2012-10-31 at 11:12 -0700, m...@freebsd.org wrote:
> I seem to recall a thread earlier on this limitation, but looking at
> actual libc/stdio sources, the 4 year old check for open(2)'s fd being
> less than SHRT_MAX is still there.  I thought I saw a patch to change
> this to an int, but it's not in the tree.  Was this in a PR or a
> mailing list thread or am I just imagining things?
> 
> We've run into this limitation at work, where some processes have
> around 32k open file descriptors and then try to use the libc FILE
> interface.  Since we control ABI we can just change this to int, but I
> had been hoping there was a FreeBSD revision we could pull instead of
> having another diff.

FWIW, I also remember some discussion recently (this year) on some
mailing list about this, but I can't find it now.  I thought it was
somehow related to in-lib versus external uses of the funopen()
function, but I may be conflating two unrelated discusssions in my head.

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: WITHOUT_GNU_[COMPAT|SUPPORT]

2012-10-13 Thread Ian Lepore
On Thu, 2012-10-11 at 19:45 +0200, Gabor Kovesdan wrote:
> Em 11-10-2012 19:09, Ian Lepore escreveu:
> > I want to build grep without the gnu regex library.  The makefile for
> > usr.bin/grep  contains
> > 
> >   .if !defined(WITHOUT_GNU_COMPAT)
> > 
> > And man src.conf documents WITHOUT_GNU_SUPPORT but doesn't mention
> > WITHOUT_GNU_COMPAT.  Is this a typo in the makefile, or an ommision from
> > the src.conf manpage?
> 
> That time when I added the WITHOUT_GNU_COMPAT knob I didn't make it
> global, just used it for testing grep. I didn't think it was of any use
> for users and I wasn't aware of the existence of WITHOUT_GNU_SUPPORT. If
> it seems useful, I can change grep to use this global flag instead of
> the custom knob and it will just be built without the gnu regex library
> if the knob is set.
> 
> Gabor

As it turns out, no hurry on changing the flag, because bsdgrep built
without the gnu regex library doesn't work well enough to complete a
buildworld.  I filed a PR.

 http://www.freebsd.org/cgi/query-pr.cgi?pr=172677

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: WITHOUT_GNU_[COMPAT|SUPPORT]

2012-10-11 Thread Ian Lepore
On Thu, 2012-10-11 at 19:45 +0200, Gabor Kovesdan wrote:
> Em 11-10-2012 19:09, Ian Lepore escreveu:
> > I want to build grep without the gnu regex library.  The makefile for
> > usr.bin/grep  contains
> > 
> >   .if !defined(WITHOUT_GNU_COMPAT)
> > 
> > And man src.conf documents WITHOUT_GNU_SUPPORT but doesn't mention
> > WITHOUT_GNU_COMPAT.  Is this a typo in the makefile, or an ommision from
> > the src.conf manpage?
> 
> That time when I added the WITHOUT_GNU_COMPAT knob I didn't make it
> global, just used it for testing grep. I didn't think it was of any use
> for users and I wasn't aware of the existence of WITHOUT_GNU_SUPPORT. If
> it seems useful, I can change grep to use this global flag instead of
> the custom knob and it will just be built without the gnu regex library
> if the knob is set.
> 
> Gabor

That would be helpful to us if you did that, thank you.  We try to avoid
including anything [L]GPL-licensed in the embedded-systems products we
ship at work.

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


WITHOUT_GNU_[COMPAT|SUPPORT]

2012-10-11 Thread Ian Lepore
I want to build grep without the gnu regex library.  The makefile for
usr.bin/grep  contains

  .if !defined(WITHOUT_GNU_COMPAT)

And man src.conf documents WITHOUT_GNU_SUPPORT but doesn't mention
WITHOUT_GNU_COMPAT.  Is this a typo in the makefile, or an ommision from
the src.conf manpage?

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: sysctl vs ifconfig vs other (was Re: sysctl-controlled key-value store ?)

2012-10-07 Thread Ian Lepore
On Sun, 2012-10-07 at 17:53 +0200, Luigi Rizzo wrote:
> Access through sysctl is incredibly easy from both userspace and
> from a C application, because all the work is done in the kernel
> side, whereas other mechanisms (ioctl, i'd rather leave kvm apart
> as we really don't want that!) require the definition of a specific
> API (ioctl, structs) _and_ some amount of wrapping code in userspace.
> 
> cheers
> luigi

A potential problem with sysctl is its "one thing at a time" nature.
When you pack up a bunch of related data into a structure and hand it
off to an implementation, that implementation can pretty easily make
sure that all the data related to the config request is sane.  If you
have to make a series of sysctl calls to achieve some complex config
task, what happens when you're 2/3 of the way through the series and a
call fails?  Who backs out the partial config that got accomplished?

If you go too far down this path you end up with something that looks a
lot like the unmitigated mess which is the SNMP control API.

-- Ian

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [SPAM]Re: Latest -CURRENT/i386 could not start under VirutalBox 4.1.18 and 4.2 (Windows host): hangs up after atrtc0 detection

2012-10-04 Thread Ian Lepore
On Thu, 2012-10-04 at 22:24 +0200, Marek Salwerowicz wrote:
> W dniu 2012-10-04 20:51, Lev Serebryakov pisze:
> > Hello, Marek.
> > You wrote 3 октября 2012 г., 23:17:35:
> >
> >>> atrtc0:  port 0x70-0x71 on acpi0
> > MS> still the same in my environment, running FreeBSD 9.1 under ESXi5.1 host
> > MS> Do you have any solution?
> >   In my case it was local patch for exotic embedded chipset...
> Can you send me the patch so I can have a look if I don't use the same 
> chipset ?
> 
> Regards,

It is the patch attached to this PR:

 http://www.freebsd.org/cgi/query-pr.cgi?pr=170705

The patch fixes a problem with old AMD Geode chipsets, but causes a hang
at atrtc attach when run under virtualbox, and I haven't had time yet to
install and learn to use vbox enough to debug it.

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Latest -CURRENT/i386 could not start under VirutalBox 4.1.18 and 4.2 (Windows host): hangs up after atrtc0 detection

2012-09-19 Thread Ian Lepore
On Wed, 2012-09-19 at 15:10 -0700, Adrian Chadd wrote:
> On 19 September 2012 14:57, Garrett Cooper  wrote:
> > On Wed, Sep 19, 2012 at 1:46 PM, Ian Lepore
> >  wrote:
> >
> > ...
> >
> >> Yes, exactly.  I updated the PR to request that my patch not get
> >> committed because it locks up virtualbox.  I hope to find time soon to
> >> learn enough about installing/configuring virtualbox to figure out what
> >> the problem is (offhand,I suspect it hangs in the loop that probes for
> >> the need to re-index, because vbox doesn't quite emulate the hardware
> >> behavior fully).
> >
> > Why not just detect VBox and disable that functionality? VMware at
> > least has a sane way of determining whether or not you're running it
> > based on the SMBios ident..
> 
> Sure, but that doens't answer the underlying reason(s) of "why is it
> failing?". :-)

Yeah, I'd much rather understand a problem than tap dance around it, at
least for starters.  Figuring out what's really going on may lead to a
discovery that it would fail in other circumstances as well, or it may
lead to a bugfix in vbox if that's where the problem lies.  I'm just a
bit too busy with $work right now to dig into it.

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Latest -CURRENT/i386 could not start under VirutalBox 4.1.18 and 4.2 (Windows host): hangs up after atrtc0 detection

2012-09-19 Thread Ian Lepore
On Wed, 2012-09-19 at 14:30 -0700, Adrian Chadd wrote:
> On 19 September 2012 14:12, Ian Lepore  wrote:
> 
> >> Right. Being totally clueless, is atrc_start() called just at
> >> probe/attach, or during normal operation?
> >>
> >
> > It's called just once, from the attach() routine for the rtc device.
> 
> Right. Just have it loop over say 100 times, with a 10us sleep between
> each. Shouldn't that be enough?
> 

If by "sleep" you mean any form of pausing or sleeping that waits for a
given amount of time... remember when this code is running we're still
in the process of trying to figure out which clocks can be used for such
purposes.  That leaves DELAY(), which does pretty much the equivelent of
what the loop in question is doing.  Hmmm, but DELAY() does have the
advantage of busy-looping for a known amount of time, making it easier
to constrain the time spent in the loop regardless of the speed of the
cpu.  I'll have to look into how DELAY() is implemented for x86 and see
if it's usable in this context.

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Latest -CURRENT/i386 could not start under VirutalBox 4.1.18 and 4.2 (Windows host): hangs up after atrtc0 detection

2012-09-19 Thread Ian Lepore
On Wed, 2012-09-19 at 14:08 -0700, Adrian Chadd wrote:
> On 19 September 2012 14:05, Ian Lepore  wrote:
> 
> >> Add something to atrtc_start() to only loop over that loop say, 64k
> >> times before dropping out; and print an error if it hits that
> >> condition.
> >>
> >> Also, what's that RTCSA_8192 bit do?
> >
> > That should set the interrupt rate really high, to minimize the time
> > wasted waiting for the status bit to change in the register.  Maybe
> > that's the part that vbox isn't emulating well and so it never simulates
> > an interrupt and leaves that loop.  Or maybe because the loop is a tight
> > busy-wait the emulator never gets control to simulate the occurance of
> > the interrupt.
> 
> Right. Being totally clueless, is atrc_start() called just at
> probe/attach, or during normal operation?
> 

It's called just once, from the attach() routine for the rtc device.

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Latest -CURRENT/i386 could not start under VirutalBox 4.1.18 and 4.2 (Windows host): hangs up after atrtc0 detection

2012-09-19 Thread Ian Lepore
On Wed, 2012-09-19 at 14:00 -0700, Adrian Chadd wrote:
> On 19 September 2012 13:54, Lev Serebryakov  wrote:
> > Hello, Ian.
> > You wrote 20 сентября 2012 г., 0:46:24:
> >
> > IL> Yes, exactly.  I updated the PR to request that my patch not get
> > IL> committed because it locks up virtualbox.  I hope to find time soon to
> > IL> learn enough about installing/configuring virtualbox to figure out what
> > IL> the problem is (offhand,I suspect it hangs in the loop that probes for
> > IL> the need to re-index, because vbox doesn't quite emulate the hardware
> > IL> behavior fully).
> >  How could I help? Is it possible to debug kernel on such early stage?
> 
> Add something to atrtc_start() to only loop over that loop say, 64k
> times before dropping out; and print an error if it hits that
> condition.
> 
> Also, what's that RTCSA_8192 bit do?

That should set the interrupt rate really high, to minimize the time
wasted waiting for the status bit to change in the register.  Maybe
that's the part that vbox isn't emulating well and so it never simulates
an interrupt and leaves that loop.  Or maybe because the loop is a tight
busy-wait the emulator never gets control to simulate the occurance of
the interrupt.

-- Ian

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Latest -CURRENT/i386 could not start under VirutalBox 4.1.18 and 4.2 (Windows host): hangs up after atrtc0 detection

2012-09-19 Thread Ian Lepore
On Thu, 2012-09-20 at 00:37 +0400, Lev Serebryakov wrote:
> Hello, Freebsd-current.
> You wrote 20 сентября 2012 г., 0:22:00:
> 
> LS>   I've upgraded my FreeBSD-CURRENT Virtual machine, which I use to
> LS>  build router's NanoBSD image, to today's morning (MSK time, GMT+4)
> LS>  revision. Unfortunately, I cannot provide exact version, as sources
> LS>  are in this unbootable VM too :)
>   Revision is 240689
> 
>   It looks like patch to RTC which is useful on Geode LX causes this
>  hang.
> 

Yes, exactly.  I updated the PR to request that my patch not get
committed because it locks up virtualbox.  I hope to find time soon to
learn enough about installing/configuring virtualbox to figure out what
the problem is (offhand,I suspect it hangs in the loop that probes for
the need to re-index, because vbox doesn't quite emulate the hardware
behavior fully).

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Clang as default compiler November 4th

2012-09-13 Thread Ian Lepore
On Wed, 2012-09-12 at 19:08 -0700, Steve Kargl wrote:
> In regards to my initial post in this thread, I was just trying
> to assess whether any benchmarks have been performed on FreeBSD
> for floating point generated by clang.  Other than the limited
> testing that I've done, it appears that the answer is 'no'.
> 

We have src/tools/tests/testfloat and src/tools/regression/lib/msun.  I
know nothing about the former (just noticed it for the first time).  The
latter I think is a set of correctness tests rather than performance
tests.

-- Ian

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [patch] mmap() MAP_TEXT implementation (to use for shared libraries)

2012-09-07 Thread Ian Lepore
On Fri, 2012-09-07 at 21:53 +0300, Konstantin Belousov wrote:
> On Fri, Sep 07, 2012 at 12:48:19PM -0600, Ian Lepore wrote:
> > On Fri, 2012-09-07 at 21:41 +0300, Konstantin Belousov wrote:
> > > After a second thought, I do not like your proposal as well. +x is set for
> > > shebang scripts, and allowing PROT_EXEC to set VV_TEXT for them means
> > > that such scripts are subject for write denial.
> > 
> > You say that like it's a bad thing.  I hate it when I accidentally edit
> > a script that's running and then the script fails because I did.  I
> > would be much happier if it acted just like any other executable and
> > prevented modification while it's running.
> 
> For me, if other user can block my modifications of my script by the mere
> fact that script has o+rx rights, is indeed bad. I do use real machines
> sometime.

But you don't feel the same way about a compiled program?

I see absolutely no difference between the two, conceptually.  To me,
changing an application while it's running is bad.  It makes no
difference what language the application is written in.

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [patch] mmap() MAP_TEXT implementation (to use for shared libraries)

2012-09-07 Thread Ian Lepore
On Fri, 2012-09-07 at 21:41 +0300, Konstantin Belousov wrote:
> After a second thought, I do not like your proposal as well. +x is set for
> shebang scripts, and allowing PROT_EXEC to set VV_TEXT for them means
> that such scripts are subject for write denial.

You say that like it's a bad thing.  I hate it when I accidentally edit
a script that's running and then the script fails because I did.  I
would be much happier if it acted just like any other executable and
prevented modification while it's running.

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Help. Porting "FreeOCL" fails (atomic_ops.h missing, CLANG++ libc++ issues ...)

2012-09-06 Thread Ian Lepore
On Thu, 2012-09-06 at 13:14 +0200, O. Hartmann wrote:
> I tried to add
> 
> RUN_DEPENDS=
> ${LOCALBASE}/lib/libatomic_ops.a:${PORTSDIR}/devel/libatomic_ops
> 
> to my provided Makefile, but this doesn't install the port
> devel/libatomic_ops.
> This is weird and inconsistent. I follow exact the steps suggested in
> the Porter's handbook, the _DEPENDS= section. The above RUN_DEPENDS=
> tag
> should ensure a check for the existence of the static library
> 
> /usr/local/lib/libatomic_ops.a
> 
> and if not existent, then install it. It doesn't work. Unfortunately,
> LIB_DEPENDS is considered for "shared libraries", so it isn't
> suitable.
> But LIB_DEPENDS get recognized, even if it fails, while RUN_DEPENDS
> seems not to be touched by the build process anyway ... 

I am SO not a ports expert, but I think maybe for a static lib you need
BUILD_DEPENDS because it has to be available at build-time rather than
run-time.

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: atomic_ops.h: missing ...

2012-09-05 Thread Ian Lepore
On Wed, 2012-09-05 at 16:11 +0200, O. Hartmann wrote:
> Hello.
> 
> While fiddling around with software that is looking for an include file
> "atomic_ops.h", which seems to reside in the FreeBSD operating system's
> sources with lib/lbkse, I'd like to know whether those architecture
> specific header files are installed in some places, where they could be
> found by the regular ports building environment (without necessarily
> having the OS sources installed).
> 
> I'm working on FreeBSD 10.0-CUR with OS sources installed.
> 
> Doing a "locate atomic_ops.h" reveals
> 
> /usr/local/include/cpl_atomic_ops.h
> /usr/src/lib/libkse/arch/amd64/include/atomic_ops.h
> /usr/src/lib/libkse/arch/arm/include/atomic_ops.h
> /usr/src/lib/libkse/arch/i386/include/atomic_ops.h
> /usr/src/lib/libkse/arch/ia64/include/atomic_ops.h
> /usr/src/lib/libkse/arch/powerpc/include/atomic_ops.h
> /usr/src/lib/libkse/arch/sparc64/include/atomic_ops.h
> 
> Is this include missed by intention or is it a bug?
> 
> Thanks in advance.

There also used to be an atomic_ops.h in the libpthread implementation
in days of old.  I think both it and the one in libkse are intended to
be private to the library implementation and they don't get installed.

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: pkgng suggestion: renaming /usr/sbin/pkg to /usr/sbin/pkg-bootstrap

2012-08-26 Thread Ian Lepore
On Sun, 2012-08-26 at 20:58 +0200, Baptiste Daroussin wrote:
> On Sun, Aug 26, 2012 at 11:39:07AM -0700, Doug Barton wrote:
> > On 08/26/2012 05:58, Baptiste Daroussin wrote:
> > This isn't the security issue I was talking about by having sbin/pkg
> > pass every command line to local/sbin/pkg.
> > 
> > You keep saying that you have no objections to changing the name. I am
> > asking you to do that. I don't care if it is pkg-bootstrap or something
> > else you like better. But please change the name to not be pkg, and
> > limit the functionality of the tool to bootstrapping the pkg package.
> > 
> 
> I received more feedback about keep pkg and changing it to
> pkg-bootstrap, so what should I do, changing it because you are asking for it?

Would this get better if the bootstrap tool were named pkg and were
installed on a fresh system at /usr/local/sbin, so that it in effect
replaces itself with the real thing, and has no need to leave a
forwarding stub in /usr/sbin ?

Maybe it could rename itself to /usr/local/sbin/pkg-bootstrap as part of
replacing itself, so that you could re-bootstrap your way out of a
problem later.

Hmmm, might have to be careful that future updates don't replace the
real thing with a newer bootstrap program.  

-- Ian

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: r239356: does it mean, that synchronous dhcp and dhcplcinet with disabled devd gone?

2012-08-21 Thread Ian Lepore
On Tue, 2012-08-21 at 14:04 -0600, Warner Losh wrote:
> On Aug 21, 2012, at 11:42 AM, Lev Serebryakov wrote:
> 
> > Hello, Ian.
> > You wrote 21 августа 2012 г., 21:36:30:
> > 
> > IL> I think it's funny how people have this knee-jerk reaction against C++
> > IL> apps.  The devd executable is not exactly an example of bloatware: 374k
> > IL> statically linked (so it already includes this "C++ runtime" that you
> > IL> think is large).We routinely deploy embedded systems that use apps
> > IL> written exclusively in C++, on systems that only have 32 or 64mb of ram.
> > IL> We've been doing so since the days when the biggest compact flash card
> > IL> you could buy was 64mb.
> >  BTW, typical  MIPS  SoC-based  router has only 16MiB of flash. And,
> > yes, FreeBSD doesn't fit well in this size now, but why add another
> > mandatory program, only role of which is to monitor network cable and
> > re-run the same program every time?
> 
> You'd typically not run dhclient in daemon mode in a SoC, since you don't 
> want to chew up the memory all the time, and you'd likely replace the system 
> dhclient with one that's simpler...  But the network notification part of 
> devd would be trivial to reproduce if you wanted in a specialized daemon that 
> would do what's required.
> 
> Warner
> 

For example, this script can replace devd as a daemon that restarts
dhclient when any link comes back up...


 #!/bin/sh
 daemon_loop () {
   while true; do   
 
 read line
 if [ -z "${line##!system=IFNET subsystem=* type=LINK_UP}" ] ; then
   eval ${line##!}
   /sbin/dhclient $subsystem
 fi
   done
 }
 cat /dev/devctl | daemon_loop

Of course the right thing to do is invoke the proper rc scripts rather
than dhclient directly... this is just to illustrate how easy it is to
replace devd if your needs are specialized.

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: r239356: does it mean, that synchronous dhcp and dhcplcinet with disabled devd gone?

2012-08-21 Thread Ian Lepore
On Tue, 2012-08-21 at 21:01 +0400, Lev Serebryakov wrote:
> IL> The important point is that if you unplug the cable then plug it into a
> IL> different network, now the right thing will happen -- you will acquire
> IL> an address on the new network.  That's the reason that this change is an
> IL> important bugfix for a long standing (many many years) bug in freebsd's
> IL> dhclient.
>   No, I'll be without dhclient at all, if I don't use devd :(. And
>  absence of devd is completely legal, and should be supported. It is
>  perfectly valid and sensible setup for small devices (think:
>  MIPS-based routers, which are started to be supported now), where devd
>  could be very costly in both terms of flash size (it is C++
>  application and need C++ runtime!) and memory (only devd event on
>  such devices are this cable plugging/unplugging -- so using devd
>  doesn't add any value for such setups).
> 

I think it's funny how people have this knee-jerk reaction against C++
apps.  The devd executable is not exactly an example of bloatware: 374k
statically linked (so it already includes this "C++ runtime" that you
think is large).We routinely deploy embedded systems that use apps
written exclusively in C++, on systems that only have 32 or 64mb of ram.
We've been doing so since the days when the biggest compact flash card
you could buy was 64mb.

Perhaps the right solution is to add a dhclient command line option to
operate in the historical buggy mode: it doesn't exit on link status
changes, and fails to work properly if those link status changes are
happening because the physical connection has moved to another network. 

If so, I think the default should be to work correctly, and folks
depending on the historical buggy behavior will have to add a parm to
rc.conf.

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: r239356: does it mean, that synchronous dhcp and dhcplcinet with disabled devd gone?

2012-08-21 Thread Ian Lepore
On Tue, 2012-08-21 at 19:26 +0400, Lev Serebryakov wrote:
> Hello, Ian.
> You wrote 21 августа 2012 г., 19:16:03:
> 
> IL> It has worked this way for me for years.  Does it somehow not work this
> IL> way for everyone?
>Please, read comment to r239356. Starting from this revision
>  dhclient exists on interface down and _remiove_ IP address from
>  interface. Removal of address from interface will drop all open
>  connections, which uses this address.
> 

Aha!  That's where the confusion is happening -- I didn't read the
comment, I read the code.

I don't know what "teardown the configured lease" in that comment means,
but it doesn't mean that the interface loses its current configuration,
or that any existing connections are perturbed.  

If the cable is plugged back into the same network, the interface will
get the same address it last had and existing connections continue to
work, unless the dhcp server recycled that lease to another client while
the cable was unplugged (highly unlikely unless the server/network is
starved for addresses, since the dhcpd design is to avoid recycling
recently-used addresses).

The important point is that if you unplug the cable then plug it into a
different network, now the right thing will happen -- you will acquire
an address on the new network.  That's the reason that this change is an
important bugfix for a long standing (many many years) bug in freebsd's
dhclient.

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: r239356: does it mean, that synchronous dhcp and dhcplcinet with disabled devd gone?

2012-08-21 Thread Ian Lepore
On Tue, 2012-08-21 at 19:04 +0400, Lev Serebryakov wrote:
> Hello, John.
> You wrote 21 августа 2012 г., 17:34:31:
> 
> JB> Humm.  devd is the more common case, and we explicitly don't use devd to 
> start
> JB> dhclient on boot even when devd is enabled (so out of the box dhcp would 
> first
> JB> be started by rc, but would be restarted by devd).
>   It  is  strange,  and,  maybe, changed some time ago, because when I
> disable "devd" on my NanoBSD-based router (about year or year and half
> ago), I've spent several hours to understand, why dhclient doesn't
> start anymore. And I need to add this to rc.conf:
> 
> synchronous_dhclient="YES"
> 
> JB> Another option is to rework dhclient to work like it does on OpenBSD 
> where it
> JB> renews its lease if the link bounces, but to not exit when the link goes 
> down.
>   Yes, it looks like proper solution.
> 
> JB> That case would fix the currently broken case that you unplug your cable, 
> take
> JB> your laptop over to another network (e.g. take it home if suspend/resume
> JB> works), then plug it back in and are still stuck with your old IP.
>   Yep. But _committed_ solution is very bad. For example, my ISP's
>  switch lost link every second day for second or two. I don't want to
>  lost all open connections, firewall state, etc, and to restart
>  dhclinet by hands, especially, when I'\m not at home anf my
>  girlfriend is. in such case. Another good example was provided by
>  Slava -- WiFi could disconnect for 10-15 seconds for multiple
>  reasons, and dropping of IP and all connections in such case is MAJOR
>  headache.
> 

I don't understand all this talk that makes it sound like you lose your
existing network connections when dhclient exits.  I don't experience
anything like that at all, and never have.  I just pulled the network
cable on this machine, did "sudo killall dhclient", plugged the network
back in, I still have all my ssh connections to the world in a dozen
open windows and can interact with any of them.  Then I did "sudo
dhclient re0" (simulating devd restarting dhclient on link-up) and it
reacquired a lease for the same IP it had before I killed it, and still
all my open connections are open.

It has worked this way for me for years.  Does it somehow not work this
way for everyone?

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: CURRENT as gateway on not-so-fast hardware: where is a bottlneck?

2012-08-17 Thread Ian Lepore
On Fri, 2012-08-17 at 14:29 -0700, Kevin Oberman wrote:
> On Fri, Aug 17, 2012 at 10:11 AM, Ian Lepore
> > No!  Not bde!  He'll notice that I violated style(9) by accidentally
> > leaving an extra blank line between a comment block and the function
> > definition.  :)  (There are probably more violations than that -- I did
> > this when I was first trying to come to grips with the differences
> > between style(9) and the almost-style(9) standards we use at work.)
> >
> > When I first proposed the changes, jhb remarked that they sounded good,
> > but as far as I know, nobody reviewed the actual diff when I posted it.
> > It looks like bde and phk were the primary maintainers back when this
> > code was being more actively worked on.
> 
> Why not bde? Everyone needs to learn what the term "bruceification" means.
> 
> Believe me, there IS good reason for programming style and almost
> everyone with a commit bit gets close. bde will provide a reminder of
> any of those things you forgot were in style(9). This is something we
> should appreciate, even if it does sting a bit.

Did you miss the smiley I buried between two sentences there?

Having worked on code written with no style guidelines, I totally
understand the need for consistent style.  While I find a couple of
style(9)'s edicts to be massively annoying, all in all I'd rather work
on code that has a consistent style I hate than on code with no
consistency.

-- Ian

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: CURRENT as gateway on not-so-fast hardware: where is a bottlneck?

2012-08-17 Thread Ian Lepore
On Fri, 2012-08-17 at 09:58 -0700, Adrian Chadd wrote:
> On 17 August 2012 07:56, Ian Lepore  wrote:
> 
> > That result actually matches my expectation... it fixed only a part of
> > your problem.  I suspected (without very good evidence) that you may
> > have two unrelated problems; hopefully now that we've eliminated one the
> > other will be easier to find.
> >
> > I've submitted a PR with that patch attached, since it has now been
> > shown to fix a problem on two different sets of (similar) hardware:
> >
> >   http://www.freebsd.org/cgi/query-pr.cgi?pr=170705
> 
> Hm, who's a good person to review this stuff? Maybe bde?
> 

No!  Not bde!  He'll notice that I violated style(9) by accidentally
leaving an extra blank line between a comment block and the function
definition.  :)  (There are probably more violations than that -- I did
this when I was first trying to come to grips with the differences
between style(9) and the almost-style(9) standards we use at work.)

When I first proposed the changes, jhb remarked that they sounded good,
but as far as I know, nobody reviewed the actual diff when I posted it.
It looks like bde and phk were the primary maintainers back when this
code was being more actively worked on.

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: CURRENT as gateway on not-so-fast hardware: where is a bottlneck?

2012-08-17 Thread Ian Lepore
On Fri, 2012-08-17 at 14:38 +0400, Lev Serebryakov wrote:
> Hello, Ian.
> You wrote 16 августа 2012 г., 21:47:06:
> 
> 
> IL> It's a long shot, but if the trouble you're seeing has the same cause,
> IL> it should be fixed by this patch:
> IL> 
> http://lists.freebsd.org/pipermail/freebsd-hackers/2012-January/037233.html
>  It  looks  like, this patch fixes freezes under network load. I could
> not  repeat  freezes now (except when `ktrdump' works, but I think, it
> is Ok).
> 
>  It  also change "top" layout of processes: em0 tasq is not on the top
> now, and system have enough idel time even under load.
> 
>  But WiFi is affected by wire traffic :(
> 

That result actually matches my expectation... it fixed only a part of
your problem.  I suspected (without very good evidence) that you may
have two unrelated problems; hopefully now that we've eliminated one the
other will be easier to find.

I've submitted a PR with that patch attached, since it has now been
shown to fix a problem on two different sets of (similar) hardware:

  http://www.freebsd.org/cgi/query-pr.cgi?pr=170705

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: CURRENT as gateway on not-so-fast hardware: where is a bottlneck?

2012-08-16 Thread Ian Lepore
On Wed, 2012-08-15 at 14:40 +0400, Lev Serebryakov wrote:
> Hello, Alexander.
> You wrote 15 августа 2012 г., 14:18:05:
> 
> 
> AM> It is quite pointless to speculate without real info like mentioned
> AM> above KTR_SCHED traces. Main thing I've learned about schedulers, things
> AM> there never work as you expect. There are two many factors are relations
> AM> to predict behavior in every case.
>   I'll take these with as much variants (ULE and 4BSD, polling with
> HZ=1000 and interrupts with default HZ) as I can, in day or two.
>   Now I have kernels with KTR compiled in (GEN, NET and SCHED).
> 
> AM> About Soekris and idle CPU measurement, let's start from what kind of 
> AM> eventtimer is used there. As soon as it is UP machine, I guess it uses
> AM> i8254 timer in periodic mode. It means that it by definition can't
>  It doesn't have any other timers. You could think about this machine
> as about good old "true" i386, with PCI (and some additional fancy
> commands in CPU core, something like classic Pentium) but
> nothing more.
> 
> kern.eventtimer.choice: i8254(100) RTC(0)
> kern.eventtimer.et.RTC.flags: 17
> kern.eventtimer.et.RTC.frequency: 32768
> kern.eventtimer.et.RTC.quality: 0
> kern.eventtimer.et.i8254.flags: 1
> kern.eventtimer.et.i8254.frequency: 1193182
> kern.eventtimer.et.i8254.quality: 100
> kern.eventtimer.periodic: 1
> kern.eventtimer.timer: i8254
> kern.eventtimer.activetick: 1
> kern.eventtimer.idletick: 0
> kern.eventtimer.singlemul: 2
> 
> AM> properly measure load from treads running from hardclock, such as 
> AM> dummynet, polling netisr threads, etc.
>   You see, here are two different problems:
> 
> (a) with polling, system is responsive under any load, but wire2wifi
> performance  is hugely affected by wire2wire traffic (and mpd5
> inbetween). And, yes, "top" seems to lie about idle time.
> 
> (b) with interrupts, system works much better when it works (wire2wifi
> speed is affected by wire2wire traffic, but to much less extent), but
> it freezes every third minute for minute, when traffic is passed, but
> no user-level applications including BIND and DHCP server) works at
> all FOR MINUTE OR MORE. It not looks like 100ms lag, which could affect
> video playback. It looks like 60-120 seconds lag! At least, in case of
> ULE, I didn't try 4BSD yet.
> 

I had trouble earlier this year with an industrial single-board computer
that uses the same chipset as your Soekris (Geode 500 + CS5536) where
the interrupt handler for the RTC chip would occasionally get stuck in a
loop for a minute or more at a time, making userland processes
completely unresponsive during that time.

It's a long shot, but if the trouble you're seeing has the same cause,
it should be fixed by this patch:

http://lists.freebsd.org/pipermail/freebsd-hackers/2012-January/037233.html

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: newbus' ivar's limitation..

2012-07-30 Thread Ian Lepore
On Mon, 2012-07-30 at 17:06 -0400, John Baldwin wrote:
> On Tuesday, July 17, 2012 2:03:14 am Arnaud Lacombe wrote:
> > Hi,
> > 
> > On Fri, Jul 13, 2012 at 1:56 PM, Arnaud Lacombe  wrote:
> > > Hi,
> > >
> > > On Thu, Jul 12, 2012 at 1:20 AM, Warner Losh  wrote:
> > >> [..]
> > >> Honestly, though, I think you'll be more pissed when you find out that 
> the N:1 interface that you want is being done in the wrong domain.  But I've 
> been wrong before and look forward to seeing your replacement.
> > >>
> > > I will just pass function pointers for now, if things should be done
> > > dirty, let's be explicit about it.
> > >
> > > Now, the hinted device attachment did work quite smoothly, however, I
> > > would have a few suggestion:
> > >  1) add a call to bus_enumerate_hinted_children() before the call
> > > DEVICE_IDENTIFY() call in bus_generic_driver_added()
> > >
> > > this is required to be able to support dynamic loading and attachment
> > > of hinted children.
> 
> I'm not sure this is a feature we want to support (to date hinted children
> have only been created at boot time). 

It seems to me that the bus should be in control of calling
bus_enumerate_hinted_children() at whatever time works best for it.
Also, shouldn't it only ever be called once?

The comment block for BUS_HINTED_CHILD in bus_if.h says "This method is
only called in response to the parent bus asking for hinted devices to
be enumerated."  I think one of the implications of that is that any
given bus may not call bus_enumerate_hinted_children() because it may
not be able to do anything for hinted children.  Adding a
"hint.somedev.0.at=somebus" and then forcing the bus to enumerate hinted
children amounts to forcing the bus to adopt a child it may not be able
to provide resources for, which sounds like a panic or crash waiting to
happen (or at best, no crash but nothing useful happens either).

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Adding support for WC (write-combining) memory to bus_dma

2012-07-12 Thread Ian Lepore
On Thu, 2012-07-12 at 10:40 -0400, John Baldwin wrote:
> I have a need to allocate static DMA memory via bus_dmamem_alloc() that is 
> also WC (for a PCI-e device so it can use "nosnoop" transactions).  This is 
> similar to what the nvidia driver needs, but in my case it is much cleaner to 
> allocate the memory via bus dma since the existing code I am extending all 
> uses busdma.
> 
> I have a patch to implement this on 8.x for amd64 that I can port to HEAD if 
> folks don't object.  What I would really like to do is add a new paramter to 
> bus_dmamem_alloc() to specify the memory attribute to use, but I am hesitant 
> to break that API.  Instead, I added a new flag similar to the existing 
> BUS_DMA_NOCACHE used to allocate UC memory.
> 
> While doing this, I ran into an old bug, which is that if you were to call 
> bus_dmamem_alloc() with BUS_DMA_NOCACHE but a tag that otherwise fell through 
> to using malloc() instead of contigmalloc(), bus_dmamem_alloc() would actually
> change the state of the entire page.  This seems wrong.  Instead, I think 
> that 
> any request for a non-default memory attribute should always use 
> contigmalloc().  

The problem I have with this (already, even before your proposed
changes) is that contigmalloc() is only able to allocate pages.  In the
ARM world we have a need to allocate BUS_DMA_COHERENT memory (same
effect as BUS_DMA_NOCACHE; we should consolidate these names) that is
aligned to a 32-byte boundary (cacheline-aligned) but usually the buffer
is far smaller than a page, often smaller than 1k, and sometimes we need
lots of them (allocating 128 pages for ethernet buffers, with only half
of each page used, is unreasonably expensive on a platform with only
64mb to begin with).

I keep thinking what's needed is a busdma allocation helper routine,
something MI that can be used by the various MD busdma implementations,
that can manage a pool of pages that are flagged as uncachable and can
subdivide those pages to provide small blocks of memory that fit various
alignment and boundary restrictions.

To be clear, I'm not objecting to your proposed changes, I'm more just
musing that similar problems exist in non-x86 architectures and maybe an
MI solution is possible (or at least the groundwork could be laid)?

> In fact, even better is to call kmem_alloc_contig() directly
> rather than using contigmalloc().  However, if you change this, then 
> bus_dmamem_free() won't always DTRT as it doesn't have enough state to know if
> a small allocation should be free'd via free() or contigfree() (the latter 
> would be required if it used a non-default memory attribute).  The fix I used 
> for this was to create a new dummy dmamap that is returned by 
> bus_dmamem_alloc 
> if it uses contigmalloc().  bus_dmamem_free() then checks the passed in map 
> pointer to decide which type of free to perform.  Once this is fixed, the 
> actual WC support is rather trivial as it merely consists of passing a 
> different argument to kmem_alloc_contig().
> 
> Oh, and using kmem_alloc_contig() instead of the pmap_change_attr() hack is
> required if you want to be able to export the same pages to userland via
> mmap (e.g. using an OBJT_SG object). :)
> 
> Peter, this is somewhat orthognal (but related) to your bus_dma patch which is
> what prompted me to post this.
> 
> Patch for 8 is below.  Porting it to HEAD should be fairly trivial and direct.
> [patch removed]
> 

-- Ian

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Interfacing devices with multiple parents within newbus

2012-07-07 Thread Ian Lepore
On Fri, 2012-07-06 at 16:45 -0400, Arnaud Lacombe wrote:
> Hi,
> 
> On Fri, Jul 6, 2012 at 3:09 PM, Ian Lepore
>  wrote:
> > On Fri, 2012-07-06 at 14:46 -0400, Arnaud Lacombe wrote:
> >> Hi,
> >>
> >> On Fri, Jul 6, 2012 at 11:33 AM, Arnaud Lacombe  wrote:
> >> > That's neither correct nor robust in a couple of way:
> >> >  1) you have no guarantee a device unit will always give you the same 
> >> > resource.
> >> this raises the following question: how can a device, today, figure
> >> out which parent in a given devclass would give it access to resources
> >> it needs.
> >>
> >> Say, you have gpiobus0 provided by a superio and gpiobus1 provided by
> >> the chipset and a LED on the chipset's GPIO. Now, say gpiobus0
> >> attachment is conditional to some BIOS setting. How can you tell
> >> gpioled(4) to attach on the chipset provided GPIO without hardcoding
> >> unit number either way ?
> >>
> >> AFAIK, you can not.
> >>
> >> Even hints provided layout description is defeated. Each device in a
> >> given devclass need to have a set of unique attribute to allow a child
> >> to distinguish it from other potential parent in the same devclass...
> >>
> >>  - Arnaud
> >
> > Talking about a child being unable to choose the correct parent seems to
> > indicate that this whole problem is turned upside-down somehow; children
> > don't choose their parents.
> >
> actually, I think I was wrong, I thought device were attached to a
> devclass, but they are truly attached to a given device. My mistake.
> 
> > Just blue-sky dreaming here on the fly... what we really have is a
> > resource-management problem.  A device comes along that needs a GPIO
> > resource, how does it find and use that resource?
> >
> > Well, we have a resource manager, could that help somehow?  Could a
> > driver that provides access to GPIO somehow register its availability so
> > that another driver can find and access it?  The "resource" may be a
> > callable interface, it doesn't really matter, I'm just wondering if the
> > current rman stuff could be leveraged to help make the connection
> > between unrelated devices.   I think that implies that there would have
> > to be something near the root of the hiearchy willing to be the
> > owner/manager of dynamic resources.
> >
> AFAIR, rman is mostly there to manage memory vs. i/o mapped resources.
> The more I think about it, the more FTD is the answer. The open
> question now being "how to map a flexible device structure (FTD) to a
> less flexible structure (Newbus)" :/
> 
>  - Arnaud

Memory- and IO-mapped regions and IRQs are the only current uses of rman
(that I know of), but it was designed to be fairly agnostic about the
resources it manages.  It just works with ranges of values (that it
really doesn't know how to interpret at all), leaving lots of room to
define new types of things it can manage.

The downside is that it's designed to be used hierarchically in the
context of newbus, specifically to help parents manage the resources
that they are able to provide to their children.  Trying to use it in a
way that allows devices which are hierarchically unrelated to allocate
resources from each other may amount to a square-peg/round-hole
situation.  But the alternative is writing a new facility to allow
registration and allocation of resources using some sort symbolic method
of representing the resources such that the new manager doesn't have to
know much about what it's managing.  I think it would be better to find
a way to reuse what we've already got if that's possible.

I think we have two semi-related aspects to this problem... 

How do we symbolically represent the resources that drivers can provide
to each other?   (FDT may be the answer; I don't know much about it.)

How do devices use that symbolic representation to locate the provider
of the resource, and how is the sharing of those resources managed?

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Interfacing devices with multiple parents within newbus

2012-07-06 Thread Ian Lepore
On Fri, 2012-07-06 at 14:46 -0400, Arnaud Lacombe wrote:
> Hi,
> 
> On Fri, Jul 6, 2012 at 11:33 AM, Arnaud Lacombe  wrote:
> > That's neither correct nor robust in a couple of way:
> >  1) you have no guarantee a device unit will always give you the same 
> > resource.
> this raises the following question: how can a device, today, figure
> out which parent in a given devclass would give it access to resources
> it needs.
> 
> Say, you have gpiobus0 provided by a superio and gpiobus1 provided by
> the chipset and a LED on the chipset's GPIO. Now, say gpiobus0
> attachment is conditional to some BIOS setting. How can you tell
> gpioled(4) to attach on the chipset provided GPIO without hardcoding
> unit number either way ?
> 
> AFAIK, you can not.
> 
> Even hints provided layout description is defeated. Each device in a
> given devclass need to have a set of unique attribute to allow a child
> to distinguish it from other potential parent in the same devclass...
> 
>  - Arnaud

Talking about a child being unable to choose the correct parent seems to
indicate that this whole problem is turned upside-down somehow; children
don't choose their parents.

Just blue-sky dreaming here on the fly... what we really have is a
resource-management problem.  A device comes along that needs a GPIO
resource, how does it find and use that resource?  

Well, we have a resource manager, could that help somehow?  Could a
driver that provides access to GPIO somehow register its availability so
that another driver can find and access it?  The "resource" may be a
callable interface, it doesn't really matter, I'm just wondering if the
current rman stuff could be leveraged to help make the connection
between unrelated devices.   I think that implies that there would have
to be something near the root of the hiearchy willing to be the
owner/manager of dynamic resources.

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Possible fix for Perl failing with ../lib/auto/POSIX/POSIX.so: Undefined symbol "__flt_rounds" on ARM

2012-06-12 Thread Ian Lepore
On Tue, 2012-06-12 at 23:26 +0300, Konstantin Belousov wrote:
> On Tue, Jun 12, 2012 at 05:56:12PM +0200, Jan Sieka wrote:
> > Both versions work indeed. I have analysed other architectures' 
> > lib/libc//Symbol.map files and __flt_rounds should go into FBSD_ and 
> > *not* into FBSDprivate section. I have verified that at least one of the 
> > Perl's libraries (POSIX.so) links to __flt_rounds. Python also links to 
> > this function. So to the best of my knowledge current patch is the 
> > righteous one.
> 
> Let me restate my point again. It does not matter whether some application
> uses the symbol. It does matter whether the symbol is considered the part
> of exported stable ABI, intended for use by applications. If it is, then
> FBSD_1.X is the right namespace, otherwise symbol should be moved to
> private and existing usage fixed.

The standard C macro FLT_ROUNDS from float.h expands to the reference to
__flt_rounds; it's intended for use by applications.

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: bpf kernel crash

2012-06-04 Thread Ian Lepore
On Mon, 2012-06-04 at 17:31 +0300, Michael Pounov wrote:
> Kernel crash when you wish to change interface name from vlan0 to other name
> 
> It seems to be in arrival/departure events.
> 
> 1) when I set up vlan0 and change name to mgmt and after that destroy mgmt.
>  kernel crash in bpfdetach() at line 2495. where it tries to find interface 
> structure.
> 2) when I setup vlan0, change name to mgmt and set ip address. After few 
> seconds 
>  kernel crash in vlan_transmit() at line 1029. where it tries to push mbufs 
> to bpf interface, but it is NULL.
> 

It sounds like that might be the same problem as this, maybe the same
patch will fix it for you...

http://lists.freebsd.org/pipermail/freebsd-current/2012-June/034408.html

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: device_attach(9) and driver initialization

2012-04-09 Thread Ian Lepore
On Mon, 2012-04-09 at 17:59 +0300, Konstantin Belousov wrote:
> On Mon, Apr 09, 2012 at 08:41:15AM -0600, Ian Lepore wrote:
> > On Sun, 2012-04-08 at 06:58 +0300, Konstantin Belousov wrote:
> > > On Sat, Apr 07, 2012 at 09:10:55PM -0600, Warner Losh wrote:
> > > > 
> > > > On Apr 7, 2012, at 8:57 AM, Konstantin Belousov wrote:
> > > > 
> > > > > On Sat, Apr 07, 2012 at 08:46:41AM -0600, Ian Lepore wrote:
> > > > >> On Sat, 2012-04-07 at 15:50 +0300, Konstantin Belousov wrote:
> > > > >>> Hello,
> > > > >>> there seems to be a problem with device attach sequence offered by 
> > > > >>> newbus.
> > > > >>> Basically, when device attach method is executing, device is not 
> > > > >>> fully
> > > > >>> initialized yet. Also the device state in the newbus part of the 
> > > > >>> world
> > > > >>> is DS_ALIVE. There is definitely no shattering news in the 
> > > > >>> statements,
> > > > >>> but drivers that e.g. create devfs node to communicate with 
> > > > >>> consumers
> > > > >>> are prone to a race.
> > > > >>> 
> > > > >>> If /dev node is created inside device attach method, then usermode
> > > > >>> can start calling cdevsw methods before device fully initialized 
> > > > >>> itself.
> > > > >>> Even more, if device tries to use newbus helpers in cdevsw methods,
> > > > >>> like device_busy(9), then panic occurs "called for unatteched 
> > > > >>> device".
> > > > >>> I get reports from users about this issues, to it is not something
> > > > >>> that only could happen.
> > > > >>> 
> > > > >>> I propose to add DEVICE_AFTER_ATTACH() driver method, to be called
> > > > >>> from newbus right after device attach finished and newbus considers
> > > > >>> the device fully initialized. Driver then could create devfs node
> > > > >>> in the after_attach method instead of attach. Please see the patch 
> > > > >>> below.
> > > > >>> 
> > > > >>> diff --git a/sys/kern/device_if.m b/sys/kern/device_if.m
> > > > >>> index eb720eb..9db74e2 100644
> > > > >>> --- a/sys/kern/device_if.m
> > > > >>> +++ b/sys/kern/device_if.m
> > > > >>> @@ -43,6 +43,10 @@ INTERFACE device;
> > > > >>> # Default implementations of some methods.
> > > > >>> #
> > > > >>> CODE {
> > > > >>> +   static void null_after_attach(device_t dev)
> > > > >>> +   {
> > > > >>> +   }
> > > > >>> +
> > > > >>> static int null_shutdown(device_t dev)
> > > > >>> {
> > > > >>> return 0;
> > > > >>> @@ -199,6 +203,21 @@ METHOD int attach {
> > > > >>> };
> > > > >>> 
> > > > >>> /**
> > > > >>> + * @brief Notify the driver that device is in attached state
> > > > >>> + *
> > > > >>> + * Called after driver is successfully attached to the device and
> > > > >>> + * corresponding device_t is fully operational. Driver now may 
> > > > >>> expose
> > > > >>> + * the device to the consumers, e.g. create devfs nodes.
> > > > >>> + *
> > > > >>> + * @param dev  the device to probe
> > > > >>> + *
> > > > >>> + * @see DEVICE_ATTACH()
> > > > >>> + */
> > > > >>> +METHOD void after_attach {
> > > > >>> +   device_t dev;
> > > > >>> +} DEFAULT null_after_attach;
> > > > >>> +
> > > > >>> +/**
> > > > >>>  * @brief Detach a driver from a device.
> > > > >>>  *
> > > > >>>  * This can be called if the user is replacing the
> > > > >>> diff --git a/sys/kern/subr_bus.c b/sys/kern/subr_bus.c
> > > > >>> index d485b9f..6d849cb 100644
> > > > >>> --- a/sys/kern/subr_bus.c
> > > > >>> +++ b/s

Re: device_attach(9) and driver initialization

2012-04-09 Thread Ian Lepore
On Sun, 2012-04-08 at 06:58 +0300, Konstantin Belousov wrote:
> On Sat, Apr 07, 2012 at 09:10:55PM -0600, Warner Losh wrote:
> > 
> > On Apr 7, 2012, at 8:57 AM, Konstantin Belousov wrote:
> > 
> > > On Sat, Apr 07, 2012 at 08:46:41AM -0600, Ian Lepore wrote:
> > >> On Sat, 2012-04-07 at 15:50 +0300, Konstantin Belousov wrote:
> > >>> Hello,
> > >>> there seems to be a problem with device attach sequence offered by 
> > >>> newbus.
> > >>> Basically, when device attach method is executing, device is not fully
> > >>> initialized yet. Also the device state in the newbus part of the world
> > >>> is DS_ALIVE. There is definitely no shattering news in the statements,
> > >>> but drivers that e.g. create devfs node to communicate with consumers
> > >>> are prone to a race.
> > >>> 
> > >>> If /dev node is created inside device attach method, then usermode
> > >>> can start calling cdevsw methods before device fully initialized itself.
> > >>> Even more, if device tries to use newbus helpers in cdevsw methods,
> > >>> like device_busy(9), then panic occurs "called for unatteched device".
> > >>> I get reports from users about this issues, to it is not something
> > >>> that only could happen.
> > >>> 
> > >>> I propose to add DEVICE_AFTER_ATTACH() driver method, to be called
> > >>> from newbus right after device attach finished and newbus considers
> > >>> the device fully initialized. Driver then could create devfs node
> > >>> in the after_attach method instead of attach. Please see the patch 
> > >>> below.
> > >>> 
> > >>> diff --git a/sys/kern/device_if.m b/sys/kern/device_if.m
> > >>> index eb720eb..9db74e2 100644
> > >>> --- a/sys/kern/device_if.m
> > >>> +++ b/sys/kern/device_if.m
> > >>> @@ -43,6 +43,10 @@ INTERFACE device;
> > >>> # Default implementations of some methods.
> > >>> #
> > >>> CODE {
> > >>> +   static void null_after_attach(device_t dev)
> > >>> +   {
> > >>> +   }
> > >>> +
> > >>> static int null_shutdown(device_t dev)
> > >>> {
> > >>> return 0;
> > >>> @@ -199,6 +203,21 @@ METHOD int attach {
> > >>> };
> > >>> 
> > >>> /**
> > >>> + * @brief Notify the driver that device is in attached state
> > >>> + *
> > >>> + * Called after driver is successfully attached to the device and
> > >>> + * corresponding device_t is fully operational. Driver now may expose
> > >>> + * the device to the consumers, e.g. create devfs nodes.
> > >>> + *
> > >>> + * @param dev  the device to probe
> > >>> + *
> > >>> + * @see DEVICE_ATTACH()
> > >>> + */
> > >>> +METHOD void after_attach {
> > >>> +   device_t dev;
> > >>> +} DEFAULT null_after_attach;
> > >>> +
> > >>> +/**
> > >>>  * @brief Detach a driver from a device.
> > >>>  *
> > >>>  * This can be called if the user is replacing the
> > >>> diff --git a/sys/kern/subr_bus.c b/sys/kern/subr_bus.c
> > >>> index d485b9f..6d849cb 100644
> > >>> --- a/sys/kern/subr_bus.c
> > >>> +++ b/sys/kern/subr_bus.c
> > >>> @@ -2743,6 +2743,7 @@ device_attach(device_t dev)
> > >>> dev->state = DS_ATTACHED;
> > >>> dev->flags &= ~DF_DONENOMATCH;
> > >>> devadded(dev);
> > >>> +   DEVICE_AFTER_ATTACH(dev);
> > >>> return (0);
> > >>> }
> > >>> 
> > >> 
> > >> Does device_get_softc() work before attach is completed?  (I don't have
> > >> time to go look in the code right now).  If so, then a mutex initialized
> > >> and acquired early in the driver's attach routine, and also acquired in
> > >> the driver's cdev implementation routines before using any newbus
> > >> functions other than device_get_softc(), would solve the problem without
> > >> a driver api change that would make it harder to backport/MFC driver
> > >> changes.
> > > No, 'a mutex'

Re: device_attach(9) and driver initialization

2012-04-09 Thread Ian Lepore
On Sat, 2012-04-07 at 17:57 +0300, Konstantin Belousov wrote:
> On Sat, Apr 07, 2012 at 08:46:41AM -0600, Ian Lepore wrote:
> > On Sat, 2012-04-07 at 15:50 +0300, Konstantin Belousov wrote:
> > > Hello,
> > > there seems to be a problem with device attach sequence offered by newbus.
> > > Basically, when device attach method is executing, device is not fully
> > > initialized yet. Also the device state in the newbus part of the world
> > > is DS_ALIVE. There is definitely no shattering news in the statements,
> > > but drivers that e.g. create devfs node to communicate with consumers
> > > are prone to a race.
> > > 
> > > If /dev node is created inside device attach method, then usermode
> > > can start calling cdevsw methods before device fully initialized itself.
> > > Even more, if device tries to use newbus helpers in cdevsw methods,
> > > like device_busy(9), then panic occurs "called for unatteched device".
> > > I get reports from users about this issues, to it is not something
> > > that only could happen.
> > > 
> > > I propose to add DEVICE_AFTER_ATTACH() driver method, to be called
> > > from newbus right after device attach finished and newbus considers
> > > the device fully initialized. Driver then could create devfs node
> > > in the after_attach method instead of attach. Please see the patch below.
> > > 
> > > diff --git a/sys/kern/device_if.m b/sys/kern/device_if.m
> > > index eb720eb..9db74e2 100644
> > > --- a/sys/kern/device_if.m
> > > +++ b/sys/kern/device_if.m
> > > @@ -43,6 +43,10 @@ INTERFACE device;
> > >  # Default implementations of some methods.
> > >  #
> > >  CODE {
> > > + static void null_after_attach(device_t dev)
> > > + {
> > > + }
> > > +
> > >   static int null_shutdown(device_t dev)
> > >   {
> > >   return 0;
> > > @@ -199,6 +203,21 @@ METHOD int attach {
> > >  };
> > >  
> > >  /**
> > > + * @brief Notify the driver that device is in attached state
> > > + *
> > > + * Called after driver is successfully attached to the device and
> > > + * corresponding device_t is fully operational. Driver now may expose
> > > + * the device to the consumers, e.g. create devfs nodes.
> > > + *
> > > + * @param devthe device to probe
> > > + *
> > > + * @see DEVICE_ATTACH()
> > > + */
> > > +METHOD void after_attach {
> > > + device_t dev;
> > > +} DEFAULT null_after_attach;
> > > +
> > > +/**
> > >   * @brief Detach a driver from a device.
> > >   *
> > >   * This can be called if the user is replacing the
> > > diff --git a/sys/kern/subr_bus.c b/sys/kern/subr_bus.c
> > > index d485b9f..6d849cb 100644
> > > --- a/sys/kern/subr_bus.c
> > > +++ b/sys/kern/subr_bus.c
> > > @@ -2743,6 +2743,7 @@ device_attach(device_t dev)
> > >   dev->state = DS_ATTACHED;
> > >   dev->flags &= ~DF_DONENOMATCH;
> > >   devadded(dev);
> > > + DEVICE_AFTER_ATTACH(dev);
> > >   return (0);
> > >  }
> > >  
> > 
> > Does device_get_softc() work before attach is completed?  (I don't have
> > time to go look in the code right now).  If so, then a mutex initialized
> > and acquired early in the driver's attach routine, and also acquired in
> > the driver's cdev implementation routines before using any newbus
> > functions other than device_get_softc(), would solve the problem without
> > a driver api change that would make it harder to backport/MFC driver
> > changes.
> No, 'a mutex' does not solve anything. It only adds enourmous burden
> on the driver developers, because you cannot sleep under mutex. Changing
> the mutex to the sleepable lock also does not byy you much, since
> you need to somehow solve the issues with some cdevsw call waking up
> thread sleeping into another cdevsw call, just for example.
> 
> Singlethreading a driver due to this race is just silly.
> 
> And, what do you mean by 'making it harder to MFC' ? How ?

I frequently find myself having to backport driver changes further back
than any currently-supported FreeBSD release, and something like a new
function in newbus can make that pretty hard to do.  That often makes me
think of how accomplish something with a minimally-invasive change.  (So
it's really more of a selfish personal goal more than a FreeBSD-project
goal.)

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: device_attach(9) and driver initialization

2012-04-07 Thread Ian Lepore
On Sat, 2012-04-07 at 15:50 +0300, Konstantin Belousov wrote:
> Hello,
> there seems to be a problem with device attach sequence offered by newbus.
> Basically, when device attach method is executing, device is not fully
> initialized yet. Also the device state in the newbus part of the world
> is DS_ALIVE. There is definitely no shattering news in the statements,
> but drivers that e.g. create devfs node to communicate with consumers
> are prone to a race.
> 
> If /dev node is created inside device attach method, then usermode
> can start calling cdevsw methods before device fully initialized itself.
> Even more, if device tries to use newbus helpers in cdevsw methods,
> like device_busy(9), then panic occurs "called for unatteched device".
> I get reports from users about this issues, to it is not something
> that only could happen.
> 
> I propose to add DEVICE_AFTER_ATTACH() driver method, to be called
> from newbus right after device attach finished and newbus considers
> the device fully initialized. Driver then could create devfs node
> in the after_attach method instead of attach. Please see the patch below.
> 
> diff --git a/sys/kern/device_if.m b/sys/kern/device_if.m
> index eb720eb..9db74e2 100644
> --- a/sys/kern/device_if.m
> +++ b/sys/kern/device_if.m
> @@ -43,6 +43,10 @@ INTERFACE device;
>  # Default implementations of some methods.
>  #
>  CODE {
> + static void null_after_attach(device_t dev)
> + {
> + }
> +
>   static int null_shutdown(device_t dev)
>   {
>   return 0;
> @@ -199,6 +203,21 @@ METHOD int attach {
>  };
>  
>  /**
> + * @brief Notify the driver that device is in attached state
> + *
> + * Called after driver is successfully attached to the device and
> + * corresponding device_t is fully operational. Driver now may expose
> + * the device to the consumers, e.g. create devfs nodes.
> + *
> + * @param devthe device to probe
> + *
> + * @see DEVICE_ATTACH()
> + */
> +METHOD void after_attach {
> + device_t dev;
> +} DEFAULT null_after_attach;
> +
> +/**
>   * @brief Detach a driver from a device.
>   *
>   * This can be called if the user is replacing the
> diff --git a/sys/kern/subr_bus.c b/sys/kern/subr_bus.c
> index d485b9f..6d849cb 100644
> --- a/sys/kern/subr_bus.c
> +++ b/sys/kern/subr_bus.c
> @@ -2743,6 +2743,7 @@ device_attach(device_t dev)
>   dev->state = DS_ATTACHED;
>   dev->flags &= ~DF_DONENOMATCH;
>   devadded(dev);
> + DEVICE_AFTER_ATTACH(dev);
>   return (0);
>  }
>  

Does device_get_softc() work before attach is completed?  (I don't have
time to go look in the code right now).  If so, then a mutex initialized
and acquired early in the driver's attach routine, and also acquired in
the driver's cdev implementation routines before using any newbus
functions other than device_get_softc(), would solve the problem without
a driver api change that would make it harder to backport/MFC driver
changes.

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Switching on/off 5V power to a USB port

2012-04-03 Thread Ian Lepore
On Tue, 2012-04-03 at 17:13 -0500, Ron McDowell wrote:
> I just got a little USB powered fan and it sure would be nice if I could 
> have cron on my FreeBSD box turn it on or off at certain times by 
> switching off the 5V line on a USB port.  Anyone know how I can do 
> that?  Thanks.
> 
> BTW this is a pretty decent fan for the money. :)  
> http://www.amazon.com/gp/product/B0033WSDOM/
> 

The usbconfig(8) command has power_on and power_off commands.  I've
never used them so I can't say for sure they'll do what you want.

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Failure to rebuild x11/nvidia-driver on head at r233697

2012-03-30 Thread Ian Lepore
On Fri, 2012-03-30 at 06:18 -0700, David Wolfskill wrote:
> I track stable/8, stable/9, and head on (different slices on) my laptop
> on a daily basis.  I share /usr/local among all of these: while I update
> the installed ports on a daily basis, I'm unwilling to do that 3 times
> daily. :-}
> 
> The x11/nvidia-driver port is one, however, that I've found helpful to
> specify in /etc/src.conf:
> 
> g1-227(10.0-C)[3] cat /etc/src.conf
> PORTS_MODULES=x11/nvidia-driver
> g1-227(10.0-C)[4] 
> 
> so it does get rebuilt ... rather more often.
> 
> I've had no problems with this for several months running -- and there
> were no problems with stable/8 or stable/9 today.  But head failed after
> the (otherwise) successful make buildkernel:
> 
> ...
> ===> zlib (all)
> cc -O2 -pipe -fno-strict-aliasing -Werror -D_KERNEL -DKLD_MODULE -nostdinc   
> -DHAVE_KERNEL_OPTION_HEADERS -include 
> /common/S4/obj/usr/src/sys/CANARY/opt_global.h -I. -I@ -I@/contrib/altq 
> -finline-limit=8000 --param inline-unit-growth=100 --param 
> large-function-growth=1000 -fno-common -g -I/common/S4/obj/usr/src/sys/CANARY 
>  -mno-align-long-strings -mpreferred-stack-boundary=2 -mno-mmx -mno-sse 
> -msoft-float -ffreestanding -fstack-protector -std=iso9899:1999 
> -fstack-protector -Wall -Wredundant-decls -Wnested-externs 
> -Wstrict-prototypes  -Wmissing-prototypes -Wpointer-arith -Winline 
> -Wcast-qual  -Wundef -Wno-pointer-sign -fformat-extensions  
> -Wmissing-include-dirs -fdiagnostics-show-option   -c 
> /usr/src/sys/modules/zlib/../../net/zlib.c
> ld  -d -warn-common -r -d -o zlib.kld zlib.o
> :> export_syms
> awk -f /usr/src/sys/conf/kmod_syms.awk zlib.kld  export_syms | xargs -J% 
> objcopy % zlib.kld
> ld -Bshareable  -d -warn-common -o zlib.ko.debug zlib.kld
> objcopy --only-keep-debug zlib.ko.debug zlib.ko.symbols
> objcopy --strip-debug --add-gnu-debuglink=zlib.ko.symbols zlib.ko.debug 
> zlib.ko
> cd ${PORTSDIR:-/usr/ports}/x11/nvidia-driver; SYSDIR=/usr/src/sys 
> /usr/obj/usr/src/make.i386/make -B all
> ===>  License NVIDIA accepted by the user
> ===>  Found saved configuration for nvidia-driver-285.05.09
> ===>  Extracting for nvidia-driver-285.05.09
> => SHA256 Checksum OK for NVIDIA-FreeBSD-x86-285.05.09.tar.gz.
> ===>  Patching for nvidia-driver-285.05.09
> ===>   nvidia-driver-285.05.09 depends on file: 
> /usr/local/libdata/pkgconfig/xorg-server.pc - found
> ===>   nvidia-driver-285.05.09 depends on shared library: GL.1 - found
> ===>  Configuring for nvidia-driver-285.05.09
> ===>  Building for nvidia-driver-285.05.09
> ===> src (all)
> @ -> /usr/src/sys
> machine -> /usr/src/sys/i386/include
> x86 -> /usr/src/sys/x86/include
> awk -f @/tools/makeobjops.awk @/kern/device_if.m -h
> awk -f @/tools/makeobjops.awk @/kern/bus_if.m -h
> ...
> cc -O2 -pipe -fno-strict-aliasing -DNV_VERSION_STRING=\"285.05.09\" 
> -D__KERNEL__ -DNVRM -Wno-unused-function -O -UDEBUG -U_DEBUG -DNDEBUG -Werror 
> -D_KERNEL -DKLD_MODULE -nostdinc  -I. -I. -I@ -I@/contrib/altq 
> -finline-limit=8000 --param inline-unit-growth=100 --param 
> large-function-growth=1000 -fno-common   -mno-align-long-strings 
> -mpreferred-stack-boundary=2 -mno-mmx -mno-sse -msoft-float -ffreestanding 
> -fstack-protector -std=iso9899:1999 -fstack-protector -Wall -Wredundant-decls 
> -Wnested-externs -Wstrict-prototypes  -Wmissing-prototypes -Wpointer-arith 
> -Winline -Wcast-qual  -Wundef -Wno-pointer-sign -fformat-extensions  
> -Wmissing-include-dirs -fdiagnostics-show-option   -c nvidia_os_registry.c
> cc -O2 -pipe -fno-strict-aliasing -DNV_VERSION_STRING=\"285.05.09\" 
> -D__KERNEL__ -DNVRM -Wno-unused-function -O -UDEBUG -U_DEBUG -DNDEBUG -Werror 
> -D_KERNEL -DKLD_MODULE -nostdinc  -I. -I. -I@ -I@/contrib/altq 
> -finline-limit=8000 --param inline-unit-growth=100 --param 
> large-function-growth=1000 -fno-common   -mno-align-long-strings 
> -mpreferred-stack-boundary=2 -mno-mmx -mno-sse -msoft-float -ffreestanding 
> -fstack-protector -std=iso9899:1999 -fstack-protector -Wall -Wredundant-decls 
> -Wnested-externs -Wstrict-prototypes  -Wmissing-prototypes -Wpointer-arith 
> -Winline -Wcast-qual  -Wundef -Wno-pointer-sign -fformat-extensions  
> -Wmissing-include-dirs -fdiagnostics-show-option   -c nvidia_pci.c
> cc -O2 -pipe -fno-strict-aliasing -DNV_VERSION_STRING=\"285.05.09\" 
> -D__KERNEL__ -DNVRM -Wno-unused-function -O -UDEBUG -U_DEBUG -DNDEBUG -Werror 
> -D_KERNEL -DKLD_MODULE -nostdinc  -I. -I. -I@ -I@/contrib/altq 
> -finline-limit=8000 --param inline-unit-growth=100 --param 
> large-function-growth=1000 -fno-common   -mno-align-long-strings 
> -mpreferred-stack-boundary=2 -mno-mmx -mno-sse -msoft-float -ffreestanding 
> -fstack-protector -std=iso9899:1999 -fstack-protector -Wall -Wredundant-decls 
> -Wnested-externs -Wstrict-prototypes  -Wmissing-prototypes -Wpointer-arith 
> -Winline -Wcast-qual  -Wundef -Wno-pointer-sign -fformat-extensions  
> -Wmissing-include-dirs -fdiagnostics-show-option   -c nvidia_subr.c
> nvidia_subr.c: In function 'nv_all

Re: CURRENT: make -jX buildworld doesn't work

2012-03-15 Thread Ian Lepore
On Wed, 2012-03-14 at 18:08 +0100, O. Hartmann wrote: 
> On 03/14/12 16:08, Ian Lepore wrote:
> > On Wed, 2012-03-14 at 14:47 +0100, O. Hartmann wrote:
> >> This is no compalin, since make buildworld works with one thread.
> >>
> >> But I'd like to report a funny thing I witnessed.
> >>
> >> On two boxes equipted with Core2Dou CPUS (E8500,  2 cores/threads,
> >> Q6600, 4 cores/threads) a parallel make buildworld works fine with the
> >> most recent sources of FreeBSD 10.0-CURRENT/amd64 (both boxes have 8 GB
> >> RAM, both boxes use a very close/similar setup and configuration).
> >>
> >> I moved in my lab towards a brand new Sandy-Bridge-E box with 32GB RAM
> >> and the CPU is a Core i7-3930X with six cores/12 threads. On this box,
> >> even "make buildworld" with -j2 fails to build. It builds fine with a
> >> vanilla "make buildworld".
> >>
> >> Also funny is, that even with only one thread, the 3 GHz Core2Duo
> >> methusalem systems "outperform" in compile time the 3,2 GHZ driven Intel
> >> youngster.
> >>
> >> Maybe there is an issue in FreeBSD 10 with the TURBO BOOST? I'm a bit
> >> time constraint now, but I'm willing to do some tests with advices from
> >> the experts.
> >>
> >> Lets give you some informations I think it could be valuable. Please
> >> request more, if this isn't sufficient.
> >>
> >> I realize/I'm aware that both hardware and OS are brandnew! But
> >> hopefully this is something important and could be "fixed" - if my
> >> observation is a real observation ...
> >>
> >> Regards,
> >> Oliver
> 
> [SNIP]
> 
> > 
> > There was a change (r232793) a few days ago to make turbo boost work,
> > more info in this thread:
> > 
> > http://lists.freebsd.org/pipermail/freebsd-current/2012-March/032434.html
> > 
> > I wasn't able to get it to work just by tweaking the rc.conf knobs, but
> > I suspect the reason may be because I have non-standard devd.conf. I
> > worked around it and haven't had time to look for the cause yet.
> > 
> > In trying to explain the compile time differences between two systems,
> > one of the first things I'd look at would be differences in the disks.
> > In my experience, IO performance has as big an influence on build time
> > as processor speed and number of cores.
> > 
> > I just updated my -current sources and did a fresh buildworld and
> > buildkernel using both -j2 and -j12 and had no problems on a 6-core
> > Xeon.  I know "make xdev" fails with -jN but I haven't seen failures on
> > any other targets.  There was a checkin recently that added some .ORDER
> > stuff for -jN but it only affected building usr.sbin/acpi.
> > 
> > -- Ian
> 
> The change has already been used. But no change.
> 
> The disks are on all systems the same, but the new box has SATA 6GB and
> the WD 640GB "Caviar black" disk claims also be SATA 6GB.
> You're right, disk I/O has a great impact. And I realized that FreeBSD
> 9/10 have a big problem with the Patsburg-based X79 chipset, as far as I
> can see - or it is simply a coincidence. Since the disks are attached to
> the X79 chipset's SATA 6GB port, I have sometimes strange elongated
> access times. When doing a simple diskper measurement, the raw
> performance of the disk reveals itself as slightly better than with SATA
> 3GB.
> 
> By the way, my last buildworl took 1 hour. The buildworld before 3
> hours. Same load, same box, same OS revision. funny.
> 
> I stay tuned. At the time, it doesn't bother me much. I thought it is
> just worth to be mentioned ...
> 
> Regards,
> 
> Oliver
> 

Just to give you a point of comparison...  My build machine is a 6-core
Xeon W3680 running at 4.25GHz (yes, that's overclocked from stock 3.3),
hyperthreading disabled, 12G ram, and all builds are done using
filesystems on SSD drives connected to a SATA-2 controller.
Using /dev/null for src.conf and make.conf so that it's a completely
stock build, I get these build times for a "make buildworld buildkernel"

  -j1   63 minutes
  -j2   35 minutes
  -j6   19 minutes
  -j12  18 minutes

I wonder if your 1hr/3hr difference is due to caching?  If so, that
would seem to point to big trouble with disk performance.  I re-did the
-j6 build after unmounting/remounting the filesystem to clear the cache
and the difference was less than 30 seconds, but that's with SSD drives.

Hmmm, I just realized I'm using MODULES_OVERRIDE="agp drm" so it's not
quite a completely-stock/GENERIC build.

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: CURRENT: make -jX buildworld doesn't work

2012-03-14 Thread Ian Lepore
On Wed, 2012-03-14 at 14:47 +0100, O. Hartmann wrote:
> This is no compalin, since make buildworld works with one thread.
> 
> But I'd like to report a funny thing I witnessed.
> 
> On two boxes equipted with Core2Dou CPUS (E8500,  2 cores/threads,
> Q6600, 4 cores/threads) a parallel make buildworld works fine with the
> most recent sources of FreeBSD 10.0-CURRENT/amd64 (both boxes have 8 GB
> RAM, both boxes use a very close/similar setup and configuration).
> 
> I moved in my lab towards a brand new Sandy-Bridge-E box with 32GB RAM
> and the CPU is a Core i7-3930X with six cores/12 threads. On this box,
> even "make buildworld" with -j2 fails to build. It builds fine with a
> vanilla "make buildworld".
> 
> Also funny is, that even with only one thread, the 3 GHz Core2Duo
> methusalem systems "outperform" in compile time the 3,2 GHZ driven Intel
> youngster.
> 
> Maybe there is an issue in FreeBSD 10 with the TURBO BOOST? I'm a bit
> time constraint now, but I'm willing to do some tests with advices from
> the experts.
> 
> Lets give you some informations I think it could be valuable. Please
> request more, if this isn't sufficient.
> 
> I realize/I'm aware that both hardware and OS are brandnew! But
> hopefully this is something important and could be "fixed" - if my
> observation is a real observation ...
> 
> Regards,
> Oliver
> 
> KERNEL:
> 
> # CPU frequency control
> device  cpufreq
> device  cpuctl
> device  nvram
> device  coretemp
> 
> /etc/rc.conf:
> performance_cx_lowest="HIGH"# Online CPU idle state
> performance_cpu_freq="NONE" # Online CPU frequency
> economy_cx_lowest="HIGH"# Offline CPU idle state
> economy_cpu_freq="NONE" # Offline CPU frequency
> 
> #> sysctl kern.timecounter
> kern.timecounter.tc.i8254.mask: 65535
> kern.timecounter.tc.i8254.counter: 52702
> kern.timecounter.tc.i8254.frequency: 1193182
> kern.timecounter.tc.i8254.quality: 0
> kern.timecounter.tc.HPET.mask: 4294967295
> kern.timecounter.tc.HPET.counter: 2109995661
> kern.timecounter.tc.HPET.frequency: 14318180
> kern.timecounter.tc.HPET.quality: 950
> kern.timecounter.tc.ACPI-fast.mask: 16777215
> kern.timecounter.tc.ACPI-fast.counter: 3590445
> kern.timecounter.tc.ACPI-fast.frequency: 3579545
> kern.timecounter.tc.ACPI-fast.quality: 900
> kern.timecounter.tc.TSC-low.mask: 4294967295
> kern.timecounter.tc.TSC-low.counter: 1510440780
> kern.timecounter.tc.TSC-low.frequency: 12537463
> kern.timecounter.tc.TSC-low.quality: -100
> kern.timecounter.stepwarnings: 0
> kern.timecounter.hardware: HPET
> kern.timecounter.choice: TSC-low(-100) ACPI-fast(900) HPET(950) i8254(0)
> dummy(-100)
> kern.timecounter.tick: 5
> kern.timecounter.invariant_tsc: 1
> kern.timecounter.smp_tsc: 0
> 
> #
> #> dmesg | grep cpu
>  cpu0 (BSP): APIC ID:  0
>  cpu1 (AP): APIC ID:  1
>  cpu2 (AP): APIC ID:  2
>  cpu3 (AP): APIC ID:  3
>  cpu4 (AP): APIC ID:  4
>  cpu5 (AP): APIC ID:  5
>  cpu6 (AP): APIC ID:  6
>  cpu7 (AP): APIC ID:  7
>  cpu8 (AP): APIC ID:  8
>  cpu9 (AP): APIC ID:  9
>  cpu10 (AP): APIC ID: 10
>  cpu11 (AP): APIC ID: 11
> cpu0:  on acpi0
> cpu1:  on acpi0
> cpu2:  on acpi0
> cpu3:  on acpi0
> cpu4:  on acpi0
> cpu5:  on acpi0
> cpu6:  on acpi0
> cpu7:  on acpi0
> cpu8:  on acpi0
> cpu9:  on acpi0
> cpu10:  on acpi0
> cpu11:  on acpi0
> coretemp0:  on cpu0
> est0:  on cpu0
> p4tcc0:  on cpu0
> coretemp1:  on cpu1
> est1:  on cpu1
> p4tcc1:  on cpu1
> coretemp2:  on cpu2
> est2:  on cpu2
> p4tcc2:  on cpu2
> coretemp3:  on cpu3
> est3:  on cpu3
> p4tcc3:  on cpu3
> coretemp4:  on cpu4
> est4:  on cpu4
> p4tcc4:  on cpu4
> coretemp5:  on cpu5
> est5:  on cpu5
> p4tcc5:  on cpu5
> coretemp6:  on cpu6
> est6:  on cpu6
> p4tcc6:  on cpu6
> coretemp7:  on cpu7
> est7:  on cpu7
> p4tcc7:  on cpu7
> coretemp8:  on cpu8
> est8:  on cpu8
> p4tcc8:  on cpu8
> coretemp9:  on cpu9
> est9:  on cpu9
> p4tcc9:  on cpu9
> coretemp10:  on cpu10
> est10:  on cpu10
> p4tcc10:  on cpu10
> coretemp11:  on cpu11
> est11:  on cpu11
> p4tcc11:  on cpu11
> 
> 
> 
> #>  sysctl dev.cpu
> dev.cpu.0.%desc: ACPI CPU
> dev.cpu.0.%driver: cpu
> dev.cpu.0.%location: handle=\_SB_.P000
> dev.cpu.0.%pnpinfo: _HID=none _UID=0
> dev.cpu.0.%parent: acpi0
> dev.cpu.0.coretemp.delta: 54
> dev.cpu.0.coretemp.resolution: 1
> dev.cpu.0.coretemp.tjmax: 91.0C
> dev.cpu.0.coretemp.throttle_log: 0
> dev.cpu.0.temperature: 37.0C
> dev.cpu.0.freq: 3200
> dev.cpu.0.freq_levels: 3201/13 3200/13 3000/119000 2800/109000
> 2600/99000 2400/89000 2200/8 2000/71000 1800/62000 1600/54000
> 1400/46000 1225/40250 1200/39000 1050/34125 900/29250 750/24375
> 600/19500 450/14625 300/9750 150/4875
> dev.cpu.0.cx_supported: C1/1
> dev.cpu.0.cx_lowest: C1
> dev.cpu.0.cx_usage: 100.00% last 759us
> dev.cpu.1.%desc: ACPI CPU
> dev.cpu.1.%driver: cpu
> dev.cpu.1.%location: handle=\_SB_.P001
> dev.cpu.1.%pnpinfo: _HID=none _UID=0
> dev.cpu.1.%parent: acpi0
> dev.cpu.1.coretemp.delta: 54
> dev.cpu.1.coretemp.resolution: 1
> dev.cpu.

Re: Improved Intel Turbo Boost status/control

2012-03-13 Thread Ian Lepore
On Mon, 2012-03-12 at 22:52 +0200, Alexander Motin wrote:
> On 03/12/12 22:45, Ian Lepore wrote:
> > On Mon, 2012-03-12 at 21:15 +0200, Alexander Motin wrote:
> >> I'd like to note that recent r232793 change to cpufreq(4) in HEAD opened
> >> simple access to the  Intel Turbo Boost status/control. I've found that
> >> at least two of my desktop systems (based Nehalem and SandyBridge Core
> >> i7s) with enabled Intel Turbo Boost in BIOS it is not use it by default,
> >> unless powerd is enabled. And before this change it was difficult to
> >> detect/fix.
> >>
> >> ACPI reports extra performance level with frequency 1MHz above the
> >> nominal to control Intel Turbo Boost operation. It is not a bug, but
> >> feature:
> >> dev.cpu.0.freq_levels: 2934/106000 2933/95000 2800/82000 ...
> >> In this case value 2933 means 2.93GHz, but 2934 means 3.2-3.6GHz.
> >>
> >> After boot with default settings I see:
> >> dev.cpu.0.freq: 2933
> >> , that means Turbo Boost is disabled.
> >>
> >> Enabling powerd or just adding to rc.conf
> >> performance_cpu_freq="HIGH"
> >> enables Turbo Boost and adds extra 10-20% to the system performance.
> >>
> >> Turbo Boost operation can be monitored in run-time via the PMC with
> >> command that prints number or really executed cycles per CPU core:
> >> pmcstat -s unhalted-core-cycles -w 1
> >>
> >
> > The r232793 patch applies cleanly to 8-stable and builds just fine, but
> > after install/reboot I don't see a change in the freq_levels
> >
> >  revolution>  sysctl dev.cpu.0
> >  dev.cpu.0.%desc: ACPI CPU
> >  dev.cpu.0.%driver: cpu
> >  dev.cpu.0.%location: handle=\_PR_.P001
> >  dev.cpu.0.%pnpinfo: _HID=none _UID=0
> >  dev.cpu.0.%parent: acpi0
> >  dev.cpu.0.coretemp.delta: 70
> >  dev.cpu.0.coretemp.resolution: 1
> >  dev.cpu.0.coretemp.tjmax: 101.0C
> >  dev.cpu.0.coretemp.throttle_log: 0
> >  dev.cpu.0.temperature: 31.0C
> >  dev.cpu.0.freq: 
> >  dev.cpu.0.freq_levels: /13 3200/117000 3067/105000
> >  2933/94000 2800/85000 2667/76000 2533/68000 2400/61000
> >  2267/54000 2133/48000 2000/43000 1867/39000 1733/35000
> >  1600/32000 1400/28000 1200/24000 1000/2 800/16000 600/12000
> >  400/8000 200/4000
> >  dev.cpu.0.cx_supported: C1/32 C2/96 C3/128
> >  dev.cpu.0.cx_lowest: C1
> >  dev.cpu.0.cx_usage: 100.00% 0.00% 0.00% last 657us
> >  revolution>
> >
> >
> > I would have expected a 3334 entry to appear after the reboot.  Is this
> > expected (like are there other required changes missing in 8-stable), or
> > do I have something misconfigured?  (I can post more info, but don't
> > want to spam the list if the answer is going to be "this shouldn't work
> > in 8.x).
> 
> I don't know any reason why it should not work on 8.x. It is ACPI BIOS 
> duty to report set of frequencies. This patch just makes system to 
> follow it more close. Make sure your CPU supports Turbo Boost and it is 
> enabled in BIOS. On my system disabling Turbo Boost in BIOS removes the 
> frequency from the list.
> 

It was indeed a bios config thing (I had it enabled, but then a side
effect of one of my overclock settings caused the bios to quietly
disable it).  I got that straightened out, and now it's working great.
Setting dev.cpu.0.freq=3334 cuts about 90 seconds off my standard
workflow-benchmark (that's 90 seconds off a 20 minute compile/build
process, a noticible improvement).

I found that setting performance_cpu_freq="HIGH" doesn't work on my
desktop system, I guess because devd never gets any AC adapter events
that trigger running the power_profile script.  I enabled it manually by
adding these lines to my /etc/sysctl.conf:

  hw.acpi.cpu.cx_lowest=C2
  dev.cpu.0.freq=3334

It would be nice to come up with a way to automatically enable this for
desktop users.  If not fully automatic, hopefully require no more than a
simple =YES knob in rc.conf.

Thanks for this work, Alexander!  I had assumed I was already getting
turbo mode benefits automatically just because my chip supports it.
It's a nice bonus to suddenly get another 7% improvement on my benchmark
when I thought I was already tweaked for max performance.

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Improved Intel Turbo Boost status/control

2012-03-12 Thread Ian Lepore
On Mon, 2012-03-12 at 21:15 +0200, Alexander Motin wrote:
> Hi.
> 
> I'd like to note that recent r232793 change to cpufreq(4) in HEAD opened 
> simple access to the  Intel Turbo Boost status/control. I've found that 
> at least two of my desktop systems (based Nehalem and SandyBridge Core 
> i7s) with enabled Intel Turbo Boost in BIOS it is not use it by default, 
> unless powerd is enabled. And before this change it was difficult to 
> detect/fix.
> 
> ACPI reports extra performance level with frequency 1MHz above the 
> nominal to control Intel Turbo Boost operation. It is not a bug, but 
> feature:
> dev.cpu.0.freq_levels: 2934/106000 2933/95000 2800/82000 ...
> In this case value 2933 means 2.93GHz, but 2934 means 3.2-3.6GHz.
> 
> After boot with default settings I see:
> dev.cpu.0.freq: 2933
> , that means Turbo Boost is disabled.
> 
> Enabling powerd or just adding to rc.conf
> performance_cpu_freq="HIGH"
> enables Turbo Boost and adds extra 10-20% to the system performance.
> 
> Turbo Boost operation can be monitored in run-time via the PMC with 
> command that prints number or really executed cycles per CPU core:
> pmcstat -s unhalted-core-cycles -w 1
> 

The r232793 patch applies cleanly to 8-stable and builds just fine, but
after install/reboot I don't see a change in the freq_levels

revolution > sysctl dev.cpu.0
dev.cpu.0.%desc: ACPI CPU
dev.cpu.0.%driver: cpu
dev.cpu.0.%location: handle=\_PR_.P001
dev.cpu.0.%pnpinfo: _HID=none _UID=0
dev.cpu.0.%parent: acpi0
dev.cpu.0.coretemp.delta: 70
dev.cpu.0.coretemp.resolution: 1
dev.cpu.0.coretemp.tjmax: 101.0C
dev.cpu.0.coretemp.throttle_log: 0
dev.cpu.0.temperature: 31.0C
dev.cpu.0.freq: 
dev.cpu.0.freq_levels: /13 3200/117000 3067/105000
2933/94000 2800/85000 2667/76000 2533/68000 2400/61000
2267/54000 2133/48000 2000/43000 1867/39000 1733/35000
1600/32000 1400/28000 1200/24000 1000/2 800/16000 600/12000
400/8000 200/4000
dev.cpu.0.cx_supported: C1/32 C2/96 C3/128
dev.cpu.0.cx_lowest: C1
dev.cpu.0.cx_usage: 100.00% 0.00% 0.00% last 657us
revolution > 


I would have expected a 3334 entry to appear after the reboot.  Is this
expected (like are there other required changes missing in 8-stable), or
do I have something misconfigured?  (I can post more info, but don't
want to spam the list if the answer is going to be "this shouldn't work
in 8.x).

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: negative group permissions?

2012-02-29 Thread Ian Lepore
On Wed, 2012-02-29 at 13:00 -0500, Jason Hellenthal wrote:
> 
> On Wed, Feb 29, 2012 at 10:18:13AM -0700, Ian Lepore wrote:
> > On Wed, 2012-02-29 at 11:41 -0500, Jason Hellenthal wrote:
> > > 
> > > On Wed, Feb 29, 2012 at 04:18:45PM +, jb wrote:
> > > > Ian Lepore  damnhippie.dyndns.org> writes:
> > > > 
> > > > > ... 
> > > > >  It's not a
> > > > > directory or executable file in the first place, so making it 
> > > > > executable
> > > > > for everyone except the owner and group is not some sort of subtle
> > > > > security trick, it's just meaningless.
> > > > > ...
> > > > 
> > > > Is it meaningless ?
> > > > 
> > > > Example:
> > > > # cat /var/spool/output/lpd/.seq 
> > > > #! /usr/local/bin/bash
> > > > touch /tmp/jb-test-`echo $$`
> > > > 
> > > > # ls -al /var/spool/output/lpd/.seq 
> > > > -rw-rx  1 root  daemon  54 Feb 29 17:05 /var/spool/output/lpd/.seq
> > > > # /var/spool/output/lpd/.seq 
> > > > # 
> > > > # ls /tmp/jb*
> > > > /tmp/jb-test-61789
> > > > 
> > > > # chmod 0640 /var/spool/output/lpd/.seq 
> > > > # ls -al /var/spool/output/lpd/.seq 
> > > > -rw-r-  1 root  daemon  52 Feb 29 17:11 /var/spool/output/lpd/.seq
> > > > # /var/spool/output/lpd/.seq 
> > > > su: /var/spool/output/lpd/.seq: Permission denied
> > > > #
> > > > 
> > > 
> > > Giving execute bit to others by security means to allow others to search
> > > for that file and find it. If its not there then the process created by
> > > current user will not be able to read the file since they are not part
> > > of the daemon group. I would assume that sometimes the contents of .seq
> > > was judged to be insecure for whatever reason but judged that a user
> > > should be able to still in a sense read the file without reading its
> > > contents. Negative perms are not harmful.
> > > 
> > > I do suppose a 'daily_status_security_neggrpperm_dirs=' variable should
> > > be added here to control which directories are being scanned much like
> > > chknoid.
> > > 
> > 
> > The exec bit's control over the ability to search applies to
> > directories, not individual files.  For example:
> > 
> > revolution > whoami
> > ilepore
> > revolution > ll /tmp/test
> > -rw-rx  1 root  daemon 0B Feb 29 07:37 /tmp/test*
> > 
> > The file is 0641 and I'm not in the daemon group; I can list it.
> > 
> 
> The issue is not with listing the file. Setting the execute bit on a
> file where there is only a read bit higher up allows for the calling
> process to read the contents and noone else. This is special and not a
> flaw.
> 
> > Again, the problem here seems to be the use of 0661 in the lpr program,
> > not the idea of negative permissions, not the new scan for the use of
> > negative permissions.  It's just an old bug in an old program which used
> > to be harmless and now is "mostly harmless".  Instead of trying to "fix"
> > it by causing the new scan to ignore it, why don't we fix it by fixing
> > the program?  (I'd submit a patch but it's a 1-character change -- it's
> > not clear to me a patch would be easier for a commiter to handle than
> > just finding and changing the only occurrance of "0661" in lpr.c.)
> > 
> 
> It was intentional and not a flaw. This file should be readable by the
> calling process and noone else. This is the way permissions work.
> 

I'm sorry, but I can't make any sense of what you've said here.  The
file is already readable and writable by the process that creates it. 

On a second look just now I noticed the seteuid() calls in lpr before
and after the file open/create.  I thought that might be what you mean,
that the process could lose access to the file with the seteuid() call
after it's opened.  I just tested that, and it doesn't seem to behave
that way.  I used simple test code similar to lpr.c but using mode 0660
instead of 0661:

char buf[128];
int fd;
int bytes;
errno = 0;
pid_t uid = getuid();
pid_t euid = geteuid();
printf("uid %d euid %d\n", uid, euid);
seteuid(euid);
fd = open("/tmp/test", O_RDWR|O_CREAT, 0660);
flock(fd, LOCK_EX);
printf("fd = %d

Re: negative group permissions?

2012-02-29 Thread Ian Lepore
On Wed, 2012-02-29 at 11:41 -0500, Jason Hellenthal wrote:
> 
> On Wed, Feb 29, 2012 at 04:18:45PM +, jb wrote:
> > Ian Lepore  damnhippie.dyndns.org> writes:
> > 
> > > ... 
> > >  It's not a
> > > directory or executable file in the first place, so making it executable
> > > for everyone except the owner and group is not some sort of subtle
> > > security trick, it's just meaningless.
> > > ...
> > 
> > Is it meaningless ?
> > 
> > Example:
> > # cat /var/spool/output/lpd/.seq 
> > #! /usr/local/bin/bash
> > touch /tmp/jb-test-`echo $$`
> > 
> > # ls -al /var/spool/output/lpd/.seq 
> > -rw-rx  1 root  daemon  54 Feb 29 17:05 /var/spool/output/lpd/.seq
> > # /var/spool/output/lpd/.seq 
> > # 
> > # ls /tmp/jb*
> > /tmp/jb-test-61789
> > 
> > # chmod 0640 /var/spool/output/lpd/.seq 
> > # ls -al /var/spool/output/lpd/.seq 
> > -rw-r-  1 root  daemon  52 Feb 29 17:11 /var/spool/output/lpd/.seq
> > # /var/spool/output/lpd/.seq 
> > su: /var/spool/output/lpd/.seq: Permission denied
> > #
> > 
> 
> Giving execute bit to others by security means to allow others to search
> for that file and find it. If its not there then the process created by
> current user will not be able to read the file since they are not part
> of the daemon group. I would assume that sometimes the contents of .seq
> was judged to be insecure for whatever reason but judged that a user
> should be able to still in a sense read the file without reading its
> contents. Negative perms are not harmful.
> 
> I do suppose a 'daily_status_security_neggrpperm_dirs=' variable should
> be added here to control which directories are being scanned much like
> chknoid.
> 

The exec bit's control over the ability to search applies to
directories, not individual files.  For example:

revolution > whoami
ilepore
revolution > ll /tmp/test
-rw-rx  1 root  daemon 0B Feb 29 07:37 /tmp/test*

The file is 0641 and I'm not in the daemon group; I can list it.

Again, the problem here seems to be the use of 0661 in the lpr program,
not the idea of negative permissions, not the new scan for the use of
negative permissions.  It's just an old bug in an old program which used
to be harmless and now is "mostly harmless".  Instead of trying to "fix"
it by causing the new scan to ignore it, why don't we fix it by fixing
the program?  (I'd submit a patch but it's a 1-character change -- it's
not clear to me a patch would be easier for a commiter to handle than
just finding and changing the only occurrance of "0661" in lpr.c.)

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: negative group permissions?

2012-02-29 Thread Ian Lepore
On Wed, 2012-02-29 at 16:18 +, jb wrote:
> Ian Lepore  damnhippie.dyndns.org> writes:
> 
> > ... 
> >  It's not a
> > directory or executable file in the first place, so making it executable
> > for everyone except the owner and group is not some sort of subtle
> > security trick, it's just meaningless.
> > ...
> 
> Is it meaningless ?
> 
> Example:
> # cat /var/spool/output/lpd/.seq 
> #! /usr/local/bin/bash
> touch /tmp/jb-test-`echo $$`
> 
> # ls -al /var/spool/output/lpd/.seq 
> -rw-rx  1 root  daemon  54 Feb 29 17:05 /var/spool/output/lpd/.seq
> # /var/spool/output/lpd/.seq 
> # 
> # ls /tmp/jb*
> /tmp/jb-test-61789
> 
> # chmod 0640 /var/spool/output/lpd/.seq 
> # ls -al /var/spool/output/lpd/.seq 
> -rw-r-  1 root  daemon  52 Feb 29 17:11 /var/spool/output/lpd/.seq
> # /var/spool/output/lpd/.seq 
> su: /var/spool/output/lpd/.seq: Permission denied
> #
> 
> jb

I don't understand the point of your example.  You use an example .seq
file which does not contain the data the lpr program puts into that
file.  Instead your file contains executable code, then you show how
negative permissions work on exectuable files.

My point is that the way this file is used by lpr, it is NOT an
executable file -- it holds a simple ascii-encoded sequence number.
That seems to be a pretty strong argument that manipulating the exec
permission was not an intentional invokation of negative permissions.

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: negative group permissions?

2012-02-29 Thread Ian Lepore
On Wed, 2012-02-29 at 13:21 +, jb wrote:
> jb  gmail.com> writes:
> 
> > ... 
> > I would suggest (if you can) that you change the .seq permissions to 0664 
> > and
> > watch what happens to it - the purpose is to narrow down who/what changed 
> > its
> > mode.
> > Some history. logs. and some ad hoc "watch script" would do it.
> 
> Take a look at "notify" feature (file, dir, event).
> http://www.freebsd.org/cgi/ports.cgi?query=notify&stype=all
> jb

I don't understand why everyone is focused on the 641 mode the file ends
up with.  The code creates the file using 0661, and under a umask of 022
you end up with a file with 0641 permissions.  How the write bit
disppeared from the group permissions doesn't seem to be germane to the
real question of why the code specifies world-exec access.  

I don't think it's a legitimate attempt to leverage the negative
permissions quirk, because it doesn't effectively do so.  It's not a
directory or executable file in the first place, so making it executable
for everyone except the owner and group is not some sort of subtle
security trick, it's just meaningless.  I think the code is long overdue
for a fix to 0660 permissions when creating the file.

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Race between cron and crontab

2012-02-02 Thread Ian Lepore
On Thu, 2012-02-02 at 17:03 -0800, Doug Barton wrote:
> On 02/01/2012 04:42, John Baldwin wrote:
> > On Tuesday, January 31, 2012 9:23:12 pm Doug Barton wrote:
> >> On 01/31/2012 08:49, John Baldwin wrote:
> >>> A co-worker ran into a race between updating a cron tab via crontab(8) 
> >>> and 
> >>> cron(8) yesterday.  Specifically, cron(8) failed to notice that a crontab 
> >>> was 
> >>> updated.  The problem is that 1) by default our filesystems only use 
> >>> second 
> >>> granularity for timestamps and 2) cron only caches the seconds portion of 
> >>> a 
> >>> file's timestamp when checking for changes anyway.  This means that cron 
> >>> can 
> >>> miss updates to a spool directory if multiple updates to the directory 
> >>> are 
> >>> performed within a single second and cron wakes up to scan the spool 
> >>> directory 
> >>> within the same second and scans it before all of the updates are 
> >>> complete.
> >>>
> >>> Specifically, when replacing a crontab, crontab(8) first creates a 
> >>> temporary 
> >>> file in /var/cron/tabs and then uses a rename to install it followed by 
> >>> touching the spool directory to update its modification time.  However, 
> >>> the 
> >>> creation of the temporary file already changes the modification time of 
> >>> the 
> >>> directory, and cron may "miss" the rename if it scans the directory in 
> >>> between 
> >>> the creation of the temporary file and the rename.
> >>>
> >>> The "fix" I am planning to use locally is to simply force crontab(8) to 
> >>> sleep 
> >>> for a second before it touches the spool directory, thus ensuring that it 
> >>> the 
> >>> touch of the spool directory will use a later modification time than the 
> >>> creation of the temporary file.
> >>
> >> If you really want cron to have sub-second granularity I don't see how
> >> you could do it without using flags.
> >>
> >> crontab open   sets flag that it is editing a file
> >> crontab close  clears "editing" flag, sets "something changed" flag
> >>(if something actually changed of course)
> >>
> >> cron   checks existence of "something changed" flag, pulls the
> >>update if there is no "editing" flag, clears "changed" flag
> > 
> > I don't want it to have sub-second granularity,
> 
> Ok, I was interpolating, sorry if I misinterpreted your intentions.
> 
> > I just want to make
> > 'crontab -e' more reliable so that cron doesn't miss edits.  cron is
> > currently using the mod-time of the spool directory as the 'something
> > changed' flag (have you read the cron code?).
> 
> I understand the spool behavior from history/experience, and I am
> relying on your excellent summary for the details. :)
> 
> > The problem is that it
> > currently can set the 'something changed' flag non-atomically while it is
> > updating a crontab.
> 
> That much I understood from your post. My response to what it is I think
> you're trying to achieve is that it's not likely that you can achieve it
> by only using 1 flag, no matter what that 1 flag is. I may be wrong
> about that, but hopefully my suggestion gives you some other ideas to
> consider.
> 
> Meanwhile, I was thinking more about this and TMK cron doesn't actually
> *run* jobs with seconds granularity, only minutes, right?  If so then it
> seems that the only really important seconds to care about are :59 and
> :00. That would seem to present a solution that rather than having cron
> wake up every second to see if something has changed that it only do
> that at :59 (or however many seconds in advance of :00 that it needs,
> although if it's more than 1 I'll be surprised). That limits the race to
> someone who writes out a new crontab entry at the point during second
> :59 that is after cron wakes up to look but before :00. So that's not a
> perfect solution to your problem, but it should limit the race to a very
> narrow window without having to modify the code very much.
> 
> 
> hth,
> 
> Doug

I think part of the problem here is that I started typing without
thinking enough... I thought I was offering a different small change
that fixed more than one problem, but it was wrong.  My mistake seems to
be having the unintended effect of sidetracking a perfectly reasonable
small fix for a problem that's known to happen in the real world, just
because it doesn't also fix a theoretical problem that might happen
somewhere some day; that sure wasn't my intention.

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Race between cron and crontab

2012-01-31 Thread Ian Lepore
On Tue, 2012-01-31 at 13:30 -0500, John Baldwin wrote:
> On Tuesday, January 31, 2012 12:57:50 pm Ian Lepore wrote:
> > On Tue, 2012-01-31 at 11:49 -0500, John Baldwin wrote:
> > > A co-worker ran into a race between updating a cron tab via crontab(8) 
> > > and 
> > > cron(8) yesterday.  Specifically, cron(8) failed to notice that a crontab 
> > > was 
> > > updated.  The problem is that 1) by default our filesystems only use 
> > > second 
> > > granularity for timestamps and 2) cron only caches the seconds portion of 
> > > a 
> > > file's timestamp when checking for changes anyway.  This means that cron 
> > > can 
> > > miss updates to a spool directory if multiple updates to the directory 
> > > are 
> > > performed within a single second and cron wakes up to scan the spool 
> > > directory 
> > > within the same second and scans it before all of the updates are 
> > > complete.
> > > 
> > > Specifically, when replacing a crontab, crontab(8) first creates a 
> > > temporary 
> > > file in /var/cron/tabs and then uses a rename to install it followed by 
> > > touching the spool directory to update its modification time.  However, 
> > > the 
> > > creation of the temporary file already changes the modification time of 
> > > the 
> > > directory, and cron may "miss" the rename if it scans the directory in 
> > > between 
> > > the creation of the temporary file and the rename.
> > > 
> > > The "fix" I am planning to use locally is to simply force crontab(8) to 
> > > sleep 
> > > for a second before it touches the spool directory, thus ensuring that it 
> > > the 
> > > touch of the spool directory will use a later modification time than the 
> > > creation of the temporary file.
> > > 
> > > Note that crontab -r is not affected by this race as it only does one 
> > > atomic 
> > > update to the directory (unlink()).
> > > 
> > > Index: crontab.c
> > > ===
> > > --- crontab.c (revision 225431)
> > > +++ crontab.c (working copy)
> > > @@ -604,6 +604,15 @@ replace_cmd() {
> > >  
> > >   log_it(RealUser, Pid, "REPLACE", User);
> > >  
> > > + /*
> > > +  * Creating the 'tn' temp file has already updated the
> > > +  * modification time of the spool directory.  Sleep for a
> > > +  * second to ensure that poke_daemon() sets a later
> > > +  * modification time.  Otherwise, this can race with the cron
> > > +  * daemon scanning for updated crontabs.
> > > +  */
> > > + sleep(1);
> > > +
> > >   poke_daemon();
> > >  
> > >   return (0);
> > 
> > Maybe this is overly pedantic, but that solution still allows the
> > possibility of the same sort of race if a user updates their crontab in
> > the same second as an admin saves a new /etc/crontab, because cron takes
> > the max timestamp of /etc/crontab and /var/cron/tabs and compares it
> > against the database-rebuild timestamp.
> 
> Hmm, I'm not sure I see the race in that case.  If the /etc/crontab file
> matches the timestamp of the spool directory before the utimes() call
> after the one-second sleep, then it will still rescan it on the next
> check when it notices a newer timestamp on the spool directory.  If
> it is the same timestamp as the second timestamp on the spool directory,
> then the scan is guaranteed to have not started before that second began,
> meaning that the crontab(8) process editing the user's crontab must have
> passed the rename, so the scan will see the user's new crontab.
> 
> > A possible solution on the daemon side of things might be something like
> > the attached, but I should state (nay, shout) that I haven't looked
> > beyond these few lines to see if there are any unintended side effects
> > to such a change.
> 
> I think this patch doesn't change anything at all actually.  It is 
> certainly subject to the original race I described if you do not use
> the patch in crontab(8) itself.
> 

You're right about my patch not fixing anything; I didn't think hard
enough before I started typing.  

But I think the problem I was trying to get at with /etc/crontab still
exists with your patch; it would be triggered if a user updated their
crontab and after the 1 second sleep the directory timestamp gets
updated and cron rebuilds the database, an

Re: Race between cron and crontab

2012-01-31 Thread Ian Lepore
On Tue, 2012-01-31 at 11:49 -0500, John Baldwin wrote:
> A co-worker ran into a race between updating a cron tab via crontab(8) and 
> cron(8) yesterday.  Specifically, cron(8) failed to notice that a crontab was 
> updated.  The problem is that 1) by default our filesystems only use second 
> granularity for timestamps and 2) cron only caches the seconds portion of a 
> file's timestamp when checking for changes anyway.  This means that cron can 
> miss updates to a spool directory if multiple updates to the directory are 
> performed within a single second and cron wakes up to scan the spool 
> directory 
> within the same second and scans it before all of the updates are complete.
> 
> Specifically, when replacing a crontab, crontab(8) first creates a temporary 
> file in /var/cron/tabs and then uses a rename to install it followed by 
> touching the spool directory to update its modification time.  However, the 
> creation of the temporary file already changes the modification time of the 
> directory, and cron may "miss" the rename if it scans the directory in 
> between 
> the creation of the temporary file and the rename.
> 
> The "fix" I am planning to use locally is to simply force crontab(8) to sleep 
> for a second before it touches the spool directory, thus ensuring that it the 
> touch of the spool directory will use a later modification time than the 
> creation of the temporary file.
> 
> Note that crontab -r is not affected by this race as it only does one atomic 
> update to the directory (unlink()).
> 
> Index: crontab.c
> ===
> --- crontab.c (revision 225431)
> +++ crontab.c (working copy)
> @@ -604,6 +604,15 @@ replace_cmd() {
>  
>   log_it(RealUser, Pid, "REPLACE", User);
>  
> + /*
> +  * Creating the 'tn' temp file has already updated the
> +  * modification time of the spool directory.  Sleep for a
> +  * second to ensure that poke_daemon() sets a later
> +  * modification time.  Otherwise, this can race with the cron
> +  * daemon scanning for updated crontabs.
> +  */
> + sleep(1);
> +
>   poke_daemon();
>  
>   return (0);

Maybe this is overly pedantic, but that solution still allows the
possibility of the same sort of race if a user updates their crontab in
the same second as an admin saves a new /etc/crontab, because cron takes
the max timestamp of /etc/crontab and /var/cron/tabs and compares it
against the database-rebuild timestamp.

A possible solution on the daemon side of things might be something like
the attached, but I should state (nay, shout) that I haven't looked
beyond these few lines to see if there are any unintended side effects
to such a change.

-- Ian

diff -r eb5f4971de86 usr.sbin/cron/cron/database.c
--- usr.sbin/cron/cron/database.c	Fri Jan 20 16:12:15 2012 -0700
+++ usr.sbin/cron/cron/database.c	Tue Jan 31 10:48:32 2012 -0700
@@ -72,7 +72,7 @@ load_database(old_db)
 	 * so is guaranteed to be different than the stat() mtime the first
 	 * time this function is called.
 	 */
-	if (old_db->mtime == TMAX(statbuf.st_mtime, syscron_stat.st_mtime)) {
+	if (old_db->mtime > TMAX(statbuf.st_mtime, syscron_stat.st_mtime)) {
 		Debug(DLOAD, ("[%d] spool dir mtime unch, no load needed.\n",
 			  getpid()))
 		return;
@@ -83,7 +83,7 @@ load_database(old_db)
 	 * actually changed.  Whatever is left in the old database when
 	 * we're done is chaff -- crontabs that disappeared.
 	 */
-	new_db.mtime = TMAX(statbuf.st_mtime, syscron_stat.st_mtime);
+	new_db.mtime = 1 + TMAX(statbuf.st_mtime, syscron_stat.st_mtime);
 	new_db.head = new_db.tail = NULL;
 
 	if (syscron_stat.st_mtime) {
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: FILTER_SCHEDULE_THREAD is not a bit-value

2012-01-31 Thread Ian Lepore
On Tue, 2012-01-31 at 15:57 +0700, Max Khon wrote: 
> Hello,
> 
> On Tue, Jan 31, 2012 at 12:44 AM, Ian Lepore
>  wrote:
> 
> >> sys/bus.h documents the following semantics for FILTER_SCHEDULE_THREAD:
> >>
> >> /**
> >>  * @brief Driver interrupt filter return values
> >>  *
> >>  * If a driver provides an interrupt filter routine it must return an
> >>  * integer consisting of oring together zero or more of the following
> >>  ^^^
> >>  * flags:
> >>  *
> >>  *  FILTER_STRAY- this device did not trigger the interrupt
> >>  *  FILTER_HANDLED  - the interrupt has been fully handled and can be 
> >> EOId
> >>  *  FILTER_SCHEDULE_THREAD - the threaded interrupt handler should be
> >>  *scheduled to execute
> >>  *
> >>  * If the driver does not provide a filter, then the interrupt code will
> >>  * act is if the filter had returned FILTER_SCHEDULE_THREAD.  Note that it
> >>  * is illegal to specify any other flag with FILTER_STRAY and that it is
> >>  * illegal to not specify either of FILTER_HANDLED or 
> >> FILTER_SCHEDULE_THREAD
> >>  * if FILTER_STRAY is not specified.
> >>  */
> >> #define FILTER_STRAY0x01
> >> #define FILTER_HANDLED  0x02
> >> #define FILTER_SCHEDULE_THREAD  0x04
> >>
> >> But actually FILTER_SCHEDULE_THREAD is not used as a bit-value (see
> >> kern/kern_intr.c):
> >>
> >> if (!thread) {
> >> if (ret == FILTER_SCHEDULE_THREAD)
> >> thread = 1;
> >> }
> >>
> >> There is at least one in-tree driver that could be broken because of
> >> this (asmc(8), but I found the problem with some other out-of-tree
> >> driver).
> >> This should be "if (ret & FILTER_SCHEDULE_THREAD)" instead. Attached
> >> patch fixes the problem.
> >>
> >> What do you think?
> >>
> >> Max
> >
> > I think returning (FILTER_HANDLED | FILTER_SCHEDULE_THREAD) makes no
> > sense given the definition "the interrupt has been fully handled and can
> > be EOId".  If you EOI in the primary interrupt context and then schedule
> > a threaded handler to run as well you're likely to need complex locking
> > between the primary and threaded interrupt handlers and I was under the
> > impression that's just the sort of thing the filter/threaded scheme was
> > designed to avoid.
> 
> I see no sense here.
> 1) You would have to implement locking anyway to protect concurrent
> access from ithread/filter and other driver methods (char device or
> network device callbacks)
> 

That is often, but not always, the case.  Depending on the hardware and
the needs of the driver, the guaranteed temporal separation between
primary and threaded interrupt context can reduce the need for locking.
In one case I managed to avoid the need to do any locking at all in the
primary context (in a pps driver to replace the stock one that lost the
ability to handle interrupts in a primary context at all).

> 2) ithread and filter can already be executed simultaneously even when
> only FILTER_SCHEDULE_THREAD is returned: when ithread is scheduled to
> be executed the device can emit a new interrupt and it will be
> preempted by filter
> 

No, if the primary-context handler does not return FILTER_HANDLED and
the interrupt dispatcher code does not EOI the interrupt until after the
threaded handler has run, then another hardware interrupt from that
source cannot interrupt the threaded handler.  This amounts to implicit
temporal synchronization between primary and threaded interrupt contexts
that eliminates the need for explicit synchronization using locks.

> > In other words, the part about ORing together values seems to be staking
> > out room for future growth, because the current set of flags and the
> > words about how to use them imply that only one of the current set of
> > values should be returned at once.
> 
> No, the text does not imply that only one of the values is supposed to
> be returned (where did you see it). See also KASSERT checks in
> intr_event_handle() -- they clearly show that the intention was to
> allow FILTER_HANDLED and FILTER_SCHEDULE_THREAD to be returned
> simultaneously.
> 
> Max

I have to admit that the text doesn't specifically forbid returning both
values ORed together, but it seems to me that doing so is nonsensical.
The reason is the corollary to the above point:  if you return
F

Re: [patch] nextboot(8) arbitrary kernel environment

2012-01-30 Thread Ian Lepore
On Mon, 2012-01-30 at 14:57 -0500, Ed Maste wrote:
> I have a patch to allow nextboot(8) to set arbitrary kernel environment
> variables (not just the kernel dir and kernel_options).  The usage becomes:
> 
> Usage: nextboot [-e variable=value] [-f] [-k kernel] [-o options]
>nextboot -D
> 
> and the new option is documented as:
> 
>  -e variable=value
>  This option adds the provided variable and value to the ker-
>  nel environment.  The value is quoted when written to the
>  nextboot configuration.
> 
> The patch also makes -k an option (no longer mandatory).  The patch is at
> http://people.freebsd.org/~emaste/nextboot.diff .  I'll commit it in a few
> days if no concerns are raised by review or my testing.
> 
> -Ed

Minor nit:

  -It is not the most thoroughly tested code.
  +It is not the most throughly tested code.

The original spelling is the correct one.

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: FILTER_SCHEDULE_THREAD is not a bit-value

2012-01-30 Thread Ian Lepore
On Mon, 2012-01-30 at 22:50 +0600, Max Khon wrote:
> Hello!
> 
> sys/bus.h documents the following semantics for FILTER_SCHEDULE_THREAD:
> 
> /**
>  * @brief Driver interrupt filter return values
>  *
>  * If a driver provides an interrupt filter routine it must return an
>  * integer consisting of oring together zero or more of the following
>  ^^^
>  * flags:
>  *
>  *  FILTER_STRAY- this device did not trigger the interrupt
>  *  FILTER_HANDLED  - the interrupt has been fully handled and can be EOId
>  *  FILTER_SCHEDULE_THREAD - the threaded interrupt handler should be
>  *scheduled to execute
>  *
>  * If the driver does not provide a filter, then the interrupt code will
>  * act is if the filter had returned FILTER_SCHEDULE_THREAD.  Note that it
>  * is illegal to specify any other flag with FILTER_STRAY and that it is
>  * illegal to not specify either of FILTER_HANDLED or FILTER_SCHEDULE_THREAD
>  * if FILTER_STRAY is not specified.
>  */
> #define FILTER_STRAY0x01
> #define FILTER_HANDLED  0x02
> #define FILTER_SCHEDULE_THREAD  0x04
> 
> But actually FILTER_SCHEDULE_THREAD is not used as a bit-value (see
> kern/kern_intr.c):
> 
> if (!thread) {
> if (ret == FILTER_SCHEDULE_THREAD)
> thread = 1;
> }
> 
> There is at least one in-tree driver that could be broken because of
> this (asmc(8), but I found the problem with some other out-of-tree
> driver).
> This should be "if (ret & FILTER_SCHEDULE_THREAD)" instead. Attached
> patch fixes the problem.
> 
> What do you think?
> 
> Max

I think returning (FILTER_HANDLED | FILTER_SCHEDULE_THREAD) makes no
sense given the definition "the interrupt has been fully handled and can
be EOId".  If you EOI in the primary interrupt context and then schedule
a threaded handler to run as well you're likely to need complex locking
between the primary and threaded interrupt handlers and I was under the
impression that's just the sort of thing the filter/threaded scheme was
designed to avoid.

In other words, the part about ORing together values seems to be staking
out room for future growth, because the current set of flags and the
words about how to use them imply that only one of the current set of
values should be returned at once.

On the other hand, the words are also self-contradictory, in that they
say "oring together zero or more" but then later when saying which flags
can be used together it's defined as erronious to return zero.

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: revisiting tunables under Safe Mode menu option

2012-01-30 Thread Ian Lepore
On Mon, 2012-01-30 at 18:59 +0200, Andriy Gapon wrote:
> 
> o hw.ata.ata_dma, hw.ata.atapi_dma - I am not sure if there have been any
> significant problems with ATA DMA recently.  Maybe these could be removed?

I still have to work with hardware that requires ata_dma disabled.  It
seems to be required for most systems I've worked with that have a
compact flash socket on the mainboard (sometimes you can just limit the
mode to udma33 or less, sometimes you have to turn it off completely.)

Adding kern.eventtimer.periodic=1 seems like a good idea.

As a general philosophical thing, I don't have a problem with the idea
"safe mode turns off everything that has ever historically been
problematic," because I don't think anyone expects a system to run well
in safe mode.  I see it more as a tool to start narrowing down the area
of trouble, like step 1 of a binary search for the problem.  As such,
the most important aspect is a comprehensive list of what changes for
safe mode, so that you can procede by selectively en/disabling the
various things it does.

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: memory barriers in bus_dmamap_sync() ?

2012-01-11 Thread Ian Lepore
On Wed, 2012-01-11 at 09:59 -0700, Scott Long wrote:
> 
> Where barriers _are_ needed is in interrupt handlers, and I can
> discuss that if you're interested.
> 
> Scott
> 

I'd be interested in hearing about that (and in general I'm loving the
details coming out in your explanations -- thanks!).

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: memory barriers in bus_dmamap_sync() ?

2012-01-11 Thread Ian Lepore
On Wed, 2012-01-11 at 11:49 -0500, John Baldwin wrote:
> On Wednesday, January 11, 2012 11:29:44 am Luigi Rizzo wrote:
> > On Wed, Jan 11, 2012 at 10:05:28AM -0500, John Baldwin wrote:
> > > On Tuesday, January 10, 2012 5:41:00 pm Luigi Rizzo wrote:
> > > > On Tue, Jan 10, 2012 at 01:52:49PM -0800, Adrian Chadd wrote:
> > > > > On 10 January 2012 13:37, Luigi Rizzo  wrote:
> > > > > > I was glancing through manpages and implementations of bus_dma(9)
> > > > > > and i am a bit unclear on what this API (in particular, 
> > > > > > bus_dmamap_sync() )
> > > > > > does in terms of memory barriers.
> > > > > >
> > > > > > I see that the x86/amd64 and ia64 code only does the bounce buffers.
> > > 
> > > That is because x86 in general does not need memory barriers. ...
> > 
> > maybe they are not called memory barriers but for instance
> > how do i make sure, even on the x86, that a write to the NIC ring
> > is properly flushed before the write to the 'start' register occurs ?
> > 
> > Take for instance the following segment from
> > 
> > head/sys/ixgbe/ixgbe.c::ixgbe_xmit() :
> > 
> > txd->read.cmd_type_len |=
> > htole32(IXGBE_TXD_CMD_EOP | IXGBE_TXD_CMD_RS);
> > txr->tx_avail -= nsegs;
> > txr->next_avail_desc = i;
> > 
> > txbuf->m_head = m_head;
> > /* Swap the dma map between the first and last descriptor */
> > txr->tx_buffers[first].map = txbuf->map;
> > txbuf->map = map;
> > bus_dmamap_sync(txr->txtag, map, BUS_DMASYNC_PREWRITE);
> > 
> > /* Set the index of the descriptor that will be marked done */
> > txbuf = &txr->tx_buffers[first];
> > txbuf->eop_index = last;
> > 
> > bus_dmamap_sync(txr->txdma.dma_tag, txr->txdma.dma_map,
> > BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
> > /*
> >  * Advance the Transmit Descriptor Tail (Tdt), this tells the
> >  * hardware that this frame is available to transmit.
> >  */
> > ++txr->total_packets;
> > IXGBE_WRITE_REG(&adapter->hw, IXGBE_TDT(txr->me), i);
> > 
> > the descriptor is allocated without any caching constraint,
> > the bus_dmamap_sync() are effectively NOPs on i386 and amd64,
> > and IXGBE_WRITE_REG has no implicit guard.
> 
> x86 doesn't need a guard as its stores are ordered.  The bus_dmamap_sync()
> would be sufficient for platforms where stores can be reordered in this
> case (as those platforms should place memory barriers in their implementation
> of bus_dmamap_sync()).
>  
> > > We could use lfence/sfence on amd64, but on i386 not all processors 
> > > support
> > 
> > ok then we can make it machine-specific versions... this is kernel
> > code so we do have a list of supported CPUs.
> 
> It is not worth it to add the overhead for i386 to do that when all modern
> x86 CPUs are going to run amd64 anyway.
> 

Harumph.  I run i386 on all my x86 CPUs.  For our embedded systems
products it's because they're small wimpy old CPUs, and for my desktop
system it's because I need to run builds for the embedded systems and
avoid all the cross-build problems of trying to create i386 ports on a
64 bit host.

> > > those.  The broken drivers doing it by hand don't work on early i386 CPUs.
> > > Also, I personally don't like using membars like rmb() and wmb() by hand.
> > > If you are operating on normal memory I think atomic_load_acq() and
> > > atomic_store_rel() are better.
> > 
> > is it just a matter of names ?
> 
> For regular memory when you are using memory barriers you often want to tie
> the barrier to a specific operation (e.g. it is the store in IXGBE_WRITE_REG()
> above that you want ordered after any other stores).  Having the load/store
> and membar in the same call explicitly notes that relationship.
> 
> > My complaint was mostly on how many
> > unused parameters you need to pass to bus_space_barrier().
> > They make life hard for both the programmer and the
> > compiler, which might become unable to optimize them out.
> 
> Yes, it seems overly abstracted.  In NetBSD, bus_dmapmap_sync() actually takes
> extra parameters to say which portion of the map should be sync'd.  We removed
> those in FreeBSD to make the API simpler.  bus_space_barrier() could probably
> use similar simplification (I believe we also adopted that API from NetBSD).

I've wished (in the ARM world) for the ability to sync a portion of a
map.  I've even kicked around the idea of proposing an API extension to
do so, but I guess if FreeBSD went out of its way to remove that
functionality that idea probably won't fly. :)

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: memory barriers in bus_dmamap_sync() ?

2012-01-10 Thread Ian Lepore
On Tue, 2012-01-10 at 22:37 +0100, Luigi Rizzo wrote:
> I was glancing through manpages and implementations of bus_dma(9)
> and i am a bit unclear on what this API (in particular, bus_dmamap_sync() )
> does in terms of memory barriers.
> 
> I see that the x86/amd64 and ia64 code only does the bounce buffers.
> The mips seems to do some coherency-related calls.
> 
> How do we guarantee, say, that a recently built packet is
> to memory before issuing the tx command to the NIC ?
> 
> cheers
> luigi

I've always assumed that when the concept of a memory barrier means
anything for a given architecture, it's implied that the
bus_dmamap_sync() call has to invoke it as needed to ensure the DMA
operation picks up the right data.  Maybe it would be good if the
manpage said that straight out.

The ARM implementations do use the memory barrier operations, in the
form of a call to cpu_drain_writebuf() in the busdma_machdep code.  The
ARM specification says that the CPU is stopped until all buffered data
is written to memory for that operation.

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: bus dma: a flag/quirk for page zero

2012-01-10 Thread Ian Lepore
On Tue, 2012-01-10 at 23:15 +0200, Andriy Gapon wrote:
> on 10/01/2012 22:53 Ian Lepore said the following:
> > On Tue, 2012-01-10 at 22:18 +0200, Andriy Gapon wrote:
> >>
> >> Some hardware interfaces may reserve a special meaning for a (physical) 
> >> memory
> >> address value of zero.  One example is the OHCI specification where a zero 
> >> value
> >> in CurrentBufferPointer doesn't mean a physical address, but has a reserved
> >> meaning.  To be honest I don't have another example :) but don't preclude 
> >> its
> >> existence.
> >>
> >> To deal with this peculiarity we could use a special flag/quirk that would
> >> instruct the bus dma code to never use the page zero for communication 
> >> with the
> >> hardware.
> >> Here's a proof of concept patch that implements the idea:
> >> http://people.freebsd.org/~avg/usb-dma-pagezero.diff
> >>
> >> Some concerns:
> >> - not sure if BUS_DMA_NO_PAGEZERO is the best name for the flag
> >> - the patch implements the flag only for x86 at the moment
> >> - usb code uses the flag regardless of the actual controller type
> >>
> >> What do you think?
> > 
> > I think another way to handle this, one that doesn't require modifying
> > the busdma_machdep implementation for every architecture, would be for
> > usb_dma_tag_create() to set lowaddr to zero and provide a filter func
> > that filters based on both the value zero and the expression currently
> > being passed as lowaddr.  At least, I think that's how the filterfunc
> > stuff is supposed to work, I've never actually coded a busdma filter.
> 
> This has still some problems:
> - filter func is called for the range (lowaddr, hiaddr], that is lowadr is not
> inclusive, as such there is no way to filter page zero
> - a bounce page could still be at the physical address zero
> - and overriding the above, even worse, bounce pages are allocated in the 
> range
> below lowaddr, so with lowaddr of zero it's impossible to have any bounce 
> pages

Wow, I didn't realize.  That almost reads like a list of bugs in the
current busdma design.  It seems especially wrong to assume that no
hardware in the world now or ever would have its range of DMA-able
addresses in the middle of its physical address space.

I'll throw one more idea out, (because it just popped into my head, not
because I think it's the best possible idea)...  Could the problems you
list be circumvented (for this situation and maybe others) with a new
flag BUS_DMA_ALWAYS_FILTER that makes the filter function run regardless
of the low/high addr values?  That would add the flexibility to handle
any arbitary kinds of ranges no matter what hardware or strange
requirements come along.

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: bus dma: a flag/quirk for page zero

2012-01-10 Thread Ian Lepore
On Tue, 2012-01-10 at 22:18 +0200, Andriy Gapon wrote:
> 
> Some hardware interfaces may reserve a special meaning for a (physical) memory
> address value of zero.  One example is the OHCI specification where a zero 
> value
> in CurrentBufferPointer doesn't mean a physical address, but has a reserved
> meaning.  To be honest I don't have another example :) but don't preclude its
> existence.
> 
> To deal with this peculiarity we could use a special flag/quirk that would
> instruct the bus dma code to never use the page zero for communication with 
> the
> hardware.
> Here's a proof of concept patch that implements the idea:
> http://people.freebsd.org/~avg/usb-dma-pagezero.diff
> 
> Some concerns:
> - not sure if BUS_DMA_NO_PAGEZERO is the best name for the flag
> - the patch implements the flag only for x86 at the moment
> - usb code uses the flag regardless of the actual controller type
> 
> What do you think?

I think another way to handle this, one that doesn't require modifying
the busdma_machdep implementation for every architecture, would be for
usb_dma_tag_create() to set lowaddr to zero and provide a filter func
that filters based on both the value zero and the expression currently
being passed as lowaddr.  At least, I think that's how the filterfunc
stuff is supposed to work, I've never actually coded a busdma filter.

This has the advantage I call "locality of strangeness."  If only the
OHCI hardware needs this strange processing, and it seems like in the
future this strangeness will still be more the exception than the rule,
then the strangeness is best kept close to the place where it's needed,
rather than being spread out all over the place (lots of machdep
places).

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


<    1   2   3   4   5