Re: ZFS boot fails with two pools
On Jul 7, 2011, at 12:19 PM, Volodymyr Kostyrko wrote: >>> 2. Try to convince bios to boot from the disk of pool2. >> >> There is no disk with a singular ZFS pool. > > Any disk from bootable pool. Every disk contains two pools. And the BIOS sees only two (maybe three) of them. >>> 3. You can possibly try deploying /boot/boot0 MBR selector code over disks >>> of data pool. Supplied boot0 code can be used to choose another disk to >>> jump to it during boot process and will remember the last choice. >> >> I'm not really sure how to do this with GPT. Should I use boot0 instead of >> pmbr? > > boot0cfg is your old friend Cool, how do we get acquinted? > Actuall I think that code on that stages just tries to boot from the pool on > the current disk. There are two pools on it... ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [CFT] Sierra Wireless HSPA+ USB modem
On Thu, Jul 7, 2011 at 6:47 AM, Hans Petter Selasky wrote: > On Thursday 07 July 2011 14:43:22 PseudoCylon wrote: >> On Wed, Jul 6, 2011 at 9:19 AM, Hans Petter Selasky > wrote: >> >> Hi, >> >> >> >> I'm going to review and import your driver. >> >> >> >> --HPS >> > >> > Hi, >> > >> > The intial patch had some bad code and didn't compile on 9-current. I've >> > tried to clean it up. Please test and report back if I didn't break >> > anything. >> > >> > http://hselasky.homeunix.org:8192/usie_for_FreeBSD_9_current.patch >> > >> > --HPS >> >> Hello, >> >> Thanks for the patch. >> >> if_usie.c >> 241 if (usbd_lookup_id_by_uaa(usie_devs, sizeof(usie_devs), uaa) != 0) >> 242 return; /* no device match */ >> >> It should return non-zero on success, but somehow this caused the >> process to exit, and modem stayed being a CD-ROM. > > Hi, > > Is this device changing its USB vendor and product ID ? > Yes, it does. So I added the device id for cd-rom, SIERRA, TRUINSTALL (already in usbdevs). static const STRUCT_USB_HOST_ID usie_devs[] = { #define USIE_DEV(v, d) {\ USB_VP(USB_VENDOR_##v, USB_PRODUCT_##v##_##d) } USIE_DEV(SIERRA, MC8700), USIE_DEV(AIRPRIME, USB308), +USIE_DEV(SIERRA, TRUINSTALL), #undef USIE_DEV }; Now it works even if the modem is plugged in before loading the driver. The device id 0x0fff is for cd-rom, but sierra didn't specify the vendor id. So, there might be (AIRPRIME, TRUINSTALL). With your "uint8_t pad" fix, the driver works fine. Thanks AK Here is a patch. diff --git a/sys/dev/usb/net/if_usie.c b/sys/dev/usb/net/if_usie.c index 552765b..f6f6c60 100644 --- a/sys/dev/usb/net/if_usie.c +++ b/sys/dev/usb/net/if_usie.c @@ -88,6 +88,7 @@ static const STRUCT_USB_HOST_ID usie_devs[] = { USB_VP(USB_VENDOR_##v, USB_PRODUCT_##v##_##d) } USIE_DEV(SIERRA, MC8700), USIE_DEV(AIRPRIME, USB308), + USIE_DEV(SIERRA, TRUINSTALL), #undef USIE_DEV }; @@ -1522,8 +1523,9 @@ usie_hip_rsp(struct usie_softc *sc, uint8_t *rsp, uint32_t len) DPRINTF("hip: len=%d msgID=%02x, param=%02x\n", be16toh(hip->len), hip->id, hip->param); + pad = (hip->id & USIE_HIP_PAD) ? 1 : 0; + if ((hip->id & USIE_HIP_MASK) == USIE_HIP_CNS2H) { - pad = (hip->id & USIE_HIP_PAD) ? 1 : 0; cns = (struct usie_cns *)(((uint8_t *)(hip + 1)) + pad); if (j < (sizeof(struct usie_cns) + ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Heavy I/O blocks FreeBSD box for several seconds
(OT, yes, but I'd like to take a stab at explaining "why" these things fall to the wayside..) On 7 July 2011 12:08, Arnaud Lacombe wrote: > What would be the point to even start looking at an issue? You guys > (by "you", I mean "official" committers on public list) don't care When someone who has an active interest takes ownership of the problem. > about people providing patches, might it be for trivial, obvious, > fixes. I'm not even talking about complex patches ... When you > eventually ends up providing a patch, you ends up being slammed a door > at by maintainers asserting their code is perfect, until logic and > user complaints prove them wrong. > > That said, this comment is off-topic, but I will certainly re-state > this next month when I'll be ping'ing trivial patches. The problem is that someone doesn't own the problem. If I commit someone's fix to the tree without really understanding what's going on, I take ownership of that change and any issues/breakages/changes that it creates. The people responsible for these areas are likely very busy with other things. It's not that they don't want to help! It's much more likely that they don't have the time. Trivial patches aren't always so trivial. You can change the behaviour of something subtle which works great for you and not for others. This is very likely what's going on with IO/CPU scheduling. It's a tricky area. A simple fix isn't always as simple. So if there's a diagnosed problem, with reproducable test cases and some patches which fix it, I suggest doing something like the following: * create a webpage, even if it's a wiki somewhere (even wiki.freebsd.org if you ask someone nicely) * dump all the information you can in there. Having stuff in emails is great - but it's only really helpful for tracking the 'flow' of a discussion. Having a summarised analysis of all of that on a webpage is much more helpful. * Add the patches there. * Encourage people who aren't in your immediate community to try them too - to try and find if your changes mess up other configurations somehow. * Be persistent trying to get your changes in. If you've done the background research, done some wide-spread testing and show you've not caused any obvious regressions, you're much more likely to get your changes in. With all of that done, you can likely find a committer who will help you get your fixes into the tree. Please just try not to interpret a lack of response as a lack of interest. There's only so much time in the day and committers tend to be a busy bunch, with day jobs that may in no way reflect their FreeBSD interests. Finally, if people do enough of the above and begin to take ownership of parts of the tree, you'll find someone will likely sponsor you for a commit bit. HTH, Adrian ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Heavy I/O blocks FreeBSD box for several seconds
On Jul 7, 2011, at 3:51 PM, Hartmann, O. wrote: > This is quibbling. On heavy loads on networ, disk et cetera, isn't there > always and also a CPU bound load? No. Properly written software blocks when waiting on network or disk I/O, and doesn't sit there spinning in a busy-wait consuming CPU until it actually gets more work to do. See select(2), kqueue(2), and friends. Regards, -- -Chuck ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Heavy I/O blocks FreeBSD box for several seconds
On 07/07/11 09:27, Andriy Gapon wrote: on 06/07/2011 21:11 Nathan Whitehorn said the following: On 07/06/11 13:00, Steve Kargl wrote: AFAICT, it is a cpu affinity issue. If I launch n+1 MPI images on a system with n cpus/cores, then 2 (and sometimes 3) images are stuck on a cpu and those 2 (or 3) images ping-pong on that cpu. I recall trying to use renice(8) to force some load balancing, but vaguely remember that it did not help. I've seen exactly this problem with multi-threaded math libraries, as well. Exactly the same? Let's see. Using parallel GotoBLAS on FreeBSD gives terrible performance because the threads keep migrating between CPUs, causing frequent cache misses. So Steve reports that if he has Nthr> Ncpu, then some threads are "over-glued" to a particular CPU, which results in sub-optimal scheduling for those threads. I have to guess that Steve would want to see the threads being shuffled between CPUs to produce more even CPU load. On the other hand, you report that your threads keep being shuffled between CPUs (I presume for Nthr == Ncpu case, where Nthr is a count of the number-crunching threads). And I guess that you want them to stay glued to particular CPUs. So how is this the same problem? In fact, it sounds like somewhat opposite. The only thing in common is that you both don't like how ULE works. ULE has many knobs to tune its behavior. Unfortunately they are not very well documented and there are too many of them. So, it's not easy to find which combination would be the best for a particular work-load. In your particular case you might want to try to increase value of kern.sched.affinity to increase affinity of threads to their CPUs. Not all of those using FreeBSD are developer or experts, even experts of a very specific area of computer science and engineering or a particular subject of the FreeBSD kernel and its techniques of scheduling. I'm not capable of tuning my servers via a lot of undocumented knobs, I'm sorry. I'd like to do if there would be a kind of howto (handbook?). Also, please note that FreeBSD support in GotoBLAS is not equivalent to Linux support as I have pointed out before. On Linux they bind their threads to CPUs to avoid the situation that you describe. Apparently they didn't know how to do CPU-binding on FreeBSD, so this is not implemented. You may have a motivation to help them out with this. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Heavy I/O blocks FreeBSD box for several seconds
On 07/07/11 09:04, Andriy Gapon wrote: on 07/07/2011 06:11 Steve Kargl said the following: Unfortunately, I have neither the brain capacity and time nor the money to fix the issue. To solve OP's problem in the short, the simplest solution may be to switch to 4BSD. Let's face, ULE is not a silver bullet. I think that piling up different problems into a single discussion, even if they involve a common component (to a certain degree), is not going to help anybody. If I have read this thread correctly (and taking the subject line as a witness) the OP had a problem with heavy I/O activity screwing up interactivity. I think that it's not the same problem as sub-optimal performance of heavy CPU-bound load, which is what you reported if I am not mistaken. This is quibbling. On heavy loads on networ, disk et cetera, isn't there always and also a CPU bound load? Whenever this problem came up, it was brought down by force. Yes, I reported due to the obvious fact that this essential problem involves usability of several workstations. It get more obvious when FreeBSD is used with a GUI. But I also realized, as I reported(!), problems on a headless server, even with several tunings, performed slowly and with increasing numbers over time as recommended here (for instance, kern.sched.preempt_thresh=224 or up to kern.sched.preempt_thresh=512, with little effect). Where, if not here, should such problems be discussed? If we open for each dedicated micro-problem a separate thread, we would never gather that many problems which seem to be related to one single well known 'sweet spot'. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: cardbus panic: end address is not aligned
On Sunday, July 03, 2011 1:39:18 am Doug Barton wrote: > I have 2 ath-based pc-card adapters. If I put either one of them in the > slot while the system is up, or if I try booting with them in the slot, > I get an instant panic. The cards previously worked in -current, and > continue to work in 8-stable and windows xp. I don't have any other > pc-cards to compare with. Full core.txt.0 file is in my home directory > on freefall. > > This problem persists on r223732 but happened to me for the first time a > week or 2 ago (haven't had time to report it previously, apologies). It > likely originated a while before though, I don't use these cards very > often. > > panic: end address is not aligned > > #1 0x80426a8a in kern_reboot (howto=260) > at /home/svn/head/sys/kern/kern_shutdown.c:430 > #2 0x80426521 in panic (fmt=Variable "fmt" is not available. > ) > at /home/svn/head/sys/kern/kern_shutdown.c:604 > #3 0x8032c648 in pcib_grow_window (sc=0xfe0002603400, > w=0xfe0002603498, type=3, start=0, end=4294967295, count=65536, > flags=Variable "flags" is not available. The line is here: KASSERT((w->limit & ((1ul << w->step) - 1)) == (1ul << w->step) - 1, ("end address is not aligned")); Can you run kgdb and do 'frame 3' and 'p/x *w'? Also, can you boot your machine, then do 'sysctl debug.bootverbose=1', insert the card and record the messages in dmesg when it does? (You can likely get those out of kgdb.) -- John Baldwin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Heavy I/O blocks FreeBSD box for several seconds
On Thu, Jul 07, 2011 at 10:42:39PM +0300, Andriy Gapon wrote: > on 07/07/2011 18:14 Steve Kargl said the following: >> >> I'm using OpenMPI. These are N > Ncpu processes not threads, > > I used 'thread' in a sense of a kernel thread. It shouldn't > actually matter if it's a process or a thread in userland > in this context. > > > and without > > the loss of generality let N = Ncpu + 1. It is a classic master-slave > > situation where 1 process initializes all others. The n-1 slave processes > > are then independent of each other. After 20 minutes or so of number > > crunching, each slave sends a few 10s of KB of data to the master. The > > master collects all the data, writes it to disk, and then sends the > > slaves the next set of computations to do. The computations are nearly > > identical, so each slave finishes it task in the same amount of time. The > > problem appears to be that 2 slaves are bound to the same cpu and the > > remaining N - 3 slaves are bound to a specific cpu. The N - 3 slaves > > finish their task, send data to the master, and then spin (chewing up > > nearly 100% cpu) waiting for the 2 ping-ponging slaves to finishes. > > This causes a stall in the computation. When a complete computation > > takes days to complete, theses stall become problematic. So, yes, I > > want the processes to get a more uniform access to cpus via migration > > to other cpus. This is what 4BSD appears to do. > > I would imagine that periodic rebalancing would take care of this, > but probably the ULE rebalancing algorithm is not perfect. :-) > There was a suggestion on performance@ to try to use a lower value for > kern.sched.steal_thresh, a value of 1 was recommended: > http://article.gmane.org/gmane.os.freebsd.performance/3459 node16:kargl[215] uname -a FreeBSD node16.cimu.org 9.0-CURRENT FreeBSD 9.0-CURRENT #2 r223824M: Thu Jul 7 11:12:15 PDT 2011 node16:kargl[216] sysctl -a | grep smp.cpu kern.smp.cpus: 4 4BSD kernel gives for N = Ncpu. 33 processes: 5 running, 28 sleeping PID USERNAME THR PRI NICE SIZERES STATE C TIMECPU COMMAND 1387 kargl 1 670 370M 293M CPU11 1:31 98.34% sasmp 1384 kargl 1 670 370M 293M CPU22 1:31 98.34% sasmp 1386 kargl 1 670 370M 294M CPU33 1:30 98.34% sasmp 1385 kargl 1 670 370M 294M RUN 0 1:31 98.29% sasmp 4BSD kernel gives for N = Ncpu + 1. 34 processes: 6 running, 28 sleeping PID USERNAME THR PRI NICE SIZERES STATE C TIMECPU COMMAND 1417 kargl 1 710 370M 294M RUN 0 1:30 79.39% sasmp 1416 kargl 1 710 370M 294M RUN 0 1:30 79.20% sasmp 1418 kargl 1 710 370M 294M CPU20 1:29 78.81% sasmp 1420 kargl 1 710 370M 294M CPU12 1:30 78.27% sasmp 1419 kargl 1 700 370M 294M CPU30 1:30 77.59% sasmp Recompiling the kernel to use ULE instead of 4BSD with the exact same hardware and kernel configuration. ULE kernel gives for N = Ncpu. 33 processes: 5 running, 28 sleeping PID USERNAME THR PRI NICE SIZERES STATE C TIMECPU COMMAND 1294 kargl 1 1030 370M 294M CPU33 1:30 100.00% sasmp 1292 kargl 1 1030 370M 294M RUN 2 1:30 100.00% sasmp 1295 kargl 1 1030 370M 293M CPU00 1:30 100.00% sasmp 1293 kargl 1 1030 370M 294M CPU11 1:28 100.00% sasmp ULE kernel gives for N = Ncpu + 1. 34 processes: 6 running, 28 sleeping PID USERNAME THR PRI NICE SIZERES STATE C TIMECPU COMMAND 1318 kargl 1 1030 370M 294M CPU00 1:31 100.00% sasmp 1319 kargl 1 1030 370M 294M RUN 1 1:29 100.00% sasmp 1322 kargl 1 990 370M 294M CPU22 1:03 87.26% sasmp 1320 kargl 1 910 370M 294M RUN 3 1:07 60.79% sasmp 1321 kargl 1 890 370M 294M CPU33 1:06 55.18% sasmp node16:root[165] sysctl -w kern.sched.steal_thresh=1 kern.sched.steal_thresh: 2 -> 1 34 processes: 6 running, 28 sleeping PID USERNAME THR PRI NICE SIZERES STATE C TIME WCPU COMMAND 1396 kargl 1 1030 366M 291M CPU33 1:30 100.00% sasmp 1397 kargl 1 1030 366M 291M CPU22 1:30 99.17% sasmp 1400 kargl 1 970 366M 291M CPU00 1:05 83.25% sasmp 1399 kargl 1 940 366M 291M RUN 1 1:04 73.97% sasmp 1398 kargl 1 980 366M 291M RUN 0 1:01 54.05% sasmp -- Steve ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Heavy I/O blocks FreeBSD box for several seconds
on 07/07/2011 18:14 Steve Kargl said the following: > On Thu, Jul 07, 2011 at 10:27:53AM +0300, Andriy Gapon wrote: >> on 06/07/2011 21:11 Nathan Whitehorn said the following: >>> On 07/06/11 13:00, Steve Kargl wrote: AFAICT, it is a cpu affinity issue. If I launch n+1 MPI images on a system with n cpus/cores, then 2 (and sometimes 3) images are stuck on a cpu and those 2 (or 3) images ping-pong on that cpu. I recall trying to use renice(8) to force some load balancing, but vaguely remember that it did not help. >>> >>> I've seen exactly this problem with multi-threaded math libraries, as well. >> >> Exactly the same? Let's see. >> >>> Using parallel GotoBLAS on FreeBSD gives terrible performance because the >>> threads keep migrating between CPUs, causing frequent cache misses. [*]-^^^ >> So Steve reports that if he has Nthr > Ncpu, then some threads are >> "over-glued" >> to a particular CPU, which results in sub-optimal scheduling for those >> threads. >> I have to guess that Steve would want to see the threads being shuffled >> between >> CPUs to produce more even CPU load. > > I'm using OpenMPI. These are N > Ncpu processes not threads, I used 'thread' in a sense of a kernel thread. It shouldn't actually matter if it's a process or a thread in userland in this context. > and without > the loss of generality let N = Ncpu + 1. It is a classic master-slave > situation where 1 process initializes all others. The n-1 slave processes > are then independent of each other. After 20 minutes or so of number > crunching, each slave sends a few 10s of KB of data to the master. The > master collects all the data, writes it to disk, and then sends the > slaves the next set of computations to do. The computations are nearly > identical, so each slave finishes it task in the same amount of time. The > problem appears to be that 2 slaves are bound to the same cpu and the > remaining N - 3 slaves are bound to a specific cpu. The N - 3 slaves > finish their task, send data to the master, and then spin (chewing up > nearly 100% cpu) waiting for the 2 ping-ponging slaves to finishes. > This causes a stall in the computation. When a complete computation > takes days to complete, theses stall become problematic. So, yes, I > want the processes to get a more uniform access to cpus via migration > to other cpus. This is what 4BSD appears to do. I would imagine that periodic rebalancing would take care of this, but probably the ULE rebalancing algorithm is not perfect. There was a suggestion on performance@ to try to use a lower value for kern.sched.steal_thresh, a value of 1 was recommended: http://article.gmane.org/gmane.os.freebsd.performance/3459 >> On the other hand, you report that your threads keep being shuffled between >> CPUs >> (I presume for Nthr == Ncpu case, where Nthr is a count of the >> number-crunching >> threads). And I guess that you want them to stay glued to particular CPUs. >> >> So how is this the same problem? In fact, it sounds like somewhat opposite. >> The only thing in common is that you both don't like how ULE works. > > Well, it may be similar in that N - 2 threads are bound to N - 2 > cpus, and the remaining 2 threads are ping ponging on the last It could be, but Nathan has never said this [*] and I also have never seen this in my very limited experiments with GotoBLAS. > remaining cpu. I suspect that GotoBLAS has a large amount > communication between threads, and once again the computations > stalls waiting of the 2 threads to either finish battling for the > 1 cpu or perhaps the process uses pthread_yield() in some clever > way to try to get load balancing. > -- Andriy Gapon ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Heavy I/O blocks FreeBSD box for several seconds
On Thu, Jul 7, 2011 at 5:14 PM, Steve Kargl < s...@troutmask.apl.washington.edu> wrote: > On Thu, Jul 07, 2011 at 10:27:53AM +0300, Andriy Gapon wrote: > > on 06/07/2011 21:11 Nathan Whitehorn said the following: > > > On 07/06/11 13:00, Steve Kargl wrote: > > >> AFAICT, it is a cpu affinity issue. If I launch n+1 MPI images > > >> on a system with n cpus/cores, then 2 (and sometimes 3) images > > >> are stuck on a cpu and those 2 (or 3) images ping-pong on that > > >> cpu. I recall trying to use renice(8) to force some load > > >> balancing, but vaguely remember that it did not help. > > > > > > I've seen exactly this problem with multi-threaded math libraries, as > well. > > > > Exactly the same? Let's see. > > > > > Using parallel GotoBLAS on FreeBSD gives terrible performance because > the > > > threads keep migrating between CPUs, causing frequent cache misses. > > > > So Steve reports that if he has Nthr > Ncpu, then some threads are > "over-glued" > > to a particular CPU, which results in sub-optimal scheduling for those > threads. > > I have to guess that Steve would want to see the threads being shuffled > between > > CPUs to produce more even CPU load. > > I'm using OpenMPI. These are N > Ncpu processes not threads, and without > the loss of generality let N = Ncpu + 1. It is a classic master-slave > situation where 1 process initializes all others. The n-1 slave processes > are then independent of each other. After 20 minutes or so of number > crunching, each slave sends a few 10s of KB of data to the master. The > master collects all the data, writes it to disk, and then sends the > slaves the next set of computations to do. The computations are nearly > identical, so each slave finishes it task in the same amount of time. The > problem appears to be that 2 slaves are bound to the same cpu and the > remaining N - 3 slaves are bound to a specific cpu. The N - 3 slaves > finish their task, send data to the master, and then spin (chewing up > nearly 100% cpu) waiting for the 2 ping-ponging slaves to finishes. > This causes a stall in the computation. When a complete computation > takes days to complete, theses stall become problematic. So, yes, I > want the processes to get a more uniform access to cpus via migration > to other cpus. This is what 4BSD appears to do. > > Spinning threads are a PITA for any scheduler, it's just that in your case 4BSD computes quantums differently. Is there any way to make the software sleep instead of spinning? > > On the other hand, you report that your threads keep being shuffled > between CPUs > > (I presume for Nthr == Ncpu case, where Nthr is a count of the > number-crunching > > threads). And I guess that you want them to stay glued to particular > CPUs. > > > > So how is this the same problem? In fact, it sounds like somewhat > opposite. > > The only thing in common is that you both don't like how ULE works. > > Well, it may be similar in that N - 2 threads are bound to N - 2 > cpus, and the remaining 2 threads are ping ponging on the last > remaining cpu. I suspect that GotoBLAS has a large amount > communication between threads, and once again the computations > stalls waiting of the 2 threads to either finish battling for the > 1 cpu or perhaps the process uses pthread_yield() in some clever > way to try to get load balancing. > > -- > Steve > ___ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" > -- Good, fast & cheap. Pick any two. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: named crashes on assertion in rbtdb.c on sparc64/SMP
On Thu, Jul 07, 2011 at 03:44:32PM +0400, KOT MATPOCKuH wrote: > 2011/7/7 Marius Strobl : > > On Thu, Jul 07, 2011 at 01:46:23PM +0400, KOT MATPOCKuH wrote: > >> I updated system to r223824 and got named patched to 9.6.-ESV-R4-P3, > >> but problem is still exists: > >> 07-Jul-2011 13:24:22.765 general: > >> /usr/src/lib/bind/dns/../../../contrib/bind9/lib/dns/rbtdb.c:1622: > >> REQUIRE(prev > 0) failed > >> 07-Jul-2011 13:24:22.781 general: exiting (due to assertion failure) > >> > >> How can I find root cause of the problem? > > From your description it's unclear whether you've built BIND with or > > without sparc64_isc_disable_atomic.diff. If it was built without that > > patch please give it a try. > As You can see, Doug is already included your patch in head: > http://svnweb.freebsd.org/base/head/contrib/bind9/lib/isc/sparc64/include/isc/atomic.h?r1=222395&r2=223811 > And, of course, bind builded with your patch... > That's not the patch I was referring to. I did a second one which just entirely disables the use of atomic operations on sparc64: http://people.freebsd.org/~marius/sparc64_isc_disable_atomic.diff Marius ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Heavy I/O blocks FreeBSD box for several seconds
On Thu, Jul 07, 2011 at 10:27:53AM +0300, Andriy Gapon wrote: > on 06/07/2011 21:11 Nathan Whitehorn said the following: > > On 07/06/11 13:00, Steve Kargl wrote: > >> AFAICT, it is a cpu affinity issue. If I launch n+1 MPI images > >> on a system with n cpus/cores, then 2 (and sometimes 3) images > >> are stuck on a cpu and those 2 (or 3) images ping-pong on that > >> cpu. I recall trying to use renice(8) to force some load > >> balancing, but vaguely remember that it did not help. > > > > I've seen exactly this problem with multi-threaded math libraries, as well. > > Exactly the same? Let's see. > > > Using parallel GotoBLAS on FreeBSD gives terrible performance because the > > threads keep migrating between CPUs, causing frequent cache misses. > > So Steve reports that if he has Nthr > Ncpu, then some threads are > "over-glued" > to a particular CPU, which results in sub-optimal scheduling for those > threads. > I have to guess that Steve would want to see the threads being shuffled > between > CPUs to produce more even CPU load. I'm using OpenMPI. These are N > Ncpu processes not threads, and without the loss of generality let N = Ncpu + 1. It is a classic master-slave situation where 1 process initializes all others. The n-1 slave processes are then independent of each other. After 20 minutes or so of number crunching, each slave sends a few 10s of KB of data to the master. The master collects all the data, writes it to disk, and then sends the slaves the next set of computations to do. The computations are nearly identical, so each slave finishes it task in the same amount of time. The problem appears to be that 2 slaves are bound to the same cpu and the remaining N - 3 slaves are bound to a specific cpu. The N - 3 slaves finish their task, send data to the master, and then spin (chewing up nearly 100% cpu) waiting for the 2 ping-ponging slaves to finishes. This causes a stall in the computation. When a complete computation takes days to complete, theses stall become problematic. So, yes, I want the processes to get a more uniform access to cpus via migration to other cpus. This is what 4BSD appears to do. > On the other hand, you report that your threads keep being shuffled between > CPUs > (I presume for Nthr == Ncpu case, where Nthr is a count of the > number-crunching > threads). And I guess that you want them to stay glued to particular CPUs. > > So how is this the same problem? In fact, it sounds like somewhat opposite. > The only thing in common is that you both don't like how ULE works. Well, it may be similar in that N - 2 threads are bound to N - 2 cpus, and the remaining 2 threads are ping ponging on the last remaining cpu. I suspect that GotoBLAS has a large amount communication between threads, and once again the computations stalls waiting of the 2 threads to either finish battling for the 1 cpu or perhaps the process uses pthread_yield() in some clever way to try to get load balancing. -- Steve ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [CFT] Sierra Wireless HSPA+ USB modem
On Thursday 07 July 2011 14:43:22 PseudoCylon wrote: > The compiler complained about uninitialized int > if_usie.c: 1484 > - uint8_t pad; > + uint8_t pad = 0; I changed it so that pad is set in both cases: pad = (hip->id & USIE_HIP_PAD) ? 1 : 0; if ((hip->id & USIE_HIP_MASK) == USIE_HIP_CNS2H) { cns = (struct usie_cns *)(((uint8_t *)(hip + 1)) + pad); --HPS ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [CFT] Sierra Wireless HSPA+ USB modem
On Thursday 07 July 2011 14:43:22 PseudoCylon wrote: > On Wed, Jul 6, 2011 at 9:19 AM, Hans Petter Selasky wrote: > >> Hi, > >> > >> I'm going to review and import your driver. > >> > >> --HPS > > > > Hi, > > > > The intial patch had some bad code and didn't compile on 9-current. I've > > tried to clean it up. Please test and report back if I didn't break > > anything. > > > > http://hselasky.homeunix.org:8192/usie_for_FreeBSD_9_current.patch > > > > --HPS > > Hello, > > Thanks for the patch. > > if_usie.c > 241 if (usbd_lookup_id_by_uaa(usie_devs, sizeof(usie_devs), uaa) != 0) > 242 return; /* no device match */ > > It should return non-zero on success, but somehow this caused the > process to exit, and modem stayed being a CD-ROM. Hi, Is this device changing its USB vendor and product ID ? We need this ID check, else all mass storage devices will receive the eject command! --HPS ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [CFT] Sierra Wireless HSPA+ USB modem
On Wed, Jul 6, 2011 at 9:19 AM, Hans Petter Selasky wrote: >> >> Hi, >> >> I'm going to review and import your driver. >> >> --HPS > > Hi, > > The intial patch had some bad code and didn't compile on 9-current. I've tried > to clean it up. Please test and report back if I didn't break anything. > > http://hselasky.homeunix.org:8192/usie_for_FreeBSD_9_current.patch > > --HPS > Hello, Thanks for the patch. if_usie.c 241 if (usbd_lookup_id_by_uaa(usie_devs, sizeof(usie_devs), uaa) != 0) 242 return; /* no device match */ It should return non-zero on success, but somehow this caused the process to exit, and modem stayed being a CD-ROM. The compiler complained about uninitialized int if_usie.c: 1484 - uint8_t pad; + uint8_t pad = 0; Otherwise it worked fine. AK ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Heavy I/O blocks FreeBSD box for several seconds
On 06/07/2011 20:11, Nathan Whitehorn wrote: I've seen exactly this problem with multi-threaded math libraries, as well. Using parallel GotoBLAS on FreeBSD gives terrible performance because the threads keep migrating between CPUs, causing frequent cache misses. On both schedulers? ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Heavy I/O blocks FreeBSD box for several seconds
On 06/07/2011 19:05, Poul-Henning Kamp wrote: In message<20110706170132.ga68...@troutmask.apl.washington.edu>, Steve Kargl w rites: I periodically ran the same type test in the 2008 post over the last three years. Nothing has changed. I even set up an account on one node in my cluster for jeffr to use. He was too busy to investigate at that time. Isn't this just the lemming-syncer hurling every dirty block over the cliff at the same time ? Occasionally there have been reports of there being "something" (tm) which causes CPU-bound processes to stall / starve when heavy file system IO is present. I think I have also noticed this occasionally but it was never serious enough to pursue it - only X11 lagging. The problem is - all this is sporadic and thus anecdotal. AFAIK, the "lemming-syncer" behaviour shouldn't stall anything if it's the only thing which is "wrong", right? I know one issue which might seemingly stall all IO: since there is only one IO queue, if it is filled with requests which take a long time, all other IO is blocked; as an example: doing simultaneous writes on a slow USB flash stick and on a hard drive will soon result in the queue being filled with slow USB requests, which will by the nature of the queue "push out" fast disk requests, making the drive look very slow (this is most noticable with large hirunningspace). But this doesn't seem to directly correlate with the OP's problem. Maybe this particular problem can be tested by having two drives - one to provoke this kind of stalling, and one to test if any IO can be done on it while the stall happens on the first one. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: named crashes on assertion in rbtdb.c on sparc64/SMP
2011/7/7 Marius Strobl : > On Thu, Jul 07, 2011 at 01:46:23PM +0400, KOT MATPOCKuH wrote: >> I updated system to r223824 and got named patched to 9.6.-ESV-R4-P3, >> but problem is still exists: >> 07-Jul-2011 13:24:22.765 general: >> /usr/src/lib/bind/dns/../../../contrib/bind9/lib/dns/rbtdb.c:1622: >> REQUIRE(prev > 0) failed >> 07-Jul-2011 13:24:22.781 general: exiting (due to assertion failure) >> >> How can I find root cause of the problem? > From your description it's unclear whether you've built BIND with or > without sparc64_isc_disable_atomic.diff. If it was built without that > patch please give it a try. As You can see, Doug is already included your patch in head: http://svnweb.freebsd.org/base/head/contrib/bind9/lib/isc/sparc64/include/isc/atomic.h?r1=222395&r2=223811 And, of course, bind builded with your patch... > If you had applied it then this apparently > is a generic bug in BIND and unrelated to the MD atomic implementation > and I don't know how to proceed in order to get that fixed. Hopefully > Doug can help you in that case. Okey, I look forward to for guidance from Doug... -- MATPOCKuH ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ZFS boot fails with two pools
On Thu, 07 Jul 2011 13:19:50 +0300 Volodymyr Kostyrko wrote: > > > >> You can boot from any of the drives and as long as the BIOS can see > >> enough drives you should be able to boot. > > > > In my case, the BIOS certainly can not see all members of the > > raid-z pool. The question is: why does it want to boot from raid-z > > at all, and how could it be persuaded to use the mirrored pool > > instead? > > Actuall I think that code on that stages just tries to boot from the > pool on the current disk. > Does your /boot/loader.conf have: vfs.root.mountfrom="zfs:tank/root" zfs_load="YES" Does FreeBSD bootloader actually starts to load? (Do you get to FreeBSD boot menu?) Does your system pool use compression? Which compression? -- Aldis Berjoza http://www.bsdroot.lv/ ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Jails: Setting different times in jails
> possibly achievable in libc? I don't know. Where else would it be done? stat, utimes, gettimeofday, clock_gettime, adjtime, etc and their variations. I've not checked what currently happens, but I don't think root in a jail should be able to set any kernel time parameters, absent a syscall that says it should. > in any case file this idea somewhere.. :-) Don't know here either. I looked at the lists and hackers seemed closest. I'll bcc current. Someone could maybe todo-wiki this thread as low hanging fruit. Cheers. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Heavy I/O blocks FreeBSD box for several seconds
Steve Kargl wrote: > Let's face, ULE is not a silver bullet. Or perhaps it is, but this particular problem is so heavily armored as to demand depleted uranium :) ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ZFS boot fails with two pools
07.07.2011 09:22, Berczi Gabor нwrote: On Jul 6, 2011, at 10:08 PM, Volodymyr Kostyrko wrote: 1. Check that pools have up-to-date boot code. I tried 8.2 and HEAD. You mean gpart+gptzfsboot+pmbr, right? Yep. 2. Try to convince bios to boot from the disk of pool2. There is no disk with a singular ZFS pool. Any disk from bootable pool. 3. You can possibly try deploying /boot/boot0 MBR selector code over disks of data pool. Supplied boot0 code can be used to choose another disk to jump to it during boot process and will remember the last choice. I'm not really sure how to do this with GPT. Should I use boot0 instead of pmbr? boot0cfg is your old friend However, this (http://freebsd.1045724.n5.nabble.com/Booting-from-ZFS-raidz-td4032461.html) may be related to the problem: That one is too old, I have one machine running 8.2 on raidz2 pool. You can boot from any of the drives and as long as the BIOS can see enough drives you should be able to boot. In my case, the BIOS certainly can not see all members of the raid-z pool. The question is: why does it want to boot from raid-z at all, and how could it be persuaded to use the mirrored pool instead? Actuall I think that code on that stages just tries to boot from the pool on the current disk. -- Sphinx of black quartz judge my vow. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: named crashes on assertion in rbtdb.c on sparc64/SMP
On Thu, Jul 07, 2011 at 01:46:23PM +0400, KOT MATPOCKuH wrote: > I updated system to r223824 and got named patched to 9.6.-ESV-R4-P3, > but problem is still exists: > 07-Jul-2011 13:24:22.765 general: > /usr/src/lib/bind/dns/../../../contrib/bind9/lib/dns/rbtdb.c:1622: > REQUIRE(prev > 0) failed > 07-Jul-2011 13:24:22.781 general: exiting (due to assertion failure) > > How can I find root cause of the problem? > >From your description it's unclear whether you've built BIND with or without sparc64_isc_disable_atomic.diff. If it was built without that patch please give it a try. If you had applied it then this apparently is a generic bug in BIND and unrelated to the MD atomic implementation and I don't know how to proceed in order to get that fixed. Hopefully Doug can help you in that case. Marius ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: named crashes on assertion in rbtdb.c on sparc64/SMP
I updated system to r223824 and got named patched to 9.6.-ESV-R4-P3, but problem is still exists: 07-Jul-2011 13:24:22.765 general: /usr/src/lib/bind/dns/../../../contrib/bind9/lib/dns/rbtdb.c:1622: REQUIRE(prev > 0) failed 07-Jul-2011 13:24:22.781 general: exiting (due to assertion failure) How can I find root cause of the problem? -- MATPOCKuH ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Heavy I/O blocks FreeBSD box for several seconds
on 06/07/2011 21:11 Nathan Whitehorn said the following: > On 07/06/11 13:00, Steve Kargl wrote: >> AFAICT, it is a cpu affinity issue. If I launch n+1 MPI images >> on a system with n cpus/cores, then 2 (and sometimes 3) images >> are stuck on a cpu and those 2 (or 3) images ping-pong on that >> cpu. I recall trying to use renice(8) to force some load >> balancing, but vaguely remember that it did not help. > > I've seen exactly this problem with multi-threaded math libraries, as well. Exactly the same? Let's see. > Using parallel GotoBLAS on FreeBSD gives terrible performance because the > threads keep migrating between CPUs, causing frequent cache misses. So Steve reports that if he has Nthr > Ncpu, then some threads are "over-glued" to a particular CPU, which results in sub-optimal scheduling for those threads. I have to guess that Steve would want to see the threads being shuffled between CPUs to produce more even CPU load. On the other hand, you report that your threads keep being shuffled between CPUs (I presume for Nthr == Ncpu case, where Nthr is a count of the number-crunching threads). And I guess that you want them to stay glued to particular CPUs. So how is this the same problem? In fact, it sounds like somewhat opposite. The only thing in common is that you both don't like how ULE works. ULE has many knobs to tune its behavior. Unfortunately they are not very well documented and there are too many of them. So, it's not easy to find which combination would be the best for a particular work-load. In your particular case you might want to try to increase value of kern.sched.affinity to increase affinity of threads to their CPUs. Also, please note that FreeBSD support in GotoBLAS is not equivalent to Linux support as I have pointed out before. On Linux they bind their threads to CPUs to avoid the situation that you describe. Apparently they didn't know how to do CPU-binding on FreeBSD, so this is not implemented. You may have a motivation to help them out with this. -- Andriy Gapon ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Heavy I/O blocks FreeBSD box for several seconds
on 06/07/2011 21:00 Steve Kargl said the following: > On Wed, Jul 06, 2011 at 05:05:41PM +, Poul-Henning Kamp wrote: >> In message <20110706170132.ga68...@troutmask.apl.washington.edu>, Steve >> Kargl w >> rites: >> >>> I periodically ran the same type test in the 2008 post over the >>> last three years. Nothing has changed. I even set up an account >>> on one node in my cluster for jeffr to use. He was too busy to >>> investigate at that time. >> >> Isn't this just the lemming-syncer hurling every dirty block over >> the cliff at the same time ? > > I don't know the answer. Of course, having no experience in > processing scheduling, I don't understand the question either ;-) I think that Poul-Henning was speaking in the vein of the subject line where I/O is somehow involved. I admit I would also love to hear more details in more technical terms (without lemmings and cliffs) :-) > AFAICT, it is a cpu affinity issue. If I launch n+1 MPI images > on a system with n cpus/cores, then 2 (and sometimes 3) images > are stuck on a cpu and those 2 (or 3) images ping-pong on that > cpu. I recall trying to use renice(8) to force some load > balancing, but vaguely remember that it did not help. Your issue seems to be about a specific case of purely CPU-bound loads. It is very relevant to ULE, but perhaps not to this particular thread. >> To find out: Run gstat and keep and eye on the leftmost column >> >> The road map for fixing that has been known for years... I would love to hear more about this. A link to a past discussion, if any, would suffice. -- Andriy Gapon ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Heavy I/O blocks FreeBSD box for several seconds
on 07/07/2011 06:11 Steve Kargl said the following: > Unfortunately, I have neither the brain capacity and time nor > the money to fix the issue. To solve OP's problem in the > short, the simplest solution may be to switch to 4BSD. Let's > face, ULE is not a silver bullet. I think that piling up different problems into a single discussion, even if they involve a common component (to a certain degree), is not going to help anybody. If I have read this thread correctly (and taking the subject line as a witness) the OP had a problem with heavy I/O activity screwing up interactivity. I think that it's not the same problem as sub-optimal performance of heavy CPU-bound load, which is what you reported if I am not mistaken. -- Andriy Gapon ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"