Re: LK_SHARED/LK_DOWNGRADE adjustments to lock.9 manual page
On Thu, Nov 29, 2012 at 12:05 PM, Andriy Gapon wrote: > on 16/11/2012 16:42 Andriy Gapon said the following: >> on 15/11/2012 23:44 Attilio Rao said the following: >>> Do you think you can test this patch?: >>> http://www.freebsd.org/~attilio/lockmgr_forcerec.patch >> >> I will use this patch in my tree, but I think that it is effectively already >> quite >> well tested by using INVARIANTS+WITNESS. >> > > I've been using this patch in both debug and non-debug environments and I > have not > run into any issues. Please commit when you get a chance. > Thank you. Committed as r243900, please proceed with manpage cleanup. Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: stop_cpus_hard when multiple CPUs are panicking from an NMI
On Sun, Nov 25, 2012 at 7:20 PM, Andriy Gapon wrote: > on 25/11/2012 20:05 Attilio Rao said the following: >> If you really want to do something like that please rename >> s/generic_stop_cpus/generic_stop_butself() or similar convention and I >> may be not opposed to it. > > As we discussed before, anything else besides "all but self" does not make > sense. So I am not sure what's the point of being that verbose in the naming. Avoid POLA violation and show there is a change in the KPI? I'd personally prefer a new name, but I'm open on not getting it. Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: stop_cpus_hard when multiple CPUs are panicking from an NMI
On Sun, Nov 25, 2012 at 3:39 PM, Andriy Gapon wrote: > on 25/11/2012 16:01 Attilio Rao said the following: >> On Sun, Nov 25, 2012 at 12:55 PM, Andriy Gapon wrote: >>> on 25/11/2012 14:29 Attilio Rao said the following: >>>> I think the patch you propose makes such effects even worse, because >>>> it disables interrupts in generic_stop_cpus(). >>>> What I suggest to do, is the following: >>>> - The CPU which wins the race for generic_stop_cpus also signals the >>>> CPUs it is willing to stop on a global mask >>>> - Another CPU entering generic_stop_cpus() and loosing the race, >>>> checks the mask of cpus which might be stopped and stops itself if >>>> necessary (ie. not yet done). We must be careful with races here, but >>>> I'm confindent this can be done easily enough. >>> >>> I think that you either misunderstood my patch or I misunderstand your >>> suggestion, because my patch does exactly what you wrote above. >> >> The patch is someway incomplete: >> - I don't think that we need specific checks in cpustop_handler() (and >> if you have added them to prevent races, I don't think they are >> enough, see below) > > The check is to cover the following scenario: > - two CPUs race in hard-stop scenario, and both are in NMI contexts Please consider also another type of (similar race): two CPUs race in (normal) stop scenario and the loser is with interrupt disabled. I don't this will be completely fatal, but I'm sure it can lead to issues (double stop, etc.). > - one CPU wins and the other does spinning right in generic_stop_cpus > - NMI for the spinning CPU is blocked and remains pending > - the winning CPU releases all other CPUs > - the spinning CPU exits the NMI context and get the NMI which was pending > >> - setting of "stopping_cpus" map must happen atomically/before the >> stopper_cpu cpuid setting, otherwise some CPUs may end up using a NULL >> mask in the check > > Not NULL, but empty or stale. But a very good point, I agree. > The logic must be redone. > >> - Did you consider the races about when a stop and restart request >> happen just after the CPU_ISSET() check? I think CPUs can deadlock >> there. > > Yeah, good point again. This seems to be a different side of the issue above. > stopping_cpus is probably a bad idea. > >> - I'm very doubious about the spinlock_enter() stuff, I think I can >> just make the problem worse atm. > > Well, this is where I disagree. I think that cpu_stop must already be called > in > context which effectively disable pre-emption and interrupts. > The added spinlock_enter() stuff is kind of a safety belt to make things more > explicit, but it could be changed into some sort of an assert if that's > possible. Maybe it can be come really safe once we take care of all the races involved and reported above. I reserve to suspend my judgement until we don't care of the other races. > >> However you are right, the concept of your patch is the same I really >> wanted to get, we maybe need to just lift it up a bit. > > I agree. > > To add some gas to the fire. Do you recall my wish to drop the mask parameter > from the stop calls? If that was done then it would be simpler to handle > these > things. In that case only "stopper_cpu" ID (master/winner) would be needed. If you really want to do something like that please rename s/generic_stop_cpus/generic_stop_butself() or similar convention and I may be not opposed to it. Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: [RFQ] make witness panic an option
On Sun, Nov 25, 2012 at 1:12 PM, Pawel Jakub Dawidek wrote: > On Sun, Nov 25, 2012 at 12:42:16PM +0000, Attilio Rao wrote: >> On Sun, Nov 25, 2012 at 12:39 PM, Pawel Jakub Dawidek >> wrote: >> > WITNESS is a development tool. We don't ship production kernels with >> > WITNESS even compiled in. What is more efficient use of developer time: >> > going through full reboot cycle every time or reading the warning from >> > console, unloading a module, fixing the bug and loading it again? >> > >> > And if this option is turned off by default what is the problem? >> >> Yes, so, why do you write here? > > I'm trying to understand why do you object. Until now the only concern > you have that I found is that you are afraid of it being abused. I don't > see how this can be abused if it is turned off by default. If someone > will commit a change that will turn it on by default, believe me, I'll > unleash hell personally. > > As I said, WITNESS is development tool, a very handy one. This doesn't > mean we can't make it even more handy. It is there to help find bugs > faster, right? Adrian is proposing a change that will make it help to > find and fix bugs maybe even faster. > >> Go ahead and fix BLESSED, make it the default, etc. > > This is another story, but BLESSED is much less controversial to me. > It is turned off by default in assumption that all the code that runs in > our kernel is developed for FreeBSD, which is not true. For example ZFS > is, I think, the biggest locking consumer in our kernel (around 120 > locks), which wasn't originally developed for FreeBSD and locking order > was verified using different tools. Now on FreeBSD it triggers massive > LOR warnings from WITNESS, eventhough those are not bugs. At some point > I verified many of them and they were all false-positives, so I simply > turned off WITNESS warnings for ZFS locks. Why? Because BLESSED is > turned off in fear of abuse, and this is turn is the cause of mentioned > hack in ZFS. Just a few notes about this: to my knowledge you never discussed publically this. WITNESS is not a perfect tool and it has several issues (I tried to summarize them a bit at the beginning of this thread), where the biggest problem is that its file/line approach doesn't help when willing to shutdown specific LOR without shutting down all of them involving the lock pair, etc. I have no idea what are the problems you are facing here with WITNESS and ZFS but if you could summarize them we can work on getting something which is useful also with ZFS. Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: [RFQ] make witness panic an option
On Sun, Nov 25, 2012 at 2:06 PM, Pawel Jakub Dawidek wrote: > On Sun, Nov 25, 2012 at 01:48:23PM +0000, Attilio Rao wrote: >> On Sun, Nov 25, 2012 at 1:47 PM, Pawel Jakub Dawidek >> wrote: >> > On Sun, Nov 25, 2012 at 01:37:19PM +, Attilio Rao wrote: >> >> On Sun, Nov 25, 2012 at 1:12 PM, Pawel Jakub Dawidek >> >> wrote: >> >> > On Sun, Nov 25, 2012 at 12:42:16PM +, Attilio Rao wrote: >> >> >> On Sun, Nov 25, 2012 at 12:39 PM, Pawel Jakub Dawidek >> >> >> wrote: >> >> >> > WITNESS is a development tool. We don't ship production kernels with >> >> >> > WITNESS even compiled in. What is more efficient use of developer >> >> >> > time: >> >> >> > going through full reboot cycle every time or reading the warning >> >> >> > from >> >> >> > console, unloading a module, fixing the bug and loading it again? >> >> >> > >> >> >> > And if this option is turned off by default what is the problem? >> >> >> >> >> >> Yes, so, why do you write here? >> >> > >> >> > I'm trying to understand why do you object. Until now the only concern >> >> > you have that I found is that you are afraid of it being abused. I don't >> >> > see how this can be abused if it is turned off by default. If someone >> >> > will commit a change that will turn it on by default, believe me, I'll >> >> > unleash hell personally. >> >> >> >> So I don't understand what are you proposing. >> >> You are not proposing to switch BLESSING on and you are not proposing >> >> to import Adrian's patches in, if I get it correctly. I don't >> >> understand then. >> > >> > I propose to get Adrian's patches in, just leave current behaviour as >> > the default. >> >> So if I tell that I'm afraid this mechanism will be abused (and >> believe me, I really wanted to trimm out BLESSING stuff also for the >> same reason) and you say "you can't see how" there is not much we can >> discuss. > > This is not what I said. I would see it as abuse if someone will > suddenly decided to turn off locking assertions by default in FreeBSD > base. > > If he will turn that off on his private machine be it to speed up his > development (a good thing) or to shut up important lock assertion (a bad > thing) this is entirely his decision. He can already do that having all > the source code, its just more complex. Make tools, not policies. > > BLESSING is totally different subject. You were afraid that people will > start to silence LORs they don't understand by committing blessed pairs > to FreeBSD base. And this situation is abuse and I fully agree, but I > also still think BLESSING is useful, although I recognize it might be > hard to prevent mentioned abuse. > > In case of Adrian's patch nothing will change in how we enforce locking > assertions in FreeBSD base. > >> You know how I think, there is no need to wait for me to reconsider, >> because I don't believe this will happen with arguments like "I don't >> think", "I don't agree", etc. > > I provide valid arguments with I hope proper explanation, you choose not > to address them or ignore them and I hope this will change:) I'm not ignoring them, I'm saying that your arguments are not enough convincing to me. And really, giving the possibility to turn off assertions in witness is already a dangerous tool I want to avoid (not only related to BLESSING). If there are some cases that deserve a panic, we might just get it, not matter how sysctls are setup. However it seems to me I'm just saying the same thing since 20 e-mails, please drop me from CC in your next follow up. As I said, you can commit all the changes you want (assuming they are technically correct) even if I would appreciate my disagreement is expressed in the commit message. Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: stop_cpus_hard when multiple CPUs are panicking from an NMI
On Sun, Nov 25, 2012 at 12:55 PM, Andriy Gapon wrote: > on 25/11/2012 14:29 Attilio Rao said the following: >> I think the patch you propose makes such effects even worse, because >> it disables interrupts in generic_stop_cpus(). >> What I suggest to do, is the following: >> - The CPU which wins the race for generic_stop_cpus also signals the >> CPUs it is willing to stop on a global mask >> - Another CPU entering generic_stop_cpus() and loosing the race, >> checks the mask of cpus which might be stopped and stops itself if >> necessary (ie. not yet done). We must be careful with races here, but >> I'm confindent this can be done easily enough. > > I think that you either misunderstood my patch or I misunderstand your > suggestion, because my patch does exactly what you wrote above. The patch is someway incomplete: - I don't think that we need specific checks in cpustop_handler() (and if you have added them to prevent races, I don't think they are enough, see below) - setting of "stopping_cpus" map must happen atomically/before the stopper_cpu cpuid setting, otherwise some CPUs may end up using a NULL mask in the check - Did you consider the races about when a stop and restart request happen just after the CPU_ISSET() check? I think CPUs can deadlock there. - I'm very doubious about the spinlock_enter() stuff, I think I can just make the problem worse atm. However you are right, the concept of your patch is the same I really wanted to get, we maybe need to just lift it up a bit. In the while I also double-checked suspended_cpus and I don't think there are real showstoppers to have it in stopped_cpus map. Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: [RFQ] make witness panic an option
On Sun, Nov 25, 2012 at 1:47 PM, Pawel Jakub Dawidek wrote: > On Sun, Nov 25, 2012 at 01:37:19PM +0000, Attilio Rao wrote: >> On Sun, Nov 25, 2012 at 1:12 PM, Pawel Jakub Dawidek >> wrote: >> > On Sun, Nov 25, 2012 at 12:42:16PM +, Attilio Rao wrote: >> >> On Sun, Nov 25, 2012 at 12:39 PM, Pawel Jakub Dawidek >> >> wrote: >> >> > WITNESS is a development tool. We don't ship production kernels with >> >> > WITNESS even compiled in. What is more efficient use of developer time: >> >> > going through full reboot cycle every time or reading the warning from >> >> > console, unloading a module, fixing the bug and loading it again? >> >> > >> >> > And if this option is turned off by default what is the problem? >> >> >> >> Yes, so, why do you write here? >> > >> > I'm trying to understand why do you object. Until now the only concern >> > you have that I found is that you are afraid of it being abused. I don't >> > see how this can be abused if it is turned off by default. If someone >> > will commit a change that will turn it on by default, believe me, I'll >> > unleash hell personally. >> >> So I don't understand what are you proposing. >> You are not proposing to switch BLESSING on and you are not proposing >> to import Adrian's patches in, if I get it correctly. I don't >> understand then. > > I propose to get Adrian's patches in, just leave current behaviour as > the default. So if I tell that I'm afraid this mechanism will be abused (and believe me, I really wanted to trimm out BLESSING stuff also for the same reason) and you say "you can't see how" there is not much we can discuss. You know how I think, there is no need to wait for me to reconsider, because I don't believe this will happen with arguments like "I don't think", "I don't agree", etc. Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: [RFQ] make witness panic an option
On Sun, Nov 25, 2012 at 1:12 PM, Pawel Jakub Dawidek wrote: > On Sun, Nov 25, 2012 at 12:42:16PM +0000, Attilio Rao wrote: >> On Sun, Nov 25, 2012 at 12:39 PM, Pawel Jakub Dawidek >> wrote: >> > WITNESS is a development tool. We don't ship production kernels with >> > WITNESS even compiled in. What is more efficient use of developer time: >> > going through full reboot cycle every time or reading the warning from >> > console, unloading a module, fixing the bug and loading it again? >> > >> > And if this option is turned off by default what is the problem? >> >> Yes, so, why do you write here? > > I'm trying to understand why do you object. Until now the only concern > you have that I found is that you are afraid of it being abused. I don't > see how this can be abused if it is turned off by default. If someone > will commit a change that will turn it on by default, believe me, I'll > unleash hell personally. So I don't understand what are you proposing. You are not proposing to switch BLESSING on and you are not proposing to import Adrian's patches in, if I get it correctly. I don't understand then. Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: [RFQ] make witness panic an option
On Sun, Nov 25, 2012 at 12:39 PM, Pawel Jakub Dawidek wrote: > On Thu, Nov 15, 2012 at 04:39:55PM +0000, Attilio Rao wrote: >> On 11/15/12, Adrian Chadd wrote: >> > On 15 November 2012 05:27, Giovanni Trematerra >> > wrote: >> > >> >> I really do think that is a very bad idea. >> >> When a locking assertion fails you have just to stop your mind and >> >> think what's wrong, >> >> no way to postpone on this. >> > >> > Not all witness panics are actually fatal. For a developer who is >> > sufficiently cluey in their area, they are quite likely able to just >> > stare at the code paths for a while to figure out why the >> > incorrectness occured. >> >> The problem is that such mechanism can be abused, just like the >> BLESSING one and that's why this is disabled by default. > > WITNESS is a development tool. We don't ship production kernels with > WITNESS even compiled in. What is more efficient use of developer time: > going through full reboot cycle every time or reading the warning from > console, unloading a module, fixing the bug and loading it again? > > And if this option is turned off by default what is the problem? Yes, so, why do you write here? Go ahead and fix BLESSED, make it the default, etc. I have enough of your (not referred to you particulary but to the people which contributed to this and other thread) to not be able to respect others opinion. As I said I cannot forbid you guys from doing anything, just go ahead, write the code and commit it, albeit completely bypassing other people's opinion. Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: stop_cpus_hard when multiple CPUs are panicking from an NMI
On Sun, Nov 25, 2012 at 12:29 PM, Attilio Rao wrote: > On Fri, Nov 16, 2012 at 2:47 PM, Andriy Gapon wrote: >> on 16/11/2012 16:41 Attilio Rao said the following: >>> On Fri, Nov 16, 2012 at 1:18 PM, Andriy Gapon wrote: >>>> on 16/11/2012 14:30 Attilio Rao said the following: >>>>> On Fri, Nov 16, 2012 at 7:54 AM, Andriy Gapon wrote: >>>>>> on 16/11/2012 00:58 Ryan Stone said the following: >>>>>>> At work we have some custom watchdog hardware that sends an NMI upon >>>>>>> expiry. We've modified the kernel to panic when it receives the >>>>>>> watchdog >>>>>>> NMI. I've been trying the "stop_scheduler_on_panic" mode, and I've >>>>>>> discovered that when my watchdog expires, the system gets completely >>>>>>> wedged. After some digging, I've discovered is that I have multiple >>>>>>> CPUs >>>>>>> getting the watchdog NMI and trying to panic concurrently. One of the >>>>>>> CPUs >>>>>>> wins, and the rest spin forever in this code: >>>>>>> >>>>>>> /* >>>>>>> * We don't want multiple CPU's to panic at the same time, so we >>>>>>> * use panic_cpu as a simple spinlock. We have to keep checking >>>>>>> * panic_cpu if we are spinning in case the panic on the first >>>>>>> * CPU is canceled. >>>>>>> */ >>>>>>> if (panic_cpu != PCPU_GET(cpuid)) >>>>>>> while (atomic_cmpset_int(&panic_cpu, NOCPU, >>>>>>> PCPU_GET(cpuid)) == 0) >>>>>>> while (panic_cpu != NOCPU) >>>>>>> ; /* nothing */ >>>>>>> >>>>>>> The system wedges when stop_cpus_hard() is called, which sends NMIs to >>>>>>> all >>>>>>> of the other CPUs and waits for them to acknowledge that they are >>>>>>> stopped >>>>>>> before returning. However the CPU will not deliver an NMI to a CPU >>>>>>> that is >>>>>>> already handling an NMI, so the other CPUs that got a watchdog NMI and >>>>>>> are >>>>>>> spinning will never go into the NMI handler and acknowledge that they >>>>>>> are >>>>>>> stopped. >>>>>> >>>>>> I thought about this issue and fixed (in my tree) in a different way: >>>>>> http://people.freebsd.org/~avg/cpu_stop-race.diff >>>>>> >>>>>> The need for spinlock_enter in the patch in not entirely clear. >>>>>> The main idea is that a CPU which calls cpu_stop and loses a race should >>>>>> voluntary enter cpustop_handler. >>>>>> I am also not sure about MI-cleanness of this patch. >>>>> >>>>> It is similar to what I propose but with some differences: >>>>> - It is not clean from MI perspective >>>> >>>> OK. >>>> >>>>> - I don't think we need to treact it specially, I would just >>>>> unconditionally stop all the CPUs entering in the "spinlock zone", >>>>> making the patch simpler. >>>> >>>> I couldn't understand this part. >>> >>> I'm sorry, I think I misread your patch. >>> >>> I was basically suggesting to fix Ryan case by calling >>> cpustop_handler() (or the new MI interface) into the panic() function, >>> in the case the CPU don't win the race for panic_cpu. >>> Basically doing: >>> Index: sys/kern/kern_shutdown.c >>> === >>> --- sys/kern/kern_shutdown.c(revision 243154) >>> +++ sys/kern/kern_shutdown.c(working copy) >>> @@ -568,15 +568,11 @@ panic(const char *fmt, ...) >>> #ifdef SMP >>> /* >>> * We don't want multiple CPU's to panic at the same time, so we >>> -* use panic_cpu as a simple spinlock. We have to keep checking >>> -* panic_cpu if we are spinning in case the panic on the first >>> -* CPU is canceled. >>> +* use panic_cpu as a simple lock. >>> */ >>> if (panic_cpu != PCPU_GET(
Re: stop_cpus_hard when multiple CPUs are panicking from an NMI
On Fri, Nov 16, 2012 at 2:47 PM, Andriy Gapon wrote: > on 16/11/2012 16:41 Attilio Rao said the following: >> On Fri, Nov 16, 2012 at 1:18 PM, Andriy Gapon wrote: >>> on 16/11/2012 14:30 Attilio Rao said the following: >>>> On Fri, Nov 16, 2012 at 7:54 AM, Andriy Gapon wrote: >>>>> on 16/11/2012 00:58 Ryan Stone said the following: >>>>>> At work we have some custom watchdog hardware that sends an NMI upon >>>>>> expiry. We've modified the kernel to panic when it receives the watchdog >>>>>> NMI. I've been trying the "stop_scheduler_on_panic" mode, and I've >>>>>> discovered that when my watchdog expires, the system gets completely >>>>>> wedged. After some digging, I've discovered is that I have multiple CPUs >>>>>> getting the watchdog NMI and trying to panic concurrently. One of the >>>>>> CPUs >>>>>> wins, and the rest spin forever in this code: >>>>>> >>>>>> /* >>>>>> * We don't want multiple CPU's to panic at the same time, so we >>>>>> * use panic_cpu as a simple spinlock. We have to keep checking >>>>>> * panic_cpu if we are spinning in case the panic on the first >>>>>> * CPU is canceled. >>>>>> */ >>>>>> if (panic_cpu != PCPU_GET(cpuid)) >>>>>> while (atomic_cmpset_int(&panic_cpu, NOCPU, >>>>>> PCPU_GET(cpuid)) == 0) >>>>>> while (panic_cpu != NOCPU) >>>>>> ; /* nothing */ >>>>>> >>>>>> The system wedges when stop_cpus_hard() is called, which sends NMIs to >>>>>> all >>>>>> of the other CPUs and waits for them to acknowledge that they are stopped >>>>>> before returning. However the CPU will not deliver an NMI to a CPU that >>>>>> is >>>>>> already handling an NMI, so the other CPUs that got a watchdog NMI and >>>>>> are >>>>>> spinning will never go into the NMI handler and acknowledge that they are >>>>>> stopped. >>>>> >>>>> I thought about this issue and fixed (in my tree) in a different way: >>>>> http://people.freebsd.org/~avg/cpu_stop-race.diff >>>>> >>>>> The need for spinlock_enter in the patch in not entirely clear. >>>>> The main idea is that a CPU which calls cpu_stop and loses a race should >>>>> voluntary enter cpustop_handler. >>>>> I am also not sure about MI-cleanness of this patch. >>>> >>>> It is similar to what I propose but with some differences: >>>> - It is not clean from MI perspective >>> >>> OK. >>> >>>> - I don't think we need to treact it specially, I would just >>>> unconditionally stop all the CPUs entering in the "spinlock zone", >>>> making the patch simpler. >>> >>> I couldn't understand this part. >> >> I'm sorry, I think I misread your patch. >> >> I was basically suggesting to fix Ryan case by calling >> cpustop_handler() (or the new MI interface) into the panic() function, >> in the case the CPU don't win the race for panic_cpu. >> Basically doing: >> Index: sys/kern/kern_shutdown.c >> === >> --- sys/kern/kern_shutdown.c(revision 243154) >> +++ sys/kern/kern_shutdown.c(working copy) >> @@ -568,15 +568,11 @@ panic(const char *fmt, ...) >> #ifdef SMP >> /* >> * We don't want multiple CPU's to panic at the same time, so we >> -* use panic_cpu as a simple spinlock. We have to keep checking >> -* panic_cpu if we are spinning in case the panic on the first >> -* CPU is canceled. >> +* use panic_cpu as a simple lock. >> */ >> if (panic_cpu != PCPU_GET(cpuid)) >> - while (atomic_cmpset_int(&panic_cpu, NOCPU, >> - PCPU_GET(cpuid)) == 0) >> - while (panic_cpu != NOCPU) >> - ; /* nothing */ >> + if (atomic_cmpset_int(&panic_cpu, NOCPU, PCPU_GET(cpuid)) == >> 0) >> + cpustop_handler(); >> >> if (stop_scheduler_on_panic) { >>
Re: stop_cpus_hard when multiple CPUs are panicking from an NMI
On Fri, Nov 16, 2012 at 1:18 PM, Andriy Gapon wrote: > on 16/11/2012 14:30 Attilio Rao said the following: >> On Fri, Nov 16, 2012 at 7:54 AM, Andriy Gapon wrote: >>> on 16/11/2012 00:58 Ryan Stone said the following: >>>> At work we have some custom watchdog hardware that sends an NMI upon >>>> expiry. We've modified the kernel to panic when it receives the watchdog >>>> NMI. I've been trying the "stop_scheduler_on_panic" mode, and I've >>>> discovered that when my watchdog expires, the system gets completely >>>> wedged. After some digging, I've discovered is that I have multiple CPUs >>>> getting the watchdog NMI and trying to panic concurrently. One of the CPUs >>>> wins, and the rest spin forever in this code: >>>> >>>> /* >>>> * We don't want multiple CPU's to panic at the same time, so we >>>> * use panic_cpu as a simple spinlock. We have to keep checking >>>> * panic_cpu if we are spinning in case the panic on the first >>>> * CPU is canceled. >>>> */ >>>> if (panic_cpu != PCPU_GET(cpuid)) >>>> while (atomic_cmpset_int(&panic_cpu, NOCPU, >>>> PCPU_GET(cpuid)) == 0) >>>> while (panic_cpu != NOCPU) >>>> ; /* nothing */ >>>> >>>> The system wedges when stop_cpus_hard() is called, which sends NMIs to all >>>> of the other CPUs and waits for them to acknowledge that they are stopped >>>> before returning. However the CPU will not deliver an NMI to a CPU that is >>>> already handling an NMI, so the other CPUs that got a watchdog NMI and are >>>> spinning will never go into the NMI handler and acknowledge that they are >>>> stopped. >>> >>> I thought about this issue and fixed (in my tree) in a different way: >>> http://people.freebsd.org/~avg/cpu_stop-race.diff >>> >>> The need for spinlock_enter in the patch in not entirely clear. >>> The main idea is that a CPU which calls cpu_stop and loses a race should >>> voluntary enter cpustop_handler. >>> I am also not sure about MI-cleanness of this patch. >> >> It is similar to what I propose but with some differences: >> - It is not clean from MI perspective > > OK. > >> - I don't think we need to treact it specially, I would just >> unconditionally stop all the CPUs entering in the "spinlock zone", >> making the patch simpler. > > I couldn't understand this part. I'm sorry, I think I misread your patch. I was basically suggesting to fix Ryan case by calling cpustop_handler() (or the new MI interface) into the panic() function, in the case the CPU don't win the race for panic_cpu. Basically doing: Index: sys/kern/kern_shutdown.c === --- sys/kern/kern_shutdown.c(revision 243154) +++ sys/kern/kern_shutdown.c(working copy) @@ -568,15 +568,11 @@ panic(const char *fmt, ...) #ifdef SMP /* * We don't want multiple CPU's to panic at the same time, so we -* use panic_cpu as a simple spinlock. We have to keep checking -* panic_cpu if we are spinning in case the panic on the first -* CPU is canceled. +* use panic_cpu as a simple lock. */ if (panic_cpu != PCPU_GET(cpuid)) - while (atomic_cmpset_int(&panic_cpu, NOCPU, - PCPU_GET(cpuid)) == 0) - while (panic_cpu != NOCPU) - ; /* nothing */ + if (atomic_cmpset_int(&panic_cpu, NOCPU, PCPU_GET(cpuid)) == 0) + cpustop_handler(); if (stop_scheduler_on_panic) { if (panicstr == NULL && !kdb_active) { Infact it seems to me that the comment is outdated and no longer represent truth. Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: [RFQ] make witness panic an option
On Fri, Nov 16, 2012 at 7:19 AM, Andriy Gapon wrote: > on 16/11/2012 01:38 Attilio Rao said the following: >> On Thu, Nov 15, 2012 at 8:51 PM, Andriy Gapon wrote: >>> on 15/11/2012 22:00 Adrian Chadd said the following: >>>> But I think my change is invaluable for development, where you want to >>>> improve and debug the locking and lock interactions of a subsystem. >>> >>> My practical experience was that if you mess up one lock in one place, then >>> it >>> is a total mess further on. but apparently you've got a different practical >>> experience :) >>> What would indeed be invaluable to _me_ - if the LOR messages also produced >>> the >>> stack(s) where a supposedly correct lock order was learned. >> >> Please note that the "supposedly correct lock order", as for the >> definition that it is correct, can be used in several different >> stacks. I don't see the point of saving it somewhere. >> The only helpful case would be if the "wrong order" is catched first. >> If this is really the case, I suggest you to force the order you >> expect in the static table so that the first time the wrong order >> happens it yells. > > Exactly my point - if I am a user of some code, not its developer, and I don't > know which one is the correct order I would have had the complete information > from the very start instead of having to jump through the hoops. I don't agree -- such informations are only useful to developers and also what should we do, store all the valid stacktraces? I don't understand what are you expecting here honestly. Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: stop_cpus_hard when multiple CPUs are panicking from an NMI
On Fri, Nov 16, 2012 at 7:54 AM, Andriy Gapon wrote: > on 16/11/2012 00:58 Ryan Stone said the following: >> At work we have some custom watchdog hardware that sends an NMI upon >> expiry. We've modified the kernel to panic when it receives the watchdog >> NMI. I've been trying the "stop_scheduler_on_panic" mode, and I've >> discovered that when my watchdog expires, the system gets completely >> wedged. After some digging, I've discovered is that I have multiple CPUs >> getting the watchdog NMI and trying to panic concurrently. One of the CPUs >> wins, and the rest spin forever in this code: >> >> /* >> * We don't want multiple CPU's to panic at the same time, so we >> * use panic_cpu as a simple spinlock. We have to keep checking >> * panic_cpu if we are spinning in case the panic on the first >> * CPU is canceled. >> */ >> if (panic_cpu != PCPU_GET(cpuid)) >> while (atomic_cmpset_int(&panic_cpu, NOCPU, >> PCPU_GET(cpuid)) == 0) >> while (panic_cpu != NOCPU) >> ; /* nothing */ >> >> The system wedges when stop_cpus_hard() is called, which sends NMIs to all >> of the other CPUs and waits for them to acknowledge that they are stopped >> before returning. However the CPU will not deliver an NMI to a CPU that is >> already handling an NMI, so the other CPUs that got a watchdog NMI and are >> spinning will never go into the NMI handler and acknowledge that they are >> stopped. > > I thought about this issue and fixed (in my tree) in a different way: > http://people.freebsd.org/~avg/cpu_stop-race.diff > > The need for spinlock_enter in the patch in not entirely clear. > The main idea is that a CPU which calls cpu_stop and loses a race should > voluntary enter cpustop_handler. > I am also not sure about MI-cleanness of this patch. It is similar to what I propose but with some differences: - It is not clean from MI perspective - I don't think we need to treact it specially, I would just unconditionally stop all the CPUs entering in the "spinlock zone", making the patch simpler. So I guess you are really fine with the proposal I made? Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: stop_cpus_hard when multiple CPUs are panicking from an NMI
On Thu, Nov 15, 2012 at 11:47 PM, Ryan Stone wrote: > On Thu, Nov 15, 2012 at 6:41 PM, Attilio Rao wrote: >> >> On Thu, Nov 15, 2012 at 10:58 PM, Ryan Stone wrote: >> > At work we have some custom watchdog hardware that sends an NMI upon >> > expiry. We've modified the kernel to panic when it receives the >> > watchdog >> > NMI. I've been trying the "stop_scheduler_on_panic" mode, and I've >> > discovered that when my watchdog expires, the system gets completely >> > wedged. After some digging, I've discovered is that I have multiple >> > CPUs >> > getting the watchdog NMI and trying to panic concurrently. One of the >> > CPUs >> > wins, and the rest spin forever in this code: >> >> Quick question: can you control the way your watchdog sends the NMI? >> Like only to BSP rather than broadcast, etc. >> This is tied to the very unique situation that you cannot really >> deliver the (second) NMI. >> >> Attilio >> >> >> -- >> Peace can only be achieved by understanding - A. Einstein > > > I don't believe that I can, but I can check. In any case I can imagine > other places where this could be an issue. hwpmc works with NMIs, right? > So an hwpmc bug could trigger the same kind of issues if two CPUs that > concurrently called pmc_intr both tripped over the sane bug. Frankly, I think that what you were trying to do is someway the right approach, modulo a clean interface. I don't understand why the "spinlock" does wants to spin forever as it can never recover. Stopping the cpus that gets into the "spinlock" is perfectly fine. There are only 2 things to consider: 1) I think we need a new KPI for that, a function in $arch/include/cpu.h that does take care to stop a CPU in MI way, so for example cpu_self_stop(). This needs to be implemented for all the architectures but it can be done easily because it will be what cpustop_handler() and similar functions do, basically. 2) The "fake spinlock" path will call such functions. The only thing to debeate IMHO is if we want to do that conditional to stop_scheduler_on_panic or not. If I have to be honest, stopping the CPU seems the best approach in any case to me, but I'm open to hear what you think. Comments? Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: stop_cpus_hard when multiple CPUs are panicking from an NMI
On Thu, Nov 15, 2012 at 10:58 PM, Ryan Stone wrote: > At work we have some custom watchdog hardware that sends an NMI upon > expiry. We've modified the kernel to panic when it receives the watchdog > NMI. I've been trying the "stop_scheduler_on_panic" mode, and I've > discovered that when my watchdog expires, the system gets completely > wedged. After some digging, I've discovered is that I have multiple CPUs > getting the watchdog NMI and trying to panic concurrently. One of the CPUs > wins, and the rest spin forever in this code: Quick question: can you control the way your watchdog sends the NMI? Like only to BSP rather than broadcast, etc. This is tied to the very unique situation that you cannot really deliver the (second) NMI. Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: [RFQ] make witness panic an option
On Thu, Nov 15, 2012 at 8:51 PM, Andriy Gapon wrote: > on 15/11/2012 22:00 Adrian Chadd said the following: >> But I think my change is invaluable for development, where you want to >> improve and debug the locking and lock interactions of a subsystem. > > My practical experience was that if you mess up one lock in one place, then it > is a total mess further on. but apparently you've got a different practical > experience :) > What would indeed be invaluable to _me_ - if the LOR messages also produced > the > stack(s) where a supposedly correct lock order was learned. Please note that the "supposedly correct lock order", as for the definition that it is correct, can be used in several different stacks. I don't see the point of saving it somewhere. The only helpful case would be if the "wrong order" is catched first. If this is really the case, I suggest you to force the order you expect in the static table so that the first time the wrong order happens it yells. Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: LK_SHARED/LK_DOWNGRADE adjustments to lock.9 manual page
On Thu, Nov 15, 2012 at 8:38 PM, Andriy Gapon wrote: > on 15/11/2012 20:46 Attilio Rao said the following: >> On 11/15/12, Andriy Gapon wrote: >>> >>> To people knowing the code, >>> >>> do the following documentation changes look correct? >> >> The latter chunk is not correct. >> It will panic only if assertions are on. > > But the current content is not correct too? Indeed, current content is crappy. >> I was thinking that however >> it would be good idea to patch lockmgr to panic also in non-debugging >> kernel situation. > > It would make sense indeed, IMO. Do you think you can test this patch?: http://www.freebsd.org/~attilio/lockmgr_forcerec.patch I think the LK_NOSHARE case is still fine with just asserts. Once this patch goes in, you are free to commit your documentation one. Thanks for fixing doc. Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: LK_SHARED/LK_DOWNGRADE adjustments to lock.9 manual page
On 11/15/12, Andriy Gapon wrote: > > To people knowing the code, > > do the following documentation changes look correct? The latter chunk is not correct. It will panic only if assertions are on. I was thinking that however it would be good idea to patch lockmgr to panic also in non-debugging kernel situation. Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: [RFQ] make witness panic an option
On 11/15/12, Adrian Chadd wrote: > On 15 November 2012 10:01, Attilio Rao wrote: >> I think that your worries are focused more around the latter than the >> former, which can be easilly shut down already today. >> >> And frankly I will never be in favor of a patch that automatically >> shutdowns lock assertion. Please patch your local code to do so but >> don't add any generic/upstream/all-around mechanism for that. > > Would a comprimise be ok? Ie, if I pushed everything but the sysctl > upstream, and just defaulted it to always panic? > > That way my diff wouldn't have to be a big thing; I'd just add the sysctl. I cannot forbid you from doing anything, I'm just giving you my opinion as the person who co-authored current WITNESS and locking primitives code. I think what you want to do is dangerous and highly abusable, so I'm not in favor of it at all, in whatever form it is. I understand you want to minimize your development patchset with upstream, but I think this is certainly not the way to go. That's also why I never formalized the BLESSING mechanism in WITNESS, for example. I already see WITNESS_KDB as an abuse, but at least until we have a way to make specific LOR (based on file/line, unfortunately) to be marked as "harmless" there is no way we can get rid of WITNESS_KDB. Said all that, you are free to do what you want, but if you commit anything in this area make sure your patch is reviewed by someone else and to state my firm disagreement with this approach in the commit message. Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: [RFQ] make witness panic an option
On 11/15/12, Adrian Chadd wrote: > On 15 November 2012 09:56, Warner Losh wrote: > >>> Do you really think that an abusable mechanism will help here rather > >> It sounds like he's more worried about introducing LoRs into his wireless >> code. They are harmless, for him, and he can fix them by reloading the >> driver. They are only harmful if he loses a race. > > LOR's and lock assertions. Ie, I want to find out at run time that a > lock wasnt' held at a certain position but continue soldiering on. > That's how I've been finding out all of the races in net80211/ath, as > there wasn't locking where there needed to be. I think that your worries are focused more around the latter than the former, which can be easilly shut down already today. And frankly I will never be in favor of a patch that automatically shutdowns lock assertion. Please patch your local code to do so but don't add any generic/upstream/all-around mechanism for that. Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: [RFQ] make witness panic an option
On 11/15/12, Warner Losh wrote: > > On Nov 15, 2012, at 10:47 AM, Attilio Rao wrote: > >> On 11/15/12, Ian Lepore wrote: >>> On Wed, 2012-11-14 at 22:15 -0800, Adrian Chadd wrote: >>>> Hi all, >>>> >>>> When debugging and writing wireless drivers/stack code, I like to >>>> sprinkle lots of locking assertions everywhere. However, this does >>>> cause things to panic quite often during active development. >>>> >>>> This patch (against stable/9) makes the actual panic itself >>>> configurable. It still prints the message regardless. >>>> >>>> This has allowed me to sprinkle more locking assertions everywhere to >>>> investigate whether particular paths have been hit or not. I don't >>>> necessarily want those to panic the kernel. >>>> >>>> I'd like everyone to consider this for FreeBSD-HEAD. >>>> >>>> Thanks! >>> >>> I strongly support this, because I'm tired of having to hack it in by >>> hand every time I need it. >>> >>> You can't boot an arm platform right now (on freebsd 8, 9, or 10) >>> without a LOR very early in the boot. Once you get past that, 2 or 3 >>> device drivers I use panic way before we even get to mounting root. >>> Those panics can clearly be ignored, because we've been shipping >>> products for years based on this code. (It's on my to-do list to fix >>> them, but more pressing problems are higher on the list.) >> >> This is a ridicolous motivation. >> What are the panics in question? Why they are not fixed yet? >> Without WITNESS_KDB you should not panic even in cases where WITNESS >> yells. So if you do, it means there is a more subdole breakage going >> on here. >> >> Do you really think that an abusable mechanism will help here rather >> than fixing the actual problems? >> >>> When a new problem crops up that isn't harmless, it totally sucks that I >>> can't just turn on witness without first hacking the code to make the >>> known problems non-panicky. >> >> I really don't understand what are these "harmless problems" here. >> I just know one and it is between the dirhash lock and the bufwait >> lock for UFS, which is carefully documented in the code comments. All >> the others cases haven't been analyzed deeply enough to quantify them >> as "harmless". >> >> Can you please make real examples? > > It sounds like he's more worried about introducing LoRs into his wireless > code. They are harmless, for him, and he can fix them by reloading the > driver. They are only harmful if he loses a race. Without WITNESS_KDB the kernel won't panic if this is really the case, if it is only about LOR yelling. Otherwise the breakage is more serious and I would like to see a real case example. Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: [RFQ] make witness panic an option
On 11/15/12, Ian Lepore wrote: > On Wed, 2012-11-14 at 22:15 -0800, Adrian Chadd wrote: >> Hi all, >> >> When debugging and writing wireless drivers/stack code, I like to >> sprinkle lots of locking assertions everywhere. However, this does >> cause things to panic quite often during active development. >> >> This patch (against stable/9) makes the actual panic itself >> configurable. It still prints the message regardless. >> >> This has allowed me to sprinkle more locking assertions everywhere to >> investigate whether particular paths have been hit or not. I don't >> necessarily want those to panic the kernel. >> >> I'd like everyone to consider this for FreeBSD-HEAD. >> >> Thanks! > > I strongly support this, because I'm tired of having to hack it in by > hand every time I need it. > > You can't boot an arm platform right now (on freebsd 8, 9, or 10) > without a LOR very early in the boot. Once you get past that, 2 or 3 > device drivers I use panic way before we even get to mounting root. > Those panics can clearly be ignored, because we've been shipping > products for years based on this code. (It's on my to-do list to fix > them, but more pressing problems are higher on the list.) This is a ridicolous motivation. What are the panics in question? Why they are not fixed yet? Without WITNESS_KDB you should not panic even in cases where WITNESS yells. So if you do, it means there is a more subdole breakage going on here. Do you really think that an abusable mechanism will help here rather than fixing the actual problems? > When a new problem crops up that isn't harmless, it totally sucks that I > can't just turn on witness without first hacking the code to make the > known problems non-panicky. I really don't understand what are these "harmless problems" here. I just know one and it is between the dirhash lock and the bufwait lock for UFS, which is carefully documented in the code comments. All the others cases haven't been analyzed deeply enough to quantify them as "harmless". Can you please make real examples? Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: [RFQ] make witness panic an option
On 11/15/12, Adrian Chadd wrote: > On 15 November 2012 05:27, Giovanni Trematerra > wrote: > >> I really do think that is a very bad idea. >> When a locking assertion fails you have just to stop your mind and >> think what's wrong, >> no way to postpone on this. > > Not all witness panics are actually fatal. For a developer who is > sufficiently cluey in their area, they are quite likely able to just > stare at the code paths for a while to figure out why the > incorrectness occured. The problem is that such mechanism can be abused, just like the BLESSING one and that's why this is disabled by default. I believe having a mechanism to use printf for witness is not a good idea. > As I said, I do this primarily so I can sprinkle lots of lock > owned/unowned assertions around my driver(s) and then use that to > catch when things aren't being correct. Having to reboot upon _every_ > lock assertion quickly got old. You can use it as a local patch then. This is not really the usual way to develop locking policies. Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: ule+smp: small optimization for turnstile priority lending
On 9/20/12, David Xu wrote: > On 2012/09/18 22:05, Andriy Gapon wrote: >> >> Here is a snippet that demonstrates the issue on a supposedly fully >> loaded >> 2-processor system: >> >> 136794 0 3670427870244462 KTRGRAPH group:"thread", id:"Xorg tid >> 102818", >> state:"running", attributes: prio:122 >> >> 136793 0 3670427870241000 KTRGRAPH group:"thread", id:"cc1plus tid >> 111916", >> state:"yielding", attributes: prio:183, wmesg:"(null)", lockname:"(null)" >> >> 136792 1 3670427870240829 KTRGRAPH group:"thread", id:"idle: cpu1 tid >> 14", >> state:"running", attributes: prio:255 >> >> 136791 1 3670427870239520 KTRGRAPH group:"load", id:"CPU 1 load", >> counter:0, >> attributes: none >> >> 136790 1 3670427870239248 KTRGRAPH group:"thread", id:"firefox tid >> 113473", >> state:"blocked", attributes: prio:122, wmesg:"(null)", lockname:"unp_mtx" >> >> 136789 1 3670427870237697 KTRGRAPH group:"load", id:"CPU 0 load", >> counter:2, >> attributes: none >> >> 136788 1 3670427870236394 KTRGRAPH group:"thread", id:"firefox tid >> 113473", >> point:"wokeup", attributes: linkedto:"Xorg tid 102818" >> >> 136787 1 3670427870236145 KTRGRAPH group:"thread", id:"Xorg tid >> 102818", >> state:"runq add", attributes: prio:122, linkedto:"firefox tid 113473" >> >> 136786 1 3670427870235981 KTRGRAPH group:"load", id:"CPU 1 load", >> counter:1, >> attributes: none >> >> 136785 1 3670427870235707 KTRGRAPH group:"thread", id:"Xorg tid >> 102818", >> state:"runq rem", attributes: prio:176 >> >> 136784 1 3670427870235423 KTRGRAPH group:"thread", id:"Xorg tid >> 102818", >> point:"prio", attributes: prio:176, new prio:122, linkedto:"firefox tid >> 113473" >> >> 136783 1 3670427870202392 KTRGRAPH group:"thread", id:"firefox tid >> 113473", >> state:"running", attributes: prio:104 >> >> See how how the Xorg thread was forced from CPU 1 to CPU 0 where it >> preempted >> cc1plus thread (I do have preemption enabled) only to leave CPU 1 with >> zero load. >> >> Here is a proposed solution: >> >> turnstile_wait: optimize priority lending to a thread on a runqueue >> >> As the current thread is definitely going into mi_switch, it now >> removes >> its load before doing priority propagation which can potentially >> result >> in sched_add. In the SMP && ULE case the latter searches for the >> least loaded CPU to place a boosted thread, which is supposedly >> about >> to run. >> >> diff --git a/sys/kern/sched_ule.c b/sys/kern/sched_ule.c >> index 8e466cd..3299cae 100644 >> --- a/sys/kern/sched_ule.c >> +++ b/sys/kern/sched_ule.c >> @@ -1878,7 +1878,10 @@ sched_switch(struct thread *td, struct thread >> *newtd, int >> flags) >> /* This thread must be going to sleep. */ >> TDQ_LOCK(tdq); >> mtx = thread_lock_block(td); >> -tdq_load_rem(tdq, td); >> +#if defined(SMP) >> +if ((flags & SW_TYPE_MASK) != SWT_TURNSTILE) >> +#endif >> +tdq_load_rem(tdq, td); >> } >> /* >> * We enter here with the thread blocked and assigned to the >> @@ -2412,6 +2415,21 @@ sched_rem(struct thread *td) >> tdq_setlowpri(tdq, NULL); >> } >> >> +void >> +sched_load_rem(struct thread *td) >> +{ >> +struct tdq *tdq; >> + >> +KASSERT(td == curthread, >> +("sched_rem_load: only curthread is supported")); >> +KASSERT(td->td_oncpu == td->td_sched->ts_cpu, >> +("thread running on cpu different from ts_cpu")); >> +tdq = TDQ_CPU(td->td_sched->ts_cpu); >> +TDQ_LOCK_ASSERT(tdq, MA_OWNED); >> +MPASS(td->td_lock == TDQ_LOCKPTR(tdq)); >> +tdq_load_rem(tdq, td); >> +} >> + >> /* >>* Fetch cpu utilization information. Updates on demand. >>*/ >> diff --git a/sys/kern/subr_turnstile.c b/sys/kern/subr_turnstile.c >> index 31d16fe..d1d68e9 100644 >> --- a/sys/kern/subr_turnstile.c >> +++ b/sys/kern/subr_turnstile.c >> @@ -731,6 +731,13 @@ turnstile_wait(struct turnstile *ts, struct thread >> *owner, >> int queue) >> LIST_INSERT_HEAD(&ts->ts_free, td->td_turnstile, ts_hash); >> } >> thread_lock(td); >> +#if defined(SCHED_ULE) && defined(SMP) >> +/* >> + * Remove load earlier so that it does not affect cpu selection >> + * for a thread waken up due to priority lending, if any. >> + */ >> +sched_load_rem(td); >> +#endif >> thread_lock_set(td, &ts->ts_lock); >> td->td_turnstile = NULL; >> >> diff --git a/sys/sys/sched.h b/sys/sys/sched.h >> index 4b8387c..b1ead1b 100644 >> --- a/sys/sys/sched.h >> +++ b/sys/sys/sched.h >> @@ -110,6 +110,9 @@ void sched_preempt(struct thread *td); >> void sched_add(struct thread *td, int flags); >> void sched_clock(struct thread *td); >> void sched_rem(struct thread *td); >> +#if defined(SCHED_ULE) && defined(SMP) >> +voidsched_load_rem(struct thread *td); >> +#endif >> void sched_tick(int cnt); >> void sched_relinquish(struct thread
Re: ule+smp: small optimization for turnstile priority lending
On 10/29/12, Attilio Rao wrote: > On Mon, Oct 29, 2012 at 4:56 PM, Attilio Rao wrote: >> On Wed, Oct 3, 2012 at 3:05 PM, Andriy Gapon wrote: >>> on 20/09/2012 16:14 Attilio Rao said the following: >>>> On 9/20/12, Andriy Gapon wrote: >>> [snip] >>>>> The patch works well as far as I can tell. Thank you! >>>>> There is one warning with full witness enables but it appears to be >>>>> harmless >>>>> (so >>>>> far): >>>> >>>> Andriy, >>>> thanks a lot for your testing and reports you made so far. >>>> Unfortunately I'm going off for 2 weeks now and I won't work on >>>> FreeBSD for that timeframe. I will get back to those in 2 weeks then. >>>> If you want to play more with this idea feel free to extend/fix/etc. >>>> this patch. >>> >>> Unfortunately I haven't found time to work on this further, but I have >>> some >>> additional thoughts. >>> >>> First, I think that the witness message was not benign and it actually >>> warns about >>> the same kind of deadlock that I originally had. >>> The problem is that sched_lend_prio() is called with target thread's >>> td_lock held, >>> which is a lock of tdq on the thread's CPU. Then, in your patch, we >>> acquire >>> current tdq's lock to modify its load. But if two tdq locks are to be >>> held at the >>> same time, then they must be locked using tdq_lock_pair, so that lock >>> order is >>> maintained. With the patch no tdq lock order can be maintained (can >>> arbitrary) >>> and thus it is possible to run into a deadlock. >> >> Indeed. I realized this after thinking about this problem while I was >> on holiday. >> >>> >>> I see two possibilities here, but don't rule out that there can be more. >>> >>> 1. Do something like my patch does. That is, manipulate current thread's >>> tdq in >>> advance before any other thread/tdq locks are acquired (or td_lock is >>> changed to >>> point to a different lock and current tdq is unlocked). The API can be >>> made more >>> generic in nature. E.g. it can look like this: >>> void >>> sched_thread_block(struct thread *td, int inhibitor) >>> { >>> struct tdq *tdq; >>> >>> THREAD_LOCK_ASSERT(td, MA_OWNED); >>> KASSERT(td == curthread, >>> ("sched_thread_block: only curthread is supported")); >>> tdq = TDQ_SELF(); >>> TDQ_LOCK_ASSERT(tdq, MA_OWNED); >>> MPASS(td->td_lock == TDQ_LOCKPTR(tdq)); >>> TD_SET_INHIB(td, inhibitor); >>> tdq_load_rem(tdq, td); >>> tdq_setlowpri(tdq, td); >>> } >>> >>> >>> 2. Try to do things from sched_lend_prio based on curthread's state. >>> This, as it >>> seems, would require completely lock-less manipulations of current tdq. >>> E.g. for >>> tdq_load we could use atomic operations (it is already accessed >>> locklessly, but >>> not modified). But for tdq_lowpri a more elaborate trick would be >>> required like >>> having a separate field for a temporary value. >> >> No, this is not going to work because tdq_lowpri and tdq_load are >> updated and used in the same critical path (and possibly also other >> tdq* members in tdq_runqueue_add(), for example, but I didn't bother >> to also check that). > > Ok, so here I wasn't completely right because td_lowpri and tdq_load > are not read in the same critical path (cpu_search() and > sched_pickcpu() in the self case). > However, I was over-estimating a factor in this patch: I don't think > we need to fix real tdq_load for self in that case before cpu_search() > because self is handled in a very special way, with its own > comparison. I then think that a patch local to sched_pickcpu() on the > self path should be enough. > > This brings us to another bigger issue, however. tdq_lowpri handling > is not perfect in that patch. Think about the case where the thread to > be switched out is, infact, the lowest priority one. In that case, > what happens is that what we should do is recalculating tdq_lowpri > which is usually a very expensive operation and requires TDQ self > locking to be in place to be brought on correctly. I think that the easy fix for this is to implement a second field called tdq_sec_lowpri where we store the next thread to be scheduled, after tdq_lowpri. This is going to not really change anything because we will get it for free when we already calculate tdq_lowpri and it will let us the implement a definitive logic that helps in the case you describe. However I still want to think about the base idea behind the current algorithm in the self/likely cpu pickup. Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: ule+smp: small optimization for turnstile priority lending
On Mon, Oct 29, 2012 at 6:59 PM, Attilio Rao wrote: [ trimm ] > This also let me wonder about the tdq_lowpri check in the self case, > in general. Basically it forces sched_pickcpu() to select self if and > only if the new thread to schedule has an higher priority than the > lowest one on curcpu. Why is that like this? Exactly this check is > used to enforce some sort of fairness? > It would be good if Jeff spends a word or two on this check specifically. Also, I've read the code of tdq_setlowpri() more closely and I think the name tdq_lowpri is highly misleading, because that seems to me the *highest* priority thread that gets returned. Said that, this means that self will be selected in sched_pickcpu() if and only if the new thread has an higher priority than all the ones on the self runqueue. Isn't all this a bit too extreme? Assuming that one cpu has only a single high-priority thread and others are very loaded it would likely make sense to not keep loading them but switch to the self one, maybe. > Anyway the patch that implements what suggested, let me know your > thinking about it: > http://www.freebsd.org/~attilio/avg_ule2.patch I was thinking that however, maybe we could do the tdq_choose() calculation if self == target, to have a little more chances to get the optimization in place, eventually. Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: ule+smp: small optimization for turnstile priority lending
On Mon, Oct 29, 2012 at 4:56 PM, Attilio Rao wrote: > On Wed, Oct 3, 2012 at 3:05 PM, Andriy Gapon wrote: >> on 20/09/2012 16:14 Attilio Rao said the following: >>> On 9/20/12, Andriy Gapon wrote: >> [snip] >>>> The patch works well as far as I can tell. Thank you! >>>> There is one warning with full witness enables but it appears to be >>>> harmless >>>> (so >>>> far): >>> >>> Andriy, >>> thanks a lot for your testing and reports you made so far. >>> Unfortunately I'm going off for 2 weeks now and I won't work on >>> FreeBSD for that timeframe. I will get back to those in 2 weeks then. >>> If you want to play more with this idea feel free to extend/fix/etc. >>> this patch. >> >> Unfortunately I haven't found time to work on this further, but I have some >> additional thoughts. >> >> First, I think that the witness message was not benign and it actually warns >> about >> the same kind of deadlock that I originally had. >> The problem is that sched_lend_prio() is called with target thread's td_lock >> held, >> which is a lock of tdq on the thread's CPU. Then, in your patch, we acquire >> current tdq's lock to modify its load. But if two tdq locks are to be held >> at the >> same time, then they must be locked using tdq_lock_pair, so that lock order >> is >> maintained. With the patch no tdq lock order can be maintained (can >> arbitrary) >> and thus it is possible to run into a deadlock. > > Indeed. I realized this after thinking about this problem while I was > on holiday. > >> >> I see two possibilities here, but don't rule out that there can be more. >> >> 1. Do something like my patch does. That is, manipulate current thread's >> tdq in >> advance before any other thread/tdq locks are acquired (or td_lock is >> changed to >> point to a different lock and current tdq is unlocked). The API can be made >> more >> generic in nature. E.g. it can look like this: >> void >> sched_thread_block(struct thread *td, int inhibitor) >> { >> struct tdq *tdq; >> >> THREAD_LOCK_ASSERT(td, MA_OWNED); >> KASSERT(td == curthread, >> ("sched_thread_block: only curthread is supported")); >> tdq = TDQ_SELF(); >> TDQ_LOCK_ASSERT(tdq, MA_OWNED); >> MPASS(td->td_lock == TDQ_LOCKPTR(tdq)); >> TD_SET_INHIB(td, inhibitor); >> tdq_load_rem(tdq, td); >> tdq_setlowpri(tdq, td); >> } >> >> >> 2. Try to do things from sched_lend_prio based on curthread's state. This, >> as it >> seems, would require completely lock-less manipulations of current tdq. >> E.g. for >> tdq_load we could use atomic operations (it is already accessed locklessly, >> but >> not modified). But for tdq_lowpri a more elaborate trick would be required >> like >> having a separate field for a temporary value. > > No, this is not going to work because tdq_lowpri and tdq_load are > updated and used in the same critical path (and possibly also other > tdq* members in tdq_runqueue_add(), for example, but I didn't bother > to also check that). Ok, so here I wasn't completely right because td_lowpri and tdq_load are not read in the same critical path (cpu_search() and sched_pickcpu() in the self case). However, I was over-estimating a factor in this patch: I don't think we need to fix real tdq_load for self in that case before cpu_search() because self is handled in a very special way, with its own comparison. I then think that a patch local to sched_pickcpu() on the self path should be enough. This brings us to another bigger issue, however. tdq_lowpri handling is not perfect in that patch. Think about the case where the thread to be switched out is, infact, the lowest priority one. In that case, what happens is that what we should do is recalculating tdq_lowpri which is usually a very expensive operation and requires TDQ self locking to be in place to be brought on correctly. At this point, I would suggest to use the new version of avg_ule.patch which doesn't take into account tdq_lowpri. This means the optimization won't work in all the cases but at least it is a good trade-off with a more hacky version. This also let me wonder about the tdq_lowpri check in the self case, in general. Basically it forces sched_pickcpu() to select self if and only if the new thread to schedule has an higher priority than the lowest one on curcpu. Why is that like this? Exactly this check is used to enforce some sort of fairness? It would be good if Jeff spends a word or two on this check specifically. Anyway the patch that implements what suggested, let me know your thinking about it: http://www.freebsd.org/~attilio/avg_ule2.patch Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: ule+smp: small optimization for turnstile priority lending
On Wed, Oct 3, 2012 at 3:05 PM, Andriy Gapon wrote: > on 20/09/2012 16:14 Attilio Rao said the following: >> On 9/20/12, Andriy Gapon wrote: > [snip] >>> The patch works well as far as I can tell. Thank you! >>> There is one warning with full witness enables but it appears to be harmless >>> (so >>> far): >> >> Andriy, >> thanks a lot for your testing and reports you made so far. >> Unfortunately I'm going off for 2 weeks now and I won't work on >> FreeBSD for that timeframe. I will get back to those in 2 weeks then. >> If you want to play more with this idea feel free to extend/fix/etc. >> this patch. > > Unfortunately I haven't found time to work on this further, but I have some > additional thoughts. > > First, I think that the witness message was not benign and it actually warns > about > the same kind of deadlock that I originally had. > The problem is that sched_lend_prio() is called with target thread's td_lock > held, > which is a lock of tdq on the thread's CPU. Then, in your patch, we acquire > current tdq's lock to modify its load. But if two tdq locks are to be held > at the > same time, then they must be locked using tdq_lock_pair, so that lock order is > maintained. With the patch no tdq lock order can be maintained (can > arbitrary) > and thus it is possible to run into a deadlock. Indeed. I realized this after thinking about this problem while I was on holiday. > > I see two possibilities here, but don't rule out that there can be more. > > 1. Do something like my patch does. That is, manipulate current thread's tdq > in > advance before any other thread/tdq locks are acquired (or td_lock is changed > to > point to a different lock and current tdq is unlocked). The API can be made > more > generic in nature. E.g. it can look like this: > void > sched_thread_block(struct thread *td, int inhibitor) > { > struct tdq *tdq; > > THREAD_LOCK_ASSERT(td, MA_OWNED); > KASSERT(td == curthread, > ("sched_thread_block: only curthread is supported")); > tdq = TDQ_SELF(); > TDQ_LOCK_ASSERT(tdq, MA_OWNED); > MPASS(td->td_lock == TDQ_LOCKPTR(tdq)); > TD_SET_INHIB(td, inhibitor); > tdq_load_rem(tdq, td); > tdq_setlowpri(tdq, td); > } > > > 2. Try to do things from sched_lend_prio based on curthread's state. This, > as it > seems, would require completely lock-less manipulations of current tdq. E.g. > for > tdq_load we could use atomic operations (it is already accessed locklessly, > but > not modified). But for tdq_lowpri a more elaborate trick would be required > like > having a separate field for a temporary value. No, this is not going to work because tdq_lowpri and tdq_load are updated and used in the same critical path (and possibly also other tdq* members in tdq_runqueue_add(), for example, but I didn't bother to also check that). Let me think some more about this and get back to you. Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: ule+smp: small optimization for turnstile priority lending
On 9/20/12, Andriy Gapon wrote: > on 19/09/2012 10:33 Attilio Rao said the following: >> It is hard for me to tell if this is subject to same issues because I >> do not entirely understand where the locking was happening in your >> patch. >> Can you try testing this with your already KTR instrumented test and >> possibly report? > > The patch works well as far as I can tell. Thank you! > There is one warning with full witness enables but it appears to be harmless > (so > far): Andiy, thanks a lot for your testing and reports you made so far. Unfortunately I'm going off for 2 weeks now and I won't work on FreeBSD for that timeframe. I will get back to those in 2 weeks then. If you want to play more with this idea feel free to extend/fix/etc. this patch. Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: ule+smp: small optimization for turnstile priority lending
On Wed, Sep 19, 2012 at 7:03 AM, Andriy Gapon wrote: > on 19/09/2012 02:01 Attilio Rao said the following: >> On Tue, Sep 18, 2012 at 8:07 PM, Andriy Gapon wrote: >>> on 18/09/2012 19:50 Attilio Rao said the following: >>>> On 9/18/12, Andriy Gapon wrote: >>>>> >>>>> Here is a snippet that demonstrates the issue on a supposedly fully loaded >>>>> 2-processor system: >>>>> >>>>> 136794 0 3670427870244462 KTRGRAPH group:"thread", id:"Xorg tid 102818", >>>>> state:"running", attributes: prio:122 >>>>> >>>>> 136793 0 3670427870241000 KTRGRAPH group:"thread", id:"cc1plus tid >>>>> 111916", >>>>> state:"yielding", attributes: prio:183, wmesg:"(null)", lockname:"(null)" >>>>> >>>>> 136792 1 3670427870240829 KTRGRAPH group:"thread", id:"idle: cpu1 tid >>>>> 14", >>>>> state:"running", attributes: prio:255 >>>>> >>>>> 136791 1 3670427870239520 KTRGRAPH group:"load", id:"CPU 1 load", >>>>> counter:0, >>>>> attributes: none >>>>> >>>>> 136790 1 3670427870239248 KTRGRAPH group:"thread", id:"firefox tid >>>>> 113473", >>>>> state:"blocked", attributes: prio:122, wmesg:"(null)", lockname:"unp_mtx" >>>>> >>>>> 136789 1 3670427870237697 KTRGRAPH group:"load", id:"CPU 0 load", >>>>> counter:2, >>>>> attributes: none >>>>> >>>>> 136788 1 3670427870236394 KTRGRAPH group:"thread", id:"firefox tid >>>>> 113473", >>>>> point:"wokeup", attributes: linkedto:"Xorg tid 102818" >>>>> >>>>> 136787 1 3670427870236145 KTRGRAPH group:"thread", id:"Xorg tid 102818", >>>>> state:"runq add", attributes: prio:122, linkedto:"firefox tid 113473" >>>>> >>>>> 136786 1 3670427870235981 KTRGRAPH group:"load", id:"CPU 1 load", >>>>> counter:1, >>>>> attributes: none >>>>> >>>>> 136785 1 3670427870235707 KTRGRAPH group:"thread", id:"Xorg tid 102818", >>>>> state:"runq rem", attributes: prio:176 >>>>> >>>>> 136784 1 3670427870235423 KTRGRAPH group:"thread", id:"Xorg tid 102818", >>>>> point:"prio", attributes: prio:176, new prio:122, linkedto:"firefox tid >>>>> 113473" >>>>> >>>>> 136783 1 3670427870202392 KTRGRAPH group:"thread", id:"firefox tid >>>>> 113473", >>>>> state:"running", attributes: prio:104 >>>>> >>>>> See how how the Xorg thread was forced from CPU 1 to CPU 0 where it >>>>> preempted >>>>> cc1plus thread (I do have preemption enabled) only to leave CPU 1 with >>>>> zero >>>>> load. >>>> >>>> I think that the idea is bright, but I have reservations against the >>>> implementation because it seems to me there are too many layering >>>> violations. >>> >>> Just one - for a layer between tunrstile and scheduler :-) >>> But I agree. >>> >>>> What is suggest is somewhat summarized like that: >>>> - Add a new SRQ_WILLSLEEP or the name you prefer >>>> - Add a new "flags" argument to sched_lend_prio() (both ule and 4bsd) >>>> and sched_thread_priority (ule only) >>>> - sched_thread_priority() will pass down the new flag to sched_add() >>>> which passed down to sched_pickcpu(). >>>> >>>> This way sched_pickcpu() has the correct knowledge of what is going on >>>> and it can make the right decision. You likely don't need to lower the >>>> tdq_load at that time either this way, because sched_pickcpu() can >>>> just adjust it locally for its decision. >>>> >>>> What do you think? >>> >>> This sounds easy but it is not quite so given the implementation of >>> sched_pickcpu and sched_lowest. This is probably more work than I am able >>> to >>> take now. >> >> I think actually that the attached patch should do what you need. Of >> course there is more runqueue locking, due to the tdq_load fondling, >> but I'm not sure it is really a big deal. >> I didn't test it yet, as I understand you have a test case, so maybe >> you can. However if Jeff agrees I can send the patch to flo@ for >> performance evaluation. >> >> Thoughts? > > Originally I had a similar patch with a small difference that I did tdq_load > manipulations in sched_thread_priority around sched_add call. That patch > produced a deadlock because of the need for TDQ_LOCK. It is hard for me to tell if this is subject to same issues because I do not entirely understand where the locking was happening in your patch. Can you try testing this with your already KTR instrumented test and possibly report? Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: ule+smp: small optimization for turnstile priority lending
On Tue, Sep 18, 2012 at 8:07 PM, Andriy Gapon wrote: > on 18/09/2012 19:50 Attilio Rao said the following: >> On 9/18/12, Andriy Gapon wrote: >>> >>> Here is a snippet that demonstrates the issue on a supposedly fully loaded >>> 2-processor system: >>> >>> 136794 0 3670427870244462 KTRGRAPH group:"thread", id:"Xorg tid 102818", >>> state:"running", attributes: prio:122 >>> >>> 136793 0 3670427870241000 KTRGRAPH group:"thread", id:"cc1plus tid >>> 111916", >>> state:"yielding", attributes: prio:183, wmesg:"(null)", lockname:"(null)" >>> >>> 136792 1 3670427870240829 KTRGRAPH group:"thread", id:"idle: cpu1 tid >>> 14", >>> state:"running", attributes: prio:255 >>> >>> 136791 1 3670427870239520 KTRGRAPH group:"load", id:"CPU 1 load", >>> counter:0, >>> attributes: none >>> >>> 136790 1 3670427870239248 KTRGRAPH group:"thread", id:"firefox tid >>> 113473", >>> state:"blocked", attributes: prio:122, wmesg:"(null)", lockname:"unp_mtx" >>> >>> 136789 1 3670427870237697 KTRGRAPH group:"load", id:"CPU 0 load", >>> counter:2, >>> attributes: none >>> >>> 136788 1 3670427870236394 KTRGRAPH group:"thread", id:"firefox tid >>> 113473", >>> point:"wokeup", attributes: linkedto:"Xorg tid 102818" >>> >>> 136787 1 3670427870236145 KTRGRAPH group:"thread", id:"Xorg tid 102818", >>> state:"runq add", attributes: prio:122, linkedto:"firefox tid 113473" >>> >>> 136786 1 3670427870235981 KTRGRAPH group:"load", id:"CPU 1 load", >>> counter:1, >>> attributes: none >>> >>> 136785 1 3670427870235707 KTRGRAPH group:"thread", id:"Xorg tid 102818", >>> state:"runq rem", attributes: prio:176 >>> >>> 136784 1 3670427870235423 KTRGRAPH group:"thread", id:"Xorg tid 102818", >>> point:"prio", attributes: prio:176, new prio:122, linkedto:"firefox tid >>> 113473" >>> >>> 136783 1 3670427870202392 KTRGRAPH group:"thread", id:"firefox tid >>> 113473", >>> state:"running", attributes: prio:104 >>> >>> See how how the Xorg thread was forced from CPU 1 to CPU 0 where it >>> preempted >>> cc1plus thread (I do have preemption enabled) only to leave CPU 1 with zero >>> load. >> >> I think that the idea is bright, but I have reservations against the >> implementation because it seems to me there are too many layering >> violations. > > Just one - for a layer between tunrstile and scheduler :-) > But I agree. > >> What is suggest is somewhat summarized like that: >> - Add a new SRQ_WILLSLEEP or the name you prefer >> - Add a new "flags" argument to sched_lend_prio() (both ule and 4bsd) >> and sched_thread_priority (ule only) >> - sched_thread_priority() will pass down the new flag to sched_add() >> which passed down to sched_pickcpu(). >> >> This way sched_pickcpu() has the correct knowledge of what is going on >> and it can make the right decision. You likely don't need to lower the >> tdq_load at that time either this way, because sched_pickcpu() can >> just adjust it locally for its decision. >> >> What do you think? > > This sounds easy but it is not quite so given the implementation of > sched_pickcpu and sched_lowest. This is probably more work than I am able to > take now. I think actually that the attached patch should do what you need. Of course there is more runqueue locking, due to the tdq_load fondling, but I'm not sure it is really a big deal. I didn't test it yet, as I understand you have a test case, so maybe you can. However if Jeff agrees I can send the patch to flo@ for performance evaluation. Thoughts? Attilio Index: sys/sys/sched.h === --- sys/sys/sched.h (revisione 240672) +++ sys/sys/sched.h (copia locale) @@ -91,7 +91,7 @@ void sched_nice(struct proc *p, int nice); */ void sched_exit_thread(struct thread *td, struct thread *child); void sched_fork_thread(struct thread *td, struct thread *child); -void sched_lend_prio(struct thread *td, u_char prio); +void sched_lend_prio(struc
Re: ule+smp: small optimization for turnstile priority lending
On 9/18/12, Andriy Gapon wrote: > > Here is a snippet that demonstrates the issue on a supposedly fully loaded > 2-processor system: > > 136794 0 3670427870244462 KTRGRAPH group:"thread", id:"Xorg tid 102818", > state:"running", attributes: prio:122 > > 136793 0 3670427870241000 KTRGRAPH group:"thread", id:"cc1plus tid > 111916", > state:"yielding", attributes: prio:183, wmesg:"(null)", lockname:"(null)" > > 136792 1 3670427870240829 KTRGRAPH group:"thread", id:"idle: cpu1 tid > 14", > state:"running", attributes: prio:255 > > 136791 1 3670427870239520 KTRGRAPH group:"load", id:"CPU 1 load", > counter:0, > attributes: none > > 136790 1 3670427870239248 KTRGRAPH group:"thread", id:"firefox tid > 113473", > state:"blocked", attributes: prio:122, wmesg:"(null)", lockname:"unp_mtx" > > 136789 1 3670427870237697 KTRGRAPH group:"load", id:"CPU 0 load", > counter:2, > attributes: none > > 136788 1 3670427870236394 KTRGRAPH group:"thread", id:"firefox tid > 113473", > point:"wokeup", attributes: linkedto:"Xorg tid 102818" > > 136787 1 3670427870236145 KTRGRAPH group:"thread", id:"Xorg tid 102818", > state:"runq add", attributes: prio:122, linkedto:"firefox tid 113473" > > 136786 1 3670427870235981 KTRGRAPH group:"load", id:"CPU 1 load", > counter:1, > attributes: none > > 136785 1 3670427870235707 KTRGRAPH group:"thread", id:"Xorg tid 102818", > state:"runq rem", attributes: prio:176 > > 136784 1 3670427870235423 KTRGRAPH group:"thread", id:"Xorg tid 102818", > point:"prio", attributes: prio:176, new prio:122, linkedto:"firefox tid > 113473" > > 136783 1 3670427870202392 KTRGRAPH group:"thread", id:"firefox tid > 113473", > state:"running", attributes: prio:104 > > See how how the Xorg thread was forced from CPU 1 to CPU 0 where it > preempted > cc1plus thread (I do have preemption enabled) only to leave CPU 1 with zero > load. I think that the idea is bright, but I have reservations against the implementation because it seems to me there are too many layering violations. What is suggest is somewhat summarized like that: - Add a new SRQ_WILLSLEEP or the name you prefer - Add a new "flags" argument to sched_lend_prio() (both ule and 4bsd) and sched_thread_priority (ule only) - sched_thread_priority() will pass down the new flag to sched_add() which passed down to sched_pickcpu(). This way sched_pickcpu() has the correct knowledge of what is going on and it can make the right decision. You likely don't need to lower the tdq_load at that time either this way, because sched_pickcpu() can just adjust it locally for its decision. What do you think? Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: syslog(3) issues
On Mon, Sep 3, 2012 at 3:23 AM, Ian Lepore wrote: > On Sun, 2012-09-02 at 19:50 -0600, Ian Lepore wrote: >> On Mon, 2012-09-03 at 00:35 +0100, Attilio Rao wrote: >> > Hi, >> > I was trying to use syslog(3) in a port application that uses >> > threading , having all of them at the LOG_CRIT level. What I see is >> > that when the logging gets massive (1000 entries) I cannot find some >> > items within the /var/log/messages (I know because I started stamping >> > also some sort of message ID in order to see what is going on). The >> > missing items are in the order of 25% of what really be there. >> > >> > Someone has a good idea on where I can start verifying for my syslogd >> > system? I have really 0 experience with syslogd and maybe I could be >> > missing something obvious. >> >> There's a chance this PR about syslogd incorrectly calculating socket >> receive buffer sizes is related and the patch attached to it could fix >> it... >> >> http://www.freebsd.org/cgi/query-pr.cgi?pr=1604331 >> >> I filed the PR long ago, if the patches have drifted out of date I'll be >> happy to re-work them. >> >> -- Ian >> > > Oops, I glitched the PR number when I pasted it, this one should be > correct: > > http://www.freebsd.org/cgi/query-pr.cgi?pr=160433 This patch fixes the problem for me, thanks a lot. Alexander, do you have any reservation against it? When do you think the patch can be merged to -CURRENTR? Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: syslog(3) issues
On Mon, Sep 3, 2012 at 1:20 AM, Garrett Cooper wrote: > On Sun, Sep 2, 2012 at 4:35 PM, Attilio Rao wrote: >> Hi, >> I was trying to use syslog(3) in a port application that uses >> threading , having all of them at the LOG_CRIT level. What I see is >> that when the logging gets massive (1000 entries) I cannot find some >> items within the /var/log/messages (I know because I started stamping >> also some sort of message ID in order to see what is going on). The >> missing items are in the order of 25% of what really be there. >> >> Someone has a good idea on where I can start verifying for my syslogd >> system? I have really 0 experience with syslogd and maybe I could be >> missing something obvious. > > I'd maybe use something like rsyslog and force TCP to verify that > the messages made it to their endpoints, and if all the messages make > it to the rsyslogd daemon use tcpdump/wireshark to figure out if the > UDP datagrams (default transport layer for syslog) aren't getting > dropped on the floor. Forgot to mention: the logging is done completely locally so I don't think network should play a role here. Also, I would like to understand if I'm missing something subdle or if we actually may have a bug in syslogd. Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
syslog(3) issues
Hi, I was trying to use syslog(3) in a port application that uses threading , having all of them at the LOG_CRIT level. What I see is that when the logging gets massive (1000 entries) I cannot find some items within the /var/log/messages (I know because I started stamping also some sort of message ID in order to see what is going on). The missing items are in the order of 25% of what really be there. Someone has a good idea on where I can start verifying for my syslogd system? I have really 0 experience with syslogd and maybe I could be missing something obvious. Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: On cooperative work [Was: Re: newbus' ivar's limitation..]
On 8/1/12, Arnaud Lacombe wrote: > Hi, > > On Wed, Aug 1, 2012 at 2:18 PM, Attilio Rao wrote: [ trimm ] >> You are forgetting one specific detail: you can always review a work >> *after* it entered the tree. This is something you would never do, but >> sometimes, when poor quality code is committed there is nothing else >> you can do than just raise your concern after it is in. >> > Unfortunately, not. First, the developer will certainly have moved on > after the commit, API may have been changed tree-wide, etc. Then, time > is likely to have passed between the time you figure potential > regression or bugs, which makes the first point even more problematic. > Finally, if my point of view is being ignored *before* it goes to the > tree, do you really expect it will be considered *after* ? There is one thing you are not considering: committers are as powerless as non-committers in face of someone stuck on his own buggy ideas/implementations. Often people are just convinced their idea is better than your and they won't change their mind, regardeless their status in the opensource community. And there is nothing more you can do apart from learning how to deal with such situations. Granted, there are projects blowing up and people abbandoning successful opensource community because of this. > From my external point of view, committers not only have the > possibility, but *do* commit mess in the tree without non-committers > could say or do anything, just as well as committers being able to > arbitrarily close PR even if the original reporter disagree. You should look at svn-src@ more often I suspect. You will see how many discussions are in there. >> And so? I think you have a wrong point of view of what is >> shamelessly... I'm working on the same project since 6 months, i >> thought I could finish it in 1 but then I understood that in order to >> get the quality I was hoping I had to do more work... does it qualify >> as failed, according to your standard? >> > not specifically, but at the end of that project of your, I would run > a post-mortem meeting to see what went correctly and where things got > out-of-control. Arnaud, my friend, I have a new for you: this stuff is hard. I see the brightest people I've ever met stuck for weeks on problems, thinking about how to fix them in elegant way. Sometimes things get understimated, sometimes not, this is just part of the game. But an important thing is accepting critics in costructive way and learn. This makes things much easier. > As for the mbuf meeting, all I can find from it online is: > > http://lists.freebsd.org/pipermail/freebsd-arch/2012-June/012629.html > > which is not worth much... Rather than doing things internally, maybe > it is time to open up... oh, wait, you will certainly come to the > community with a design plan, figure out it's not gonna work while > doing the implementation, change the design plan while implementing, > go public with a +3k/-2k loc change patch, ask for benediction, go > ahead and commit it up until someone comes with an obvious design flaw > which would render the change pretty much useless, but there will be > man-month of work to fix it, so it's never gonna be done. > > One obvious problem in FreeBSD is that committers are prosecutor, > judge and jury altogether. That's not the first time I point this out. You are drammatizing. As I already told, please, if you are interested in this topic, ask for the state of the discussion and ask politely to be included from now on. Nobody will reject you only because you don't have a @freebsd.org. >>>> b) there is still not consensus on details >>>> >>> Then the discussion should stop, public records are kept for reference >>> in the future. There is no problem with this. >>> >>>> and you can always publically asked on what was decided and what not. >>>> Just send a mail to interested recipients and CC any FreeBSD mailing >>>> list. >>>> >>> This is not the way "openness" should be about. >> >> There is not much more you can do when people don't share details and >> discussions automatically. >> > keep reporting regressions, interface flaws, POLA violations, ABI > breakages, bugs, etc. I agree. But with the correct and humble mindset and avoiding aggressive behaviour. Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: On cooperative work [Was: Re: newbus' ivar's limitation..]
On 8/1/12, Arnaud Lacombe wrote: > Hi, > > On Wed, Aug 1, 2012 at 12:40 PM, Attilio Rao wrote: >> On Wed, Aug 1, 2012 at 5:32 PM, Arnaud Lacombe >> wrote: >>> Hi, >>> >>> On Tue, Jul 31, 2012 at 4:14 PM, Attilio Rao >>> wrote: >>>> >>>> You don't want to work cooperatively. >>>> >>> Why is it that mbuf's refactoring consultation is being held in >>> internal, private, committers-and-invite-only-restricted meeting at >>> BSDCan ? >>> >>> Why is it that so much review and discussion on changes are held >>> privately ? >> >> Arnaud, >> belive me, to date I don't recall a single major technical decision >> that has been settled exclusively in private (not subjected to peer >> review) and in particular in person (e-mail help you focus on a lot of >> different details that you may not have under control when talking to >> people, etc). >> > Whose call is it to declare something worth public discussion ? No one. > > Every time I see a "Suggested by:", "Submitted by:", "Reported by:", > and especially "Approved by:", there should to be a public reference > of the mentioned communication. Not necessarilly. Every developer must ensure to produce a quality work, with definition of "quality" being discretional. Some people fail this expectation, while others do very good. As a general rule, some people send patches to experts on the matter and they just trust their judgment, others also full-fill testing cycles by thirdy part people, others again ask for public reviews. Often this process is adapted based on the dimension of the work and the number of architectural changes it introduces. As a personal matter, for a big architectural change things I would *personally* do are: - Prepare a master-plan with experts of the matter - Post a plan (after having achived consensus) on the public mailing list for further discussions - Adjust the plan based on public feedbacks (if necessary) - Implement the plan - Ask the experts if they have objections to the implementation - Ask testers to perform some stress-testing - Ask testers to perform benchmark (if I find people interested in that) - Send out to the public mailing list for public review - Integrate suggestions - Ask testers to stress-test again - Commit I think this model in general works fairly well, but people may have different ideas on that, meaning that people may want to not involve thirdy part for testing or review. This is going to be risky and lower the quality of their work but it is their call. >> Sometimes it is useful that a limited number of developers is involved >> in initial brainstorming of some works, >> > Never. > >> but after this period >> constructive people usually ask for peer review publishing their plans >> on the mailing lists or other media. >> > Again, never. By doing so, you merely put the community in a situation > where, well, "We, committers, have come with this, you can either > accept or STFU, but no major changes will be made because we decided > so." You are forgetting one specific detail: you can always review a work *after* it entered the tree. This is something you would never do, but sometimes, when poor quality code is committed there is nothing else you can do than just raise your concern after it is in. > The callout-ng conference at BSDCan was just beautiful, it was basically: > > Speaker: "we will do this" > Audience: "how about this situation ? What you will do will not work." > Speaker: "thank you for listening, end of the conference" > > It was beautiful to witness. Not sure if you realized but I was what you mention "Audience". I think you are referring to a specific case where a quick heads-up on a summer of code project has been presented, you cannot really believe all the technical discussion around FreeBSD evolve this way. >> If you don't see any public further discussion this may be meaning: >> a) the BSDCan meetings have been fruitless and there is no precise >> plan/roadmap/etc. >> > so not only you make it private, but it shamelessly failed... And so? I think you have a wrong point of view of what is shamelessly... I'm working on the same project since 6 months, i thought I could finish it in 1 but then I understood that in order to get the quality I was hoping I had to do more work... does it qualify as failed, according to your standard? >> b) there is still not consensus on details >> > Then the discussion should stop, public records are kept for reference > in the future. There is no problem with this. > >> and you can always publica
Re: On cooperative work [Was: Re: newbus' ivar's limitation..]
On Wed, Aug 1, 2012 at 5:32 PM, Arnaud Lacombe wrote: > Hi, > > On Tue, Jul 31, 2012 at 4:14 PM, Attilio Rao wrote: >> >> You don't want to work cooperatively. >> > Why is it that mbuf's refactoring consultation is being held in > internal, private, committers-and-invite-only-restricted meeting at > BSDCan ? > > Why is it that so much review and discussion on changes are held privately ? Arnaud, belive me, to date I don't recall a single major technical decision that has been settled exclusively in private (not subjected to peer review) and in particular in person (e-mail help you focus on a lot of different details that you may not have under control when talking to people, etc). Sometimes it is useful that a limited number of developers is involved in initial brainstorming of some works, but after this period constructive people usually ask for peer review publishing their plans on the mailing lists or other media. If you don't see any public further discussion this may be meaning: a) the BSDCan meetings have been fruitless and there is no precise plan/roadmap/etc. b) there is still not consensus on details and you can always publically asked on what was decided and what not. Just send a mail to interested recipients and CC any FreeBSD mailing list. Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: How to diagnose system freezes?
On 8/1/12, Yuri wrote: > One of my 9.1-BETA1 systems periodically freezes. If sound was playing, > it would usually cycle with a very short period. And system stops being > sensitive to keyboard/mouse. Also ping of this system doesn't get a > response. > I would normally think that this is the faulty memory. But memory was > recently replaced and tested with memtest+ for hours both before and > after freezes and it passes all tests. > One out of the ordinary thing that is running on this system is nvidia > driver. But the freezes happen even when there is no graphics activity. > Another out of the ordinary thing is that the kernel is built for > DTrace. But DTrace was never used in the sessions that had a freeze. > > What is the way to diagnose this problem? Start by adding SW_WATCHDOG to your machine with a reasonably low timeout. Also, can you use a serial console? If yes you may consider going with a serial break. Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: newbus' ivar's limitation..
On Tue, Jul 31, 2012 at 8:47 PM, Arnaud Lacombe wrote: > Hi, > > On Tue, Jul 31, 2012 at 12:27 PM, Warner Losh wrote: >> >> On Jul 31, 2012, at 9:20 AM, Arnaud Lacombe wrote: >> >>> Hi, >>> >>> On Mon, Jul 30, 2012 at 11:51 PM, Warner Losh wrote: [...] We lack that right now, which is why you're trying to shoe-horn the FDT connections into a newbus world and complaining that everything sucks because it is a poor fit. I'd suggest that different mechanisms are necessary. >>> I'm not trying anything, or at least no longer. I do not see the point >>> working on that when I can not get trivial patches in, especially >>> those fixing poorly maintained drivers, whose issues _do_ affect >>> people. >> >> Hey Arnaud, sorry to be a little harsh, but maybe if you shouted less and >> cooperated more, people would be more willing to work with you. >> > I tried to be Mr Nice Guy in the past, all I got in return was being > ignored. Lists are full of unanswered problem because people do not > want to insist getting an answer. Now, believe me on one point, if you > are a driver or subsystem author, might I have an issue with your > work, I *will* be a recurring pain in your butt to get the issue > fixed, or to get in what I do believe, with the limited set of > knowledge and resources to my disposal[0], to be a correct fix for the > issue, at the time. If you are condescending, arrogant, or advocates > of the status-quo, as have been committers in the past, I will return > you favor. Let face it, FreeBSD is not the most outstanding OS out > there (despite obvious capabilities), and I would not be too proud of > its state. > > All that to say that asking politely does not work in the arbitrary > FreeBSD realm, where "the power to serve", is today nothing more that > a relic. Arnaud, the problem I see here is that as always you make strong and false claims on bugs and missing support from FreeBSD kernel, but when people points out what are you missing/misunderstanding, you turn the whole thread into "FreeBSD is a relic" baby-whining, without replying with real technical arguments and simply ignoring e-mail by freebsd developers. I didn't see any response from you to several technical e-mail in this threads and others (please don't force me to open mailman and show exactly all the responses you have deliberately ignored), spitting unrespectful, poison-weighted words on developers of our project. You don't want to work cooperatively. You don't want to build something with FreeBSD community. So why you keep insist on sending e-mail like this? Don't you think it would be more productive for you to stick with another project? (I have a couple of names into my head that may be a good fit for you...). I think it is very offensive that you mock our statement like that. For many people reading this e-mail it has a true meaning, people like you should really watch his mouth when speaking about FreeBSD. Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: ULE/sched issues on stable/9 - why isn't preemption occuring?
2012/5/29 Adrian Chadd : > Hi Alexander and others, > > I've been tinkering with ath(4) IO scheduling and taskqueues. In order > to get proper "in order" TX IO occuring, I've placed ath_start() into > a taskqueue so now whenever ath_start() is called, it just schedules a > taskqueue entry to run. > > However, performance is worse. :-) > > Here's a schedgraph trace. > > http://people.freebsd.org/~adrian/ath/ktr.4-ath-iperf-using-taskqueue-for-tx.ktr.gz > > I've thrown this through schedgraph.py on stable/9 and I've found some > rather annoying behaviour. It seems that the ath0 taskqueue stays in > the "runq add" state for quite a long time (1.5ms and longer) because > something else is going on on CPU #0. > > I'm very confused about what's going on. I'd like a hand trying to > figure out why the schedgraph output is the way it is. What I would usually do for this cases, is to patch your kernel with more KTR traces (handmade class, add a fictious KTR class after KTR_BUF) in the interested code paths to log why your task is not really scheduled. Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: new panic in cpu_reset() with WITNESS
2012/5/17, Andriy Gapon : > on 25/01/2012 23:52 Andriy Gapon said the following: >> on 24/01/2012 14:32 Gleb Smirnoff said the following: >>> Yes, now: >>> >>> Rebooting... >>> lock order reversal: >>> 1st 0x80937140 smp rendezvous (smp rendezvous) @ >>> /usr/src/head/sys/kern/kern_shutdown.c:542 >>> 2nd 0xfe0001f5d838 uart_hwmtx (uart_hwmtx) @ >>> /usr/src/head/sys/dev/uart/uart_cpu.h:92 >>> panic: mtx_lock_spin: recursed on non-recursive mutex cnputs_mtx @ >>> /usr/src/head/sys/kern/kern_cons.c:500 >> >> OK, so it's just a plain LOR between smp rendezvous and uart_hwmtx, no >> new >> details... >> >> It's still intriguing to me why the LOR *doesn't* happen [*] with >> stop_scheduler_on_panic=0. But I don't see a productive way to pursue >> this >> investigation further. > > Salve Glebius! > After your recent nudging I took yet another look at this issue and it seems > that > I have some findings. > > For those who might get interested here is a convenience reference to the > whole > thread on gmane: http://thread.gmane.org/gmane.os.freebsd.current/139307 > > A short summary. > Prerequisites: an SMP x86 system, its kernel is configured with WITNESS && > !WITNESS_SKIPSPIN (this is important) and a system uses serial console via > uart. > Then, if stop_scheduler_on_panic is set to zero the system can be rebooted > without > a problem. On the other hand, if stop_scheduler_on_panic is enabled, then > the > system first runs into a LOR when calling printf() in cpu_reset() and then > it runs > into a panic when printf is recursively called from witness(9) to report the > LOR. > The panic happens because of the recursion on cnputs_mtx, which should > ensure > that cnputs() output is not intermingled and which is not flagged to > support > recursion. > > There are two things about this report that greatly confused and puzzled > me: > 1. stop_scheduler_on_panic variable is used _only_ in panic(9). So how > could it > be possible that changing its value affects behavior of the system when > panic(9) > is not called?! > > 2. The LOR in question happens between "smp rendezvous" (smp_ipi_mtx) and > "uart_hwmtx" (sc_hwmtx_s in uart core) spin locks. The order of these locks > is > actually predefined in witness order_lists[] such that uart_hwmtx must come > before > smp_ipi_mtx. But in the reboot path we first take smp_ipi_mtx in > shutdown_reset(), then we call cpu_reset, then it calls printf and from > there we > get to uart_hwmtx for serial console output. So the order between these > spinlocks > is always violated in the x86 SMP reboot path. > How come witness(9) doesn't _always_ detect this LOR? > How come it didn't detect this LOR before any "stop scheduler" commits?! > > [Spoiler alert :-)] > > Turns out that the two puzzles above are closely related. > Let's first list all the things that change when stop_scheduler_on_panic is > enabled and a panic happens: > - other CPUs are stopped (forced to spin) > - interrupts on current CPU are disabled > - by virtue of the above the current thread should be the only thread > running > (unless it executes a voluntary switch) > - all locks are "busted", they are completely ignored / bypassed > - by virtue of the above no lock invariants and witness checks are > performed, so > no lock order checking, no recursion checking, etc > > So, what I observe is this: when stop_scheduler_on_panic is disabled, the > LOR is > actually detected as it should be. witness(9) works properly here. Once > the LOR > is detected witness(9) wants to report it using printf(9). That's where we > run > into the cnputs_mtx recursion panic. It's all exactly as with > stop_scheduler_on_panic enabled. Then panic(9) wants to report the panic > using > printf(9), which goes to cnputs() again, where _mtx_lock_spin_flags() > detects > locks recursion again (this is independent of witness(9)) and calls > panic(9) > again. Then panic(9) wants to report the panic using printf(9)... > I assume that when the stack is exhausted we run into a double fault and > dblfault_handler wants to print something again... Likely we eventually run > into > a triple fault which resets the machine. Although, I recall some old > reports of > machines hanging during reboot in a place suspiciously close to where the > described ordeal happens... > But if the machine doesn't hang we get a full appearance of the reset > successfully > happening (modulo some last messages missing). > > With stop_scheduler_on_panic enabled and all the locks being ignored we, of > course, do not run into any secondary lock recursions and resulting panics. > So > the system is able to at least panic "gracefully" (still we should have > just > reported the LOR gracefully, no panic is required). > > Some obvious conclusions: > - the "smp rendezvous" and "uart_hwmtx" LOR is real and it is the true cause > of > the problem; it should be fixed one way or other - either by correcting > witness > order_lists[] or by avoiding the
Re: Disabling an arbitrary device
Il 20 aprile 2012 19:18, Arnaud Lacombe ha scritto: > Hi, > > On Fri, Apr 20, 2012 at 2:16 PM, Arnaud Lacombe wrote: >> Hi, >> >> I will be bringing up an old thread there, but it would seem the >> situation did not evolve in the past 9 years. I have a machine running >> 7.1 whose UHCI controller is generating some interrupt storm: >> >> # vmstat -i >> interrupt total rate >> irq4: sio0 1328 2 >> irq19: uhci1+ 63514509 96380 >> [...] >> >> generating useless load on one CPU: >> >> # top -SH >> last pid: 5223; load averages: 0.00, 0.00, 0.00 up 0+00:17:21 >> 13:10:35 >> 117 processes: 14 running, 79 sleeping, 24 waiting >> CPU: 0.2% user, 0.0% nice, 0.2% system, 6.6% interrupt, 93.0% idle >> Mem: 33M Active, 9348K Inact, 67M Wired, 400K Cache, 29M Buf, 2892M Free >> [...] >> 57 root -64 - 0K 8K CPU0 0 11:59 86.57% irq19: >> uhci1+ >> >> I thought I could use an hint to forbid uhci(4) attachment, ala: >> >> hint.uhci.0.disabled="1" >> hint.uhci.1.disabled="1" >> >> in /boot/loader.conf. However, it would seem that what should be >> usable with any arbitrary devices, ie. be an integral part of >> device(9), has to be hardcoded in every driver, sad. >> > as for the usual "if you're not happy, please provide a patch": > > https://github.com/lacombar/freebsd/commit/30786d09b0cb441890cdc749ee5243238e81d2d8 I think a generalized kludge should really belong to device_probe_child() rather than device_add_child_ordered(). When you have a tested patch against -CURRENT, can you please send to freebsd-newbus@? BTW, IIRC 7.x doesn't have the new USB stack, which means that you can have caught easilly a driver bug there, did you consider switching at least to 8.x kernel? Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Regarding core disable in FreeBSD 9
Il 13 aprile 2012 17:18, Mahesh Babu ha scritto: > How to disable a particular core in FreeBSD 9? > How to enable it again? You can disable specific core at boot time by using the tunable: hint.lapic.X.disabled=1 where X is the APIC ID of the CPU you want to turn off. Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: [RFT][patch] Scheduling for HTT and not only
Il 06 aprile 2012 15:27, Alexander Motin ha scritto: > On 04/06/12 17:13, Attilio Rao wrote: >> >> Il 05 aprile 2012 19:12, Arnaud Lacombe ha scritto: >>> >>> Hi, >>> >>> [Sorry for the delay, I got a bit sidetrack'ed...] >>> >>> 2012/2/17 Alexander Motin: >>>> >>>> On 17.02.2012 18:53, Arnaud Lacombe wrote: >>>>> >>>>> >>>>> On Fri, Feb 17, 2012 at 11:29 AM, Alexander Motin >>>>> wrote: >>>>>> >>>>>> >>>>>> On 02/15/12 21:54, Jeff Roberson wrote: >>>>>>> >>>>>>> >>>>>>> On Wed, 15 Feb 2012, Alexander Motin wrote: >>>>>>>> >>>>>>>> >>>>>>>> I've decided to stop those cache black magic practices and focus on >>>>>>>> things that really exist in this world -- SMT and CPU load. I've >>>>>>>> dropped most of cache related things from the patch and made the >>>>>>>> rest >>>>>>>> of things more strict and predictable: >>>>>>>> http://people.freebsd.org/~mav/sched.htt34.patch >>>>>>> >>>>>>> >>>>>>> >>>>>>> This looks great. I think there is value in considering the other >>>>>>> approach further but I would like to do this part first. It would be >>>>>>> nice to also add priority as a greater influence in the load >>>>>>> balancing >>>>>>> as well. >>>>>> >>>>>> >>>>>> >>>>>> I haven't got good idea yet about balancing priorities, but I've >>>>>> rewritten >>>>>> balancer itself. As soon as sched_lowest() / sched_highest() are more >>>>>> intelligent now, they allowed to remove topology traversing from the >>>>>> balancer itself. That should fix double-swapping problem, allow to >>>>>> keep >>>>>> some >>>>>> affinity while moving threads and make balancing more fair. I did >>>>>> number >>>>>> of >>>>>> tests running 4, 8, 9 and 16 CPU-bound threads on 8 CPUs. With 4, 8 >>>>>> and >>>>>> 16 >>>>>> threads everything is stationary as it should. With 9 threads I see >>>>>> regular >>>>>> and random load move between all 8 CPUs. Measurements on 5 minutes run >>>>>> show >>>>>> deviation of only about 5 seconds. It is the same deviation as I see >>>>>> caused >>>>>> by only scheduling of 16 threads on 8 cores without any balancing >>>>>> needed >>>>>> at >>>>>> all. So I believe this code works as it should. >>>>>> >>>>>> Here is the patch: http://people.freebsd.org/~mav/sched.htt40.patch >>>>>> >>>>>> I plan this to be a final patch of this series (more to come :)) and >>>>>> if >>>>>> there will be no problems or objections, I am going to commit it >>>>>> (except >>>>>> some debugging KTRs) in about ten days. So now it's a good time for >>>>>> reviews >>>>>> and testing. :) >>>>>> >>>>> is there a place where all the patches are available ? >>>> >>>> >>>> >>>> All my scheduler patches are cumulative, so all you need is only the >>>> last >>>> mentioned here sched.htt40.patch. >>>> >>> You may want to have a look to the result I collected in the >>> `runs/freebsd-experiments' branch of: >>> >>> https://github.com/lacombar/hackbench/ >>> >>> and compare them with vanilla FreeBSD 9.0 and -CURRENT results >>> available in `runs/freebsd'. On the dual package platform, your patch >>> is not a definite win. >>> >>>> But in some cases, especially for multi-socket systems, to let it show >>>> its >>>> best, you may want to apply additional patch from avg@ to better detect >>>> CPU >>>> topology: >>>> >>>> https://gitorious.org/~avg/freebsd/avgbsd/commit/6bca4a2e4854ea3fc275946a023db65c483cb9dd >>>> >>> test I conducted specifically for this patch did not showed much >>> improvement... >> >> >> Can you please clarify on this point? >> The test you did included cases where the topology was detected badly >> against cases where the topology was detected correctly as a patched >> kernel (and you still didn't see a performance improvement), in terms >> of cache line sharing? > > > At this moment SCHED_ULE does almost nothing in terms of cache line sharing > affinity (though it probably worth some further experiments). What this > patch may improve is opposite case -- reduce cache sharing pressure for > cache-hungry applications. For example, proper cache topology detection > (such as lack of global L3 cache, but shared L2 per pairs of cores on > Core2Quad class CPUs) increases pbzip2 performance when number of threads is > less then number of CPUs (i.e. when there is place for optimization). My asking is not referred to your patch really. I just wanted to know if he correctly benchmark a case where the topology was screwed up and then correctly recognized by avg's patch in terms of cache level aggregation (it wasn't referred to your patch btw). Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: [RFT][patch] Scheduling for HTT and not only
Il 05 aprile 2012 19:12, Arnaud Lacombe ha scritto: > Hi, > > [Sorry for the delay, I got a bit sidetrack'ed...] > > 2012/2/17 Alexander Motin : >> On 17.02.2012 18:53, Arnaud Lacombe wrote: >>> >>> On Fri, Feb 17, 2012 at 11:29 AM, Alexander Motin wrote: On 02/15/12 21:54, Jeff Roberson wrote: > > On Wed, 15 Feb 2012, Alexander Motin wrote: >> >> I've decided to stop those cache black magic practices and focus on >> things that really exist in this world -- SMT and CPU load. I've >> dropped most of cache related things from the patch and made the rest >> of things more strict and predictable: >> http://people.freebsd.org/~mav/sched.htt34.patch > > > This looks great. I think there is value in considering the other > approach further but I would like to do this part first. It would be > nice to also add priority as a greater influence in the load balancing > as well. I haven't got good idea yet about balancing priorities, but I've rewritten balancer itself. As soon as sched_lowest() / sched_highest() are more intelligent now, they allowed to remove topology traversing from the balancer itself. That should fix double-swapping problem, allow to keep some affinity while moving threads and make balancing more fair. I did number of tests running 4, 8, 9 and 16 CPU-bound threads on 8 CPUs. With 4, 8 and 16 threads everything is stationary as it should. With 9 threads I see regular and random load move between all 8 CPUs. Measurements on 5 minutes run show deviation of only about 5 seconds. It is the same deviation as I see caused by only scheduling of 16 threads on 8 cores without any balancing needed at all. So I believe this code works as it should. Here is the patch: http://people.freebsd.org/~mav/sched.htt40.patch I plan this to be a final patch of this series (more to come :)) and if there will be no problems or objections, I am going to commit it (except some debugging KTRs) in about ten days. So now it's a good time for reviews and testing. :) >>> is there a place where all the patches are available ? >> >> >> All my scheduler patches are cumulative, so all you need is only the last >> mentioned here sched.htt40.patch. >> > You may want to have a look to the result I collected in the > `runs/freebsd-experiments' branch of: > > https://github.com/lacombar/hackbench/ > > and compare them with vanilla FreeBSD 9.0 and -CURRENT results > available in `runs/freebsd'. On the dual package platform, your patch > is not a definite win. > >> But in some cases, especially for multi-socket systems, to let it show its >> best, you may want to apply additional patch from avg@ to better detect CPU >> topology: >> https://gitorious.org/~avg/freebsd/avgbsd/commit/6bca4a2e4854ea3fc275946a023db65c483cb9dd >> > test I conducted specifically for this patch did not showed much > improvement... Can you please clarify on this point? The test you did included cases where the topology was detected badly against cases where the topology was detected correctly as a patched kernel (and you still didn't see a performance improvement), in terms of cache line sharing? Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: NTFS GSoC Project Idea
2012/3/27, Efstratios Karatzas : > On Tue, Mar 27, 2012 at 11:06 PM, Gleb Kurtsou > wrote: > >> On (26/03/2012 21:13), Efstratios Karatzas wrote: >> > Greetings, >> > >> > I am a FreeBSD GSoC 2010 student, looking to participate in this years >> > GSoC. The project that I wish to work on is the FreeBSD NTFS driver. >> > >> > I 've already discussed my project idea with Attilio@ who suggested >> that I >> > forward my proposal to the hackers mailing list in order to get more >> > feedback and see if there's anyone interested in mentoring a NTFS >> project. >> > >> > The current draft of the technical part of my proposal(pdf & plain text) >> > can be found here: >> > >> > http://cgi.di.uoa.gr/~std06101/ntfs/ntfs_proposal.tar >> > >> > The project idea focuses on mp-safing the NTFS driver as well as adding >> > extra write support. I've tried to merge the two conflicting NTFS >> > project >> > ideas in the ideas wiki page, into one. One of them suggesting that work >> > should be done on the current driver (mp-safe it etc) and the other one >> > suggesting that we port Apple's NTFS driver from scratch. The concept is >> > that we keep most of our vnode/vfs code (i.e. ntfs_vfsops.c, >> ntfs_vnops.c) >> > and rework existing infrastructure as needed as well as implement new >> > features using Apple's driver as a guide. >> >> I'm not sure I follow your idea, but I'd suggest sticking to a single >> project: either improve FreeBSD NTFS or do the port. FreeBSD and Darwin >> ntfs implementations are completely unrelated thus porting features from >> Darwin won't be easy. >> >> > This way, we avoid the major >> > changes in Apple's VFS (is there any documentation to be found?) and >> > port >> > code that won't break current functionality. >> >> I bet "major changes in Apple's VFS" are easier to deal with than >> "merging" two unrelated file systems. XNU VFS is based on FreeBSD 5 VFS >> and they still share a lot in common, e.g. vnode operations themselves >> are nearly the same, e.g. extended attributes, locking, buffer cache are >> different. >> >> Take a look at HFS+ port. It's unmaintained and outdated but page >> contains link to CVS repository snapshot. >> http://people.freebsd.org/~yar/hfs/ >> >> > I tried to keep this e-mail brief so If you wish to know more, please >> refer >> > to the proposal. >> > >> > Thank you >> > >> > -- >> > >> > Efstratios "GPF" Karatzas >> > > Since the FreeBSD wiki has two conflicting NTFS ideas, I thought of > submitting two proposals: one to work on the current driver and one to port > Apple's driver and let the FreeBSD team choose, should they wish to pick > one of them. Attilio suggested that perhaps a merge of those two ideas was > better than two separate proposals and this was my attempt to merge those > ideas; I hear what you say though. What I specifically had in mind is: - make NTFS mpsafe - bugbusting - surf in Apple implementation/history, check for bugfixes there and import in the FreeBSD implementation Unfortunately our idea page is not as updated as you want it to be, thus it happens to find duplicate ideas list, not well merged, etc. Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Kernel threads inherit CPU affinity from random sibling
2012/2/24, Eugene Grosbein : > 24.02.2012 18:45, Attilio Rao пишет: > >>> I have the pathological test-case for it: >>> http://www.freebsd.org/cgi/query-pr.cgi?pr=165444 >>> >> >> A fix has been committed as r230984, it should apply to STABLE_9/8 >> too, can you try it? >> >> Attilio >> >> > > I will try but I already run my patch for netisr, so it would be difficult > to see what helps :-) Yes, I mean, can you back your patch out? Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Kernel threads inherit CPU affinity from random sibling
2012/2/24, Eugene Grosbein : > 28.01.2012 20:22, Attilio Rao пишет: > >> 2012/1/28 Ryan Stone : >>> On Fri, Jan 27, 2012 at 10:41 PM, Attilio Rao >>> wrote: >>>> I think what you found out is very sensitive. >>>> However, the patch is not correct as you cannot call >>>> cpuset_setthread() with thread_lock held. >>> >>> Whoops! I actually discovered that for myself and had already fixed >>> it, but apparently I included an old version of the patch in the >>> email. >>> >>>> Hence this is my fix: >>>> http://www.freebsd.org/~attilio/cpuset_root.patch >>> >>> Oh, I do like this better. I tried something similar myself but >>> abandoned it because I misread how sched_affinity() was implemented by >>> 4BSD(I had gotten the impression that once TSF_AFFINITY is set it >>> could never be cleared). >> >> Do you have a pathological test-case for it? Are you going to test the >> patch? > > I have the pathological test-case for it: > http://www.freebsd.org/cgi/query-pr.cgi?pr=165444 > A fix has been committed as r230984, it should apply to STABLE_9/8 too, can you try it? Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: sem(4) lockup in python?
2012/2/5 Ivan Voras : > On 5 February 2012 11:44, Garrett Cooper wrote: > >> >> 'make MAKE_JOBS_NUMBER=1' is the workground used right now.. > > David Xu suggested that it is a bug in Python - it doesn't set > process-shared attribute when it calls sem_init(), but i've tried > patching it (replacing the port patchfile file the one I've attached) > and I still get the hang. Guys, it would be valuable if you do the following: 1) recompile your kernel with INVARIANTS, WITNESS and without WITNESS_SKIPSPIN 2a) If you have a serial console, please run the DDB stuff through it (go to point 3) 2b) If you don't have a serial console please run the DDB stuff in textdump (go to point 3) 3) Collect the following informations: - show allpcpu - show alllocks - ps - alltrace 3a) If you had the serial console (thus not textdump) please collect the coredump with: call doadump 4) reset your machine You will end up with the textdump or coredump + all the serial logs necessary to debug this. If you cannot reproduce your issue with WITNESS enabled, please remove from your kernel config and avoid to call 'show alllocks' when in DDB. But try to leave INVARIANTS on. Hope this helps, Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Kernel threads inherit CPU affinity from random sibling
2012/1/28 Attilio Rao : > 2012/1/28 Ryan Stone : >> On Fri, Jan 27, 2012 at 10:41 PM, Attilio Rao wrote: >>> I think what you found out is very sensitive. >>> However, the patch is not correct as you cannot call >>> cpuset_setthread() with thread_lock held. >> >> Whoops! I actually discovered that for myself and had already fixed >> it, but apparently I included an old version of the patch in the >> email. >> >>> Hence this is my fix: >>> http://www.freebsd.org/~attilio/cpuset_root.patch >> >> Oh, I do like this better. I tried something similar myself but >> abandoned it because I misread how sched_affinity() was implemented by >> 4BSD(I had gotten the impression that once TSF_AFFINITY is set it >> could never be cleared). > > Do you have a pathological test-case for it? Are you going to test the patch? BTW, I've just now updated the patch in order to remove an added white line and s/priority/affinity in comments. Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Kernel threads inherit CPU affinity from random sibling
2012/1/28 Ryan Stone : > On Fri, Jan 27, 2012 at 10:41 PM, Attilio Rao wrote: >> I think what you found out is very sensitive. >> However, the patch is not correct as you cannot call >> cpuset_setthread() with thread_lock held. > > Whoops! I actually discovered that for myself and had already fixed > it, but apparently I included an old version of the patch in the > email. > >> Hence this is my fix: >> http://www.freebsd.org/~attilio/cpuset_root.patch > > Oh, I do like this better. I tried something similar myself but > abandoned it because I misread how sched_affinity() was implemented by > 4BSD(I had gotten the impression that once TSF_AFFINITY is set it > could never be cleared). Do you have a pathological test-case for it? Are you going to test the patch? Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Kernel threads inherit CPU affinity from random sibling
2012/1/28 Ryan Stone : > Right now, whenever a thread is spawned, it inherits CPU affinity from > its "parent" thread. I put parent in scare quotes because as far as I > can tell, for a kernel thread the parent is essentially chosen > arbitrarily (it looks like it is the most recent thread spawned in > that kernel process). Inheriting affinity is arguably correct for > userland threads (at the very least, an explicit design decision to > implement inheritance was clearly implemented). However for kernel > threads, this behaviour leads to bizarre results that clearly weren't > intended. For example, I added a printf to sched_fork_thread() that > prints a message every time a thread inherits affinity from its > "parent". Here's the results from booting a dual-core VM with > net.isr.bindthreads=1: > > Thread 16 inheriting affinity from parent swi1: netisr 0(15) > Thread 17 inheriting affinity from parent swi3: vm(16) > Thread 18 inheriting affinity from parent swi4: clock(17) > Thread 100014 inheriting affinity from parent swi4: clock(18) > Thread 100017 inheriting affinity from parent swi5: +(100014) > Thread 100018 inheriting affinity from parent swi6: Giant taskq(100017) > Thread 100022 inheriting affinity from parent swi6: task queue(100018) > Thread 100025 inheriting affinity from parent swi2: cambio(100022) > Thread 100026 inheriting affinity from parent irq14: ata0(100025) > Thread 100027 inheriting affinity from parent irq15: ata1(100026) > Thread 100028 inheriting affinity from parent irq19: le0(100027) > Thread 100029 inheriting affinity from parent irq21: pcm0(100028) > Thread 100034 inheriting affinity from parent irq22: ohci0(100029) > Thread 100035 inheriting affinity from parent irq1: atkbd0(100034) > > The result is that every thread in intr kernel process ends up > inheriting the affinity(my favourite part is that both softclock > threads are now forced to fight over the same CPU). I am working on > the following patch that adds a flag to > sched_fork()/sched_fork_thread() that can be passed to force the child > thread to *not* inherit affinity. However, this doesn't address > affinity inheritance from calls to kproc_create(). I suppose that I > could push the flag up into fork1(), but I'd prefer not to. I am > wondering if it even makes sense for affinity to be inherited from a > parent process even for userland processes. > > What are peoples thoughts? Is this the right approach? I think what you found out is very sensitive. However, the patch is not correct as you cannot call cpuset_setthread() with thread_lock held. Also, I don't think we need the flag bloat, to be honest. IMHO, we should choose a default policy for sched_fork{_thread}() and have the opposite case to fix the thread inherited affinity manually. For example, for the shortest patch, we can go with sched_fork{_thread}() functions to directly inherited parent affinity which is ok in the userland case but needs to be fixed in the kthread/kproc case. Hence this is my fix: http://www.freebsd.org/~attilio/cpuset_root.patch (just test-compiled, so if you agree and want to test further, be my guest). Maybe we can also add a comment to sched_fork() guys explaining the default policy, but the code seems self-explanatory enough to me. What do you think about it? Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: 2 years student project
2012/1/14 Kip Macy : > Many of the ideas on that page are stale. I believe that the most > promising approach would be to figure out what area of the system > you're most interested in, e.g. networking, file systems, virtual > memory, scheduling etc., make an honest appraisal of your development > abilities (the rate at which you can come up to speed working with a > large body of code and your ability to read technical documentation), > and then contact the mentor for one of the "Ideas" in your area of > interest to discuss how to proceed further. In order to have any > likelihood of success you'll need to find someone who is willing to > take the time to discuss things with you on a regular basis. In addition to what Kip suggests, there are also a lot of other ideas that are really very subsystem-specific and are not listed in the "Ideas" page, thus if you figure out the kernel areas where you want to work (and, as Kip says, the rate of your learning) we can provide you a list of people to contact for any area which can provide several degrees of mentorship too. Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Per-mount syncer threads and fanout for pagedaemon cleaning
2011/12/27 : > On Tue, Dec 27, 2011 at 8:05 AM, Attilio Rao wrote: >> 2011/12/27 Giovanni Trematerra : >>> On Mon, Dec 26, 2011 at 9:24 PM, Venkatesh Srinivas >>> wrote: >>>> Hi! >>>> >>>> I've been playing with two things in DragonFly that might be of interest >>>> here. >>>> >>>> Thing #1 := >>>> >>>> First, per-mountpoint syncer threads. Currently there is a single thread, >>>> 'syncer', which periodically calls fsync() on dirty vnodes from every >>>> mount, >>>> along with calling vfs_sync() on each filesystem itself (via syncer >>>> vnodes). >>>> >>>> My patch modifies this to create syncer threads for mounts that request it. >>>> For these mounts, vnodes are synced from their mount-specific thread rather >>>> than the global syncer. >>>> >>>> The idea is that periodic fsync/sync operations from one filesystem should >>>> not >>>> stall or delay synchronization for other ones. >>>> The patch was fairly simple: >>>> http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/50e4012a4b55e1efc595db0db397b4365f08b640 >>>> >>> >>> There's something WIP by attilio@ on that area. >>> you might want to take a look at >>> http://people.freebsd.org/~attilio/syncer_alpha_15.diff >>> >>> I don't know what hammerfs needs but UFS/FFS and buffer cache make a good >>> job performance-wise and so the authors are skeptical about the boost that >>> such >>> a change can give. We believe that brain cycles need to be spent on >>> other pieces of the system such as ARC and ZFS. >> >> More specifically, it is likely that focusing on UFS and buffer cache >> for performance is not really useful, we should drive our efforts over >> ARC and ZFS. >> Also, the real bottlenecks in our I/O paths are in GEOM >> single-threaded design, lack of unmapped I/O functionality, possibly >> lack of proritized I/O, etc. > > Indeed, Isilon (and probably other vendors as well) entirely skip > VFS_SYNC when the WAIT argument is MNT_LAZY. Since we're a > distributed journalled filesystem, syncing via a system thread is not > a relevant operation; i.e. all writes that have exited a VOP_WRITE or > similar operation are already in reasonably stable storage in a > journal on the relevant nodes. > > However, we do then have our own threads running on each node to flush > the journal regularly (in addition to when it fills up), and I don't > know enough about this to know if it could be fit into the syncer > thread idea or if it's too tied in somehow to our architecture. I'm not really sure how does journaling is implemented on OneFS, but when I made this patch SU+J wasn't yet there. Also, this patch just adds the infrastructure for a multithreaded and configurable syncer, which means it still requires the UFS bits for skipping the "double-syncing" (alias the MNT_LAZY skippage you mentioned). Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Per-mount syncer threads and fanout for pagedaemon cleaning
2011/12/27 Giovanni Trematerra : > On Mon, Dec 26, 2011 at 9:24 PM, Venkatesh Srinivas > wrote: >> Hi! >> >> I've been playing with two things in DragonFly that might be of interest >> here. >> >> Thing #1 := >> >> First, per-mountpoint syncer threads. Currently there is a single thread, >> 'syncer', which periodically calls fsync() on dirty vnodes from every mount, >> along with calling vfs_sync() on each filesystem itself (via syncer vnodes). >> >> My patch modifies this to create syncer threads for mounts that request it. >> For these mounts, vnodes are synced from their mount-specific thread rather >> than the global syncer. >> >> The idea is that periodic fsync/sync operations from one filesystem should >> not >> stall or delay synchronization for other ones. >> The patch was fairly simple: >> http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/50e4012a4b55e1efc595db0db397b4365f08b640 >> > > There's something WIP by attilio@ on that area. > you might want to take a look at > http://people.freebsd.org/~attilio/syncer_alpha_15.diff > > I don't know what hammerfs needs but UFS/FFS and buffer cache make a good > job performance-wise and so the authors are skeptical about the boost that > such > a change can give. We believe that brain cycles need to be spent on > other pieces of the system such as ARC and ZFS. More specifically, it is likely that focusing on UFS and buffer cache for performance is not really useful, we should drive our efforts over ARC and ZFS. Also, the real bottlenecks in our I/O paths are in GEOM single-threaded design, lack of unmapped I/O functionality, possibly lack of proritized I/O, etc. Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: question about the exchanges of td->td_lock and mtx in sched_switch() of sched_ule
2011/10/13 Haozhong Zhang : > Hi, > > I'm recently reading the code of sched_ule in freebsd 8.2.0 and have two > questions. > > 1. sched_switch() (in sched_ule.c) invokes cpu_switch() (at line 1852) and > thread_unblock_switch() (at line 1867). These two functions exchange > td->td_lock and mtx. What are the purposes of these exchanges? > > 2. Can the exchange in cpu_switch() (in amd64/amd64/cpu_switch.S, at line > 134) be done before calling cpu_switch()? I mean, does this reorder of > exchange and other operations in cpu_switch() cause side-effects to other > code? This thread should have the details you need: http://lists.freebsd.org/pipermail/svn-src-all/2010-January/019345.html Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Re: Kernel timers infrastructure
Besides I'd also suggest to have the callout rearming as the very last step of you callback in order to avoid buffering interleaving issues. Attilio 2011/9/12 Kostik Belousov : > On Mon, Sep 12, 2011 at 11:48:42AM +0200, "Marc L?rner" wrote: >> Hello, >> what about changing order of callout_reset and uprintf? >> And your timeout isn't 1minute, it's one second! >> >> Regards, >> Marc >> >> >I already did that to ensure timer_event_handler would be called correctly. >> > >> >The result follows: >> > >> >freebsd# kldload ./timer.ko >> >timer_event_handler() with MOD_LOAD >> > >> >freebsd# kldunload ./timer.ko >> >timer_event_handler() with MOD_UNLOAD >> > >> >and I maintained the module load for about 1 minute so the timer printing >> >>"Hello, World!" should have been run. >> > >> >Filippo >> > >> >On 12/set/2011, at 11:24, Adrian Chadd wrote: >> > >> >> How about adding some printfs() to the functions to ensure they're being >> >> called? >> >> > > The callouts are executed in the context that does not have the controlling > terminal. uprintf(9) tries to use the ctty for output. > > Use printf(9) to get something on console. > -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Large machine test ideas
2011/9/1 Ivan Voras : > On 1 September 2011 16:11, Attilio Rao wrote: > >>> I mean, if we have 2 cpus in a machine, but MAXCPU is set to 256, there >>> is a bunch of "lost" memory and higher levels of lock contention? >>> >>> I thought that attilio was taking a stab at enhancing this, but at the >>> current time anything more than a value of 64 for MAXCPU is kind of a >>> "caveat emptor" area of FreeBSD. >> >> With newest current you can redefine MAXCPU in your kernel config, so >> you don't need to bump the default value. >> I think 64 as default value is good enough. >> >> Removing MAXCPU dependency from the KBI is an important project >> someone should adopt and bring to conclusion. > > That's certainly one half of it and thanks for the work, but the real > question in this thread is what Sean asked: what are the negative > side-effects of simply bumping MAXCPU to 256 by default? AFAIK, there > are not that many structures which are statically sized by MAXCMPU and > most use the runtime-detected smp_cpus? > Well, there are quite a few statically allocated, but as I said, making the kernel MAXCPU-agnostic (or sort of agnostic) is a goal and a good project. Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Large machine test ideas
2011/8/31 Sean Bruno : > On Tue, 2011-08-30 at 17:11 -0700, Ivan Voras wrote: >> On 29.8.2011. 20:15, John Baldwin wrote: >> >> > However, the SRAT code just ignores the table when it encounters an issue >> > like >> > this, it doesn't hang. Something else later in the boot must have hung. >> >> Anyway... that machine can in its maximal configuration be populated >> with eight 10-core CPUs, i.e. 80 physical / 160 logical, so here's a >> vote from me to bump the shiny new cpuset infrastructure maximum CPU >> count to 256 before 9.0. >> >> http://www.supermicro.com/products/system/5U/5086/SYS-5086B-TRF.cfm > > Doesn't that (MAXCPU) seriously impact VM usage, lock contention > etc ... ? > > I mean, if we have 2 cpus in a machine, but MAXCPU is set to 256, there > is a bunch of "lost" memory and higher levels of lock contention? > > I thought that attilio was taking a stab at enhancing this, but at the > current time anything more than a value of 64 for MAXCPU is kind of a > "caveat emptor" area of FreeBSD. With newest current you can redefine MAXCPU in your kernel config, so you don't need to bump the default value. I think 64 as default value is good enough. Removing MAXCPU dependency from the KBI is an important project someone should adopt and bring to conclusion. Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: debugging frequent kernel panics on 8.2-RELEASE
2011/8/18 Andriy Gapon : > on 17/08/2011 23:21 Andriy Gapon said the following: >> >> It seems like everything starts with some kind of a race between >> terminating >> processes in a jail and termination of the jail itself. This is where the >> details are very thin so far. What we see is that a process (http) is in >> exit(2) syscall, in exit1() function actually, and past the place where >> P_WEXIT >> flag is set and even past the place where p_limit is freed and reset to >> NULL. >> At that place the thread calls prison_proc_free(), which calls >> prison_deref(). >> Then, we see that in prison_deref() the thread gets a page fault because >> of what >> seems like a NULL pointer dereference. That's just the start of the >> problem and >> its root cause. >> >> Then, trap_pfault() gets invoked and, because addresses close to NULL look >> like >> userspace addresses, vm_fault/vm_fault_hold gets called, which in its turn >> goes >> on to call vm_map_growstack. First thing that vm_map_growstack does is a >> call >> to lim_cur(), but because p_limit is already NULL, that call results in a >> NULL >> pointer dereference and a page fault. Goto the beginning of this >> paragraph. >> >> So we get this recursion of sorts, which only ends when a stack is >> exhausted and >> a CPU generates a double-fault. > > BTW, does anyone has an idea why the thread in question would "disappear" > from > the kgdb's point of view? > > (kgdb) p cpuid_to_pcpu[2]->pc_curthread->td_tid > $3 = 102057 > (kgdb) tid 102057 > invalid tid > > info threads also doesn't list the thread. > > Is it because the panic happened while the thread was somewhere in exit1()? > is there an easy way to examine its stack in this case? Yes it is likely it. 'tid' command should lookup the tid_to_thread() table (or similar name) which returns NULL, which means the thread has past beyond the point it was in the lookup table. Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: SMP question w.r.t. reading kernel variables
2011/4/17 Rick Macklem : > Hi, > > I should know the answer to this, but... When reading a global kernel > variable, where its modifications are protected by a mutex, is it > necessary to get the mutex lock to just read its value? > > For example: > A if ((mp->mnt_kern_flag & MNTK_UNMOUNTF) != 0) > return (EPERM); > versus > B MNT_ILOCK(mp); > if ((mp->mnt_kern_flag & MNTK_UNMOUNTF) != 0) { > MNT_IUNLOCK(mp); > return (EPERM); > } > MNT_IUNLOCK(mp); > > My hunch is that B is necessary if you need an up-to-date value > for the variable (mp->mnt_kern_flag in this case). > > Is that correct? > > Thanks in advance for help with this, rick In general, FreeBSD kernel assumes that read and writes of the word boundry and of the int types are always atomic. Considering this, if a kernel variable is of int type or word boundry size you don't strictly need a lock there. Anyway, locking also bring some side effect, like usage of memory and compiler barriers... while it is true that bounded read/writes should be seq points, it is not too obvious to predict if the barrier is necessary or not, is implied or not in every architecture we support. Said that, for a matter of consistency and of better semantic, I prefer to also lock "simple" read/writes when the objects are explicitly equipped to do that. Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: atomic_set_xxx(&x, 0)
2010/12/7 Andriy Gapon : > > $ glimpse atomic_set_ | fgrep -w 0 > /usr/src/sys/dev/arcmsr/arcmsr.c: > atomic_set_int(&acb->srboutstandingcount, 0); > /usr/src/sys/dev/arcmsr/arcmsr.c: > atomic_set_int(&acb->srboutstandingcount, 0); > /usr/src/sys/dev/jme/if_jme.c: atomic_set_int(&sc->jme_morework, 0); > /usr/src/sys/dev/jme/if_jme.c: atomic_set_int(&sc->jme_morework, 0); > /usr/src/sys/dev/ale/if_ale.c: atomic_set_int(&sc->ale_morework, 0); > /usr/src/sys/mips/rmi/dev/xlr/rge.c: > atomic_set_int(&(priv->frin_to_be_sent[i]), 0); > /usr/src/sys/dev/drm/drm_irq.c: > atomic_set_rel_32(&dev->vblank[i].count, 0); > /usr/src/sys/dev/cxgb/ulp/tom/cxgb_tom.c: > atomic_set_int(&t->tids_in_use, 0); > > I wonder if these are all bugs and atomic_store_xxx() was actually intended? Besides, we assume store on the int boundary is implicitly atomic on all architectures on FreeBSD then, unless a memory barriers is needed, there is no need to use atomic_* in "store" cases. Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Best way to determine if an IRQ is present
2010/11/25 Andriy Gapon : > on 25/11/2010 17:28 John Baldwin said the following: >> Andriy Gapon wrote: >>> on 22/11/2010 16:24 John Baldwin said the following: Well, the real solution is actually larger than described in the PR. What you really want to do is take the logical CPUs offline when they are "halted". Taking a CPU offline should trigger an EVENTHANDLER that various bits of code could invoke. In the case of platforms that support binding interrupts to CPUs (x86 and sparc64 at least), they would install an event handler that searches the MD interrupt tables (e.g. the interrupt_sources[] array on x86) and move bound interrupts to other CPUs. However, I think all the interrupt bits will be MD, not MI. >>> >>> That's a good idea and a comprehensive approach. >>> One minor technical detail - should an offlined CPU be removed from all_cpus >>> mask/set? >> >> That's tricky. In other e-mails I've had on this topic, the idea has been >> to have >> a new online_cpus mask and maybe a CPU_ONLINE() test macro similar to >> CPU_ABSENT(). In that case, an offline CPU should still be in all_cpus, but >> many >> places that use all_cpus would need to use online_cpus instead. >> > > This sounds like a plan. > CPU_FOREACH_ONLINE() could also come handy, > Thanks! One of the biggest issues here is that these bitmasks become no-more fixed for the kernel timelife, but you need proper locking to access them. I recall that one of the point for this plan is to benchmark and eventually optimize performance (as it is easilly feasible) on writing for rmlocks. In order to have a fully effective plan, there are several nits that need to be addressed here, I may have collected somewhere in a file, when I find it I will send to you. Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: KDB_TRACE and no backend
2010/9/18 Andriy Gapon : > on 18/09/2010 23:35 Attilio Rao said the following: >> It is still missing checking on opt_stack.h > > Yes, thanks, fixed it in my tree. > >> Besides, I'd reconsider having KDB_TRACE explanation in ddb(4) manpage >> (right now it is rightly there because it is DDB specific only, as >> long as it offers the backend, but with your change it is a global >> functionality. Not sure if it worths changing it but however you may >> have more opinions). > > It seems that we don't have kdb(4) ? > We don't and we should really have. I'd really like a kernel section describing the whole kdb infrastructure and kdbe hooks. That may be indicated as a janitor taks actually if someone wants to takeover and document the whole layer. Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: KDB_TRACE and no backend
2010/9/18 Andriy Gapon : > on 18/09/2010 22:00 Andriy Gapon said the following: >> Oh, wow, and I totally overlooked stack_print(). >> Should have read stack(9) from the start. > > New patch. Hope this is better. > I don't like that the printf is duplicated, but couldn't figure out a way to > combine pre-processor and C conditions. > > --- a/sys/kern/subr_kdb.c > +++ b/sys/kern/subr_kdb.c > @@ -37,6 +37,7 @@ __FBSDID("$FreeBSD$"); > #include > #include > #include > +#include > #include > > #include > @@ -300,6 +301,15 @@ kdb_backtrace(void) > printf("KDB: stack backtrace:\n"); > kdb_dbbe->dbbe_trace(); > } > +#ifdef STACK > + else { > + struct stack st; > + > + printf("KDB: stack backtrace:\n"); > + stack_save(&st); > + stack_print(&st); > + } > +#endif > } > > /* > It is still missing checking on opt_stack.h Besides, I'd reconsider having KDB_TRACE explanation in ddb(4) manpage (right now it is rightly there because it is DDB specific only, as long as it offers the backend, but with your change it is a global functionality. Not sure if it worths changing it but however you may have more opinions). Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: KDB_TRACE and no backend
2010/9/18 Andriy Gapon : > > Here's a small patch that adds support for printing stack trace in form of > frame > addresses when KDB_TRACE is enabled, but there is no debugger backend > configured. > The patch is styled after "cheap" variant of stack_ktr. > > What do you think (useful/useless, correct, etc) ? > > --- a/sys/kern/subr_kdb.c > +++ b/sys/kern/subr_kdb.c > @@ -37,6 +37,7 @@ > #include > #include > #include > +#include > #include > > #include > @@ -295,10 +296,16 @@ > void > kdb_backtrace(void) > { > + struct stack st; > + int i; > > - if (kdb_dbbe != NULL && kdb_dbbe->dbbe_trace != NULL) { > - printf("KDB: stack backtrace:\n"); > + printf("KDB: stack backtrace:\n"); > + if (kdb_dbbe != NULL && kdb_dbbe->dbbe_trace != NULL) > kdb_dbbe->dbbe_trace(); > + else { > + stack_save(&st); > + for (i = 0; i < st.depth; i++) > + printf("#%d %p\n", i, (void*)(uintptr_t)st.pcs[i]); > } > } You have to eventually wrap this logic within the 'STACK' option (opt_stack.h for the check) because stack_save() will be uneffective otherwise. STACK should be mandatory for DDB I guess, but it is not for KDB. Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: kld modules remain loaded if MOD_LOAD handler returns an error
2010/8/23 Andriy Gapon : > on 23/08/2010 15:10 John Baldwin said the following: >> On Friday, August 20, 2010 1:13:53 pm Ryan Stone wrote: >>> Consider the following modules: >>> >>> /* first.c */ >>> static int *test; >>> >>> int >>> test_function(void) >>> { >>> return *test; >>> } >>> >>> static int >>> first_modevent(struct module *m, int what, void *arg) >>> { >>> int err = 0; >>> >>> switch (what) { >>> case MOD_LOAD: /* kldload */ >>> test = malloc(sizeof(int), M_TEMP, M_NOWAIT | M_ZERO); >>> if (!test) >>> err = ENOMEM; >>> break; >>> case MOD_UNLOAD: /* kldunload */ >>> break; >>> default: >>> err = EINVAL; >>> break; >>> } >>> return(err); >>> } >>> >>> static moduledata_t first_mod = { >>> "first", >>> first_modevent, >>> NULL >>> }; >>> >>> DECLARE_MODULE(first, first_mod, SI_SUB_KLD, SI_ORDER_ANY); >>> MODULE_VERSION(first, 1); >>> >>> >>> /* second.c */ >>> static int >>> second_modevent(struct module *m, int what, void *arg) >>> { >>> int err = 0; >>> >>> switch (what) { >>> case MOD_LOAD: /* kldload */ >>> test_function(); >>> break; >>> case MOD_UNLOAD: /* kldunload */ >>> break; >>> default: >>> err = EINVAL; >>> break; >>> } >>> return(err); >>> } >>> >>> static moduledata_t second_mod = { >>> "second", >>> second_modevent, >>> NULL >>> }; >>> >>> DECLARE_MODULE(second, second_mod, SI_SUB_KLD, SI_ORDER_ANY); >>> MODULE_DEPEND(second, first, 1, 1, 1); >>> >>> >>> Consider the case where malloc fails in first_modevent. >>> first_modevent will return ENOMEM, but the module will remain loaded. >>> Now when the second module goes and loads, it calls into the first >>> module, which is not initialized properly, and promptly crashes when >>> test_function() dereferences a null pointer. >>> >>> It seems to me that a module should be unloaded if it returns an error >>> from its MOD_LOAD handler. However, that's easier said than done. >>> The MOD_LOAD handler is called from a SYSINIT, and there's no >>> immediately obvious way to pass information about the failure from the >>> SYSINIT to the kernel linker. Anybody have any thoughts on this? >> >> Yeah, it's not easy to fix. Probably we could patch the kernel linker to >> notice if any of the modules for a given linker file had errors during >> initialization and trigger an unload if that occurs. I don't think this >> would >> be too hard to do. > > John, > > please note that for this testcase we would need to prevent second module's > modevent from being executed at all. > Perhaps a module shouldn't be considered as loaded until modevent caller > marks it > as successfully initialized, but I haven't looked at the actual code. I replied in private, but I really agree with Andriy here. Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: sched_pin() versus PCPU_GET
2010/8/4 : > On Fri, Jul 30, 2010 at 2:31 PM, John Baldwin wrote: >> On Friday, July 30, 2010 10:08:22 am John Baldwin wrote: >>> On Thursday, July 29, 2010 7:39:02 pm m...@freebsd.org wrote: >>> > We've seen a few instances at work where witness_warn() in ast() >>> > indicates the sched lock is still held, but the place it claims it was >>> > held by is in fact sometimes not possible to keep the lock, like: >>> > >>> > thread_lock(td); >>> > td->td_flags &= ~TDF_SELECT; >>> > thread_unlock(td); >>> > >>> > What I was wondering is, even though the assembly I see in objdump -S >>> > for witness_warn has the increment of td_pinned before the PCPU_GET: >>> > >>> > 802db210: 65 48 8b 1c 25 00 00 mov %gs:0x0,%rbx >>> > 802db217: 00 00 >>> > 802db219: ff 83 04 01 00 00 incl 0x104(%rbx) >>> > * Pin the thread in order to avoid problems with thread migration. >>> > * Once that all verifies are passed about spinlocks ownership, >>> > * the thread is in a safe path and it can be unpinned. >>> > */ >>> > sched_pin(); >>> > lock_list = PCPU_GET(spinlocks); >>> > 802db21f: 65 48 8b 04 25 48 00 mov %gs:0x48,%rax >>> > 802db226: 00 00 >>> > if (lock_list != NULL && lock_list->ll_count != 0) { >>> > 802db228: 48 85 c0 test %rax,%rax >>> > * Pin the thread in order to avoid problems with thread migration. >>> > * Once that all verifies are passed about spinlocks ownership, >>> > * the thread is in a safe path and it can be unpinned. >>> > */ >>> > sched_pin(); >>> > lock_list = PCPU_GET(spinlocks); >>> > 802db22b: 48 89 85 f0 fe ff ff mov %rax,-0x110(%rbp) >>> > 802db232: 48 89 85 f8 fe ff ff mov %rax,-0x108(%rbp) >>> > if (lock_list != NULL && lock_list->ll_count != 0) { >>> > 802db239: 0f 84 ff 00 00 00 je 802db33e >>> > >>> > 802db23f: 44 8b 60 50 mov 0x50(%rax),%r12d >>> > >>> > is it possible for the hardware to do any re-ordering here? >>> > >>> > The reason I'm suspicious is not just that the code doesn't have a >>> > lock leak at the indicated point, but in one instance I can see in the >>> > dump that the lock_list local from witness_warn is from the pcpu >>> > structure for CPU 0 (and I was warned about sched lock 0), but the >>> > thread id in panic_cpu is 2. So clearly the thread was being migrated >>> > right around panic time. >>> > >>> > This is the amd64 kernel on stable/7. I'm not sure exactly what kind >>> > of hardware; it's a 4-way Intel chip from about 3 or 4 years ago IIRC. >>> > >>> > So... do we need some kind of barrier in the code for sched_pin() for >>> > it to really do what it claims? Could the hardware have re-ordered >>> > the "mov %gs:0x48,%rax" PCPU_GET to before the sched_pin() >>> > increment? >>> >>> Hmmm, I think it might be able to because they refer to different locations. >>> >>> Note this rule in section 8.2.2 of Volume 3A: >>> >>> • Reads may be reordered with older writes to different locations but not >>> with older writes to the same location. >>> >>> It is certainly true that sparc64 could reorder with RMO. I believe ia64 >>> could reorder as well. Since sched_pin/unpin are frequently used to provide >>> this sort of synchronization, we could use memory barriers in pin/unpin >>> like so: >>> >>> sched_pin() >>> { >>> td->td_pinned = atomic_load_acq_int(&td->td_pinned) + 1; >>> } >>> >>> sched_unpin() >>> { >>> atomic_store_rel_int(&td->td_pinned, td->td_pinned - 1); >>> } >>> >>> We could also just use atomic_add_acq_int() and atomic_sub_rel_int(), but >>> they >>> are slightly more heavyweight, though it would be more clear what is >>> happening >>> I think. >> >> However, to actually get a race you'd have to have an interrupt fire and >> migrate you so that the speculative read was from the other CPU. However, I >> don't think the speculative read would be preserved in that case. The CPU >> has to return to a specific PC when it returns from the interrupt and it has >> no way of storing the state for what speculative reordering it might be >> doing, so presumably it is thrown away? I suppose it is possible that it >> actually retires both instructions (but reordered) and then returns to the PC >> value after the read of listlocks after the interrupt. However, in that case >> the scheduler would not migrate as it would see td_pinned != 0. To get the >> race you have to have the interrupt take effect prior to modifying td_pinned, >> so I think the processor would have to discard the reordered read of >> listlocks so it could safely resume execution at the 'incl' instruction. >> >> The other nit there on x86 at least is that the incl instruction is doing >> both a read and a write and another rule in the section 8.2.2 is this: >> >> • Reads are not reordered with other reads. >> >> That wo
Re: sched_pin() versus PCPU_GET
2010/8/5 John Baldwin : > On Thursday, August 05, 2010 11:59:37 am m...@freebsd.org wrote: >> On Wed, Aug 4, 2010 at 11:55 AM, John Baldwin wrote: >> > On Wednesday, August 04, 2010 12:20:31 pm m...@freebsd.org wrote: >> >> On Wed, Aug 4, 2010 at 2:26 PM, John Baldwin wrote: >> >> > On Tuesday, August 03, 2010 9:46:16 pm m...@freebsd.org wrote: >> >> >> On Fri, Jul 30, 2010 at 2:31 PM, John Baldwin wrote: >> >> >> > On Friday, July 30, 2010 10:08:22 am John Baldwin wrote: >> >> >> >> On Thursday, July 29, 2010 7:39:02 pm m...@freebsd.org wrote: >> >> >> >> > We've seen a few instances at work where witness_warn() in ast() >> >> >> >> > indicates the sched lock is still held, but the place it claims > it was >> >> >> >> > held by is in fact sometimes not possible to keep the lock, like: >> >> >> >> > >> >> >> >> > thread_lock(td); >> >> >> >> > td->td_flags &= ~TDF_SELECT; >> >> >> >> > thread_unlock(td); >> >> >> >> > >> >> >> >> > What I was wondering is, even though the assembly I see in > objdump -S >> >> >> >> > for witness_warn has the increment of td_pinned before the > PCPU_GET: >> >> >> >> > >> >> >> >> > 802db210: 65 48 8b 1c 25 00 00 mov %gs:0x0,%rbx >> >> >> >> > 802db217: 00 00 >> >> >> >> > 802db219: ff 83 04 01 00 00 incl 0x104(%rbx) >> >> >> >> > * Pin the thread in order to avoid problems with thread > migration. >> >> >> >> > * Once that all verifies are passed about spinlocks > ownership, >> >> >> >> > * the thread is in a safe path and it can be unpinned. >> >> >> >> > */ >> >> >> >> > sched_pin(); >> >> >> >> > lock_list = PCPU_GET(spinlocks); >> >> >> >> > 802db21f: 65 48 8b 04 25 48 00 mov %gs:0x48,%rax >> >> >> >> > 802db226: 00 00 >> >> >> >> > if (lock_list != NULL && lock_list->ll_count != 0) { >> >> >> >> > 802db228: 48 85 c0 test %rax,%rax >> >> >> >> > * Pin the thread in order to avoid problems with thread > migration. >> >> >> >> > * Once that all verifies are passed about spinlocks > ownership, >> >> >> >> > * the thread is in a safe path and it can be unpinned. >> >> >> >> > */ >> >> >> >> > sched_pin(); >> >> >> >> > lock_list = PCPU_GET(spinlocks); >> >> >> >> > 802db22b: 48 89 85 f0 fe ff ff mov > %rax,-0x110(%rbp) >> >> >> >> > 802db232: 48 89 85 f8 fe ff ff mov > %rax,-0x108(%rbp) >> >> >> >> > if (lock_list != NULL && lock_list->ll_count != 0) { >> >> >> >> > 802db239: 0f 84 ff 00 00 00 je > 802db33e >> >> >> >> > >> >> >> >> > 802db23f: 44 8b 60 50 mov 0x50(%rax), > %r12d >> >> >> >> > >> >> >> >> > is it possible for the hardware to do any re-ordering here? >> >> >> >> > >> >> >> >> > The reason I'm suspicious is not just that the code doesn't have > a >> >> >> >> > lock leak at the indicated point, but in one instance I can see > in the >> >> >> >> > dump that the lock_list local from witness_warn is from the pcpu >> >> >> >> > structure for CPU 0 (and I was warned about sched lock 0), but > the >> >> >> >> > thread id in panic_cpu is 2. So clearly the thread was being > migrated >> >> >> >> > right around panic time. >> >> >> >> > >> >> >> >> > This is the amd64 kernel on stable/7. I'm not sure exactly what > kind >> >> >> >> > of hardware; it's a 4-way Intel chip from about 3 or 4 years ago > IIRC. >> >> >> >> > >> >> >> >> > So... do we need some kind of barrier in the code for sched_pin() > for >> >> >> >> > it to really do what it claims? Could the hardware have re- > ordered >> >> >> >> > the "mov %gs:0x48,%rax" PCPU_GET to before the sched_pin() >> >> >> >> > increment? >> >> >> >> >> >> >> >> Hmmm, I think it might be able to because they refer to different > locations. >> >> >> >> >> >> >> >> Note this rule in section 8.2.2 of Volume 3A: >> >> >> >> >> >> >> >> • Reads may be reordered with older writes to different locations > but not >> >> >> >> with older writes to the same location. >> >> >> >> >> >> >> >> It is certainly true that sparc64 could reorder with RMO. I > believe ia64 >> >> >> >> could reorder as well. Since sched_pin/unpin are frequently used > to provide >> >> >> >> this sort of synchronization, we could use memory barriers in > pin/unpin >> >> >> >> like so: >> >> >> >> >> >> >> >> sched_pin() >> >> >> >> { >> >> >> >> td->td_pinned = atomic_load_acq_int(&td->td_pinned) + 1; >> >> >> >> } >> >> >> >> >> >> >> >> sched_unpin() >> >> >> >> { >> >> >> >> atomic_store_rel_int(&td->td_pinned, td->td_pinned - 1); >> >> >> >> } >> >> >> >> >> >> >> >> We could also just use atomic_add_acq_int() and > atomic_sub_rel_int(), but they >> >> >> >> are slightly more heavyweight, though it would be more clear what > is happening >> >> >> >> I think. >> >> >> > >> >> >> > However, to actually get a race you'd have to have an interrupt fire > and >> >> >> > migrate you so that t
Re: disk I/O, VFS hirunningspace
2010/7/15 Alan Cox : > On Thu, Jul 15, 2010 at 8:01 AM, Ivan Voras wrote: > >> On 07/14/10 18:27, Jerry Toung wrote: >> > On Wed, Jul 14, 2010 at 12:04 AM, Gary Jennejohn >> > wrote: >> > >> >> >> >> >> >> Rather than commenting out the code try setting the sysctl >> >> vfs.hirunningspace to various powers-of-two. Default seems to be >> >> 1MB. I just changed it on the command line as a test to 2MB. >> >> >> >> You can do this in /etc/sysctl.conf. >> >> >> >> >> > thank you all, that did it. The settings that Matt recommended are giving >> > the same numbers >> >> Any objections to raising the defaults to 8 MB / 1 MB in HEAD? >> >> >> > Keep in mind that we still run on some fairly small systems with limited I/O > capabilities, e.g., a typical arm platform. More generally, with the range > of systems that FreeBSD runs on today, any particular choice of constants is > going to perform poorly for someone. If nothing else, making these sysctls > a function of the buffer cache size is probably better than any particular > constants. What about making a MD sysctl? Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: sysbench / fileio - Linux vs. FreeBSD
2010/6/5 Kostik Belousov > > On Sat, Jun 05, 2010 at 07:41:23PM +0200, Attilio Rao wrote: > > 2010/6/5 Matthew Jacob > > > > > > All of these tests have been apples vs. oranges for years. > > > > > > The following seems to be true, though: > > > > > > a) FreeBSD sequential write performance in UFS has always been less than > > > optimal. > > > > > > b) Linux sequential write performance in just about any filesystem has > > > always been "impressive". But that "impressive" has come at some not so > > > obvious costs. First of all, Linux is probably the most aggressive > > > cluster/write-behind OS I've even seen. You can suck down all available > > > memory with writebehind using dd. This means that some stats are > > > "impressive", and others are "painful". A desktop that becomes completely > > > unresponsive while you're doing this dd is one personal outcome. > > > > > > Also, you have to be careful what you're asking for in comparing the two > > > platforms, or any platforms for that matter. What do you want to optimize > > > for? Apparent responsiveness as a desktop? A specific workload (nfs, > > > cifs) that completes N quatloos per fortnight? > > > > Besides anything, I'm much more concerned about the loss of > > performance within FreeBSD itself. I wouldn't expect a so high > > pessimization when the number of threads increases (without > > considering the big performance loss with the 8k blocksize, pretty > > much reproducible). I'm trying to drive, privately, the tester to > > pmc/lock profiling analysis in order to start collecting some useful > > datas. > Are the benchmarks create threads that write to the same file ? > If yes, then this behaviour is well understood. Actually I still don't know as I just sent an e-mail to the tester and he didn't followup still. However I'm not entirely sure this is a full bottleneck which may be reconduit to missing of byte-range locking. I want to dig more and better understand what's going on exactly. Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: sysbench / fileio - Linux vs. FreeBSD
2010/6/5 Matthew Jacob > > All of these tests have been apples vs. oranges for years. > > The following seems to be true, though: > > a) FreeBSD sequential write performance in UFS has always been less than > optimal. > > b) Linux sequential write performance in just about any filesystem has always > been "impressive". But that "impressive" has come at some not so obvious > costs. First of all, Linux is probably the most aggressive > cluster/write-behind OS I've even seen. You can suck down all available > memory with writebehind using dd. This means that some stats are > "impressive", and others are "painful". A desktop that becomes completely > unresponsive while you're doing this dd is one personal outcome. > > Also, you have to be careful what you're asking for in comparing the two > platforms, or any platforms for that matter. What do you want to optimize > for? Apparent responsiveness as a desktop? A specific workload (nfs, cifs) > that completes N quatloos per fortnight? Besides anything, I'm much more concerned about the loss of performance within FreeBSD itself. I wouldn't expect a so high pessimization when the number of threads increases (without considering the big performance loss with the 8k blocksize, pretty much reproducible). I'm trying to drive, privately, the tester to pmc/lock profiling analysis in order to start collecting some useful datas. While I think that we might pay a lot of attention to ZFS, I think we might not leave alone FFS. Having a fast, well supported, native filesystem might be a great thing for us. Comparing with other operating systems, as you smartly point out, might not be got as 'undefeatable truths' but have cons and prons that needs to be fully understood before to make false claims. Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: idle threads
2010/4/24 Matthew Fleming : > I'm looking at kern_idle.c in stable/7 and I don't quite follow how idle > threads work. The kthread_create(9) call does not pass in a function > pointer, so what code does a processor run when there is no other > runnable thread? In STABLE_7: ... #ifdef SMP SLIST_FOREACH(pc, &cpuhead, pc_allcpu) { error = kthread_create(sched_idletd, NULL, &p, RFSTOPPED | RFHIGHPID, 0, "idle: cpu%d", pc->pc_cpuid); pc->pc_idlethread = FIRST_THREAD_IN_PROC(p); #else error = kthread_create(sched_idletd, NULL, &p, RFSTOPPED | RFHIGHPID, 0, "idle"); PCPU_SET(idlethread, FIRST_THREAD_IN_PROC(p)); #endif ... then they rightly passes sched_idletd(). Any scheduler may define its own version of sched_idletd(). NULL is just the argument passed that is not meaningful. Or maybe I'm not understanding your question? Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Panic in vm_map_stack
2010/3/26 pluknet : > On 26 March 2010 23:10, Tom Judge wrote: >> -BEGIN PGP SIGNED MESSAGE- >> Hash: SHA1 >> >> This is the function, I am guessing that I need to unlock the proc >> before calling vmspace_free ? >> >> > > As far as I know you cannot lock a process around > locking vmspace and/or vm_map (at least on 6.x). > I used process reference counting for that purpose. > Sort of the following mumble.. Generally that is true for vm_map_lock() because it is a sx_lock (thus sleeping) but in this case it is used a trylock operation that won't sleep (and I assume that is why it is used), thus it is not a problem. The vmspace refcounting doesn't impose any extra-constraint instead, thus the lock dance is not required here I think. Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: resource leak in fifo_vnops.c: 6.x/7.x/8.x
2009/11/6 Dorr H. Clark : > > > We believe we have identified a significant resource leak > present in 6.x, 7.x, and 8.x. We believe this is a regression > versus FreeBSD 4.x which appears to do the Right Thing (tm). > > We have a test program (see below) which will run the system > out of sockets by repeated exercise of the failing code > path in the kernel. > > Our proposed fix is applied to the file usr/src/sys/fs/fifofs/fifo_vnops.c > > > @@ -237,6 +237,8 @@ >if (ap->a_mode & FWRITE) { >if ((ap->a_mode & O_NONBLOCK) && fip->fi_readers == 0) { >mtx_unlock(&fifo_mtx); > + /* Exclusive VOP lock is held - safe to clean */ > + fifo_cleanup(vp); >return (ENXIO); >} >fip->fi_writers++; I think it should also check that fip->if_writers == 0 (and possibly the checks within fifo_cleanup() should just be assertions, but that's orthogonal someway) and the comment is not needed. Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: sx locks and memory barriers
2009/9/29 John Baldwin : > On Tuesday 29 September 2009 4:42:13 pm Attilio Rao wrote: >> 2009/9/29 Max Laier : >> > On Tuesday 29 September 2009 17:39:37 Attilio Rao wrote: >> >> 2009/9/25 Fabio Checconi : >> >> > Hi all, >> >> > looking at sys/sx.h I have some troubles understanding this comment: >> >> > >> >> > * A note about memory barriers. Exclusive locks need to use the same >> >> > * memory barriers as mutexes: _acq when acquiring an exclusive lock >> >> > * and _rel when releasing an exclusive lock. On the other side, >> >> > * shared lock needs to use an _acq barrier when acquiring the lock >> >> > * but, since they don't update any locked data, no memory barrier is >> >> > * needed when releasing a shared lock. >> >> > >> >> > In particular, I'm not understanding what prevents the following >> >> > sequence >> >> > from happening: >> >> > >> >> > CPU A CPU B >> >> > >> >> > sx_slock(&data->lock); >> >> > >> >> > sx_sunlock(&data->lock); >> >> > >> >> > /* reordered after the unlock >> >> > by the cpu */ >> >> > if (data->buffer) >> >> >sx_xlock(&data->lock); >> >> >free(data->buffer); >> >> >data->buffer = NULL; >> >> >sx_xunlock(&data->lock); >> >> > >> >> >a = *data->buffer; >> >> > >> >> > IOW, even if readers do not modify the data protected by the lock, >> >> > without a release barrier a memory access may leak past the unlock (as >> >> > the cpu won't notice any dependency between the unlock and the fetch, >> >> > feeling free to reorder them), thus potentially racing with an exclusive >> >> > writer accessing the data. >> >> > >> >> > On architectures where atomic ops serialize memory accesses this would >> >> > never happen, otherwise the sequence above seems possible; am I missing >> >> > something? >> >> >> >> I think your concerns are right, possibly we need this patch: >> >> http://www.freebsd.org/~attilio/sxrw_unlockb.diff >> >> >> >> However speaking with John we agreed possibly there is a more serious >> >> breakage. Possibly, memory barriers would also require to ensure the >> >> compiler to not reorder the operation, while right now, in FreeBSD, they >> >> just take care of the reordering from the architecture perspective. >> >> The only way I'm aware of GCC offers that is to clobber memory. >> >> I will provide a patch that address this soon, hoping that GCC will be >> >> smart enough to not overhead too much the memory clobbering but just >> >> try to understand what's our purpose and servers it (I will try to >> >> compare code generated before and after the patch at least for tier-1 >> >> architectures). >> > >> > Does GCC really reorder accesses to volatile objects? The C Standard seems >> > to >> > object: >> > >> > 5.1.2.3 - 2 >> > Accessing a volatile object, modifying an object, modifying a file, or >> > calling >> > a function that does any of those operations are all side effects,11) which >> > are changes in the state of the execution environment. Evaluation of an >> > expression may produce side effects. At certain specified points in the >> > execution sequence called sequence points, all side effects of previous >> > evaluations shall be complete and no side effects of subsequent evaluations >> > shall have taken place. (A summary of the sequence points is given in annex >> > C.) >> >> Very interesting. >> I was thinking about the other operating systems which basically do >> 'memory clobbering' for ensuring a compiler barrier, but actually they >> often forsee such a barrier without the conjuction of a memory >> operand. >> >> I think I will need to speak a bit with a GCC engineer in order to see >> what do they implement in regard of volatile operands. > > GCC can be quite aggressive with reordering even in the fac
Re: NMI with 7.2-stable
2009/10/1 ivan anyukov : > dmesg goes here: > Yes, I think it is an hw failure. Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: NMI with 7.2-stable
Can you send a full dmesg? 2009/10/1 ivan anyukov : > Hi guys, > I'm running 7.2-STABLE on a Thinkpad T60. > When connecting a second monitor to my docking station sometimes my FreeBSD > freezes. > kgdb on the vmcore-file says "non-maskable interrupt trap" > Some details: > X.Org 1.5.3 using the radeon-Driver > I think the problem appears when moving xterms from the first to the second > monitor (or back). The mouse cursor looks _very_ strange then and after some > minutes the whole system freezes. > Does anyone know about the problem? Is it a hardware-failure for sure? > > A vmcore is generated, I loaded it into kgdb: > > This GDB was configured as "i386-marcel-freebsd"... > > Unread portion of the kernel message buffer: > <<22>NMI>N MIIS AI SA b0b, 0E,I ESIAS A > < > 2>R<2 >>RA<2M >ApMa rpiatyr ietryr oerr,r olri,k elliyk ehlayr dhwaarrdew afraei > lfuariel.ure. > > > > FFaattaall ttrraapp 1199:: nnoonn--mmaasskkaabbllee iinntteeuupptt > ttrraapp wwhhiillee iinn kkeerrnneell mmooddee > > ccppuuiidd == 01;; aappiicc iidd == 0010 > > iinnssttrruuccttiioonn ppooiinntteerr == > 00xx2200::00xxcc00eeaa7733eecc > > ssttaacckk ppooiinntteerr == > 00xx2288::00xxcc4477aa58cc > > ffrraammee ppooiinntteerr == > 00xx2288::00xxcc4477aa58cc > > ccooddee sseeggmmeenntt== bbaassee > 00xx00,, lliimmiitt 00xxff,, ttyyppee 00xx11bb > >== DDPPLL 00,, pprreess > 11,, ddeeff3322 11,, ggrraann 11 > > pprroocceeoorr eeffllaaggss== iinntteeuupptt > eennaabblleedd,, IIOOPPLL == 00 > > ccuueenntt pprrooccee == 1112 > ((iiddllee:: ccppuu10)) > > ttrraapp nnuummbbeerr == 1199 > > backtrace says: > #0 doadump () at pcpu.h:196 > #1 0xc07c5258 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418 > #2 0xc07c5535 in panic (fmt=Variable "fmt" is not available. > ) at /usr/src/sys/kern/kern_shutdown.c:574 > #3 0xc0ab2d14 in trap_fatal (frame=0xc47a5c48, eva=0) at > /usr/src/sys/i386/i386/trap.c:939 > #4 0xc0ab3a9d in trap (frame=0xc47a5c48) at > /usr/src/sys/i386/i386/trap.c:726 > #5 0xc0a9910b in calltrap () at /usr/src/sys/i386/i386/exception.s:159 > #6 0xc0ea73ec in acpi_cpu_c1 () at > /usr/src/sys/modules/acpi/acpi/../../../i386/acpica/acpi_machdep.c:550 > #7 0xc0ea0623 in acpi_cpu_idle () at > /usr/src/sys/modules/acpi/acpi/../../../dev/acpica/acpi_cpu.c:943 > #8 0xc0aa377b in cpu_idle () at /usr/src/sys/i386/i386/machdep.c:1183 > #9 0xc07e6bd4 in sched_idletd (dummy=0x0) at > /usr/src/sys/kern/sched_ule.c:2681 > #10 0xc07a1311 in fork_exit (callout=0xc07e690f , arg=0x0, > frame=0xc47a5d38) >at /usr/src/sys/kern/kern_fork.c:810 > #11 0xc0a99180 in fork_trampoline () at > /usr/src/sys/i386/i386/exception.s:264 > > Any ideas? > > Thanks a lot! > ___ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org" > -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: sx locks and memory barriers
2009/9/29 John Baldwin : > On Tuesday 29 September 2009 4:42:13 pm Attilio Rao wrote: >> 2009/9/29 Max Laier : >> > On Tuesday 29 September 2009 17:39:37 Attilio Rao wrote: >> >> 2009/9/25 Fabio Checconi : >> >> > Hi all, >> >> > looking at sys/sx.h I have some troubles understanding this comment: >> >> > >> >> > * A note about memory barriers. Exclusive locks need to use the same >> >> > * memory barriers as mutexes: _acq when acquiring an exclusive lock >> >> > * and _rel when releasing an exclusive lock. On the other side, >> >> > * shared lock needs to use an _acq barrier when acquiring the lock >> >> > * but, since they don't update any locked data, no memory barrier is >> >> > * needed when releasing a shared lock. >> >> > >> >> > In particular, I'm not understanding what prevents the following >> >> > sequence >> >> > from happening: >> >> > >> >> > CPU A CPU B >> >> > >> >> > sx_slock(&data->lock); >> >> > >> >> > sx_sunlock(&data->lock); >> >> > >> >> > /* reordered after the unlock >> >> > by the cpu */ >> >> > if (data->buffer) >> >> >sx_xlock(&data->lock); >> >> >free(data->buffer); >> >> >data->buffer = NULL; >> >> >sx_xunlock(&data->lock); >> >> > >> >> >a = *data->buffer; >> >> > >> >> > IOW, even if readers do not modify the data protected by the lock, >> >> > without a release barrier a memory access may leak past the unlock (as >> >> > the cpu won't notice any dependency between the unlock and the fetch, >> >> > feeling free to reorder them), thus potentially racing with an exclusive >> >> > writer accessing the data. >> >> > >> >> > On architectures where atomic ops serialize memory accesses this would >> >> > never happen, otherwise the sequence above seems possible; am I missing >> >> > something? >> >> >> >> I think your concerns are right, possibly we need this patch: >> >> http://www.freebsd.org/~attilio/sxrw_unlockb.diff >> >> >> >> However speaking with John we agreed possibly there is a more serious >> >> breakage. Possibly, memory barriers would also require to ensure the >> >> compiler to not reorder the operation, while right now, in FreeBSD, they >> >> just take care of the reordering from the architecture perspective. >> >> The only way I'm aware of GCC offers that is to clobber memory. >> >> I will provide a patch that address this soon, hoping that GCC will be >> >> smart enough to not overhead too much the memory clobbering but just >> >> try to understand what's our purpose and servers it (I will try to >> >> compare code generated before and after the patch at least for tier-1 >> >> architectures). >> > >> > Does GCC really reorder accesses to volatile objects? The C Standard seems >> > to >> > object: >> > >> > 5.1.2.3 - 2 >> > Accessing a volatile object, modifying an object, modifying a file, or >> > calling >> > a function that does any of those operations are all side effects,11) which >> > are changes in the state of the execution environment. Evaluation of an >> > expression may produce side effects. At certain specified points in the >> > execution sequence called sequence points, all side effects of previous >> > evaluations shall be complete and no side effects of subsequent evaluations >> > shall have taken place. (A summary of the sequence points is given in annex >> > C.) >> >> Very interesting. >> I was thinking about the other operating systems which basically do >> 'memory clobbering' for ensuring a compiler barrier, but actually they >> often forsee such a barrier without the conjuction of a memory >> operand. >> >> I think I will need to speak a bit with a GCC engineer in order to see >> what do they implement in regard of volatile operands. > > GCC can be quite aggressive with reordering even in the fa
Re: sx locks and memory barriers
2009/9/29 Marius Nünnerich : > On Tue, Sep 29, 2009 at 21:15, Attilio Rao wrote: >> 2009/9/29 John Baldwin : >>> On Tuesday 29 September 2009 11:39:37 am Attilio Rao wrote: >>>> 2009/9/25 Fabio Checconi : >>>> > Hi all, >>>> > looking at sys/sx.h I have some troubles understanding this comment: >>>> > >>>> > * A note about memory barriers. Exclusive locks need to use the same >>>> > * memory barriers as mutexes: _acq when acquiring an exclusive lock >>>> > * and _rel when releasing an exclusive lock. On the other side, >>>> > * shared lock needs to use an _acq barrier when acquiring the lock >>>> > * but, since they don't update any locked data, no memory barrier is >>>> > * needed when releasing a shared lock. >>>> > >>>> > In particular, I'm not understanding what prevents the following sequence >>>> > from happening: >>>> > >>>> > CPU A CPU B >>>> > >>>> > sx_slock(&data->lock); >>>> > >>>> > sx_sunlock(&data->lock); >>>> > >>>> > /* reordered after the unlock >>>> > by the cpu */ >>>> > if (data->buffer) >>>> >sx_xlock(&data->lock); >>>> >free(data->buffer); >>>> >data->buffer = NULL; >>>> >sx_xunlock(&data->lock); >>>> > >>>> >a = *data->buffer; >>>> > >>>> > IOW, even if readers do not modify the data protected by the lock, >>>> > without a release barrier a memory access may leak past the unlock (as >>>> > the cpu won't notice any dependency between the unlock and the fetch, >>>> > feeling free to reorder them), thus potentially racing with an exclusive >>>> > writer accessing the data. >>>> > >>>> > On architectures where atomic ops serialize memory accesses this would >>>> > never happen, otherwise the sequence above seems possible; am I missing >>>> > something? >>>> >>>> I think your concerns are right, possibly we need this patch: >>>> http://www.freebsd.org/~attilio/sxrw_unlockb.diff >>> >>> Actually, since you are only worried about reads, I think this should be >>> an "acq" barrier rather than a "rel". In some cases "acq" is cheaper, so we >>> should prefer the cheapest barrier that provides what we need. You would >>> still need to keep some language about the memory barriers since using "acq" >>> for shared unlocking is different from exclusive unlocking. >> >> Actually, I don't think that an acq barrier ensures enough protection >> against the reordering of 'earlier' operation thus not fixing the >> architecture ordering problem reported by Fabio. Also, I don't think >> we just have to care about reads (or I don't understand what you mean >> here). >> However, I'm not even sure that we have faster read barriers than the >> write one. As long as it should be true in theory I don't think that's >> what happen in practice. >> >>> The memory clobber is quite heavyweight. It actually forces gcc to forget >>> any >>> cached memory items in registers and reload everything again. What I really >>> want is just a barrier to tell GCC to not reorder things. If I read a value >>> in the program before acquiring a lock it is in theory fine to keep that >>> cached across the barrier. However, there isn't a way to do this sort of >>> thing with GCC currently. >> >> Yes, that's the only tool we have right now with GCC. I will try to >> look for another way, but it sounds difficult to discover. > > Even if we would have a mechanism to tell GCC to not reorder the > instructions the CPU itself would still be free to reorder if there > are no barriers. Or am I missing something? Our code already takes care of that case in our barriers. Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: sx locks and memory barriers
2009/9/29 Max Laier : > On Tuesday 29 September 2009 17:39:37 Attilio Rao wrote: >> 2009/9/25 Fabio Checconi : >> > Hi all, >> > looking at sys/sx.h I have some troubles understanding this comment: >> > >> > * A note about memory barriers. Exclusive locks need to use the same >> > * memory barriers as mutexes: _acq when acquiring an exclusive lock >> > * and _rel when releasing an exclusive lock. On the other side, >> > * shared lock needs to use an _acq barrier when acquiring the lock >> > * but, since they don't update any locked data, no memory barrier is >> > * needed when releasing a shared lock. >> > >> > In particular, I'm not understanding what prevents the following sequence >> > from happening: >> > >> > CPU A CPU B >> > >> > sx_slock(&data->lock); >> > >> > sx_sunlock(&data->lock); >> > >> > /* reordered after the unlock >> > by the cpu */ >> > if (data->buffer) >> >sx_xlock(&data->lock); >> >free(data->buffer); >> >data->buffer = NULL; >> >sx_xunlock(&data->lock); >> > >> >a = *data->buffer; >> > >> > IOW, even if readers do not modify the data protected by the lock, >> > without a release barrier a memory access may leak past the unlock (as >> > the cpu won't notice any dependency between the unlock and the fetch, >> > feeling free to reorder them), thus potentially racing with an exclusive >> > writer accessing the data. >> > >> > On architectures where atomic ops serialize memory accesses this would >> > never happen, otherwise the sequence above seems possible; am I missing >> > something? >> >> I think your concerns are right, possibly we need this patch: >> http://www.freebsd.org/~attilio/sxrw_unlockb.diff >> >> However speaking with John we agreed possibly there is a more serious >> breakage. Possibly, memory barriers would also require to ensure the >> compiler to not reorder the operation, while right now, in FreeBSD, they >> just take care of the reordering from the architecture perspective. >> The only way I'm aware of GCC offers that is to clobber memory. >> I will provide a patch that address this soon, hoping that GCC will be >> smart enough to not overhead too much the memory clobbering but just >> try to understand what's our purpose and servers it (I will try to >> compare code generated before and after the patch at least for tier-1 >> architectures). > > Does GCC really reorder accesses to volatile objects? The C Standard seems to > object: > > 5.1.2.3 - 2 > Accessing a volatile object, modifying an object, modifying a file, or calling > a function that does any of those operations are all side effects,11) which > are changes in the state of the execution environment. Evaluation of an > expression may produce side effects. At certain specified points in the > execution sequence called sequence points, all side effects of previous > evaluations shall be complete and no side effects of subsequent evaluations > shall have taken place. (A summary of the sequence points is given in annex > C.) Very interesting. I was thinking about the other operating systems which basically do 'memory clobbering' for ensuring a compiler barrier, but actually they often forsee such a barrier without the conjuction of a memory operand. I think I will need to speak a bit with a GCC engineer in order to see what do they implement in regard of volatile operands. Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: sx locks and memory barriers
2009/9/29 John Baldwin : > On Tuesday 29 September 2009 3:15:40 pm Attilio Rao wrote: >> 2009/9/29 John Baldwin : >> > On Tuesday 29 September 2009 11:39:37 am Attilio Rao wrote: >> >> 2009/9/25 Fabio Checconi : >> >> > Hi all, >> >> > looking at sys/sx.h I have some troubles understanding this comment: >> >> > >> >> > * A note about memory barriers. Exclusive locks need to use the same >> >> > * memory barriers as mutexes: _acq when acquiring an exclusive lock >> >> > * and _rel when releasing an exclusive lock. On the other side, >> >> > * shared lock needs to use an _acq barrier when acquiring the lock >> >> > * but, since they don't update any locked data, no memory barrier is >> >> > * needed when releasing a shared lock. >> >> > >> >> > In particular, I'm not understanding what prevents the following >> >> > sequence >> >> > from happening: >> >> > >> >> > CPU A CPU B >> >> > >> >> > sx_slock(&data->lock); >> >> > >> >> > sx_sunlock(&data->lock); >> >> > >> >> > /* reordered after the unlock >> >> > by the cpu */ >> >> > if (data->buffer) >> >> >sx_xlock(&data->lock); >> >> >free(data->buffer); >> >> >data->buffer = NULL; >> >> >sx_xunlock(&data->lock); >> >> > >> >> >a = *data->buffer; >> >> > >> >> > IOW, even if readers do not modify the data protected by the lock, >> >> > without a release barrier a memory access may leak past the unlock (as >> >> > the cpu won't notice any dependency between the unlock and the fetch, >> >> > feeling free to reorder them), thus potentially racing with an exclusive >> >> > writer accessing the data. >> >> > >> >> > On architectures where atomic ops serialize memory accesses this would >> >> > never happen, otherwise the sequence above seems possible; am I missing >> >> > something? >> >> >> >> I think your concerns are right, possibly we need this patch: >> >> http://www.freebsd.org/~attilio/sxrw_unlockb.diff >> > >> > Actually, since you are only worried about reads, I think this should be >> > an "acq" barrier rather than a "rel". In some cases "acq" is cheaper, so >> > we >> > should prefer the cheapest barrier that provides what we need. You would >> > still need to keep some language about the memory barriers since using >> > "acq" >> > for shared unlocking is different from exclusive unlocking. >> >> Actually, I don't think that an acq barrier ensures enough protection >> against the reordering of 'earlier' operation thus not fixing the >> architecture ordering problem reported by Fabio. Also, I don't think >> we just have to care about reads (or I don't understand what you mean >> here). > > Hmmm, it might not on certain archs. It does on x86 (i.e. an lfence would > work here on x86), but probably not on ia64 and sparc64. Also, we certainly > only care about reads. A read/share lock cannot resolve any races where the > lock holder is writing data, it can only ensure that the lock holder can > safely read shared data without the data changing out from underneath it. > >> However, I'm not even sure that we have faster read barriers than the >> write one. As long as it should be true in theory I don't think that's >> what happen in practice. > > bde@ found that sfence was generally much more expensive than lfence on x86. > However, since x86 guarantees the order of all stores we don't actually need > to use sfence at all on x86 anyway. Yes, x86 guarantees that the stores are strong ordered so I don't think acq to be faster than rel. Can I assume the patch I already sent as reviewed by you and commit then, right? Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: sx locks and memory barriers
2009/9/29 John Baldwin : > On Tuesday 29 September 2009 11:39:37 am Attilio Rao wrote: >> 2009/9/25 Fabio Checconi : >> > Hi all, >> > looking at sys/sx.h I have some troubles understanding this comment: >> > >> > * A note about memory barriers. Exclusive locks need to use the same >> > * memory barriers as mutexes: _acq when acquiring an exclusive lock >> > * and _rel when releasing an exclusive lock. On the other side, >> > * shared lock needs to use an _acq barrier when acquiring the lock >> > * but, since they don't update any locked data, no memory barrier is >> > * needed when releasing a shared lock. >> > >> > In particular, I'm not understanding what prevents the following sequence >> > from happening: >> > >> > CPU A CPU B >> > >> > sx_slock(&data->lock); >> > >> > sx_sunlock(&data->lock); >> > >> > /* reordered after the unlock >> > by the cpu */ >> > if (data->buffer) >> >sx_xlock(&data->lock); >> >free(data->buffer); >> >data->buffer = NULL; >> >sx_xunlock(&data->lock); >> > >> >a = *data->buffer; >> > >> > IOW, even if readers do not modify the data protected by the lock, >> > without a release barrier a memory access may leak past the unlock (as >> > the cpu won't notice any dependency between the unlock and the fetch, >> > feeling free to reorder them), thus potentially racing with an exclusive >> > writer accessing the data. >> > >> > On architectures where atomic ops serialize memory accesses this would >> > never happen, otherwise the sequence above seems possible; am I missing >> > something? >> >> I think your concerns are right, possibly we need this patch: >> http://www.freebsd.org/~attilio/sxrw_unlockb.diff > > Actually, since you are only worried about reads, I think this should be > an "acq" barrier rather than a "rel". In some cases "acq" is cheaper, so we > should prefer the cheapest barrier that provides what we need. You would > still need to keep some language about the memory barriers since using "acq" > for shared unlocking is different from exclusive unlocking. Actually, I don't think that an acq barrier ensures enough protection against the reordering of 'earlier' operation thus not fixing the architecture ordering problem reported by Fabio. Also, I don't think we just have to care about reads (or I don't understand what you mean here). However, I'm not even sure that we have faster read barriers than the write one. As long as it should be true in theory I don't think that's what happen in practice. > The memory clobber is quite heavyweight. It actually forces gcc to forget any > cached memory items in registers and reload everything again. What I really > want is just a barrier to tell GCC to not reorder things. If I read a value > in the program before acquiring a lock it is in theory fine to keep that > cached across the barrier. However, there isn't a way to do this sort of > thing with GCC currently. Yes, that's the only tool we have right now with GCC. I will try to look for another way, but it sounds difficult to discover. Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: sx locks and memory barriers
2009/9/25 Fabio Checconi : > Hi all, > looking at sys/sx.h I have some troubles understanding this comment: > > * A note about memory barriers. Exclusive locks need to use the same > * memory barriers as mutexes: _acq when acquiring an exclusive lock > * and _rel when releasing an exclusive lock. On the other side, > * shared lock needs to use an _acq barrier when acquiring the lock > * but, since they don't update any locked data, no memory barrier is > * needed when releasing a shared lock. > > In particular, I'm not understanding what prevents the following sequence > from happening: > > CPU A CPU B > > sx_slock(&data->lock); > > sx_sunlock(&data->lock); > > /* reordered after the unlock > by the cpu */ > if (data->buffer) >sx_xlock(&data->lock); >free(data->buffer); >data->buffer = NULL; >sx_xunlock(&data->lock); > >a = *data->buffer; > > IOW, even if readers do not modify the data protected by the lock, > without a release barrier a memory access may leak past the unlock (as > the cpu won't notice any dependency between the unlock and the fetch, > feeling free to reorder them), thus potentially racing with an exclusive > writer accessing the data. > > On architectures where atomic ops serialize memory accesses this would > never happen, otherwise the sequence above seems possible; am I missing > something? I think your concerns are right, possibly we need this patch: http://www.freebsd.org/~attilio/sxrw_unlockb.diff However speaking with John we agreed possibly there is a more serious breakage. Possibly, memory barriers would also require to ensure the compiler to not reorder the operation, while right now, in FreeBSD, they just take care of the reordering from the architecture perspective. The only way I'm aware of GCC offers that is to clobber memory. I will provide a patch that address this soon, hoping that GCC will be smart enough to not overhead too much the memory clobbering but just try to understand what's our purpose and servers it (I will try to compare code generated before and after the patch at least for tier-1 architectures). Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Where have all the vnodes gone?
2009/8/1 Linda Messerschmidt : > (Reposted from freebsd-questions due to no replies.) > > With the last few releases, I've noticed a distinct trend toward > disappearing vnodes on one of the machines I look after. > > This machine isn't doing a whole lot. It runs a couple of small web > sites, and once an hour it rsync's some files from one NFS mount to > another, but the rsync doesn't stay running; it restarts every hour > and runs for 10-15 minutes. > > I set it to log the number of vnodes every ten minutes and this is what I got: > > 00:47:59 vfs.numvnodes: 39337 > 00:57:59 vfs.numvnodes: 40568 > 01:07:59 vfs.numvnodes: 44554 > 01:17:59 vfs.numvnodes: 52141 > 01:27:59 vfs.numvnodes: 55713 > 01:37:59 vfs.numvnodes: 58643 > 01:47:59 vfs.numvnodes: 60792 > 01:57:59 vfs.numvnodes: 67130 > 02:07:59 vfs.numvnodes: 76035 > 02:17:59 vfs.numvnodes: 84349 > 02:27:59 vfs.numvnodes: 92187 > 02:37:59 vfs.numvnodes: 98114 > 02:47:59 vfs.numvnodes: 116854 > 02:57:59 vfs.numvnodes: 124842 > 03:07:59 vfs.numvnodes: 164173 > 03:17:59 vfs.numvnodes: 172257 > 03:27:59 vfs.numvnodes: 178388 > 03:37:59 vfs.numvnodes: 183066 > 03:47:59 vfs.numvnodes: 190092 > 03:57:59 vfs.numvnodes: 198322 > 04:07:59 vfs.numvnodes: 204598 > 04:17:59 vfs.numvnodes: 208311 > 04:27:59 vfs.numvnodes: 214207 > 04:37:59 vfs.numvnodes: 221028 > 04:47:59 vfs.numvnodes: 227792 > 04:57:59 vfs.numvnodes: 233214 > 05:07:59 vfs.numvnodes: 240112 > 05:17:59 vfs.numvnodes: 247572 > 05:27:59 vfs.numvnodes: 256090 > 05:37:59 vfs.numvnodes: 262720 > 05:47:59 vfs.numvnodes: 269755 > 05:57:59 vfs.numvnodes: 274986 > 06:07:59 vfs.numvnodes: 279879 > 06:17:59 vfs.numvnodes: 287039 > 06:27:59 vfs.numvnodes: 291984 > 06:37:59 vfs.numvnodes: 294267 > 06:47:59 vfs.numvnodes: 296658 > 06:57:59 vfs.numvnodes: 299086 > 07:07:59 vfs.numvnodes: 301825 > 07:17:59 vfs.numvnodes: 309060 > 07:27:59 vfs.numvnodes: 312955 > 07:37:59 vfs.numvnodes: 317400 > 07:47:59 vfs.numvnodes: 320047 > > At that point the machine crashed with: > > panic: kmem_malloc(16384): kmem_map too small: 334745600 total allocated > > If I don't tune kern.maxvnodes up to the point where it panics, then > eventually it runs out of vnodes and all sorts of stuff gets stuck in > vlruwk. > > The machine in question is running 7.2-RELEASE-p3, but I already > upgraded it from 7.1 trying to get this to go away, so it's a problem > that's been around for awhile. > > My guess is that they're leaking in the kernel somewhere because of > the rsync, because there's just not much else going on, but unless I > can figure out how many vnodes are being used on a per-process basis, > I can't make any headway on proving or disproving that. I do know > that according to fstat, there are only 1000-1500 descriptors open at > any given time, and kern.openfiles ranges 250-500. So I'm just > mystified what the other 30+ could be. > > Is there any way to figure out where all these vnodes are going? > > TIA for any advice! > ___ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org" > -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Where have all the vnodes gone?
2009/8/1 Linda Messerschmidt : > (Reposted from freebsd-questions due to no replies.) > > With the last few releases, I've noticed a distinct trend toward > disappearing vnodes on one of the machines I look after. > > This machine isn't doing a whole lot. It runs a couple of small web > sites, and once an hour it rsync's some files from one NFS mount to > another, but the rsync doesn't stay running; it restarts every hour > and runs for 10-15 minutes. > > I set it to log the number of vnodes every ten minutes and this is what I got: > > 00:47:59 vfs.numvnodes: 39337 > 00:57:59 vfs.numvnodes: 40568 > 01:07:59 vfs.numvnodes: 44554 > 01:17:59 vfs.numvnodes: 52141 > 01:27:59 vfs.numvnodes: 55713 > 01:37:59 vfs.numvnodes: 58643 > 01:47:59 vfs.numvnodes: 60792 > 01:57:59 vfs.numvnodes: 67130 > 02:07:59 vfs.numvnodes: 76035 > 02:17:59 vfs.numvnodes: 84349 > 02:27:59 vfs.numvnodes: 92187 > 02:37:59 vfs.numvnodes: 98114 > 02:47:59 vfs.numvnodes: 116854 > 02:57:59 vfs.numvnodes: 124842 > 03:07:59 vfs.numvnodes: 164173 > 03:17:59 vfs.numvnodes: 172257 > 03:27:59 vfs.numvnodes: 178388 > 03:37:59 vfs.numvnodes: 183066 > 03:47:59 vfs.numvnodes: 190092 > 03:57:59 vfs.numvnodes: 198322 > 04:07:59 vfs.numvnodes: 204598 > 04:17:59 vfs.numvnodes: 208311 > 04:27:59 vfs.numvnodes: 214207 > 04:37:59 vfs.numvnodes: 221028 > 04:47:59 vfs.numvnodes: 227792 > 04:57:59 vfs.numvnodes: 233214 > 05:07:59 vfs.numvnodes: 240112 > 05:17:59 vfs.numvnodes: 247572 > 05:27:59 vfs.numvnodes: 256090 > 05:37:59 vfs.numvnodes: 262720 > 05:47:59 vfs.numvnodes: 269755 > 05:57:59 vfs.numvnodes: 274986 > 06:07:59 vfs.numvnodes: 279879 > 06:17:59 vfs.numvnodes: 287039 > 06:27:59 vfs.numvnodes: 291984 > 06:37:59 vfs.numvnodes: 294267 > 06:47:59 vfs.numvnodes: 296658 > 06:57:59 vfs.numvnodes: 299086 > 07:07:59 vfs.numvnodes: 301825 > 07:17:59 vfs.numvnodes: 309060 > 07:27:59 vfs.numvnodes: 312955 > 07:37:59 vfs.numvnodes: 317400 > 07:47:59 vfs.numvnodes: 320047 > > At that point the machine crashed with: > > panic: kmem_malloc(16384): kmem_map too small: 334745600 total allocated > > If I don't tune kern.maxvnodes up to the point where it panics, then > eventually it runs out of vnodes and all sorts of stuff gets stuck in > vlruwk. > > The machine in question is running 7.2-RELEASE-p3, but I already > upgraded it from 7.1 trying to get this to go away, so it's a problem > that's been around for awhile. > > My guess is that they're leaking in the kernel somewhere because of > the rsync, because there's just not much else going on, but unless I > can figure out how many vnodes are being used on a per-process basis, > I can't make any headway on proving or disproving that. I do know > that according to fstat, there are only 1000-1500 descriptors open at > any given time, and kern.openfiles ranges 250-500. So I'm just > mystified what the other 30+ could be. > > Is there any way to figure out where all these vnodes are going? It seems you can reproduce it easilly. Can you please provide KTR traces of KTR_VFS? Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: valgrind on FreeBSD 7
2008/10/2 Yuri : > Xin LI wrote: >> >> Simon Barner wrote: >>> >>> Mike Silbersack wrote: On Mon, 17 Mar 2008, Heiko Wundram wrote: Here's a tarball of what's in perforce right now. I tried it a little bit, and it seemed to work for me. Make sure to install the kernel module! http://www.silby.com/valgrind_freebsd_3.tar.gz But don't send me questions about it - I'm not an expert on it, I'm just the guy who grabbed it from perforce and found that it seems to work. :) >>> >>> Could you please provide me with the details, so I can update my >>> (horribly outdated :( valgrind ports? >> >> It was available from p4 at: >> >> //depot/projects/valgrind/... >> >> Note that this version does not work on architectures other than i386. >> >> Cheers, > > > Any developments in Valgrind/Callgrind on FreeBSD? > Any hope to get this version into ports and to merge FreeBSD support up into > Valgrind project? For our employer (Sandvine incorporated) Ed Maste and me are working on finishing the port Peter Wemm started a while ago. You can find updated informations on the project here: http://wiki.freebsd.org/Valgrind while you can get the sources pack here: http://people.freebsd.org/~attilio/Sandvine/valgrind/valgrind.tar.bz2 Please note that 8 and 7 now have all the required support to let run this modified version, so no further modify is required to these branches. Valgrind is able to run on 6 too, if the patch system_valgrind.diff is applied (I can't recall if it targeted for SVOS, as I think, or it is stock STABLE_6, just try it, it may apply to both). The port is really not clean, but that's not Peter's fault. Valgrind should be really rewritten in order to be less Linux-centric and be more smart about interface sucking. The effort, though, would be so large that would require too much work to happen in the short term. What is in the tar works mostly ok (modulo missing syscalls) on both ia32 and amd64, but you should carefully follow what is written in the Wiki for a good install of the product. Also note that I have an internal version that should fix most (all?) the issues reported in the Wiki and I will try to release it on a regular basis, until possible. So far, I'm fighting with a segfault in the profiled process while running with SV environment {amd64 arch}, a symptomancy that doesn't show with a stock FreeBSD installation so far. If you could try the port and report to us I would appreciate a lot. For any other question please refer to us. Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: vfs_cache panic, 7.2-prerelease (stable)
2009/5/1 : > Filed under: > > http://www.freebsd.org/cgi/query-pr.cgi?pr=134142 > > Would be incredibly grateful if somebody in the know could take > a look at this. But, what's the panic message? Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: [problem] aac0 does not respond
2009/3/24 Vladimir Ermakov : > Hello, All > > Describe my problem: > have volume RAID-10 (SAS-HDD x 6) on Adaptec RAID 5805 > 2 HHD of 6 have errors in smart data (damaged) > i am try read file /var/db/mysql/ibdata1 from this volume > system does not respond ( lost access to ssh ) after read 6GB data from this > file > and print debug messages on ttyv0 > > As to prevent the emergence of this problem? > As monitor the status of RAID-controller? > > please, any solutions Is this -STABLE or -CURRENT? And if it is -CURRENT, what revision? Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: smbfs hang
2008/10/16, Chris Coleman <[EMAIL PROTECTED]>: > I'm doing a large transfer from an SMB mounted drive, about 2TB of > files. After about 250G, it hanging. Of course any process that > tries to access that drive hangs as well. > > Is there anyway to get those processes killed off and remount the > drive without rebooting? I haven't had much luck so far. They don't > seem to want to die. Hello Chris, are you equipped with a debugging kernel? Are you interested in helping fixing this bug? Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Building custom kernel with new files
2008/8/16, Ryan French <[EMAIL PROTECTED]>: > On Sat, 16 Aug 2008 10:38:24 pm you wrote: > > 2008/8/16, Ryan French <[EMAIL PROTECTED]>: > > > Hi All, > > > > > > I am currently trying to build a custom kernel for my Google Summer of > > > Code, and am running into a bit of a problem. I have all of my code > > > compiling, but when I get to the linking stage as soon as it comes to the > > > new files I have installed it says the *.o does not exist. I have added > > > the files to /conf/files, and well as added the files to /conf/options > > > and /conf/NOTES, and the option is set in my Makefile for the kernel. Is > > > there another step I need to do before I can build the kernel? > > > > Ryan, > > can you please past the error message? > > More specifically, missing .o are about your newly included files or other? > > This can happen, for example, if the consumer of a subsystem wants to > > use it and the subsystem is not compiled (the stack(9) without STACK > > option, for example). > > > > Thanks, > > Attilio > > The missing .o is about my own files that I have included in the kernel. More > specifically I have a file 'mpls.h' and as it is the first one in the > directory as soon as it gets to linking the file I get the error > > ld: mpls.o: No such file: No such file or directory > *** Error code 1 > > This occurs right after the command > > MAKE=make sh /usr/src/sys/conf/newvers.sh MPLSKERNEL > > Sorry I cant copy the whole error but I am running the development in a VM and > I cant copy and paste between the VM and my host machine. Can you please paste your diff against src/sys/conf/files* ? Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Building custom kernel with new files
2008/8/16, Ryan French <[EMAIL PROTECTED]>: > Hi All, > > I am currently trying to build a custom kernel for my Google Summer of Code, > and am running into a bit of a problem. I have all of my code compiling, but > when I get to the linking stage as soon as it comes to the new files I have > installed it says the *.o does not exist. I have added the files > to /conf/files, and well as added the files to /conf/options and /conf/NOTES, > and the option is set in my Makefile for the kernel. Is there another step I > need to do before I can build the kernel? Ryan, can you please past the error message? More specifically, missing .o are about your newly included files or other? This can happen, for example, if the consumer of a subsystem wants to use it and the subsystem is not compiled (the stack(9) without STACK option, for example). Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Do you really "sleep" when blocked on a mutex?
2008/4/21, Murty, Ravi <[EMAIL PROTECTED]>: > Hello, > > > > When a thread cannot get a mutex (default mutex) and needs to be > blocked, is it really put to sleep? From looking at the code it appears > that it is inhibited (TD_SET_LOCK) but isn't really put to sleep. > > > > 1. Why isn't it put to sleep - why can't it be treated the same? > 2. The eventual question I am trying to answer is the difference > between setrunnable() and setrunqueue() - this one simply finds a slot > in the ksegrp and a runq to add the KSE/td. But setrunnable() also > checks to see if the process is in memory (PS_INMEM) before calling > sched_wakeup which eventually calls setrunqueue()? Why doesn't > setrunqueue have to worry about the possibility that the process may > have been swapped out while it was waiting to become runnable? I just forgot to answer this: on 6.x serie (note that setrunqueue() shouldn't be available on 7.x and above), setrunqueue() is just used as backend for setrunnable(). Usually, setrunqueue() was used in places where the state of the thread was alredy assumed and no further check were needed. Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Do you really "sleep" when blocked on a mutex?
2008/4/21, Murty, Ravi <[EMAIL PROTECTED]>: > Hello, > > > > When a thread cannot get a mutex (default mutex) and needs to be > blocked, is it really put to sleep? From looking at the code it appears > that it is inhibited (TD_SET_LOCK) but isn't really put to sleep. > >From a scheduler perspective, sleeping and blocking are 2 different events but from a low level look-through there is not much difference. What happens is that the thread is maked as inhibited, moved into a different queue than runqueue and switched out. What really changes is upper level behaviour. TD_SET_LOCK is used in conjuction with turnstiles which basically operate priority propagation on the 'lock owner' and so the turnstile becames the recipient for blocked thread (in other word, the turnstile is the 'different queue than runqueue' where the thread lives). TD_SET_SLEEP, on the other side, is used in conjuction with sleepqueue which doesn't operate priority propagation to lock owner (as long as, often, this owner is not present -- just think to wait channels). In this case the recipient is the sleepqueue and there the thread lives. On a low level the paradigm is similar when a thread blocks using the turnstile or sleeps using the sleepqueue: - the thread is running so it is nomore present on any runqueue - the thread is marked as inhibited - the relevant event is signalled (lock / sleep) - the thread is assigned to the new recipient (turnstile / sleepqueue) - the thread is switched out Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: pfind() in ithread handler
2008/2/28, Yuri Pankov <[EMAIL PROTECTED]>: > Hi, > > I'm trying to understand the cause of following panic. > > panic: Trying sleep, but thread marked as sleeping prohibited > cpuid = 0 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2a > panic() at panic+0x17d > sleepq_add() at sleepq_add+0x2e1 > _sx_slock_hard() at _sx_slock_hard+0x15d > _sx_slock() at _sx_slock+0xc1 > pfind() at pfind+0x24 > saa_intr() at saa_intr+0x313 > ithread_loop() at ithread_loop+0xda > fork_exit() at fork_exit+0x12a > fork_trampoline() at fork_trampoline+0xe > --- trap 0, rip = 0, rsp = 0xac3c0d30, rbp = 0 --- > > Can someone enlighten me on what is causing the panic and is it ok to > use pfind() in interrupt handler (I have very limited understanding of > kernel internals)? > > Code in question (taken from saa driver > http://freebsd.ricin.com/ports/distfiles/kbtv-1.92.tbz) can be found at > http://www.pastebin.ca/921830 ithreads cannot perform unbound sleeps and pfind() needs to access to allproc_lock which is a sx lock and *performs* an unbound sleep, so there is a mismatch. Btw, it sounds like allproc_lock and other similar locks are just using an sx lock for *legacy*, they should be probabilly converted to rwlock. It also imposes little problems in the TTY layer, for example, where I saw you need to use a unbounded sleeping primitive because of processes locks acquisitions. A nice janitor task would be to convert these sx locks into rw and see what happens. Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: a strange/stupid question
2007/11/24, Aryeh Friedman <[EMAIL PROTECTED]>: > Where do I find the main() [and/or other entery point] for the > kernel I tend to understand stuff better if I follow the flow of > exec from the start It is highly MD. For IA32 it is in i386/i386/locore.s::btext Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: a strange/stupid question
2007/11/24, Aryeh Friedman <[EMAIL PROTECTED]>: > On 11/24/07, Attilio Rao <[EMAIL PROTECTED]> wrote: > > 2007/11/24, Aryeh Friedman <[EMAIL PROTECTED]>: > > > Where do I find the main() [and/or other entery point] for the > > > kernel I tend to understand stuff better if I follow the flow of > > > exec from the start > > > > It is highly MD. > > For IA32 it is in i386/i386/locore.s::btext > > For AMD64 I assume something close to that... > > I just relized that I actually want to understand everything from POST > on (actually from power on but I know that is very mobo dependant) so > I guess the question is where do I find the first executed statement > for BTX (I know how to disamble the MBR so that part is not an issue) It should be: boot/i386/boot0/boot0.S::start Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Inner workings of turnstiles and sleepqueues
2007/10/19, John Baldwin <[EMAIL PROTECTED]>: > On Friday 19 October 2007 12:56:54 am Ed Schouten wrote: > > * John Baldwin <[EMAIL PROTECTED]> wrote: > > > The best option right now is to read the code. There are some comments in > > > both the headers and implementation. > > > > Would it be useful to write manpages for these interfaces, or do we > > assume that only godlike people can use them anyway? I am willing to > > write manpages for them. > > They already exist, but they really are only used to implement higher-level > primitives like locks and condition variables. The rest of the kernel should > use the higher-level primitives anyway. Well, really turnstiles don't have manpages, but this is still OK as they are only used in mutex while the "real" sleeping primitive should be identified by sleepqueues. Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"