Re: LK_SHARED/LK_DOWNGRADE adjustments to lock.9 manual page

2012-12-05 Thread Attilio Rao
On Thu, Nov 29, 2012 at 12:05 PM, Andriy Gapon a...@freebsd.org wrote:
 on 16/11/2012 16:42 Andriy Gapon said the following:
 on 15/11/2012 23:44 Attilio Rao said the following:
 Do you think you can test this patch?:
 http://www.freebsd.org/~attilio/lockmgr_forcerec.patch

 I will use this patch in my tree, but I think that it is effectively already 
 quite
 well tested by using INVARIANTS+WITNESS.


 I've been using this patch in both debug and non-debug environments and I 
 have not
 run into any issues.  Please commit when you get a chance.
 Thank you.

Committed as r243900, please proceed with manpage cleanup.

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: stop_cpus_hard when multiple CPUs are panicking from an NMI

2012-11-25 Thread Attilio Rao
On Fri, Nov 16, 2012 at 2:47 PM, Andriy Gapon a...@freebsd.org wrote:
 on 16/11/2012 16:41 Attilio Rao said the following:
 On Fri, Nov 16, 2012 at 1:18 PM, Andriy Gapon a...@freebsd.org wrote:
 on 16/11/2012 14:30 Attilio Rao said the following:
 On Fri, Nov 16, 2012 at 7:54 AM, Andriy Gapon a...@freebsd.org wrote:
 on 16/11/2012 00:58 Ryan Stone said the following:
 At work we have some custom watchdog hardware that sends an NMI upon
 expiry.  We've modified the kernel to panic when it receives the watchdog
 NMI.  I've been trying the stop_scheduler_on_panic mode, and I've
 discovered that when my watchdog expires, the system gets completely
 wedged.  After some digging, I've discovered is that I have multiple CPUs
 getting the watchdog NMI and trying to panic concurrently.  One of the 
 CPUs
 wins, and the rest spin forever in this code:

 /*
  * We don't want multiple CPU's to panic at the same time, so we
  * use panic_cpu as a simple spinlock.  We have to keep checking
  * panic_cpu if we are spinning in case the panic on the first
  * CPU is canceled.
  */
 if (panic_cpu != PCPU_GET(cpuid))
 while (atomic_cmpset_int(panic_cpu, NOCPU,
 PCPU_GET(cpuid)) == 0)
 while (panic_cpu != NOCPU)
 ; /* nothing */

 The system wedges when stop_cpus_hard() is called, which sends NMIs to 
 all
 of the other CPUs and waits for them to acknowledge that they are stopped
 before returning.  However the CPU will not deliver an NMI to a CPU that 
 is
 already handling an NMI, so the other CPUs that got a watchdog NMI and 
 are
 spinning will never go into the NMI handler and acknowledge that they are
 stopped.

 I thought about this issue and fixed (in my tree) in a different way:
 http://people.freebsd.org/~avg/cpu_stop-race.diff

 The need for spinlock_enter in the patch in not entirely clear.
 The main idea is that a CPU which calls cpu_stop and loses a race should
 voluntary enter cpustop_handler.
 I am also not sure about MI-cleanness of this patch.

 It is similar to what I propose but with some differences:
 - It is not clean from MI perspective

 OK.

 - I don't think we need to treact it specially, I would just
 unconditionally stop all the CPUs entering in the spinlock zone,
 making the patch simpler.

 I couldn't understand this part.

 I'm sorry, I think I misread your patch.

 I was basically suggesting to fix Ryan case by calling
 cpustop_handler() (or the new MI interface) into the panic() function,
 in the case the CPU don't win the race for panic_cpu.
 Basically doing:
 Index: sys/kern/kern_shutdown.c
 ===
 --- sys/kern/kern_shutdown.c(revision 243154)
 +++ sys/kern/kern_shutdown.c(working copy)
 @@ -568,15 +568,11 @@ panic(const char *fmt, ...)
  #ifdef SMP
 /*
  * We don't want multiple CPU's to panic at the same time, so we
 -* use panic_cpu as a simple spinlock.  We have to keep checking
 -* panic_cpu if we are spinning in case the panic on the first
 -* CPU is canceled.
 +* use panic_cpu as a simple lock.
  */
 if (panic_cpu != PCPU_GET(cpuid))
 -   while (atomic_cmpset_int(panic_cpu, NOCPU,
 -   PCPU_GET(cpuid)) == 0)
 -   while (panic_cpu != NOCPU)
 -   ; /* nothing */
 +   if (atomic_cmpset_int(panic_cpu, NOCPU, PCPU_GET(cpuid)) == 
 0)
 +   cpustop_handler();

 if (stop_scheduler_on_panic) {
 if (panicstr == NULL  !kdb_active) {


 Infact it seems to me that the comment is outdated and no longer
 represent truth.

 Ah, I see.  Thank you.

 My older plan was to get rid of stop_scheduler_on_panic, that is to make the
 behavior unconditional.  And then to use stop_cpus_hard instead of the 
 hand-rolled
 'panic_cpu' spinlock.  This way the whichever CPU wins stop_cpus_hard would 
 be the
 only CPU to enter panic.

So, assuming we are not yet defaulting to stop cpus behaviour for
panic, I think we have 2 things to take care:
1) Handling the simultaneous panics
2) Handling deadlocks/starvation among simultaneous cpu_stops

In particular, case 2 is more tricky than it seems, it is not only
related to NMI but also to the case where you send an IPI_STOP via a
normal IPI and interrupts are disabled on one of the channels. Infact,
I think the patch you propose makes such effects even worse, because
it disables interrupts in generic_stop_cpus().
What I suggest to do, is the following:
- The CPU which wins the race for generic_stop_cpus also signals the
CPUs it is willing to stop on a global mask
- Another CPU entering generic_stop_cpus() and loosing the race,
checks the mask of cpus which might be stopped and stops itself if
necessary (ie. not yet done). We must be careful with races here, but
I'm confindent this can be done easily enough.

About the case 1 I think

Re: stop_cpus_hard when multiple CPUs are panicking from an NMI

2012-11-25 Thread Attilio Rao
On Sun, Nov 25, 2012 at 12:29 PM, Attilio Rao atti...@freebsd.org wrote:
 On Fri, Nov 16, 2012 at 2:47 PM, Andriy Gapon a...@freebsd.org wrote:
 on 16/11/2012 16:41 Attilio Rao said the following:
 On Fri, Nov 16, 2012 at 1:18 PM, Andriy Gapon a...@freebsd.org wrote:
 on 16/11/2012 14:30 Attilio Rao said the following:
 On Fri, Nov 16, 2012 at 7:54 AM, Andriy Gapon a...@freebsd.org wrote:
 on 16/11/2012 00:58 Ryan Stone said the following:
 At work we have some custom watchdog hardware that sends an NMI upon
 expiry.  We've modified the kernel to panic when it receives the 
 watchdog
 NMI.  I've been trying the stop_scheduler_on_panic mode, and I've
 discovered that when my watchdog expires, the system gets completely
 wedged.  After some digging, I've discovered is that I have multiple 
 CPUs
 getting the watchdog NMI and trying to panic concurrently.  One of the 
 CPUs
 wins, and the rest spin forever in this code:

 /*
  * We don't want multiple CPU's to panic at the same time, so we
  * use panic_cpu as a simple spinlock.  We have to keep checking
  * panic_cpu if we are spinning in case the panic on the first
  * CPU is canceled.
  */
 if (panic_cpu != PCPU_GET(cpuid))
 while (atomic_cmpset_int(panic_cpu, NOCPU,
 PCPU_GET(cpuid)) == 0)
 while (panic_cpu != NOCPU)
 ; /* nothing */

 The system wedges when stop_cpus_hard() is called, which sends NMIs to 
 all
 of the other CPUs and waits for them to acknowledge that they are 
 stopped
 before returning.  However the CPU will not deliver an NMI to a CPU 
 that is
 already handling an NMI, so the other CPUs that got a watchdog NMI and 
 are
 spinning will never go into the NMI handler and acknowledge that they 
 are
 stopped.

 I thought about this issue and fixed (in my tree) in a different way:
 http://people.freebsd.org/~avg/cpu_stop-race.diff

 The need for spinlock_enter in the patch in not entirely clear.
 The main idea is that a CPU which calls cpu_stop and loses a race should
 voluntary enter cpustop_handler.
 I am also not sure about MI-cleanness of this patch.

 It is similar to what I propose but with some differences:
 - It is not clean from MI perspective

 OK.

 - I don't think we need to treact it specially, I would just
 unconditionally stop all the CPUs entering in the spinlock zone,
 making the patch simpler.

 I couldn't understand this part.

 I'm sorry, I think I misread your patch.

 I was basically suggesting to fix Ryan case by calling
 cpustop_handler() (or the new MI interface) into the panic() function,
 in the case the CPU don't win the race for panic_cpu.
 Basically doing:
 Index: sys/kern/kern_shutdown.c
 ===
 --- sys/kern/kern_shutdown.c(revision 243154)
 +++ sys/kern/kern_shutdown.c(working copy)
 @@ -568,15 +568,11 @@ panic(const char *fmt, ...)
  #ifdef SMP
 /*
  * We don't want multiple CPU's to panic at the same time, so we
 -* use panic_cpu as a simple spinlock.  We have to keep checking
 -* panic_cpu if we are spinning in case the panic on the first
 -* CPU is canceled.
 +* use panic_cpu as a simple lock.
  */
 if (panic_cpu != PCPU_GET(cpuid))
 -   while (atomic_cmpset_int(panic_cpu, NOCPU,
 -   PCPU_GET(cpuid)) == 0)
 -   while (panic_cpu != NOCPU)
 -   ; /* nothing */
 +   if (atomic_cmpset_int(panic_cpu, NOCPU, PCPU_GET(cpuid)) 
 == 0)
 +   cpustop_handler();

 if (stop_scheduler_on_panic) {
 if (panicstr == NULL  !kdb_active) {


 Infact it seems to me that the comment is outdated and no longer
 represent truth.

 Ah, I see.  Thank you.

 My older plan was to get rid of stop_scheduler_on_panic, that is to make the
 behavior unconditional.  And then to use stop_cpus_hard instead of the 
 hand-rolled
 'panic_cpu' spinlock.  This way the whichever CPU wins stop_cpus_hard would 
 be the
 only CPU to enter panic.

 So, assuming we are not yet defaulting to stop cpus behaviour for
 panic, I think we have 2 things to take care:
 1) Handling the simultaneous panics
 2) Handling deadlocks/starvation among simultaneous cpu_stops

 In particular, case 2 is more tricky than it seems, it is not only
 related to NMI but also to the case where you send an IPI_STOP via a
 normal IPI and interrupts are disabled on one of the channels. Infact,
 I think the patch you propose makes such effects even worse, because
 it disables interrupts in generic_stop_cpus().
 What I suggest to do, is the following:
 - The CPU which wins the race for generic_stop_cpus also signals the
 CPUs it is willing to stop on a global mask
 - Another CPU entering generic_stop_cpus() and loosing the race,
 checks the mask of cpus which might be stopped and stops itself if
 necessary (ie. not yet done). We must be careful

Re: [RFQ] make witness panic an option

2012-11-25 Thread Attilio Rao
On Sun, Nov 25, 2012 at 12:39 PM, Pawel Jakub Dawidek p...@freebsd.org wrote:
 On Thu, Nov 15, 2012 at 04:39:55PM +, Attilio Rao wrote:
 On 11/15/12, Adrian Chadd adr...@freebsd.org wrote:
  On 15 November 2012 05:27, Giovanni Trematerra
  giovanni.tremate...@gmail.com wrote:
 
  I really do think that is a very bad idea.
  When a locking assertion fails you have just to stop your mind and
  think what's wrong,
  no way to postpone on this.
 
  Not all witness panics are actually fatal. For a developer who is
  sufficiently cluey in their area, they are quite likely able to just
  stare at the code paths for a while to figure out why the
  incorrectness occured.

 The problem is that such mechanism can be abused, just like the
 BLESSING one and that's why this is disabled by default.

 WITNESS is a development tool. We don't ship production kernels with
 WITNESS even compiled in. What is more efficient use of developer time:
 going through full reboot cycle every time or reading the warning from
 console, unloading a module, fixing the bug and loading it again?

 And if this option is turned off by default what is the problem?

Yes, so, why do you write here?
Go ahead and fix BLESSED, make it the default, etc.

I have enough of your (not referred to you particulary but to the
people which contributed to this and other thread) to not be able to
respect others opinion.
As I said I cannot forbid you guys from doing anything, just go ahead,
write the code and commit it, albeit completely bypassing other
people's opinion.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [RFQ] make witness panic an option

2012-11-25 Thread Attilio Rao
On Sun, Nov 25, 2012 at 1:12 PM, Pawel Jakub Dawidek p...@freebsd.org wrote:
 On Sun, Nov 25, 2012 at 12:42:16PM +, Attilio Rao wrote:
 On Sun, Nov 25, 2012 at 12:39 PM, Pawel Jakub Dawidek p...@freebsd.org 
 wrote:
  WITNESS is a development tool. We don't ship production kernels with
  WITNESS even compiled in. What is more efficient use of developer time:
  going through full reboot cycle every time or reading the warning from
  console, unloading a module, fixing the bug and loading it again?
 
  And if this option is turned off by default what is the problem?

 Yes, so, why do you write here?

 I'm trying to understand why do you object. Until now the only concern
 you have that I found is that you are afraid of it being abused. I don't
 see how this can be abused if it is turned off by default. If someone
 will commit a change that will turn it on by default, believe me, I'll
 unleash hell personally.

So I don't understand what are you proposing.
You are not proposing to switch BLESSING on and you are not proposing
to import Adrian's patches in, if I get it correctly. I don't
understand then.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [RFQ] make witness panic an option

2012-11-25 Thread Attilio Rao
On Sun, Nov 25, 2012 at 1:47 PM, Pawel Jakub Dawidek p...@freebsd.org wrote:
 On Sun, Nov 25, 2012 at 01:37:19PM +, Attilio Rao wrote:
 On Sun, Nov 25, 2012 at 1:12 PM, Pawel Jakub Dawidek p...@freebsd.org 
 wrote:
  On Sun, Nov 25, 2012 at 12:42:16PM +, Attilio Rao wrote:
  On Sun, Nov 25, 2012 at 12:39 PM, Pawel Jakub Dawidek p...@freebsd.org 
  wrote:
   WITNESS is a development tool. We don't ship production kernels with
   WITNESS even compiled in. What is more efficient use of developer time:
   going through full reboot cycle every time or reading the warning from
   console, unloading a module, fixing the bug and loading it again?
  
   And if this option is turned off by default what is the problem?
 
  Yes, so, why do you write here?
 
  I'm trying to understand why do you object. Until now the only concern
  you have that I found is that you are afraid of it being abused. I don't
  see how this can be abused if it is turned off by default. If someone
  will commit a change that will turn it on by default, believe me, I'll
  unleash hell personally.

 So I don't understand what are you proposing.
 You are not proposing to switch BLESSING on and you are not proposing
 to import Adrian's patches in, if I get it correctly. I don't
 understand then.

 I propose to get Adrian's patches in, just leave current behaviour as
 the default.

So if I tell that I'm afraid this mechanism will be abused (and
believe me, I really wanted to trimm out BLESSING stuff also for the
same reason) and you say you can't see how there is not much we can
discuss.

You know how I think, there is no need to wait for me to reconsider,
because I don't believe this will happen with arguments like I don't
think, I don't agree, etc.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: stop_cpus_hard when multiple CPUs are panicking from an NMI

2012-11-25 Thread Attilio Rao
On Sun, Nov 25, 2012 at 12:55 PM, Andriy Gapon a...@freebsd.org wrote:
 on 25/11/2012 14:29 Attilio Rao said the following:
 I think the patch you propose makes such effects even worse, because
 it disables interrupts in generic_stop_cpus().
 What I suggest to do, is the following:
 - The CPU which wins the race for generic_stop_cpus also signals the
 CPUs it is willing to stop on a global mask
 - Another CPU entering generic_stop_cpus() and loosing the race,
 checks the mask of cpus which might be stopped and stops itself if
 necessary (ie. not yet done). We must be careful with races here, but
 I'm confindent this can be done easily enough.

 I think that you either misunderstood my patch or I misunderstand your
 suggestion, because my patch does exactly what you wrote above.

The patch is someway incomplete:
- I don't think that we need specific checks in cpustop_handler() (and
if you have added them to prevent races, I don't think they are
enough, see below)
- setting of stopping_cpus map must happen atomically/before the
stopper_cpu cpuid setting, otherwise some CPUs may end up using a NULL
mask in the check
- Did you consider the races about when a stop and restart request
happen just after the CPU_ISSET() check? I think CPUs can deadlock
there.
- I'm very doubious about the spinlock_enter() stuff, I think I can
just make the problem worse atm.

However you are right, the concept of your patch is the same I really
wanted to get, we maybe need to just lift it up a bit.

In the while I also double-checked suspended_cpus and I don't think
there are real showstoppers to have it in stopped_cpus map.

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [RFQ] make witness panic an option

2012-11-25 Thread Attilio Rao
On Sun, Nov 25, 2012 at 2:06 PM, Pawel Jakub Dawidek p...@freebsd.org wrote:
 On Sun, Nov 25, 2012 at 01:48:23PM +, Attilio Rao wrote:
 On Sun, Nov 25, 2012 at 1:47 PM, Pawel Jakub Dawidek p...@freebsd.org 
 wrote:
  On Sun, Nov 25, 2012 at 01:37:19PM +, Attilio Rao wrote:
  On Sun, Nov 25, 2012 at 1:12 PM, Pawel Jakub Dawidek p...@freebsd.org 
  wrote:
   On Sun, Nov 25, 2012 at 12:42:16PM +, Attilio Rao wrote:
   On Sun, Nov 25, 2012 at 12:39 PM, Pawel Jakub Dawidek 
   p...@freebsd.org wrote:
WITNESS is a development tool. We don't ship production kernels with
WITNESS even compiled in. What is more efficient use of developer 
time:
going through full reboot cycle every time or reading the warning 
from
console, unloading a module, fixing the bug and loading it again?
   
And if this option is turned off by default what is the problem?
  
   Yes, so, why do you write here?
  
   I'm trying to understand why do you object. Until now the only concern
   you have that I found is that you are afraid of it being abused. I don't
   see how this can be abused if it is turned off by default. If someone
   will commit a change that will turn it on by default, believe me, I'll
   unleash hell personally.
 
  So I don't understand what are you proposing.
  You are not proposing to switch BLESSING on and you are not proposing
  to import Adrian's patches in, if I get it correctly. I don't
  understand then.
 
  I propose to get Adrian's patches in, just leave current behaviour as
  the default.

 So if I tell that I'm afraid this mechanism will be abused (and
 believe me, I really wanted to trimm out BLESSING stuff also for the
 same reason) and you say you can't see how there is not much we can
 discuss.

 This is not what I said. I would see it as abuse if someone will
 suddenly decided to turn off locking assertions by default in FreeBSD
 base.

 If he will turn that off on his private machine be it to speed up his
 development (a good thing) or to shut up important lock assertion (a bad
 thing) this is entirely his decision. He can already do that having all
 the source code, its just more complex. Make tools, not policies.

 BLESSING is totally different subject. You were afraid that people will
 start to silence LORs they don't understand by committing blessed pairs
 to FreeBSD base. And this situation is abuse and I fully agree, but I
 also still think BLESSING is useful, although I recognize it might be
 hard to prevent mentioned abuse.

 In case of Adrian's patch nothing will change in how we enforce locking
 assertions in FreeBSD base.

 You know how I think, there is no need to wait for me to reconsider,
 because I don't believe this will happen with arguments like I don't
 think, I don't agree, etc.

 I provide valid arguments with I hope proper explanation, you choose not
 to address them or ignore them and I hope this will change:)

I'm not ignoring them, I'm saying that your arguments are not enough
convincing to me.
And really, giving the possibility to turn off assertions in witness
is already a dangerous tool I want to avoid (not only related to
BLESSING). If there are some cases that deserve a panic, we might just
get it, not matter how sysctls are setup.

However it seems to me I'm just saying the same thing since 20
e-mails, please drop me from CC in your next follow up. As I said, you
can commit all the changes you want (assuming they are technically
correct) even if I would appreciate my disagreement is expressed in
the commit message.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [RFQ] make witness panic an option

2012-11-25 Thread Attilio Rao
On Sun, Nov 25, 2012 at 1:12 PM, Pawel Jakub Dawidek p...@freebsd.org wrote:
 On Sun, Nov 25, 2012 at 12:42:16PM +, Attilio Rao wrote:
 On Sun, Nov 25, 2012 at 12:39 PM, Pawel Jakub Dawidek p...@freebsd.org 
 wrote:
  WITNESS is a development tool. We don't ship production kernels with
  WITNESS even compiled in. What is more efficient use of developer time:
  going through full reboot cycle every time or reading the warning from
  console, unloading a module, fixing the bug and loading it again?
 
  And if this option is turned off by default what is the problem?

 Yes, so, why do you write here?

 I'm trying to understand why do you object. Until now the only concern
 you have that I found is that you are afraid of it being abused. I don't
 see how this can be abused if it is turned off by default. If someone
 will commit a change that will turn it on by default, believe me, I'll
 unleash hell personally.

 As I said, WITNESS is development tool, a very handy one. This doesn't
 mean we can't make it even more handy. It is there to help find bugs
 faster, right? Adrian is proposing a change that will make it help to
 find and fix bugs maybe even faster.

 Go ahead and fix BLESSED, make it the default, etc.

 This is another story, but BLESSED is much less controversial to me.
 It is turned off by default in assumption that all the code that runs in
 our kernel is developed for FreeBSD, which is not true. For example ZFS
 is, I think, the biggest locking consumer in our kernel (around 120
 locks), which wasn't originally developed for FreeBSD and locking order
 was verified using different tools. Now on FreeBSD it triggers massive
 LOR warnings from WITNESS, eventhough those are not bugs. At some point
 I verified many of them and they were all false-positives, so I simply
 turned off WITNESS warnings for ZFS locks. Why? Because BLESSED is
 turned off in fear of abuse, and this is turn is the cause of mentioned
 hack in ZFS.

Just a few notes about this: to my knowledge you never discussed
publically this. WITNESS is not a perfect tool and it has several
issues (I tried to summarize them a bit at the beginning of this
thread), where the biggest problem is that its file/line approach
doesn't help when willing to shutdown specific LOR without shutting
down all of them involving the lock pair, etc.

I have no idea what are the problems you are facing here with WITNESS
and ZFS but if you could summarize them we can work on getting
something which is useful also with ZFS.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: stop_cpus_hard when multiple CPUs are panicking from an NMI

2012-11-25 Thread Attilio Rao
On Sun, Nov 25, 2012 at 3:39 PM, Andriy Gapon a...@freebsd.org wrote:
 on 25/11/2012 16:01 Attilio Rao said the following:
 On Sun, Nov 25, 2012 at 12:55 PM, Andriy Gapon a...@freebsd.org wrote:
 on 25/11/2012 14:29 Attilio Rao said the following:
 I think the patch you propose makes such effects even worse, because
 it disables interrupts in generic_stop_cpus().
 What I suggest to do, is the following:
 - The CPU which wins the race for generic_stop_cpus also signals the
 CPUs it is willing to stop on a global mask
 - Another CPU entering generic_stop_cpus() and loosing the race,
 checks the mask of cpus which might be stopped and stops itself if
 necessary (ie. not yet done). We must be careful with races here, but
 I'm confindent this can be done easily enough.

 I think that you either misunderstood my patch or I misunderstand your
 suggestion, because my patch does exactly what you wrote above.

 The patch is someway incomplete:
 - I don't think that we need specific checks in cpustop_handler() (and
 if you have added them to prevent races, I don't think they are
 enough, see below)

 The check is to cover the following scenario:
 - two CPUs race in hard-stop scenario, and both are in NMI contexts

Please consider also another type of (similar race): two CPUs race in
(normal) stop scenario and the loser is with interrupt disabled. I
don't this will be completely fatal, but I'm sure it can lead to
issues (double stop, etc.).

 - one CPU wins and the other does spinning right in generic_stop_cpus
 - NMI for the spinning CPU is blocked and remains pending
 - the winning CPU releases all other CPUs
 - the spinning CPU exits the NMI context and get the NMI which was pending

 - setting of stopping_cpus map must happen atomically/before the
 stopper_cpu cpuid setting, otherwise some CPUs may end up using a NULL
 mask in the check

 Not NULL, but empty or stale.  But a very good point, I agree.
 The logic must be redone.

 - Did you consider the races about when a stop and restart request
 happen just after the CPU_ISSET() check? I think CPUs can deadlock
 there.

 Yeah, good point again.  This seems to be a different side of the issue above.
 stopping_cpus is probably a bad idea.

 - I'm very doubious about the spinlock_enter() stuff, I think I can
 just make the problem worse atm.

 Well, this is where I disagree.  I think that cpu_stop must already be called 
 in
 context which effectively disable pre-emption and interrupts.
 The added spinlock_enter() stuff is kind of a safety belt to make things more
 explicit, but it could be changed into some sort of an assert if that's 
 possible.

Maybe it can be come really safe once we take care of all the races
involved and reported above. I reserve to suspend my judgement until
we don't care of the other races.


 However you are right, the concept of your patch is the same I really
 wanted to get, we maybe need to just lift it up a bit.

 I agree.

 To add some gas to the fire.  Do you recall my wish to drop the mask parameter
 from the stop calls?  If that was done then it would be simpler to handle 
 these
 things.  In that case only stopper_cpu ID (master/winner) would be needed.

If you really want to do something like that please rename
s/generic_stop_cpus/generic_stop_butself() or similar convention and I
may be not opposed to it.

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: stop_cpus_hard when multiple CPUs are panicking from an NMI

2012-11-25 Thread Attilio Rao
On Sun, Nov 25, 2012 at 7:20 PM, Andriy Gapon a...@freebsd.org wrote:
 on 25/11/2012 20:05 Attilio Rao said the following:
 If you really want to do something like that please rename
 s/generic_stop_cpus/generic_stop_butself() or similar convention and I
 may be not opposed to it.

 As we discussed before, anything else besides all but self does not make
 sense.  So I am not sure what's the point of being that verbose in the naming.

Avoid POLA violation and show there is a change in the KPI?

I'd personally prefer a new name, but I'm open on not getting it.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: stop_cpus_hard when multiple CPUs are panicking from an NMI

2012-11-16 Thread Attilio Rao
On Fri, Nov 16, 2012 at 7:54 AM, Andriy Gapon a...@freebsd.org wrote:
 on 16/11/2012 00:58 Ryan Stone said the following:
 At work we have some custom watchdog hardware that sends an NMI upon
 expiry.  We've modified the kernel to panic when it receives the watchdog
 NMI.  I've been trying the stop_scheduler_on_panic mode, and I've
 discovered that when my watchdog expires, the system gets completely
 wedged.  After some digging, I've discovered is that I have multiple CPUs
 getting the watchdog NMI and trying to panic concurrently.  One of the CPUs
 wins, and the rest spin forever in this code:

 /*
  * We don't want multiple CPU's to panic at the same time, so we
  * use panic_cpu as a simple spinlock.  We have to keep checking
  * panic_cpu if we are spinning in case the panic on the first
  * CPU is canceled.
  */
 if (panic_cpu != PCPU_GET(cpuid))
 while (atomic_cmpset_int(panic_cpu, NOCPU,
 PCPU_GET(cpuid)) == 0)
 while (panic_cpu != NOCPU)
 ; /* nothing */

 The system wedges when stop_cpus_hard() is called, which sends NMIs to all
 of the other CPUs and waits for them to acknowledge that they are stopped
 before returning.  However the CPU will not deliver an NMI to a CPU that is
 already handling an NMI, so the other CPUs that got a watchdog NMI and are
 spinning will never go into the NMI handler and acknowledge that they are
 stopped.

 I thought about this issue and fixed (in my tree) in a different way:
 http://people.freebsd.org/~avg/cpu_stop-race.diff

 The need for spinlock_enter in the patch in not entirely clear.
 The main idea is that a CPU which calls cpu_stop and loses a race should
 voluntary enter cpustop_handler.
 I am also not sure about MI-cleanness of this patch.

It is similar to what I propose but with some differences:
- It is not clean from MI perspective
- I don't think we need to treact it specially, I would just
unconditionally stop all the CPUs entering in the spinlock zone,
making the patch simpler.

So I guess you are really fine with the proposal I made?

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [RFQ] make witness panic an option

2012-11-16 Thread Attilio Rao
On Fri, Nov 16, 2012 at 7:19 AM, Andriy Gapon a...@freebsd.org wrote:
 on 16/11/2012 01:38 Attilio Rao said the following:
 On Thu, Nov 15, 2012 at 8:51 PM, Andriy Gapon a...@freebsd.org wrote:
 on 15/11/2012 22:00 Adrian Chadd said the following:
 But I think my change is invaluable for development, where you want to
 improve and debug the locking and lock interactions of a subsystem.

 My practical experience was that if you mess up one lock in one place, then 
 it
 is a total mess further on.  but apparently you've got a different practical
 experience :)
 What would indeed be invaluable to _me_ - if the LOR messages also produced 
 the
 stack(s) where a supposedly correct lock order was learned.

 Please note that the supposedly correct lock order, as for the
 definition that it is correct, can be used in several different
 stacks. I don't see the point of saving it somewhere.
 The only helpful case would be if the wrong order is catched first.
 If this is really the case, I suggest you to force the order you
 expect in the static table so that the first time the wrong order
 happens it yells.

 Exactly my point - if I am a user of some code, not its developer, and I don't
 know which one is the correct order I would have had the complete information
 from the very start instead of having to jump through the hoops.

I don't agree -- such informations are only useful to developers and
also what should we do, store all the valid stacktraces?
I don't understand what are you expecting here honestly.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: stop_cpus_hard when multiple CPUs are panicking from an NMI

2012-11-16 Thread Attilio Rao
On Fri, Nov 16, 2012 at 1:18 PM, Andriy Gapon a...@freebsd.org wrote:
 on 16/11/2012 14:30 Attilio Rao said the following:
 On Fri, Nov 16, 2012 at 7:54 AM, Andriy Gapon a...@freebsd.org wrote:
 on 16/11/2012 00:58 Ryan Stone said the following:
 At work we have some custom watchdog hardware that sends an NMI upon
 expiry.  We've modified the kernel to panic when it receives the watchdog
 NMI.  I've been trying the stop_scheduler_on_panic mode, and I've
 discovered that when my watchdog expires, the system gets completely
 wedged.  After some digging, I've discovered is that I have multiple CPUs
 getting the watchdog NMI and trying to panic concurrently.  One of the CPUs
 wins, and the rest spin forever in this code:

 /*
  * We don't want multiple CPU's to panic at the same time, so we
  * use panic_cpu as a simple spinlock.  We have to keep checking
  * panic_cpu if we are spinning in case the panic on the first
  * CPU is canceled.
  */
 if (panic_cpu != PCPU_GET(cpuid))
 while (atomic_cmpset_int(panic_cpu, NOCPU,
 PCPU_GET(cpuid)) == 0)
 while (panic_cpu != NOCPU)
 ; /* nothing */

 The system wedges when stop_cpus_hard() is called, which sends NMIs to all
 of the other CPUs and waits for them to acknowledge that they are stopped
 before returning.  However the CPU will not deliver an NMI to a CPU that is
 already handling an NMI, so the other CPUs that got a watchdog NMI and are
 spinning will never go into the NMI handler and acknowledge that they are
 stopped.

 I thought about this issue and fixed (in my tree) in a different way:
 http://people.freebsd.org/~avg/cpu_stop-race.diff

 The need for spinlock_enter in the patch in not entirely clear.
 The main idea is that a CPU which calls cpu_stop and loses a race should
 voluntary enter cpustop_handler.
 I am also not sure about MI-cleanness of this patch.

 It is similar to what I propose but with some differences:
 - It is not clean from MI perspective

 OK.

 - I don't think we need to treact it specially, I would just
 unconditionally stop all the CPUs entering in the spinlock zone,
 making the patch simpler.

 I couldn't understand this part.

I'm sorry, I think I misread your patch.

I was basically suggesting to fix Ryan case by calling
cpustop_handler() (or the new MI interface) into the panic() function,
in the case the CPU don't win the race for panic_cpu.
Basically doing:
Index: sys/kern/kern_shutdown.c
===
--- sys/kern/kern_shutdown.c(revision 243154)
+++ sys/kern/kern_shutdown.c(working copy)
@@ -568,15 +568,11 @@ panic(const char *fmt, ...)
 #ifdef SMP
/*
 * We don't want multiple CPU's to panic at the same time, so we
-* use panic_cpu as a simple spinlock.  We have to keep checking
-* panic_cpu if we are spinning in case the panic on the first
-* CPU is canceled.
+* use panic_cpu as a simple lock.
 */
if (panic_cpu != PCPU_GET(cpuid))
-   while (atomic_cmpset_int(panic_cpu, NOCPU,
-   PCPU_GET(cpuid)) == 0)
-   while (panic_cpu != NOCPU)
-   ; /* nothing */
+   if (atomic_cmpset_int(panic_cpu, NOCPU, PCPU_GET(cpuid)) == 0)
+   cpustop_handler();

if (stop_scheduler_on_panic) {
if (panicstr == NULL  !kdb_active) {


Infact it seems to me that the comment is outdated and no longer
represent truth.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [RFQ] make witness panic an option

2012-11-15 Thread Attilio Rao
On 11/15/12, Adrian Chadd adr...@freebsd.org wrote:
 On 15 November 2012 05:27, Giovanni Trematerra
 giovanni.tremate...@gmail.com wrote:

 I really do think that is a very bad idea.
 When a locking assertion fails you have just to stop your mind and
 think what's wrong,
 no way to postpone on this.

 Not all witness panics are actually fatal. For a developer who is
 sufficiently cluey in their area, they are quite likely able to just
 stare at the code paths for a while to figure out why the
 incorrectness occured.

The problem is that such mechanism can be abused, just like the
BLESSING one and that's why this is disabled by default.

I believe having a mechanism to use printf for witness is not a good idea.

 As I said, I do this primarily so I can sprinkle lots of lock
 owned/unowned assertions around my driver(s) and then use that to
 catch when things aren't being correct. Having to reboot upon _every_
 lock assertion quickly got old.

You can use it as a local patch then. This is not really the usual way
to develop locking policies.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [RFQ] make witness panic an option

2012-11-15 Thread Attilio Rao
On 11/15/12, Ian Lepore free...@damnhippie.dyndns.org wrote:
 On Wed, 2012-11-14 at 22:15 -0800, Adrian Chadd wrote:
 Hi all,

 When debugging and writing wireless drivers/stack code, I like to
 sprinkle lots of locking assertions everywhere. However, this does
 cause things to panic quite often during active development.

 This patch (against stable/9) makes the actual panic itself
 configurable. It still prints the message regardless.

 This has allowed me to sprinkle more locking assertions everywhere to
 investigate whether particular paths have been hit or not. I don't
 necessarily want those to panic the kernel.

 I'd like everyone to consider this for FreeBSD-HEAD.

 Thanks!

 I strongly support this, because I'm tired of having to hack it in by
 hand every time I need it.

 You can't boot an arm platform right now (on freebsd 8, 9, or 10)
 without a LOR very early in the boot.  Once you get past that, 2 or 3
 device drivers I use panic way before we even get to mounting root.
 Those panics can clearly be ignored, because we've been shipping
 products for years based on this code.  (It's on my to-do list to fix
 them, but more pressing problems are higher on the list.)

This is a ridicolous motivation.
What are the panics in question? Why they are not fixed yet?
Without WITNESS_KDB you should not panic even in cases where WITNESS
yells. So if you do, it means there is a more subdole breakage going
on here.

Do you really think that an abusable mechanism will help here rather
than fixing the actual problems?

 When a new problem crops up that isn't harmless, it totally sucks that I
 can't just turn on witness without first hacking the code to make the
 known problems non-panicky.

I really don't understand what are these harmless problems here.
I just know one and it is between the dirhash lock and the bufwait
lock for UFS, which is carefully documented in the code comments. All
the others cases haven't been analyzed deeply enough to quantify them
as harmless.

Can you please make real examples?

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [RFQ] make witness panic an option

2012-11-15 Thread Attilio Rao
On 11/15/12, Warner Losh i...@bsdimp.com wrote:

 On Nov 15, 2012, at 10:47 AM, Attilio Rao wrote:

 On 11/15/12, Ian Lepore free...@damnhippie.dyndns.org wrote:
 On Wed, 2012-11-14 at 22:15 -0800, Adrian Chadd wrote:
 Hi all,

 When debugging and writing wireless drivers/stack code, I like to
 sprinkle lots of locking assertions everywhere. However, this does
 cause things to panic quite often during active development.

 This patch (against stable/9) makes the actual panic itself
 configurable. It still prints the message regardless.

 This has allowed me to sprinkle more locking assertions everywhere to
 investigate whether particular paths have been hit or not. I don't
 necessarily want those to panic the kernel.

 I'd like everyone to consider this for FreeBSD-HEAD.

 Thanks!

 I strongly support this, because I'm tired of having to hack it in by
 hand every time I need it.

 You can't boot an arm platform right now (on freebsd 8, 9, or 10)
 without a LOR very early in the boot.  Once you get past that, 2 or 3
 device drivers I use panic way before we even get to mounting root.
 Those panics can clearly be ignored, because we've been shipping
 products for years based on this code.  (It's on my to-do list to fix
 them, but more pressing problems are higher on the list.)

 This is a ridicolous motivation.
 What are the panics in question? Why they are not fixed yet?
 Without WITNESS_KDB you should not panic even in cases where WITNESS
 yells. So if you do, it means there is a more subdole breakage going
 on here.

 Do you really think that an abusable mechanism will help here rather
 than fixing the actual problems?

 When a new problem crops up that isn't harmless, it totally sucks that I
 can't just turn on witness without first hacking the code to make the
 known problems non-panicky.

 I really don't understand what are these harmless problems here.
 I just know one and it is between the dirhash lock and the bufwait
 lock for UFS, which is carefully documented in the code comments. All
 the others cases haven't been analyzed deeply enough to quantify them
 as harmless.

 Can you please make real examples?

 It sounds like he's more worried about introducing LoRs into his wireless
 code. They are harmless, for him, and he can fix them by reloading the
 driver. They are only harmful if he loses a race.

Without WITNESS_KDB the kernel won't panic if this is really the case,
if it is only about LOR yelling.
Otherwise the breakage is more serious and I would like to see a real
case example.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [RFQ] make witness panic an option

2012-11-15 Thread Attilio Rao
On 11/15/12, Adrian Chadd adr...@freebsd.org wrote:
 On 15 November 2012 09:56, Warner Losh i...@bsdimp.com wrote:

 Do you really think that an abusable mechanism will help here rather

 It sounds like he's more worried about introducing LoRs into his wireless
 code. They are harmless, for him, and he can fix them by reloading the
 driver. They are only harmful if he loses a race.

 LOR's and lock assertions. Ie, I want to find out at run time that a
 lock wasnt' held at a certain position but continue soldiering on.
 That's how I've been finding out all of the races in net80211/ath, as
 there wasn't locking where there needed to be.

I think that your worries are focused more around the latter than the
former, which can be easilly shut down already today.

And frankly I will never be in favor of a patch that automatically
shutdowns lock assertion. Please patch your local code to do so but
don't add any generic/upstream/all-around mechanism for that.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [RFQ] make witness panic an option

2012-11-15 Thread Attilio Rao
On 11/15/12, Adrian Chadd adr...@freebsd.org wrote:
 On 15 November 2012 10:01, Attilio Rao atti...@freebsd.org wrote:
 I think that your worries are focused more around the latter than the
 former, which can be easilly shut down already today.

 And frankly I will never be in favor of a patch that automatically
 shutdowns lock assertion. Please patch your local code to do so but
 don't add any generic/upstream/all-around mechanism for that.

 Would a comprimise be ok? Ie, if I pushed everything but the sysctl
 upstream, and just defaulted it to always panic?

 That way my diff wouldn't have to be a big thing; I'd just add the sysctl.

I cannot forbid you from doing anything, I'm just giving you my
opinion as the person who co-authored current WITNESS and locking
primitives code.

I think what you want to do is dangerous and highly abusable, so I'm
not in favor of it at all, in whatever form it is. I understand you
want to minimize your development patchset with upstream, but I think
this is certainly not the way to go.
That's also why I never formalized the BLESSING mechanism in WITNESS,
for example. I already see WITNESS_KDB as an abuse, but at least until
we have a way to make specific LOR (based on file/line, unfortunately)
to be marked as harmless there is no way we can get rid of
WITNESS_KDB.

Said all that, you are free to do what you want, but if you commit
anything in this area make sure your patch is reviewed by someone else
and to state my firm disagreement with this approach in the commit
message.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: LK_SHARED/LK_DOWNGRADE adjustments to lock.9 manual page

2012-11-15 Thread Attilio Rao
On 11/15/12, Andriy Gapon a...@freebsd.org wrote:

 To people knowing the code,

 do the following documentation changes look correct?

The latter chunk is not correct.
It will panic only if assertions are on. I was thinking that however
it would be good idea to patch lockmgr to panic also in non-debugging
kernel situation.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: LK_SHARED/LK_DOWNGRADE adjustments to lock.9 manual page

2012-11-15 Thread Attilio Rao
On Thu, Nov 15, 2012 at 8:38 PM, Andriy Gapon a...@freebsd.org wrote:
 on 15/11/2012 20:46 Attilio Rao said the following:
 On 11/15/12, Andriy Gapon a...@freebsd.org wrote:

 To people knowing the code,

 do the following documentation changes look correct?

 The latter chunk is not correct.
 It will panic only if assertions are on.

 But the current content is not correct too?

Indeed, current content is crappy.

 I was thinking that however
 it would be good idea to patch lockmgr to panic also in non-debugging
 kernel situation.

 It would make sense indeed, IMO.

Do you think you can test this patch?:
http://www.freebsd.org/~attilio/lockmgr_forcerec.patch

I think the LK_NOSHARE case is still fine with just asserts.

Once this patch goes in, you are free to commit your documentation one.
Thanks for fixing doc.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [RFQ] make witness panic an option

2012-11-15 Thread Attilio Rao
On Thu, Nov 15, 2012 at 8:51 PM, Andriy Gapon a...@freebsd.org wrote:
 on 15/11/2012 22:00 Adrian Chadd said the following:
 But I think my change is invaluable for development, where you want to
 improve and debug the locking and lock interactions of a subsystem.

 My practical experience was that if you mess up one lock in one place, then it
 is a total mess further on.  but apparently you've got a different practical
 experience :)
 What would indeed be invaluable to _me_ - if the LOR messages also produced 
 the
 stack(s) where a supposedly correct lock order was learned.

Please note that the supposedly correct lock order, as for the
definition that it is correct, can be used in several different
stacks. I don't see the point of saving it somewhere.
The only helpful case would be if the wrong order is catched first.
If this is really the case, I suggest you to force the order you
expect in the static table so that the first time the wrong order
happens it yells.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: stop_cpus_hard when multiple CPUs are panicking from an NMI

2012-11-15 Thread Attilio Rao
On Thu, Nov 15, 2012 at 10:58 PM, Ryan Stone ryst...@gmail.com wrote:
 At work we have some custom watchdog hardware that sends an NMI upon
 expiry.  We've modified the kernel to panic when it receives the watchdog
 NMI.  I've been trying the stop_scheduler_on_panic mode, and I've
 discovered that when my watchdog expires, the system gets completely
 wedged.  After some digging, I've discovered is that I have multiple CPUs
 getting the watchdog NMI and trying to panic concurrently.  One of the CPUs
 wins, and the rest spin forever in this code:

Quick question: can you control the way your watchdog sends the NMI?
Like only to BSP rather than broadcast, etc.
This is tied to the very unique situation that you cannot really
deliver the (second) NMI.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: stop_cpus_hard when multiple CPUs are panicking from an NMI

2012-11-15 Thread Attilio Rao
On Thu, Nov 15, 2012 at 11:47 PM, Ryan Stone ryst...@gmail.com wrote:
 On Thu, Nov 15, 2012 at 6:41 PM, Attilio Rao atti...@freebsd.org wrote:

 On Thu, Nov 15, 2012 at 10:58 PM, Ryan Stone ryst...@gmail.com wrote:
  At work we have some custom watchdog hardware that sends an NMI upon
  expiry.  We've modified the kernel to panic when it receives the
  watchdog
  NMI.  I've been trying the stop_scheduler_on_panic mode, and I've
  discovered that when my watchdog expires, the system gets completely
  wedged.  After some digging, I've discovered is that I have multiple
  CPUs
  getting the watchdog NMI and trying to panic concurrently.  One of the
  CPUs
  wins, and the rest spin forever in this code:

 Quick question: can you control the way your watchdog sends the NMI?
 Like only to BSP rather than broadcast, etc.
 This is tied to the very unique situation that you cannot really
 deliver the (second) NMI.

 Attilio


 --
 Peace can only be achieved by understanding - A. Einstein


 I don't believe that I can, but I can check.  In any case I can imagine
 other places where this could be an issue.  hwpmc works with NMIs, right?
 So an hwpmc bug could trigger the same kind of issues if two CPUs that
 concurrently called pmc_intr both tripped over the sane bug.

Frankly, I think that what you were trying to do is someway the right
approach, modulo a clean interface.

I don't understand why the spinlock does wants to spin forever as it
can never recover. Stopping the cpus that gets into the spinlock is
perfectly fine.
There are only 2 things to consider:
1) I think we need a new KPI for that, a function in
$arch/include/cpu.h that does take care to stop a CPU in MI way, so
for example cpu_self_stop(). This needs to be implemented for all the
architectures but it can be done easily because it will be what
cpustop_handler() and similar functions do, basically.
2) The fake spinlock path will call such functions. The only thing
to debeate IMHO is if we want to do that conditional to
stop_scheduler_on_panic or not. If I have to be honest, stopping the
CPU seems the best approach in any case to me, but I'm open to hear
what you think.

Comments?

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ule+smp: small optimization for turnstile priority lending

2012-11-06 Thread Attilio Rao
On 10/29/12, Attilio Rao atti...@freebsd.org wrote:
 On Mon, Oct 29, 2012 at 4:56 PM, Attilio Rao atti...@freebsd.org wrote:
 On Wed, Oct 3, 2012 at 3:05 PM, Andriy Gapon a...@freebsd.org wrote:
 on 20/09/2012 16:14 Attilio Rao said the following:
 On 9/20/12, Andriy Gapon a...@freebsd.org wrote:
 [snip]
 The patch works well as far as I can tell.  Thank you!
 There is one warning with full witness enables but it appears to be
 harmless
 (so
 far):

 Andriy,
 thanks a lot for your testing and reports you made so far.
 Unfortunately I'm going off for 2 weeks now and I won't work on
 FreeBSD for that timeframe. I will get back to those in 2 weeks then.
 If you want  to play more with this idea feel free to extend/fix/etc.
 this patch.

 Unfortunately I haven't found time to work on this further, but I have
 some
 additional thoughts.

 First, I think that the witness message was not benign and it actually
 warns about
 the same kind of deadlock that I originally had.
 The problem is that sched_lend_prio() is called with target thread's
 td_lock held,
 which is a lock of tdq on the thread's CPU.  Then, in your patch, we
 acquire
 current tdq's lock to modify its load.  But if two tdq locks are to be
 held at the
 same time, then they must be locked using tdq_lock_pair, so that lock
 order is
 maintained.  With the patch no tdq lock order can be maintained (can
 arbitrary)
 and thus it is possible to run into a deadlock.

 Indeed. I realized this after thinking about this problem while I was
 on holiday.


 I see two possibilities here, but don't rule out that there can be more.

 1. Do something like my patch does.  That is, manipulate current thread's
 tdq in
 advance before any other thread/tdq locks are acquired (or td_lock is
 changed to
 point to a different lock and current tdq is unlocked).  The API can be
 made more
 generic in nature.  E.g. it can look like this:
 void
 sched_thread_block(struct thread *td, int inhibitor)
 {
 struct tdq *tdq;

 THREAD_LOCK_ASSERT(td, MA_OWNED);
 KASSERT(td == curthread,
 (sched_thread_block: only curthread is supported));
 tdq = TDQ_SELF();
 TDQ_LOCK_ASSERT(tdq, MA_OWNED);
 MPASS(td-td_lock == TDQ_LOCKPTR(tdq));
 TD_SET_INHIB(td, inhibitor);
 tdq_load_rem(tdq, td);
 tdq_setlowpri(tdq, td);
 }


 2. Try to do things from sched_lend_prio based on curthread's state.
 This, as it
 seems, would require completely lock-less manipulations of current tdq.
 E.g. for
 tdq_load we could use atomic operations (it is already accessed
 locklessly, but
 not modified).  But for tdq_lowpri a more elaborate trick would be
 required like
 having a separate field for a temporary value.

 No, this is not going to work because tdq_lowpri and tdq_load are
 updated and used in the same critical path (and possibly also other
 tdq* members in tdq_runqueue_add(), for example, but I didn't bother
 to also check that).

 Ok, so here I wasn't completely right because td_lowpri and tdq_load
 are not read in the same critical path (cpu_search() and
 sched_pickcpu() in the self case).
 However, I was over-estimating a factor in this patch: I don't think
 we need to fix real tdq_load for self in that case before cpu_search()
 because self is handled in a very special way, with its own
 comparison. I then think that a patch local to sched_pickcpu() on the
 self path should be enough.

 This brings us to another bigger issue, however. tdq_lowpri handling
 is not perfect in that patch. Think about the case where the thread to
 be switched out is, infact, the lowest priority one. In that case,
 what happens is that what we should do is recalculating tdq_lowpri
 which is usually a very expensive operation and requires TDQ self
 locking to be in place to be brought on correctly.

I think that the easy fix for this is to implement a second field
called tdq_sec_lowpri where we store the next thread to be scheduled,
after tdq_lowpri. This is going to not really change anything because
we will get it for free when we already calculate tdq_lowpri and it
will let us the implement a definitive logic that helps in the case
you describe.

However I still want to think about the base idea behind the current
algorithm in the self/likely cpu pickup.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ule+smp: small optimization for turnstile priority lending

2012-11-06 Thread Attilio Rao
On 9/20/12, David Xu davi...@freebsd.org wrote:
 On 2012/09/18 22:05, Andriy Gapon wrote:

 Here is a snippet that demonstrates the issue on a supposedly fully
 loaded
 2-processor system:

 136794   0 3670427870244462 KTRGRAPH group:thread, id:Xorg tid
 102818,
 state:running, attributes: prio:122

 136793   0 3670427870241000 KTRGRAPH group:thread, id:cc1plus tid
 111916,
 state:yielding, attributes: prio:183, wmesg:(null), lockname:(null)

 136792   1 3670427870240829 KTRGRAPH group:thread, id:idle: cpu1 tid
 14,
 state:running, attributes: prio:255

 136791   1 3670427870239520 KTRGRAPH group:load, id:CPU 1 load,
 counter:0,
 attributes: none

 136790   1 3670427870239248 KTRGRAPH group:thread, id:firefox tid
 113473,
 state:blocked, attributes: prio:122, wmesg:(null), lockname:unp_mtx

 136789   1 3670427870237697 KTRGRAPH group:load, id:CPU 0 load,
 counter:2,
 attributes: none

 136788   1 3670427870236394 KTRGRAPH group:thread, id:firefox tid
 113473,
 point:wokeup, attributes: linkedto:Xorg tid 102818

 136787   1 3670427870236145 KTRGRAPH group:thread, id:Xorg tid
 102818,
 state:runq add, attributes: prio:122, linkedto:firefox tid 113473

 136786   1 3670427870235981 KTRGRAPH group:load, id:CPU 1 load,
 counter:1,
 attributes: none

 136785   1 3670427870235707 KTRGRAPH group:thread, id:Xorg tid
 102818,
 state:runq rem, attributes: prio:176

 136784   1 3670427870235423 KTRGRAPH group:thread, id:Xorg tid
 102818,
 point:prio, attributes: prio:176, new prio:122, linkedto:firefox tid
 113473

 136783   1 3670427870202392 KTRGRAPH group:thread, id:firefox tid
 113473,
 state:running, attributes: prio:104

 See how how the Xorg thread was forced from CPU 1 to CPU 0 where it
 preempted
 cc1plus thread (I do have preemption enabled) only to leave CPU 1 with
 zero load.

 Here is a proposed solution:

  turnstile_wait: optimize priority lending to a thread on a runqueue

  As the current thread is definitely going into mi_switch, it now
 removes
  its load before doing priority propagation which can potentially
 result
  in sched_add.  In the SMP  ULE case the latter searches for the
  least loaded CPU to place a boosted thread, which is supposedly
 about
  to run.

 diff --git a/sys/kern/sched_ule.c b/sys/kern/sched_ule.c
 index 8e466cd..3299cae 100644
 --- a/sys/kern/sched_ule.c
 +++ b/sys/kern/sched_ule.c
 @@ -1878,7 +1878,10 @@ sched_switch(struct thread *td, struct thread
 *newtd, int
 flags)
  /* This thread must be going to sleep. */
  TDQ_LOCK(tdq);
  mtx = thread_lock_block(td);
 -tdq_load_rem(tdq, td);
 +#if defined(SMP)
 +if ((flags  SW_TYPE_MASK) != SWT_TURNSTILE)
 +#endif
 +tdq_load_rem(tdq, td);
  }
  /*
   * We enter here with the thread blocked and assigned to the
 @@ -2412,6 +2415,21 @@ sched_rem(struct thread *td)
  tdq_setlowpri(tdq, NULL);
   }

 +void
 +sched_load_rem(struct thread *td)
 +{
 +struct tdq *tdq;
 +
 +KASSERT(td == curthread,
 +(sched_rem_load: only curthread is supported));
 +KASSERT(td-td_oncpu == td-td_sched-ts_cpu,
 +(thread running on cpu different from ts_cpu));
 +tdq = TDQ_CPU(td-td_sched-ts_cpu);
 +TDQ_LOCK_ASSERT(tdq, MA_OWNED);
 +MPASS(td-td_lock == TDQ_LOCKPTR(tdq));
 +tdq_load_rem(tdq, td);
 +}
 +
   /*
* Fetch cpu utilization information.  Updates on demand.
*/
 diff --git a/sys/kern/subr_turnstile.c b/sys/kern/subr_turnstile.c
 index 31d16fe..d1d68e9 100644
 --- a/sys/kern/subr_turnstile.c
 +++ b/sys/kern/subr_turnstile.c
 @@ -731,6 +731,13 @@ turnstile_wait(struct turnstile *ts, struct thread
 *owner,
 int queue)
  LIST_INSERT_HEAD(ts-ts_free, td-td_turnstile, ts_hash);
  }
  thread_lock(td);
 +#if defined(SCHED_ULE)  defined(SMP)
 +/*
 + * Remove load earlier so that it does not affect cpu selection
 + * for a thread waken up due to priority lending, if any.
 + */
 +sched_load_rem(td);
 +#endif
  thread_lock_set(td, ts-ts_lock);
  td-td_turnstile = NULL;

 diff --git a/sys/sys/sched.h b/sys/sys/sched.h
 index 4b8387c..b1ead1b 100644
 --- a/sys/sys/sched.h
 +++ b/sys/sys/sched.h
 @@ -110,6 +110,9 @@ void sched_preempt(struct thread *td);
   void   sched_add(struct thread *td, int flags);
   void   sched_clock(struct thread *td);
   void   sched_rem(struct thread *td);
 +#if defined(SCHED_ULE)  defined(SMP)
 +voidsched_load_rem(struct thread *td);
 +#endif
   void   sched_tick(int cnt);
   void   sched_relinquish(struct thread *td);
   struct thread *sched_choose(void);


 I found another scenario in taskqueue, in the function
 taskqueue_terminate, current thread tries to wake
 another thread up and sleep immediately, the tq_mutex sometimes
 is a spinlock. So if you remove one thread load from current cpu
 before wakeup, the resumed thread may be put on same cpu,
 so it will optimize the cpu 

Re: ule+smp: small optimization for turnstile priority lending

2012-10-29 Thread Attilio Rao
On Wed, Oct 3, 2012 at 3:05 PM, Andriy Gapon a...@freebsd.org wrote:
 on 20/09/2012 16:14 Attilio Rao said the following:
 On 9/20/12, Andriy Gapon a...@freebsd.org wrote:
 [snip]
 The patch works well as far as I can tell.  Thank you!
 There is one warning with full witness enables but it appears to be harmless
 (so
 far):

 Andriy,
 thanks a lot for your testing and reports you made so far.
 Unfortunately I'm going off for 2 weeks now and I won't work on
 FreeBSD for that timeframe. I will get back to those in 2 weeks then.
 If you want  to play more with this idea feel free to extend/fix/etc.
 this patch.

 Unfortunately I haven't found time to work on this further, but I have some
 additional thoughts.

 First, I think that the witness message was not benign and it actually warns 
 about
 the same kind of deadlock that I originally had.
 The problem is that sched_lend_prio() is called with target thread's td_lock 
 held,
 which is a lock of tdq on the thread's CPU.  Then, in your patch, we acquire
 current tdq's lock to modify its load.  But if two tdq locks are to be held 
 at the
 same time, then they must be locked using tdq_lock_pair, so that lock order is
 maintained.  With the patch no tdq lock order can be maintained (can 
 arbitrary)
 and thus it is possible to run into a deadlock.

Indeed. I realized this after thinking about this problem while I was
on holiday.


 I see two possibilities here, but don't rule out that there can be more.

 1. Do something like my patch does.  That is, manipulate current thread's tdq 
 in
 advance before any other thread/tdq locks are acquired (or td_lock is changed 
 to
 point to a different lock and current tdq is unlocked).  The API can be made 
 more
 generic in nature.  E.g. it can look like this:
 void
 sched_thread_block(struct thread *td, int inhibitor)
 {
 struct tdq *tdq;

 THREAD_LOCK_ASSERT(td, MA_OWNED);
 KASSERT(td == curthread,
 (sched_thread_block: only curthread is supported));
 tdq = TDQ_SELF();
 TDQ_LOCK_ASSERT(tdq, MA_OWNED);
 MPASS(td-td_lock == TDQ_LOCKPTR(tdq));
 TD_SET_INHIB(td, inhibitor);
 tdq_load_rem(tdq, td);
 tdq_setlowpri(tdq, td);
 }


 2. Try to do things from sched_lend_prio based on curthread's state.  This, 
 as it
 seems, would require completely lock-less manipulations of current tdq.  E.g. 
 for
 tdq_load we could use atomic operations (it is already accessed locklessly, 
 but
 not modified).  But for tdq_lowpri a more elaborate trick would be required 
 like
 having a separate field for a temporary value.

No, this is not going to work because tdq_lowpri and tdq_load are
updated and used in the same critical path (and possibly also other
tdq* members in tdq_runqueue_add(), for example, but I didn't bother
to also check that).

Let me think some more about this and get back to you.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ule+smp: small optimization for turnstile priority lending

2012-10-29 Thread Attilio Rao
On Mon, Oct 29, 2012 at 4:56 PM, Attilio Rao atti...@freebsd.org wrote:
 On Wed, Oct 3, 2012 at 3:05 PM, Andriy Gapon a...@freebsd.org wrote:
 on 20/09/2012 16:14 Attilio Rao said the following:
 On 9/20/12, Andriy Gapon a...@freebsd.org wrote:
 [snip]
 The patch works well as far as I can tell.  Thank you!
 There is one warning with full witness enables but it appears to be 
 harmless
 (so
 far):

 Andriy,
 thanks a lot for your testing and reports you made so far.
 Unfortunately I'm going off for 2 weeks now and I won't work on
 FreeBSD for that timeframe. I will get back to those in 2 weeks then.
 If you want  to play more with this idea feel free to extend/fix/etc.
 this patch.

 Unfortunately I haven't found time to work on this further, but I have some
 additional thoughts.

 First, I think that the witness message was not benign and it actually warns 
 about
 the same kind of deadlock that I originally had.
 The problem is that sched_lend_prio() is called with target thread's td_lock 
 held,
 which is a lock of tdq on the thread's CPU.  Then, in your patch, we acquire
 current tdq's lock to modify its load.  But if two tdq locks are to be held 
 at the
 same time, then they must be locked using tdq_lock_pair, so that lock order 
 is
 maintained.  With the patch no tdq lock order can be maintained (can 
 arbitrary)
 and thus it is possible to run into a deadlock.

 Indeed. I realized this after thinking about this problem while I was
 on holiday.


 I see two possibilities here, but don't rule out that there can be more.

 1. Do something like my patch does.  That is, manipulate current thread's 
 tdq in
 advance before any other thread/tdq locks are acquired (or td_lock is 
 changed to
 point to a different lock and current tdq is unlocked).  The API can be made 
 more
 generic in nature.  E.g. it can look like this:
 void
 sched_thread_block(struct thread *td, int inhibitor)
 {
 struct tdq *tdq;

 THREAD_LOCK_ASSERT(td, MA_OWNED);
 KASSERT(td == curthread,
 (sched_thread_block: only curthread is supported));
 tdq = TDQ_SELF();
 TDQ_LOCK_ASSERT(tdq, MA_OWNED);
 MPASS(td-td_lock == TDQ_LOCKPTR(tdq));
 TD_SET_INHIB(td, inhibitor);
 tdq_load_rem(tdq, td);
 tdq_setlowpri(tdq, td);
 }


 2. Try to do things from sched_lend_prio based on curthread's state.  This, 
 as it
 seems, would require completely lock-less manipulations of current tdq.  
 E.g. for
 tdq_load we could use atomic operations (it is already accessed locklessly, 
 but
 not modified).  But for tdq_lowpri a more elaborate trick would be required 
 like
 having a separate field for a temporary value.

 No, this is not going to work because tdq_lowpri and tdq_load are
 updated and used in the same critical path (and possibly also other
 tdq* members in tdq_runqueue_add(), for example, but I didn't bother
 to also check that).

Ok, so here I wasn't completely right because td_lowpri and tdq_load
are not read in the same critical path (cpu_search() and
sched_pickcpu() in the self case).
However, I was over-estimating a factor in this patch: I don't think
we need to fix real tdq_load for self in that case before cpu_search()
because self is handled in a very special way, with its own
comparison. I then think that a patch local to sched_pickcpu() on the
self path should be enough.

This brings us to another bigger issue, however. tdq_lowpri handling
is not perfect in that patch. Think about the case where the thread to
be switched out is, infact, the lowest priority one. In that case,
what happens is that what we should do is recalculating tdq_lowpri
which is usually a very expensive operation and requires TDQ self
locking to be in place to be brought on correctly.

At this point, I would suggest to use the new version of avg_ule.patch
which doesn't take into account tdq_lowpri. This means the
optimization won't work in all the cases but at least it is a good
trade-off with a more hacky version.

This also let me wonder about the tdq_lowpri check in the self case,
in general. Basically it forces sched_pickcpu() to select self if and
only if the new thread to schedule has an higher priority than the
lowest one on curcpu. Why is that like this? Exactly this check is
used to enforce some sort of fairness?
It would be good if Jeff spends a word or two on this check specifically.

Anyway the patch that implements what suggested, let me know your
thinking about it:
http://www.freebsd.org/~attilio/avg_ule2.patch

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ule+smp: small optimization for turnstile priority lending

2012-10-29 Thread Attilio Rao
On Mon, Oct 29, 2012 at 6:59 PM, Attilio Rao atti...@freebsd.org wrote:

[ trimm ]

 This also let me wonder about the tdq_lowpri check in the self case,
 in general. Basically it forces sched_pickcpu() to select self if and
 only if the new thread to schedule has an higher priority than the
 lowest one on curcpu. Why is that like this? Exactly this check is
 used to enforce some sort of fairness?
 It would be good if Jeff spends a word or two on this check specifically.

Also, I've read the code of tdq_setlowpri() more closely and I think
the name tdq_lowpri is highly misleading, because that seems to me the
*highest* priority thread that gets returned.
Said that, this means that self will be selected in sched_pickcpu() if
and only if the new thread has an higher priority than all the ones on
the self runqueue. Isn't all this a bit too extreme? Assuming that one
cpu has only a single high-priority thread and others are very loaded
it would likely make sense to not keep loading them but switch to the
self one, maybe.

 Anyway the patch that implements what suggested, let me know your
 thinking about it:
 http://www.freebsd.org/~attilio/avg_ule2.patch

I was thinking that however, maybe we could do the tdq_choose()
calculation if self == target, to have a little more chances to get
the optimization in place, eventually.

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ule+smp: small optimization for turnstile priority lending

2012-09-20 Thread Attilio Rao
On 9/20/12, Andriy Gapon a...@freebsd.org wrote:
 on 19/09/2012 10:33 Attilio Rao said the following:
 It is hard for me to tell if this is subject to same issues because I
 do not entirely understand where the locking was happening in your
 patch.
 Can you try testing this with your already KTR instrumented test and
 possibly report?

 The patch works well as far as I can tell.  Thank you!
 There is one warning with full witness enables but it appears to be harmless
 (so
 far):

Andiy,
thanks a lot for your testing and reports you made so far.
Unfortunately I'm going off for 2 weeks now and I won't work on
FreeBSD for that timeframe. I will get back to those in 2 weeks then.
If you want  to play more with this idea feel free to extend/fix/etc.
this patch.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ule+smp: small optimization for turnstile priority lending

2012-09-19 Thread Attilio Rao
On Wed, Sep 19, 2012 at 7:03 AM, Andriy Gapon a...@freebsd.org wrote:
 on 19/09/2012 02:01 Attilio Rao said the following:
 On Tue, Sep 18, 2012 at 8:07 PM, Andriy Gapon a...@freebsd.org wrote:
 on 18/09/2012 19:50 Attilio Rao said the following:
 On 9/18/12, Andriy Gapon a...@freebsd.org wrote:

 Here is a snippet that demonstrates the issue on a supposedly fully loaded
 2-processor system:

 136794   0 3670427870244462 KTRGRAPH group:thread, id:Xorg tid 102818,
 state:running, attributes: prio:122

 136793   0 3670427870241000 KTRGRAPH group:thread, id:cc1plus tid
 111916,
 state:yielding, attributes: prio:183, wmesg:(null), lockname:(null)

 136792   1 3670427870240829 KTRGRAPH group:thread, id:idle: cpu1 tid
 14,
 state:running, attributes: prio:255

 136791   1 3670427870239520 KTRGRAPH group:load, id:CPU 1 load,
 counter:0,
 attributes: none

 136790   1 3670427870239248 KTRGRAPH group:thread, id:firefox tid
 113473,
 state:blocked, attributes: prio:122, wmesg:(null), lockname:unp_mtx

 136789   1 3670427870237697 KTRGRAPH group:load, id:CPU 0 load,
 counter:2,
 attributes: none

 136788   1 3670427870236394 KTRGRAPH group:thread, id:firefox tid
 113473,
 point:wokeup, attributes: linkedto:Xorg tid 102818

 136787   1 3670427870236145 KTRGRAPH group:thread, id:Xorg tid 102818,
 state:runq add, attributes: prio:122, linkedto:firefox tid 113473

 136786   1 3670427870235981 KTRGRAPH group:load, id:CPU 1 load,
 counter:1,
 attributes: none

 136785   1 3670427870235707 KTRGRAPH group:thread, id:Xorg tid 102818,
 state:runq rem, attributes: prio:176

 136784   1 3670427870235423 KTRGRAPH group:thread, id:Xorg tid 102818,
 point:prio, attributes: prio:176, new prio:122, linkedto:firefox tid
 113473

 136783   1 3670427870202392 KTRGRAPH group:thread, id:firefox tid
 113473,
 state:running, attributes: prio:104

 See how how the Xorg thread was forced from CPU 1 to CPU 0 where it
 preempted
 cc1plus thread (I do have preemption enabled) only to leave CPU 1 with 
 zero
 load.

 I think that the idea is bright, but I have reservations against the
 implementation because it seems to me there are too many layering
 violations.

 Just one - for a layer between tunrstile and scheduler :-)
 But I agree.

 What is suggest is somewhat summarized like that:
 - Add a new SRQ_WILLSLEEP or the name you prefer
 - Add a new flags argument to sched_lend_prio() (both ule and 4bsd)
 and sched_thread_priority (ule only)
 - sched_thread_priority() will pass down the new flag to sched_add()
 which passed down to sched_pickcpu().

 This way sched_pickcpu() has the correct knowledge of what is going on
 and it can make the right decision. You likely don't need to lower the
 tdq_load at that time either this way, because sched_pickcpu() can
 just adjust it locally for its decision.

 What do you think?

 This sounds easy but it is not quite so given the implementation of
 sched_pickcpu and sched_lowest.  This is probably more work than I am able 
 to
 take now.

 I think actually that the attached patch should do what you need. Of
 course there is more runqueue locking, due to the tdq_load fondling,
 but I'm not sure it is really a big deal.
 I didn't test it yet, as I understand you have a test case, so maybe
 you can. However if Jeff agrees I can send the patch to flo@ for
 performance evaluation.

 Thoughts?

 Originally I had a similar patch with a small difference that I did tdq_load
 manipulations in sched_thread_priority around sched_add call.  That patch
 produced a deadlock because of the need for TDQ_LOCK.

It is hard for me to tell if this is subject to same issues because I
do not entirely understand where the locking was happening in your
patch.
Can you try testing this with your already KTR instrumented test and
possibly report?

Thanks,
Attilio



-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ule+smp: small optimization for turnstile priority lending

2012-09-18 Thread Attilio Rao
On 9/18/12, Andriy Gapon a...@freebsd.org wrote:

 Here is a snippet that demonstrates the issue on a supposedly fully loaded
 2-processor system:

 136794   0 3670427870244462 KTRGRAPH group:thread, id:Xorg tid 102818,
 state:running, attributes: prio:122

 136793   0 3670427870241000 KTRGRAPH group:thread, id:cc1plus tid
 111916,
 state:yielding, attributes: prio:183, wmesg:(null), lockname:(null)

 136792   1 3670427870240829 KTRGRAPH group:thread, id:idle: cpu1 tid
 14,
 state:running, attributes: prio:255

 136791   1 3670427870239520 KTRGRAPH group:load, id:CPU 1 load,
 counter:0,
 attributes: none

 136790   1 3670427870239248 KTRGRAPH group:thread, id:firefox tid
 113473,
 state:blocked, attributes: prio:122, wmesg:(null), lockname:unp_mtx

 136789   1 3670427870237697 KTRGRAPH group:load, id:CPU 0 load,
 counter:2,
 attributes: none

 136788   1 3670427870236394 KTRGRAPH group:thread, id:firefox tid
 113473,
 point:wokeup, attributes: linkedto:Xorg tid 102818

 136787   1 3670427870236145 KTRGRAPH group:thread, id:Xorg tid 102818,
 state:runq add, attributes: prio:122, linkedto:firefox tid 113473

 136786   1 3670427870235981 KTRGRAPH group:load, id:CPU 1 load,
 counter:1,
 attributes: none

 136785   1 3670427870235707 KTRGRAPH group:thread, id:Xorg tid 102818,
 state:runq rem, attributes: prio:176

 136784   1 3670427870235423 KTRGRAPH group:thread, id:Xorg tid 102818,
 point:prio, attributes: prio:176, new prio:122, linkedto:firefox tid
 113473

 136783   1 3670427870202392 KTRGRAPH group:thread, id:firefox tid
 113473,
 state:running, attributes: prio:104

 See how how the Xorg thread was forced from CPU 1 to CPU 0 where it
 preempted
 cc1plus thread (I do have preemption enabled) only to leave CPU 1 with zero
 load.

I think that the idea is bright, but I have reservations against the
implementation because it seems to me there are too many layering
violations.

What is suggest is somewhat summarized like that:
- Add a new SRQ_WILLSLEEP or the name you prefer
- Add a new flags argument to sched_lend_prio() (both ule and 4bsd)
and sched_thread_priority (ule only)
- sched_thread_priority() will pass down the new flag to sched_add()
which passed down to sched_pickcpu().

This way sched_pickcpu() has the correct knowledge of what is going on
and it can make the right decision. You likely don't need to lower the
tdq_load at that time either this way, because sched_pickcpu() can
just adjust it locally for its decision.

What do you think?

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ule+smp: small optimization for turnstile priority lending

2012-09-18 Thread Attilio Rao
On Tue, Sep 18, 2012 at 8:07 PM, Andriy Gapon a...@freebsd.org wrote:
 on 18/09/2012 19:50 Attilio Rao said the following:
 On 9/18/12, Andriy Gapon a...@freebsd.org wrote:

 Here is a snippet that demonstrates the issue on a supposedly fully loaded
 2-processor system:

 136794   0 3670427870244462 KTRGRAPH group:thread, id:Xorg tid 102818,
 state:running, attributes: prio:122

 136793   0 3670427870241000 KTRGRAPH group:thread, id:cc1plus tid
 111916,
 state:yielding, attributes: prio:183, wmesg:(null), lockname:(null)

 136792   1 3670427870240829 KTRGRAPH group:thread, id:idle: cpu1 tid
 14,
 state:running, attributes: prio:255

 136791   1 3670427870239520 KTRGRAPH group:load, id:CPU 1 load,
 counter:0,
 attributes: none

 136790   1 3670427870239248 KTRGRAPH group:thread, id:firefox tid
 113473,
 state:blocked, attributes: prio:122, wmesg:(null), lockname:unp_mtx

 136789   1 3670427870237697 KTRGRAPH group:load, id:CPU 0 load,
 counter:2,
 attributes: none

 136788   1 3670427870236394 KTRGRAPH group:thread, id:firefox tid
 113473,
 point:wokeup, attributes: linkedto:Xorg tid 102818

 136787   1 3670427870236145 KTRGRAPH group:thread, id:Xorg tid 102818,
 state:runq add, attributes: prio:122, linkedto:firefox tid 113473

 136786   1 3670427870235981 KTRGRAPH group:load, id:CPU 1 load,
 counter:1,
 attributes: none

 136785   1 3670427870235707 KTRGRAPH group:thread, id:Xorg tid 102818,
 state:runq rem, attributes: prio:176

 136784   1 3670427870235423 KTRGRAPH group:thread, id:Xorg tid 102818,
 point:prio, attributes: prio:176, new prio:122, linkedto:firefox tid
 113473

 136783   1 3670427870202392 KTRGRAPH group:thread, id:firefox tid
 113473,
 state:running, attributes: prio:104

 See how how the Xorg thread was forced from CPU 1 to CPU 0 where it
 preempted
 cc1plus thread (I do have preemption enabled) only to leave CPU 1 with zero
 load.

 I think that the idea is bright, but I have reservations against the
 implementation because it seems to me there are too many layering
 violations.

 Just one - for a layer between tunrstile and scheduler :-)
 But I agree.

 What is suggest is somewhat summarized like that:
 - Add a new SRQ_WILLSLEEP or the name you prefer
 - Add a new flags argument to sched_lend_prio() (both ule and 4bsd)
 and sched_thread_priority (ule only)
 - sched_thread_priority() will pass down the new flag to sched_add()
 which passed down to sched_pickcpu().

 This way sched_pickcpu() has the correct knowledge of what is going on
 and it can make the right decision. You likely don't need to lower the
 tdq_load at that time either this way, because sched_pickcpu() can
 just adjust it locally for its decision.

 What do you think?

 This sounds easy but it is not quite so given the implementation of
 sched_pickcpu and sched_lowest.  This is probably more work than I am able to
 take now.

I think actually that the attached patch should do what you need. Of
course there is more runqueue locking, due to the tdq_load fondling,
but I'm not sure it is really a big deal.
I didn't test it yet, as I understand you have a test case, so maybe
you can. However if Jeff agrees I can send the patch to flo@ for
performance evaluation.

Thoughts?

Attilio

Index: sys/sys/sched.h
===
--- sys/sys/sched.h (revisione 240672)
+++ sys/sys/sched.h (copia locale)
@@ -91,7 +91,7 @@ void  sched_nice(struct proc *p, int nice);
  */
 void   sched_exit_thread(struct thread *td, struct thread *child);
 void   sched_fork_thread(struct thread *td, struct thread *child);
-void   sched_lend_prio(struct thread *td, u_char prio);
+void   sched_lend_prio(struct thread *td, u_char prio, int flags);
 void   sched_lend_user_prio(struct thread *td, u_char pri);
 fixpt_tsched_pctcpu(struct thread *td);
 void   sched_prio(struct thread *td, u_char prio);
@@ -161,6 +161,7 @@ sched_unpin(void)
 #defineSRQ_INTR0x0004  /* It is probably urgent. */
 #defineSRQ_PREEMPTED   0x0008  /* has been
preempted.. be kind */
 #defineSRQ_BORROWING   0x0010  /* Priority updated
due to prio_lend */
+#defineSRQ_WILLSWITCH  0x0020  /* curthread will be
switched out */

 /* Scheduler stats. */
 #ifdef SCHED_STATS
Index: sys/kern/sched_ule.c
===
--- sys/kern/sched_ule.c(revisione 240672)
+++ sys/kern/sched_ule.c(copia locale)
@@ -290,7 +290,7 @@ static struct tdq   tdq_cpu;
 #defineTDQ_LOCKPTR(t)  ((t)-tdq_lock)

 static void sched_priority(struct thread *);
-static void sched_thread_priority(struct thread *, u_char);
+static void sched_thread_priority(struct thread *, u_char, int flags);
 static int sched_interact_score(struct thread *);
 static void sched_interact_update(struct thread *);
 static void sched_interact_fork(struct thread *);
@@ -1233,6 +1233,18 @@ sched_pickcpu(struct thread *td

Re: syslog(3) issues

2012-09-05 Thread Attilio Rao
On Mon, Sep 3, 2012 at 3:23 AM, Ian Lepore
free...@damnhippie.dyndns.org wrote:
 On Sun, 2012-09-02 at 19:50 -0600, Ian Lepore wrote:
 On Mon, 2012-09-03 at 00:35 +0100, Attilio Rao wrote:
  Hi,
  I was trying to use syslog(3) in a port application that uses
  threading , having all of them at the LOG_CRIT level. What I see is
  that when the logging gets massive (1000 entries) I cannot find some
  items within the /var/log/messages (I know because I started stamping
  also some sort of message ID in order to see what is going on). The
  missing items are in the order of 25% of what really be there.
 
  Someone has a good idea on where I can start verifying for my syslogd
  system? I have really 0 experience with syslogd and maybe I could be
  missing something obvious.

 There's a chance this PR about syslogd incorrectly calculating socket
 receive buffer sizes is related and the patch attached to it could fix
 it...

 http://www.freebsd.org/cgi/query-pr.cgi?pr=1604331

 I filed the PR long ago, if the patches have drifted out of date I'll be
 happy to re-work them.

 -- Ian


 Oops, I glitched the PR number when I pasted it, this one should be
 correct:

 http://www.freebsd.org/cgi/query-pr.cgi?pr=160433

This patch fixes the problem for me, thanks a lot.

Alexander, do you have any reservation against it? When do you think
the patch can be merged to -CURRENTR?

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


syslog(3) issues

2012-09-02 Thread Attilio Rao
Hi,
I was trying to use syslog(3) in a port application that uses
threading , having all of them at the LOG_CRIT level. What I see is
that when the logging gets massive (1000 entries) I cannot find some
items within the /var/log/messages (I know because I started stamping
also some sort of message ID in order to see what is going on). The
missing items are in the order of 25% of what really be there.

Someone has a good idea on where I can start verifying for my syslogd
system? I have really 0 experience with syslogd and maybe I could be
missing something obvious.

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: syslog(3) issues

2012-09-02 Thread Attilio Rao
On Mon, Sep 3, 2012 at 1:20 AM, Garrett Cooper yaneg...@gmail.com wrote:
 On Sun, Sep 2, 2012 at 4:35 PM, Attilio Rao atti...@freebsd.org wrote:
 Hi,
 I was trying to use syslog(3) in a port application that uses
 threading , having all of them at the LOG_CRIT level. What I see is
 that when the logging gets massive (1000 entries) I cannot find some
 items within the /var/log/messages (I know because I started stamping
 also some sort of message ID in order to see what is going on). The
 missing items are in the order of 25% of what really be there.

 Someone has a good idea on where I can start verifying for my syslogd
 system? I have really 0 experience with syslogd and maybe I could be
 missing something obvious.

 I'd maybe use something like rsyslog and force TCP to verify that
 the messages made it to their endpoints, and if all the messages make
 it to the rsyslogd daemon use tcpdump/wireshark to figure out if the
 UDP datagrams (default transport layer for syslog) aren't getting
 dropped on the floor.

Forgot to mention: the logging is done completely locally so I don't
think network should play a role here. Also, I would like to
understand if I'm missing something subdle or if we actually may have
a bug in syslogd.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: On cooperative work [Was: Re: newbus' ivar's limitation..]

2012-08-01 Thread Attilio Rao
On Wed, Aug 1, 2012 at 5:32 PM, Arnaud Lacombe lacom...@gmail.com wrote:
 Hi,

 On Tue, Jul 31, 2012 at 4:14 PM, Attilio Rao atti...@freebsd.org wrote:

 You don't want to work cooperatively.

 Why is it that mbuf's refactoring consultation is being held in
 internal, private, committers-and-invite-only-restricted meeting at
 BSDCan ?

 Why is it that so much review and discussion on changes are held privately ?

Arnaud,
belive me, to date I don't recall a single major technical decision
that has been settled exclusively in private (not subjected to peer
review) and in particular in person (e-mail help you focus on a lot of
different details that you may not have under control when talking to
people, etc).
Sometimes it is useful that a limited number of developers is involved
in initial brainstorming of some works, but after this period
constructive people usually ask for peer review publishing their plans
on the mailing lists or other media.
If you don't see any public further discussion this may be meaning:
a) the BSDCan meetings have been fruitless and there is no precise
plan/roadmap/etc.
b) there is still not consensus on details

and you can always publically asked on what was decided and what not.
Just send a mail to interested recipients and CC any FreeBSD mailing
list.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: On cooperative work [Was: Re: newbus' ivar's limitation..]

2012-08-01 Thread Attilio Rao
On 8/1/12, Arnaud Lacombe lacom...@gmail.com wrote:
 Hi,

 On Wed, Aug 1, 2012 at 12:40 PM, Attilio Rao atti...@freebsd.org wrote:
 On Wed, Aug 1, 2012 at 5:32 PM, Arnaud Lacombe lacom...@gmail.com
 wrote:
 Hi,

 On Tue, Jul 31, 2012 at 4:14 PM, Attilio Rao atti...@freebsd.org
 wrote:

 You don't want to work cooperatively.

 Why is it that mbuf's refactoring consultation is being held in
 internal, private, committers-and-invite-only-restricted meeting at
 BSDCan ?

 Why is it that so much review and discussion on changes are held
 privately ?

 Arnaud,
 belive me, to date I don't recall a single major technical decision
 that has been settled exclusively in private (not subjected to peer
 review) and in particular in person (e-mail help you focus on a lot of
 different details that you may not have under control when talking to
 people, etc).

 Whose call is it to declare something worth public discussion ? No one.

 Every time I see a Suggested by:, Submitted by:, Reported by:,
 and especially Approved by:, there should to be a public reference
 of the mentioned communication.

Not necessarilly. Every developer must ensure to produce a quality
work, with definition of quality being discretional. Some people
fail this expectation, while others do very good. As a general rule,
some people send patches to experts on the matter and they just trust
their judgment, others also full-fill testing cycles by thirdy part
people, others again ask for public reviews. Often this process is
adapted based on the dimension of the work and the number of
architectural changes it introduces.

As a personal matter, for a big architectural change things I would
*personally* do are:
- Prepare a master-plan with experts of the matter
- Post a plan (after having achived consensus) on the public mailing
list for further discussions
- Adjust the plan based on public feedbacks (if necessary)
- Implement the plan
- Ask the experts if they have objections to the implementation
- Ask testers to perform some stress-testing
- Ask testers to perform benchmark (if I find people interested in that)
- Send out to the public mailing list for public review
- Integrate suggestions
- Ask testers to stress-test again
- Commit

I think this model in general works fairly well, but people may have
different ideas on that, meaning that people may want to not involve
thirdy part for testing or review. This is going to be risky and lower
the quality of their work but it is their call.

 Sometimes it is useful that a limited number of developers is involved
 in initial brainstorming of some works,

 Never.

 but after this period
 constructive people usually ask for peer review publishing their plans
 on the mailing lists or other media.

 Again, never. By doing so, you merely put the community in a situation
 where, well, We, committers, have come with this, you can either
 accept or STFU, but no major changes will be made because we decided
 so.

You are forgetting one specific detail: you can always review a work
*after* it entered the tree. This is something you would never do, but
sometimes, when poor quality code is committed there is nothing else
you can do than just raise your concern after it is in.

 The callout-ng conference at BSDCan was just beautiful, it was basically:

 Speaker: we will do this
 Audience: how about this situation ? What you will do will not work.
 Speaker: thank you for listening, end of the conference

 It was beautiful to witness.

Not sure if you realized but I was what you mention Audience. I
think you are referring to a specific case where a quick heads-up on a
summer of code project has been presented, you cannot really believe
all the technical discussion around FreeBSD evolve this way.

 If you don't see any public further discussion this may be meaning:
 a) the BSDCan meetings have been fruitless and there is no precise
 plan/roadmap/etc.

 so not only you make it private, but it shamelessly failed...

And so? I think you have a wrong point of view of what is
shamelessly... I'm working on the same project since 6 months, i
thought I could finish it in 1 but then I understood that in order to
get the quality I was hoping I had to do more work... does it qualify
as failed, according to your standard?

 b) there is still not consensus on details

 Then the discussion should stop, public records are kept for reference
 in the future. There is no problem with this.

 and you can always publically asked on what was decided and what not.
 Just send a mail to interested recipients and CC any FreeBSD mailing
 list.

 This is not the way openness should be about.

There is not much more you can do when people don't share details and
discussions automatically.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd

Re: On cooperative work [Was: Re: newbus' ivar's limitation..]

2012-08-01 Thread Attilio Rao
On 8/1/12, Arnaud Lacombe lacom...@gmail.com wrote:
 Hi,

 On Wed, Aug 1, 2012 at 2:18 PM, Attilio Rao atti...@freebsd.org wrote:

[ trimm ]

 You are forgetting one specific detail: you can always review a work
 *after* it entered the tree. This is something you would never do, but
 sometimes, when poor quality code is committed there is nothing else
 you can do than just raise your concern after it is in.

 Unfortunately, not. First, the developer will certainly have moved on
 after the commit, API may have been changed tree-wide, etc. Then, time
 is likely to have passed between the time you figure potential
 regression or bugs, which makes the first point even more problematic.
 Finally, if my point of view is being ignored *before* it goes to the
 tree, do you really expect it will be considered *after* ?

There is one thing you are not considering: committers are as
powerless as non-committers in face of someone stuck on his own buggy
ideas/implementations. Often people are just convinced their idea is
better than your and they won't change their mind, regardeless their
status in the opensource community. And there is nothing more you can
do apart from learning how to deal with such situations. Granted,
there are projects blowing up and people abbandoning successful
opensource community because of this.

 From my external point of view, committers not only have the
 possibility, but *do* commit mess in the tree without non-committers
 could say or do anything, just as well as committers being able to
 arbitrarily close PR even if the original reporter disagree.

You should look at svn-src@ more often I suspect. You will see how
many discussions are in there.

 And so? I think you have a wrong point of view of what is
 shamelessly... I'm working on the same project since 6 months, i
 thought I could finish it in 1 but then I understood that in order to
 get the quality I was hoping I had to do more work... does it qualify
 as failed, according to your standard?

 not specifically, but at the end of that project of your, I would run
 a post-mortem meeting to see what went correctly and where things got
 out-of-control.

Arnaud, my friend, I have a new for you: this stuff is hard. I see the
brightest people I've ever met stuck for weeks on problems, thinking
about how to fix them in elegant way. Sometimes things get
understimated, sometimes not, this is just part of the game. But an
important thing is accepting critics in costructive way and learn.
This makes things much easier.

 As for the mbuf meeting, all I can find from it online is:

 http://lists.freebsd.org/pipermail/freebsd-arch/2012-June/012629.html

 which is not worth much... Rather than doing things internally, maybe
 it is time to open up... oh, wait, you will certainly come to the
 community with a design plan, figure out it's not gonna work while
 doing the implementation, change the design plan while implementing,
 go public with a +3k/-2k loc change patch, ask for benediction, go
 ahead and commit it up until someone comes with an obvious design flaw
 which would render the change pretty much useless, but there will be
 man-month of work to fix it, so it's never gonna be done.

 One obvious problem in FreeBSD is that committers are prosecutor,
 judge and jury altogether. That's not the first time I point this out.

You are drammatizing. As I already told, please, if you are interested
in this topic, ask for the state of the discussion and ask politely to
be included from now on. Nobody will reject you only because you don't
have a @freebsd.org.

 b) there is still not consensus on details

 Then the discussion should stop, public records are kept for reference
 in the future. There is no problem with this.

 and you can always publically asked on what was decided and what not.
 Just send a mail to interested recipients and CC any FreeBSD mailing
 list.

 This is not the way openness should be about.

 There is not much more you can do when people don't share details and
 discussions automatically.

 keep reporting regressions, interface flaws, POLA violations, ABI
 breakages, bugs, etc.

I agree. But with the correct and humble mindset and avoiding
aggressive behaviour.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: newbus' ivar's limitation..

2012-07-31 Thread Attilio Rao
On Tue, Jul 31, 2012 at 8:47 PM, Arnaud Lacombe lacom...@gmail.com wrote:
 Hi,

 On Tue, Jul 31, 2012 at 12:27 PM, Warner Losh i...@bsdimp.com wrote:

 On Jul 31, 2012, at 9:20 AM, Arnaud Lacombe wrote:

 Hi,

 On Mon, Jul 30, 2012 at 11:51 PM, Warner Losh i...@bsdimp.com wrote:
 [...] We lack that right now, which is why you're trying to shoe-horn the 
 FDT connections into a newbus world and complaining that everything sucks 
 because it is a poor fit.  I'd suggest that different mechanisms are 
 necessary.

 I'm not trying anything, or at least no longer. I do not see the point
 working on that when I can not get trivial patches in, especially
 those fixing poorly maintained drivers, whose issues _do_ affect
 people.

 Hey Arnaud, sorry to be a little harsh, but maybe if you shouted less and 
 cooperated more, people would be more willing to work with you.

 I tried to be Mr Nice Guy in the past, all I got in return was being
 ignored. Lists are full of unanswered problem because people do not
 want to insist getting an answer. Now, believe me on one point, if you
 are a driver or subsystem author, might I have an issue with your
 work, I *will* be a recurring pain in your butt to get the issue
 fixed, or to get in what I do believe, with the limited set of
 knowledge and resources to my disposal[0], to be a correct fix for the
 issue, at the time. If you are condescending, arrogant, or advocates
 of the status-quo, as have been committers in the past, I will return
 you favor. Let face it, FreeBSD is not the most outstanding OS out
 there (despite obvious capabilities), and I would not be too proud of
 its state.

 All that to say that asking politely does not work in the arbitrary
 FreeBSD realm, where the power to serve, is today nothing more that
 a relic.

Arnaud,
the problem I see here is that as always you make strong and false
claims on bugs and missing support from FreeBSD kernel, but when
people points out what are you missing/misunderstanding, you turn the
whole thread into FreeBSD is a relic baby-whining, without replying
with real technical arguments and simply ignoring e-mail by freebsd
developers. I didn't see any response from you to several technical
e-mail in this threads and others (please don't force me to open
mailman and show exactly all the responses you have deliberately
ignored), spitting unrespectful, poison-weighted words on developers
of our project. You don't want to work cooperatively. You don't want
to build something with FreeBSD community. So why you keep insist on
sending e-mail like this? Don't you think it would be more productive
for you to stick with another project? (I have a couple of names into
my head that may be a good fit for you...).

I think it is very offensive that you mock our statement like that.
For many people reading this e-mail it has a true meaning, people like
you should really watch his mouth when speaking about FreeBSD.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: How to diagnose system freezes?

2012-07-31 Thread Attilio Rao
On 8/1/12, Yuri y...@rawbw.com wrote:
 One of my 9.1-BETA1 systems periodically freezes. If sound was playing,
 it would usually cycle with a very short period. And system stops being
 sensitive to keyboard/mouse. Also ping of this system doesn't get a
 response.
 I would normally think that this is the faulty memory. But memory was
 recently replaced and tested with memtest+ for hours both before and
 after freezes and it passes all tests.
 One out of the ordinary thing that is running on this system is nvidia
 driver. But the freezes happen even when there is no graphics activity.
 Another out of the ordinary thing is that the kernel is built for
 DTrace. But DTrace was never used in the sessions that had a freeze.

 What is the way to diagnose this problem?

Start by adding SW_WATCHDOG to your machine with a reasonably low timeout.
Also, can you use a serial console? If yes you may consider going with
a serial break.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ULE/sched issues on stable/9 - why isn't preemption occuring?

2012-05-29 Thread Attilio Rao
2012/5/29 Adrian Chadd adr...@freebsd.org:
 Hi Alexander and others,

 I've been tinkering with ath(4) IO scheduling and taskqueues. In order
 to get proper in order TX IO occuring, I've placed ath_start() into
 a taskqueue so now whenever ath_start() is called, it just schedules a
 taskqueue entry to run.

 However, performance is worse. :-)

 Here's a schedgraph trace.

 http://people.freebsd.org/~adrian/ath/ktr.4-ath-iperf-using-taskqueue-for-tx.ktr.gz

 I've thrown this through schedgraph.py on stable/9 and I've found some
 rather annoying behaviour. It seems that the ath0 taskqueue stays in
 the runq add state for quite a long time (1.5ms and longer) because
 something else is going on on CPU #0.

 I'm very confused about what's going on. I'd like a hand trying to
 figure out why the schedgraph output is the way it is.

What I would usually do for this cases, is to patch your kernel with
more KTR traces (handmade class, add a fictious KTR class after
KTR_BUF) in the interested code paths to log why your task is not
really scheduled.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: new panic in cpu_reset() with WITNESS

2012-05-17 Thread Attilio Rao
2012/5/17, Andriy Gapon a...@freebsd.org:
 on 25/01/2012 23:52 Andriy Gapon said the following:
 on 24/01/2012 14:32 Gleb Smirnoff said the following:
 Yes, now:

 Rebooting...
 lock order reversal:
  1st 0x80937140 smp rendezvous (smp rendezvous) @
 /usr/src/head/sys/kern/kern_shutdown.c:542
  2nd 0xfe0001f5d838 uart_hwmtx (uart_hwmtx) @
 /usr/src/head/sys/dev/uart/uart_cpu.h:92
 panic: mtx_lock_spin: recursed on non-recursive mutex cnputs_mtx @
 /usr/src/head/sys/kern/kern_cons.c:500

 OK, so it's just a plain LOR between smp rendezvous and uart_hwmtx, no
 new
 details...

 It's still intriguing to me why the LOR *doesn't* happen [*] with
 stop_scheduler_on_panic=0.  But I don't see a productive way to pursue
 this
 investigation further.

 Salve Glebius!
 After your recent nudging I took yet another look at this issue and it seems
 that
 I have some findings.

 For those who might get interested here is a convenience reference to the
 whole
 thread on gmane: http://thread.gmane.org/gmane.os.freebsd.current/139307

 A short summary.
 Prerequisites: an SMP x86 system, its kernel is configured with WITNESS 
 !WITNESS_SKIPSPIN (this is important) and a system uses serial console via
 uart.
 Then, if stop_scheduler_on_panic is set to zero the system can be rebooted
 without
 a problem.  On the other hand, if stop_scheduler_on_panic is enabled, then
 the
 system first runs into a LOR when calling printf() in cpu_reset() and then
 it runs
 into a panic when printf is recursively called from witness(9) to report the
 LOR.
  The panic happens because of the recursion on cnputs_mtx, which should
 ensure
 that cnputs() output is not intermingled and which is not flagged to
 support
 recursion.

 There are two things about this report that greatly confused and puzzled
 me:
 1. stop_scheduler_on_panic variable is used _only_ in panic(9).  So how
 could it
 be possible that changing its value affects behavior of the system when
 panic(9)
 is not called?!

 2. The LOR in question happens between smp rendezvous (smp_ipi_mtx) and
 uart_hwmtx (sc_hwmtx_s in uart core) spin locks.  The order of these locks
 is
 actually predefined in witness order_lists[] such that uart_hwmtx must come
 before
 smp_ipi_mtx.  But in the reboot path we first take smp_ipi_mtx in
 shutdown_reset(), then we call cpu_reset, then it calls printf and from
 there we
 get to uart_hwmtx for serial console output.  So the order between these
 spinlocks
 is always violated in the x86 SMP reboot path.
 How come witness(9) doesn't _always_ detect this LOR?
 How come it didn't detect this LOR before any stop scheduler commits?!

 [Spoiler alert :-)]

 Turns out that the two puzzles above are closely related.
 Let's first list all the things that change when stop_scheduler_on_panic is
 enabled and a panic happens:
 - other CPUs are stopped (forced to spin)
 - interrupts on current CPU are disabled
 - by virtue of the above the current thread should be the only thread
 running
 (unless it executes a voluntary switch)
 - all locks are busted, they are completely ignored / bypassed
 - by virtue of the above no lock invariants and witness checks are
 performed, so
 no lock order checking, no recursion checking, etc

 So, what I observe is this: when stop_scheduler_on_panic is disabled, the
 LOR is
 actually detected as it should be.  witness(9) works properly here.  Once
 the LOR
 is detected witness(9) wants to report it using printf(9).  That's where we
 run
 into the cnputs_mtx recursion panic.  It's all exactly as with
 stop_scheduler_on_panic enabled.  Then panic(9) wants to report the panic
 using
 printf(9), which goes to cnputs() again, where _mtx_lock_spin_flags()
 detects
 locks recursion again (this is independent of witness(9)) and calls
 panic(9)
 again.  Then panic(9) wants to report the panic using printf(9)...
 I assume that when the stack is exhausted we run into a double fault and
 dblfault_handler wants to print something again...  Likely we eventually run
 into
 a triple fault which resets the machine.  Although, I recall some old
 reports of
 machines hanging during reboot in a place suspiciously close to where the
 described ordeal happens...
 But if the machine doesn't hang we get a full appearance of the reset
 successfully
 happening (modulo some last messages missing).

 With stop_scheduler_on_panic enabled and all the locks being ignored we, of
 course, do not run into any secondary lock recursions and resulting panics.
 So
 the system is able to at least panic gracefully (still we should have
 just
 reported the LOR gracefully, no panic is required).

 Some obvious conclusions:
 - the smp rendezvous and uart_hwmtx LOR is real and it is the true cause
 of
 the problem; it should be fixed one way or other - either by correcting
 witness
 order_lists[] or by avoiding the LOR in shutdown_reset/cpu_reset;
 - because witness(9) uses printf(9) to report problems, it is very fragile
 to use
 witness with any locks that can be 

Re: Disabling an arbitrary device

2012-04-20 Thread Attilio Rao
Il 20 aprile 2012 19:18, Arnaud Lacombe lacom...@gmail.com ha scritto:
 Hi,

 On Fri, Apr 20, 2012 at 2:16 PM, Arnaud Lacombe lacom...@gmail.com wrote:
 Hi,

 I will be bringing up an old thread there, but it would seem the
 situation did not evolve in the past 9 years. I have a machine running
 7.1 whose UHCI controller is generating some interrupt storm:

 # vmstat -i
 interrupt                          total       rate
 irq4: sio0                          1328          2
 irq19: uhci1+                   63514509      96380
 [...]

 generating useless load on one CPU:

 # top -SH
 last pid:  5223;  load averages:  0.00,  0.00,  0.00    up 0+00:17:21  
 13:10:35
 117 processes: 14 running, 79 sleeping, 24 waiting
 CPU:  0.2% user,  0.0% nice,  0.2% system,  6.6% interrupt, 93.0% idle
 Mem: 33M Active, 9348K Inact, 67M Wired, 400K Cache, 29M Buf, 2892M Free
 [...]
   57 root          -64    -     0K     8K CPU0   0  11:59 86.57% irq19: 
 uhci1+

 I thought I could use an hint to forbid uhci(4) attachment, ala:

 hint.uhci.0.disabled=1
 hint.uhci.1.disabled=1

 in /boot/loader.conf. However, it would seem that what should be
 usable with any arbitrary devices, ie. be an integral part of
 device(9), has to be hardcoded in every driver, sad.

 as for the usual if you're not happy, please provide a patch:

 https://github.com/lacombar/freebsd/commit/30786d09b0cb441890cdc749ee5243238e81d2d8

I think a generalized kludge should really belong to
device_probe_child() rather than device_add_child_ordered().

When you have a tested patch against -CURRENT, can you please send to
freebsd-newbus@?

BTW, IIRC 7.x doesn't have the new USB stack, which means that you can
have caught easilly a driver bug there, did you consider switching at
least to 8.x kernel?

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Regarding core disable in FreeBSD 9

2012-04-14 Thread Attilio Rao
Il 13 aprile 2012 17:18, Mahesh Babu maheshbab...@yahoo.co.in ha scritto:
 How to disable a particular core in FreeBSD 9?
 How to enable it again?

You can disable specific core at boot time by using the tunable:
hint.lapic.X.disabled=1

where X is the APIC ID of the CPU you want to turn off.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [RFT][patch] Scheduling for HTT and not only

2012-04-06 Thread Attilio Rao
Il 05 aprile 2012 19:12, Arnaud Lacombe lacom...@gmail.com ha scritto:
 Hi,

 [Sorry for the delay, I got a bit sidetrack'ed...]

 2012/2/17 Alexander Motin m...@freebsd.org:
 On 17.02.2012 18:53, Arnaud Lacombe wrote:

 On Fri, Feb 17, 2012 at 11:29 AM, Alexander Motinm...@freebsd.org  wrote:

 On 02/15/12 21:54, Jeff Roberson wrote:

 On Wed, 15 Feb 2012, Alexander Motin wrote:

 I've decided to stop those cache black magic practices and focus on
 things that really exist in this world -- SMT and CPU load. I've
 dropped most of cache related things from the patch and made the rest
 of things more strict and predictable:
 http://people.freebsd.org/~mav/sched.htt34.patch


 This looks great. I think there is value in considering the other
 approach further but I would like to do this part first. It would be
 nice to also add priority as a greater influence in the load balancing
 as well.


 I haven't got good idea yet about balancing priorities, but I've
 rewritten
 balancer itself. As soon as sched_lowest() / sched_highest() are more
 intelligent now, they allowed to remove topology traversing from the
 balancer itself. That should fix double-swapping problem, allow to keep
 some
 affinity while moving threads and make balancing more fair. I did number
 of
 tests running 4, 8, 9 and 16 CPU-bound threads on 8 CPUs. With 4, 8 and
 16
 threads everything is stationary as it should. With 9 threads I see
 regular
 and random load move between all 8 CPUs. Measurements on 5 minutes run
 show
 deviation of only about 5 seconds. It is the same deviation as I see
 caused
 by only scheduling of 16 threads on 8 cores without any balancing needed
 at
 all. So I believe this code works as it should.

 Here is the patch: http://people.freebsd.org/~mav/sched.htt40.patch

 I plan this to be a final patch of this series (more to come :)) and if
 there will be no problems or objections, I am going to commit it (except
 some debugging KTRs) in about ten days. So now it's a good time for
 reviews
 and testing. :)

 is there a place where all the patches are available ?


 All my scheduler patches are cumulative, so all you need is only the last
 mentioned here sched.htt40.patch.

 You may want to have a look to the result I collected in the
 `runs/freebsd-experiments' branch of:

 https://github.com/lacombar/hackbench/

 and compare them with vanilla FreeBSD 9.0 and -CURRENT results
 available in `runs/freebsd'. On the dual package platform, your patch
 is not a definite win.

 But in some cases, especially for multi-socket systems, to let it show its
 best, you may want to apply additional patch from avg@ to better detect CPU
 topology:
 https://gitorious.org/~avg/freebsd/avgbsd/commit/6bca4a2e4854ea3fc275946a023db65c483cb9dd

 test I conducted specifically for this patch did not showed much 
 improvement...

Can you please clarify on this point?
The test you did included cases where the topology was detected badly
against cases where the topology was detected correctly as a patched
kernel (and you still didn't see a performance improvement), in terms
of cache line sharing?

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [RFT][patch] Scheduling for HTT and not only

2012-04-06 Thread Attilio Rao
Il 06 aprile 2012 15:27, Alexander Motin m...@freebsd.org ha scritto:
 On 04/06/12 17:13, Attilio Rao wrote:

 Il 05 aprile 2012 19:12, Arnaud Lacombelacom...@gmail.com  ha scritto:

 Hi,

 [Sorry for the delay, I got a bit sidetrack'ed...]

 2012/2/17 Alexander Motinm...@freebsd.org:

 On 17.02.2012 18:53, Arnaud Lacombe wrote:


 On Fri, Feb 17, 2012 at 11:29 AM, Alexander Motinm...@freebsd.org
  wrote:


 On 02/15/12 21:54, Jeff Roberson wrote:


 On Wed, 15 Feb 2012, Alexander Motin wrote:


 I've decided to stop those cache black magic practices and focus on
 things that really exist in this world -- SMT and CPU load. I've
 dropped most of cache related things from the patch and made the
 rest
 of things more strict and predictable:
 http://people.freebsd.org/~mav/sched.htt34.patch



 This looks great. I think there is value in considering the other
 approach further but I would like to do this part first. It would be
 nice to also add priority as a greater influence in the load
 balancing
 as well.



 I haven't got good idea yet about balancing priorities, but I've
 rewritten
 balancer itself. As soon as sched_lowest() / sched_highest() are more
 intelligent now, they allowed to remove topology traversing from the
 balancer itself. That should fix double-swapping problem, allow to
 keep
 some
 affinity while moving threads and make balancing more fair. I did
 number
 of
 tests running 4, 8, 9 and 16 CPU-bound threads on 8 CPUs. With 4, 8
 and
 16
 threads everything is stationary as it should. With 9 threads I see
 regular
 and random load move between all 8 CPUs. Measurements on 5 minutes run
 show
 deviation of only about 5 seconds. It is the same deviation as I see
 caused
 by only scheduling of 16 threads on 8 cores without any balancing
 needed
 at
 all. So I believe this code works as it should.

 Here is the patch: http://people.freebsd.org/~mav/sched.htt40.patch

 I plan this to be a final patch of this series (more to come :)) and
 if
 there will be no problems or objections, I am going to commit it
 (except
 some debugging KTRs) in about ten days. So now it's a good time for
 reviews
 and testing. :)

 is there a place where all the patches are available ?



 All my scheduler patches are cumulative, so all you need is only the
 last
 mentioned here sched.htt40.patch.

 You may want to have a look to the result I collected in the
 `runs/freebsd-experiments' branch of:

 https://github.com/lacombar/hackbench/

 and compare them with vanilla FreeBSD 9.0 and -CURRENT results
 available in `runs/freebsd'. On the dual package platform, your patch
 is not a definite win.

 But in some cases, especially for multi-socket systems, to let it show
 its
 best, you may want to apply additional patch from avg@ to better detect
 CPU
 topology:

 https://gitorious.org/~avg/freebsd/avgbsd/commit/6bca4a2e4854ea3fc275946a023db65c483cb9dd

 test I conducted specifically for this patch did not showed much
 improvement...


 Can you please clarify on this point?
 The test you did included cases where the topology was detected badly
 against cases where the topology was detected correctly as a patched
 kernel (and you still didn't see a performance improvement), in terms
 of cache line sharing?


 At this moment SCHED_ULE does almost nothing in terms of cache line sharing
 affinity (though it probably worth some further experiments). What this
 patch may improve is opposite case -- reduce cache sharing pressure for
 cache-hungry applications. For example, proper cache topology detection
 (such as lack of global L3 cache, but shared L2 per pairs of cores on
 Core2Quad class CPUs) increases pbzip2 performance when number of threads is
 less then number of CPUs (i.e. when there is place for optimization).

My asking is not referred to your patch really.
I just wanted to know if he correctly benchmark a case where the
topology was screwed up and then correctly recognized by avg's patch
in terms of cache level aggregation (it wasn't referred to your patch
btw).

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: NTFS GSoC Project Idea

2012-03-28 Thread Attilio Rao
2012/3/27, Efstratios Karatzas gpf.k...@gmail.com:
 On Tue, Mar 27, 2012 at 11:06 PM, Gleb Kurtsou
 gleb.kurt...@gmail.comwrote:

 On (26/03/2012 21:13), Efstratios Karatzas wrote:
  Greetings,
 
  I am a FreeBSD GSoC 2010 student, looking to participate in this years
  GSoC. The project that I wish to work on is the FreeBSD NTFS driver.
 
  I 've already discussed my project idea with Attilio@ who suggested
 that I
  forward my proposal to the hackers mailing list in order to get more
  feedback and see if there's anyone interested in mentoring a NTFS
 project.
 
  The current draft of the technical part of my proposal(pdf  plain text)
  can be found here:
 
  http://cgi.di.uoa.gr/~std06101/ntfs/ntfs_proposal.tar
 
  The project idea focuses on mp-safing the NTFS driver as well as adding
  extra write support. I've tried to merge the two conflicting NTFS
  project
  ideas in the ideas wiki page, into one. One of them suggesting that work
  should be done on the current driver (mp-safe it etc) and the other one
  suggesting that we port Apple's NTFS driver from scratch. The concept is
  that we keep most of our vnode/vfs code (i.e. ntfs_vfsops.c,
 ntfs_vnops.c)
  and rework existing infrastructure as needed as well as implement new
  features using Apple's driver as a guide.

 I'm not sure I follow your idea, but I'd suggest sticking to a single
 project: either improve FreeBSD NTFS or do the port. FreeBSD and Darwin
 ntfs implementations are completely unrelated thus porting features from
 Darwin won't be easy.

  This way, we avoid the major
  changes in Apple's VFS (is there any documentation to be found?) and
  port
  code that won't break current functionality.

 I bet major changes in Apple's VFS are easier to deal with than
 merging two unrelated file systems. XNU VFS is based on FreeBSD 5 VFS
 and they still share a lot in common, e.g. vnode operations themselves
 are nearly the same, e.g. extended attributes, locking, buffer cache are
 different.

 Take a look at HFS+ port. It's unmaintained and outdated but page
 contains link to CVS repository snapshot.
 http://people.freebsd.org/~yar/hfs/

  I tried to keep this e-mail brief so If you wish to know more, please
 refer
  to the proposal.
 
  Thank you
 
  --
 
  Efstratios GPF Karatzas


 Since the FreeBSD wiki has two conflicting NTFS ideas, I thought of
 submitting two proposals: one to work on the current driver and one to port
 Apple's driver and let the FreeBSD team choose, should they wish to pick
 one of them. Attilio suggested that perhaps a merge of those two ideas was
 better than two separate proposals and this was my attempt to merge those
 ideas; I hear what you say though.

What I specifically had in mind is:
- make NTFS mpsafe
- bugbusting
- surf in Apple implementation/history, check for bugfixes there and
import in the FreeBSD implementation

Unfortunately our idea page is not as updated as you want it to be,
thus it happens to find duplicate ideas list, not well merged, etc.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Kernel threads inherit CPU affinity from random sibling

2012-02-24 Thread Attilio Rao
2012/2/24, Eugene Grosbein egrosb...@rdtc.ru:
 28.01.2012 20:22, Attilio Rao пишет:

 2012/1/28 Ryan Stone ryst...@gmail.com:
 On Fri, Jan 27, 2012 at 10:41 PM, Attilio Rao atti...@freebsd.org
 wrote:
 I think what you found out is very sensitive.
 However, the patch is not correct as you cannot call
 cpuset_setthread() with thread_lock held.

 Whoops!  I actually discovered that for myself and had already fixed
 it, but apparently I included an old version of the patch in the
 email.

 Hence this is my fix:
 http://www.freebsd.org/~attilio/cpuset_root.patch

 Oh, I do like this better.  I tried something similar myself but
 abandoned it because I misread how sched_affinity() was implemented by
 4BSD(I had gotten the impression that once TSF_AFFINITY is set it
 could never be cleared).

 Do you have a pathological test-case for it? Are you going to test the
 patch?

 I have the pathological test-case for it:
 http://www.freebsd.org/cgi/query-pr.cgi?pr=165444


A fix has been committed as r230984, it should apply to STABLE_9/8
too, can you try it?

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Kernel threads inherit CPU affinity from random sibling

2012-02-24 Thread Attilio Rao
2012/2/24, Eugene Grosbein eu...@grosbein.pp.ru:
 24.02.2012 18:45, Attilio Rao пишет:

 I have the pathological test-case for it:
 http://www.freebsd.org/cgi/query-pr.cgi?pr=165444


 A fix has been committed as r230984, it should apply to STABLE_9/8
 too, can you try it?

 Attilio



 I will try but I already run my patch for netisr, so it would be difficult
 to see what helps :-)

Yes, I mean, can you back your patch out?

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: sem(4) lockup in python?

2012-02-05 Thread Attilio Rao
2012/2/5 Ivan Voras ivo...@freebsd.org:
 On 5 February 2012 11:44, Garrett Cooper yaneg...@gmail.com wrote:


    'make MAKE_JOBS_NUMBER=1' is the workground used right now..

 David Xu suggested that it is a bug in Python - it doesn't set
 process-shared attribute when it calls sem_init(), but i've tried
 patching it (replacing the port patchfile file the one I've attached)
 and I still get the hang.

Guys,
it would be valuable if you do the following:
1) recompile your kernel with INVARIANTS, WITNESS and without WITNESS_SKIPSPIN
2a) If you have a serial console, please run the DDB stuff through it
(go to point 3)
2b) If you don't have a serial console please run the DDB stuff in
textdump (go to point 3)
3) Collect the following informations:
- show allpcpu
- show alllocks
- ps
- alltrace
3a) If you had the serial console (thus not textdump) please collect
the coredump with: call doadump
4) reset your machine

You will end up with the textdump or coredump + all the serial logs
necessary to debug this.
If you cannot reproduce your issue with WITNESS enabled, please remove
from your kernel config and avoid to call 'show alllocks' when in DDB.
But try to leave INVARIANTS on.

Hope this helps,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Kernel threads inherit CPU affinity from random sibling

2012-01-28 Thread Attilio Rao
2012/1/28 Ryan Stone ryst...@gmail.com:
 On Fri, Jan 27, 2012 at 10:41 PM, Attilio Rao atti...@freebsd.org wrote:
 I think what you found out is very sensitive.
 However, the patch is not correct as you cannot call
 cpuset_setthread() with thread_lock held.

 Whoops!  I actually discovered that for myself and had already fixed
 it, but apparently I included an old version of the patch in the
 email.

 Hence this is my fix:
 http://www.freebsd.org/~attilio/cpuset_root.patch

 Oh, I do like this better.  I tried something similar myself but
 abandoned it because I misread how sched_affinity() was implemented by
 4BSD(I had gotten the impression that once TSF_AFFINITY is set it
 could never be cleared).

Do you have a pathological test-case for it? Are you going to test the patch?

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Kernel threads inherit CPU affinity from random sibling

2012-01-28 Thread Attilio Rao
2012/1/28 Attilio Rao atti...@freebsd.org:
 2012/1/28 Ryan Stone ryst...@gmail.com:
 On Fri, Jan 27, 2012 at 10:41 PM, Attilio Rao atti...@freebsd.org wrote:
 I think what you found out is very sensitive.
 However, the patch is not correct as you cannot call
 cpuset_setthread() with thread_lock held.

 Whoops!  I actually discovered that for myself and had already fixed
 it, but apparently I included an old version of the patch in the
 email.

 Hence this is my fix:
 http://www.freebsd.org/~attilio/cpuset_root.patch

 Oh, I do like this better.  I tried something similar myself but
 abandoned it because I misread how sched_affinity() was implemented by
 4BSD(I had gotten the impression that once TSF_AFFINITY is set it
 could never be cleared).

 Do you have a pathological test-case for it? Are you going to test the patch?

BTW, I've just now updated the patch in order to remove an added white
line and s/priority/affinity in comments.

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Kernel threads inherit CPU affinity from random sibling

2012-01-27 Thread Attilio Rao
2012/1/28 Ryan Stone ryst...@gmail.com:
 Right now, whenever a thread is spawned, it inherits CPU affinity from
 its parent thread.  I put parent in scare quotes because as far as I
 can tell, for a kernel thread the parent is essentially chosen
 arbitrarily (it looks like it is the most recent thread spawned in
 that kernel process).  Inheriting affinity is arguably correct for
 userland threads (at the very least, an explicit design decision to
 implement inheritance was clearly implemented).  However for kernel
 threads, this behaviour leads to bizarre results that clearly weren't
 intended.  For example, I added a printf to sched_fork_thread() that
 prints a message every time a thread inherits affinity from its
 parent.  Here's the results from booting a dual-core VM with
 net.isr.bindthreads=1:

 Thread 16 inheriting affinity from parent swi1: netisr 0(15)
 Thread 17 inheriting affinity from parent swi3: vm(16)
 Thread 18 inheriting affinity from parent swi4: clock(17)
 Thread 100014 inheriting affinity from parent swi4: clock(18)
 Thread 100017 inheriting affinity from parent swi5: +(100014)
 Thread 100018 inheriting affinity from parent swi6: Giant taskq(100017)
 Thread 100022 inheriting affinity from parent swi6: task queue(100018)
 Thread 100025 inheriting affinity from parent swi2: cambio(100022)
 Thread 100026 inheriting affinity from parent irq14: ata0(100025)
 Thread 100027 inheriting affinity from parent irq15: ata1(100026)
 Thread 100028 inheriting affinity from parent irq19: le0(100027)
 Thread 100029 inheriting affinity from parent irq21: pcm0(100028)
 Thread 100034 inheriting affinity from parent irq22: ohci0(100029)
 Thread 100035 inheriting affinity from parent irq1: atkbd0(100034)

 The result is that every thread in intr kernel process ends up
 inheriting the affinity(my favourite part is that both softclock
 threads are now forced to fight over the same CPU).  I am working on
 the following patch that adds a flag to
 sched_fork()/sched_fork_thread() that can be passed to force the child
 thread to *not* inherit affinity.  However, this doesn't address
 affinity inheritance from calls to kproc_create().  I suppose that I
 could push the flag up into fork1(), but I'd prefer not to.  I am
 wondering if it even makes sense for affinity to be inherited from a
 parent process even for userland processes.

 What are peoples thoughts?  Is this the right approach?

I think what you found out is very sensitive.
However, the patch is not correct as you cannot call
cpuset_setthread() with thread_lock held. Also, I don't think we need
the flag bloat, to be honest.

IMHO, we should choose a default policy for sched_fork{_thread}() and
have the opposite case to fix the thread inherited affinity manually.
For example, for the shortest patch, we can go with
sched_fork{_thread}() functions to directly inherited parent affinity
which is ok in the userland case but needs to be fixed in the
kthread/kproc case. Hence this is my fix:
http://www.freebsd.org/~attilio/cpuset_root.patch

(just test-compiled, so if you agree and want to test further, be my guest).
Maybe we can also add a comment to sched_fork() guys explaining the
default policy, but the code seems self-explanatory enough to me.

What do you think about it?

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: 2 years student project

2012-01-14 Thread Attilio Rao
2012/1/14 Kip Macy kip.m...@gmail.com:
 Many of the ideas on that page are stale. I believe that the most
 promising approach would be to figure out what area of the system
 you're most interested in, e.g. networking, file systems, virtual
 memory, scheduling etc., make an honest appraisal of your development
 abilities (the rate at which you can come up to speed working with a
 large body of code and your ability to read technical documentation),
 and then contact the mentor for one of the Ideas in your area of
 interest to discuss how to proceed further. In order to have any
 likelihood of success you'll need to find someone who is willing to
 take the time to discuss things with you on a regular basis.

In addition to what Kip suggests, there are also a lot of other ideas
that are really very subsystem-specific and are not listed in the
Ideas page, thus if you figure out the kernel areas where you want
to work (and, as Kip says, the rate of your learning) we can provide
you a list of people to contact for any area which can provide several
degrees of mentorship too.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Per-mount syncer threads and fanout for pagedaemon cleaning

2011-12-27 Thread Attilio Rao
2011/12/27 Giovanni Trematerra giovanni.tremate...@gmail.com:
 On Mon, Dec 26, 2011 at 9:24 PM, Venkatesh Srinivas
 vsrini...@dragonflybsd.org wrote:
 Hi!

 I've been playing with two things in DragonFly that might be of interest
 here.

 Thing #1 :=

 First, per-mountpoint syncer threads. Currently there is a single thread,
 'syncer', which periodically calls fsync() on dirty vnodes from every mount,
 along with calling vfs_sync() on each filesystem itself (via syncer vnodes).

 My patch modifies this to create syncer threads for mounts that request it.
 For these mounts, vnodes are synced from their mount-specific thread rather
 than the global syncer.

 The idea is that periodic fsync/sync operations from one filesystem should
 not
 stall or delay synchronization for other ones.
 The patch was fairly simple:
 http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/50e4012a4b55e1efc595db0db397b4365f08b640


 There's something WIP by attilio@ on that area.
 you might want to take a look at
 http://people.freebsd.org/~attilio/syncer_alpha_15.diff

 I don't know what hammerfs needs but UFS/FFS and buffer cache make a good
 job performance-wise and so the authors are skeptical about the boost that 
 such
 a change can give. We believe that brain cycles need to be spent on
 other pieces of the system such as ARC and ZFS.

More specifically, it is likely that focusing on UFS and buffer cache
for performance is not really useful, we should drive our efforts over
ARC and ZFS.
Also, the real bottlenecks in our I/O paths are in GEOM
single-threaded design, lack of unmapped I/O functionality, possibly
lack of proritized I/O, etc.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Per-mount syncer threads and fanout for pagedaemon cleaning

2011-12-27 Thread Attilio Rao
2011/12/27  m...@freebsd.org:
 On Tue, Dec 27, 2011 at 8:05 AM, Attilio Rao atti...@freebsd.org wrote:
 2011/12/27 Giovanni Trematerra giovanni.tremate...@gmail.com:
 On Mon, Dec 26, 2011 at 9:24 PM, Venkatesh Srinivas
 vsrini...@dragonflybsd.org wrote:
 Hi!

 I've been playing with two things in DragonFly that might be of interest
 here.

 Thing #1 :=

 First, per-mountpoint syncer threads. Currently there is a single thread,
 'syncer', which periodically calls fsync() on dirty vnodes from every 
 mount,
 along with calling vfs_sync() on each filesystem itself (via syncer 
 vnodes).

 My patch modifies this to create syncer threads for mounts that request it.
 For these mounts, vnodes are synced from their mount-specific thread rather
 than the global syncer.

 The idea is that periodic fsync/sync operations from one filesystem should
 not
 stall or delay synchronization for other ones.
 The patch was fairly simple:
 http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/50e4012a4b55e1efc595db0db397b4365f08b640


 There's something WIP by attilio@ on that area.
 you might want to take a look at
 http://people.freebsd.org/~attilio/syncer_alpha_15.diff

 I don't know what hammerfs needs but UFS/FFS and buffer cache make a good
 job performance-wise and so the authors are skeptical about the boost that 
 such
 a change can give. We believe that brain cycles need to be spent on
 other pieces of the system such as ARC and ZFS.

 More specifically, it is likely that focusing on UFS and buffer cache
 for performance is not really useful, we should drive our efforts over
 ARC and ZFS.
 Also, the real bottlenecks in our I/O paths are in GEOM
 single-threaded design, lack of unmapped I/O functionality, possibly
 lack of proritized I/O, etc.

 Indeed, Isilon (and probably other vendors as well) entirely skip
 VFS_SYNC when the WAIT argument is MNT_LAZY.  Since we're a
 distributed journalled filesystem, syncing via a system thread is not
 a relevant operation; i.e. all writes that have exited a VOP_WRITE or
 similar operation are already in reasonably stable storage in a
 journal on the relevant nodes.

 However, we do then have our own threads running on each node to flush
 the journal regularly (in addition to when it fills up), and I don't
 know enough about this to know if it could be fit into the syncer
 thread idea or if it's too tied in somehow to our architecture.

I'm not really sure how does journaling is implemented on OneFS, but
when I made this patch SU+J wasn't yet there.
Also, this patch just adds the infrastructure for a multithreaded and
configurable syncer, which means it still requires the UFS bits for
skipping the double-syncing (alias the MNT_LAZY skippage you
mentioned).

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: question about the exchanges of td-td_lock and mtx in sched_switch() of sched_ule

2011-10-24 Thread Attilio Rao
2011/10/13 Haozhong Zhang snowzh8...@gmail.com:
 Hi,

 I'm recently reading the code of sched_ule in freebsd 8.2.0 and have two
 questions.

 1. sched_switch() (in sched_ule.c) invokes cpu_switch() (at line 1852) and
 thread_unblock_switch() (at line 1867). These two functions exchange
 td-td_lock and mtx. What are the purposes of these exchanges?

 2. Can the exchange in cpu_switch() (in amd64/amd64/cpu_switch.S, at line
 134) be done before calling cpu_switch()? I mean, does this reorder of
 exchange and other operations in cpu_switch() cause side-effects to other
 code?

This thread should have the details you need:
http://lists.freebsd.org/pipermail/svn-src-all/2010-January/019345.html

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Re: Kernel timers infrastructure

2011-09-12 Thread Attilio Rao
Besides I'd also suggest to have the callout rearming as the very last
step of you callback in order to avoid buffering interleaving issues.

Attilio

2011/9/12 Kostik Belousov kostik...@gmail.com:
 On Mon, Sep 12, 2011 at 11:48:42AM +0200, Marc L?rner wrote:
 Hello,
 what about changing order of callout_reset and uprintf?
 And your timeout isn't 1minute, it's one second!

 Regards,
 Marc

 I already did that to ensure timer_event_handler would be called correctly.
 
 The result follows:
 
 freebsd# kldload ./timer.ko
 timer_event_handler() with MOD_LOAD
 
 freebsd# kldunload ./timer.ko
 timer_event_handler() with MOD_UNLOAD
 
 and I maintained the module load for about 1 minute so the timer printing 
 Hello, World! should have been run.
 
 Filippo
 
 On 12/set/2011, at 11:24, Adrian Chadd wrote:
 
  How about adding some printfs() to the functions to ensure they're being 
  called?
 

 The callouts are executed in the context that does not have the controlling
 terminal. uprintf(9) tries to use the ctty for output.

 Use printf(9) to get something on console.




-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Large machine test ideas

2011-09-01 Thread Attilio Rao
2011/8/31 Sean Bruno sean...@yahoo-inc.com:
 On Tue, 2011-08-30 at 17:11 -0700, Ivan Voras wrote:
 On 29.8.2011. 20:15, John Baldwin wrote:

  However, the SRAT code just ignores the table when it encounters an issue 
  like
  this, it doesn't hang.  Something else later in the boot must have hung.

 Anyway... that machine can in its maximal configuration be populated
 with eight 10-core CPUs, i.e. 80 physical / 160 logical, so here's a
 vote from me to bump the shiny new cpuset infrastructure maximum CPU
 count to 256 before 9.0.

 http://www.supermicro.com/products/system/5U/5086/SYS-5086B-TRF.cfm

 Doesn't that (MAXCPU) seriously impact VM usage, lock contention
 etc ... ?

 I mean, if we have 2 cpus in a machine, but MAXCPU is set to 256, there
 is a bunch of lost memory and higher levels of lock contention?

 I thought that attilio was taking a stab at enhancing this, but at the
 current time anything more than a value of 64 for MAXCPU is kind of a
 caveat emptor area of FreeBSD.

With newest current you can redefine MAXCPU in your kernel config, so
you don't need to bump the default value.
I think 64 as default value is good enough.

Removing MAXCPU dependency from the KBI is an important project
someone should adopt and bring to conclusion.

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Large machine test ideas

2011-09-01 Thread Attilio Rao
2011/9/1 Ivan Voras ivo...@freebsd.org:
 On 1 September 2011 16:11, Attilio Rao atti...@freebsd.org wrote:

 I mean, if we have 2 cpus in a machine, but MAXCPU is set to 256, there
 is a bunch of lost memory and higher levels of lock contention?

 I thought that attilio was taking a stab at enhancing this, but at the
 current time anything more than a value of 64 for MAXCPU is kind of a
 caveat emptor area of FreeBSD.

 With newest current you can redefine MAXCPU in your kernel config, so
 you don't need to bump the default value.
 I think 64 as default value is good enough.

 Removing MAXCPU dependency from the KBI is an important project
 someone should adopt and bring to conclusion.

 That's certainly one half of it and thanks for the work, but the real
 question in this thread is what Sean asked: what are the negative
 side-effects of simply bumping MAXCPU to 256 by default? AFAIK, there
 are not that many structures which are statically sized by MAXCMPU and
 most use the runtime-detected smp_cpus?


Well, there are quite a few statically allocated, but as I said,
making the kernel MAXCPU-agnostic (or sort of agnostic) is a goal and
a good project.

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-18 Thread Attilio Rao
2011/8/18 Andriy Gapon a...@freebsd.org:
 on 17/08/2011 23:21 Andriy Gapon said the following:

 It seems like everything starts with some kind of a race between
 terminating
 processes in a jail and termination of the jail itself.  This is where the
 details are very thin so far.  What we see is that a process (http) is in
 exit(2) syscall, in exit1() function actually, and past the place where
 P_WEXIT
 flag is set and even past the place where p_limit is freed and reset to
 NULL.
 At that place the thread calls prison_proc_free(), which calls
 prison_deref().
 Then, we see that in prison_deref() the thread gets a page fault because
 of what
 seems like a NULL pointer dereference.  That's just the start of the
 problem and
 its root cause.

 Then, trap_pfault() gets invoked and, because addresses close to NULL look
 like
 userspace addresses, vm_fault/vm_fault_hold gets called, which in its turn
 goes
 on to call vm_map_growstack.  First thing that vm_map_growstack does is a
 call
 to lim_cur(), but because p_limit is already NULL, that call results in a
 NULL
 pointer dereference and a page fault.  Goto the beginning of this
 paragraph.

 So we get this recursion of sorts, which only ends when a stack is
 exhausted and
 a CPU generates a double-fault.

 BTW, does anyone has an idea why the thread in question would disappear
 from
 the kgdb's point of view?

 (kgdb) p cpuid_to_pcpu[2]-pc_curthread-td_tid
 $3 = 102057
 (kgdb) tid 102057
 invalid tid

 info threads also doesn't list the thread.

 Is it because the panic happened while the thread was somewhere in exit1()?
 is there an easy way to examine its stack in this case?

Yes it is likely it.

'tid' command should lookup the tid_to_thread() table (or similar
name) which returns NULL, which means the thread has past beyond the
point it was in the lookup table.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: SMP question w.r.t. reading kernel variables

2011-04-17 Thread Attilio Rao
2011/4/17 Rick Macklem rmack...@uoguelph.ca:
 Hi,

 I should know the answer to this, but... When reading a global kernel
 variable, where its modifications are protected by a mutex, is it
 necessary to get the mutex lock to just read its value?

 For example:
 A    if ((mp-mnt_kern_flag  MNTK_UNMOUNTF) != 0)
          return (EPERM);
 versus
 B    MNT_ILOCK(mp);
     if ((mp-mnt_kern_flag  MNTK_UNMOUNTF) != 0) {
          MNT_IUNLOCK(mp);
          return (EPERM);
     }
     MNT_IUNLOCK(mp);

 My hunch is that B is necessary if you need an up-to-date value
 for the variable (mp-mnt_kern_flag in this case).

 Is that correct?

 Thanks in advance for help with this, rick

In general, FreeBSD kernel assumes that read and writes of the word
boundry and of the int types are always atomic.

Considering this, if a kernel variable is of int type or word boundry
size you don't strictly need a lock there.
Anyway, locking also bring some side effect, like usage of memory and
compiler barriers... while it is true that bounded read/writes should
be seq points, it is not too obvious to predict if the barrier is
necessary or not, is implied or not in every architecture we support.

Said that, for a matter of consistency and of better semantic, I
prefer to also lock simple read/writes when the objects are
explicitly equipped to do that.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: atomic_set_xxx(x, 0)

2010-12-07 Thread Attilio Rao
2010/12/7 Andriy Gapon a...@freebsd.org:

 $ glimpse atomic_set_ | fgrep -w 0
 /usr/src/sys/dev/arcmsr/arcmsr.c:       
 atomic_set_int(acb-srboutstandingcount, 0);
 /usr/src/sys/dev/arcmsr/arcmsr.c:       
 atomic_set_int(acb-srboutstandingcount, 0);
 /usr/src/sys/dev/jme/if_jme.c:  atomic_set_int(sc-jme_morework, 0);
 /usr/src/sys/dev/jme/if_jme.c:  atomic_set_int(sc-jme_morework, 0);
 /usr/src/sys/dev/ale/if_ale.c:  atomic_set_int(sc-ale_morework, 0);
 /usr/src/sys/mips/rmi/dev/xlr/rge.c:
 atomic_set_int((priv-frin_to_be_sent[i]), 0);
 /usr/src/sys/dev/drm/drm_irq.c:
 atomic_set_rel_32(dev-vblank[i].count, 0);
 /usr/src/sys/dev/cxgb/ulp/tom/cxgb_tom.c:       
 atomic_set_int(t-tids_in_use, 0);

 I wonder if these are all bugs and atomic_store_xxx() was actually intended?

Besides, we assume store on the int boundary is implicitly atomic on
all architectures on FreeBSD then, unless a memory barriers is needed,
there is no need to use atomic_* in store cases.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Best way to determine if an IRQ is present

2010-11-25 Thread Attilio Rao
2010/11/25 Andriy Gapon a...@freebsd.org:
 on 25/11/2010 17:28 John Baldwin said the following:
 Andriy Gapon wrote:
 on 22/11/2010 16:24 John Baldwin said the following:
 Well, the real solution is actually larger than described in the PR.  What 
 you
 really want to do is take the logical CPUs offline when they are halted.
 Taking a CPU offline should trigger an EVENTHANDLER that various bits of 
 code
 could invoke.  In the case of platforms that support binding interrupts to 
 CPUs
 (x86 and sparc64 at least), they would install an event handler that 
 searches
 the MD interrupt tables (e.g. the interrupt_sources[] array on x86) and 
 move
 bound interrupts to other CPUs.  However, I think all the interrupt
 bits will be MD, not MI.

 That's a good idea and a comprehensive approach.
 One minor technical detail - should an offlined CPU be removed from all_cpus
 mask/set?

 That's tricky.  In other e-mails I've had on this topic, the idea has been 
 to have
 a new online_cpus mask and maybe a CPU_ONLINE() test macro  similar to
 CPU_ABSENT().  In that case, an offline CPU should still be in all_cpus, but 
 many
 places that use all_cpus would need to use online_cpus instead.


 This sounds like a plan.
 CPU_FOREACH_ONLINE() could also come handy,
 Thanks!

One of the biggest issues here is that these bitmasks become no-more
fixed for the kernel timelife, but you need proper locking to access
them.

I recall that one of the point for this plan is to benchmark and
eventually optimize performance (as it is easilly feasible) on writing
for rmlocks.

In order to have a fully effective plan, there are several nits that
need to be addressed here, I may have collected somewhere in a file,
when I find it I will send to you.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: KDB_TRACE and no backend

2010-09-18 Thread Attilio Rao
2010/9/18 Andriy Gapon a...@icyb.net.ua:

 Here's a small patch that adds support for printing stack trace in form of 
 frame
 addresses when KDB_TRACE is enabled, but there is no debugger backend 
 configured.
 The patch is styled after cheap variant of stack_ktr.

 What do you think (useful/useless, correct, etc) ?

 --- a/sys/kern/subr_kdb.c
 +++ b/sys/kern/subr_kdb.c
 @@ -37,6 +37,7 @@
  #include sys/pcpu.h
  #include sys/proc.h
  #include sys/smp.h
 +#include sys/stack.h
  #include sys/sysctl.h

  #include machine/kdb.h
 @@ -295,10 +296,16 @@
  void
  kdb_backtrace(void)
  {
 +       struct stack st;
 +       int i;

 -       if (kdb_dbbe != NULL  kdb_dbbe-dbbe_trace != NULL) {
 -               printf(KDB: stack backtrace:\n);
 +       printf(KDB: stack backtrace:\n);
 +       if (kdb_dbbe != NULL  kdb_dbbe-dbbe_trace != NULL)
                kdb_dbbe-dbbe_trace();
 +       else {
 +               stack_save(st);
 +               for (i = 0; i  st.depth; i++)
 +                       printf(#%d %p\n, i, (void*)(uintptr_t)st.pcs[i]);
        }
  }

You have to eventually wrap this logic within the 'STACK' option
(opt_stack.h for the check) because stack_save() will be uneffective
otherwise. STACK should be mandatory for DDB I guess, but it is not
for KDB.

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: KDB_TRACE and no backend

2010-09-18 Thread Attilio Rao
2010/9/18 Andriy Gapon a...@freebsd.org:
 on 18/09/2010 22:00 Andriy Gapon said the following:
 Oh, wow, and I totally overlooked stack_print().
 Should have read stack(9) from the start.

 New patch.  Hope this is better.
 I don't like that the printf is duplicated, but couldn't figure out a way to
 combine pre-processor and C conditions.

 --- a/sys/kern/subr_kdb.c
 +++ b/sys/kern/subr_kdb.c
 @@ -37,6 +37,7 @@ __FBSDID($FreeBSD$);
  #include sys/pcpu.h
  #include sys/proc.h
  #include sys/smp.h
 +#include sys/stack.h
  #include sys/sysctl.h

  #include machine/kdb.h
 @@ -300,6 +301,15 @@ kdb_backtrace(void)
                printf(KDB: stack backtrace:\n);
                kdb_dbbe-dbbe_trace();
        }
 +#ifdef STACK
 +       else {
 +               struct stack st;
 +
 +               printf(KDB: stack backtrace:\n);
 +               stack_save(st);
 +               stack_print(st);
 +       }
 +#endif
  }

  /*


It is still missing checking on opt_stack.h
Besides, I'd reconsider having KDB_TRACE explanation in ddb(4) manpage
(right now it is rightly there because it is DDB specific only, as
long as it offers the backend, but with your change it is a global
functionality. Not sure if it worths changing it but however you may
have more opinions).

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: KDB_TRACE and no backend

2010-09-18 Thread Attilio Rao
2010/9/18 Andriy Gapon a...@freebsd.org:
 on 18/09/2010 23:35 Attilio Rao said the following:
 It is still missing checking on opt_stack.h

 Yes, thanks, fixed it in my tree.

 Besides, I'd reconsider having KDB_TRACE explanation in ddb(4) manpage
 (right now it is rightly there because it is DDB specific only, as
 long as it offers the backend, but with your change it is a global
 functionality. Not sure if it worths changing it but however you may
 have more opinions).

 It seems that we don't have kdb(4) ?


We don't and we should really have. I'd really like a kernel section
describing the whole kdb infrastructure and kdbe hooks.
That may be indicated as a janitor taks actually if someone wants to
takeover and document the whole layer.

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: kld modules remain loaded if MOD_LOAD handler returns an error

2010-08-23 Thread Attilio Rao
2010/8/23 Andriy Gapon a...@icyb.net.ua:
 on 23/08/2010 15:10 John Baldwin said the following:
 On Friday, August 20, 2010 1:13:53 pm Ryan Stone wrote:
 Consider the following modules:

 /* first.c */
 static int *test;

 int
 test_function(void)
 {
     return *test;
 }

 static int
 first_modevent(struct module *m, int what, void *arg)
 {
         int err = 0;

         switch (what) {
         case MOD_LOAD:                /* kldload */
                 test = malloc(sizeof(int), M_TEMP, M_NOWAIT | M_ZERO);
                 if (!test)
                         err = ENOMEM;
                 break;
         case MOD_UNLOAD:              /* kldunload */
                 break;
         default:
                 err = EINVAL;
                 break;
         }
         return(err);
 }

 static moduledata_t first_mod = {
         first,
         first_modevent,
         NULL
 };

 DECLARE_MODULE(first, first_mod, SI_SUB_KLD, SI_ORDER_ANY);
 MODULE_VERSION(first, 1);


 /* second.c */
 static int
 second_modevent(struct module *m, int what, void *arg)
 {
         int err = 0;

         switch (what) {
         case MOD_LOAD:                /* kldload */
                 test_function();
                 break;
         case MOD_UNLOAD:              /* kldunload */
                 break;
         default:
                 err = EINVAL;
                 break;
         }
         return(err);
 }

 static moduledata_t second_mod = {
         second,
         second_modevent,
         NULL
 };

 DECLARE_MODULE(second, second_mod, SI_SUB_KLD, SI_ORDER_ANY);
 MODULE_DEPEND(second, first, 1, 1, 1);


 Consider the case where malloc fails in first_modevent.
 first_modevent will return ENOMEM, but the module will remain loaded.
 Now when the second module goes and loads, it calls into the first
 module, which is not initialized properly, and promptly crashes when
 test_function() dereferences a null pointer.

 It seems to me that a module should be unloaded if it returns an error
 from its MOD_LOAD handler.  However, that's easier said than done.
 The MOD_LOAD handler is called from a SYSINIT, and there's no
 immediately obvious way to pass information about the failure from the
 SYSINIT to the kernel linker.  Anybody have any thoughts on this?

 Yeah, it's not easy to fix.  Probably we could patch the kernel linker to
 notice if any of the modules for a given linker file had errors during
 initialization and trigger an unload if that occurs.  I don't think this 
 would
 be too hard to do.

 John,

 please note that for this testcase we would need to prevent second module's
 modevent from being executed at all.
 Perhaps a module shouldn't be considered as loaded until modevent caller 
 marks it
 as successfully initialized, but I haven't looked at the actual code.

I replied in private, but I really agree with Andriy here.

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: sched_pin() versus PCPU_GET

2010-08-08 Thread Attilio Rao
2010/8/4  m...@freebsd.org:
 On Fri, Jul 30, 2010 at 2:31 PM, John Baldwin j...@freebsd.org wrote:
 On Friday, July 30, 2010 10:08:22 am John Baldwin wrote:
 On Thursday, July 29, 2010 7:39:02 pm m...@freebsd.org wrote:
  We've seen a few instances at work where witness_warn() in ast()
  indicates the sched lock is still held, but the place it claims it was
  held by is in fact sometimes not possible to keep the lock, like:
 
      thread_lock(td);
      td-td_flags = ~TDF_SELECT;
      thread_unlock(td);
 
  What I was wondering is, even though the assembly I see in objdump -S
  for witness_warn has the increment of td_pinned before the PCPU_GET:
 
  802db210:   65 48 8b 1c 25 00 00    mov    %gs:0x0,%rbx
  802db217:   00 00
  802db219:   ff 83 04 01 00 00       incl   0x104(%rbx)
       * Pin the thread in order to avoid problems with thread migration.
       * Once that all verifies are passed about spinlocks ownership,
       * the thread is in a safe path and it can be unpinned.
       */
      sched_pin();
      lock_list = PCPU_GET(spinlocks);
  802db21f:   65 48 8b 04 25 48 00    mov    %gs:0x48,%rax
  802db226:   00 00
      if (lock_list != NULL  lock_list-ll_count != 0) {
  802db228:   48 85 c0                test   %rax,%rax
       * Pin the thread in order to avoid problems with thread migration.
       * Once that all verifies are passed about spinlocks ownership,
       * the thread is in a safe path and it can be unpinned.
       */
      sched_pin();
      lock_list = PCPU_GET(spinlocks);
  802db22b:   48 89 85 f0 fe ff ff    mov    %rax,-0x110(%rbp)
  802db232:   48 89 85 f8 fe ff ff    mov    %rax,-0x108(%rbp)
      if (lock_list != NULL  lock_list-ll_count != 0) {
  802db239:   0f 84 ff 00 00 00       je     802db33e
  witness_warn+0x30e
  802db23f:   44 8b 60 50             mov    0x50(%rax),%r12d
 
  is it possible for the hardware to do any re-ordering here?
 
  The reason I'm suspicious is not just that the code doesn't have a
  lock leak at the indicated point, but in one instance I can see in the
  dump that the lock_list local from witness_warn is from the pcpu
  structure for CPU 0 (and I was warned about sched lock 0), but the
  thread id in panic_cpu is 2.  So clearly the thread was being migrated
  right around panic time.
 
  This is the amd64 kernel on stable/7.  I'm not sure exactly what kind
  of hardware; it's a 4-way Intel chip from about 3 or 4 years ago IIRC.
 
  So... do we need some kind of barrier in the code for sched_pin() for
  it to really do what it claims?  Could the hardware have re-ordered
  the mov    %gs:0x48,%rax PCPU_GET to before the sched_pin()
  increment?

 Hmmm, I think it might be able to because they refer to different locations.

 Note this rule in section 8.2.2 of Volume 3A:

   • Reads may be reordered with older writes to different locations but not
     with older writes to the same location.

 It is certainly true that sparc64 could reorder with RMO.  I believe ia64
 could reorder as well.  Since sched_pin/unpin are frequently used to provide
 this sort of synchronization, we could use memory barriers in pin/unpin
 like so:

 sched_pin()
 {
       td-td_pinned = atomic_load_acq_int(td-td_pinned) + 1;
 }

 sched_unpin()
 {
       atomic_store_rel_int(td-td_pinned, td-td_pinned - 1);
 }

 We could also just use atomic_add_acq_int() and atomic_sub_rel_int(), but 
 they
 are slightly more heavyweight, though it would be more clear what is 
 happening
 I think.

 However, to actually get a race you'd have to have an interrupt fire and
 migrate you so that the speculative read was from the other CPU.  However, I
 don't think the speculative read would be preserved in that case.  The CPU
 has to return to a specific PC when it returns from the interrupt and it has
 no way of storing the state for what speculative reordering it might be
 doing, so presumably it is thrown away?  I suppose it is possible that it
 actually retires both instructions (but reordered) and then returns to the PC
 value after the read of listlocks after the interrupt.  However, in that case
 the scheduler would not migrate as it would see td_pinned != 0.  To get the
 race you have to have the interrupt take effect prior to modifying td_pinned,
 so I think the processor would have to discard the reordered read of
 listlocks so it could safely resume execution at the 'incl' instruction.

 The other nit there on x86 at least is that the incl instruction is doing
 both a read and a write and another rule in the section 8.2.2 is this:

  • Reads are not reordered with other reads.

 That would seem to prevent the read of listlocks from passing the read of
 td_pinned in the incl instruction on x86.

 I wonder how that's interpreted in the microcode, though?  I.e. if the
 incr instruction decodes to load, add, store, does the h/w allow the
 later reads to pass the final store?

 I added the 

Re: sched_pin() versus PCPU_GET

2010-08-06 Thread Attilio Rao
2010/8/5 John Baldwin j...@freebsd.org:
 On Thursday, August 05, 2010 11:59:37 am m...@freebsd.org wrote:
 On Wed, Aug 4, 2010 at 11:55 AM, John Baldwin j...@freebsd.org wrote:
  On Wednesday, August 04, 2010 12:20:31 pm m...@freebsd.org wrote:
  On Wed, Aug 4, 2010 at 2:26 PM, John Baldwin j...@freebsd.org wrote:
   On Tuesday, August 03, 2010 9:46:16 pm m...@freebsd.org wrote:
   On Fri, Jul 30, 2010 at 2:31 PM, John Baldwin j...@freebsd.org wrote:
On Friday, July 30, 2010 10:08:22 am John Baldwin wrote:
On Thursday, July 29, 2010 7:39:02 pm m...@freebsd.org wrote:
 We've seen a few instances at work where witness_warn() in ast()
 indicates the sched lock is still held, but the place it claims
 it was
 held by is in fact sometimes not possible to keep the lock, like:

     thread_lock(td);
     td-td_flags = ~TDF_SELECT;
     thread_unlock(td);

 What I was wondering is, even though the assembly I see in
 objdump -S
 for witness_warn has the increment of td_pinned before the
 PCPU_GET:

 802db210:   65 48 8b 1c 25 00 00    mov    %gs:0x0,%rbx
 802db217:   00 00
 802db219:   ff 83 04 01 00 00       incl   0x104(%rbx)
      * Pin the thread in order to avoid problems with thread
 migration.
      * Once that all verifies are passed about spinlocks
 ownership,
      * the thread is in a safe path and it can be unpinned.
      */
     sched_pin();
     lock_list = PCPU_GET(spinlocks);
 802db21f:   65 48 8b 04 25 48 00    mov    %gs:0x48,%rax
 802db226:   00 00
     if (lock_list != NULL  lock_list-ll_count != 0) {
 802db228:   48 85 c0                test   %rax,%rax
      * Pin the thread in order to avoid problems with thread
 migration.
      * Once that all verifies are passed about spinlocks
 ownership,
      * the thread is in a safe path and it can be unpinned.
      */
     sched_pin();
     lock_list = PCPU_GET(spinlocks);
 802db22b:   48 89 85 f0 fe ff ff    mov
  %rax,-0x110(%rbp)
 802db232:   48 89 85 f8 fe ff ff    mov
  %rax,-0x108(%rbp)
     if (lock_list != NULL  lock_list-ll_count != 0) {
 802db239:   0f 84 ff 00 00 00       je
 802db33e
 witness_warn+0x30e
 802db23f:   44 8b 60 50             mov    0x50(%rax),
 %r12d

 is it possible for the hardware to do any re-ordering here?

 The reason I'm suspicious is not just that the code doesn't have
 a
 lock leak at the indicated point, but in one instance I can see
 in the
 dump that the lock_list local from witness_warn is from the pcpu
 structure for CPU 0 (and I was warned about sched lock 0), but
 the
 thread id in panic_cpu is 2.  So clearly the thread was being
 migrated
 right around panic time.

 This is the amd64 kernel on stable/7.  I'm not sure exactly what
 kind
 of hardware; it's a 4-way Intel chip from about 3 or 4 years ago
 IIRC.

 So... do we need some kind of barrier in the code for sched_pin()
 for
 it to really do what it claims?  Could the hardware have re-
 ordered
 the mov    %gs:0x48,%rax PCPU_GET to before the sched_pin()
 increment?
   
Hmmm, I think it might be able to because they refer to different
 locations.
   
Note this rule in section 8.2.2 of Volume 3A:
   
  • Reads may be reordered with older writes to different locations
 but not
    with older writes to the same location.
   
It is certainly true that sparc64 could reorder with RMO.  I
 believe ia64
could reorder as well.  Since sched_pin/unpin are frequently used
 to provide
this sort of synchronization, we could use memory barriers in
 pin/unpin
like so:
   
sched_pin()
{
      td-td_pinned = atomic_load_acq_int(td-td_pinned) + 1;
}
   
sched_unpin()
{
      atomic_store_rel_int(td-td_pinned, td-td_pinned - 1);
}
   
We could also just use atomic_add_acq_int() and
 atomic_sub_rel_int(), but they
are slightly more heavyweight, though it would be more clear what
 is happening
I think.
   
However, to actually get a race you'd have to have an interrupt fire
 and
migrate you so that the speculative read was from the other CPU.
  However, I
don't think the speculative read would be preserved in that case.
  The CPU
has to return to a specific PC when it returns from the interrupt
 and it has
no way of storing the state for what speculative reordering it might
 be
doing, so presumably it is thrown away?  I suppose it is possible
 that it
actually retires both instructions (but reordered) and then returns
 to the PC
value after the read of listlocks after the interrupt.  However, in
 that case
the scheduler would not migrate as it would see td_pinned != 0.  To
 get the
race you have to have the interrupt take effect prior to modifying
 

Re: disk I/O, VFS hirunningspace

2010-07-15 Thread Attilio Rao
2010/7/15 Alan Cox alan.l@gmail.com:
 On Thu, Jul 15, 2010 at 8:01 AM, Ivan Voras ivo...@freebsd.org wrote:

 On 07/14/10 18:27, Jerry Toung wrote:
  On Wed, Jul 14, 2010 at 12:04 AM, Gary Jennejohn
  gljennj...@googlemail.comwrote:
 
 
 
  Rather than commenting out the code try setting the sysctl
  vfs.hirunningspace to various powers-of-two.  Default seems to be
  1MB.  I just changed it on the command line as a test to 2MB.
 
  You can do this in /etc/sysctl.conf.
 
 
  thank you all, that did it. The settings that Matt recommended are giving
  the same numbers

 Any objections to raising the defaults to 8 MB / 1 MB in HEAD?



 Keep in mind that we still run on some fairly small systems with limited I/O
 capabilities, e.g., a typical arm platform.  More generally, with the range
 of systems that FreeBSD runs on today, any particular choice of constants is
 going to perform poorly for someone.  If nothing else, making these sysctls
 a function of the buffer cache size is probably better than any particular
 constants.

What about making a MD sysctl?

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: sysbench / fileio - Linux vs. FreeBSD

2010-06-05 Thread Attilio Rao
2010/6/5 Matthew Jacob m...@feral.com

 All of these tests have been apples vs. oranges for years.

 The following seems to be true, though:

 a) FreeBSD sequential write performance in UFS has always been less than 
 optimal.

 b) Linux sequential write performance in just about any filesystem has always 
 been impressive. But that impressive has come at some not so obvious 
 costs. First of all, Linux is probably the most aggressive 
 cluster/write-behind OS I've even seen. You can suck down all available 
 memory with writebehind using dd. This means that some stats are 
 impressive, and others are painful. A desktop that becomes completely 
 unresponsive while you're doing this dd is one personal outcome.

 Also, you have to be careful what you're asking for in comparing the two 
 platforms, or any platforms for that matter. What do you want to optimize 
 for? Apparent responsiveness as a desktop? A specific workload (nfs, cifs) 
 that completes N quatloos per fortnight?

Besides anything, I'm much more concerned about the loss of
performance within FreeBSD itself. I wouldn't expect a so high
pessimization when the number of threads increases (without
considering the big performance loss with the 8k blocksize, pretty
much reproducible). I'm trying to drive, privately, the tester to
pmc/lock profiling analysis in order to start collecting some useful
datas.
While I think that we might pay a lot of attention to ZFS, I think we
might not leave alone FFS. Having a fast, well supported, native
filesystem might be a great thing for us.

Comparing with other operating systems, as you smartly point out,
might not be got as 'undefeatable truths' but have cons and prons that
needs to be fully understood before to make false claims.

Thanks,
Attilio


--
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: sysbench / fileio - Linux vs. FreeBSD

2010-06-05 Thread Attilio Rao
2010/6/5 Kostik Belousov kostik...@gmail.com

 On Sat, Jun 05, 2010 at 07:41:23PM +0200, Attilio Rao wrote:
  2010/6/5 Matthew Jacob m...@feral.com
  
   All of these tests have been apples vs. oranges for years.
  
   The following seems to be true, though:
  
   a) FreeBSD sequential write performance in UFS has always been less than 
   optimal.
  
   b) Linux sequential write performance in just about any filesystem has 
   always been impressive. But that impressive has come at some not so 
   obvious costs. First of all, Linux is probably the most aggressive 
   cluster/write-behind OS I've even seen. You can suck down all available 
   memory with writebehind using dd. This means that some stats are 
   impressive, and others are painful. A desktop that becomes completely 
   unresponsive while you're doing this dd is one personal outcome.
  
   Also, you have to be careful what you're asking for in comparing the two 
   platforms, or any platforms for that matter. What do you want to optimize 
   for? Apparent responsiveness as a desktop? A specific workload (nfs, 
   cifs) that completes N quatloos per fortnight?
 
  Besides anything, I'm much more concerned about the loss of
  performance within FreeBSD itself. I wouldn't expect a so high
  pessimization when the number of threads increases (without
  considering the big performance loss with the 8k blocksize, pretty
  much reproducible). I'm trying to drive, privately, the tester to
  pmc/lock profiling analysis in order to start collecting some useful
  datas.
 Are the benchmarks create threads that write to the same file ?
 If yes, then this behaviour is well understood.

Actually I still don't know as I just sent an e-mail to the tester and
he didn't followup still.
However I'm not entirely sure this is a full bottleneck which may be
reconduit to missing of byte-range locking.
I want to dig more and better understand what's going on exactly.

Attilio


--
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: idle threads

2010-04-23 Thread Attilio Rao
2010/4/24 Matthew Fleming matthew.flem...@isilon.com:
 I'm looking at kern_idle.c in stable/7 and I don't quite follow how idle
 threads work.  The kthread_create(9) call does not pass in a function
 pointer, so what code does a processor run when there is no other
 runnable thread?

In STABLE_7:

...
#ifdef SMP
SLIST_FOREACH(pc, cpuhead, pc_allcpu) {
error = kthread_create(sched_idletd, NULL, p,
RFSTOPPED | RFHIGHPID, 0, idle: cpu%d, pc-pc_cpuid);
pc-pc_idlethread = FIRST_THREAD_IN_PROC(p);
#else
error = kthread_create(sched_idletd, NULL, p,
RFSTOPPED | RFHIGHPID, 0, idle);
PCPU_SET(idlethread, FIRST_THREAD_IN_PROC(p));
#endif
...

then they rightly passes sched_idletd(). Any scheduler may define its
own version of sched_idletd().
NULL is just the argument passed that is not meaningful.
Or maybe I'm not understanding your question?

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Panic in vm_map_stack

2010-03-26 Thread Attilio Rao
2010/3/26 pluknet pluk...@gmail.com:
 On 26 March 2010 23:10, Tom Judge t...@tomjudge.com wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 This is the function, I am guessing that I need to unlock the proc
 before calling vmspace_free ?



 As far as I know you cannot lock a process around
 locking vmspace and/or vm_map (at least on 6.x).
 I used process reference counting for that purpose.
 Sort of the following mumble..

Generally that is true for vm_map_lock() because it is a sx_lock (thus
sleeping) but in this case it is used a trylock operation that won't
sleep (and I assume that is why it is used), thus it is not a problem.
The vmspace refcounting doesn't impose any extra-constraint instead,
thus the lock dance is not required here I think.

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: resource leak in fifo_vnops.c: 6.x/7.x/8.x

2009-11-06 Thread Attilio Rao
2009/11/6 Dorr H. Clark dcl...@engr.scu.edu:


 We believe we have identified a significant resource leak
 present in 6.x, 7.x, and 8.x.  We believe this is a regression
 versus FreeBSD 4.x which appears to do the Right Thing (tm).

 We have a test program (see below) which will run the system
 out of sockets by repeated exercise of the failing code
 path in the kernel.

 Our proposed fix is applied to the file usr/src/sys/fs/fifofs/fifo_vnops.c


 @@ -237,6 +237,8 @@
if (ap-a_mode  FWRITE) {
if ((ap-a_mode  O_NONBLOCK)  fip-fi_readers == 0) {
mtx_unlock(fifo_mtx);
 +   /* Exclusive VOP lock is held - safe to clean */
 +   fifo_cleanup(vp);
return (ENXIO);
}
fip-fi_writers++;

I think it should also check that fip-if_writers == 0 (and possibly
the checks within fifo_cleanup() should just be assertions, but that's
orthogonal someway) and the comment is not needed.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: sx locks and memory barriers

2009-10-03 Thread Attilio Rao
2009/9/29 John Baldwin j...@freebsd.org:
 On Tuesday 29 September 2009 4:42:13 pm Attilio Rao wrote:
 2009/9/29 Max Laier m...@love2party.net:
  On Tuesday 29 September 2009 17:39:37 Attilio Rao wrote:
  2009/9/25 Fabio Checconi fa...@freebsd.org:
   Hi all,
looking at sys/sx.h I have some troubles understanding this comment:
  
* A note about memory barriers.  Exclusive locks need to use the same
* memory barriers as mutexes: _acq when acquiring an exclusive lock
* and _rel when releasing an exclusive lock.  On the other side,
* shared lock needs to use an _acq barrier when acquiring the lock
* but, since they don't update any locked data, no memory barrier is
* needed when releasing a shared lock.
  
   In particular, I'm not understanding what prevents the following 
   sequence
   from happening:
  
   CPU A   CPU B
  
   sx_slock(data-lock);
  
   sx_sunlock(data-lock);
  
   /* reordered after the unlock
 by the cpu */
   if (data-buffer)
  sx_xlock(data-lock);
  free(data-buffer);
  data-buffer = NULL;
  sx_xunlock(data-lock);
  
  a = *data-buffer;
  
   IOW, even if readers do not modify the data protected by the lock,
   without a release barrier a memory access may leak past the unlock (as
   the cpu won't notice any dependency between the unlock and the fetch,
   feeling free to reorder them), thus potentially racing with an exclusive
   writer accessing the data.
  
   On architectures where atomic ops serialize memory accesses this would
   never happen, otherwise the sequence above seems possible; am I missing
   something?
 
  I think your concerns are right, possibly we need this patch:
  http://www.freebsd.org/~attilio/sxrw_unlockb.diff
 
  However speaking with John we agreed possibly there is a more serious
   breakage. Possibly, memory barriers would also require to ensure the
   compiler to not reorder the operation, while right now, in FreeBSD, they
   just take care of the reordering from the architecture perspective.
  The only way I'm aware of GCC offers that is to clobber memory.
  I will provide a patch that address this soon, hoping that GCC will be
  smart enough to not overhead too much the memory clobbering but just
  try to understand what's our purpose and servers it (I will try to
  compare code generated before and after the patch at least for tier-1
  architectures).
 
  Does GCC really reorder accesses to volatile objects? The C Standard seems 
  to
  object:
 
  5.1.2.3 - 2
  Accessing a volatile object, modifying an object, modifying a file, or 
  calling
  a function that does any of those operations are all side effects,11) which
  are changes in the state of the execution environment. Evaluation of an
  expression may produce side effects. At certain specified points in the
  execution sequence called sequence points, all side effects of previous
  evaluations shall be complete and no side effects of subsequent evaluations
  shall have taken place. (A summary of the sequence points is given in annex
  C.)

 Very interesting.
 I was thinking about the other operating systems which basically do
 'memory clobbering' for ensuring a compiler barrier, but actually they
 often forsee such a barrier without the conjuction of a memory
 operand.

 I think I will need to speak a bit with a GCC engineer in order to see
 what do they implement in regard of volatile operands.

 GCC can be quite aggressive with reordering even in the face of volatile.  I
 was recently doing a hack to export some data from the kernel to userland
 that used a spin loop to grab a snapshot of the contents of a structure
 similar to the method used in the kernel with the timehands structures.  It
 used a volatile structure exposed from the kernel that looked something
 like:

 struct foo {
volatile int gen;
/* other stuff */
 };

 volatile struct foo *p;

 do {
x = p-gen;
/* read other stuff */
y = p-gen;
 } while (x != y  x != 0);

 GCC moved the 'y = ' up into the middle of the '/* read other stuff */'.
 I eventually had to add explicit memory clobbers to force GCC to not
 move the reads of 'gen' around but do them around all the other
 operations, so that the working code is:

 do {
x = p-gen;
asm volatile( ::: memory);
/* read other stuff */
asm volatile( ::: memory);
y = p-gen;
 } while (x != y  x != 0);

The situation was not so desperate as I first thought.
Actually, only ia32 and amd64 seems affected by the missing of memory
clobbering because it is already done for all the other platform when
using all the memory barriers.
On ia32 and amd64 too, the impact is more limited than what I
expected. atomic_cmpset_* already clobbers the memory and only
atomic_store_rel_* is not adeguately

Re: NMI with 7.2-stable

2009-10-01 Thread Attilio Rao
Can you send a full dmesg?

2009/10/1 ivan anyukov ivan.anyu...@googlemail.com:
 Hi guys,
 I'm running 7.2-STABLE on a Thinkpad T60.
 When connecting a second monitor to my docking station sometimes my FreeBSD
 freezes.
 kgdb on the vmcore-file says non-maskable interrupt trap
 Some details:
 X.Org 1.5.3 using the radeon-Driver
 I think the problem appears when moving xterms from the first to the second
 monitor (or back). The mouse cursor looks _very_ strange then and after some
 minutes the whole system freezes.
 Does anyone know about the problem? Is it a hardware-failure for sure?

 A vmcore is generated, I loaded it into kgdb:

 This GDB was configured as i386-marcel-freebsd...

 Unread portion of the kernel message buffer:
 22NMIN MIIS AI SA b0b, 0E,I ESIAS A 
 
 2R2
RA2M ApMa rpiatyr ietryr oerr,r olri,k elliyk ehlayr dhwaarrdew afraei
 lfuariel.ure.



 FFaattaall  ttrraapp  1199::  nnoonn--mmaasskkaabbllee  iinntteeuupptt
 ttrraapp  wwhhiillee  iinn  kkeerrnneell  mmooddee

 ccppuuiidd  ==  01;;  aappiicc  iidd  ==  0010

 iinnssttrruuccttiioonn  ppooiinntteerr  ==
 00xx2200::00xxcc00eeaa7733eecc

 ssttaacckk  ppooiinntteerr  ==
 00xx2288::00xxcc4477aa58cc

 ffrraammee  ppooiinntteerr  ==
 00xx2288::00xxcc4477aa58cc

 ccooddee  sseeggmmeenntt==  bbaassee
 00xx00,,  lliimmiitt  00xxff,,  ttyyppee  00xx11bb

==  DDPPLL  00,,  pprreess
 11,,  ddeeff3322  11,,  ggrraann  11

 pprroocceeoorr  eeffllaaggss==  iinntteeuupptt
 eennaabblleedd,,  IIOOPPLL  ==  00

 ccuueenntt  pprrooccee  ==  1112
 ((iiddllee::  ccppuu10))

 ttrraapp  nnuummbbeerr  ==  1199

 backtrace says:
 #0  doadump () at pcpu.h:196
 #1  0xc07c5258 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418
 #2  0xc07c5535 in panic (fmt=Variable fmt is not available.
 ) at /usr/src/sys/kern/kern_shutdown.c:574
 #3  0xc0ab2d14 in trap_fatal (frame=0xc47a5c48, eva=0) at
 /usr/src/sys/i386/i386/trap.c:939
 #4  0xc0ab3a9d in trap (frame=0xc47a5c48) at
 /usr/src/sys/i386/i386/trap.c:726
 #5  0xc0a9910b in calltrap () at /usr/src/sys/i386/i386/exception.s:159
 #6  0xc0ea73ec in acpi_cpu_c1 () at
 /usr/src/sys/modules/acpi/acpi/../../../i386/acpica/acpi_machdep.c:550
 #7  0xc0ea0623 in acpi_cpu_idle () at
 /usr/src/sys/modules/acpi/acpi/../../../dev/acpica/acpi_cpu.c:943
 #8  0xc0aa377b in cpu_idle () at /usr/src/sys/i386/i386/machdep.c:1183
 #9  0xc07e6bd4 in sched_idletd (dummy=0x0) at
 /usr/src/sys/kern/sched_ule.c:2681
 #10 0xc07a1311 in fork_exit (callout=0xc07e690f sched_idletd, arg=0x0,
 frame=0xc47a5d38)
at /usr/src/sys/kern/kern_fork.c:810
 #11 0xc0a99180 in fork_trampoline () at
 /usr/src/sys/i386/i386/exception.s:264

 Any ideas?

 Thanks a lot!
 ___
 freebsd-hackers@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
 To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org




-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: NMI with 7.2-stable

2009-10-01 Thread Attilio Rao
2009/10/1 ivan anyukov ivan.anyu...@googlemail.com:
 dmesg goes here:


Yes, I think it is an hw failure.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: sx locks and memory barriers

2009-09-29 Thread Attilio Rao
2009/9/25 Fabio Checconi fa...@freebsd.org:
 Hi all,
  looking at sys/sx.h I have some troubles understanding this comment:

  * A note about memory barriers.  Exclusive locks need to use the same
  * memory barriers as mutexes: _acq when acquiring an exclusive lock
  * and _rel when releasing an exclusive lock.  On the other side,
  * shared lock needs to use an _acq barrier when acquiring the lock
  * but, since they don't update any locked data, no memory barrier is
  * needed when releasing a shared lock.

 In particular, I'm not understanding what prevents the following sequence
 from happening:

 CPU A   CPU B

 sx_slock(data-lock);

 sx_sunlock(data-lock);

 /* reordered after the unlock
   by the cpu */
 if (data-buffer)
sx_xlock(data-lock);
free(data-buffer);
data-buffer = NULL;
sx_xunlock(data-lock);

a = *data-buffer;

 IOW, even if readers do not modify the data protected by the lock,
 without a release barrier a memory access may leak past the unlock (as
 the cpu won't notice any dependency between the unlock and the fetch,
 feeling free to reorder them), thus potentially racing with an exclusive
 writer accessing the data.

 On architectures where atomic ops serialize memory accesses this would
 never happen, otherwise the sequence above seems possible; am I missing
 something?

I think your concerns are right, possibly we need this patch:
http://www.freebsd.org/~attilio/sxrw_unlockb.diff

However speaking with John we agreed possibly there is a more serious breakage.
Possibly, memory barriers would also require to ensure the compiler to
not reorder the operation, while right now, in FreeBSD, they just take
care of the reordering from the architecture perspective.
The only way I'm aware of GCC offers that is to clobber memory.
I will provide a patch that address this soon, hoping that GCC will be
smart enough to not overhead too much the memory clobbering but just
try to understand what's our purpose and servers it (I will try to
compare code generated before and after the patch at least for tier-1
architectures).

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: sx locks and memory barriers

2009-09-29 Thread Attilio Rao
2009/9/29 John Baldwin j...@freebsd.org:
 On Tuesday 29 September 2009 11:39:37 am Attilio Rao wrote:
 2009/9/25 Fabio Checconi fa...@freebsd.org:
  Hi all,
   looking at sys/sx.h I have some troubles understanding this comment:
 
   * A note about memory barriers.  Exclusive locks need to use the same
   * memory barriers as mutexes: _acq when acquiring an exclusive lock
   * and _rel when releasing an exclusive lock.  On the other side,
   * shared lock needs to use an _acq barrier when acquiring the lock
   * but, since they don't update any locked data, no memory barrier is
   * needed when releasing a shared lock.
 
  In particular, I'm not understanding what prevents the following sequence
  from happening:
 
  CPU A   CPU B
 
  sx_slock(data-lock);
 
  sx_sunlock(data-lock);
 
  /* reordered after the unlock
by the cpu */
  if (data-buffer)
 sx_xlock(data-lock);
 free(data-buffer);
 data-buffer = NULL;
 sx_xunlock(data-lock);
 
 a = *data-buffer;
 
  IOW, even if readers do not modify the data protected by the lock,
  without a release barrier a memory access may leak past the unlock (as
  the cpu won't notice any dependency between the unlock and the fetch,
  feeling free to reorder them), thus potentially racing with an exclusive
  writer accessing the data.
 
  On architectures where atomic ops serialize memory accesses this would
  never happen, otherwise the sequence above seems possible; am I missing
  something?

 I think your concerns are right, possibly we need this patch:
 http://www.freebsd.org/~attilio/sxrw_unlockb.diff

 Actually, since you are only worried about reads, I think this should be
 an acq barrier rather than a rel.  In some cases acq is cheaper, so we
 should prefer the cheapest barrier that provides what we need.  You would
 still need to keep some language about the memory barriers since using acq
 for shared unlocking is different from exclusive unlocking.

Actually, I don't think that an acq barrier ensures enough protection
against the reordering of 'earlier' operation thus not fixing the
architecture ordering problem reported by Fabio. Also, I don't think
we just have to care about reads (or  I don't understand what you mean
here).
However, I'm not even sure that we have faster read barriers than the
write one. As long as it should be true in theory I don't think that's
what happen in practice.

 The memory clobber is quite heavyweight.  It actually forces gcc to forget any
 cached memory items in registers and reload everything again.  What I really
 want is just a barrier to tell GCC to not reorder things.  If I read a value
 in the program before acquiring a lock it is in theory fine to keep that
 cached across the barrier.  However, there isn't a way to do this sort of
 thing with GCC currently.

Yes, that's the only tool we have right now with GCC. I will try to
look for another way, but it sounds difficult to discover.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: sx locks and memory barriers

2009-09-29 Thread Attilio Rao
2009/9/29 John Baldwin j...@freebsd.org:
 On Tuesday 29 September 2009 3:15:40 pm Attilio Rao wrote:
 2009/9/29 John Baldwin j...@freebsd.org:
  On Tuesday 29 September 2009 11:39:37 am Attilio Rao wrote:
  2009/9/25 Fabio Checconi fa...@freebsd.org:
   Hi all,
looking at sys/sx.h I have some troubles understanding this comment:
  
* A note about memory barriers.  Exclusive locks need to use the same
* memory barriers as mutexes: _acq when acquiring an exclusive lock
* and _rel when releasing an exclusive lock.  On the other side,
* shared lock needs to use an _acq barrier when acquiring the lock
* but, since they don't update any locked data, no memory barrier is
* needed when releasing a shared lock.
  
   In particular, I'm not understanding what prevents the following 
   sequence
   from happening:
  
   CPU A   CPU B
  
   sx_slock(data-lock);
  
   sx_sunlock(data-lock);
  
   /* reordered after the unlock
 by the cpu */
   if (data-buffer)
  sx_xlock(data-lock);
  free(data-buffer);
  data-buffer = NULL;
  sx_xunlock(data-lock);
  
  a = *data-buffer;
  
   IOW, even if readers do not modify the data protected by the lock,
   without a release barrier a memory access may leak past the unlock (as
   the cpu won't notice any dependency between the unlock and the fetch,
   feeling free to reorder them), thus potentially racing with an exclusive
   writer accessing the data.
  
   On architectures where atomic ops serialize memory accesses this would
   never happen, otherwise the sequence above seems possible; am I missing
   something?
 
  I think your concerns are right, possibly we need this patch:
  http://www.freebsd.org/~attilio/sxrw_unlockb.diff
 
  Actually, since you are only worried about reads, I think this should be
  an acq barrier rather than a rel.  In some cases acq is cheaper, so 
  we
  should prefer the cheapest barrier that provides what we need.  You would
  still need to keep some language about the memory barriers since using 
  acq
  for shared unlocking is different from exclusive unlocking.

 Actually, I don't think that an acq barrier ensures enough protection
 against the reordering of 'earlier' operation thus not fixing the
 architecture ordering problem reported by Fabio. Also, I don't think
 we just have to care about reads (or  I don't understand what you mean
 here).

 Hmmm, it might not on certain archs.  It does on x86 (i.e. an lfence would
 work here on x86), but probably not on ia64 and sparc64.  Also, we certainly
 only care about reads.  A read/share lock cannot resolve any races where the
 lock holder is writing data, it can only ensure that the lock holder can
 safely read shared data without the data changing out from underneath it.

 However, I'm not even sure that we have faster read barriers than the
 write one. As long as it should be true in theory I don't think that's
 what happen in practice.

 bde@ found that sfence was generally much more expensive than lfence on x86.
 However, since x86 guarantees the order of all stores we don't actually need
 to use sfence at all on x86 anyway.

Yes, x86 guarantees that the stores are strong ordered so I don't
think acq to be faster than rel.
Can I assume the patch I already sent as reviewed by you and commit then, right?

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: sx locks and memory barriers

2009-09-29 Thread Attilio Rao
2009/9/29 Max Laier m...@love2party.net:
 On Tuesday 29 September 2009 17:39:37 Attilio Rao wrote:
 2009/9/25 Fabio Checconi fa...@freebsd.org:
  Hi all,
   looking at sys/sx.h I have some troubles understanding this comment:
 
   * A note about memory barriers.  Exclusive locks need to use the same
   * memory barriers as mutexes: _acq when acquiring an exclusive lock
   * and _rel when releasing an exclusive lock.  On the other side,
   * shared lock needs to use an _acq barrier when acquiring the lock
   * but, since they don't update any locked data, no memory barrier is
   * needed when releasing a shared lock.
 
  In particular, I'm not understanding what prevents the following sequence
  from happening:
 
  CPU A   CPU B
 
  sx_slock(data-lock);
 
  sx_sunlock(data-lock);
 
  /* reordered after the unlock
by the cpu */
  if (data-buffer)
 sx_xlock(data-lock);
 free(data-buffer);
 data-buffer = NULL;
 sx_xunlock(data-lock);
 
 a = *data-buffer;
 
  IOW, even if readers do not modify the data protected by the lock,
  without a release barrier a memory access may leak past the unlock (as
  the cpu won't notice any dependency between the unlock and the fetch,
  feeling free to reorder them), thus potentially racing with an exclusive
  writer accessing the data.
 
  On architectures where atomic ops serialize memory accesses this would
  never happen, otherwise the sequence above seems possible; am I missing
  something?

 I think your concerns are right, possibly we need this patch:
 http://www.freebsd.org/~attilio/sxrw_unlockb.diff

 However speaking with John we agreed possibly there is a more serious
  breakage. Possibly, memory barriers would also require to ensure the
  compiler to not reorder the operation, while right now, in FreeBSD, they
  just take care of the reordering from the architecture perspective.
 The only way I'm aware of GCC offers that is to clobber memory.
 I will provide a patch that address this soon, hoping that GCC will be
 smart enough to not overhead too much the memory clobbering but just
 try to understand what's our purpose and servers it (I will try to
 compare code generated before and after the patch at least for tier-1
 architectures).

 Does GCC really reorder accesses to volatile objects? The C Standard seems to
 object:

 5.1.2.3 - 2
 Accessing a volatile object, modifying an object, modifying a file, or calling
 a function that does any of those operations are all side effects,11) which
 are changes in the state of the execution environment. Evaluation of an
 expression may produce side effects. At certain specified points in the
 execution sequence called sequence points, all side effects of previous
 evaluations shall be complete and no side effects of subsequent evaluations
 shall have taken place. (A summary of the sequence points is given in annex
 C.)

Very interesting.
I was thinking about the other operating systems which basically do
'memory clobbering' for ensuring a compiler barrier, but actually they
often forsee such a barrier without the conjuction of a memory
operand.

I think I will need to speak a bit with a GCC engineer in order to see
what do they implement in regard of volatile operands.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: sx locks and memory barriers

2009-09-29 Thread Attilio Rao
2009/9/29 Marius Nünnerich mar...@nuenneri.ch:
 On Tue, Sep 29, 2009 at 21:15, Attilio Rao atti...@freebsd.org wrote:
 2009/9/29 John Baldwin j...@freebsd.org:
 On Tuesday 29 September 2009 11:39:37 am Attilio Rao wrote:
 2009/9/25 Fabio Checconi fa...@freebsd.org:
  Hi all,
   looking at sys/sx.h I have some troubles understanding this comment:
 
   * A note about memory barriers.  Exclusive locks need to use the same
   * memory barriers as mutexes: _acq when acquiring an exclusive lock
   * and _rel when releasing an exclusive lock.  On the other side,
   * shared lock needs to use an _acq barrier when acquiring the lock
   * but, since they don't update any locked data, no memory barrier is
   * needed when releasing a shared lock.
 
  In particular, I'm not understanding what prevents the following sequence
  from happening:
 
  CPU A   CPU B
 
  sx_slock(data-lock);
 
  sx_sunlock(data-lock);
 
  /* reordered after the unlock
by the cpu */
  if (data-buffer)
 sx_xlock(data-lock);
 free(data-buffer);
 data-buffer = NULL;
 sx_xunlock(data-lock);
 
 a = *data-buffer;
 
  IOW, even if readers do not modify the data protected by the lock,
  without a release barrier a memory access may leak past the unlock (as
  the cpu won't notice any dependency between the unlock and the fetch,
  feeling free to reorder them), thus potentially racing with an exclusive
  writer accessing the data.
 
  On architectures where atomic ops serialize memory accesses this would
  never happen, otherwise the sequence above seems possible; am I missing
  something?

 I think your concerns are right, possibly we need this patch:
 http://www.freebsd.org/~attilio/sxrw_unlockb.diff

 Actually, since you are only worried about reads, I think this should be
 an acq barrier rather than a rel.  In some cases acq is cheaper, so we
 should prefer the cheapest barrier that provides what we need.  You would
 still need to keep some language about the memory barriers since using acq
 for shared unlocking is different from exclusive unlocking.

 Actually, I don't think that an acq barrier ensures enough protection
 against the reordering of 'earlier' operation thus not fixing the
 architecture ordering problem reported by Fabio. Also, I don't think
 we just have to care about reads (or  I don't understand what you mean
 here).
 However, I'm not even sure that we have faster read barriers than the
 write one. As long as it should be true in theory I don't think that's
 what happen in practice.

 The memory clobber is quite heavyweight.  It actually forces gcc to forget 
 any
 cached memory items in registers and reload everything again.  What I really
 want is just a barrier to tell GCC to not reorder things.  If I read a value
 in the program before acquiring a lock it is in theory fine to keep that
 cached across the barrier.  However, there isn't a way to do this sort of
 thing with GCC currently.

 Yes, that's the only tool we have right now with GCC. I will try to
 look for another way, but it sounds difficult to discover.

 Even if we would have a mechanism to tell GCC to not reorder the
 instructions the CPU itself would still be free to reorder if there
 are no barriers. Or am I missing something?

Our code already takes care of that case in our barriers.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: sx locks and memory barriers

2009-09-29 Thread Attilio Rao
2009/9/29 John Baldwin j...@freebsd.org:
 On Tuesday 29 September 2009 4:42:13 pm Attilio Rao wrote:
 2009/9/29 Max Laier m...@love2party.net:
  On Tuesday 29 September 2009 17:39:37 Attilio Rao wrote:
  2009/9/25 Fabio Checconi fa...@freebsd.org:
   Hi all,
looking at sys/sx.h I have some troubles understanding this comment:
  
* A note about memory barriers.  Exclusive locks need to use the same
* memory barriers as mutexes: _acq when acquiring an exclusive lock
* and _rel when releasing an exclusive lock.  On the other side,
* shared lock needs to use an _acq barrier when acquiring the lock
* but, since they don't update any locked data, no memory barrier is
* needed when releasing a shared lock.
  
   In particular, I'm not understanding what prevents the following 
   sequence
   from happening:
  
   CPU A   CPU B
  
   sx_slock(data-lock);
  
   sx_sunlock(data-lock);
  
   /* reordered after the unlock
 by the cpu */
   if (data-buffer)
  sx_xlock(data-lock);
  free(data-buffer);
  data-buffer = NULL;
  sx_xunlock(data-lock);
  
  a = *data-buffer;
  
   IOW, even if readers do not modify the data protected by the lock,
   without a release barrier a memory access may leak past the unlock (as
   the cpu won't notice any dependency between the unlock and the fetch,
   feeling free to reorder them), thus potentially racing with an exclusive
   writer accessing the data.
  
   On architectures where atomic ops serialize memory accesses this would
   never happen, otherwise the sequence above seems possible; am I missing
   something?
 
  I think your concerns are right, possibly we need this patch:
  http://www.freebsd.org/~attilio/sxrw_unlockb.diff
 
  However speaking with John we agreed possibly there is a more serious
   breakage. Possibly, memory barriers would also require to ensure the
   compiler to not reorder the operation, while right now, in FreeBSD, they
   just take care of the reordering from the architecture perspective.
  The only way I'm aware of GCC offers that is to clobber memory.
  I will provide a patch that address this soon, hoping that GCC will be
  smart enough to not overhead too much the memory clobbering but just
  try to understand what's our purpose and servers it (I will try to
  compare code generated before and after the patch at least for tier-1
  architectures).
 
  Does GCC really reorder accesses to volatile objects? The C Standard seems 
  to
  object:
 
  5.1.2.3 - 2
  Accessing a volatile object, modifying an object, modifying a file, or 
  calling
  a function that does any of those operations are all side effects,11) which
  are changes in the state of the execution environment. Evaluation of an
  expression may produce side effects. At certain specified points in the
  execution sequence called sequence points, all side effects of previous
  evaluations shall be complete and no side effects of subsequent evaluations
  shall have taken place. (A summary of the sequence points is given in annex
  C.)

 Very interesting.
 I was thinking about the other operating systems which basically do
 'memory clobbering' for ensuring a compiler barrier, but actually they
 often forsee such a barrier without the conjuction of a memory
 operand.

 I think I will need to speak a bit with a GCC engineer in order to see
 what do they implement in regard of volatile operands.

 GCC can be quite aggressive with reordering even in the face of volatile.  I
 was recently doing a hack to export some data from the kernel to userland
 that used a spin loop to grab a snapshot of the contents of a structure
 similar to the method used in the kernel with the timehands structures.  It
 used a volatile structure exposed from the kernel that looked something
 like:

 struct foo {
volatile int gen;
/* other stuff */
 };

 volatile struct foo *p;

 do {
x = p-gen;
/* read other stuff */
y = p-gen;
 } while (x != y  x != 0);

 GCC moved the 'y = ' up into the middle of the '/* read other stuff */'.
 I eventually had to add explicit memory clobbers to force GCC to not
 move the reads of 'gen' around but do them around all the other
 operations, so that the working code is:

 do {
x = p-gen;
asm volatile( ::: memory);
/* read other stuff */
asm volatile( ::: memory);
y = p-gen;
 } while (x != y  x != 0);


I see.
So probabilly clobbering memory is the only choice we have right now.
I will try to make a patch which also keeps into account the
possibility to skip it (or define by hand alternative approaches) for
different compilers.
I wonder, specifically, how llvm/clang relies with it.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein

Re: Where have all the vnodes gone?

2009-08-01 Thread Attilio Rao
2009/8/1 Linda Messerschmidt linda.messerschm...@gmail.com:
 (Reposted from freebsd-questions due to no replies.)

 With the last few releases, I've noticed a distinct trend toward
 disappearing vnodes on one of the machines I look after.

 This machine isn't doing a whole lot.  It runs a couple of small web
 sites, and once an hour it rsync's some files from one NFS mount to
 another, but the rsync doesn't stay running; it restarts every hour
 and runs for 10-15 minutes.

 I set it to log the number of vnodes every ten minutes and this is what I got:

 00:47:59 vfs.numvnodes: 39337
 00:57:59 vfs.numvnodes: 40568
 01:07:59 vfs.numvnodes: 44554
 01:17:59 vfs.numvnodes: 52141
 01:27:59 vfs.numvnodes: 55713
 01:37:59 vfs.numvnodes: 58643
 01:47:59 vfs.numvnodes: 60792
 01:57:59 vfs.numvnodes: 67130
 02:07:59 vfs.numvnodes: 76035
 02:17:59 vfs.numvnodes: 84349
 02:27:59 vfs.numvnodes: 92187
 02:37:59 vfs.numvnodes: 98114
 02:47:59 vfs.numvnodes: 116854
 02:57:59 vfs.numvnodes: 124842
 03:07:59 vfs.numvnodes: 164173
 03:17:59 vfs.numvnodes: 172257
 03:27:59 vfs.numvnodes: 178388
 03:37:59 vfs.numvnodes: 183066
 03:47:59 vfs.numvnodes: 190092
 03:57:59 vfs.numvnodes: 198322
 04:07:59 vfs.numvnodes: 204598
 04:17:59 vfs.numvnodes: 208311
 04:27:59 vfs.numvnodes: 214207
 04:37:59 vfs.numvnodes: 221028
 04:47:59 vfs.numvnodes: 227792
 04:57:59 vfs.numvnodes: 233214
 05:07:59 vfs.numvnodes: 240112
 05:17:59 vfs.numvnodes: 247572
 05:27:59 vfs.numvnodes: 256090
 05:37:59 vfs.numvnodes: 262720
 05:47:59 vfs.numvnodes: 269755
 05:57:59 vfs.numvnodes: 274986
 06:07:59 vfs.numvnodes: 279879
 06:17:59 vfs.numvnodes: 287039
 06:27:59 vfs.numvnodes: 291984
 06:37:59 vfs.numvnodes: 294267
 06:47:59 vfs.numvnodes: 296658
 06:57:59 vfs.numvnodes: 299086
 07:07:59 vfs.numvnodes: 301825
 07:17:59 vfs.numvnodes: 309060
 07:27:59 vfs.numvnodes: 312955
 07:37:59 vfs.numvnodes: 317400
 07:47:59 vfs.numvnodes: 320047

 At that point the machine crashed with:

 panic: kmem_malloc(16384): kmem_map too small: 334745600 total allocated

 If I don't tune kern.maxvnodes up to the point where it panics, then
 eventually it runs out of vnodes and all sorts of stuff gets stuck in
 vlruwk.

 The machine in question is running 7.2-RELEASE-p3, but I already
 upgraded it from 7.1 trying to get this to go away, so it's a problem
 that's been around for awhile.

 My guess is that they're leaking in the kernel somewhere because of
 the rsync, because there's just not much else going on, but unless I
 can figure out how many vnodes are being used on a per-process basis,
 I can't make any headway on proving or disproving that.  I do know
 that according to fstat, there are only 1000-1500 descriptors open at
 any given time, and kern.openfiles ranges 250-500.  So I'm just
 mystified what the other 30+ could be.

 Is there any way to figure out where all these vnodes are going?

It seems you can reproduce it easilly.
Can you please provide KTR traces of KTR_VFS?

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Where have all the vnodes gone?

2009-08-01 Thread Attilio Rao
2009/8/1 Linda Messerschmidt linda.messerschm...@gmail.com:
 (Reposted from freebsd-questions due to no replies.)

 With the last few releases, I've noticed a distinct trend toward
 disappearing vnodes on one of the machines I look after.

 This machine isn't doing a whole lot.  It runs a couple of small web
 sites, and once an hour it rsync's some files from one NFS mount to
 another, but the rsync doesn't stay running; it restarts every hour
 and runs for 10-15 minutes.

 I set it to log the number of vnodes every ten minutes and this is what I got:

 00:47:59 vfs.numvnodes: 39337
 00:57:59 vfs.numvnodes: 40568
 01:07:59 vfs.numvnodes: 44554
 01:17:59 vfs.numvnodes: 52141
 01:27:59 vfs.numvnodes: 55713
 01:37:59 vfs.numvnodes: 58643
 01:47:59 vfs.numvnodes: 60792
 01:57:59 vfs.numvnodes: 67130
 02:07:59 vfs.numvnodes: 76035
 02:17:59 vfs.numvnodes: 84349
 02:27:59 vfs.numvnodes: 92187
 02:37:59 vfs.numvnodes: 98114
 02:47:59 vfs.numvnodes: 116854
 02:57:59 vfs.numvnodes: 124842
 03:07:59 vfs.numvnodes: 164173
 03:17:59 vfs.numvnodes: 172257
 03:27:59 vfs.numvnodes: 178388
 03:37:59 vfs.numvnodes: 183066
 03:47:59 vfs.numvnodes: 190092
 03:57:59 vfs.numvnodes: 198322
 04:07:59 vfs.numvnodes: 204598
 04:17:59 vfs.numvnodes: 208311
 04:27:59 vfs.numvnodes: 214207
 04:37:59 vfs.numvnodes: 221028
 04:47:59 vfs.numvnodes: 227792
 04:57:59 vfs.numvnodes: 233214
 05:07:59 vfs.numvnodes: 240112
 05:17:59 vfs.numvnodes: 247572
 05:27:59 vfs.numvnodes: 256090
 05:37:59 vfs.numvnodes: 262720
 05:47:59 vfs.numvnodes: 269755
 05:57:59 vfs.numvnodes: 274986
 06:07:59 vfs.numvnodes: 279879
 06:17:59 vfs.numvnodes: 287039
 06:27:59 vfs.numvnodes: 291984
 06:37:59 vfs.numvnodes: 294267
 06:47:59 vfs.numvnodes: 296658
 06:57:59 vfs.numvnodes: 299086
 07:07:59 vfs.numvnodes: 301825
 07:17:59 vfs.numvnodes: 309060
 07:27:59 vfs.numvnodes: 312955
 07:37:59 vfs.numvnodes: 317400
 07:47:59 vfs.numvnodes: 320047

 At that point the machine crashed with:

 panic: kmem_malloc(16384): kmem_map too small: 334745600 total allocated

 If I don't tune kern.maxvnodes up to the point where it panics, then
 eventually it runs out of vnodes and all sorts of stuff gets stuck in
 vlruwk.

 The machine in question is running 7.2-RELEASE-p3, but I already
 upgraded it from 7.1 trying to get this to go away, so it's a problem
 that's been around for awhile.

 My guess is that they're leaking in the kernel somewhere because of
 the rsync, because there's just not much else going on, but unless I
 can figure out how many vnodes are being used on a per-process basis,
 I can't make any headway on proving or disproving that.  I do know
 that according to fstat, there are only 1000-1500 descriptors open at
 any given time, and kern.openfiles ranges 250-500.  So I'm just
 mystified what the other 30+ could be.

 Is there any way to figure out where all these vnodes are going?

 TIA for any advice!
 ___
 freebsd-hackers@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
 To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org




-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: valgrind on FreeBSD 7

2009-06-18 Thread Attilio Rao
2008/10/2 Yuri y...@rawbw.com:
 Xin LI wrote:

 Simon Barner wrote:

 Mike Silbersack wrote:

 On Mon, 17 Mar 2008, Heiko Wundram wrote:

 Here's a tarball of what's in perforce right now.  I tried it a little
 bit, and it seemed to work for me.  Make sure to install the kernel module!

 http://www.silby.com/valgrind_freebsd_3.tar.gz

 But don't send me questions about it - I'm not an expert on it, I'm just
 the guy who grabbed it from perforce and found that it seems to work. :)

 Could you please provide me with the details, so I can update my
 (horribly outdated :( valgrind ports?

 It was available from p4 at:

 //depot/projects/valgrind/...

 Note that this version does not work on architectures other than i386.

 Cheers,


 Any developments in Valgrind/Callgrind on FreeBSD?
 Any hope to get this version into ports and to merge FreeBSD support up into
 Valgrind project?

For our employer (Sandvine incorporated) Ed Maste and me are working
on finishing the port Peter Wemm started a while ago.
You can find updated informations on the project here:
http://wiki.freebsd.org/Valgrind

while you can get the sources pack here:
http://people.freebsd.org/~attilio/Sandvine/valgrind/valgrind.tar.bz2

Please note that 8 and 7 now have all the required support to let run
this modified version, so no further modify is required to these
branches.
Valgrind is able to run on 6 too, if the patch system_valgrind.diff is
applied (I can't recall if it targeted for SVOS, as I think, or it is
stock STABLE_6, just try it, it may apply to both).

The port is really not clean, but that's not Peter's fault. Valgrind
should be really rewritten in order to be less Linux-centric and be
more smart about interface sucking. The effort, though, would be so
large that would require too much work to happen in the short term.

What is in the tar works mostly ok (modulo missing syscalls) on both
ia32 and amd64, but you should carefully follow what is written in the
Wiki for a good install of the product.
Also note that I have an internal version that should fix most (all?)
the issues reported in the Wiki and I will try to release it on a
regular basis, until possible.
So far, I'm fighting with a segfault in the profiled process while
running with SV environment {amd64 arch}, a symptomancy that doesn't
show with a stock FreeBSD installation so far. If you could try the
port and report to us I would appreciate a lot.

For any other question please refer to us.

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: vfs_cache panic, 7.2-prerelease (stable)

2009-05-01 Thread Attilio Rao
2009/5/1  xorquew...@googlemail.com:
 Filed under:

 http://www.freebsd.org/cgi/query-pr.cgi?pr=134142

 Would be incredibly grateful if somebody in the know could take
 a look at this.

But, what's the panic message?

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [problem] aac0 does not respond

2009-03-24 Thread Attilio Rao
2009/3/24 Vladimir Ermakov samflan...@gmail.com:
 Hello, All

 Describe my problem:
 have volume RAID-10 (SAS-HDD x 6) on Adaptec RAID 5805
 2 HHD of 6  have errors in smart data (damaged)
 i am try read file /var/db/mysql/ibdata1 from this volume
 system does not respond ( lost access to ssh ) after read 6GB data from this
 file
 and print debug messages on ttyv0

 As to prevent the emergence of this problem?
 As monitor the status of RAID-controller?

 please, any solutions

Is this -STABLE or -CURRENT?
And if it is -CURRENT, what revision?

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: smbfs hang

2008-10-16 Thread Attilio Rao
2008/10/16, Chris Coleman [EMAIL PROTECTED]:
 I'm doing a large transfer from an SMB mounted drive, about 2TB of
  files.  After about 250G, it hanging.   Of course any process that
  tries to access that drive hangs as well.

  Is there anyway to get those processes killed off and remount the
  drive without rebooting?  I haven't had much luck so far.  They don't
  seem to want to die.

Hello Chris,
are you equipped with a debugging kernel?
Are you interested in helping fixing this bug?

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Building custom kernel with new files

2008-08-16 Thread Attilio Rao
2008/8/16, Ryan French [EMAIL PROTECTED]:
 Hi All,

 I am currently trying to build a custom kernel for my Google Summer of Code,
 and am running into a bit of a problem. I have all of my code compiling, but
 when I get to the linking stage as soon as it comes to the new files I have
 installed it says the *.o does not exist. I have added the files
 to /conf/files, and well as added the files to /conf/options and /conf/NOTES,
 and the option is set in my Makefile for the kernel. Is there another step I
 need to do before I can build the kernel?

Ryan,
can you please past the error message?
More specifically, missing .o are about your newly included files or other?
This can happen, for example, if the consumer of a subsystem wants to
use it and the subsystem is not compiled (the stack(9) without STACK
option, for example).

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Building custom kernel with new files

2008-08-16 Thread Attilio Rao
2008/8/16, Ryan French [EMAIL PROTECTED]:
 On Sat, 16 Aug 2008 10:38:24 pm you wrote:
  2008/8/16, Ryan French [EMAIL PROTECTED]:
   Hi All,
  
   I am currently trying to build a custom kernel for my Google Summer of
   Code, and am running into a bit of a problem. I have all of my code
   compiling, but when I get to the linking stage as soon as it comes to the
   new files I have installed it says the *.o does not exist. I have added
   the files to /conf/files, and well as added the files to /conf/options
   and /conf/NOTES, and the option is set in my Makefile for the kernel. Is
   there another step I need to do before I can build the kernel?
 
  Ryan,
  can you please past the error message?
  More specifically, missing .o are about your newly included files or other?
  This can happen, for example, if the consumer of a subsystem wants to
  use it and the subsystem is not compiled (the stack(9) without STACK
  option, for example).
 
  Thanks,
  Attilio

 The missing .o is about my own files that I have included in the kernel. More
 specifically I have a file 'mpls.h' and as it is the first one in the
 directory as soon as it gets to linking the file I get the error

 ld: mpls.o: No such file: No such file or directory
 *** Error code 1

 This occurs right after the command

 MAKE=make sh /usr/src/sys/conf/newvers.sh MPLSKERNEL

 Sorry I cant copy the whole error but I am running the development in a VM and
 I cant copy and paste between the VM and my host machine.

Can you please paste your diff against src/sys/conf/files* ?

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Do you really sleep when blocked on a mutex?

2008-04-21 Thread Attilio Rao
2008/4/21, Murty, Ravi [EMAIL PROTECTED]:
 Hello,



  When a thread cannot get a mutex (default mutex) and needs to be
  blocked, is it really put to sleep? From looking at the code it appears
  that it is inhibited (TD_SET_LOCK) but isn't really put to sleep.


From a scheduler perspective, sleeping and blocking are 2 different
events but from a low level look-through there is not much difference.
What happens is that the thread is maked as inhibited, moved into a
different queue than runqueue and switched out.
What really changes is upper level behaviour.
TD_SET_LOCK is used in conjuction with turnstiles which basically
operate priority propagation on the 'lock owner' and so the turnstile
becames the recipient for blocked thread (in other word, the turnstile
is the 'different queue than runqueue' where the thread lives).
TD_SET_SLEEP, on the other side, is used in conjuction with sleepqueue
which doesn't operate priority propagation to lock owner (as long as,
often, this owner is not present -- just think to wait channels). In
this case the recipient is the sleepqueue and there the thread lives.

On a low level the paradigm is similar when a thread blocks using the
turnstile or sleeps using the sleepqueue:
- the thread is running so it is nomore present on any runqueue
- the thread is marked as inhibited
- the relevant event is signalled (lock / sleep)
- the thread is assigned to the new recipient (turnstile / sleepqueue)
- the thread is switched out

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Do you really sleep when blocked on a mutex?

2008-04-21 Thread Attilio Rao
2008/4/21, Murty, Ravi [EMAIL PROTECTED]:
 Hello,



  When a thread cannot get a mutex (default mutex) and needs to be
  blocked, is it really put to sleep? From looking at the code it appears
  that it is inhibited (TD_SET_LOCK) but isn't really put to sleep.



  1.  Why isn't it put to sleep - why can't it be treated the same?
  2.  The eventual question I am trying to answer is the difference
  between setrunnable() and setrunqueue() - this one simply finds a slot
  in the ksegrp and a runq to add the KSE/td. But setrunnable() also
  checks to see if the process is in memory (PS_INMEM) before calling
  sched_wakeup which eventually calls setrunqueue()? Why doesn't
  setrunqueue have to worry about the possibility that the process may
  have been swapped out while it was waiting to become runnable?

I just forgot to answer this:
on 6.x serie (note that setrunqueue() shouldn't be available on 7.x
and above), setrunqueue() is just used as backend for setrunnable().
Usually, setrunqueue() was used in places where the state of the
thread was alredy assumed and no further check were needed.

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: pfind() in ithread handler

2008-02-28 Thread Attilio Rao
2008/2/28, Yuri Pankov [EMAIL PROTECTED]:
 Hi,

  I'm trying to understand the cause of following panic.

  panic: Trying sleep, but thread marked as sleeping prohibited
  cpuid = 0
  KDB: stack backtrace:
  db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
  panic() at panic+0x17d
  sleepq_add() at sleepq_add+0x2e1
  _sx_slock_hard() at _sx_slock_hard+0x15d
  _sx_slock() at _sx_slock+0xc1
  pfind() at pfind+0x24
  saa_intr() at saa_intr+0x313
  ithread_loop() at ithread_loop+0xda
  fork_exit() at fork_exit+0x12a
  fork_trampoline() at fork_trampoline+0xe
  --- trap 0, rip = 0, rsp = 0xac3c0d30, rbp = 0 ---

  Can someone enlighten me on what is causing the panic and is it ok to
  use pfind() in interrupt handler (I have very limited understanding of
  kernel internals)?

  Code in question (taken from saa driver
  http://freebsd.ricin.com/ports/distfiles/kbtv-1.92.tbz) can be found at
  http://www.pastebin.ca/921830

ithreads cannot perform unbound sleeps and pfind() needs to access to
allproc_lock which is a sx lock and *performs* an unbound sleep, so
there is a mismatch.

Btw, it sounds like allproc_lock and other similar locks are just
using an sx lock for *legacy*, they should be probabilly converted to
rwlock.
It also imposes little problems in the TTY layer, for example, where I
saw you need to use a unbounded sleeping primitive because of
processes locks acquisitions.

A nice janitor task would be to convert these sx locks into rw and see
what happens.

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: a strange/stupid question

2007-11-23 Thread Attilio Rao
2007/11/24, Aryeh Friedman [EMAIL PROTECTED]:
 On 11/24/07, Attilio Rao [EMAIL PROTECTED] wrote:
  2007/11/24, Aryeh Friedman [EMAIL PROTECTED]:
   Where do I find the main() [and/or other entery point] for the
   kernel I tend to understand stuff better if I follow the flow of
   exec from the start
 
  It is highly MD.
  For IA32 it is in i386/i386/locore.s::btext

 For AMD64 I assume something close to that...

 I just relized that I actually want to understand everything from POST
 on (actually from power on but I know that is very mobo dependant) so
 I guess the question is where do I find the first executed statement
 for BTX (I know how to disamble the MBR so that part is not an issue)

It should be:
boot/i386/boot0/boot0.S::start

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: a strange/stupid question

2007-11-23 Thread Attilio Rao
2007/11/24, Aryeh Friedman [EMAIL PROTECTED]:
 Where do I find the main() [and/or other entery point] for the
 kernel I tend to understand stuff better if I follow the flow of
 exec from the start

It is highly MD.
For IA32 it is in i386/i386/locore.s::btext

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Inner workings of turnstiles and sleepqueues

2007-10-19 Thread Attilio Rao
2007/10/19, John Baldwin [EMAIL PROTECTED]:
 On Friday 19 October 2007 12:56:54 am Ed Schouten wrote:
  * John Baldwin [EMAIL PROTECTED] wrote:
   The best option right now is to read the code.  There are some comments in
   both the headers and implementation.
 
  Would it be useful to write manpages for these interfaces, or do we
  assume that only godlike people can use them anyway? I am willing to
  write manpages for them.

 They already exist, but they really are only used to implement higher-level
 primitives like locks and condition variables.  The rest of the kernel should
 use the higher-level primitives anyway.

Well, really turnstiles don't have manpages, but this is still OK as
they are only used in mutex while the real sleeping primitive should
be identified by sleepqueues.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


  1   2   >