Re: ARM pcpu changes, was Re: [patch] i386 pmap sysmaps_pcpu[] atomic access

2013-03-01 Thread Svatopluk Kraus
On Tue, Feb 26, 2013 at 1:39 PM, Konstantin Belousov
kostik...@gmail.com wrote:
 On Tue, Feb 26, 2013 at 09:06:51PM +1300, Andrew Turner wrote:
 Does anyone know if it is only curthread that needs to be atomic? If so
 this should work. Reading the cpuid from the system currently is a
 single instruction, however it appears the code will need to be reworked
 for use with multiple levels of affinity, for example when there
 are clusters of cores the current code will number two cores on
 different clusters with the same id.

 I don't think we need to use ldrex/strex to read/write the values, my
 understanding is the rest should be safe to access without atomic
 functions.

 I read about ARM architecture some time ago. What I am saying below is not
 a proposal, but rather a way for me to learn more about the architecture.

 From my reading of the docs, ARM has the shadow registers, in particular,
 the r13 (stack pointer) is shadowed for all privileged modes. Is the shadow
 used by our kernel, is it correctly restored on the context switch ?

 If yes, you can easily recover the pcb address from the current sp,
 without accessing any coprocessor registers, and even without any
 assignment of the global register for curthread (which needs to be
 done at the kernel entry). Just align the stack
 start on the 2*PAGE_SIZE boundary (like it is already done for MIPS),
 and do the mask operation on the sp value to get the end of pcb.
 This is atomic and context-switch safe.

 pcb us already per-thread, and can store the thread pointer.
 More, you can store the curcpu or cpuid pointer into pcb too,
 and update it on the context switch.

 amd64 has an architecturally-defined special register (k)gsbase, which
 effectively must be correctly initialized at each kernel entry, and
 restored on the return to usermode. Since the initialization on entry
 and restoration on exit is not atomic wrt the entry/exit itself, amd64
 periodically gets a bugs which cause kernel running with user gsbase.
 This is the fatal bug, destroying the kernel structures and allowing the
 CPU privilege mode switch for normal user.

 Using the shadow registers for this purpose eliminate the whole source
 of the bugs.

Well, IMHO, the both kernel and user thread stacks must correctly be
set for current threads, otherwise it doesn't work. So, the idea to
save things on a thread stack and update them on a context switch will
work in general. However, the stack must be aligned to its size. And
this is the price. I have no idea how this kernel stack alignment can
impact kernel virtual space fragmentation.
OTOH, as you say, this kernel stack storage can be used for many things.

Svata
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: ARM pcpu changes, was Re: [patch] i386 pmap sysmaps_pcpu[] atomic access

2013-03-01 Thread Olivier Houchard
On Fri, Mar 01, 2013 at 02:20:16PM +0100, Svatopluk Kraus wrote:
 On Tue, Feb 26, 2013 at 1:39 PM, Konstantin Belousov
 kostik...@gmail.com wrote:
  On Tue, Feb 26, 2013 at 09:06:51PM +1300, Andrew Turner wrote:
  Does anyone know if it is only curthread that needs to be atomic? If so
  this should work. Reading the cpuid from the system currently is a
  single instruction, however it appears the code will need to be reworked
  for use with multiple levels of affinity, for example when there
  are clusters of cores the current code will number two cores on
  different clusters with the same id.
 
  I don't think we need to use ldrex/strex to read/write the values, my
  understanding is the rest should be safe to access without atomic
  functions.
 
  I read about ARM architecture some time ago. What I am saying below is not
  a proposal, but rather a way for me to learn more about the architecture.
 
  From my reading of the docs, ARM has the shadow registers, in particular,
  the r13 (stack pointer) is shadowed for all privileged modes. Is the shadow
  used by our kernel, is it correctly restored on the context switch ?
 
  If yes, you can easily recover the pcb address from the current sp,
  without accessing any coprocessor registers, and even without any
  assignment of the global register for curthread (which needs to be
  done at the kernel entry). Just align the stack
  start on the 2*PAGE_SIZE boundary (like it is already done for MIPS),
  and do the mask operation on the sp value to get the end of pcb.
  This is atomic and context-switch safe.
 
  pcb us already per-thread, and can store the thread pointer.
  More, you can store the curcpu or cpuid pointer into pcb too,
  and update it on the context switch.
 
  amd64 has an architecturally-defined special register (k)gsbase, which
  effectively must be correctly initialized at each kernel entry, and
  restored on the return to usermode. Since the initialization on entry
  and restoration on exit is not atomic wrt the entry/exit itself, amd64
  periodically gets a bugs which cause kernel running with user gsbase.
  This is the fatal bug, destroying the kernel structures and allowing the
  CPU privilege mode switch for normal user.
 
  Using the shadow registers for this purpose eliminate the whole source
  of the bugs.
 
 Well, IMHO, the both kernel and user thread stacks must correctly be
 set for current threads, otherwise it doesn't work. So, the idea to
 save things on a thread stack and update them on a context switch will
 work in general. However, the stack must be aligned to its size. And
 this is the price. I have no idea how this kernel stack alignment can
 impact kernel virtual space fragmentation.
 OTOH, as you say, this kernel stack storage can be used for many things.
 

Hi,

FWIW, I have a proof-of-ooncept patch that does just that, and it seems to 
work well.

Regards,

Olivier 
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


ARM pcpu changes, was Re: [patch] i386 pmap sysmaps_pcpu[] atomic access

2013-02-26 Thread Andrew Turner
On Mon, 25 Feb 2013 13:47:47 +0100
Olivier Houchard cog...@ci0.org wrote:

 On Mon, Feb 25, 2013 at 07:09:20PM +1300, Andrew Turner wrote:
  On Thu, 21 Feb 2013 10:43:49 -0500
  John Baldwin j...@freebsd.org wrote:
  
   On Thursday, February 21, 2013 7:53:52 am Svatopluk Kraus wrote:
On Wed, Feb 20, 2013 at 4:22 PM, John Baldwin j...@freebsd.org
wrote:
 On Wednesday, February 20, 2013 7:31:08 am Svatopluk Kraus
 wrote:
 On Tue, Feb 19, 2013 at 7:51 PM, Konstantin Belousov
 kostik...@gmail.com wrote:
  On Mon, Feb 18, 2013 at 11:18:16PM +0100, Svatopluk Kraus
  wrote:
  On Mon, Feb 18, 2013 at 9:36 PM, Konstantin Belousov
  kostik...@gmail.com wrote:
  Well, I'm taking a part on porting FreeBSD to
  ARM11mpcore. UP case was simple. SMP case is more complex
  and rather new for me. Recently, I was solving a problem
  with PCPU stuff. For example, PCPU_GET is implemented by
  one instruction on i386 arch. So, a need of atomicity
  with respect to interrupts can be overlooked. On
  load-store archs, the implementation which works in SMP
  case is not so simple. And what works in UP case (single
  PCPU), not works in SMP case. Believe me, mysterious and
  sporadic 'mutex not owned' assertions and others ones
  caused by curthreads mess, it takes a while ...
  Note that PCPU_GET() is not needed to be atomic. The
  reason is that the code which uses its result would not be
  atomic (single-instruction or whatever, see below). Thus,
  either the preemption should not matter since the action
  with the per-cpu data is advisory, as is in the case of
  i386 pmap you noted. or some external measures should be
  applied in advance to the containing region (which you
  proposed, by bracing with sched_pin()).

 So, it's advisory in the case of i386 pmap... Well, if you
 can live with that, I can too.

 
  Also, note that it is not interrupts but preemption which
  is concern.

 Yes and no. In theory, yes, a preemption is a concern. In
 FreeBSD, however, sched_pin() and critical_enter() and their
 counterparts are implemented with help of curthread. And
 curthread definition falls to PCPU_GET(curthread) if not
 defined in other way. So, curthread should be atomic with
 respect to interrupts and in general, PCPU_GET() should be
 too. Note that spinlock_enter() definitions often (always)
 use curthread too. Anyhow, it's defined in MD code and can
 be defined for each arch separately.

 curthread is a bit magic. :)  If you perform a context switch
 during an interrupt (which will change 'curthread') you also
 change your register state. When you resume, the register
 state is also restored.  This means that while something like
 'PCPU_GET(cpuid)' might be stale after you read it,
 'curthread' never is.  However, it is true that actually
 reading curthread has to be atomic.  If your read of
 curthread looks like:

 mov pcpu reg, r0
 add offset of pc_curthread, r0
 ld r0, r1

 Then that will indeed break.  Alpha used a fixed register for
 'pcpu_reg' (as does ia64 IIRC).  OTOH, you might also be able
 to depend on the fact that pc_curthread is the first thing in
 'struct pcpu' (and always will be, you could add a CTASSERT to
 future-proof).  In that case, you can remove the 'add'
 instruction and instead just do:

 ld pcpu reg, r1

 which is fine.

Just for the record. There are three extra (coprocessor)
registers per a core in arm11mpcore (armv6k). Unfortunately
only one is Privileged. The other ones are respectively User
Read only and User Read Write. For now, we are using the
Privileged one to store pcpu pointer (curthread is correctly
set during each context switch). Thus, using a coprocessor
register means that reading of curthread is not a single
instruction implementation now. After we figured out the
curthread issue in SMP case, using a fixed (processor) register
for pcpu is an option. Meanwhile, we disable interrupts before
reading of curthread and enable them after. The same is done
for other PCPU stuff. For now we have not stable system enough
for profiling, however, when it will be, it would be
interesting to learn how various implementations of curthread
reading impact system performance.
   
   curthread is read _a lot_, so I would expect this to hurt.  What
   we did on Alpha was to use a fixed register for pcpu access, but
   we used the equivalent of a coprocessor register to also store
   the value so we could set that fixed register on entry to the
   kernel (userland was free to use the register for its own
   purposes).
   
  
  The current code on ARM is not atomic, it loads the base address of
  the pcpu data from the coprocessor then 

Re: ARM pcpu changes, was Re: [patch] i386 pmap sysmaps_pcpu[] atomic access

2013-02-26 Thread Konstantin Belousov
On Tue, Feb 26, 2013 at 09:06:51PM +1300, Andrew Turner wrote:
 Does anyone know if it is only curthread that needs to be atomic? If so
 this should work. Reading the cpuid from the system currently is a
 single instruction, however it appears the code will need to be reworked
 for use with multiple levels of affinity, for example when there
 are clusters of cores the current code will number two cores on
 different clusters with the same id.
 
 I don't think we need to use ldrex/strex to read/write the values, my
 understanding is the rest should be safe to access without atomic
 functions.

I read about ARM architecture some time ago. What I am saying below is not
a proposal, but rather a way for me to learn more about the architecture.

From my reading of the docs, ARM has the shadow registers, in particular,
the r13 (stack pointer) is shadowed for all privileged modes. Is the shadow
used by our kernel, is it correctly restored on the context switch ?

If yes, you can easily recover the pcb address from the current sp,
without accessing any coprocessor registers, and even without any
assignment of the global register for curthread (which needs to be
done at the kernel entry). Just align the stack
start on the 2*PAGE_SIZE boundary (like it is already done for MIPS),
and do the mask operation on the sp value to get the end of pcb.
This is atomic and context-switch safe.

pcb us already per-thread, and can store the thread pointer.
More, you can store the curcpu or cpuid pointer into pcb too,
and update it on the context switch.

amd64 has an architecturally-defined special register (k)gsbase, which
effectively must be correctly initialized at each kernel entry, and
restored on the return to usermode. Since the initialization on entry
and restoration on exit is not atomic wrt the entry/exit itself, amd64
periodically gets a bugs which cause kernel running with user gsbase.
This is the fatal bug, destroying the kernel structures and allowing the
CPU privilege mode switch for normal user.

Using the shadow registers for this purpose eliminate the whole source
of the bugs.


pgpmYPFAUYHxg.pgp
Description: PGP signature


Re: [patch] i386 pmap sysmaps_pcpu[] atomic access

2013-02-26 Thread John Baldwin
On Monday, February 25, 2013 7:47:47 am Olivier Houchard wrote:
 On Mon, Feb 25, 2013 at 07:09:20PM +1300, Andrew Turner wrote:
  On Thu, 21 Feb 2013 10:43:49 -0500
  John Baldwin j...@freebsd.org wrote:
  
   On Thursday, February 21, 2013 7:53:52 am Svatopluk Kraus wrote:
On Wed, Feb 20, 2013 at 4:22 PM, John Baldwin j...@freebsd.org
wrote:
 On Wednesday, February 20, 2013 7:31:08 am Svatopluk Kraus wrote:
 On Tue, Feb 19, 2013 at 7:51 PM, Konstantin Belousov
 kostik...@gmail.com wrote:
  On Mon, Feb 18, 2013 at 11:18:16PM +0100, Svatopluk Kraus
  wrote:
  On Mon, Feb 18, 2013 at 9:36 PM, Konstantin Belousov
  kostik...@gmail.com wrote:
  Well, I'm taking a part on porting FreeBSD to ARM11mpcore. UP
  case was simple. SMP case is more complex and rather new for
  me. Recently, I was solving a problem with PCPU stuff. For
  example, PCPU_GET is implemented by one instruction on i386
  arch. So, a need of atomicity with respect to interrupts can
  be overlooked. On load-store archs, the implementation which
  works in SMP case is not so simple. And what works in UP case
  (single PCPU), not works in SMP case. Believe me, mysterious
  and sporadic 'mutex not owned' assertions and others ones
  caused by curthreads mess, it takes a while ...
  Note that PCPU_GET() is not needed to be atomic. The reason is
  that the code which uses its result would not be atomic
  (single-instruction or whatever, see below). Thus, either the
  preemption should not matter since the action with the per-cpu
  data is advisory, as is in the case of i386 pmap you noted. or
  some external measures should be applied in advance to the
  containing region (which you proposed, by bracing with
  sched_pin()).

 So, it's advisory in the case of i386 pmap... Well, if you can
 live with that, I can too.

 
  Also, note that it is not interrupts but preemption which is
  concern.

 Yes and no. In theory, yes, a preemption is a concern. In
 FreeBSD, however, sched_pin() and critical_enter() and their
 counterparts are implemented with help of curthread. And
 curthread definition falls to PCPU_GET(curthread) if not defined
 in other way. So, curthread should be atomic with respect to
 interrupts and in general, PCPU_GET() should be too. Note that
 spinlock_enter() definitions often (always) use curthread too.
 Anyhow, it's defined in MD code and can be defined for each arch
 separately.

 curthread is a bit magic. :)  If you perform a context switch
 during an interrupt (which will change 'curthread') you also
 change your register state. When you resume, the register state
 is also restored.  This means that while something like
 'PCPU_GET(cpuid)' might be stale after you read it, 'curthread'
 never is.  However, it is true that actually reading curthread
 has to be atomic.  If your read of curthread looks like:

 mov pcpu reg, r0
 add offset of pc_curthread, r0
 ld r0, r1

 Then that will indeed break.  Alpha used a fixed register for
 'pcpu_reg' (as does ia64 IIRC).  OTOH, you might also be able to
 depend on the fact that pc_curthread is the first thing in
 'struct pcpu' (and always will be, you could add a CTASSERT to
 future-proof).  In that case, you can remove the 'add'
 instruction and instead just do:

 ld pcpu reg, r1

 which is fine.

Just for the record. There are three extra (coprocessor) registers
per a core in arm11mpcore (armv6k). Unfortunately only one is
Privileged. The other ones are respectively User Read only and User
Read Write. For now, we are using the Privileged one to store pcpu
pointer (curthread is correctly set during each context switch).
Thus, using a coprocessor register means that reading of curthread
is not a single instruction implementation now. After we figured
out the curthread issue in SMP case, using a fixed (processor)
register for pcpu is an option. Meanwhile, we disable interrupts
before reading of curthread and enable them after. The same is done
for other PCPU stuff. For now we have not stable system enough for
profiling, however, when it will be, it would be interesting to
learn how various implementations of curthread reading impact
system performance.
   
   curthread is read _a lot_, so I would expect this to hurt.  What we
   did on Alpha was to use a fixed register for pcpu access, but we used
   the equivalent of a coprocessor register to also store the value so
   we could set that fixed register on entry to the kernel (userland was
   free to use the register for its own purposes).
   
  
  The current code on ARM is not atomic, it loads the base address of the
  pcpu data from the coprocessor then loads curthread from this 

Re: [patch] i386 pmap sysmaps_pcpu[] atomic access

2013-02-26 Thread Olivier Houchard
On Mon, Feb 25, 2013 at 04:26:14PM -0500, John Baldwin wrote:
 On Monday, February 25, 2013 7:47:47 am Olivier Houchard wrote:
  On Mon, Feb 25, 2013 at 07:09:20PM +1300, Andrew Turner wrote:
   On Thu, 21 Feb 2013 10:43:49 -0500
   John Baldwin j...@freebsd.org wrote:
   
On Thursday, February 21, 2013 7:53:52 am Svatopluk Kraus wrote:
 On Wed, Feb 20, 2013 at 4:22 PM, John Baldwin j...@freebsd.org
 wrote:
  On Wednesday, February 20, 2013 7:31:08 am Svatopluk Kraus wrote:
  On Tue, Feb 19, 2013 at 7:51 PM, Konstantin Belousov
  kostik...@gmail.com wrote:
   On Mon, Feb 18, 2013 at 11:18:16PM +0100, Svatopluk Kraus
   wrote:
   On Mon, Feb 18, 2013 at 9:36 PM, Konstantin Belousov
   kostik...@gmail.com wrote:
   Well, I'm taking a part on porting FreeBSD to ARM11mpcore. UP
   case was simple. SMP case is more complex and rather new for
   me. Recently, I was solving a problem with PCPU stuff. For
   example, PCPU_GET is implemented by one instruction on i386
   arch. So, a need of atomicity with respect to interrupts can
   be overlooked. On load-store archs, the implementation which
   works in SMP case is not so simple. And what works in UP case
   (single PCPU), not works in SMP case. Believe me, mysterious
   and sporadic 'mutex not owned' assertions and others ones
   caused by curthreads mess, it takes a while ...
   Note that PCPU_GET() is not needed to be atomic. The reason is
   that the code which uses its result would not be atomic
   (single-instruction or whatever, see below). Thus, either the
   preemption should not matter since the action with the per-cpu
   data is advisory, as is in the case of i386 pmap you noted. or
   some external measures should be applied in advance to the
   containing region (which you proposed, by bracing with
   sched_pin()).
 
  So, it's advisory in the case of i386 pmap... Well, if you can
  live with that, I can too.
 
  
   Also, note that it is not interrupts but preemption which is
   concern.
 
  Yes and no. In theory, yes, a preemption is a concern. In
  FreeBSD, however, sched_pin() and critical_enter() and their
  counterparts are implemented with help of curthread. And
  curthread definition falls to PCPU_GET(curthread) if not defined
  in other way. So, curthread should be atomic with respect to
  interrupts and in general, PCPU_GET() should be too. Note that
  spinlock_enter() definitions often (always) use curthread too.
  Anyhow, it's defined in MD code and can be defined for each arch
  separately.
 
  curthread is a bit magic. :)  If you perform a context switch
  during an interrupt (which will change 'curthread') you also
  change your register state. When you resume, the register state
  is also restored.  This means that while something like
  'PCPU_GET(cpuid)' might be stale after you read it, 'curthread'
  never is.  However, it is true that actually reading curthread
  has to be atomic.  If your read of curthread looks like:
 
  mov pcpu reg, r0
  add offset of pc_curthread, r0
  ld r0, r1
 
  Then that will indeed break.  Alpha used a fixed register for
  'pcpu_reg' (as does ia64 IIRC).  OTOH, you might also be able to
  depend on the fact that pc_curthread is the first thing in
  'struct pcpu' (and always will be, you could add a CTASSERT to
  future-proof).  In that case, you can remove the 'add'
  instruction and instead just do:
 
  ld pcpu reg, r1
 
  which is fine.
 
 Just for the record. There are three extra (coprocessor) registers
 per a core in arm11mpcore (armv6k). Unfortunately only one is
 Privileged. The other ones are respectively User Read only and User
 Read Write. For now, we are using the Privileged one to store pcpu
 pointer (curthread is correctly set during each context switch).
 Thus, using a coprocessor register means that reading of curthread
 is not a single instruction implementation now. After we figured
 out the curthread issue in SMP case, using a fixed (processor)
 register for pcpu is an option. Meanwhile, we disable interrupts
 before reading of curthread and enable them after. The same is done
 for other PCPU stuff. For now we have not stable system enough for
 profiling, however, when it will be, it would be interesting to
 learn how various implementations of curthread reading impact
 system performance.

curthread is read _a lot_, so I would expect this to hurt.  What we
did on Alpha was to use a fixed register for pcpu access, but we used
the equivalent of a coprocessor register to also store the value so
we could set that fixed register on entry to the kernel (userland was
free to use the register for its own 

Re: [patch] i386 pmap sysmaps_pcpu[] atomic access

2013-02-25 Thread Olivier Houchard
On Mon, Feb 25, 2013 at 07:09:20PM +1300, Andrew Turner wrote:
 On Thu, 21 Feb 2013 10:43:49 -0500
 John Baldwin j...@freebsd.org wrote:
 
  On Thursday, February 21, 2013 7:53:52 am Svatopluk Kraus wrote:
   On Wed, Feb 20, 2013 at 4:22 PM, John Baldwin j...@freebsd.org
   wrote:
On Wednesday, February 20, 2013 7:31:08 am Svatopluk Kraus wrote:
On Tue, Feb 19, 2013 at 7:51 PM, Konstantin Belousov
kostik...@gmail.com wrote:
 On Mon, Feb 18, 2013 at 11:18:16PM +0100, Svatopluk Kraus
 wrote:
 On Mon, Feb 18, 2013 at 9:36 PM, Konstantin Belousov
 kostik...@gmail.com wrote:
 Well, I'm taking a part on porting FreeBSD to ARM11mpcore. UP
 case was simple. SMP case is more complex and rather new for
 me. Recently, I was solving a problem with PCPU stuff. For
 example, PCPU_GET is implemented by one instruction on i386
 arch. So, a need of atomicity with respect to interrupts can
 be overlooked. On load-store archs, the implementation which
 works in SMP case is not so simple. And what works in UP case
 (single PCPU), not works in SMP case. Believe me, mysterious
 and sporadic 'mutex not owned' assertions and others ones
 caused by curthreads mess, it takes a while ...
 Note that PCPU_GET() is not needed to be atomic. The reason is
 that the code which uses its result would not be atomic
 (single-instruction or whatever, see below). Thus, either the
 preemption should not matter since the action with the per-cpu
 data is advisory, as is in the case of i386 pmap you noted. or
 some external measures should be applied in advance to the
 containing region (which you proposed, by bracing with
 sched_pin()).
   
So, it's advisory in the case of i386 pmap... Well, if you can
live with that, I can too.
   

 Also, note that it is not interrupts but preemption which is
 concern.
   
Yes and no. In theory, yes, a preemption is a concern. In
FreeBSD, however, sched_pin() and critical_enter() and their
counterparts are implemented with help of curthread. And
curthread definition falls to PCPU_GET(curthread) if not defined
in other way. So, curthread should be atomic with respect to
interrupts and in general, PCPU_GET() should be too. Note that
spinlock_enter() definitions often (always) use curthread too.
Anyhow, it's defined in MD code and can be defined for each arch
separately.
   
curthread is a bit magic. :)  If you perform a context switch
during an interrupt (which will change 'curthread') you also
change your register state. When you resume, the register state
is also restored.  This means that while something like
'PCPU_GET(cpuid)' might be stale after you read it, 'curthread'
never is.  However, it is true that actually reading curthread
has to be atomic.  If your read of curthread looks like:
   
mov pcpu reg, r0
add offset of pc_curthread, r0
ld r0, r1
   
Then that will indeed break.  Alpha used a fixed register for
'pcpu_reg' (as does ia64 IIRC).  OTOH, you might also be able to
depend on the fact that pc_curthread is the first thing in
'struct pcpu' (and always will be, you could add a CTASSERT to
future-proof).  In that case, you can remove the 'add'
instruction and instead just do:
   
ld pcpu reg, r1
   
which is fine.
   
   Just for the record. There are three extra (coprocessor) registers
   per a core in arm11mpcore (armv6k). Unfortunately only one is
   Privileged. The other ones are respectively User Read only and User
   Read Write. For now, we are using the Privileged one to store pcpu
   pointer (curthread is correctly set during each context switch).
   Thus, using a coprocessor register means that reading of curthread
   is not a single instruction implementation now. After we figured
   out the curthread issue in SMP case, using a fixed (processor)
   register for pcpu is an option. Meanwhile, we disable interrupts
   before reading of curthread and enable them after. The same is done
   for other PCPU stuff. For now we have not stable system enough for
   profiling, however, when it will be, it would be interesting to
   learn how various implementations of curthread reading impact
   system performance.
  
  curthread is read _a lot_, so I would expect this to hurt.  What we
  did on Alpha was to use a fixed register for pcpu access, but we used
  the equivalent of a coprocessor register to also store the value so
  we could set that fixed register on entry to the kernel (userland was
  free to use the register for its own purposes).
  
 
 The current code on ARM is not atomic, it loads the base address of the
 pcpu data from the coprocessor then loads curthread from this address.
 One solution I discussed with Olivier Houchard is to keep the data in
 the coprocessor but to then load it into a dedicated register when we
 enter the 

Re: [patch] i386 pmap sysmaps_pcpu[] atomic access

2013-02-24 Thread Andrew Turner
On Thu, 21 Feb 2013 10:43:49 -0500
John Baldwin j...@freebsd.org wrote:

 On Thursday, February 21, 2013 7:53:52 am Svatopluk Kraus wrote:
  On Wed, Feb 20, 2013 at 4:22 PM, John Baldwin j...@freebsd.org
  wrote:
   On Wednesday, February 20, 2013 7:31:08 am Svatopluk Kraus wrote:
   On Tue, Feb 19, 2013 at 7:51 PM, Konstantin Belousov
   kostik...@gmail.com wrote:
On Mon, Feb 18, 2013 at 11:18:16PM +0100, Svatopluk Kraus
wrote:
On Mon, Feb 18, 2013 at 9:36 PM, Konstantin Belousov
kostik...@gmail.com wrote:
Well, I'm taking a part on porting FreeBSD to ARM11mpcore. UP
case was simple. SMP case is more complex and rather new for
me. Recently, I was solving a problem with PCPU stuff. For
example, PCPU_GET is implemented by one instruction on i386
arch. So, a need of atomicity with respect to interrupts can
be overlooked. On load-store archs, the implementation which
works in SMP case is not so simple. And what works in UP case
(single PCPU), not works in SMP case. Believe me, mysterious
and sporadic 'mutex not owned' assertions and others ones
caused by curthreads mess, it takes a while ...
Note that PCPU_GET() is not needed to be atomic. The reason is
that the code which uses its result would not be atomic
(single-instruction or whatever, see below). Thus, either the
preemption should not matter since the action with the per-cpu
data is advisory, as is in the case of i386 pmap you noted. or
some external measures should be applied in advance to the
containing region (which you proposed, by bracing with
sched_pin()).
  
   So, it's advisory in the case of i386 pmap... Well, if you can
   live with that, I can too.
  
   
Also, note that it is not interrupts but preemption which is
concern.
  
   Yes and no. In theory, yes, a preemption is a concern. In
   FreeBSD, however, sched_pin() and critical_enter() and their
   counterparts are implemented with help of curthread. And
   curthread definition falls to PCPU_GET(curthread) if not defined
   in other way. So, curthread should be atomic with respect to
   interrupts and in general, PCPU_GET() should be too. Note that
   spinlock_enter() definitions often (always) use curthread too.
   Anyhow, it's defined in MD code and can be defined for each arch
   separately.
  
   curthread is a bit magic. :)  If you perform a context switch
   during an interrupt (which will change 'curthread') you also
   change your register state. When you resume, the register state
   is also restored.  This means that while something like
   'PCPU_GET(cpuid)' might be stale after you read it, 'curthread'
   never is.  However, it is true that actually reading curthread
   has to be atomic.  If your read of curthread looks like:
  
   mov pcpu reg, r0
   add offset of pc_curthread, r0
   ld r0, r1
  
   Then that will indeed break.  Alpha used a fixed register for
   'pcpu_reg' (as does ia64 IIRC).  OTOH, you might also be able to
   depend on the fact that pc_curthread is the first thing in
   'struct pcpu' (and always will be, you could add a CTASSERT to
   future-proof).  In that case, you can remove the 'add'
   instruction and instead just do:
  
   ld pcpu reg, r1
  
   which is fine.
  
  Just for the record. There are three extra (coprocessor) registers
  per a core in arm11mpcore (armv6k). Unfortunately only one is
  Privileged. The other ones are respectively User Read only and User
  Read Write. For now, we are using the Privileged one to store pcpu
  pointer (curthread is correctly set during each context switch).
  Thus, using a coprocessor register means that reading of curthread
  is not a single instruction implementation now. After we figured
  out the curthread issue in SMP case, using a fixed (processor)
  register for pcpu is an option. Meanwhile, we disable interrupts
  before reading of curthread and enable them after. The same is done
  for other PCPU stuff. For now we have not stable system enough for
  profiling, however, when it will be, it would be interesting to
  learn how various implementations of curthread reading impact
  system performance.
 
 curthread is read _a lot_, so I would expect this to hurt.  What we
 did on Alpha was to use a fixed register for pcpu access, but we used
 the equivalent of a coprocessor register to also store the value so
 we could set that fixed register on entry to the kernel (userland was
 free to use the register for its own purposes).
 

The current code on ARM is not atomic, it loads the base address of the
pcpu data from the coprocessor then loads curthread from this address.
One solution I discussed with Olivier Houchard is to keep the data in
the coprocessor but to then load it into a dedicated register when we
enter the kernel.

Reading curthread would then be a single load instruction thanks to
ARM's ldr using an immediate offset [1], e.g. to load from r12 into r1
it would be 'ldr r1, 

Re: [patch] i386 pmap sysmaps_pcpu[] atomic access

2013-02-21 Thread Svatopluk Kraus
On Wed, Feb 20, 2013 at 4:22 PM, John Baldwin j...@freebsd.org wrote:
 On Wednesday, February 20, 2013 7:31:08 am Svatopluk Kraus wrote:
 On Tue, Feb 19, 2013 at 7:51 PM, Konstantin Belousov
 kostik...@gmail.com wrote:
  On Mon, Feb 18, 2013 at 11:18:16PM +0100, Svatopluk Kraus wrote:
  On Mon, Feb 18, 2013 at 9:36 PM, Konstantin Belousov
  kostik...@gmail.com wrote:
  Well, I'm taking a part on porting FreeBSD to ARM11mpcore. UP case was
  simple. SMP case is more complex and rather new for me. Recently, I
  was solving a problem with PCPU stuff. For example, PCPU_GET is
  implemented by one instruction on i386 arch. So, a need of atomicity
  with respect to interrupts can be overlooked. On load-store archs, the
  implementation which works in SMP case is not so simple. And what
  works in UP case (single PCPU), not works in SMP case. Believe me,
  mysterious and sporadic 'mutex not owned' assertions and others ones
  caused by curthreads mess, it takes a while ...
  Note that PCPU_GET() is not needed to be atomic. The reason is that the 
  code
  which uses its result would not be atomic (single-instruction or whatever, 
  see
  below). Thus, either the preemption should not matter since the action with
  the per-cpu data is advisory, as is in the case of i386 pmap you noted.
  or some external measures should be applied in advance to the containing
  region (which you proposed, by bracing with sched_pin()).

 So, it's advisory in the case of i386 pmap... Well, if you can live
 with that, I can too.

 
  Also, note that it is not interrupts but preemption which is concern.

 Yes and no. In theory, yes, a preemption is a concern. In FreeBSD,
 however, sched_pin() and critical_enter() and their counterparts are
 implemented with help of curthread. And curthread definition falls to
 PCPU_GET(curthread) if not defined in other way. So, curthread should
 be atomic with respect to interrupts and in general, PCPU_GET() should
 be too. Note that spinlock_enter() definitions often (always) use
 curthread too. Anyhow, it's defined in MD code and can be defined for
 each arch separately.

 curthread is a bit magic. :)  If you perform a context switch during an
 interrupt (which will change 'curthread') you also change your register state.
 When you resume, the register state is also restored.  This means that while
 something like 'PCPU_GET(cpuid)' might be stale after you read it, 'curthread'
 never is.  However, it is true that actually reading curthread has to be
 atomic.  If your read of curthread looks like:

 mov pcpu reg, r0
 add offset of pc_curthread, r0
 ld r0, r1

 Then that will indeed break.  Alpha used a fixed register for 'pcpu_reg'
 (as does ia64 IIRC).  OTOH, you might also be able to depend on the fact that
 pc_curthread is the first thing in 'struct pcpu' (and always will be, you 
 could
 add a CTASSERT to future-proof).  In that case, you can remove the 'add'
 instruction and instead just do:

 ld pcpu reg, r1

 which is fine.

Just for the record. There are three extra (coprocessor) registers per
a core in arm11mpcore (armv6k). Unfortunately only one is Privileged.
The other ones are respectively User Read only and User Read Write.
For now, we are using the Privileged one to store pcpu pointer
(curthread is correctly set during each context switch). Thus, using a
coprocessor register means that reading of curthread is not a single
instruction implementation now. After we figured out the curthread
issue in SMP case, using a fixed (processor) register for pcpu is an
option. Meanwhile, we disable interrupts before reading of curthread
and enable them after. The same is done for other PCPU stuff. For now
we have not stable system enough for profiling, however, when it will
be, it would be interesting to learn how various implementations of
curthread reading impact system performance.

Svata
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [patch] i386 pmap sysmaps_pcpu[] atomic access

2013-02-21 Thread John Baldwin
On Thursday, February 21, 2013 7:53:52 am Svatopluk Kraus wrote:
 On Wed, Feb 20, 2013 at 4:22 PM, John Baldwin j...@freebsd.org wrote:
  On Wednesday, February 20, 2013 7:31:08 am Svatopluk Kraus wrote:
  On Tue, Feb 19, 2013 at 7:51 PM, Konstantin Belousov
  kostik...@gmail.com wrote:
   On Mon, Feb 18, 2013 at 11:18:16PM +0100, Svatopluk Kraus wrote:
   On Mon, Feb 18, 2013 at 9:36 PM, Konstantin Belousov
   kostik...@gmail.com wrote:
   Well, I'm taking a part on porting FreeBSD to ARM11mpcore. UP case was
   simple. SMP case is more complex and rather new for me. Recently, I
   was solving a problem with PCPU stuff. For example, PCPU_GET is
   implemented by one instruction on i386 arch. So, a need of atomicity
   with respect to interrupts can be overlooked. On load-store archs, the
   implementation which works in SMP case is not so simple. And what
   works in UP case (single PCPU), not works in SMP case. Believe me,
   mysterious and sporadic 'mutex not owned' assertions and others ones
   caused by curthreads mess, it takes a while ...
   Note that PCPU_GET() is not needed to be atomic. The reason is that the 
   code
   which uses its result would not be atomic (single-instruction or 
   whatever, see
   below). Thus, either the preemption should not matter since the action 
   with
   the per-cpu data is advisory, as is in the case of i386 pmap you noted.
   or some external measures should be applied in advance to the containing
   region (which you proposed, by bracing with sched_pin()).
 
  So, it's advisory in the case of i386 pmap... Well, if you can live
  with that, I can too.
 
  
   Also, note that it is not interrupts but preemption which is concern.
 
  Yes and no. In theory, yes, a preemption is a concern. In FreeBSD,
  however, sched_pin() and critical_enter() and their counterparts are
  implemented with help of curthread. And curthread definition falls to
  PCPU_GET(curthread) if not defined in other way. So, curthread should
  be atomic with respect to interrupts and in general, PCPU_GET() should
  be too. Note that spinlock_enter() definitions often (always) use
  curthread too. Anyhow, it's defined in MD code and can be defined for
  each arch separately.
 
  curthread is a bit magic. :)  If you perform a context switch during an
  interrupt (which will change 'curthread') you also change your register 
  state.
  When you resume, the register state is also restored.  This means that while
  something like 'PCPU_GET(cpuid)' might be stale after you read it, 
  'curthread'
  never is.  However, it is true that actually reading curthread has to be
  atomic.  If your read of curthread looks like:
 
  mov pcpu reg, r0
  add offset of pc_curthread, r0
  ld r0, r1
 
  Then that will indeed break.  Alpha used a fixed register for 'pcpu_reg'
  (as does ia64 IIRC).  OTOH, you might also be able to depend on the fact 
  that
  pc_curthread is the first thing in 'struct pcpu' (and always will be, you 
  could
  add a CTASSERT to future-proof).  In that case, you can remove the 'add'
  instruction and instead just do:
 
  ld pcpu reg, r1
 
  which is fine.
 
 Just for the record. There are three extra (coprocessor) registers per
 a core in arm11mpcore (armv6k). Unfortunately only one is Privileged.
 The other ones are respectively User Read only and User Read Write.
 For now, we are using the Privileged one to store pcpu pointer
 (curthread is correctly set during each context switch). Thus, using a
 coprocessor register means that reading of curthread is not a single
 instruction implementation now. After we figured out the curthread
 issue in SMP case, using a fixed (processor) register for pcpu is an
 option. Meanwhile, we disable interrupts before reading of curthread
 and enable them after. The same is done for other PCPU stuff. For now
 we have not stable system enough for profiling, however, when it will
 be, it would be interesting to learn how various implementations of
 curthread reading impact system performance.

curthread is read _a lot_, so I would expect this to hurt.  What we did on
Alpha was to use a fixed register for pcpu access, but we used the equivalent
of a coprocessor register to also store the value so we could set that fixed
register on entry to the kernel (userland was free to use the register for
its own purposes).

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [patch] i386 pmap sysmaps_pcpu[] atomic access

2013-02-21 Thread Svatopluk Kraus
On Thu, Feb 21, 2013 at 4:43 PM, John Baldwin j...@freebsd.org wrote:
  curthread is a bit magic. :)  If you perform a context switch during an
  interrupt (which will change 'curthread') you also change your register 
  state.
  When you resume, the register state is also restored.  This means that 
  while
  something like 'PCPU_GET(cpuid)' might be stale after you read it, 
  'curthread'
  never is.  However, it is true that actually reading curthread has to be
  atomic.  If your read of curthread looks like:
 
  mov pcpu reg, r0
  add offset of pc_curthread, r0
  ld r0, r1
 
  Then that will indeed break.  Alpha used a fixed register for 'pcpu_reg'
  (as does ia64 IIRC).  OTOH, you might also be able to depend on the fact 
  that
  pc_curthread is the first thing in 'struct pcpu' (and always will be, you 
  could
  add a CTASSERT to future-proof).  In that case, you can remove the 'add'
  instruction and instead just do:
 
  ld pcpu reg, r1
 
  which is fine.

 Just for the record. There are three extra (coprocessor) registers per
 a core in arm11mpcore (armv6k). Unfortunately only one is Privileged.
 The other ones are respectively User Read only and User Read Write.
 For now, we are using the Privileged one to store pcpu pointer
 (curthread is correctly set during each context switch). Thus, using a
 coprocessor register means that reading of curthread is not a single
 instruction implementation now. After we figured out the curthread
 issue in SMP case, using a fixed (processor) register for pcpu is an
 option. Meanwhile, we disable interrupts before reading of curthread
 and enable them after. The same is done for other PCPU stuff. For now
 we have not stable system enough for profiling, however, when it will
 be, it would be interesting to learn how various implementations of
 curthread reading impact system performance.

 curthread is read _a lot_, so I would expect this to hurt.  What we did on
 Alpha was to use a fixed register for pcpu access, but we used the equivalent
 of a coprocessor register to also store the value so we could set that fixed
 register on entry to the kernel (userland was free to use the register for
 its own purposes).

Interesting solution. Thanks.

Svata
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [patch] i386 pmap sysmaps_pcpu[] atomic access

2013-02-20 Thread Svatopluk Kraus
On Tue, Feb 19, 2013 at 7:51 PM, Konstantin Belousov
kostik...@gmail.com wrote:
 On Mon, Feb 18, 2013 at 11:18:16PM +0100, Svatopluk Kraus wrote:
 On Mon, Feb 18, 2013 at 9:36 PM, Konstantin Belousov
 kostik...@gmail.com wrote:
 Well, I'm taking a part on porting FreeBSD to ARM11mpcore. UP case was
 simple. SMP case is more complex and rather new for me. Recently, I
 was solving a problem with PCPU stuff. For example, PCPU_GET is
 implemented by one instruction on i386 arch. So, a need of atomicity
 with respect to interrupts can be overlooked. On load-store archs, the
 implementation which works in SMP case is not so simple. And what
 works in UP case (single PCPU), not works in SMP case. Believe me,
 mysterious and sporadic 'mutex not owned' assertions and others ones
 caused by curthreads mess, it takes a while ...
 Note that PCPU_GET() is not needed to be atomic. The reason is that the code
 which uses its result would not be atomic (single-instruction or whatever, see
 below). Thus, either the preemption should not matter since the action with
 the per-cpu data is advisory, as is in the case of i386 pmap you noted.
 or some external measures should be applied in advance to the containing
 region (which you proposed, by bracing with sched_pin()).

So, it's advisory in the case of i386 pmap... Well, if you can live
with that, I can too.


 Also, note that it is not interrupts but preemption which is concern.

Yes and no. In theory, yes, a preemption is a concern. In FreeBSD,
however, sched_pin() and critical_enter() and their counterparts are
implemented with help of curthread. And curthread definition falls to
PCPU_GET(curthread) if not defined in other way. So, curthread should
be atomic with respect to interrupts and in general, PCPU_GET() should
be too. Note that spinlock_enter() definitions often (always) use
curthread too. Anyhow, it's defined in MD code and can be defined for
each arch separately.


 After this, I took a look at how PCPU stuff is used in whole kernel
 and found out mentioned here i386 pmap issue. So, my concern is
 following:

 1. to verify my newly gained ideas how per CPU data should be used,
 2. to decide how to change my implementation of pmap on ARM11mpcore,
 as it's based on i386 pmap,
 3. to make FreeBSD code better.

 In the meanwhile, it looks that using data dedicated to one CPU on
 another one is OK. However, I can't agree. At least, without comments,
 it is misleading for anyone new in FreeBSD and makes code misty.

  Both threads in your description make useful progress, and computation
  proceeds correctly.

 I thought, there is only one thread in my example. One thread running
 on CPU1, but holding sysmaps dedicated to CPU0 instead of holding
 sysmaps dedicated to CPU1. So, any thread running on CPU0 must wait
 because the thread running on CPU1 is a thief. Futhermore, the idea
 that a thread on CPU1 should hold data for CPU1 is not valid. So,
 either some comment is missing in the code that it's OK or the using
 of PCPU_GET(cpuid) is unneeded and some kind of free sysmaps list can
 be used and it will serve better.

 What is wrong on almost all architectures except x86 is PCPU_ADD(), which
 is usually supposed by the MD code to be atomic in regard to _both_
 interrupts and preemption. I said almost all, but in fact I am not aware
 of any architecture except x86 where it is right.

 And, RISCs with the load-link and store-conditional (ll/sc) primitives
 are well capable of doing the PCPU_ADD() correctly.

 You could look at the projects/counters branch, where single-instruction
 increment is used on x86 for dynamic per-cpu counters, and critical_enter()
 for other architectures.  I might do some work for other arches, but the
 counters are correct but slower that it could be, on !x86.

Thanks to help me to settle my thoughts.

Svata
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [patch] i386 pmap sysmaps_pcpu[] atomic access

2013-02-20 Thread John Baldwin
On Wednesday, February 20, 2013 7:31:08 am Svatopluk Kraus wrote:
 On Tue, Feb 19, 2013 at 7:51 PM, Konstantin Belousov
 kostik...@gmail.com wrote:
  On Mon, Feb 18, 2013 at 11:18:16PM +0100, Svatopluk Kraus wrote:
  On Mon, Feb 18, 2013 at 9:36 PM, Konstantin Belousov
  kostik...@gmail.com wrote:
  Well, I'm taking a part on porting FreeBSD to ARM11mpcore. UP case was
  simple. SMP case is more complex and rather new for me. Recently, I
  was solving a problem with PCPU stuff. For example, PCPU_GET is
  implemented by one instruction on i386 arch. So, a need of atomicity
  with respect to interrupts can be overlooked. On load-store archs, the
  implementation which works in SMP case is not so simple. And what
  works in UP case (single PCPU), not works in SMP case. Believe me,
  mysterious and sporadic 'mutex not owned' assertions and others ones
  caused by curthreads mess, it takes a while ...
  Note that PCPU_GET() is not needed to be atomic. The reason is that the code
  which uses its result would not be atomic (single-instruction or whatever, 
  see
  below). Thus, either the preemption should not matter since the action with
  the per-cpu data is advisory, as is in the case of i386 pmap you noted.
  or some external measures should be applied in advance to the containing
  region (which you proposed, by bracing with sched_pin()).
 
 So, it's advisory in the case of i386 pmap... Well, if you can live
 with that, I can too.
 
 
  Also, note that it is not interrupts but preemption which is concern.
 
 Yes and no. In theory, yes, a preemption is a concern. In FreeBSD,
 however, sched_pin() and critical_enter() and their counterparts are
 implemented with help of curthread. And curthread definition falls to
 PCPU_GET(curthread) if not defined in other way. So, curthread should
 be atomic with respect to interrupts and in general, PCPU_GET() should
 be too. Note that spinlock_enter() definitions often (always) use
 curthread too. Anyhow, it's defined in MD code and can be defined for
 each arch separately.

curthread is a bit magic. :)  If you perform a context switch during an
interrupt (which will change 'curthread') you also change your register state.
When you resume, the register state is also restored.  This means that while
something like 'PCPU_GET(cpuid)' might be stale after you read it, 'curthread'
never is.  However, it is true that actually reading curthread has to be
atomic.  If your read of curthread looks like:

mov pcpu reg, r0
add offset of pc_curthread, r0
ld r0, r1

Then that will indeed break.  Alpha used a fixed register for 'pcpu_reg'
(as does ia64 IIRC).  OTOH, you might also be able to depend on the fact that
pc_curthread is the first thing in 'struct pcpu' (and always will be, you could
add a CTASSERT to future-proof).  In that case, you can remove the 'add'
instruction and instead just do:

ld pcpu reg, r1

which is fine.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [patch] i386 pmap sysmaps_pcpu[] atomic access

2013-02-20 Thread Konstantin Belousov
On Wed, Feb 20, 2013 at 10:22:29AM -0500, John Baldwin wrote:
 On Wednesday, February 20, 2013 7:31:08 am Svatopluk Kraus wrote:
  On Tue, Feb 19, 2013 at 7:51 PM, Konstantin Belousov
  kostik...@gmail.com wrote:
   On Mon, Feb 18, 2013 at 11:18:16PM +0100, Svatopluk Kraus wrote:
   On Mon, Feb 18, 2013 at 9:36 PM, Konstantin Belousov
   kostik...@gmail.com wrote:
   Well, I'm taking a part on porting FreeBSD to ARM11mpcore. UP case was
   simple. SMP case is more complex and rather new for me. Recently, I
   was solving a problem with PCPU stuff. For example, PCPU_GET is
   implemented by one instruction on i386 arch. So, a need of atomicity
   with respect to interrupts can be overlooked. On load-store archs, the
   implementation which works in SMP case is not so simple. And what
   works in UP case (single PCPU), not works in SMP case. Believe me,
   mysterious and sporadic 'mutex not owned' assertions and others ones
   caused by curthreads mess, it takes a while ...
   Note that PCPU_GET() is not needed to be atomic. The reason is that the 
   code
   which uses its result would not be atomic (single-instruction or 
   whatever, see
   below). Thus, either the preemption should not matter since the action 
   with
   the per-cpu data is advisory, as is in the case of i386 pmap you noted.
   or some external measures should be applied in advance to the containing
   region (which you proposed, by bracing with sched_pin()).
  
  So, it's advisory in the case of i386 pmap... Well, if you can live
  with that, I can too.
  
  
   Also, note that it is not interrupts but preemption which is concern.
  
  Yes and no. In theory, yes, a preemption is a concern. In FreeBSD,
  however, sched_pin() and critical_enter() and their counterparts are
  implemented with help of curthread. And curthread definition falls to
  PCPU_GET(curthread) if not defined in other way. So, curthread should
  be atomic with respect to interrupts and in general, PCPU_GET() should
  be too. Note that spinlock_enter() definitions often (always) use
  curthread too. Anyhow, it's defined in MD code and can be defined for
  each arch separately.
 
 curthread is a bit magic. :)  If you perform a context switch during an
 interrupt (which will change 'curthread') you also change your register state.
 When you resume, the register state is also restored.  This means that while
 something like 'PCPU_GET(cpuid)' might be stale after you read it, 'curthread'
 never is.  However, it is true that actually reading curthread has to be
 atomic.  If your read of curthread looks like:
 
   mov pcpu reg, r0
   add offset of pc_curthread, r0
   ld r0, r1
 
 Then that will indeed break.  Alpha used a fixed register for 'pcpu_reg'
 (as does ia64 IIRC).  OTOH, you might also be able to depend on the fact that
 pc_curthread is the first thing in 'struct pcpu' (and always will be, you 
 could
 add a CTASSERT to future-proof).  In that case, you can remove the 'add'
 instruction and instead just do:
 
   ld pcpu reg, r1
 
 which is fine.

I just looked at the arm pcpu.h, and indeed the curthread falls
back to the default implementation from sys/pcpu.h, which is
get_pcpu()-pc_curthread. This looks buggy for SMP, does our arm port
support any multi-cpu configuration ?



pgpoLvGDsLIGn.pgp
Description: PGP signature


Re: [patch] i386 pmap sysmaps_pcpu[] atomic access

2013-02-20 Thread John Baldwin
On Wednesday, February 20, 2013 2:27:39 pm Konstantin Belousov wrote:
 On Wed, Feb 20, 2013 at 10:22:29AM -0500, John Baldwin wrote:
  On Wednesday, February 20, 2013 7:31:08 am Svatopluk Kraus wrote:
   On Tue, Feb 19, 2013 at 7:51 PM, Konstantin Belousov
   kostik...@gmail.com wrote:
On Mon, Feb 18, 2013 at 11:18:16PM +0100, Svatopluk Kraus wrote:
On Mon, Feb 18, 2013 at 9:36 PM, Konstantin Belousov
kostik...@gmail.com wrote:
Well, I'm taking a part on porting FreeBSD to ARM11mpcore. UP case was
simple. SMP case is more complex and rather new for me. Recently, I
was solving a problem with PCPU stuff. For example, PCPU_GET is
implemented by one instruction on i386 arch. So, a need of atomicity
with respect to interrupts can be overlooked. On load-store archs, the
implementation which works in SMP case is not so simple. And what
works in UP case (single PCPU), not works in SMP case. Believe me,
mysterious and sporadic 'mutex not owned' assertions and others ones
caused by curthreads mess, it takes a while ...
Note that PCPU_GET() is not needed to be atomic. The reason is that the 
code
which uses its result would not be atomic (single-instruction or 
whatever, see
below). Thus, either the preemption should not matter since the action 
with
the per-cpu data is advisory, as is in the case of i386 pmap you noted.
or some external measures should be applied in advance to the containing
region (which you proposed, by bracing with sched_pin()).
   
   So, it's advisory in the case of i386 pmap... Well, if you can live
   with that, I can too.
   
   
Also, note that it is not interrupts but preemption which is concern.
   
   Yes and no. In theory, yes, a preemption is a concern. In FreeBSD,
   however, sched_pin() and critical_enter() and their counterparts are
   implemented with help of curthread. And curthread definition falls to
   PCPU_GET(curthread) if not defined in other way. So, curthread should
   be atomic with respect to interrupts and in general, PCPU_GET() should
   be too. Note that spinlock_enter() definitions often (always) use
   curthread too. Anyhow, it's defined in MD code and can be defined for
   each arch separately.
  
  curthread is a bit magic. :)  If you perform a context switch during an
  interrupt (which will change 'curthread') you also change your register 
  state.
  When you resume, the register state is also restored.  This means that while
  something like 'PCPU_GET(cpuid)' might be stale after you read it, 
  'curthread'
  never is.  However, it is true that actually reading curthread has to be
  atomic.  If your read of curthread looks like:
  
  mov pcpu reg, r0
  add offset of pc_curthread, r0
  ld r0, r1
  
  Then that will indeed break.  Alpha used a fixed register for 'pcpu_reg'
  (as does ia64 IIRC).  OTOH, you might also be able to depend on the fact 
  that
  pc_curthread is the first thing in 'struct pcpu' (and always will be, you 
  could
  add a CTASSERT to future-proof).  In that case, you can remove the 'add'
  instruction and instead just do:
  
  ld pcpu reg, r1
  
  which is fine.
 
 I just looked at the arm pcpu.h, and indeed the curthread falls
 back to the default implementation from sys/pcpu.h, which is
 get_pcpu()-pc_curthread. This looks buggy for SMP, does our arm port
 support any multi-cpu configuration ?

Not in HEAD.  There is a projects branch for armv6 I think that does.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [patch] i386 pmap sysmaps_pcpu[] atomic access

2013-02-20 Thread Ian Lepore
On Wed, 2013-02-20 at 14:32 -0500, John Baldwin wrote:
 On Wednesday, February 20, 2013 2:27:39 pm Konstantin Belousov wrote:
  On Wed, Feb 20, 2013 at 10:22:29AM -0500, John Baldwin wrote:
   On Wednesday, February 20, 2013 7:31:08 am Svatopluk Kraus wrote:
On Tue, Feb 19, 2013 at 7:51 PM, Konstantin Belousov
kostik...@gmail.com wrote:
 On Mon, Feb 18, 2013 at 11:18:16PM +0100, Svatopluk Kraus wrote:
 On Mon, Feb 18, 2013 at 9:36 PM, Konstantin Belousov
 kostik...@gmail.com wrote:
 Well, I'm taking a part on porting FreeBSD to ARM11mpcore. UP case 
 was
 simple. SMP case is more complex and rather new for me. Recently, I
 was solving a problem with PCPU stuff. For example, PCPU_GET is
 implemented by one instruction on i386 arch. So, a need of atomicity
 with respect to interrupts can be overlooked. On load-store archs, 
 the
 implementation which works in SMP case is not so simple. And what
 works in UP case (single PCPU), not works in SMP case. Believe me,
 mysterious and sporadic 'mutex not owned' assertions and others ones
 caused by curthreads mess, it takes a while ...
 Note that PCPU_GET() is not needed to be atomic. The reason is that 
 the code
 which uses its result would not be atomic (single-instruction or 
 whatever, see
 below). Thus, either the preemption should not matter since the 
 action with
 the per-cpu data is advisory, as is in the case of i386 pmap you 
 noted.
 or some external measures should be applied in advance to the 
 containing
 region (which you proposed, by bracing with sched_pin()).

So, it's advisory in the case of i386 pmap... Well, if you can live
with that, I can too.


 Also, note that it is not interrupts but preemption which is concern.

Yes and no. In theory, yes, a preemption is a concern. In FreeBSD,
however, sched_pin() and critical_enter() and their counterparts are
implemented with help of curthread. And curthread definition falls to
PCPU_GET(curthread) if not defined in other way. So, curthread should
be atomic with respect to interrupts and in general, PCPU_GET() should
be too. Note that spinlock_enter() definitions often (always) use
curthread too. Anyhow, it's defined in MD code and can be defined for
each arch separately.
   
   curthread is a bit magic. :)  If you perform a context switch during an
   interrupt (which will change 'curthread') you also change your register 
   state.
   When you resume, the register state is also restored.  This means that 
   while
   something like 'PCPU_GET(cpuid)' might be stale after you read it, 
   'curthread'
   never is.  However, it is true that actually reading curthread has to be
   atomic.  If your read of curthread looks like:
   
 mov pcpu reg, r0
 add offset of pc_curthread, r0
 ld r0, r1
   
   Then that will indeed break.  Alpha used a fixed register for 'pcpu_reg'
   (as does ia64 IIRC).  OTOH, you might also be able to depend on the fact 
   that
   pc_curthread is the first thing in 'struct pcpu' (and always will be, you 
   could
   add a CTASSERT to future-proof).  In that case, you can remove the 'add'
   instruction and instead just do:
   
 ld pcpu reg, r1
   
   which is fine.
  
  I just looked at the arm pcpu.h, and indeed the curthread falls
  back to the default implementation from sys/pcpu.h, which is
  get_pcpu()-pc_curthread. This looks buggy for SMP, does our arm port
  support any multi-cpu configuration ?
 
 Not in HEAD.  There is a projects branch for armv6 I think that does.
 

There isn't anything I've heard of that makes it all the way to single
user mode yet with multiple cores enabled. 

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [patch] i386 pmap sysmaps_pcpu[] atomic access

2013-02-19 Thread Konstantin Belousov
On Mon, Feb 18, 2013 at 11:18:16PM +0100, Svatopluk Kraus wrote:
 On Mon, Feb 18, 2013 at 9:36 PM, Konstantin Belousov
 kostik...@gmail.com wrote:
 Well, I'm taking a part on porting FreeBSD to ARM11mpcore. UP case was
 simple. SMP case is more complex and rather new for me. Recently, I
 was solving a problem with PCPU stuff. For example, PCPU_GET is
 implemented by one instruction on i386 arch. So, a need of atomicity
 with respect to interrupts can be overlooked. On load-store archs, the
 implementation which works in SMP case is not so simple. And what
 works in UP case (single PCPU), not works in SMP case. Believe me,
 mysterious and sporadic 'mutex not owned' assertions and others ones
 caused by curthreads mess, it takes a while ...
Note that PCPU_GET() is not needed to be atomic. The reason is that the code
which uses its result would not be atomic (single-instruction or whatever, see
below). Thus, either the preemption should not matter since the action with
the per-cpu data is advisory, as is in the case of i386 pmap you noted.
or some external measures should be applied in advance to the containing
region (which you proposed, by bracing with sched_pin()).

Also, note that it is not interrupts but preemption which is concern.

 
 After this, I took a look at how PCPU stuff is used in whole kernel
 and found out mentioned here i386 pmap issue. So, my concern is
 following:
 
 1. to verify my newly gained ideas how per CPU data should be used,
 2. to decide how to change my implementation of pmap on ARM11mpcore,
 as it's based on i386 pmap,
 3. to make FreeBSD code better.
 
 In the meanwhile, it looks that using data dedicated to one CPU on
 another one is OK. However, I can't agree. At least, without comments,
 it is misleading for anyone new in FreeBSD and makes code misty.
 
  Both threads in your description make useful progress, and computation
  proceeds correctly.
 
 I thought, there is only one thread in my example. One thread running
 on CPU1, but holding sysmaps dedicated to CPU0 instead of holding
 sysmaps dedicated to CPU1. So, any thread running on CPU0 must wait
 because the thread running on CPU1 is a thief. Futhermore, the idea
 that a thread on CPU1 should hold data for CPU1 is not valid. So,
 either some comment is missing in the code that it's OK or the using
 of PCPU_GET(cpuid) is unneeded and some kind of free sysmaps list can
 be used and it will serve better.

What is wrong on almost all architectures except x86 is PCPU_ADD(), which
is usually supposed by the MD code to be atomic in regard to _both_
interrupts and preemption. I said almost all, but in fact I am not aware
of any architecture except x86 where it is right.

And, RISCs with the load-link and store-conditional (ll/sc) primitives
are well capable of doing the PCPU_ADD() correctly.

You could look at the projects/counters branch, where single-instruction
increment is used on x86 for dynamic per-cpu counters, and critical_enter()
for other architectures.  I might do some work for other arches, but the
counters are correct but slower that it could be, on !x86.


pgpKvcaK3GJAa.pgp
Description: PGP signature


Re: [patch] i386 pmap sysmaps_pcpu[] atomic access

2013-02-18 Thread Konstantin Belousov
On Mon, Feb 18, 2013 at 01:44:35PM +0100, Svatopluk Kraus wrote:
 Hi,
 
the access to sysmaps_pcpu[] should be atomic with respect to
 thread migration. Otherwise, a sysmaps for one CPU can be stolen by
 another CPU and the purpose of per CPU sysmaps is broken. A patch is
 enclosed.
And, what are the problem caused by the 'otherwise' ?
I do not see any.

Really, taking the mutex while bind to a CPU could be deadlock-prone
under some situations.

This was discussed at least one more time. Might be, a comment saying that
there is no issue should be added.
 
  Svata
 
 Index: sys/i386/i386/pmap.c
 ===
 --- sys/i386/i386/pmap.c  (revision 246831)
 +++ sys/i386/i386/pmap.c  (working copy)
 @@ -4146,11 +4146,11 @@
  {
   struct sysmaps *sysmaps;
 
 + sched_pin();
   sysmaps = sysmaps_pcpu[PCPU_GET(cpuid)];
   mtx_lock(sysmaps-lock);
   if (*sysmaps-CMAP2)
   panic(pmap_zero_page: CMAP2 busy);
 - sched_pin();
   *sysmaps-CMAP2 = PG_V | PG_RW | VM_PAGE_TO_PHYS(m) | PG_A | PG_M |
   pmap_cache_bits(m-md.pat_mode, 0);
   invlcaddr(sysmaps-CADDR2);
 @@ -4171,11 +4171,11 @@
  {
   struct sysmaps *sysmaps;
 
 + sched_pin();
   sysmaps = sysmaps_pcpu[PCPU_GET(cpuid)];
   mtx_lock(sysmaps-lock);
   if (*sysmaps-CMAP2)
   panic(pmap_zero_page_area: CMAP2 busy);
 - sched_pin();
   *sysmaps-CMAP2 = PG_V | PG_RW | VM_PAGE_TO_PHYS(m) | PG_A | PG_M |
   pmap_cache_bits(m-md.pat_mode, 0);
   invlcaddr(sysmaps-CADDR2);
 @@ -4220,13 +4220,13 @@
  {
   struct sysmaps *sysmaps;
 
 + sched_pin();
   sysmaps = sysmaps_pcpu[PCPU_GET(cpuid)];
   mtx_lock(sysmaps-lock);
   if (*sysmaps-CMAP1)
   panic(pmap_copy_page: CMAP1 busy);
   if (*sysmaps-CMAP2)
   panic(pmap_copy_page: CMAP2 busy);
 - sched_pin();
   invlpg((u_int)sysmaps-CADDR1);
   invlpg((u_int)sysmaps-CADDR2);
   *sysmaps-CMAP1 = PG_V | VM_PAGE_TO_PHYS(src) | PG_A |
 @@ -5072,11 +5072,11 @@
   vm_offset_t sva, eva;
 
   if ((cpu_feature  CPUID_CLFSH) != 0) {
 + sched_pin();
   sysmaps = sysmaps_pcpu[PCPU_GET(cpuid)];
   mtx_lock(sysmaps-lock);
   if (*sysmaps-CMAP2)
   panic(pmap_flush_page: CMAP2 busy);
 - sched_pin();
   *sysmaps-CMAP2 = PG_V | PG_RW | VM_PAGE_TO_PHYS(m) |
   PG_A | PG_M | pmap_cache_bits(m-md.pat_mode, 0);
   invlcaddr(sysmaps-CADDR2);
 ___
 freebsd-current@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


pgpJybqFrB7Kk.pgp
Description: PGP signature


Re: [patch] i386 pmap sysmaps_pcpu[] atomic access

2013-02-18 Thread Konstantin Belousov
On Mon, Feb 18, 2013 at 06:06:42PM +0100, Svatopluk Kraus wrote:
 On Mon, Feb 18, 2013 at 4:08 PM, Konstantin Belousov
 kostik...@gmail.com wrote:
  On Mon, Feb 18, 2013 at 01:44:35PM +0100, Svatopluk Kraus wrote:
  Hi,
 
 the access to sysmaps_pcpu[] should be atomic with respect to
  thread migration. Otherwise, a sysmaps for one CPU can be stolen by
  another CPU and the purpose of per CPU sysmaps is broken. A patch is
  enclosed.
  And, what are the problem caused by the 'otherwise' ?
  I do not see any.
 
 The 'otherwise' issue is the following:
 
 1. A thread is running on CPU0.
 
 sysmaps = sysmaps_pcpu[PCPU_GET(cpuid)];
 
 2. A sysmaps variable contains a pointer to 'CPU0' sysmaps.
 3. Now, the thread migrates into CPU1.
 4. However, the sysmaps variable still contains a pointers to 'CPU0' sysmaps.
 
   mtx_lock(sysmaps-lock);
 
 4. The thread running on CPU1 locked 'CPU0' sysmaps mutex, so the
 thread uselessly can block another thread running on CPU0. Maybe, it's
 not a problem. However, it definitely goes against the reason why the
 submaps (one for each CPU) exist.
So what ?

 
 
  Really, taking the mutex while bind to a CPU could be deadlock-prone
  under some situations.
 
  This was discussed at least one more time. Might be, a comment saying that
  there is no issue should be added.
 
 I missed the discussion. Can you point me to it, please? A deadlock is
 not problem here, however, I can be wrong, as I can't imagine now how
 a simple pinning could lead into a deadlock at all.
Because some other load on the bind cpu might prevent the thread from
being scheduled.

 
 
   Svata
 
  Index: sys/i386/i386/pmap.c
  ===
  --- sys/i386/i386/pmap.c  (revision 246831)
  +++ sys/i386/i386/pmap.c  (working copy)
  @@ -4146,11 +4146,11 @@
   {
struct sysmaps *sysmaps;
 
  + sched_pin();
sysmaps = sysmaps_pcpu[PCPU_GET(cpuid)];
mtx_lock(sysmaps-lock);
if (*sysmaps-CMAP2)
panic(pmap_zero_page: CMAP2 busy);
  - sched_pin();
*sysmaps-CMAP2 = PG_V | PG_RW | VM_PAGE_TO_PHYS(m) | PG_A | PG_M |
pmap_cache_bits(m-md.pat_mode, 0);
invlcaddr(sysmaps-CADDR2);
  @@ -4171,11 +4171,11 @@
   {
struct sysmaps *sysmaps;
 
  + sched_pin();
sysmaps = sysmaps_pcpu[PCPU_GET(cpuid)];
mtx_lock(sysmaps-lock);
if (*sysmaps-CMAP2)
panic(pmap_zero_page_area: CMAP2 busy);
  - sched_pin();
*sysmaps-CMAP2 = PG_V | PG_RW | VM_PAGE_TO_PHYS(m) | PG_A | PG_M |
pmap_cache_bits(m-md.pat_mode, 0);
invlcaddr(sysmaps-CADDR2);
  @@ -4220,13 +4220,13 @@
   {
struct sysmaps *sysmaps;
 
  + sched_pin();
sysmaps = sysmaps_pcpu[PCPU_GET(cpuid)];
mtx_lock(sysmaps-lock);
if (*sysmaps-CMAP1)
panic(pmap_copy_page: CMAP1 busy);
if (*sysmaps-CMAP2)
panic(pmap_copy_page: CMAP2 busy);
  - sched_pin();
invlpg((u_int)sysmaps-CADDR1);
invlpg((u_int)sysmaps-CADDR2);
*sysmaps-CMAP1 = PG_V | VM_PAGE_TO_PHYS(src) | PG_A |
  @@ -5072,11 +5072,11 @@
vm_offset_t sva, eva;
 
if ((cpu_feature  CPUID_CLFSH) != 0) {
  + sched_pin();
sysmaps = sysmaps_pcpu[PCPU_GET(cpuid)];
mtx_lock(sysmaps-lock);
if (*sysmaps-CMAP2)
panic(pmap_flush_page: CMAP2 busy);
  - sched_pin();
*sysmaps-CMAP2 = PG_V | PG_RW | VM_PAGE_TO_PHYS(m) |
PG_A | PG_M | pmap_cache_bits(m-md.pat_mode, 0);
invlcaddr(sysmaps-CADDR2);
  ___
  freebsd-current@freebsd.org mailing list
  http://lists.freebsd.org/mailman/listinfo/freebsd-current
  To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


pgp9RnaKGdSYD.pgp
Description: PGP signature


Re: [patch] i386 pmap sysmaps_pcpu[] atomic access

2013-02-18 Thread Svatopluk Kraus
On Mon, Feb 18, 2013 at 4:08 PM, Konstantin Belousov
kostik...@gmail.com wrote:
 On Mon, Feb 18, 2013 at 01:44:35PM +0100, Svatopluk Kraus wrote:
 Hi,

the access to sysmaps_pcpu[] should be atomic with respect to
 thread migration. Otherwise, a sysmaps for one CPU can be stolen by
 another CPU and the purpose of per CPU sysmaps is broken. A patch is
 enclosed.
 And, what are the problem caused by the 'otherwise' ?
 I do not see any.

The 'otherwise' issue is the following:

1. A thread is running on CPU0.

sysmaps = sysmaps_pcpu[PCPU_GET(cpuid)];

2. A sysmaps variable contains a pointer to 'CPU0' sysmaps.
3. Now, the thread migrates into CPU1.
4. However, the sysmaps variable still contains a pointers to 'CPU0' sysmaps.

  mtx_lock(sysmaps-lock);

4. The thread running on CPU1 locked 'CPU0' sysmaps mutex, so the
thread uselessly can block another thread running on CPU0. Maybe, it's
not a problem. However, it definitely goes against the reason why the
submaps (one for each CPU) exist.


 Really, taking the mutex while bind to a CPU could be deadlock-prone
 under some situations.

 This was discussed at least one more time. Might be, a comment saying that
 there is no issue should be added.

I missed the discussion. Can you point me to it, please? A deadlock is
not problem here, however, I can be wrong, as I can't imagine now how
a simple pinning could lead into a deadlock at all.


  Svata

 Index: sys/i386/i386/pmap.c
 ===
 --- sys/i386/i386/pmap.c  (revision 246831)
 +++ sys/i386/i386/pmap.c  (working copy)
 @@ -4146,11 +4146,11 @@
  {
   struct sysmaps *sysmaps;

 + sched_pin();
   sysmaps = sysmaps_pcpu[PCPU_GET(cpuid)];
   mtx_lock(sysmaps-lock);
   if (*sysmaps-CMAP2)
   panic(pmap_zero_page: CMAP2 busy);
 - sched_pin();
   *sysmaps-CMAP2 = PG_V | PG_RW | VM_PAGE_TO_PHYS(m) | PG_A | PG_M |
   pmap_cache_bits(m-md.pat_mode, 0);
   invlcaddr(sysmaps-CADDR2);
 @@ -4171,11 +4171,11 @@
  {
   struct sysmaps *sysmaps;

 + sched_pin();
   sysmaps = sysmaps_pcpu[PCPU_GET(cpuid)];
   mtx_lock(sysmaps-lock);
   if (*sysmaps-CMAP2)
   panic(pmap_zero_page_area: CMAP2 busy);
 - sched_pin();
   *sysmaps-CMAP2 = PG_V | PG_RW | VM_PAGE_TO_PHYS(m) | PG_A | PG_M |
   pmap_cache_bits(m-md.pat_mode, 0);
   invlcaddr(sysmaps-CADDR2);
 @@ -4220,13 +4220,13 @@
  {
   struct sysmaps *sysmaps;

 + sched_pin();
   sysmaps = sysmaps_pcpu[PCPU_GET(cpuid)];
   mtx_lock(sysmaps-lock);
   if (*sysmaps-CMAP1)
   panic(pmap_copy_page: CMAP1 busy);
   if (*sysmaps-CMAP2)
   panic(pmap_copy_page: CMAP2 busy);
 - sched_pin();
   invlpg((u_int)sysmaps-CADDR1);
   invlpg((u_int)sysmaps-CADDR2);
   *sysmaps-CMAP1 = PG_V | VM_PAGE_TO_PHYS(src) | PG_A |
 @@ -5072,11 +5072,11 @@
   vm_offset_t sva, eva;

   if ((cpu_feature  CPUID_CLFSH) != 0) {
 + sched_pin();
   sysmaps = sysmaps_pcpu[PCPU_GET(cpuid)];
   mtx_lock(sysmaps-lock);
   if (*sysmaps-CMAP2)
   panic(pmap_flush_page: CMAP2 busy);
 - sched_pin();
   *sysmaps-CMAP2 = PG_V | PG_RW | VM_PAGE_TO_PHYS(m) |
   PG_A | PG_M | pmap_cache_bits(m-md.pat_mode, 0);
   invlcaddr(sysmaps-CADDR2);
 ___
 freebsd-current@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [patch] i386 pmap sysmaps_pcpu[] atomic access

2013-02-18 Thread Svatopluk Kraus
On Mon, Feb 18, 2013 at 6:09 PM, Konstantin Belousov
kostik...@gmail.com wrote:
 On Mon, Feb 18, 2013 at 06:06:42PM +0100, Svatopluk Kraus wrote:
 On Mon, Feb 18, 2013 at 4:08 PM, Konstantin Belousov
 kostik...@gmail.com wrote:
  On Mon, Feb 18, 2013 at 01:44:35PM +0100, Svatopluk Kraus wrote:
  Hi,
 
 the access to sysmaps_pcpu[] should be atomic with respect to
  thread migration. Otherwise, a sysmaps for one CPU can be stolen by
  another CPU and the purpose of per CPU sysmaps is broken. A patch is
  enclosed.
  And, what are the problem caused by the 'otherwise' ?
  I do not see any.

 The 'otherwise' issue is the following:

 1. A thread is running on CPU0.

 sysmaps = sysmaps_pcpu[PCPU_GET(cpuid)];

 2. A sysmaps variable contains a pointer to 'CPU0' sysmaps.
 3. Now, the thread migrates into CPU1.
 4. However, the sysmaps variable still contains a pointers to 'CPU0' sysmaps.

   mtx_lock(sysmaps-lock);

 4. The thread running on CPU1 locked 'CPU0' sysmaps mutex, so the
 thread uselessly can block another thread running on CPU0. Maybe, it's
 not a problem. However, it definitely goes against the reason why the
 submaps (one for each CPU) exist.
 So what ?

It depends. You don't understand it or you think it's ok? Tell me.




  Really, taking the mutex while bind to a CPU could be deadlock-prone
  under some situations.
 
  This was discussed at least one more time. Might be, a comment saying that
  there is no issue should be added.

 I missed the discussion. Can you point me to it, please? A deadlock is
 not problem here, however, I can be wrong, as I can't imagine now how
 a simple pinning could lead into a deadlock at all.
 Because some other load on the bind cpu might prevent the thread from
 being scheduled.

I'm afraid I still have no idea. On single CPU, a binding has no
meaning. Thus, if any deadlock exists then exists without binding too.
Hmm, you are talking about a deadlock caused by heavy CPU load? Is it
a deadlock at all? Anyhow, mutex is a lock with priority propagation,
isn't it?



 
   Svata
 
  Index: sys/i386/i386/pmap.c
  ===
  --- sys/i386/i386/pmap.c  (revision 246831)
  +++ sys/i386/i386/pmap.c  (working copy)
  @@ -4146,11 +4146,11 @@
   {
struct sysmaps *sysmaps;
 
  + sched_pin();
sysmaps = sysmaps_pcpu[PCPU_GET(cpuid)];
mtx_lock(sysmaps-lock);
if (*sysmaps-CMAP2)
panic(pmap_zero_page: CMAP2 busy);
  - sched_pin();
*sysmaps-CMAP2 = PG_V | PG_RW | VM_PAGE_TO_PHYS(m) | PG_A | PG_M |
pmap_cache_bits(m-md.pat_mode, 0);
invlcaddr(sysmaps-CADDR2);
  @@ -4171,11 +4171,11 @@
   {
struct sysmaps *sysmaps;
 
  + sched_pin();
sysmaps = sysmaps_pcpu[PCPU_GET(cpuid)];
mtx_lock(sysmaps-lock);
if (*sysmaps-CMAP2)
panic(pmap_zero_page_area: CMAP2 busy);
  - sched_pin();
*sysmaps-CMAP2 = PG_V | PG_RW | VM_PAGE_TO_PHYS(m) | PG_A | PG_M |
pmap_cache_bits(m-md.pat_mode, 0);
invlcaddr(sysmaps-CADDR2);
  @@ -4220,13 +4220,13 @@
   {
struct sysmaps *sysmaps;
 
  + sched_pin();
sysmaps = sysmaps_pcpu[PCPU_GET(cpuid)];
mtx_lock(sysmaps-lock);
if (*sysmaps-CMAP1)
panic(pmap_copy_page: CMAP1 busy);
if (*sysmaps-CMAP2)
panic(pmap_copy_page: CMAP2 busy);
  - sched_pin();
invlpg((u_int)sysmaps-CADDR1);
invlpg((u_int)sysmaps-CADDR2);
*sysmaps-CMAP1 = PG_V | VM_PAGE_TO_PHYS(src) | PG_A |
  @@ -5072,11 +5072,11 @@
vm_offset_t sva, eva;
 
if ((cpu_feature  CPUID_CLFSH) != 0) {
  + sched_pin();
sysmaps = sysmaps_pcpu[PCPU_GET(cpuid)];
mtx_lock(sysmaps-lock);
if (*sysmaps-CMAP2)
panic(pmap_flush_page: CMAP2 busy);
  - sched_pin();
*sysmaps-CMAP2 = PG_V | PG_RW | VM_PAGE_TO_PHYS(m) |
PG_A | PG_M | pmap_cache_bits(m-md.pat_mode, 0);
invlcaddr(sysmaps-CADDR2);
  ___
  freebsd-current@freebsd.org mailing list
  http://lists.freebsd.org/mailman/listinfo/freebsd-current
  To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [patch] i386 pmap sysmaps_pcpu[] atomic access

2013-02-18 Thread Konstantin Belousov
On Mon, Feb 18, 2013 at 09:27:40PM +0100, Svatopluk Kraus wrote:
 On Mon, Feb 18, 2013 at 6:09 PM, Konstantin Belousov
 kostik...@gmail.com wrote:
  On Mon, Feb 18, 2013 at 06:06:42PM +0100, Svatopluk Kraus wrote:
  On Mon, Feb 18, 2013 at 4:08 PM, Konstantin Belousov
  kostik...@gmail.com wrote:
   On Mon, Feb 18, 2013 at 01:44:35PM +0100, Svatopluk Kraus wrote:
   Hi,
  
  the access to sysmaps_pcpu[] should be atomic with respect to
   thread migration. Otherwise, a sysmaps for one CPU can be stolen by
   another CPU and the purpose of per CPU sysmaps is broken. A patch is
   enclosed.
   And, what are the problem caused by the 'otherwise' ?
   I do not see any.
 
  The 'otherwise' issue is the following:
 
  1. A thread is running on CPU0.
 
  sysmaps = sysmaps_pcpu[PCPU_GET(cpuid)];
 
  2. A sysmaps variable contains a pointer to 'CPU0' sysmaps.
  3. Now, the thread migrates into CPU1.
  4. However, the sysmaps variable still contains a pointers to 'CPU0' 
  sysmaps.
 
mtx_lock(sysmaps-lock);
 
  4. The thread running on CPU1 locked 'CPU0' sysmaps mutex, so the
  thread uselessly can block another thread running on CPU0. Maybe, it's
  not a problem. However, it definitely goes against the reason why the
  submaps (one for each CPU) exist.
  So what ?
 
 It depends. You don't understand it or you think it's ok? Tell me.
 
Both. I do not understand your concern, and I think that the code is fine.

Both threads in your description make useful progress, and computation
proceeds correctly.

 
 
 
   Really, taking the mutex while bind to a CPU could be deadlock-prone
   under some situations.
  
   This was discussed at least one more time. Might be, a comment saying 
   that
   there is no issue should be added.
 
  I missed the discussion. Can you point me to it, please? A deadlock is
  not problem here, however, I can be wrong, as I can't imagine now how
  a simple pinning could lead into a deadlock at all.
  Because some other load on the bind cpu might prevent the thread from
  being scheduled.
 
 I'm afraid I still have no idea. On single CPU, a binding has no
 meaning. Thus, if any deadlock exists then exists without binding too.
 Hmm, you are talking about a deadlock caused by heavy CPU load? Is it
 a deadlock at all? Anyhow, mutex is a lock with priority propagation,
 isn't it?
 

When executing on single cpu, kernel sometimes make different decisions.
Yes, the deadlock can be more precisely described as livelock.

It might not make any matter for exactly this case, but still is useful
to remember.


pgpRzTWR25fZB.pgp
Description: PGP signature


Re: [patch] i386 pmap sysmaps_pcpu[] atomic access

2013-02-18 Thread Svatopluk Kraus
On Mon, Feb 18, 2013 at 9:36 PM, Konstantin Belousov
kostik...@gmail.com wrote:
 On Mon, Feb 18, 2013 at 09:27:40PM +0100, Svatopluk Kraus wrote:
 On Mon, Feb 18, 2013 at 6:09 PM, Konstantin Belousov
 kostik...@gmail.com wrote:
  On Mon, Feb 18, 2013 at 06:06:42PM +0100, Svatopluk Kraus wrote:
  On Mon, Feb 18, 2013 at 4:08 PM, Konstantin Belousov
  kostik...@gmail.com wrote:
   On Mon, Feb 18, 2013 at 01:44:35PM +0100, Svatopluk Kraus wrote:
   Hi,
  
  the access to sysmaps_pcpu[] should be atomic with respect to
   thread migration. Otherwise, a sysmaps for one CPU can be stolen by
   another CPU and the purpose of per CPU sysmaps is broken. A patch is
   enclosed.
   And, what are the problem caused by the 'otherwise' ?
   I do not see any.
 
  The 'otherwise' issue is the following:
 
  1. A thread is running on CPU0.
 
  sysmaps = sysmaps_pcpu[PCPU_GET(cpuid)];
 
  2. A sysmaps variable contains a pointer to 'CPU0' sysmaps.
  3. Now, the thread migrates into CPU1.
  4. However, the sysmaps variable still contains a pointers to 'CPU0' 
  sysmaps.
 
mtx_lock(sysmaps-lock);
 
  4. The thread running on CPU1 locked 'CPU0' sysmaps mutex, so the
  thread uselessly can block another thread running on CPU0. Maybe, it's
  not a problem. However, it definitely goes against the reason why the
  submaps (one for each CPU) exist.
  So what ?

 It depends. You don't understand it or you think it's ok? Tell me.

 Both. I do not understand your concern, and I think that the code is fine.

Well, I'm taking a part on porting FreeBSD to ARM11mpcore. UP case was
simple. SMP case is more complex and rather new for me. Recently, I
was solving a problem with PCPU stuff. For example, PCPU_GET is
implemented by one instruction on i386 arch. So, a need of atomicity
with respect to interrupts can be overlooked. On load-store archs, the
implementation which works in SMP case is not so simple. And what
works in UP case (single PCPU), not works in SMP case. Believe me,
mysterious and sporadic 'mutex not owned' assertions and others ones
caused by curthreads mess, it takes a while ...

After this, I took a look at how PCPU stuff is used in whole kernel
and found out mentioned here i386 pmap issue. So, my concern is
following:

1. to verify my newly gained ideas how per CPU data should be used,
2. to decide how to change my implementation of pmap on ARM11mpcore,
as it's based on i386 pmap,
3. to make FreeBSD code better.

In the meanwhile, it looks that using data dedicated to one CPU on
another one is OK. However, I can't agree. At least, without comments,
it is misleading for anyone new in FreeBSD and makes code misty.

 Both threads in your description make useful progress, and computation
 proceeds correctly.

I thought, there is only one thread in my example. One thread running
on CPU1, but holding sysmaps dedicated to CPU0 instead of holding
sysmaps dedicated to CPU1. So, any thread running on CPU0 must wait
because the thread running on CPU1 is a thief. Futhermore, the idea
that a thread on CPU1 should hold data for CPU1 is not valid. So,
either some comment is missing in the code that it's OK or the using
of PCPU_GET(cpuid) is unneeded and some kind of free sysmaps list can
be used and it will serve better.


 
 
   Really, taking the mutex while bind to a CPU could be deadlock-prone
   under some situations.
  
   This was discussed at least one more time. Might be, a comment saying 
   that
   there is no issue should be added.
 
  I missed the discussion. Can you point me to it, please? A deadlock is
  not problem here, however, I can be wrong, as I can't imagine now how
  a simple pinning could lead into a deadlock at all.
  Because some other load on the bind cpu might prevent the thread from
  being scheduled.

 I'm afraid I still have no idea. On single CPU, a binding has no
 meaning. Thus, if any deadlock exists then exists without binding too.
 Hmm, you are talking about a deadlock caused by heavy CPU load? Is it
 a deadlock at all? Anyhow, mutex is a lock with priority propagation,
 isn't it?


 When executing on single cpu, kernel sometimes make different decisions.
 Yes, the deadlock can be more precisely described as livelock.

 It might not make any matter for exactly this case, but still is useful
 to remember.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org