Re: 4byte aligned com(4) and PCI_MAPREG_TYPE_MEM

2014-02-16 Thread Masanobu SAITOH

Hi.

(2014/02/15 1:58), Izumi Tsutsui wrote:

I'd suggest to clarify what problems you are trying to solve
and consider how it should be solved, before updating your patch.

The problems you mentioned are:
(1) merge initialization of sparse register mappings (with 4 byte stride)


Right.


(2) defer consinit() for puc com to use uvm_km_alloc() in it


Right though the real goal is not to defer consinit but to support
memory mapped I/O in com(4).

And,

  (3) Almost all drivers don't clear struct com_regs regs. It's not
 a real bug but should be cleard for further change.


Your patch is trying to solve them by:
(1) change COM_INIT_REGS() (and comprobe1()) APIs to pass stride byte
(2) add an MI extent_initted global variable and check it in MD consinit()

My vote is:
(1) leave existing APIs and handle the quirk in MD attachment
(2) add an x86 specific MD variable to defer consinit() till cpu_startup()


 I think static int iniited variable is used to do that, but there
is no way to know whether extent_init() (or uvm_init())  is called
or not.


Because:
(1)
  - it's really pain to change the MI APIs
(so many attachments and most of them will rarely be tested unfortunately)
  - only three or four attachments can share the new API
while such embedded devices often might have more other quirks


 Before writing that patch, I wrote another patch which did't change
the arguments of COM_INIT_REGS(). We have two choices

a) Use COM_INIT_REGS() for all drivers.
b) Copy COM_INIT_REGS() and modify to support 4 byte stride.

I couldn't decice which one was better or not. Now I think not change
the argument and make a new macro is better than adding new argument,
because considering the byte order is not required for 1 byte slide
devices.


  - even if stride handling is really necessary in MI part,
it's much better to prepare new wrap functions,
like wdcprobe1() and wdcprobe() in wdc.c
(i.e. prepare a new COM_INIT_REGS_STRIDE() macro with a new arg
 and make exiting COM_INIT_REGS() macro use it)

(2)
  - it's unclear what functions actually require the extent_init()
(I guess uvm_init() is enough to call uvm_km_alloc())


 Me neither... I had thought that uvm_init() was enough, but someone
advived me that exten_init() should be called.


  - in general MI code assumes that console devices are properly
mapped by MD bootstrap or its firmware


 Yes. But, it's little hard to make such mapping as PCI bus_space_tag
and handle in eary boot stage.


  - some ports already has MD flags to switch malloc(9) or static
memory in early device mappings and initialization


 It can be used, but IMHO, checking whether bus_space_map() can be
called is more generic than it.



Just my two cents,

---
Izumi Tsutsui



 Please give me a little time to write another patch.

--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


pcb offset into uarea

2014-02-16 Thread David Laight
I'm adding code to i386 and amd64 to save the ymm registers on process
switch - allowing userspace to use the AVX instructions.

I also don't want to have to do it all again when the next set of
extensions appear.
This means that the size of the FPU save area (currently embedded in
the pcb) can't be determined until runtime.

Plan A is to move the FPU save are to the end of the pcb, and then
locate the pcb at the correct offset in the uarea so that the written
region ends at the end of the page.
The problem with this is that the offset of the pcb in the uarea
is set by MI code based on some #defines - and there seem to be
several related values.

Now on x86 (like most systems) the cpu stack advances into low memory.
The pcb is placed at the end of the uarea with the intial stack pointer
just below it.
I suspect that a long time ago (when the uarea had a fixed KVA) an
additional memory page was placed below the uarea to give interrupts
more stack space. I don't think this happens any more.

As an aside: The uarea used to be pageable, whereas (what is now) the
lwp structure isn't. Paging of uarea's was disabled a few years back
- so there is no real difference between the lifetimes of an lwp a uarea.
(zombies probably lose the uarea before the lwp).

An alternative would be to place the FP save area at the start of the uarea.
This would mean that, on stack overflow, the FP save area would be trashed
before some random piece of memory.
It might even be worth putting the pcb at the start of the uarea - so that
stack overflow crashes out the failing process, and probably earlier
than the random corruption would.

This gives me three options:
A) Put the save area at the end of the pcb and dynamically adjust the pcb
   offset.
B) Put the save area at the start of the uarea, with the pcb at a fixed
   offset at the end of the uarea.
C) Put the save area at the end of the pcb, and put the pcb at the start
   of the uarea.

Votes?
What have I missed?

David

-- 
David Laight: da...@l8s.co.uk


Re: pcb offset into uarea

2014-02-16 Thread Matt Thomas

On Feb 16, 2014, at 1:41 PM, David Laight da...@l8s.co.uk wrote:

 I'm adding code to i386 and amd64 to save the ymm registers on process
 switch - allowing userspace to use the AVX instructions.
 
 I also don't want to have to do it all again when the next set of
 extensions appear.
 This means that the size of the FPU save area (currently embedded in
 the pcb) can't be determined until runtime.
 
 Plan A is to move the FPU save are to the end of the pcb, and then
 locate the pcb at the correct offset in the uarea so that the written
 region ends at the end of the page.
 The problem with this is that the offset of the pcb in the uarea
 is set by MI code based on some #defines - and there seem to be
 several related values.
 
 Now on x86 (like most systems) the cpu stack advances into low memory.
 The pcb is placed at the end of the uarea with the intial stack pointer
 just below it.
 I suspect that a long time ago (when the uarea had a fixed KVA) an
 additional memory page was placed below the uarea to give interrupts
 more stack space. I don't think this happens any more.
 
 As an aside: The uarea used to be pageable, whereas (what is now) the
 lwp structure isn't. Paging of uarea's was disabled a few years back
 - so there is no real difference between the lifetimes of an lwp a uarea.
 (zombies probably lose the uarea before the lwp).
 
 An alternative would be to place the FP save area at the start of the uarea.
 This would mean that, on stack overflow, the FP save area would be trashed
 before some random piece of memory.
 It might even be worth putting the pcb at the start of the uarea - so that
 stack overflow crashes out the failing process, and probably earlier
 than the random corruption would.

For most ports, the pcb is at the start of the uarea.

 This gives me three options:
 A) Put the save area at the end of the pcb and dynamically adjust the pcb
   offset.
 B) Put the save area at the start of the uarea, with the pcb at a fixed
   offset at the end of the uarea.
 C) Put the save area at the end of the pcb, and put the pcb at the start
   of the uarea.
 
 Votes?
 What have I missed?

Keep a default mmx/sse save area in the pcb along with a pointer to it.
If a variant is used that needs a larger save area, dynamically allocate
it and save it in the pcb pointer.

Since it's unlikely most processes will be AVX why waste the space?