Re: [Milkymist port] virtual memory management

2014-02-09 Thread Yann Sionneau

Hello Eduardo,

Le 30/05/13 22:45, Eduardo Horvath a écrit :

On Wed, 29 May 2013, Yann Sionneau wrote:


Hello NetBSD fellows,

As I mentioned in my previous e-mail, I may need from time to time a little
bit of help since this is my first "full featured OS" port.

I am wondering how I can manage virtual memory (especially how to avoid tlb
miss, or deal with them) in exception handlers.

There are essentially three ways to do this.  Which one you chose depends
on the hardware.

1) Turn off the MMU on exception

2) Keep parts of the address space untranslated

3) Lock important pages into the TLB


Turning off the MMU is pretty straight-forward.  ISTR the PowerPC Book E
processors do this.  Just look up the TLB entry in the table and return
from exception.  You just have to make sure that the kernel manages page
tables using physical addresses.
This seems like the easiest thing to do (because I won't have to think 
about recursive faults) but then if I put physical addresses in my 1st 
level page table, how does the kernel manage the page table entries?
Since the kernel runs with MMU on, using virtual addresses, it cannot 
dereference physical pointers then it cannot add/modify/remove PTEs, right?
I'm sure there is some kernel internal mechanism that I don't know about 
which could help me getting the virtual address from the physical one, 
do you know which mechanism it would be?


Also, is it possible to make sure that everything (in kernel space) is 
mapped so that virtual_addr = physical_addr - RAM_START_ADDR + 
virtual_offset
In my case RAM_START_ADDR is 0x4000 and I am trying to use 
virtual_offset of 0xc000 (everything in my kernel ELF binary is 
mapped at virtual address starting at 0xc000)
If I can ensure that this formula is always correct I can then use a 
very simple macro to translate "statically" a physical address to a 
virtual address.


Then I have another question, who is supposed to build the kernel's page 
table? pmap_bootstrap()?
If so, then how do I allocate pages for that purpose? using 
pmap_pte_pagealloc() and pmap_segtab_init() ?


FYI I am using those files for my pmap:

uvm/pmap/pmap.c
uvm/pmap/pmap_segtab.c
uvm/pmap/pmap_tlb.c

I am taking inspiration from the PPC Book-E (mpc85xx) code.

Thanks !

Regards,

--
Yann


Re: [Milkymist port] virtual memory management

2014-02-09 Thread Matt Thomas

On Feb 9, 2014, at 10:07 AM, Yann Sionneau  wrote:

> This seems like the easiest thing to do (because I won't have to think about 
> recursive faults) but then if I put physical addresses in my 1st level page 
> table, how does the kernel manage the page table entries?

BookE always has the MMU on and contains fixed TLB entries to make sure
all of physical ram is always mapped.

> Since the kernel runs with MMU on, using virtual addresses, it cannot 
> dereference physical pointers then it cannot add/modify/remove PTEs, right?

Wrong.  See above.  Note that on BookE, PTEs are purely a software 
construction and the H/W never reads them directly.

> I'm sure there is some kernel internal mechanism that I don't know about 
> which could help me getting the virtual address from the physical one, do you 
> know which mechanism it would be?

Look at __HAVE_MM_MD_DIRECT_MAPPED_PHYS and/or PMAP_{MAP,UNMAP}_POOLPAGE.


> Also, is it possible to make sure that everything (in kernel space) is mapped 
> so that virtual_addr = physical_addr - RAM_START_ADDR + virtual_offset
> In my case RAM_START_ADDR is 0x4000 and I am trying to use virtual_offset 
> of 0xc000 (everything in my kernel ELF binary is mapped at virtual 
> address starting at 0xc000)
> If I can ensure that this formula is always correct I can then use a very 
> simple macro to translate "statically" a physical address to a virtual 
> address.

Not knowing how much ram you have, I can only speak in generalities. 
But in general you reserve a part of the address space for direct mapped
memory and then place the kernel about that.

For instance, you might have 512MB of RAM which you map at 0xa000.
and then have the kernel's mapped va space start at 0xc000..

Then conversion to from PA to VA is just adding a constant while getting
the PA from a direct mapped VA is just subtraction.

> Then I have another question, who is supposed to build the kernel's page 
> table? pmap_bootstrap()?

Some part of MD code.  pmap_bootstrap() could be that.

> If so, then how do I allocate pages for that purpose? using 
> pmap_pte_pagealloc() and pmap_segtab_init() ?

usually you use pmap_steal_memory to do that.
But for mpc85xx I just allocate the kernel initial segmap in the .bss.
But the page tables were from allocated using uvm can do prebootstrap
allocations.

> 
> FYI I am using those files for my pmap:
> 
> uvm/pmap/pmap.c
> uvm/pmap/pmap_segtab.c
> uvm/pmap/pmap_tlb.c
> 
> I am taking inspiration from the PPC Book-E (mpc85xx) code.



one time crash in usb_allocmem_flags

2014-02-09 Thread Alexander Nasonov
Hi,

I was running current amd64 (last updated few weeks ago) when I got
a random crash shortly after switching to X mode. If my analysis is
correct, it crashed in usb_allocmem_flags inside this loop:

LIST_FOREACH(f, &usb_frag_freelist, next) {
KDASSERTMSG(usb_valid_block_p(f->block, &usb_blk_fraglist),
"%s: usb frag %p: unknown block pointer %p",
 __func__, f, f->block);
if (f->block->tag == tag)
break;
}

It couldn't access f->block->tag. I wasn't actively using any of
the usb devices at that time. I wonder if it's a known problem or
should I file a PR? Details of the analysis is below.

Thanks,
Alex

crash> dmesg
...
fatal protection fault in supervisor mode
trap type 4 code 0 rip 808515e2 cs 8 rflags 13282 cr2
7f7ff5773020 ilevel 0 rsp fe80ca6f16c0
curlwp 0xfe811a8aaba0 pid 475.1 lowest kstack 0xfe80ca6ee000
panic: trap
cpu2: Begin traceback...
vpanic() at netbsd:vpanic+0x13e
printf_nolog() at netbsd:printf_nolog
startlwp() at netbsd:startlwp
alltraps() at netbsd:alltraps+0x9e
ehci_allocm() at netbsd:ehci_allocm+0x2c
usbd_transfer() at netbsd:usbd_transfer+0x5f
usbd_open_pipe_intr() at netbsd:usbd_open_pipe_intr+0xcb
uhidev_open() at netbsd:uhidev_open+0xb3
wsmouseopen() at netbsd:wsmouseopen+0xf3
cdev_open() at netbsd:cdev_open+0x87
spec_open() at netbsd:spec_open+0x183
VOP_OPEN() at netbsd:VOP_OPEN+0x33
vn_open() at netbsd:vn_open+0x1b0
do_open() at netbsd:do_open+0x102
do_sys_openat() at netbsd:do_sys_openat+0x68
sys_open() at netbsd:sys_open+0x24
syscall() at netbsd:syscall+0x9a
--- syscall (number 5) ---
7f7ff403af3a:
cpu2: End traceback...
rebooting in 10 9 8 7 6 5 4 3 2 1 0

crash> dmesg|grep usb
usb0 at xhci0: USB revision 2.0
usb1 at ehci0: USB revision 2.0
uhub0 at usb0: NetBSD xHCI Root Hub, class 9/0, rev 2.00/1.00, addr 0
uhub1 at usb1: vendor 0x8086 EHCI root hub, class 9/0, rev 2.00/1.00,
addr 1
usbd_transfer() at netbsd:usbd_transfer+0x5f
usbd_open_pipe_intr() at netbsd:usbd_open_pipe_intr+0xcb

crash> x 0x808515e2
usb_allocmem_flags+0xfd:751a3948


$ objdump -d /netbsd
...
8085158b:   48 c7 c7 60 15 f8 80mov$0x80f81560,%rdi
80851592:   e8 69 42 d3 ff  callq  80585800 

80851597:   48 8b 05 c2 bf 69 00mov0x69bfc2(%rip),%rax  
  # 80eed560 
8085159e:   48 85 c0test   %rax,%rax
808515a1:   75 3c   jne808515df 


/* You don't need to look at this block */
808515a3:   48 8d 4d c8 lea-0x38(%rbp),%rcx
808515a7:   45 31 c0xor%r8d,%r8d
808515aa:   ba 40 00 00 00  mov$0x40,%edx
808515af:   be 00 10 00 00  mov$0x1000,%esi
808515b4:   48 89 dfmov%rbx,%rdi
808515b7:   e8 f4 fb ff ff  callq  808511b0 

808515bc:   89 c3   mov%eax,%ebx
808515be:   85 c0   test   %eax,%eax
808515c0:   75 ac   jne8085156e 

808515c2:   48 8b 4d c8 mov-0x38(%rbp),%rcx
808515c6:   c7 41 38 00 00 00 00movl   $0x0,0x38(%rcx)
808515cd:   bb 40 00 00 00  mov$0x40,%ebx
808515d2:   31 d2   xor%edx,%edx
808515d4:   eb 57   jmp8085162d 

/* end of block. */

/*  LIST_FOREACH(f, &usb_frag_freelist, next) { */
808515d6:   48 8b 40 10 mov0x10(%rax),%rax
808515da:   48 85 c0test   %rax,%rax
808515dd:   74 c4   je 808515a3 

808515df:   48 8b 10mov(%rax),%rdx
808515e2:   48 39 1acmp%rbx,(%rdx)
808515e5:   75 ef   jne808515d6 



crash> ps
PIDLID S CPU FLAGS   STRUCT LWP *   NAME WAIT
475  >   1 7   2 0   fe811a8aaba0   Xorg
72   1 2   3   902   fe811a709b80  xinit
43   1 2   3   802   fe811a709760 sh
437  1 2   3   802   fe811d311720ksh
420  1 2   2   802   fe811e2b6240  getty
435  1 2   0   802   fe811e2b6a80  getty
429  1 2   3   802   fe811e2b6660  login
412  1 2   0   802   fe811e4c1220  getty
390  1 2   0   802   fe8119a90b60   cron
407  1 2   0   802   fe811d767b00  inetd
342  1 2   3   802   fe811d311300privoxy
357  1 2   3   802   fe811c3b9b20   sshd
332  >   1 7   0   802   fe811d7812a0tor
31

Re: 4byte aligned com(4) and PCI_MAPREG_TYPE_MEM

2014-02-09 Thread Christos Zoulas
In article <52f7c96e.6000...@execsw.org>,
SAITOH Masanobu   wrote:
>Hello, all.
>
> I'm now working to support Intel Quark X1000.
>This chip's internal com is MMIO(PCI_MAPREG_TYPE_MEM).
>Our com and puc don't support such type of device, yet.
>To solve the problem, I wrote a patch.
>
> Registers of Quark X1000's com are 4byte aligned.
>Some other machines have such type of device, so
>I modified COM_INIT_REGS() macro to support both
>byte aligned and 4byte aligned. This change reduce
>special modifications done in atheros, rmi and
>marvell drivers.
>
> One of problem is serial console on i386 and amd64.
>These archs calls consinit() three times. The function
>is called in the following order:
>
>   1) machdep.c::init386() or init_x86_64()
>   2) init_main.c::main()
>   *) (call uvm_init())
>   *) (call extent_init())
>   3) machdep.c::cpu_startup()
>
>When consinit() called in init386(), it calls
>
>  comcnattach()
>->comcnattach1()
>  ->comcninit()
>-> bus_space_map() with x86_bus_space_mem tag.
>  ->bus_space_reservation_map()
>->x86_mem_add_mapping()
>  ->uvm_km_alloc()
>panic in KASSERT(vm_map_pmap(map) == pmap_kernel());
>
>What should I do?
>One of the solution is to check whether extent_init() was called
>or not. There is no easy way to know it, so I added a global
>variable "extent_initted". Is it acceptable?

Looks great, can't you use "cold" instead, or is that too late?

christos



Re: [Milkymist port] virtual memory management

2014-02-09 Thread Yann Sionneau

Thank you for your answer Matt,

Le 09/02/14 19:49, Matt Thomas a écrit :

On Feb 9, 2014, at 10:07 AM, Yann Sionneau  wrote:


This seems like the easiest thing to do (because I won't have to think about 
recursive faults) but then if I put physical addresses in my 1st level page 
table, how does the kernel manage the page table entries?

BookE always has the MMU on and contains fixed TLB entries to make sure
all of physical ram is always mapped.
My TLB hardware is very simple and does not give me the option to "fix" 
a TLB entry so I won't be able to do that.
the lm32 MMU is turned off upon exception (tlb miss for instance) 
automatically, then I can enable it back if I want. In the end the MMU 
is enabled back upon return from exception.



Since the kernel runs with MMU on, using virtual addresses, it cannot 
dereference physical pointers then it cannot add/modify/remove PTEs, right?

Wrong.  See above.
You mean that the TLB contains entries which map a physical address to 
itself? like 0xabcd. is mapped to 0xabcd.? Or you mean all RAM 
is always mapped but to the (0xa000.000+physical_pframe) kind of virtual 
address you mention later in your reply?

Note that on BookE, PTEs are purely a software
construction and the H/W never reads them directly.
Here my HW is like BookE, I don't have hardware page tree walker, PTEs 
are only for the software to reload the TLB when there is an exception 
(tlb miss), TLB will never read memory to find PTE in my lm32 MMU 
implementation.



I'm sure there is some kernel internal mechanism that I don't know about which 
could help me getting the virtual address from the physical one, do you know 
which mechanism it would be?

Look at __HAVE_MM_MD_DIRECT_MAPPED_PHYS and/or PMAP_{MAP,UNMAP}_POOLPAGE.

For now I have something like that:

vaddr_t
pmap_md_map_poolpage(paddr_t pa, vsize_t size)
{
  const vaddr_t sva = (vaddr_t) pa - 0x4000 + 0xc000;
  return sva;
}

But I guess it only works to access the content of kernel ELF (text and 
data) but not to access dynamic runtime kernel allocations, right?




Also, is it possible to make sure that everything (in kernel space) is mapped 
so that virtual_addr = physical_addr - RAM_START_ADDR + virtual_offset
In my case RAM_START_ADDR is 0x4000 and I am trying to use virtual_offset 
of 0xc000 (everything in my kernel ELF binary is mapped at virtual address 
starting at 0xc000)
If I can ensure that this formula is always correct I can then use a very simple macro to 
translate "statically" a physical address to a virtual address.

Not knowing how much ram you have, I can only speak in generalities.

I have 128 MB of RAM.

But in general you reserve a part of the address space for direct mapped
memory and then place the kernel about that.

For instance, you might have 512MB of RAM which you map at 0xa000.
and then have the kernel's mapped va space start at 0xc000..
So if I understand correctly, the first page of physical ram 
(0x4000.) is mapped at virtual address 0xa000. *and* at 
0xc000. ?
Isn't it a problem that a physical address is mapped twice in the same 
process (here the kernel)?

My caches are VIPT, couldn't it generate cache aliases issues?


Then conversion to from PA to VA is just adding a constant while getting
the PA from a direct mapped VA is just subtraction.


Then I have another question, who is supposed to build the kernel's page table? 
pmap_bootstrap()?

Some part of MD code.  pmap_bootstrap() could be that.


If so, then how do I allocate pages for that purpose? using 
pmap_pte_pagealloc() and pmap_segtab_init() ?

usually you use pmap_steal_memory to do that.
But for mpc85xx I just allocate the kernel initial segmap in the .bss.
But the page tables were from allocated using uvm can do prebootstrap
allocations.

Are you referring to the following code?

  /*
   * Now actually allocate the kernel PTE array (must be done
   * after virtual_end is initialized).
   */
  const vaddr_t kv_segtabs = avail[0].start;
  KASSERT(kv_segtabs == endkernel);
  KASSERT(avail[0].size >= NBPG * kv_nsegtabs);
  printf(" kv_nsegtabs=%#"PRIxVSIZE, kv_nsegtabs);
  printf(" kv_segtabs=%#"PRIxVADDR, kv_segtabs);
  avail[0].start += NBPG * kv_nsegtabs;
  avail[0].size -= NBPG * kv_nsegtabs;
  endkernel += NBPG * kv_nsegtabs;

  /*
   * Initialize the kernel's two-level page level.  This only wastes
   * an extra page for the segment table and allows the user/kernel
   * access to be common.
   */
  pt_entry_t **ptp = &stp->seg_tab[VM_MIN_KERNEL_ADDRESS >> SEGSHIFT];
  pt_entry_t *ptep = (void *)kv_segtabs;
  memset(ptep, 0, NBPG * kv_nsegtabs);
  for (size_t i = 0; i < kv_nsegtabs; i++, ptep += NPTEPG) {
*ptp++ = ptep;
  }



FYI I am using those files for my pmap:

uvm/pmap/pmap.c
uvm/pmap/pmap_segtab.c
uvm/pmap/pmap_tlb.c

I am taking inspiration from the PPC Book-E (mpc85xx) code.

Regards,

--
Yann


Re: 4byte aligned com(4) and PCI_MAPREG_TYPE_MEM

2014-02-09 Thread Dennis Ferguson

On 9 Feb, 2014, at 10:31 , SAITOH Masanobu  wrote:
> +#if BYTE_ORDER == BIG_ENDIAN
> +#define COM_INIT_REGS_OFFSET 3
> +#else
> +#define COM_INIT_REGS_OFFSET 0
> +#endif
[...]
> + regs.cr_nports = COM_NPORTS * (align);  \
> + for (int i = 0; i < __arraycount(regs.cr_map); i++) \
> + regs.cr_map[i] = com_std_map[i] * (align)   \
> + + COM_INIT_REGS_OFFSET; \

Is this correct for PCI on a big endian machine?  I'm not positive
but I thought PCI was defined to always be little endian on the bus,
so that the address of the low order byte of a register always had
the low order bits clear, and that big endian machines needed to deal
with this by byte-swapping I/O to multibyte registers, either in software
or in the PCI bridge.  Note that this is not to imply that there might
not be some other bus where the above is correct (say, obio on a big-endian
machine), I just have some doubt that it is correct for PCI.

This does suggest that it might be better to reserve single byte I/O
operations for when the hardware registers are defined to be single bytes,
and to use 4 byte operations instead when the hardware registers are defined
as being 4 bytes wide, since then the bus-dependent code should do the
right thing for you.  I realize this would do yet more damage to the driver
you are working on, however.


> +#define  COM_INIT_REGS(regs, tag, hdl, addr, align)  \
>   do {\
>   regs.cr_iot = tag;  \
>   regs.cr_ioh = hdl;  \
>   regs.cr_iobase = addr;  \
> - regs.cr_nports = COM_NPORTS;\
> + regs.cr_nports = COM_NPORTS);   \
>   } while (0)

The ')' insertion looks odd to me.

Dennis Ferguson