Re: TLB tiredown by ASID bump
On Jan 5, 2011, at 9:36 PM, Toru Nishimura wrote: > Matt Thomas made a comment; > >> The ASID generational stuff has a downside in that valid entries will be >> thrown away. For mips (and booke) I use >> a different algorithm which eliminates the overhead of >> discarding all the TLB entries when you run out of ASIDs. > > It's a good move to pursue efficent ASID management > schemes since it's the key area for runtime VM/TLB activity. > > Matt points loosing valid entries is a problem when ASID > generation is going to get bumped. I think, however, it'd be > forgiven that tbia() operation, to discard all entries but global > or locked, discards "live entries" since TLB size is still small > enough. Some CPU architectures do it with a single > special instruction or others do at-most 64 time loop to discard. > TLB is a cache for VA->PA translation and the management > scheme always provokes "efficiency v.s. correctness" > arguments. It's a matter of implementation tradeoff, I believe. More systems are getting larger numbers of TLB entries. The mpc85xx has 512 4KB fixed-size and 16 variable-sized TLB entries. One newer MIPS64 processor has 2048 4KB fixed-size and 128 variable sized TLB entries and includes a page table walker. Both of the fixed-size TLBs are set/way organized unlike the variable TLBs which are fully associative. Invalidate all becomes expensive when your TLBs get very large as does discarding live entries.
Re: TLB tiredown by ASID bump
On Jan 5, 2011, at 9:36 PM, Toru Nishimura wrote: > Matt Thomas made a comment; > >> The ASID generational stuff has a downside in that valid entries will be >> thrown away. For mips (and booke) I use >> a different algorithm which eliminates the overhead of >> discarding all the TLB entries when you run out of ASIDs. > > It's a good move to pursue efficent ASID management > schemes since it's the key area for runtime VM/TLB activity. > > Matt points loosing valid entries is a problem when ASID > generation is going to get bumped. I think, however, it'd be > forgiven that tbia() operation, to discard all entries but global > or locked, discards "live entries" since TLB size is still small > enough. Some CPU architectures do it with a single > special instruction or others do at-most 64 time loop to discard. > TLB is a cache for VA->PA translation and the management > scheme always provokes "efficiency v.s. correctness" > arguments. It's a matter of implementation tradeoff, I believe. > > BTW, how do you approach to implement a remote TLB > shootdown? It depends on the reason for the shootdown. Might as include the comments of pmap_tlb.c here: /* * Manages address spaces in a TLB. * * Normally there is a 1:1 mapping between a TLB and a CPU. However, some * implementations may share a TLB between multiple CPUs (really CPU thread * contexts). This requires the TLB abstraction to be separated from the * CPU abstraction. It also requires that the TLB be locked while doing * TLB activities. * * For each TLB, we track the ASIDs in use in a bitmap and a list of pmaps * that have a valid ASID. * * We allocate ASIDs in increasing order until we have exhausted the supply, * then reinitialize the ASID space, and start allocating again at 1. When * allocating from the ASID bitmap, we skip any ASID who has a corresponding * bit set in the ASID bitmap. Eventually this causes the ASID bitmap to fill * and, when completely filled, a reinitialization of the ASID space. * * To reinitialize the ASID space, the ASID bitmap is reset and then the ASIDs * of non-kernel TLB entries get recorded in the ASID bitmap. If the entries * in TLB consume more than half of the ASID space, all ASIDs are invalidated, * the ASID bitmap is recleared, and the list of pmaps is emptied. Otherwise, * (the normal case), any ASID present in the TLB (even those which are no * longer used by a pmap) will remain active (allocated) and all other ASIDs * will be freed. If the size of the TLB is much smaller than the ASID space, * this algorithm completely avoids TLB invalidation. * * For multiprocessors, we also have to deal TLB invalidation requests from * other CPUs, some of which are dealt with the reinitialization of the ASID * space. Whereas above we keep the ASIDs of those pmaps which have active * TLB entries, this type of reinitialization preserves the ASIDs of any * "onproc" user pmap and all other ASIDs will be freed. We must do this * since we can't change the current ASID. * * Each pmap has two bitmaps: pm_active and pm_onproc. Each bit in pm_active * indicates whether that pmap has an allocated ASID for a CPU. Each bit in * pm_onproc indicates that pmap's ASID is active (equal to the ASID in COP 0 * register EntryHi) on a CPU. The bit number comes from the CPU's cpu_index(). * Even though these bitmaps contain the bits for all CPUs, the bits that * correspond to the bits belonging to the CPUs sharing a TLB can only be * manipulated while holding that TLB's lock. Atomic ops must be used to * update them since multiple CPUs may be changing different sets of bits at * same time but these sets never overlap. * * When a change to the local TLB may require a change in the TLB's of other * CPUs, we try to avoid sending an IPI if at all possible. For instance, if * are updating a PTE and that PTE previously was invalid and therefore * couldn't support an active mapping, there's no need for an IPI since can be * no TLB entry to invalidate. The other case is when we change a PTE to be * modified we just update the local TLB. If another TLB has a stale entry, * a TLB MOD exception will be raised and that will cause the local TLB to be * updated. * * We never need to update a non-local TLB if the pmap doesn't have a valid * ASID for that TLB. If it does have a valid ASID but isn't current "onproc" * we simply reset its ASID for that TLB and at the time it goes "onproc" it * will allocate a new ASID and any existing TLB entries will be orphaned. * Only in the case that pmap has an "onproc" ASID do we actually have to send * an IPI. * * Once we determined we must send an IPI to shootdown a TLB, we need to send * it to one of CPUs that share that TLB. We choose the lowest numbered CPU * that has one of the pmap's ASID "onproc". In reality, any CPU sharing that * TLB would do, but interrupting an active CPU seems best. * * A TLB might have multiple shootdowns
Re: TLB tiredown by ASID bump
Matt Thomas made a comment; The ASID generational stuff has a downside in that valid entries will be thrown away. For mips (and booke) I use a different algorithm which eliminates the overhead of discarding all the TLB entries when you run out of ASIDs. It's a good move to pursue efficent ASID management schemes since it's the key area for runtime VM/TLB activity. Matt points loosing valid entries is a problem when ASID generation is going to get bumped. I think, however, it'd be forgiven that tbia() operation, to discard all entries but global or locked, discards "live entries" since TLB size is still small enough. Some CPU architectures do it with a single special instruction or others do at-most 64 time loop to discard. TLB is a cache for VA->PA translation and the management scheme always provokes "efficiency v.s. correctness" arguments. It's a matter of implementation tradeoff, I believe. BTW, how do you approach to implement a remote TLB shootdown? Toru Nishimura / ALKYL Technology
Re: TLB tiredown by ASID bump
On Jan 5, 2011, at 6:41 PM, Toru Nishimura wrote: > There are growing number of CPU architecture which have > ASID to extend TLB VA->PA translation and improve runtime > TLB efficiency. ASID switching is the least cost MMU > operation to make multiple processes run simultaneously. > ASID eliminates the necessity to discard whole TLB context > on each context switch. The ASID generational stuff has a downside in that valid entries will be thrown away. For mips (and booke) I use a different algorithm which eliminates the overhead of discarding all the TLB entries when you run out of ASIDs. ASIDs are allocated from low to high. If the allocation bitmap shows that ASID has been allocated, the next ASID is tried. Once the ASID hits its limit, the TLB is scanned for all active ASIDS and the allocation bitmap is updated accordingly. Then the list of active ASIDs is scanned. If the pmap has an ASID that still has matching entries in the TLB (its bit is set in the allocation bitmap), that ASID stays allocated. If a pmap has an ASID that is no longer in the TLB (corresponding bit is clear), that the pmap's ASID is discarded and the pmap will get a new on its next activation. If the TLB contains entries with ASIDs that don't match any current pmap, those ASIDs are left allocated and will not be allocated for this pass (but will be eventually reaped as those TLB entries are overwritten with new ASIDs). If there's aren't enough free ASIDs (but that rarely, if ever, happens), then the ASID pool is emptied and everyone gets a fresh ASID on their next pmap activation. > I would propose here to add a new hook in exception return > path for every NetBSD ports. The hook point is at right > after ast() call. The hook is to call TLB ASID bump op > (to write EntryHi with a single mtc0 instruction for MIPS > architecture , for example). It should be noted that the > hook is also useful to do "isync operation" for the process > which needs to invalidate instruction cache to continue. It's not really needed and doesn't do the right thing anyways.
TLB tiredown by ASID bump
There are growing number of CPU architecture which have ASID to extend TLB VA->PA translation and improve runtime TLB efficiency. ASID switching is the least cost MMU operation to make multiple processes run simultaneously. ASID eliminates the necessity to discard whole TLB context on each context switch. Bumping ASID is a good way to invalidate TLB entries which belong to a process. By assigning a new ASID, old stale TLB entries are discarded effectively. This is the case when some large portion of process address space protection is changed. Since common TLB size ranges 32 ~ 64, it would be pretty inefficient and probably senseless to repeat the large number of TLB invalidate operation for every page being changed. TLB size is small and the most of invalidate ops end with failing to find the target entries anyway. For the occasion it would be ok to schedule ASID bump operation to fire at exception return path to user context. When fired, user context restarts to fetch fresh TLB entries invoking TLBmiss like as initial exec. ASID is also in effect for process termination. The ASID, combined with a generation number, is unique to the each process, there is no necessity to discard TLB entries when a process is about to exit. It's ok to leave stale entries in TLB. They will be overwritten by other processes shortly. Bumping ASID makes sense for multi-core SMP too. It is to shoot down (tire down) potentially stale remote TLB entries belongs to offending process. It is the least cost TLB shootdown implementation without practicing heavy weight search and discard TLB operation triggered by IPI. (Vahalia "UNIX Internal" book describes TLB consistency book keeping by ASID bump at pp. 501 - 502) I would propose here to add a new hook in exception return path for every NetBSD ports. The hook point is at right after ast() call. The hook is to call TLB ASID bump op (to write EntryHi with a single mtc0 instruction for MIPS architecture , for example). It should be noted that the hook is also useful to do "isync operation" for the process which needs to invalidate instruction cache to continue. Toru Nishimura / ALKYL Technology
Re: Providing access to the ELF Auxillary Vector for all binaries
> > Further changes could include a careful check of initial system calls > of typical process traces. One change to adopt from FreeBSD (IIRC) is to > include the initial seed for arc4random to save that system call etc. Could you explain these points more carefully? What is the purpose of these checks and traces? Will they show up in .core files? --Steve Bellovin, http://www.cs.columbia.edu/~smb
Providing access to the ELF Auxillary Vector for all binaries
Hi all, attached is a patch to provide all NetBSD binaries with a reference to the auxillary vector. This is a minor change for dynamically linked programs as the usual information are already used by ld.elf_so. It is a major change for statically linked programs as it allows the dl_iterate_phdr interface to work. This is usual for libunwind and TLS support. This works as long as the PHDR is either explicitly or implicitly part of the loaded segments. This seems to be the case on NetBSD by default, but could be enforced by modifying the linker script accordingly. The loop to find the end of environ can be eliminated if the __ps_strings problem is fixed as that structure contains the number of entries in environ. Further changes could include a careful check of initial system calls of typical process traces. One change to adopt from FreeBSD (IIRC) is to include the initial seed for arc4random to save that system call etc. Joerg Index: src/lib/csu/alpha/crt0.c === --- src/lib/csu/alpha/crt0.c +++ src/lib/csu/alpha/crt0.c @@ -46,14 +46,19 @@ const Obj_Entry *obj; /* from shared loader */ struct ps_strings *ps_strings; { long argc; char **argv, *namep; + size_t i; argc = *(long *)sp; argv = sp + 1; environ = sp + 2 + argc; /* 2: argc + NULL ending argv */ + + for (i = 0; environ[i] != NULL; ) + ++i; + __auxinfo = &environ[i + 1]; if ((namep = argv[0]) != NULL) { /* NULL ptr if argc = 0 */ if ((__progname = _strrchr(namep, '/')) == NULL) __progname = namep; else Index: src/lib/csu/arm_elf/crt0.c === --- src/lib/csu/arm_elf/crt0.c +++ src/lib/csu/arm_elf/crt0.c @@ -73,13 +73,19 @@ void ___start(int argc, char **argv, char **envp, struct ps_strings *ps_strings, const Obj_Entry *obj, void (*cleanup)(void)) { + size_t i; char *ap; - environ = envp; + environ = envp; + + for (i = 0; environ[i] != NULL; ) + ++i; + __auxinfo = &environ[i + 1]; + __ps_strings = ps_strings; if ((ap = argv[0])) { if ((__progname = _strrchr(ap, '/')) == NULL) __progname = ap; Index: src/lib/csu/common/crt0-common.c === --- src/lib/csu/common/crt0-common.c +++ src/lib/csu/common/crt0-common.c @@ -64,11 +64,12 @@ extern void _mcleanup(void); extern unsigned char __etext, __eprol; #endif /* MCRT0 */ char **environ; -struct ps_strings *__ps_strings = 0; +void *__auxinfo; +struct ps_strings *__ps_strings; static char empty_string[] = ""; char *__progname = empty_string; void ___start(int, char **, char **, void (*)(void), @@ -86,11 +87,17 @@ ___start(int argc, char **argv, char **envp, void (*cleanup)(void), /* from shared loader */ const Obj_Entry *obj, /* from shared loader */ struct ps_strings *ps_strings) { + size_t i; + environ = envp; + + for (i = 0; environ[i] != NULL; ) + ++i; + __auxinfo = &environ[i + 1]; if (argv[0] != NULL) { char *c; __progname = argv[0]; for (c = argv[0]; *c; ++c) { Index: src/lib/csu/common_elf/common.h === --- src/lib/csu/common_elf/common.h +++ src/lib/csu/common_elf/common.h @@ -65,10 +65,11 @@ static char *_strrchr(char *, int); char **environ; char *__progname = ""; +void *__auxinfo; struct ps_strings *__ps_strings = 0; extern void _init(void); extern void _fini(void); Index: src/lib/csu/hppa/crt0.c === --- src/lib/csu/hppa/crt0.c +++ src/lib/csu/hppa/crt0.c @@ -71,14 +71,19 @@ int dp) { int argc; char **argv; int fini_plabel[2]; + size_t i; argc = ps_strings->ps_nargvstr; argv = ps_strings->ps_argvstr; environ = ps_strings->ps_envstr; + + for (i = 0; environ[i] != NULL; ) + ++i; + __auxinfo = &environ[i + 1]; if ((__progname = argv[0]) != NULL) { /* NULL ptr if argc = 0 */ if ((__progname = _strrchr(__progname, '/')) == NULL) __progname = argv[0]; else Index: src/lib/csu/ia64/crt0.c === --- src/lib/csu/ia64/crt0.c +++ src/lib/csu/ia64/crt0.c @@ -45,10 +45,11 @@ const Obj_Entry *obj, /* from shared loader */ struct ps_strings *ps_strings) { long argc; char **argv, *namep; + size_t i; __asm __volatile__ ("1: \ { .mii \ mov r...@gprel(1b) \n \ mov r16=ip ;; \n \ @@ -57,10 +58,14 @@ } "); argc = *(long *)sp; argv = sp + 1; environ = sp + 2 + argc; /* 2: argc + NULL ending argv */ + + for (i = 0; environ[i] != NULL; ) + ++i; + __auxinfo = &environ[i + 1]; if ((namep = argv[0]) != NULL) { /* NULL ptr if argc = 0 */ if ((__progname = _strrchr(namep, '/')) == NULL) __progname = namep; else Index: src/lib/csu/m68k_elf/crt0.c ==
Re: Bogus KASSERT() in LFS?
On Wed, 5 Jan 2011, Martin Husemann wrote: > On Wed, Jan 05, 2011 at 07:35:53PM +, Eduardo Horvath wrote: > > Really? Last time I tried (about a month or two ago) I wasn't able to > > hang LFS. OTOH, looks like there's been quite some churn since then. > > > > What's your setup and what tests are you running? > > I run src/tests/fs/vfs/t_full with argument "lfs_fillfs", unfortunately > gdb doesn't like me: > > (gdb) run lfs_fillfs > Starting program: /usr/obj/tests/fs/vfs/t_full lfs_fillfs > Segment size 1048576 is too large; trying smaller sizes. > WARNING: the log-structured file system is experimental > WARNING: it may cause system crashes and/or corrupt data > lfs_cleanerd[5658]: /mnt: attaching cleaner > lfs_cleanerd[5658]: /mnt: detaching cleaner > panic: rumpuser fatal failure 11 (Resource deadlock avoided) > > Program received signal SIGABRT, Aborted. > 0x42a09720 in ?? () > (gdb) bt > #0 0x42a09720 in ?? () > #1 0x42a09720 in ?? () > Previous frame identical to this frame (corrupt stack?) Hm. Interesting. I've never tried lfs on rump. I wonder if there are issues running both the filesystem and the cleaner as separate userland processes. > On a life kernel this probably would be a "locking against myself". Have > you tried filling lfs with a LOCKDEBUG kernel recently? Certainly. I usually do -j4 kernel builds with LOCKDEBUG, as well as other miscellaneous stressers. Eduardo
Re: Bogus KASSERT() in LFS?
On Wed, Jan 05, 2011 at 07:35:53PM +, Eduardo Horvath wrote: > Really? Last time I tried (about a month or two ago) I wasn't able to > hang LFS. OTOH, looks like there's been quite some churn since then. > > What's your setup and what tests are you running? I run src/tests/fs/vfs/t_full with argument "lfs_fillfs", unfortunately gdb doesn't like me: (gdb) run lfs_fillfs Starting program: /usr/obj/tests/fs/vfs/t_full lfs_fillfs Segment size 1048576 is too large; trying smaller sizes. WARNING: the log-structured file system is experimental WARNING: it may cause system crashes and/or corrupt data lfs_cleanerd[5658]: /mnt: attaching cleaner lfs_cleanerd[5658]: /mnt: detaching cleaner panic: rumpuser fatal failure 11 (Resource deadlock avoided) Program received signal SIGABRT, Aborted. 0x42a09720 in ?? () (gdb) bt #0 0x42a09720 in ?? () #1 0x42a09720 in ?? () Previous frame identical to this frame (corrupt stack?) On a life kernel this probably would be a "locking against myself". Have you tried filling lfs with a LOCKDEBUG kernel recently? Martin
Re: Bogus KASSERT() in LFS?
On Wed, 5 Jan 2011, Martin Husemann wrote: > I'll commit the original patch. Unfortunately I run into locking issues > later, so this still does not fix the full test. Really? Last time I tried (about a month or two ago) I wasn't able to hang LFS. OTOH, looks like there's been quite some churn since then. What's your setup and what tests are you running? Eduardo
Re: Bogus KASSERT() in LFS?
On Wed, Jan 05, 2011 at 06:06:17PM +0100, Martin Husemann wrote: > Ok, I'll add a KASSERT() to check for that and re-run the tests. Indeed, it never happens on first entry to the loop, but via the goto top later. I'll commit the original patch. Unfortunately I run into locking issues later, so this still does not fix the full test. Martin
Re: Bogus KASSERT() in LFS?
On Wed, Jan 05, 2011 at 05:03:15PM +, Eduardo Horvath wrote: > If by_list is set we'll always get here, and I don't think we'd be called > if the vnode had no pages at all Ok, I'll add a KASSERT() to check for that and re-run the tests. Martin
Re: Bogus KASSERT() in LFS?
On Wed, 5 Jan 2011, Martin Husemann wrote: > On Wed, Jan 05, 2011 at 04:25:09PM +, Eduardo Horvath wrote: > > I think you're right. While I'm pretty sure that curpg won't be NULL on > > the first iteration, I think it can be NULL on subsequent iterations. I'd > > commit that change. > > It shouldn't get there on subsequent iterations if it pulled a NULL out > of the TAILQ because it explicitily breaks out of the loop in that case. > > Why do you think it can't happen initially? Looking at the code: 1845: top: 1846:by_list = (vp->v_uobj.uo_npages <= 1847: ((endoffset - startoffset) >> PAGE_SHIFT) * 1848: UVM_PAGE_TREE_PENALTY); 1849:any_dirty = 0; 1850: 1851:if (by_list) { 1852:curpg = TAILQ_FIRST(&vp->v_uobj.memq); If by_list is set we'll always get here, and I don't think we'd be called if the vnode had no pages at all, 'cause lfs_putpages checks for that condition and exits immediately if it's true: 2062: * If there are no pages, don't do anything. 2063: */ 2064:if (vp->v_uobj.uo_npages == 0) { 2065:if (TAILQ_EMPTY(&vp->v_uobj.memq) && 2066:(vp->v_iflag & VI_ONWORKLST) && 2067:LIST_FIRST(&vp->v_dirtyblkhd) == NULL) { 2068:vp->v_iflag &= ~VI_WRMAPDIRTY; 2069:vn_syncer_remove_from_worklist(vp); 2070:} 2071:mutex_exit(&vp->v_interlock); 2072: 2073:/* Remove us from paging queue, if we were on it */ 2074:mutex_enter(&lfs_lock); 2075:if (ip->i_flags & IN_PAGING) { 2076:ip->i_flags &= ~IN_PAGING; 2077:TAILQ_REMOVE(&fs->lfs_pchainhd, ip, i_lfs_pchain); 2078:} 2079:mutex_exit(&lfs_lock); 2080:return 0; 2081:} Later: 1853:else { 1854:soff = startoffset; 1855:} 1856:while (by_list || soff < MIN(blkeof, endoffset)) { 1857:if (by_list) { So if by_list is true, curpg should not be NULL on the first iteration. However 1858:/* 1859: * Find the first page in a block. Skip 1860: * blocks outside our area of interest or beyond 1861: * the end of file. 1862: */ 1863:KASSERT((curpg->flags & PG_MARKER) == 0); 1864:if (pages_per_block > 1) { At the bottom of the loop: 1972:if (by_list) { 1973:curpg = TAILQ_NEXT(curpg, listq.queue); 1974:} else { 1975:soff += MAX(PAGE_SIZE, fs->lfs_bsize); 1976:} 1977:} If we got to the end of the queue, curpg could be NULL at the next iteration. (Sigh. One of these days I need to get inspired to give LFS an overhaul. Recursion in the kernel is not good, and the code needs to be robust enough to recover from a corrupted ifile. In theory, LFS should be much more robust than FFS even with logging, since even if you lose parts of the platter you may be able to roll back to older versions of a file) Eduardo
Re: Bogus KASSERT() in LFS?
On Wed, Jan 05, 2011 at 04:25:09PM +, Eduardo Horvath wrote: > I think you're right. While I'm pretty sure that curpg won't be NULL on > the first iteration, I think it can be NULL on subsequent iterations. I'd > commit that change. It shouldn't get there on subsequent iterations if it pulled a NULL out of the TAILQ because it explicitily breaks out of the loop in that case. Why do you think it can't happen initially? Martin
Re: Bogus KASSERT() in LFS?
On Wed, 5 Jan 2011, Martin Husemann wrote: > Disclaimer: I know nothing about LFS, but it seems to me that there is no > guarantee for "curpg" to not be NULL in the following code from > src/sys/ufs/lfs/lfs_vnops.c: > > while (by_list || soff < MIN(blkeof, endoffset)) { > if (by_list) { > /* > * Find the first page in a block. Skip > * blocks outside our area of interest or beyond > * the end of file. > */ > KASSERT(curpg == NULL || > (curpg->flags & PG_MARKER) == 0); > > > and actually some ATF tests die for me with SIGSEGV inside the KASSERT. > So, would this patch be ok? Interesting.. I think you're right. While I'm pretty sure that curpg won't be NULL on the first iteration, I think it can be NULL on subsequent iterations. I'd commit that change. Eduardo > > Index: lfs_vnops.c > === > RCS file: /cvsroot/src/sys/ufs/lfs/lfs_vnops.c,v > retrieving revision 1.233 > diff -u -r1.233 lfs_vnops.c > --- lfs_vnops.c 2 Jan 2011 05:09:32 - 1.233 > +++ lfs_vnops.c 5 Jan 2011 15:07:00 - > @@ -1860,7 +1860,8 @@ >* blocks outside our area of interest or beyond >* the end of file. >*/ > - KASSERT((curpg->flags & PG_MARKER) == 0); > + KASSERT(curpg == NULL || > + (curpg->flags & PG_MARKER) == 0); > if (pages_per_block > 1) { > while (curpg && > ((curpg->offset & fs->lfs_bmask) || >
Re: semantics of TRYEMULROOT
On Tue, Jan 04, 2011 at 07:04:56PM +, David Laight wrote: > On Wed, Jan 05, 2011 at 12:01:17AM +0900, Masao Uebayashi wrote: > > IMO emul lookup is completely a hack, and it should be redesigned > > using per-process namespace > > It is a lot less of a hack than what went before. > The only other option would be an actual (working) union mount, > and that has much bigger problems - not to mention needing > process specific mount tables! Bigger problems, yes. But the only real solution! > > OTOH I do wish the 32bit/64bit compatibility had been done as part of > the '64bit' port - so that the 64bit userspace doesn't have to rename > anything from the 32bit version - giving you a userpace that can > (at least in theory) boot either a 32bit or 64bit kernel. Do you mean multilib directories? > > David > > -- > David Laight: da...@l8s.co.uk
Re: CVS commit: src/sys/uvm
On Wed, Jan 05, 2011 at 05:58:34AM +, YAMAMOTO Takashi wrote: > hi, > > > I take silence as "no objection". > > the silence in this case means was-busy-for-other-things-and-forgot. > sorry. > > >> I have no real code for this big picture at this moment. Making > >> vm_physseg available as reference is the first step. This only > >> changes uvm_page_physload() to return a pointer: > >> > >>-void uvm_page_physload(); > >>+void *uvm_page_physload(); > >> > >> But this makes XIP pager MUCH cleaner. The reason has been explained > >> many times. > > because the separate uvm_page_physload_device is no longer necessary, > you mean? i have no problem with the step. > > >> Making fault handlers and pagers to use vm_physseg * + off_t is > >> the next step, and I don't intend to work on it now. I just want > >> to explain the big picture. > >> > >> > > >> > >> > >> > >> Keep vm_physseg * + off_t array on stack. If UVM objects uses > >> > >> vm_page (e.g. vnode), its pager looks up vm_page -> vm_physseg * > >> > >> + off_t *once* and cache it on stack. > >> > > >> > do you mean something like this? > >> > struct { > >> > vm_physseg *hoge; > >> > off_t fuga; > >> > } foo [16]; > >> > >> Yes. > >> > >> Or cache vm_page * with it, like: > >> > >>struct vm_id { > >>vm_physseg *seg; > >>off_t off; > >>vm_page *pg; > >>}; > >> > >>uvm_fault() > >>{ > >>vm_id pgs[]; > >>: > >>} > >> > >> Vnode pager (genfs_getpages) takes vm_page's by looking up > >> vnode::v_uobj's list, or uvn_findpages(). > >> > >> When it returns back to fault handler, we have to lookup vm_physseg > >> for each page. Then fill the "seg" slot above (assuming we'll > >> remove vm_page::phys_addr soon). > >> > >> Fault handler calls per-vm_page operations iff vm_page slot is filled. > >> XIP pages are not pageq'ed. > > pgo_get returns either seg+off or pg for each vm_id slots? vm_id[] is passed to pgo_get, each pgo_get (uvn_get, udv_get, ...) will fill seg + off and return the array back to uvm_fault. Pagers that use vm_page (e.g. vnode) will get pg from vnode, then lookup seg+off, fill them into the vm_id slot. Those that don't use vm_page (e.g. cdev) will get seg+off from underlying objects, fill them into the vm_id slot, and return. > > >> XIP pages don't need vm_page, but > >> cached because it's vnode. > > can you explain this sentence? Sorry, it was so unclear. XIP pages don't need vm_page (as a paging state variable), because the pages are read-only and never involve paging. XIP pages have to be mapped as cacheable at MMU level, because it's vnode. I'm allocating the full vm_page for XIP'able pages, only because chs@ wanted to support loaning, which needs vm_page::loan_count. > > >> (Just in case, have you read my paper?) > > which paper? i guess no. http://uebayasi.dyndns.org/~uebayasi/tmp/xip.pdf > > YAMAMOTO Takashi -- Masao Uebayashi / Tombi Inc. / Tel: +81-90-9141-4635
Bogus KASSERT() in LFS?
Disclaimer: I know nothing about LFS, but it seems to me that there is no guarantee for "curpg" to not be NULL in the following code from src/sys/ufs/lfs/lfs_vnops.c: while (by_list || soff < MIN(blkeof, endoffset)) { if (by_list) { /* * Find the first page in a block. Skip * blocks outside our area of interest or beyond * the end of file. */ KASSERT(curpg == NULL || (curpg->flags & PG_MARKER) == 0); and actually some ATF tests die for me with SIGSEGV inside the KASSERT. So, would this patch be ok? Index: lfs_vnops.c === RCS file: /cvsroot/src/sys/ufs/lfs/lfs_vnops.c,v retrieving revision 1.233 diff -u -r1.233 lfs_vnops.c --- lfs_vnops.c 2 Jan 2011 05:09:32 - 1.233 +++ lfs_vnops.c 5 Jan 2011 15:07:00 - @@ -1860,7 +1860,8 @@ * blocks outside our area of interest or beyond * the end of file. */ - KASSERT((curpg->flags & PG_MARKER) == 0); + KASSERT(curpg == NULL || + (curpg->flags & PG_MARKER) == 0); if (pages_per_block > 1) { while (curpg && ((curpg->offset & fs->lfs_bmask) || Martin