Re: TLB tiredown by ASID bump

2011-01-05 Thread Matt Thomas

On Jan 5, 2011, at 9:36 PM, Toru Nishimura wrote:

> Matt Thomas made a comment;
> 
>> The ASID generational stuff has a downside in that valid entries will be 
>> thrown away.  For mips (and booke) I use
>> a different algorithm which eliminates the overhead of
>> discarding all the TLB entries when you run out of ASIDs.
> 
> It's a good move to pursue efficent ASID management
> schemes since it's the key area for runtime VM/TLB activity.
> 
> Matt points loosing valid entries is a problem when ASID
> generation is going to get bumped.  I think, however, it'd be
> forgiven that tbia() operation, to discard all entries but global
> or locked, discards "live entries" since TLB size is still small
> enough.  Some CPU architectures do it  with a single
> special instruction or others do at-most 64 time loop to discard.
> TLB is a cache for VA->PA translation and the management
> scheme always provokes "efficiency v.s. correctness"
> arguments.  It's a matter of implementation tradeoff, I believe.

More systems are getting larger numbers of TLB entries.  The mpc85xx has
512 4KB fixed-size and 16 variable-sized TLB entries.  One newer MIPS64 
processor has 2048 4KB fixed-size and 128 variable sized TLB entries and
includes a page table walker.  Both of the fixed-size TLBs are set/way 
organized unlike the variable TLBs which are fully associative.

Invalidate all becomes expensive when your TLBs get very large as does
discarding live entries.  

Re: TLB tiredown by ASID bump

2011-01-05 Thread Matt Thomas

On Jan 5, 2011, at 9:36 PM, Toru Nishimura wrote:

> Matt Thomas made a comment;
> 
>> The ASID generational stuff has a downside in that valid entries will be 
>> thrown away.  For mips (and booke) I use
>> a different algorithm which eliminates the overhead of
>> discarding all the TLB entries when you run out of ASIDs.
> 
> It's a good move to pursue efficent ASID management
> schemes since it's the key area for runtime VM/TLB activity.
> 
> Matt points loosing valid entries is a problem when ASID
> generation is going to get bumped.  I think, however, it'd be
> forgiven that tbia() operation, to discard all entries but global
> or locked, discards "live entries" since TLB size is still small
> enough.  Some CPU architectures do it  with a single
> special instruction or others do at-most 64 time loop to discard.
> TLB is a cache for VA->PA translation and the management
> scheme always provokes "efficiency v.s. correctness"
> arguments.  It's a matter of implementation tradeoff, I believe.
> 
> BTW, how do you approach to implement a remote TLB
> shootdown?

It depends on the reason for the shootdown.  Might as include the
comments of pmap_tlb.c here:

/*
 * Manages address spaces in a TLB.
 *
 * Normally there is a 1:1 mapping between a TLB and a CPU.  However, some
 * implementations may share a TLB between multiple CPUs (really CPU thread
 * contexts).  This requires the TLB abstraction to be separated from the
 * CPU abstraction.  It also requires that the TLB be locked while doing
 * TLB activities.
 *
 * For each TLB, we track the ASIDs in use in a bitmap and a list of pmaps
 * that have a valid ASID.
 *
 * We allocate ASIDs in increasing order until we have exhausted the supply,
 * then reinitialize the ASID space, and start allocating again at 1.  When
 * allocating from the ASID bitmap, we skip any ASID who has a corresponding
 * bit set in the ASID bitmap.  Eventually this causes the ASID bitmap to fill
 * and, when completely filled, a reinitialization of the ASID space.
 *
 * To reinitialize the ASID space, the ASID bitmap is reset and then the ASIDs
 * of non-kernel TLB entries get recorded in the ASID bitmap.  If the entries
 * in TLB consume more than half of the ASID space, all ASIDs are invalidated,
 * the ASID bitmap is recleared, and the list of pmaps is emptied.  Otherwise,
 * (the normal case), any ASID present in the TLB (even those which are no
 * longer used by a pmap) will remain active (allocated) and all other ASIDs
 * will be freed.  If the size of the TLB is much smaller than the ASID space,
 * this algorithm completely avoids TLB invalidation.
 *
 * For multiprocessors, we also have to deal TLB invalidation requests from
 * other CPUs, some of which are dealt with the reinitialization of the ASID
 * space.  Whereas above we keep the ASIDs of those pmaps which have active
 * TLB entries, this type of reinitialization preserves the ASIDs of any
 * "onproc" user pmap and all other ASIDs will be freed.  We must do this
 * since we can't change the current ASID.
 *
 * Each pmap has two bitmaps: pm_active and pm_onproc.  Each bit in pm_active
 * indicates whether that pmap has an allocated ASID for a CPU.  Each bit in
 * pm_onproc indicates that pmap's ASID is active (equal to the ASID in COP 0
 * register EntryHi) on a CPU.  The bit number comes from the CPU's cpu_index().
 * Even though these bitmaps contain the bits for all CPUs, the bits that
 * correspond to the bits belonging to the CPUs sharing a TLB can only be
 * manipulated while holding that TLB's lock.  Atomic ops must be used to
 * update them since multiple CPUs may be changing different sets of bits at
 * same time but these sets never overlap.
 *
 * When a change to the local TLB may require a change in the TLB's of other
 * CPUs, we try to avoid sending an IPI if at all possible.  For instance, if
 * are updating a PTE and that PTE previously was invalid and therefore
 * couldn't support an active mapping, there's no need for an IPI since can be
 * no TLB entry to invalidate.  The other case is when we change a PTE to be
 * modified we just update the local TLB.  If another TLB has a stale entry,
 * a TLB MOD exception will be raised and that will cause the local TLB to be
 * updated.
 *
 * We never need to update a non-local TLB if the pmap doesn't have a valid
 * ASID for that TLB.  If it does have a valid ASID but isn't current "onproc"
 * we simply reset its ASID for that TLB and at the time it goes "onproc" it
 * will allocate a new ASID and any existing TLB entries will be orphaned.
 * Only in the case that pmap has an "onproc" ASID do we actually have to send
 * an IPI.
 *
 * Once we determined we must send an IPI to shootdown a TLB, we need to send
 * it to one of CPUs that share that TLB.  We choose the lowest numbered CPU
 * that has one of the pmap's ASID "onproc".  In reality, any CPU sharing that
 * TLB would do, but interrupting an active CPU seems best.
 *
 * A TLB might have multiple shootdowns 

Re: TLB tiredown by ASID bump

2011-01-05 Thread Toru Nishimura

Matt Thomas made a comment;

The ASID generational stuff has a downside in that valid 
entries will be thrown away.  For mips (and booke) I use

a different algorithm which eliminates the overhead of
discarding all the TLB entries when you run out of ASIDs.


It's a good move to pursue efficent ASID management
schemes since it's the key area for runtime VM/TLB activity.

Matt points loosing valid entries is a problem when ASID
generation is going to get bumped.  I think, however, it'd be
forgiven that tbia() operation, to discard all entries but global
or locked, discards "live entries" since TLB size is still small
enough.  Some CPU architectures do it  with a single
special instruction or others do at-most 64 time loop to discard.
TLB is a cache for VA->PA translation and the management
scheme always provokes "efficiency v.s. correctness"
arguments.  It's a matter of implementation tradeoff, I believe.

BTW, how do you approach to implement a remote TLB
shootdown?

Toru Nishimura / ALKYL Technology


Re: TLB tiredown by ASID bump

2011-01-05 Thread Matt Thomas

On Jan 5, 2011, at 6:41 PM, Toru Nishimura wrote:

> There are growing number of CPU architecture which have
> ASID to extend TLB VA->PA translation and improve runtime
> TLB efficiency.  ASID switching is the least cost MMU
> operation to make multiple processes run simultaneously.
> ASID eliminates the necessity to discard whole TLB context
> on each context switch.

The ASID generational stuff has a downside in that valid 
entries will be thrown away.  For mips (and booke) I use
a different algorithm which eliminates the overhead of
discarding all the TLB entries when you run out of ASIDs.

ASIDs are allocated from low to high.  If the allocation
bitmap shows that ASID has been allocated, the next ASID 
is tried.  Once the ASID hits its limit, the TLB is scanned 
for all active ASIDS and the allocation bitmap is updated
accordingly.  Then the list of active ASIDs is scanned.  If
the pmap has an ASID that still has matching entries in the
TLB (its bit is set in the allocation bitmap), that ASID 
stays allocated.  If a pmap has an ASID that is no longer 
in the TLB (corresponding bit is clear), that the pmap's ASID 
is discarded and the pmap will get a new on its next activation.
If the TLB contains entries with ASIDs that don't match any current
pmap, those ASIDs are left allocated and will not be allocated
for this pass (but will be eventually reaped as those TLB entries
are overwritten with new ASIDs).

If there's aren't enough free ASIDs (but that rarely, if ever, 
happens), then the ASID pool is emptied and everyone gets a 
fresh ASID on their next pmap activation.

> I would propose here to add a new hook in exception return
> path for every NetBSD ports.  The hook point is at right
> after ast() call.  The hook is to call TLB ASID bump op
> (to write EntryHi with a single mtc0 instruction for MIPS
> architecture , for example).   It should be noted that the
> hook is also useful to do "isync operation" for the process
> which needs to invalidate instruction cache to continue.

It's not really needed and doesn't do the right thing anyways.




TLB tiredown by ASID bump

2011-01-05 Thread Toru Nishimura

There are growing number of CPU architecture which have
ASID to extend TLB VA->PA translation and improve runtime
TLB efficiency.  ASID switching is the least cost MMU
operation to make multiple processes run simultaneously.
ASID eliminates the necessity to discard whole TLB context
on each context switch.

Bumping ASID is a good way to invalidate TLB entries
which belong to a process.  By assigning a new ASID, old
stale TLB entries are discarded effectively.  This is the case
when some large portion of process address space protection
is changed.  Since common TLB size ranges 32 ~ 64, it would
be pretty inefficient and probably senseless to repeat the large
number of TLB invalidate operation for every page being
changed.  TLB size is small and the most of invalidate ops end
with failing to find the target entries anyway.  For the occasion
it would be ok to schedule ASID bump operation to fire at
exception return path to user context.  When fired, user context
restarts to fetch fresh TLB entries invoking TLBmiss like as
initial exec.

ASID is also in effect for process termination.  The ASID,
combined with a generation number, is unique to the each
process, there is no necessity to discard TLB entries when
a process is about to exit.  It's ok to leave stale entries in
TLB.  They will be overwritten by other processes shortly.

Bumping ASID makes sense for multi-core SMP too.  It
is to shoot down (tire down) potentially stale remote TLB
entries belongs to offending process.  It is the least cost
TLB shootdown implementation without practicing heavy
weight search and discard TLB operation triggered by IPI.
(Vahalia "UNIX Internal" book describes TLB consistency
book keeping by ASID bump at pp. 501 - 502)

I would propose here to add a new hook in exception return
path for every NetBSD ports.  The hook point is at right
after ast() call.  The hook is to call TLB ASID bump op
(to write EntryHi with a single mtc0 instruction for MIPS
architecture , for example).   It should be noted that the
hook is also useful to do "isync operation" for the process
which needs to invalidate instruction cache to continue.

Toru Nishimura / ALKYL Technology


Re: Providing access to the ELF Auxillary Vector for all binaries

2011-01-05 Thread Steven Bellovin
> 
> Further changes could include a careful check of initial system calls
> of typical process traces. One change to adopt from FreeBSD (IIRC) is to
> include the initial seed for arc4random to save that system call etc.

Could you explain these points more carefully?  What is the purpose of these
checks and traces?  Will they show up in .core files?

--Steve Bellovin, http://www.cs.columbia.edu/~smb







Providing access to the ELF Auxillary Vector for all binaries

2011-01-05 Thread Joerg Sonnenberger
Hi all,
attached is a patch to provide all NetBSD binaries with a reference to
the auxillary vector. This is a minor change for dynamically linked
programs as the usual information are already used by ld.elf_so. It is a
major change for statically linked programs as it allows the
dl_iterate_phdr interface to work. This is usual for libunwind and TLS
support. This works as long as the PHDR is either explicitly or
implicitly part of the loaded segments. This seems to be the case on
NetBSD by default,  but could be enforced by modifying the linker script
accordingly.

The loop to find the end of environ can be eliminated if the
__ps_strings problem is fixed as that structure contains the number of
entries in environ.

Further changes could include a careful check of initial system calls
of typical process traces. One change to adopt from FreeBSD (IIRC) is to
include the initial seed for arc4random to save that system call etc.

Joerg
Index: src/lib/csu/alpha/crt0.c
===
--- src/lib/csu/alpha/crt0.c
+++ src/lib/csu/alpha/crt0.c
@@ -46,14 +46,19 @@
 	const Obj_Entry *obj;			/* from shared loader */
 	struct ps_strings *ps_strings;
 {
 	long argc;
 	char **argv, *namep;
+	size_t i;
 
 	argc = *(long *)sp;
 	argv = sp + 1;
 	environ = sp + 2 + argc;		/* 2: argc + NULL ending argv */
+
+	for (i = 0; environ[i] != NULL; )
+		++i;
+	__auxinfo = &environ[i + 1];
 
 	if ((namep = argv[0]) != NULL) {	/* NULL ptr if argc = 0 */
 		if ((__progname = _strrchr(namep, '/')) == NULL)
 			__progname = namep;
 		else

Index: src/lib/csu/arm_elf/crt0.c
===
--- src/lib/csu/arm_elf/crt0.c
+++ src/lib/csu/arm_elf/crt0.c
@@ -73,13 +73,19 @@
 
 void
 ___start(int argc, char **argv, char **envp, struct ps_strings *ps_strings,
 	const Obj_Entry *obj, void (*cleanup)(void))
 {
+	size_t i;
 	char *ap;
 
- 	environ = envp;
+	environ = envp;
+
+	for (i = 0; environ[i] != NULL; )
+		++i;
+	__auxinfo = &environ[i + 1];
+
 	__ps_strings = ps_strings;
 
 	if ((ap = argv[0])) {
 		if ((__progname = _strrchr(ap, '/')) == NULL)
 			__progname = ap;

Index: src/lib/csu/common/crt0-common.c
===
--- src/lib/csu/common/crt0-common.c
+++ src/lib/csu/common/crt0-common.c
@@ -64,11 +64,12 @@
 extern void	_mcleanup(void);
 extern unsigned char __etext, __eprol;
 #endif /* MCRT0 */
 
 char		**environ;
-struct ps_strings *__ps_strings = 0;
+void		*__auxinfo;
+struct ps_strings *__ps_strings;
 
 static char	 empty_string[] = "";
 char		*__progname = empty_string;
 
 void		___start(int, char **, char **, void (*)(void),
@@ -86,11 +87,17 @@
 ___start(int argc, char **argv, char **envp,
 void (*cleanup)(void),			/* from shared loader */
 const Obj_Entry *obj,			/* from shared loader */
 struct ps_strings *ps_strings)
 {
+	size_t i;
+
 	environ = envp;
+
+	for (i = 0; environ[i] != NULL; )
+		++i;
+	__auxinfo = &environ[i + 1];
 
 	if (argv[0] != NULL) {
 		char *c;
 		__progname = argv[0];
 		for (c = argv[0]; *c; ++c) {

Index: src/lib/csu/common_elf/common.h
===
--- src/lib/csu/common_elf/common.h
+++ src/lib/csu/common_elf/common.h
@@ -65,10 +65,11 @@
 
 static char	*_strrchr(char *, int);
 
 char	**environ;
 char	*__progname = "";
+void	*__auxinfo;
 struct ps_strings *__ps_strings = 0;
 
 extern void	_init(void);
 extern void	_fini(void);
 

Index: src/lib/csu/hppa/crt0.c
===
--- src/lib/csu/hppa/crt0.c
+++ src/lib/csu/hppa/crt0.c
@@ -71,14 +71,19 @@
 int dp)
 {
 	int argc;
 	char **argv;
 	int fini_plabel[2];
+	size_t i;
 
 	argc = ps_strings->ps_nargvstr;
 	argv = ps_strings->ps_argvstr;
 	environ = ps_strings->ps_envstr;
+
+	for (i = 0; environ[i] != NULL; )
+		++i;
+	__auxinfo = &environ[i + 1];
 
 	if ((__progname = argv[0]) != NULL) {	/* NULL ptr if argc = 0 */
 		if ((__progname = _strrchr(__progname, '/')) == NULL)
 			__progname = argv[0];
 		else

Index: src/lib/csu/ia64/crt0.c
===
--- src/lib/csu/ia64/crt0.c
+++ src/lib/csu/ia64/crt0.c
@@ -45,10 +45,11 @@
 const Obj_Entry *obj,	/* from shared loader */
 struct ps_strings *ps_strings)
 {
 	long argc;
 	char **argv, *namep;
+	size_t i;
 
 	__asm __volatile__ ("1:		\
 { .mii	\
   mov		r...@gprel(1b)	\n	\
   mov		r16=ip	;;	\n	\
@@ -57,10 +58,14 @@
 } ");
 
 	argc = *(long *)sp;
 	argv = sp + 1;
 	environ = sp + 2 + argc;		/* 2: argc + NULL ending argv */
+
+	for (i = 0; environ[i] != NULL; )
+		++i;
+	__auxinfo = &environ[i + 1];
 
 	if ((namep = argv[0]) != NULL) {	/* NULL ptr if argc = 0 */
 		if ((__progname = _strrchr(namep, '/')) == NULL)
 			__progname = namep;
 		else

Index: src/lib/csu/m68k_elf/crt0.c
==

Re: Bogus KASSERT() in LFS?

2011-01-05 Thread Eduardo Horvath
On Wed, 5 Jan 2011, Martin Husemann wrote:

> On Wed, Jan 05, 2011 at 07:35:53PM +, Eduardo Horvath wrote:
> > Really?  Last time I tried (about a month or two ago) I wasn't able to 
> > hang LFS.  OTOH, looks like there's been quite some churn since then.  
> > 
> > What's your setup and what tests are you running?
> 
> I run src/tests/fs/vfs/t_full with argument "lfs_fillfs", unfortunately
> gdb doesn't like me:
> 
> (gdb) run lfs_fillfs
> Starting program: /usr/obj/tests/fs/vfs/t_full lfs_fillfs
> Segment size 1048576 is too large; trying smaller sizes.
> WARNING: the log-structured file system is experimental
> WARNING: it may cause system crashes and/or corrupt data
> lfs_cleanerd[5658]: /mnt: attaching cleaner
> lfs_cleanerd[5658]: /mnt: detaching cleaner
> panic: rumpuser fatal failure 11 (Resource deadlock avoided)
> 
> Program received signal SIGABRT, Aborted.
> 0x42a09720 in ?? ()
> (gdb) bt
> #0  0x42a09720 in ?? ()
> #1  0x42a09720 in ?? ()
> Previous frame identical to this frame (corrupt stack?)

Hm.  Interesting.  I've never tried lfs on rump.  I wonder if there are 
issues running both the filesystem and the cleaner as separate userland 
processes.

> On a life kernel this probably would be a "locking against myself". Have
> you tried filling lfs with a LOCKDEBUG kernel recently?

Certainly.  I usually do -j4 kernel builds with LOCKDEBUG, as well as 
other miscellaneous stressers.  

Eduardo


Re: Bogus KASSERT() in LFS?

2011-01-05 Thread Martin Husemann
On Wed, Jan 05, 2011 at 07:35:53PM +, Eduardo Horvath wrote:
> Really?  Last time I tried (about a month or two ago) I wasn't able to 
> hang LFS.  OTOH, looks like there's been quite some churn since then.  
> 
> What's your setup and what tests are you running?

I run src/tests/fs/vfs/t_full with argument "lfs_fillfs", unfortunately
gdb doesn't like me:

(gdb) run lfs_fillfs
Starting program: /usr/obj/tests/fs/vfs/t_full lfs_fillfs
Segment size 1048576 is too large; trying smaller sizes.
WARNING: the log-structured file system is experimental
WARNING: it may cause system crashes and/or corrupt data
lfs_cleanerd[5658]: /mnt: attaching cleaner
lfs_cleanerd[5658]: /mnt: detaching cleaner
panic: rumpuser fatal failure 11 (Resource deadlock avoided)

Program received signal SIGABRT, Aborted.
0x42a09720 in ?? ()
(gdb) bt
#0  0x42a09720 in ?? ()
#1  0x42a09720 in ?? ()
Previous frame identical to this frame (corrupt stack?)

On a life kernel this probably would be a "locking against myself". Have
you tried filling lfs with a LOCKDEBUG kernel recently?

Martin


Re: Bogus KASSERT() in LFS?

2011-01-05 Thread Eduardo Horvath
On Wed, 5 Jan 2011, Martin Husemann wrote:

> I'll commit the original patch. Unfortunately I run into locking issues
> later, so this still does not fix the full test.

Really?  Last time I tried (about a month or two ago) I wasn't able to 
hang LFS.  OTOH, looks like there's been quite some churn since then.  

What's your setup and what tests are you running?

Eduardo


Re: Bogus KASSERT() in LFS?

2011-01-05 Thread Martin Husemann
On Wed, Jan 05, 2011 at 06:06:17PM +0100, Martin Husemann wrote:
> Ok, I'll add a KASSERT() to check for that and re-run the tests.

Indeed, it never happens on first entry to the loop, but via the goto top
later.

I'll commit the original patch. Unfortunately I run into locking issues
later, so this still does not fix the full test.

Martin


Re: Bogus KASSERT() in LFS?

2011-01-05 Thread Martin Husemann
On Wed, Jan 05, 2011 at 05:03:15PM +, Eduardo Horvath wrote:
> If by_list is set we'll always get here, and I don't think we'd be called 
> if the vnode had no pages at all

Ok, I'll add a KASSERT() to check for that and re-run the tests.

Martin


Re: Bogus KASSERT() in LFS?

2011-01-05 Thread Eduardo Horvath
On Wed, 5 Jan 2011, Martin Husemann wrote:

> On Wed, Jan 05, 2011 at 04:25:09PM +, Eduardo Horvath wrote:
> > I think you're right.  While I'm pretty sure that curpg won't be NULL on 
> > the first iteration, I think it can be NULL on subsequent iterations.  I'd 
> > commit that change.
> 
> It shouldn't get there on subsequent iterations if it pulled a NULL out
> of the TAILQ because it explicitily breaks out of the loop in that case.
> 
> Why do you think it can't happen initially?

Looking at the code:

1845:   top:
1846:by_list = (vp->v_uobj.uo_npages <=
1847:   ((endoffset - startoffset) >> PAGE_SHIFT) *
1848:   UVM_PAGE_TREE_PENALTY);
1849:any_dirty = 0;
1850:
1851:if (by_list) {
1852:curpg = TAILQ_FIRST(&vp->v_uobj.memq);

If by_list is set we'll always get here, and I don't think we'd be called 
if the vnode had no pages at all, 'cause lfs_putpages checks for that 
condition and exits immediately if it's true:

2062: * If there are no pages, don't do anything.
2063: */
2064:if (vp->v_uobj.uo_npages == 0) {
2065:if (TAILQ_EMPTY(&vp->v_uobj.memq) &&
2066:(vp->v_iflag & VI_ONWORKLST) &&
2067:LIST_FIRST(&vp->v_dirtyblkhd) == NULL) {
2068:vp->v_iflag &= ~VI_WRMAPDIRTY;
2069:vn_syncer_remove_from_worklist(vp);
2070:}
2071:mutex_exit(&vp->v_interlock);
2072:
2073:/* Remove us from paging queue, if we were on it */
2074:mutex_enter(&lfs_lock);
2075:if (ip->i_flags & IN_PAGING) {
2076:ip->i_flags &= ~IN_PAGING;
2077:TAILQ_REMOVE(&fs->lfs_pchainhd, ip, i_lfs_pchain);
2078:}
2079:mutex_exit(&lfs_lock);
2080:return 0;
2081:}


Later:

1853:else {
1854:soff = startoffset;
1855:}
1856:while (by_list || soff < MIN(blkeof, endoffset)) {
1857:if (by_list) {

So if by_list is true, curpg should not be NULL on the first iteration.  
However

1858:/*
1859: * Find the first page in a block.  Skip
1860: * blocks outside our area of interest or beyond
1861: * the end of file.
1862: */
1863:KASSERT((curpg->flags & PG_MARKER) == 0);
1864:if (pages_per_block > 1) {

At the bottom of the loop:

1972:if (by_list) {
1973:curpg = TAILQ_NEXT(curpg, listq.queue);
1974:} else {
1975:soff += MAX(PAGE_SIZE, fs->lfs_bsize);
1976:}
1977:}

If we got to the end of the queue, curpg could be NULL at the next 
iteration.

(Sigh.  One of these days I need to get inspired to give LFS an overhaul.  
Recursion in the kernel is not good, and the code needs to be robust 
enough to recover from a corrupted ifile.  In theory, LFS should be much 
more robust than FFS even with logging, since even if you lose parts of 
the platter you may be able to roll back to older versions of a file)

Eduardo 


Re: Bogus KASSERT() in LFS?

2011-01-05 Thread Martin Husemann
On Wed, Jan 05, 2011 at 04:25:09PM +, Eduardo Horvath wrote:
> I think you're right.  While I'm pretty sure that curpg won't be NULL on 
> the first iteration, I think it can be NULL on subsequent iterations.  I'd 
> commit that change.

It shouldn't get there on subsequent iterations if it pulled a NULL out
of the TAILQ because it explicitily breaks out of the loop in that case.

Why do you think it can't happen initially?

Martin


Re: Bogus KASSERT() in LFS?

2011-01-05 Thread Eduardo Horvath
On Wed, 5 Jan 2011, Martin Husemann wrote:

> Disclaimer: I know nothing about LFS, but it seems to me that there is no
> guarantee for "curpg" to not be NULL in the following code from
> src/sys/ufs/lfs/lfs_vnops.c:
> 
> while (by_list || soff < MIN(blkeof, endoffset)) {
> if (by_list) {
> /*
>  * Find the first page in a block.  Skip
>  * blocks outside our area of interest or beyond
>  * the end of file.
>  */
> KASSERT(curpg == NULL ||
> (curpg->flags & PG_MARKER) == 0);
> 
> 
> and actually some ATF tests die for me with SIGSEGV inside the KASSERT.
> So, would this patch be ok?

Interesting..

I think you're right.  While I'm pretty sure that curpg won't be NULL on 
the first iteration, I think it can be NULL on subsequent iterations.  I'd 
commit that change.

Eduardo

> 
> Index: lfs_vnops.c
> ===
> RCS file: /cvsroot/src/sys/ufs/lfs/lfs_vnops.c,v
> retrieving revision 1.233
> diff -u -r1.233 lfs_vnops.c
> --- lfs_vnops.c   2 Jan 2011 05:09:32 -   1.233
> +++ lfs_vnops.c   5 Jan 2011 15:07:00 -
> @@ -1860,7 +1860,8 @@
>* blocks outside our area of interest or beyond
>* the end of file.
>*/
> - KASSERT((curpg->flags & PG_MARKER) == 0);
> + KASSERT(curpg == NULL ||
> + (curpg->flags & PG_MARKER) == 0);
>   if (pages_per_block > 1) {
>   while (curpg &&
>   ((curpg->offset & fs->lfs_bmask) ||
> 


Re: semantics of TRYEMULROOT

2011-01-05 Thread Masao Uebayashi
On Tue, Jan 04, 2011 at 07:04:56PM +, David Laight wrote:
> On Wed, Jan 05, 2011 at 12:01:17AM +0900, Masao Uebayashi wrote:
> > IMO emul lookup is completely a hack, and it should be redesigned
> > using per-process namespace
> 
> It is a lot less of a hack than what went before.
> The only other option would be an actual (working) union mount,
> and that has much bigger problems - not to mention needing
> process specific mount tables!

Bigger problems, yes.  But the only real solution!

> 
> OTOH I do wish the 32bit/64bit compatibility had been done as part of
> the '64bit' port - so that the 64bit userspace doesn't have to rename
> anything from the 32bit version - giving you a userpace that can
> (at least in theory) boot either a 32bit or 64bit kernel.

Do you mean multilib directories?

> 
>   David
> 
> -- 
> David Laight: da...@l8s.co.uk


Re: CVS commit: src/sys/uvm

2011-01-05 Thread Masao Uebayashi
On Wed, Jan 05, 2011 at 05:58:34AM +, YAMAMOTO Takashi wrote:
> hi,
> 
> > I take silence as "no objection".
> 
> the silence in this case means was-busy-for-other-things-and-forgot.
> sorry.
> 
> >> I have no real code for this big picture at this moment.  Making
> >> vm_physseg available as reference is the first step.  This only
> >> changes uvm_page_physload() to return a pointer:
> >> 
> >>-void uvm_page_physload();
> >>+void *uvm_page_physload();
> >> 
> >> But this makes XIP pager MUCH cleaner.  The reason has been explained
> >> many times.
> 
> because the separate uvm_page_physload_device is no longer necessary,
> you mean?  i have no problem with the step.
> 
> >> Making fault handlers and pagers to use vm_physseg * + off_t is
> >> the next step, and I don't intend to work on it now.  I just want
> >> to explain the big picture.
> >> 
> >> > 
> >> > >> 
> >> > >> Keep vm_physseg * + off_t array on stack.  If UVM objects uses
> >> > >> vm_page (e.g. vnode), its pager looks up vm_page -> vm_physseg *
> >> > >> + off_t *once* and cache it on stack.
> >> > 
> >> > do you mean something like this?
> >> >  struct {
> >> >  vm_physseg *hoge;
> >> >  off_t fuga;
> >> >  } foo [16];
> >> 
> >> Yes.
> >> 
> >> Or cache vm_page * with it, like:
> >> 
> >>struct vm_id {
> >>vm_physseg *seg;
> >>off_t off;
> >>vm_page *pg;
> >>};
> >> 
> >>uvm_fault()
> >>{
> >>vm_id pgs[];
> >>:
> >>}
> >> 
> >> Vnode pager (genfs_getpages) takes vm_page's by looking up
> >> vnode::v_uobj's list, or uvn_findpages().
> >> 
> >> When it returns back to fault handler, we have to lookup vm_physseg
> >> for each page.  Then fill the "seg" slot above (assuming we'll
> >> remove vm_page::phys_addr soon).
> >> 
> >> Fault handler calls per-vm_page operations iff vm_page slot is filled.
> >> XIP pages are not pageq'ed.
> 
> pgo_get returns either seg+off or pg for each vm_id slots?

vm_id[] is passed to pgo_get, each pgo_get (uvn_get, udv_get, ...)
will fill seg + off and return the array back to uvm_fault.

Pagers that use vm_page (e.g. vnode) will get pg from vnode, then
lookup seg+off, fill them into the vm_id slot.

Those that don't use vm_page (e.g. cdev) will get seg+off from
underlying objects, fill them into the vm_id slot, and return.

> 
> >> XIP pages don't need vm_page, but
> >> cached because it's vnode.
> 
> can you explain this sentence?

Sorry, it was so unclear.

XIP pages don't need vm_page (as a paging state variable),
because the pages are read-only and never involve paging.

XIP pages have to be mapped as cacheable at MMU level,
because it's vnode.

I'm allocating the full vm_page for XIP'able pages, only because
chs@ wanted to support loaning, which needs vm_page::loan_count.

> 
> >> (Just in case, have you read my paper?)
> 
> which paper?  i guess no.

http://uebayasi.dyndns.org/~uebayasi/tmp/xip.pdf

> 
> YAMAMOTO Takashi

-- 
Masao Uebayashi / Tombi Inc. / Tel: +81-90-9141-4635


Bogus KASSERT() in LFS?

2011-01-05 Thread Martin Husemann
Disclaimer: I know nothing about LFS, but it seems to me that there is no
guarantee for "curpg" to not be NULL in the following code from
src/sys/ufs/lfs/lfs_vnops.c:

while (by_list || soff < MIN(blkeof, endoffset)) {
if (by_list) {
/*
 * Find the first page in a block.  Skip
 * blocks outside our area of interest or beyond
 * the end of file.
 */
KASSERT(curpg == NULL ||
(curpg->flags & PG_MARKER) == 0);


and actually some ATF tests die for me with SIGSEGV inside the KASSERT.
So, would this patch be ok?

Index: lfs_vnops.c
===
RCS file: /cvsroot/src/sys/ufs/lfs/lfs_vnops.c,v
retrieving revision 1.233
diff -u -r1.233 lfs_vnops.c
--- lfs_vnops.c 2 Jan 2011 05:09:32 -   1.233
+++ lfs_vnops.c 5 Jan 2011 15:07:00 -
@@ -1860,7 +1860,8 @@
 * blocks outside our area of interest or beyond
 * the end of file.
 */
-   KASSERT((curpg->flags & PG_MARKER) == 0);
+   KASSERT(curpg == NULL ||
+   (curpg->flags & PG_MARKER) == 0);
if (pages_per_block > 1) {
while (curpg &&
((curpg->offset & fs->lfs_bmask) ||



Martin