Re: [possible race in ext2] Re: how to write get_block?

1999-10-09 Thread Ingo Molnar


On Fri, 8 Oct 1999, Alexander Viro wrote:

  yep we knew about this problem ... it's not quite an easy hack though. 
  Putting the directory block cache (and symlink block cache) into the page
  cache would be the preferred method - this would also clean up the code
  alot i think.
 
 More or less random comments/questions:
 
 1) Is ext2_bread() vs. bread() separation in fs/ext2/* a first step to
 that? (I.e. data blocks vs. inode/bitmap/indirect ones)
 2) Sometime ago there was a pretty impressive patch excluding large part
 of ext2_find_entry() (it caches the offset into dentry and invalidates it
 if needed) and I have a code for cyclic lookups in a directory (cheap
 hack, but it really speeds the things up for ls -l and friends). Both
 would go nicely into this scheme.
 3) We ought to switch from (bh+pointer) to (page+pointer) in namei.c.
 4) I still think that indirect blocks may go into the page cache - that
 would further simplify truncate(). OTOH the method used by BSD (indirect
 block covering addresses from n to n+block_size*pointers_per_block gets
 an address -n, double indirect block covering the area from n to n+...
 gets -n-block_size, triple indirect - -n-2*block size) will be wasteful
 for situations when block size is smaller than page.

yes, all these cleanups would be very nice to have. I'm not aware of any
rathole in that area - even those small steps we took in early 2.3 proved
how much cleaner things could get.

The memory usage point on low memory machines - i dont think it holds.
Small directories (well, even larger ones ...) will use 4k, but since the
2.0 kernel moved to the page cache small files already used up 4k as well,
and there are more small files than small directories. In the case of
indirect pointers the 2.3 page cache only needs the indirect pointer once,
then it can be deallocated by the cache if there is memory pressure -
while 2.2 needed the indirect pointer for every write(). So i suspect 2.4
might even do better on small-memory boxes than 2.2.

 5) Who works on that? I would try to do the thing, but if there already is
 some code I'ld like to look at it, indeed.

nobody AFAIK.

-- mingo



Re: [possible race in ext2] Re: how to write get_block?

1999-10-08 Thread Alexander Viro



On Fri, 8 Oct 1999, Ingo Molnar wrote:

 
 On Fri, 8 Oct 1999, Alexander Viro wrote:
 
  Stephen, Ingo, could you look at the stuff above? Methink it means that we
  either must separate ext2_truncate() for directories (doing bforget() on
  the data blocks) _or_ put the directory blocks into the page cache and do
  block_flush(). I'ld rather prefer the latter. Moreover, we can stuff the
  indirect blocks into the page cache too (using negative offsets). That
  would leave us with pure buffer cache only for static structures.
 
 yep we knew about this problem ... it's not quite an easy hack though. 
 Putting the directory block cache (and symlink block cache) into the page
 cache would be the preferred method - this would also clean up the code
 alot i think.

More or less random comments/questions:

1) Is ext2_bread() vs. bread() separation in fs/ext2/* a first step to
that? (I.e. data blocks vs. inode/bitmap/indirect ones)
2) Sometime ago there was a pretty impressive patch excluding large part
of ext2_find_entry() (it caches the offset into dentry and invalidates it
if needed) and I have a code for cyclic lookups in a directory (cheap
hack, but it really speeds the things up for ls -l and friends). Both
would go nicely into this scheme.
3) We ought to switch from (bh+pointer) to (page+pointer) in namei.c.
4) I still think that indirect blocks may go into the page cache - that
would further simplify truncate(). OTOH the method used by BSD (indirect
block covering addresses from n to n+block_size*pointers_per_block gets
an address -n, double indirect block covering the area from n to n+...
gets -n-block_size, triple indirect - -n-2*block size) will be wasteful
for situations when block size is smaller than page.
5) Who works on that? I would try to do the thing, but if there already is
some code I'ld like to look at it, indeed.

Comments?