Re: [PATCH] reiserfs: on-demand bitmap loading (testing only)

2005-07-09 Thread Hans Reiser
Jeff Mahoney wrote:

>
>
> The results showed that delayed allocation was only slightly faster than
> waiting on the buffers. The reason for this is that the majority of the
> time is actually spent issuing the block read requests, not actually
> waiting for their results. 

Can you define "spent issuing"?  Perhaps this is simply a bad choice of
block device congestion configuration, and changing it would fix
things.  Because device congestion is based on NUMBER of requests, not
their size, bitmap reading would congest things more than file IO.

Block device congestion limits I suspect to be in need of serious
review, not just because of bitmaps.

> The amount of time waiting on the blocks
> appears not to change radically, though the amount of time issuing the
> read requests does.
>
> Here are the actual numbers from the test runs. Between each mount
> attempt, I attempted to clear the system caches by allocating and
> writing to all the memory on the system, as well as the disk caches by
> reading 50 MB from disk. I performed the tests with four block sizes in
> order to increase the number of bitmap blocks that need to be loaded at
> mount time. Note that each decrease in block size increases the number
> of bitmaps fourfold. This is because when the block size is halved, it
> not only doubles the number of blocks, but also halves the capacity of
> each bitmap block.
>
> 4k block size:  2k block size:
> 10036464 blocks,20072928 blocks,
> 307 bitmaps (~= 39 GB)  1226 bitmaps  (~= 153 GB @ 4k)
> -opin_bitmaps   -opin_bitmaps
> sb_getblk loop: 0.0ssb_getblk loop: 0.3999643s
> ll_rw_block: 1.435871744s   ll_rw_block: 8.143272619s
> wait_on_buffer: 0.513519144swait_on_buffer: 1.990925198s
> real0m4.551sreal0m10.906s
> user0m0.000suser0m0.000s
> sys 0m0.060ssys 0m0.028s
>
> -opin_bitmaps,delayed_bitmaps   -opin_bitmaps,delayed_bitmaps
> sb_getblk loop: 0.0ssb_getblk loop: 0.3999643s
> ll_rw_block: 1.443871029s   ll_rw_block: 8.97839s
> real0m2.128sreal0m8.630s
> user0m0.000suser0m0.000s
> sys 0m0.016ssys 0m0.020s
>
> -odyn_bitmaps   -odyn_bitmaps
> real0m0.626sreal0m0.850s
> user0m0.000suser0m0.000s
> sys 0m0.008ssys 0m0.016s
>
> 1k block size:  512b block size:
> 40145856 blocks,80291712 blocks,
> 4901 bitmaps (~= 612 GB @ 4k)   19603 bitmaps (~= 2.4 [EMAIL 
> PROTECTED])
> -opin_bitmaps   -opin_bitmaps
> sb_getblk loop: 0.19998214s sb_getblk loop: 0.95991426s
> ll_rw_block: 33.727900516s  ll_rw_block: 110.98165711s
> wait_on_buffer: 1.423872816swait_on_buffer: 0.749324905s
> real0m36.052s   real1m51.423s
> user0m0.000suser0m0.000s
> sys 0m0.124ssys 0m0.256s
>
> -opin_bitmaps,delayed_bitmaps   -opin_bitmaps,delayed_bitmaps
> sb_getblk loop: 0.23997856s sb_getblk loop: 0.95991426s
> ll_rw_block: 33.644994731s  ll_rw_block: 109.427893721s
> real0m34.562s   real1m50.693s
> user0m0.004suser0m0.004s
> sys 0m0.060ssys 0m0.232s
>
> -odyn_bitmaps   -odyn_bitmaps
> real0m0.516sreal0m0.601s
> user0m0.000suser0m0.000s
> sys 0m0.004ssys 0m0.000s
>
> I will post runtime results of each case early next week.
>
> -Jeff
>
> --
> Jeff Mahoney
> SuSE Labs



Re: [PATCH] reiserfs: on-demand bitmap loading (testing only)

2005-07-08 Thread Jeff Mahoney
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

David Masover wrote:
> Pierre Etchemaïté wrote:
>> Le Thu, 7 Jul 2005 13:59:35 -0400, studdugie <[EMAIL PROTECTED]> a
>> écrit :
>>
>>
>>> I agree w/ Jeff 100%. I'm not a kernel hacker, simply a user. As a
>>> matter of fact, I was one of those people that Jeff aluded to when he
>>> said: "There have been reports of large filesystems taking an
>>> unacceptably long time to mount."
>>
>>
>> That also makes reiserfs uncomfortable with automount devices, specially
>> if they're bandwidth limited like external USB or firewire disks...
> 
> USB and firewire disks already take a little long to mount anyway.
> 
> But, it is definitely a performance enhancement, or at least a tweak.
> I'd like to see it happen -- it takes 10-15 seconds to mount my 200 gig
> Reiser4 partition, which is unacceptabe for a desktop machine -- at
> least, for a *linux* desktop machine.
> 
> To keep Hans happy about the "default case", can we load the bitmap in
> the background during boot/mount?  Basically, if it's loaded on demand,
> then we pretend to demand each part of it, one by one.  Would that
> considerably slow normal FS operation?  Could we defer it to when the
> disk is idle?  (*disk*, not FS)

I believe there are two possible methods of delayed loading. The first
is to issue all the bitmap read requests on mount and then when we need
that particular bitmap later we can wait on it. The second is to issue
the read request the first time it's needed, and don't let go of it. For
the sake of exploring other options, I decided to implment the first
one. The results were surprising.

The disk I've been testing on lately is a 40 GB ATA/100 disk mounted in
a USB2 enclosure. I tested with a range of block sizes so that the
number of bitmaps would increase without needing a larger disk. I
realize the results won't be identical to filesystems that large, but
it's the best I can do with my storage constraints. Realistically, the
times will be even longer on the larger filesystems.

The results showed that delayed allocation was only slightly faster than
waiting on the buffers. The reason for this is that the majority of the
time is actually spent issuing the block read requests, not actually
waiting for their results. The amount of time waiting on the blocks
appears not to change radically, though the amount of time issuing the
read requests does.

Here are the actual numbers from the test runs. Between each mount
attempt, I attempted to clear the system caches by allocating and
writing to all the memory on the system, as well as the disk caches by
reading 50 MB from disk. I performed the tests with four block sizes in
order to increase the number of bitmap blocks that need to be loaded at
mount time. Note that each decrease in block size increases the number
of bitmaps fourfold. This is because when the block size is halved, it
not only doubles the number of blocks, but also halves the capacity of
each bitmap block.

4k block size:  2k block size:
10036464 blocks,20072928 blocks,
307 bitmaps (~= 39 GB)  1226 bitmaps  (~= 153 GB @ 4k)
- -opin_bitmaps   -opin_bitmaps
sb_getblk loop: 0.0ssb_getblk loop: 0.3999643s
ll_rw_block: 1.435871744s   ll_rw_block: 8.143272619s
wait_on_buffer: 0.513519144swait_on_buffer: 1.990925198s
real0m4.551sreal0m10.906s
user0m0.000suser0m0.000s
sys 0m0.060ssys 0m0.028s

- -opin_bitmaps,delayed_bitmaps   -opin_bitmaps,delayed_bitmaps
sb_getblk loop: 0.0ssb_getblk loop: 0.3999643s
ll_rw_block: 1.443871029s   ll_rw_block: 8.97839s
real0m2.128sreal0m8.630s
user0m0.000suser0m0.000s
sys 0m0.016ssys 0m0.020s

- -odyn_bitmaps   -odyn_bitmaps
real0m0.626sreal0m0.850s
user0m0.000suser0m0.000s
sys 0m0.008ssys 0m0.016s

1k block size:  512b block size:
40145856 blocks,80291712 blocks,
4901 bitmaps (~= 612 GB @ 4k)   19603 bitmaps (~= 2.4 [EMAIL PROTECTED])
- -opin_bitmaps   -opin_bitmaps
sb_getblk loop: 0.19998214s sb_getblk loop: 0.95991426s
ll_rw_block: 33.727900516s  ll_rw_block: 110.98165711s
wait_on_buffer: 1.423872816swait_on_buffer: 0.749324905s
real0m36.052s   real1m51.423s
user0m0.000suser0m0.000s
sys 0m0.124ssys 0m0.256s

- -opin_bitmaps,delayed_bitmaps   -opin_bitmaps,delayed_bitmaps
sb_getblk loop: 0.23997856s sb_getblk loop: 0.95991426s
ll_rw_block: 33.644994731

Re: [PATCH] reiserfs: on-demand bitmap loading (testing only)

2005-07-08 Thread David Masover

Jeffrey Mahoney wrote:

David Masover wrote:


Pierre Etchemaïté wrote:



Le Thu, 7 Jul 2005 13:59:35 -0400, studdugie <[EMAIL PROTECTED]> a
écrit :




I agree w/ Jeff 100%. I'm not a kernel hacker, simply a user. As a
matter of fact, I was one of those people that Jeff aluded to when he
said: "There have been reports of large filesystems taking an
unacceptably long time to mount."




That also makes reiserfs uncomfortable with automount devices, specially
if they're bandwidth limited like external USB or firewire disks...



USB and firewire disks already take a little long to mount anyway.

But, it is definitely a performance enhancement, or at least a tweak.
I'd like to see it happen -- it takes 10-15 seconds to mount my 200 gig
Reiser4 partition, which is unacceptabe for a desktop machine -- at
least, for a *linux* desktop machine.

To keep Hans happy about the "default case", can we load the bitmap in
the background during boot/mount?  Basically, if it's loaded on demand,
then we pretend to demand each part of it, one by one.  Would that
considerably slow normal FS operation?  Could we defer it to when the
disk is idle?  (*disk*, not FS)



Hi David -

The main issue I have with this is that I don't think bitmaps should be
treated specially. They are metadata, pure and simple.


This is the philosophical reason why I like on-demand loading.  There's 
also the immediate speedup of boot.


But, I think Hans has a point that it may be better for performance to 
pre-cache them.  I would rather the default behavior be to load them on 
demand, but I can see situations where people would choose to pre-cache 
them, or even (as we do now) force them to stay in kernel memory as long 
as the FS is mounted.  But that's not a sane default.


Big hard drives on desktop machines are getting more and more common, 
and even the lowly Linux gamer doesn't want to waste the RAM he bought 
for Doom 3 on a 200 gig filesystem when he's only using 1.5 gigs of it 
at the moment.


I would guess that this is a lot more common of a scenario than massive 
2TB arrays where people can throw money (RAM) at the system to make it 
faster, in any way they can.  But even if it's not, people with 2TB 
arrays are much more likely to discover the precaching feature and turn 
it on than gamers / desktop users woud be to discover it and turn it off.


And besides, for the average desktop machine, it's latency that matters 
more than throughput.  The most noticeable latency that we can optimize 
for is boot time, the next most noticeable is launching a new app / 
changing apps.  For the average desktop user, it doesn't matter if it 
takes an extra half second to load a chunk of the bitmap in order to 
load apps, and it certainly doesn't matter if it takes an extra tenth of 
a second to load a file, but it does matter if RAM wasteage pushes an 
app into swap and it takes 5-10 seconds (at least) to switch apps, and 
it does matter if it takes 5-10-20 seconds longer to boot.


That is why I think on-demand should be default.


The root block/node sees far more activity than any bitmap and it is not
treated any differently than any other bit of metadata. It's simply
requested when it is needed. If the vm has determined that the root
block is frequently used, it stays in memory. Why should the bitmaps be
any different?


Because bitmaps are harder to seek to?  I think that's the argument -- 
the bitmap is going to be pushed out of memory because it's used less 
frequently, but it'll take much longer to load than anything else 
because it's spread over the disk.


But, you could make a similar argument about most files.

I second the request for benchmarks.


Re: [PATCH] reiserfs: on-demand bitmap loading (testing only)

2005-07-07 Thread Stefan Traby
On Fri, Jul 08, 2005 at 09:35:28AM +0400, Alexander Zarochentsev wrote:

> Lena, may you check how the bitmap on-demand loading patch affects, say,
> mongo results for reiserfs?

Alex, it seems that to share Jeff's view - for me it's a question of
correctness.
You can't test correctness with benchmarks - well except your
name is Yury -> http://oesiman.de/last-words-1.html
:)

-- 

  ciao - 
Stefan


Re: [PATCH] reiserfs: on-demand bitmap loading (testing only)

2005-07-07 Thread Stefan Traby
On Fri, Jul 08, 2005 at 12:52:52AM -0400, Jeffrey Mahoney wrote:

> It's possible to read the bitmaps in a "delayed" fashion, but the
> problem of completely wasted resources is still not addressed. I feel
> the correct solution is to let the buffer cache do its job and not
> assume that any particular filesystem takes priority over other resources.
 
I feel 100% with you.
Getting rid of super.c:read_bitmaps is nothing but a bug-fix.

-- 

  ciao - 
Stefan


Re: [PATCH] reiserfs: on-demand bitmap loading (testing only)

2005-07-07 Thread Alexander Zarochentsev
On Thursday 07 July 2005 02:45, Hans Reiser wrote:
> Jeff, are you sure that you need this code to exist?  Here are the
> problems I see:
>
> for the average case, it is suboptimal.  The seeks to the bitmaps
> are far more expensive than the averaged cost of keeping them in ram.

It would be good to see the benchmark results which your opinion is based on.

I know that on-demand bitmap loading in reiser4 was under suspicion 
several times but I don't remember any benchmark results which showed a
slowdown regarding that.  

>
> for 16TB filesystems, they will have plenty of budget for ram
>
> it complicates code if it has to worry about such things as not
> enough clean memory for holding bitmaps, etc.
>
> It is more appropriate to write this kind of code for the
> development branch which is V4.  This kind of code is likely to have
> hard to test and hit bugs.
>

Lena, may you check how the bitmap on-demand loading patch affects, say,
mongo results for reiserfs?

Thanks, 
Alex.



Re: [PATCH] reiserfs: on-demand bitmap loading (testing only)

2005-07-07 Thread Jeffrey Mahoney
David Masover wrote:
> Pierre Etchemaïté wrote:
> 
>> Le Thu, 7 Jul 2005 13:59:35 -0400, studdugie <[EMAIL PROTECTED]> a
>> écrit :
>>
>>
>>> I agree w/ Jeff 100%. I'm not a kernel hacker, simply a user. As a
>>> matter of fact, I was one of those people that Jeff aluded to when he
>>> said: "There have been reports of large filesystems taking an
>>> unacceptably long time to mount."
>>
>>
>>
>> That also makes reiserfs uncomfortable with automount devices, specially
>> if they're bandwidth limited like external USB or firewire disks...
> 
> 
> USB and firewire disks already take a little long to mount anyway.
> 
> But, it is definitely a performance enhancement, or at least a tweak.
> I'd like to see it happen -- it takes 10-15 seconds to mount my 200 gig
> Reiser4 partition, which is unacceptabe for a desktop machine -- at
> least, for a *linux* desktop machine.
> 
> To keep Hans happy about the "default case", can we load the bitmap in
> the background during boot/mount?  Basically, if it's loaded on demand,
> then we pretend to demand each part of it, one by one.  Would that
> considerably slow normal FS operation?  Could we defer it to when the
> disk is idle?  (*disk*, not FS)

Hi David -

The main issue I have with this is that I don't think bitmaps should be
treated specially. They are metadata, pure and simple. We don't treat
the s-tree specially with respect to caching and it is used regardless
of whether the operations performed on the filesystem are for reads or
writes.

The root block/node sees far more activity than any bitmap and it is not
treated any differently than any other bit of metadata. It's simply
requested when it is needed. If the vm has determined that the root
block is frequently used, it stays in memory. Why should the bitmaps be
any different?

It's possible to read the bitmaps in a "delayed" fashion, but the
problem of completely wasted resources is still not addressed. I feel
the correct solution is to let the buffer cache do its job and not
assume that any particular filesystem takes priority over other resources.

-Jeff

-- 
Jeff Mahoney
SuSE Labs
[EMAIL PROTECTED]


Re: [PATCH] reiserfs: on-demand bitmap loading (testing only)

2005-07-07 Thread David Masover

Pierre Etchemaïté wrote:

Le Thu, 7 Jul 2005 13:59:35 -0400, studdugie <[EMAIL PROTECTED]> a écrit :



I agree w/ Jeff 100%. I'm not a kernel hacker, simply a user. As a
matter of fact, I was one of those people that Jeff aluded to when he
said: "There have been reports of large filesystems taking an
unacceptably long time to mount."



That also makes reiserfs uncomfortable with automount devices, specially
if they're bandwidth limited like external USB or firewire disks...


USB and firewire disks already take a little long to mount anyway.

But, it is definitely a performance enhancement, or at least a tweak. 
I'd like to see it happen -- it takes 10-15 seconds to mount my 200 gig 
Reiser4 partition, which is unacceptabe for a desktop machine -- at 
least, for a *linux* desktop machine.


To keep Hans happy about the "default case", can we load the bitmap in 
the background during boot/mount?  Basically, if it's loaded on demand, 
then we pretend to demand each part of it, one by one.  Would that 
considerably slow normal FS operation?  Could we defer it to when the 
disk is idle?  (*disk*, not FS)




Re: [PATCH] reiserfs: on-demand bitmap loading (testing only)

2005-07-07 Thread Pierre Etchemaïté
Le Thu, 7 Jul 2005 13:59:35 -0400, studdugie <[EMAIL PROTECTED]> a écrit :

> I agree w/ Jeff 100%. I'm not a kernel hacker, simply a user. As a
> matter of fact, I was one of those people that Jeff aluded to when he
> said: "There have been reports of large filesystems taking an
> unacceptably long time to mount."

That also makes reiserfs uncomfortable with automount devices, specially
if they're bandwidth limited like external USB or firewire disks...


Re: [PATCH] reiserfs: on-demand bitmap loading (testing only)

2005-07-07 Thread studdugie
I agree w/ Jeff 100%. I'm not a kernel hacker, simply a user. As a
matter of fact, I was one of those people that Jeff aluded to when he
said: "There have been reports of large filesystems taking an
unacceptably long time to mount."
On 7/7/05, Jeff Mahoney <[EMAIL PROTECTED]> wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> Hans Reiser wrote:
> > Jeff, are you sure that you need this code to exist?  Here are the
> > problems I see:
> >
> > for the average case, it is suboptimal.  The seeks to the bitmaps
> > are far more expensive than the averaged cost of keeping them in ram.
> >
> > for 16TB filesystems, they will have plenty of budget for ram
> >
> > it complicates code if it has to worry about such things as not
> > enough clean memory for holding bitmaps, etc.
> >
> > It is more appropriate to write this kind of code for the
> > development branch which is V4.  This kind of code is likely to have
> > hard to test and hit bugs.
> >
> > The mount time problem should be solved by querying the device
> > geometry, and inserting into the queue requests for every disk drive in
> > parallel.  The current code fails to keep all the spindles busy.  It
> > would be nice if there was general purpose code for querying about how a
> > device divides into spindles so that scheduling in general can be optimized.
> >
> > This should be a nondefault mount option.
> >
> > That said, thanks for paying attention to a problem Namesys discussed
> > but lacked the manpower for addressing.  Do you think you could discuss
> > your plans before coding next time?  I agree that ReiserFS V3 and V4
> > mount time is too long.  15 minutes is clearly not acceptable.  Perhaps
> > there is a deeper IO scheduler problem beyond bitmaps that should be
> > addressed though.
> >
> 
> Hans -
> 
> There are two issues here: The amount of time required to read in the
> bitmap blocks at mount time, and the resources that are wasted due to
> maintaining unused bitmap data in memory. Your arguments are reasonable,
> but the user response to each of them is the same: They will simply
> choose another filesystem to deploy rather than deal with the caveats of
> ReiserFS.
> 
> I agree that there may be opportunities to optimize the I/O scheduler,
> but even if we ignored the blockdev<->filesystem layering violations,
> and had perfect knowledge of the storage subsystem, there is still
> latency associated with reading the data in. There may be any number of
> abstractions between the block device presented to the filesystem and
> the actual spindles (md, dm, loop, or hardware raid) and the block dev
> subsystem is best equipped to handle that. The goal is not to make mount
> times quicker than they are now, but to make them negligible. Suppose
> for the sake of argument that somehow the I/O scheduler could be
> leveraged to reduce the mount time by 90%. This is an incredibly
> optimistic number and still it only reduces the 15 minute mount time to
> 90 seconds. That's 90 seconds *every* boot that the system becomes
> unavailable. That 90 second addition adds up, and will be the difference
> between a site deploying reiserfs and choosing another solution that
> doesn't have that caveat.
> 
> That said, the resource savings benefit is largely secondary, but may be
> quite important for many users including those deploying embedded
> devices. We are not in the position to be making hardware purchasing
> guidelines for our users. It's not reasonable to expect more than the
> disk space required to store the filesystem itself. "Huge" filesystems
> that were once reserved for large servers can now be found on the
> desktop. For a few hundred dollars in hardware, I can construct a
> multi-terabyte array under my desk. A typical usage for something like
> this would be to store music, movies, or say an A/V editing suite. On a
> system with 512MB of RAM, the 32 MB allocation for ONLY bitmaps is a
> huge resource hit. On embedded systems that are tight on RAM, where they
> are using alternate C libraries to shave off a few KiB of memory use,
> pinning bitmaps is a total waste of resources. Telling the user "go buy
> more memory" is not an acceptable solution. Again, this will only mean
> another user chooses a different solution than reiserfs.
> 
> ReiserFS v3 has an established track record as a stable filesystem. V4
> may be an excellent successor, but many users simply aren't interested.
> They want particular features now and aren't willing to be guinea pigs
> for V4 in order to get them. We've seen this time and again with feature
> additions. Denying user demands with the mantra of "wait for it in V4"
> has left many users frustrated, and they will once again choose
> something else rather than deal without features they can have on other
> filesystems.
> 
> The performance difference, I suspect, will be negligible. If the
> bitmaps are really in heavy use (which is only the case for a limited
> set of workloads)

Re: [PATCH] reiserfs: on-demand bitmap loading (testing only)

2005-07-07 Thread Jeff Mahoney
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hans Reiser wrote:
> Jeff, are you sure that you need this code to exist?  Here are the
> problems I see:
> 
> for the average case, it is suboptimal.  The seeks to the bitmaps
> are far more expensive than the averaged cost of keeping them in ram.
> 
> for 16TB filesystems, they will have plenty of budget for ram
> 
> it complicates code if it has to worry about such things as not
> enough clean memory for holding bitmaps, etc.
> 
> It is more appropriate to write this kind of code for the
> development branch which is V4.  This kind of code is likely to have
> hard to test and hit bugs.
> 
> The mount time problem should be solved by querying the device
> geometry, and inserting into the queue requests for every disk drive in
> parallel.  The current code fails to keep all the spindles busy.  It
> would be nice if there was general purpose code for querying about how a
> device divides into spindles so that scheduling in general can be optimized.
> 
> This should be a nondefault mount option.
> 
> That said, thanks for paying attention to a problem Namesys discussed
> but lacked the manpower for addressing.  Do you think you could discuss
> your plans before coding next time?  I agree that ReiserFS V3 and V4
> mount time is too long.  15 minutes is clearly not acceptable.  Perhaps
> there is a deeper IO scheduler problem beyond bitmaps that should be
> addressed though.
> 

Hans -

There are two issues here: The amount of time required to read in the
bitmap blocks at mount time, and the resources that are wasted due to
maintaining unused bitmap data in memory. Your arguments are reasonable,
but the user response to each of them is the same: They will simply
choose another filesystem to deploy rather than deal with the caveats of
ReiserFS.

I agree that there may be opportunities to optimize the I/O scheduler,
but even if we ignored the blockdev<->filesystem layering violations,
and had perfect knowledge of the storage subsystem, there is still
latency associated with reading the data in. There may be any number of
abstractions between the block device presented to the filesystem and
the actual spindles (md, dm, loop, or hardware raid) and the block dev
subsystem is best equipped to handle that. The goal is not to make mount
times quicker than they are now, but to make them negligible. Suppose
for the sake of argument that somehow the I/O scheduler could be
leveraged to reduce the mount time by 90%. This is an incredibly
optimistic number and still it only reduces the 15 minute mount time to
90 seconds. That's 90 seconds *every* boot that the system becomes
unavailable. That 90 second addition adds up, and will be the difference
between a site deploying reiserfs and choosing another solution that
doesn't have that caveat.

That said, the resource savings benefit is largely secondary, but may be
quite important for many users including those deploying embedded
devices. We are not in the position to be making hardware purchasing
guidelines for our users. It's not reasonable to expect more than the
disk space required to store the filesystem itself. "Huge" filesystems
that were once reserved for large servers can now be found on the
desktop. For a few hundred dollars in hardware, I can construct a
multi-terabyte array under my desk. A typical usage for something like
this would be to store music, movies, or say an A/V editing suite. On a
system with 512MB of RAM, the 32 MB allocation for ONLY bitmaps is a
huge resource hit. On embedded systems that are tight on RAM, where they
are using alternate C libraries to shave off a few KiB of memory use,
pinning bitmaps is a total waste of resources. Telling the user "go buy
more memory" is not an acceptable solution. Again, this will only mean
another user chooses a different solution than reiserfs.

ReiserFS v3 has an established track record as a stable filesystem. V4
may be an excellent successor, but many users simply aren't interested.
They want particular features now and aren't willing to be guinea pigs
for V4 in order to get them. We've seen this time and again with feature
additions. Denying user demands with the mantra of "wait for it in V4"
has left many users frustrated, and they will once again choose
something else rather than deal without features they can have on other
filesystems.

The performance difference, I suspect, will be negligible. If the
bitmaps are really in heavy use (which is only the case for a limited
set of workloads) then the buffer cache will keep those around anyway.
If the memory is needed elsewhere, the system has the "big picture" view
and should be able to make those decisions. Having to swap out user code
or data vs. keeping ReiserFS bitmaps in memory is going to have a
performance impact either way, and I suspect the former will be the
worse case. Regarding the unavailability of memory for bitmaps, we must
already sleep in order to get the buffer heads

Re: [PATCH] reiserfs: on-demand bitmap loading (testing only)

2005-07-06 Thread Hans Reiser
Jeff, are you sure that you need this code to exist?  Here are the
problems I see:

for the average case, it is suboptimal.  The seeks to the bitmaps
are far more expensive than the averaged cost of keeping them in ram.

for 16TB filesystems, they will have plenty of budget for ram

it complicates code if it has to worry about such things as not
enough clean memory for holding bitmaps, etc.

It is more appropriate to write this kind of code for the
development branch which is V4.  This kind of code is likely to have
hard to test and hit bugs.

The mount time problem should be solved by querying the device
geometry, and inserting into the queue requests for every disk drive in
parallel.  The current code fails to keep all the spindles busy.  It
would be nice if there was general purpose code for querying about how a
device divides into spindles so that scheduling in general can be optimized.

This should be a nondefault mount option.

That said, thanks for paying attention to a problem Namesys discussed
but lacked the manpower for addressing.  Do you think you could discuss
your plans before coding next time?  I agree that ReiserFS V3 and V4
mount time is too long.  15 minutes is clearly not acceptable.  Perhaps
there is a deeper IO scheduler problem beyond bitmaps that should be
addressed though.

Hans

Jeff Mahoney wrote:

>  Currently, ReiserFS will read and keep in memory all the bitmaps for
>  the filesystem on mount. After the journal is replayed, it will read
>  them in again. On huge filesystems, this can be a resource hog and a
>  performance/ availability problem.
>
>  For example, on a maximum size (~16 TB) ReiserFS filesystem, there are
>  2^32 blocks, which require 131072 bitmaps to describe them. This means
>  that without loading any of the metadata tree or accessing file data,
>  just over 512M of RAM must be allocated (and is unswappable) for the
>  filesystem to be mounted and completely idle. All of that data is
>  distributed evenly over the entire disk, and must be read (twice!)
>  on mount.
>
>  There have been reports of large filesystems taking an unacceptably
>  long time to mount. These mount times can take your 5 9's down pretty
>  quickly.
>
>  The following patch implements on-demand loading for bitmaps. Rather
>  than pin all the bitmaps in memory as we do now, when a bitmap is
>  needed it is read from disk. If it is needed frequently, the buffer
>  cache will use existing heuristics to keep it around. The caching of
>  bitmap metadata is kept, so that bitmaps that are known to be full are
>  skipped completely.
>
>  I have done some very basic testing on this, but I'd like to have some
>  more eyes take a look.
>
>  Caveats:
>
>  The error handling in this revision is incomplete. This is a known
>  issue as I would like to end up applying this patch after reworking
>  the error handling in ReiserFS as a whole. Ultimately, a
>  reiserfs_error() similar to ext3 will be introduced, which will allow
>  smoother handling of errors than currently available.
>
>  The "old bitmap" code is untested. In principle, the difference boils
>  down to only where the bitmap block is located.
>
> -Jeff
>
> --
> Jeff Mahoney
> SuSE Labs


-

From: Jeff Mahoney <[EMAIL PROTECTED]>
Subject: [PATCH] reiserfs: implement on-demand bitmap loading (testing only)

 Currently, ReiserFS will read and keep in memory all the bitmaps for the
 filesystem on mount. After the journal is replayed, it will read them in
 again. On huge filesystems, this can be a resource hog and a performance/
 availability problem.

 For example, on a maximum size (~16 TB) ReiserFS filesystem, there are
 2^32 blocks, which require 131072 bitmaps to describe them. This means that
 without loading any of the metadata tree or accessing file data, just over
 512M of RAM must be allocated (and is unswappable) for the filesystem to
 be mounted and completely idle. All of that data is distributed evenly over
 the entire disk, and must be read (twice!) on mount.

 There have been reports of large filesystems taking an unacceptably
long time
 to mount. These mount times can take your 5 9's down pretty quickly.

 The following patch implements on-demand loading for bitmaps. Rather
than pin
 all the bitmaps in memory as we do now, when a bitmap is needed it is read
 from disk. If it is needed frequently, the buffer cache will use existing
 heuristics to keep it around. The caching of bitmap metadata is kept,
so that
 bitmaps that are known to be full are skipped completely.

 I have done some very basic testing on this, but I'd like to have some more
 eyes take a look.

 Caveats:

 The error handling in this revision is incomplete. This is a known issue
 as I would like to end up applying this patch after reworking the error
 handling in ReiserFS as a whole. Ultimately, a reiserfs_error() similar to
 ext3 will be introduced, which will allow smoother handling of errors than
 currently ava

[PATCH] reiserfs: on-demand bitmap loading (testing only)

2005-07-06 Thread Jeff Mahoney
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

 Currently, ReiserFS will read and keep in memory all the bitmaps for
 the filesystem on mount. After the journal is replayed, it will read
 them in again. On huge filesystems, this can be a resource hog and a
 performance/ availability problem.

 For example, on a maximum size (~16 TB) ReiserFS filesystem, there are
 2^32 blocks, which require 131072 bitmaps to describe them. This means
 that without loading any of the metadata tree or accessing file data,
 just over 512M of RAM must be allocated (and is unswappable) for the
 filesystem to be mounted and completely idle. All of that data is
 distributed evenly over the entire disk, and must be read (twice!)
 on mount.

 There have been reports of large filesystems taking an unacceptably
 long time to mount. These mount times can take your 5 9's down pretty
 quickly.

 The following patch implements on-demand loading for bitmaps. Rather
 than pin all the bitmaps in memory as we do now, when a bitmap is
 needed it is read from disk. If it is needed frequently, the buffer
 cache will use existing heuristics to keep it around. The caching of
 bitmap metadata is kept, so that bitmaps that are known to be full are
 skipped completely.

 I have done some very basic testing on this, but I'd like to have some
 more eyes take a look.

 Caveats:

 The error handling in this revision is incomplete. This is a known
 issue as I would like to end up applying this patch after reworking
 the error handling in ReiserFS as a whole. Ultimately, a
 reiserfs_error() similar to ext3 will be introduced, which will allow
 smoother handling of errors than currently available.

 The "old bitmap" code is untested. In principle, the difference boils
 down to only where the bitmap block is located.

- -Jeff

- --
Jeff Mahoney
SuSE Labs
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.0 (GNU/Linux)

iD8DBQFCzEt/LPWxlyuTD7IRAt4iAJ48L0ldd+jgowf1Qf6f6e90wbgacwCgjo5o
RSQzCzJAnYeLpYQDzkLBGVc=
=5ux1
-END PGP SIGNATURE-
From: Jeff Mahoney <[EMAIL PROTECTED]>
Subject: [PATCH] reiserfs: implement on-demand bitmap loading (testing only)

 Currently, ReiserFS will read and keep in memory all the bitmaps for the
 filesystem on mount. After the journal is replayed, it will read them in
 again. On huge filesystems, this can be a resource hog and a performance/
 availability problem.

 For example, on a maximum size (~16 TB) ReiserFS filesystem, there are
 2^32 blocks, which require 131072 bitmaps to describe them. This means that
 without loading any of the metadata tree or accessing file data, just over
 512M of RAM must be allocated (and is unswappable) for the filesystem to
 be mounted and completely idle. All of that data is distributed evenly over
 the entire disk, and must be read (twice!) on mount.

 There have been reports of large filesystems taking an unacceptably long time
 to mount. These mount times can take your 5 9's down pretty quickly.

 The following patch implements on-demand loading for bitmaps. Rather than pin
 all the bitmaps in memory as we do now, when a bitmap is needed it is read
 from disk. If it is needed frequently, the buffer cache will use existing
 heuristics to keep it around. The caching of bitmap metadata is kept, so that
 bitmaps that are known to be full are skipped completely.

 I have done some very basic testing on this, but I'd like to have some more
 eyes take a look.

 Caveats:

 The error handling in this revision is incomplete. This is a known issue
 as I would like to end up applying this patch after reworking the error
 handling in ReiserFS as a whole. Ultimately, a reiserfs_error() similar to
 ext3 will be introduced, which will allow smoother handling of errors than
 currently available.

 The "old bitmap" code is untested. In principle, the difference boils down to
 only where the bitmap block is located.

Signed-off-by: Jeff Mahoney <[EMAIL PROTECTED]>

diff -ruNpX dontdiff linux-2.6.12.1/fs/reiserfs/bitmap.c linux-2.6.12.1.devel/fs/reiserfs/bitmap.c
--- linux-2.6.12.1/fs/reiserfs/bitmap.c	2005-06-30 12:51:42.0 -0400
+++ linux-2.6.12.1.devel/fs/reiserfs/bitmap.c	2005-06-30 16:40:29.0 -0400
@@ -61,6 +61,8 @@ static inline void get_bit_address (stru
 int is_reusable (struct super_block * s, b_blocknr_t block, int bit_value)
 {
 int i, j;
+unsigned int bmap = block >> s->s_blocksize_bits;
+struct buffer_head *bh;
 
 if (block == 0 || block >= SB_BLOCK_COUNT (s)) {
 	reiserfs_warning (s, "vs-4010: is_reusable: block number is out of range %lu (%u)",
@@ -68,14 +70,29 @@ int is_reusable (struct super_block * s,
 	return 0;
 }
 
-/* it can't be one of the bitmap blocks */
-for (i = 0; i < SB_BMAP_NR (s); i ++)
-	if (block == SB_AP_BITMAP (s)[i].bh->b_blocknr) {
-	reiserfs_warning (s, "vs: 4020: is_reusable: "
-			  "bitmap block %lu(%u) can't be freed or reused",
-			  block, SB_BMAP_NR (s));
-	return 0;
-	}
+/* Old format filesystem? Unl