Re: [PATCH] reiserfs: on-demand bitmap loading (testing only)
Jeff Mahoney wrote: > > > The results showed that delayed allocation was only slightly faster than > waiting on the buffers. The reason for this is that the majority of the > time is actually spent issuing the block read requests, not actually > waiting for their results. Can you define "spent issuing"? Perhaps this is simply a bad choice of block device congestion configuration, and changing it would fix things. Because device congestion is based on NUMBER of requests, not their size, bitmap reading would congest things more than file IO. Block device congestion limits I suspect to be in need of serious review, not just because of bitmaps. > The amount of time waiting on the blocks > appears not to change radically, though the amount of time issuing the > read requests does. > > Here are the actual numbers from the test runs. Between each mount > attempt, I attempted to clear the system caches by allocating and > writing to all the memory on the system, as well as the disk caches by > reading 50 MB from disk. I performed the tests with four block sizes in > order to increase the number of bitmap blocks that need to be loaded at > mount time. Note that each decrease in block size increases the number > of bitmaps fourfold. This is because when the block size is halved, it > not only doubles the number of blocks, but also halves the capacity of > each bitmap block. > > 4k block size: 2k block size: > 10036464 blocks,20072928 blocks, > 307 bitmaps (~= 39 GB) 1226 bitmaps (~= 153 GB @ 4k) > -opin_bitmaps -opin_bitmaps > sb_getblk loop: 0.0ssb_getblk loop: 0.3999643s > ll_rw_block: 1.435871744s ll_rw_block: 8.143272619s > wait_on_buffer: 0.513519144swait_on_buffer: 1.990925198s > real0m4.551sreal0m10.906s > user0m0.000suser0m0.000s > sys 0m0.060ssys 0m0.028s > > -opin_bitmaps,delayed_bitmaps -opin_bitmaps,delayed_bitmaps > sb_getblk loop: 0.0ssb_getblk loop: 0.3999643s > ll_rw_block: 1.443871029s ll_rw_block: 8.97839s > real0m2.128sreal0m8.630s > user0m0.000suser0m0.000s > sys 0m0.016ssys 0m0.020s > > -odyn_bitmaps -odyn_bitmaps > real0m0.626sreal0m0.850s > user0m0.000suser0m0.000s > sys 0m0.008ssys 0m0.016s > > 1k block size: 512b block size: > 40145856 blocks,80291712 blocks, > 4901 bitmaps (~= 612 GB @ 4k) 19603 bitmaps (~= 2.4 [EMAIL > PROTECTED]) > -opin_bitmaps -opin_bitmaps > sb_getblk loop: 0.19998214s sb_getblk loop: 0.95991426s > ll_rw_block: 33.727900516s ll_rw_block: 110.98165711s > wait_on_buffer: 1.423872816swait_on_buffer: 0.749324905s > real0m36.052s real1m51.423s > user0m0.000suser0m0.000s > sys 0m0.124ssys 0m0.256s > > -opin_bitmaps,delayed_bitmaps -opin_bitmaps,delayed_bitmaps > sb_getblk loop: 0.23997856s sb_getblk loop: 0.95991426s > ll_rw_block: 33.644994731s ll_rw_block: 109.427893721s > real0m34.562s real1m50.693s > user0m0.004suser0m0.004s > sys 0m0.060ssys 0m0.232s > > -odyn_bitmaps -odyn_bitmaps > real0m0.516sreal0m0.601s > user0m0.000suser0m0.000s > sys 0m0.004ssys 0m0.000s > > I will post runtime results of each case early next week. > > -Jeff > > -- > Jeff Mahoney > SuSE Labs
Re: [PATCH] reiserfs: on-demand bitmap loading (testing only)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 David Masover wrote: > Pierre Etchemaïté wrote: >> Le Thu, 7 Jul 2005 13:59:35 -0400, studdugie <[EMAIL PROTECTED]> a >> écrit : >> >> >>> I agree w/ Jeff 100%. I'm not a kernel hacker, simply a user. As a >>> matter of fact, I was one of those people that Jeff aluded to when he >>> said: "There have been reports of large filesystems taking an >>> unacceptably long time to mount." >> >> >> That also makes reiserfs uncomfortable with automount devices, specially >> if they're bandwidth limited like external USB or firewire disks... > > USB and firewire disks already take a little long to mount anyway. > > But, it is definitely a performance enhancement, or at least a tweak. > I'd like to see it happen -- it takes 10-15 seconds to mount my 200 gig > Reiser4 partition, which is unacceptabe for a desktop machine -- at > least, for a *linux* desktop machine. > > To keep Hans happy about the "default case", can we load the bitmap in > the background during boot/mount? Basically, if it's loaded on demand, > then we pretend to demand each part of it, one by one. Would that > considerably slow normal FS operation? Could we defer it to when the > disk is idle? (*disk*, not FS) I believe there are two possible methods of delayed loading. The first is to issue all the bitmap read requests on mount and then when we need that particular bitmap later we can wait on it. The second is to issue the read request the first time it's needed, and don't let go of it. For the sake of exploring other options, I decided to implment the first one. The results were surprising. The disk I've been testing on lately is a 40 GB ATA/100 disk mounted in a USB2 enclosure. I tested with a range of block sizes so that the number of bitmaps would increase without needing a larger disk. I realize the results won't be identical to filesystems that large, but it's the best I can do with my storage constraints. Realistically, the times will be even longer on the larger filesystems. The results showed that delayed allocation was only slightly faster than waiting on the buffers. The reason for this is that the majority of the time is actually spent issuing the block read requests, not actually waiting for their results. The amount of time waiting on the blocks appears not to change radically, though the amount of time issuing the read requests does. Here are the actual numbers from the test runs. Between each mount attempt, I attempted to clear the system caches by allocating and writing to all the memory on the system, as well as the disk caches by reading 50 MB from disk. I performed the tests with four block sizes in order to increase the number of bitmap blocks that need to be loaded at mount time. Note that each decrease in block size increases the number of bitmaps fourfold. This is because when the block size is halved, it not only doubles the number of blocks, but also halves the capacity of each bitmap block. 4k block size: 2k block size: 10036464 blocks,20072928 blocks, 307 bitmaps (~= 39 GB) 1226 bitmaps (~= 153 GB @ 4k) - -opin_bitmaps -opin_bitmaps sb_getblk loop: 0.0ssb_getblk loop: 0.3999643s ll_rw_block: 1.435871744s ll_rw_block: 8.143272619s wait_on_buffer: 0.513519144swait_on_buffer: 1.990925198s real0m4.551sreal0m10.906s user0m0.000suser0m0.000s sys 0m0.060ssys 0m0.028s - -opin_bitmaps,delayed_bitmaps -opin_bitmaps,delayed_bitmaps sb_getblk loop: 0.0ssb_getblk loop: 0.3999643s ll_rw_block: 1.443871029s ll_rw_block: 8.97839s real0m2.128sreal0m8.630s user0m0.000suser0m0.000s sys 0m0.016ssys 0m0.020s - -odyn_bitmaps -odyn_bitmaps real0m0.626sreal0m0.850s user0m0.000suser0m0.000s sys 0m0.008ssys 0m0.016s 1k block size: 512b block size: 40145856 blocks,80291712 blocks, 4901 bitmaps (~= 612 GB @ 4k) 19603 bitmaps (~= 2.4 [EMAIL PROTECTED]) - -opin_bitmaps -opin_bitmaps sb_getblk loop: 0.19998214s sb_getblk loop: 0.95991426s ll_rw_block: 33.727900516s ll_rw_block: 110.98165711s wait_on_buffer: 1.423872816swait_on_buffer: 0.749324905s real0m36.052s real1m51.423s user0m0.000suser0m0.000s sys 0m0.124ssys 0m0.256s - -opin_bitmaps,delayed_bitmaps -opin_bitmaps,delayed_bitmaps sb_getblk loop: 0.23997856s sb_getblk loop: 0.95991426s ll_rw_block: 33.644994731
Re: [PATCH] reiserfs: on-demand bitmap loading (testing only)
Jeffrey Mahoney wrote: David Masover wrote: Pierre Etchemaïté wrote: Le Thu, 7 Jul 2005 13:59:35 -0400, studdugie <[EMAIL PROTECTED]> a écrit : I agree w/ Jeff 100%. I'm not a kernel hacker, simply a user. As a matter of fact, I was one of those people that Jeff aluded to when he said: "There have been reports of large filesystems taking an unacceptably long time to mount." That also makes reiserfs uncomfortable with automount devices, specially if they're bandwidth limited like external USB or firewire disks... USB and firewire disks already take a little long to mount anyway. But, it is definitely a performance enhancement, or at least a tweak. I'd like to see it happen -- it takes 10-15 seconds to mount my 200 gig Reiser4 partition, which is unacceptabe for a desktop machine -- at least, for a *linux* desktop machine. To keep Hans happy about the "default case", can we load the bitmap in the background during boot/mount? Basically, if it's loaded on demand, then we pretend to demand each part of it, one by one. Would that considerably slow normal FS operation? Could we defer it to when the disk is idle? (*disk*, not FS) Hi David - The main issue I have with this is that I don't think bitmaps should be treated specially. They are metadata, pure and simple. This is the philosophical reason why I like on-demand loading. There's also the immediate speedup of boot. But, I think Hans has a point that it may be better for performance to pre-cache them. I would rather the default behavior be to load them on demand, but I can see situations where people would choose to pre-cache them, or even (as we do now) force them to stay in kernel memory as long as the FS is mounted. But that's not a sane default. Big hard drives on desktop machines are getting more and more common, and even the lowly Linux gamer doesn't want to waste the RAM he bought for Doom 3 on a 200 gig filesystem when he's only using 1.5 gigs of it at the moment. I would guess that this is a lot more common of a scenario than massive 2TB arrays where people can throw money (RAM) at the system to make it faster, in any way they can. But even if it's not, people with 2TB arrays are much more likely to discover the precaching feature and turn it on than gamers / desktop users woud be to discover it and turn it off. And besides, for the average desktop machine, it's latency that matters more than throughput. The most noticeable latency that we can optimize for is boot time, the next most noticeable is launching a new app / changing apps. For the average desktop user, it doesn't matter if it takes an extra half second to load a chunk of the bitmap in order to load apps, and it certainly doesn't matter if it takes an extra tenth of a second to load a file, but it does matter if RAM wasteage pushes an app into swap and it takes 5-10 seconds (at least) to switch apps, and it does matter if it takes 5-10-20 seconds longer to boot. That is why I think on-demand should be default. The root block/node sees far more activity than any bitmap and it is not treated any differently than any other bit of metadata. It's simply requested when it is needed. If the vm has determined that the root block is frequently used, it stays in memory. Why should the bitmaps be any different? Because bitmaps are harder to seek to? I think that's the argument -- the bitmap is going to be pushed out of memory because it's used less frequently, but it'll take much longer to load than anything else because it's spread over the disk. But, you could make a similar argument about most files. I second the request for benchmarks.
Re: [PATCH] reiserfs: on-demand bitmap loading (testing only)
On Fri, Jul 08, 2005 at 09:35:28AM +0400, Alexander Zarochentsev wrote: > Lena, may you check how the bitmap on-demand loading patch affects, say, > mongo results for reiserfs? Alex, it seems that to share Jeff's view - for me it's a question of correctness. You can't test correctness with benchmarks - well except your name is Yury -> http://oesiman.de/last-words-1.html :) -- ciao - Stefan
Re: [PATCH] reiserfs: on-demand bitmap loading (testing only)
On Fri, Jul 08, 2005 at 12:52:52AM -0400, Jeffrey Mahoney wrote: > It's possible to read the bitmaps in a "delayed" fashion, but the > problem of completely wasted resources is still not addressed. I feel > the correct solution is to let the buffer cache do its job and not > assume that any particular filesystem takes priority over other resources. I feel 100% with you. Getting rid of super.c:read_bitmaps is nothing but a bug-fix. -- ciao - Stefan
Re: [PATCH] reiserfs: on-demand bitmap loading (testing only)
On Thursday 07 July 2005 02:45, Hans Reiser wrote: > Jeff, are you sure that you need this code to exist? Here are the > problems I see: > > for the average case, it is suboptimal. The seeks to the bitmaps > are far more expensive than the averaged cost of keeping them in ram. It would be good to see the benchmark results which your opinion is based on. I know that on-demand bitmap loading in reiser4 was under suspicion several times but I don't remember any benchmark results which showed a slowdown regarding that. > > for 16TB filesystems, they will have plenty of budget for ram > > it complicates code if it has to worry about such things as not > enough clean memory for holding bitmaps, etc. > > It is more appropriate to write this kind of code for the > development branch which is V4. This kind of code is likely to have > hard to test and hit bugs. > Lena, may you check how the bitmap on-demand loading patch affects, say, mongo results for reiserfs? Thanks, Alex.
Re: [PATCH] reiserfs: on-demand bitmap loading (testing only)
David Masover wrote: > Pierre Etchemaïté wrote: > >> Le Thu, 7 Jul 2005 13:59:35 -0400, studdugie <[EMAIL PROTECTED]> a >> écrit : >> >> >>> I agree w/ Jeff 100%. I'm not a kernel hacker, simply a user. As a >>> matter of fact, I was one of those people that Jeff aluded to when he >>> said: "There have been reports of large filesystems taking an >>> unacceptably long time to mount." >> >> >> >> That also makes reiserfs uncomfortable with automount devices, specially >> if they're bandwidth limited like external USB or firewire disks... > > > USB and firewire disks already take a little long to mount anyway. > > But, it is definitely a performance enhancement, or at least a tweak. > I'd like to see it happen -- it takes 10-15 seconds to mount my 200 gig > Reiser4 partition, which is unacceptabe for a desktop machine -- at > least, for a *linux* desktop machine. > > To keep Hans happy about the "default case", can we load the bitmap in > the background during boot/mount? Basically, if it's loaded on demand, > then we pretend to demand each part of it, one by one. Would that > considerably slow normal FS operation? Could we defer it to when the > disk is idle? (*disk*, not FS) Hi David - The main issue I have with this is that I don't think bitmaps should be treated specially. They are metadata, pure and simple. We don't treat the s-tree specially with respect to caching and it is used regardless of whether the operations performed on the filesystem are for reads or writes. The root block/node sees far more activity than any bitmap and it is not treated any differently than any other bit of metadata. It's simply requested when it is needed. If the vm has determined that the root block is frequently used, it stays in memory. Why should the bitmaps be any different? It's possible to read the bitmaps in a "delayed" fashion, but the problem of completely wasted resources is still not addressed. I feel the correct solution is to let the buffer cache do its job and not assume that any particular filesystem takes priority over other resources. -Jeff -- Jeff Mahoney SuSE Labs [EMAIL PROTECTED]
Re: [PATCH] reiserfs: on-demand bitmap loading (testing only)
Pierre Etchemaïté wrote: Le Thu, 7 Jul 2005 13:59:35 -0400, studdugie <[EMAIL PROTECTED]> a écrit : I agree w/ Jeff 100%. I'm not a kernel hacker, simply a user. As a matter of fact, I was one of those people that Jeff aluded to when he said: "There have been reports of large filesystems taking an unacceptably long time to mount." That also makes reiserfs uncomfortable with automount devices, specially if they're bandwidth limited like external USB or firewire disks... USB and firewire disks already take a little long to mount anyway. But, it is definitely a performance enhancement, or at least a tweak. I'd like to see it happen -- it takes 10-15 seconds to mount my 200 gig Reiser4 partition, which is unacceptabe for a desktop machine -- at least, for a *linux* desktop machine. To keep Hans happy about the "default case", can we load the bitmap in the background during boot/mount? Basically, if it's loaded on demand, then we pretend to demand each part of it, one by one. Would that considerably slow normal FS operation? Could we defer it to when the disk is idle? (*disk*, not FS)
Re: [PATCH] reiserfs: on-demand bitmap loading (testing only)
Le Thu, 7 Jul 2005 13:59:35 -0400, studdugie <[EMAIL PROTECTED]> a écrit : > I agree w/ Jeff 100%. I'm not a kernel hacker, simply a user. As a > matter of fact, I was one of those people that Jeff aluded to when he > said: "There have been reports of large filesystems taking an > unacceptably long time to mount." That also makes reiserfs uncomfortable with automount devices, specially if they're bandwidth limited like external USB or firewire disks...
Re: [PATCH] reiserfs: on-demand bitmap loading (testing only)
I agree w/ Jeff 100%. I'm not a kernel hacker, simply a user. As a matter of fact, I was one of those people that Jeff aluded to when he said: "There have been reports of large filesystems taking an unacceptably long time to mount." On 7/7/05, Jeff Mahoney <[EMAIL PROTECTED]> wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Hans Reiser wrote: > > Jeff, are you sure that you need this code to exist? Here are the > > problems I see: > > > > for the average case, it is suboptimal. The seeks to the bitmaps > > are far more expensive than the averaged cost of keeping them in ram. > > > > for 16TB filesystems, they will have plenty of budget for ram > > > > it complicates code if it has to worry about such things as not > > enough clean memory for holding bitmaps, etc. > > > > It is more appropriate to write this kind of code for the > > development branch which is V4. This kind of code is likely to have > > hard to test and hit bugs. > > > > The mount time problem should be solved by querying the device > > geometry, and inserting into the queue requests for every disk drive in > > parallel. The current code fails to keep all the spindles busy. It > > would be nice if there was general purpose code for querying about how a > > device divides into spindles so that scheduling in general can be optimized. > > > > This should be a nondefault mount option. > > > > That said, thanks for paying attention to a problem Namesys discussed > > but lacked the manpower for addressing. Do you think you could discuss > > your plans before coding next time? I agree that ReiserFS V3 and V4 > > mount time is too long. 15 minutes is clearly not acceptable. Perhaps > > there is a deeper IO scheduler problem beyond bitmaps that should be > > addressed though. > > > > Hans - > > There are two issues here: The amount of time required to read in the > bitmap blocks at mount time, and the resources that are wasted due to > maintaining unused bitmap data in memory. Your arguments are reasonable, > but the user response to each of them is the same: They will simply > choose another filesystem to deploy rather than deal with the caveats of > ReiserFS. > > I agree that there may be opportunities to optimize the I/O scheduler, > but even if we ignored the blockdev<->filesystem layering violations, > and had perfect knowledge of the storage subsystem, there is still > latency associated with reading the data in. There may be any number of > abstractions between the block device presented to the filesystem and > the actual spindles (md, dm, loop, or hardware raid) and the block dev > subsystem is best equipped to handle that. The goal is not to make mount > times quicker than they are now, but to make them negligible. Suppose > for the sake of argument that somehow the I/O scheduler could be > leveraged to reduce the mount time by 90%. This is an incredibly > optimistic number and still it only reduces the 15 minute mount time to > 90 seconds. That's 90 seconds *every* boot that the system becomes > unavailable. That 90 second addition adds up, and will be the difference > between a site deploying reiserfs and choosing another solution that > doesn't have that caveat. > > That said, the resource savings benefit is largely secondary, but may be > quite important for many users including those deploying embedded > devices. We are not in the position to be making hardware purchasing > guidelines for our users. It's not reasonable to expect more than the > disk space required to store the filesystem itself. "Huge" filesystems > that were once reserved for large servers can now be found on the > desktop. For a few hundred dollars in hardware, I can construct a > multi-terabyte array under my desk. A typical usage for something like > this would be to store music, movies, or say an A/V editing suite. On a > system with 512MB of RAM, the 32 MB allocation for ONLY bitmaps is a > huge resource hit. On embedded systems that are tight on RAM, where they > are using alternate C libraries to shave off a few KiB of memory use, > pinning bitmaps is a total waste of resources. Telling the user "go buy > more memory" is not an acceptable solution. Again, this will only mean > another user chooses a different solution than reiserfs. > > ReiserFS v3 has an established track record as a stable filesystem. V4 > may be an excellent successor, but many users simply aren't interested. > They want particular features now and aren't willing to be guinea pigs > for V4 in order to get them. We've seen this time and again with feature > additions. Denying user demands with the mantra of "wait for it in V4" > has left many users frustrated, and they will once again choose > something else rather than deal without features they can have on other > filesystems. > > The performance difference, I suspect, will be negligible. If the > bitmaps are really in heavy use (which is only the case for a limited > set of workloads)
Re: [PATCH] reiserfs: on-demand bitmap loading (testing only)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hans Reiser wrote: > Jeff, are you sure that you need this code to exist? Here are the > problems I see: > > for the average case, it is suboptimal. The seeks to the bitmaps > are far more expensive than the averaged cost of keeping them in ram. > > for 16TB filesystems, they will have plenty of budget for ram > > it complicates code if it has to worry about such things as not > enough clean memory for holding bitmaps, etc. > > It is more appropriate to write this kind of code for the > development branch which is V4. This kind of code is likely to have > hard to test and hit bugs. > > The mount time problem should be solved by querying the device > geometry, and inserting into the queue requests for every disk drive in > parallel. The current code fails to keep all the spindles busy. It > would be nice if there was general purpose code for querying about how a > device divides into spindles so that scheduling in general can be optimized. > > This should be a nondefault mount option. > > That said, thanks for paying attention to a problem Namesys discussed > but lacked the manpower for addressing. Do you think you could discuss > your plans before coding next time? I agree that ReiserFS V3 and V4 > mount time is too long. 15 minutes is clearly not acceptable. Perhaps > there is a deeper IO scheduler problem beyond bitmaps that should be > addressed though. > Hans - There are two issues here: The amount of time required to read in the bitmap blocks at mount time, and the resources that are wasted due to maintaining unused bitmap data in memory. Your arguments are reasonable, but the user response to each of them is the same: They will simply choose another filesystem to deploy rather than deal with the caveats of ReiserFS. I agree that there may be opportunities to optimize the I/O scheduler, but even if we ignored the blockdev<->filesystem layering violations, and had perfect knowledge of the storage subsystem, there is still latency associated with reading the data in. There may be any number of abstractions between the block device presented to the filesystem and the actual spindles (md, dm, loop, or hardware raid) and the block dev subsystem is best equipped to handle that. The goal is not to make mount times quicker than they are now, but to make them negligible. Suppose for the sake of argument that somehow the I/O scheduler could be leveraged to reduce the mount time by 90%. This is an incredibly optimistic number and still it only reduces the 15 minute mount time to 90 seconds. That's 90 seconds *every* boot that the system becomes unavailable. That 90 second addition adds up, and will be the difference between a site deploying reiserfs and choosing another solution that doesn't have that caveat. That said, the resource savings benefit is largely secondary, but may be quite important for many users including those deploying embedded devices. We are not in the position to be making hardware purchasing guidelines for our users. It's not reasonable to expect more than the disk space required to store the filesystem itself. "Huge" filesystems that were once reserved for large servers can now be found on the desktop. For a few hundred dollars in hardware, I can construct a multi-terabyte array under my desk. A typical usage for something like this would be to store music, movies, or say an A/V editing suite. On a system with 512MB of RAM, the 32 MB allocation for ONLY bitmaps is a huge resource hit. On embedded systems that are tight on RAM, where they are using alternate C libraries to shave off a few KiB of memory use, pinning bitmaps is a total waste of resources. Telling the user "go buy more memory" is not an acceptable solution. Again, this will only mean another user chooses a different solution than reiserfs. ReiserFS v3 has an established track record as a stable filesystem. V4 may be an excellent successor, but many users simply aren't interested. They want particular features now and aren't willing to be guinea pigs for V4 in order to get them. We've seen this time and again with feature additions. Denying user demands with the mantra of "wait for it in V4" has left many users frustrated, and they will once again choose something else rather than deal without features they can have on other filesystems. The performance difference, I suspect, will be negligible. If the bitmaps are really in heavy use (which is only the case for a limited set of workloads) then the buffer cache will keep those around anyway. If the memory is needed elsewhere, the system has the "big picture" view and should be able to make those decisions. Having to swap out user code or data vs. keeping ReiserFS bitmaps in memory is going to have a performance impact either way, and I suspect the former will be the worse case. Regarding the unavailability of memory for bitmaps, we must already sleep in order to get the buffer heads
Re: [PATCH] reiserfs: on-demand bitmap loading (testing only)
Jeff, are you sure that you need this code to exist? Here are the problems I see: for the average case, it is suboptimal. The seeks to the bitmaps are far more expensive than the averaged cost of keeping them in ram. for 16TB filesystems, they will have plenty of budget for ram it complicates code if it has to worry about such things as not enough clean memory for holding bitmaps, etc. It is more appropriate to write this kind of code for the development branch which is V4. This kind of code is likely to have hard to test and hit bugs. The mount time problem should be solved by querying the device geometry, and inserting into the queue requests for every disk drive in parallel. The current code fails to keep all the spindles busy. It would be nice if there was general purpose code for querying about how a device divides into spindles so that scheduling in general can be optimized. This should be a nondefault mount option. That said, thanks for paying attention to a problem Namesys discussed but lacked the manpower for addressing. Do you think you could discuss your plans before coding next time? I agree that ReiserFS V3 and V4 mount time is too long. 15 minutes is clearly not acceptable. Perhaps there is a deeper IO scheduler problem beyond bitmaps that should be addressed though. Hans Jeff Mahoney wrote: > Currently, ReiserFS will read and keep in memory all the bitmaps for > the filesystem on mount. After the journal is replayed, it will read > them in again. On huge filesystems, this can be a resource hog and a > performance/ availability problem. > > For example, on a maximum size (~16 TB) ReiserFS filesystem, there are > 2^32 blocks, which require 131072 bitmaps to describe them. This means > that without loading any of the metadata tree or accessing file data, > just over 512M of RAM must be allocated (and is unswappable) for the > filesystem to be mounted and completely idle. All of that data is > distributed evenly over the entire disk, and must be read (twice!) > on mount. > > There have been reports of large filesystems taking an unacceptably > long time to mount. These mount times can take your 5 9's down pretty > quickly. > > The following patch implements on-demand loading for bitmaps. Rather > than pin all the bitmaps in memory as we do now, when a bitmap is > needed it is read from disk. If it is needed frequently, the buffer > cache will use existing heuristics to keep it around. The caching of > bitmap metadata is kept, so that bitmaps that are known to be full are > skipped completely. > > I have done some very basic testing on this, but I'd like to have some > more eyes take a look. > > Caveats: > > The error handling in this revision is incomplete. This is a known > issue as I would like to end up applying this patch after reworking > the error handling in ReiserFS as a whole. Ultimately, a > reiserfs_error() similar to ext3 will be introduced, which will allow > smoother handling of errors than currently available. > > The "old bitmap" code is untested. In principle, the difference boils > down to only where the bitmap block is located. > > -Jeff > > -- > Jeff Mahoney > SuSE Labs - From: Jeff Mahoney <[EMAIL PROTECTED]> Subject: [PATCH] reiserfs: implement on-demand bitmap loading (testing only) Currently, ReiserFS will read and keep in memory all the bitmaps for the filesystem on mount. After the journal is replayed, it will read them in again. On huge filesystems, this can be a resource hog and a performance/ availability problem. For example, on a maximum size (~16 TB) ReiserFS filesystem, there are 2^32 blocks, which require 131072 bitmaps to describe them. This means that without loading any of the metadata tree or accessing file data, just over 512M of RAM must be allocated (and is unswappable) for the filesystem to be mounted and completely idle. All of that data is distributed evenly over the entire disk, and must be read (twice!) on mount. There have been reports of large filesystems taking an unacceptably long time to mount. These mount times can take your 5 9's down pretty quickly. The following patch implements on-demand loading for bitmaps. Rather than pin all the bitmaps in memory as we do now, when a bitmap is needed it is read from disk. If it is needed frequently, the buffer cache will use existing heuristics to keep it around. The caching of bitmap metadata is kept, so that bitmaps that are known to be full are skipped completely. I have done some very basic testing on this, but I'd like to have some more eyes take a look. Caveats: The error handling in this revision is incomplete. This is a known issue as I would like to end up applying this patch after reworking the error handling in ReiserFS as a whole. Ultimately, a reiserfs_error() similar to ext3 will be introduced, which will allow smoother handling of errors than currently ava
[PATCH] reiserfs: on-demand bitmap loading (testing only)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Currently, ReiserFS will read and keep in memory all the bitmaps for the filesystem on mount. After the journal is replayed, it will read them in again. On huge filesystems, this can be a resource hog and a performance/ availability problem. For example, on a maximum size (~16 TB) ReiserFS filesystem, there are 2^32 blocks, which require 131072 bitmaps to describe them. This means that without loading any of the metadata tree or accessing file data, just over 512M of RAM must be allocated (and is unswappable) for the filesystem to be mounted and completely idle. All of that data is distributed evenly over the entire disk, and must be read (twice!) on mount. There have been reports of large filesystems taking an unacceptably long time to mount. These mount times can take your 5 9's down pretty quickly. The following patch implements on-demand loading for bitmaps. Rather than pin all the bitmaps in memory as we do now, when a bitmap is needed it is read from disk. If it is needed frequently, the buffer cache will use existing heuristics to keep it around. The caching of bitmap metadata is kept, so that bitmaps that are known to be full are skipped completely. I have done some very basic testing on this, but I'd like to have some more eyes take a look. Caveats: The error handling in this revision is incomplete. This is a known issue as I would like to end up applying this patch after reworking the error handling in ReiserFS as a whole. Ultimately, a reiserfs_error() similar to ext3 will be introduced, which will allow smoother handling of errors than currently available. The "old bitmap" code is untested. In principle, the difference boils down to only where the bitmap block is located. - -Jeff - -- Jeff Mahoney SuSE Labs -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.0 (GNU/Linux) iD8DBQFCzEt/LPWxlyuTD7IRAt4iAJ48L0ldd+jgowf1Qf6f6e90wbgacwCgjo5o RSQzCzJAnYeLpYQDzkLBGVc= =5ux1 -END PGP SIGNATURE- From: Jeff Mahoney <[EMAIL PROTECTED]> Subject: [PATCH] reiserfs: implement on-demand bitmap loading (testing only) Currently, ReiserFS will read and keep in memory all the bitmaps for the filesystem on mount. After the journal is replayed, it will read them in again. On huge filesystems, this can be a resource hog and a performance/ availability problem. For example, on a maximum size (~16 TB) ReiserFS filesystem, there are 2^32 blocks, which require 131072 bitmaps to describe them. This means that without loading any of the metadata tree or accessing file data, just over 512M of RAM must be allocated (and is unswappable) for the filesystem to be mounted and completely idle. All of that data is distributed evenly over the entire disk, and must be read (twice!) on mount. There have been reports of large filesystems taking an unacceptably long time to mount. These mount times can take your 5 9's down pretty quickly. The following patch implements on-demand loading for bitmaps. Rather than pin all the bitmaps in memory as we do now, when a bitmap is needed it is read from disk. If it is needed frequently, the buffer cache will use existing heuristics to keep it around. The caching of bitmap metadata is kept, so that bitmaps that are known to be full are skipped completely. I have done some very basic testing on this, but I'd like to have some more eyes take a look. Caveats: The error handling in this revision is incomplete. This is a known issue as I would like to end up applying this patch after reworking the error handling in ReiserFS as a whole. Ultimately, a reiserfs_error() similar to ext3 will be introduced, which will allow smoother handling of errors than currently available. The "old bitmap" code is untested. In principle, the difference boils down to only where the bitmap block is located. Signed-off-by: Jeff Mahoney <[EMAIL PROTECTED]> diff -ruNpX dontdiff linux-2.6.12.1/fs/reiserfs/bitmap.c linux-2.6.12.1.devel/fs/reiserfs/bitmap.c --- linux-2.6.12.1/fs/reiserfs/bitmap.c 2005-06-30 12:51:42.0 -0400 +++ linux-2.6.12.1.devel/fs/reiserfs/bitmap.c 2005-06-30 16:40:29.0 -0400 @@ -61,6 +61,8 @@ static inline void get_bit_address (stru int is_reusable (struct super_block * s, b_blocknr_t block, int bit_value) { int i, j; +unsigned int bmap = block >> s->s_blocksize_bits; +struct buffer_head *bh; if (block == 0 || block >= SB_BLOCK_COUNT (s)) { reiserfs_warning (s, "vs-4010: is_reusable: block number is out of range %lu (%u)", @@ -68,14 +70,29 @@ int is_reusable (struct super_block * s, return 0; } -/* it can't be one of the bitmap blocks */ -for (i = 0; i < SB_BMAP_NR (s); i ++) - if (block == SB_AP_BITMAP (s)[i].bh->b_blocknr) { - reiserfs_warning (s, "vs: 4020: is_reusable: " - "bitmap block %lu(%u) can't be freed or reused", - block, SB_BMAP_NR (s)); - return 0; - } +/* Old format filesystem? Unl