Re: 64-bit block sizes on 32-bit systems
My turn to chime in. JFS was designed around a 4K meta-data page size. It would require some major re-design to use larger block sizes. On the other hand, JFS could take advantage of 64-bit block addresses immediately. JFS internally store the block address in 40 bits. (Sorry, file size & volume size are both limited to 4 peta-bytes on JFS.) At the rate that storage hardware and requirements are increasing, increasing the block size is a short-term solution that is only going to delay the inevitable requirement for 64-bit block addressability. There is a practical limit to a usable block-size. Someone threw out 64K, which seems reasonable to me. -- David Kleikamp IBM Linux Technology Center - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
My turn to chime in. JFS was designed around a 4K meta-data page size. It would require some major re-design to use larger block sizes. On the other hand, JFS could take advantage of 64-bit block addresses immediately. JFS internally store the block address in 40 bits. (Sorry, file size volume size are both limited to 4 peta-bytes on JFS.) At the rate that storage hardware and requirements are increasing, increasing the block size is a short-term solution that is only going to delay the inevitable requirement for 64-bit block addressability. There is a practical limit to a usable block-size. Someone threw out 64K, which seems reasonable to me. -- David Kleikamp IBM Linux Technology Center - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
On Mon, Mar 26, 2001 at 08:39:21AM -0800, LA Walsh wrote: > So...is it the plan, or has it been though about -- 'abstracting' > block numbes as a typedef 'block_nr', then at compile time > having it be selectable as to whether or not this was to > be a 32-bit or 64 bit quantity -- that way older systems would Oh, did no-one mention the words `Module ABI' yet? -- Revolutions do not require corporate support. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
Steve Lord wrote: > Just a brief add to the discussion, besides which I have a vested interest > in this! I'll add my little comments as well, and hopefully not start a flamewar... :) [snip comments about blocksize, etc.] Here's a real-life example of something that most of you will probably hate me for mentioning: HFS uses variable sized blocks (made up of multiple 512 byte sectors), but stores block numbers as a 16 bit value. (I know, everyone will say, "We're talking about moving from 32 to 64 bits." Keep listening.) This gave great performance on the then current massive storage of a 20M drive. However, when it became possible to get the absolutely gigantic hard drive of 1G, it became more and more obvious that it was a drawback that was causing a huge amount of wasted space. Apple had to design a new filesystem (HFS+) that was able to represent blocks with a 32 bit number to overcome the effective limitation on how big a filesystem could be. It's getting to the point now that it's easily possible to put together a disk array that is large enough that even referring to blocks with a 32 bit value requires relatively large blocks. I don't know if we have very many filesystems that would support this feature, but it will become important a lot sooner than anyone may be thinking. Obviously this case isn't a perfect fit for the situation, since HFS was designed to be read by 32 bit machines, and the upgrade to 32 bits didn't give a CPU penalty, just a bus bandwidth problem. Also, I'm coming from a platform that actually can do a decent job of 64 bit, unlike x86, but we shouldn't disallow people from doing bigger and better things. It's become very popular lately to position Linux as an enterprise-ready system, and this is something that will be expected. People will want to access a multi-TB database as a single file, as well as other things that may seem crazy to most people now. I understand people's aversion to the #ifdefs in the code, but if the changes are made in a sane way, it can still be clean and easy to maintain. It's worth it to add a little complexity (particularly as an option) to add a feature that people will be demanding in the relatively near future. It might be a good idea to wait for 2.5, tho... Brad Boyer [EMAIL PROTECTED] P.S.: No, I have no personal reason to need any of this 64 bit filesystem stuff. Just trying to point out possibilities. Don't expect me to actually be writing this stuff... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
Hi, Just a brief add to the discussion, besides which I have a vested interest in this! I do not believe that you can make the addressability of a device larger at the expense of granularity of address space at the bottom end. Just because ext2 has a single size for metadata does not mean everything you put on the disks does. XFS filesystems, for example, can be made with block sizes from 512 bytes to 64Kbytes (ok not working on linux across this range yet, but it will). In all of these cases we have chunks of metadata which are 512 bytes long, and we have chunks bigger than the blocksize. The 512 byte chunks are the superblock and the heads of the freespace structures, there are multilples of them through the filesystem. To top that, we have disk write ordering constraints that could mean that for two of the 512 byte chunks next to each other one must be written to disk now to free log space, the other must not be written to disk because it is in a transaction. We would be forced to do read-modify-write down at some lower level - wait the lower levels would not have the addressability. There are probably other things which will not fly if you lose the addressing granularity. Volume headers and such like would be one possibility. No I don't have a magic bullet solution, but I do not think that just increasing the granularity of the addressing is the correct answer, and yes I do agree that just growing the buffer_head fields is not perfect either. Steve Lord p.s. there was mention of bigger page size, it is not hard to fix, but the swap path will not even work with 64K pages right now. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
Jan Harkes <[EMAIL PROTECTED]>: > > On Tue, Mar 27, 2001 at 01:57:42PM -0600, Jesse Pollard wrote: > > > Using similar numbers as presented. If we are working our way through > > > every single block in a Pentabyte filesystem, and the blocksize is 512 > > > bytes. Then the 1us in extra CPU cycles because of 64-bit operations > > > would add, according to by back of the envelope calculation, 2199023 > > > seconds of CPU time a bit more than 25 days. > > > > Ummm... I don't think it adds that much. You seem to be leaving out the > > overlap disk/IO and computation for read-ahead. This should eliminate the > > majority of the delay effect. > > 1024 TB should be around 2*10^12 512-byte blocks, divide by 10^6 (1us) > of "assumed" overhead per block operation is 2*10^6 seconds, no I > believe I'm pretty close there. I am considering everything being > "available in the cache", i.e. no waiting for disk access. That would be true for small files (< 5GB). I have to deal with files that may be 20-100 GB. Except for the largest systems (200GB of main memory) the data will NOT be in the cache except for ~50% of the time. (assuming only one user) > > > Seriously, there is a lot more that needs to be done than introducing a > > > 64-bit blocknumber. Effectively 512 byte blocks are far too small for > > > that kind of data, and going to pagesize blocks (and increasing pagesize > > > to 64KB or 2MB at the same time) is a solution that is far more likely > > > to give good results since it reduces both the total the number of > > > 'blocks' on the device as well as reducing the total amount of calls > > > throughout kernel space instead of increasing the cost per call. > > > > Talk about adding overhead... How long do you think it takes to read a > > 2MB block (not to mention the time to update that page..) The additional > > contention on the fiberchannel I/O alone might kill it if the filesystem > > is busy. > > The time to update the pagetables is identical to the time to update a > 4KB page when the OS is using a 2MB pagesize. Ofcourse it will take more > time to load the data into the page, however it should be a consecutive > stretch of data on disk, which should give a more efficient transfer > than small blocks scattered around the disk. You assume the file is accessed sequentially. The wether models don't do that. They do have some locality, but only in a 3D sense. When you include time it becomes closer to a random disk block reference when everything has to be linearized. > > > Granted, 512 bytes could be considered too small for some things, but > > once you pass 32K you start adding a lot of rotational delay problems. > > I've used file systems with 256K blocks - they are slow when compaired > > to the throughput using 32K. I wasn't the one running the benchmarks, > > but with a MaxStrat 400GB raid with 256K sized data transfer was much > > slower (around 3 times slower) than 32K. (The target application was > > a GIS server using Oracle). > > But your subsystem (the disk) was probably still using 512 byte blocks, > possibly scattered. And the OS was still using 4KB pages, it takes more > time to reclaim and gather 64 pages per IO operation than one, that's > why I'm saying that the pagesize needs to scale along with the blocksize. It wasn't - the "disks" were composed of groups of 5 drives in a raid striped for speed and spread across 5 SCSI III controlers. Each raid attached had 16MB internal cache. I think the controlers were using an entire sector read (32K). > The application might have been assuming a small block size as well, and > the OS was told to do several read/modify/write cycles, perhaps even 512 > times as much as necessary. There was some of that, but not much. Oracle (as I recall) allows for the specification of transfer size. This also brings up the problem of small files. Allocating 2MB per file would, waist quite a bit of disk space (assuming 5 - 10 million files with only 15% having 25GB or more). > I'm not saying that the current system will perform well when working > with large blocks, but compared to increasing the size of block_t, a > larger blocksize has more potential to give improvements in the long > term without adding an unrecoverable performance hit. Not when the filesystem is required for general use. It only makes it simpler to actually have a large filesystem. It doesn't help when it must be used. Now you are saying that the throughput WILL go down, but only if you use large block sizes. I can go along with making block sizes up to 8K. Even 32K for special circumstances (even 64K for dedicated use). But not larger. NFS overhead on file I/O becomes way too excessive (...worst example now is having to read a 2MB block to update 512 bytes, then write it back... :-) - Jesse I Pollard, II Email: [EMAIL PROTECTED] Any opinions expressed are solely my own. - To unsubscribe from this
Re: 64-bit block sizes on 32-bit systems
Jan Harkes wrote: > > On Tue, Mar 27, 2001 at 01:57:42PM -0600, Jesse Pollard wrote: > > > Using similar numbers as presented. If we are working our way through > > > every single block in a Pentabyte filesystem, and the blocksize is 512 > > > bytes. Then the 1us in extra CPU cycles because of 64-bit operations > > > would add, according to by back of the envelope calculation, 2199023 > > > seconds of CPU time a bit more than 25 days. > > > > Ummm... I don't think it adds that much. You seem to be leaving out the > > overlap disk/IO and computation for read-ahead. This should eliminate the > > majority of the delay effect. > > 1024 TB should be around 2*10^12 512-byte blocks, divide by 10^6 (1us) > of "assumed" overhead per block operation is 2*10^6 seconds, no I > believe I'm pretty close there. I am considering everything being > "available in the cache", i.e. no waiting for disk access. --- If everything being used is only used from the cache, then the application probably doesn't need 64-bit block support. I submit that your argument may be flawed in the assumption that if an application needs multi-terabyte files and devices, that most of the data will be in the in-memory cache. > The time to update the pagetables is identical to the time to update a > 4KB page when the OS is using a 2MB pagesize. Ofcourse it will take more > time to load the data into the page, however it should be a consecutive > stretch of data on disk, which should give a more efficient transfer > than small blocks scattered around the disk. --- Not if you were doing alot of random reads where you only needd 1-2K of data. The read-time of the extra 2M-1K would seem to eat into any performance boot gained by the large pagesize. > > > Granted, 512 bytes could be considered too small for some things, but > > once you pass 32K you start adding a lot of rotational delay problems. > > I've used file systems with 256K blocks - they are slow when compaired > > to the throughput using 32K. I wasn't the one running the benchmarks, > > but with a MaxStrat 400GB raid with 256K sized data transfer was much > > slower (around 3 times slower) than 32K. (The target application was > > a GIS server using Oracle). > > But your subsystem (the disk) was probably still using 512 byte blocks, > possibly scattered. And the OS was still using 4KB pages, it takes more > time to reclaim and gather 64 pages per IO operation than one, that's > why I'm saying that the pagesize needs to scale along with the blocksize. > > The application might have been assuming a small block size as well, and > the OS was told to do several read/modify/write cycles, perhaps even 512 > times as much as necessary. > > I'm not saying that the current system will perform well when working > with large blocks, but compared to increasing the size of block_t, a > larger blocksize has more potential to give improvements in the long > term without adding an unrecoverable performance hit. --- That's totally application dependent. Database applications might tend to skip around in the data and do short/reads/writes over a very large file. Large block sizes will degrade their performance. This was the idea of making it a *configurable* option. If you need it, configure it. Same with block size -- that should likely have a wider range for configuration as well. But configuration (and ideally auto-configuration where possible) seems the ultimate win-win situation. -l -- The above thoughts are my own and do not necessarily represent those of my employer. L A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
On Tue, Mar 27, 2001 at 01:57:42PM -0600, Jesse Pollard wrote: > > Using similar numbers as presented. If we are working our way through > > every single block in a Pentabyte filesystem, and the blocksize is 512 > > bytes. Then the 1us in extra CPU cycles because of 64-bit operations > > would add, according to by back of the envelope calculation, 2199023 > > seconds of CPU time a bit more than 25 days. > > Ummm... I don't think it adds that much. You seem to be leaving out the > overlap disk/IO and computation for read-ahead. This should eliminate the > majority of the delay effect. 1024 TB should be around 2*10^12 512-byte blocks, divide by 10^6 (1us) of "assumed" overhead per block operation is 2*10^6 seconds, no I believe I'm pretty close there. I am considering everything being "available in the cache", i.e. no waiting for disk access. > > Seriously, there is a lot more that needs to be done than introducing a > > 64-bit blocknumber. Effectively 512 byte blocks are far too small for > > that kind of data, and going to pagesize blocks (and increasing pagesize > > to 64KB or 2MB at the same time) is a solution that is far more likely > > to give good results since it reduces both the total the number of > > 'blocks' on the device as well as reducing the total amount of calls > > throughout kernel space instead of increasing the cost per call. > > Talk about adding overhead... How long do you think it takes to read a > 2MB block (not to mention the time to update that page..) The additional > contention on the fiberchannel I/O alone might kill it if the filesystem > is busy. The time to update the pagetables is identical to the time to update a 4KB page when the OS is using a 2MB pagesize. Ofcourse it will take more time to load the data into the page, however it should be a consecutive stretch of data on disk, which should give a more efficient transfer than small blocks scattered around the disk. > Granted, 512 bytes could be considered too small for some things, but > once you pass 32K you start adding a lot of rotational delay problems. > I've used file systems with 256K blocks - they are slow when compaired > to the throughput using 32K. I wasn't the one running the benchmarks, > but with a MaxStrat 400GB raid with 256K sized data transfer was much > slower (around 3 times slower) than 32K. (The target application was > a GIS server using Oracle). But your subsystem (the disk) was probably still using 512 byte blocks, possibly scattered. And the OS was still using 4KB pages, it takes more time to reclaim and gather 64 pages per IO operation than one, that's why I'm saying that the pagesize needs to scale along with the blocksize. The application might have been assuming a small block size as well, and the OS was told to do several read/modify/write cycles, perhaps even 512 times as much as necessary. I'm not saying that the current system will perform well when working with large blocks, but compared to increasing the size of block_t, a larger blocksize has more potential to give improvements in the long term without adding an unrecoverable performance hit. Jan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
- Received message begins Here - > > On Tue, Mar 27, 2001 at 09:15:08AM -0800, LA Walsh wrote: > > Now lets look at the sites want to process terabytes of > > data -- perhaps files systems up into the Pentabyte range. Often I > > can see these being large multi-node (think 16-1024 clusters as > > are in use today for large super-clusters). If I was to characterize > > the performance of them, I'd likely see the CPU pegged at 100% > > with 99% usage in user space. Let's assume that increasing the > > block size decreases disk accesses by as much as 10% (you'll have > > to admit -- using a 64bit quantity vs. 32bit quantity isn't going > > to even come close to increasing disk access times by 1 millisecond, > > really, so it really is going to be a much smaller fraction when > > compared to the actual disk latency). > [snip] > > Is there some logical flaw in the above reasoning? > > But those changes will affect even the fastpath, i.e. data that is > already in the page/buffer caches. In which case we don't have to wait > for disk access latency. Why would anyone who is working with a > pentabyte of data even consider not relying on essentially always > hitting data that is available the read-ahead cache. It depends entirely on the application. Where the cache can contain 20% of the data, most accesses should already be in memory. If the data is significantly larger, there is a high chance that the data will not be there. > > Using similar numbers as presented. If we are working our way through > every single block in a Pentabyte filesystem, and the blocksize is 512 > bytes. Then the 1us in extra CPU cycles because of 64-bit operations > would add, according to by back of the envelope calculation, 2199023 > seconds of CPU time a bit more than 25 days. Ummm... I don't think it adds that much. You seem to be leaving out the overlap disk/IO and computation for read-ahead. This should eliminate the majority of the delay effect. > Seriously, there is a lot more that needs to be done than introducing a > 64-bit blocknumber. Effectively 512 byte blocks are far too small for > that kind of data, and going to pagesize blocks (and increasing pagesize > to 64KB or 2MB at the same time) is a solution that is far more likely > to give good results since it reduces both the total the number of > 'blocks' on the device as well as reducing the total amount of calls > throughout kernel space instead of increasing the cost per call. Talk about adding overhead... How long do you think it takes to read a 2MB block (not to mention the time to update that page..) The additional contention on the fiberchannel I/O alone might kill it if the filesystem is busy. Granted, 512 bytes could be considered too small for some things, but once you pass 32K you start adding a lot of rotational delay problems. I've used file systems with 256K blocks - they are slow when compaired to the throughput using 32K. I wasn't the one running the benchmarks, but with a MaxStrat 400GB raid with 256K sized data transfer was much slower (around 3 times slower) than 32K. (The target application was a GIS server using Oracle). - Jesse I Pollard, II Email: [EMAIL PROTECTED] Any opinions expressed are solely my own. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
LA Walsh <[EMAIL PROTECTED]>: > Ion Badulescu wrote: > > Compile option or not, 64-bit arithmetic is unacceptable on IA32. The > > introduction of LFS was bad enough, we don't need yet another proof that > > IA32 sucks. Especially when there *are* better alternatives. > === > So if it is a compile option -- the majority of people > wouldn't be affected, is that in agreement? Since the default would > be to use the same arithmetic as we use now. > > In fact, I posit that if anything, the majority of the people > might be helped as the block_nr becomes a a 'typed' value -- and > perhaps the sector_nr as well. They remain the same size, but as > a typed value the kernel gains increased integrity from the increased > type checking. At worst, it finds no new bugs and there is no impact > in speed. Are we in agreement so far? > > Now lets look at the sites want to process terabytes of > data -- perhaps files systems up into the Pentabyte range. Often I > can see these being large multi-node (think 16-1024 clusters as > are in use today for large super-clusters). If I was to characterize > the performance of them, I'd likely see the CPU pegged at 100% > with 99% usage in user space. Let's assume that increasing the > block size decreases disk accesses by as much as 10% (you'll have > to admit -- using a 64bit quantity vs. 32bit quantity isn't going > to even come close to increasing disk access times by 1 millisecond, > really, so it really is going to be a much smaller fraction when > compared to the actual disk latency). Relatively small quibble - Current large clusters (SP3, 330 node 4cpu/node) gets around 85% to 90% (real user) user mode total cpu. The rest is user mode is attributed to overhead. Why: 1. Inter-node communication/synchronization 2. Memory bus saturation 3. Users usually use only 3 cpus/node and allow the last cpu to handle filesystem/network/administration/batch handling functions. Using the last cpu in the node for part of the job reduces the overall throughput > Ok...but for the sake of > argument using 10% -- that's still only 10% of 1% spent in the system. > or a slowdown of .1%. Now that's using a really liberal figure > of 10%. If you look at the actual speed of 64 bit arithmatic vs. > 32, we're likely talking -- upper bound, 10x the clocks for > disk block arithmetic. Disk block arithmetic is a small fraction > of time spent in the kernel. We have to be looking at *maximum* > slowdowns in the range of a few hundred maybe a few thousand extra clocks. > A 1000 extra clocks on a 1G machine is 1 microsecond, or approx > 1/5000th your average seek latency on a *fast* hard disk. So > instead of 10% slowdown we are talking slowdowns in the 1/1000 range > or less. Now that's a slowdown in the 1% that was being spent in > the kernel, so now we've slowdown the total program speed by .001% > at the increase benefit (to that site) of being able to process > those mega-gig's (Pentabytes) of information. For a hit that is > not noticable to human perception, they go from not being able to > use super-clusters of IA32 machines (for which HW and SW is cheap), > to being able to use it. That's quite a cost savings for them. > > Is there some logical flaw in the above reasoning? - Jesse I Pollard, II Email: [EMAIL PROTECTED] Any opinions expressed are solely my own. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
On Tue, Mar 27, 2001 at 09:15:08AM -0800, LA Walsh wrote: > Now lets look at the sites want to process terabytes of > data -- perhaps files systems up into the Pentabyte range. Often I > can see these being large multi-node (think 16-1024 clusters as > are in use today for large super-clusters). If I was to characterize > the performance of them, I'd likely see the CPU pegged at 100% > with 99% usage in user space. Let's assume that increasing the > block size decreases disk accesses by as much as 10% (you'll have > to admit -- using a 64bit quantity vs. 32bit quantity isn't going > to even come close to increasing disk access times by 1 millisecond, > really, so it really is going to be a much smaller fraction when > compared to the actual disk latency). [snip] > Is there some logical flaw in the above reasoning? But those changes will affect even the fastpath, i.e. data that is already in the page/buffer caches. In which case we don't have to wait for disk access latency. Why would anyone who is working with a pentabyte of data even consider not relying on essentially always hitting data that is available the read-ahead cache. Using similar numbers as presented. If we are working our way through every single block in a Pentabyte filesystem, and the blocksize is 512 bytes. Then the 1us in extra CPU cycles because of 64-bit operations would add, according to by back of the envelope calculation, 2199023 seconds of CPU time a bit more than 25 days. Seriously, there is a lot more that needs to be done than introducing a 64-bit blocknumber. Effectively 512 byte blocks are far too small for that kind of data, and going to pagesize blocks (and increasing pagesize to 64KB or 2MB at the same time) is a solution that is far more likely to give good results since it reduces both the total the number of 'blocks' on the device as well as reducing the total amount of calls throughout kernel space instead of increasing the cost per call. Jan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
Ion Badulescu wrote: > Are you being deliberately insulting, "L", or are you one of those users > who bitch and scream for features they *need* at *any cost*, and who > have never even opened up the book for Computer Architecture 101? --- Sorry, I was borderline insulting. I'm getting pressure on personal fronts other than just here. But my degree is in computer science and I've had almost 20 years experience programming things as small as 8080's w/ 4K ram on up. I'm familiar with 'cost' of emulation. > Let's try to keep the discussion civilized, shall we? --- Certainly. > > Compile option or not, 64-bit arithmetic is unacceptable on IA32. The > introduction of LFS was bad enough, we don't need yet another proof that > IA32 sucks. Especially when there *are* better alternatives. === So if it is a compile option -- the majority of people wouldn't be affected, is that in agreement? Since the default would be to use the same arithmetic as we use now. In fact, I posit that if anything, the majority of the people might be helped as the block_nr becomes a a 'typed' value -- and perhaps the sector_nr as well. They remain the same size, but as a typed value the kernel gains increased integrity from the increased type checking. At worst, it finds no new bugs and there is no impact in speed. Are we in agreement so far? Now lets look at the sites want to process terabytes of data -- perhaps files systems up into the Pentabyte range. Often I can see these being large multi-node (think 16-1024 clusters as are in use today for large super-clusters). If I was to characterize the performance of them, I'd likely see the CPU pegged at 100% with 99% usage in user space. Let's assume that increasing the block size decreases disk accesses by as much as 10% (you'll have to admit -- using a 64bit quantity vs. 32bit quantity isn't going to even come close to increasing disk access times by 1 millisecond, really, so it really is going to be a much smaller fraction when compared to the actual disk latency). Ok...but for the sake of argument using 10% -- that's still only 10% of 1% spent in the system. or a slowdown of .1%. Now that's using a really liberal figure of 10%. If you look at the actual speed of 64 bit arithmatic vs. 32, we're likely talking -- upper bound, 10x the clocks for disk block arithmetic. Disk block arithmetic is a small fraction of time spent in the kernel. We have to be looking at *maximum* slowdowns in the range of a few hundred maybe a few thousand extra clocks. A 1000 extra clocks on a 1G machine is 1 microsecond, or approx 1/5000th your average seek latency on a *fast* hard disk. So instead of 10% slowdown we are talking slowdowns in the 1/1000 range or less. Now that's a slowdown in the 1% that was being spent in the kernel, so now we've slowdown the total program speed by .001% at the increase benefit (to that site) of being able to process those mega-gig's (Pentabytes) of information. For a hit that is not noticable to human perception, they go from not being able to use super-clusters of IA32 machines (for which HW and SW is cheap), to being able to use it. That's quite a cost savings for them. Is there some logical flaw in the above reasoning? -linda -- L A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
Ion Badulescu wrote: Are you being deliberately insulting, "L", or are you one of those users who bitch and scream for features they *need* at *any cost*, and who have never even opened up the book for Computer Architecture 101? --- Sorry, I was borderline insulting. I'm getting pressure on personal fronts other than just here. But my degree is in computer science and I've had almost 20 years experience programming things as small as 8080's w/ 4K ram on up. I'm familiar with 'cost' of emulation. Let's try to keep the discussion civilized, shall we? --- Certainly. Compile option or not, 64-bit arithmetic is unacceptable on IA32. The introduction of LFS was bad enough, we don't need yet another proof that IA32 sucks. Especially when there *are* better alternatives. === So if it is a compile option -- the majority of people wouldn't be affected, is that in agreement? Since the default would be to use the same arithmetic as we use now. In fact, I posit that if anything, the majority of the people might be helped as the block_nr becomes a a 'typed' value -- and perhaps the sector_nr as well. They remain the same size, but as a typed value the kernel gains increased integrity from the increased type checking. At worst, it finds no new bugs and there is no impact in speed. Are we in agreement so far? Now lets look at the sites want to process terabytes of data -- perhaps files systems up into the Pentabyte range. Often I can see these being large multi-node (think 16-1024 clusters as are in use today for large super-clusters). If I was to characterize the performance of them, I'd likely see the CPU pegged at 100% with 99% usage in user space. Let's assume that increasing the block size decreases disk accesses by as much as 10% (you'll have to admit -- using a 64bit quantity vs. 32bit quantity isn't going to even come close to increasing disk access times by 1 millisecond, really, so it really is going to be a much smaller fraction when compared to the actual disk latency). Ok...but for the sake of argument using 10% -- that's still only 10% of 1% spent in the system. or a slowdown of .1%. Now that's using a really liberal figure of 10%. If you look at the actual speed of 64 bit arithmatic vs. 32, we're likely talking -- upper bound, 10x the clocks for disk block arithmetic. Disk block arithmetic is a small fraction of time spent in the kernel. We have to be looking at *maximum* slowdowns in the range of a few hundred maybe a few thousand extra clocks. A 1000 extra clocks on a 1G machine is 1 microsecond, or approx 1/5000th your average seek latency on a *fast* hard disk. So instead of 10% slowdown we are talking slowdowns in the 1/1000 range or less. Now that's a slowdown in the 1% that was being spent in the kernel, so now we've slowdown the total program speed by .001% at the increase benefit (to that site) of being able to process those mega-gig's (Pentabytes) of information. For a hit that is not noticable to human perception, they go from not being able to use super-clusters of IA32 machines (for which HW and SW is cheap), to being able to use it. That's quite a cost savings for them. Is there some logical flaw in the above reasoning? -linda -- L A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
On Tue, Mar 27, 2001 at 09:15:08AM -0800, LA Walsh wrote: Now lets look at the sites want to process terabytes of data -- perhaps files systems up into the Pentabyte range. Often I can see these being large multi-node (think 16-1024 clusters as are in use today for large super-clusters). If I was to characterize the performance of them, I'd likely see the CPU pegged at 100% with 99% usage in user space. Let's assume that increasing the block size decreases disk accesses by as much as 10% (you'll have to admit -- using a 64bit quantity vs. 32bit quantity isn't going to even come close to increasing disk access times by 1 millisecond, really, so it really is going to be a much smaller fraction when compared to the actual disk latency). [snip] Is there some logical flaw in the above reasoning? But those changes will affect even the fastpath, i.e. data that is already in the page/buffer caches. In which case we don't have to wait for disk access latency. Why would anyone who is working with a pentabyte of data even consider not relying on essentially always hitting data that is available the read-ahead cache. Using similar numbers as presented. If we are working our way through every single block in a Pentabyte filesystem, and the blocksize is 512 bytes. Then the 1us in extra CPU cycles because of 64-bit operations would add, according to by back of the envelope calculation, 2199023 seconds of CPU time a bit more than 25 days. Seriously, there is a lot more that needs to be done than introducing a 64-bit blocknumber. Effectively 512 byte blocks are far too small for that kind of data, and going to pagesize blocks (and increasing pagesize to 64KB or 2MB at the same time) is a solution that is far more likely to give good results since it reduces both the total the number of 'blocks' on the device as well as reducing the total amount of calls throughout kernel space instead of increasing the cost per call. Jan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
LA Walsh [EMAIL PROTECTED]: Ion Badulescu wrote: Compile option or not, 64-bit arithmetic is unacceptable on IA32. The introduction of LFS was bad enough, we don't need yet another proof that IA32 sucks. Especially when there *are* better alternatives. === So if it is a compile option -- the majority of people wouldn't be affected, is that in agreement? Since the default would be to use the same arithmetic as we use now. In fact, I posit that if anything, the majority of the people might be helped as the block_nr becomes a a 'typed' value -- and perhaps the sector_nr as well. They remain the same size, but as a typed value the kernel gains increased integrity from the increased type checking. At worst, it finds no new bugs and there is no impact in speed. Are we in agreement so far? Now lets look at the sites want to process terabytes of data -- perhaps files systems up into the Pentabyte range. Often I can see these being large multi-node (think 16-1024 clusters as are in use today for large super-clusters). If I was to characterize the performance of them, I'd likely see the CPU pegged at 100% with 99% usage in user space. Let's assume that increasing the block size decreases disk accesses by as much as 10% (you'll have to admit -- using a 64bit quantity vs. 32bit quantity isn't going to even come close to increasing disk access times by 1 millisecond, really, so it really is going to be a much smaller fraction when compared to the actual disk latency). Relatively small quibble - Current large clusters (SP3, 330 node 4cpu/node) gets around 85% to 90% (real user) user mode total cpu. The rest is user mode is attributed to overhead. Why: 1. Inter-node communication/synchronization 2. Memory bus saturation 3. Users usually use only 3 cpus/node and allow the last cpu to handle filesystem/network/administration/batch handling functions. Using the last cpu in the node for part of the job reduces the overall throughput Ok...but for the sake of argument using 10% -- that's still only 10% of 1% spent in the system. or a slowdown of .1%. Now that's using a really liberal figure of 10%. If you look at the actual speed of 64 bit arithmatic vs. 32, we're likely talking -- upper bound, 10x the clocks for disk block arithmetic. Disk block arithmetic is a small fraction of time spent in the kernel. We have to be looking at *maximum* slowdowns in the range of a few hundred maybe a few thousand extra clocks. A 1000 extra clocks on a 1G machine is 1 microsecond, or approx 1/5000th your average seek latency on a *fast* hard disk. So instead of 10% slowdown we are talking slowdowns in the 1/1000 range or less. Now that's a slowdown in the 1% that was being spent in the kernel, so now we've slowdown the total program speed by .001% at the increase benefit (to that site) of being able to process those mega-gig's (Pentabytes) of information. For a hit that is not noticable to human perception, they go from not being able to use super-clusters of IA32 machines (for which HW and SW is cheap), to being able to use it. That's quite a cost savings for them. Is there some logical flaw in the above reasoning? - Jesse I Pollard, II Email: [EMAIL PROTECTED] Any opinions expressed are solely my own. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
- Received message begins Here - On Tue, Mar 27, 2001 at 09:15:08AM -0800, LA Walsh wrote: Now lets look at the sites want to process terabytes of data -- perhaps files systems up into the Pentabyte range. Often I can see these being large multi-node (think 16-1024 clusters as are in use today for large super-clusters). If I was to characterize the performance of them, I'd likely see the CPU pegged at 100% with 99% usage in user space. Let's assume that increasing the block size decreases disk accesses by as much as 10% (you'll have to admit -- using a 64bit quantity vs. 32bit quantity isn't going to even come close to increasing disk access times by 1 millisecond, really, so it really is going to be a much smaller fraction when compared to the actual disk latency). [snip] Is there some logical flaw in the above reasoning? But those changes will affect even the fastpath, i.e. data that is already in the page/buffer caches. In which case we don't have to wait for disk access latency. Why would anyone who is working with a pentabyte of data even consider not relying on essentially always hitting data that is available the read-ahead cache. It depends entirely on the application. Where the cache can contain 20% of the data, most accesses should already be in memory. If the data is significantly larger, there is a high chance that the data will not be there. Using similar numbers as presented. If we are working our way through every single block in a Pentabyte filesystem, and the blocksize is 512 bytes. Then the 1us in extra CPU cycles because of 64-bit operations would add, according to by back of the envelope calculation, 2199023 seconds of CPU time a bit more than 25 days. Ummm... I don't think it adds that much. You seem to be leaving out the overlap disk/IO and computation for read-ahead. This should eliminate the majority of the delay effect. Seriously, there is a lot more that needs to be done than introducing a 64-bit blocknumber. Effectively 512 byte blocks are far too small for that kind of data, and going to pagesize blocks (and increasing pagesize to 64KB or 2MB at the same time) is a solution that is far more likely to give good results since it reduces both the total the number of 'blocks' on the device as well as reducing the total amount of calls throughout kernel space instead of increasing the cost per call. Talk about adding overhead... How long do you think it takes to read a 2MB block (not to mention the time to update that page..) The additional contention on the fiberchannel I/O alone might kill it if the filesystem is busy. Granted, 512 bytes could be considered too small for some things, but once you pass 32K you start adding a lot of rotational delay problems. I've used file systems with 256K blocks - they are slow when compaired to the throughput using 32K. I wasn't the one running the benchmarks, but with a MaxStrat 400GB raid with 256K sized data transfer was much slower (around 3 times slower) than 32K. (The target application was a GIS server using Oracle). - Jesse I Pollard, II Email: [EMAIL PROTECTED] Any opinions expressed are solely my own. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
On Tue, Mar 27, 2001 at 01:57:42PM -0600, Jesse Pollard wrote: Using similar numbers as presented. If we are working our way through every single block in a Pentabyte filesystem, and the blocksize is 512 bytes. Then the 1us in extra CPU cycles because of 64-bit operations would add, according to by back of the envelope calculation, 2199023 seconds of CPU time a bit more than 25 days. Ummm... I don't think it adds that much. You seem to be leaving out the overlap disk/IO and computation for read-ahead. This should eliminate the majority of the delay effect. 1024 TB should be around 2*10^12 512-byte blocks, divide by 10^6 (1us) of "assumed" overhead per block operation is 2*10^6 seconds, no I believe I'm pretty close there. I am considering everything being "available in the cache", i.e. no waiting for disk access. Seriously, there is a lot more that needs to be done than introducing a 64-bit blocknumber. Effectively 512 byte blocks are far too small for that kind of data, and going to pagesize blocks (and increasing pagesize to 64KB or 2MB at the same time) is a solution that is far more likely to give good results since it reduces both the total the number of 'blocks' on the device as well as reducing the total amount of calls throughout kernel space instead of increasing the cost per call. Talk about adding overhead... How long do you think it takes to read a 2MB block (not to mention the time to update that page..) The additional contention on the fiberchannel I/O alone might kill it if the filesystem is busy. The time to update the pagetables is identical to the time to update a 4KB page when the OS is using a 2MB pagesize. Ofcourse it will take more time to load the data into the page, however it should be a consecutive stretch of data on disk, which should give a more efficient transfer than small blocks scattered around the disk. Granted, 512 bytes could be considered too small for some things, but once you pass 32K you start adding a lot of rotational delay problems. I've used file systems with 256K blocks - they are slow when compaired to the throughput using 32K. I wasn't the one running the benchmarks, but with a MaxStrat 400GB raid with 256K sized data transfer was much slower (around 3 times slower) than 32K. (The target application was a GIS server using Oracle). But your subsystem (the disk) was probably still using 512 byte blocks, possibly scattered. And the OS was still using 4KB pages, it takes more time to reclaim and gather 64 pages per IO operation than one, that's why I'm saying that the pagesize needs to scale along with the blocksize. The application might have been assuming a small block size as well, and the OS was told to do several read/modify/write cycles, perhaps even 512 times as much as necessary. I'm not saying that the current system will perform well when working with large blocks, but compared to increasing the size of block_t, a larger blocksize has more potential to give improvements in the long term without adding an unrecoverable performance hit. Jan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
Jan Harkes wrote: On Tue, Mar 27, 2001 at 01:57:42PM -0600, Jesse Pollard wrote: Using similar numbers as presented. If we are working our way through every single block in a Pentabyte filesystem, and the blocksize is 512 bytes. Then the 1us in extra CPU cycles because of 64-bit operations would add, according to by back of the envelope calculation, 2199023 seconds of CPU time a bit more than 25 days. Ummm... I don't think it adds that much. You seem to be leaving out the overlap disk/IO and computation for read-ahead. This should eliminate the majority of the delay effect. 1024 TB should be around 2*10^12 512-byte blocks, divide by 10^6 (1us) of "assumed" overhead per block operation is 2*10^6 seconds, no I believe I'm pretty close there. I am considering everything being "available in the cache", i.e. no waiting for disk access. --- If everything being used is only used from the cache, then the application probably doesn't need 64-bit block support. I submit that your argument may be flawed in the assumption that if an application needs multi-terabyte files and devices, that most of the data will be in the in-memory cache. The time to update the pagetables is identical to the time to update a 4KB page when the OS is using a 2MB pagesize. Ofcourse it will take more time to load the data into the page, however it should be a consecutive stretch of data on disk, which should give a more efficient transfer than small blocks scattered around the disk. --- Not if you were doing alot of random reads where you only needd 1-2K of data. The read-time of the extra 2M-1K would seem to eat into any performance boot gained by the large pagesize. Granted, 512 bytes could be considered too small for some things, but once you pass 32K you start adding a lot of rotational delay problems. I've used file systems with 256K blocks - they are slow when compaired to the throughput using 32K. I wasn't the one running the benchmarks, but with a MaxStrat 400GB raid with 256K sized data transfer was much slower (around 3 times slower) than 32K. (The target application was a GIS server using Oracle). But your subsystem (the disk) was probably still using 512 byte blocks, possibly scattered. And the OS was still using 4KB pages, it takes more time to reclaim and gather 64 pages per IO operation than one, that's why I'm saying that the pagesize needs to scale along with the blocksize. The application might have been assuming a small block size as well, and the OS was told to do several read/modify/write cycles, perhaps even 512 times as much as necessary. I'm not saying that the current system will perform well when working with large blocks, but compared to increasing the size of block_t, a larger blocksize has more potential to give improvements in the long term without adding an unrecoverable performance hit. --- That's totally application dependent. Database applications might tend to skip around in the data and do short/reads/writes over a very large file. Large block sizes will degrade their performance. This was the idea of making it a *configurable* option. If you need it, configure it. Same with block size -- that should likely have a wider range for configuration as well. But configuration (and ideally auto-configuration where possible) seems the ultimate win-win situation. -l -- The above thoughts are my own and do not necessarily represent those of my employer. L A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
Jan Harkes [EMAIL PROTECTED]: On Tue, Mar 27, 2001 at 01:57:42PM -0600, Jesse Pollard wrote: Using similar numbers as presented. If we are working our way through every single block in a Pentabyte filesystem, and the blocksize is 512 bytes. Then the 1us in extra CPU cycles because of 64-bit operations would add, according to by back of the envelope calculation, 2199023 seconds of CPU time a bit more than 25 days. Ummm... I don't think it adds that much. You seem to be leaving out the overlap disk/IO and computation for read-ahead. This should eliminate the majority of the delay effect. 1024 TB should be around 2*10^12 512-byte blocks, divide by 10^6 (1us) of "assumed" overhead per block operation is 2*10^6 seconds, no I believe I'm pretty close there. I am considering everything being "available in the cache", i.e. no waiting for disk access. That would be true for small files ( 5GB). I have to deal with files that may be 20-100 GB. Except for the largest systems (200GB of main memory) the data will NOT be in the cache except for ~50% of the time. (assuming only one user) Seriously, there is a lot more that needs to be done than introducing a 64-bit blocknumber. Effectively 512 byte blocks are far too small for that kind of data, and going to pagesize blocks (and increasing pagesize to 64KB or 2MB at the same time) is a solution that is far more likely to give good results since it reduces both the total the number of 'blocks' on the device as well as reducing the total amount of calls throughout kernel space instead of increasing the cost per call. Talk about adding overhead... How long do you think it takes to read a 2MB block (not to mention the time to update that page..) The additional contention on the fiberchannel I/O alone might kill it if the filesystem is busy. The time to update the pagetables is identical to the time to update a 4KB page when the OS is using a 2MB pagesize. Ofcourse it will take more time to load the data into the page, however it should be a consecutive stretch of data on disk, which should give a more efficient transfer than small blocks scattered around the disk. You assume the file is accessed sequentially. The wether models don't do that. They do have some locality, but only in a 3D sense. When you include time it becomes closer to a random disk block reference when everything has to be linearized. Granted, 512 bytes could be considered too small for some things, but once you pass 32K you start adding a lot of rotational delay problems. I've used file systems with 256K blocks - they are slow when compaired to the throughput using 32K. I wasn't the one running the benchmarks, but with a MaxStrat 400GB raid with 256K sized data transfer was much slower (around 3 times slower) than 32K. (The target application was a GIS server using Oracle). But your subsystem (the disk) was probably still using 512 byte blocks, possibly scattered. And the OS was still using 4KB pages, it takes more time to reclaim and gather 64 pages per IO operation than one, that's why I'm saying that the pagesize needs to scale along with the blocksize. It wasn't - the "disks" were composed of groups of 5 drives in a raid striped for speed and spread across 5 SCSI III controlers. Each raid attached had 16MB internal cache. I think the controlers were using an entire sector read (32K). The application might have been assuming a small block size as well, and the OS was told to do several read/modify/write cycles, perhaps even 512 times as much as necessary. There was some of that, but not much. Oracle (as I recall) allows for the specification of transfer size. This also brings up the problem of small files. Allocating 2MB per file would, waist quite a bit of disk space (assuming 5 - 10 million files with only 15% having 25GB or more). I'm not saying that the current system will perform well when working with large blocks, but compared to increasing the size of block_t, a larger blocksize has more potential to give improvements in the long term without adding an unrecoverable performance hit. Not when the filesystem is required for general use. It only makes it simpler to actually have a large filesystem. It doesn't help when it must be used. Now you are saying that the throughput WILL go down, but only if you use large block sizes. I can go along with making block sizes up to 8K. Even 32K for special circumstances (even 64K for dedicated use). But not larger. NFS overhead on file I/O becomes way too excessive (...worst example now is having to read a 2MB block to update 512 bytes, then write it back... :-) - Jesse I Pollard, II Email: [EMAIL PROTECTED] Any opinions expressed are solely my own. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED]
Re: 64-bit block sizes on 32-bit systems
Hi, Just a brief add to the discussion, besides which I have a vested interest in this! I do not believe that you can make the addressability of a device larger at the expense of granularity of address space at the bottom end. Just because ext2 has a single size for metadata does not mean everything you put on the disks does. XFS filesystems, for example, can be made with block sizes from 512 bytes to 64Kbytes (ok not working on linux across this range yet, but it will). In all of these cases we have chunks of metadata which are 512 bytes long, and we have chunks bigger than the blocksize. The 512 byte chunks are the superblock and the heads of the freespace structures, there are multilples of them through the filesystem. To top that, we have disk write ordering constraints that could mean that for two of the 512 byte chunks next to each other one must be written to disk now to free log space, the other must not be written to disk because it is in a transaction. We would be forced to do read-modify-write down at some lower level - wait the lower levels would not have the addressability. There are probably other things which will not fly if you lose the addressing granularity. Volume headers and such like would be one possibility. No I don't have a magic bullet solution, but I do not think that just increasing the granularity of the addressing is the correct answer, and yes I do agree that just growing the buffer_head fields is not perfect either. Steve Lord p.s. there was mention of bigger page size, it is not hard to fix, but the swap path will not even work with 64K pages right now. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
Steve Lord wrote: Just a brief add to the discussion, besides which I have a vested interest in this! I'll add my little comments as well, and hopefully not start a flamewar... :) [snip comments about blocksize, etc.] Here's a real-life example of something that most of you will probably hate me for mentioning: HFS uses variable sized blocks (made up of multiple 512 byte sectors), but stores block numbers as a 16 bit value. (I know, everyone will say, "We're talking about moving from 32 to 64 bits." Keep listening.) This gave great performance on the then current massive storage of a 20M drive. However, when it became possible to get the absolutely gigantic hard drive of 1G, it became more and more obvious that it was a drawback that was causing a huge amount of wasted space. Apple had to design a new filesystem (HFS+) that was able to represent blocks with a 32 bit number to overcome the effective limitation on how big a filesystem could be. It's getting to the point now that it's easily possible to put together a disk array that is large enough that even referring to blocks with a 32 bit value requires relatively large blocks. I don't know if we have very many filesystems that would support this feature, but it will become important a lot sooner than anyone may be thinking. Obviously this case isn't a perfect fit for the situation, since HFS was designed to be read by 32 bit machines, and the upgrade to 32 bits didn't give a CPU penalty, just a bus bandwidth problem. Also, I'm coming from a platform that actually can do a decent job of 64 bit, unlike x86, but we shouldn't disallow people from doing bigger and better things. It's become very popular lately to position Linux as an enterprise-ready system, and this is something that will be expected. People will want to access a multi-TB database as a single file, as well as other things that may seem crazy to most people now. I understand people's aversion to the #ifdefs in the code, but if the changes are made in a sane way, it can still be clean and easy to maintain. It's worth it to add a little complexity (particularly as an option) to add a feature that people will be demanding in the relatively near future. It might be a good idea to wait for 2.5, tho... Brad Boyer [EMAIL PROTECTED] P.S.: No, I have no personal reason to need any of this 64 bit filesystem stuff. Just trying to point out possibilities. Don't expect me to actually be writing this stuff... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
On Mon, Mar 26, 2001 at 08:39:21AM -0800, LA Walsh wrote: So...is it the plan, or has it been though about -- 'abstracting' block numbes as a typedef 'block_nr', then at compile time having it be selectable as to whether or not this was to be a 32-bit or 64 bit quantity -- that way older systems would Oh, did no-one mention the words `Module ABI' yet? -- Revolutions do not require corporate support. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
On Mon, 26 Mar 2001, Jonathan Morton wrote: >>These are NOT the only 64 bit systems - Intel, PPC, IBM (in various guises). >>If you need raw compute power, the Alpha is pretty good (we have over a >>1000 in a Cray T3..). > >Best of all, the PowerPC and the POWER are binary-compatible to a very >large degree - just the latter has an extra set of 64-bit instructions. >What was that I was hearing about having to redevelop or recompile your >apps for 64-bit? > >I can easily imagine a 64-bit filesystem being accessed by a bunch of >RS/6000s and monitored using an old PowerMac. Goodness, the PowerMac 9600 >even has 6 PCI slots to put all those SCSI-RAID and Ethernet cards in. :) Save the money - get one fiber channel and connect to all that through one interface... -- - Jesse I Pollard, II Email: [EMAIL PROTECTED] Any opinions expressed are solely my own. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
>These are NOT the only 64 bit systems - Intel, PPC, IBM (in various guises). >If you need raw compute power, the Alpha is pretty good (we have over a >1000 in a Cray T3..). Best of all, the PowerPC and the POWER are binary-compatible to a very large degree - just the latter has an extra set of 64-bit instructions. What was that I was hearing about having to redevelop or recompile your apps for 64-bit? I can easily imagine a 64-bit filesystem being accessed by a bunch of RS/6000s and monitored using an old PowerMac. Goodness, the PowerMac 9600 even has 6 PCI slots to put all those SCSI-RAID and Ethernet cards in. :) -- from: Jonathan "Chromatix" Morton mail: [EMAIL PROTECTED] (not for attachments) big-mail: [EMAIL PROTECTED] uni-mail: [EMAIL PROTECTED] The key to knowledge is not to rely on people to teach you it. Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/ -BEGIN GEEK CODE BLOCK- Version 3.12 GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*) -END GEEK CODE BLOCK- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
From: "LA Walsh" <[EMAIL PROTECTED]> > Manfred Spraul wrote: > > > > >4k page size * 2GB = 8TB. > > > > Try it. > > If your drive (array) is larger than 512byte*4G (4TB) linux will eat > > your data. > --- > I have a block device that doesn't use 'sectors'. It > only uses the logical block size (which is currently set for > 1K). Seems I could up that to the max blocksize (4k?) and > get 8TB...No? > > I don't use the generic block make request (have my > own). > Which field do you access? bh->b_blocknr instead of bh->r_sector? There were plans to split the buffer_head into 2 structures: buffer cache data and the block io data. b_blocknr is buffer cache only, no driver should access them. http://groups.google.com/groups?q=NeilBrown+io_head=en==off num=1=928643305=1 -- Manfred - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
Martin Dalecki <[EMAIL PROTECTED]>: > "Eric W. Biederman" wrote: > > > > Matthew Wilcox <[EMAIL PROTECTED]> writes: > > > > > On Mon, Mar 26, 2001 at 10:47:13AM -0700, Andreas Dilger wrote: > > > > What do you mean by problems 5 years down the road? The real issue is that > > > > this 32-bit block count limit affects composite devices like MD RAID and > > > > LVM today, not just individual disks. There have been several postings > > > > I have seen with people having a problem _today_ with a 2TB limit on > > > > devices. > > > > > > people who can afford 2TB of disc can afford to buy a 64-bit processor. > > > > Currently that doesn't solve the problem as block_nr is held in an int. > > And as gcc compiles an int to a 32bit number on a 64bit processor, the > > problem still isn't solved. > > > > That at least we need to address. > > And then you must face the fact that there may be the need for > some of the shelf software, which isn't well supported on > correspondig 64 bit architectures... as well. So the > arguemnt doesn't hold up to the reality in any way. You are missing the point - I may need to use a 32 bit system to monitor a large file system. I don't need the compute power of most 64 bit systems to monitor user file activity. > BTW. For many reasons 32 bit architecutres are in > respoect of some application shemes *faster* the 64. Which is why I want to use them with a 64 bit file system. Some of the weather models run here have been known to exceed 100 GB data file. Yes one file. Most only need 20GB, but there are a couple of hundred of them... > Ultra III in 64 mode just crawls in comparision to 32. Depends on what you are doing. If you need to handle large arrays of floating point it is reasonable (not great, just reasonable). > Alpha - unfortulatly an orphaned and dyring archtecutre... which > is not well supported by sw verndors... These are NOT the only 64 bit systems - Intel, PPC, IBM (in various guises). If you need raw compute power, the Alpha is pretty good (we have over a 1000 in a Cray T3..). - Jesse I Pollard, II Email: [EMAIL PROTECTED] Any opinions expressed are solely my own. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
On Mon, 26 Mar 2001, Andreas Dilger wrote: > Matthew Wilcox writes: > > people who can afford 2TB of disc can afford to buy a 64-bit processor. > This whole "64-bit" fallacy has got to stop. Indeed. > Now it is "anybody who needs > 2TB disk should use a 64-bit CPU", soon > to be wrong. It was already wrong in 1995. -Dan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
> "Matthew" == Matthew Wilcox <[EMAIL PROTECTED]> writes: Matthew> On Mon, Mar 26, 2001 at 10:47:13AM -0700, Andreas Dilger Matthew> wrote: >> What do you mean by problems 5 years down the road? The real issue >> is that this 32-bit block count limit affects composite devices >> like MD RAID and LVM today, not just individual disks. There have >> been several postings I have seen with people having a problem >> _today_ with a 2TB limit on devices. Matthew> people who can afford 2TB of disc can afford to buy a 64-bit Matthew> processor. Oh great, and migrating a large application to a new architecture is soo cheap. Disk costs nothing these days and there is a legitimate need here. Jes - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
"Eric W. Biederman" wrote: > > Matthew Wilcox <[EMAIL PROTECTED]> writes: > > > On Mon, Mar 26, 2001 at 10:47:13AM -0700, Andreas Dilger wrote: > > > What do you mean by problems 5 years down the road? The real issue is that > > > this 32-bit block count limit affects composite devices like MD RAID and > > > LVM today, not just individual disks. There have been several postings > > > I have seen with people having a problem _today_ with a 2TB limit on > > > devices. > > > > people who can afford 2TB of disc can afford to buy a 64-bit processor. > > Currently that doesn't solve the problem as block_nr is held in an int. > And as gcc compiles an int to a 32bit number on a 64bit processor, the > problem still isn't solved. > > That at least we need to address. And then you must face the fact that there may be the need for some of the shelf software, which isn't well supported on correspondig 64 bit architectures... as well. So the arguemnt doesn't hold up to the reality in any way. BTW. For many reasons 32 bit architecutres are in respoect of some application shemes *faster* the 64. Ultra III in 64 mode just crawls in comparision to 32. Alpha - unfortulatly an orphaned and dyring archtecutre... which is not well supported by sw verndors... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
Manfred Spraul wrote: > > >4k page size * 2GB = 8TB. > > Try it. > If your drive (array) is larger than 512byte*4G (4TB) linux will eat > your data. --- I have a block device that doesn't use 'sectors'. It only uses the logical block size (which is currently set for 1K). Seems I could up that to the max blocksize (4k?) and get 8TB...No? I don't use the generic block make request (have my own). -- L A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
On Mon, 26 Mar 2001, Matthew Wilcox wrote: > people who can afford 2TB of disc can afford to buy a 64-bit processor. You realise that this'll double the price of storage? ;) (at least, in a year or two) Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com.br/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
- Received message begins Here - > > On Mon, Mar 26, 2001 at 08:39:21AM -0800, LA Walsh wrote: > > I vaguely remember a discussion about this a few months back. > > If I remember, the reasoning was it would unnecessarily slow > > down smaller systems that would never have block devices in > > the 4-28T range attached. > > 4k page size * 2GB = 8TB. > > i consider it much more likely on such systems that the page size will > be increased to maybe 16 or 64k which would give us 32TB or 128TB. > you keep on trying to increase the size of types without looking at > what gcc outputs in the way of code that manipulates 64-bit types. > seriously, why don't you just try it? see what the performance is. > see what the code size is. then come back with some numbers. and i mean > numbers, not `it doesn't feel any slower'. > > personally, i'm going to see what the situation looks like in 5 years time > and try to solve the problem then. there're enough real problems with the > VFS today that i don't feel inclined to fix tomorrow's potential problems. I don't feel that it is that far away ... IBM has already released a 64 CPU intel based system (NUMA). We already have systems in that class (though 64 bit based) that use 5 TB file systems. The need is coming, and appears to be coming fast. It should be resolved during the improvements to the VFS. A second reason to include it in the VFS is that the low level filesystem implementation would NOT be required to use it. If the administrator CHOOSES to access a 16TB filesystem from a workstation, then it should be possible (likely something like the GFS, where the administrator is just monitoring things, would be reasonable for a 32 bit system to do). As I see it, the VFS itself doesn't really care what the block size is, it just carries relatively opaque values that the filesystem implementation uses. Most of the overhead should just be copying an extra 4 bytes around. - Jesse I Pollard, II Email: [EMAIL PROTECTED] Any opinions expressed are solely my own. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
Matthew Wilcox <[EMAIL PROTECTED]> writes: > On Mon, Mar 26, 2001 at 10:47:13AM -0700, Andreas Dilger wrote: > > What do you mean by problems 5 years down the road? The real issue is that > > this 32-bit block count limit affects composite devices like MD RAID and > > LVM today, not just individual disks. There have been several postings > > I have seen with people having a problem _today_ with a 2TB limit on > > devices. > > people who can afford 2TB of disc can afford to buy a 64-bit processor. Currently that doesn't solve the problem as block_nr is held in an int. And as gcc compiles an int to a 32bit number on a 64bit processor, the problem still isn't solved. That at least we need to address. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
On Mon, Mar 26, 2001 at 08:01:21PM +0200, Manfred Spraul wrote: > drivers/block/ll_rw_blk.c, in submit_bh() > >bh->b_rsector = bh->b_blocknr * (bh->b_size >> 9); > > But it shouldn't cause data corruptions: > It was discussed a few months ago, and iirc LVM refuses to create too > large volumes. Ah yes, I'd forgotten the block layer still works in terms of 512-byte blocks. -- Revolutions do not require corporate support. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
On Mon, Mar 26, 2001 at 10:47:13AM -0700, Andreas Dilger wrote: > What do you mean by problems 5 years down the road? The real issue is that > this 32-bit block count limit affects composite devices like MD RAID and > LVM today, not just individual disks. There have been several postings > I have seen with people having a problem _today_ with a 2TB limit on > devices. people who can afford 2TB of disc can afford to buy a 64-bit processor. -- Revolutions do not require corporate support. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
>> I vaguely remember a discussion about this a few months back. >> If I remember, the reasoning was it would unnecessarily slow >> down smaller systems that would never have block devices in >> the 4-28T range attached. > >4k page size * 2GB = 8TB. Try it. If your drive (array) is larger than 512byte*4G (4TB) linux will eat your data. drivers/block/ll_rw_blk.c, in submit_bh() >bh->b_rsector = bh->b_blocknr * (bh->b_size >> 9); But it shouldn't cause data corruptions: It was discussed a few months ago, and iirc LVM refuses to create too large volumes. -- Manfred - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
LA Walsh <[EMAIL PROTECTED]> writes: > I vaguely remember a discussion about this a few months back. > If I remember, the reasoning was it would unnecessarily slow > down smaller systems that would never have block devices in > the 4-28T range attached. With classic 512 byte sectors the top size is right about 2TB. The basic thought is that 64bit numbers tend to suck, so we don't want then in any fast paths on a 32bit system. > However, isn't it possible there will continue to be a series > of P-IV,V,VI,VII ...etc, addons that will be used for sometime > to come. I've even heard it suggested that we might see > 2 or more CPU's on a single chip as a way to increase cpu > capacity w/o driving up clock speed. Given the cheapness of > .25T drives now, seeing the possibility of 4T drives doesn't seem > that remote (maybe 5 years?). > > Side question: does the 32-bit block size limit also apply to > RAID disks or does it use a different block-nr type? For now yes it does. > > So...is it the plan, or has it been though about -- 'abstracting' > block numbes as a typedef 'block_nr', then at compile time > having it be selectable as to whether or not this was to > be a 32-bit or 64 bit quantity -- that way older systems would > lose no efficiency. Drivers that couldn't be or hadn't been > ported to use 'block_nr' could default to being disabled if > 64-bit blocks were selected, etc. > > So has this idea been tossed about and or previously thrashed? Using a 64bit number of 32bit systems has so far been trashed. Though this does look like a real problem that needs to be solved at some point. I doubt we can wait past 2.5 though if we want the code ready when the hardware is. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
Matthew Wilcox wrote: > > On Mon, Mar 26, 2001 at 08:39:21AM -0800, LA Walsh wrote: > > I vaguely remember a discussion about this a few months back. > > If I remember, the reasoning was it would unnecessarily slow > > down smaller systems that would never have block devices in > > the 4-28T range attached. > > 4k page size * 2GB = 8TB. --- Drat...was being more optimistic -- you're right the block_nr can be negative. Somehow thought page size could be 8Kliving in future land. That just makes the limitations even closer at hand...:-( > you keep on trying to increase the size of types without looking at > what gcc outputs in the way of code that manipulates 64-bit types. --- Maybe someone will backport some of the features of the IA-64 code generator into 'gcc'. I've been told that in some cases it's a 2.5x performance difference. If 'gcc' is generating bad code, then maybe the 'gcc' people will increase the quality of their code -- I'm sure they are just as eagerly working on gcc improvements as we are kernel improvements. When I worked on the PL/M compiler project at Intel, I know our code-optimization guy would spend endless cycles trying to get better optimization out of the code. He got great joy out of doing so. -- and that was almost 20 years ago -- and code generation has come a *long* way since then. > seriously, why don't you just try it? see what the performance is. > see what the code size is. then come back with some numbers. and i mean > numbers, not `it doesn't feel any slower'. --- As for 'trying' it -- would anyone care if we virtualized the block_nr into a typedef? That seems like it would provide for cleaner (type-checked) code at no performance penalty and more easily allow such comparisons. Well this is my point: if I have disks > 8T, wouldn't it be at *all* beneficial to be able to *choose* some slight performance impact and access those large disks vs. having not choice? Having it as a configurable would allow a given installation to make that choice rather than them having no choice. BTW, are block_nr's on RAID arrays subject to this limitation? > > personally, i'm going to see what the situation looks like in 5 years time > and try to solve the problem then. --- It's not the same, but SGI has had customers for over 3 years using >2T *files*. The point I'm looking at is if the P-X series gets developed enough, and someone is using a 4-16P system, a corp user might be approaching that limit today or tomorrow. Joe User, might not for 5 years, but that's what the configurability is about. Keep linux usable for both ends of the scale -- "I love scalability" -l -- L A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 -- L A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
Matthew Wilcox writes: > On Mon, Mar 26, 2001 at 08:39:21AM -0800, LA Walsh wrote: > > I vaguely remember a discussion about this a few months back. > > If I remember, the reasoning was it would unnecessarily slow > > down smaller systems that would never have block devices in > > the 4-28T range attached. > > 4k page size * 2GB = 8TB. > > i consider it much more likely on such systems that the page size will > be increased to maybe 16 or 64k which would give us 32TB or 128TB. > > personally, i'm going to see what the situation looks like in 5 years time > and try to solve the problem then. What do you mean by problems 5 years down the road? The real issue is that this 32-bit block count limit affects composite devices like MD RAID and LVM today, not just individual disks. There have been several postings I have seen with people having a problem _today_ with a 2TB limit on devices. There is some hope with LVM (and MD I suspect as well), that it could do blocksize remapping, so it appears to be a 4k sector device, but remaps to 512-byte sector disks underneath. This _should_ give us an upper limit of 16TB, assuming 32-bit unsigned ints for block numbers. Of course, you would need to only do 4kB block I/O on top of these devices (not much of an issue for such large devices). Still, this is just a stop-gap measure because next year people will want > 16TB devices, and there won't be an easy way to do this. Cheers, Andreas -- Andreas Dilger \ "If a man ate a pound of pasta and a pound of antipasto, \ would they cancel out, leaving him still hungry?" http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
On Mon, Mar 26, 2001 at 08:39:21AM -0800, LA Walsh wrote: > I vaguely remember a discussion about this a few months back. > If I remember, the reasoning was it would unnecessarily slow > down smaller systems that would never have block devices in > the 4-28T range attached. 4k page size * 2GB = 8TB. i consider it much more likely on such systems that the page size will be increased to maybe 16 or 64k which would give us 32TB or 128TB. you keep on trying to increase the size of types without looking at what gcc outputs in the way of code that manipulates 64-bit types. seriously, why don't you just try it? see what the performance is. see what the code size is. then come back with some numbers. and i mean numbers, not `it doesn't feel any slower'. personally, i'm going to see what the situation looks like in 5 years time and try to solve the problem then. there're enough real problems with the VFS today that i don't feel inclined to fix tomorrow's potential problems. -- Revolutions do not require corporate support. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
64-bit block sizes on 32-bit systems
I vaguely remember a discussion about this a few months back. If I remember, the reasoning was it would unnecessarily slow down smaller systems that would never have block devices in the 4-28T range attached. However, isn't it possible there will continue to be a series of P-IV,V,VI,VII ...etc, addons that will be used for sometime to come. I've even heard it suggested that we might see 2 or more CPU's on a single chip as a way to increase cpu capacity w/o driving up clock speed. Given the cheapness of .25T drives now, seeing the possibility of 4T drives doesn't seem that remote (maybe 5 years?). Side question: does the 32-bit block size limit also apply to RAID disks or does it use a different block-nr type? So...is it the plan, or has it been though about -- 'abstracting' block numbes as a typedef 'block_nr', then at compile time having it be selectable as to whether or not this was to be a 32-bit or 64 bit quantity -- that way older systems would lose no efficiency. Drivers that couldn't be or hadn't been ported to use 'block_nr' could default to being disabled if 64-bit blocks were selected, etc. So has this idea been tossed about and or previously thrashed? -l -- L A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
64-bit block sizes on 32-bit systems
I vaguely remember a discussion about this a few months back. If I remember, the reasoning was it would unnecessarily slow down smaller systems that would never have block devices in the 4-28T range attached. However, isn't it possible there will continue to be a series of P-IV,V,VI,VII ...etc, addons that will be used for sometime to come. I've even heard it suggested that we might see 2 or more CPU's on a single chip as a way to increase cpu capacity w/o driving up clock speed. Given the cheapness of .25T drives now, seeing the possibility of 4T drives doesn't seem that remote (maybe 5 years?). Side question: does the 32-bit block size limit also apply to RAID disks or does it use a different block-nr type? So...is it the plan, or has it been though about -- 'abstracting' block numbes as a typedef 'block_nr', then at compile time having it be selectable as to whether or not this was to be a 32-bit or 64 bit quantity -- that way older systems would lose no efficiency. Drivers that couldn't be or hadn't been ported to use 'block_nr' could default to being disabled if 64-bit blocks were selected, etc. So has this idea been tossed about and or previously thrashed? -l -- L A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
On Mon, Mar 26, 2001 at 08:39:21AM -0800, LA Walsh wrote: I vaguely remember a discussion about this a few months back. If I remember, the reasoning was it would unnecessarily slow down smaller systems that would never have block devices in the 4-28T range attached. 4k page size * 2GB = 8TB. i consider it much more likely on such systems that the page size will be increased to maybe 16 or 64k which would give us 32TB or 128TB. you keep on trying to increase the size of types without looking at what gcc outputs in the way of code that manipulates 64-bit types. seriously, why don't you just try it? see what the performance is. see what the code size is. then come back with some numbers. and i mean numbers, not `it doesn't feel any slower'. personally, i'm going to see what the situation looks like in 5 years time and try to solve the problem then. there're enough real problems with the VFS today that i don't feel inclined to fix tomorrow's potential problems. -- Revolutions do not require corporate support. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
Matthew Wilcox wrote: On Mon, Mar 26, 2001 at 08:39:21AM -0800, LA Walsh wrote: I vaguely remember a discussion about this a few months back. If I remember, the reasoning was it would unnecessarily slow down smaller systems that would never have block devices in the 4-28T range attached. 4k page size * 2GB = 8TB. --- Drat...was being more optimistic -- you're right the block_nr can be negative. Somehow thought page size could be 8Kliving in future land. That just makes the limitations even closer at hand...:-( you keep on trying to increase the size of types without looking at what gcc outputs in the way of code that manipulates 64-bit types. --- Maybe someone will backport some of the features of the IA-64 code generator into 'gcc'. I've been told that in some cases it's a 2.5x performance difference. If 'gcc' is generating bad code, then maybe the 'gcc' people will increase the quality of their code -- I'm sure they are just as eagerly working on gcc improvements as we are kernel improvements. When I worked on the PL/M compiler project at Intel, I know our code-optimization guy would spend endless cycles trying to get better optimization out of the code. He got great joy out of doing so. -- and that was almost 20 years ago -- and code generation has come a *long* way since then. seriously, why don't you just try it? see what the performance is. see what the code size is. then come back with some numbers. and i mean numbers, not `it doesn't feel any slower'. --- As for 'trying' it -- would anyone care if we virtualized the block_nr into a typedef? That seems like it would provide for cleaner (type-checked) code at no performance penalty and more easily allow such comparisons. Well this is my point: if I have disks 8T, wouldn't it be at *all* beneficial to be able to *choose* some slight performance impact and access those large disks vs. having not choice? Having it as a configurable would allow a given installation to make that choice rather than them having no choice. BTW, are block_nr's on RAID arrays subject to this limitation? personally, i'm going to see what the situation looks like in 5 years time and try to solve the problem then. --- It's not the same, but SGI has had customers for over 3 years using 2T *files*. The point I'm looking at is if the P-X series gets developed enough, and someone is using a 4-16P system, a corp user might be approaching that limit today or tomorrow. Joe User, might not for 5 years, but that's what the configurability is about. Keep linux usable for both ends of the scale -- "I love scalability" -l -- L A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 -- L A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
I vaguely remember a discussion about this a few months back. If I remember, the reasoning was it would unnecessarily slow down smaller systems that would never have block devices in the 4-28T range attached. 4k page size * 2GB = 8TB. Try it. If your drive (array) is larger than 512byte*4G (4TB) linux will eat your data. drivers/block/ll_rw_blk.c, in submit_bh() bh-b_rsector = bh-b_blocknr * (bh-b_size 9); But it shouldn't cause data corruptions: It was discussed a few months ago, and iirc LVM refuses to create too large volumes. -- Manfred - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
On Mon, Mar 26, 2001 at 08:01:21PM +0200, Manfred Spraul wrote: drivers/block/ll_rw_blk.c, in submit_bh() bh-b_rsector = bh-b_blocknr * (bh-b_size 9); But it shouldn't cause data corruptions: It was discussed a few months ago, and iirc LVM refuses to create too large volumes. Ah yes, I'd forgotten the block layer still works in terms of 512-byte blocks. -- Revolutions do not require corporate support. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
On Mon, Mar 26, 2001 at 10:47:13AM -0700, Andreas Dilger wrote: What do you mean by problems 5 years down the road? The real issue is that this 32-bit block count limit affects composite devices like MD RAID and LVM today, not just individual disks. There have been several postings I have seen with people having a problem _today_ with a 2TB limit on devices. people who can afford 2TB of disc can afford to buy a 64-bit processor. -- Revolutions do not require corporate support. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
Matthew Wilcox writes: On Mon, Mar 26, 2001 at 08:39:21AM -0800, LA Walsh wrote: I vaguely remember a discussion about this a few months back. If I remember, the reasoning was it would unnecessarily slow down smaller systems that would never have block devices in the 4-28T range attached. 4k page size * 2GB = 8TB. i consider it much more likely on such systems that the page size will be increased to maybe 16 or 64k which would give us 32TB or 128TB. personally, i'm going to see what the situation looks like in 5 years time and try to solve the problem then. What do you mean by problems 5 years down the road? The real issue is that this 32-bit block count limit affects composite devices like MD RAID and LVM today, not just individual disks. There have been several postings I have seen with people having a problem _today_ with a 2TB limit on devices. There is some hope with LVM (and MD I suspect as well), that it could do blocksize remapping, so it appears to be a 4k sector device, but remaps to 512-byte sector disks underneath. This _should_ give us an upper limit of 16TB, assuming 32-bit unsigned ints for block numbers. Of course, you would need to only do 4kB block I/O on top of these devices (not much of an issue for such large devices). Still, this is just a stop-gap measure because next year people will want 16TB devices, and there won't be an easy way to do this. Cheers, Andreas -- Andreas Dilger \ "If a man ate a pound of pasta and a pound of antipasto, \ would they cancel out, leaving him still hungry?" http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
Matthew Wilcox [EMAIL PROTECTED] writes: On Mon, Mar 26, 2001 at 10:47:13AM -0700, Andreas Dilger wrote: What do you mean by problems 5 years down the road? The real issue is that this 32-bit block count limit affects composite devices like MD RAID and LVM today, not just individual disks. There have been several postings I have seen with people having a problem _today_ with a 2TB limit on devices. people who can afford 2TB of disc can afford to buy a 64-bit processor. Currently that doesn't solve the problem as block_nr is held in an int. And as gcc compiles an int to a 32bit number on a 64bit processor, the problem still isn't solved. That at least we need to address. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
LA Walsh [EMAIL PROTECTED] writes: I vaguely remember a discussion about this a few months back. If I remember, the reasoning was it would unnecessarily slow down smaller systems that would never have block devices in the 4-28T range attached. With classic 512 byte sectors the top size is right about 2TB. The basic thought is that 64bit numbers tend to suck, so we don't want then in any fast paths on a 32bit system. However, isn't it possible there will continue to be a series of P-IV,V,VI,VII ...etc, addons that will be used for sometime to come. I've even heard it suggested that we might see 2 or more CPU's on a single chip as a way to increase cpu capacity w/o driving up clock speed. Given the cheapness of .25T drives now, seeing the possibility of 4T drives doesn't seem that remote (maybe 5 years?). Side question: does the 32-bit block size limit also apply to RAID disks or does it use a different block-nr type? For now yes it does. So...is it the plan, or has it been though about -- 'abstracting' block numbes as a typedef 'block_nr', then at compile time having it be selectable as to whether or not this was to be a 32-bit or 64 bit quantity -- that way older systems would lose no efficiency. Drivers that couldn't be or hadn't been ported to use 'block_nr' could default to being disabled if 64-bit blocks were selected, etc. So has this idea been tossed about and or previously thrashed? Using a 64bit number of 32bit systems has so far been trashed. Though this does look like a real problem that needs to be solved at some point. I doubt we can wait past 2.5 though if we want the code ready when the hardware is. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
Manfred Spraul wrote: 4k page size * 2GB = 8TB. Try it. If your drive (array) is larger than 512byte*4G (4TB) linux will eat your data. --- I have a block device that doesn't use 'sectors'. It only uses the logical block size (which is currently set for 1K). Seems I could up that to the max blocksize (4k?) and get 8TB...No? I don't use the generic block make request (have my own). -- L A Walsh| Trust Technology, Core Linux, SGI [EMAIL PROTECTED] | Voice: (650) 933-5338 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
"Eric W. Biederman" wrote: Matthew Wilcox [EMAIL PROTECTED] writes: On Mon, Mar 26, 2001 at 10:47:13AM -0700, Andreas Dilger wrote: What do you mean by problems 5 years down the road? The real issue is that this 32-bit block count limit affects composite devices like MD RAID and LVM today, not just individual disks. There have been several postings I have seen with people having a problem _today_ with a 2TB limit on devices. people who can afford 2TB of disc can afford to buy a 64-bit processor. Currently that doesn't solve the problem as block_nr is held in an int. And as gcc compiles an int to a 32bit number on a 64bit processor, the problem still isn't solved. That at least we need to address. And then you must face the fact that there may be the need for some of the shelf software, which isn't well supported on correspondig 64 bit architectures... as well. So the arguemnt doesn't hold up to the reality in any way. BTW. For many reasons 32 bit architecutres are in respoect of some application shemes *faster* the 64. Ultra III in 64 mode just crawls in comparision to 32. Alpha - unfortulatly an orphaned and dyring archtecutre... which is not well supported by sw verndors... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
- Received message begins Here - On Mon, Mar 26, 2001 at 08:39:21AM -0800, LA Walsh wrote: I vaguely remember a discussion about this a few months back. If I remember, the reasoning was it would unnecessarily slow down smaller systems that would never have block devices in the 4-28T range attached. 4k page size * 2GB = 8TB. i consider it much more likely on such systems that the page size will be increased to maybe 16 or 64k which would give us 32TB or 128TB. you keep on trying to increase the size of types without looking at what gcc outputs in the way of code that manipulates 64-bit types. seriously, why don't you just try it? see what the performance is. see what the code size is. then come back with some numbers. and i mean numbers, not `it doesn't feel any slower'. personally, i'm going to see what the situation looks like in 5 years time and try to solve the problem then. there're enough real problems with the VFS today that i don't feel inclined to fix tomorrow's potential problems. I don't feel that it is that far away ... IBM has already released a 64 CPU intel based system (NUMA). We already have systems in that class (though 64 bit based) that use 5 TB file systems. The need is coming, and appears to be coming fast. It should be resolved during the improvements to the VFS. A second reason to include it in the VFS is that the low level filesystem implementation would NOT be required to use it. If the administrator CHOOSES to access a 16TB filesystem from a workstation, then it should be possible (likely something like the GFS, where the administrator is just monitoring things, would be reasonable for a 32 bit system to do). As I see it, the VFS itself doesn't really care what the block size is, it just carries relatively opaque values that the filesystem implementation uses. Most of the overhead should just be copying an extra 4 bytes around. - Jesse I Pollard, II Email: [EMAIL PROTECTED] Any opinions expressed are solely my own. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
"Matthew" == Matthew Wilcox [EMAIL PROTECTED] writes: Matthew On Mon, Mar 26, 2001 at 10:47:13AM -0700, Andreas Dilger Matthew wrote: What do you mean by problems 5 years down the road? The real issue is that this 32-bit block count limit affects composite devices like MD RAID and LVM today, not just individual disks. There have been several postings I have seen with people having a problem _today_ with a 2TB limit on devices. Matthew people who can afford 2TB of disc can afford to buy a 64-bit Matthew processor. Oh great, and migrating a large application to a new architecture is soo cheap. Disk costs nothing these days and there is a legitimate need here. Jes - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
On Mon, 26 Mar 2001, Andreas Dilger wrote: Matthew Wilcox writes: people who can afford 2TB of disc can afford to buy a 64-bit processor. This whole "64-bit" fallacy has got to stop. Indeed. Now it is "anybody who needs 2TB disk should use a 64-bit CPU", soon to be wrong. It was already wrong in 1995. -Dan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
On Mon, 26 Mar 2001, Matthew Wilcox wrote: people who can afford 2TB of disc can afford to buy a 64-bit processor. You realise that this'll double the price of storage? ;) (at least, in a year or two) Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com.br/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
Martin Dalecki [EMAIL PROTECTED]: "Eric W. Biederman" wrote: Matthew Wilcox [EMAIL PROTECTED] writes: On Mon, Mar 26, 2001 at 10:47:13AM -0700, Andreas Dilger wrote: What do you mean by problems 5 years down the road? The real issue is that this 32-bit block count limit affects composite devices like MD RAID and LVM today, not just individual disks. There have been several postings I have seen with people having a problem _today_ with a 2TB limit on devices. people who can afford 2TB of disc can afford to buy a 64-bit processor. Currently that doesn't solve the problem as block_nr is held in an int. And as gcc compiles an int to a 32bit number on a 64bit processor, the problem still isn't solved. That at least we need to address. And then you must face the fact that there may be the need for some of the shelf software, which isn't well supported on correspondig 64 bit architectures... as well. So the arguemnt doesn't hold up to the reality in any way. You are missing the point - I may need to use a 32 bit system to monitor a large file system. I don't need the compute power of most 64 bit systems to monitor user file activity. BTW. For many reasons 32 bit architecutres are in respoect of some application shemes *faster* the 64. Which is why I want to use them with a 64 bit file system. Some of the weather models run here have been known to exceed 100 GB data file. Yes one file. Most only need 20GB, but there are a couple of hundred of them... Ultra III in 64 mode just crawls in comparision to 32. Depends on what you are doing. If you need to handle large arrays of floating point it is reasonable (not great, just reasonable). Alpha - unfortulatly an orphaned and dyring archtecutre... which is not well supported by sw verndors... These are NOT the only 64 bit systems - Intel, PPC, IBM (in various guises). If you need raw compute power, the Alpha is pretty good (we have over a 1000 in a Cray T3..). - Jesse I Pollard, II Email: [EMAIL PROTECTED] Any opinions expressed are solely my own. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
From: "LA Walsh" [EMAIL PROTECTED] Manfred Spraul wrote: 4k page size * 2GB = 8TB. Try it. If your drive (array) is larger than 512byte*4G (4TB) linux will eat your data. --- I have a block device that doesn't use 'sectors'. It only uses the logical block size (which is currently set for 1K). Seems I could up that to the max blocksize (4k?) and get 8TB...No? I don't use the generic block make request (have my own). Which field do you access? bh-b_blocknr instead of bh-r_sector? There were plans to split the buffer_head into 2 structures: buffer cache data and the block io data. b_blocknr is buffer cache only, no driver should access them. http://groups.google.com/groups?q=NeilBrown+io_headhl=enlr=safe=offr num=1seld=928643305ic=1 -- Manfred - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
These are NOT the only 64 bit systems - Intel, PPC, IBM (in various guises). If you need raw compute power, the Alpha is pretty good (we have over a 1000 in a Cray T3..). Best of all, the PowerPC and the POWER are binary-compatible to a very large degree - just the latter has an extra set of 64-bit instructions. What was that I was hearing about having to redevelop or recompile your apps for 64-bit? I can easily imagine a 64-bit filesystem being accessed by a bunch of RS/6000s and monitored using an old PowerMac. Goodness, the PowerMac 9600 even has 6 PCI slots to put all those SCSI-RAID and Ethernet cards in. :) -- from: Jonathan "Chromatix" Morton mail: [EMAIL PROTECTED] (not for attachments) big-mail: [EMAIL PROTECTED] uni-mail: [EMAIL PROTECTED] The key to knowledge is not to rely on people to teach you it. Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/ -BEGIN GEEK CODE BLOCK- Version 3.12 GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*) -END GEEK CODE BLOCK- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 64-bit block sizes on 32-bit systems
On Mon, 26 Mar 2001, Jonathan Morton wrote: These are NOT the only 64 bit systems - Intel, PPC, IBM (in various guises). If you need raw compute power, the Alpha is pretty good (we have over a 1000 in a Cray T3..). Best of all, the PowerPC and the POWER are binary-compatible to a very large degree - just the latter has an extra set of 64-bit instructions. What was that I was hearing about having to redevelop or recompile your apps for 64-bit? I can easily imagine a 64-bit filesystem being accessed by a bunch of RS/6000s and monitored using an old PowerMac. Goodness, the PowerMac 9600 even has 6 PCI slots to put all those SCSI-RAID and Ethernet cards in. :) Save the money - get one fiber channel and connect to all that through one interface... -- - Jesse I Pollard, II Email: [EMAIL PROTECTED] Any opinions expressed are solely my own. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/