Re: [OT] Re: 64-bit block sizes on 32-bit systems

2001-03-29 Thread Ion Badulescu

On Thu, 29 Mar 2001 21:09:53 +0100, Per Jessen <[EMAIL PROTECTED]> wrote:

> This is academic - if people running IA32 systems are willing - and
> I think they are - to accept a performance penalty in return for the 
> ability to utilize much bigger disks/disk-space, why not let them
> have it ?

Because I'm also running IA32 systems and I'm not willing to accept
the performance penalty. And like me are numerous other who've never
thought of pushing an architecture beyond its limits.

Maybe also because academic (nee technical) arguments happen to matter
a lot more in the Linux world than commercial arguments.

You just can't expect to put a Honda Civic engine on a tank and
have it perform well. That's quite well understood. You want to
do it anyway, you're welcome to try. But please, leave us the default
Honda Civic body which this engine can manage rather well.

> Your attitude is *almost* that of Microsoft - WE know what is best for
> YOU. 

Ohh, but WE've always known what's best for YOU, otherwise we wouldn't
bother and would let YOU write YOUR OWN kernel from scratch. :-)

However, *unlike* Microsoft, we do give you full source code and allow
you to make whatever changes fancy you. You're welcome to make this
change, maintain the patch outside the official kernel and give it to
the poor sods whose management gave them the tank with the Honda Civic
engine inside. I'm sure they'll appreciate it (no sarcasm intended).

> Just to put things in perspective.

Indeed, since you mention perspective: before we continue this thread,
please all take a long and hard look at the "64-bit dev_t" thread on
linux-kernel. Judge for yourselves what the chances of Linus accepting
a 64-bit device size_t would be, in the light of that thread...

[I won't even get into the "but make it a compile option" argument.
Matthew Wilcox mentioned it breaks the module ABI compatibility; I'll
just say that we currently have 6 different (incompatible) strains --
3 memory models x {UP, SMP} -- and adding yet another option affecting
the core ABI would multiply that to 12. Thanks, but no thanks.]

EOT for me,
Ion

-- 
  It is better to keep your mouth shut and be thought a fool,
than to open it and remove all doubt.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]



[OT] Re: 64-bit block sizes on 32-bit systems

2001-03-29 Thread Per Jessen

On Mon, 26 Mar 2001 13:13:32 -0800, Ion Badulescu wrote:

[snip]
>> I don't think the millions of 32-bit systems will disappear overnight,
>
>That was said of the millions of 16-bit systems about 10 years ago.
>I wonder where we would be now if Linux tried to cater to 286's or lower
>processors.

Indeed it was. And you you compare the number of 16bit systems in use in
mission-critical and/or enterprise work 10 years ago, you will realise 
why it was possible to move to 32bit systems so fast. 

[snip]
>Anybody who advocates extended use of 64-bit type on IA32 -- especially
>on IA32 -- please remember that you're dealing with a processor that 
>emulates at most *4* general purpose non-atomic 64-bit registers.

This is academic - if people running IA32 systems are willing - and
I think they are - to accept a performance penalty in return for the 
ability to utilize much bigger disks/disk-space, why not let them
have it ?
Your attitude is *almost* that of Microsoft - WE know what is best for
YOU. 
Just to put things in perspective.


regards,
Per Jessen


regards,
Per Jessen


-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]



Re: 64-bit block sizes on 32-bit systems

2001-03-28 Thread Dave Kleikamp

My turn to chime in.

JFS was designed around a 4K meta-data page size.  It would require some
major re-design to use larger block sizes.  On the other hand, JFS could
take advantage of 64-bit block addresses immediately.  JFS internally
store the block address in 40 bits.  (Sorry, file size & volume size are
both limited to 4 peta-bytes on JFS.)

At the rate that storage hardware and requirements are increasing,
increasing the block size is a short-term solution that is only going to
delay the inevitable requirement for 64-bit block addressability.  There
is a practical limit to a usable block-size.  Someone threw out 64K,
which seems reasonable to me.
-- 
David Kleikamp
IBM Linux Technology Center
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]



Re: 64-bit block sizes on 32-bit systems

2001-03-27 Thread Matthew Wilcox

On Mon, Mar 26, 2001 at 08:39:21AM -0800, LA Walsh wrote:
> So...is it the plan, or has it been though about -- 'abstracting'
> block numbes as a typedef 'block_nr', then at compile time
> having it be selectable as to whether or not this was to
> be a 32-bit or 64 bit quantity -- that way older systems would

Oh, did no-one mention the words `Module ABI' yet?

-- 
Revolutions do not require corporate support.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]



Re: 64-bit block sizes on 32-bit systems

2001-03-27 Thread Brad Boyer

Steve Lord wrote:
> Just a brief add to the discussion, besides which I have a vested interest
> in this!

I'll add my little comments as well, and hopefully not start a flamewar... :)

[snip comments about blocksize, etc.]

Here's a real-life example of something that most of you will probably hate
me for mentioning:

HFS uses variable sized blocks (made up of multiple 512 byte sectors), but
stores block numbers as a 16 bit value. (I know, everyone will say, "We're
talking about moving from 32 to 64 bits." Keep listening.) This gave great
performance on the then current massive storage of a 20M drive. However,
when it became possible to get the absolutely gigantic hard drive of 1G,
it became more and more obvious that it was a drawback that was causing
a huge amount of wasted space. Apple had to design a new filesystem (HFS+)
that was able to represent blocks with a 32 bit number to overcome the
effective limitation on how big a filesystem could be. It's getting to
the point now that it's easily possible to put together a disk array that
is large enough that even referring to blocks with a 32 bit value requires
relatively large blocks. I don't know if we have very many filesystems that
would support this feature, but it will become important a lot sooner than
anyone may be thinking.

Obviously this case isn't a perfect fit for the situation, since HFS was
designed to be read by 32 bit machines, and the upgrade to 32 bits didn't
give a CPU penalty, just a bus bandwidth problem. Also, I'm coming from
a platform that actually can do a decent job of 64 bit, unlike x86, but
we shouldn't disallow people from doing bigger and better things. It's
become very popular lately to position Linux as an enterprise-ready system,
and this is something that will be expected. People will want to access
a multi-TB database as a single file, as well as other things that may
seem crazy to most people now.

I understand people's aversion to the #ifdefs in the code, but if the changes
are made in a sane way, it can still be clean and easy to maintain. It's
worth it to add a little complexity (particularly as an option) to add a
feature that people will be demanding in the relatively near future. It
might be a good idea to wait for 2.5, tho...

Brad Boyer
[EMAIL PROTECTED]

P.S.: No, I have no personal reason to need any of this 64 bit filesystem
stuff. Just trying to point out possibilities. Don't expect me to actually
be writing this stuff...

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]



Re: 64-bit block sizes on 32-bit systems

2001-03-27 Thread Steve Lord


Hi,

Just a brief add to the discussion, besides which I have a vested interest
in this!

I do not believe that you can make the addressability of a device larger at
the expense of granularity of address space at the bottom end. Just because
ext2 has a single size for metadata does not mean everything you put on the
disks does. XFS filesystems, for example, can be made with block sizes from
512 bytes to 64Kbytes (ok not working on linux across this range yet, but it
will).

In all of these cases we have chunks of metadata which are 512 bytes
long, and we have chunks bigger than the blocksize.  The 512 byte chunks
are the superblock and the heads of the freespace structures, there
are multilples of them through the filesystem.

To top that, we have disk write ordering constraints that could mean that
for two of the 512 byte chunks next to each other one must be written to
disk now to free log space, the other must not be written to disk because it
is in a transaction. We would be forced to do read-modify-write down at
some lower level - wait the lower levels would not have the addressability.

There are probably other things which will not fly if you lose the
addressing granularity. Volume headers and such like would be one
possibility.

No I don't have a magic bullet solution, but I do not think that just
increasing the granularity of the addressing is the correct answer,
and yes I do agree that just growing the buffer_head fields is not
perfect either.

Steve Lord

p.s. there was mention of bigger page size, it is not hard to fix, but the
swap path will not even work with 64K pages right now.



-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]



Re: 64-bit block sizes on 32-bit systems

2001-03-27 Thread Jesse Pollard

Jan Harkes <[EMAIL PROTECTED]>:
> 
> On Tue, Mar 27, 2001 at 01:57:42PM -0600, Jesse Pollard wrote:
> > > Using similar numbers as presented. If we are working our way through
> > > every single block in a Pentabyte filesystem, and the blocksize is 512
> > > bytes. Then the 1us in extra CPU cycles because of 64-bit operations
> > > would add, according to by back of the envelope calculation, 2199023
> > > seconds of CPU time a bit more than 25 days.
> > 
> > Ummm... I don't think it adds that much. You seem to be leaving out the
> > overlap disk/IO and computation for read-ahead. This should eliminate the
> > majority of the delay effect.
> 
> 1024 TB should be around 2*10^12 512-byte blocks, divide by 10^6 (1us)
> of "assumed" overhead per block operation is 2*10^6 seconds, no I
> believe I'm pretty close there. I am considering everything being
> "available in the cache", i.e. no waiting for disk access.

That would be true for small files (< 5GB). I have to deal with files that
may be 20-100 GB. Except for the largest systems (200GB of main memory)
the data will NOT be in the cache except for ~50% of the time. (assuming
only one user)

> > > Seriously, there is a lot more that needs to be done than introducing a
> > > 64-bit blocknumber. Effectively 512 byte blocks are far too small for
> > > that kind of data, and going to pagesize blocks (and increasing pagesize
> > > to 64KB or 2MB at the same time) is a solution that is far more likely
> > > to give good results since it reduces both the total the number of
> > > 'blocks' on the device as well as reducing the total amount of calls
> > > throughout kernel space instead of increasing the cost per call.
> > 
> > Talk about adding overhead... How long do you think it takes to read a
> > 2MB block (not to mention the time to update that page..) The additional
> > contention on the fiberchannel I/O alone might kill it if the filesystem
> > is busy.
> 
> The time to update the pagetables is identical to the time to update a
> 4KB page when the OS is using a 2MB pagesize. Ofcourse it will take more
> time to load the data into the page, however it should be a consecutive
> stretch of data on disk, which should give a more efficient transfer
> than small blocks scattered around the disk.

You assume the file is accessed sequentially. The wether models don't do
that. They do have some locality, but only in a 3D sense. When you include
time it becomes closer to a random disk block reference when everything has to
be linearized.

> 
> > Granted, 512 bytes could be considered too small for some things, but
> > once you pass 32K you start adding a lot of rotational delay problems.
> > I've used file systems with 256K blocks - they are slow when compaired
> > to the throughput using 32K. I wasn't the one running the benchmarks,
> > but with a MaxStrat 400GB raid with 256K sized data transfer was much
> > slower (around 3 times slower) than 32K. (The target application was
> > a GIS server using Oracle).
> 
> But your subsystem (the disk) was probably still using 512 byte blocks,
> possibly scattered. And the OS was still using 4KB pages, it takes more
> time to reclaim and gather 64 pages per IO operation than one, that's
> why I'm saying that the pagesize needs to scale along with the blocksize.

It wasn't - the "disks" were composed of groups of 5 drives in a raid striped
for speed and spread across 5 SCSI III controlers. Each raid attached had
16MB internal cache. I think the controlers were using an entire sector
read (32K).

> The application might have been assuming a small block size as well, and
> the OS was told to do several read/modify/write cycles, perhaps even 512
> times as much as necessary.

There was some of that, but not much. Oracle (as I recall) allows for the
specification of transfer size. 

This also brings up the problem of small files. Allocating 2MB per file
would, waist quite a bit of disk space (assuming 5 - 10 million files
with only 15% having 25GB or more).

> I'm not saying that the current system will perform well when working
> with large blocks, but compared to increasing the size of block_t, a
> larger blocksize has more potential to give improvements in the long
> term without adding an unrecoverable performance hit.

Not when the filesystem is required for general use. It only makes it
simpler to actually have a large filesystem. It doesn't help when it
must be used.

Now you are saying that the throughput WILL go down, but only if you use
large block sizes.

I can go along with making block sizes up to 8K. Even 32K for special
circumstances (even 64K for dedicated use). But not larger. NFS overhead on
file I/O becomes way too excessive (...worst example now is having to read
a 2MB block to update 512 bytes, then write it back... :-) 

-
Jesse I Pollard, II
Email: [EMAIL PROTECTED]

Any opinions expressed are solely my own.
-
To unsubscribe from this 

Re: 64-bit block sizes on 32-bit systems

2001-03-27 Thread LA Walsh

Jan Harkes wrote:
> 
> On Tue, Mar 27, 2001 at 01:57:42PM -0600, Jesse Pollard wrote:
> > > Using similar numbers as presented. If we are working our way through
> > > every single block in a Pentabyte filesystem, and the blocksize is 512
> > > bytes. Then the 1us in extra CPU cycles because of 64-bit operations
> > > would add, according to by back of the envelope calculation, 2199023
> > > seconds of CPU time a bit more than 25 days.
> >
> > Ummm... I don't think it adds that much. You seem to be leaving out the
> > overlap disk/IO and computation for read-ahead. This should eliminate the
> > majority of the delay effect.
> 
> 1024 TB should be around 2*10^12 512-byte blocks, divide by 10^6 (1us)
> of "assumed" overhead per block operation is 2*10^6 seconds, no I
> believe I'm pretty close there. I am considering everything being
> "available in the cache", i.e. no waiting for disk access.
---
If everything being used is only used from the cache, then
the application probably doesn't need 64-bit block support.  

I submit that your argument may be flawed in the assumption that
if an application needs multi-terabyte files and devices, that most
of the data will be in the in-memory cache. 
 

> The time to update the pagetables is identical to the time to update a
> 4KB page when the OS is using a 2MB pagesize. Ofcourse it will take more
> time to load the data into the page, however it should be a consecutive
> stretch of data on disk, which should give a more efficient transfer
> than small blocks scattered around the disk.
---
Not if you were doing alot of random reads where you only
needd 1-2K of data.  The read-time of the extra 2M-1K would seem
to eat into any performance boot gained by the large pagesize.

> 
> > Granted, 512 bytes could be considered too small for some things, but
> > once you pass 32K you start adding a lot of rotational delay problems.
> > I've used file systems with 256K blocks - they are slow when compaired
> > to the throughput using 32K. I wasn't the one running the benchmarks,
> > but with a MaxStrat 400GB raid with 256K sized data transfer was much
> > slower (around 3 times slower) than 32K. (The target application was
> > a GIS server using Oracle).
> 
> But your subsystem (the disk) was probably still using 512 byte blocks,
> possibly scattered. And the OS was still using 4KB pages, it takes more
> time to reclaim and gather 64 pages per IO operation than one, that's
> why I'm saying that the pagesize needs to scale along with the blocksize.
> 
> The application might have been assuming a small block size as well, and
> the OS was told to do several read/modify/write cycles, perhaps even 512
> times as much as necessary.
> 
> I'm not saying that the current system will perform well when working
> with large blocks, but compared to increasing the size of block_t, a
> larger blocksize has more potential to give improvements in the long
> term without adding an unrecoverable performance hit.
---
That's totally application dependent.  Database applications
might tend to skip around in the data and do short/reads/writes over
a very large file.  Large block sizes will degrade their performance.

This was the idea of making it a *configurable* option.  If
you need it, configure it.  Same with block size -- that should
likely have a wider range for configuration as well.  But
configuration (and ideally auto-configuration where possible)
seems the ultimate win-win situation.

-l
-- 
The above thoughts are my own and do not necessarily represent those
of my employer.
L A Walsh| Trust Technology, Core Linux, SGI
[EMAIL PROTECTED]  | Voice: (650) 933-5338
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]



Re: 64-bit block sizes on 32-bit systems

2001-03-27 Thread Jan Harkes

On Tue, Mar 27, 2001 at 01:57:42PM -0600, Jesse Pollard wrote:
> > Using similar numbers as presented. If we are working our way through
> > every single block in a Pentabyte filesystem, and the blocksize is 512
> > bytes. Then the 1us in extra CPU cycles because of 64-bit operations
> > would add, according to by back of the envelope calculation, 2199023
> > seconds of CPU time a bit more than 25 days.
> 
> Ummm... I don't think it adds that much. You seem to be leaving out the
> overlap disk/IO and computation for read-ahead. This should eliminate the
> majority of the delay effect.

1024 TB should be around 2*10^12 512-byte blocks, divide by 10^6 (1us)
of "assumed" overhead per block operation is 2*10^6 seconds, no I
believe I'm pretty close there. I am considering everything being
"available in the cache", i.e. no waiting for disk access.

> > Seriously, there is a lot more that needs to be done than introducing a
> > 64-bit blocknumber. Effectively 512 byte blocks are far too small for
> > that kind of data, and going to pagesize blocks (and increasing pagesize
> > to 64KB or 2MB at the same time) is a solution that is far more likely
> > to give good results since it reduces both the total the number of
> > 'blocks' on the device as well as reducing the total amount of calls
> > throughout kernel space instead of increasing the cost per call.
> 
> Talk about adding overhead... How long do you think it takes to read a
> 2MB block (not to mention the time to update that page..) The additional
> contention on the fiberchannel I/O alone might kill it if the filesystem
> is busy.

The time to update the pagetables is identical to the time to update a
4KB page when the OS is using a 2MB pagesize. Ofcourse it will take more
time to load the data into the page, however it should be a consecutive
stretch of data on disk, which should give a more efficient transfer
than small blocks scattered around the disk.

> Granted, 512 bytes could be considered too small for some things, but
> once you pass 32K you start adding a lot of rotational delay problems.
> I've used file systems with 256K blocks - they are slow when compaired
> to the throughput using 32K. I wasn't the one running the benchmarks,
> but with a MaxStrat 400GB raid with 256K sized data transfer was much
> slower (around 3 times slower) than 32K. (The target application was
> a GIS server using Oracle).

But your subsystem (the disk) was probably still using 512 byte blocks,
possibly scattered. And the OS was still using 4KB pages, it takes more
time to reclaim and gather 64 pages per IO operation than one, that's
why I'm saying that the pagesize needs to scale along with the blocksize.

The application might have been assuming a small block size as well, and
the OS was told to do several read/modify/write cycles, perhaps even 512
times as much as necessary.

I'm not saying that the current system will perform well when working
with large blocks, but compared to increasing the size of block_t, a
larger blocksize has more potential to give improvements in the long
term without adding an unrecoverable performance hit.

Jan

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]



Re: 64-bit block sizes on 32-bit systems

2001-03-27 Thread Jesse Pollard

-  Received message begins Here  -

> 
> On Tue, Mar 27, 2001 at 09:15:08AM -0800, LA Walsh wrote:
> > Now lets look at the sites want to process terabytes of
> > data -- perhaps files systems up into the Pentabyte range.  Often I
> > can see these being large multi-node (think 16-1024 clusters as 
> > are in use today for large super-clusters).  If I was to characterize
> > the performance of them, I'd likely see the CPU pegged at 100% 
> > with 99% usage in user space.  Let's assume that increasing the
> > block size decreases disk accesses by as much as 10% (you'll have
> > to admit -- using a 64bit quantity vs. 32bit quantity isn't going
> > to even come close to increasing disk access times by 1 millisecond,
> > really, so it really is going to be a much smaller fraction when
> > compared to the actual disk latency).  
> [snip]
> > Is there some logical flaw in the above reasoning?
> 
> But those changes will affect even the fastpath, i.e. data that is
> already in the page/buffer caches. In which case we don't have to wait
> for disk access latency. Why would anyone who is working with a
> pentabyte of data even consider not relying on essentially always
> hitting data that is available the read-ahead cache.

It depends entirely on the application. Where the cache can contain
20% of the data, most accesses should already be in memory. If the
data is significantly larger, there is a high chance that the data
will not be there.

> 
> Using similar numbers as presented. If we are working our way through
> every single block in a Pentabyte filesystem, and the blocksize is 512
> bytes. Then the 1us in extra CPU cycles because of 64-bit operations
> would add, according to by back of the envelope calculation, 2199023
> seconds of CPU time a bit more than 25 days.

Ummm... I don't think it adds that much. You seem to be leaving out the
overlap disk/IO and computation for read-ahead. This should eliminate the
majority of the delay effect.

> Seriously, there is a lot more that needs to be done than introducing a
> 64-bit blocknumber. Effectively 512 byte blocks are far too small for
> that kind of data, and going to pagesize blocks (and increasing pagesize
> to 64KB or 2MB at the same time) is a solution that is far more likely
> to give good results since it reduces both the total the number of
> 'blocks' on the device as well as reducing the total amount of calls
> throughout kernel space instead of increasing the cost per call.

Talk about adding overhead... How long do you think it takes to read a
2MB block (not to mention the time to update that page..) The additional
contention on the fiberchannel I/O alone might kill it if the filesystem
is busy.

Granted, 512 bytes could be considered too small for some things, but
once you pass 32K you start adding a lot of rotational delay problems.
I've used file systems with 256K blocks - they are slow when compaired
to the throughput using 32K. I wasn't the one running the benchmarks,
but with a MaxStrat 400GB raid with 256K sized data transfer was much
slower (around 3 times slower) than 32K. (The target application was
a GIS server using Oracle).

-
Jesse I Pollard, II
Email: [EMAIL PROTECTED]

Any opinions expressed are solely my own.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]



Re: 64-bit block sizes on 32-bit systems

2001-03-27 Thread Jan Harkes

On Tue, Mar 27, 2001 at 09:15:08AM -0800, LA Walsh wrote:
>   Now lets look at the sites want to process terabytes of
> data -- perhaps files systems up into the Pentabyte range.  Often I
> can see these being large multi-node (think 16-1024 clusters as 
> are in use today for large super-clusters).  If I was to characterize
> the performance of them, I'd likely see the CPU pegged at 100% 
> with 99% usage in user space.  Let's assume that increasing the
> block size decreases disk accesses by as much as 10% (you'll have
> to admit -- using a 64bit quantity vs. 32bit quantity isn't going
> to even come close to increasing disk access times by 1 millisecond,
> really, so it really is going to be a much smaller fraction when
> compared to the actual disk latency).  
[snip]
>   Is there some logical flaw in the above reasoning?

But those changes will affect even the fastpath, i.e. data that is
already in the page/buffer caches. In which case we don't have to wait
for disk access latency. Why would anyone who is working with a
pentabyte of data even consider not relying on essentially always
hitting data that is available the read-ahead cache.

Using similar numbers as presented. If we are working our way through
every single block in a Pentabyte filesystem, and the blocksize is 512
bytes. Then the 1us in extra CPU cycles because of 64-bit operations
would add, according to by back of the envelope calculation, 2199023
seconds of CPU time a bit more than 25 days.

Seriously, there is a lot more that needs to be done than introducing a
64-bit blocknumber. Effectively 512 byte blocks are far too small for
that kind of data, and going to pagesize blocks (and increasing pagesize
to 64KB or 2MB at the same time) is a solution that is far more likely
to give good results since it reduces both the total the number of
'blocks' on the device as well as reducing the total amount of calls
throughout kernel space instead of increasing the cost per call.

Jan

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]



Re: 64-bit block sizes on 32-bit systems

2001-03-27 Thread LA Walsh

Ion Badulescu wrote:
> Are you being deliberately insulting, "L", or are you one of those users
> who bitch and scream for features they *need* at *any cost*, and who
> have never even opened up the book for Computer Architecture 101?
---
Sorry, I was borderline insulting.  I'm getting pressure on
personal fronts other than just here.  But my degree is in computer
science and I've had almost 20 years experience programming things
as small as 8080's w/ 4K ram on up.  I'm familiar with 'cost' of
emulation.

> Let's try to keep the discussion civilized, shall we?
---
Certainly.
> 
> Compile option or not, 64-bit arithmetic is unacceptable on IA32. The
> introduction of LFS was bad enough, we don't need yet another proof that
> IA32 sucks. Especially when there *are* better alternatives.
===
So if it is a compile option -- the majority of people
wouldn't be affected, is that in agreement?  Since the default would
be to use the same arithmetic as we use  now.

In fact, I posit that if anything, the majority of the people
might be helped as the block_nr becomes a a 'typed' value -- and
perhaps the sector_nr as well.  They remain the same size, but as
a typed value the kernel gains increased integrity from the increased
type checking.  At worst, it finds no new bugs and there is no impact
in speed.  Are we in agreement so far?

Now lets look at the sites want to process terabytes of
data -- perhaps files systems up into the Pentabyte range.  Often I
can see these being large multi-node (think 16-1024 clusters as 
are in use today for large super-clusters).  If I was to characterize
the performance of them, I'd likely see the CPU pegged at 100% 
with 99% usage in user space.  Let's assume that increasing the
block size decreases disk accesses by as much as 10% (you'll have
to admit -- using a 64bit quantity vs. 32bit quantity isn't going
to even come close to increasing disk access times by 1 millisecond,
really, so it really is going to be a much smaller fraction when
compared to the actual disk latency).  

Ok...but for the sake of
argument using 10% -- that's still only 10% of 1% spent in the system.
or a slowdown of .1%.  Now that's using a really liberal figure
of 10%.  If you look at the actual speed of 64 bit arithmatic vs.
32, we're likely talking -- upper bound, 10x the clocks for 
disk block arithmetic.  Disk block arithmetic is a small fraction
of time spent in the kernel.  We have to be looking at *maximum*
slowdowns in the range of a few hundred maybe a few thousand extra clocks.
A 1000 extra clocks on a 1G machine is 1 microsecond, or approx
1/5000th your average seek latency on a *fast* hard disk.  So
instead of 10% slowdown we are talking slowdowns in the 1/1000 range
or less.  Now that's a slowdown in the 1% that was being spent in
the kernel, so now we've slowdown the total program speed by .001%
at the increase benefit (to that site) of being able to process
those mega-gig's (Pentabytes) of information.  For a hit that is
not noticable to human perception, they go from not being able to
use super-clusters of IA32 machines (for which HW and SW is cheap), 
to being able to use it.  That's quite a cost savings for them.

Is there some logical flaw in the above reasoning?

-linda

-- 
L A Walsh| Trust Technology, Core Linux, SGI
[EMAIL PROTECTED]  | Voice: (650) 933-5338
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]



Re: 64-bit block sizes on 32-bit systems

2001-03-27 Thread Ion Badulescu

On Tue, 27 Mar 2001, Anton Altaparmakov wrote:

> At 09:35 27/03/2001, Ion Badulescu wrote:
> >Compile option or not, 64-bit arithmetic is unacceptable on IA32. The
> >introduction of LFS was bad enough, we don't need yet another proof that
> >IA32 sucks. Especially when there *are* better alternatives.
> 
> Sorry, but why should it be unacceptable? It works, so what's the problem? 

Because there are some of us who believe a CPU upgrade should server other 
purposes than to just compensate for an increase in disk space...

Well, let me put it from a different perspective.

Suppose gcc had been implemented in such a way that it never uses 4 of the 
8 general purpose registers on IA32. Would the generated code work? Sure 
thing (ABI's not withstanding). Would you use this version of gcc? I 
sincerely doubt it.

No, why wouldn't you use it? It works, right? Oh, could it be that you 
*know* that it's a bad implementation and there are better ways to 
implement it?

Same thing with the 64-bit device size: it's the suboptimal answer to a 
real problem. As has been pointed out on the list, there are better 
answers out there and people are aware of them.

> I would much rather have a performance hit, than have it not working at 
> all. 

I'd rather not have a performance hit than have it, when in both cases the
code "works".

> I am definitely putting 64-bit arithmetic into the new NTFS project as 
> most of NTFS is 64-bit, and I find it quite acceptable. 

Ok, so you don't have much of a choice. Just don't be very surprised if
people won't make NTFS their default filesystem on Linux. :-) Hey, don't
get me wrong, this is not directed at you in any way. You didn't design
NTFS, you're just trying to make it work on Linux -- a very commendable 
work.

> So it will be a few % slower than with 32-bit,

Make that 4x slower. Not overall, obviously, but the 64-bit C code itself 
will be about 4x slower than equivalent 32-bit C code.

> Anyway, what is this obsession with speed? Surely it is stability, and 
> capability which is more important than speed? 

They need to be properly balanced. People will not buy a very featureful
product which is slower than molasses. I know this from personal
experience, trying to sell/support such a featureful and slow product...

> I see the point why people with small systems would not want 64-bit, so I 
> agree that a compile time option is a good idea. Where is your problem with 
> that? 

More fragmentation in the core ABI, and there is enough of it already. It 
makes support hell (like it wasn't already...).

> I think you are forgetting that 64-bit maths is still on the CPU, which is 
> fast, I/O speed is bound by the HD, not by the CPU, so whether the fs-code 
> is a bit slower or not will not make any difference in the actual disk 
> access case. (Cache access is of course a different matter.)

So it's the same thing to you if just accessing your disk, and not doing 
any useful data processing, eats up 12% instead of 3% of your CPU? It 
might not matter if that's _all_ the machine is doing, but that's rarely 
the case in Unix.

> Just my 2p.

Likewise..

Ion

-- 
  It is better to keep your mouth shut and be thought a fool,
than to open it and remove all doubt.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]



Re: 64-bit block sizes on 32-bit systems

2001-03-27 Thread Anton Altaparmakov

At 09:35 27/03/2001, Ion Badulescu wrote:
>Compile option or not, 64-bit arithmetic is unacceptable on IA32. The
>introduction of LFS was bad enough, we don't need yet another proof that
>IA32 sucks. Especially when there *are* better alternatives.

Sorry, but why should it be unacceptable? It works, so what's the problem? 
I would much rather have a performance hit, than have it not working at 
all. I am definitely putting 64-bit arithmetic into the new NTFS project as 
most of NTFS is 64-bit, and I find it quite acceptable. So it will be a few 
% slower than with 32-bit, but it will also mean that it will actually 
work, rather than just refusing to mount or crashing out half-way through 
complaining about 64-bit sizes not being implemented, or even worse, 
crashing the kernel / causing fs data corruption due to overflows.

With the current trend in HD prices, I will probably have TiB storage on my 
32-bit CPU PC within a few years (at the moment I am at 0.1TiB), and no, I 
am not going to upgrade my computers to 64-bit ones for large amounts of 
money, when I can just buy a (few) cheap new disk(s).

Anyway, what is this obsession with speed? Surely it is stability, and 
capability which is more important than speed? I don't care whether my PC 
scores top on every benchmark or not, but I do care that my PC can do 
everything I want it to do, in this case use large hds.

I see the point why people with small systems would not want 64-bit, so I 
agree that a compile time option is a good idea. Where is your problem with 
that? If you find 64-bit maths on ia32 unacceptable, then just don't use 
it! Your kernel will just have a few features less. On the other hand, for 
those of us, who do think it is acceptable, we can enable it and enjoy the 
features we want.

Windows NT uses 64-bit maths all the way through NTFS and it works fine. I 
don't see my fs being slow, and in about 4 years of use, I have only ever 
had one problem with fs-corruption caused by NT. Unacceptable? I think not.

I think you are forgetting that 64-bit maths is still on the CPU, which is 
fast, I/O speed is bound by the HD, not by the CPU, so whether the fs-code 
is a bit slower or not will not make any difference in the actual disk 
access case. (Cache access is of course a different matter.)

Just my 2p.

Regards,

 Anton


-- 
Anton Altaparmakov  (replace at with @)
Linux NTFS Maintainer / WWW: http://sourceforge.net/projects/linux-ntfs/
ICQ: 8561279 / WWW: http://www-stu.christs.cam.ac.uk/~aia21/

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]



Re: 64-bit block sizes on 32-bit systems

2001-03-27 Thread Ion Badulescu

On Mon, 26 Mar 2001, LA Walsh wrote:

>   Are you being deliberately misleading or do you just not read?
> 
>   No one is suggesting penalizing *anyone*.  What was suggested
> was a compile time option so sites could choose based on their
> needs.  Creating some strawman that this penalizes some section
> of users is just plain disingenuous or an indication of an inability
> to read.

Are you being deliberately insulting, "L", or are you one of those users
who bitch and scream for features they *need* at *any cost*, and who
have never even opened up the book for Computer Architecture 101?

Let's try to keep the discussion civilized, shall we?

Compile option or not, 64-bit arithmetic is unacceptable on IA32. The 
introduction of LFS was bad enough, we don't need yet another proof that 
IA32 sucks. Especially when there *are* better alternatives.

Ion

-- 
  It is better to keep your mouth shut and be thought a fool,
than to open it and remove all doubt.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]



Re: 64-bit block sizes on 32-bit systems

2001-03-26 Thread LA Walsh

Ion Badulescu wrote:
> Yes, there are millions of 32-bit systems in use today. They do their job
> just fine with the 32-bit device support we have right now. Do you really
> want to penalize them *all* for the sake of the few idiotic sysadmins who
> want multi-TB storage on their 32-bit system?
---
Are you being deliberately misleading or do you just not read?

No one is suggesting penalizing *anyone*.  What was suggested
was a compile time option so sites could choose based on their
needs.  Creating some strawman that this penalizes some section
of users is just plain disingenuous or an indication of an inability
to read.

Like by your logic we should eliminate SMP because it
will slow down non-SMP users who are "penalize" for having to use
it.  This line of reasoning is severly flawed.

-l
-- 
L A Walsh| Trust Technology, Core Linux, SGI
[EMAIL PROTECTED]  | Voice: (650) 933-5338
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]



Re: 64-bit block sizes on 32-bit systems

2001-03-26 Thread AJ Lewis

On Mon, Mar 26, 2001 at 11:37:52AM -0700, Eric W. Biederman wrote:
> Matthew Wilcox <[EMAIL PROTECTED]> writes:
> 
> > On Mon, Mar 26, 2001 at 10:47:13AM -0700, Andreas Dilger wrote:
> > > What do you mean by problems 5 years down the road?  The real issue is that
> > > this 32-bit block count limit affects composite devices like MD RAID and
> > > LVM today, not just individual disks.  There have been several postings
> > > I have seen with people having a problem _today_ with a 2TB limit on
> > > devices.
> > 
> > people who can afford 2TB of disc can afford to buy a 64-bit processor.
> 
> Currently that doesn't solve the problem as block_nr is held in an int.
> And as gcc compiles an int to a 32bit number on a 64bit processor, the
> problem still isn't solved.
> 
> That at least we need to address.

What I don't understand is why we can't just put an option in the Linux
config to enable 64-bit block support, as we have with the High Memory
Support option.  That way the user could select that option if they want it,
regardless of the processor they are using.  Jens Axboe <[EMAIL PROTECTED]>
already mentioned he had patched the kernel to do someting similar earlier
this month on a similar thread on linux-kernel.

It makes sense to have this option when we have an enterprise level LVM
and 64-bit file systems such as the Global File System (GFS) for Linux.

Regards,
-- 
AJ Lewis
Sistina Software Inc.  Voice:  612-379-3951
1313 5th St SE, Suite 111  Fax:612-379-3952
Minneapolis, MN 55414  E-Mail: [EMAIL PROTECTED]
http://www.sistina.com

Current GPG fingerprint = 3B5F 6011 5216 76A5 2F6B  52A0 941E 1261 0029 2648
Get my key at: http://www.sistina.com/~lewis/gpgkey
 (Unfortunately, the PKS-type keyservers do not work with multiple sub-keys)

-Begin Obligatory Humorous Quote
APATHY ERROR: Don't bother striking any key.
-End Obligatory Humorous Quote--

 PGP signature


Re: 64-bit block sizes on 32-bit systems

2001-03-26 Thread Matthew Wilcox

On Mon, Mar 26, 2001 at 01:13:32PM -0800, Ion Badulescu wrote:
> Yes, there are millions of 32-bit systems in use today. They do their job
> just fine with the 32-bit device support we have right now. Do you really
> want to penalize them *all* for the sake of the few idiotic sysadmins who
> want multi-TB storage on their 32-bit system?

And they can even have that multi-TB storage today.  It's the lazy ones
who want to combine all their 80GB discs into one virtual device.

In case anyone's getting the wrong idea from this, I _do_ think we ought
to make the 2TB limit go away.  I _don't_ think we should do that by going
to 64-bit block numbers on 32-bit machines.  Linus suggested a while back
(in one of the kiobuf flamewa^W discussions) that the block device layer
ought to work fine with non-512-byte-blocks.  Perhaps those who are keen
to see Linux support large drives should investigate that instead.

-- 
Revolutions do not require corporate support.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]



Re: 64-bit block sizes on 32-bit systems

2001-03-26 Thread Ion Badulescu

On Mon, 26 Mar 2001 12:09:04 -0700 (MST), Andreas Dilger <[EMAIL PROTECTED]> 
wrote:

> This whole "64-bit" fallacy has got to stop.  First it was "anybody
> who needs files > 2GB should use a 64-bit CPU", wrong.  Then it was
> "anybody who needs > 1GB RAM should use a 64-bit CPU", wrong.  Now it is
> "anybody who needs > 2TB disk should use a 64-bit CPU", soon to be wrong.

And each of the above came with its own assorted set of performance hits.

> I don't think the millions of 32-bit systems will disappear overnight,

That was said of the millions of 16-bit systems about 10 years ago.
I wonder where we would be now if Linux tried to cater to 286's or lower
processors.

I personally don't want to *ever* remember the dreadful days of DOS and
its expanded/extended/fooshtended/fsckedup memory. Although we're getting
dangerously close with kmap/kunmap nowadays -- but at least PAE is a
compile-time option.

Yes, there are millions of 32-bit systems in use today. They do their job
just fine with the 32-bit device support we have right now. Do you really
want to penalize them *all* for the sake of the few idiotic sysadmins who
want multi-TB storage on their 32-bit system?

Anybody who advocates extended use of 64-bit type on IA32 -- especially
on IA32 -- please remember that you're dealing with a processor that 
emulates at most *4* general purpose non-atomic 64-bit registers.

Ion

-- 
  It is better to keep your mouth shut and be thought a fool,
than to open it and remove all doubt.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]



Re: 64-bit block sizes on 32-bit systems

2001-03-26 Thread Dan Hollis

On Mon, 26 Mar 2001, Andreas Dilger wrote:
> Matthew Wilcox writes:
> > people who can afford 2TB of disc can afford to buy a 64-bit processor.
> This whole "64-bit" fallacy has got to stop.

Indeed.

> Now it is "anybody who needs > 2TB disk should use a 64-bit CPU", soon
> to be wrong.

It was already wrong in 1995.

-Dan

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]



Re: 64-bit block sizes on 32-bit systems

2001-03-26 Thread Jes Sorensen

> "Matthew" == Matthew Wilcox <[EMAIL PROTECTED]> writes:

Matthew> On Mon, Mar 26, 2001 at 10:47:13AM -0700, Andreas Dilger
Matthew> wrote:
>> What do you mean by problems 5 years down the road?  The real issue
>> is that this 32-bit block count limit affects composite devices
>> like MD RAID and LVM today, not just individual disks.  There have
>> been several postings I have seen with people having a problem
>> _today_ with a 2TB limit on devices.

Matthew> people who can afford 2TB of disc can afford to buy a 64-bit
Matthew> processor.

Oh great, and migrating a large application to a new architecture is
soo cheap. Disk costs nothing these days and there is a legitimate
need here.

Jes
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]



Re: 64-bit block sizes on 32-bit systems

2001-03-26 Thread Martin Dalecki

"Eric W. Biederman" wrote:
> 
> Matthew Wilcox <[EMAIL PROTECTED]> writes:
> 
> > On Mon, Mar 26, 2001 at 10:47:13AM -0700, Andreas Dilger wrote:
> > > What do you mean by problems 5 years down the road?  The real issue is that
> > > this 32-bit block count limit affects composite devices like MD RAID and
> > > LVM today, not just individual disks.  There have been several postings
> > > I have seen with people having a problem _today_ with a 2TB limit on
> > > devices.
> >
> > people who can afford 2TB of disc can afford to buy a 64-bit processor.
> 
> Currently that doesn't solve the problem as block_nr is held in an int.
> And as gcc compiles an int to a 32bit number on a 64bit processor, the
> problem still isn't solved.
> 
> That at least we need to address.

And then you must face the fact that there may be the need for
some of the shelf software, which isn't well supported on 
correspondig 64 bit architectures... as well. So the
arguemnt doesn't hold up to the reality in any way.
BTW. For many reasons 32 bit architecutres are in
respoect of some application shemes *faster* the 64.
Ultra III in 64 mode just crawls in comparision to 32.
Alpha - unfortulatly an orphaned and dyring archtecutre... which
is not well supported by sw verndors...
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]



Re: 64-bit block sizes on 32-bit systems

2001-03-26 Thread Andreas Dilger

Matthew Wilcox writes:
> On Mon, Mar 26, 2001 at 10:47:13AM -0700, Andreas Dilger wrote:
> > What do you mean by problems 5 years down the road?  The real issue is that
> > this 32-bit block count limit affects composite devices like MD RAID and
> > LVM today, not just individual disks.  There have been several postings
> > I have seen with people having a problem _today_ with a 2TB limit on
> > devices.
> 
> people who can afford 2TB of disc can afford to buy a 64-bit processor.

Get real.  If you buy (cheapest) 40GB IDE disks, I can have 2TB for
U$9200 (not including controllers).  In 1 year it will be half, etc.
I expect I will start moving my DVD collection to disk storage in an
ia32 system once price/GB falls by 50% from current levels.  This is
just for home use, let alone what large companies want to do.  I am
fully expecting hard drive price/GB to keep falling at its current rate.

This whole "64-bit" fallacy has got to stop.  First it was "anybody
who needs files > 2GB should use a 64-bit CPU", wrong.  Then it was
"anybody who needs > 1GB RAM should use a 64-bit CPU", wrong.  Now it is
"anybody who needs > 2TB disk should use a 64-bit CPU", soon to be wrong.
I don't think the millions of 32-bit systems will disappear overnight,
or even in 10 years, yet we already have single IDE disks > 100GB, and
in 2 or 3 years we will have single IDE disks > 1TB that people will
want to use in their 32-bit systems.

Cheers, Andreas
-- 
Andreas Dilger  \ "If a man ate a pound of pasta and a pound of antipasto,
 \  would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/   -- Dogbert
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]



Re: 64-bit block sizes on 32-bit systems

2001-03-26 Thread Jesse Pollard

-  Received message begins Here  -

> 
> On Mon, Mar 26, 2001 at 08:39:21AM -0800, LA Walsh wrote:
> > I vaguely remember a discussion about this a few months back.
> > If I remember, the reasoning was it would unnecessarily slow
> > down smaller systems that would never have block devices in
> > the 4-28T range attached.  
> 
> 4k page size * 2GB = 8TB.
> 
> i consider it much more likely on such systems that the page size will
> be increased to maybe 16 or 64k which would give us 32TB or 128TB.
> you keep on trying to increase the size of types without looking at
> what gcc outputs in the way of code that manipulates 64-bit types.
> seriously, why don't you just try it?  see what the performance is.
> see what the code size is.  then come back with some numbers.  and i mean
> numbers, not `it doesn't feel any slower'.
> 
> personally, i'm going to see what the situation looks like in 5 years time
> and try to solve the problem then.  there're enough real problems with the
> VFS today that i don't feel inclined to fix tomorrow's potential problems.

I don't feel that it is that far away ... IBM has already released a 64 CPU
intel based system (NUMA). We already have systems in that class (though
64 bit based) that use 5 TB file systems. The need is coming, and appears
to be coming fast. It should be resolved during the improvements to the
VFS.

A second reason to include it in the VFS is that the low level filesystem
implementation would NOT be required to use it. If the administrator
CHOOSES to access a 16TB filesystem from a workstation, then it should
be possible (likely something like the GFS, where the administrator is
just monitoring things, would be reasonable for a 32 bit system to do).

As I see it, the VFS itself doesn't really care what the block size is,
it just carries relatively opaque values that the filesystem implementation
uses. Most of the overhead should just be copying an extra 4 bytes around.

-
Jesse I Pollard, II
Email: [EMAIL PROTECTED]

Any opinions expressed are solely my own.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]



Re: 64-bit block sizes on 32-bit systems

2001-03-26 Thread Rik van Riel

On Mon, 26 Mar 2001, Matthew Wilcox wrote:

> people who can afford 2TB of disc can afford to buy a 64-bit processor.

You realise that this'll double the price of storage?  ;)

(at least, in a year or two)

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com.br/

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]



Re: 64-bit block sizes on 32-bit systems

2001-03-26 Thread Scott Laird



On Mon, 26 Mar 2001, Matthew Wilcox wrote:
>
> people who can afford 2TB of disc can afford to buy a 64-bit processor.
>

Sort of.  A back-of-the-envelope calculation shows that 2 TB is only 25
80GB IDE drives.  Given 4 3ware 8-channel IDE controllers and a large
enough case, you could probably build a cheap 2TB RAID0 array for ~$10k.
You could do RAID5 for only slightly more.

While this isn't exactly a standard, off-the-shelf, general-purpose sort
of configuration, it definately has its uses.  Be careful assuming that
huge amounts of disk storage requires a huge amount of money, or a high
level of reliability or performance.


Scott

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]



Re: 64-bit block sizes on 32-bit systems

2001-03-26 Thread Eric W. Biederman

Matthew Wilcox <[EMAIL PROTECTED]> writes:

> On Mon, Mar 26, 2001 at 10:47:13AM -0700, Andreas Dilger wrote:
> > What do you mean by problems 5 years down the road?  The real issue is that
> > this 32-bit block count limit affects composite devices like MD RAID and
> > LVM today, not just individual disks.  There have been several postings
> > I have seen with people having a problem _today_ with a 2TB limit on
> > devices.
> 
> people who can afford 2TB of disc can afford to buy a 64-bit processor.

Currently that doesn't solve the problem as block_nr is held in an int.
And as gcc compiles an int to a 32bit number on a 64bit processor, the
problem still isn't solved.

That at least we need to address.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]



Re: 64-bit block sizes on 32-bit systems

2001-03-26 Thread Matthew Wilcox

On Mon, Mar 26, 2001 at 10:47:13AM -0700, Andreas Dilger wrote:
> What do you mean by problems 5 years down the road?  The real issue is that
> this 32-bit block count limit affects composite devices like MD RAID and
> LVM today, not just individual disks.  There have been several postings
> I have seen with people having a problem _today_ with a 2TB limit on
> devices.

people who can afford 2TB of disc can afford to buy a 64-bit processor.

-- 
Revolutions do not require corporate support.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]



Re: 64-bit block sizes on 32-bit systems

2001-03-26 Thread Eric W. Biederman

LA Walsh <[EMAIL PROTECTED]> writes:

> I vaguely remember a discussion about this a few months back.
> If I remember, the reasoning was it would unnecessarily slow
> down smaller systems that would never have block devices in
> the 4-28T range attached.  

With classic 512 byte sectors the top size is right about 2TB.

The basic thought is that 64bit numbers tend to suck, so we don't
want then in any fast paths on a 32bit system.

> However, isn't it possible there will continue to be a series
> of P-IV,V,VI,VII ...etc, addons that will be used for sometime
> to come.  I've even heard it suggested that we might see
> 2 or more CPU's on a single chip as a way to increase cpu
> capacity w/o driving up clock speed.  Given the cheapness of
> .25T drives now, seeing the possibility of 4T drives doesn't seem
> that remote (maybe 5 years?).  
> 
> Side question: does the 32-bit block size limit also apply to 
> RAID disks or does it use a different block-nr type?
For now yes it does.

> 
> So...is it the plan, or has it been though about -- 'abstracting'
> block numbes as a typedef 'block_nr', then at compile time
> having it be selectable as to whether or not this was to
> be a 32-bit or 64 bit quantity -- that way older systems would
> lose no efficiency.  Drivers that couldn't be or hadn't been
> ported to use 'block_nr' could default to being disabled if
> 64-bit blocks were selected, etc.
> 
> So has this idea been tossed about and or previously thrashed?

Using a 64bit number of 32bit systems has so far been trashed.
Though this does look like a real problem that needs to be solved
at some point.  I doubt we can wait past 2.5 though if we want the
code ready when the hardware is.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]



Re: 64-bit block sizes on 32-bit systems

2001-03-26 Thread Andreas Dilger

Matthew Wilcox writes:
> On Mon, Mar 26, 2001 at 08:39:21AM -0800, LA Walsh wrote:
> > I vaguely remember a discussion about this a few months back.
> > If I remember, the reasoning was it would unnecessarily slow
> > down smaller systems that would never have block devices in
> > the 4-28T range attached.  
> 
> 4k page size * 2GB = 8TB.
> 
> i consider it much more likely on such systems that the page size will
> be increased to maybe 16 or 64k which would give us 32TB or 128TB.
> 
> personally, i'm going to see what the situation looks like in 5 years time
> and try to solve the problem then.

What do you mean by problems 5 years down the road?  The real issue is that
this 32-bit block count limit affects composite devices like MD RAID and
LVM today, not just individual disks.  There have been several postings
I have seen with people having a problem _today_ with a 2TB limit on
devices.

There is some hope with LVM (and MD I suspect as well), that it could
do blocksize remapping, so it appears to be a 4k sector device, but
remaps to 512-byte sector disks underneath.  This _should_ give us an
upper limit of 16TB, assuming 32-bit unsigned ints for block numbers.
Of course, you would need to only do 4kB block I/O on top of these devices
(not much of an issue for such large devices).

Still, this is just a stop-gap measure because next year people will want
> 16TB devices, and there won't be an easy way to do this.

Cheers, Andreas
-- 
Andreas Dilger  \ "If a man ate a pound of pasta and a pound of antipasto,
 \  would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/   -- Dogbert
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]



Re: 64-bit block sizes on 32-bit systems

2001-03-26 Thread LA Walsh


Matthew Wilcox wrote:
> 
> On Mon, Mar 26, 2001 at 08:39:21AM -0800, LA Walsh wrote:
> > I vaguely remember a discussion about this a few months back.
> > If I remember, the reasoning was it would unnecessarily slow
> > down smaller systems that would never have block devices in
> > the 4-28T range attached.
> 
> 4k page size * 2GB = 8TB.
---
Drat...was being more optimistic -- you're right
the block_nr can be negative.  Somehow thought page size could
be 8Kliving in future land.  That just makes the limitations
even closer at hand...:-(

> you keep on trying to increase the size of types without looking at
> what gcc outputs in the way of code that manipulates 64-bit types.
---
Maybe someone will backport some of the features of the
IA-64 code generator into 'gcc'.  I've been told that in some 
cases it's a 2.5x performance difference.  If 'gcc' is generating
bad code, then maybe the 'gcc' people will increase the quality
of their code -- I'm sure they are just as eagerly working on
gcc improvements as we are kernel improvements.  When I worked
on the PL/M compiler project at Intel, I know our code-optimization
guy would spend endless cycles trying to get better optimization
out of the code.  He got great joy out of doing so. -- and
that was almost 20 years ago -- and code generation has come
a *long* way since then.

> seriously, why don't you just try it?  see what the performance is.
> see what the code size is.  then come back with some numbers.  and i mean
> numbers, not `it doesn't feel any slower'.
---
As for 'trying' it -- would anyone care if we virtualized
the block_nr into a typedef?  That seems like it would provide
for cleaner (type-checked) code at no performance penalty and
more easily allow such comparisons.

Well this is my point: if I have disks > 8T, wouldn't
it be at *all* beneficial to be able to *choose* some slight
performance impact and access those large disks vs. having not
choice?  Having it as a configurable would allow a given 
installation to make that choice rather than them having no
choice.  BTW, are block_nr's on RAID arrays subject to this
limitation?
> 
> personally, i'm going to see what the situation looks like in 5 years time
> and try to solve the problem then.
---
It's not the same, but SGI has had customers for over
3 years using >2T *files*.  The point I'm looking at is if
the P-X series gets developed enough, and someone is using a
4-16P system, a corp user might be approaching that limit
today or tomorrow.  Joe User, might not for 5 years, but that's
what the configurability is about.  Keep linux usable for both
ends of the scale -- "I love scalability"

-l

-- 
L A Walsh| Trust Technology, Core Linux, SGI
[EMAIL PROTECTED]  | Voice: (650) 933-5338
-- 
L A Walsh| Trust Technology, Core Linux, SGI
[EMAIL PROTECTED]  | Voice: (650) 933-5338
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]



Re: 64-bit block sizes on 32-bit systems

2001-03-26 Thread Matthew Wilcox

On Mon, Mar 26, 2001 at 08:39:21AM -0800, LA Walsh wrote:
> I vaguely remember a discussion about this a few months back.
> If I remember, the reasoning was it would unnecessarily slow
> down smaller systems that would never have block devices in
> the 4-28T range attached.  

4k page size * 2GB = 8TB.

i consider it much more likely on such systems that the page size will
be increased to maybe 16 or 64k which would give us 32TB or 128TB.
you keep on trying to increase the size of types without looking at
what gcc outputs in the way of code that manipulates 64-bit types.
seriously, why don't you just try it?  see what the performance is.
see what the code size is.  then come back with some numbers.  and i mean
numbers, not `it doesn't feel any slower'.

personally, i'm going to see what the situation looks like in 5 years time
and try to solve the problem then.  there're enough real problems with the
VFS today that i don't feel inclined to fix tomorrow's potential problems.

-- 
Revolutions do not require corporate support.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]



64-bit block sizes on 32-bit systems

2001-03-26 Thread LA Walsh

I vaguely remember a discussion about this a few months back.
If I remember, the reasoning was it would unnecessarily slow
down smaller systems that would never have block devices in
the 4-28T range attached.  

However, isn't it possible there will continue to be a series
of P-IV,V,VI,VII ...etc, addons that will be used for sometime
to come.  I've even heard it suggested that we might see
2 or more CPU's on a single chip as a way to increase cpu
capacity w/o driving up clock speed.  Given the cheapness of
.25T drives now, seeing the possibility of 4T drives doesn't seem
that remote (maybe 5 years?).  

Side question: does the 32-bit block size limit also apply to 
RAID disks or does it use a different block-nr type?

So...is it the plan, or has it been though about -- 'abstracting'
block numbes as a typedef 'block_nr', then at compile time
having it be selectable as to whether or not this was to
be a 32-bit or 64 bit quantity -- that way older systems would
lose no efficiency.  Drivers that couldn't be or hadn't been
ported to use 'block_nr' could default to being disabled if
64-bit blocks were selected, etc.

So has this idea been tossed about and or previously thrashed?

-l

-- 
L A Walsh| Trust Technology, Core Linux, SGI
[EMAIL PROTECTED]  | Voice: (650) 933-5338
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]