On Sun, Feb 7, 2010 at 11:39 AM, Willie Wong <ww...@math.princeton.edu> wrote:
> On Sun, Feb 07, 2010 at 08:27:46AM -0800, Mark Knecht wrote:
>> <QUOTE>
>> 4KB physical sectors: KNOW WHAT YOU'RE DOING!
>>
>> Pros: Quiet, cool-running, big cache
>>
>> Cons: The 4KB physical sectors are a problem waiting to happen. If you
>> misalign your partitions, disk performance can suffer. I ran
>> benchmarks in Linux using a number of filesystems, and I found that
>> with most filesystems, read performance and write performance with
>> large files didn't suffer with misaligned partitions, but writes of
>> many small files (unpacking a Linux kernel archive) could take several
>> times as long with misaligned partitions as with aligned partitions.
>> WD's advice about who needs to be concerned is overly simplistic,
>> IMHO, and it's flat-out wrong for Linux, although it's probably
>> accurate for 90% of buyers (those who run Windows or Mac OS and use
>> their standard partitioning tools). If you're not part of that 90%,
>> though, and if you don't fully understand this new technology and how
>> to handle it, buy a drive with conventional 512-byte sectors!
>> </QUOTE>
>>
>>    Now, I don't mind getting a bit dirty learning to use this
>> correctly but I'm wondering what that means in a practical sense.
>> Reading the mke2fs man page the word 'sector' doesn't come up. It's my
>> understanding the Linux 'blocks' are groups of sectors. True? If the
>> disk must use 4K sectors then what - the smallest block has to be 4K
>> and I'm using 1 sector per block? It seems that ext3 doesn't support
>> anything larger than 4K?
>
> The problem is not when you are making the filesystem with mke2fs, but
> when you partitioned the disk using fdisk. I'm sure I am making some
> small mistakes in the explanation below, but it goes something like
> this:
>
> a) The harddrive with 4K sectors allows the head to efficiently
> read/write 4K sized blocks at a time.
> b) However, to be compatible in hardware, the harddrive allows 512B
> sized blocks to be addressed. In reality, this means that you can
> individually address the 8 512B-sized chunks of the 4K sized blocks,
> but each will count as a separate operation. To illustrate: say the
> hardware has some sector X of size 4K. It has 8 addressable slots
> inside X1 ... X8 each of size 512B. If your OS clusters read/writes on
> the 512B level, it will send 8 commands to read the info in those 8
> blocks separately. If your OS clusters in 4K, it will send one
> command. So in the stupid analysis I give here, it will take 8 times
> as long for the 512B addressing to read the same data, since it will
> take 8 passes, and each time inefficiently reading only 1/8 of the
> data required. Now in reality, drives are smarter than that: if all 8
> of those are sent in sequence, sometimes the drives will cluster them
> together in one read.
> c) A problem occurs, however, when your OS deals with 4K clusters but
> when you make the partition, the partition is offset! Imagine the
> physical read sectors of your disk looking like
>
> AAAAAAAABBBBBBBBCCCCCCCCDDDDDDDD
>
> but when you make your partitions, somehow you partitioned it
>
> ....YYYYYYYYZZZZZZZZWWWWWWWW....
>
> This is possible because the drive allows addressing by 512K chunks.
> So for some reason one of your partitions starts halfway inside a
> physical sector. What is the problem with this? Now suppose your OS
> sends data to be written to the ZZZZZZZZ block. If it were completely
> aligned, the drive will just go kink-move the head to the block, and
> overwrite it with this information. But since half of the block is
> over the BBBB phsical sector, and half over CCCC, what the disk now
> needs to do is to
>
> pass 1) read BBBBBBBB
> pass 2) modify the second half of BBBB to match the first half of ZZZZ
> pass 3) write BBBBBBBB
> pass 4) read CCCCCCCC
> pass 5) modify the first half of CCCC to match the second half of ZZZZ
> pass 6) write CCCCCCCC
>
> Or what is known as a read-modify-write operation. Thus the disk
> becomes a lot less efficient.
>
> ----------
>
> Now, I don't know if this is the actual problem is causing your
> performance problems. But this may be it. When you use fdisk, it
> defaults to aligning the partition to cylinder boundaries, and use the
> default (from ancient times) value of 63 x (512B sized) sectors per
> track. Since 63 is not evenly divisible by 8, you see that quite
> likely some of your partitions are not aligned to the physical sector
> boundaries.
>
> If you use cfdisk, you can try to change the geometry with the command
> g. Or you can use the command u to change the units used in the
> partitioning to either sectors or megabytes, and make sure your
> partition sizes are a multiple of 8 in the former, or an integer in
> the latter.
>
> Again, take what I wrote with a grain of salt: this information came
> from the research I did a little while back after reading the slashdot
> article on this 4K switch. So being my own understanding, it may not
> completely be correct.
>
> HTH,
>
> W
> --
> Willie W. Wong                                     ww...@math.princeton.edu
> Data aequatione quotcunque fluentes quantitae involvente fluxiones invenire
>         et vice versa   ~~~  I. Newton
>
>

Willie,
   Thanks. Your description above is pretty much consistent (I think)
with the information I found at the WD site explaining how the data is
being physically packed on the drive. Being that I have the OS set up
on a different drive I was able to blow away all the partitions so I
just created 1 large 1T partition but I think that doesn't deal with
the exact problem you outline.

   I'll have to study how to change the geometry. I do see that cfdisk
is reporting 255/63/121601. Am I to choose a size that __smaller__
than 63 but a multiple of 8? I.e. - 56? And then if I do that does the
partitioning of the drive just ignore those last 7 sectors and reduce
capacity by 56/63 or about 11%?

   Or is it legal to push the number of sectors up to 64? I would have
thought that the sector count would be driven by really low level
formatting and I shouldn't be messing with that.

   Assuming I have done what you are suggesting then with 7
blocks/track then I need to choose the starting positions of each
partition to be aligned to the start of a new 8 sector blocks?

   It's very strange that the disk industry chose anything that's not
2^X but I guess they did.

   As per your and Volker's suggestions I'm going to study the proper
way to align partitions before I do anything more. I did find a small
program called 'fio' that does some interesting drive testing
including seek time testing. I need to study how to really use it
though. It can set up multiple threads to simulate loads that are more
real-world like.

   Thanks to you both for the responses.

Cheers,
Mark

Reply via email to