Re: Hadoop use direct I/O in Linux?

Da Zheng Wed, 05 Jan 2011 15:46:03 -0800

I'm not sure of that. I wrote a small checksum program for testing.After the size of a block gets to larger than 8192 bytes, I don't seemuch performance improvement. See the code below. I don't think 64MB canbring us any benefit.I did change io.bytes.per.checksum to 131072 in hadoop, and the programran about 4 or 5 minutes faster (the total time for reducing is about 35minutes).


import java.util.zip.CRC32;
import java.util.zip.Checksum;



public class Test1 {
    public static void main(String args[]) {
        Checksum sum = new CRC32();
        byte[] bs = new byte[512];
        final int tot_size = 64 * 1024 * 1024;
        long time = System.nanoTime();
        for (int k = 0; k < tot_size / bs.length; k++) {
            for (int i = 0; i < bs.length; i++)
                bs[i] = (byte) i;
            sum.update(bs, 0, bs.length);
        }

System.out.println("takes " + (System.nanoTime() - time) / 1000/ 1000);

    }
}


On 01/05/2011 05:03 PM, Milind Bhandarkar wrote:

I agree with Jay B. Checksumming is usually the culprit for high CPU on clients 
and datanodes. Plus, a checksum of 4 bytes for every 512, means for 64MB block, 
the checksum will be 512KB, i.e. 128 ext3 blocks. Changing it to generate 1 
ext3 checksum block per DFS block will speedup read/write without any loss of 
reliability.

- milind

---
Milind Bhandarkar
(mbhandar...@linkedin.com)
(650-776-3236)

Re: Hadoop use direct I/O in Linux?

Reply via email to