Two main reasons caused the performance decrease: 1. NNBench sets the block size to be 1. Althouth it generates a file with only 1 byte, but the file's checksum file has 16 bytes (12 bytes header plus 4 bytes checksums). Without the checksum file, only 1 block needs to be generated. With the checksum file, 17 blocks need to be generated. So the overhead of generating a checksum file is huge in this special case. Hadoop-1134 should help a lot for this.
2. NotReplicatedYetException occures only when a file has more than 1 block. Because the checksum file has 16 blocks, it receives NotReplicatedYetException. The client retries slow down the file writing significantly. Hadoop-1093 should be able to fix this. Hairong -----Original Message----- From: Raghu Angadi [mailto:[EMAIL PROTECTED] Sent: Thursday, March 29, 2007 2:49 PM To: [email protected] Subject: Re: [jira] Updated: (HADOOP-1180) NNbench test should be able to test the checksumfilesystem as well as the raw filesystem Nigel Daley wrote: > So shouldn't fixing this test to conform to the new model in > HADOOP-1134 be the concern of the patch for HADOOP-1134? As it stand, > I can't run NNBench at scale without using a raw file system, which is > what this patch is intended to allow. HADOOP-928 caused this test to > use a ChecksumFileSystem and subsequently we saw our "read" TPS metric > plummet from 20,000 to a couple hundred. Wow! this would be a good test for 1134. I didn't expect the TPS to be so different. I would expect TPS to remain closer to 20000 with CRCs with 1134 Raghu. > Let's get our current benchmark back on track before we commit > HADOOP-1134 (which will likely take a while before it is "Patch > Available"). > > On Mar 29, 2007, at 11:29 AM, Doug Cutting (JIRA) wrote: > >> >> [ >> https://issues.apache.org/jira/browse/HADOOP-1180?page=com.atlassian. >> jira.plugin.system.issuetabpanels:all-tabpanel >> ] >> >> Doug Cutting updated HADOOP-1180: >> --------------------------------- >> >> Status: Open (was: Patch Available) >> >> -1 This patch may be rendered obsolete by HADOOP-1134. And, the way >> it is written, the 'useChecksum=false' mode will silently fail to >> work once HADOOP-1134 is completed. So, if we feel we'll want to >> continue to support this feature after HADOOP-1134, then we should >> add an explicit way of constructing an HDFS FileSystem that does not >> perform checksumming, rather than relying on 'instanceof ChecksumFileSystem'. >> >>> NNbench test should be able to test the checksumfilesystem as well >>> as the raw filesystem >>> -------------------------------------------------------------------- >>> -------------------- >>> >>> >>> Key: HADOOP-1180 >>> URL: https://issues.apache.org/jira/browse/HADOOP-1180 >>> Project: Hadoop >>> Issue Type: Bug >>> Components: dfs >>> Reporter: dhruba borthakur >>> Assigned To: dhruba borthakur >>> Attachments: nnbench.patch >>> >>> >>> The NNbench test should have the option of testing a file system >>> with checksums turned on and with checksums turned off. The original >>> behaviour of nnbench test was to test hdfs without checksums. >> >> --This message is automatically generated by JIRA. >> - >> You can reply to this email to add a comment to the issue online. >> >
