RE: [jira] Updated: (HADOOP-1180) NNbench test should be able to test the checksumfilesystem as well as the raw filesystem

hairong Kuang Thu, 29 Mar 2007 16:15:46 -0800

Two main reasons caused the performance decrease:

1. NNBench sets the block size to be 1. Althouth it generates a file with
only 1 byte,  but the file's checksum file has 16 bytes (12 bytes header
plus 4 bytes checksums). Without the checksum file, only 1 block needs to be
generated. With the checksum file, 17 blocks need to be generated. So the
overhead of generating a checksum file is huge in this special case.
Hadoop-1134 should help a lot for this.


2. NotReplicatedYetException occures only when a file has more than 1 block.
Because the checksum file has 16 blocks, it receives
NotReplicatedYetException. The client retries slow down the file writing
significantly. Hadoop-1093 should be able to fix this.

Hairong

-----Original Message-----
From: Raghu Angadi [mailto:[EMAIL PROTECTED] 
Sent: Thursday, March 29, 2007 2:49 PM
To: [email protected]
Subject: Re: [jira] Updated: (HADOOP-1180) NNbench test should be able to
test the checksumfilesystem as well as the raw filesystem

Nigel Daley wrote:
> So shouldn't fixing this test to conform to the new model in 
> HADOOP-1134 be the concern of the patch for HADOOP-1134?  As it stand, 
> I can't run NNBench at scale without using a raw file system, which is 
> what this patch is intended to allow.  HADOOP-928 caused this test to 
> use a ChecksumFileSystem and subsequently we saw our "read" TPS metric 
> plummet from 20,000 to a couple hundred.

Wow! this would be a good test for 1134. I didn't expect the TPS to be so
different. I would expect TPS to remain closer to 20000 with CRCs with 1134

Raghu.

> Let's get our current benchmark back on track before we commit
> HADOOP-1134 (which will likely take a while before it is "Patch 
> Available").
> 
> On Mar 29, 2007, at 11:29 AM, Doug Cutting (JIRA) wrote:
> 
>>
>>      [
>> https://issues.apache.org/jira/browse/HADOOP-1180?page=com.atlassian.
>> jira.plugin.system.issuetabpanels:all-tabpanel
>> ]
>>
>> Doug Cutting updated HADOOP-1180:
>> ---------------------------------
>>
>>     Status: Open  (was: Patch Available)
>>
>> -1 This patch may be rendered obsolete by HADOOP-1134.  And, the way 
>> it is written, the 'useChecksum=false' mode will silently fail to 
>> work once HADOOP-1134 is completed.  So, if we feel we'll want to 
>> continue to support this feature after HADOOP-1134, then we should 
>> add an explicit way of constructing an HDFS FileSystem that does not 
>> perform checksumming, rather than relying on 'instanceof
ChecksumFileSystem'.
>>
>>> NNbench test should be able to test the checksumfilesystem as well 
>>> as the raw filesystem
>>> --------------------------------------------------------------------
>>> --------------------
>>>
>>>
>>>                 Key: HADOOP-1180
>>>                 URL: https://issues.apache.org/jira/browse/HADOOP-1180
>>>             Project: Hadoop
>>>          Issue Type: Bug
>>>          Components: dfs
>>>            Reporter: dhruba borthakur
>>>         Assigned To: dhruba borthakur
>>>         Attachments: nnbench.patch
>>>
>>>
>>> The NNbench test should have the option of testing a file system 
>>> with checksums turned on and with checksums turned off. The original 
>>> behaviour of nnbench test was to test hdfs without checksums.
>>
>> --This message is automatically generated by JIRA.
>> -
>> You can reply to this email to add a comment to the issue online.
>>
>

RE: [jira] Updated: (HADOOP-1180) NNbench test should be able to test the checksumfilesystem as well as the raw filesystem

Reply via email to