[jira] Commented: (HDFS-755) Read multiple checksum chunks at once in DFSInputStream

2010-01-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799108#action_12799108
 ] 

Hudson commented on HDFS-755:
-

Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #183 (See 
[http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/183/])


> Read multiple checksum chunks at once in DFSInputStream
> ---
>
> Key: HDFS-755
> URL: https://issues.apache.org/jira/browse/HDFS-755
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: 0.22.0
>
> Attachments: alldata-hdfs.tsv, benchmark-8-256.png, benchmark.png, 
> hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, 
> hdfs-755.txt
>
>
> HADOOP-3205 adds the ability for FSInputChecker subclasses to read multiple 
> checksum chunks in a single call to readChunk. This is the HDFS-side use of 
> that new feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-755) Read multiple checksum chunks at once in DFSInputStream

2010-01-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799008#action_12799008
 ] 

Hudson commented on HDFS-755:
-

Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #94 (See 
[http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/94/])


> Read multiple checksum chunks at once in DFSInputStream
> ---
>
> Key: HDFS-755
> URL: https://issues.apache.org/jira/browse/HDFS-755
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: 0.22.0
>
> Attachments: alldata-hdfs.tsv, benchmark-8-256.png, benchmark.png, 
> hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, 
> hdfs-755.txt
>
>
> HADOOP-3205 adds the ability for FSInputChecker subclasses to read multiple 
> checksum chunks in a single call to readChunk. This is the HDFS-side use of 
> that new feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-755) Read multiple checksum chunks at once in DFSInputStream

2010-01-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798356#action_12798356
 ] 

Hudson commented on HDFS-755:
-

Integrated in Hadoop-Hdfs-trunk #195 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/195/])


> Read multiple checksum chunks at once in DFSInputStream
> ---
>
> Key: HDFS-755
> URL: https://issues.apache.org/jira/browse/HDFS-755
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: 0.22.0
>
> Attachments: alldata-hdfs.tsv, benchmark-8-256.png, benchmark.png, 
> hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, 
> hdfs-755.txt
>
>
> HADOOP-3205 adds the ability for FSInputChecker subclasses to read multiple 
> checksum chunks in a single call to readChunk. This is the HDFS-side use of 
> that new feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-755) Read multiple checksum chunks at once in DFSInputStream

2010-01-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797844#action_12797844
 ] 

Hudson commented on HDFS-755:
-

Integrated in Hadoop-Hdfs-trunk-Commit #161 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/161/])
. Read multiple checksum chunks at once in DFSInputStream. Contributed by 
Todd Lipcon.


> Read multiple checksum chunks at once in DFSInputStream
> ---
>
> Key: HDFS-755
> URL: https://issues.apache.org/jira/browse/HDFS-755
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: 0.22.0
>
> Attachments: alldata-hdfs.tsv, benchmark-8-256.png, benchmark.png, 
> hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, 
> hdfs-755.txt
>
>
> HADOOP-3205 adds the ability for FSInputChecker subclasses to read multiple 
> checksum chunks in a single call to readChunk. This is the HDFS-side use of 
> that new feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-755) Read multiple checksum chunks at once in DFSInputStream

2010-01-06 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797369#action_12797369
 ] 

Todd Lipcon commented on HDFS-755:
--

Ran this patch through the full test suite on a local build machine, and the 
only failures were completely unrelated (and different than the ones that 
failed above). I believe this is ready to commit, since it was already +1ed 
above.

> Read multiple checksum chunks at once in DFSInputStream
> ---
>
> Key: HDFS-755
> URL: https://issues.apache.org/jira/browse/HDFS-755
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: alldata-hdfs.tsv, benchmark-8-256.png, benchmark.png, 
> hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, 
> hdfs-755.txt
>
>
> HADOOP-3205 adds the ability for FSInputChecker subclasses to read multiple 
> checksum chunks in a single call to readChunk. This is the HDFS-side use of 
> that new feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-755) Read multiple checksum chunks at once in DFSInputStream

2010-01-05 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12796952#action_12796952
 ] 

Todd Lipcon commented on HDFS-755:
--

I think these failures are spurious - the same test passes locally:
{noformat}
[junit] Running org.apache.hadoop.hdfs.TestDataTransferProtocol
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 5.678 sec
{noformat}

> Read multiple checksum chunks at once in DFSInputStream
> ---
>
> Key: HDFS-755
> URL: https://issues.apache.org/jira/browse/HDFS-755
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: alldata-hdfs.tsv, benchmark-8-256.png, benchmark.png, 
> hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, 
> hdfs-755.txt
>
>
> HADOOP-3205 adds the ability for FSInputChecker subclasses to read multiple 
> checksum chunks in a single call to readChunk. This is the HDFS-side use of 
> that new feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-755) Read multiple checksum chunks at once in DFSInputStream

2010-01-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12796947#action_12796947
 ] 

Hadoop QA commented on HDFS-755:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12429481/hdfs-755.txt
  against trunk revision 895877.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/170/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/170/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/170/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/170/console

This message is automatically generated.

> Read multiple checksum chunks at once in DFSInputStream
> ---
>
> Key: HDFS-755
> URL: https://issues.apache.org/jira/browse/HDFS-755
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: alldata-hdfs.tsv, benchmark-8-256.png, benchmark.png, 
> hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, 
> hdfs-755.txt
>
>
> HADOOP-3205 adds the ability for FSInputChecker subclasses to read multiple 
> checksum chunks in a single call to readChunk. This is the HDFS-side use of 
> that new feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-755) Read multiple checksum chunks at once in DFSInputStream

2010-01-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12796895#action_12796895
 ] 

Hadoop QA commented on HDFS-755:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12427406/alldata-hdfs.tsv
  against trunk revision 895877.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 patch.  The patch command could not apply the patch.

Console output: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/169/console

This message is automatically generated.

> Read multiple checksum chunks at once in DFSInputStream
> ---
>
> Key: HDFS-755
> URL: https://issues.apache.org/jira/browse/HDFS-755
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: alldata-hdfs.tsv, benchmark-8-256.png, benchmark.png, 
> hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt
>
>
> HADOOP-3205 adds the ability for FSInputChecker subclasses to read multiple 
> checksum chunks in a single call to readChunk. This is the HDFS-side use of 
> that new feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-755) Read multiple checksum chunks at once in DFSInputStream

2009-12-28 Thread Raghu Angadi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794943#action_12794943
 ] 

Raghu Angadi commented on HDFS-755:
---

There is always a buffer of 512 bytes (checksum chunk size). So the worst case 
is 512 byte reads. If 512 is not large enough, we can decide on some size like 
4k. This way large readers benefit from reduced copy and small readers pay a 
small penalty (1 syscall per 4k).

The misalignment can occur even after the first packet. Another option is to 
have two buffers which which are read alternatively for crc and data (each time 
checking if other buffer has available data).

>  So, I don't think we should do optimizatinos that would destroy performance 
> of this scenario.

true. at the same time this is an optimization jira.

I didn't get around to reproducing cpu improvement. I ran the commands you gave 
(in email). will try again today.

I have already gave a +1 for the patch. We should just note that it needs more 
work to actually make use of HADOOP-3205.

> Read multiple checksum chunks at once in DFSInputStream
> ---
>
> Key: HDFS-755
> URL: https://issues.apache.org/jira/browse/HDFS-755
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: alldata-hdfs.tsv, benchmark-8-256.png, benchmark.png, 
> hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt
>
>
> HADOOP-3205 adds the ability for FSInputChecker subclasses to read multiple 
> checksum chunks in a single call to readChunk. This is the HDFS-side use of 
> that new feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-755) Read multiple checksum chunks at once in DFSInputStream

2009-12-23 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794055#action_12794055
 ] 

Todd Lipcon commented on HDFS-755:
--

bq. User code should use buffering for application specific reasons. May be 
'bufferSize' argument for FSInputStream is flawed to start with.

Personally, I agree, but I think it's out of scope for this JIRA to fix that.

bq. My impression is that main purpose of this patch is to reduce a copy. 
keeping the large buffer prohibits that.

That's true, but I think we need to thoroughly benchmark SequenceFile.Reader 
there, and do it in a separate JIRA. This one as it stands is not a breaking 
change, in that it should not reduce performance for any workload. Having a 
small internal buffer can potentially be breaking, so we should benchmark how 
big that break could be and weigh it vs the improvements.

Aside from making a smaller internal buffer, there are a couple other options 
that might be less "dangerous" - eg using a small buffer for the initial reads, 
then creating a _new_ BufferedInputStream with a fresh buffer to start the data 
reads. This would get rid of the "misalignment" issue here. ChecksumFileSystem 
has this same problem, so introducing our own BufferedInputStream 
implementation that has some tricks to re-align its reads against the buffer.

bq. Even when a sequencefile has very small records (avg < 1k?)

I've seen SequenceFiles used for even smaller records - down to a few bytes (eg 
IntWritable keys and values). Syscalls are cheap but not *that* cheap compared 
to an 8-byte copy. So, I don't think we should do optimizatinos that would 
destroy performance of this scenario.

bq. ...but not been able to see improvement. will verify if I am really running 
the patch. 

Did you run this patch with a core jar that was compiled with HADOOP-3205? To 
test, you need to do "ant -Dresolvers=internal mvn-install" from Common, with 
HADOOP-3205 applied. Then, in the HDFS tree, "ant -Dresolvers=internal 
clean-cache binary" to make sure it pulls your local common build.

> Read multiple checksum chunks at once in DFSInputStream
> ---
>
> Key: HDFS-755
> URL: https://issues.apache.org/jira/browse/HDFS-755
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: alldata-hdfs.tsv, benchmark-8-256.png, benchmark.png, 
> hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt
>
>
> HADOOP-3205 adds the ability for FSInputChecker subclasses to read multiple 
> checksum chunks in a single call to readChunk. This is the HDFS-side use of 
> that new feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-755) Read multiple checksum chunks at once in DFSInputStream

2009-12-21 Thread Raghu Angadi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793113#action_12793113
 ] 

Raghu Angadi commented on HDFS-755:
---

User code should use buffering for application specific reasons. May be 
'bufferSize' argument for FSInputStream is flawed to start with.

My impression is that main purpose of this patch is to reduce a copy. keeping 
the large buffer prohibits that.

Even when a sequencefile has very small records (avg < 1k?), I think it might 
not have net negative effect. system calls are fairly cheap. There might not be 
a net negative effect on fairly small reads.

Do you see FSInputChecker or DFSClient evolve to dynamically decide if a buffer 
should be used in near future?

+1 for the patch itself.

btw, I ran 'time bin/hadoop fs -cat 1gbfile > /dev/null', with NN, DN, and the 
client on the same machine, but not been able to see improvement. will verify 
if I am really running the patch. 


> Read multiple checksum chunks at once in DFSInputStream
> ---
>
> Key: HDFS-755
> URL: https://issues.apache.org/jira/browse/HDFS-755
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: alldata-hdfs.tsv, benchmark-8-256.png, benchmark.png, 
> hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt
>
>
> HADOOP-3205 adds the ability for FSInputChecker subclasses to read multiple 
> checksum chunks in a single call to readChunk. This is the HDFS-side use of 
> that new feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-755) Read multiple checksum chunks at once in DFSInputStream

2009-12-08 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787862#action_12787862
 ] 

Todd Lipcon commented on HDFS-755:
--

bq. What does this translate to for user cpu improvement (say with 32 byte 
buffer in DFSClient)? 

Here's a table of median times for 1G cat. The "before" row is 
CHUNKS_PER_READ=1 (the pre-HADOOP-3205 behavior) and the internal buffer size 
65536 (typical value). The "after" row is CHUNKS_PER_READ=32 and internal 
buffer 64 bytes.

|| ||User||Sys||Wall||
||Before|5.310|1.260|6.010|
||After|4.935|1.235|5.350|
||Improvement|7.06%|1.98%|10.98%|

The sys difference isn't significant according to a t-test. The user/wall are 
definitely significant (p < 2.2e-16). Changing around the internal buffer 
between 32, 50, 64 bytes didn't make any significant differences to any of the 
measurements.

bq. The bufferSize passed to FSInputChecker is essentially a hint

My question is whether people are actually treating it like that in practice. 
For example, SequenceFile.Reader doesn't create its own BufferedInputStream to 
wrap fs.open. It just passes the user-specified buffer size through. If our own 
code isn't wrapping these things with a buffer, should we expect that user code 
is?

> Read multiple checksum chunks at once in DFSInputStream
> ---
>
> Key: HDFS-755
> URL: https://issues.apache.org/jira/browse/HDFS-755
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: benchmark-8-256.png, benchmark.png, hdfs-755.txt, 
> hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt
>
>
> HADOOP-3205 adds the ability for FSInputChecker subclasses to read multiple 
> checksum chunks in a single call to readChunk. This is the HDFS-side use of 
> that new feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-755) Read multiple checksum chunks at once in DFSInputStream

2009-12-08 Thread Raghu Angadi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787756#action_12787756
 ] 

Raghu Angadi commented on HDFS-755:
---

Thanks for the benchmarks. What does this translate to for user cpu improvement 
(say with 32 byte buffer in DFSClient)?

I think the internal buffer should be small for for this patch. It does not 
matter whether a user always wraps with another buffer or not... they 
essentially get performance inline with  their read size. The bufferSize passed 
to FSInputChecker is essentially a hint.

I need to look more into limit on CHUNKS_PER_READ. I don't see much of reason 
to limit it in FSInputChecker (within limits), if users invokes a read with 
large buffer, underlying FS (DFSClient in this case) should have access to that 
buffer...


> Read multiple checksum chunks at once in DFSInputStream
> ---
>
> Key: HDFS-755
> URL: https://issues.apache.org/jira/browse/HDFS-755
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: benchmark-8-256.png, benchmark.png, hdfs-755.txt, 
> hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt
>
>
> HADOOP-3205 adds the ability for FSInputChecker subclasses to read multiple 
> checksum chunks in a single call to readChunk. This is the HDFS-side use of 
> that new feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-755) Read multiple checksum chunks at once in DFSInputStream

2009-12-07 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787224#action_12787224
 ] 

Todd Lipcon commented on HDFS-755:
--

I checked Raghu's idea and it's correct - since there's a buffer in BlockReader 
and it does some small reads at the beginning of the block (to get status code, 
etc) it "misaligns" the reads. I'm benchmarking various values for this buffer 
now with similar benchmarks as I ran in HADOOP-3205. (that JIRA should still be 
fine to commit)

> Read multiple checksum chunks at once in DFSInputStream
> ---
>
> Key: HDFS-755
> URL: https://issues.apache.org/jira/browse/HDFS-755
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, 
> hdfs-755.txt
>
>
> HADOOP-3205 adds the ability for FSInputChecker subclasses to read multiple 
> checksum chunks in a single call to readChunk. This is the HDFS-side use of 
> that new feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-755) Read multiple checksum chunks at once in DFSInputStream

2009-12-06 Thread Raghu Angadi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786788#action_12786788
 ] 

Raghu Angadi commented on HDFS-755:
---

The patch looks good except that BlockReader.in seems to buffered with a large 
buffer. Most of the data might still be going through extra copy.

I haven't run the patch yet... 

> Read multiple checksum chunks at once in DFSInputStream
> ---
>
> Key: HDFS-755
> URL: https://issues.apache.org/jira/browse/HDFS-755
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, 
> hdfs-755.txt
>
>
> HADOOP-3205 adds the ability for FSInputChecker subclasses to read multiple 
> checksum chunks in a single call to readChunk. This is the HDFS-side use of 
> that new feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-755) Read multiple checksum chunks at once in DFSInputStream

2009-12-03 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12785704#action_12785704
 ] 

Eli Collins commented on HDFS-755:
--

Patch looks great. And nice catch with TestFSInputChecker.  +1  

> Read multiple checksum chunks at once in DFSInputStream
> ---
>
> Key: HDFS-755
> URL: https://issues.apache.org/jira/browse/HDFS-755
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, 
> hdfs-755.txt
>
>
> HADOOP-3205 adds the ability for FSInputChecker subclasses to read multiple 
> checksum chunks in a single call to readChunk. This is the HDFS-side use of 
> that new feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-755) Read multiple checksum chunks at once in DFSInputStream

2009-12-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12785699#action_12785699
 ] 

Hadoop QA commented on HDFS-755:


+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12426824/hdfs-755.txt
  against trunk revision 886322.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/131/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/131/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/131/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/131/console

This message is automatically generated.

> Read multiple checksum chunks at once in DFSInputStream
> ---
>
> Key: HDFS-755
> URL: https://issues.apache.org/jira/browse/HDFS-755
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, 
> hdfs-755.txt
>
>
> HADOOP-3205 adds the ability for FSInputChecker subclasses to read multiple 
> checksum chunks in a single call to readChunk. This is the HDFS-side use of 
> that new feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-755) Read multiple checksum chunks at once in DFSInputStream

2009-11-13 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777815#action_12777815
 ] 

Eli Collins commented on HDFS-755:
--

Hey Todd -- patch looks great.  Did you test w/o checksums enabled? 



> Read multiple checksum chunks at once in DFSInputStream
> ---
>
> Key: HDFS-755
> URL: https://issues.apache.org/jira/browse/HDFS-755
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt
>
>
> HADOOP-3205 adds the ability for FSInputChecker subclasses to read multiple 
> checksum chunks in a single call to readChunk. This is the HDFS-side use of 
> that new feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-755) Read multiple checksum chunks at once in DFSInputStream

2009-11-12 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777227#action_12777227
 ] 

Todd Lipcon commented on HDFS-755:
--

No additional tests included since this is an optimization to the existing read 
path, covered by all the existing HDFS tests that read files.

*Note:* after HADOOP-3205 has been committed we should re-submit this patch to 
Hudson before committing this one to double-check the tests.

> Read multiple checksum chunks at once in DFSInputStream
> ---
>
> Key: HDFS-755
> URL: https://issues.apache.org/jira/browse/HDFS-755
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt
>
>
> HADOOP-3205 adds the ability for FSInputChecker subclasses to read multiple 
> checksum chunks in a single call to readChunk. This is the HDFS-side use of 
> that new feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-755) Read multiple checksum chunks at once in DFSInputStream

2009-11-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777218#action_12777218
 ] 

Hadoop QA commented on HDFS-755:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12424749/hdfs-755.txt
  against trunk revision 835179.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/106/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/106/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/106/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/106/console

This message is automatically generated.

> Read multiple checksum chunks at once in DFSInputStream
> ---
>
> Key: HDFS-755
> URL: https://issues.apache.org/jira/browse/HDFS-755
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt
>
>
> HADOOP-3205 adds the ability for FSInputChecker subclasses to read multiple 
> checksum chunks in a single call to readChunk. This is the HDFS-side use of 
> that new feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-755) Read multiple checksum chunks at once in DFSInputStream

2009-11-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776826#action_12776826
 ] 

Hadoop QA commented on HDFS-755:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12424688/hdfs-755.txt
  against trunk revision 835179.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/105/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/105/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/105/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/105/console

This message is automatically generated.

> Read multiple checksum chunks at once in DFSInputStream
> ---
>
> Key: HDFS-755
> URL: https://issues.apache.org/jira/browse/HDFS-755
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: hdfs-755.txt, hdfs-755.txt, hdfs-755.txt
>
>
> HADOOP-3205 adds the ability for FSInputChecker subclasses to read multiple 
> checksum chunks in a single call to readChunk. This is the HDFS-side use of 
> that new feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-755) Read multiple checksum chunks at once in DFSInputStream

2009-11-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776730#action_12776730
 ] 

Hadoop QA commented on HDFS-755:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12424669/hdfs-755.txt
  against trunk revision 835110.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The patch appears to cause tar ant target to fail.

-1 findbugs.  The patch appears to cause Findbugs to fail.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/70/testReport/
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/70/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/70/console

This message is automatically generated.

> Read multiple checksum chunks at once in DFSInputStream
> ---
>
> Key: HDFS-755
> URL: https://issues.apache.org/jira/browse/HDFS-755
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: hdfs-755.txt, hdfs-755.txt
>
>
> HADOOP-3205 adds the ability for FSInputChecker subclasses to read multiple 
> checksum chunks in a single call to readChunk. This is the HDFS-side use of 
> that new feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.