[jira] Commented: (MAPREDUCE-1823) Reduce the number of calls of HarFileSystem.getFileStatus in RaidNode

2010-06-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882148#action_12882148
 ] 

Hadoop QA commented on MAPREDUCE-1823:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12447914/MAPREDUCE-1823.txt
  against trunk revision 957437.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/266/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/266/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/266/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/266/console

This message is automatically generated.

> Reduce the number of calls of HarFileSystem.getFileStatus in RaidNode
> -
>
> Key: MAPREDUCE-1823
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1823
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.22.0
>Reporter: Scott Chen
>Assignee: Scott Chen
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1823.txt
>
>
> RaidNode makes lots of calls of HarFileSystem.getFileStatus. This method 
> fetches information from DataNode so it is slow. It becomes the bottleneck of 
> the RaidNode. It will be nice if we can make this more efficient.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1823) Reduce the number of calls of HarFileSystem.getFileStatus in RaidNode

2010-06-23 Thread Scott Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881989#action_12881989
 ] 

Scott Chen commented on MAPREDUCE-1823:
---

In the patch, when performing getFileStatus() in recursing the policy, we do 
listStatus() instead.
And we put the result in a map. This will reduce the number of RPCs to NN.

There is no unit test. This is an optimization and the code path is covered by 
the original tests: TestRaidNode, TestRaidPurge and TestRaidHar.

> Reduce the number of calls of HarFileSystem.getFileStatus in RaidNode
> -
>
> Key: MAPREDUCE-1823
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1823
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.22.0
>Reporter: Scott Chen
>Assignee: Scott Chen
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1823.txt
>
>
> RaidNode makes lots of calls of HarFileSystem.getFileStatus. This method 
> fetches information from DataNode so it is slow. It becomes the bottleneck of 
> the RaidNode. It will be nice if we can make this more efficient.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1823) Reduce the number of calls of HarFileSystem.getFileStatus in RaidNode

2010-05-28 Thread Scott Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12873109#action_12873109
 ] 

Scott Chen commented on MAPREDUCE-1823:
---

Here's the corresponding jstack:
{code}
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
- locked <0x2aaab7e19810> (a sun.nio.ch.Util$1)
- locked <0x2aaab7e197f8> (a java.util.Collections$UnmodifiableSet)
- locked <0x2aaab7e19468> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
at 
org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:332)
at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
- locked <0x2aaae427a320> (a java.io.BufferedInputStream)
at java.io.DataInputStream.readShort(DataInputStream.java:295)
at 
org.apache.hadoop.hdfs.DFSClient$BlockReader.newBlockReader(DFSClient.java:1436)
at 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1698)
- locked <0x2aaae4264f38> (a 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream)
at 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1815)
- locked <0x2aaae4264f38> (a 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream)
at java.io.DataInputStream.read(DataInputStream.java:83)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:134)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:187)
at 
org.apache.hadoop.fs.HarFileSystem.fileStatusInIndex(HarFileSystem.java:441)
at 
org.apache.hadoop.fs.HarFileSystem.getFileStatus(HarFileSystem.java:616)
at org.apache.hadoop.raid.RaidNode.getParityFile(RaidNode.java:541)
at org.apache.hadoop.raid.RaidNode.getParityFile(RaidNode.java:561)
at org.apache.hadoop.raid.RaidNode.recurse(RaidNode.java:639)
at org.apache.hadoop.raid.RaidNode.recurse(RaidNode.java:655)
at org.apache.hadoop.raid.RaidNode.recurse(RaidNode.java:655)
at org.apache.hadoop.raid.RaidNode.selectFiles(RaidNode.java:594)
at org.apache.hadoop.raid.RaidNode.access$300(RaidNode.java:63)
at 
org.apache.hadoop.raid.RaidNode$TriggerMonitor.doProcess(RaidNode.java:374)
at org.apache.hadoop.raid.RaidNode$TriggerMonitor.run(RaidNode.java:313)
at java.lang.Thread.run(Thread.java:619)
{code}

> Reduce the number of calls of HarFileSystem.getFileStatus in RaidNode
> -
>
> Key: MAPREDUCE-1823
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1823
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.22.0
>Reporter: Scott Chen
>Assignee: Scott Chen
> Fix For: 0.22.0
>
>
> RaidNode makes lots of calls of HarFileSystem.getFileStatus. This method 
> fetches information from DataNode so it is slow. It becomes the bottleneck of 
> the RaidNode. It will be nice if we can make this more efficient.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.