[jira] Commented: (MAPREDUCE-1823) Reduce the number of calls of HarFileSystem.getFileStatus in RaidNode
[ https://issues.apache.org/jira/browse/MAPREDUCE-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882148#action_12882148 ] Hadoop QA commented on MAPREDUCE-1823: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12447914/MAPREDUCE-1823.txt against trunk revision 957437. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/266/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/266/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/266/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/266/console This message is automatically generated. > Reduce the number of calls of HarFileSystem.getFileStatus in RaidNode > - > > Key: MAPREDUCE-1823 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1823 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 0.22.0 >Reporter: Scott Chen >Assignee: Scott Chen > Fix For: 0.22.0 > > Attachments: MAPREDUCE-1823.txt > > > RaidNode makes lots of calls of HarFileSystem.getFileStatus. This method > fetches information from DataNode so it is slow. It becomes the bottleneck of > the RaidNode. It will be nice if we can make this more efficient. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1823) Reduce the number of calls of HarFileSystem.getFileStatus in RaidNode
[ https://issues.apache.org/jira/browse/MAPREDUCE-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881989#action_12881989 ] Scott Chen commented on MAPREDUCE-1823: --- In the patch, when performing getFileStatus() in recursing the policy, we do listStatus() instead. And we put the result in a map. This will reduce the number of RPCs to NN. There is no unit test. This is an optimization and the code path is covered by the original tests: TestRaidNode, TestRaidPurge and TestRaidHar. > Reduce the number of calls of HarFileSystem.getFileStatus in RaidNode > - > > Key: MAPREDUCE-1823 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1823 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 0.22.0 >Reporter: Scott Chen >Assignee: Scott Chen > Fix For: 0.22.0 > > Attachments: MAPREDUCE-1823.txt > > > RaidNode makes lots of calls of HarFileSystem.getFileStatus. This method > fetches information from DataNode so it is slow. It becomes the bottleneck of > the RaidNode. It will be nice if we can make this more efficient. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1823) Reduce the number of calls of HarFileSystem.getFileStatus in RaidNode
[ https://issues.apache.org/jira/browse/MAPREDUCE-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12873109#action_12873109 ] Scott Chen commented on MAPREDUCE-1823: --- Here's the corresponding jstack: {code} at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) - locked <0x2aaab7e19810> (a sun.nio.ch.Util$1) - locked <0x2aaab7e197f8> (a java.util.Collections$UnmodifiableSet) - locked <0x2aaab7e19468> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:332) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) - locked <0x2aaae427a320> (a java.io.BufferedInputStream) at java.io.DataInputStream.readShort(DataInputStream.java:295) at org.apache.hadoop.hdfs.DFSClient$BlockReader.newBlockReader(DFSClient.java:1436) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1698) - locked <0x2aaae4264f38> (a org.apache.hadoop.hdfs.DFSClient$DFSInputStream) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1815) - locked <0x2aaae4264f38> (a org.apache.hadoop.hdfs.DFSClient$DFSInputStream) at java.io.DataInputStream.read(DataInputStream.java:83) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:134) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:187) at org.apache.hadoop.fs.HarFileSystem.fileStatusInIndex(HarFileSystem.java:441) at org.apache.hadoop.fs.HarFileSystem.getFileStatus(HarFileSystem.java:616) at org.apache.hadoop.raid.RaidNode.getParityFile(RaidNode.java:541) at org.apache.hadoop.raid.RaidNode.getParityFile(RaidNode.java:561) at org.apache.hadoop.raid.RaidNode.recurse(RaidNode.java:639) at org.apache.hadoop.raid.RaidNode.recurse(RaidNode.java:655) at org.apache.hadoop.raid.RaidNode.recurse(RaidNode.java:655) at org.apache.hadoop.raid.RaidNode.selectFiles(RaidNode.java:594) at org.apache.hadoop.raid.RaidNode.access$300(RaidNode.java:63) at org.apache.hadoop.raid.RaidNode$TriggerMonitor.doProcess(RaidNode.java:374) at org.apache.hadoop.raid.RaidNode$TriggerMonitor.run(RaidNode.java:313) at java.lang.Thread.run(Thread.java:619) {code} > Reduce the number of calls of HarFileSystem.getFileStatus in RaidNode > - > > Key: MAPREDUCE-1823 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1823 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 0.22.0 >Reporter: Scott Chen >Assignee: Scott Chen > Fix For: 0.22.0 > > > RaidNode makes lots of calls of HarFileSystem.getFileStatus. This method > fetches information from DataNode so it is slow. It becomes the bottleneck of > the RaidNode. It will be nice if we can make this more efficient. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.