[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14233132#comment-14233132 ] Hudson commented on HDFS-6735: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #24 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/24/]) HDFS-6735. A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream (Lars Hofhansl via stack) (stack: rev 7caa3bc98e6880f98c5c32c486a0c539f9fd3f5f) * hadoop-hdfs-project/hadoop-hdfs/dev-support/findbugsExcludeFile.xml * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/shortcircuit/ShortCircuitCache.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/shortcircuit/ShortCircuitReplica.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/LocatedBlocks.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Lars Hofhansl > Fix For: 2.7.0 > > Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, > HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735-v7.txt, HDFS-6735-v8.txt, > HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14233116#comment-14233116 ] Hudson commented on HDFS-6735: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1978 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1978/]) HDFS-6735. A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream (Lars Hofhansl via stack) (stack: rev 7caa3bc98e6880f98c5c32c486a0c539f9fd3f5f) * hadoop-hdfs-project/hadoop-hdfs/dev-support/findbugsExcludeFile.xml * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/shortcircuit/ShortCircuitReplica.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/LocatedBlocks.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/shortcircuit/ShortCircuitCache.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Lars Hofhansl > Fix For: 2.7.0 > > Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, > HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735-v7.txt, HDFS-6735-v8.txt, > HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14233061#comment-14233061 ] Hudson commented on HDFS-6735: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #24 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/24/]) HDFS-6735. A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream (Lars Hofhansl via stack) (stack: rev 7caa3bc98e6880f98c5c32c486a0c539f9fd3f5f) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/LocatedBlocks.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/shortcircuit/ShortCircuitReplica.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/shortcircuit/ShortCircuitCache.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java * hadoop-hdfs-project/hadoop-hdfs/dev-support/findbugsExcludeFile.xml > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Lars Hofhansl > Fix For: 2.7.0 > > Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, > HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735-v7.txt, HDFS-6735-v8.txt, > HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14233045#comment-14233045 ] Hudson commented on HDFS-6735: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1955 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1955/]) HDFS-6735. A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream (Lars Hofhansl via stack) (stack: rev 7caa3bc98e6880f98c5c32c486a0c539f9fd3f5f) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java * hadoop-hdfs-project/hadoop-hdfs/dev-support/findbugsExcludeFile.xml * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/shortcircuit/ShortCircuitReplica.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/shortcircuit/ShortCircuitCache.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/LocatedBlocks.java > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Lars Hofhansl > Fix For: 2.7.0 > > Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, > HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735-v7.txt, HDFS-6735-v8.txt, > HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14232917#comment-14232917 ] Hudson commented on HDFS-6735: -- SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #24 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/24/]) HDFS-6735. A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream (Lars Hofhansl via stack) (stack: rev 7caa3bc98e6880f98c5c32c486a0c539f9fd3f5f) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/shortcircuit/ShortCircuitCache.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/LocatedBlocks.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/shortcircuit/ShortCircuitReplica.java * hadoop-hdfs-project/hadoop-hdfs/dev-support/findbugsExcludeFile.xml * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Lars Hofhansl > Fix For: 2.7.0 > > Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, > HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735-v7.txt, HDFS-6735-v8.txt, > HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14232903#comment-14232903 ] Hudson commented on HDFS-6735: -- FAILURE: Integrated in Hadoop-Yarn-trunk #763 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/763/]) HDFS-6735. A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream (Lars Hofhansl via stack) (stack: rev 7caa3bc98e6880f98c5c32c486a0c539f9fd3f5f) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/shortcircuit/ShortCircuitReplica.java * hadoop-hdfs-project/hadoop-hdfs/dev-support/findbugsExcludeFile.xml * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/LocatedBlocks.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/shortcircuit/ShortCircuitCache.java > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Lars Hofhansl > Fix For: 2.7.0 > > Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, > HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735-v7.txt, HDFS-6735-v8.txt, > HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14232596#comment-14232596 ] Hudson commented on HDFS-6735: -- FAILURE: Integrated in Hadoop-trunk-Commit #6638 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6638/]) HDFS-6735. A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream (Lars Hofhansl via stack) (stack: rev 7caa3bc98e6880f98c5c32c486a0c539f9fd3f5f) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java * hadoop-hdfs-project/hadoop-hdfs/dev-support/findbugsExcludeFile.xml * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/LocatedBlocks.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/shortcircuit/ShortCircuitReplica.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/shortcircuit/ShortCircuitCache.java > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Lars Hofhansl > Fix For: 2.7.0 > > Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, > HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735-v7.txt, HDFS-6735-v8.txt, > HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14230294#comment-14230294 ] Colin Patrick McCabe commented on HDFS-6735: +1. Thanks, Lars. > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Lars Hofhansl > Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, > HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735-v7.txt, HDFS-6735-v8.txt, > HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14228890#comment-14228890 ] stack commented on HDFS-6735: - Patch looks great to me. Trying it here on a little cluster. Intend to commit Monday unless objection. > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Lars Hofhansl > Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, > HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735-v7.txt, HDFS-6735-v8.txt, > HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14228549#comment-14228549 ] Hadoop QA commented on HDFS-6735: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12684213/HDFS-6735-v8.txt against trunk revision 1556f86. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8869//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8869//console This message is automatically generated. > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Lars Hofhansl > Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, > HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735-v7.txt, HDFS-6735-v8.txt, > HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14227953#comment-14227953 ] Hadoop QA commented on HDFS-6735: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12684096/HDFS-6735-v7.txt against trunk revision c1f2bb2. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8865//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8865//console This message is automatically generated. > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Lars Hofhansl > Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, > HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735-v7.txt, HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14227945#comment-14227945 ] Hadoop QA commented on HDFS-6735: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12684091/HDFS-6735-v6.txt against trunk revision c1f2bb2. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8864//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/8864//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8864//console This message is automatically generated. > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Lars Hofhansl > Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, > HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735-v7.txt, HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14227174#comment-14227174 ] Hadoop QA commented on HDFS-6735: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12683956/HDFS-6735-v6.txt against trunk revision c1f2bb2. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The test build failed in hadoop-hdfs-project/hadoop-hdfs {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8860//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/8860//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8860//console This message is automatically generated. > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Lars Hofhansl > Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, > HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735-v6.txt, HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14227155#comment-14227155 ] Lars Hofhansl commented on HDFS-6735: - So to be specific the improvement I see above is still there. Just that the next thing to tackle is the ShortCircuitCache. > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, > HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735-v6.txt, HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14227148#comment-14227148 ] Lars Hofhansl commented on HDFS-6735: - Tested -v6 with HBase. Still good from the DFSInputStream angle. I do see now that much more time is spent in ShortCircuitCache.fetchOrCreate and unref. (rechecked that is true to -v3 as well). It's still better, but the can is kicked down the road a bit. > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, > HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735-v6.txt, HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14227031#comment-14227031 ] Lars Hofhansl commented on HDFS-6735: - Per my comment above my preference would still be to just make the cachingStrategy reference volatile in DFSInputStream. It is immutable and hence the volatile reference would make access safe in all cases without any locking - the same is true for fileEncryptionInfo, btw (immutable already, just needs a volatile reference, no locking needed at all). I'll make a new patch. > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, > HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14226896#comment-14226896 ] stack commented on HDFS-6735: - Interesting CachingStrategy can be changed on a DFSIS post-construction. Could avoid infolock on cachingstrategy if pre-made the readahead and dropbehinds but that's probably OTT. Nice doc'ing of locking strategy around data members. If we are doing a openInfo call, we can't service a filelength; I suppose thats how it should be; if we are updating our block info, file length could change and if updating block info, somethings up w/ the block set we currently have... At least the lock has climbed down from a lock on 'this'. Good. Patch LGTM (I like the Colin feedback above). Numbers still pretty good [~larsh]? > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, > HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14226791#comment-14226791 ] Colin Patrick McCabe commented on HDFS-6735: * {{readWithStrategy}} and {{blockSeekTo}} should be marked {{synchronized}}. Yes, they are called from a {{synchronized}} function, but let's make it clear. It's kind of confusing to see us fooling around with {{pos}} and other stuff without seeing a {{synchronized}} on the function. * We should document in a comment that we cannot try to take the DFSInputStream lock when holding the infoLock. We need to be careful to avoid deadlock, and maintaining this lock ordering is the easiest way. * I noticed that in {{blockSeekTo}}, we are holding the {{infoLock}} when calling {{BlockReaderFactory#build}}. It would be nice to avoid this. That function does a lot of stuff... if we're creating a {{RemoteBlockReader2}}, it potentially blocks while a TCP connection to the DataNode is opened. It seems like all you need the {{infoLock}} for here is to get the {{cachingStrategy}} and determine if {{shortCircuitForbidden}}, and you could pull this out into a synchronized block prior to the {{Builder#build}} call, similar to how {{actualGetFromOneDataNode}} does it. Incidentally, the findbugs warning is probably because findbugs doesn't realize that {{CachingStrategy}} is an immutable class, and so it's safe to access it without locking. (The only thing you need locking for is actually reading the current reference to the object, not for accessing the object itself.) +1 once those are addressed > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, > HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14225473#comment-14225473 ] Hadoop QA commented on HDFS-6735: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12683657/HDFS-6735-v6.txt against trunk revision 78f7cdb. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8834//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8834//console This message is automatically generated. > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, > HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14224353#comment-14224353 ] Steve Loughran commented on HDFS-6735: -- you can fix the findbugs warnng by tweaking {{hadoop-hdfs-project/hadoop-hdfs/dev-support/findbugsExcludeFile.xml }} and including that diff in the patch > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, > HDFS-6735-v5.txt, HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14223901#comment-14223901 ] Lars Hofhansl commented on HDFS-6735: - The remaining findbugs warning is due to cachingStrategy. I am 100% sure that the locking is correct, every single reference to cachingStrategy is guarded by the infoLock. This should good to go (happy to squash the bogus findbugs warning if somebody has a suggestion how). The findbugs website states this for IS2_INCONSISTENT_SYNC: {quote} Note that there are various sources of inaccuracy in this detector; for example, the detector cannot statically detect all situations in which a lock is held. Also, even when the detector is accurate in distinguishing locked vs. unlocked accesses, the code in question may still be correct. {quote} > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, > HDFS-6735-v5.txt, HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14222772#comment-14222772 ] Hadoop QA commented on HDFS-6735: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12683270/HDFS-6735-v5.txt against trunk revision a4df9ee. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8817//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/8817//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8817//console This message is automatically generated. > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, > HDFS-6735-v5.txt, HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14222673#comment-14222673 ] Lars Hofhansl commented on HDFS-6735: - s/since we never get into that if block if we coming from a called synchronized/since we *only* get into that if block if we coming from a caller synchronized/ > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, > HDFS-6735-v5.txt, HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14222544#comment-14222544 ] Hadoop QA commented on HDFS-6735: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12683239/HDFS-6735-v4.txt against trunk revision a4df9ee. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8814//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/8814//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8814//console This message is automatically generated. > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, > HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221839#comment-14221839 ] Lars Hofhansl commented on HDFS-6735: - Thanks [~cmccabe]. I'll put the synchronized back, do the correct indentation, and name the new lock differently. I'll also look through the other synchronized modifiers that I had removed from private methods where is makes sense. On the indentation... I completely agree. It's hard to review - sometimes I apply HBase patches locally just so that I can do a git diff -b to review it without the whitespace, which is a pain. And if not done in all branches then cherry-picking a patch becomes annoying, etc, etc. Thanks again for looking! New patch upcoming. > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14220659#comment-14220659 ] Colin Patrick McCabe commented on HDFS-6735: bq. Another locking option is not to synchronize on at all, but to have two locks ("streamLock" and "pLock", or whatever are good names). That way the intend might be more explicit. Yet another option would be to disentangle to two apis by subclassing or delegation (since the issue really is that we have state for two different modes of operation in the same class), that'd be a bigger change though. Yeah, I thought about that too. It seems cleaner, intuitively, but it would also involve a lot of re-indentation, since we could not have "synchronized" methods any more, but would have to have synchronized blocks on the "positional lock". It seems like such a trivial thing, but changing indentation can be painful. I'm fine with committing something like the current patch, if we can make the changes I suggested above. We can think about additional cleanups in a follow-on JIRA, I guess. I know you guys have spent a lot of time on this and it's important for HBase perf. bq. Tested this with HBase and observed with a sampler that all delays internal to DFSInputStream are gone, which is nice. awesome! > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14220654#comment-14220654 ] Colin Patrick McCabe commented on HDFS-6735: bq. re: tryReadZeroCopy removing the synchronization is fine, because it is only called from (stateful) read(...) and pos is only used in the stateful read path and hence needs to be guarded by the lock on only. Why not make it {{synchronized}} then? There is no extra overhead if we already have the lock, and this makes it self-documenting. > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14219075#comment-14219075 ] Lars Hofhansl commented on HDFS-6735: - Apologies for the spam... I have a backport of this to branch-2.4 in case anybody is interested. > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14219074#comment-14219074 ] Lars Hofhansl commented on HDFS-6735: - re: tryReadZeroCopy removing the synchronization is fine, because it is only called from (stateful) read(...) and pos is only used in the stateful read path and hence needs to be guarded by the lock on only. > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14219072#comment-14219072 ] Lars Hofhansl commented on HDFS-6735: - Thanks [~cmccabe]. "infoLock" is better. I'll fix the indentation later. Let me have a look at tryReadZeroCopy again. I had mapped out all members and which methods use what, and concluded the synchronized wasn't needed, quite possible I made a mistake. Another locking option is not to synchronize on at all, but to have two locks ("streamLock" and "pLock", or whatever are good names). That way the intend might be more explicit. Yet another option would be to disentangle to two apis by subclassing or delegation (since the issue really is that we have state for two different modes of operation in the same class), that'd be a bigger change though. Meanwhile in HBase land: Tested this with HBase and observed with a sampler that all delays internal to DFSInputStream are gone, which is nice. I committed a change to HBase to allow us to (1) have compaction use their own input streams so they do not interfere with user scans along the same files and (2) optionally force p-reads for all user scans. See HBASE-12411. Especially with #2 I see nice speedups for many concurrent scanners essentially to what my disks can sustain, but a 50% slow downs for a single scanner per file only - which is obvious as we're not benefiting from prefetching now. > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14219026#comment-14219026 ] Colin Patrick McCabe commented on HDFS-6735: This is definitely a tricky change but I think it will have a lot of benefits. Thanks again for taking this on. I can see that you didn't update the indent. I guess that makes things easier to review. But eventually we will need to add indentation for the things that are now surrounded by a new {{synchronized}} block. {code} - private synchronized ByteBuffer tryReadZeroCopy(int maxLength, + private ByteBuffer tryReadZeroCopy(int maxLength, EnumSet opts) throws IOException { {code} I don't think that we can remove the synchronization here, since this function is using {{pos}} and some other stuff which is positional. re: CachingStrategy: I think this should just be protected by {{sharedLock}}, right? It shouldn't need to be volatile if we just grab the current value when holding {{sharedLock}}. I'm not sure "sharedLock" is the best name. All locks are shared, right? If we weren't sharing between threads, there would be no need for locks. I can see that the intention is that sharedLock = shared between read and pread. But sharedLock is also used by stuff like getFileEncryptionInfo. How about calling this "infoLock" instead? It's protecting the block info we got from the NameNode, lastBlockBeingWrittenLength, and fileEncryptionInfo (and I think cachingstrategy, if my previous comment makes sense?) > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218802#comment-14218802 ] Lars Hofhansl commented on HDFS-6735: - I ran TestByteArrayManager as well as all tests derived from TestParallelReadUtil. All pass locally. Will checkout the findbugs warning and do an real-life test with HBase (with this patch on top of the latest 2.4) Any recommendation on what else I should test? > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14203875#comment-14203875 ] Hadoop QA commented on HDFS-6735: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12680470/HDFS-6735-v3.txt against trunk revision 9ba8d8c. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 4 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The following test timeouts occurred in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.util.TestByteArrayManager {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8704//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/8704//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8704//console This message is automatically generated. > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194904#comment-14194904 ] Colin Patrick McCabe commented on HDFS-6735: Hi Lars, Thanks for working on this. It seems like everyone agrees that {{DFSInputStream#read}} doesn't need to block {{DFSInputStream#pread}} or {{DFSInputStream#getFileLength}}. As per my comment here: https://issues.apache.org/jira/browse/HDFS-6698?focusedCommentId=14194902&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14194902 I wonder if we could combine this with HDFS-6698 as "improve concurrency in DFSInputStream" > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-6735-v2.txt, HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14193634#comment-14193634 ] Hadoop QA commented on HDFS-6735: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12657297/HDFS-6735-v2.txt against trunk revision 5c0381c. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8624//console This message is automatically generated. > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-6735-v2.txt, HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14193629#comment-14193629 ] Lars Hofhansl commented on HDFS-6735: - As described in HDFS-6698, the potential performance gains for something like HBase are substantial. I agree it's better to keep LocatedBlocks as not threadsafe and require called to lock accordingly. I've not see fetchAt in a hot path (at least not from HBase usage patterns). seek + read (non positional) cannot be done concurrently, agreed. pread should be possible, though. How should we continue to move on this? Seems important. :) Also open to suggestions about how to fix things in HBase (see last comment in HDFS-6698, about how HBase handles things and how limited concurrency "within" an InputStream is an issue). > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-6735-v2.txt, HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14090419#comment-14090419 ] Yi Liu commented on HDFS-6735: -- Hi [~xieliang007], thanks for the patch. Agree with [~cmccabe] that we should figure out what the tread-safty guarantees are. >From the patch itself, my comments are: * instead of making volatile for locatedBlocks variable, we may need to define a separate lock for locatedBlocks operations. * {{fetchBlockAt}} should be handled. * {{failures}} and {{readStatistics}} are not pretected if support multi-threads. > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-6735-v2.txt, HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14082749#comment-14082749 ] Colin Patrick McCabe commented on HDFS-6735: Thanks, Steve. And thanks Stack and Liang, for bearing with me on this one :) Looking forward to figuring this out. > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-6735-v2.txt, HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14082114#comment-14082114 ] Steve Loughran commented on HDFS-6735: -- Linking to HADOOP-9361 as it raises the question "what concurrency guarantees are all filesystem input streams meant to offer" > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-6735-v2.txt, HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14072228#comment-14072228 ] Colin Patrick McCabe commented on HDFS-6735: Changing the locking model for the DFSInputStream seems like a big project. Can we have a design doc for this and for HDFS-6698? I'm also not sure if we document the thread-safety guarantees offered by the DFSInputStream anywhere. Most things seem to be protected by locks, but we should discuss what the guarantees are and put them as comments in the code explicitly. We should figure out what the thread-safety guarantees are (and which operations block which other operations). For example, a non-positional read probably always has to block another non-positional read, but the situation with other ops is less clear. > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-6735-v2.txt, HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071655#comment-14071655 ] Liang Xie commented on HDFS-6735: - The failed TestPipelinesFailover & TestNamenodeCapacityReport were not related with current patch(i saw them in other recent reports) > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-6735-v2.txt, HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071628#comment-14071628 ] Hadoop QA commented on HDFS-6735: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12657297/HDFS-6735-v2.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover org.apache.hadoop.hdfs.server.namenode.TestNamenodeCapacityReport {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7439//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7439//console This message is automatically generated. > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-6735-v2.txt, HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071440#comment-14071440 ] Liang Xie commented on HDFS-6735: - bq. I'd think you'd want a comment at least in LocatedBlocks#underConstruction warning an upper layer is dependent on it being final in case LocatedBlocks changes and starts to allow blocks complete under a stream. done. bq. locatedBlocks.insertRange(targetBlockIdx, newBlocks.getLocatedBlocks()); ... be inside a synchronization too? Could two threads be updating block locations at same time? yes, it's possible. but we could not put a "synchronization" that, it's different from "synchronized (this) { + pos = offset; + blockEnd = blk.getStartOffset() + blk.getBlockSize() - 1; + currentLocatedBlock = blk; + }", because in pread scenario, the "updatePosition" is false, so will never go into the "synchronized (this) { + pos = offset; + blockEnd = blk.getStartOffset() + blk.getBlockSize() - 1; + currentLocatedBlock = blk; + }". And if we put a "synchronization" there, so if pread reach here, it's still blocked by other monitor holder, e.g. read() :) but we can have a "synchronized" or rwLock in Locatedblocks class. let me try > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-6735-v2.txt, HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071435#comment-14071435 ] Hadoop QA commented on HDFS-6735: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12657271/HDFS-6735.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestDatanodeConfig org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover org.apache.hadoop.hdfs.TestPread org.apache.hadoop.hdfs.TestDataTransferKeepalive {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7431//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7431//console This message is automatically generated. > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071409#comment-14071409 ] Liang Xie commented on HDFS-6735: - bq. We'd not check in the test since it does not assert anything? We'd just check it in as a utility testing concurrent pread throughput? In the last of testing code snippet, there're assertion, see: {code} assertTrue(readLatency.readMs > readLatency.readMs); //because we issued a pread already, so the second one should not hit //disk, even consider running on a slow VM, 1 second should be fine? assertTrue(readLatency.preadMs < 1000); {code} Per "assertTrue(readLatency.preadMs < 1000);", we could know weather the pread() be blocked by read() or not :) > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071382#comment-14071382 ] stack commented on HDFS-6735: - Nice test. We'd not check in the test since it does not assert anything? We'd just check it in as a utility testing concurrent pread throughput? On the below, it is 'safe' because the data member being queried if 'final'? - synchronized boolean shortCircuitForbidden() { + boolean shortCircuitForbidden() { return locatedBlocks.isUnderConstruction(); } I'd think you'd want a comment at least in LocatedBlocks#underConstruction warning an upper layer is dependent on it being final in case LocatedBlocks changes and starts to allow blocks complete under a stream. Looking at this change: - pos = offset; - blockEnd = blk.getStartOffset() + blk.getBlockSize() - 1; - currentLocatedBlock = blk; + synchronized (this) { +pos = offset; +blockEnd = blk.getStartOffset() + blk.getBlockSize() - 1; +currentLocatedBlock = blk; + } ... it makes sense. Should the line from a few lines above: locatedBlocks.insertRange(targetBlockIdx, newBlocks.getLocatedBlocks()); ... be inside a synchronization too? Could two threads be updating block locations at same time? The below looks safe to me: - private synchronized List getFinalizedBlockRange( + private List getFinalizedBlockRange( Else patch looks great. > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071335#comment-14071335 ] Liang Xie commented on HDFS-6735: - This patch includes HDFS-6698 already. With the source change, the added "testPreadBlockedbyRead" result in my box: {code} grep cost hadoop-hdfs-project/hadoop-hdfs/target/surefire-reports/org.apache.hadoop.hdfs.TestPread-output.txt read() cost:5008ms, pread() cost:5ms {code} Without the source change(off cause, you still need keep the "readDelay" injecting faults in src code), the added "testPreadBlockedbyRead" result in my box: {code} read() cost:5009ms, pread() cost:4912ms {code} > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. -- This message was sent by Atlassian JIRA (v6.2#6252)