[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-12-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14233132#comment-14233132
 ] 

Hudson commented on HDFS-6735:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #24 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/24/])
HDFS-6735. A minor optimization to avoid pread() be blocked by read() inside 
the same DFSInputStream (Lars Hofhansl via stack) (stack: rev 
7caa3bc98e6880f98c5c32c486a0c539f9fd3f5f)
* hadoop-hdfs-project/hadoop-hdfs/dev-support/findbugsExcludeFile.xml
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/shortcircuit/ShortCircuitCache.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/shortcircuit/ShortCircuitReplica.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/LocatedBlocks.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java


> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Lars Hofhansl
> Fix For: 2.7.0
>
> Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, 
> HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735-v7.txt, HDFS-6735-v8.txt, 
> HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-12-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14233116#comment-14233116
 ] 

Hudson commented on HDFS-6735:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1978 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1978/])
HDFS-6735. A minor optimization to avoid pread() be blocked by read() inside 
the same DFSInputStream (Lars Hofhansl via stack) (stack: rev 
7caa3bc98e6880f98c5c32c486a0c539f9fd3f5f)
* hadoop-hdfs-project/hadoop-hdfs/dev-support/findbugsExcludeFile.xml
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/shortcircuit/ShortCircuitReplica.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/LocatedBlocks.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/shortcircuit/ShortCircuitCache.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java


> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Lars Hofhansl
> Fix For: 2.7.0
>
> Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, 
> HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735-v7.txt, HDFS-6735-v8.txt, 
> HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-12-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14233061#comment-14233061
 ] 

Hudson commented on HDFS-6735:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #24 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/24/])
HDFS-6735. A minor optimization to avoid pread() be blocked by read() inside 
the same DFSInputStream (Lars Hofhansl via stack) (stack: rev 
7caa3bc98e6880f98c5c32c486a0c539f9fd3f5f)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/LocatedBlocks.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/shortcircuit/ShortCircuitReplica.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/shortcircuit/ShortCircuitCache.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
* hadoop-hdfs-project/hadoop-hdfs/dev-support/findbugsExcludeFile.xml


> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Lars Hofhansl
> Fix For: 2.7.0
>
> Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, 
> HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735-v7.txt, HDFS-6735-v8.txt, 
> HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-12-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14233045#comment-14233045
 ] 

Hudson commented on HDFS-6735:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1955 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1955/])
HDFS-6735. A minor optimization to avoid pread() be blocked by read() inside 
the same DFSInputStream (Lars Hofhansl via stack) (stack: rev 
7caa3bc98e6880f98c5c32c486a0c539f9fd3f5f)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
* hadoop-hdfs-project/hadoop-hdfs/dev-support/findbugsExcludeFile.xml
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/shortcircuit/ShortCircuitReplica.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/shortcircuit/ShortCircuitCache.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/LocatedBlocks.java


> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Lars Hofhansl
> Fix For: 2.7.0
>
> Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, 
> HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735-v7.txt, HDFS-6735-v8.txt, 
> HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-12-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14232917#comment-14232917
 ] 

Hudson commented on HDFS-6735:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #24 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/24/])
HDFS-6735. A minor optimization to avoid pread() be blocked by read() inside 
the same DFSInputStream (Lars Hofhansl via stack) (stack: rev 
7caa3bc98e6880f98c5c32c486a0c539f9fd3f5f)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/shortcircuit/ShortCircuitCache.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/LocatedBlocks.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/shortcircuit/ShortCircuitReplica.java
* hadoop-hdfs-project/hadoop-hdfs/dev-support/findbugsExcludeFile.xml
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Lars Hofhansl
> Fix For: 2.7.0
>
> Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, 
> HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735-v7.txt, HDFS-6735-v8.txt, 
> HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-12-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14232903#comment-14232903
 ] 

Hudson commented on HDFS-6735:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #763 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/763/])
HDFS-6735. A minor optimization to avoid pread() be blocked by read() inside 
the same DFSInputStream (Lars Hofhansl via stack) (stack: rev 
7caa3bc98e6880f98c5c32c486a0c539f9fd3f5f)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/shortcircuit/ShortCircuitReplica.java
* hadoop-hdfs-project/hadoop-hdfs/dev-support/findbugsExcludeFile.xml
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/LocatedBlocks.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/shortcircuit/ShortCircuitCache.java


> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Lars Hofhansl
> Fix For: 2.7.0
>
> Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, 
> HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735-v7.txt, HDFS-6735-v8.txt, 
> HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-12-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14232596#comment-14232596
 ] 

Hudson commented on HDFS-6735:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6638 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6638/])
HDFS-6735. A minor optimization to avoid pread() be blocked by read() inside 
the same DFSInputStream (Lars Hofhansl via stack) (stack: rev 
7caa3bc98e6880f98c5c32c486a0c539f9fd3f5f)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
* hadoop-hdfs-project/hadoop-hdfs/dev-support/findbugsExcludeFile.xml
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/LocatedBlocks.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/shortcircuit/ShortCircuitReplica.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/shortcircuit/ShortCircuitCache.java


> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Lars Hofhansl
> Fix For: 2.7.0
>
> Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, 
> HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735-v7.txt, HDFS-6735-v8.txt, 
> HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-12-01 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14230294#comment-14230294
 ] 

Colin Patrick McCabe commented on HDFS-6735:


+1.  Thanks, Lars.

> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Lars Hofhansl
> Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, 
> HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735-v7.txt, HDFS-6735-v8.txt, 
> HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-29 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14228890#comment-14228890
 ] 

stack commented on HDFS-6735:
-

Patch looks great to me.  Trying it here on a little cluster.  Intend to commit 
Monday unless objection.

> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Lars Hofhansl
> Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, 
> HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735-v7.txt, HDFS-6735-v8.txt, 
> HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14228549#comment-14228549
 ] 

Hadoop QA commented on HDFS-6735:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12684213/HDFS-6735-v8.txt
  against trunk revision 1556f86.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8869//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8869//console

This message is automatically generated.

> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Lars Hofhansl
> Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, 
> HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735-v7.txt, HDFS-6735-v8.txt, 
> HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14227953#comment-14227953
 ] 

Hadoop QA commented on HDFS-6735:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12684096/HDFS-6735-v7.txt
  against trunk revision c1f2bb2.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8865//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8865//console

This message is automatically generated.

> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Lars Hofhansl
> Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, 
> HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735-v7.txt, HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14227945#comment-14227945
 ] 

Hadoop QA commented on HDFS-6735:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12684091/HDFS-6735-v6.txt
  against trunk revision c1f2bb2.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8864//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8864//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8864//console

This message is automatically generated.

> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Lars Hofhansl
> Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, 
> HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735-v7.txt, HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14227174#comment-14227174
 ] 

Hadoop QA commented on HDFS-6735:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12683956/HDFS-6735-v6.txt
  against trunk revision c1f2bb2.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The test build failed in 
hadoop-hdfs-project/hadoop-hdfs 

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8860//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8860//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8860//console

This message is automatically generated.

> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Lars Hofhansl
> Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, 
> HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735-v6.txt, HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-26 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14227155#comment-14227155
 ] 

Lars Hofhansl commented on HDFS-6735:
-

So to be specific the improvement I see above is still there. Just that the 
next thing to tackle is the ShortCircuitCache.

> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, 
> HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735-v6.txt, HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-26 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14227148#comment-14227148
 ] 

Lars Hofhansl commented on HDFS-6735:
-

Tested -v6 with HBase. Still good from the DFSInputStream angle.
I do see now that much more time is spent in ShortCircuitCache.fetchOrCreate 
and unref. (rechecked that is true to -v3 as well).
It's still better, but the can is kicked down the road a bit.


> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, 
> HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735-v6.txt, HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-26 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14227031#comment-14227031
 ] 

Lars Hofhansl commented on HDFS-6735:
-

Per my comment above my preference would still be to just make the 
cachingStrategy reference volatile in DFSInputStream. It is immutable and hence 
the volatile reference would make access safe in all cases without any locking 
- the same is true for fileEncryptionInfo, btw (immutable already, just needs a 
volatile reference, no locking needed at all).

I'll make a new patch.

> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, 
> HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-26 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14226896#comment-14226896
 ] 

stack commented on HDFS-6735:
-

Interesting CachingStrategy can be changed on a DFSIS post-construction. Could 
avoid infolock on cachingstrategy if pre-made the readahead and dropbehinds but 
that's probably OTT.

Nice doc'ing of locking strategy around data members.

If we are doing a openInfo call, we can't service a filelength; I suppose thats 
how it should be; if we are updating our block info, file length could 
change and if updating block info, somethings up w/ the block set we 
currently have...  At least the lock has climbed down from a lock on 'this'.  
Good.

Patch LGTM (I like the Colin feedback above).  Numbers still pretty good 
[~larsh]?







> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, 
> HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-26 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14226791#comment-14226791
 ] 

Colin Patrick McCabe commented on HDFS-6735:


* {{readWithStrategy}} and {{blockSeekTo}} should be marked {{synchronized}}.  
Yes, they are called from a {{synchronized}} function, but let's make it clear. 
 It's kind of confusing to see us fooling around with {{pos}} and other stuff 
without seeing a {{synchronized}} on the function.

* We should document in a comment that we cannot try to take the DFSInputStream 
lock when holding the infoLock.  We need to be careful to avoid deadlock, and 
maintaining this lock ordering is the easiest way.

* I noticed that in {{blockSeekTo}}, we are holding the {{infoLock}} when 
calling {{BlockReaderFactory#build}}.  It would be nice to avoid this.  That 
function does a lot of stuff... if we're creating a {{RemoteBlockReader2}}, it 
potentially blocks while a TCP connection to the DataNode is opened.  It seems 
like all you need the {{infoLock}} for here is to get the {{cachingStrategy}} 
and determine if {{shortCircuitForbidden}}, and you could pull this out into a 
synchronized block prior to the {{Builder#build}} call, similar to how 
{{actualGetFromOneDataNode}} does it.

Incidentally, the findbugs warning is probably because findbugs doesn't realize 
that {{CachingStrategy}} is an immutable class, and so it's safe to access it 
without locking.  (The only thing you need locking for is actually reading the 
current reference to the object, not for accessing the object itself.)

+1 once those are addressed

> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, 
> HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14225473#comment-14225473
 ] 

Hadoop QA commented on HDFS-6735:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12683657/HDFS-6735-v6.txt
  against trunk revision 78f7cdb.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8834//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8834//console

This message is automatically generated.

> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, 
> HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-25 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14224353#comment-14224353
 ] 

Steve Loughran commented on HDFS-6735:
--

you can fix the findbugs warnng by tweaking 
{{hadoop-hdfs-project/hadoop-hdfs/dev-support/findbugsExcludeFile.xml }} and 
including that diff in the patch

> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, 
> HDFS-6735-v5.txt, HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-24 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14223901#comment-14223901
 ] 

Lars Hofhansl commented on HDFS-6735:
-

The remaining findbugs warning is due to cachingStrategy. I am 100% sure that 
the locking is correct, every single reference to cachingStrategy is guarded by 
the infoLock.
This should good to go (happy to squash the bogus findbugs warning if somebody 
has a suggestion how).

The findbugs website states this for IS2_INCONSISTENT_SYNC:
{quote}
Note that there are various sources of inaccuracy in this detector; for 
example, the detector cannot statically detect all situations in which a lock 
is held.  Also, even when the detector is accurate in distinguishing locked vs. 
unlocked accesses, the code in question may still be correct.
{quote}


> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, 
> HDFS-6735-v5.txt, HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14222772#comment-14222772
 ] 

Hadoop QA commented on HDFS-6735:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12683270/HDFS-6735-v5.txt
  against trunk revision a4df9ee.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8817//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8817//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8817//console

This message is automatically generated.

> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, 
> HDFS-6735-v5.txt, HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-23 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14222673#comment-14222673
 ] 

Lars Hofhansl commented on HDFS-6735:
-

s/since we never get into that if block if we coming from a called 
synchronized/since we *only* get into that if block if we coming from a caller 
synchronized/


> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, 
> HDFS-6735-v5.txt, HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14222544#comment-14222544
 ] 

Hadoop QA commented on HDFS-6735:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12683239/HDFS-6735-v4.txt
  against trunk revision a4df9ee.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8814//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8814//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8814//console

This message is automatically generated.

> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, 
> HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-21 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221839#comment-14221839
 ] 

Lars Hofhansl commented on HDFS-6735:
-

Thanks [~cmccabe]. I'll put the synchronized back, do the correct indentation, 
and name the new lock differently.
I'll also look through the other synchronized modifiers that I had removed from 
private methods where is makes sense.

On the indentation... I completely agree. It's hard to review - sometimes I 
apply HBase patches locally just so that I can do a git diff -b to review it 
without the whitespace, which is a pain. And if not done in all branches then 
cherry-picking a patch becomes annoying, etc, etc.

Thanks again for looking! New patch upcoming.

> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-21 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14220659#comment-14220659
 ] 

Colin Patrick McCabe commented on HDFS-6735:


bq. Another locking option is not to synchronize on  at all, but to have 
two locks ("streamLock" and "pLock", or whatever are good names). That way the 
intend might be more explicit.  Yet another option would be to disentangle to 
two apis by subclassing or delegation (since the issue really is that we have 
state for two different modes of operation in the same class), that'd be a 
bigger change though.

Yeah, I thought about that too.  It seems cleaner, intuitively, but it would 
also involve a lot of re-indentation, since we could not have "synchronized" 
methods any more, but would have to have synchronized blocks on the "positional 
lock".  It seems like such a trivial thing, but changing indentation can be 
painful.

I'm fine with committing something like the current patch, if we can make the 
changes I suggested above.  We can think about additional cleanups in a 
follow-on JIRA, I guess.  I know you guys have spent a lot of time on this and 
it's important for HBase perf.

bq. Tested this with HBase and observed with a sampler that all delays internal 
to DFSInputStream are gone, which is nice.

awesome!

> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-21 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14220654#comment-14220654
 ] 

Colin Patrick McCabe commented on HDFS-6735:


bq. re: tryReadZeroCopy removing the synchronization is fine, because it is 
only called from (stateful) read(...) and pos is only used in the stateful read 
path and hence needs to be guarded by the lock on  only.

Why not make it {{synchronized}} then?  There is no extra overhead if we 
already have the lock, and this makes it self-documenting.

> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-19 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14219075#comment-14219075
 ] 

Lars Hofhansl commented on HDFS-6735:
-

Apologies for the spam... I have a backport of this to branch-2.4 in case 
anybody is interested.


> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-19 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14219074#comment-14219074
 ] 

Lars Hofhansl commented on HDFS-6735:
-

re: tryReadZeroCopy removing the synchronization is fine, because it is only 
called from (stateful) read(...) and pos is only used in the stateful read path 
and hence needs to be guarded by the lock on  only.


> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-19 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14219072#comment-14219072
 ] 

Lars Hofhansl commented on HDFS-6735:
-

Thanks [~cmccabe]. "infoLock" is better. I'll fix the indentation later. Let me 
have a look at tryReadZeroCopy again. I had mapped out all members and which 
methods use what, and concluded the synchronized wasn't needed, quite possible 
I made a mistake.

Another locking option is not to synchronize on  at all, but to have two 
locks ("streamLock" and "pLock", or whatever are good names). That way the 
intend might be more explicit.
Yet another option would be to disentangle to two apis by subclassing or 
delegation (since the issue really is that we have state for two different 
modes of operation in the same class), that'd be a bigger change though.

Meanwhile in HBase land:
Tested this with HBase and observed with a sampler that all delays internal to 
DFSInputStream are gone, which is nice.

I committed a change to HBase to allow us to (1) have compaction use their own 
input streams so they do not interfere with user scans along the same files and 
(2) optionally force p-reads for all user scans. See HBASE-12411.

Especially with #2 I see nice speedups for many concurrent scanners essentially 
to what my disks can sustain, but a 50% slow downs for a single scanner per 
file only - which is obvious as we're not benefiting from prefetching now.


> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-19 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14219026#comment-14219026
 ] 

Colin Patrick McCabe commented on HDFS-6735:


This is definitely a tricky change but I think it will have a lot of benefits.  
Thanks again for taking this on.

I can see that you didn't update the indent.  I guess that makes things easier 
to review.  But eventually we will need to add indentation for the things that 
are now surrounded by a new {{synchronized}} block.

{code}
-  private synchronized ByteBuffer tryReadZeroCopy(int maxLength,
+  private ByteBuffer tryReadZeroCopy(int maxLength,
   EnumSet opts) throws IOException {
{code}

I don't think that we can remove the synchronization here, since this function 
is using {{pos}} and some other stuff which is positional.

re: CachingStrategy: I think this should just be protected by {{sharedLock}}, 
right?  It shouldn't need to be volatile if we just grab the current value when 
holding {{sharedLock}}.

I'm not sure "sharedLock" is the best name.  All locks are shared, right?  If 
we weren't sharing between threads, there would be no need for locks.  I can 
see that the intention is that sharedLock = shared between read and pread.  But 
sharedLock is also used by stuff like getFileEncryptionInfo.

How about calling this "infoLock" instead?  It's protecting the block info we 
got from the NameNode, lastBlockBeingWrittenLength, and fileEncryptionInfo (and 
I think cachingstrategy, if my previous comment makes sense?)

> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-19 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218802#comment-14218802
 ] 

Lars Hofhansl commented on HDFS-6735:
-

I ran TestByteArrayManager as well as all tests derived from 
TestParallelReadUtil. All pass locally.
Will checkout the findbugs warning and do an real-life test with HBase (with 
this patch on top of the latest 2.4)

Any recommendation on what else I should test?


> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14203875#comment-14203875
 ] 

Hadoop QA commented on HDFS-6735:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12680470/HDFS-6735-v3.txt
  against trunk revision 9ba8d8c.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 4 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.util.TestByteArrayManager

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8704//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8704//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8704//console

This message is automatically generated.

> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-03 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194904#comment-14194904
 ] 

Colin Patrick McCabe commented on HDFS-6735:


Hi Lars,

Thanks for working on this.  It seems like everyone agrees that 
{{DFSInputStream#read}} doesn't need to block {{DFSInputStream#pread}} or 
{{DFSInputStream#getFileLength}}.  As per my comment here: 
https://issues.apache.org/jira/browse/HDFS-6698?focusedCommentId=14194902&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14194902
  I wonder if we could combine this with HDFS-6698 as "improve concurrency in 
DFSInputStream"

> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6735-v2.txt, HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14193634#comment-14193634
 ] 

Hadoop QA commented on HDFS-6735:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12657297/HDFS-6735-v2.txt
  against trunk revision 5c0381c.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8624//console

This message is automatically generated.

> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6735-v2.txt, HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-01 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14193629#comment-14193629
 ] 

Lars Hofhansl commented on HDFS-6735:
-

As described in HDFS-6698, the potential performance gains for something like 
HBase are substantial.

I agree it's better to keep LocatedBlocks as not threadsafe and require called 
to lock accordingly.
I've not see fetchAt in a hot path (at least not from HBase usage patterns).
seek + read (non positional) cannot be done concurrently, agreed. pread should 
be possible, though.

How should we continue to move on this? Seems important. :)

Also open to suggestions about how to fix things in HBase (see last comment in 
HDFS-6698, about how HBase handles things and how limited concurrency "within" 
an InputStream is an issue).


> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6735-v2.txt, HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-08-08 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14090419#comment-14090419
 ] 

Yi Liu commented on HDFS-6735:
--

Hi [~xieliang007], thanks for the patch. 
Agree with [~cmccabe] that we should figure out what the tread-safty guarantees 
are. 
>From the patch itself, my comments are: 
* instead of making volatile for locatedBlocks variable, we may need to define 
a separate lock for locatedBlocks operations. 
* {{fetchBlockAt}} should be handled.
* {{failures}} and {{readStatistics}} are not pretected if support 
multi-threads.

> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6735-v2.txt, HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-08-01 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14082749#comment-14082749
 ] 

Colin Patrick McCabe commented on HDFS-6735:


Thanks, Steve.  And thanks Stack and Liang, for bearing with me on this one :)  
Looking forward to figuring this out.

> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6735-v2.txt, HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-08-01 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14082114#comment-14082114
 ] 

Steve Loughran commented on HDFS-6735:
--

Linking to HADOOP-9361 as it raises the question "what concurrency guarantees 
are all filesystem input streams meant to offer"

> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6735-v2.txt, HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.2#6252)



[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-07-23 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14072228#comment-14072228
 ] 

Colin Patrick McCabe commented on HDFS-6735:


Changing the locking model for the DFSInputStream seems like a big project.  
Can we have a design doc for this and for HDFS-6698?

I'm also not sure if we document the thread-safety guarantees offered by the 
DFSInputStream anywhere.  Most things seem to be protected by locks, but we 
should discuss what the guarantees are and put them as comments in the code 
explicitly.  We should figure out what the thread-safety guarantees are (and 
which operations block which other operations).  For example, a non-positional 
read probably always has to block another non-positional read, but the 
situation with other ops is less clear.

> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6735-v2.txt, HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-07-23 Thread Liang Xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071655#comment-14071655
 ] 

Liang Xie commented on HDFS-6735:
-

The failed TestPipelinesFailover & TestNamenodeCapacityReport were not related 
with current patch(i saw them in other recent reports)

> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6735-v2.txt, HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-07-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071628#comment-14071628
 ] 

Hadoop QA commented on HDFS-6735:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12657297/HDFS-6735-v2.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
  
org.apache.hadoop.hdfs.server.namenode.TestNamenodeCapacityReport

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7439//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7439//console

This message is automatically generated.

> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6735-v2.txt, HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-07-23 Thread Liang Xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071440#comment-14071440
 ] 

Liang Xie commented on HDFS-6735:
-

bq. I'd think you'd want a comment at least in LocatedBlocks#underConstruction 
warning an upper layer is dependent on it being final in case LocatedBlocks 
changes and starts to allow blocks complete under a stream.
done.

bq.  locatedBlocks.insertRange(targetBlockIdx, newBlocks.getLocatedBlocks()); 
... be inside a synchronization too? Could two threads be updating block 
locations at same time?
yes, it's possible. but we could not put a "synchronization" that, it's 
different from "synchronized (this) { + pos = offset; + blockEnd = 
blk.getStartOffset() + blk.getBlockSize() - 1; + currentLocatedBlock = blk; + 
}",  because in pread scenario, the "updatePosition" is false, so will never go 
into the "synchronized (this) { + pos = offset; + blockEnd = 
blk.getStartOffset() + blk.getBlockSize() - 1; + currentLocatedBlock = blk; + 
}".   And if we put a "synchronization" there, so if pread reach here, it's 
still blocked by other monitor holder, e.g. read()  :) but we can have a 
"synchronized" or rwLock in Locatedblocks class. let me try

> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6735-v2.txt, HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-07-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071435#comment-14071435
 ] 

Hadoop QA commented on HDFS-6735:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12657271/HDFS-6735.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestDatanodeConfig
  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
  org.apache.hadoop.hdfs.TestPread
  org.apache.hadoop.hdfs.TestDataTransferKeepalive

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7431//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7431//console

This message is automatically generated.

> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-07-22 Thread Liang Xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071409#comment-14071409
 ] 

Liang Xie commented on HDFS-6735:
-

bq.  We'd not check in the test since it does not assert anything? We'd just 
check it in as a utility testing concurrent pread throughput?
In the last of testing code snippet, there're assertion, see:
{code}
  assertTrue(readLatency.readMs > readLatency.readMs);
  //because we issued a pread already, so the second one should not hit
  //disk, even consider running on a slow VM, 1 second should be fine?
  assertTrue(readLatency.preadMs < 1000);
{code}
Per "assertTrue(readLatency.preadMs < 1000);", we could know weather the 
pread() be blocked by read() or not :)

> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-07-22 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071382#comment-14071382
 ] 

stack commented on HDFS-6735:
-

Nice test.  We'd not check in the test since it does not assert anything? We'd 
just check it in as a utility testing concurrent pread throughput?

On the below, it is 'safe' because the data member being queried if 'final'?

-  synchronized boolean shortCircuitForbidden() {
+   boolean shortCircuitForbidden() {
 return locatedBlocks.isUnderConstruction();
   }

I'd think you'd want a comment at least in LocatedBlocks#underConstruction 
warning an upper layer is dependent on it being final in case LocatedBlocks 
changes and starts to allow blocks complete under a stream.

Looking at this change:

-  pos = offset;
-  blockEnd = blk.getStartOffset() + blk.getBlockSize() - 1;
-  currentLocatedBlock = blk;
+  synchronized (this) {
+pos = offset;
+blockEnd = blk.getStartOffset() + blk.getBlockSize() - 1;
+currentLocatedBlock = blk;
+  }

... it makes sense. Should the line from a few lines above:

locatedBlocks.insertRange(targetBlockIdx, newBlocks.getLocatedBlocks());

... be inside a synchronization too?  Could two threads be updating block 
locations at same time?

The below looks safe to me:

-  private synchronized List getFinalizedBlockRange(
+  private List getFinalizedBlockRange(

Else patch looks great.

> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-07-22 Thread Liang Xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071335#comment-14071335
 ] 

Liang Xie commented on HDFS-6735:
-

This patch includes HDFS-6698 already.
With the source change, the added "testPreadBlockedbyRead" result in my box: 
{code}
grep cost 
hadoop-hdfs-project/hadoop-hdfs/target/surefire-reports/org.apache.hadoop.hdfs.TestPread-output.txt
 
read() cost:5008ms, pread() cost:5ms
{code}

Without the source change(off cause, you still need keep the "readDelay" 
injecting faults in src code), the added "testPreadBlockedbyRead" result in my 
box: 
{code}
read() cost:5009ms, pread() cost:4912ms
{code}

> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.



--
This message was sent by Atlassian JIRA
(v6.2#6252)