[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes
[ https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598677#comment-14598677 ] Lars Hofhansl commented on HDFS-6440: - Yeah. Thanks [~atm]! > Support more than 2 NameNodes > - > > Key: HDFS-6440 > URL: https://issues.apache.org/jira/browse/HDFS-6440 > Project: Hadoop HDFS > Issue Type: New Feature > Components: auto-failover, ha, namenode >Affects Versions: 2.4.0 >Reporter: Jesse Yates >Assignee: Jesse Yates > Fix For: 3.0.0 > > Attachments: Multiple-Standby-NameNodes_V1.pdf, > hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, > hdfs-6440-trunk-v1.patch, hdfs-6440-trunk-v3.patch, hdfs-6440-trunk-v4.patch, > hdfs-6440-trunk-v5.patch, hdfs-6440-trunk-v6.patch, hdfs-6440-trunk-v7.patch, > hdfs-6440-trunk-v8.patch, hdfs-multiple-snn-trunk-v0.patch > > > Most of the work is already done to support more than 2 NameNodes (one > active, one standby). This would be the last bit to support running multiple > _standby_ NameNodes; one of the standbys should be available for fail-over. > Mostly, this is a matter of updating how we parse configurations, some > complexity around managing the checkpointing, and updating a whole lot of > tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes
[ https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14525990#comment-14525990 ] Lars Hofhansl commented on HDFS-6440: - [~eli], this is the issue I mentioned on Wednesday. I find it hard to believe that we're the only ones who want this, it's running in production at Salesforce. What's holding this up? How can we help getting this in? Break it into smaller pieces? Something else? > Support more than 2 NameNodes > - > > Key: HDFS-6440 > URL: https://issues.apache.org/jira/browse/HDFS-6440 > Project: Hadoop HDFS > Issue Type: New Feature > Components: auto-failover, ha, namenode >Affects Versions: 2.4.0 >Reporter: Jesse Yates >Assignee: Jesse Yates > Attachments: Multiple-Standby-NameNodes_V1.pdf, > hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, > hdfs-6440-trunk-v1.patch, hdfs-multiple-snn-trunk-v0.patch > > > Most of the work is already done to support more than 2 NameNodes (one > active, one standby). This would be the last bit to support running multiple > _standby_ NameNodes; one of the standbys should be available for fail-over. > Mostly, this is a matter of updating how we parse configurations, some > complexity around managing the checkpointing, and updating a whole lot of > tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7240) Object store in HDFS
[ https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14486738#comment-14486738 ] Lars Hofhansl commented on HDFS-7240: - Awesome stuff. We (Salesforce) have a need for this. I think these will lead to immediate management problems: * Object Size : 5G * Number of buckets system-wide : 10 million * Number of objects per bucket: 1 million * Number of buckets per storage volume : 1000 We have a large number of tenant (many times more than 1000). Some of the tenants will be very large (storing many times more than 1m objects). Of course there are simple workarounds for that, such as including a tenant id in the volume name and a bucket name in our internal blob ids. Are these technical limits? I don't think that we're the only ones who will to store a large amount of objects (more than 1m) and the bucket management would get into the way, rather than help. > Object store in HDFS > > > Key: HDFS-7240 > URL: https://issues.apache.org/jira/browse/HDFS-7240 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Jitendra Nath Pandey >Assignee: Jitendra Nath Pandey > Attachments: Ozone-architecture-v1.pdf > > > This jira proposes to add object store capabilities into HDFS. > As part of the federation work (HDFS-1052) we separated block storage as a > generic storage layer. Using the Block Pool abstraction, new kinds of > namespaces can be built on top of the storage layer i.e. datanodes. > In this jira I will explore building an object store using the datanode > storage, but independent of namespace metadata. > I will soon update with a detailed design document. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes
[ https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14481957#comment-14481957 ] Lars Hofhansl commented on HDFS-6440: - Let me also restate that we are running this in production on hundreds of clusters at Salesforce; we haven't seen any issues. It _is_ a pretty intricate patch, so I understand the hesitation. > Support more than 2 NameNodes > - > > Key: HDFS-6440 > URL: https://issues.apache.org/jira/browse/HDFS-6440 > Project: Hadoop HDFS > Issue Type: New Feature > Components: auto-failover, ha, namenode >Affects Versions: 2.4.0 >Reporter: Jesse Yates >Assignee: Jesse Yates > Attachments: Multiple-Standby-NameNodes_V1.pdf, > hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, > hdfs-6440-trunk-v1.patch, hdfs-multiple-snn-trunk-v0.patch > > > Most of the work is already done to support more than 2 NameNodes (one > active, one standby). This would be the last bit to support running multiple > _standby_ NameNodes; one of the standbys should be available for fail-over. > Mostly, this is a matter of updating how we parse configurations, some > complexity around managing the checkpointing, and updating a whole lot of > tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HDFS-6735: Attachment: HDFS-6735-v8.txt One more update. I noticed that the lock in ShortCircuitCache is taking more time than warranted. I noticed we have all these Precondition checks, where we prebuild the string that is only used in the exceptional case. Much better to use static strings with parameters so that the message string is constant and the final string is only built in the exception case. That noticeably decreases the time spend in the ShortCircuitCache.lock. Could do that in the separate jira, but it seemed easy enough. Please let me know what you think. Thanks. > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Lars Hofhansl > Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, > HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735-v7.txt, HDFS-6735-v8.txt, > HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HDFS-6735: Attachment: HDFS-6735-v7.txt So here's the final one (with the findbugs tweak back in). > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Lars Hofhansl > Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, > HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735-v7.txt, HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HDFS-6735: Attachment: (was: HDFS-6735-v6.txt) > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Lars Hofhansl > Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, > HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HDFS-6735: Attachment: (was: HDFS-6735-v6.txt) > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Lars Hofhansl > Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, > HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HDFS-6735: Attachment: HDFS-6735-v6.txt Trying to get another build. The artifacts of the previous one are gone for some reason. > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Lars Hofhansl > Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, > HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735-v6.txt, HDFS-6735-v6.txt, > HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl reassigned HDFS-6735: --- Assignee: Lars Hofhansl (was: Liang Xie) > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Lars Hofhansl > Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, > HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735-v6.txt, HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14227155#comment-14227155 ] Lars Hofhansl commented on HDFS-6735: - So to be specific the improvement I see above is still there. Just that the next thing to tackle is the ShortCircuitCache. > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, > HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735-v6.txt, HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14227148#comment-14227148 ] Lars Hofhansl commented on HDFS-6735: - Tested -v6 with HBase. Still good from the DFSInputStream angle. I do see now that much more time is spent in ShortCircuitCache.fetchOrCreate and unref. (rechecked that is true to -v3 as well). It's still better, but the can is kicked down the road a bit. > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, > HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735-v6.txt, HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HDFS-6735: Attachment: HDFS-6735-v6.txt Updated patch. The findbugs tweak is still necessary. Locking was correct before, findbugs does not seem to realize that all references to cachingStrategy is always guarded by the infoLock. I'll run a 2.4.1 version of this patch against HBase again. > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, > HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735-v6.txt, HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14227031#comment-14227031 ] Lars Hofhansl commented on HDFS-6735: - Per my comment above my preference would still be to just make the cachingStrategy reference volatile in DFSInputStream. It is immutable and hence the volatile reference would make access safe in all cases without any locking - the same is true for fileEncryptionInfo, btw (immutable already, just needs a volatile reference, no locking needed at all). I'll make a new patch. > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, > HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HDFS-6735: Attachment: HDFS-6735-v6.txt Thanks [~ste...@apache.org]. New patch with findbugs tweak. > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, > HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14223901#comment-14223901 ] Lars Hofhansl commented on HDFS-6735: - The remaining findbugs warning is due to cachingStrategy. I am 100% sure that the locking is correct, every single reference to cachingStrategy is guarded by the infoLock. This should good to go (happy to squash the bogus findbugs warning if somebody has a suggestion how). The findbugs website states this for IS2_INCONSISTENT_SYNC: {quote} Note that there are various sources of inaccuracy in this detector; for example, the detector cannot statically detect all situations in which a lock is held. Also, even when the detector is accurate in distinguishing locked vs. unlocked accesses, the code in question may still be correct. {quote} > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, > HDFS-6735-v5.txt, HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14222673#comment-14222673 ] Lars Hofhansl commented on HDFS-6735: - s/since we never get into that if block if we coming from a called synchronized/since we *only* get into that if block if we coming from a caller synchronized/ > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, > HDFS-6735-v5.txt, HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HDFS-6735: Attachment: HDFS-6735-v5.txt Looked through the findbugs warning for DFSInputStream: * indeed currentNode was wrongly synchronized (was so even before the patch). In getCurrentDataNode I had added synchronized(infoLock) but getCurrentData should just synchronized as currentNode is seek+read state. * added a synchronized block in getBlockAt around access to pos, blockEnd, currentLocatedBlock. As explained in comment that is not needed, since we never get into that if block if we coming from a called synchronized on . But if that is so the extra synchronized won't hurt and it should make findbugs happy. > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, > HDFS-6735-v5.txt, HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HDFS-6735: Attachment: HDFS-6735-v4.txt > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, > HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HDFS-6735: Attachment: (was: HDFS-6735-v4.txt) > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HDFS-6735: Attachment: HDFS-6735-v4.txt New patch: * added synchronized back to tryZeroCopyRead * renamed sharedLock to infoLock * this time did all the correct indentation - harder to review, but this should be committable as is * surrounded every reference to cachingStrategy with synchronized(infoLock) {...}, removed volatile Looking at this again, we can be better about safe publishing with immutable state and avoid some of the locks. For example FileEncryptionInfo and CachingStrategy are already immutable and can be 100% safely handled by just a volatile reference; most the LocatedBlocks state is also immutable and for those parts we can avoid the locks as well. Immutable state is easier to reason about and more efficient. (volatile still places read and write memory fences - but that is cheaper than synchronized). Can do that later :) > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, > HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221839#comment-14221839 ] Lars Hofhansl commented on HDFS-6735: - Thanks [~cmccabe]. I'll put the synchronized back, do the correct indentation, and name the new lock differently. I'll also look through the other synchronized modifiers that I had removed from private methods where is makes sense. On the indentation... I completely agree. It's hard to review - sometimes I apply HBase patches locally just so that I can do a git diff -b to review it without the whitespace, which is a pain. And if not done in all branches then cherry-picking a patch becomes annoying, etc, etc. Thanks again for looking! New patch upcoming. > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14219075#comment-14219075 ] Lars Hofhansl commented on HDFS-6735: - Apologies for the spam... I have a backport of this to branch-2.4 in case anybody is interested. > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14219074#comment-14219074 ] Lars Hofhansl commented on HDFS-6735: - re: tryReadZeroCopy removing the synchronization is fine, because it is only called from (stateful) read(...) and pos is only used in the stateful read path and hence needs to be guarded by the lock on only. > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14219072#comment-14219072 ] Lars Hofhansl commented on HDFS-6735: - Thanks [~cmccabe]. "infoLock" is better. I'll fix the indentation later. Let me have a look at tryReadZeroCopy again. I had mapped out all members and which methods use what, and concluded the synchronized wasn't needed, quite possible I made a mistake. Another locking option is not to synchronize on at all, but to have two locks ("streamLock" and "pLock", or whatever are good names). That way the intend might be more explicit. Yet another option would be to disentangle to two apis by subclassing or delegation (since the issue really is that we have state for two different modes of operation in the same class), that'd be a bigger change though. Meanwhile in HBase land: Tested this with HBase and observed with a sampler that all delays internal to DFSInputStream are gone, which is nice. I committed a change to HBase to allow us to (1) have compaction use their own input streams so they do not interfere with user scans along the same files and (2) optionally force p-reads for all user scans. See HBASE-12411. Especially with #2 I see nice speedups for many concurrent scanners essentially to what my disks can sustain, but a 50% slow downs for a single scanner per file only - which is obvious as we're not benefiting from prefetching now. > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218802#comment-14218802 ] Lars Hofhansl commented on HDFS-6735: - I ran TestByteArrayManager as well as all tests derived from TestParallelReadUtil. All pass locally. Will checkout the findbugs warning and do an real-life test with HBase (with this patch on top of the latest 2.4) Any recommendation on what else I should test? > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HDFS-6735: Attachment: HDFS-6735-v3.txt I classified the state in DFSInputStream into state used by read only and state used by both read and pread. With that here's a new proposed patch. * makes LocatedBlocks immutable (which was intended it seems) * pread no longer affects currentNode (that was unintended I think) * guards state shared between read and pread with an extra sharedLock (the state used for read only is still guarded by a lock on , which we need to take anyway to avoid concurrent stateful reads against the same input stream) * removed all synchronized on private method that were only called from methods already synchronized (good practice anyway) * makes cachingStrategy volatile (made more sense than locking there) * should be free of deadlocks (never acquire lock on with sharedLock held, but the reverse is possible) * pos, blockEnd, currentLocatedBlock are not updated in getBlockAt unless called on behalf of read (not for pread, hence locking on not needed there) I have not tested this, yet. Please have a careful look and let me know what you think. We might want to further disentangle the mixed state. (And just maybe the best solution would be for HBase to have an input stream for each thread doing read and one for all threads doing preads - and not do any of this...?) > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6698) try to optimize DFSInputStream.getFileLength()
[ https://issues.apache.org/jira/browse/HDFS-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195409#comment-14195409 ] Lars Hofhansl commented on HDFS-6698: - Yeah, let's combine these. We can close this one and do the work in HDFS-6735. I'm with you on volatile, it only guarantees visibility (via memory barriers) but doesn't control concurrent access. Things should be final (immutable) or locked correctly - volatile is rarely enough by itself. Using a separate lock for touching DFSInputStream#locatedBlocks seems like the right approach to me. > try to optimize DFSInputStream.getFileLength() > -- > > Key: HDFS-6698 > URL: https://issues.apache.org/jira/browse/HDFS-6698 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-6698.txt, HDFS-6698.txt, HDFS-6698v2.txt, > HDFS-6698v2.txt, HDFS-6698v3.txt > > > HBase prefers to invoke read() serving scan request, and invoke pread() > serving get reqeust. Because pread() almost holds no lock. > Let's image there's a read() running, because the definition is: > {code} > public synchronized int read > {code} > so no other read() request could run concurrently, this is known, but pread() > also could not run... because: > {code} > public int read(long position, byte[] buffer, int offset, int length) > throws IOException { > // sanity checks > dfsClient.checkOpen(); > if (closed) { > throw new IOException("Stream closed"); > } > failures = 0; > long filelen = getFileLength(); > {code} > the getFileLength() also needs lock. so we need to figure out a no lock impl > for getFileLength() before HBase multi stream feature done. > [~saint@gmail.com] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14193629#comment-14193629 ] Lars Hofhansl commented on HDFS-6735: - As described in HDFS-6698, the potential performance gains for something like HBase are substantial. I agree it's better to keep LocatedBlocks as not threadsafe and require called to lock accordingly. I've not see fetchAt in a hot path (at least not from HBase usage patterns). seek + read (non positional) cannot be done concurrently, agreed. pread should be possible, though. How should we continue to move on this? Seems important. :) Also open to suggestions about how to fix things in HBase (see last comment in HDFS-6698, about how HBase handles things and how limited concurrency "within" an InputStream is an issue). > A minor optimization to avoid pread() be blocked by read() inside the same > DFSInputStream > - > > Key: HDFS-6735 > URL: https://issues.apache.org/jira/browse/HDFS-6735 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-6735-v2.txt, HDFS-6735.txt > > > In current DFSInputStream impl, there're a couple of coarser-grained locks in > read/pread path, and it has became a HBase read latency pain point so far. In > HDFS-6698, i made a minor patch against the first encourtered lock, around > getFileLength, in deed, after reading code and testing, it shows still other > locks we could improve. > In this jira, i'll make a patch against other locks, and a simple test case > to show the issue and the improved result. > This is important for HBase application, since in current HFile read path, we > issue all read()/pread() requests in the same DFSInputStream for one HFile. > (Multi streams solution is another story i had a plan to do, but probably > will take more time than i expected) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6698) try to optimize DFSInputStream.getFileLength()
[ https://issues.apache.org/jira/browse/HDFS-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14192993#comment-14192993 ] Lars Hofhansl commented on HDFS-6698: - Now... I am not saying that we do not have work to in HBase: * we're using one reader per HFile * after a major compaction we have a single store file per column family (that file can be up to 20GB in size) * we allow one thread using seek+read on that reader, other concurrent scanners will fall back to pread (see HBASE-7336). For my test I did this: * my test table had 2^25 (~32m) rows, in two regions, about 1GB on disk * I tested this with Phoenix, which can break a query into parts and execute scans for the parts (that's where the parallel scanning on the same readers comes into play) * I have short circuit reading enabled * all data in the OS cache (HBase block cache not used) This is not an uncommon scenario, though. The original poster cited scans(seek+read) + gets(pread) as a problem. In either case, I'll post an updated patch to HDFS-6735 and we can take it from there. > try to optimize DFSInputStream.getFileLength() > -- > > Key: HDFS-6698 > URL: https://issues.apache.org/jira/browse/HDFS-6698 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-6698.txt, HDFS-6698.txt, HDFS-6698v2.txt, > HDFS-6698v2.txt, HDFS-6698v3.txt > > > HBase prefers to invoke read() serving scan request, and invoke pread() > serving get reqeust. Because pread() almost holds no lock. > Let's image there's a read() running, because the definition is: > {code} > public synchronized int read > {code} > so no other read() request could run concurrently, this is known, but pread() > also could not run... because: > {code} > public int read(long position, byte[] buffer, int offset, int length) > throws IOException { > // sanity checks > dfsClient.checkOpen(); > if (closed) { > throw new IOException("Stream closed"); > } > failures = 0; > long filelen = getFileLength(); > {code} > the getFileLength() also needs lock. so we need to figure out a no lock impl > for getFileLength() before HBase multi stream feature done. > [~saint@gmail.com] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6698) try to optimize DFSInputStream.getFileLength()
[ https://issues.apache.org/jira/browse/HDFS-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14192684#comment-14192684 ] Lars Hofhansl commented on HDFS-6698: - Pulling in selected changes from HDFS-6735 yields a HUGE speed improvement. A scan that took 16s to execute now finishes in 9s. (setup is such all data fits into the OS cache and the HBase cache is disabled to isolate this code path) > try to optimize DFSInputStream.getFileLength() > -- > > Key: HDFS-6698 > URL: https://issues.apache.org/jira/browse/HDFS-6698 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-6698.txt, HDFS-6698.txt, HDFS-6698v2.txt, > HDFS-6698v2.txt, HDFS-6698v3.txt > > > HBase prefers to invoke read() serving scan request, and invoke pread() > serving get reqeust. Because pread() almost holds no lock. > Let's image there's a read() running, because the definition is: > {code} > public synchronized int read > {code} > so no other read() request could run concurrently, this is known, but pread() > also could not run... because: > {code} > public int read(long position, byte[] buffer, int offset, int length) > throws IOException { > // sanity checks > dfsClient.checkOpen(); > if (closed) { > throw new IOException("Stream closed"); > } > failures = 0; > long filelen = getFileLength(); > {code} > the getFileLength() also needs lock. so we need to figure out a no lock impl > for getFileLength() before HBase multi stream feature done. > [~saint@gmail.com] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6698) try to optimize DFSInputStream.getFileLength()
[ https://issues.apache.org/jira/browse/HDFS-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14192550#comment-14192550 ] Lars Hofhansl commented on HDFS-6698: - Indeed I now find that the time is spent in {{getBlockRange()}} :) I'll look at HDFS-6735 and include fixes from there. > try to optimize DFSInputStream.getFileLength() > -- > > Key: HDFS-6698 > URL: https://issues.apache.org/jira/browse/HDFS-6698 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-6698.txt, HDFS-6698.txt, HDFS-6698v2.txt, > HDFS-6698v2.txt, HDFS-6698v3.txt > > > HBase prefers to invoke read() serving scan request, and invoke pread() > serving get reqeust. Because pread() almost holds no lock. > Let's image there's a read() running, because the definition is: > {code} > public synchronized int read > {code} > so no other read() request could run concurrently, this is known, but pread() > also could not run... because: > {code} > public int read(long position, byte[] buffer, int offset, int length) > throws IOException { > // sanity checks > dfsClient.checkOpen(); > if (closed) { > throw new IOException("Stream closed"); > } > failures = 0; > long filelen = getFileLength(); > {code} > the getFileLength() also needs lock. so we need to figure out a no lock impl > for getFileLength() before HBase multi stream feature done. > [~saint@gmail.com] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6698) try to optimize DFSInputStream.getFileLength()
[ https://issues.apache.org/jira/browse/HDFS-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14192209#comment-14192209 ] Lars Hofhansl commented on HDFS-6698: - Need to make sure now that we do not kick the can down the road; there are more synchronized methods call from within read. I'll do some testing and report back. > try to optimize DFSInputStream.getFileLength() > -- > > Key: HDFS-6698 > URL: https://issues.apache.org/jira/browse/HDFS-6698 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-6698.txt, HDFS-6698.txt, HDFS-6698v2.txt, > HDFS-6698v2.txt, HDFS-6698v3.txt > > > HBase prefers to invoke read() serving scan request, and invoke pread() > serving get reqeust. Because pread() almost holds no lock. > Let's image there's a read() running, because the definition is: > {code} > public synchronized int read > {code} > so no other read() request could run concurrently, this is known, but pread() > also could not run... because: > {code} > public int read(long position, byte[] buffer, int offset, int length) > throws IOException { > // sanity checks > dfsClient.checkOpen(); > if (closed) { > throw new IOException("Stream closed"); > } > failures = 0; > long filelen = getFileLength(); > {code} > the getFileLength() also needs lock. so we need to figure out a no lock impl > for getFileLength() before HBase multi stream feature done. > [~saint@gmail.com] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6698) try to optimize DFSInputStream.getFileLength()
[ https://issues.apache.org/jira/browse/HDFS-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HDFS-6698: Attachment: HDFS-6698v3.txt I just ran into this as well while debugging why HBase does not benefit from Snappy compression as much as it should. Turns out a non-trivial amount of time (as determined by a sampler, not a instrumenting profiler) is spent in this method. To be safe I'd probably also turn LocatedBlocks into an immutable object (well, except for blocks) - see attached patch. All members of LocatedBlocks are safely published now. With that I don't think this patch can do any harm. > try to optimize DFSInputStream.getFileLength() > -- > > Key: HDFS-6698 > URL: https://issues.apache.org/jira/browse/HDFS-6698 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-6698.txt, HDFS-6698.txt, HDFS-6698v2.txt, > HDFS-6698v2.txt, HDFS-6698v3.txt > > > HBase prefers to invoke read() serving scan request, and invoke pread() > serving get reqeust. Because pread() almost holds no lock. > Let's image there's a read() running, because the definition is: > {code} > public synchronized int read > {code} > so no other read() request could run concurrently, this is known, but pread() > also could not run... because: > {code} > public int read(long position, byte[] buffer, int offset, int length) > throws IOException { > // sanity checks > dfsClient.checkOpen(); > if (closed) { > throw new IOException("Stream closed"); > } > failures = 0; > long filelen = getFileLength(); > {code} > the getFileLength() also needs lock. so we need to figure out a no lock impl > for getFileLength() before HBase multi stream feature done. > [~saint@gmail.com] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-5042) Completed files lost after power failure
[ https://issues.apache.org/jira/browse/HDFS-5042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14158677#comment-14158677 ] Lars Hofhansl commented on HDFS-5042: - We should study the perf impact. Previously I found that sync-on-close severely impacted file creation time - unless sync-behind-writes is also enabled. (Interestingly sync-behind-writes should not cause any performance detriment as we're dealing with immutable files, and hence delaying writing these dirty blocks to disk in the hopes that they'd be updated before we do so is pointless anyway). > Completed files lost after power failure > > > Key: HDFS-5042 > URL: https://issues.apache.org/jira/browse/HDFS-5042 > Project: Hadoop HDFS > Issue Type: Bug > Environment: ext3 on CentOS 5.7 (kernel 2.6.18-274.el5) >Reporter: Dave Latham >Priority: Critical > > We suffered a cluster wide power failure after which HDFS lost data that it > had acknowledged as closed and complete. > The client was HBase which compacted a set of HFiles into a new HFile, then > after closing the file successfully, deleted the previous versions of the > file. The cluster then lost power, and when brought back up the newly > created file was marked CORRUPT. > Based on reading the logs it looks like the replicas were created by the > DataNodes in the 'blocksBeingWritten' directory. Then when the file was > closed they were moved to the 'current' directory. After the power cycle > those replicas were again in the blocksBeingWritten directory of the > underlying file system (ext3). When those DataNodes reported in to the > NameNode it deleted those replicas and lost the file. > Some possible fixes could be having the DataNode fsync the directory(s) after > moving the block from blocksBeingWritten to current to ensure the rename is > durable or having the NameNode accept replicas from blocksBeingWritten under > certain circumstances. > Log snippets from RS (RegionServer), NN (NameNode), DN (DataNode): > {noformat} > RS 2013-06-29 11:16:06,812 DEBUG org.apache.hadoop.hbase.util.FSUtils: > Creating > file=hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c > with permission=rwxrwxrwx > NN 2013-06-29 11:16:06,830 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > NameSystem.allocateBlock: > /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c. > blk_1395839728632046111_357084589 > DN 2013-06-29 11:16:06,832 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block > blk_1395839728632046111_357084589 src: /10.0.5.237:14327 dest: > /10.0.5.237:50010 > NN 2013-06-29 11:16:11,370 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > NameSystem.addStoredBlock: blockMap updated: 10.0.6.1:50010 is added to > blk_1395839728632046111_357084589 size 25418340 > NN 2013-06-29 11:16:11,370 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > NameSystem.addStoredBlock: blockMap updated: 10.0.6.24:50010 is added to > blk_1395839728632046111_357084589 size 25418340 > NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > NameSystem.addStoredBlock: blockMap updated: 10.0.5.237:50010 is added to > blk_1395839728632046111_357084589 size 25418340 > DN 2013-06-29 11:16:11,385 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: Received block > blk_1395839728632046111_357084589 of size 25418340 from /10.0.5.237:14327 > DN 2013-06-29 11:16:11,385 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 2 for block > blk_1395839728632046111_357084589 terminating > NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: Removing > lease on file > /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c > from client DFSClient_hb_rs_hs745,60020,1372470111932 > NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: DIR* > NameSystem.completeFile: file > /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c > is closed by DFSClient_hb_rs_hs745,60020,1372470111932 > RS 2013-06-29 11:16:11,393 INFO org.apache.hadoop.hbase.regionserver.Store: > Renaming compacted file at > hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c > to > hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/n/6e0cc30af6e64e56ba5a539fdf159c4c > RS 2013-06-29 11:16:11,505 INFO org.apache.hadoop.hbase.regionserver.Store: > Completed major compaction of 7 file(s) in n of > users-6,\x12\xBDp\xA3,1359426311784.b5b0820cde759ae68e333b2f4015bb7e. into > 6e0cc30af6e64e56ba5a539fdf159c4c, size=24.2m; total size for store is 24.2m > --- CRASH, RESTART - > NN 2013-06-29 12:01:19,743 INFO org.apache.hadoop
[jira] [Commented] (HDFS-5042) Completed files lost after power failure
[ https://issues.apache.org/jira/browse/HDFS-5042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14158434#comment-14158434 ] Lars Hofhansl commented on HDFS-5042: - Cool. That should work. > Completed files lost after power failure > > > Key: HDFS-5042 > URL: https://issues.apache.org/jira/browse/HDFS-5042 > Project: Hadoop HDFS > Issue Type: Bug > Environment: ext3 on CentOS 5.7 (kernel 2.6.18-274.el5) >Reporter: Dave Latham >Priority: Critical > > We suffered a cluster wide power failure after which HDFS lost data that it > had acknowledged as closed and complete. > The client was HBase which compacted a set of HFiles into a new HFile, then > after closing the file successfully, deleted the previous versions of the > file. The cluster then lost power, and when brought back up the newly > created file was marked CORRUPT. > Based on reading the logs it looks like the replicas were created by the > DataNodes in the 'blocksBeingWritten' directory. Then when the file was > closed they were moved to the 'current' directory. After the power cycle > those replicas were again in the blocksBeingWritten directory of the > underlying file system (ext3). When those DataNodes reported in to the > NameNode it deleted those replicas and lost the file. > Some possible fixes could be having the DataNode fsync the directory(s) after > moving the block from blocksBeingWritten to current to ensure the rename is > durable or having the NameNode accept replicas from blocksBeingWritten under > certain circumstances. > Log snippets from RS (RegionServer), NN (NameNode), DN (DataNode): > {noformat} > RS 2013-06-29 11:16:06,812 DEBUG org.apache.hadoop.hbase.util.FSUtils: > Creating > file=hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c > with permission=rwxrwxrwx > NN 2013-06-29 11:16:06,830 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > NameSystem.allocateBlock: > /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c. > blk_1395839728632046111_357084589 > DN 2013-06-29 11:16:06,832 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block > blk_1395839728632046111_357084589 src: /10.0.5.237:14327 dest: > /10.0.5.237:50010 > NN 2013-06-29 11:16:11,370 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > NameSystem.addStoredBlock: blockMap updated: 10.0.6.1:50010 is added to > blk_1395839728632046111_357084589 size 25418340 > NN 2013-06-29 11:16:11,370 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > NameSystem.addStoredBlock: blockMap updated: 10.0.6.24:50010 is added to > blk_1395839728632046111_357084589 size 25418340 > NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > NameSystem.addStoredBlock: blockMap updated: 10.0.5.237:50010 is added to > blk_1395839728632046111_357084589 size 25418340 > DN 2013-06-29 11:16:11,385 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: Received block > blk_1395839728632046111_357084589 of size 25418340 from /10.0.5.237:14327 > DN 2013-06-29 11:16:11,385 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 2 for block > blk_1395839728632046111_357084589 terminating > NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: Removing > lease on file > /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c > from client DFSClient_hb_rs_hs745,60020,1372470111932 > NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: DIR* > NameSystem.completeFile: file > /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c > is closed by DFSClient_hb_rs_hs745,60020,1372470111932 > RS 2013-06-29 11:16:11,393 INFO org.apache.hadoop.hbase.regionserver.Store: > Renaming compacted file at > hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c > to > hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/n/6e0cc30af6e64e56ba5a539fdf159c4c > RS 2013-06-29 11:16:11,505 INFO org.apache.hadoop.hbase.regionserver.Store: > Completed major compaction of 7 file(s) in n of > users-6,\x12\xBDp\xA3,1359426311784.b5b0820cde759ae68e333b2f4015bb7e. into > 6e0cc30af6e64e56ba5a539fdf159c4c, size=24.2m; total size for store is 24.2m > --- CRASH, RESTART - > NN 2013-06-29 12:01:19,743 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > NameSystem.addStoredBlock: addStoredBlock request received for > blk_1395839728632046111_357084589 on 10.0.6.1:50010 size 21978112 but was > rejected: Reported as block being written but is a block of closed file. > NN 2013-06-29 12:01:19,743 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > NameSystem.addToInvalidates: blk_1395839728632046111 i
[jira] [Commented] (HDFS-5042) Completed files lost after power failure
[ https://issues.apache.org/jira/browse/HDFS-5042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14158235#comment-14158235 ] Lars Hofhansl commented on HDFS-5042: - Thanks Luke. I meant to say: (1) finish writing the block. (2) Move it. (3) fsync or fdatasync the block file in the new location. (We'd just change the order of moving vs. fsync.) The rename would still be atomic (file block is written completely before we move it), but doing the fsync after should order the meta data commits correctly assuming write barriers. Then again the write and the move would be two different transactions as far as the fs is concerned. Agree it's cleanest if we in fact sync both actions. > Completed files lost after power failure > > > Key: HDFS-5042 > URL: https://issues.apache.org/jira/browse/HDFS-5042 > Project: Hadoop HDFS > Issue Type: Bug > Environment: ext3 on CentOS 5.7 (kernel 2.6.18-274.el5) >Reporter: Dave Latham >Priority: Critical > > We suffered a cluster wide power failure after which HDFS lost data that it > had acknowledged as closed and complete. > The client was HBase which compacted a set of HFiles into a new HFile, then > after closing the file successfully, deleted the previous versions of the > file. The cluster then lost power, and when brought back up the newly > created file was marked CORRUPT. > Based on reading the logs it looks like the replicas were created by the > DataNodes in the 'blocksBeingWritten' directory. Then when the file was > closed they were moved to the 'current' directory. After the power cycle > those replicas were again in the blocksBeingWritten directory of the > underlying file system (ext3). When those DataNodes reported in to the > NameNode it deleted those replicas and lost the file. > Some possible fixes could be having the DataNode fsync the directory(s) after > moving the block from blocksBeingWritten to current to ensure the rename is > durable or having the NameNode accept replicas from blocksBeingWritten under > certain circumstances. > Log snippets from RS (RegionServer), NN (NameNode), DN (DataNode): > {noformat} > RS 2013-06-29 11:16:06,812 DEBUG org.apache.hadoop.hbase.util.FSUtils: > Creating > file=hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c > with permission=rwxrwxrwx > NN 2013-06-29 11:16:06,830 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > NameSystem.allocateBlock: > /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c. > blk_1395839728632046111_357084589 > DN 2013-06-29 11:16:06,832 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block > blk_1395839728632046111_357084589 src: /10.0.5.237:14327 dest: > /10.0.5.237:50010 > NN 2013-06-29 11:16:11,370 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > NameSystem.addStoredBlock: blockMap updated: 10.0.6.1:50010 is added to > blk_1395839728632046111_357084589 size 25418340 > NN 2013-06-29 11:16:11,370 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > NameSystem.addStoredBlock: blockMap updated: 10.0.6.24:50010 is added to > blk_1395839728632046111_357084589 size 25418340 > NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > NameSystem.addStoredBlock: blockMap updated: 10.0.5.237:50010 is added to > blk_1395839728632046111_357084589 size 25418340 > DN 2013-06-29 11:16:11,385 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: Received block > blk_1395839728632046111_357084589 of size 25418340 from /10.0.5.237:14327 > DN 2013-06-29 11:16:11,385 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 2 for block > blk_1395839728632046111_357084589 terminating > NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: Removing > lease on file > /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c > from client DFSClient_hb_rs_hs745,60020,1372470111932 > NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: DIR* > NameSystem.completeFile: file > /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c > is closed by DFSClient_hb_rs_hs745,60020,1372470111932 > RS 2013-06-29 11:16:11,393 INFO org.apache.hadoop.hbase.regionserver.Store: > Renaming compacted file at > hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c > to > hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/n/6e0cc30af6e64e56ba5a539fdf159c4c > RS 2013-06-29 11:16:11,505 INFO org.apache.hadoop.hbase.regionserver.Store: > Completed major compaction of 7 file(s) in n of > users-6,\x12\xBDp\xA3,1359426311784.b5b0820cde759ae68e333b2f4015bb7e. into > 6e0cc30af6e64e56ba5a539fdf159c4c, size=24
[jira] [Commented] (HDFS-5042) Completed files lost after power failure
[ https://issues.apache.org/jira/browse/HDFS-5042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157328#comment-14157328 ] Lars Hofhansl commented on HDFS-5042: - Is this a problem when enabling write barriers on the DNs? EXT3 has them off by default. In that case we might need to move the file in place first and then fsync the file, that should force the meta updates in order... I'm sure that'd cause other problems. > Completed files lost after power failure > > > Key: HDFS-5042 > URL: https://issues.apache.org/jira/browse/HDFS-5042 > Project: Hadoop HDFS > Issue Type: Bug > Environment: ext3 on CentOS 5.7 (kernel 2.6.18-274.el5) >Reporter: Dave Latham >Priority: Critical > > We suffered a cluster wide power failure after which HDFS lost data that it > had acknowledged as closed and complete. > The client was HBase which compacted a set of HFiles into a new HFile, then > after closing the file successfully, deleted the previous versions of the > file. The cluster then lost power, and when brought back up the newly > created file was marked CORRUPT. > Based on reading the logs it looks like the replicas were created by the > DataNodes in the 'blocksBeingWritten' directory. Then when the file was > closed they were moved to the 'current' directory. After the power cycle > those replicas were again in the blocksBeingWritten directory of the > underlying file system (ext3). When those DataNodes reported in to the > NameNode it deleted those replicas and lost the file. > Some possible fixes could be having the DataNode fsync the directory(s) after > moving the block from blocksBeingWritten to current to ensure the rename is > durable or having the NameNode accept replicas from blocksBeingWritten under > certain circumstances. > Log snippets from RS (RegionServer), NN (NameNode), DN (DataNode): > {noformat} > RS 2013-06-29 11:16:06,812 DEBUG org.apache.hadoop.hbase.util.FSUtils: > Creating > file=hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c > with permission=rwxrwxrwx > NN 2013-06-29 11:16:06,830 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > NameSystem.allocateBlock: > /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c. > blk_1395839728632046111_357084589 > DN 2013-06-29 11:16:06,832 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block > blk_1395839728632046111_357084589 src: /10.0.5.237:14327 dest: > /10.0.5.237:50010 > NN 2013-06-29 11:16:11,370 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > NameSystem.addStoredBlock: blockMap updated: 10.0.6.1:50010 is added to > blk_1395839728632046111_357084589 size 25418340 > NN 2013-06-29 11:16:11,370 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > NameSystem.addStoredBlock: blockMap updated: 10.0.6.24:50010 is added to > blk_1395839728632046111_357084589 size 25418340 > NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > NameSystem.addStoredBlock: blockMap updated: 10.0.5.237:50010 is added to > blk_1395839728632046111_357084589 size 25418340 > DN 2013-06-29 11:16:11,385 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: Received block > blk_1395839728632046111_357084589 of size 25418340 from /10.0.5.237:14327 > DN 2013-06-29 11:16:11,385 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 2 for block > blk_1395839728632046111_357084589 terminating > NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: Removing > lease on file > /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c > from client DFSClient_hb_rs_hs745,60020,1372470111932 > NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: DIR* > NameSystem.completeFile: file > /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c > is closed by DFSClient_hb_rs_hs745,60020,1372470111932 > RS 2013-06-29 11:16:11,393 INFO org.apache.hadoop.hbase.regionserver.Store: > Renaming compacted file at > hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c > to > hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/n/6e0cc30af6e64e56ba5a539fdf159c4c > RS 2013-06-29 11:16:11,505 INFO org.apache.hadoop.hbase.regionserver.Store: > Completed major compaction of 7 file(s) in n of > users-6,\x12\xBDp\xA3,1359426311784.b5b0820cde759ae68e333b2f4015bb7e. into > 6e0cc30af6e64e56ba5a539fdf159c4c, size=24.2m; total size for store is 24.2m > --- CRASH, RESTART - > NN 2013-06-29 12:01:19,743 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > NameSystem.addStoredBlock: addStoredBlock request received for > blk_1395839728632046111_357084589 on 10.0.6.1:5
[jira] [Commented] (HDFS-4455) Datanode sometimes gives up permanently on Namenode in HA setup
[ https://issues.apache.org/jira/browse/HDFS-4455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001009#comment-14001009 ] Lars Hofhansl commented on HDFS-4455: - Looked at HDFS-2882. I agree that should fix this issue. > Datanode sometimes gives up permanently on Namenode in HA setup > --- > > Key: HDFS-4455 > URL: https://issues.apache.org/jira/browse/HDFS-4455 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, ha >Affects Versions: 2.0.2-alpha >Reporter: Lars Hofhansl >Assignee: Juan Yu >Priority: Critical > > Today we got ourselves into a situation where we hard killed the cluster > (kill -9 across the board on all processes) and upon restarting all DNs would > permanently give up on of the NNs in our two NN HA setup (using QJM). > The HA setup is correct (prior to this we failed over the NNs many times for > testing). Bouncing the DNs resolved the problem. > In the logs I see this exception: > {code} > 2013-01-29 23:32:49,461 FATAL datanode.DataNode - Initialization failed for > block pool Block pool BP-1852726028--1358813649047 (storage id > DS-60505003--50010-1353106051747) service to /:8020 > java.io.IOException: Failed on local exception: java.io.IOException: Response > is null.; Host Details : local host is: "/"; destination host is: > "":8020; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:759) > at org.apache.hadoop.ipc.Client.call(Client.java:1164) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) > at $Proxy10.registerDatanode(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) > at $Proxy10.registerDatanode(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:149) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:619) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:221) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:661) > at java.lang.Thread.run(Thread.java:662) > Caused by: java.io.IOException: Response is null. > at > org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:885) > at org.apache.hadoop.ipc.Client$Connection.run(Client.java:813) > 2013-01-29 23:32:49,463 WARN datanode.DataNode - Ending block pool service > for: Block pool BP-1852726028--1358813649047 (storage id > DS-60505003--50010-1353106051747) service to /:8020 > {code} > So somehow in BPServiceActor.connectToNNAndHandshake() we made it all the way > to register(). Then failed in bpNamenode.registerDatanode(bpRegistration) > with an IOException, which is not caught and has the block pool service fail > as a whole. > No doubt that was caused by one of the NNs being a weird state. While that > happened the active NN claimed that the FS was corrupted and stayed in safe > mode, and DNs only registered with the standby DN. Failing over to the 2nd NN > and then restarting the first NN and failing did not change that. > No amount bouncing/failing over the HA NNs would have the DNs reconnect to > one of the NNs. > In BPServiceActor.register(), should we catch IOException instead of > SocketTimeoutException? That way it would continue to retry and eventually > connect to the NN. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HDFS-4455) Datanode sometimes gives up permanently on Namenode in HA setup
[ https://issues.apache.org/jira/browse/HDFS-4455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HDFS-4455. - Resolution: Implemented > Datanode sometimes gives up permanently on Namenode in HA setup > --- > > Key: HDFS-4455 > URL: https://issues.apache.org/jira/browse/HDFS-4455 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, ha >Affects Versions: 2.0.2-alpha >Reporter: Lars Hofhansl >Assignee: Juan Yu >Priority: Critical > > Today we got ourselves into a situation where we hard killed the cluster > (kill -9 across the board on all processes) and upon restarting all DNs would > permanently give up on of the NNs in our two NN HA setup (using QJM). > The HA setup is correct (prior to this we failed over the NNs many times for > testing). Bouncing the DNs resolved the problem. > In the logs I see this exception: > {code} > 2013-01-29 23:32:49,461 FATAL datanode.DataNode - Initialization failed for > block pool Block pool BP-1852726028--1358813649047 (storage id > DS-60505003--50010-1353106051747) service to /:8020 > java.io.IOException: Failed on local exception: java.io.IOException: Response > is null.; Host Details : local host is: "/"; destination host is: > "":8020; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:759) > at org.apache.hadoop.ipc.Client.call(Client.java:1164) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) > at $Proxy10.registerDatanode(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) > at $Proxy10.registerDatanode(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:149) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:619) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:221) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:661) > at java.lang.Thread.run(Thread.java:662) > Caused by: java.io.IOException: Response is null. > at > org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:885) > at org.apache.hadoop.ipc.Client$Connection.run(Client.java:813) > 2013-01-29 23:32:49,463 WARN datanode.DataNode - Ending block pool service > for: Block pool BP-1852726028--1358813649047 (storage id > DS-60505003--50010-1353106051747) service to /:8020 > {code} > So somehow in BPServiceActor.connectToNNAndHandshake() we made it all the way > to register(). Then failed in bpNamenode.registerDatanode(bpRegistration) > with an IOException, which is not caught and has the block pool service fail > as a whole. > No doubt that was caused by one of the NNs being a weird state. While that > happened the active NN claimed that the FS was corrupted and stayed in safe > mode, and DNs only registered with the standby DN. Failing over to the 2nd NN > and then restarting the first NN and failing did not change that. > No amount bouncing/failing over the HA NNs would have the DNs reconnect to > one of the NNs. > In BPServiceActor.register(), should we catch IOException instead of > SocketTimeoutException? That way it would continue to retry and eventually > connect to the NN. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-5571) postScannerFilterRow consumes a lot of CPU in tall table scans
Lars Hofhansl created HDFS-5571: --- Summary: postScannerFilterRow consumes a lot of CPU in tall table scans Key: HDFS-5571 URL: https://issues.apache.org/jira/browse/HDFS-5571 Project: Hadoop HDFS Issue Type: Bug Reporter: Lars Hofhansl Continuing my profiling quest, I find that in scanning tall table (and filtering everything on the server) a quarter of the time is now spent in the postScannerFilterRow coprocessor hook. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5461) fallback to non-ssr(local short circuit reads) while oom detected
[ https://issues.apache.org/jira/browse/HDFS-5461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13821560#comment-13821560 ] Lars Hofhansl commented on HDFS-5461: - The issue is that the JDK only collects direct byte buffers during a full GC, and there are different limits for the direct buffer and the general heap. HBase keeps a reader open for each store file and thus we end up with a lot of direct memory used. I was actually curious about 1mb as the default size; it seems even as little 8kb should be OK. > fallback to non-ssr(local short circuit reads) while oom detected > - > > Key: HDFS-5461 > URL: https://issues.apache.org/jira/browse/HDFS-5461 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.0.0, 2.2.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-5461.txt > > > Currently, the DirectBufferPool used by ssr feature seems doesn't have a > upper-bound limit except DirectMemory VM option. So there's a risk to > encounter direct memory oom. see HBASE-8143 for example. > IMHO, maybe we could improve it a bit: > 1) detect OOM or reach a setting up-limit from caller, then fallback to > non-ssr > 2) add a new metric about current raw consumed direct memory size. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-2834) ByteBuffer-based read API for DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13733186#comment-13733186 ] Lars Hofhansl commented on HDFS-2834: - Just for reference with many open files one can easily OOM on direct buffer memory. See: HBASE-8143. 1MB seems to be a rather large default. > ByteBuffer-based read API for DFSInputStream > > > Key: HDFS-2834 > URL: https://issues.apache.org/jira/browse/HDFS-2834 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, performance >Reporter: Henry Robinson >Assignee: Henry Robinson > Fix For: 2.0.2-alpha > > Attachments: HDFS-2834.10.patch, HDFS-2834.11.patch, > HDFS-2834.3.patch, HDFS-2834.4.patch, HDFS-2834.5.patch, HDFS-2834.6.patch, > HDFS-2834.7.patch, HDFS-2834.8.patch, HDFS-2834.9.patch, > hdfs-2834-libhdfs-benchmark.png, HDFS-2834-no-common.patch, HDFS-2834.patch, > HDFS-2834.patch > > > The {{DFSInputStream}} read-path always copies bytes into a JVM-allocated > {{byte[]}}. Although for many clients this is desired behaviour, in certain > situations, such as native-reads through libhdfs, this imposes an extra copy > penalty since the {{byte[]}} needs to be copied out again into a natively > readable memory area. > For these cases, it would be preferable to allow the client to supply its own > buffer, wrapped in a {{ByteBuffer}}, to avoid that final copy overhead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1783) Ability for HDFS client to write replicas in parallel
[ https://issues.apache.org/jira/browse/HDFS-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HDFS-1783: Assignee: (was: Lars Hofhansl) > Ability for HDFS client to write replicas in parallel > - > > Key: HDFS-1783 > URL: https://issues.apache.org/jira/browse/HDFS-1783 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Reporter: dhruba borthakur > Attachments: HDFS-1783-trunk.patch, HDFS-1783-trunk-v2.patch, > HDFS-1783-trunk-v3.patch, HDFS-1783-trunk-v4.patch, HDFS-1783-trunk-v5.patch > > > The current implementation of HDFS pipelines the writes to the three > replicas. This introduces some latency for realtime latency sensitive > applications. An alternate implementation that allows the client to write all > replicas in parallel gives much better response times to these applications. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4455) Datanode sometimes gives up permanently on Namenode in HA setup
Lars Hofhansl created HDFS-4455: --- Summary: Datanode sometimes gives up permanently on Namenode in HA setup Key: HDFS-4455 URL: https://issues.apache.org/jira/browse/HDFS-4455 Project: Hadoop HDFS Issue Type: Bug Components: datanode Reporter: Lars Hofhansl Today we got ourselves into a situation where we hard killed the cluster (kill -9 across the board on all processes) and upon restarting all DNs would permanently give up on of the NNs in our two NN HA setup (using QJM). The HA setup is correct (prior to this we failed over the NNs many times for testing). Bouncing the DNs resolved the problem. In the logs I see this exception: {code} 2013-01-29 23:32:49,461 FATAL datanode.DataNode - Initialization failed for block pool Block pool BP-1852726028--1358813649047 (storage id DS-60505003--50010-1353106051747) service to /:8020 java.io.IOException: Failed on local exception: java.io.IOException: Response is null.; Host Details : local host is: "/"; destination host is: "":8020; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:759) at org.apache.hadoop.ipc.Client.call(Client.java:1164) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) at $Proxy10.registerDatanode(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) at $Proxy10.registerDatanode(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:149) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:619) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:221) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:661) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: Response is null. at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:885) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:813) 2013-01-29 23:32:49,463 WARN datanode.DataNode - Ending block pool service for: Block pool BP-1852726028--1358813649047 (storage id DS-60505003--50010-1353106051747) service to /:8020 {code} So somehow in BPServiceActor.connectToNNAndHandshake() we made it all the way to register(). Then failed in bpNamenode.registerDatanode(bpRegistration) with an IOException, which is not caught and has the block pool service fail as a whole. No doubt that was caused by one of the NNs being a weird state. While that happened the active NN claimed that the FS was corrupted and stayed in safe mode, and DNs only registered with the standby DN. Failing over to the 2nd NN and then restarting the first NN and failing did not change that. No amount bouncing/failing over the HA NNs would have the DNs reconnect to one of the NNs. In BPServiceActor.register(), should we catch IOException instead of SocketTimeoutException? That way it would continue to retry and eventually connect to the NN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4345) Release resources of unpoolable Decompressors
[ https://issues.apache.org/jira/browse/HDFS-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HDFS-4345: Status: Patch Available (was: Open) > Release resources of unpoolable Decompressors > - > > Key: HDFS-4345 > URL: https://issues.apache.org/jira/browse/HDFS-4345 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl >Priority: Minor > Fix For: 3.0.0 > > Attachments: HDFS-4345.txt > > > Found this when looking into HBASE-7435. > When a Decompressor is returned to the pool in CodecPool.java, we should > probably call end() on it to release its resources. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4345) Release resources of unpoolable Decompressors
[ https://issues.apache.org/jira/browse/HDFS-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HDFS-4345: Attachment: HDFS-4345.txt Here's a two-line change for that. (Also calls end() on compressors) > Release resources of unpoolable Decompressors > - > > Key: HDFS-4345 > URL: https://issues.apache.org/jira/browse/HDFS-4345 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl >Priority: Minor > Fix For: 3.0.0 > > Attachments: HDFS-4345.txt > > > Found this when looking into HBASE-7435. > When a Decompressor is returned to the pool in CodecPool.java, we should > probably call end() on it to release its resources. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4345) Release resources of unpoolable Decompressors
Lars Hofhansl created HDFS-4345: --- Summary: Release resources of unpoolable Decompressors Key: HDFS-4345 URL: https://issues.apache.org/jira/browse/HDFS-4345 Project: Hadoop HDFS Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Fix For: 3.0.0 Found this when looking into HBASE-7435. When a Decompressor is returned to the pool in CodecPool.java, we should probably call end() on it to release its resources. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-4060) TestHSync#testSequenceFileSync failed
[ https://issues.apache.org/jira/browse/HDFS-4060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HDFS-4060. - Resolution: Duplicate This is fixed with the changed to HDFS-3979 > TestHSync#testSequenceFileSync failed > - > > Key: HDFS-4060 > URL: https://issues.apache.org/jira/browse/HDFS-4060 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Eli Collins > Labels: test-fail > > TestHSync#testSequenceFileSync failed in the pre commit run of HDFS-4055. > {noformat} > java.lang.AssertionError: Bad value for metric FsyncCount expected:<2> but > was:<1> > at org.junit.Assert.fail(Assert.java:91) > at org.junit.Assert.failNotEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:126) > at org.junit.Assert.assertEquals(Assert.java:470) > at > org.apache.hadoop.test.MetricsAsserts.assertCounter(MetricsAsserts.java:228) > at > org.apache.hadoop.hdfs.server.datanode.TestHSync.checkSyncMetric(TestHSync.java:46) > at > org.apache.hadoop.hdfs.server.datanode.TestHSync.checkSyncMetric(TestHSync.java:49) > at > org.apache.hadoop.hdfs.server.datanode.TestHSync.testSequenceFileSync(TestHSync.java:158) > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3979) Fix hsync semantics
[ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494373#comment-13494373 ] Lars Hofhansl commented on HDFS-3979: - Thanks Nicholas. Luke, so the test you're looking for is starting 3 DNs, and then have the write permanently fail at any of them, and in all cases have the hsync fail on the client, right? > Fix hsync semantics > --- > > Key: HDFS-3979 > URL: https://issues.apache.org/jira/browse/HDFS-3979 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 2.0.2-alpha >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl > Fix For: 2.0.3-alpha > > Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt, > hdfs-3979-v3.txt, hdfs-3979-v4.txt > > > See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver > is not on a synchronous path from the DFSClient, hence it is possible that a > DN loses data that it has already acknowledged as persisted to a client. > Edit: Spelling. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3979) Fix hsync and hflush semantics.
[ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HDFS-3979: Attachment: hdfs-3979-v4.txt Updated patch with Nicholas' suggestion. I agree that the previous patch would have slowed all writes that reach the DN. We can't distinguish between an hflush from the client and "normal" packet from the client. On the other hand this no longer deals with Luke's "kill -9" scenario (where a cluster management tool would kill -9 datanodes in parallel), but in the end no tool really should do that. > Fix hsync and hflush semantics. > --- > > Key: HDFS-3979 > URL: https://issues.apache.org/jira/browse/HDFS-3979 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node, hdfs client >Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl > Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt, > hdfs-3979-v3.txt, hdfs-3979-v4.txt > > > See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver > is not on a synchronous path from the DFSClient, hence it is possible that a > DN loses data that it has already acknowledged as persisted to a client. > Edit: Spelling. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.
[ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13490445#comment-13490445 ] Lars Hofhansl commented on HDFS-3979: - I'll make that change. > Fix hsync and hflush semantics. > --- > > Key: HDFS-3979 > URL: https://issues.apache.org/jira/browse/HDFS-3979 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node, hdfs client >Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl > Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt, hdfs-3979-v3.txt > > > See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver > is not on a synchronous path from the DFSClient, hence it is possible that a > DN loses data that it has already acknowledged as persisted to a client. > Edit: Spelling. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.
[ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13490113#comment-13490113 ] Lars Hofhansl commented on HDFS-3979: - Hi Kan, the only difference between v2 and v3 is that in v3 the "fsync" metric is updated after the actual sync to the FS (BlockReceiver.flushOrSync). This exposes the race condition we want to fix and makes TestHSync fail almost every run (the client return from hsync before the datanode could update the metric). With the rest of this patch applies this race is removed and TestHSync never fails. So now we have a test case for the race condition. [~vicaya] The existing tests: TestFiPipelines and TestFiHFlush do not cover the other scenarios you worry about? > Fix hsync and hflush semantics. > --- > > Key: HDFS-3979 > URL: https://issues.apache.org/jira/browse/HDFS-3979 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node, hdfs client >Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl > Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt, hdfs-3979-v3.txt > > > See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver > is not on a synchronous path from the DFSClient, hence it is possible that a > DN loses data that it has already acknowledged as persisted to a client. > Edit: Spelling. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4060) TestHSync#testSequenceFileSync failed
[ https://issues.apache.org/jira/browse/HDFS-4060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477584#comment-13477584 ] Lars Hofhansl commented on HDFS-4060: - The sync to disk is not actually on a synchronous path as seen from the client, so there is a short race that the client returns but the metric was not updated. See HDFS-3979, which would fix the issue, but appears to be stuck in discussion about what extra tests it would need, if any. > TestHSync#testSequenceFileSync failed > - > > Key: HDFS-4060 > URL: https://issues.apache.org/jira/browse/HDFS-4060 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Eli Collins > Labels: test-fail > > TestHSync#testSequenceFileSync failed in the pre commit run of HDFS-4055. > {noformat} > java.lang.AssertionError: Bad value for metric FsyncCount expected:<2> but > was:<1> > at org.junit.Assert.fail(Assert.java:91) > at org.junit.Assert.failNotEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:126) > at org.junit.Assert.assertEquals(Assert.java:470) > at > org.apache.hadoop.test.MetricsAsserts.assertCounter(MetricsAsserts.java:228) > at > org.apache.hadoop.hdfs.server.datanode.TestHSync.checkSyncMetric(TestHSync.java:46) > at > org.apache.hadoop.hdfs.server.datanode.TestHSync.checkSyncMetric(TestHSync.java:49) > at > org.apache.hadoop.hdfs.server.datanode.TestHSync.testSequenceFileSync(TestHSync.java:158) > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3979) Fix hsync and hflush semantics.
[ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HDFS-3979: Attachment: hdfs-3979-v3.txt This little change makes TestHSync fail most of the time - without the rest of the patch, and never with this patch. (In HDFS-744 I had avoided this race, by updating the sync metric first. I know that was a hack... By updating the metric last in BlockReceiver.flushOrSync, this race becomes apparent again). We do have pipeline tests that seem to verify correct pipeline behavior in the face of failures via fault injection: TestFiPipelines and TestFiHFlush. In terms of the API3/API4 discussion, I think we agree that hflush should follow API4, right? (otherwise we'd have unduly complex code) > Fix hsync and hflush semantics. > --- > > Key: HDFS-3979 > URL: https://issues.apache.org/jira/browse/HDFS-3979 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node, hdfs client >Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl > Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt, hdfs-3979-v3.txt > > > See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver > is not on a synchronous path from the DFSClient, hence it is possible that a > DN loses data that it has already acknowledged as persisted to a client. > Edit: Spelling. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.
[ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470016#comment-13470016 ] Lars Hofhansl commented on HDFS-3979: - API4 is hflush (with change in OS buffers). That's an interesting discussion by itself. hsync'ing every edit in HBase is prohibitive. I have some simple numbers in HBASE-5954. Although, I need to do that test again with the sync_file_range changes in HDFS-2465 (that would hopefully do most of the data sync'ing asynchronously and only sync the last changes and metadata synchronously upon client request). Many applications do not need every edit to be guaranteed on disk, but have "sync points". That is what I am aiming for in HBase. The application will know the specific semantics. What is really important for HBase (IMHO) is that every block is synced to disk when it is closed. HBase constantly rewrites existing data via compactions so without syncing arbitrarily old data can be lost during a rack or DC outage. Lastly, we can play with this. For example only one of the replicas could sync to disk and the other's just guarantee the data in the OS buffers (API4.5 :) ). > Fix hsync and hflush semantics. > --- > > Key: HDFS-3979 > URL: https://issues.apache.org/jira/browse/HDFS-3979 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node, hdfs client >Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl > Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt > > > See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver > is not on a synchronous path from the DFSClient, hence it is possible that a > DN loses data that it has already acknowledged as persisted to a client. > Edit: Spelling. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.
[ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13469827#comment-13469827 ] Lars Hofhansl commented on HDFS-3979: - Thanks Luke and Kan. I'll come up with a test once I get some spare cycles (quite busy with HBase atm). > Fix hsync and hflush semantics. > --- > > Key: HDFS-3979 > URL: https://issues.apache.org/jira/browse/HDFS-3979 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node, hdfs client >Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl > Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt > > > See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver > is not on a synchronous path from the DFSClient, hence it is possible that a > DN loses data that it has already acknowledged as persisted to a client. > Edit: Spelling. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.
[ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13469671#comment-13469671 ] Lars Hofhansl commented on HDFS-3979: - I've seen that race when I write a test for HDFS-744. I "fixed" it there by updating the metrics first... Ugh :) I think I can make a test that fails at least with reasonable probability with the current semantics. The race between ack and write errors should be reduced (eliminated) with this patch. > Fix hsync and hflush semantics. > --- > > Key: HDFS-3979 > URL: https://issues.apache.org/jira/browse/HDFS-3979 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node, hdfs client >Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl > Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt > > > See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver > is not on a synchronous path from the DFSClient, hence it is possible that a > DN loses data that it has already acknowledged as persisted to a client. > Edit: Spelling. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.
[ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13467882#comment-13467882 ] Lars Hofhansl commented on HDFS-3979: - You don't think the existing pipeline tests cover the failure scenarios? I see if I can get some performance numbers. > Fix hsync and hflush semantics. > --- > > Key: HDFS-3979 > URL: https://issues.apache.org/jira/browse/HDFS-3979 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node, hdfs client >Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl > Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt > > > See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver > is not on a synchronous path from the DFSClient, hence it is possible that a > DN loses data that it has already acknowledged as persisted to a client. > Edit: Spelling. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.
[ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13467270#comment-13467270 ] Lars Hofhansl commented on HDFS-3979: - Do we want this change? Seems to me that HDFS-265 broke hsync/hflush and this would fix it. > Fix hsync and hflush semantics. > --- > > Key: HDFS-3979 > URL: https://issues.apache.org/jira/browse/HDFS-3979 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node, hdfs client >Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl > Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt > > > See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver > is not on a synchronous path from the DFSClient, hence it is possible that a > DN loses data that it has already acknowledged as persisted to a client. > Edit: Spelling. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.
[ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13466113#comment-13466113 ] Lars Hofhansl commented on HDFS-3979: - I see. Thanks Kan. So now we we have API4 and (with HDFS-744) API5. For applications like HBase we'd like API4 as well as API5. (API4 allows a hypothetical kill -9 of all DNs without loss of acknowledged data, API5 allows HW failures of all data nodes - i.e. a DC outage - with loss of acknowledged data) > Fix hsync and hflush semantics. > --- > > Key: HDFS-3979 > URL: https://issues.apache.org/jira/browse/HDFS-3979 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node, hdfs client >Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl > Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt > > > See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver > is not on a synchronous path from the DFSClient, hence it is possible that a > DN loses data that it has already acknowledged as persisted to a client. > Edit: Spelling. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3979) Fix hsync and hflush semantics.
[ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HDFS-3979: Attachment: hdfs-3979-v2.txt New patch. Order of local operations and waiting for downstream DNs now reflects the pre HDFS-265 logic. > Fix hsync and hflush semantics. > --- > > Key: HDFS-3979 > URL: https://issues.apache.org/jira/browse/HDFS-3979 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node, hdfs client >Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl > Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt > > > See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver > is not on a synchronous path from the DFSClient, hence it is possible that a > DN loses data that it has already acknowledged as persisted to a client. > Edit: Spelling. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.
[ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13465915#comment-13465915 ] Lars Hofhansl commented on HDFS-3979: - Enqueing the seqno at end seems like the best approach. (Indeed this is done in the 0.20.x code as both of you said). I wonder why this was changed? Will have a new patch momentarily. > Fix hsync and hflush semantics. > --- > > Key: HDFS-3979 > URL: https://issues.apache.org/jira/browse/HDFS-3979 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node, hdfs client >Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl > Attachments: hdfs-3979-sketch.txt > > > See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver > is not on a synchronous path from the DFSClient, hence it is possible that a > DN loses data that it has already acknowledged as persisted to a client. > Edit: Spelling. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.
[ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13465386#comment-13465386 ] Lars Hofhansl commented on HDFS-3979: - Should we simply do the enqueue at the end of receivePacket(), then? So just to make sure: In the current code the seqno is already enqueued in the beginning, so if there's an exception later in the code it won't have any effect on the enqued seqno. The finally is just preserves this existing behavior. What happens when there is an exception and the seqno is never enqueued? (and if that is OK, why is it not a problem now.) > Fix hsync and hflush semantics. > --- > > Key: HDFS-3979 > URL: https://issues.apache.org/jira/browse/HDFS-3979 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node, hdfs client >Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl > Attachments: hdfs-3979-sketch.txt > > > See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver > is not on a synchronous path from the DFSClient, hence it is possible that a > DN loses data that it has already acknowledged as persisted to a client. > Edit: Spelling. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.
[ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13464410#comment-13464410 ] Lars Hofhansl commented on HDFS-3979: - I'm not sure either. I am trying not to change the existing behavior. The enqueue used to happen in the beginning of receivePacket(...), so if that latter part of the method fails the ack would already be enqueued. > Fix hsync and hflush semantics. > --- > > Key: HDFS-3979 > URL: https://issues.apache.org/jira/browse/HDFS-3979 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node, hdfs client >Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl > Attachments: hdfs-3979-sketch.txt > > > See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver > is not on a synchronous path from the DFSClient, hence it is possible that a > DN loses data that it has already acknowledged as persisted to a client. > Edit: Spelling. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3979) Fix hsync and hflush semantics.
[ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HDFS-3979: Description: See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver is not on a synchronous path from the DFSClient, hence it is possible that a DN loses data that it has already acknowledged as persisted to a client. Edit: Spelling. was:See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver is not on a synchronous path from the DFSClient, hence it is possible that DN loses data that is has already acknowledged as persisted to a client. > Fix hsync and hflush semantics. > --- > > Key: HDFS-3979 > URL: https://issues.apache.org/jira/browse/HDFS-3979 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl > Attachments: hdfs-3979-sketch.txt > > > See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver > is not on a synchronous path from the DFSClient, hence it is possible that a > DN loses data that it has already acknowledged as persisted to a client. > Edit: Spelling. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3979) Fix hsync and hflush semantics.
[ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HDFS-3979: Status: Patch Available (was: Open) Let's try HadoopQA. TestHSync still passes. I'll also do some tests with HBase... > Fix hsync and hflush semantics. > --- > > Key: HDFS-3979 > URL: https://issues.apache.org/jira/browse/HDFS-3979 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl > Attachments: hdfs-3979-sketch.txt > > > See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver > is not on a synchronous path from the DFSClient, hence it is possible that DN > loses data that is has already acknowledged as persisted to a client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.
[ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13464133#comment-13464133 ] Lars Hofhansl commented on HDFS-3979: - (and sorry for misspelling you name) > Fix hsync and hflush semantics. > --- > > Key: HDFS-3979 > URL: https://issues.apache.org/jira/browse/HDFS-3979 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl > Attachments: hdfs-3979-sketch.txt > > > See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver > is not on a synchronous path from the DFSClient, hence it is possible that DN > loses data that is has already acknowledged as persisted to a client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3979) Fix hsync and hflush semantics.
[ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HDFS-3979: Attachment: hdfs-3979-sketch.txt Something like this. (This is a sketch, the only test I performed was compiling) > Fix hsync and hflush semantics. > --- > > Key: HDFS-3979 > URL: https://issues.apache.org/jira/browse/HDFS-3979 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl > Attachments: hdfs-3979-sketch.txt > > > See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver > is not on a synchronous path from the DFSClient, hence it is possible that DN > loses data that is has already acknowledged as persisted to a client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.
[ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13464020#comment-13464020 ] Lars Hofhansl commented on HDFS-3979: - More good discussion on HDFS-744. Looks like we can just enqueu the seqno for the packet after the sync/flush is finished. (Khan's idea) > Fix hsync and hflush semantics. > --- > > Key: HDFS-3979 > URL: https://issues.apache.org/jira/browse/HDFS-3979 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl > > See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver > is not on a synchronous path from the DFSClient, hence it is possible that DN > loses data that is has already acknowledged as persisted to a client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-744) Support hsync in HDFS
[ https://issues.apache.org/jira/browse/HDFS-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13464005#comment-13464005 ] Lars Hofhansl commented on HDFS-744: BTW. I filed HDFS-3979 to do that. > Support hsync in HDFS > - > > Key: HDFS-744 > URL: https://issues.apache.org/jira/browse/HDFS-744 > Project: Hadoop HDFS > Issue Type: New Feature > Components: data-node, hdfs client >Reporter: Hairong Kuang >Assignee: Lars Hofhansl > Fix For: 2.0.2-alpha > > Attachments: HDFS-744-2.0-v1.patch, HDFS-744-2.0-v2.patch, > HDFS-744-trunk.patch, HDFS-744-trunk-v2.patch, HDFS-744-trunk-v3.patch, > HDFS-744-trunk-v4.patch, HDFS-744-trunk-v5.patch, HDFS-744-trunk-v6.patch, > HDFS-744-trunk-v7.patch, HDFS-744-trunk-v8.patch, hdfs-744.txt, > hdfs-744-v2.txt, hdfs-744-v3.txt > > > HDFS-731 implements hsync by default as hflush. As descriibed in HADOOP-6313, > the real expected semantics should be "flushes out to all replicas and all > replicas have done posix fsync equivalent - ie the OS has flushed it to the > disk device (but the disk may have it in its cache)." This jira aims to > implement the expected behaviour. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-744) Support hsync in HDFS
[ https://issues.apache.org/jira/browse/HDFS-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13464004#comment-13464004 ] Lars Hofhansl commented on HDFS-744: I see. In that case we wouldn't ack back until all local work in done. A possible place to do that would be almost at the end of receivePacket(), maybe in a finally block of the last try/catch in that method. That still does not take of all the cases, though; for the last packet in a block, the sync is deferred to close() (to avoid double sync). That's not hard to to change either, I think. > Support hsync in HDFS > - > > Key: HDFS-744 > URL: https://issues.apache.org/jira/browse/HDFS-744 > Project: Hadoop HDFS > Issue Type: New Feature > Components: data-node, hdfs client >Reporter: Hairong Kuang >Assignee: Lars Hofhansl > Fix For: 2.0.2-alpha > > Attachments: HDFS-744-2.0-v1.patch, HDFS-744-2.0-v2.patch, > HDFS-744-trunk.patch, HDFS-744-trunk-v2.patch, HDFS-744-trunk-v3.patch, > HDFS-744-trunk-v4.patch, HDFS-744-trunk-v5.patch, HDFS-744-trunk-v6.patch, > HDFS-744-trunk-v7.patch, HDFS-744-trunk-v8.patch, hdfs-744.txt, > hdfs-744-v2.txt, hdfs-744-v3.txt > > > HDFS-731 implements hsync by default as hflush. As descriibed in HADOOP-6313, > the real expected semantics should be "flushes out to all replicas and all > replicas have done posix fsync equivalent - ie the OS has flushed it to the > disk device (but the disk may have it in its cache)." This jira aims to > implement the expected behaviour. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HDFS-3979) Fix hsync and hflush semantics.
[ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl reassigned HDFS-3979: --- Assignee: Lars Hofhansl > Fix hsync and hflush semantics. > --- > > Key: HDFS-3979 > URL: https://issues.apache.org/jira/browse/HDFS-3979 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl > > See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver > is not on a synchronous path from the DFSClient, hence it is possible that DN > loses data that is has already acknowledged as persisted to a client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3979) Fix hsync and hflush semantics.
Lars Hofhansl created HDFS-3979: --- Summary: Fix hsync and hflush semantics. Key: HDFS-3979 URL: https://issues.apache.org/jira/browse/HDFS-3979 Project: Hadoop HDFS Issue Type: Bug Reporter: Lars Hofhansl See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver is not on a synchronous path from the DFSClient, hence it is possible that DN loses data that is has already acknowledged as persisted to a client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.
[ https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13463515#comment-13463515 ] Lars Hofhansl commented on HDFS-3979: - Also see my comment here: https://issues.apache.org/jira/browse/HDFS-744?focusedCommentId=13279619&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13279619 > Fix hsync and hflush semantics. > --- > > Key: HDFS-3979 > URL: https://issues.apache.org/jira/browse/HDFS-3979 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Lars Hofhansl > > See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver > is not on a synchronous path from the DFSClient, hence it is possible that DN > loses data that is has already acknowledged as persisted to a client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-744) Support hsync in HDFS
[ https://issues.apache.org/jira/browse/HDFS-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13463512#comment-13463512 ] Lars Hofhansl commented on HDFS-744: Another approach would be to wait in the responder until both the downstream datanode responded *and* the sync has finished. That way we get correctness and we can still interleave sync'ing/RTT in the pipeline. > Support hsync in HDFS > - > > Key: HDFS-744 > URL: https://issues.apache.org/jira/browse/HDFS-744 > Project: Hadoop HDFS > Issue Type: New Feature > Components: data-node, hdfs client >Reporter: Hairong Kuang >Assignee: Lars Hofhansl > Fix For: 2.0.2-alpha > > Attachments: HDFS-744-2.0-v1.patch, HDFS-744-2.0-v2.patch, > HDFS-744-trunk.patch, HDFS-744-trunk-v2.patch, HDFS-744-trunk-v3.patch, > HDFS-744-trunk-v4.patch, HDFS-744-trunk-v5.patch, HDFS-744-trunk-v6.patch, > HDFS-744-trunk-v7.patch, HDFS-744-trunk-v8.patch, hdfs-744.txt, > hdfs-744-v2.txt, hdfs-744-v3.txt > > > HDFS-731 implements hsync by default as hflush. As descriibed in HADOOP-6313, > the real expected semantics should be "flushes out to all replicas and all > replicas have done posix fsync equivalent - ie the OS has flushed it to the > disk device (but the disk may have it in its cache)." This jira aims to > implement the expected behaviour. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-744) Support hsync in HDFS
[ https://issues.apache.org/jira/browse/HDFS-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13463505#comment-13463505 ] Lars Hofhansl commented on HDFS-744: I think the problem is that we have to enqueue the seqno before the packet is sent downstream right? (Otherwise we could potentially miss the ack, right?) So in order to enqueue the seqno after we syncOrFlush, we'd also have to send the packet downstream after we syncOrFlush, which essentially means that we are serializing the sync times across all replicas. > Support hsync in HDFS > - > > Key: HDFS-744 > URL: https://issues.apache.org/jira/browse/HDFS-744 > Project: Hadoop HDFS > Issue Type: New Feature > Components: data-node, hdfs client >Reporter: Hairong Kuang >Assignee: Lars Hofhansl > Fix For: 2.0.2-alpha > > Attachments: HDFS-744-2.0-v1.patch, HDFS-744-2.0-v2.patch, > HDFS-744-trunk.patch, HDFS-744-trunk-v2.patch, HDFS-744-trunk-v3.patch, > HDFS-744-trunk-v4.patch, HDFS-744-trunk-v5.patch, HDFS-744-trunk-v6.patch, > HDFS-744-trunk-v7.patch, HDFS-744-trunk-v8.patch, hdfs-744.txt, > hdfs-744-v2.txt, hdfs-744-v3.txt > > > HDFS-731 implements hsync by default as hflush. As descriibed in HADOOP-6313, > the real expected semantics should be "flushes out to all replicas and all > replicas have done posix fsync equivalent - ie the OS has flushed it to the > disk device (but the disk may have it in its cache)." This jira aims to > implement the expected behaviour. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-744) Support hsync in HDFS
[ https://issues.apache.org/jira/browse/HDFS-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13462869#comment-13462869 ] Lars Hofhansl commented on HDFS-744: In any case, hsync and hflush should be fixed together. A one-off for hsync does not seem to be the right thing. > Support hsync in HDFS > - > > Key: HDFS-744 > URL: https://issues.apache.org/jira/browse/HDFS-744 > Project: Hadoop HDFS > Issue Type: New Feature > Components: data-node, hdfs client >Reporter: Hairong Kuang >Assignee: Lars Hofhansl > Fix For: 2.0.2-alpha > > Attachments: HDFS-744-2.0-v1.patch, HDFS-744-2.0-v2.patch, > HDFS-744-trunk.patch, HDFS-744-trunk-v2.patch, HDFS-744-trunk-v3.patch, > HDFS-744-trunk-v4.patch, HDFS-744-trunk-v5.patch, HDFS-744-trunk-v6.patch, > HDFS-744-trunk-v7.patch, HDFS-744-trunk-v8.patch, hdfs-744.txt, > hdfs-744-v2.txt, hdfs-744-v3.txt > > > HDFS-731 implements hsync by default as hflush. As descriibed in HADOOP-6313, > the real expected semantics should be "flushes out to all replicas and all > replicas have done posix fsync equivalent - ie the OS has flushed it to the > disk device (but the disk may have it in its cache)." This jira aims to > implement the expected behaviour. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-744) Support hsync in HDFS
[ https://issues.apache.org/jira/browse/HDFS-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13462855#comment-13462855 ] Lars Hofhansl commented on HDFS-744: You are right Luke. I implemented this in the context of hadoop-2 (i.e. with HDFS-265). It seems to get this right HDFS-265 needs to be revisited again. Will look at your suggestion (doing sync in the data thread). As long as the syncs (or flushes) are not serialized it's fine (otherwise nobody is going to switch this on). > Support hsync in HDFS > - > > Key: HDFS-744 > URL: https://issues.apache.org/jira/browse/HDFS-744 > Project: Hadoop HDFS > Issue Type: New Feature > Components: data-node, hdfs client >Reporter: Hairong Kuang >Assignee: Lars Hofhansl > Fix For: 2.0.2-alpha > > Attachments: HDFS-744-2.0-v1.patch, HDFS-744-2.0-v2.patch, > HDFS-744-trunk.patch, HDFS-744-trunk-v2.patch, HDFS-744-trunk-v3.patch, > HDFS-744-trunk-v4.patch, HDFS-744-trunk-v5.patch, HDFS-744-trunk-v6.patch, > HDFS-744-trunk-v7.patch, HDFS-744-trunk-v8.patch, hdfs-744.txt, > hdfs-744-v2.txt, hdfs-744-v3.txt > > > HDFS-731 implements hsync by default as hflush. As descriibed in HADOOP-6313, > the real expected semantics should be "flushes out to all replicas and all > replicas have done posix fsync equivalent - ie the OS has flushed it to the > disk device (but the disk may have it in its cache)." This jira aims to > implement the expected behaviour. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3721) hsync support broke wire compatibility
[ https://issues.apache.org/jira/browse/HDFS-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13423180#comment-13423180 ] Lars Hofhansl commented on HDFS-3721: - Saw Todd's comment on HDFS-744 (moving that change to 2.0.x is not an option). I had assumed that HDFS-744 would be in 2.0.x (in which case there would have been no compatibility issues). Had a quick look through the patch here. Looks good as far as I can tell, I'll take more detailed look later today. > hsync support broke wire compatibility > -- > > Key: HDFS-3721 > URL: https://issues.apache.org/jira/browse/HDFS-3721 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node, hdfs client >Affects Versions: 2.1.0-alpha >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Critical > Attachments: hdfs-3721.txt > > > HDFS-744 added support for hsync to the data transfer wire protocol. However, > it actually broke wire compatibility: if the client has hsync support but the > server does not, the client cannot read or write data on the old cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-744) Support hsync in HDFS
[ https://issues.apache.org/jira/browse/HDFS-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13422842#comment-13422842 ] Lars Hofhansl commented on HDFS-744: Just noticed (because of HDFS-3721) that this is now in 2.1.0-alpha. Any chance to get this into 2.0.x-alpha? (In fact on June 4th is was marked with 2.0.1-alpha, but something changed since then, so now it's 2.1.0-alpha) > Support hsync in HDFS > - > > Key: HDFS-744 > URL: https://issues.apache.org/jira/browse/HDFS-744 > Project: Hadoop HDFS > Issue Type: New Feature > Components: data-node, hdfs client >Reporter: Hairong Kuang >Assignee: Lars Hofhansl > Fix For: 2.1.0-alpha > > Attachments: HDFS-744-2.0-v1.patch, HDFS-744-2.0-v2.patch, > HDFS-744-trunk-v2.patch, HDFS-744-trunk-v3.patch, HDFS-744-trunk-v4.patch, > HDFS-744-trunk-v5.patch, HDFS-744-trunk-v6.patch, HDFS-744-trunk-v7.patch, > HDFS-744-trunk-v8.patch, HDFS-744-trunk.patch, hdfs-744-v2.txt, > hdfs-744-v3.txt, hdfs-744.txt > > > HDFS-731 implements hsync by default as hflush. As descriibed in HADOOP-6313, > the real expected semantics should be "flushes out to all replicas and all > replicas have done posix fsync equivalent - ie the OS has flushed it to the > disk device (but the disk may have it in its cache)." This jira aims to > implement the expected behaviour. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3721) hsync support broke wire compatibility
[ https://issues.apache.org/jira/browse/HDFS-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13422840#comment-13422840 ] Lars Hofhansl commented on HDFS-3721: - Oh this is a 2.1.x vs 2.0.x issue...? The HDFS-744 patch is smaller than this patch, could we port that to 2.0.x? > hsync support broke wire compatibility > -- > > Key: HDFS-3721 > URL: https://issues.apache.org/jira/browse/HDFS-3721 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node, hdfs client >Affects Versions: 2.1.0-alpha >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Critical > Attachments: hdfs-3721.txt > > > HDFS-744 added support for hsync to the data transfer wire protocol. However, > it actually broke wire compatibility: if the client has hsync support but the > server does not, the client cannot read or write data on the old cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3721) hsync support broke wire compatibility
[ https://issues.apache.org/jira/browse/HDFS-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13422835#comment-13422835 ] Lars Hofhansl commented on HDFS-3721: - What sort of wire-compatibility are we talking about? Both trunk and 2.0 have the hsync code. What sort of old cluster would not have this? Does the 2.x.x client support communicating with a 1.x.x cluster? Apologies for introducing this with my patch in HDFS-744, I had assumed protobuf will take care of it. > hsync support broke wire compatibility > -- > > Key: HDFS-3721 > URL: https://issues.apache.org/jira/browse/HDFS-3721 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node, hdfs client >Affects Versions: 2.1.0-alpha >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Critical > Attachments: hdfs-3721.txt > > > HDFS-744 added support for hsync to the data transfer wire protocol. However, > it actually broke wire compatibility: if the client has hsync support but the > server does not, the client cannot read or write data on the old cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3580) incompatible types; no instance(s) of type variable(s) V exist so that V conforms to boolean compiling HttpFSServer.java with OpenJDK
[ https://issues.apache.org/jira/browse/HDFS-3580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404276#comment-13404276 ] Lars Hofhansl commented on HDFS-3580: - You beat me to it Andy. :) +1 on patch, identical patch compiled fine on my home machine with OpenJDK (don't access to it right now). I am not sure whether the version with the primitive types is valid or not, but the version with reference types is definitely valid and should work with all JDKs. > incompatible types; no instance(s) of type variable(s) V exist so that V > conforms to boolean compiling HttpFSServer.java with OpenJDK > - > > Key: HDFS-3580 > URL: https://issues.apache.org/jira/browse/HDFS-3580 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.0.1-alpha >Reporter: Andy Isaacson >Assignee: Andy Isaacson >Priority: Minor > Attachments: hdfs-3580.txt > > > {quote} > [ERROR] > /home/lars/dev/hadoop-2/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/fs/http/server/HttpFSServer.java:[407,36] > incompatible types; no instance(s) of type variable(s) V exist so that V > conforms to boolean > {quote} > {quote} > $ javac -version > javac 1.6.0_24 > $ java -version > java version "1.6.0_24" > OpenJDK Runtime Environment (IcedTea6 1.11.3) (fedora-67.1.11.3.fc16-x86_64) > OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode) > {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1783) Ability for HDFS client to write replicas in parallel
[ https://issues.apache.org/jira/browse/HDFS-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13400196#comment-13400196 ] Lars Hofhansl commented on HDFS-1783: - One more point to consider: For us (Salesforce) this is mostly interesting for HBase. A typical HBase cluster has the DataNodes co-located with the HBase RegionServers. So assuming good load distribution within HBase, the bandwidth would still be amortized across the cluster, but with lower latency for each single RegionServer (this HDFS client in this case). Overall the same number of bits is sent through the cluster as a whole. This would only be enabled for the WAL. Other write load (like compactions), would still do the pipelining. Andy did some cool testing on EC2 over in HBASE-6116. We'll be doing some basic testing in a real, dedicated cluster this week. > Ability for HDFS client to write replicas in parallel > - > > Key: HDFS-1783 > URL: https://issues.apache.org/jira/browse/HDFS-1783 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs client >Reporter: dhruba borthakur >Assignee: Lars Hofhansl > Attachments: HDFS-1783-trunk-v2.patch, HDFS-1783-trunk-v3.patch, > HDFS-1783-trunk-v4.patch, HDFS-1783-trunk-v5.patch, HDFS-1783-trunk.patch > > > The current implementation of HDFS pipelines the writes to the three > replicas. This introduces some latency for realtime latency sensitive > applications. An alternate implementation that allows the client to write all > replicas in parallel gives much better response times to these applications. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1783) Ability for HDFS client to write replicas in parallel
[ https://issues.apache.org/jira/browse/HDFS-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13398785#comment-13398785 ] Lars Hofhansl commented on HDFS-1783: - Yes, that would be a good optimization. I would propose starting with something simple, though (such as the current patch) and see how that behaves with HBase, and build confidence that it does not break things. It's optional and the patch (IMHO) is low risk (only DFSOutputStream is changed, the rest for tests). Then we can think about further optimizations. > Ability for HDFS client to write replicas in parallel > - > > Key: HDFS-1783 > URL: https://issues.apache.org/jira/browse/HDFS-1783 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs client >Reporter: dhruba borthakur >Assignee: Lars Hofhansl > Attachments: HDFS-1783-trunk-v2.patch, HDFS-1783-trunk-v3.patch, > HDFS-1783-trunk-v4.patch, HDFS-1783-trunk-v5.patch, HDFS-1783-trunk.patch > > > The current implementation of HDFS pipelines the writes to the three > replicas. This introduces some latency for realtime latency sensitive > applications. An alternate implementation that allows the client to write all > replicas in parallel gives much better response times to these applications. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1783) Ability for HDFS client to write replicas in parallel
[ https://issues.apache.org/jira/browse/HDFS-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13398767#comment-13398767 ] Lars Hofhansl commented on HDFS-1783: - Yep. That is exactly the point. HDFS does pipelining to improve throughput at the expense of latency. This patch allows a client to favor latency. If the client operates at the NIC's throughput limit enabling parallel writes will make things worse. This patch could be extended in the future to mix direct connections with pipelining. For example a client could setup a 1-hop (direct) pipeline and a 2-hop-pipeline for a replication factor of 3, or 2 2-hop-pipelines for a replication factor of 4, etc. We'll be testing this with HBase workloads. Using traffic shaping is interesting. > Ability for HDFS client to write replicas in parallel > - > > Key: HDFS-1783 > URL: https://issues.apache.org/jira/browse/HDFS-1783 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs client >Reporter: dhruba borthakur >Assignee: Lars Hofhansl > Attachments: HDFS-1783-trunk-v2.patch, HDFS-1783-trunk-v3.patch, > HDFS-1783-trunk-v4.patch, HDFS-1783-trunk-v5.patch, HDFS-1783-trunk.patch > > > The current implementation of HDFS pipelines the writes to the three > replicas. This introduces some latency for realtime latency sensitive > applications. An alternate implementation that allows the client to write all > replicas in parallel gives much better response times to these applications. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1783) Ability for HDFS client to write replicas in parallel
[ https://issues.apache.org/jira/browse/HDFS-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13398256#comment-13398256 ] Lars Hofhansl commented on HDFS-1783: - Thanks Ted. Yes, I got that wrong in the first version of the patch (Dhruba had it right on Github, just me). I found that when I added the various tests. I'll be back in the US soon, and finish the HBase patch and get some performance testing done on a real cluster. > Ability for HDFS client to write replicas in parallel > - > > Key: HDFS-1783 > URL: https://issues.apache.org/jira/browse/HDFS-1783 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs client >Reporter: dhruba borthakur >Assignee: Lars Hofhansl > Attachments: HDFS-1783-trunk-v2.patch, HDFS-1783-trunk-v3.patch, > HDFS-1783-trunk-v4.patch, HDFS-1783-trunk-v5.patch, HDFS-1783-trunk.patch > > > The current implementation of HDFS pipelines the writes to the three > replicas. This introduces some latency for realtime latency sensitive > applications. An alternate implementation that allows the client to write all > replicas in parallel gives much better response times to these applications. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3370) HDFS hardlink
[ https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396592#comment-13396592 ] Lars Hofhansl commented on HDFS-3370: - Hardlinks would be used for temporary snapshotting (not to hold the backup itself). Anyway... Since there's strong opposition to this, at Salesforce we'll either come up with something else, maintain local HDFS patches, or use a different file system. > HDFS hardlink > - > > Key: HDFS-3370 > URL: https://issues.apache.org/jira/browse/HDFS-3370 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Hairong Kuang >Assignee: Liyin Tang > Attachments: HDFS-HardLink.pdf > > > We'd like to add a new feature hardlink to HDFS that allows harlinked files > to share data without copying. Currently we will support hardlinking only > closed files, but it could be extended to unclosed files as well. > Among many potential use cases of the feature, the following two are > primarily used in facebook: > 1. This provides a lightweight way for applications like hbase to create a > snapshot; > 2. This also allows an application like Hive to move a table to a different > directory without breaking current running hive queries. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1783) Ability for HDFS client to write replicas in parallel
[ https://issues.apache.org/jira/browse/HDFS-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13294951#comment-13294951 ] Lars Hofhansl commented on HDFS-1783: - @Ted: The first method is overridden in DistributedFileSystem (to avoid having to change method signatures in each subclass for FileSystem). PrimitiveCreate is called from FileContext. There seem to be some general inconsistencies in FileSystem. For example calling FileSystem.create(..., APPEND, ...) will not append. FileContext.create(..., APPEND, ...) on the other hand will do the right thing. This patch does not affect that. The patch will naturally work with FileContext.create(..., APPEND, ...) I'll add a few more tests for this. When I'm back in the US, I'll get some performance numbers (judging from my micro benchmarks, I'd expect some nice improvements as long as the client's network-link is not saturated). > Ability for HDFS client to write replicas in parallel > - > > Key: HDFS-1783 > URL: https://issues.apache.org/jira/browse/HDFS-1783 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs client >Reporter: dhruba borthakur >Assignee: Lars Hofhansl > Attachments: HDFS-1783-trunk-v2.patch, HDFS-1783-trunk-v3.patch, > HDFS-1783-trunk-v4.patch, HDFS-1783-trunk-v5.patch, HDFS-1783-trunk.patch > > > The current implementation of HDFS pipelines the writes to the three > replicas. This introduces some latency for realtime latency sensitive > applications. An alternate implementation that allows the client to write all > replicas in parallel gives much better response times to these applications. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3370) HDFS hardlink
[ https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13294934#comment-13294934 ] Lars Hofhansl commented on HDFS-3370: - This is a good discussion. Couple of points: bq. Or provide use cases which cannot be solved without it. This seems to be the key question: What services should a file system provide? The same argument could be made for symbolic links. The application could implement those (in fact it's quite simple). bq. but they are very hard to support when the namespace is distributed But isn't that an implementation detail, which should not inform the feature set? Hardlinks could be only supported per distinct namespace (namespace in federated HDFS or a volume in MapR - I think). This is not unlike Unix where hardlinks are per distinct filesystem (i.e. not across mount points). @M.C. Srivas: If you create 15 backups without hardlinks you get 15 times the metadata *and* 15 times the data... Unless you assume some other feature such as snapshots with copy-on-write or backup-on-write semantics. (Maybe I did not get the argument) Immutable files are very a common and useful design pattern (not just for HBase) and while not strictly needed, hardlinks are very useful together with immutable files. Just my $0.02. > HDFS hardlink > - > > Key: HDFS-3370 > URL: https://issues.apache.org/jira/browse/HDFS-3370 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Hairong Kuang >Assignee: Liyin Tang > Attachments: HDFS-HardLink.pdf > > > We'd like to add a new feature hardlink to HDFS that allows harlinked files > to share data without copying. Currently we will support hardlinking only > closed files, but it could be extended to unclosed files as well. > Among many potential use cases of the feature, the following two are > primarily used in facebook: > 1. This provides a lightweight way for applications like hbase to create a > snapshot; > 2. This also allows an application like Hive to move a table to a different > directory without breaking current running hive queries. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3370) HDFS hardlink
[ https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293480#comment-13293480 ] Lars Hofhansl commented on HDFS-3370: - Thanks Liyin. Sounds good. One thought that occurred to me since: We need to think about copy semantics. For example how will distcp handle this? It shouldn't create a new copy of a file for each hardlink that points to it, but rather just copy it at most once and create hardlinks for each following reference. But then what about multiple distcp commands that happen to cover hardlinks to the same file? I suppose in the case we cannot be expected to avoid multiple copies of the same file (but at most one copy for each invocation of distcp, and only if the distcp happens to cover a different hardlink). > HDFS hardlink > - > > Key: HDFS-3370 > URL: https://issues.apache.org/jira/browse/HDFS-3370 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Hairong Kuang >Assignee: Liyin Tang > Attachments: HDFS-HardLink.pdf > > > We'd like to add a new feature hardlink to HDFS that allows harlinked files > to share data without copying. Currently we will support hardlinking only > closed files, but it could be extended to unclosed files as well. > Among many potential use cases of the feature, the following two are > primarily used in facebook: > 1. This provides a lightweight way for applications like hbase to create a > snapshot; > 2. This also allows an application like Hive to move a table to a different > directory without breaking current running hive queries. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1783) Ability for HDFS client to write replicas in parallel
[ https://issues.apache.org/jira/browse/HDFS-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HDFS-1783: Attachment: HDFS-1783-trunk-v5.patch Also adds a subclass of TestPipelinesFailover running all tests with PARALLEL_WRITES. FileSystem.append itself does not support parallel writes (as of this patch). I am generally not quite clear what the difference between FileSystem.append and FileSystem.create(..., CreateFlag.APPEND, ...) is supposed to be. > Ability for HDFS client to write replicas in parallel > - > > Key: HDFS-1783 > URL: https://issues.apache.org/jira/browse/HDFS-1783 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs client >Reporter: dhruba borthakur >Assignee: Lars Hofhansl > Attachments: HDFS-1783-trunk-v2.patch, HDFS-1783-trunk-v3.patch, > HDFS-1783-trunk-v4.patch, HDFS-1783-trunk-v5.patch, HDFS-1783-trunk.patch > > > The current implementation of HDFS pipelines the writes to the three > replicas. This introduces some latency for realtime latency sensitive > applications. An alternate implementation that allows the client to write all > replicas in parallel gives much better response times to these applications. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1783) Ability for HDFS client to write replicas in parallel
[ https://issues.apache.org/jira/browse/HDFS-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291647#comment-13291647 ] Lars Hofhansl commented on HDFS-1783: - One more test: I introduced an artificial 1ms sleep in the beginning of BlockReceiever.receivePacket. Then I ran the same test above with 10.000 loops. With the patch it takes ~19s without the patch ~44s. > Ability for HDFS client to write replicas in parallel > - > > Key: HDFS-1783 > URL: https://issues.apache.org/jira/browse/HDFS-1783 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs client >Reporter: dhruba borthakur >Assignee: Lars Hofhansl > Attachments: HDFS-1783-trunk-v2.patch, HDFS-1783-trunk-v3.patch, > HDFS-1783-trunk-v4.patch, HDFS-1783-trunk.patch > > > The current implementation of HDFS pipelines the writes to the three > replicas. This introduces some latency for realtime latency sensitive > applications. An alternate implementation that allows the client to write all > replicas in parallel gives much better response times to these applications. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1783) Ability for HDFS client to write replicas in parallel
[ https://issues.apache.org/jira/browse/HDFS-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291641#comment-13291641 ] Lars Hofhansl commented on HDFS-1783: - I did a simple local micro benchmark: Started a mini cluster with 3 data nodes. Wrote 1 byte 100.000 times, each followed by an hflush (so 100.000 packets). With parallel writes it took ~25s, without ~30s (this was repeatable). Also tried to 10 and 100 byte packets. For 10 bytes I get the same results. For 100 bytes it took ~29s with parallel writes and ~37s without. Since this was all on a single machine I am not entirely sure how this would translate to a real cluster with real network latency. The latency I measured for my "lo" device is 0.05ms... I would expect the impact of this change to be more profound in a real cluster setting with latency in the order of a few ms. There also should be a definite gain when hsync (after HDFS-744) is enabled (but that I cannot test on a single machine with a single spindle). > Ability for HDFS client to write replicas in parallel > - > > Key: HDFS-1783 > URL: https://issues.apache.org/jira/browse/HDFS-1783 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs client >Reporter: dhruba borthakur >Assignee: Lars Hofhansl > Attachments: HDFS-1783-trunk-v2.patch, HDFS-1783-trunk-v3.patch, > HDFS-1783-trunk-v4.patch, HDFS-1783-trunk.patch > > > The current implementation of HDFS pipelines the writes to the three > replicas. This introduces some latency for realtime latency sensitive > applications. An alternate implementation that allows the client to write all > replicas in parallel gives much better response times to these applications. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3370) HDFS hardlink
[ https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291614#comment-13291614 ] Lars Hofhansl commented on HDFS-3370: - Do you have a preliminary patch to look at? > HDFS hardlink > - > > Key: HDFS-3370 > URL: https://issues.apache.org/jira/browse/HDFS-3370 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Hairong Kuang >Assignee: Liyin Tang > Attachments: HDFS-HardLink.pdf > > > We'd like to add a new feature hardlink to HDFS that allows harlinked files > to share data without copying. Currently we will support hardlinking only > closed files, but it could be extended to unclosed files as well. > Among many potential use cases of the feature, the following two are > primarily used in facebook: > 1. This provides a lightweight way for applications like hbase to create a > snapshot; > 2. This also allows an application like Hive to move a table to a different > directory without breaking current running hive queries. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1783) Ability for HDFS client to write replicas in parallel
[ https://issues.apache.org/jira/browse/HDFS-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291612#comment-13291612 ] Lars Hofhansl commented on HDFS-1783: - Thanks Todd. I see your point. I'm still oversees until the end of the month with no physical access to a cluster. Ram and Andy said, that they might get a chance to do some performance test before that. (It's hard to beat the pipelining on throughput, so I only expect latency to be improved.) As for the complexity, I find it manageable... The pipelining as such has not changed, only that the client opens up N pipelines on length 1. Once this change is in, one could get fancier (for example 2 pipelines of length 2 for 4 replicas, etc, or maybe we could open pipelines to multiple clusters, etc). > Ability for HDFS client to write replicas in parallel > - > > Key: HDFS-1783 > URL: https://issues.apache.org/jira/browse/HDFS-1783 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs client >Reporter: dhruba borthakur >Assignee: Lars Hofhansl > Attachments: HDFS-1783-trunk-v2.patch, HDFS-1783-trunk-v3.patch, > HDFS-1783-trunk-v4.patch, HDFS-1783-trunk.patch > > > The current implementation of HDFS pipelines the writes to the three > replicas. This introduces some latency for realtime latency sensitive > applications. An alternate implementation that allows the client to write all > replicas in parallel gives much better response times to these applications. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3370) HDFS hardlink
[ https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13290922#comment-13290922 ] Lars Hofhansl commented on HDFS-3370: - Is anybody working on a patch for this? If not, I would not mind picking this up (although I can't promise getting to this before the end of the month). > HDFS hardlink > - > > Key: HDFS-3370 > URL: https://issues.apache.org/jira/browse/HDFS-3370 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Hairong Kuang >Assignee: Liyin Tang > Attachments: HDFS-HardLink.pdf > > > We'd like to add a new feature hardlink to HDFS that allows harlinked files > to share data without copying. Currently we will support hardlinking only > closed files, but it could be extended to unclosed files as well. > Among many potential use cases of the feature, the following two are > primarily used in facebook: > 1. This provides a lightweight way for applications like hbase to create a > snapshot; > 2. This also allows an application like Hive to move a table to a different > directory without breaking current running hive queries. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1783) Ability for HDFS client to write replicas in parallel
[ https://issues.apache.org/jira/browse/HDFS-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13290920#comment-13290920 ] Lars Hofhansl commented on HDFS-1783: - Is there general interest in this? > Ability for HDFS client to write replicas in parallel > - > > Key: HDFS-1783 > URL: https://issues.apache.org/jira/browse/HDFS-1783 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs client >Reporter: dhruba borthakur >Assignee: Lars Hofhansl > Attachments: HDFS-1783-trunk-v2.patch, HDFS-1783-trunk-v3.patch, > HDFS-1783-trunk-v4.patch, HDFS-1783-trunk.patch > > > The current implementation of HDFS pipelines the writes to the three > replicas. This introduces some latency for realtime latency sensitive > applications. An alternate implementation that allows the client to write all > replicas in parallel gives much better response times to these applications. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3370) HDFS hardlink
[ https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13289689#comment-13289689 ] Lars Hofhansl commented on HDFS-3370: - Reading through the Design Doc it seems that FileSystem.{setPermission|setOwner} would be awkward. We'd have to find each INodeHardLinkFile pointing to the same "file" and then changing all their permissions/owners. HardLinkFileInfo could also maintain permissions and owners (since they - following posix - are the same for each hard link). That way changing owner or permissions would immediately affect all hard links. When the fsimage is saved each INodeHardLinkFile would still write its own permission and owner (for simplicity, but that could be optimized, as long as at least one INode writes the permissions/owner). Upon read INode representing a hardlink must have the same permission/owner as all other INodes linking to the same "file". If not the image is inconsistent. In that case HardLinkFileInfo would not need to maintain a list of pointers back to all INodeHardLinkFiles, and owner/permissions would only be stored once in memory. > HDFS hardlink > - > > Key: HDFS-3370 > URL: https://issues.apache.org/jira/browse/HDFS-3370 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Hairong Kuang >Assignee: Liyin Tang > Attachments: HDFS-HardLink.pdf > > > We'd like to add a new feature hardlink to HDFS that allows harlinked files > to share data without copying. Currently we will support hardlinking only > closed files, but it could be extended to unclosed files as well. > Among many potential use cases of the feature, the following two are > primarily used in facebook: > 1. This provides a lightweight way for applications like hbase to create a > snapshot; > 2. This also allows an application like Hive to move a table to a different > directory without breaking current running hive queries. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira