from:"\"Lars Hofhansl \\\(JIRA\\\)\""

[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes

2015-06-23 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598677#comment-14598677
 ] 

Lars Hofhansl commented on HDFS-6440:
-

Yeah. Thanks [~atm]!

> Support more than 2 NameNodes
> -
>
> Key: HDFS-6440
> URL: https://issues.apache.org/jira/browse/HDFS-6440
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: auto-failover, ha, namenode
>Affects Versions: 2.4.0
>Reporter: Jesse Yates
>Assignee: Jesse Yates
> Fix For: 3.0.0
>
> Attachments: Multiple-Standby-NameNodes_V1.pdf, 
> hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, 
> hdfs-6440-trunk-v1.patch, hdfs-6440-trunk-v3.patch, hdfs-6440-trunk-v4.patch, 
> hdfs-6440-trunk-v5.patch, hdfs-6440-trunk-v6.patch, hdfs-6440-trunk-v7.patch, 
> hdfs-6440-trunk-v8.patch, hdfs-multiple-snn-trunk-v0.patch
>
>
> Most of the work is already done to support more than 2 NameNodes (one 
> active, one standby). This would be the last bit to support running multiple 
> _standby_ NameNodes; one of the standbys should be available for fail-over.
> Mostly, this is a matter of updating how we parse configurations, some 
> complexity around managing the checkpointing, and updating a whole lot of 
> tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes

2015-05-03 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14525990#comment-14525990
 ] 

Lars Hofhansl commented on HDFS-6440:
-

[~eli], this is the issue I mentioned on Wednesday.

I find it hard to believe that we're the only ones who want this, it's running 
in production at Salesforce. What's holding this up? How can we help getting 
this in? Break it into smaller pieces? Something else?


> Support more than 2 NameNodes
> -
>
> Key: HDFS-6440
> URL: https://issues.apache.org/jira/browse/HDFS-6440
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: auto-failover, ha, namenode
>Affects Versions: 2.4.0
>Reporter: Jesse Yates
>Assignee: Jesse Yates
> Attachments: Multiple-Standby-NameNodes_V1.pdf, 
> hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, 
> hdfs-6440-trunk-v1.patch, hdfs-multiple-snn-trunk-v0.patch
>
>
> Most of the work is already done to support more than 2 NameNodes (one 
> active, one standby). This would be the last bit to support running multiple 
> _standby_ NameNodes; one of the standbys should be available for fail-over.
> Mostly, this is a matter of updating how we parse configurations, some 
> complexity around managing the checkpointing, and updating a whole lot of 
> tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7240) Object store in HDFS

2015-04-08 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14486738#comment-14486738
 ] 

Lars Hofhansl commented on HDFS-7240:
-

Awesome stuff. We (Salesforce) have a need for this.

I think these will lead to immediate management problems: 
* Object Size : 5G
* Number of buckets system-wide : 10 million  
* Number of objects per bucket: 1 million  
* Number of buckets per storage volume : 1000  

We have a large number of tenant (many times more than 1000). Some of the 
tenants will be very large (storing many times more than 1m objects). Of course 
there are simple workarounds for that, such as including a tenant id in the 
volume name and a bucket name in our internal blob ids. Are these technical 
limits?

I don't think that we're the only ones who will to store a large amount of 
objects (more than 1m) and the bucket management would get into the way, rather 
than help.


> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: Ozone-architecture-v1.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes

2015-04-06 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14481957#comment-14481957
 ] 

Lars Hofhansl commented on HDFS-6440:
-

Let me also restate that we are running this in production on hundreds of 
clusters at Salesforce; we haven't seen any issues. It _is_ a pretty intricate 
patch, so I understand the hesitation.


> Support more than 2 NameNodes
> -
>
> Key: HDFS-6440
> URL: https://issues.apache.org/jira/browse/HDFS-6440
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: auto-failover, ha, namenode
>Affects Versions: 2.4.0
>Reporter: Jesse Yates
>Assignee: Jesse Yates
> Attachments: Multiple-Standby-NameNodes_V1.pdf, 
> hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, 
> hdfs-6440-trunk-v1.patch, hdfs-multiple-snn-trunk-v0.patch
>
>
> Most of the work is already done to support more than 2 NameNodes (one 
> active, one standby). This would be the last bit to support running multiple 
> _standby_ NameNodes; one of the standbys should be available for fail-over.
> Mostly, this is a matter of updating how we parse configurations, some 
> complexity around managing the checkpointing, and updating a whole lot of 
> tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-28 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HDFS-6735:

Attachment: HDFS-6735-v8.txt

One more update. I noticed that the lock in ShortCircuitCache is taking more 
time than warranted. I noticed we have all these Precondition checks, where we 
prebuild the string that is only used in the exceptional case. Much better to 
use static strings with parameters so that the message string is constant and 
the final string is only built in the exception case.
That noticeably decreases the time spend in the ShortCircuitCache.lock.

Could do that in the separate jira, but it seemed easy enough.
Please let me know what you think. Thanks.

> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Lars Hofhansl
> Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, 
> HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735-v7.txt, HDFS-6735-v8.txt, 
> HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-27 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HDFS-6735:

Attachment: HDFS-6735-v7.txt

So here's the final one (with the findbugs tweak back in).

> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Lars Hofhansl
> Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, 
> HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735-v7.txt, HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-27 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HDFS-6735:

Attachment: (was: HDFS-6735-v6.txt)

> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Lars Hofhansl
> Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, 
> HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-27 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HDFS-6735:

Attachment: (was: HDFS-6735-v6.txt)

> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Lars Hofhansl
> Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, 
> HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-27 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HDFS-6735:

Attachment: HDFS-6735-v6.txt

Trying to get another build. The artifacts of the previous one are gone for 
some reason.

> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Lars Hofhansl
> Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, 
> HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735-v6.txt, HDFS-6735-v6.txt, 
> HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-26 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl reassigned HDFS-6735:
---

Assignee: Lars Hofhansl  (was: Liang Xie)

> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Lars Hofhansl
> Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, 
> HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735-v6.txt, HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-26 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14227155#comment-14227155
 ] 

Lars Hofhansl commented on HDFS-6735:
-

So to be specific the improvement I see above is still there. Just that the 
next thing to tackle is the ShortCircuitCache.

> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, 
> HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735-v6.txt, HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-26 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14227148#comment-14227148
 ] 

Lars Hofhansl commented on HDFS-6735:
-

Tested -v6 with HBase. Still good from the DFSInputStream angle.
I do see now that much more time is spent in ShortCircuitCache.fetchOrCreate 
and unref. (rechecked that is true to -v3 as well).
It's still better, but the can is kicked down the road a bit.


> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, 
> HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735-v6.txt, HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-26 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HDFS-6735:

Attachment: HDFS-6735-v6.txt

Updated patch.
The findbugs tweak is still necessary. Locking was correct before, findbugs 
does not seem to realize that all references to cachingStrategy is always 
guarded by the infoLock.

I'll run a 2.4.1 version of this patch against HBase again.


> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, 
> HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735-v6.txt, HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-26 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14227031#comment-14227031
 ] 

Lars Hofhansl commented on HDFS-6735:
-

Per my comment above my preference would still be to just make the 
cachingStrategy reference volatile in DFSInputStream. It is immutable and hence 
the volatile reference would make access safe in all cases without any locking 
- the same is true for fileEncryptionInfo, btw (immutable already, just needs a 
volatile reference, no locking needed at all).

I'll make a new patch.

> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, 
> HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-25 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HDFS-6735:

Attachment: HDFS-6735-v6.txt

Thanks [~ste...@apache.org].

New patch with findbugs tweak.

> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, 
> HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-24 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14223901#comment-14223901
 ] 

Lars Hofhansl commented on HDFS-6735:
-

The remaining findbugs warning is due to cachingStrategy. I am 100% sure that 
the locking is correct, every single reference to cachingStrategy is guarded by 
the infoLock.
This should good to go (happy to squash the bogus findbugs warning if somebody 
has a suggestion how).

The findbugs website states this for IS2_INCONSISTENT_SYNC:
{quote}
Note that there are various sources of inaccuracy in this detector; for 
example, the detector cannot statically detect all situations in which a lock 
is held.  Also, even when the detector is accurate in distinguishing locked vs. 
unlocked accesses, the code in question may still be correct.
{quote}


> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, 
> HDFS-6735-v5.txt, HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-23 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14222673#comment-14222673
 ] 

Lars Hofhansl commented on HDFS-6735:
-

s/since we never get into that if block if we coming from a called 
synchronized/since we *only* get into that if block if we coming from a caller 
synchronized/


> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, 
> HDFS-6735-v5.txt, HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-23 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HDFS-6735:

Attachment: HDFS-6735-v5.txt

Looked through the findbugs warning for DFSInputStream:
* indeed currentNode was wrongly synchronized (was so even before the patch). 
In getCurrentDataNode I had added synchronized(infoLock) but getCurrentData 
should just synchronized as currentNode is seek+read state.
* added a synchronized block in getBlockAt around access to pos, blockEnd, 
currentLocatedBlock. As explained in comment that is not needed, since we never 
get into that if block if we coming from a called synchronized on . But 
if that is so the extra synchronized won't hurt and it should make findbugs 
happy. 


> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, 
> HDFS-6735-v5.txt, HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-23 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HDFS-6735:

Attachment: HDFS-6735-v4.txt

> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, 
> HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-23 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HDFS-6735:

Attachment: (was: HDFS-6735-v4.txt)

> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-23 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HDFS-6735:

Attachment: HDFS-6735-v4.txt

New patch:
* added synchronized back to tryZeroCopyRead
* renamed sharedLock to infoLock
* this time did all the correct indentation - harder to review, but this should 
be committable as is
* surrounded every reference to cachingStrategy with synchronized(infoLock) 
{...}, removed volatile

Looking at this again, we can be better about safe publishing with immutable 
state and avoid some of the locks.
For example FileEncryptionInfo and CachingStrategy are already immutable and 
can be 100% safely handled by just a volatile reference; most the LocatedBlocks 
state is also immutable and for those parts we can avoid the locks as well.

Immutable state is easier to reason about and more efficient.
(volatile still places read and write memory fences - but that is cheaper than 
synchronized). Can do that later :)


> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, 
> HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-21 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221839#comment-14221839
 ] 

Lars Hofhansl commented on HDFS-6735:
-

Thanks [~cmccabe]. I'll put the synchronized back, do the correct indentation, 
and name the new lock differently.
I'll also look through the other synchronized modifiers that I had removed from 
private methods where is makes sense.

On the indentation... I completely agree. It's hard to review - sometimes I 
apply HBase patches locally just so that I can do a git diff -b to review it 
without the whitespace, which is a pain. And if not done in all branches then 
cherry-picking a patch becomes annoying, etc, etc.

Thanks again for looking! New patch upcoming.

> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-19 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14219075#comment-14219075
 ] 

Lars Hofhansl commented on HDFS-6735:
-

Apologies for the spam... I have a backport of this to branch-2.4 in case 
anybody is interested.


> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-19 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14219074#comment-14219074
 ] 

Lars Hofhansl commented on HDFS-6735:
-

re: tryReadZeroCopy removing the synchronization is fine, because it is only 
called from (stateful) read(...) and pos is only used in the stateful read path 
and hence needs to be guarded by the lock on  only.


> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-19 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14219072#comment-14219072
 ] 

Lars Hofhansl commented on HDFS-6735:
-

Thanks [~cmccabe]. "infoLock" is better. I'll fix the indentation later. Let me 
have a look at tryReadZeroCopy again. I had mapped out all members and which 
methods use what, and concluded the synchronized wasn't needed, quite possible 
I made a mistake.

Another locking option is not to synchronize on  at all, but to have two 
locks ("streamLock" and "pLock", or whatever are good names). That way the 
intend might be more explicit.
Yet another option would be to disentangle to two apis by subclassing or 
delegation (since the issue really is that we have state for two different 
modes of operation in the same class), that'd be a bigger change though.

Meanwhile in HBase land:
Tested this with HBase and observed with a sampler that all delays internal to 
DFSInputStream are gone, which is nice.

I committed a change to HBase to allow us to (1) have compaction use their own 
input streams so they do not interfere with user scans along the same files and 
(2) optionally force p-reads for all user scans. See HBASE-12411.

Especially with #2 I see nice speedups for many concurrent scanners essentially 
to what my disks can sustain, but a 50% slow downs for a single scanner per 
file only - which is obvious as we're not benefiting from prefetching now.


> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-19 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218802#comment-14218802
 ] 

Lars Hofhansl commented on HDFS-6735:
-

I ran TestByteArrayManager as well as all tests derived from 
TestParallelReadUtil. All pass locally.
Will checkout the findbugs warning and do an real-life test with HBase (with 
this patch on top of the latest 2.4)

Any recommendation on what else I should test?


> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-08 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HDFS-6735:

Attachment: HDFS-6735-v3.txt

I classified the state in DFSInputStream into state used by read only and state 
used by both read and pread.

With that here's a new proposed patch.
* makes LocatedBlocks immutable (which was intended it seems)
* pread no longer affects currentNode (that was unintended I think)
* guards state shared between read and pread with an extra sharedLock
(the state used for read only is still guarded by a lock on , which we 
need to take anyway to avoid concurrent stateful reads against the same input 
stream)
* removed all synchronized on private method that were only called from methods 
already synchronized (good practice anyway)
* makes cachingStrategy volatile (made more sense than locking there)
* should be free of deadlocks (never acquire lock on  with sharedLock 
held, but the reverse is possible)
* pos, blockEnd, currentLocatedBlock are not updated in getBlockAt unless 
called on behalf of read (not for pread, hence locking on  not needed 
there)

I have not tested this, yet.
Please have a careful look and let me know what you think.
We might want to further disentangle the mixed state.

(And just maybe the best solution would be for HBase to have an input stream 
for each thread doing read and one for all threads doing preads - and not do 
any of this...?)

> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6698) try to optimize DFSInputStream.getFileLength()

2014-11-03 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195409#comment-14195409
 ] 

Lars Hofhansl commented on HDFS-6698:
-

Yeah, let's combine these. We can close this one and do the work in HDFS-6735.

I'm with you on volatile, it only guarantees visibility (via memory barriers) 
but doesn't control concurrent access. Things should be final (immutable) or 
locked correctly - volatile is rarely enough by itself.

Using a separate lock for touching DFSInputStream#locatedBlocks seems like the 
right approach to me.


> try to optimize DFSInputStream.getFileLength()
> --
>
> Key: HDFS-6698
> URL: https://issues.apache.org/jira/browse/HDFS-6698
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6698.txt, HDFS-6698.txt, HDFS-6698v2.txt, 
> HDFS-6698v2.txt, HDFS-6698v3.txt
>
>
> HBase prefers to invoke read() serving scan request, and invoke pread() 
> serving get reqeust. Because pread() almost holds no lock.
> Let's image there's a read() running, because the definition is:
> {code}
> public synchronized int read
> {code}
> so no other read() request could run concurrently, this is known, but pread() 
> also could not run...  because:
> {code}
>   public int read(long position, byte[] buffer, int offset, int length)
> throws IOException {
> // sanity checks
> dfsClient.checkOpen();
> if (closed) {
>   throw new IOException("Stream closed");
> }
> failures = 0;
> long filelen = getFileLength();
> {code}
> the getFileLength() also needs lock.  so we need to figure out a no lock impl 
> for getFileLength() before HBase multi stream feature done. 
> [~saint@gmail.com]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-01 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14193629#comment-14193629
 ] 

Lars Hofhansl commented on HDFS-6735:
-

As described in HDFS-6698, the potential performance gains for something like 
HBase are substantial.

I agree it's better to keep LocatedBlocks as not threadsafe and require called 
to lock accordingly.
I've not see fetchAt in a hot path (at least not from HBase usage patterns).
seek + read (non positional) cannot be done concurrently, agreed. pread should 
be possible, though.

How should we continue to move on this? Seems important. :)

Also open to suggestions about how to fix things in HBase (see last comment in 
HDFS-6698, about how HBase handles things and how limited concurrency "within" 
an InputStream is an issue).


> A minor optimization to avoid pread() be blocked by read() inside the same 
> DFSInputStream
> -
>
> Key: HDFS-6735
> URL: https://issues.apache.org/jira/browse/HDFS-6735
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6735-v2.txt, HDFS-6735.txt
>
>
> In current DFSInputStream impl, there're a couple of coarser-grained locks in 
> read/pread path, and it has became a HBase read latency pain point so far. In 
> HDFS-6698, i made a minor patch against the first encourtered lock, around 
> getFileLength, in deed, after reading code and testing, it shows still other 
> locks we could improve.
> In this jira, i'll make a patch against other locks, and a simple test case 
> to show the issue and the improved result.
> This is important for HBase application, since in current HFile read path, we 
> issue all read()/pread() requests in the same DFSInputStream for one HFile. 
> (Multi streams solution is another story i had a plan to do, but probably 
> will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6698) try to optimize DFSInputStream.getFileLength()

2014-10-31 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14192993#comment-14192993
 ] 

Lars Hofhansl commented on HDFS-6698:
-

Now... I am not saying that we do not have work to in HBase:
* we're using one reader per HFile
* after a major compaction we have a single store file per column family (that 
file can be up to 20GB in size)
* we allow one thread using seek+read on that reader, other concurrent scanners 
will fall back to pread (see HBASE-7336).

For my test I did this:
* my test table had 2^25 (~32m) rows, in two regions, about 1GB on disk
* I tested this with Phoenix, which can break a query into parts and execute 
scans for the parts (that's where the parallel scanning on the same readers 
comes into play)
* I have short circuit reading enabled
* all data in the OS cache (HBase block cache not used)

This is not an uncommon scenario, though. The original poster cited 
scans(seek+read) + gets(pread) as a problem.

In either case, I'll post an updated patch to HDFS-6735 and we can take it from 
there.


> try to optimize DFSInputStream.getFileLength()
> --
>
> Key: HDFS-6698
> URL: https://issues.apache.org/jira/browse/HDFS-6698
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6698.txt, HDFS-6698.txt, HDFS-6698v2.txt, 
> HDFS-6698v2.txt, HDFS-6698v3.txt
>
>
> HBase prefers to invoke read() serving scan request, and invoke pread() 
> serving get reqeust. Because pread() almost holds no lock.
> Let's image there's a read() running, because the definition is:
> {code}
> public synchronized int read
> {code}
> so no other read() request could run concurrently, this is known, but pread() 
> also could not run...  because:
> {code}
>   public int read(long position, byte[] buffer, int offset, int length)
> throws IOException {
> // sanity checks
> dfsClient.checkOpen();
> if (closed) {
>   throw new IOException("Stream closed");
> }
> failures = 0;
> long filelen = getFileLength();
> {code}
> the getFileLength() also needs lock.  so we need to figure out a no lock impl 
> for getFileLength() before HBase multi stream feature done. 
> [~saint@gmail.com]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6698) try to optimize DFSInputStream.getFileLength()

2014-10-31 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14192684#comment-14192684
 ] 

Lars Hofhansl commented on HDFS-6698:
-

Pulling in selected changes from HDFS-6735 yields a HUGE speed improvement. A 
scan that took 16s to execute now finishes in 9s. (setup is such all data fits 
into the OS cache and the HBase cache is disabled to isolate this code path)


> try to optimize DFSInputStream.getFileLength()
> --
>
> Key: HDFS-6698
> URL: https://issues.apache.org/jira/browse/HDFS-6698
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6698.txt, HDFS-6698.txt, HDFS-6698v2.txt, 
> HDFS-6698v2.txt, HDFS-6698v3.txt
>
>
> HBase prefers to invoke read() serving scan request, and invoke pread() 
> serving get reqeust. Because pread() almost holds no lock.
> Let's image there's a read() running, because the definition is:
> {code}
> public synchronized int read
> {code}
> so no other read() request could run concurrently, this is known, but pread() 
> also could not run...  because:
> {code}
>   public int read(long position, byte[] buffer, int offset, int length)
> throws IOException {
> // sanity checks
> dfsClient.checkOpen();
> if (closed) {
>   throw new IOException("Stream closed");
> }
> failures = 0;
> long filelen = getFileLength();
> {code}
> the getFileLength() also needs lock.  so we need to figure out a no lock impl 
> for getFileLength() before HBase multi stream feature done. 
> [~saint@gmail.com]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6698) try to optimize DFSInputStream.getFileLength()

2014-10-31 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14192550#comment-14192550
 ] 

Lars Hofhansl commented on HDFS-6698:
-

Indeed I now find that the time is spent in {{getBlockRange()}} :)
I'll look at HDFS-6735 and include fixes from there.

> try to optimize DFSInputStream.getFileLength()
> --
>
> Key: HDFS-6698
> URL: https://issues.apache.org/jira/browse/HDFS-6698
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6698.txt, HDFS-6698.txt, HDFS-6698v2.txt, 
> HDFS-6698v2.txt, HDFS-6698v3.txt
>
>
> HBase prefers to invoke read() serving scan request, and invoke pread() 
> serving get reqeust. Because pread() almost holds no lock.
> Let's image there's a read() running, because the definition is:
> {code}
> public synchronized int read
> {code}
> so no other read() request could run concurrently, this is known, but pread() 
> also could not run...  because:
> {code}
>   public int read(long position, byte[] buffer, int offset, int length)
> throws IOException {
> // sanity checks
> dfsClient.checkOpen();
> if (closed) {
>   throw new IOException("Stream closed");
> }
> failures = 0;
> long filelen = getFileLength();
> {code}
> the getFileLength() also needs lock.  so we need to figure out a no lock impl 
> for getFileLength() before HBase multi stream feature done. 
> [~saint@gmail.com]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6698) try to optimize DFSInputStream.getFileLength()

2014-10-31 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14192209#comment-14192209
 ] 

Lars Hofhansl commented on HDFS-6698:
-

Need to make sure now that we do not kick the can down the road; there are more 
synchronized methods call from within read. I'll do some testing and report 
back.


> try to optimize DFSInputStream.getFileLength()
> --
>
> Key: HDFS-6698
> URL: https://issues.apache.org/jira/browse/HDFS-6698
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6698.txt, HDFS-6698.txt, HDFS-6698v2.txt, 
> HDFS-6698v2.txt, HDFS-6698v3.txt
>
>
> HBase prefers to invoke read() serving scan request, and invoke pread() 
> serving get reqeust. Because pread() almost holds no lock.
> Let's image there's a read() running, because the definition is:
> {code}
> public synchronized int read
> {code}
> so no other read() request could run concurrently, this is known, but pread() 
> also could not run...  because:
> {code}
>   public int read(long position, byte[] buffer, int offset, int length)
> throws IOException {
> // sanity checks
> dfsClient.checkOpen();
> if (closed) {
>   throw new IOException("Stream closed");
> }
> failures = 0;
> long filelen = getFileLength();
> {code}
> the getFileLength() also needs lock.  so we need to figure out a no lock impl 
> for getFileLength() before HBase multi stream feature done. 
> [~saint@gmail.com]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6698) try to optimize DFSInputStream.getFileLength()

2014-10-30 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HDFS-6698:

Attachment: HDFS-6698v3.txt

I just ran into this as well while debugging why HBase does not benefit from 
Snappy compression as much as it should. Turns out a non-trivial amount of time 
(as determined by a sampler, not a instrumenting profiler) is spent in this 
method.

To be safe I'd probably also turn LocatedBlocks into an immutable object (well, 
except for blocks) - see attached patch. All members of LocatedBlocks are 
safely published now.

With that I don't think this patch can do any harm.


> try to optimize DFSInputStream.getFileLength()
> --
>
> Key: HDFS-6698
> URL: https://issues.apache.org/jira/browse/HDFS-6698
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6698.txt, HDFS-6698.txt, HDFS-6698v2.txt, 
> HDFS-6698v2.txt, HDFS-6698v3.txt
>
>
> HBase prefers to invoke read() serving scan request, and invoke pread() 
> serving get reqeust. Because pread() almost holds no lock.
> Let's image there's a read() running, because the definition is:
> {code}
> public synchronized int read
> {code}
> so no other read() request could run concurrently, this is known, but pread() 
> also could not run...  because:
> {code}
>   public int read(long position, byte[] buffer, int offset, int length)
> throws IOException {
> // sanity checks
> dfsClient.checkOpen();
> if (closed) {
>   throw new IOException("Stream closed");
> }
> failures = 0;
> long filelen = getFileLength();
> {code}
> the getFileLength() also needs lock.  so we need to figure out a no lock impl 
> for getFileLength() before HBase multi stream feature done. 
> [~saint@gmail.com]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-5042) Completed files lost after power failure

2014-10-03 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14158677#comment-14158677
 ] 

Lars Hofhansl commented on HDFS-5042:
-

We should study the perf impact.

Previously I found that sync-on-close severely impacted file creation time - 
unless sync-behind-writes is also enabled. (Interestingly sync-behind-writes 
should not cause any performance detriment as we're dealing with immutable 
files, and hence delaying writing these dirty blocks to disk in the hopes that 
they'd be updated before we do so is pointless anyway). 


> Completed files lost after power failure
> 
>
> Key: HDFS-5042
> URL: https://issues.apache.org/jira/browse/HDFS-5042
> Project: Hadoop HDFS
>  Issue Type: Bug
> Environment: ext3 on CentOS 5.7 (kernel 2.6.18-274.el5)
>Reporter: Dave Latham
>Priority: Critical
>
> We suffered a cluster wide power failure after which HDFS lost data that it 
> had acknowledged as closed and complete.
> The client was HBase which compacted a set of HFiles into a new HFile, then 
> after closing the file successfully, deleted the previous versions of the 
> file.  The cluster then lost power, and when brought back up the newly 
> created file was marked CORRUPT.
> Based on reading the logs it looks like the replicas were created by the 
> DataNodes in the 'blocksBeingWritten' directory.  Then when the file was 
> closed they were moved to the 'current' directory.  After the power cycle 
> those replicas were again in the blocksBeingWritten directory of the 
> underlying file system (ext3).  When those DataNodes reported in to the 
> NameNode it deleted those replicas and lost the file.
> Some possible fixes could be having the DataNode fsync the directory(s) after 
> moving the block from blocksBeingWritten to current to ensure the rename is 
> durable or having the NameNode accept replicas from blocksBeingWritten under 
> certain circumstances.
> Log snippets from RS (RegionServer), NN (NameNode), DN (DataNode):
> {noformat}
> RS 2013-06-29 11:16:06,812 DEBUG org.apache.hadoop.hbase.util.FSUtils: 
> Creating 
> file=hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
>  with permission=rwxrwxrwx
> NN 2013-06-29 11:16:06,830 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> NameSystem.allocateBlock: 
> /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c.
>  blk_1395839728632046111_357084589
> DN 2013-06-29 11:16:06,832 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block 
> blk_1395839728632046111_357084589 src: /10.0.5.237:14327 dest: 
> /10.0.5.237:50010
> NN 2013-06-29 11:16:11,370 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> NameSystem.addStoredBlock: blockMap updated: 10.0.6.1:50010 is added to 
> blk_1395839728632046111_357084589 size 25418340
> NN 2013-06-29 11:16:11,370 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> NameSystem.addStoredBlock: blockMap updated: 10.0.6.24:50010 is added to 
> blk_1395839728632046111_357084589 size 25418340
> NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> NameSystem.addStoredBlock: blockMap updated: 10.0.5.237:50010 is added to 
> blk_1395839728632046111_357084589 size 25418340
> DN 2013-06-29 11:16:11,385 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Received block 
> blk_1395839728632046111_357084589 of size 25418340 from /10.0.5.237:14327
> DN 2013-06-29 11:16:11,385 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 2 for block 
> blk_1395839728632046111_357084589 terminating
> NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: Removing 
> lease on  file 
> /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
>  from client DFSClient_hb_rs_hs745,60020,1372470111932
> NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: DIR* 
> NameSystem.completeFile: file 
> /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
>  is closed by DFSClient_hb_rs_hs745,60020,1372470111932
> RS 2013-06-29 11:16:11,393 INFO org.apache.hadoop.hbase.regionserver.Store: 
> Renaming compacted file at 
> hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
>  to 
> hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/n/6e0cc30af6e64e56ba5a539fdf159c4c
> RS 2013-06-29 11:16:11,505 INFO org.apache.hadoop.hbase.regionserver.Store: 
> Completed major compaction of 7 file(s) in n of 
> users-6,\x12\xBDp\xA3,1359426311784.b5b0820cde759ae68e333b2f4015bb7e. into 
> 6e0cc30af6e64e56ba5a539fdf159c4c, size=24.2m; total size for store is 24.2m
> ---  CRASH, RESTART -
> NN 2013-06-29 12:01:19,743 INFO org.apache.hadoop

[jira] [Commented] (HDFS-5042) Completed files lost after power failure

2014-10-03 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14158434#comment-14158434
 ] 

Lars Hofhansl commented on HDFS-5042:
-

Cool. That should work.


> Completed files lost after power failure
> 
>
> Key: HDFS-5042
> URL: https://issues.apache.org/jira/browse/HDFS-5042
> Project: Hadoop HDFS
>  Issue Type: Bug
> Environment: ext3 on CentOS 5.7 (kernel 2.6.18-274.el5)
>Reporter: Dave Latham
>Priority: Critical
>
> We suffered a cluster wide power failure after which HDFS lost data that it 
> had acknowledged as closed and complete.
> The client was HBase which compacted a set of HFiles into a new HFile, then 
> after closing the file successfully, deleted the previous versions of the 
> file.  The cluster then lost power, and when brought back up the newly 
> created file was marked CORRUPT.
> Based on reading the logs it looks like the replicas were created by the 
> DataNodes in the 'blocksBeingWritten' directory.  Then when the file was 
> closed they were moved to the 'current' directory.  After the power cycle 
> those replicas were again in the blocksBeingWritten directory of the 
> underlying file system (ext3).  When those DataNodes reported in to the 
> NameNode it deleted those replicas and lost the file.
> Some possible fixes could be having the DataNode fsync the directory(s) after 
> moving the block from blocksBeingWritten to current to ensure the rename is 
> durable or having the NameNode accept replicas from blocksBeingWritten under 
> certain circumstances.
> Log snippets from RS (RegionServer), NN (NameNode), DN (DataNode):
> {noformat}
> RS 2013-06-29 11:16:06,812 DEBUG org.apache.hadoop.hbase.util.FSUtils: 
> Creating 
> file=hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
>  with permission=rwxrwxrwx
> NN 2013-06-29 11:16:06,830 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> NameSystem.allocateBlock: 
> /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c.
>  blk_1395839728632046111_357084589
> DN 2013-06-29 11:16:06,832 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block 
> blk_1395839728632046111_357084589 src: /10.0.5.237:14327 dest: 
> /10.0.5.237:50010
> NN 2013-06-29 11:16:11,370 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> NameSystem.addStoredBlock: blockMap updated: 10.0.6.1:50010 is added to 
> blk_1395839728632046111_357084589 size 25418340
> NN 2013-06-29 11:16:11,370 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> NameSystem.addStoredBlock: blockMap updated: 10.0.6.24:50010 is added to 
> blk_1395839728632046111_357084589 size 25418340
> NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> NameSystem.addStoredBlock: blockMap updated: 10.0.5.237:50010 is added to 
> blk_1395839728632046111_357084589 size 25418340
> DN 2013-06-29 11:16:11,385 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Received block 
> blk_1395839728632046111_357084589 of size 25418340 from /10.0.5.237:14327
> DN 2013-06-29 11:16:11,385 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 2 for block 
> blk_1395839728632046111_357084589 terminating
> NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: Removing 
> lease on  file 
> /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
>  from client DFSClient_hb_rs_hs745,60020,1372470111932
> NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: DIR* 
> NameSystem.completeFile: file 
> /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
>  is closed by DFSClient_hb_rs_hs745,60020,1372470111932
> RS 2013-06-29 11:16:11,393 INFO org.apache.hadoop.hbase.regionserver.Store: 
> Renaming compacted file at 
> hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
>  to 
> hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/n/6e0cc30af6e64e56ba5a539fdf159c4c
> RS 2013-06-29 11:16:11,505 INFO org.apache.hadoop.hbase.regionserver.Store: 
> Completed major compaction of 7 file(s) in n of 
> users-6,\x12\xBDp\xA3,1359426311784.b5b0820cde759ae68e333b2f4015bb7e. into 
> 6e0cc30af6e64e56ba5a539fdf159c4c, size=24.2m; total size for store is 24.2m
> ---  CRASH, RESTART -
> NN 2013-06-29 12:01:19,743 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> NameSystem.addStoredBlock: addStoredBlock request received for 
> blk_1395839728632046111_357084589 on 10.0.6.1:50010 size 21978112 but was 
> rejected: Reported as block being written but is a block of closed file.
> NN 2013-06-29 12:01:19,743 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> NameSystem.addToInvalidates: blk_1395839728632046111 i

[jira] [Commented] (HDFS-5042) Completed files lost after power failure

2014-10-03 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14158235#comment-14158235
 ] 

Lars Hofhansl commented on HDFS-5042:
-

Thanks Luke.

I meant to say: (1) finish writing the block. (2) Move it. (3) fsync or 
fdatasync the block file in the new location.
(We'd just change the order of moving vs. fsync.)

The rename would still be atomic (file block is written completely before we 
move it), but doing the fsync after should order the meta data commits 
correctly assuming write barriers. Then again the write and the move would be 
two different transactions as far as the fs is concerned.

Agree it's cleanest if we in fact sync both actions.


> Completed files lost after power failure
> 
>
> Key: HDFS-5042
> URL: https://issues.apache.org/jira/browse/HDFS-5042
> Project: Hadoop HDFS
>  Issue Type: Bug
> Environment: ext3 on CentOS 5.7 (kernel 2.6.18-274.el5)
>Reporter: Dave Latham
>Priority: Critical
>
> We suffered a cluster wide power failure after which HDFS lost data that it 
> had acknowledged as closed and complete.
> The client was HBase which compacted a set of HFiles into a new HFile, then 
> after closing the file successfully, deleted the previous versions of the 
> file.  The cluster then lost power, and when brought back up the newly 
> created file was marked CORRUPT.
> Based on reading the logs it looks like the replicas were created by the 
> DataNodes in the 'blocksBeingWritten' directory.  Then when the file was 
> closed they were moved to the 'current' directory.  After the power cycle 
> those replicas were again in the blocksBeingWritten directory of the 
> underlying file system (ext3).  When those DataNodes reported in to the 
> NameNode it deleted those replicas and lost the file.
> Some possible fixes could be having the DataNode fsync the directory(s) after 
> moving the block from blocksBeingWritten to current to ensure the rename is 
> durable or having the NameNode accept replicas from blocksBeingWritten under 
> certain circumstances.
> Log snippets from RS (RegionServer), NN (NameNode), DN (DataNode):
> {noformat}
> RS 2013-06-29 11:16:06,812 DEBUG org.apache.hadoop.hbase.util.FSUtils: 
> Creating 
> file=hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
>  with permission=rwxrwxrwx
> NN 2013-06-29 11:16:06,830 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> NameSystem.allocateBlock: 
> /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c.
>  blk_1395839728632046111_357084589
> DN 2013-06-29 11:16:06,832 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block 
> blk_1395839728632046111_357084589 src: /10.0.5.237:14327 dest: 
> /10.0.5.237:50010
> NN 2013-06-29 11:16:11,370 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> NameSystem.addStoredBlock: blockMap updated: 10.0.6.1:50010 is added to 
> blk_1395839728632046111_357084589 size 25418340
> NN 2013-06-29 11:16:11,370 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> NameSystem.addStoredBlock: blockMap updated: 10.0.6.24:50010 is added to 
> blk_1395839728632046111_357084589 size 25418340
> NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> NameSystem.addStoredBlock: blockMap updated: 10.0.5.237:50010 is added to 
> blk_1395839728632046111_357084589 size 25418340
> DN 2013-06-29 11:16:11,385 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Received block 
> blk_1395839728632046111_357084589 of size 25418340 from /10.0.5.237:14327
> DN 2013-06-29 11:16:11,385 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 2 for block 
> blk_1395839728632046111_357084589 terminating
> NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: Removing 
> lease on  file 
> /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
>  from client DFSClient_hb_rs_hs745,60020,1372470111932
> NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: DIR* 
> NameSystem.completeFile: file 
> /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
>  is closed by DFSClient_hb_rs_hs745,60020,1372470111932
> RS 2013-06-29 11:16:11,393 INFO org.apache.hadoop.hbase.regionserver.Store: 
> Renaming compacted file at 
> hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
>  to 
> hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/n/6e0cc30af6e64e56ba5a539fdf159c4c
> RS 2013-06-29 11:16:11,505 INFO org.apache.hadoop.hbase.regionserver.Store: 
> Completed major compaction of 7 file(s) in n of 
> users-6,\x12\xBDp\xA3,1359426311784.b5b0820cde759ae68e333b2f4015bb7e. into 
> 6e0cc30af6e64e56ba5a539fdf159c4c, size=24

[jira] [Commented] (HDFS-5042) Completed files lost after power failure

2014-10-02 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157328#comment-14157328
 ] 

Lars Hofhansl commented on HDFS-5042:
-

Is this a problem when enabling write barriers on the DNs? EXT3 has them off by 
default.
In that case we might need to move the file in place first and then fsync the 
file, that should force the meta updates in order... I'm sure that'd cause 
other problems.


> Completed files lost after power failure
> 
>
> Key: HDFS-5042
> URL: https://issues.apache.org/jira/browse/HDFS-5042
> Project: Hadoop HDFS
>  Issue Type: Bug
> Environment: ext3 on CentOS 5.7 (kernel 2.6.18-274.el5)
>Reporter: Dave Latham
>Priority: Critical
>
> We suffered a cluster wide power failure after which HDFS lost data that it 
> had acknowledged as closed and complete.
> The client was HBase which compacted a set of HFiles into a new HFile, then 
> after closing the file successfully, deleted the previous versions of the 
> file.  The cluster then lost power, and when brought back up the newly 
> created file was marked CORRUPT.
> Based on reading the logs it looks like the replicas were created by the 
> DataNodes in the 'blocksBeingWritten' directory.  Then when the file was 
> closed they were moved to the 'current' directory.  After the power cycle 
> those replicas were again in the blocksBeingWritten directory of the 
> underlying file system (ext3).  When those DataNodes reported in to the 
> NameNode it deleted those replicas and lost the file.
> Some possible fixes could be having the DataNode fsync the directory(s) after 
> moving the block from blocksBeingWritten to current to ensure the rename is 
> durable or having the NameNode accept replicas from blocksBeingWritten under 
> certain circumstances.
> Log snippets from RS (RegionServer), NN (NameNode), DN (DataNode):
> {noformat}
> RS 2013-06-29 11:16:06,812 DEBUG org.apache.hadoop.hbase.util.FSUtils: 
> Creating 
> file=hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
>  with permission=rwxrwxrwx
> NN 2013-06-29 11:16:06,830 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> NameSystem.allocateBlock: 
> /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c.
>  blk_1395839728632046111_357084589
> DN 2013-06-29 11:16:06,832 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block 
> blk_1395839728632046111_357084589 src: /10.0.5.237:14327 dest: 
> /10.0.5.237:50010
> NN 2013-06-29 11:16:11,370 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> NameSystem.addStoredBlock: blockMap updated: 10.0.6.1:50010 is added to 
> blk_1395839728632046111_357084589 size 25418340
> NN 2013-06-29 11:16:11,370 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> NameSystem.addStoredBlock: blockMap updated: 10.0.6.24:50010 is added to 
> blk_1395839728632046111_357084589 size 25418340
> NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> NameSystem.addStoredBlock: blockMap updated: 10.0.5.237:50010 is added to 
> blk_1395839728632046111_357084589 size 25418340
> DN 2013-06-29 11:16:11,385 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Received block 
> blk_1395839728632046111_357084589 of size 25418340 from /10.0.5.237:14327
> DN 2013-06-29 11:16:11,385 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 2 for block 
> blk_1395839728632046111_357084589 terminating
> NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: Removing 
> lease on  file 
> /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
>  from client DFSClient_hb_rs_hs745,60020,1372470111932
> NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: DIR* 
> NameSystem.completeFile: file 
> /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
>  is closed by DFSClient_hb_rs_hs745,60020,1372470111932
> RS 2013-06-29 11:16:11,393 INFO org.apache.hadoop.hbase.regionserver.Store: 
> Renaming compacted file at 
> hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
>  to 
> hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/n/6e0cc30af6e64e56ba5a539fdf159c4c
> RS 2013-06-29 11:16:11,505 INFO org.apache.hadoop.hbase.regionserver.Store: 
> Completed major compaction of 7 file(s) in n of 
> users-6,\x12\xBDp\xA3,1359426311784.b5b0820cde759ae68e333b2f4015bb7e. into 
> 6e0cc30af6e64e56ba5a539fdf159c4c, size=24.2m; total size for store is 24.2m
> ---  CRASH, RESTART -
> NN 2013-06-29 12:01:19,743 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> NameSystem.addStoredBlock: addStoredBlock request received for 
> blk_1395839728632046111_357084589 on 10.0.6.1:5

[jira] [Commented] (HDFS-4455) Datanode sometimes gives up permanently on Namenode in HA setup

2014-05-18 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001009#comment-14001009
 ] 

Lars Hofhansl commented on HDFS-4455:
-

Looked at HDFS-2882. I agree that should fix this issue.

> Datanode sometimes gives up permanently on Namenode in HA setup
> ---
>
> Key: HDFS-4455
> URL: https://issues.apache.org/jira/browse/HDFS-4455
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, ha
>Affects Versions: 2.0.2-alpha
>Reporter: Lars Hofhansl
>Assignee: Juan Yu
>Priority: Critical
>
> Today we got ourselves into a situation where we hard killed the cluster 
> (kill -9 across the board on all processes) and upon restarting all DNs would 
> permanently give up on of the NNs in our two NN HA setup (using QJM).
> The HA setup is correct (prior to this we failed over the NNs many times for 
> testing). Bouncing the DNs resolved the problem.
> In the logs I see this exception:
> {code}
> 2013-01-29 23:32:49,461 FATAL datanode.DataNode - Initialization failed for 
> block pool Block pool BP-1852726028--1358813649047 (storage id 
> DS-60505003--50010-1353106051747) service to /:8020
> java.io.IOException: Failed on local exception: java.io.IOException: Response 
> is null.; Host Details : local host is: "/"; destination host is: 
> "":8020; 
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:759)
> at org.apache.hadoop.ipc.Client.call(Client.java:1164)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
> at $Proxy10.registerDatanode(Unknown Source)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
> at $Proxy10.registerDatanode(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:149)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:619)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:221)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:661)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.io.IOException: Response is null.
> at 
> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:885)
> at org.apache.hadoop.ipc.Client$Connection.run(Client.java:813)
> 2013-01-29 23:32:49,463 WARN  datanode.DataNode - Ending block pool service 
> for: Block pool BP-1852726028--1358813649047 (storage id 
> DS-60505003--50010-1353106051747) service to /:8020
> {code}
> So somehow in BPServiceActor.connectToNNAndHandshake() we made it all the way 
> to register(). Then failed in bpNamenode.registerDatanode(bpRegistration) 
> with an IOException, which is not caught and has the block pool service fail 
> as a whole.
> No doubt that was caused by one of the NNs being a weird state. While that 
> happened the active NN claimed that the FS was corrupted and stayed in safe 
> mode, and DNs only registered with the standby DN. Failing over to the 2nd NN 
> and then restarting the first NN and failing did not change that.
> No amount bouncing/failing over the HA NNs would have the DNs reconnect to 
> one of the NNs.
> In BPServiceActor.register(), should we catch IOException instead of 
> SocketTimeoutException? That way it would continue to retry and eventually 
> connect to the NN.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (HDFS-4455) Datanode sometimes gives up permanently on Namenode in HA setup

2014-05-17 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-4455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HDFS-4455.
-

Resolution: Implemented

> Datanode sometimes gives up permanently on Namenode in HA setup
> ---
>
> Key: HDFS-4455
> URL: https://issues.apache.org/jira/browse/HDFS-4455
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, ha
>Affects Versions: 2.0.2-alpha
>Reporter: Lars Hofhansl
>Assignee: Juan Yu
>Priority: Critical
>
> Today we got ourselves into a situation where we hard killed the cluster 
> (kill -9 across the board on all processes) and upon restarting all DNs would 
> permanently give up on of the NNs in our two NN HA setup (using QJM).
> The HA setup is correct (prior to this we failed over the NNs many times for 
> testing). Bouncing the DNs resolved the problem.
> In the logs I see this exception:
> {code}
> 2013-01-29 23:32:49,461 FATAL datanode.DataNode - Initialization failed for 
> block pool Block pool BP-1852726028--1358813649047 (storage id 
> DS-60505003--50010-1353106051747) service to /:8020
> java.io.IOException: Failed on local exception: java.io.IOException: Response 
> is null.; Host Details : local host is: "/"; destination host is: 
> "":8020; 
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:759)
> at org.apache.hadoop.ipc.Client.call(Client.java:1164)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
> at $Proxy10.registerDatanode(Unknown Source)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
> at $Proxy10.registerDatanode(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:149)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:619)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:221)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:661)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.io.IOException: Response is null.
> at 
> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:885)
> at org.apache.hadoop.ipc.Client$Connection.run(Client.java:813)
> 2013-01-29 23:32:49,463 WARN  datanode.DataNode - Ending block pool service 
> for: Block pool BP-1852726028--1358813649047 (storage id 
> DS-60505003--50010-1353106051747) service to /:8020
> {code}
> So somehow in BPServiceActor.connectToNNAndHandshake() we made it all the way 
> to register(). Then failed in bpNamenode.registerDatanode(bpRegistration) 
> with an IOException, which is not caught and has the block pool service fail 
> as a whole.
> No doubt that was caused by one of the NNs being a weird state. While that 
> happened the active NN claimed that the FS was corrupted and stayed in safe 
> mode, and DNs only registered with the standby DN. Failing over to the 2nd NN 
> and then restarting the first NN and failing did not change that.
> No amount bouncing/failing over the HA NNs would have the DNs reconnect to 
> one of the NNs.
> In BPServiceActor.register(), should we catch IOException instead of 
> SocketTimeoutException? That way it would continue to retry and eventually 
> connect to the NN.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HDFS-5571) postScannerFilterRow consumes a lot of CPU in tall table scans

2013-11-26 Thread Lars Hofhansl (JIRA)

Lars Hofhansl created HDFS-5571:
---

 Summary: postScannerFilterRow consumes a lot of CPU in tall table 
scans
 Key: HDFS-5571
 URL: https://issues.apache.org/jira/browse/HDFS-5571
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Lars Hofhansl


Continuing my profiling quest, I find that in scanning tall table (and 
filtering everything on the server) a quarter of the time is now spent in the 
postScannerFilterRow coprocessor hook.




--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HDFS-5461) fallback to non-ssr(local short circuit reads) while oom detected

2013-11-13 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13821560#comment-13821560
 ] 

Lars Hofhansl commented on HDFS-5461:
-

The issue is that the JDK only collects direct byte buffers during a full GC, 
and there are different limits for the direct buffer and the general heap. 
HBase keeps a reader open for each store file and thus we end up with a lot of 
direct memory used.

I was actually curious about 1mb as the default size; it seems even as little 
8kb should be OK.

> fallback to non-ssr(local short circuit reads) while oom detected
> -
>
> Key: HDFS-5461
> URL: https://issues.apache.org/jira/browse/HDFS-5461
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-5461.txt
>
>
> Currently, the DirectBufferPool used by ssr feature seems doesn't have a 
> upper-bound limit except DirectMemory VM option. So there's a risk to 
> encounter direct memory oom. see HBASE-8143 for example.
> IMHO, maybe we could improve it a bit:
> 1) detect OOM or reach a setting up-limit from caller, then fallback to 
> non-ssr
> 2) add a new metric about current raw consumed direct memory size.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HDFS-2834) ByteBuffer-based read API for DFSInputStream

2013-08-07 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13733186#comment-13733186
 ] 

Lars Hofhansl commented on HDFS-2834:
-

Just for reference with many open files one can easily OOM on direct buffer 
memory. See: HBASE-8143.
1MB seems to be a rather large default.


> ByteBuffer-based read API for DFSInputStream
> 
>
> Key: HDFS-2834
> URL: https://issues.apache.org/jira/browse/HDFS-2834
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, performance
>Reporter: Henry Robinson
>Assignee: Henry Robinson
> Fix For: 2.0.2-alpha
>
> Attachments: HDFS-2834.10.patch, HDFS-2834.11.patch, 
> HDFS-2834.3.patch, HDFS-2834.4.patch, HDFS-2834.5.patch, HDFS-2834.6.patch, 
> HDFS-2834.7.patch, HDFS-2834.8.patch, HDFS-2834.9.patch, 
> hdfs-2834-libhdfs-benchmark.png, HDFS-2834-no-common.patch, HDFS-2834.patch, 
> HDFS-2834.patch
>
>
> The {{DFSInputStream}} read-path always copies bytes into a JVM-allocated 
> {{byte[]}}. Although for many clients this is desired behaviour, in certain 
> situations, such as native-reads through libhdfs, this imposes an extra copy 
> penalty since the {{byte[]}} needs to be copied out again into a natively 
> readable memory area. 
> For these cases, it would be preferable to allow the client to supply its own 
> buffer, wrapped in a {{ByteBuffer}}, to avoid that final copy overhead. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1783) Ability for HDFS client to write replicas in parallel

2013-03-06 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HDFS-1783:


Assignee: (was: Lars Hofhansl)

> Ability for HDFS client to write replicas in parallel
> -
>
> Key: HDFS-1783
> URL: https://issues.apache.org/jira/browse/HDFS-1783
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: dhruba borthakur
> Attachments: HDFS-1783-trunk.patch, HDFS-1783-trunk-v2.patch, 
> HDFS-1783-trunk-v3.patch, HDFS-1783-trunk-v4.patch, HDFS-1783-trunk-v5.patch
>
>
> The current implementation of HDFS pipelines the writes to the three 
> replicas. This introduces some latency for realtime latency sensitive 
> applications. An alternate implementation that allows the client to write all 
> replicas in parallel gives much better response times to these applications. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-4455) Datanode sometimes gives up permanently on Namenode in HA setup

2013-01-29 Thread Lars Hofhansl (JIRA)

Lars Hofhansl created HDFS-4455:
---

 Summary: Datanode sometimes gives up permanently on Namenode in HA 
setup
 Key: HDFS-4455
 URL: https://issues.apache.org/jira/browse/HDFS-4455
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Reporter: Lars Hofhansl


Today we got ourselves into a situation where we hard killed the cluster (kill 
-9 across the board on all processes) and upon restarting all DNs would 
permanently give up on of the NNs in our two NN HA setup (using QJM).

The HA setup is correct (prior to this we failed over the NNs many times for 
testing). Bouncing the DNs resolved the problem.

In the logs I see this exception:
{code}
2013-01-29 23:32:49,461 FATAL datanode.DataNode - Initialization failed for 
block pool Block pool BP-1852726028--1358813649047 (storage id 
DS-60505003--50010-1353106051747) service to /:8020
java.io.IOException: Failed on local exception: java.io.IOException: Response 
is null.; Host Details : local host is: "/"; destination host is: 
"":8020; 
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:759)
at org.apache.hadoop.ipc.Client.call(Client.java:1164)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at $Proxy10.registerDatanode(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
at $Proxy10.registerDatanode(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:149)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:619)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:221)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:661)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Response is null.
at 
org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:885)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:813)
2013-01-29 23:32:49,463 WARN  datanode.DataNode - Ending block pool service 
for: Block pool BP-1852726028--1358813649047 (storage id 
DS-60505003--50010-1353106051747) service to /:8020
{code}

So somehow in BPServiceActor.connectToNNAndHandshake() we made it all the way 
to register(). Then failed in bpNamenode.registerDatanode(bpRegistration) with 
an IOException, which is not caught and has the block pool service fail as a 
whole.

No doubt that was caused by one of the NNs being a weird state. While that 
happened the active NN claimed that the FS was corrupted and stayed in safe 
mode, and DNs only registered with the standby DN. Failing over to the 2nd NN 
and then restarting the first NN and failing did not change that.

No amount bouncing/failing over the HA NNs would have the DNs reconnect to one 
of the NNs.

In BPServiceActor.register(), should we catch IOException instead of 
SocketTimeoutException? That way it would continue to retry and eventually 
connect to the NN.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-4345) Release resources of unpoolable Decompressors

2012-12-27 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HDFS-4345:


Status: Patch Available  (was: Open)

> Release resources of unpoolable Decompressors
> -
>
> Key: HDFS-4345
> URL: https://issues.apache.org/jira/browse/HDFS-4345
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HDFS-4345.txt
>
>
> Found this when looking into HBASE-7435.
> When a Decompressor is returned to the pool in CodecPool.java, we should 
> probably call end() on it to release its resources.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-4345) Release resources of unpoolable Decompressors

2012-12-27 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HDFS-4345:


Attachment: HDFS-4345.txt

Here's a two-line change for that.
(Also calls end() on compressors)

> Release resources of unpoolable Decompressors
> -
>
> Key: HDFS-4345
> URL: https://issues.apache.org/jira/browse/HDFS-4345
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HDFS-4345.txt
>
>
> Found this when looking into HBASE-7435.
> When a Decompressor is returned to the pool in CodecPool.java, we should 
> probably call end() on it to release its resources.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-4345) Release resources of unpoolable Decompressors

2012-12-27 Thread Lars Hofhansl (JIRA)

Lars Hofhansl created HDFS-4345:
---

 Summary: Release resources of unpoolable Decompressors
 Key: HDFS-4345
 URL: https://issues.apache.org/jira/browse/HDFS-4345
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 3.0.0


Found this when looking into HBASE-7435.
When a Decompressor is returned to the pool in CodecPool.java, we should 
probably call end() on it to release its resources.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-4060) TestHSync#testSequenceFileSync failed

2012-11-09 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-4060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HDFS-4060.
-

Resolution: Duplicate

This is fixed with the changed to HDFS-3979

> TestHSync#testSequenceFileSync failed
> -
>
> Key: HDFS-4060
> URL: https://issues.apache.org/jira/browse/HDFS-4060
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eli Collins
>  Labels: test-fail
>
> TestHSync#testSequenceFileSync failed in the pre commit run of HDFS-4055.
> {noformat}
> java.lang.AssertionError: Bad value for metric FsyncCount expected:<2> but 
> was:<1>
>   at org.junit.Assert.fail(Assert.java:91)
>   at org.junit.Assert.failNotEquals(Assert.java:645)
>   at org.junit.Assert.assertEquals(Assert.java:126)
>   at org.junit.Assert.assertEquals(Assert.java:470)
>   at 
> org.apache.hadoop.test.MetricsAsserts.assertCounter(MetricsAsserts.java:228)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestHSync.checkSyncMetric(TestHSync.java:46)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestHSync.checkSyncMetric(TestHSync.java:49)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestHSync.testSequenceFileSync(TestHSync.java:158)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3979) Fix hsync semantics

2012-11-09 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494373#comment-13494373
 ] 

Lars Hofhansl commented on HDFS-3979:
-

Thanks Nicholas.

Luke, so the test you're looking for is starting 3 DNs, and then have the write 
permanently fail at any of them, and in all cases have the hsync fail on the 
client, right?


> Fix hsync semantics
> ---
>
> Key: HDFS-3979
> URL: https://issues.apache.org/jira/browse/HDFS-3979
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 2.0.2-alpha
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 2.0.3-alpha
>
> Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt, 
> hdfs-3979-v3.txt, hdfs-3979-v4.txt
>
>
> See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
> is not on a synchronous path from the DFSClient, hence it is possible that a 
> DN loses data that it has already acknowledged as persisted to a client.
> Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3979) Fix hsync and hflush semantics.

2012-11-04 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HDFS-3979:


Attachment: hdfs-3979-v4.txt

Updated patch with Nicholas' suggestion.

I agree that the previous patch would have slowed all writes that reach the DN.
We can't distinguish between an hflush from the client and "normal" packet from 
the client.
On the other hand this no longer deals with Luke's "kill -9" scenario (where a 
cluster management tool would kill -9 datanodes in parallel), but in the end no 
tool really should do that.


> Fix hsync and hflush semantics.
> ---
>
> Key: HDFS-3979
> URL: https://issues.apache.org/jira/browse/HDFS-3979
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node, hdfs client
>Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt, 
> hdfs-3979-v3.txt, hdfs-3979-v4.txt
>
>
> See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
> is not on a synchronous path from the DFSClient, hence it is possible that a 
> DN loses data that it has already acknowledged as persisted to a client.
> Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-11-04 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13490445#comment-13490445
 ] 

Lars Hofhansl commented on HDFS-3979:
-

I'll make that change.

> Fix hsync and hflush semantics.
> ---
>
> Key: HDFS-3979
> URL: https://issues.apache.org/jira/browse/HDFS-3979
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node, hdfs client
>Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt, hdfs-3979-v3.txt
>
>
> See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
> is not on a synchronous path from the DFSClient, hence it is possible that a 
> DN loses data that it has already acknowledged as persisted to a client.
> Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-11-03 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13490113#comment-13490113
 ] 

Lars Hofhansl commented on HDFS-3979:
-

Hi Kan, the only difference between v2 and v3 is that in v3 the "fsync" metric 
is updated after the actual sync to the FS (BlockReceiver.flushOrSync).

This exposes the race condition we want to fix and makes TestHSync fail almost 
every run (the client return from hsync before the datanode could update the 
metric). With the rest of this patch applies this race is removed and TestHSync 
never fails.

So now we have a test case for the race condition.

[~vicaya] The existing tests: TestFiPipelines and TestFiHFlush do not cover the 
other scenarios you worry about?


> Fix hsync and hflush semantics.
> ---
>
> Key: HDFS-3979
> URL: https://issues.apache.org/jira/browse/HDFS-3979
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node, hdfs client
>Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt, hdfs-3979-v3.txt
>
>
> See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
> is not on a synchronous path from the DFSClient, hence it is possible that a 
> DN loses data that it has already acknowledged as persisted to a client.
> Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4060) TestHSync#testSequenceFileSync failed

2012-10-16 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477584#comment-13477584
 ] 

Lars Hofhansl commented on HDFS-4060:
-

The sync to disk is not actually on a synchronous path as seen from the client, 
so there is a short race that the client returns but the metric was not updated.

See HDFS-3979, which would fix the issue, but appears to be stuck in discussion 
about what extra tests it would need, if any.


> TestHSync#testSequenceFileSync failed
> -
>
> Key: HDFS-4060
> URL: https://issues.apache.org/jira/browse/HDFS-4060
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eli Collins
>  Labels: test-fail
>
> TestHSync#testSequenceFileSync failed in the pre commit run of HDFS-4055.
> {noformat}
> java.lang.AssertionError: Bad value for metric FsyncCount expected:<2> but 
> was:<1>
>   at org.junit.Assert.fail(Assert.java:91)
>   at org.junit.Assert.failNotEquals(Assert.java:645)
>   at org.junit.Assert.assertEquals(Assert.java:126)
>   at org.junit.Assert.assertEquals(Assert.java:470)
>   at 
> org.apache.hadoop.test.MetricsAsserts.assertCounter(MetricsAsserts.java:228)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestHSync.checkSyncMetric(TestHSync.java:46)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestHSync.checkSyncMetric(TestHSync.java:49)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestHSync.testSequenceFileSync(TestHSync.java:158)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3979) Fix hsync and hflush semantics.

2012-10-12 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HDFS-3979:


Attachment: hdfs-3979-v3.txt

This little change makes TestHSync fail most of the time - without the rest of 
the patch, and never with this patch.

(In HDFS-744 I had avoided this race, by updating the sync metric first. I know 
that was a hack... By updating the metric last in BlockReceiver.flushOrSync, 
this race becomes apparent again).

We do have pipeline tests that seem to verify correct pipeline behavior in the 
face of failures via fault injection: TestFiPipelines and TestFiHFlush.

In terms of the API3/API4 discussion, I think we agree that hflush should 
follow API4, right? (otherwise we'd have unduly complex code)


> Fix hsync and hflush semantics.
> ---
>
> Key: HDFS-3979
> URL: https://issues.apache.org/jira/browse/HDFS-3979
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node, hdfs client
>Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt, hdfs-3979-v3.txt
>
>
> See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
> is not on a synchronous path from the DFSClient, hence it is possible that a 
> DN loses data that it has already acknowledged as persisted to a client.
> Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-10-04 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470016#comment-13470016
 ] 

Lars Hofhansl commented on HDFS-3979:
-

API4 is hflush (with change in OS buffers).

That's an interesting discussion by itself. hsync'ing every edit in HBase is 
prohibitive.
I have some simple numbers in HBASE-5954.

Although, I need to do that test again with the sync_file_range changes in 
HDFS-2465 (that would hopefully do most of the data sync'ing asynchronously and 
only sync the last changes and metadata synchronously upon client request).

Many applications do not need every edit to be guaranteed on disk, but have 
"sync points". That is what I am aiming for in HBase. The application will know 
the specific semantics.

What is really important for HBase (IMHO) is that every block is synced to disk 
when it is closed. HBase constantly rewrites existing data via compactions so 
without syncing arbitrarily old data can be lost during a rack or DC outage.

Lastly, we can play with this. For example only one of the replicas could sync 
to disk and the other's just guarantee the data in the OS buffers (API4.5 :) ).


> Fix hsync and hflush semantics.
> ---
>
> Key: HDFS-3979
> URL: https://issues.apache.org/jira/browse/HDFS-3979
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node, hdfs client
>Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt
>
>
> See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
> is not on a synchronous path from the DFSClient, hence it is possible that a 
> DN loses data that it has already acknowledged as persisted to a client.
> Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-10-04 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13469827#comment-13469827
 ] 

Lars Hofhansl commented on HDFS-3979:
-

Thanks Luke and Kan. I'll come up with a test once I get some spare cycles 
(quite busy with HBase atm).

> Fix hsync and hflush semantics.
> ---
>
> Key: HDFS-3979
> URL: https://issues.apache.org/jira/browse/HDFS-3979
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node, hdfs client
>Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt
>
>
> See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
> is not on a synchronous path from the DFSClient, hence it is possible that a 
> DN loses data that it has already acknowledged as persisted to a client.
> Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-10-04 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13469671#comment-13469671
 ] 

Lars Hofhansl commented on HDFS-3979:
-

I've seen that race when I write a test for HDFS-744. I "fixed" it there by 
updating the metrics first... Ugh :)

I think I can make a test that fails at least with reasonable probability with 
the current semantics.

The race between ack and write errors should be reduced (eliminated) with this 
patch.

> Fix hsync and hflush semantics.
> ---
>
> Key: HDFS-3979
> URL: https://issues.apache.org/jira/browse/HDFS-3979
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node, hdfs client
>Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt
>
>
> See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
> is not on a synchronous path from the DFSClient, hence it is possible that a 
> DN loses data that it has already acknowledged as persisted to a client.
> Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-10-02 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13467882#comment-13467882
 ] 

Lars Hofhansl commented on HDFS-3979:
-

You don't think the existing pipeline tests cover the failure scenarios? 
I see if I can get some performance numbers.

> Fix hsync and hflush semantics.
> ---
>
> Key: HDFS-3979
> URL: https://issues.apache.org/jira/browse/HDFS-3979
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node, hdfs client
>Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt
>
>
> See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
> is not on a synchronous path from the DFSClient, hence it is possible that a 
> DN loses data that it has already acknowledged as persisted to a client.
> Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-10-01 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13467270#comment-13467270
 ] 

Lars Hofhansl commented on HDFS-3979:
-

Do we want this change?
Seems to me that HDFS-265 broke hsync/hflush and this would fix it.


> Fix hsync and hflush semantics.
> ---
>
> Key: HDFS-3979
> URL: https://issues.apache.org/jira/browse/HDFS-3979
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node, hdfs client
>Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt
>
>
> See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
> is not on a synchronous path from the DFSClient, hence it is possible that a 
> DN loses data that it has already acknowledged as persisted to a client.
> Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-09-28 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13466113#comment-13466113
 ] 

Lars Hofhansl commented on HDFS-3979:
-

I see. Thanks Kan. So now we we have API4 and (with HDFS-744) API5.

For applications like HBase we'd like API4 as well as API5.
(API4 allows a hypothetical kill -9 of all DNs without loss of acknowledged 
data, API5 allows HW failures of all data nodes - i.e. a DC outage - with loss 
of acknowledged data)


> Fix hsync and hflush semantics.
> ---
>
> Key: HDFS-3979
> URL: https://issues.apache.org/jira/browse/HDFS-3979
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node, hdfs client
>Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt
>
>
> See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
> is not on a synchronous path from the DFSClient, hence it is possible that a 
> DN loses data that it has already acknowledged as persisted to a client.
> Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3979) Fix hsync and hflush semantics.

2012-09-28 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HDFS-3979:


Attachment: hdfs-3979-v2.txt

New patch. Order of local operations and waiting for downstream DNs now 
reflects the pre HDFS-265 logic.

> Fix hsync and hflush semantics.
> ---
>
> Key: HDFS-3979
> URL: https://issues.apache.org/jira/browse/HDFS-3979
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node, hdfs client
>Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt
>
>
> See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
> is not on a synchronous path from the DFSClient, hence it is possible that a 
> DN loses data that it has already acknowledged as persisted to a client.
> Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-09-28 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13465915#comment-13465915
 ] 

Lars Hofhansl commented on HDFS-3979:
-

Enqueing the seqno at end seems like the best approach. (Indeed this is done in 
the 0.20.x code as both of you said). 
I wonder why this was changed? Will have a new patch momentarily.


> Fix hsync and hflush semantics.
> ---
>
> Key: HDFS-3979
> URL: https://issues.apache.org/jira/browse/HDFS-3979
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node, hdfs client
>Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Attachments: hdfs-3979-sketch.txt
>
>
> See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
> is not on a synchronous path from the DFSClient, hence it is possible that a 
> DN loses data that it has already acknowledged as persisted to a client.
> Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-09-27 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13465386#comment-13465386
 ] 

Lars Hofhansl commented on HDFS-3979:
-

Should we simply do the enqueue at the end of receivePacket(), then?

So just to make sure: In the current code the seqno is already enqueued in the 
beginning, so if there's an exception later in the code it won't have any 
effect on the enqued seqno. The finally is just preserves this existing 
behavior.

What happens when there is an exception and the seqno is never enqueued? (and 
if that is OK, why is it not a problem now.)


> Fix hsync and hflush semantics.
> ---
>
> Key: HDFS-3979
> URL: https://issues.apache.org/jira/browse/HDFS-3979
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node, hdfs client
>Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Attachments: hdfs-3979-sketch.txt
>
>
> See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
> is not on a synchronous path from the DFSClient, hence it is possible that a 
> DN loses data that it has already acknowledged as persisted to a client.
> Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-09-26 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13464410#comment-13464410
 ] 

Lars Hofhansl commented on HDFS-3979:
-

I'm not sure either. I am trying not to change the existing behavior.

The enqueue used to happen in the beginning of receivePacket(...), so if that 
latter part of the method fails the ack would already be enqueued.


> Fix hsync and hflush semantics.
> ---
>
> Key: HDFS-3979
> URL: https://issues.apache.org/jira/browse/HDFS-3979
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node, hdfs client
>Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Attachments: hdfs-3979-sketch.txt
>
>
> See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
> is not on a synchronous path from the DFSClient, hence it is possible that a 
> DN loses data that it has already acknowledged as persisted to a client.
> Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3979) Fix hsync and hflush semantics.

2012-09-26 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HDFS-3979:


Description: 
See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver is 
not on a synchronous path from the DFSClient, hence it is possible that a DN 
loses data that it has already acknowledged as persisted to a client.

Edit: Spelling.


  was:See discussion in HDFS-744. The actual sync/flush operation in 
BlockReceiver is not on a synchronous path from the DFSClient, hence it is 
possible that DN loses data that is has already acknowledged as persisted to a 
client.


> Fix hsync and hflush semantics.
> ---
>
> Key: HDFS-3979
> URL: https://issues.apache.org/jira/browse/HDFS-3979
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Attachments: hdfs-3979-sketch.txt
>
>
> See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
> is not on a synchronous path from the DFSClient, hence it is possible that a 
> DN loses data that it has already acknowledged as persisted to a client.
> Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3979) Fix hsync and hflush semantics.

2012-09-26 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HDFS-3979:


Status: Patch Available  (was: Open)

Let's try HadoopQA.

TestHSync still passes. I'll also do some tests with HBase...


> Fix hsync and hflush semantics.
> ---
>
> Key: HDFS-3979
> URL: https://issues.apache.org/jira/browse/HDFS-3979
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Attachments: hdfs-3979-sketch.txt
>
>
> See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
> is not on a synchronous path from the DFSClient, hence it is possible that DN 
> loses data that is has already acknowledged as persisted to a client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-09-26 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13464133#comment-13464133
 ] 

Lars Hofhansl commented on HDFS-3979:
-

(and sorry for misspelling you name)

> Fix hsync and hflush semantics.
> ---
>
> Key: HDFS-3979
> URL: https://issues.apache.org/jira/browse/HDFS-3979
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Attachments: hdfs-3979-sketch.txt
>
>
> See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
> is not on a synchronous path from the DFSClient, hence it is possible that DN 
> loses data that is has already acknowledged as persisted to a client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3979) Fix hsync and hflush semantics.

2012-09-26 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HDFS-3979:


Attachment: hdfs-3979-sketch.txt

Something like this.
(This is a sketch, the only test I performed was compiling)

> Fix hsync and hflush semantics.
> ---
>
> Key: HDFS-3979
> URL: https://issues.apache.org/jira/browse/HDFS-3979
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Attachments: hdfs-3979-sketch.txt
>
>
> See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
> is not on a synchronous path from the DFSClient, hence it is possible that DN 
> loses data that is has already acknowledged as persisted to a client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-09-26 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13464020#comment-13464020
 ] 

Lars Hofhansl commented on HDFS-3979:
-

More good discussion on HDFS-744.

Looks like we can just enqueu the seqno for the packet after the sync/flush is 
finished. (Khan's idea)

> Fix hsync and hflush semantics.
> ---
>
> Key: HDFS-3979
> URL: https://issues.apache.org/jira/browse/HDFS-3979
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
>
> See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
> is not on a synchronous path from the DFSClient, hence it is possible that DN 
> loses data that is has already acknowledged as persisted to a client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-744) Support hsync in HDFS

2012-09-26 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13464005#comment-13464005
 ] 

Lars Hofhansl commented on HDFS-744:


BTW. I filed HDFS-3979 to do that.

> Support hsync in HDFS
> -
>
> Key: HDFS-744
> URL: https://issues.apache.org/jira/browse/HDFS-744
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node, hdfs client
>Reporter: Hairong Kuang
>Assignee: Lars Hofhansl
> Fix For: 2.0.2-alpha
>
> Attachments: HDFS-744-2.0-v1.patch, HDFS-744-2.0-v2.patch, 
> HDFS-744-trunk.patch, HDFS-744-trunk-v2.patch, HDFS-744-trunk-v3.patch, 
> HDFS-744-trunk-v4.patch, HDFS-744-trunk-v5.patch, HDFS-744-trunk-v6.patch, 
> HDFS-744-trunk-v7.patch, HDFS-744-trunk-v8.patch, hdfs-744.txt, 
> hdfs-744-v2.txt, hdfs-744-v3.txt
>
>
> HDFS-731 implements hsync by default as hflush. As descriibed in HADOOP-6313, 
> the real expected semantics should be "flushes out to all replicas and all 
> replicas have done posix fsync equivalent - ie the OS has flushed it to the 
> disk device (but the disk may have it in its cache)." This jira aims to 
> implement the expected behaviour.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-744) Support hsync in HDFS

2012-09-26 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13464004#comment-13464004
 ] 

Lars Hofhansl commented on HDFS-744:


I see. In that case we wouldn't ack back until all local work in done.
A possible place to do that would be almost at the end of receivePacket(), 
maybe in a finally block of the last try/catch in that method.

That still does not take of all the cases, though; for the last packet in a 
block, the sync is deferred to close() (to avoid double sync). That's not hard 
to to change either, I think.


> Support hsync in HDFS
> -
>
> Key: HDFS-744
> URL: https://issues.apache.org/jira/browse/HDFS-744
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node, hdfs client
>Reporter: Hairong Kuang
>Assignee: Lars Hofhansl
> Fix For: 2.0.2-alpha
>
> Attachments: HDFS-744-2.0-v1.patch, HDFS-744-2.0-v2.patch, 
> HDFS-744-trunk.patch, HDFS-744-trunk-v2.patch, HDFS-744-trunk-v3.patch, 
> HDFS-744-trunk-v4.patch, HDFS-744-trunk-v5.patch, HDFS-744-trunk-v6.patch, 
> HDFS-744-trunk-v7.patch, HDFS-744-trunk-v8.patch, hdfs-744.txt, 
> hdfs-744-v2.txt, hdfs-744-v3.txt
>
>
> HDFS-731 implements hsync by default as hflush. As descriibed in HADOOP-6313, 
> the real expected semantics should be "flushes out to all replicas and all 
> replicas have done posix fsync equivalent - ie the OS has flushed it to the 
> disk device (but the disk may have it in its cache)." This jira aims to 
> implement the expected behaviour.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HDFS-3979) Fix hsync and hflush semantics.

2012-09-25 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl reassigned HDFS-3979:
---

Assignee: Lars Hofhansl

> Fix hsync and hflush semantics.
> ---
>
> Key: HDFS-3979
> URL: https://issues.apache.org/jira/browse/HDFS-3979
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
>
> See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
> is not on a synchronous path from the DFSClient, hence it is possible that DN 
> loses data that is has already acknowledged as persisted to a client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-3979) Fix hsync and hflush semantics.

2012-09-25 Thread Lars Hofhansl (JIRA)

Lars Hofhansl created HDFS-3979:
---

 Summary: Fix hsync and hflush semantics.
 Key: HDFS-3979
 URL: https://issues.apache.org/jira/browse/HDFS-3979
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Lars Hofhansl


See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver is 
not on a synchronous path from the DFSClient, hence it is possible that DN 
loses data that is has already acknowledged as persisted to a client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-09-25 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13463515#comment-13463515
 ] 

Lars Hofhansl commented on HDFS-3979:
-

Also see my comment here: 
https://issues.apache.org/jira/browse/HDFS-744?focusedCommentId=13279619&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13279619

> Fix hsync and hflush semantics.
> ---
>
> Key: HDFS-3979
> URL: https://issues.apache.org/jira/browse/HDFS-3979
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>
> See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
> is not on a synchronous path from the DFSClient, hence it is possible that DN 
> loses data that is has already acknowledged as persisted to a client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-744) Support hsync in HDFS

2012-09-25 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13463512#comment-13463512
 ] 

Lars Hofhansl commented on HDFS-744:


Another approach would be to wait in the responder until both the downstream 
datanode responded *and* the sync has finished. That way we get correctness and 
we can still interleave sync'ing/RTT in the pipeline.

> Support hsync in HDFS
> -
>
> Key: HDFS-744
> URL: https://issues.apache.org/jira/browse/HDFS-744
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node, hdfs client
>Reporter: Hairong Kuang
>Assignee: Lars Hofhansl
> Fix For: 2.0.2-alpha
>
> Attachments: HDFS-744-2.0-v1.patch, HDFS-744-2.0-v2.patch, 
> HDFS-744-trunk.patch, HDFS-744-trunk-v2.patch, HDFS-744-trunk-v3.patch, 
> HDFS-744-trunk-v4.patch, HDFS-744-trunk-v5.patch, HDFS-744-trunk-v6.patch, 
> HDFS-744-trunk-v7.patch, HDFS-744-trunk-v8.patch, hdfs-744.txt, 
> hdfs-744-v2.txt, hdfs-744-v3.txt
>
>
> HDFS-731 implements hsync by default as hflush. As descriibed in HADOOP-6313, 
> the real expected semantics should be "flushes out to all replicas and all 
> replicas have done posix fsync equivalent - ie the OS has flushed it to the 
> disk device (but the disk may have it in its cache)." This jira aims to 
> implement the expected behaviour.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-744) Support hsync in HDFS

2012-09-25 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13463505#comment-13463505
 ] 

Lars Hofhansl commented on HDFS-744:


I think the problem is that we have to enqueue the seqno before the packet is 
sent downstream right? (Otherwise we could potentially miss the ack, right?)

So in order to enqueue the seqno after we syncOrFlush, we'd also have to send 
the packet downstream after we syncOrFlush, which essentially means that we are 
serializing the sync times across all replicas.


> Support hsync in HDFS
> -
>
> Key: HDFS-744
> URL: https://issues.apache.org/jira/browse/HDFS-744
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node, hdfs client
>Reporter: Hairong Kuang
>Assignee: Lars Hofhansl
> Fix For: 2.0.2-alpha
>
> Attachments: HDFS-744-2.0-v1.patch, HDFS-744-2.0-v2.patch, 
> HDFS-744-trunk.patch, HDFS-744-trunk-v2.patch, HDFS-744-trunk-v3.patch, 
> HDFS-744-trunk-v4.patch, HDFS-744-trunk-v5.patch, HDFS-744-trunk-v6.patch, 
> HDFS-744-trunk-v7.patch, HDFS-744-trunk-v8.patch, hdfs-744.txt, 
> hdfs-744-v2.txt, hdfs-744-v3.txt
>
>
> HDFS-731 implements hsync by default as hflush. As descriibed in HADOOP-6313, 
> the real expected semantics should be "flushes out to all replicas and all 
> replicas have done posix fsync equivalent - ie the OS has flushed it to the 
> disk device (but the disk may have it in its cache)." This jira aims to 
> implement the expected behaviour.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-744) Support hsync in HDFS

2012-09-25 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13462869#comment-13462869
 ] 

Lars Hofhansl commented on HDFS-744:


In any case, hsync and hflush should be fixed together. A one-off for hsync 
does not seem to be the right thing.

> Support hsync in HDFS
> -
>
> Key: HDFS-744
> URL: https://issues.apache.org/jira/browse/HDFS-744
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node, hdfs client
>Reporter: Hairong Kuang
>Assignee: Lars Hofhansl
> Fix For: 2.0.2-alpha
>
> Attachments: HDFS-744-2.0-v1.patch, HDFS-744-2.0-v2.patch, 
> HDFS-744-trunk.patch, HDFS-744-trunk-v2.patch, HDFS-744-trunk-v3.patch, 
> HDFS-744-trunk-v4.patch, HDFS-744-trunk-v5.patch, HDFS-744-trunk-v6.patch, 
> HDFS-744-trunk-v7.patch, HDFS-744-trunk-v8.patch, hdfs-744.txt, 
> hdfs-744-v2.txt, hdfs-744-v3.txt
>
>
> HDFS-731 implements hsync by default as hflush. As descriibed in HADOOP-6313, 
> the real expected semantics should be "flushes out to all replicas and all 
> replicas have done posix fsync equivalent - ie the OS has flushed it to the 
> disk device (but the disk may have it in its cache)." This jira aims to 
> implement the expected behaviour.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-744) Support hsync in HDFS

2012-09-25 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13462855#comment-13462855
 ] 

Lars Hofhansl commented on HDFS-744:


You are right Luke. I implemented this in the context of hadoop-2 (i.e. with 
HDFS-265).
It seems to get this right HDFS-265 needs to be revisited again.

Will look at your suggestion (doing sync in the data thread). As long as the 
syncs (or flushes) are not serialized it's fine (otherwise nobody is going to 
switch this on).


> Support hsync in HDFS
> -
>
> Key: HDFS-744
> URL: https://issues.apache.org/jira/browse/HDFS-744
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node, hdfs client
>Reporter: Hairong Kuang
>Assignee: Lars Hofhansl
> Fix For: 2.0.2-alpha
>
> Attachments: HDFS-744-2.0-v1.patch, HDFS-744-2.0-v2.patch, 
> HDFS-744-trunk.patch, HDFS-744-trunk-v2.patch, HDFS-744-trunk-v3.patch, 
> HDFS-744-trunk-v4.patch, HDFS-744-trunk-v5.patch, HDFS-744-trunk-v6.patch, 
> HDFS-744-trunk-v7.patch, HDFS-744-trunk-v8.patch, hdfs-744.txt, 
> hdfs-744-v2.txt, hdfs-744-v3.txt
>
>
> HDFS-731 implements hsync by default as hflush. As descriibed in HADOOP-6313, 
> the real expected semantics should be "flushes out to all replicas and all 
> replicas have done posix fsync equivalent - ie the OS has flushed it to the 
> disk device (but the disk may have it in its cache)." This jira aims to 
> implement the expected behaviour.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3721) hsync support broke wire compatibility

2012-07-26 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13423180#comment-13423180
 ] 

Lars Hofhansl commented on HDFS-3721:
-

Saw Todd's comment on HDFS-744 (moving that change to 2.0.x is not an option).
I had assumed that HDFS-744 would be in 2.0.x (in which case there would have 
been no compatibility issues).

Had a quick look through the patch here. Looks good as far as I can tell, I'll 
take more detailed look later today.


> hsync support broke wire compatibility
> --
>
> Key: HDFS-3721
> URL: https://issues.apache.org/jira/browse/HDFS-3721
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node, hdfs client
>Affects Versions: 2.1.0-alpha
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Critical
> Attachments: hdfs-3721.txt
>
>
> HDFS-744 added support for hsync to the data transfer wire protocol. However, 
> it actually broke wire compatibility: if the client has hsync support but the 
> server does not, the client cannot read or write data on the old cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-744) Support hsync in HDFS

2012-07-25 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13422842#comment-13422842
 ] 

Lars Hofhansl commented on HDFS-744:


Just noticed (because of HDFS-3721) that this is now in 2.1.0-alpha.
Any chance to get this into 2.0.x-alpha? (In fact on June 4th is was marked 
with 2.0.1-alpha, but something changed since then, so now it's 2.1.0-alpha)


> Support hsync in HDFS
> -
>
> Key: HDFS-744
> URL: https://issues.apache.org/jira/browse/HDFS-744
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node, hdfs client
>Reporter: Hairong Kuang
>Assignee: Lars Hofhansl
> Fix For: 2.1.0-alpha
>
> Attachments: HDFS-744-2.0-v1.patch, HDFS-744-2.0-v2.patch, 
> HDFS-744-trunk-v2.patch, HDFS-744-trunk-v3.patch, HDFS-744-trunk-v4.patch, 
> HDFS-744-trunk-v5.patch, HDFS-744-trunk-v6.patch, HDFS-744-trunk-v7.patch, 
> HDFS-744-trunk-v8.patch, HDFS-744-trunk.patch, hdfs-744-v2.txt, 
> hdfs-744-v3.txt, hdfs-744.txt
>
>
> HDFS-731 implements hsync by default as hflush. As descriibed in HADOOP-6313, 
> the real expected semantics should be "flushes out to all replicas and all 
> replicas have done posix fsync equivalent - ie the OS has flushed it to the 
> disk device (but the disk may have it in its cache)." This jira aims to 
> implement the expected behaviour.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3721) hsync support broke wire compatibility

2012-07-25 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13422840#comment-13422840
 ] 

Lars Hofhansl commented on HDFS-3721:
-

Oh this is a 2.1.x vs 2.0.x issue...?
The HDFS-744 patch is smaller than this patch, could we port that to 2.0.x?


> hsync support broke wire compatibility
> --
>
> Key: HDFS-3721
> URL: https://issues.apache.org/jira/browse/HDFS-3721
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node, hdfs client
>Affects Versions: 2.1.0-alpha
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Critical
> Attachments: hdfs-3721.txt
>
>
> HDFS-744 added support for hsync to the data transfer wire protocol. However, 
> it actually broke wire compatibility: if the client has hsync support but the 
> server does not, the client cannot read or write data on the old cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3721) hsync support broke wire compatibility

2012-07-25 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13422835#comment-13422835
 ] 

Lars Hofhansl commented on HDFS-3721:
-

What sort of wire-compatibility are we talking about? Both trunk and 2.0 have 
the hsync code. What sort of old cluster would not have this? Does the 2.x.x 
client support communicating with a 1.x.x cluster?

Apologies for introducing this with my patch in HDFS-744, I had assumed 
protobuf will take care of it.


> hsync support broke wire compatibility
> --
>
> Key: HDFS-3721
> URL: https://issues.apache.org/jira/browse/HDFS-3721
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node, hdfs client
>Affects Versions: 2.1.0-alpha
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Critical
> Attachments: hdfs-3721.txt
>
>
> HDFS-744 added support for hsync to the data transfer wire protocol. However, 
> it actually broke wire compatibility: if the client has hsync support but the 
> server does not, the client cannot read or write data on the old cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3580) incompatible types; no instance(s) of type variable(s) V exist so that V conforms to boolean compiling HttpFSServer.java with OpenJDK

2012-06-29 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404276#comment-13404276
 ] 

Lars Hofhansl commented on HDFS-3580:
-

You beat me to it Andy. :)

+1 on patch, identical patch compiled fine on my home machine with OpenJDK 
(don't access to it right now).

I am not sure whether the version with the primitive types is valid or not, but 
the version with reference types is definitely valid and should work with all 
JDKs.


> incompatible types; no instance(s) of type variable(s) V exist so that V 
> conforms to boolean compiling HttpFSServer.java with OpenJDK
> -
>
> Key: HDFS-3580
> URL: https://issues.apache.org/jira/browse/HDFS-3580
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.1-alpha
>Reporter: Andy Isaacson
>Assignee: Andy Isaacson
>Priority: Minor
> Attachments: hdfs-3580.txt
>
>
> {quote}
> [ERROR] 
> /home/lars/dev/hadoop-2/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/fs/http/server/HttpFSServer.java:[407,36]
>  incompatible types; no instance(s) of type variable(s) V exist so that V 
> conforms to boolean
> {quote}
> {quote}
> $ javac -version
> javac 1.6.0_24
> $ java -version
> java version "1.6.0_24"
> OpenJDK Runtime Environment (IcedTea6 1.11.3) (fedora-67.1.11.3.fc16-x86_64)
> OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)
> {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1783) Ability for HDFS client to write replicas in parallel

2012-06-24 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13400196#comment-13400196
 ] 

Lars Hofhansl commented on HDFS-1783:
-

One more point to consider:
For us (Salesforce) this is mostly interesting for HBase.

A typical HBase cluster has the DataNodes co-located with the HBase 
RegionServers.
So assuming good load distribution within HBase, the bandwidth would still be 
amortized across the cluster, but with lower latency for each single 
RegionServer (this HDFS client in this case). Overall the same number of bits 
is sent through the cluster as a whole.

This would only be enabled for the WAL. Other write load (like compactions), 
would still do the pipelining.

Andy did some cool testing on EC2 over in HBASE-6116.

We'll be doing some basic testing in a real, dedicated cluster this week.


> Ability for HDFS client to write replicas in parallel
> -
>
> Key: HDFS-1783
> URL: https://issues.apache.org/jira/browse/HDFS-1783
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Reporter: dhruba borthakur
>Assignee: Lars Hofhansl
> Attachments: HDFS-1783-trunk-v2.patch, HDFS-1783-trunk-v3.patch, 
> HDFS-1783-trunk-v4.patch, HDFS-1783-trunk-v5.patch, HDFS-1783-trunk.patch
>
>
> The current implementation of HDFS pipelines the writes to the three 
> replicas. This introduces some latency for realtime latency sensitive 
> applications. An alternate implementation that allows the client to write all 
> replicas in parallel gives much better response times to these applications. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1783) Ability for HDFS client to write replicas in parallel

2012-06-21 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13398785#comment-13398785
 ] 

Lars Hofhansl commented on HDFS-1783:
-

Yes, that would be a good optimization.
I would propose starting with something simple, though (such as the current 
patch) and see how that behaves with HBase, and build confidence that it does 
not break things.
It's optional and the patch (IMHO) is low risk (only DFSOutputStream is 
changed, the rest for tests). Then we can think about further optimizations.


> Ability for HDFS client to write replicas in parallel
> -
>
> Key: HDFS-1783
> URL: https://issues.apache.org/jira/browse/HDFS-1783
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Reporter: dhruba borthakur
>Assignee: Lars Hofhansl
> Attachments: HDFS-1783-trunk-v2.patch, HDFS-1783-trunk-v3.patch, 
> HDFS-1783-trunk-v4.patch, HDFS-1783-trunk-v5.patch, HDFS-1783-trunk.patch
>
>
> The current implementation of HDFS pipelines the writes to the three 
> replicas. This introduces some latency for realtime latency sensitive 
> applications. An alternate implementation that allows the client to write all 
> replicas in parallel gives much better response times to these applications. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1783) Ability for HDFS client to write replicas in parallel

2012-06-21 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13398767#comment-13398767
 ] 

Lars Hofhansl commented on HDFS-1783:
-

Yep. That is exactly the point. HDFS does pipelining to improve throughput at 
the expense of latency. This patch allows a client to favor latency.

If the client operates at the NIC's throughput limit enabling parallel writes 
will make things worse.

This patch could be extended in the future to mix direct connections with 
pipelining. For example a client could setup a 1-hop (direct) pipeline and a 
2-hop-pipeline for a replication factor of 3, or 2 2-hop-pipelines for a 
replication factor of 4, etc.

We'll be testing this with HBase workloads. Using traffic shaping is 
interesting.


> Ability for HDFS client to write replicas in parallel
> -
>
> Key: HDFS-1783
> URL: https://issues.apache.org/jira/browse/HDFS-1783
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Reporter: dhruba borthakur
>Assignee: Lars Hofhansl
> Attachments: HDFS-1783-trunk-v2.patch, HDFS-1783-trunk-v3.patch, 
> HDFS-1783-trunk-v4.patch, HDFS-1783-trunk-v5.patch, HDFS-1783-trunk.patch
>
>
> The current implementation of HDFS pipelines the writes to the three 
> replicas. This introduces some latency for realtime latency sensitive 
> applications. An alternate implementation that allows the client to write all 
> replicas in parallel gives much better response times to these applications. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1783) Ability for HDFS client to write replicas in parallel

2012-06-21 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13398256#comment-13398256
 ] 

Lars Hofhansl commented on HDFS-1783:
-

Thanks Ted. Yes, I got that wrong in the first version of the patch (Dhruba had 
it right on Github, just me). I found that when I added the various tests.

I'll be back in the US soon, and finish the HBase patch and get some 
performance testing done on a real cluster.

> Ability for HDFS client to write replicas in parallel
> -
>
> Key: HDFS-1783
> URL: https://issues.apache.org/jira/browse/HDFS-1783
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Reporter: dhruba borthakur
>Assignee: Lars Hofhansl
> Attachments: HDFS-1783-trunk-v2.patch, HDFS-1783-trunk-v3.patch, 
> HDFS-1783-trunk-v4.patch, HDFS-1783-trunk-v5.patch, HDFS-1783-trunk.patch
>
>
> The current implementation of HDFS pipelines the writes to the three 
> replicas. This introduces some latency for realtime latency sensitive 
> applications. An alternate implementation that allows the client to write all 
> replicas in parallel gives much better response times to these applications. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-06-19 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396592#comment-13396592
 ] 

Lars Hofhansl commented on HDFS-3370:
-

Hardlinks would be used for temporary snapshotting (not to hold the backup 
itself).

Anyway... Since there's strong opposition to this, at Salesforce we'll either 
come up with something else, maintain local HDFS patches, or use a different 
file system.


> HDFS hardlink
> -
>
> Key: HDFS-3370
> URL: https://issues.apache.org/jira/browse/HDFS-3370
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Hairong Kuang
>Assignee: Liyin Tang
> Attachments: HDFS-HardLink.pdf
>
>
> We'd like to add a new feature hardlink to HDFS that allows harlinked files 
> to share data without copying. Currently we will support hardlinking only 
> closed files, but it could be extended to unclosed files as well.
> Among many potential use cases of the feature, the following two are 
> primarily used in facebook:
> 1. This provides a lightweight way for applications like hbase to create a 
> snapshot;
> 2. This also allows an application like Hive to move a table to a different 
> directory without breaking current running hive queries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1783) Ability for HDFS client to write replicas in parallel

2012-06-14 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13294951#comment-13294951
 ] 

Lars Hofhansl commented on HDFS-1783:
-

@Ted: The first method is overridden in DistributedFileSystem (to avoid having 
to change method signatures in each subclass for FileSystem).

PrimitiveCreate is called from FileContext. There seem to be some general 
inconsistencies in FileSystem. For example calling FileSystem.create(..., 
APPEND, ...) will not append. FileContext.create(..., APPEND, ...) on the other 
hand will do the right thing.

This patch does not affect that. The patch will naturally work with 
FileContext.create(..., APPEND, ...) I'll add a few more tests for this.
When I'm back in the US, I'll get some performance numbers (judging from my 
micro benchmarks, I'd expect some nice improvements as long as the client's 
network-link is not saturated).


> Ability for HDFS client to write replicas in parallel
> -
>
> Key: HDFS-1783
> URL: https://issues.apache.org/jira/browse/HDFS-1783
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Reporter: dhruba borthakur
>Assignee: Lars Hofhansl
> Attachments: HDFS-1783-trunk-v2.patch, HDFS-1783-trunk-v3.patch, 
> HDFS-1783-trunk-v4.patch, HDFS-1783-trunk-v5.patch, HDFS-1783-trunk.patch
>
>
> The current implementation of HDFS pipelines the writes to the three 
> replicas. This introduces some latency for realtime latency sensitive 
> applications. An alternate implementation that allows the client to write all 
> replicas in parallel gives much better response times to these applications. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-06-14 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13294934#comment-13294934
 ] 

Lars Hofhansl commented on HDFS-3370:
-

This is a good discussion. 

Couple of points:
bq. Or provide use cases which cannot be solved without it.
This seems to be the key question: What services should a file system provide?
The same argument could be made for symbolic links. The application could 
implement those (in fact it's quite simple).

bq. but they are very hard to support when the namespace is distributed
But isn't that an implementation detail, which should not inform the feature 
set? 
Hardlinks could be only supported per distinct namespace (namespace in 
federated HDFS or a volume in MapR - I think). This is not unlike Unix where 
hardlinks are per distinct filesystem (i.e. not across mount points).

@M.C. Srivas:
If you create 15 backups without hardlinks you get 15 times the metadata *and* 
15 times the data... Unless you assume some other feature such as snapshots 
with copy-on-write or backup-on-write semantics. (Maybe I did not get the 
argument)

Immutable files are very a common and useful design pattern (not just for 
HBase) and while not strictly needed, hardlinks are very useful together with 
immutable files.

Just my $0.02.


> HDFS hardlink
> -
>
> Key: HDFS-3370
> URL: https://issues.apache.org/jira/browse/HDFS-3370
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Hairong Kuang
>Assignee: Liyin Tang
> Attachments: HDFS-HardLink.pdf
>
>
> We'd like to add a new feature hardlink to HDFS that allows harlinked files 
> to share data without copying. Currently we will support hardlinking only 
> closed files, but it could be extended to unclosed files as well.
> Among many potential use cases of the feature, the following two are 
> primarily used in facebook:
> 1. This provides a lightweight way for applications like hbase to create a 
> snapshot;
> 2. This also allows an application like Hive to move a table to a different 
> directory without breaking current running hive queries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-06-12 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293480#comment-13293480
 ] 

Lars Hofhansl commented on HDFS-3370:
-

Thanks Liyin. Sounds good.

One thought that occurred to me since: We need to think about copy semantics. 
For example how will distcp handle this? It shouldn't create a new copy of a 
file for each hardlink that points to it, but rather just copy it at most once 
and create hardlinks for each following reference. But then what about multiple 
distcp commands that happen to cover hardlinks to the same file? I suppose in 
the case we cannot be expected to avoid multiple copies of the same file (but 
at most one copy for each invocation of distcp, and only if the distcp happens 
to cover a different hardlink).


> HDFS hardlink
> -
>
> Key: HDFS-3370
> URL: https://issues.apache.org/jira/browse/HDFS-3370
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Hairong Kuang
>Assignee: Liyin Tang
> Attachments: HDFS-HardLink.pdf
>
>
> We'd like to add a new feature hardlink to HDFS that allows harlinked files 
> to share data without copying. Currently we will support hardlinking only 
> closed files, but it could be extended to unclosed files as well.
> Among many potential use cases of the feature, the following two are 
> primarily used in facebook:
> 1. This provides a lightweight way for applications like hbase to create a 
> snapshot;
> 2. This also allows an application like Hive to move a table to a different 
> directory without breaking current running hive queries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1783) Ability for HDFS client to write replicas in parallel

2012-06-09 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HDFS-1783:


Attachment: HDFS-1783-trunk-v5.patch

Also adds a subclass of TestPipelinesFailover running all tests with 
PARALLEL_WRITES.

FileSystem.append itself does not support parallel writes (as of this patch).

I am generally not quite clear what the difference between FileSystem.append 
and FileSystem.create(..., CreateFlag.APPEND, ...) is supposed to be.


> Ability for HDFS client to write replicas in parallel
> -
>
> Key: HDFS-1783
> URL: https://issues.apache.org/jira/browse/HDFS-1783
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Reporter: dhruba borthakur
>Assignee: Lars Hofhansl
> Attachments: HDFS-1783-trunk-v2.patch, HDFS-1783-trunk-v3.patch, 
> HDFS-1783-trunk-v4.patch, HDFS-1783-trunk-v5.patch, HDFS-1783-trunk.patch
>
>
> The current implementation of HDFS pipelines the writes to the three 
> replicas. This introduces some latency for realtime latency sensitive 
> applications. An alternate implementation that allows the client to write all 
> replicas in parallel gives much better response times to these applications. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1783) Ability for HDFS client to write replicas in parallel

2012-06-08 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291647#comment-13291647
 ] 

Lars Hofhansl commented on HDFS-1783:
-

One more test:
I introduced an artificial 1ms sleep in the beginning of 
BlockReceiever.receivePacket.
Then I ran the same test above with 10.000 loops.
With the patch it takes ~19s without the patch ~44s.


> Ability for HDFS client to write replicas in parallel
> -
>
> Key: HDFS-1783
> URL: https://issues.apache.org/jira/browse/HDFS-1783
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Reporter: dhruba borthakur
>Assignee: Lars Hofhansl
> Attachments: HDFS-1783-trunk-v2.patch, HDFS-1783-trunk-v3.patch, 
> HDFS-1783-trunk-v4.patch, HDFS-1783-trunk.patch
>
>
> The current implementation of HDFS pipelines the writes to the three 
> replicas. This introduces some latency for realtime latency sensitive 
> applications. An alternate implementation that allows the client to write all 
> replicas in parallel gives much better response times to these applications. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1783) Ability for HDFS client to write replicas in parallel

2012-06-08 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291641#comment-13291641
 ] 

Lars Hofhansl commented on HDFS-1783:
-

I did a simple local micro benchmark:

Started a mini cluster with 3 data nodes.
Wrote 1 byte 100.000 times, each followed by an hflush (so 100.000 packets).

With parallel writes it took ~25s, without ~30s (this was repeatable).

Also tried to 10 and 100 byte packets. For 10 bytes I get the same results.
For 100 bytes it took ~29s with parallel writes and ~37s without.

Since this was all on a single machine I am not entirely sure how this would 
translate to a real cluster with real network latency.

The latency I measured for my "lo" device is 0.05ms... I would expect the 
impact of this change to be more profound in a real cluster setting with 
latency in the order of a few ms. There also should be a definite gain when 
hsync (after HDFS-744) is enabled (but that I cannot test on a single machine 
with a single spindle).


> Ability for HDFS client to write replicas in parallel
> -
>
> Key: HDFS-1783
> URL: https://issues.apache.org/jira/browse/HDFS-1783
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Reporter: dhruba borthakur
>Assignee: Lars Hofhansl
> Attachments: HDFS-1783-trunk-v2.patch, HDFS-1783-trunk-v3.patch, 
> HDFS-1783-trunk-v4.patch, HDFS-1783-trunk.patch
>
>
> The current implementation of HDFS pipelines the writes to the three 
> replicas. This introduces some latency for realtime latency sensitive 
> applications. An alternate implementation that allows the client to write all 
> replicas in parallel gives much better response times to these applications. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-06-08 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291614#comment-13291614
 ] 

Lars Hofhansl commented on HDFS-3370:
-

Do you have a preliminary patch to look at?


> HDFS hardlink
> -
>
> Key: HDFS-3370
> URL: https://issues.apache.org/jira/browse/HDFS-3370
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Hairong Kuang
>Assignee: Liyin Tang
> Attachments: HDFS-HardLink.pdf
>
>
> We'd like to add a new feature hardlink to HDFS that allows harlinked files 
> to share data without copying. Currently we will support hardlinking only 
> closed files, but it could be extended to unclosed files as well.
> Among many potential use cases of the feature, the following two are 
> primarily used in facebook:
> 1. This provides a lightweight way for applications like hbase to create a 
> snapshot;
> 2. This also allows an application like Hive to move a table to a different 
> directory without breaking current running hive queries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1783) Ability for HDFS client to write replicas in parallel

2012-06-08 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291612#comment-13291612
 ] 

Lars Hofhansl commented on HDFS-1783:
-

Thanks Todd. I see your point. I'm still oversees until the end of the month 
with no physical access to a cluster. Ram and Andy said, that they might get a 
chance to do some performance test before that. (It's hard to beat the 
pipelining on throughput, so I only expect latency to be improved.)

As for the complexity, I find it manageable... The pipelining as such has not 
changed, only that the client opens up N pipelines on length 1.
Once this change is in, one could get fancier (for example 2 pipelines of 
length 2 for 4 replicas, etc, or maybe we could open pipelines to multiple 
clusters, etc).


> Ability for HDFS client to write replicas in parallel
> -
>
> Key: HDFS-1783
> URL: https://issues.apache.org/jira/browse/HDFS-1783
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Reporter: dhruba borthakur
>Assignee: Lars Hofhansl
> Attachments: HDFS-1783-trunk-v2.patch, HDFS-1783-trunk-v3.patch, 
> HDFS-1783-trunk-v4.patch, HDFS-1783-trunk.patch
>
>
> The current implementation of HDFS pipelines the writes to the three 
> replicas. This introduces some latency for realtime latency sensitive 
> applications. An alternate implementation that allows the client to write all 
> replicas in parallel gives much better response times to these applications. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-06-07 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13290922#comment-13290922
 ] 

Lars Hofhansl commented on HDFS-3370:
-

Is anybody working on a patch for this?
If not, I would not mind picking this up (although I can't promise getting to 
this before the end of the month).


> HDFS hardlink
> -
>
> Key: HDFS-3370
> URL: https://issues.apache.org/jira/browse/HDFS-3370
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Hairong Kuang
>Assignee: Liyin Tang
> Attachments: HDFS-HardLink.pdf
>
>
> We'd like to add a new feature hardlink to HDFS that allows harlinked files 
> to share data without copying. Currently we will support hardlinking only 
> closed files, but it could be extended to unclosed files as well.
> Among many potential use cases of the feature, the following two are 
> primarily used in facebook:
> 1. This provides a lightweight way for applications like hbase to create a 
> snapshot;
> 2. This also allows an application like Hive to move a table to a different 
> directory without breaking current running hive queries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1783) Ability for HDFS client to write replicas in parallel

2012-06-07 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13290920#comment-13290920
 ] 

Lars Hofhansl commented on HDFS-1783:
-

Is there general interest in this?


> Ability for HDFS client to write replicas in parallel
> -
>
> Key: HDFS-1783
> URL: https://issues.apache.org/jira/browse/HDFS-1783
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Reporter: dhruba borthakur
>Assignee: Lars Hofhansl
> Attachments: HDFS-1783-trunk-v2.patch, HDFS-1783-trunk-v3.patch, 
> HDFS-1783-trunk-v4.patch, HDFS-1783-trunk.patch
>
>
> The current implementation of HDFS pipelines the writes to the three 
> replicas. This introduces some latency for realtime latency sensitive 
> applications. An alternate implementation that allows the client to write all 
> replicas in parallel gives much better response times to these applications. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-06-05 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13289689#comment-13289689
 ] 

Lars Hofhansl commented on HDFS-3370:
-

Reading through the Design Doc it seems that 
FileSystem.{setPermission|setOwner} would be awkward. We'd have to find each 
INodeHardLinkFile pointing to the same "file" and then changing all their 
permissions/owners.

HardLinkFileInfo could also maintain permissions and owners (since they - 
following posix - are the same for each hard link). That way changing owner or 
permissions would immediately affect all hard links.
When the fsimage is saved each INodeHardLinkFile would still write its own 
permission and owner (for simplicity, but that could be optimized, as long as 
at least one INode writes the permissions/owner).
Upon read INode representing a hardlink must have the same permission/owner as 
all other INodes linking to the same "file". If not the image is inconsistent.

In that case HardLinkFileInfo would not need to maintain a list of pointers 
back to all INodeHardLinkFiles, and owner/permissions would only be stored once 
in memory.


> HDFS hardlink
> -
>
> Key: HDFS-3370
> URL: https://issues.apache.org/jira/browse/HDFS-3370
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Hairong Kuang
>Assignee: Liyin Tang
> Attachments: HDFS-HardLink.pdf
>
>
> We'd like to add a new feature hardlink to HDFS that allows harlinked files 
> to share data without copying. Currently we will support hardlinking only 
> closed files, but it could be extended to unclosed files as well.
> Among many potential use cases of the feature, the following two are 
> primarily used in facebook:
> 1. This provides a lightweight way for applications like hbase to create a 
> snapshot;
> 2. This also allows an application like Hive to move a table to a different 
> directory without breaking current running hive queries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

1 2 >

1 - 100 of 147 matches

Mail list logo