from:"Lars Hofhansl \(JIRA\)"

[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes

2015-06-23 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598677#comment-14598677
 ] 

Lars Hofhansl commented on HDFS-6440:
-

Yeah. Thanks [~atm]!

 Support more than 2 NameNodes
 -

 Key: HDFS-6440
 URL: https://issues.apache.org/jira/browse/HDFS-6440
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: auto-failover, ha, namenode
Affects Versions: 2.4.0
Reporter: Jesse Yates
Assignee: Jesse Yates
 Fix For: 3.0.0

 Attachments: Multiple-Standby-NameNodes_V1.pdf, 
 hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, 
 hdfs-6440-trunk-v1.patch, hdfs-6440-trunk-v3.patch, hdfs-6440-trunk-v4.patch, 
 hdfs-6440-trunk-v5.patch, hdfs-6440-trunk-v6.patch, hdfs-6440-trunk-v7.patch, 
 hdfs-6440-trunk-v8.patch, hdfs-multiple-snn-trunk-v0.patch


 Most of the work is already done to support more than 2 NameNodes (one 
 active, one standby). This would be the last bit to support running multiple 
 _standby_ NameNodes; one of the standbys should be available for fail-over.
 Mostly, this is a matter of updating how we parse configurations, some 
 complexity around managing the checkpointing, and updating a whole lot of 
 tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes

2015-05-03 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525990#comment-14525990
 ] 

Lars Hofhansl commented on HDFS-6440:
-

[~eli], this is the issue I mentioned on Wednesday.

I find it hard to believe that we're the only ones who want this, it's running 
in production at Salesforce. What's holding this up? How can we help getting 
this in? Break it into smaller pieces? Something else?


 Support more than 2 NameNodes
 -

 Key: HDFS-6440
 URL: https://issues.apache.org/jira/browse/HDFS-6440
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: auto-failover, ha, namenode
Affects Versions: 2.4.0
Reporter: Jesse Yates
Assignee: Jesse Yates
 Attachments: Multiple-Standby-NameNodes_V1.pdf, 
 hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, 
 hdfs-6440-trunk-v1.patch, hdfs-multiple-snn-trunk-v0.patch


 Most of the work is already done to support more than 2 NameNodes (one 
 active, one standby). This would be the last bit to support running multiple 
 _standby_ NameNodes; one of the standbys should be available for fail-over.
 Mostly, this is a matter of updating how we parse configurations, some 
 complexity around managing the checkpointing, and updating a whole lot of 
 tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7240) Object store in HDFS

2015-04-08 Thread Lars Hofhansl (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14486738#comment-14486738
]

Lars Hofhansl commented on HDFS-7240:
-

Awesome stuff. We (Salesforce) have a need for this.

I think these will lead to immediate management problems:
* Object Size : 5G
* Number of buckets system-wide : 10 million
* Number of objects per bucket: 1 million
* Number of buckets per storage volume : 1000

We have a large number of tenant (many times more than 1000). Some of the
tenants will be very large (storing many times more than 1m objects). Of course
there are simple workarounds for that, such as including a tenant id in the
volume name and a bucket name in our internal blob ids. Are these technical
limits?

I don't think that we're the only ones who will to store a large amount of
objects (more than 1m) and the bucket management would get into the way, rather
than help.

Object store in HDFS

Key: HDFS-7240
URL: https://issues.apache.org/jira/browse/HDFS-7240
Project: Hadoop HDFS
Issue Type: New Feature
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
Attachments: Ozone-architecture-v1.pdf

This jira proposes to add object store capabilities into HDFS.
As part of the federation work (HDFS-1052) we separated block storage as a
generic storage layer. Using the Block Pool abstraction, new kinds of
namespaces can be built on top of the storage layer i.e. datanodes.
In this jira I will explore building an object store using the datanode
storage, but independent of namespace metadata.
I will soon update with a detailed design document.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes

2015-04-06 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14481957#comment-14481957
 ] 

Lars Hofhansl commented on HDFS-6440:
-

Let me also restate that we are running this in production on hundreds of 
clusters at Salesforce; we haven't seen any issues. It _is_ a pretty intricate 
patch, so I understand the hesitation.


 Support more than 2 NameNodes
 -

 Key: HDFS-6440
 URL: https://issues.apache.org/jira/browse/HDFS-6440
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: auto-failover, ha, namenode
Affects Versions: 2.4.0
Reporter: Jesse Yates
Assignee: Jesse Yates
 Attachments: Multiple-Standby-NameNodes_V1.pdf, 
 hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, 
 hdfs-6440-trunk-v1.patch, hdfs-multiple-snn-trunk-v0.patch


 Most of the work is already done to support more than 2 NameNodes (one 
 active, one standby). This would be the last bit to support running multiple 
 _standby_ NameNodes; one of the standbys should be available for fail-over.
 Mostly, this is a matter of updating how we parse configurations, some 
 complexity around managing the checkpointing, and updating a whole lot of 
 tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-28 Thread Lars Hofhansl (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Lars Hofhansl updated HDFS-6735:

Attachment: HDFS-6735-v8.txt

One more update. I noticed that the lock in ShortCircuitCache is taking more
time than warranted. I noticed we have all these Precondition checks, where we
prebuild the string that is only used in the exceptional case. Much better to
use static strings with parameters so that the message string is constant and
the final string is only built in the exception case.
That noticeably decreases the time spend in the ShortCircuitCache.lock.

Could do that in the separate jira, but it seemed easy enough.
Please let me know what you think. Thanks.

A minor optimization to avoid pread() be blocked by read() inside the same
DFSInputStream
-

Key: HDFS-6735
URL: https://issues.apache.org/jira/browse/HDFS-6735
Project: Hadoop HDFS
Issue Type: Improvement
Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Lars Hofhansl
Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt,
HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735-v7.txt, HDFS-6735-v8.txt,
HDFS-6735.txt

In current DFSInputStream impl, there're a couple of coarser-grained locks in
read/pread path, and it has became a HBase read latency pain point so far. In
HDFS-6698, i made a minor patch against the first encourtered lock, around
getFileLength, in deed, after reading code and testing, it shows still other
locks we could improve.
In this jira, i'll make a patch against other locks, and a simple test case
to show the issue and the improved result.
This is important for HBase application, since in current HFile read path, we
issue all read()/pread() requests in the same DFSInputStream for one HFile.
(Multi streams solution is another story i had a plan to do, but probably
will take more time than i expected)

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-27 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HDFS-6735:

Attachment: HDFS-6735-v6.txt

Trying to get another build. The artifacts of the previous one are gone for 
some reason.

 A minor optimization to avoid pread() be blocked by read() inside the same 
 DFSInputStream
 -

 Key: HDFS-6735
 URL: https://issues.apache.org/jira/browse/HDFS-6735
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Lars Hofhansl
 Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, 
 HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735-v6.txt, HDFS-6735-v6.txt, 
 HDFS-6735.txt


 In current DFSInputStream impl, there're a couple of coarser-grained locks in 
 read/pread path, and it has became a HBase read latency pain point so far. In 
 HDFS-6698, i made a minor patch against the first encourtered lock, around 
 getFileLength, in deed, after reading code and testing, it shows still other 
 locks we could improve.
 In this jira, i'll make a patch against other locks, and a simple test case 
 to show the issue and the improved result.
 This is important for HBase application, since in current HFile read path, we 
 issue all read()/pread() requests in the same DFSInputStream for one HFile. 
 (Multi streams solution is another story i had a plan to do, but probably 
 will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-27 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HDFS-6735:

Attachment: (was: HDFS-6735-v6.txt)

 A minor optimization to avoid pread() be blocked by read() inside the same 
 DFSInputStream
 -

 Key: HDFS-6735
 URL: https://issues.apache.org/jira/browse/HDFS-6735
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Lars Hofhansl
 Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, 
 HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735.txt


 In current DFSInputStream impl, there're a couple of coarser-grained locks in 
 read/pread path, and it has became a HBase read latency pain point so far. In 
 HDFS-6698, i made a minor patch against the first encourtered lock, around 
 getFileLength, in deed, after reading code and testing, it shows still other 
 locks we could improve.
 In this jira, i'll make a patch against other locks, and a simple test case 
 to show the issue and the improved result.
 This is important for HBase application, since in current HFile read path, we 
 issue all read()/pread() requests in the same DFSInputStream for one HFile. 
 (Multi streams solution is another story i had a plan to do, but probably 
 will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-27 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HDFS-6735:

Attachment: (was: HDFS-6735-v6.txt)

 A minor optimization to avoid pread() be blocked by read() inside the same 
 DFSInputStream
 -

 Key: HDFS-6735
 URL: https://issues.apache.org/jira/browse/HDFS-6735
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Lars Hofhansl
 Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, 
 HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735.txt


 In current DFSInputStream impl, there're a couple of coarser-grained locks in 
 read/pread path, and it has became a HBase read latency pain point so far. In 
 HDFS-6698, i made a minor patch against the first encourtered lock, around 
 getFileLength, in deed, after reading code and testing, it shows still other 
 locks we could improve.
 In this jira, i'll make a patch against other locks, and a simple test case 
 to show the issue and the improved result.
 This is important for HBase application, since in current HFile read path, we 
 issue all read()/pread() requests in the same DFSInputStream for one HFile. 
 (Multi streams solution is another story i had a plan to do, but probably 
 will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-27 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HDFS-6735:

Attachment: HDFS-6735-v7.txt

So here's the final one (with the findbugs tweak back in).

 A minor optimization to avoid pread() be blocked by read() inside the same 
 DFSInputStream
 -

 Key: HDFS-6735
 URL: https://issues.apache.org/jira/browse/HDFS-6735
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Lars Hofhansl
 Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, 
 HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735-v7.txt, HDFS-6735.txt


 In current DFSInputStream impl, there're a couple of coarser-grained locks in 
 read/pread path, and it has became a HBase read latency pain point so far. In 
 HDFS-6698, i made a minor patch against the first encourtered lock, around 
 getFileLength, in deed, after reading code and testing, it shows still other 
 locks we could improve.
 In this jira, i'll make a patch against other locks, and a simple test case 
 to show the issue and the improved result.
 This is important for HBase application, since in current HFile read path, we 
 issue all read()/pread() requests in the same DFSInputStream for one HFile. 
 (Multi streams solution is another story i had a plan to do, but probably 
 will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-26 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14227031#comment-14227031
 ] 

Lars Hofhansl commented on HDFS-6735:
-

Per my comment above my preference would still be to just make the 
cachingStrategy reference volatile in DFSInputStream. It is immutable and hence 
the volatile reference would make access safe in all cases without any locking 
- the same is true for fileEncryptionInfo, btw (immutable already, just needs a 
volatile reference, no locking needed at all).

I'll make a new patch.

 A minor optimization to avoid pread() be blocked by read() inside the same 
 DFSInputStream
 -

 Key: HDFS-6735
 URL: https://issues.apache.org/jira/browse/HDFS-6735
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, 
 HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735.txt


 In current DFSInputStream impl, there're a couple of coarser-grained locks in 
 read/pread path, and it has became a HBase read latency pain point so far. In 
 HDFS-6698, i made a minor patch against the first encourtered lock, around 
 getFileLength, in deed, after reading code and testing, it shows still other 
 locks we could improve.
 In this jira, i'll make a patch against other locks, and a simple test case 
 to show the issue and the improved result.
 This is important for HBase application, since in current HFile read path, we 
 issue all read()/pread() requests in the same DFSInputStream for one HFile. 
 (Multi streams solution is another story i had a plan to do, but probably 
 will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-26 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HDFS-6735:

Attachment: HDFS-6735-v6.txt

Updated patch.
The findbugs tweak is still necessary. Locking was correct before, findbugs 
does not seem to realize that all references to cachingStrategy is always 
guarded by the infoLock.

I'll run a 2.4.1 version of this patch against HBase again.


 A minor optimization to avoid pread() be blocked by read() inside the same 
 DFSInputStream
 -

 Key: HDFS-6735
 URL: https://issues.apache.org/jira/browse/HDFS-6735
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, 
 HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735-v6.txt, HDFS-6735.txt


 In current DFSInputStream impl, there're a couple of coarser-grained locks in 
 read/pread path, and it has became a HBase read latency pain point so far. In 
 HDFS-6698, i made a minor patch against the first encourtered lock, around 
 getFileLength, in deed, after reading code and testing, it shows still other 
 locks we could improve.
 In this jira, i'll make a patch against other locks, and a simple test case 
 to show the issue and the improved result.
 This is important for HBase application, since in current HFile read path, we 
 issue all read()/pread() requests in the same DFSInputStream for one HFile. 
 (Multi streams solution is another story i had a plan to do, but probably 
 will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-26 Thread Lars Hofhansl (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14227148#comment-14227148
]

Lars Hofhansl commented on HDFS-6735:
-

Tested -v6 with HBase. Still good from the DFSInputStream angle.
I do see now that much more time is spent in ShortCircuitCache.fetchOrCreate
and unref. (rechecked that is true to -v3 as well).
It's still better, but the can is kicked down the road a bit.

A minor optimization to avoid pread() be blocked by read() inside the same
DFSInputStream
-

Key: HDFS-6735
URL: https://issues.apache.org/jira/browse/HDFS-6735
Project: Hadoop HDFS
Issue Type: Improvement
Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie
Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt,
HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735-v6.txt, HDFS-6735.txt

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-26 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14227155#comment-14227155
 ] 

Lars Hofhansl commented on HDFS-6735:
-

So to be specific the improvement I see above is still there. Just that the 
next thing to tackle is the ShortCircuitCache.

 A minor optimization to avoid pread() be blocked by read() inside the same 
 DFSInputStream
 -

 Key: HDFS-6735
 URL: https://issues.apache.org/jira/browse/HDFS-6735
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, 
 HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735-v6.txt, HDFS-6735.txt


 In current DFSInputStream impl, there're a couple of coarser-grained locks in 
 read/pread path, and it has became a HBase read latency pain point so far. In 
 HDFS-6698, i made a minor patch against the first encourtered lock, around 
 getFileLength, in deed, after reading code and testing, it shows still other 
 locks we could improve.
 In this jira, i'll make a patch against other locks, and a simple test case 
 to show the issue and the improved result.
 This is important for HBase application, since in current HFile read path, we 
 issue all read()/pread() requests in the same DFSInputStream for one HFile. 
 (Multi streams solution is another story i had a plan to do, but probably 
 will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-26 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl reassigned HDFS-6735:
---

Assignee: Lars Hofhansl  (was: Liang Xie)

 A minor optimization to avoid pread() be blocked by read() inside the same 
 DFSInputStream
 -

 Key: HDFS-6735
 URL: https://issues.apache.org/jira/browse/HDFS-6735
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Lars Hofhansl
 Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, 
 HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735-v6.txt, HDFS-6735.txt


 In current DFSInputStream impl, there're a couple of coarser-grained locks in 
 read/pread path, and it has became a HBase read latency pain point so far. In 
 HDFS-6698, i made a minor patch against the first encourtered lock, around 
 getFileLength, in deed, after reading code and testing, it shows still other 
 locks we could improve.
 In this jira, i'll make a patch against other locks, and a simple test case 
 to show the issue and the improved result.
 This is important for HBase application, since in current HFile read path, we 
 issue all read()/pread() requests in the same DFSInputStream for one HFile. 
 (Multi streams solution is another story i had a plan to do, but probably 
 will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-25 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HDFS-6735:

Attachment: HDFS-6735-v6.txt

Thanks [~ste...@apache.org].

New patch with findbugs tweak.

 A minor optimization to avoid pread() be blocked by read() inside the same 
 DFSInputStream
 -

 Key: HDFS-6735
 URL: https://issues.apache.org/jira/browse/HDFS-6735
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, 
 HDFS-6735-v5.txt, HDFS-6735-v6.txt, HDFS-6735.txt


 In current DFSInputStream impl, there're a couple of coarser-grained locks in 
 read/pread path, and it has became a HBase read latency pain point so far. In 
 HDFS-6698, i made a minor patch against the first encourtered lock, around 
 getFileLength, in deed, after reading code and testing, it shows still other 
 locks we could improve.
 In this jira, i'll make a patch against other locks, and a simple test case 
 to show the issue and the improved result.
 This is important for HBase application, since in current HFile read path, we 
 issue all read()/pread() requests in the same DFSInputStream for one HFile. 
 (Multi streams solution is another story i had a plan to do, but probably 
 will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-24 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223901#comment-14223901
 ] 

Lars Hofhansl commented on HDFS-6735:
-

The remaining findbugs warning is due to cachingStrategy. I am 100% sure that 
the locking is correct, every single reference to cachingStrategy is guarded by 
the infoLock.
This should good to go (happy to squash the bogus findbugs warning if somebody 
has a suggestion how).

The findbugs website states this for IS2_INCONSISTENT_SYNC:
{quote}
Note that there are various sources of inaccuracy in this detector; for 
example, the detector cannot statically detect all situations in which a lock 
is held.  Also, even when the detector is accurate in distinguishing locked vs. 
unlocked accesses, the code in question may still be correct.
{quote}


 A minor optimization to avoid pread() be blocked by read() inside the same 
 DFSInputStream
 -

 Key: HDFS-6735
 URL: https://issues.apache.org/jira/browse/HDFS-6735
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, 
 HDFS-6735-v5.txt, HDFS-6735.txt


 In current DFSInputStream impl, there're a couple of coarser-grained locks in 
 read/pread path, and it has became a HBase read latency pain point so far. In 
 HDFS-6698, i made a minor patch against the first encourtered lock, around 
 getFileLength, in deed, after reading code and testing, it shows still other 
 locks we could improve.
 In this jira, i'll make a patch against other locks, and a simple test case 
 to show the issue and the improved result.
 This is important for HBase application, since in current HFile read path, we 
 issue all read()/pread() requests in the same DFSInputStream for one HFile. 
 (Multi streams solution is another story i had a plan to do, but probably 
 will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-23 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HDFS-6735:

Attachment: HDFS-6735-v4.txt

New patch:
* added synchronized back to tryZeroCopyRead
* renamed sharedLock to infoLock
* this time did all the correct indentation - harder to review, but this should 
be committable as is
* surrounded every reference to cachingStrategy with synchronized(infoLock) 
{...}, removed volatile

Looking at this again, we can be better about safe publishing with immutable 
state and avoid some of the locks.
For example FileEncryptionInfo and CachingStrategy are already immutable and 
can be 100% safely handled by just a volatile reference; most the LocatedBlocks 
state is also immutable and for those parts we can avoid the locks as well.

Immutable state is easier to reason about and more efficient.
(volatile still places read and write memory fences - but that is cheaper than 
synchronized). Can do that later :)


 A minor optimization to avoid pread() be blocked by read() inside the same 
 DFSInputStream
 -

 Key: HDFS-6735
 URL: https://issues.apache.org/jira/browse/HDFS-6735
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, 
 HDFS-6735.txt


 In current DFSInputStream impl, there're a couple of coarser-grained locks in 
 read/pread path, and it has became a HBase read latency pain point so far. In 
 HDFS-6698, i made a minor patch against the first encourtered lock, around 
 getFileLength, in deed, after reading code and testing, it shows still other 
 locks we could improve.
 In this jira, i'll make a patch against other locks, and a simple test case 
 to show the issue and the improved result.
 This is important for HBase application, since in current HFile read path, we 
 issue all read()/pread() requests in the same DFSInputStream for one HFile. 
 (Multi streams solution is another story i had a plan to do, but probably 
 will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-23 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HDFS-6735:

Attachment: (was: HDFS-6735-v4.txt)

 A minor optimization to avoid pread() be blocked by read() inside the same 
 DFSInputStream
 -

 Key: HDFS-6735
 URL: https://issues.apache.org/jira/browse/HDFS-6735
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735.txt


 In current DFSInputStream impl, there're a couple of coarser-grained locks in 
 read/pread path, and it has became a HBase read latency pain point so far. In 
 HDFS-6698, i made a minor patch against the first encourtered lock, around 
 getFileLength, in deed, after reading code and testing, it shows still other 
 locks we could improve.
 In this jira, i'll make a patch against other locks, and a simple test case 
 to show the issue and the improved result.
 This is important for HBase application, since in current HFile read path, we 
 issue all read()/pread() requests in the same DFSInputStream for one HFile. 
 (Multi streams solution is another story i had a plan to do, but probably 
 will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-23 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HDFS-6735:

Attachment: HDFS-6735-v4.txt

 A minor optimization to avoid pread() be blocked by read() inside the same 
 DFSInputStream
 -

 Key: HDFS-6735
 URL: https://issues.apache.org/jira/browse/HDFS-6735
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, 
 HDFS-6735.txt


 In current DFSInputStream impl, there're a couple of coarser-grained locks in 
 read/pread path, and it has became a HBase read latency pain point so far. In 
 HDFS-6698, i made a minor patch against the first encourtered lock, around 
 getFileLength, in deed, after reading code and testing, it shows still other 
 locks we could improve.
 In this jira, i'll make a patch against other locks, and a simple test case 
 to show the issue and the improved result.
 This is important for HBase application, since in current HFile read path, we 
 issue all read()/pread() requests in the same DFSInputStream for one HFile. 
 (Multi streams solution is another story i had a plan to do, but probably 
 will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-23 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HDFS-6735:

Attachment: HDFS-6735-v5.txt

Looked through the findbugs warning for DFSInputStream:
* indeed currentNode was wrongly synchronized (was so even before the patch). 
In getCurrentDataNode I had added synchronized(infoLock) but getCurrentData 
should just synchronized as currentNode is seek+read state.
* added a synchronized block in getBlockAt around access to pos, blockEnd, 
currentLocatedBlock. As explained in comment that is not needed, since we never 
get into that if block if we coming from a called synchronized on this. But 
if that is so the extra synchronized won't hurt and it should make findbugs 
happy. 


 A minor optimization to avoid pread() be blocked by read() inside the same 
 DFSInputStream
 -

 Key: HDFS-6735
 URL: https://issues.apache.org/jira/browse/HDFS-6735
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, 
 HDFS-6735-v5.txt, HDFS-6735.txt


 In current DFSInputStream impl, there're a couple of coarser-grained locks in 
 read/pread path, and it has became a HBase read latency pain point so far. In 
 HDFS-6698, i made a minor patch against the first encourtered lock, around 
 getFileLength, in deed, after reading code and testing, it shows still other 
 locks we could improve.
 In this jira, i'll make a patch against other locks, and a simple test case 
 to show the issue and the improved result.
 This is important for HBase application, since in current HFile read path, we 
 issue all read()/pread() requests in the same DFSInputStream for one HFile. 
 (Multi streams solution is another story i had a plan to do, but probably 
 will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-23 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14222673#comment-14222673
 ] 

Lars Hofhansl commented on HDFS-6735:
-

s/since we never get into that if block if we coming from a called 
synchronized/since we *only* get into that if block if we coming from a caller 
synchronized/


 A minor optimization to avoid pread() be blocked by read() inside the same 
 DFSInputStream
 -

 Key: HDFS-6735
 URL: https://issues.apache.org/jira/browse/HDFS-6735
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735-v4.txt, 
 HDFS-6735-v5.txt, HDFS-6735.txt


 In current DFSInputStream impl, there're a couple of coarser-grained locks in 
 read/pread path, and it has became a HBase read latency pain point so far. In 
 HDFS-6698, i made a minor patch against the first encourtered lock, around 
 getFileLength, in deed, after reading code and testing, it shows still other 
 locks we could improve.
 In this jira, i'll make a patch against other locks, and a simple test case 
 to show the issue and the improved result.
 This is important for HBase application, since in current HFile read path, we 
 issue all read()/pread() requests in the same DFSInputStream for one HFile. 
 (Multi streams solution is another story i had a plan to do, but probably 
 will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-21 Thread Lars Hofhansl (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14221839#comment-14221839
]

Lars Hofhansl commented on HDFS-6735:
-

Thanks [~cmccabe]. I'll put the synchronized back, do the correct indentation,
and name the new lock differently.
I'll also look through the other synchronized modifiers that I had removed from
private methods where is makes sense.

On the indentation... I completely agree. It's hard to review - sometimes I
apply HBase patches locally just so that I can do a git diff -b to review it
without the whitespace, which is a pain. And if not done in all branches then
cherry-picking a patch becomes annoying, etc, etc.

Thanks again for looking! New patch upcoming.

A minor optimization to avoid pread() be blocked by read() inside the same
DFSInputStream
-

Key: HDFS-6735
URL: https://issues.apache.org/jira/browse/HDFS-6735
Project: Hadoop HDFS
Issue Type: Improvement
Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie
Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735.txt

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-19 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218802#comment-14218802
 ] 

Lars Hofhansl commented on HDFS-6735:
-

I ran TestByteArrayManager as well as all tests derived from 
TestParallelReadUtil. All pass locally.
Will checkout the findbugs warning and do an real-life test with HBase (with 
this patch on top of the latest 2.4)

Any recommendation on what else I should test?


 A minor optimization to avoid pread() be blocked by read() inside the same 
 DFSInputStream
 -

 Key: HDFS-6735
 URL: https://issues.apache.org/jira/browse/HDFS-6735
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735.txt


 In current DFSInputStream impl, there're a couple of coarser-grained locks in 
 read/pread path, and it has became a HBase read latency pain point so far. In 
 HDFS-6698, i made a minor patch against the first encourtered lock, around 
 getFileLength, in deed, after reading code and testing, it shows still other 
 locks we could improve.
 In this jira, i'll make a patch against other locks, and a simple test case 
 to show the issue and the improved result.
 This is important for HBase application, since in current HFile read path, we 
 issue all read()/pread() requests in the same DFSInputStream for one HFile. 
 (Multi streams solution is another story i had a plan to do, but probably 
 will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-19 Thread Lars Hofhansl (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219072#comment-14219072
]

Lars Hofhansl commented on HDFS-6735:
-

Thanks [~cmccabe]. infoLock is better. I'll fix the indentation later. Let me
have a look at tryReadZeroCopy again. I had mapped out all members and which
methods use what, and concluded the synchronized wasn't needed, quite possible
I made a mistake.

Another locking option is not to synchronize on this at all, but to have two
locks (streamLock and pLock, or whatever are good names). That way the
intend might be more explicit.
Yet another option would be to disentangle to two apis by subclassing or
delegation (since the issue really is that we have state for two different
modes of operation in the same class), that'd be a bigger change though.

Meanwhile in HBase land:
Tested this with HBase and observed with a sampler that all delays internal to
DFSInputStream are gone, which is nice.

I committed a change to HBase to allow us to (1) have compaction use their own
input streams so they do not interfere with user scans along the same files and
(2) optionally force p-reads for all user scans. See HBASE-12411.

Especially with #2 I see nice speedups for many concurrent scanners essentially
to what my disks can sustain, but a 50% slow downs for a single scanner per
file only - which is obvious as we're not benefiting from prefetching now.

A minor optimization to avoid pread() be blocked by read() inside the same
DFSInputStream
-

Key: HDFS-6735
URL: https://issues.apache.org/jira/browse/HDFS-6735
Project: Hadoop HDFS
Issue Type: Improvement
Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie
Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735.txt

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-19 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219074#comment-14219074
 ] 

Lars Hofhansl commented on HDFS-6735:
-

re: tryReadZeroCopy removing the synchronization is fine, because it is only 
called from (stateful) read(...) and pos is only used in the stateful read path 
and hence needs to be guarded by the lock on this only.


 A minor optimization to avoid pread() be blocked by read() inside the same 
 DFSInputStream
 -

 Key: HDFS-6735
 URL: https://issues.apache.org/jira/browse/HDFS-6735
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735.txt


 In current DFSInputStream impl, there're a couple of coarser-grained locks in 
 read/pread path, and it has became a HBase read latency pain point so far. In 
 HDFS-6698, i made a minor patch against the first encourtered lock, around 
 getFileLength, in deed, after reading code and testing, it shows still other 
 locks we could improve.
 In this jira, i'll make a patch against other locks, and a simple test case 
 to show the issue and the improved result.
 This is important for HBase application, since in current HFile read path, we 
 issue all read()/pread() requests in the same DFSInputStream for one HFile. 
 (Multi streams solution is another story i had a plan to do, but probably 
 will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-19 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219075#comment-14219075
 ] 

Lars Hofhansl commented on HDFS-6735:
-

Apologies for the spam... I have a backport of this to branch-2.4 in case 
anybody is interested.


 A minor optimization to avoid pread() be blocked by read() inside the same 
 DFSInputStream
 -

 Key: HDFS-6735
 URL: https://issues.apache.org/jira/browse/HDFS-6735
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735.txt


 In current DFSInputStream impl, there're a couple of coarser-grained locks in 
 read/pread path, and it has became a HBase read latency pain point so far. In 
 HDFS-6698, i made a minor patch against the first encourtered lock, around 
 getFileLength, in deed, after reading code and testing, it shows still other 
 locks we could improve.
 In this jira, i'll make a patch against other locks, and a simple test case 
 to show the issue and the improved result.
 This is important for HBase application, since in current HFile read path, we 
 issue all read()/pread() requests in the same DFSInputStream for one HFile. 
 (Multi streams solution is another story i had a plan to do, but probably 
 will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-08 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HDFS-6735:

Attachment: HDFS-6735-v3.txt

I classified the state in DFSInputStream into state used by read only and state 
used by both read and pread.

With that here's a new proposed patch.
* makes LocatedBlocks immutable (which was intended it seems)
* pread no longer affects currentNode (that was unintended I think)
* guards state shared between read and pread with an extra sharedLock
(the state used for read only is still guarded by a lock on this, which we 
need to take anyway to avoid concurrent stateful reads against the same input 
stream)
* removed all synchronized on private method that were only called from methods 
already synchronized (good practice anyway)
* makes cachingStrategy volatile (made more sense than locking there)
* should be free of deadlocks (never acquire lock on this with sharedLock 
held, but the reverse is possible)
* pos, blockEnd, currentLocatedBlock are not updated in getBlockAt unless 
called on behalf of read (not for pread, hence locking on this not needed 
there)

I have not tested this, yet.
Please have a careful look and let me know what you think.
We might want to further disentangle the mixed state.

(And just maybe the best solution would be for HBase to have an input stream 
for each thread doing read and one for all threads doing preads - and not do 
any of this...?)

 A minor optimization to avoid pread() be blocked by read() inside the same 
 DFSInputStream
 -

 Key: HDFS-6735
 URL: https://issues.apache.org/jira/browse/HDFS-6735
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-6735-v2.txt, HDFS-6735-v3.txt, HDFS-6735.txt


 In current DFSInputStream impl, there're a couple of coarser-grained locks in 
 read/pread path, and it has became a HBase read latency pain point so far. In 
 HDFS-6698, i made a minor patch against the first encourtered lock, around 
 getFileLength, in deed, after reading code and testing, it shows still other 
 locks we could improve.
 In this jira, i'll make a patch against other locks, and a simple test case 
 to show the issue and the improved result.
 This is important for HBase application, since in current HFile read path, we 
 issue all read()/pread() requests in the same DFSInputStream for one HFile. 
 (Multi streams solution is another story i had a plan to do, but probably 
 will take more time than i expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6698) try to optimize DFSInputStream.getFileLength()

2014-11-03 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195409#comment-14195409
 ] 

Lars Hofhansl commented on HDFS-6698:
-

Yeah, let's combine these. We can close this one and do the work in HDFS-6735.

I'm with you on volatile, it only guarantees visibility (via memory barriers) 
but doesn't control concurrent access. Things should be final (immutable) or 
locked correctly - volatile is rarely enough by itself.

Using a separate lock for touching DFSInputStream#locatedBlocks seems like the 
right approach to me.


 try to optimize DFSInputStream.getFileLength()
 --

 Key: HDFS-6698
 URL: https://issues.apache.org/jira/browse/HDFS-6698
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-6698.txt, HDFS-6698.txt, HDFS-6698v2.txt, 
 HDFS-6698v2.txt, HDFS-6698v3.txt


 HBase prefers to invoke read() serving scan request, and invoke pread() 
 serving get reqeust. Because pread() almost holds no lock.
 Let's image there's a read() running, because the definition is:
 {code}
 public synchronized int read
 {code}
 so no other read() request could run concurrently, this is known, but pread() 
 also could not run...  because:
 {code}
   public int read(long position, byte[] buffer, int offset, int length)
 throws IOException {
 // sanity checks
 dfsClient.checkOpen();
 if (closed) {
   throw new IOException(Stream closed);
 }
 failures = 0;
 long filelen = getFileLength();
 {code}
 the getFileLength() also needs lock.  so we need to figure out a no lock impl 
 for getFileLength() before HBase multi stream feature done. 
 [~saint@gmail.com]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6698) try to optimize DFSInputStream.getFileLength()

2014-11-01 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14192993#comment-14192993
 ] 

Lars Hofhansl commented on HDFS-6698:
-

Now... I am not saying that we do not have work to in HBase:
* we're using one reader per HFile
* after a major compaction we have a single store file per column family (that 
file can be up to 20GB in size)
* we allow one thread using seek+read on that reader, other concurrent scanners 
will fall back to pread (see HBASE-7336).

For my test I did this:
* my test table had 2^25 (~32m) rows, in two regions, about 1GB on disk
* I tested this with Phoenix, which can break a query into parts and execute 
scans for the parts (that's where the parallel scanning on the same readers 
comes into play)
* I have short circuit reading enabled
* all data in the OS cache (HBase block cache not used)

This is not an uncommon scenario, though. The original poster cited 
scans(seek+read) + gets(pread) as a problem.

In either case, I'll post an updated patch to HDFS-6735 and we can take it from 
there.


 try to optimize DFSInputStream.getFileLength()
 --

 Key: HDFS-6698
 URL: https://issues.apache.org/jira/browse/HDFS-6698
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-6698.txt, HDFS-6698.txt, HDFS-6698v2.txt, 
 HDFS-6698v2.txt, HDFS-6698v3.txt


 HBase prefers to invoke read() serving scan request, and invoke pread() 
 serving get reqeust. Because pread() almost holds no lock.
 Let's image there's a read() running, because the definition is:
 {code}
 public synchronized int read
 {code}
 so no other read() request could run concurrently, this is known, but pread() 
 also could not run...  because:
 {code}
   public int read(long position, byte[] buffer, int offset, int length)
 throws IOException {
 // sanity checks
 dfsClient.checkOpen();
 if (closed) {
   throw new IOException(Stream closed);
 }
 failures = 0;
 long filelen = getFileLength();
 {code}
 the getFileLength() also needs lock.  so we need to figure out a no lock impl 
 for getFileLength() before HBase multi stream feature done. 
 [~saint@gmail.com]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6735) A minor optimization to avoid pread() be blocked by read() inside the same DFSInputStream

2014-11-01 Thread Lars Hofhansl (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193629#comment-14193629
]

Lars Hofhansl commented on HDFS-6735:
-

As described in HDFS-6698, the potential performance gains for something like
HBase are substantial.

I agree it's better to keep LocatedBlocks as not threadsafe and require called
to lock accordingly.
I've not see fetchAt in a hot path (at least not from HBase usage patterns).
seek + read (non positional) cannot be done concurrently, agreed. pread should
be possible, though.

How should we continue to move on this? Seems important. :)

Also open to suggestions about how to fix things in HBase (see last comment in
HDFS-6698, about how HBase handles things and how limited concurrency within
an InputStream is an issue).

A minor optimization to avoid pread() be blocked by read() inside the same
DFSInputStream
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6698) try to optimize DFSInputStream.getFileLength()

2014-10-31 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14192209#comment-14192209
 ] 

Lars Hofhansl commented on HDFS-6698:
-

Need to make sure now that we do not kick the can down the road; there are more 
synchronized methods call from within read. I'll do some testing and report 
back.


 try to optimize DFSInputStream.getFileLength()
 --

 Key: HDFS-6698
 URL: https://issues.apache.org/jira/browse/HDFS-6698
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-6698.txt, HDFS-6698.txt, HDFS-6698v2.txt, 
 HDFS-6698v2.txt, HDFS-6698v3.txt


 HBase prefers to invoke read() serving scan request, and invoke pread() 
 serving get reqeust. Because pread() almost holds no lock.
 Let's image there's a read() running, because the definition is:
 {code}
 public synchronized int read
 {code}
 so no other read() request could run concurrently, this is known, but pread() 
 also could not run...  because:
 {code}
   public int read(long position, byte[] buffer, int offset, int length)
 throws IOException {
 // sanity checks
 dfsClient.checkOpen();
 if (closed) {
   throw new IOException(Stream closed);
 }
 failures = 0;
 long filelen = getFileLength();
 {code}
 the getFileLength() also needs lock.  so we need to figure out a no lock impl 
 for getFileLength() before HBase multi stream feature done. 
 [~saint@gmail.com]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6698) try to optimize DFSInputStream.getFileLength()

2014-10-31 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14192550#comment-14192550
 ] 

Lars Hofhansl commented on HDFS-6698:
-

Indeed I now find that the time is spent in {{getBlockRange()}} :)
I'll look at HDFS-6735 and include fixes from there.

 try to optimize DFSInputStream.getFileLength()
 --

 Key: HDFS-6698
 URL: https://issues.apache.org/jira/browse/HDFS-6698
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-6698.txt, HDFS-6698.txt, HDFS-6698v2.txt, 
 HDFS-6698v2.txt, HDFS-6698v3.txt


 HBase prefers to invoke read() serving scan request, and invoke pread() 
 serving get reqeust. Because pread() almost holds no lock.
 Let's image there's a read() running, because the definition is:
 {code}
 public synchronized int read
 {code}
 so no other read() request could run concurrently, this is known, but pread() 
 also could not run...  because:
 {code}
   public int read(long position, byte[] buffer, int offset, int length)
 throws IOException {
 // sanity checks
 dfsClient.checkOpen();
 if (closed) {
   throw new IOException(Stream closed);
 }
 failures = 0;
 long filelen = getFileLength();
 {code}
 the getFileLength() also needs lock.  so we need to figure out a no lock impl 
 for getFileLength() before HBase multi stream feature done. 
 [~saint@gmail.com]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6698) try to optimize DFSInputStream.getFileLength()

2014-10-31 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14192684#comment-14192684
 ] 

Lars Hofhansl commented on HDFS-6698:
-

Pulling in selected changes from HDFS-6735 yields a HUGE speed improvement. A 
scan that took 16s to execute now finishes in 9s. (setup is such all data fits 
into the OS cache and the HBase cache is disabled to isolate this code path)


 try to optimize DFSInputStream.getFileLength()
 --

 Key: HDFS-6698
 URL: https://issues.apache.org/jira/browse/HDFS-6698
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-6698.txt, HDFS-6698.txt, HDFS-6698v2.txt, 
 HDFS-6698v2.txt, HDFS-6698v3.txt


 HBase prefers to invoke read() serving scan request, and invoke pread() 
 serving get reqeust. Because pread() almost holds no lock.
 Let's image there's a read() running, because the definition is:
 {code}
 public synchronized int read
 {code}
 so no other read() request could run concurrently, this is known, but pread() 
 also could not run...  because:
 {code}
   public int read(long position, byte[] buffer, int offset, int length)
 throws IOException {
 // sanity checks
 dfsClient.checkOpen();
 if (closed) {
   throw new IOException(Stream closed);
 }
 failures = 0;
 long filelen = getFileLength();
 {code}
 the getFileLength() also needs lock.  so we need to figure out a no lock impl 
 for getFileLength() before HBase multi stream feature done. 
 [~saint@gmail.com]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6698) try to optimize DFSInputStream.getFileLength()

2014-10-30 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HDFS-6698:

Attachment: HDFS-6698v3.txt

I just ran into this as well while debugging why HBase does not benefit from 
Snappy compression as much as it should. Turns out a non-trivial amount of time 
(as determined by a sampler, not a instrumenting profiler) is spent in this 
method.

To be safe I'd probably also turn LocatedBlocks into an immutable object (well, 
except for blocks) - see attached patch. All members of LocatedBlocks are 
safely published now.

With that I don't think this patch can do any harm.


 try to optimize DFSInputStream.getFileLength()
 --

 Key: HDFS-6698
 URL: https://issues.apache.org/jira/browse/HDFS-6698
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-6698.txt, HDFS-6698.txt, HDFS-6698v2.txt, 
 HDFS-6698v2.txt, HDFS-6698v3.txt


 HBase prefers to invoke read() serving scan request, and invoke pread() 
 serving get reqeust. Because pread() almost holds no lock.
 Let's image there's a read() running, because the definition is:
 {code}
 public synchronized int read
 {code}
 so no other read() request could run concurrently, this is known, but pread() 
 also could not run...  because:
 {code}
   public int read(long position, byte[] buffer, int offset, int length)
 throws IOException {
 // sanity checks
 dfsClient.checkOpen();
 if (closed) {
   throw new IOException(Stream closed);
 }
 failures = 0;
 long filelen = getFileLength();
 {code}
 the getFileLength() also needs lock.  so we need to figure out a no lock impl 
 for getFileLength() before HBase multi stream feature done. 
 [~saint@gmail.com]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-5042) Completed files lost after power failure

2014-10-03 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158235#comment-14158235
 ] 

Lars Hofhansl commented on HDFS-5042:
-

Thanks Luke.

I meant to say: (1) finish writing the block. (2) Move it. (3) fsync or 
fdatasync the block file in the new location.
(We'd just change the order of moving vs. fsync.)

The rename would still be atomic (file block is written completely before we 
move it), but doing the fsync after should order the meta data commits 
correctly assuming write barriers. Then again the write and the move would be 
two different transactions as far as the fs is concerned.

Agree it's cleanest if we in fact sync both actions.


 Completed files lost after power failure
 

 Key: HDFS-5042
 URL: https://issues.apache.org/jira/browse/HDFS-5042
 Project: Hadoop HDFS
  Issue Type: Bug
 Environment: ext3 on CentOS 5.7 (kernel 2.6.18-274.el5)
Reporter: Dave Latham
Priority: Critical

 We suffered a cluster wide power failure after which HDFS lost data that it 
 had acknowledged as closed and complete.
 The client was HBase which compacted a set of HFiles into a new HFile, then 
 after closing the file successfully, deleted the previous versions of the 
 file.  The cluster then lost power, and when brought back up the newly 
 created file was marked CORRUPT.
 Based on reading the logs it looks like the replicas were created by the 
 DataNodes in the 'blocksBeingWritten' directory.  Then when the file was 
 closed they were moved to the 'current' directory.  After the power cycle 
 those replicas were again in the blocksBeingWritten directory of the 
 underlying file system (ext3).  When those DataNodes reported in to the 
 NameNode it deleted those replicas and lost the file.
 Some possible fixes could be having the DataNode fsync the directory(s) after 
 moving the block from blocksBeingWritten to current to ensure the rename is 
 durable or having the NameNode accept replicas from blocksBeingWritten under 
 certain circumstances.
 Log snippets from RS (RegionServer), NN (NameNode), DN (DataNode):
 {noformat}
 RS 2013-06-29 11:16:06,812 DEBUG org.apache.hadoop.hbase.util.FSUtils: 
 Creating 
 file=hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
  with permission=rwxrwxrwx
 NN 2013-06-29 11:16:06,830 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 NameSystem.allocateBlock: 
 /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c.
  blk_1395839728632046111_357084589
 DN 2013-06-29 11:16:06,832 INFO 
 org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block 
 blk_1395839728632046111_357084589 src: /10.0.5.237:14327 dest: 
 /10.0.5.237:50010
 NN 2013-06-29 11:16:11,370 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 NameSystem.addStoredBlock: blockMap updated: 10.0.6.1:50010 is added to 
 blk_1395839728632046111_357084589 size 25418340
 NN 2013-06-29 11:16:11,370 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 NameSystem.addStoredBlock: blockMap updated: 10.0.6.24:50010 is added to 
 blk_1395839728632046111_357084589 size 25418340
 NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 NameSystem.addStoredBlock: blockMap updated: 10.0.5.237:50010 is added to 
 blk_1395839728632046111_357084589 size 25418340
 DN 2013-06-29 11:16:11,385 INFO 
 org.apache.hadoop.hdfs.server.datanode.DataNode: Received block 
 blk_1395839728632046111_357084589 of size 25418340 from /10.0.5.237:14327
 DN 2013-06-29 11:16:11,385 INFO 
 org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 2 for block 
 blk_1395839728632046111_357084589 terminating
 NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: Removing 
 lease on  file 
 /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
  from client DFSClient_hb_rs_hs745,60020,1372470111932
 NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: DIR* 
 NameSystem.completeFile: file 
 /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
  is closed by DFSClient_hb_rs_hs745,60020,1372470111932
 RS 2013-06-29 11:16:11,393 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Renaming compacted file at 
 hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
  to 
 hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/n/6e0cc30af6e64e56ba5a539fdf159c4c
 RS 2013-06-29 11:16:11,505 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Completed major compaction of 7 file(s) in n of 
 users-6,\x12\xBDp\xA3,1359426311784.b5b0820cde759ae68e333b2f4015bb7e. into 
 6e0cc30af6e64e56ba5a539fdf159c4c, size=24.2m; total size for store is 24.2m
 ---  CRASH, RESTART -
 NN

[jira] [Commented] (HDFS-5042) Completed files lost after power failure

2014-10-03 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158434#comment-14158434
 ] 

Lars Hofhansl commented on HDFS-5042:
-

Cool. That should work.


 Completed files lost after power failure
 

 Key: HDFS-5042
 URL: https://issues.apache.org/jira/browse/HDFS-5042
 Project: Hadoop HDFS
  Issue Type: Bug
 Environment: ext3 on CentOS 5.7 (kernel 2.6.18-274.el5)
Reporter: Dave Latham
Priority: Critical

 We suffered a cluster wide power failure after which HDFS lost data that it 
 had acknowledged as closed and complete.
 The client was HBase which compacted a set of HFiles into a new HFile, then 
 after closing the file successfully, deleted the previous versions of the 
 file.  The cluster then lost power, and when brought back up the newly 
 created file was marked CORRUPT.
 Based on reading the logs it looks like the replicas were created by the 
 DataNodes in the 'blocksBeingWritten' directory.  Then when the file was 
 closed they were moved to the 'current' directory.  After the power cycle 
 those replicas were again in the blocksBeingWritten directory of the 
 underlying file system (ext3).  When those DataNodes reported in to the 
 NameNode it deleted those replicas and lost the file.
 Some possible fixes could be having the DataNode fsync the directory(s) after 
 moving the block from blocksBeingWritten to current to ensure the rename is 
 durable or having the NameNode accept replicas from blocksBeingWritten under 
 certain circumstances.
 Log snippets from RS (RegionServer), NN (NameNode), DN (DataNode):
 {noformat}
 RS 2013-06-29 11:16:06,812 DEBUG org.apache.hadoop.hbase.util.FSUtils: 
 Creating 
 file=hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
  with permission=rwxrwxrwx
 NN 2013-06-29 11:16:06,830 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 NameSystem.allocateBlock: 
 /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c.
  blk_1395839728632046111_357084589
 DN 2013-06-29 11:16:06,832 INFO 
 org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block 
 blk_1395839728632046111_357084589 src: /10.0.5.237:14327 dest: 
 /10.0.5.237:50010
 NN 2013-06-29 11:16:11,370 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 NameSystem.addStoredBlock: blockMap updated: 10.0.6.1:50010 is added to 
 blk_1395839728632046111_357084589 size 25418340
 NN 2013-06-29 11:16:11,370 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 NameSystem.addStoredBlock: blockMap updated: 10.0.6.24:50010 is added to 
 blk_1395839728632046111_357084589 size 25418340
 NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 NameSystem.addStoredBlock: blockMap updated: 10.0.5.237:50010 is added to 
 blk_1395839728632046111_357084589 size 25418340
 DN 2013-06-29 11:16:11,385 INFO 
 org.apache.hadoop.hdfs.server.datanode.DataNode: Received block 
 blk_1395839728632046111_357084589 of size 25418340 from /10.0.5.237:14327
 DN 2013-06-29 11:16:11,385 INFO 
 org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 2 for block 
 blk_1395839728632046111_357084589 terminating
 NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: Removing 
 lease on  file 
 /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
  from client DFSClient_hb_rs_hs745,60020,1372470111932
 NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: DIR* 
 NameSystem.completeFile: file 
 /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
  is closed by DFSClient_hb_rs_hs745,60020,1372470111932
 RS 2013-06-29 11:16:11,393 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Renaming compacted file at 
 hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
  to 
 hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/n/6e0cc30af6e64e56ba5a539fdf159c4c
 RS 2013-06-29 11:16:11,505 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Completed major compaction of 7 file(s) in n of 
 users-6,\x12\xBDp\xA3,1359426311784.b5b0820cde759ae68e333b2f4015bb7e. into 
 6e0cc30af6e64e56ba5a539fdf159c4c, size=24.2m; total size for store is 24.2m
 ---  CRASH, RESTART -
 NN 2013-06-29 12:01:19,743 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 NameSystem.addStoredBlock: addStoredBlock request received for 
 blk_1395839728632046111_357084589 on 10.0.6.1:50010 size 21978112 but was 
 rejected: Reported as block being written but is a block of closed file.
 NN 2013-06-29 12:01:19,743 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 NameSystem.addToInvalidates: blk_1395839728632046111 is added to invalidSet 
 of 10.0.6.1:50010
 NN 2013-06-29 12:01:20,155 INFO

[jira] [Commented] (HDFS-5042) Completed files lost after power failure

2014-10-03 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158677#comment-14158677
 ] 

Lars Hofhansl commented on HDFS-5042:
-

We should study the perf impact.

Previously I found that sync-on-close severely impacted file creation time - 
unless sync-behind-writes is also enabled. (Interestingly sync-behind-writes 
should not cause any performance detriment as we're dealing with immutable 
files, and hence delaying writing these dirty blocks to disk in the hopes that 
they'd be updated before we do so is pointless anyway). 


 Completed files lost after power failure
 

 Key: HDFS-5042
 URL: https://issues.apache.org/jira/browse/HDFS-5042
 Project: Hadoop HDFS
  Issue Type: Bug
 Environment: ext3 on CentOS 5.7 (kernel 2.6.18-274.el5)
Reporter: Dave Latham
Priority: Critical

 We suffered a cluster wide power failure after which HDFS lost data that it 
 had acknowledged as closed and complete.
 The client was HBase which compacted a set of HFiles into a new HFile, then 
 after closing the file successfully, deleted the previous versions of the 
 file.  The cluster then lost power, and when brought back up the newly 
 created file was marked CORRUPT.
 Based on reading the logs it looks like the replicas were created by the 
 DataNodes in the 'blocksBeingWritten' directory.  Then when the file was 
 closed they were moved to the 'current' directory.  After the power cycle 
 those replicas were again in the blocksBeingWritten directory of the 
 underlying file system (ext3).  When those DataNodes reported in to the 
 NameNode it deleted those replicas and lost the file.
 Some possible fixes could be having the DataNode fsync the directory(s) after 
 moving the block from blocksBeingWritten to current to ensure the rename is 
 durable or having the NameNode accept replicas from blocksBeingWritten under 
 certain circumstances.
 Log snippets from RS (RegionServer), NN (NameNode), DN (DataNode):
 {noformat}
 RS 2013-06-29 11:16:06,812 DEBUG org.apache.hadoop.hbase.util.FSUtils: 
 Creating 
 file=hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
  with permission=rwxrwxrwx
 NN 2013-06-29 11:16:06,830 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 NameSystem.allocateBlock: 
 /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c.
  blk_1395839728632046111_357084589
 DN 2013-06-29 11:16:06,832 INFO 
 org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block 
 blk_1395839728632046111_357084589 src: /10.0.5.237:14327 dest: 
 /10.0.5.237:50010
 NN 2013-06-29 11:16:11,370 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 NameSystem.addStoredBlock: blockMap updated: 10.0.6.1:50010 is added to 
 blk_1395839728632046111_357084589 size 25418340
 NN 2013-06-29 11:16:11,370 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 NameSystem.addStoredBlock: blockMap updated: 10.0.6.24:50010 is added to 
 blk_1395839728632046111_357084589 size 25418340
 NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 NameSystem.addStoredBlock: blockMap updated: 10.0.5.237:50010 is added to 
 blk_1395839728632046111_357084589 size 25418340
 DN 2013-06-29 11:16:11,385 INFO 
 org.apache.hadoop.hdfs.server.datanode.DataNode: Received block 
 blk_1395839728632046111_357084589 of size 25418340 from /10.0.5.237:14327
 DN 2013-06-29 11:16:11,385 INFO 
 org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 2 for block 
 blk_1395839728632046111_357084589 terminating
 NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: Removing 
 lease on  file 
 /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
  from client DFSClient_hb_rs_hs745,60020,1372470111932
 NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: DIR* 
 NameSystem.completeFile: file 
 /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
  is closed by DFSClient_hb_rs_hs745,60020,1372470111932
 RS 2013-06-29 11:16:11,393 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Renaming compacted file at 
 hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
  to 
 hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/n/6e0cc30af6e64e56ba5a539fdf159c4c
 RS 2013-06-29 11:16:11,505 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Completed major compaction of 7 file(s) in n of 
 users-6,\x12\xBDp\xA3,1359426311784.b5b0820cde759ae68e333b2f4015bb7e. into 
 6e0cc30af6e64e56ba5a539fdf159c4c, size=24.2m; total size for store is 24.2m
 ---  CRASH, RESTART -
 NN 2013-06-29 12:01:19,743 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 NameSystem.addStoredBlock: addStoredBlock

[jira] [Commented] (HDFS-5042) Completed files lost after power failure

2014-10-02 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157328#comment-14157328
 ] 

Lars Hofhansl commented on HDFS-5042:
-

Is this a problem when enabling write barriers on the DNs? EXT3 has them off by 
default.
In that case we might need to move the file in place first and then fsync the 
file, that should force the meta updates in order... I'm sure that'd cause 
other problems.


 Completed files lost after power failure
 

 Key: HDFS-5042
 URL: https://issues.apache.org/jira/browse/HDFS-5042
 Project: Hadoop HDFS
  Issue Type: Bug
 Environment: ext3 on CentOS 5.7 (kernel 2.6.18-274.el5)
Reporter: Dave Latham
Priority: Critical

 We suffered a cluster wide power failure after which HDFS lost data that it 
 had acknowledged as closed and complete.
 The client was HBase which compacted a set of HFiles into a new HFile, then 
 after closing the file successfully, deleted the previous versions of the 
 file.  The cluster then lost power, and when brought back up the newly 
 created file was marked CORRUPT.
 Based on reading the logs it looks like the replicas were created by the 
 DataNodes in the 'blocksBeingWritten' directory.  Then when the file was 
 closed they were moved to the 'current' directory.  After the power cycle 
 those replicas were again in the blocksBeingWritten directory of the 
 underlying file system (ext3).  When those DataNodes reported in to the 
 NameNode it deleted those replicas and lost the file.
 Some possible fixes could be having the DataNode fsync the directory(s) after 
 moving the block from blocksBeingWritten to current to ensure the rename is 
 durable or having the NameNode accept replicas from blocksBeingWritten under 
 certain circumstances.
 Log snippets from RS (RegionServer), NN (NameNode), DN (DataNode):
 {noformat}
 RS 2013-06-29 11:16:06,812 DEBUG org.apache.hadoop.hbase.util.FSUtils: 
 Creating 
 file=hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
  with permission=rwxrwxrwx
 NN 2013-06-29 11:16:06,830 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 NameSystem.allocateBlock: 
 /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c.
  blk_1395839728632046111_357084589
 DN 2013-06-29 11:16:06,832 INFO 
 org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block 
 blk_1395839728632046111_357084589 src: /10.0.5.237:14327 dest: 
 /10.0.5.237:50010
 NN 2013-06-29 11:16:11,370 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 NameSystem.addStoredBlock: blockMap updated: 10.0.6.1:50010 is added to 
 blk_1395839728632046111_357084589 size 25418340
 NN 2013-06-29 11:16:11,370 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 NameSystem.addStoredBlock: blockMap updated: 10.0.6.24:50010 is added to 
 blk_1395839728632046111_357084589 size 25418340
 NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 NameSystem.addStoredBlock: blockMap updated: 10.0.5.237:50010 is added to 
 blk_1395839728632046111_357084589 size 25418340
 DN 2013-06-29 11:16:11,385 INFO 
 org.apache.hadoop.hdfs.server.datanode.DataNode: Received block 
 blk_1395839728632046111_357084589 of size 25418340 from /10.0.5.237:14327
 DN 2013-06-29 11:16:11,385 INFO 
 org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 2 for block 
 blk_1395839728632046111_357084589 terminating
 NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: Removing 
 lease on  file 
 /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
  from client DFSClient_hb_rs_hs745,60020,1372470111932
 NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: DIR* 
 NameSystem.completeFile: file 
 /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
  is closed by DFSClient_hb_rs_hs745,60020,1372470111932
 RS 2013-06-29 11:16:11,393 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Renaming compacted file at 
 hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
  to 
 hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/n/6e0cc30af6e64e56ba5a539fdf159c4c
 RS 2013-06-29 11:16:11,505 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Completed major compaction of 7 file(s) in n of 
 users-6,\x12\xBDp\xA3,1359426311784.b5b0820cde759ae68e333b2f4015bb7e. into 
 6e0cc30af6e64e56ba5a539fdf159c4c, size=24.2m; total size for store is 24.2m
 ---  CRASH, RESTART -
 NN 2013-06-29 12:01:19,743 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 NameSystem.addStoredBlock: addStoredBlock request received for 
 blk_1395839728632046111_357084589 on 10.0.6.1:50010 size 21978112 but was 
 rejected: Reported as block being written but is

[jira] [Resolved] (HDFS-4455) Datanode sometimes gives up permanently on Namenode in HA setup

2014-05-18 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-4455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HDFS-4455.
-

Resolution: Implemented

 Datanode sometimes gives up permanently on Namenode in HA setup
 ---

 Key: HDFS-4455
 URL: https://issues.apache.org/jira/browse/HDFS-4455
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, ha
Affects Versions: 2.0.2-alpha
Reporter: Lars Hofhansl
Assignee: Juan Yu
Priority: Critical

 Today we got ourselves into a situation where we hard killed the cluster 
 (kill -9 across the board on all processes) and upon restarting all DNs would 
 permanently give up on of the NNs in our two NN HA setup (using QJM).
 The HA setup is correct (prior to this we failed over the NNs many times for 
 testing). Bouncing the DNs resolved the problem.
 In the logs I see this exception:
 {code}
 2013-01-29 23:32:49,461 FATAL datanode.DataNode - Initialization failed for 
 block pool Block pool BP-1852726028-ip-1358813649047 (storage id 
 DS-60505003-ip-50010-1353106051747) service to host/ip:8020
 java.io.IOException: Failed on local exception: java.io.IOException: Response 
 is null.; Host Details : local host is: host/ip; destination host is: 
 host:8020; 
 at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:759)
 at org.apache.hadoop.ipc.Client.call(Client.java:1164)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
 at $Proxy10.registerDatanode(Unknown Source)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
 at $Proxy10.registerDatanode(Unknown Source)
 at 
 org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:149)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:619)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:221)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:661)
 at java.lang.Thread.run(Thread.java:662)
 Caused by: java.io.IOException: Response is null.
 at 
 org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:885)
 at org.apache.hadoop.ipc.Client$Connection.run(Client.java:813)
 2013-01-29 23:32:49,463 WARN  datanode.DataNode - Ending block pool service 
 for: Block pool BP-1852726028-ip-1358813649047 (storage id 
 DS-60505003-ip-50010-1353106051747) service to host/ip:8020
 {code}
 So somehow in BPServiceActor.connectToNNAndHandshake() we made it all the way 
 to register(). Then failed in bpNamenode.registerDatanode(bpRegistration) 
 with an IOException, which is not caught and has the block pool service fail 
 as a whole.
 No doubt that was caused by one of the NNs being a weird state. While that 
 happened the active NN claimed that the FS was corrupted and stayed in safe 
 mode, and DNs only registered with the standby DN. Failing over to the 2nd NN 
 and then restarting the first NN and failing did not change that.
 No amount bouncing/failing over the HA NNs would have the DNs reconnect to 
 one of the NNs.
 In BPServiceActor.register(), should we catch IOException instead of 
 SocketTimeoutException? That way it would continue to retry and eventually 
 connect to the NN.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-4455) Datanode sometimes gives up permanently on Namenode in HA setup

2014-05-18 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14001009#comment-14001009
 ] 

Lars Hofhansl commented on HDFS-4455:
-

Looked at HDFS-2882. I agree that should fix this issue.

 Datanode sometimes gives up permanently on Namenode in HA setup
 ---

 Key: HDFS-4455
 URL: https://issues.apache.org/jira/browse/HDFS-4455
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, ha
Affects Versions: 2.0.2-alpha
Reporter: Lars Hofhansl
Assignee: Juan Yu
Priority: Critical

 Today we got ourselves into a situation where we hard killed the cluster 
 (kill -9 across the board on all processes) and upon restarting all DNs would 
 permanently give up on of the NNs in our two NN HA setup (using QJM).
 The HA setup is correct (prior to this we failed over the NNs many times for 
 testing). Bouncing the DNs resolved the problem.
 In the logs I see this exception:
 {code}
 2013-01-29 23:32:49,461 FATAL datanode.DataNode - Initialization failed for 
 block pool Block pool BP-1852726028-ip-1358813649047 (storage id 
 DS-60505003-ip-50010-1353106051747) service to host/ip:8020
 java.io.IOException: Failed on local exception: java.io.IOException: Response 
 is null.; Host Details : local host is: host/ip; destination host is: 
 host:8020; 
 at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:759)
 at org.apache.hadoop.ipc.Client.call(Client.java:1164)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
 at $Proxy10.registerDatanode(Unknown Source)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
 at $Proxy10.registerDatanode(Unknown Source)
 at 
 org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:149)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:619)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:221)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:661)
 at java.lang.Thread.run(Thread.java:662)
 Caused by: java.io.IOException: Response is null.
 at 
 org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:885)
 at org.apache.hadoop.ipc.Client$Connection.run(Client.java:813)
 2013-01-29 23:32:49,463 WARN  datanode.DataNode - Ending block pool service 
 for: Block pool BP-1852726028-ip-1358813649047 (storage id 
 DS-60505003-ip-50010-1353106051747) service to host/ip:8020
 {code}
 So somehow in BPServiceActor.connectToNNAndHandshake() we made it all the way 
 to register(). Then failed in bpNamenode.registerDatanode(bpRegistration) 
 with an IOException, which is not caught and has the block pool service fail 
 as a whole.
 No doubt that was caused by one of the NNs being a weird state. While that 
 happened the active NN claimed that the FS was corrupted and stayed in safe 
 mode, and DNs only registered with the standby DN. Failing over to the 2nd NN 
 and then restarting the first NN and failing did not change that.
 No amount bouncing/failing over the HA NNs would have the DNs reconnect to 
 one of the NNs.
 In BPServiceActor.register(), should we catch IOException instead of 
 SocketTimeoutException? That way it would continue to retry and eventually 
 connect to the NN.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HDFS-5571) postScannerFilterRow consumes a lot of CPU in tall table scans

2013-11-26 Thread Lars Hofhansl (JIRA)

Lars Hofhansl created HDFS-5571:
---

 Summary: postScannerFilterRow consumes a lot of CPU in tall table 
scans
 Key: HDFS-5571
 URL: https://issues.apache.org/jira/browse/HDFS-5571
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Lars Hofhansl


Continuing my profiling quest, I find that in scanning tall table (and 
filtering everything on the server) a quarter of the time is now spent in the 
postScannerFilterRow coprocessor hook.




--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HDFS-5461) fallback to non-ssr(local short circuit reads) while oom detected

2013-11-13 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13821560#comment-13821560
 ] 

Lars Hofhansl commented on HDFS-5461:
-

The issue is that the JDK only collects direct byte buffers during a full GC, 
and there are different limits for the direct buffer and the general heap. 
HBase keeps a reader open for each store file and thus we end up with a lot of 
direct memory used.

I was actually curious about 1mb as the default size; it seems even as little 
8kb should be OK.

 fallback to non-ssr(local short circuit reads) while oom detected
 -

 Key: HDFS-5461
 URL: https://issues.apache.org/jira/browse/HDFS-5461
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 3.0.0, 2.2.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-5461.txt


 Currently, the DirectBufferPool used by ssr feature seems doesn't have a 
 upper-bound limit except DirectMemory VM option. So there's a risk to 
 encounter direct memory oom. see HBASE-8143 for example.
 IMHO, maybe we could improve it a bit:
 1) detect OOM or reach a setting up-limit from caller, then fallback to 
 non-ssr
 2) add a new metric about current raw consumed direct memory size.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HDFS-2834) ByteBuffer-based read API for DFSInputStream

2013-08-08 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13733186#comment-13733186
 ] 

Lars Hofhansl commented on HDFS-2834:
-

Just for reference with many open files one can easily OOM on direct buffer 
memory. See: HBASE-8143.
1MB seems to be a rather large default.


 ByteBuffer-based read API for DFSInputStream
 

 Key: HDFS-2834
 URL: https://issues.apache.org/jira/browse/HDFS-2834
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client, performance
Reporter: Henry Robinson
Assignee: Henry Robinson
 Fix For: 2.0.2-alpha

 Attachments: HDFS-2834.10.patch, HDFS-2834.11.patch, 
 HDFS-2834.3.patch, HDFS-2834.4.patch, HDFS-2834.5.patch, HDFS-2834.6.patch, 
 HDFS-2834.7.patch, HDFS-2834.8.patch, HDFS-2834.9.patch, 
 hdfs-2834-libhdfs-benchmark.png, HDFS-2834-no-common.patch, HDFS-2834.patch, 
 HDFS-2834.patch


 The {{DFSInputStream}} read-path always copies bytes into a JVM-allocated 
 {{byte[]}}. Although for many clients this is desired behaviour, in certain 
 situations, such as native-reads through libhdfs, this imposes an extra copy 
 penalty since the {{byte[]}} needs to be copied out again into a natively 
 readable memory area. 
 For these cases, it would be preferable to allow the client to supply its own 
 buffer, wrapped in a {{ByteBuffer}}, to avoid that final copy overhead. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1783) Ability for HDFS client to write replicas in parallel

2013-03-06 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HDFS-1783:


Assignee: (was: Lars Hofhansl)

 Ability for HDFS client to write replicas in parallel
 -

 Key: HDFS-1783
 URL: https://issues.apache.org/jira/browse/HDFS-1783
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Reporter: dhruba borthakur
 Attachments: HDFS-1783-trunk.patch, HDFS-1783-trunk-v2.patch, 
 HDFS-1783-trunk-v3.patch, HDFS-1783-trunk-v4.patch, HDFS-1783-trunk-v5.patch


 The current implementation of HDFS pipelines the writes to the three 
 replicas. This introduces some latency for realtime latency sensitive 
 applications. An alternate implementation that allows the client to write all 
 replicas in parallel gives much better response times to these applications. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-4455) Datanode sometimes gives up permanently on Namenode in HA setup

2013-01-29 Thread Lars Hofhansl (JIRA)

Lars Hofhansl created HDFS-4455:
---

 Summary: Datanode sometimes gives up permanently on Namenode in HA 
setup
 Key: HDFS-4455
 URL: https://issues.apache.org/jira/browse/HDFS-4455
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Reporter: Lars Hofhansl


Today we got ourselves into a situation where we hard killed the cluster (kill 
-9 across the board on all processes) and upon restarting all DNs would 
permanently give up on of the NNs in our two NN HA setup (using QJM).

The HA setup is correct (prior to this we failed over the NNs many times for 
testing). Bouncing the DNs resolved the problem.

In the logs I see this exception:
{code}
2013-01-29 23:32:49,461 FATAL datanode.DataNode - Initialization failed for 
block pool Block pool BP-1852726028-ip-1358813649047 (storage id 
DS-60505003-ip-50010-1353106051747) service to host/ip:8020
java.io.IOException: Failed on local exception: java.io.IOException: Response 
is null.; Host Details : local host is: host/ip; destination host is: 
host:8020; 
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:759)
at org.apache.hadoop.ipc.Client.call(Client.java:1164)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at $Proxy10.registerDatanode(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
at $Proxy10.registerDatanode(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:149)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:619)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:221)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:661)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Response is null.
at 
org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:885)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:813)
2013-01-29 23:32:49,463 WARN  datanode.DataNode - Ending block pool service 
for: Block pool BP-1852726028-ip-1358813649047 (storage id 
DS-60505003-ip-50010-1353106051747) service to host/ip:8020
{code}

So somehow in BPServiceActor.connectToNNAndHandshake() we made it all the way 
to register(). Then failed in bpNamenode.registerDatanode(bpRegistration) with 
an IOException, which is not caught and has the block pool service fail as a 
whole.

No doubt that was caused by one of the NNs being a weird state. While that 
happened the active NN claimed that the FS was corrupted and stayed in safe 
mode, and DNs only registered with the standby DN. Failing over to the 2nd NN 
and then restarting the first NN and failing did not change that.

No amount bouncing/failing over the HA NNs would have the DNs reconnect to one 
of the NNs.

In BPServiceActor.register(), should we catch IOException instead of 
SocketTimeoutException? That way it would continue to retry and eventually 
connect to the NN.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-4345) Release resources of unpoolable Decompressors

2012-12-27 Thread Lars Hofhansl (JIRA)

Lars Hofhansl created HDFS-4345:
---

 Summary: Release resources of unpoolable Decompressors
 Key: HDFS-4345
 URL: https://issues.apache.org/jira/browse/HDFS-4345
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 3.0.0


Found this when looking into HBASE-7435.
When a Decompressor is returned to the pool in CodecPool.java, we should 
probably call end() on it to release its resources.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-4345) Release resources of unpoolable Decompressors

2012-12-27 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HDFS-4345:


Status: Patch Available  (was: Open)

 Release resources of unpoolable Decompressors
 -

 Key: HDFS-4345
 URL: https://issues.apache.org/jira/browse/HDFS-4345
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 3.0.0

 Attachments: HDFS-4345.txt


 Found this when looking into HBASE-7435.
 When a Decompressor is returned to the pool in CodecPool.java, we should 
 probably call end() on it to release its resources.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3979) Fix hsync semantics

2012-11-09 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494373#comment-13494373
 ] 

Lars Hofhansl commented on HDFS-3979:
-

Thanks Nicholas.

Luke, so the test you're looking for is starting 3 DNs, and then have the write 
permanently fail at any of them, and in all cases have the hsync fail on the 
client, right?


 Fix hsync semantics
 ---

 Key: HDFS-3979
 URL: https://issues.apache.org/jira/browse/HDFS-3979
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 2.0.2-alpha
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 2.0.3-alpha

 Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt, 
 hdfs-3979-v3.txt, hdfs-3979-v4.txt


 See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
 is not on a synchronous path from the DFSClient, hence it is possible that a 
 DN loses data that it has already acknowledged as persisted to a client.
 Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-4060) TestHSync#testSequenceFileSync failed

2012-11-09 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-4060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HDFS-4060.
-

Resolution: Duplicate

This is fixed with the changed to HDFS-3979

 TestHSync#testSequenceFileSync failed
 -

 Key: HDFS-4060
 URL: https://issues.apache.org/jira/browse/HDFS-4060
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Eli Collins
  Labels: test-fail

 TestHSync#testSequenceFileSync failed in the pre commit run of HDFS-4055.
 {noformat}
 java.lang.AssertionError: Bad value for metric FsyncCount expected:2 but 
 was:1
   at org.junit.Assert.fail(Assert.java:91)
   at org.junit.Assert.failNotEquals(Assert.java:645)
   at org.junit.Assert.assertEquals(Assert.java:126)
   at org.junit.Assert.assertEquals(Assert.java:470)
   at 
 org.apache.hadoop.test.MetricsAsserts.assertCounter(MetricsAsserts.java:228)
   at 
 org.apache.hadoop.hdfs.server.datanode.TestHSync.checkSyncMetric(TestHSync.java:46)
   at 
 org.apache.hadoop.hdfs.server.datanode.TestHSync.checkSyncMetric(TestHSync.java:49)
   at 
 org.apache.hadoop.hdfs.server.datanode.TestHSync.testSequenceFileSync(TestHSync.java:158)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-11-04 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13490445#comment-13490445
 ] 

Lars Hofhansl commented on HDFS-3979:
-

I'll make that change.

 Fix hsync and hflush semantics.
 ---

 Key: HDFS-3979
 URL: https://issues.apache.org/jira/browse/HDFS-3979
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt, hdfs-3979-v3.txt


 See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
 is not on a synchronous path from the DFSClient, hence it is possible that a 
 DN loses data that it has already acknowledged as persisted to a client.
 Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3979) Fix hsync and hflush semantics.

2012-11-04 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HDFS-3979:


Attachment: hdfs-3979-v4.txt

Updated patch with Nicholas' suggestion.

I agree that the previous patch would have slowed all writes that reach the DN.
We can't distinguish between an hflush from the client and normal packet from 
the client.
On the other hand this no longer deals with Luke's kill -9 scenario (where a 
cluster management tool would kill -9 datanodes in parallel), but in the end no 
tool really should do that.


 Fix hsync and hflush semantics.
 ---

 Key: HDFS-3979
 URL: https://issues.apache.org/jira/browse/HDFS-3979
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt, 
 hdfs-3979-v3.txt, hdfs-3979-v4.txt


 See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
 is not on a synchronous path from the DFSClient, hence it is possible that a 
 DN loses data that it has already acknowledged as persisted to a client.
 Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-11-03 Thread Lars Hofhansl (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13490113#comment-13490113
]

Lars Hofhansl commented on HDFS-3979:
-

Hi Kan, the only difference between v2 and v3 is that in v3 the fsync metric
is updated after the actual sync to the FS (BlockReceiver.flushOrSync).

This exposes the race condition we want to fix and makes TestHSync fail almost
every run (the client return from hsync before the datanode could update the
metric). With the rest of this patch applies this race is removed and TestHSync
never fails.

So now we have a test case for the race condition.

[~vicaya] The existing tests: TestFiPipelines and TestFiHFlush do not cover the
other scenarios you worry about?

Fix hsync and hflush semantics.
---

Key: HDFS-3979
URL: https://issues.apache.org/jira/browse/HDFS-3979
Project: Hadoop HDFS
Issue Type: Bug
Components: data-node, hdfs client
Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt, hdfs-3979-v3.txt

See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver
is not on a synchronous path from the DFSClient, hence it is possible that a
DN loses data that it has already acknowledged as persisted to a client.
Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4060) TestHSync#testSequenceFileSync failed

2012-10-16 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477584#comment-13477584
 ] 

Lars Hofhansl commented on HDFS-4060:
-

The sync to disk is not actually on a synchronous path as seen from the client, 
so there is a short race that the client returns but the metric was not updated.

See HDFS-3979, which would fix the issue, but appears to be stuck in discussion 
about what extra tests it would need, if any.


 TestHSync#testSequenceFileSync failed
 -

 Key: HDFS-4060
 URL: https://issues.apache.org/jira/browse/HDFS-4060
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Eli Collins
  Labels: test-fail

 TestHSync#testSequenceFileSync failed in the pre commit run of HDFS-4055.
 {noformat}
 java.lang.AssertionError: Bad value for metric FsyncCount expected:2 but 
 was:1
   at org.junit.Assert.fail(Assert.java:91)
   at org.junit.Assert.failNotEquals(Assert.java:645)
   at org.junit.Assert.assertEquals(Assert.java:126)
   at org.junit.Assert.assertEquals(Assert.java:470)
   at 
 org.apache.hadoop.test.MetricsAsserts.assertCounter(MetricsAsserts.java:228)
   at 
 org.apache.hadoop.hdfs.server.datanode.TestHSync.checkSyncMetric(TestHSync.java:46)
   at 
 org.apache.hadoop.hdfs.server.datanode.TestHSync.checkSyncMetric(TestHSync.java:49)
   at 
 org.apache.hadoop.hdfs.server.datanode.TestHSync.testSequenceFileSync(TestHSync.java:158)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3979) Fix hsync and hflush semantics.

2012-10-12 Thread Lars Hofhansl (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Lars Hofhansl updated HDFS-3979:

Attachment: hdfs-3979-v3.txt

This little change makes TestHSync fail most of the time - without the rest of
the patch, and never with this patch.

(In HDFS-744 I had avoided this race, by updating the sync metric first. I know
that was a hack... By updating the metric last in BlockReceiver.flushOrSync,
this race becomes apparent again).

We do have pipeline tests that seem to verify correct pipeline behavior in the
face of failures via fault injection: TestFiPipelines and TestFiHFlush.

In terms of the API3/API4 discussion, I think we agree that hflush should
follow API4, right? (otherwise we'd have unduly complex code)

Fix hsync and hflush semantics.
---

[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-10-04 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13469671#comment-13469671
 ] 

Lars Hofhansl commented on HDFS-3979:
-

I've seen that race when I write a test for HDFS-744. I fixed it there by 
updating the metrics first... Ugh :)

I think I can make a test that fails at least with reasonable probability with 
the current semantics.

The race between ack and write errors should be reduced (eliminated) with this 
patch.

 Fix hsync and hflush semantics.
 ---

 Key: HDFS-3979
 URL: https://issues.apache.org/jira/browse/HDFS-3979
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt


 See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
 is not on a synchronous path from the DFSClient, hence it is possible that a 
 DN loses data that it has already acknowledged as persisted to a client.
 Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-10-04 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13469827#comment-13469827
 ] 

Lars Hofhansl commented on HDFS-3979:
-

Thanks Luke and Kan. I'll come up with a test once I get some spare cycles 
(quite busy with HBase atm).

 Fix hsync and hflush semantics.
 ---

 Key: HDFS-3979
 URL: https://issues.apache.org/jira/browse/HDFS-3979
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt


 See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
 is not on a synchronous path from the DFSClient, hence it is possible that a 
 DN loses data that it has already acknowledged as persisted to a client.
 Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-10-04 Thread Lars Hofhansl (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470016#comment-13470016
]

Lars Hofhansl commented on HDFS-3979:
-

API4 is hflush (with change in OS buffers).

That's an interesting discussion by itself. hsync'ing every edit in HBase is
prohibitive.
I have some simple numbers in HBASE-5954.

Although, I need to do that test again with the sync_file_range changes in
HDFS-2465 (that would hopefully do most of the data sync'ing asynchronously and
only sync the last changes and metadata synchronously upon client request).

Many applications do not need every edit to be guaranteed on disk, but have
sync points. That is what I am aiming for in HBase. The application will know
the specific semantics.

What is really important for HBase (IMHO) is that every block is synced to disk
when it is closed. HBase constantly rewrites existing data via compactions so
without syncing arbitrarily old data can be lost during a rack or DC outage.

Lastly, we can play with this. For example only one of the replicas could sync
to disk and the other's just guarantee the data in the OS buffers (API4.5 :) ).

Fix hsync and hflush semantics.
---

[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-10-02 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13467882#comment-13467882
 ] 

Lars Hofhansl commented on HDFS-3979:
-

You don't think the existing pipeline tests cover the failure scenarios? 
I see if I can get some performance numbers.

 Fix hsync and hflush semantics.
 ---

 Key: HDFS-3979
 URL: https://issues.apache.org/jira/browse/HDFS-3979
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt


 See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
 is not on a synchronous path from the DFSClient, hence it is possible that a 
 DN loses data that it has already acknowledged as persisted to a client.
 Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-10-01 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13467270#comment-13467270
 ] 

Lars Hofhansl commented on HDFS-3979:
-

Do we want this change?
Seems to me that HDFS-265 broke hsync/hflush and this would fix it.


 Fix hsync and hflush semantics.
 ---

 Key: HDFS-3979
 URL: https://issues.apache.org/jira/browse/HDFS-3979
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt


 See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
 is not on a synchronous path from the DFSClient, hence it is possible that a 
 DN loses data that it has already acknowledged as persisted to a client.
 Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-09-28 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13465915#comment-13465915
 ] 

Lars Hofhansl commented on HDFS-3979:
-

Enqueing the seqno at end seems like the best approach. (Indeed this is done in 
the 0.20.x code as both of you said). 
I wonder why this was changed? Will have a new patch momentarily.


 Fix hsync and hflush semantics.
 ---

 Key: HDFS-3979
 URL: https://issues.apache.org/jira/browse/HDFS-3979
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: hdfs-3979-sketch.txt


 See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
 is not on a synchronous path from the DFSClient, hence it is possible that a 
 DN loses data that it has already acknowledged as persisted to a client.
 Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3979) Fix hsync and hflush semantics.

2012-09-28 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HDFS-3979:


Attachment: hdfs-3979-v2.txt

New patch. Order of local operations and waiting for downstream DNs now 
reflects the pre HDFS-265 logic.

 Fix hsync and hflush semantics.
 ---

 Key: HDFS-3979
 URL: https://issues.apache.org/jira/browse/HDFS-3979
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt


 See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
 is not on a synchronous path from the DFSClient, hence it is possible that a 
 DN loses data that it has already acknowledged as persisted to a client.
 Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-09-28 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13466113#comment-13466113
 ] 

Lars Hofhansl commented on HDFS-3979:
-

I see. Thanks Kan. So now we we have API4 and (with HDFS-744) API5.

For applications like HBase we'd like API4 as well as API5.
(API4 allows a hypothetical kill -9 of all DNs without loss of acknowledged 
data, API5 allows HW failures of all data nodes - i.e. a DC outage - with loss 
of acknowledged data)


 Fix hsync and hflush semantics.
 ---

 Key: HDFS-3979
 URL: https://issues.apache.org/jira/browse/HDFS-3979
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: hdfs-3979-sketch.txt, hdfs-3979-v2.txt


 See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
 is not on a synchronous path from the DFSClient, hence it is possible that a 
 DN loses data that it has already acknowledged as persisted to a client.
 Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-09-27 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13465386#comment-13465386
 ] 

Lars Hofhansl commented on HDFS-3979:
-

Should we simply do the enqueue at the end of receivePacket(), then?

So just to make sure: In the current code the seqno is already enqueued in the 
beginning, so if there's an exception later in the code it won't have any 
effect on the enqued seqno. The finally is just preserves this existing 
behavior.

What happens when there is an exception and the seqno is never enqueued? (and 
if that is OK, why is it not a problem now.)


 Fix hsync and hflush semantics.
 ---

 Key: HDFS-3979
 URL: https://issues.apache.org/jira/browse/HDFS-3979
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: hdfs-3979-sketch.txt


 See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
 is not on a synchronous path from the DFSClient, hence it is possible that a 
 DN loses data that it has already acknowledged as persisted to a client.
 Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-744) Support hsync in HDFS

2012-09-26 Thread Lars Hofhansl (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13464004#comment-13464004
]

Lars Hofhansl commented on HDFS-744:

I see. In that case we wouldn't ack back until all local work in done.
A possible place to do that would be almost at the end of receivePacket(),
maybe in a finally block of the last try/catch in that method.

That still does not take of all the cases, though; for the last packet in a
block, the sync is deferred to close() (to avoid double sync). That's not hard
to to change either, I think.

Support hsync in HDFS
-

Key: HDFS-744
URL: https://issues.apache.org/jira/browse/HDFS-744
Project: Hadoop HDFS
Issue Type: New Feature
Components: data-node, hdfs client
Reporter: Hairong Kuang
Assignee: Lars Hofhansl
Fix For: 2.0.2-alpha

Attachments: HDFS-744-2.0-v1.patch, HDFS-744-2.0-v2.patch,
HDFS-744-trunk.patch, HDFS-744-trunk-v2.patch, HDFS-744-trunk-v3.patch,
HDFS-744-trunk-v4.patch, HDFS-744-trunk-v5.patch, HDFS-744-trunk-v6.patch,
HDFS-744-trunk-v7.patch, HDFS-744-trunk-v8.patch, hdfs-744.txt,
hdfs-744-v2.txt, hdfs-744-v3.txt

HDFS-731 implements hsync by default as hflush. As descriibed in HADOOP-6313,
the real expected semantics should be flushes out to all replicas and all
replicas have done posix fsync equivalent - ie the OS has flushed it to the
disk device (but the disk may have it in its cache). This jira aims to
implement the expected behaviour.

[jira] [Commented] (HDFS-744) Support hsync in HDFS

2012-09-26 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13464005#comment-13464005
 ] 

Lars Hofhansl commented on HDFS-744:


BTW. I filed HDFS-3979 to do that.

 Support hsync in HDFS
 -

 Key: HDFS-744
 URL: https://issues.apache.org/jira/browse/HDFS-744
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: data-node, hdfs client
Reporter: Hairong Kuang
Assignee: Lars Hofhansl
 Fix For: 2.0.2-alpha

 Attachments: HDFS-744-2.0-v1.patch, HDFS-744-2.0-v2.patch, 
 HDFS-744-trunk.patch, HDFS-744-trunk-v2.patch, HDFS-744-trunk-v3.patch, 
 HDFS-744-trunk-v4.patch, HDFS-744-trunk-v5.patch, HDFS-744-trunk-v6.patch, 
 HDFS-744-trunk-v7.patch, HDFS-744-trunk-v8.patch, hdfs-744.txt, 
 hdfs-744-v2.txt, hdfs-744-v3.txt


 HDFS-731 implements hsync by default as hflush. As descriibed in HADOOP-6313, 
 the real expected semantics should be flushes out to all replicas and all 
 replicas have done posix fsync equivalent - ie the OS has flushed it to the 
 disk device (but the disk may have it in its cache). This jira aims to 
 implement the expected behaviour.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3979) Fix hsync and hflush semantics.

2012-09-26 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HDFS-3979:


Attachment: hdfs-3979-sketch.txt

Something like this.
(This is a sketch, the only test I performed was compiling)

 Fix hsync and hflush semantics.
 ---

 Key: HDFS-3979
 URL: https://issues.apache.org/jira/browse/HDFS-3979
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: hdfs-3979-sketch.txt


 See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
 is not on a synchronous path from the DFSClient, hence it is possible that DN 
 loses data that is has already acknowledged as persisted to a client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-09-26 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13464133#comment-13464133
 ] 

Lars Hofhansl commented on HDFS-3979:
-

(and sorry for misspelling you name)

 Fix hsync and hflush semantics.
 ---

 Key: HDFS-3979
 URL: https://issues.apache.org/jira/browse/HDFS-3979
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: hdfs-3979-sketch.txt


 See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
 is not on a synchronous path from the DFSClient, hence it is possible that DN 
 loses data that is has already acknowledged as persisted to a client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3979) Fix hsync and hflush semantics.

2012-09-26 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HDFS-3979:


Status: Patch Available  (was: Open)

Let's try HadoopQA.

TestHSync still passes. I'll also do some tests with HBase...


 Fix hsync and hflush semantics.
 ---

 Key: HDFS-3979
 URL: https://issues.apache.org/jira/browse/HDFS-3979
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: hdfs-3979-sketch.txt


 See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
 is not on a synchronous path from the DFSClient, hence it is possible that DN 
 loses data that is has already acknowledged as persisted to a client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3979) Fix hsync and hflush semantics.

2012-09-26 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HDFS-3979:


Description: 
See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver is 
not on a synchronous path from the DFSClient, hence it is possible that a DN 
loses data that it has already acknowledged as persisted to a client.

Edit: Spelling.


  was:See discussion in HDFS-744. The actual sync/flush operation in 
BlockReceiver is not on a synchronous path from the DFSClient, hence it is 
possible that DN loses data that is has already acknowledged as persisted to a 
client.


 Fix hsync and hflush semantics.
 ---

 Key: HDFS-3979
 URL: https://issues.apache.org/jira/browse/HDFS-3979
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: hdfs-3979-sketch.txt


 See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
 is not on a synchronous path from the DFSClient, hence it is possible that a 
 DN loses data that it has already acknowledged as persisted to a client.
 Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-09-26 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13464410#comment-13464410
 ] 

Lars Hofhansl commented on HDFS-3979:
-

I'm not sure either. I am trying not to change the existing behavior.

The enqueue used to happen in the beginning of receivePacket(...), so if that 
latter part of the method fails the ack would already be enqueued.


 Fix hsync and hflush semantics.
 ---

 Key: HDFS-3979
 URL: https://issues.apache.org/jira/browse/HDFS-3979
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.22.0, 0.23.0, 2.0.0-alpha
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: hdfs-3979-sketch.txt


 See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
 is not on a synchronous path from the DFSClient, hence it is possible that a 
 DN loses data that it has already acknowledged as persisted to a client.
 Edit: Spelling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-744) Support hsync in HDFS

2012-09-25 Thread Lars Hofhansl (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13462855#comment-13462855
]

Lars Hofhansl commented on HDFS-744:

You are right Luke. I implemented this in the context of hadoop-2 (i.e. with
HDFS-265).
It seems to get this right HDFS-265 needs to be revisited again.

Will look at your suggestion (doing sync in the data thread). As long as the
syncs (or flushes) are not serialized it's fine (otherwise nobody is going to
switch this on).

Support hsync in HDFS
-

[jira] [Commented] (HDFS-744) Support hsync in HDFS

2012-09-25 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13462869#comment-13462869
 ] 

Lars Hofhansl commented on HDFS-744:


In any case, hsync and hflush should be fixed together. A one-off for hsync 
does not seem to be the right thing.

 Support hsync in HDFS
 -

 Key: HDFS-744
 URL: https://issues.apache.org/jira/browse/HDFS-744
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: data-node, hdfs client
Reporter: Hairong Kuang
Assignee: Lars Hofhansl
 Fix For: 2.0.2-alpha

 Attachments: HDFS-744-2.0-v1.patch, HDFS-744-2.0-v2.patch, 
 HDFS-744-trunk.patch, HDFS-744-trunk-v2.patch, HDFS-744-trunk-v3.patch, 
 HDFS-744-trunk-v4.patch, HDFS-744-trunk-v5.patch, HDFS-744-trunk-v6.patch, 
 HDFS-744-trunk-v7.patch, HDFS-744-trunk-v8.patch, hdfs-744.txt, 
 hdfs-744-v2.txt, hdfs-744-v3.txt


 HDFS-731 implements hsync by default as hflush. As descriibed in HADOOP-6313, 
 the real expected semantics should be flushes out to all replicas and all 
 replicas have done posix fsync equivalent - ie the OS has flushed it to the 
 disk device (but the disk may have it in its cache). This jira aims to 
 implement the expected behaviour.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-744) Support hsync in HDFS

2012-09-25 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13463505#comment-13463505
 ] 

Lars Hofhansl commented on HDFS-744:


I think the problem is that we have to enqueue the seqno before the packet is 
sent downstream right? (Otherwise we could potentially miss the ack, right?)

So in order to enqueue the seqno after we syncOrFlush, we'd also have to send 
the packet downstream after we syncOrFlush, which essentially means that we are 
serializing the sync times across all replicas.


 Support hsync in HDFS
 -

 Key: HDFS-744
 URL: https://issues.apache.org/jira/browse/HDFS-744
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: data-node, hdfs client
Reporter: Hairong Kuang
Assignee: Lars Hofhansl
 Fix For: 2.0.2-alpha

 Attachments: HDFS-744-2.0-v1.patch, HDFS-744-2.0-v2.patch, 
 HDFS-744-trunk.patch, HDFS-744-trunk-v2.patch, HDFS-744-trunk-v3.patch, 
 HDFS-744-trunk-v4.patch, HDFS-744-trunk-v5.patch, HDFS-744-trunk-v6.patch, 
 HDFS-744-trunk-v7.patch, HDFS-744-trunk-v8.patch, hdfs-744.txt, 
 hdfs-744-v2.txt, hdfs-744-v3.txt


 HDFS-731 implements hsync by default as hflush. As descriibed in HADOOP-6313, 
 the real expected semantics should be flushes out to all replicas and all 
 replicas have done posix fsync equivalent - ie the OS has flushed it to the 
 disk device (but the disk may have it in its cache). This jira aims to 
 implement the expected behaviour.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-744) Support hsync in HDFS

2012-09-25 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13463512#comment-13463512
 ] 

Lars Hofhansl commented on HDFS-744:


Another approach would be to wait in the responder until both the downstream 
datanode responded *and* the sync has finished. That way we get correctness and 
we can still interleave sync'ing/RTT in the pipeline.

 Support hsync in HDFS
 -

 Key: HDFS-744
 URL: https://issues.apache.org/jira/browse/HDFS-744
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: data-node, hdfs client
Reporter: Hairong Kuang
Assignee: Lars Hofhansl
 Fix For: 2.0.2-alpha

 Attachments: HDFS-744-2.0-v1.patch, HDFS-744-2.0-v2.patch, 
 HDFS-744-trunk.patch, HDFS-744-trunk-v2.patch, HDFS-744-trunk-v3.patch, 
 HDFS-744-trunk-v4.patch, HDFS-744-trunk-v5.patch, HDFS-744-trunk-v6.patch, 
 HDFS-744-trunk-v7.patch, HDFS-744-trunk-v8.patch, hdfs-744.txt, 
 hdfs-744-v2.txt, hdfs-744-v3.txt


 HDFS-731 implements hsync by default as hflush. As descriibed in HADOOP-6313, 
 the real expected semantics should be flushes out to all replicas and all 
 replicas have done posix fsync equivalent - ie the OS has flushed it to the 
 disk device (but the disk may have it in its cache). This jira aims to 
 implement the expected behaviour.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-3979) Fix hsync and hflush semantics.

2012-09-25 Thread Lars Hofhansl (JIRA)

Lars Hofhansl created HDFS-3979:
---

 Summary: Fix hsync and hflush semantics.
 Key: HDFS-3979
 URL: https://issues.apache.org/jira/browse/HDFS-3979
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Lars Hofhansl


See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver is 
not on a synchronous path from the DFSClient, hence it is possible that DN 
loses data that is has already acknowledged as persisted to a client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3979) Fix hsync and hflush semantics.

2012-09-25 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13463515#comment-13463515
 ] 

Lars Hofhansl commented on HDFS-3979:
-

Also see my comment here: 
https://issues.apache.org/jira/browse/HDFS-744?focusedCommentId=13279619page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13279619

 Fix hsync and hflush semantics.
 ---

 Key: HDFS-3979
 URL: https://issues.apache.org/jira/browse/HDFS-3979
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Lars Hofhansl

 See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
 is not on a synchronous path from the DFSClient, hence it is possible that DN 
 loses data that is has already acknowledged as persisted to a client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HDFS-3979) Fix hsync and hflush semantics.

2012-09-25 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl reassigned HDFS-3979:
---

Assignee: Lars Hofhansl

 Fix hsync and hflush semantics.
 ---

 Key: HDFS-3979
 URL: https://issues.apache.org/jira/browse/HDFS-3979
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl

 See discussion in HDFS-744. The actual sync/flush operation in BlockReceiver 
 is not on a synchronous path from the DFSClient, hence it is possible that DN 
 loses data that is has already acknowledged as persisted to a client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3721) hsync support broke wire compatibility

2012-07-26 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13423180#comment-13423180
 ] 

Lars Hofhansl commented on HDFS-3721:
-

Saw Todd's comment on HDFS-744 (moving that change to 2.0.x is not an option).
I had assumed that HDFS-744 would be in 2.0.x (in which case there would have 
been no compatibility issues).

Had a quick look through the patch here. Looks good as far as I can tell, I'll 
take more detailed look later today.


 hsync support broke wire compatibility
 --

 Key: HDFS-3721
 URL: https://issues.apache.org/jira/browse/HDFS-3721
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 2.1.0-alpha
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Critical
 Attachments: hdfs-3721.txt


 HDFS-744 added support for hsync to the data transfer wire protocol. However, 
 it actually broke wire compatibility: if the client has hsync support but the 
 server does not, the client cannot read or write data on the old cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3721) hsync support broke wire compatibility

2012-07-25 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13422835#comment-13422835
 ] 

Lars Hofhansl commented on HDFS-3721:
-

What sort of wire-compatibility are we talking about? Both trunk and 2.0 have 
the hsync code. What sort of old cluster would not have this? Does the 2.x.x 
client support communicating with a 1.x.x cluster?

Apologies for introducing this with my patch in HDFS-744, I had assumed 
protobuf will take care of it.


 hsync support broke wire compatibility
 --

 Key: HDFS-3721
 URL: https://issues.apache.org/jira/browse/HDFS-3721
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 2.1.0-alpha
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Critical
 Attachments: hdfs-3721.txt


 HDFS-744 added support for hsync to the data transfer wire protocol. However, 
 it actually broke wire compatibility: if the client has hsync support but the 
 server does not, the client cannot read or write data on the old cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3721) hsync support broke wire compatibility

2012-07-25 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13422840#comment-13422840
 ] 

Lars Hofhansl commented on HDFS-3721:
-

Oh this is a 2.1.x vs 2.0.x issue...?
The HDFS-744 patch is smaller than this patch, could we port that to 2.0.x?


 hsync support broke wire compatibility
 --

 Key: HDFS-3721
 URL: https://issues.apache.org/jira/browse/HDFS-3721
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 2.1.0-alpha
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Critical
 Attachments: hdfs-3721.txt


 HDFS-744 added support for hsync to the data transfer wire protocol. However, 
 it actually broke wire compatibility: if the client has hsync support but the 
 server does not, the client cannot read or write data on the old cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-744) Support hsync in HDFS

2012-07-25 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13422842#comment-13422842
 ] 

Lars Hofhansl commented on HDFS-744:


Just noticed (because of HDFS-3721) that this is now in 2.1.0-alpha.
Any chance to get this into 2.0.x-alpha? (In fact on June 4th is was marked 
with 2.0.1-alpha, but something changed since then, so now it's 2.1.0-alpha)


 Support hsync in HDFS
 -

 Key: HDFS-744
 URL: https://issues.apache.org/jira/browse/HDFS-744
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: data-node, hdfs client
Reporter: Hairong Kuang
Assignee: Lars Hofhansl
 Fix For: 2.1.0-alpha

 Attachments: HDFS-744-2.0-v1.patch, HDFS-744-2.0-v2.patch, 
 HDFS-744-trunk-v2.patch, HDFS-744-trunk-v3.patch, HDFS-744-trunk-v4.patch, 
 HDFS-744-trunk-v5.patch, HDFS-744-trunk-v6.patch, HDFS-744-trunk-v7.patch, 
 HDFS-744-trunk-v8.patch, HDFS-744-trunk.patch, hdfs-744-v2.txt, 
 hdfs-744-v3.txt, hdfs-744.txt


 HDFS-731 implements hsync by default as hflush. As descriibed in HADOOP-6313, 
 the real expected semantics should be flushes out to all replicas and all 
 replicas have done posix fsync equivalent - ie the OS has flushed it to the 
 disk device (but the disk may have it in its cache). This jira aims to 
 implement the expected behaviour.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3580) incompatible types; no instance(s) of type variable(s) V exist so that V conforms to boolean compiling HttpFSServer.java with OpenJDK

2012-06-29 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13404276#comment-13404276
 ] 

Lars Hofhansl commented on HDFS-3580:
-

You beat me to it Andy. :)

+1 on patch, identical patch compiled fine on my home machine with OpenJDK 
(don't access to it right now).

I am not sure whether the version with the primitive types is valid or not, but 
the version with reference types is definitely valid and should work with all 
JDKs.


 incompatible types; no instance(s) of type variable(s) V exist so that V 
 conforms to boolean compiling HttpFSServer.java with OpenJDK
 -

 Key: HDFS-3580
 URL: https://issues.apache.org/jira/browse/HDFS-3580
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.1-alpha
Reporter: Andy Isaacson
Assignee: Andy Isaacson
Priority: Minor
 Attachments: hdfs-3580.txt


 {quote}
 [ERROR] 
 /home/lars/dev/hadoop-2/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/fs/http/server/HttpFSServer.java:[407,36]
  incompatible types; no instance(s) of type variable(s) V exist so that V 
 conforms to boolean
 {quote}
 {quote}
 $ javac -version
 javac 1.6.0_24
 $ java -version
 java version 1.6.0_24
 OpenJDK Runtime Environment (IcedTea6 1.11.3) (fedora-67.1.11.3.fc16-x86_64)
 OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)
 {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1783) Ability for HDFS client to write replicas in parallel

2012-06-24 Thread Lars Hofhansl (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13400196#comment-13400196
]

Lars Hofhansl commented on HDFS-1783:
-

One more point to consider:
For us (Salesforce) this is mostly interesting for HBase.

A typical HBase cluster has the DataNodes co-located with the HBase
RegionServers.
So assuming good load distribution within HBase, the bandwidth would still be
amortized across the cluster, but with lower latency for each single
RegionServer (this HDFS client in this case). Overall the same number of bits
is sent through the cluster as a whole.

This would only be enabled for the WAL. Other write load (like compactions),
would still do the pipelining.

Andy did some cool testing on EC2 over in HBASE-6116.

We'll be doing some basic testing in a real, dedicated cluster this week.

Ability for HDFS client to write replicas in parallel
-

Key: HDFS-1783
URL: https://issues.apache.org/jira/browse/HDFS-1783
Project: Hadoop HDFS
Issue Type: Improvement
Components: hdfs client
Reporter: dhruba borthakur
Assignee: Lars Hofhansl
Attachments: HDFS-1783-trunk-v2.patch, HDFS-1783-trunk-v3.patch,
HDFS-1783-trunk-v4.patch, HDFS-1783-trunk-v5.patch, HDFS-1783-trunk.patch

The current implementation of HDFS pipelines the writes to the three
replicas. This introduces some latency for realtime latency sensitive
applications. An alternate implementation that allows the client to write all
replicas in parallel gives much better response times to these applications.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1783) Ability for HDFS client to write replicas in parallel

2012-06-21 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13398256#comment-13398256
 ] 

Lars Hofhansl commented on HDFS-1783:
-

Thanks Ted. Yes, I got that wrong in the first version of the patch (Dhruba had 
it right on Github, just me). I found that when I added the various tests.

I'll be back in the US soon, and finish the HBase patch and get some 
performance testing done on a real cluster.

 Ability for HDFS client to write replicas in parallel
 -

 Key: HDFS-1783
 URL: https://issues.apache.org/jira/browse/HDFS-1783
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs client
Reporter: dhruba borthakur
Assignee: Lars Hofhansl
 Attachments: HDFS-1783-trunk-v2.patch, HDFS-1783-trunk-v3.patch, 
 HDFS-1783-trunk-v4.patch, HDFS-1783-trunk-v5.patch, HDFS-1783-trunk.patch


 The current implementation of HDFS pipelines the writes to the three 
 replicas. This introduces some latency for realtime latency sensitive 
 applications. An alternate implementation that allows the client to write all 
 replicas in parallel gives much better response times to these applications. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1783) Ability for HDFS client to write replicas in parallel

2012-06-21 Thread Lars Hofhansl (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13398767#comment-13398767
]

Lars Hofhansl commented on HDFS-1783:
-

Yep. That is exactly the point. HDFS does pipelining to improve throughput at
the expense of latency. This patch allows a client to favor latency.

If the client operates at the NIC's throughput limit enabling parallel writes
will make things worse.

This patch could be extended in the future to mix direct connections with
pipelining. For example a client could setup a 1-hop (direct) pipeline and a
2-hop-pipeline for a replication factor of 3, or 2 2-hop-pipelines for a
replication factor of 4, etc.

We'll be testing this with HBase workloads. Using traffic shaping is
interesting.

Ability for HDFS client to write replicas in parallel
-

[jira] [Commented] (HDFS-1783) Ability for HDFS client to write replicas in parallel

2012-06-21 Thread Lars Hofhansl (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13398785#comment-13398785
]

Lars Hofhansl commented on HDFS-1783:
-

Yes, that would be a good optimization.
I would propose starting with something simple, though (such as the current
patch) and see how that behaves with HBase, and build confidence that it does
not break things.
It's optional and the patch (IMHO) is low risk (only DFSOutputStream is
changed, the rest for tests). Then we can think about further optimizations.

Ability for HDFS client to write replicas in parallel
-

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-06-19 Thread Lars Hofhansl (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13396592#comment-13396592
]

Lars Hofhansl commented on HDFS-3370:
-

Hardlinks would be used for temporary snapshotting (not to hold the backup
itself).

Anyway... Since there's strong opposition to this, at Salesforce we'll either
come up with something else, maintain local HDFS patches, or use a different
file system.

HDFS hardlink
-

Key: HDFS-3370
URL: https://issues.apache.org/jira/browse/HDFS-3370
Project: Hadoop HDFS
Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
Attachments: HDFS-HardLink.pdf

We'd like to add a new feature hardlink to HDFS that allows harlinked files
to share data without copying. Currently we will support hardlinking only
closed files, but it could be extended to unclosed files as well.
Among many potential use cases of the feature, the following two are
primarily used in facebook:
1. This provides a lightweight way for applications like hbase to create a
snapshot;
2. This also allows an application like Hive to move a table to a different
directory without breaking current running hive queries.

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-06-14 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13294934#comment-13294934
 ] 

Lars Hofhansl commented on HDFS-3370:
-

This is a good discussion. 

Couple of points:
bq. Or provide use cases which cannot be solved without it.
This seems to be the key question: What services should a file system provide?
The same argument could be made for symbolic links. The application could 
implement those (in fact it's quite simple).

bq. but they are very hard to support when the namespace is distributed
But isn't that an implementation detail, which should not inform the feature 
set? 
Hardlinks could be only supported per distinct namespace (namespace in 
federated HDFS or a volume in MapR - I think). This is not unlike Unix where 
hardlinks are per distinct filesystem (i.e. not across mount points).

@M.C. Srivas:
If you create 15 backups without hardlinks you get 15 times the metadata *and* 
15 times the data... Unless you assume some other feature such as snapshots 
with copy-on-write or backup-on-write semantics. (Maybe I did not get the 
argument)

Immutable files are very a common and useful design pattern (not just for 
HBase) and while not strictly needed, hardlinks are very useful together with 
immutable files.

Just my $0.02.


 HDFS hardlink
 -

 Key: HDFS-3370
 URL: https://issues.apache.org/jira/browse/HDFS-3370
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
 Attachments: HDFS-HardLink.pdf


 We'd like to add a new feature hardlink to HDFS that allows harlinked files 
 to share data without copying. Currently we will support hardlinking only 
 closed files, but it could be extended to unclosed files as well.
 Among many potential use cases of the feature, the following two are 
 primarily used in facebook:
 1. This provides a lightweight way for applications like hbase to create a 
 snapshot;
 2. This also allows an application like Hive to move a table to a different 
 directory without breaking current running hive queries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1783) Ability for HDFS client to write replicas in parallel

2012-06-14 Thread Lars Hofhansl (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13294951#comment-13294951
]

Lars Hofhansl commented on HDFS-1783:
-

@Ted: The first method is overridden in DistributedFileSystem (to avoid having
to change method signatures in each subclass for FileSystem).

PrimitiveCreate is called from FileContext. There seem to be some general
inconsistencies in FileSystem. For example calling FileSystem.create(...,
APPEND, ...) will not append. FileContext.create(..., APPEND, ...) on the other
hand will do the right thing.

This patch does not affect that. The patch will naturally work with
FileContext.create(..., APPEND, ...) I'll add a few more tests for this.
When I'm back in the US, I'll get some performance numbers (judging from my
micro benchmarks, I'd expect some nice improvements as long as the client's
network-link is not saturated).

Ability for HDFS client to write replicas in parallel
-

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-06-12 Thread Lars Hofhansl (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293480#comment-13293480
]

Lars Hofhansl commented on HDFS-3370:
-

Thanks Liyin. Sounds good.

One thought that occurred to me since: We need to think about copy semantics.
For example how will distcp handle this? It shouldn't create a new copy of a
file for each hardlink that points to it, but rather just copy it at most once
and create hardlinks for each following reference. But then what about multiple
distcp commands that happen to cover hardlinks to the same file? I suppose in
the case we cannot be expected to avoid multiple copies of the same file (but
at most one copy for each invocation of distcp, and only if the distcp happens
to cover a different hardlink).

HDFS hardlink
-

Key: HDFS-3370
URL: https://issues.apache.org/jira/browse/HDFS-3370
Project: Hadoop HDFS
Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
Attachments: HDFS-HardLink.pdf

[jira] [Updated] (HDFS-1783) Ability for HDFS client to write replicas in parallel

2012-06-09 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HDFS-1783:


Attachment: HDFS-1783-trunk-v5.patch

Also adds a subclass of TestPipelinesFailover running all tests with 
PARALLEL_WRITES.

FileSystem.append itself does not support parallel writes (as of this patch).

I am generally not quite clear what the difference between FileSystem.append 
and FileSystem.create(..., CreateFlag.APPEND, ...) is supposed to be.


 Ability for HDFS client to write replicas in parallel
 -

 Key: HDFS-1783
 URL: https://issues.apache.org/jira/browse/HDFS-1783
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs client
Reporter: dhruba borthakur
Assignee: Lars Hofhansl
 Attachments: HDFS-1783-trunk-v2.patch, HDFS-1783-trunk-v3.patch, 
 HDFS-1783-trunk-v4.patch, HDFS-1783-trunk-v5.patch, HDFS-1783-trunk.patch


 The current implementation of HDFS pipelines the writes to the three 
 replicas. This introduces some latency for realtime latency sensitive 
 applications. An alternate implementation that allows the client to write all 
 replicas in parallel gives much better response times to these applications. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1783) Ability for HDFS client to write replicas in parallel

2012-06-08 Thread Lars Hofhansl (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291612#comment-13291612
]

Lars Hofhansl commented on HDFS-1783:
-

Thanks Todd. I see your point. I'm still oversees until the end of the month
with no physical access to a cluster. Ram and Andy said, that they might get a
chance to do some performance test before that. (It's hard to beat the
pipelining on throughput, so I only expect latency to be improved.)

As for the complexity, I find it manageable... The pipelining as such has not
changed, only that the client opens up N pipelines on length 1.
Once this change is in, one could get fancier (for example 2 pipelines of
length 2 for 4 replicas, etc, or maybe we could open pipelines to multiple
clusters, etc).

Ability for HDFS client to write replicas in parallel
-

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-06-08 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291614#comment-13291614
 ] 

Lars Hofhansl commented on HDFS-3370:
-

Do you have a preliminary patch to look at?


 HDFS hardlink
 -

 Key: HDFS-3370
 URL: https://issues.apache.org/jira/browse/HDFS-3370
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
 Attachments: HDFS-HardLink.pdf


 We'd like to add a new feature hardlink to HDFS that allows harlinked files 
 to share data without copying. Currently we will support hardlinking only 
 closed files, but it could be extended to unclosed files as well.
 Among many potential use cases of the feature, the following two are 
 primarily used in facebook:
 1. This provides a lightweight way for applications like hbase to create a 
 snapshot;
 2. This also allows an application like Hive to move a table to a different 
 directory without breaking current running hive queries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1783) Ability for HDFS client to write replicas in parallel

2012-06-08 Thread Lars Hofhansl (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291641#comment-13291641
]

Lars Hofhansl commented on HDFS-1783:
-

I did a simple local micro benchmark:

Started a mini cluster with 3 data nodes.
Wrote 1 byte 100.000 times, each followed by an hflush (so 100.000 packets).

With parallel writes it took ~25s, without ~30s (this was repeatable).

Also tried to 10 and 100 byte packets. For 10 bytes I get the same results.
For 100 bytes it took ~29s with parallel writes and ~37s without.

Since this was all on a single machine I am not entirely sure how this would
translate to a real cluster with real network latency.

The latency I measured for my lo device is 0.05ms... I would expect the
impact of this change to be more profound in a real cluster setting with
latency in the order of a few ms. There also should be a definite gain when
hsync (after HDFS-744) is enabled (but that I cannot test on a single machine
with a single spindle).

Ability for HDFS client to write replicas in parallel
-

[jira] [Commented] (HDFS-1783) Ability for HDFS client to write replicas in parallel

2012-06-08 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291647#comment-13291647
 ] 

Lars Hofhansl commented on HDFS-1783:
-

One more test:
I introduced an artificial 1ms sleep in the beginning of 
BlockReceiever.receivePacket.
Then I ran the same test above with 10.000 loops.
With the patch it takes ~19s without the patch ~44s.


 Ability for HDFS client to write replicas in parallel
 -

 Key: HDFS-1783
 URL: https://issues.apache.org/jira/browse/HDFS-1783
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs client
Reporter: dhruba borthakur
Assignee: Lars Hofhansl
 Attachments: HDFS-1783-trunk-v2.patch, HDFS-1783-trunk-v3.patch, 
 HDFS-1783-trunk-v4.patch, HDFS-1783-trunk.patch


 The current implementation of HDFS pipelines the writes to the three 
 replicas. This introduces some latency for realtime latency sensitive 
 applications. An alternate implementation that allows the client to write all 
 replicas in parallel gives much better response times to these applications. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1783) Ability for HDFS client to write replicas in parallel

2012-06-07 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13290920#comment-13290920
 ] 

Lars Hofhansl commented on HDFS-1783:
-

Is there general interest in this?


 Ability for HDFS client to write replicas in parallel
 -

 Key: HDFS-1783
 URL: https://issues.apache.org/jira/browse/HDFS-1783
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs client
Reporter: dhruba borthakur
Assignee: Lars Hofhansl
 Attachments: HDFS-1783-trunk-v2.patch, HDFS-1783-trunk-v3.patch, 
 HDFS-1783-trunk-v4.patch, HDFS-1783-trunk.patch


 The current implementation of HDFS pipelines the writes to the three 
 replicas. This introduces some latency for realtime latency sensitive 
 applications. An alternate implementation that allows the client to write all 
 replicas in parallel gives much better response times to these applications. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-06-07 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13290922#comment-13290922
 ] 

Lars Hofhansl commented on HDFS-3370:
-

Is anybody working on a patch for this?
If not, I would not mind picking this up (although I can't promise getting to 
this before the end of the month).


 HDFS hardlink
 -

 Key: HDFS-3370
 URL: https://issues.apache.org/jira/browse/HDFS-3370
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
 Attachments: HDFS-HardLink.pdf


 We'd like to add a new feature hardlink to HDFS that allows harlinked files 
 to share data without copying. Currently we will support hardlinking only 
 closed files, but it could be extended to unclosed files as well.
 Among many potential use cases of the feature, the following two are 
 primarily used in facebook:
 1. This provides a lightweight way for applications like hbase to create a 
 snapshot;
 2. This also allows an application like Hive to move a table to a different 
 directory without breaking current running hive queries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1783) Ability for HDFS client to write replicas in parallel

2012-06-05 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HDFS-1783:


Attachment: HDFS-1783-trunk-v4.patch

* added Stack's suggestions (thanks Stack):
* added subclasses of TestReplication and TestDatanodeDeath, that run all tests 
with parallel writes enabled
* found a problem with error handling (with 
TestDatanodeDeathWithParallelWrites), fixed it
* renamed s[] to sockets[]


 Ability for HDFS client to write replicas in parallel
 -

 Key: HDFS-1783
 URL: https://issues.apache.org/jira/browse/HDFS-1783
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs client
Reporter: dhruba borthakur
Assignee: Lars Hofhansl
 Attachments: HDFS-1783-trunk-v2.patch, HDFS-1783-trunk-v3.patch, 
 HDFS-1783-trunk-v4.patch, HDFS-1783-trunk.patch


 The current implementation of HDFS pipelines the writes to the three 
 replicas. This introduces some latency for realtime latency sensitive 
 applications. An alternate implementation that allows the client to write all 
 replicas in parallel gives much better response times to these applications. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-06-05 Thread Lars Hofhansl (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289689#comment-13289689
]

Lars Hofhansl commented on HDFS-3370:
-

Reading through the Design Doc it seems that
FileSystem.{setPermission|setOwner} would be awkward. We'd have to find each
INodeHardLinkFile pointing to the same file and then changing all their
permissions/owners.

HardLinkFileInfo could also maintain permissions and owners (since they -
following posix - are the same for each hard link). That way changing owner or
permissions would immediately affect all hard links.
When the fsimage is saved each INodeHardLinkFile would still write its own
permission and owner (for simplicity, but that could be optimized, as long as
at least one INode writes the permissions/owner).
Upon read INode representing a hardlink must have the same permission/owner as
all other INodes linking to the same file. If not the image is inconsistent.

In that case HardLinkFileInfo would not need to maintain a list of pointers
back to all INodeHardLinkFiles, and owner/permissions would only be stored once
in memory.

HDFS hardlink
-

Key: HDFS-3370
URL: https://issues.apache.org/jira/browse/HDFS-3370
Project: Hadoop HDFS
Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
Attachments: HDFS-HardLink.pdf

[jira] [Commented] (HDFS-744) Support hsync in HDFS

2012-06-04 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13288691#comment-13288691
 ] 

Lars Hofhansl commented on HDFS-744:


Two responses on DEV list on port to 2.0, both in favor.
If there are no objections, should I open a new jira, or should we just do it 
here?
The patch is identical (minus a single hunk that needed a small change to 
apply).

Let's punt on 1.0, the API is too different.


 Support hsync in HDFS
 -

 Key: HDFS-744
 URL: https://issues.apache.org/jira/browse/HDFS-744
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: data-node, hdfs client
Reporter: Hairong Kuang
Assignee: Lars Hofhansl
 Fix For: 3.0.0

 Attachments: HDFS-744-2.0-v1.patch, HDFS-744-2.0-v2.patch, 
 HDFS-744-trunk-v2.patch, HDFS-744-trunk-v3.patch, HDFS-744-trunk-v4.patch, 
 HDFS-744-trunk-v5.patch, HDFS-744-trunk-v6.patch, HDFS-744-trunk-v7.patch, 
 HDFS-744-trunk-v8.patch, HDFS-744-trunk.patch, hdfs-744-v2.txt, 
 hdfs-744-v3.txt, hdfs-744.txt


 HDFS-731 implements hsync by default as hflush. As descriibed in HADOOP-6313, 
 the real expected semantics should be flushes out to all replicas and all 
 replicas have done posix fsync equivalent - ie the OS has flushed it to the 
 disk device (but the disk may have it in its cache). This jira aims to 
 implement the expected behaviour.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

1 2 >

1 - 100 of 143 matches

Mail list logo