[jira] [Commented] (HDFS-14498) LeaseManager can loop forever on the file for which create has failed

2020-07-13 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156958#comment-17156958
 ] 

Eric Payne commented on HDFS-14498:
---

[~hexiaoqiao], can you please provide patches for branch-3.2 and branch-3.1

> LeaseManager can loop forever on the file for which create has failed 
> --
>
> Key: HDFS-14498
> URL: https://issues.apache.org/jira/browse/HDFS-14498
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.9.0
>Reporter: Sergey Shelukhin
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.2.2, 3.3.1, 3.4.0, 3.1.5
>
> Attachments: HDFS-14498-branch-2.10.001.patch, HDFS-14498.001.patch, 
> HDFS-14498.002.patch
>
>
> The logs from file creation are long gone due to infinite lease logging, 
> however it presumably failed... the client who was trying to write this file 
> is definitely long dead.
> The version includes HDFS-4882.
> We get this log pattern repeating infinitely:
> {noformat}
> 2019-05-16 14:00:16,893 INFO 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: [Lease.  Holder: 
> DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 1] has expired hard 
> limit
> 2019-05-16 14:00:16,893 INFO 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease.  
> Holder: DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 1], src=
> 2019-05-16 14:00:16,893 WARN 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.internalReleaseLease: 
> Failed to release lease for file . Committed blocks are waiting to be 
> minimally replicated. Try again later.
> 2019-05-16 14:00:16,893 WARN 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Cannot release the path 
>  in the lease [Lease.  Holder: DFSClient_NONMAPREDUCE_-20898906_61, 
> pending creates: 1]. It will be retried.
> org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: DIR* 
> NameSystem.internalReleaseLease: Failed to release lease for file . 
> Committed blocks are waiting to be minimally replicated. Try again later.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3357)
>   at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:573)
>   at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:509)
>   at java.lang.Thread.run(Thread.java:745)
> $  grep -c "Recovering.*DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 
> 1" hdfs_nn*
> hdfs_nn.log:1068035
> hdfs_nn.log.2019-05-16-14:1516179
> hdfs_nn.log.2019-05-16-15:1538350
> {noformat}
> Aside from an actual bug fix, it might make sense to make LeaseManager not 
> log so much, in case if there are more bugs like this...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14498) LeaseManager can loop forever on the file for which create has failed

2020-07-13 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156935#comment-17156935
 ] 

Eric Payne commented on HDFS-14498:
---

I see that a patch was put up for only branch-2.10, but the build is also 
broken for branch-3.2 and branch-3.1

> LeaseManager can loop forever on the file for which create has failed 
> --
>
> Key: HDFS-14498
> URL: https://issues.apache.org/jira/browse/HDFS-14498
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.9.0
>Reporter: Sergey Shelukhin
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.2.2, 3.3.1, 3.4.0, 3.1.5
>
> Attachments: HDFS-14498-branch-2.10.001.patch, HDFS-14498.001.patch, 
> HDFS-14498.002.patch
>
>
> The logs from file creation are long gone due to infinite lease logging, 
> however it presumably failed... the client who was trying to write this file 
> is definitely long dead.
> The version includes HDFS-4882.
> We get this log pattern repeating infinitely:
> {noformat}
> 2019-05-16 14:00:16,893 INFO 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: [Lease.  Holder: 
> DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 1] has expired hard 
> limit
> 2019-05-16 14:00:16,893 INFO 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease.  
> Holder: DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 1], src=
> 2019-05-16 14:00:16,893 WARN 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.internalReleaseLease: 
> Failed to release lease for file . Committed blocks are waiting to be 
> minimally replicated. Try again later.
> 2019-05-16 14:00:16,893 WARN 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Cannot release the path 
>  in the lease [Lease.  Holder: DFSClient_NONMAPREDUCE_-20898906_61, 
> pending creates: 1]. It will be retried.
> org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: DIR* 
> NameSystem.internalReleaseLease: Failed to release lease for file . 
> Committed blocks are waiting to be minimally replicated. Try again later.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3357)
>   at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:573)
>   at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:509)
>   at java.lang.Thread.run(Thread.java:745)
> $  grep -c "Recovering.*DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 
> 1" hdfs_nn*
> hdfs_nn.log:1068035
> hdfs_nn.log.2019-05-16-14:1516179
> hdfs_nn.log.2019-05-16-15:1538350
> {noformat}
> Aside from an actual bug fix, it might make sense to make LeaseManager not 
> log so much, in case if there are more bugs like this...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14758) Decrease lease hard limit

2019-08-20 Thread Eric Payne (Jira)
Eric Payne created HDFS-14758:
-

 Summary: Decrease lease hard limit
 Key: HDFS-14758
 URL: https://issues.apache.org/jira/browse/HDFS-14758
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Eric Payne


The hard limit is currently hard-coded to be 1 hour. This also determines the 
NN automatic lease recovery interval. Something like 20 min will make more 
sense.

After the 5 min soft limit, other clients can recover the lease. If no one else 
takes the lease away, the original client still can renew the lease within the 
hard limit. So even after a NN full GC of 8 minutes, leases can be still valid.

However, there is one risk in reducing the hard limit. E.g. Reduced to 20 min. 
If the NN crashes and the manual failover takes more than 20 minutes, clients 
will abort.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12625) Reduce expense of deleting large directories

2017-10-10 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HDFS-12625:
--
Release Note:   (was: Deletion of ~5M files on a large cluster jammed the 
NN for 52 seconds. The call queue overflowed and began rejecting clients. 14k 
calls were queued for which most clients timed out while the NN was hung. Tasks 
issuing the calls likely failed.)

> Reduce expense of deleting large directories
> 
>
> Key: HDFS-12625
> URL: https://issues.apache.org/jira/browse/HDFS-12625
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.9.0, 2.8.1, 3.1.0
>Reporter: Eric Payne
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12625) Reduce expense of deleting large directories

2017-10-10 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HDFS-12625:
--
Description: Deletion of ~5M files on a large cluster jammed the NN for 52 
seconds. The call queue overflowed and began rejecting clients. 14k calls were 
queued for which most clients timed out while the NN was hung. Tasks issuing 
the calls likely failed.

> Reduce expense of deleting large directories
> 
>
> Key: HDFS-12625
> URL: https://issues.apache.org/jira/browse/HDFS-12625
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.9.0, 2.8.1, 3.1.0
>Reporter: Eric Payne
>
> Deletion of ~5M files on a large cluster jammed the NN for 52 seconds. The 
> call queue overflowed and began rejecting clients. 14k calls were queued for 
> which most clients timed out while the NN was hung. Tasks issuing the calls 
> likely failed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12625) Reduce expense of deleting large directories

2017-10-10 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16199264#comment-16199264
 ] 

Eric Payne commented on HDFS-12625:
---

Features like snapshots have increased the difficulty to redesign deletes to be 
more asynchronous. A large delete should be profiled to target areas for 
optimization. Perhaps a max limit on files that can be deleted at once may 
mitigate issues.

> Reduce expense of deleting large directories
> 
>
> Key: HDFS-12625
> URL: https://issues.apache.org/jira/browse/HDFS-12625
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.9.0, 2.8.1, 3.1.0
>Reporter: Eric Payne
>
> Deletion of ~5M files on a large cluster jammed the NN for 52 seconds. The 
> call queue overflowed and began rejecting clients. 14k calls were queued for 
> which most clients timed out while the NN was hung. Tasks issuing the calls 
> likely failed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-12625) Reduce expense of deleting large directories

2017-10-10 Thread Eric Payne (JIRA)
Eric Payne created HDFS-12625:
-

 Summary: Reduce expense of deleting large directories
 Key: HDFS-12625
 URL: https://issues.apache.org/jira/browse/HDFS-12625
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.8.1, 2.9.0, 3.1.0
Reporter: Eric Payne






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10391) Always enable NameNode service RPC port

2017-09-14 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16166507#comment-16166507
 ] 

Eric Payne commented on HDFS-10391:
---

[~arpitagarwal] and [~GergelyNovak]
After this change, my datanode won't talk to my namenode in my pseudo cluster 
by default:
{code}
2017-09-13 14:04:52,393 [Thread-23] WARN datanode.DataNode: Problem connecting 
to server: hostname.x.y.com:9840
{code}

> Always enable NameNode service RPC port
> ---
>
> Key: HDFS-10391
> URL: https://issues.apache.org/jira/browse/HDFS-10391
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, namenode
>Reporter: Arpit Agarwal
>Assignee: Gergely Novák
>  Labels: Incompatible
> Fix For: 3.0.0-beta1
>
> Attachments: HDFS-10391.001.patch, HDFS-10391.002.patch, 
> HDFS-10391.003.patch, HDFS-10391.004.patch, HDFS-10391.005.patch, 
> HDFS-10391.006.patch, HDFS-10391.007.patch, HDFS-10391.008.patch, 
> HDFS-10391.009.patch, HDFS-10391.010.patch, HDFS-10391.v5-v6-delta.patch
>
>
> The NameNode should always be setup with a service RPC port so that it does 
> not have to be explicitly enabled by an administrator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11316) TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure fails in trunk

2017-04-19 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15975156#comment-15975156
 ] 

Eric Payne commented on HDFS-11316:
---

Thanks [~linyiqun]. I backported this to branch-2 and branch-2.8

> TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure fails in trunk
> 
>
> Key: HDFS-11316
> URL: https://issues.apache.org/jira/browse/HDFS-11316
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha2
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Minor
> Fix For: 2.9.0, 3.0.0-alpha2, 2.8.1
>
> Attachments: HDFS-11316.001.patch, HDFS-11316.002.patch
>
>
> The test {{TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure}} 
> fails frequently in recent Jenkins buildings. The stack infos:
> {code}
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure.testUnderReplicationAfterVolFailure
> java.lang.AssertionError: There is no under replicated block after volume 
> failure
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure.testUnderReplicationAfterVolFailure(TestDataNodeVolumeFailure.java:419)
> {code}
> We would be better using {{GenericTestUtils.waitFor}} here to wait for the 
> condition being satisfied.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11316) TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure fails in trunk

2017-04-19 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HDFS-11316:
--
Fix Version/s: 2.8.1
   2.9.0

> TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure fails in trunk
> 
>
> Key: HDFS-11316
> URL: https://issues.apache.org/jira/browse/HDFS-11316
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha2
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Minor
> Fix For: 2.9.0, 3.0.0-alpha2, 2.8.1
>
> Attachments: HDFS-11316.001.patch, HDFS-11316.002.patch
>
>
> The test {{TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure}} 
> fails frequently in recent Jenkins buildings. The stack infos:
> {code}
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure.testUnderReplicationAfterVolFailure
> java.lang.AssertionError: There is no under replicated block after volume 
> failure
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure.testUnderReplicationAfterVolFailure(TestDataNodeVolumeFailure.java:419)
> {code}
> We would be better using {{GenericTestUtils.waitFor}} here to wait for the 
> condition being satisfied.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11404) Increase timeout on TestShortCircuitLocalRead.testDeprecatedGetBlockLocalPathInfoRpc

2017-02-21 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HDFS-11404:
--
   Resolution: Fixed
Fix Version/s: 2.8.1
   3.0.0-alpha3
   2.9.0
   Status: Resolved  (was: Patch Available)

Committed to trunk, branch-2, and branch-2.8.

> Increase timeout on 
> TestShortCircuitLocalRead.testDeprecatedGetBlockLocalPathInfoRpc
> 
>
> Key: HDFS-11404
> URL: https://issues.apache.org/jira/browse/HDFS-11404
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Fix For: 2.9.0, 3.0.0-alpha3, 2.8.1
>
> Attachments: HDFS-11404.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11404) Increase timeout on TestShortCircuitLocalRead.testDeprecatedGetBlockLocalPathInfoRpc

2017-02-21 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15875976#comment-15875976
 ] 

Eric Payne commented on HDFS-11404:
---

+1

Thanks [~ebadger]. I will commit later today.

> Increase timeout on 
> TestShortCircuitLocalRead.testDeprecatedGetBlockLocalPathInfoRpc
> 
>
> Key: HDFS-11404
> URL: https://issues.apache.org/jira/browse/HDFS-11404
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-11404.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-9300) TestDirectoryScanner.testThrottle() is still a little flakey

2017-02-02 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HDFS-9300:
-
Fix Version/s: 2.8.1

Thanks [~templedf] for making this test more stable.

I backported it to branch-2.8

> TestDirectoryScanner.testThrottle() is still a little flakey
> 
>
> Key: HDFS-9300
> URL: https://issues.apache.org/jira/browse/HDFS-9300
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: balancer & mover, test
>Affects Versions: 2.7.1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
> Fix For: 2.9.0, 3.0.0-alpha1, 2.8.1
>
> Attachments: HDFS-9300.001.patch
>
>
> It failed in:
> https://builds.apache.org/job/PreCommit-HDFS-Build/13160/testReport/org.apache.hadoop.hdfs.server.datanode/TestDirectoryScanner/testThrottling/
> by narrowly missing the performance boundaries.  The only solution I have is 
> to relax the boundaries a little.  The throttle's just a hard thing to test 
> in an unpredictable environment.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-9905) WebHdfsFileSystem#runWithRetry should display original stack trace on error

2016-06-09 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HDFS-9905:
-
Attachment: HDFS-9905-branch-2.7.002.patch

HDFS-7163, which introduced this problem, was originally merged back to 2.7, 
but this JIRA (HDFS-9905), was not. The misleading stack trace is causing 
difficulty in debugging problems for us, so I cherry-picked this back to 2.7: 

> WebHdfsFileSystem#runWithRetry should display original stack trace on error
> ---
>
> Key: HDFS-9905
> URL: https://issues.apache.org/jira/browse/HDFS-9905
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.7.3
>Reporter: Kihwal Lee
>Assignee: Wei-Chiu Chuang
> Fix For: 2.8.0
>
> Attachments: HDFS-9905-branch-2.7.002.patch, HDFS-9905.001.patch, 
> HDFS-9905.002.patch
>
>
> When checking for a timeout in {{TestWebHdfsTimeouts}}, it does get 
> {{SocketTimeoutException}}, but the message sometimes does not contain 
> "connect timed out". Since the original exception is not logged, we do not 
> know details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9905) TestWebHdfsTimeouts fails occasionally

2016-04-14 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241844#comment-15241844
 ] 

Eric Payne commented on HDFS-9905:
--

bq.  java.net.SocksSocketImpl is possible to throw SocketTimeoutException with 
null message. We seem not to be able to expect that SocketTimeoutException 
always contains message such as "Read timed out" or "connect timed out".
bq. Use GenericTestUtils.assertExceptionContains instead of Assert.assertEquals 
so that if the string doesn't match, it logs the exception.

Thanks, [~iwasakims] and [~jojochuang] for your work on this issue. I don't 
know what would cause {{SocketTimeoutException}} to give a null message instead 
of the expected {{Read timed out}}. However, your point about the original 
stack trace being lost is a very good one:
bq. the exception object was reinterpreted in the exception handling, so the 
original stack trace was lost.

In {{WebHdfsFileSystem#AbstractRunner#runWithRetry}}, the code that recreates 
the exception with the node name should also propagate the stack trace:
{code}
  ioe = ioe.getClass().getConstructor(String.class)
.newInstance(node + ": " + ioe.getMessage());
{code}
Should be:
{code}
  IOException newIoe =
  ioe.getClass().getConstructor(String.class)
.newInstance(node + ": " + ioe.getMessage());
  newIoe.setStackTrace(ioe.getStackTrace());
  ioe = newIoe;
{code}
I can open a separate JIRA for this if you want.

> TestWebHdfsTimeouts fails occasionally
> --
>
> Key: HDFS-9905
> URL: https://issues.apache.org/jira/browse/HDFS-9905
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.7.3
>Reporter: Kihwal Lee
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-9905.001.patch
>
>
> When checking for a timeout, it does get {{SocketTimeoutException}}, but the 
> message sometimes does not contain "connect timed out". Since the original 
> exception is not logged, we do not know details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9945) Datanode command for evicting writers

2016-04-07 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HDFS-9945:
-
   Resolution: Fixed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

> Datanode command for evicting writers
> -
>
> Key: HDFS-9945
> URL: https://issues.apache.org/jira/browse/HDFS-9945
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
> Fix For: 2.8.0
>
> Attachments: HDFS-9945.patch, HDFS-9945.v2.patch
>
>
> It will be useful if there is a command to evict writers from a datanode. 
> When a set of datanodes are being decommissioned, they can get blocked by 
> slow writers at the end.  It was rare in the old days since mapred jobs 
> didn't last too long, but with many different types of apps running on 
> today's YARN cluster, we are often see very long tail in datanode 
> decommissioning.
> I propose a new dfsadmin command, {{evictWriters}}, to be added. I initially 
> thought about having namenode automatically telling datanodes on 
> decommissioning, but realized that having a command is more flexible. E.g. 
> users can choose not to do this at all, choose when to evict writers, or 
> whether to try multiple times for whatever reasons.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9945) Datanode command for evicting writers

2016-04-06 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15228984#comment-15228984
 ] 

Eric Payne commented on HDFS-9945:
--

Thanks, [~kihwal], for providing this feature. The patch looks good to me.

+1. I will commit to trunk, branch-2, and branch-2.8

> Datanode command for evicting writers
> -
>
> Key: HDFS-9945
> URL: https://issues.apache.org/jira/browse/HDFS-9945
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
> Attachments: HDFS-9945.patch, HDFS-9945.v2.patch
>
>
> It will be useful if there is a command to evict writers from a datanode. 
> When a set of datanodes are being decommissioned, they can get blocked by 
> slow writers at the end.  It was rare in the old days since mapred jobs 
> didn't last too long, but with many different types of apps running on 
> today's YARN cluster, we are often see very long tail in datanode 
> decommissioning.
> I propose a new dfsadmin command, {{evictWriters}}, to be added. I initially 
> thought about having namenode automatically telling datanodes on 
> decommissioning, but realized that having a command is more flexible. E.g. 
> users can choose not to do this at all, choose when to evict writers, or 
> whether to try multiple times for whatever reasons.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9634) webhdfs client side exceptions don't provide enough details

2016-03-19 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202068#comment-15202068
 ] 

Eric Payne commented on HDFS-9634:
--

[~jojochuang], Thanks for reviewing the functionality of this patch.

Did you use the FS Shell to test this (i.e., {{hadoop fs -cat 
webhdfs://SOMEHOST/MyHome/myfile.txt}})? If so, the FS Shell swallows the stack 
trace and just prints the message when accessing files via webhdfs. It has done 
this for a long time. In my test environment, I reverted this change, and FS 
Shell behaves the same way.

When I write a test java program, read a file via webhdfs, and inject an error, 
I do get the whole stack trace, including the cause message.

> webhdfs client side exceptions don't provide enough details
> ---
>
> Key: HDFS-9634
> URL: https://issues.apache.org/jira/browse/HDFS-9634
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 3.0.0, 2.8.0, 2.7.1
>Reporter: Eric Payne
>Assignee: Eric Payne
> Fix For: 2.7.3
>
> Attachments: HDFS-9634.001.patch, HDFS-9634.002.patch
>
>
> When a WebHDFS client side exception (for example, read timeout) occurs there 
> are no details beyond the fact that a timeout occurred. Ideally it should say 
> which node is responsible for the timeout, but failing that it should at 
> least say which node we're talking to so we can examine that node's logs to 
> further investigate.
> {noformat}
> java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.read(SocketInputStream.java:150)
> at java.net.SocketInputStream.read(SocketInputStream.java:121)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
> at sun.net.www.MeteredStream.read(MeteredStream.java:134)
> at java.io.FilterInputStream.read(FilterInputStream.java:133)
> at 
> sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:3035)
> at 
> org.apache.commons.io.input.BoundedInputStream.read(BoundedInputStream.java:121)
> at 
> org.apache.hadoop.hdfs.web.ByteRangeInputStream.read(ByteRangeInputStream.java:188)
> at java.io.DataInputStream.read(DataInputStream.java:149)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
> at 
> com.yahoo.grid.tools.util.io.ThrottledBufferedInputStream.read(ThrottledBufferedInputStream.java:58)
> at java.io.FilterInputStream.read(FilterInputStream.java:107)
> at 
> com.yahoo.grid.replication.distcopy.tasklet.HFTPDistributedCopy.copyBytes(HFTPDistributedCopy.java:495)
> at 
> com.yahoo.grid.replication.distcopy.tasklet.HFTPDistributedCopy.doCopy(HFTPDistributedCopy.java:440)
> at 
> com.yahoo.grid.replication.distcopy.tasklet.HFTPDistributedCopy.access$200(HFTPDistributedCopy.java:57)
> at 
> com.yahoo.grid.replication.distcopy.tasklet.HFTPDistributedCopy$1.doExecute(HFTPDistributedCopy.java:387)
> ... 12 more
> {noformat}
> There are no clues as to which datanode we're talking to nor which datanode 
> was responsible for the timeout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9634) webhdfs client side exceptions don't provide enough details

2016-01-21 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110644#comment-15110644
 ] 

Eric Payne commented on HDFS-9634:
--

Sorry, that should have been java 1.7 and 1.8.

> webhdfs client side exceptions don't provide enough details
> ---
>
> Key: HDFS-9634
> URL: https://issues.apache.org/jira/browse/HDFS-9634
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 3.0.0, 2.8.0, 2.7.1
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: HDFS-9634.001.patch, HDFS-9634.002.patch
>
>
> When a WebHDFS client side exception (for example, read timeout) occurs there 
> are no details beyond the fact that a timeout occurred. Ideally it should say 
> which node is responsible for the timeout, but failing that it should at 
> least say which node we're talking to so we can examine that node's logs to 
> further investigate.
> {noformat}
> java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.read(SocketInputStream.java:150)
> at java.net.SocketInputStream.read(SocketInputStream.java:121)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
> at sun.net.www.MeteredStream.read(MeteredStream.java:134)
> at java.io.FilterInputStream.read(FilterInputStream.java:133)
> at 
> sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:3035)
> at 
> org.apache.commons.io.input.BoundedInputStream.read(BoundedInputStream.java:121)
> at 
> org.apache.hadoop.hdfs.web.ByteRangeInputStream.read(ByteRangeInputStream.java:188)
> at java.io.DataInputStream.read(DataInputStream.java:149)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
> at 
> com.yahoo.grid.tools.util.io.ThrottledBufferedInputStream.read(ThrottledBufferedInputStream.java:58)
> at java.io.FilterInputStream.read(FilterInputStream.java:107)
> at 
> com.yahoo.grid.replication.distcopy.tasklet.HFTPDistributedCopy.copyBytes(HFTPDistributedCopy.java:495)
> at 
> com.yahoo.grid.replication.distcopy.tasklet.HFTPDistributedCopy.doCopy(HFTPDistributedCopy.java:440)
> at 
> com.yahoo.grid.replication.distcopy.tasklet.HFTPDistributedCopy.access$200(HFTPDistributedCopy.java:57)
> at 
> com.yahoo.grid.replication.distcopy.tasklet.HFTPDistributedCopy$1.doExecute(HFTPDistributedCopy.java:387)
> ... 12 more
> {noformat}
> There are no clues as to which datanode we're talking to nor which datanode 
> was responsible for the timeout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9634) webhdfs client side exceptions don't provide enough details

2016-01-21 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110643#comment-15110643
 ] 

Eric Payne commented on HDFS-9634:
--

Thanks a lot, [~kihwal] and [~shahrs87]. I have run {{TestWebHdfsTimeouts}} 
several times with both java 2.7 and 2.8, and it always succeeds for me.

> webhdfs client side exceptions don't provide enough details
> ---
>
> Key: HDFS-9634
> URL: https://issues.apache.org/jira/browse/HDFS-9634
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 3.0.0, 2.8.0, 2.7.1
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: HDFS-9634.001.patch, HDFS-9634.002.patch
>
>
> When a WebHDFS client side exception (for example, read timeout) occurs there 
> are no details beyond the fact that a timeout occurred. Ideally it should say 
> which node is responsible for the timeout, but failing that it should at 
> least say which node we're talking to so we can examine that node's logs to 
> further investigate.
> {noformat}
> java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.read(SocketInputStream.java:150)
> at java.net.SocketInputStream.read(SocketInputStream.java:121)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
> at sun.net.www.MeteredStream.read(MeteredStream.java:134)
> at java.io.FilterInputStream.read(FilterInputStream.java:133)
> at 
> sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:3035)
> at 
> org.apache.commons.io.input.BoundedInputStream.read(BoundedInputStream.java:121)
> at 
> org.apache.hadoop.hdfs.web.ByteRangeInputStream.read(ByteRangeInputStream.java:188)
> at java.io.DataInputStream.read(DataInputStream.java:149)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
> at 
> com.yahoo.grid.tools.util.io.ThrottledBufferedInputStream.read(ThrottledBufferedInputStream.java:58)
> at java.io.FilterInputStream.read(FilterInputStream.java:107)
> at 
> com.yahoo.grid.replication.distcopy.tasklet.HFTPDistributedCopy.copyBytes(HFTPDistributedCopy.java:495)
> at 
> com.yahoo.grid.replication.distcopy.tasklet.HFTPDistributedCopy.doCopy(HFTPDistributedCopy.java:440)
> at 
> com.yahoo.grid.replication.distcopy.tasklet.HFTPDistributedCopy.access$200(HFTPDistributedCopy.java:57)
> at 
> com.yahoo.grid.replication.distcopy.tasklet.HFTPDistributedCopy$1.doExecute(HFTPDistributedCopy.java:387)
> ... 12 more
> {noformat}
> There are no clues as to which datanode we're talking to nor which datanode 
> was responsible for the timeout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6221) Webhdfs should recover from dead DNs

2016-01-12 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15095183#comment-15095183
 ] 

Eric Payne commented on HDFS-6221:
--

[~daryn], is this still a problem post HDFS-7163?

> Webhdfs should recover from dead DNs
> 
>
> Key: HDFS-6221
> URL: https://issues.apache.org/jira/browse/HDFS-6221
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, webhdfs
>Affects Versions: 2.0.0-alpha, 3.0.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>
> We've repeatedly observed the jetty acceptor thread silently dying in the 
> DNs.  The webhdfs servlet may also "disappear" and jetty returns non-json 
> 404s.
> One approach to make webhdfs more resilient to bad DNs is dfsclient-like 
> fetching of block locations to directly access the DNs instead of relying on 
> a NN redirect that may repeatedly send the client to the same faulty DN(s).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9634) webhdfs client side exceptions don't provide enough details

2016-01-12 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094893#comment-15094893
 ] 

Eric Payne commented on HDFS-9634:
--

All of the unit tests that are listed above passed when I ran them in my local 
buildenvironment.

> webhdfs client side exceptions don't provide enough details
> ---
>
> Key: HDFS-9634
> URL: https://issues.apache.org/jira/browse/HDFS-9634
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 3.0.0, 2.8.0, 2.7.1
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: HDFS-9634.001.patch, HDFS-9634.002.patch
>
>
> When a WebHDFS client side exception (for example, read timeout) occurs there 
> are no details beyond the fact that a timeout occurred. Ideally it should say 
> which node is responsible for the timeout, but failing that it should at 
> least say which node we're talking to so we can examine that node's logs to 
> further investigate.
> {noformat}
> java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.read(SocketInputStream.java:150)
> at java.net.SocketInputStream.read(SocketInputStream.java:121)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
> at sun.net.www.MeteredStream.read(MeteredStream.java:134)
> at java.io.FilterInputStream.read(FilterInputStream.java:133)
> at 
> sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:3035)
> at 
> org.apache.commons.io.input.BoundedInputStream.read(BoundedInputStream.java:121)
> at 
> org.apache.hadoop.hdfs.web.ByteRangeInputStream.read(ByteRangeInputStream.java:188)
> at java.io.DataInputStream.read(DataInputStream.java:149)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
> at 
> com.yahoo.grid.tools.util.io.ThrottledBufferedInputStream.read(ThrottledBufferedInputStream.java:58)
> at java.io.FilterInputStream.read(FilterInputStream.java:107)
> at 
> com.yahoo.grid.replication.distcopy.tasklet.HFTPDistributedCopy.copyBytes(HFTPDistributedCopy.java:495)
> at 
> com.yahoo.grid.replication.distcopy.tasklet.HFTPDistributedCopy.doCopy(HFTPDistributedCopy.java:440)
> at 
> com.yahoo.grid.replication.distcopy.tasklet.HFTPDistributedCopy.access$200(HFTPDistributedCopy.java:57)
> at 
> com.yahoo.grid.replication.distcopy.tasklet.HFTPDistributedCopy$1.doExecute(HFTPDistributedCopy.java:387)
> ... 12 more
> {noformat}
> There are no clues as to which datanode we're talking to nor which datanode 
> was responsible for the timeout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9634) webhdfs client side exceptions don't provide enough details

2016-01-12 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HDFS-9634:
-
Attachment: HDFS-9634.002.patch

Attaching HDFS-9634-002.patch. Sorry about the previous bad patch.

> webhdfs client side exceptions don't provide enough details
> ---
>
> Key: HDFS-9634
> URL: https://issues.apache.org/jira/browse/HDFS-9634
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 3.0.0, 2.8.0, 2.7.1
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: HDFS-9634.001.patch, HDFS-9634.002.patch
>
>
> When a WebHDFS client side exception (for example, read timeout) occurs there 
> are no details beyond the fact that a timeout occurred. Ideally it should say 
> which node is responsible for the timeout, but failing that it should at 
> least say which node we're talking to so we can examine that node's logs to 
> further investigate.
> {noformat}
> java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.read(SocketInputStream.java:150)
> at java.net.SocketInputStream.read(SocketInputStream.java:121)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
> at sun.net.www.MeteredStream.read(MeteredStream.java:134)
> at java.io.FilterInputStream.read(FilterInputStream.java:133)
> at 
> sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:3035)
> at 
> org.apache.commons.io.input.BoundedInputStream.read(BoundedInputStream.java:121)
> at 
> org.apache.hadoop.hdfs.web.ByteRangeInputStream.read(ByteRangeInputStream.java:188)
> at java.io.DataInputStream.read(DataInputStream.java:149)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
> at 
> com.yahoo.grid.tools.util.io.ThrottledBufferedInputStream.read(ThrottledBufferedInputStream.java:58)
> at java.io.FilterInputStream.read(FilterInputStream.java:107)
> at 
> com.yahoo.grid.replication.distcopy.tasklet.HFTPDistributedCopy.copyBytes(HFTPDistributedCopy.java:495)
> at 
> com.yahoo.grid.replication.distcopy.tasklet.HFTPDistributedCopy.doCopy(HFTPDistributedCopy.java:440)
> at 
> com.yahoo.grid.replication.distcopy.tasklet.HFTPDistributedCopy.access$200(HFTPDistributedCopy.java:57)
> at 
> com.yahoo.grid.replication.distcopy.tasklet.HFTPDistributedCopy$1.doExecute(HFTPDistributedCopy.java:387)
> ... 12 more
> {noformat}
> There are no clues as to which datanode we're talking to nor which datanode 
> was responsible for the timeout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9634) webhdfs client side exceptions don't provide enough details

2016-01-11 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HDFS-9634:
-
Target Version/s: 3.0.0, 2.8.0
  Status: Patch Available  (was: Open)

[~daryn], [~kihwal], and [~jlowe]:
Attached HDFS-9634.001.patch

> webhdfs client side exceptions don't provide enough details
> ---
>
> Key: HDFS-9634
> URL: https://issues.apache.org/jira/browse/HDFS-9634
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 2.7.1, 3.0.0, 2.8.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: HDFS-9634.001.patch
>
>
> When a WebHDFS client side exception (for example, read timeout) occurs there 
> are no details beyond the fact that a timeout occurred. Ideally it should say 
> which node is responsible for the timeout, but failing that it should at 
> least say which node we're talking to so we can examine that node's logs to 
> further investigate.
> {noformat}
> java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.read(SocketInputStream.java:150)
> at java.net.SocketInputStream.read(SocketInputStream.java:121)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
> at sun.net.www.MeteredStream.read(MeteredStream.java:134)
> at java.io.FilterInputStream.read(FilterInputStream.java:133)
> at 
> sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:3035)
> at 
> org.apache.commons.io.input.BoundedInputStream.read(BoundedInputStream.java:121)
> at 
> org.apache.hadoop.hdfs.web.ByteRangeInputStream.read(ByteRangeInputStream.java:188)
> at java.io.DataInputStream.read(DataInputStream.java:149)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
> at 
> com.yahoo.grid.tools.util.io.ThrottledBufferedInputStream.read(ThrottledBufferedInputStream.java:58)
> at java.io.FilterInputStream.read(FilterInputStream.java:107)
> at 
> com.yahoo.grid.replication.distcopy.tasklet.HFTPDistributedCopy.copyBytes(HFTPDistributedCopy.java:495)
> at 
> com.yahoo.grid.replication.distcopy.tasklet.HFTPDistributedCopy.doCopy(HFTPDistributedCopy.java:440)
> at 
> com.yahoo.grid.replication.distcopy.tasklet.HFTPDistributedCopy.access$200(HFTPDistributedCopy.java:57)
> at 
> com.yahoo.grid.replication.distcopy.tasklet.HFTPDistributedCopy$1.doExecute(HFTPDistributedCopy.java:387)
> ... 12 more
> {noformat}
> There are no clues as to which datanode we're talking to nor which datanode 
> was responsible for the timeout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9634) webhdfs client side exceptions don't provide enough details

2016-01-11 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HDFS-9634:
-
Attachment: HDFS-9634.001.patch

> webhdfs client side exceptions don't provide enough details
> ---
>
> Key: HDFS-9634
> URL: https://issues.apache.org/jira/browse/HDFS-9634
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 3.0.0, 2.8.0, 2.7.1
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: HDFS-9634.001.patch
>
>
> When a WebHDFS client side exception (for example, read timeout) occurs there 
> are no details beyond the fact that a timeout occurred. Ideally it should say 
> which node is responsible for the timeout, but failing that it should at 
> least say which node we're talking to so we can examine that node's logs to 
> further investigate.
> {noformat}
> java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.read(SocketInputStream.java:150)
> at java.net.SocketInputStream.read(SocketInputStream.java:121)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
> at sun.net.www.MeteredStream.read(MeteredStream.java:134)
> at java.io.FilterInputStream.read(FilterInputStream.java:133)
> at 
> sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:3035)
> at 
> org.apache.commons.io.input.BoundedInputStream.read(BoundedInputStream.java:121)
> at 
> org.apache.hadoop.hdfs.web.ByteRangeInputStream.read(ByteRangeInputStream.java:188)
> at java.io.DataInputStream.read(DataInputStream.java:149)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
> at 
> com.yahoo.grid.tools.util.io.ThrottledBufferedInputStream.read(ThrottledBufferedInputStream.java:58)
> at java.io.FilterInputStream.read(FilterInputStream.java:107)
> at 
> com.yahoo.grid.replication.distcopy.tasklet.HFTPDistributedCopy.copyBytes(HFTPDistributedCopy.java:495)
> at 
> com.yahoo.grid.replication.distcopy.tasklet.HFTPDistributedCopy.doCopy(HFTPDistributedCopy.java:440)
> at 
> com.yahoo.grid.replication.distcopy.tasklet.HFTPDistributedCopy.access$200(HFTPDistributedCopy.java:57)
> at 
> com.yahoo.grid.replication.distcopy.tasklet.HFTPDistributedCopy$1.doExecute(HFTPDistributedCopy.java:387)
> ... 12 more
> {noformat}
> There are no clues as to which datanode we're talking to nor which datanode 
> was responsible for the timeout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9634) webhdfs client side exceptions don't provide enough details

2016-01-09 Thread Eric Payne (JIRA)
Eric Payne created HDFS-9634:


 Summary: webhdfs client side exceptions don't provide enough 
details
 Key: HDFS-9634
 URL: https://issues.apache.org/jira/browse/HDFS-9634
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 2.7.1, 3.0.0, 2.8.0
Reporter: Eric Payne
Assignee: Eric Payne


When a WebHDFS client side exception (for example, read timeout) occurs there 
are no details beyond the fact that a timeout occurred. Ideally it should say 
which node is responsible for the timeout, but failing that it should at least 
say which node we're talking to so we can examine that node's logs to further 
investigate.
{noformat}
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:150)
at java.net.SocketInputStream.read(SocketInputStream.java:121)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at sun.net.www.MeteredStream.read(MeteredStream.java:134)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at 
sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:3035)
at 
org.apache.commons.io.input.BoundedInputStream.read(BoundedInputStream.java:121)
at 
org.apache.hadoop.hdfs.web.ByteRangeInputStream.read(ByteRangeInputStream.java:188)
at java.io.DataInputStream.read(DataInputStream.java:149)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at 
com.yahoo.grid.tools.util.io.ThrottledBufferedInputStream.read(ThrottledBufferedInputStream.java:58)
at java.io.FilterInputStream.read(FilterInputStream.java:107)
at 
com.yahoo.grid.replication.distcopy.tasklet.HFTPDistributedCopy.copyBytes(HFTPDistributedCopy.java:495)
at 
com.yahoo.grid.replication.distcopy.tasklet.HFTPDistributedCopy.doCopy(HFTPDistributedCopy.java:440)
at 
com.yahoo.grid.replication.distcopy.tasklet.HFTPDistributedCopy.access$200(HFTPDistributedCopy.java:57)
at 
com.yahoo.grid.replication.distcopy.tasklet.HFTPDistributedCopy$1.doExecute(HFTPDistributedCopy.java:387)
... 12 more
{noformat}
There are no clues as to which datanode we're talking to nor which datanode was 
responsible for the timeout.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7163) WebHdfsFileSystem should retry reads according to the configured retry policy.

2015-12-29 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15074044#comment-15074044
 ] 

Eric Payne commented on HDFS-7163:
--

{quote}
||Vote||Subsystem||Runtime||Comment||
|-1|compile|0m 43s|hadoop-hdfs in branch-2.7 failed with JDK v1.8.0_66.|
|-1|compile|0m 44s|hadoop-hdfs in branch-2.7 failed with JDK v1.7.0_91.|
{quote}
I'm not sure what's wrong with the HDFS pre-commit build. This patch builds for 
me locally.

Thanks again, [~kihwal].

> WebHdfsFileSystem should retry reads according to the configured retry policy.
> --
>
> Key: HDFS-7163
> URL: https://issues.apache.org/jira/browse/HDFS-7163
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 3.0.0, 2.5.1
>Reporter: Eric Payne
>Assignee: Eric Payne
> Fix For: 3.0.0, 2.7.3
>
> Attachments: HDFS-7163-branch-2.003.patch, 
> HDFS-7163-branch-2.004.patch, HDFS-7163-branch-2.7.003.patch, 
> HDFS-7163-branch-2.7.004.patch, HDFS-7163-branch-2.7.005.patch, 
> HDFS-7163.001.patch, HDFS-7163.002.patch, HDFS-7163.003.patch, 
> HDFS-7163.004.patch, HDFS-7163.005.patch, WebHDFS Read Retry.pdf
>
>
> In the current implementation of WebHdfsFileSystem, opens are retried 
> according to the configured retry policy, but not reads. Therefore, if a 
> connection goes down while data is being read, the read will fail and the 
> read will have to be retried by the client code.
> Also, after a connection has been established, the next read (or seek/read) 
> will fail and the read will have to be restarted by the client code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7163) WebHdfsFileSystem should retry reads according to the configured retry policy.

2015-12-26 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HDFS-7163:
-
Attachment: HDFS-7163-branch-2.7.005.patch

bq. I've committed this to trunk, branch-2 and branch-2.8. Eric Payne, please 
post a 2.7 version.
[~kihwal], Thank you!

Attaching {{HDFS-7163-branch-2.7.005.patch}}

> WebHdfsFileSystem should retry reads according to the configured retry policy.
> --
>
> Key: HDFS-7163
> URL: https://issues.apache.org/jira/browse/HDFS-7163
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 3.0.0, 2.5.1
>Reporter: Eric Payne
>Assignee: Eric Payne
> Fix For: 3.0.0
>
> Attachments: HDFS-7163-branch-2.003.patch, 
> HDFS-7163-branch-2.004.patch, HDFS-7163-branch-2.7.003.patch, 
> HDFS-7163-branch-2.7.004.patch, HDFS-7163-branch-2.7.005.patch, 
> HDFS-7163.001.patch, HDFS-7163.002.patch, HDFS-7163.003.patch, 
> HDFS-7163.004.patch, HDFS-7163.005.patch, WebHDFS Read Retry.pdf
>
>
> In the current implementation of WebHdfsFileSystem, opens are retried 
> according to the configured retry policy, but not reads. Therefore, if a 
> connection goes down while data is being read, the read will fail and the 
> read will have to be retried by the client code.
> Also, after a connection has been established, the next read (or seek/read) 
> will fail and the read will have to be restarted by the client code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7163) WebHdfsFileSystem should retry reads according to the configured retry policy.

2015-12-17 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HDFS-7163:
-
Attachment: HDFS-7163.005.patch

[~daryn], thank you very much for your in-depth analysis and helpful comments!

{quote}
Code:

1. We may want to defer the open until a read occurs.
{quote}
I have done that in this patch, but it caused some of the unit tests to fail 
because they were expecting the input stream to be open after the {{fs.open()}} 
call. I had to change these tests:
- {{FSXAttrBaseTest}}
- {{TestAuditLogs}}
- {{TestWebHdfsFileSystemContract}}
- {{TestWebHdfsTokens}}

My question is, are we comfortable that nothing is depending on the current 
behavior?

bq. 2. {{runnerState}} ... should just be initialized as DISCONNECTED.
Done.

bq. 3. If read(...) throws an IOE due to an explicitly closed stream, will 
retries occur?
No. The check for the explicitly closed state happens outside of the retry 
logic.

bq. 4. In {{connect(URL)}}, Calling it {{cachedConnection}} would clarify its 
purpose.
Done.

{quote}
5. In getResponse:
5.1. Should {{initializeInputStream}} be unconditionally invoked inside the 
prior null check on connection? Ie. Is there ever a case when it shouldn't be 
initialized when a new connection is made?
5.2. I think the logic should be if (conn != cachedConnection) { 
cachedConnection = conn; in = initializeInputStream(cachedConnection) }
{quote}
If the connection is not cached, initialization always needs to happen. 
However, the converse is not true. That is, even if connection is cached, 
initialization still may need to happen.

For a seek, the connection is cached into {{cachedConnection}} by 
{{ReadRunner#read}} after invoking the {{URLRunner}} to make the connection. 
The {{URLRunner}} is used rather than the {{ReadRunner}} so that 
{{AbstractRunner#connect}} can be told that the URL has already been 
redirected. On the ohter hand, for a regular read (non-seek case), the 
{{ReadRunner#connect}} makes the connection, but {{cachedConnection}} isn't 
cached until {{eadRunner#getResponse}} because we want {{validateResponse}} to 
be run before caching the connection.

So, in {{ReadRunner#getResponse}}, in the seek case, {{cachedConnection}} will 
be non-null, but the input stream ({{in}}) will be null. In the regular read 
case, both will be null.

So, I took out the check for if {{cachedConnection}} is null and always cache 
it, but I kept the check for if {{in}} is null. I realize that 
{{cachedConnection}} doesn't always need to be cached, but the performance cost 
is small and it makes the code cleaner.

bq.5.3. Should use URL#getAuthority instead of explicitly extracting and 
joining the host and port.
Done.

bq. 6. In ReadRunner#initializeInputStream has a misspelled "performznt".
Done

bq. 7. In {{closeInputStream}}, I'd use {{IOUtils.closeStream}} to ensure the 
close doesn't throw which would prevent the stream state from being updated.
I replaced {{in.close()}} with {{IOUtils.close(cachedConnection)}}. Is that 
what you meant?

bq. 8. In general the state management isn't clear. DISCONNECTED vs SEEK appear 
to be the same, with the exception that SEEK allows the connection to be 
reopened. When errors occur and the stream is DISCONNECTED, are you sure it 
will retry/recover in all cases?
I've done quite a bit of manual testing in a full cluster with reasonably 
substantial files (16GB). Can you be more specific about your concerns?

As far as each state is concerned, SEEK and DISCONNECTED are a little different 
than your comment. Let me try to explain in a little more detail
- DISCONNECTED
Connection is closed programmatically by ReadRunner after an exception has 
occurred. {{ReadRunner}} will attempt to open a new connection if it is retried 
while in this state.
- OPEN
Connection has been successfully established by {{ReadRunner}}. This occurs 
after the input stream has been initialized.
- SEEK
{{ReadRunner}} will only be put in this state if the user code has explicitly 
called seek(). {{ReadRunner}} will use this state as a trigger to perform a 
redirected connection (as I have discussed above in my reply to your point, 
5.1). Once the connection is established and the input stream is initialized, 
the {{RunnerState}} will move to OPEN. Retries will not be attempted while in 
this state. If an IOException occurs while {{URLRunner}} is attempting to open 
a redirected connection, {{ReadRunner}} will move to the DISCOMMECTED state and 
retry via the normal read path.
- CLOSED
{{ReadRunner}} is put in this state when user code has explicitly called 
close().

Also, as part of this patch, I added a {{RunnerState}} parameter to the 
{{closeInputStream}} method. These two are not necessarily tied together, but 
it does make it clearer (at least in my mind) which state {{ReadRunner}} will 
be moving to as a result of the action. If you are uncomfortable with that, I 

[jira] [Commented] (HDFS-7163) WebHdfsFileSystem should retry reads according to the configured retry policy.

2015-12-01 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15034375#comment-15034375
 ] 

Eric Payne commented on HDFS-7163:
--

[~wheat9] and [~daryn],
Did my comments 
[above|https://issues.apache.org/jira/browse/HDFS-7163?focusedCommentId=15019039&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15019039]
 make sense:
{quote}
In this patch, if the DN that is being read from goes down, WebHDFS will put 
that DN into the client's URL exclude list before querying the NN again for 
another DN. The only time the same DN is reused is if a seek has occurred.
bq. An alternative approach is to have WebHDFS (1) expose a GET_BLOCK call 
where the DN returns the block directly, and (2) be a smarter client that 
retries based on block locations.
Although this may be a more elegant solution, I think that could be done as 
part of a separate JIRA, given that we can take advantage of the exclude list 
functionality as I mentioned above.
{quote}

> WebHdfsFileSystem should retry reads according to the configured retry policy.
> --
>
> Key: HDFS-7163
> URL: https://issues.apache.org/jira/browse/HDFS-7163
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 3.0.0, 2.5.1
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: HDFS-7163-branch-2.003.patch, 
> HDFS-7163-branch-2.004.patch, HDFS-7163-branch-2.7.003.patch, 
> HDFS-7163-branch-2.7.004.patch, HDFS-7163.001.patch, HDFS-7163.002.patch, 
> HDFS-7163.003.patch, HDFS-7163.004.patch, WebHDFS Read Retry.pdf
>
>
> In the current implementation of WebHdfsFileSystem, opens are retried 
> according to the configured retry policy, but not reads. Therefore, if a 
> connection goes down while data is being read, the read will fail and the 
> read will have to be retried by the client code.
> Also, after a connection has been established, the next read (or seek/read) 
> will fail and the read will have to be restarted by the client code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7163) WebHdfsFileSystem should retry reads according to the configured retry policy.

2015-11-24 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15025288#comment-15025288
 ] 

Eric Payne commented on HDFS-7163:
--

Although the following tests are listed in the above {{Failed unit tests}} 
section, they all passed for me in my local environment:
{code}
hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes
hadoop.hdfs.server.blockmanagement.TestRBWBlockInvalidation
hadoop.hdfs.server.datanode.TestDataNodeMetrics
hadoop.hdfs.server.namenode.ha.TestHASafeMode
hadoop.hdfs.server.namenode.snapshot.TestSnapshotDeletion
hadoop.hdfs.server.namenode.TestDecommissioningStatus
hadoop.hdfs.shortcircuit.TestShortCircuitCache
hadoop.hdfs.TestDFSStripedOutputStreamWithFailure
hadoop.hdfs.TestDFSStripedOutputStreamWithFailure160
hadoop.hdfs.TestEncryptionZones
hadoop.hdfs.TestReadStripedFileWithDecoding
hadoop.hdfs.TestReplaceDatanodeOnFailure
hadoop.hdfs.web.TestWebHDFS
{code}

And, this one failed with and without my patch:
{code}
hadoop.security.TestPermission
{code}

> WebHdfsFileSystem should retry reads according to the configured retry policy.
> --
>
> Key: HDFS-7163
> URL: https://issues.apache.org/jira/browse/HDFS-7163
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 3.0.0, 2.5.1
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: HDFS-7163-branch-2.003.patch, 
> HDFS-7163-branch-2.004.patch, HDFS-7163-branch-2.7.003.patch, 
> HDFS-7163-branch-2.7.004.patch, HDFS-7163.001.patch, HDFS-7163.002.patch, 
> HDFS-7163.003.patch, HDFS-7163.004.patch, WebHDFS Read Retry.pdf
>
>
> In the current implementation of WebHdfsFileSystem, opens are retried 
> according to the configured retry policy, but not reads. Therefore, if a 
> connection goes down while data is being read, the read will fail and the 
> read will have to be retried by the client code.
> Also, after a connection has been established, the next read (or seek/read) 
> will fail and the read will have to be restarted by the client code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7163) WebHdfsFileSystem should retry reads according to the configured retry policy.

2015-11-24 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15025292#comment-15025292
 ] 

Eric Payne commented on HDFS-7163:
--

Looks like {{TestPermission#testBackwardCompatibility}} is broken by  
HADOOP-12294 as documented 
[here|https://issues.apache.org/jira/browse/HDFS-9451?focusedCommentId=15023926&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15023926]

> WebHdfsFileSystem should retry reads according to the configured retry policy.
> --
>
> Key: HDFS-7163
> URL: https://issues.apache.org/jira/browse/HDFS-7163
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 3.0.0, 2.5.1
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: HDFS-7163-branch-2.003.patch, 
> HDFS-7163-branch-2.004.patch, HDFS-7163-branch-2.7.003.patch, 
> HDFS-7163-branch-2.7.004.patch, HDFS-7163.001.patch, HDFS-7163.002.patch, 
> HDFS-7163.003.patch, HDFS-7163.004.patch, WebHDFS Read Retry.pdf
>
>
> In the current implementation of WebHdfsFileSystem, opens are retried 
> according to the configured retry policy, but not reads. Therefore, if a 
> connection goes down while data is being read, the read will fail and the 
> read will have to be retried by the client code.
> Also, after a connection has been established, the next read (or seek/read) 
> will fail and the read will have to be restarted by the client code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7163) WebHdfsFileSystem should retry reads according to the configured retry policy.

2015-11-24 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HDFS-7163:
-
Attachment: HDFS-7163.004.patch

Removed and reattached HDFS-7163.004.patch in hopes of re-launching the 
precommit build.

https://builds.apache.org/job/PreCommit-HDFS-Build/13627/ ran, but did not 
complete.

> WebHdfsFileSystem should retry reads according to the configured retry policy.
> --
>
> Key: HDFS-7163
> URL: https://issues.apache.org/jira/browse/HDFS-7163
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 3.0.0, 2.5.1
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: HDFS-7163-branch-2.003.patch, 
> HDFS-7163-branch-2.004.patch, HDFS-7163-branch-2.7.003.patch, 
> HDFS-7163-branch-2.7.004.patch, HDFS-7163.001.patch, HDFS-7163.002.patch, 
> HDFS-7163.003.patch, HDFS-7163.004.patch, WebHDFS Read Retry.pdf
>
>
> In the current implementation of WebHdfsFileSystem, opens are retried 
> according to the configured retry policy, but not reads. Therefore, if a 
> connection goes down while data is being read, the read will fail and the 
> read will have to be retried by the client code.
> Also, after a connection has been established, the next read (or seek/read) 
> will fail and the read will have to be restarted by the client code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7163) WebHdfsFileSystem should retry reads according to the configured retry policy.

2015-11-24 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HDFS-7163:
-
Attachment: (was: HDFS-7163.004.patch)

> WebHdfsFileSystem should retry reads according to the configured retry policy.
> --
>
> Key: HDFS-7163
> URL: https://issues.apache.org/jira/browse/HDFS-7163
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 3.0.0, 2.5.1
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: HDFS-7163-branch-2.003.patch, 
> HDFS-7163-branch-2.004.patch, HDFS-7163-branch-2.7.003.patch, 
> HDFS-7163-branch-2.7.004.patch, HDFS-7163.001.patch, HDFS-7163.002.patch, 
> HDFS-7163.003.patch, WebHDFS Read Retry.pdf
>
>
> In the current implementation of WebHdfsFileSystem, opens are retried 
> according to the configured retry policy, but not reads. Therefore, if a 
> connection goes down while data is being read, the read will fail and the 
> read will have to be retried by the client code.
> Also, after a connection has been established, the next read (or seek/read) 
> will fail and the read will have to be restarted by the client code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7163) WebHdfsFileSystem should retry reads according to the configured retry policy.

2015-11-23 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HDFS-7163:
-
Target Version/s: 3.0.0, 2.8.0, 2.7.3  (was: 3.0.0, 2.8.0)

> WebHdfsFileSystem should retry reads according to the configured retry policy.
> --
>
> Key: HDFS-7163
> URL: https://issues.apache.org/jira/browse/HDFS-7163
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 3.0.0, 2.5.1
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: HDFS-7163-branch-2.003.patch, 
> HDFS-7163-branch-2.004.patch, HDFS-7163-branch-2.7.003.patch, 
> HDFS-7163-branch-2.7.004.patch, HDFS-7163.001.patch, HDFS-7163.002.patch, 
> HDFS-7163.003.patch, HDFS-7163.004.patch, WebHDFS Read Retry.pdf
>
>
> In the current implementation of WebHdfsFileSystem, opens are retried 
> according to the configured retry policy, but not reads. Therefore, if a 
> connection goes down while data is being read, the read will fail and the 
> read will have to be retried by the client code.
> Also, after a connection has been established, the next read (or seek/read) 
> will fail and the read will have to be restarted by the client code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7163) WebHdfsFileSystem should retry reads according to the configured retry policy.

2015-11-23 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HDFS-7163:
-
Attachment: HDFS-7163-branch-2.7.004.patch
HDFS-7163-branch-2.004.patch
HDFS-7163.004.patch

Attaching version 004 of patches for trunk, branch-2, and branch-2.7.

Version 003 had the issue that could result in the NN sending the client to a 
bad DN one extra time. In version 003, if the client received an IOException 
while reading from the DN, it failed to put the DN in the excluded nodes list. 
This could result in the NN sending the client back to the same DN. However, if 
that occurred, the open would fail and send the client back to the NN, this 
time with the bad DN in the excluded nodes list. The read would still succeed, 
but it would take a bit longer due to an extra attempt to open a bad DN.

Version 004 fixes that issue and supplies the bad DN in the excluded nodes list 
during a read when an IOException occurs.

> WebHdfsFileSystem should retry reads according to the configured retry policy.
> --
>
> Key: HDFS-7163
> URL: https://issues.apache.org/jira/browse/HDFS-7163
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 3.0.0, 2.5.1
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: HDFS-7163-branch-2.003.patch, 
> HDFS-7163-branch-2.004.patch, HDFS-7163-branch-2.7.003.patch, 
> HDFS-7163-branch-2.7.004.patch, HDFS-7163.001.patch, HDFS-7163.002.patch, 
> HDFS-7163.003.patch, HDFS-7163.004.patch, WebHDFS Read Retry.pdf
>
>
> In the current implementation of WebHdfsFileSystem, opens are retried 
> according to the configured retry policy, but not reads. Therefore, if a 
> connection goes down while data is being read, the read will fail and the 
> read will have to be retried by the client code.
> Also, after a connection has been established, the next read (or seek/read) 
> will fail and the read will have to be restarted by the client code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7163) WebHdfsFileSystem should retry reads according to the configured retry policy.

2015-11-23 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15022684#comment-15022684
 ] 

Eric Payne commented on HDFS-7163:
--

{quote}
In this patch, if the DN that is being read from goes down, WebHDFS will put 
that DN into the client's URL exclude list before querying the NN again for 
another DN. The only time the same DN is reused is if a seek has occurred.
{quote}
[~wheat9], I was wrong about one thing. In the current patch, a failed read 
does not put the current node into the exclude list, so when the client queries 
the NN again, the NN could give it the same node back. I will put up a new 
patch soon addressing this issue.

> WebHdfsFileSystem should retry reads according to the configured retry policy.
> --
>
> Key: HDFS-7163
> URL: https://issues.apache.org/jira/browse/HDFS-7163
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 3.0.0, 2.5.1
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: HDFS-7163-branch-2.003.patch, 
> HDFS-7163-branch-2.7.003.patch, HDFS-7163.001.patch, HDFS-7163.002.patch, 
> HDFS-7163.003.patch, WebHDFS Read Retry.pdf
>
>
> In the current implementation of WebHdfsFileSystem, opens are retried 
> according to the configured retry policy, but not reads. Therefore, if a 
> connection goes down while data is being read, the read will fail and the 
> read will have to be retried by the client code.
> Also, after a connection has been established, the next read (or seek/read) 
> will fail and the read will have to be restarted by the client code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7163) WebHdfsFileSystem should retry reads according to the configured retry policy.

2015-11-20 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15019039#comment-15019039
 ] 

Eric Payne commented on HDFS-7163:
--

[~wheat9], thank you for your review and comments on this feature.

bq. I think retrying only on the data node is problematic as the retry might 
have little value when the DN goes down.

In this patch, if the DN that is being read from goes down, WebHDFS will put 
that DN into the client's URL exclude list before querying the NN again for 
another DN. The only time the same DN is reused is if a seek has occurred.

bq. An alternative approach is to have WebHDFS (1) expose a GET_BLOCK call 
where the DN returns the block directly, and (2) be a smarter client that 
retries based on block locations.

Although this may be a more elegant solution, I think that could be done as 
part of a separate JIRA, given that we can take advantage of the exclude list 
functionality as I mentioned above.


> WebHdfsFileSystem should retry reads according to the configured retry policy.
> --
>
> Key: HDFS-7163
> URL: https://issues.apache.org/jira/browse/HDFS-7163
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 3.0.0, 2.5.1
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: HDFS-7163-branch-2.003.patch, 
> HDFS-7163-branch-2.7.003.patch, HDFS-7163.001.patch, HDFS-7163.002.patch, 
> HDFS-7163.003.patch, WebHDFS Read Retry.pdf
>
>
> In the current implementation of WebHdfsFileSystem, opens are retried 
> according to the configured retry policy, but not reads. Therefore, if a 
> connection goes down while data is being read, the read will fail and the 
> read will have to be retried by the client code.
> Also, after a connection has been established, the next read (or seek/read) 
> will fail and the read will have to be restarted by the client code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7163) WebHdfsFileSystem should retry reads according to the configured retry policy.

2015-11-19 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15014632#comment-15014632
 ] 

Eric Payne commented on HDFS-7163:
--

Hi [~wheat9] and [~daryn]. I wonder if either of you know when you might have a 
chance to review this. I would really appreciate your feedback. Thanks.

> WebHdfsFileSystem should retry reads according to the configured retry policy.
> --
>
> Key: HDFS-7163
> URL: https://issues.apache.org/jira/browse/HDFS-7163
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 3.0.0, 2.5.1
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: HDFS-7163-branch-2.003.patch, 
> HDFS-7163-branch-2.7.003.patch, HDFS-7163.001.patch, HDFS-7163.002.patch, 
> HDFS-7163.003.patch, WebHDFS Read Retry.pdf
>
>
> In the current implementation of WebHdfsFileSystem, opens are retried 
> according to the configured retry policy, but not reads. Therefore, if a 
> connection goes down while data is being read, the read will fail and the 
> read will have to be retried by the client code.
> Also, after a connection has been established, the next read (or seek/read) 
> will fail and the read will have to be restarted by the client code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7163) WebHdfsFileSystem should retry reads according to the configured retry policy.

2015-11-12 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HDFS-7163:
-
Attachment: HDFS-7163-branch-2.7.003.patch
HDFS-7163-branch-2.003.patch

As documented above, the unit test errors are not occurring for me in my local 
build environment.

Attaching branch-2 and branch-2.7 patches. Although I named them according to 
the naming convention documented 
[here|http://wiki.apache.org/hadoop/HowToContribute#Naming_your_patch], the 
build will still try to apply them to trunk, so the corresponding HadoopQA 
message will indicate a build failure.

[~wheat9], [~daryn], can you please take a look at this patch? Thank you.

> WebHdfsFileSystem should retry reads according to the configured retry policy.
> --
>
> Key: HDFS-7163
> URL: https://issues.apache.org/jira/browse/HDFS-7163
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 3.0.0, 2.5.1
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: HDFS-7163-branch-2.003.patch, 
> HDFS-7163-branch-2.7.003.patch, HDFS-7163.001.patch, HDFS-7163.002.patch, 
> HDFS-7163.003.patch, WebHDFS Read Retry.pdf
>
>
> In the current implementation of WebHdfsFileSystem, opens are retried 
> according to the configured retry policy, but not reads. Therefore, if a 
> connection goes down while data is being read, the read will fail and the 
> read will have to be retried by the client code.
> Also, after a connection has been established, the next read (or seek/read) 
> will fail and the read will have to be restarted by the client code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7163) WebHdfsFileSystem should retry reads according to the configured retry policy.

2015-11-06 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HDFS-7163:
-
Attachment: HDFS-7163.003.patch

Fixed the checkstyle and findbugs warnings. None of the unit tests listed above 
failed in my own build environment.

> WebHdfsFileSystem should retry reads according to the configured retry policy.
> --
>
> Key: HDFS-7163
> URL: https://issues.apache.org/jira/browse/HDFS-7163
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 3.0.0, 2.5.1
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: HDFS-7163.001.patch, HDFS-7163.002.patch, 
> HDFS-7163.003.patch, WebHDFS Read Retry.pdf
>
>
> In the current implementation of WebHdfsFileSystem, opens are retried 
> according to the configured retry policy, but not reads. Therefore, if a 
> connection goes down while data is being read, the read will fail and the 
> read will have to be retried by the client code.
> Also, after a connection has been established, the next read (or seek/read) 
> will fail and the read will have to be restarted by the client code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7163) WebHdfsFileSystem should retry reads according to the configured retry policy.

2015-11-05 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HDFS-7163:
-
Attachment: HDFS-7163.002.patch

New patch (HDFS-7163.002.patch). Fixed unit test failures for 
{{TestWriteReadStripedFile}}. Also fixed javadoc and whitespace warnings.

The following tests did not fail for me in my build environment, so I don't 
think they are related:
{{hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes}}
{{hadoop.hdfs.server.blockmanagement.TestNodeCount}}
{{hadoop.hdfs.server.namenode.ha.TestDNFencing}}
{{hadoop.hdfs.server.namenode.TestCacheDirectives}}
{{hadoop.hdfs.TestDFSStripedOutputStreamWithFailure000}}

I will shortly post branch-2 and branch-2.7 patches.

> WebHdfsFileSystem should retry reads according to the configured retry policy.
> --
>
> Key: HDFS-7163
> URL: https://issues.apache.org/jira/browse/HDFS-7163
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 3.0.0, 2.5.1
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: HDFS-7163.001.patch, HDFS-7163.002.patch, WebHDFS Read 
> Retry.pdf
>
>
> In the current implementation of WebHdfsFileSystem, opens are retried 
> according to the configured retry policy, but not reads. Therefore, if a 
> connection goes down while data is being read, the read will fail and the 
> read will have to be retried by the client code.
> Also, after a connection has been established, the next read (or seek/read) 
> will fail and the read will have to be restarted by the client code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7163) WebHdfsFileSystem should retry reads according to the configured retry policy.

2015-10-28 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HDFS-7163:
-
Status: Patch Available  (was: Open)

> WebHdfsFileSystem should retry reads according to the configured retry policy.
> --
>
> Key: HDFS-7163
> URL: https://issues.apache.org/jira/browse/HDFS-7163
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 2.5.1, 3.0.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: HDFS-7163.001.patch, WebHDFS Read Retry.pdf
>
>
> In the current implementation of WebHdfsFileSystem, opens are retried 
> according to the configured retry policy, but not reads. Therefore, if a 
> connection goes down while data is being read, the read will fail and the 
> read will have to be retried by the client code.
> Also, after a connection has been established, the next read (or seek/read) 
> will fail and the read will have to be restarted by the client code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7163) WebHdfsFileSystem should retry reads according to the configured retry policy.

2015-10-28 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HDFS-7163:
-
Attachment: HDFS-7163.001.patch

> WebHdfsFileSystem should retry reads according to the configured retry policy.
> --
>
> Key: HDFS-7163
> URL: https://issues.apache.org/jira/browse/HDFS-7163
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 3.0.0, 2.5.1
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: HDFS-7163.001.patch, WebHDFS Read Retry.pdf
>
>
> In the current implementation of WebHdfsFileSystem, opens are retried 
> according to the configured retry policy, but not reads. Therefore, if a 
> connection goes down while data is being read, the read will fail and the 
> read will have to be retried by the client code.
> Also, after a connection has been established, the next read (or seek/read) 
> will fail and the read will have to be restarted by the client code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7163) WebHdfsFileSystem should retry reads according to the configured retry policy.

2015-10-28 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HDFS-7163:
-
Status: Open  (was: Patch Available)

> WebHdfsFileSystem should retry reads according to the configured retry policy.
> --
>
> Key: HDFS-7163
> URL: https://issues.apache.org/jira/browse/HDFS-7163
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 2.5.1, 3.0.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: WebHDFS Read Retry.pdf
>
>
> In the current implementation of WebHdfsFileSystem, opens are retried 
> according to the configured retry policy, but not reads. Therefore, if a 
> connection goes down while data is being read, the read will fail and the 
> read will have to be retried by the client code.
> Also, after a connection has been established, the next read (or seek/read) 
> will fail and the read will have to be restarted by the client code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7163) WebHdfsFileSystem should retry reads according to the configured retry policy.

2015-10-28 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HDFS-7163:
-
Attachment: (was: HDFS-7163.001.patch)

> WebHdfsFileSystem should retry reads according to the configured retry policy.
> --
>
> Key: HDFS-7163
> URL: https://issues.apache.org/jira/browse/HDFS-7163
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 3.0.0, 2.5.1
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: WebHDFS Read Retry.pdf
>
>
> In the current implementation of WebHdfsFileSystem, opens are retried 
> according to the configured retry policy, but not reads. Therefore, if a 
> connection goes down while data is being read, the read will fail and the 
> read will have to be retried by the client code.
> Also, after a connection has been established, the next read (or seek/read) 
> will fail and the read will have to be restarted by the client code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7163) WebHdfsFileSystem should retry reads according to the configured retry policy.

2015-10-28 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HDFS-7163:
-
Status: Patch Available  (was: Open)

> WebHdfsFileSystem should retry reads according to the configured retry policy.
> --
>
> Key: HDFS-7163
> URL: https://issues.apache.org/jira/browse/HDFS-7163
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 2.5.1, 3.0.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: HDFS-7163.001.patch, WebHDFS Read Retry.pdf
>
>
> In the current implementation of WebHdfsFileSystem, opens are retried 
> according to the configured retry policy, but not reads. Therefore, if a 
> connection goes down while data is being read, the read will fail and the 
> read will have to be retried by the client code.
> Also, after a connection has been established, the next read (or seek/read) 
> will fail and the read will have to be restarted by the client code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7163) WebHdfsFileSystem should retry reads according to the configured retry policy.

2015-10-28 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HDFS-7163:
-
Attachment: HDFS-7163.001.patch

> WebHdfsFileSystem should retry reads according to the configured retry policy.
> --
>
> Key: HDFS-7163
> URL: https://issues.apache.org/jira/browse/HDFS-7163
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 3.0.0, 2.5.1
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: HDFS-7163.001.patch, WebHDFS Read Retry.pdf
>
>
> In the current implementation of WebHdfsFileSystem, opens are retried 
> according to the configured retry policy, but not reads. Therefore, if a 
> connection goes down while data is being read, the read will fail and the 
> read will have to be retried by the client code.
> Also, after a connection has been established, the next read (or seek/read) 
> will fail and the read will have to be restarted by the client code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7163) WebHdfsFileSystem should retry reads according to the configured retry policy.

2015-10-28 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HDFS-7163:
-
Status: Open  (was: Patch Available)

> WebHdfsFileSystem should retry reads according to the configured retry policy.
> --
>
> Key: HDFS-7163
> URL: https://issues.apache.org/jira/browse/HDFS-7163
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 2.5.1, 3.0.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: WebHDFS Read Retry.pdf
>
>
> In the current implementation of WebHdfsFileSystem, opens are retried 
> according to the configured retry policy, but not reads. Therefore, if a 
> connection goes down while data is being read, the read will fail and the 
> read will have to be retried by the client code.
> Also, after a connection has been established, the next read (or seek/read) 
> will fail and the read will have to be restarted by the client code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7163) WebHdfsFileSystem should retry reads according to the configured retry policy.

2015-10-28 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HDFS-7163:
-
Attachment: (was: HDFS-7163.001.patch)

> WebHdfsFileSystem should retry reads according to the configured retry policy.
> --
>
> Key: HDFS-7163
> URL: https://issues.apache.org/jira/browse/HDFS-7163
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 3.0.0, 2.5.1
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: WebHDFS Read Retry.pdf
>
>
> In the current implementation of WebHdfsFileSystem, opens are retried 
> according to the configured retry policy, but not reads. Therefore, if a 
> connection goes down while data is being read, the read will fail and the 
> read will have to be retried by the client code.
> Also, after a connection has been established, the next read (or seek/read) 
> will fail and the read will have to be restarted by the client code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7163) WebHdfsFileSystem should retry reads according to the configured retry policy.

2015-10-27 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HDFS-7163:
-
Target Version/s: 3.0.0, 2.8.0
  Status: Patch Available  (was: Open)

[~daryn], [~wheat9], [~kihwal]. Please find attached the design and patch for 
adding read retry support to WebHdfs. I would really appreciate your input and 
feedback.

> WebHdfsFileSystem should retry reads according to the configured retry policy.
> --
>
> Key: HDFS-7163
> URL: https://issues.apache.org/jira/browse/HDFS-7163
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 2.5.1, 3.0.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: HDFS-7163.001.patch, WebHDFS Read Retry.pdf
>
>
> In the current implementation of WebHdfsFileSystem, opens are retried 
> according to the configured retry policy, but not reads. Therefore, if a 
> connection goes down while data is being read, the read will fail and the 
> read will have to be retried by the client code.
> Also, after a connection has been established, the next read (or seek/read) 
> will fail and the read will have to be restarted by the client code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7163) WebHdfsFileSystem should retry reads according to the configured retry policy.

2015-10-27 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HDFS-7163:
-
Attachment: WebHDFS Read Retry.pdf

> WebHdfsFileSystem should retry reads according to the configured retry policy.
> --
>
> Key: HDFS-7163
> URL: https://issues.apache.org/jira/browse/HDFS-7163
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 3.0.0, 2.5.1
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: HDFS-7163.001.patch, WebHDFS Read Retry.pdf
>
>
> In the current implementation of WebHdfsFileSystem, opens are retried 
> according to the configured retry policy, but not reads. Therefore, if a 
> connection goes down while data is being read, the read will fail and the 
> read will have to be retried by the client code.
> Also, after a connection has been established, the next read (or seek/read) 
> will fail and the read will have to be restarted by the client code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7163) WebHdfsFileSystem should retry reads according to the configured retry policy.

2015-10-27 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HDFS-7163:
-
Attachment: HDFS-7163.001.patch

> WebHdfsFileSystem should retry reads according to the configured retry policy.
> --
>
> Key: HDFS-7163
> URL: https://issues.apache.org/jira/browse/HDFS-7163
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 3.0.0, 2.5.1
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: HDFS-7163.001.patch
>
>
> In the current implementation of WebHdfsFileSystem, opens are retried 
> according to the configured retry policy, but not reads. Therefore, if a 
> connection goes down while data is being read, the read will fail and the 
> read will have to be retried by the client code.
> Also, after a connection has been established, the next read (or seek/read) 
> will fail and the read will have to be restarted by the client code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7163) WebHdfsFileSystem should retry reads according to the configured retry policy.

2015-10-27 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HDFS-7163:
-
Description: 
In the current implementation of WebHdfsFileSystem, opens are retried according 
to the configured retry policy, but not reads. Therefore, if a connection goes 
down while data is being read, the read will fail and the read will have to be 
retried by the client code.

Also, after a connection has been established, the next read (or seek/read) 
will fail and the read will have to be restarted by the client code.
Summary: WebHdfsFileSystem should retry reads according to the 
configured retry policy.  (was: WebHdfsFileSystem should retry reads in a 
similar way as the open)

> WebHdfsFileSystem should retry reads according to the configured retry policy.
> --
>
> Key: HDFS-7163
> URL: https://issues.apache.org/jira/browse/HDFS-7163
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 3.0.0, 2.5.1
>Reporter: Eric Payne
>Assignee: Eric Payne
>
> In the current implementation of WebHdfsFileSystem, opens are retried 
> according to the configured retry policy, but not reads. Therefore, if a 
> connection goes down while data is being read, the read will fail and the 
> read will have to be retried by the client code.
> Also, after a connection has been established, the next read (or seek/read) 
> will fail and the read will have to be restarted by the client code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9235) hdfs-native-client build getting errors when built with cmake 2.6

2015-10-13 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14955745#comment-14955745
 ] 

Eric Payne commented on HDFS-9235:
--

Thanks [~wheat9]!

> hdfs-native-client build getting errors when built with cmake 2.6
> -
>
> Key: HDFS-9235
> URL: https://issues.apache.org/jira/browse/HDFS-9235
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 3.0.0, 2.8.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: HDFS-9235.001.patch
>
>
> During the hdfs-native-client code move done as part of HDFS-9170, the cmake 
> minimum version was changed from 2.6 to 2.8. This JIRA will change the value 
> back to 2.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7163) WebHdfsFileSystem should retry reads in a similar way as the open

2015-10-13 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HDFS-7163:
-
Summary: WebHdfsFileSystem should retry reads in a similar way as the open  
(was: port read retry logic from 0.23's WebHdfsFilesystem#WebHdfsInputStream to 
2.x)

> WebHdfsFileSystem should retry reads in a similar way as the open
> -
>
> Key: HDFS-7163
> URL: https://issues.apache.org/jira/browse/HDFS-7163
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 3.0.0, 2.5.1
>Reporter: Eric Payne
>Assignee: Eric Payne
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9235) hdfs-native-client build getting errors when built with cmake 2.6

2015-10-13 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HDFS-9235:
-
Status: Patch Available  (was: Open)

[~andrew.wang] and [~wheat9], referencing [the comment from 
HDFS-9170|https://issues.apache.org/jira/browse/HDFS-9170?focusedCommentId=14954188&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14954188],
 I would like to request that you provide feedback for this change.

> hdfs-native-client build getting errors when built with cmake 2.6
> -
>
> Key: HDFS-9235
> URL: https://issues.apache.org/jira/browse/HDFS-9235
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 3.0.0, 2.8.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Minor
> Attachments: HDFS-9235.001.patch
>
>
> During the hdfs-native-client code move done as part of HDFS-9170, the cmake 
> minimum version was changed from 2.6 to 2.8. This JIRA will change the value 
> back to 2.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work stopped] (HDFS-9235) hdfs-native-client build getting errors when built with cmake 2.6

2015-10-13 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-9235 stopped by Eric Payne.

> hdfs-native-client build getting errors when built with cmake 2.6
> -
>
> Key: HDFS-9235
> URL: https://issues.apache.org/jira/browse/HDFS-9235
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 3.0.0, 2.8.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Minor
> Attachments: HDFS-9235.001.patch
>
>
> During the hdfs-native-client code move done as part of HDFS-9170, the cmake 
> minimum version was changed from 2.6 to 2.8. This JIRA will change the value 
> back to 2.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (HDFS-9235) hdfs-native-client build getting errors when built with cmake 2.6

2015-10-13 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-9235 started by Eric Payne.

> hdfs-native-client build getting errors when built with cmake 2.6
> -
>
> Key: HDFS-9235
> URL: https://issues.apache.org/jira/browse/HDFS-9235
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 3.0.0, 2.8.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Minor
> Attachments: HDFS-9235.001.patch
>
>
> During the hdfs-native-client code move done as part of HDFS-9170, the cmake 
> minimum version was changed from 2.6 to 2.8. This JIRA will change the value 
> back to 2.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9235) hdfs-native-client build getting errors when built with cmake 2.6

2015-10-13 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HDFS-9235:
-
Affects Version/s: (was: 2.7.2)
   2.8.0

> hdfs-native-client build getting errors when built with cmake 2.6
> -
>
> Key: HDFS-9235
> URL: https://issues.apache.org/jira/browse/HDFS-9235
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 3.0.0, 2.8.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Minor
> Attachments: HDFS-9235.001.patch
>
>
> During the hdfs-native-client code move done as part of HDFS-9170, the cmake 
> minimum version was changed from 2.6 to 2.8. This JIRA will change the value 
> back to 2.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9235) hdfs-native-client build getting errors when built with cmake 2.6

2015-10-13 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HDFS-9235:
-
Target Version/s: 3.0.0, 2.8.0  (was: 3.0.0, 2.7.2)

> hdfs-native-client build getting errors when built with cmake 2.6
> -
>
> Key: HDFS-9235
> URL: https://issues.apache.org/jira/browse/HDFS-9235
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 3.0.0, 2.8.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Minor
> Attachments: HDFS-9235.001.patch
>
>
> During the hdfs-native-client code move done as part of HDFS-9170, the cmake 
> minimum version was changed from 2.6 to 2.8. This JIRA will change the value 
> back to 2.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9235) hdfs-native-client build getting errors when built with cmake 2.6

2015-10-13 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HDFS-9235:
-
Attachment: HDFS-9235.001.patch

> hdfs-native-client build getting errors when built with cmake 2.6
> -
>
> Key: HDFS-9235
> URL: https://issues.apache.org/jira/browse/HDFS-9235
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 3.0.0, 2.7.2
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Minor
> Attachments: HDFS-9235.001.patch
>
>
> During the hdfs-native-client code move done as part of HDFS-9170, the cmake 
> minimum version was changed from 2.6 to 2.8. This JIRA will change the value 
> back to 2.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9235) hdfs-native-client build getting errors when built with cmake 2.6

2015-10-13 Thread Eric Payne (JIRA)
Eric Payne created HDFS-9235:


 Summary: hdfs-native-client build getting errors when built with 
cmake 2.6
 Key: HDFS-9235
 URL: https://issues.apache.org/jira/browse/HDFS-9235
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 3.0.0, 2.7.2
Reporter: Eric Payne
Assignee: Eric Payne
Priority: Minor


During the hdfs-native-client code move done as part of HDFS-9170, the cmake 
minimum version was changed from 2.6 to 2.8. This JIRA will change the value 
back to 2.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9170) Move libhdfs / fuse-dfs / libwebhdfs to hdfs-client

2015-10-10 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14951920#comment-14951920
 ] 

Eric Payne commented on HDFS-9170:
--

[~wheat9], yes, changing the value in CMakeLists.txt to 2.6 allowed me to build 
with cmake 2.6.

> Move libhdfs / fuse-dfs / libwebhdfs to hdfs-client
> ---
>
> Key: HDFS-9170
> URL: https://issues.apache.org/jira/browse/HDFS-9170
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Fix For: 2.8.0
>
> Attachments: HDFS-9170.000.patch, HDFS-9170.001.patch, 
> HDFS-9170.002.patch, HDFS-9170.003.patch, HDFS-9170.004.patch, 
> native-package-build-fails-with-cmake-2.5.log
>
>
> After HDFS-6200 the Java implementation of hdfs-client has be moved to a 
> separate hadoop-hdfs-client module.
> libhdfs, fuse-dfs and libwebhdfs still reside in the hadoop-hdfs module. 
> Ideally these modules should reside in the hadoop-hdfs-client. However, to 
> write unit tests for these components, it is often necessary to run 
> MiniDFSCluster which resides in the hadoop-hdfs module.
> This jira is to discuss how these native modules should layout after 
> HDFS-6200.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9170) Move libhdfs / fuse-dfs / libwebhdfs to hdfs-client

2015-10-09 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HDFS-9170:
-
Attachment: native-package-build-fails-with-cmake-2.5.log

bq. Eric Payne, do you mind posting the error log?
[~wheat9], attaching output log.

Running with {{cmake.x86_64 2.6.4-5.el6}}

On branch trunk

Command line is {{mvn package -Pdist -DskipTests -Dtar -Pnative}}

> Move libhdfs / fuse-dfs / libwebhdfs to hdfs-client
> ---
>
> Key: HDFS-9170
> URL: https://issues.apache.org/jira/browse/HDFS-9170
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Fix For: 2.8.0
>
> Attachments: HDFS-9170.000.patch, HDFS-9170.001.patch, 
> HDFS-9170.002.patch, HDFS-9170.003.patch, HDFS-9170.004.patch, 
> native-package-build-fails-with-cmake-2.5.log
>
>
> After HDFS-6200 the Java implementation of hdfs-client has be moved to a 
> separate hadoop-hdfs-client module.
> libhdfs, fuse-dfs and libwebhdfs still reside in the hadoop-hdfs module. 
> Ideally these modules should reside in the hadoop-hdfs-client. However, to 
> write unit tests for these components, it is often necessary to run 
> MiniDFSCluster which resides in the hadoop-hdfs module.
> This jira is to discuss how these native modules should layout after 
> HDFS-6200.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9181) Better handling of exceptions thrown during upgrade shutdown

2015-10-09 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950916#comment-14950916
 ] 

Eric Payne commented on HDFS-9181:
--

{quote}
I committed rev 004 to trunk and branch-2. Thanks Wei-Chiu for the 
contribution, and all for the review.

Just saw Wei-Chiu uploaded a new rev 005 with some refactoring. Suggest to 
create follow-up jira and do the refactoring when fixing another issue in same 
area.
{quote}
I agree with [~yzhangal]'s suggestion to make the refactoring task a separate 
JIRA. I believe that it is much easier to review changes when they are focused 
on one issue. In fact, if refactoring is to be done, I would rather that it be 
done by itself rather than with any other changes.

> Better handling of exceptions thrown during upgrade shutdown
> 
>
> Key: HDFS-9181
> URL: https://issues.apache.org/jira/browse/HDFS-9181
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Minor
>  Labels: supportability
> Fix For: 2.8.0
>
> Attachments: HDFS-9181.002.patch, HDFS-9181.003.patch, 
> HDFS-9181.004.patch, HDFS-9181.005.patch
>
>
> Previously in HDFS-7533, a bug was fixed by suppressing exceptions during 
> upgrade shutdown. It may be appropriate as a temporary fix, but it would be 
> better if the exception is handled in some other ways.
> One way to handle it is by emitting a warning message. There could exist 
> other ways to handle it. This jira is created to discuss how to handle this 
> case better.
> Thanks to [~templedf] for bringing this up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9170) Move libhdfs / fuse-dfs / libwebhdfs to hdfs-client

2015-10-09 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950580#comment-14950580
 ] 

Eric Payne commented on HDFS-9170:
--

{quote}
bq. I noticed the patch is not a simple move, it also bumped the minimum cmake 
version to 2.8. Could you explain why this was done?
I think it was a copy and paste error. I think it should work for 2.6. I'll 
open a separate jira to restore the change.
{quote}
I am sorry to report that compiling with cmake 2.6 fails on branch-2 now. I get 
the following error:
{code}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-antrun-plugin:1.7:run (make) on project 
hadoop-hdfs-native-client: An Ant BuildException has occured: exec returned: 1
[ERROR] around Ant part .. @ 5:136 in 
/hadoop/source/current/Hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/target/antrun/build-main.xml
{code}
Fortunately, upgrading to {{cmake.x86_64 2.8.12.2-4.el6}} fixed the problem.

> Move libhdfs / fuse-dfs / libwebhdfs to hdfs-client
> ---
>
> Key: HDFS-9170
> URL: https://issues.apache.org/jira/browse/HDFS-9170
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Fix For: 2.8.0
>
> Attachments: HDFS-9170.000.patch, HDFS-9170.001.patch, 
> HDFS-9170.002.patch, HDFS-9170.003.patch, HDFS-9170.004.patch
>
>
> After HDFS-6200 the Java implementation of hdfs-client has be moved to a 
> separate hadoop-hdfs-client module.
> libhdfs, fuse-dfs and libwebhdfs still reside in the hadoop-hdfs module. 
> Ideally these modules should reside in the hadoop-hdfs-client. However, to 
> write unit tests for these components, it is often necessary to run 
> MiniDFSCluster which resides in the hadoop-hdfs module.
> This jira is to discuss how these native modules should layout after 
> HDFS-6200.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9215) Suppress the RAT warnings in hdfs-native-client module

2015-10-08 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949530#comment-14949530
 ] 

Eric Payne commented on HDFS-9215:
--

Thanks [~wheat9]. +1 (non-binding)

> Suppress the RAT warnings in hdfs-native-client module
> --
>
> Key: HDFS-9215
> URL: https://issues.apache.org/jira/browse/HDFS-9215
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Haohui Mai
>Assignee: Haohui Mai
>Priority: Minor
> Attachments: HDFS-9215.000.patch, HDFS-9215.001.patch
>
>
> HDFS-9170 moves the native client implementation to the hdfs-native-client 
> module. This is a follow-up jira to suppress the RAT warning that was 
> suppressed in the original hadoop-hdfs module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-9216) Fix RAT licensing issues

2015-10-08 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne resolved HDFS-9216.
--
Resolution: Duplicate

> Fix RAT licensing issues
> 
>
> Key: HDFS-9216
> URL: https://issues.apache.org/jira/browse/HDFS-9216
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: HDFS
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Minor
>
> The following files in HDFS have license issues:
> {{hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/util/tree.h}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9215) Suppress the RAT warnings in hdfs-native-client module

2015-10-08 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949514#comment-14949514
 ] 

Eric Payne commented on HDFS-9215:
--

For example, something like the following should be added to LICENSE.txt:
{noformat}
For 
hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/util/tree.h

/*  $NetBSD: tree.h,v 1.8 2004/03/28 19:38:30 provos Exp $  */
/*  $OpenBSD: tree.h,v 1.7 2002/10/17 21:51:54 art Exp $*/
/* $FreeBSD: src/sys/sys/tree.h,v 1.9.4.1 2011/09/23 00:51:37 kensmith Exp $ */

/*-
 * Copyright 2002 Niels Provos 
 * All rights reserved.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
...
{noformat}

> Suppress the RAT warnings in hdfs-native-client module
> --
>
> Key: HDFS-9215
> URL: https://issues.apache.org/jira/browse/HDFS-9215
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Haohui Mai
>Assignee: Haohui Mai
>Priority: Minor
> Attachments: HDFS-9215.000.patch
>
>
> HDFS-9170 moves the native client implementation to the hdfs-native-client 
> module. This is a follow-up jira to suppress the RAT warning that was 
> suppressed in the original hadoop-hdfs module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9215) Suppress the RAT warnings in hdfs-native-client module

2015-10-08 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949511#comment-14949511
 ] 

Eric Payne commented on HDFS-9215:
--

[~wheat9] and [~andrew.wang], I'm pretty sure we also have to change 
LICENSE.txt.

> Suppress the RAT warnings in hdfs-native-client module
> --
>
> Key: HDFS-9215
> URL: https://issues.apache.org/jira/browse/HDFS-9215
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Haohui Mai
>Assignee: Haohui Mai
>Priority: Minor
> Attachments: HDFS-9215.000.patch
>
>
> HDFS-9170 moves the native client implementation to the hdfs-native-client 
> module. This is a follow-up jira to suppress the RAT warning that was 
> suppressed in the original hadoop-hdfs module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9216) Fix RAT licensing issues

2015-10-08 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HDFS-9216:
-
Description: 
The following files in HDFS have license issues:
{{hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/util/tree.h}}

  was:
The following files in HDFS have license issues:
{{hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/util/tree.h}}
{{hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileStatusWithECPolicy.java}}


> Fix RAT licensing issues
> 
>
> Key: HDFS-9216
> URL: https://issues.apache.org/jira/browse/HDFS-9216
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: HDFS
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Minor
>
> The following files in HDFS have license issues:
> {{hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/util/tree.h}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9216) Fix RAT licensing issues

2015-10-08 Thread Eric Payne (JIRA)
Eric Payne created HDFS-9216:


 Summary: Fix RAT licensing issues
 Key: HDFS-9216
 URL: https://issues.apache.org/jira/browse/HDFS-9216
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: HDFS
Reporter: Eric Payne
Assignee: Eric Payne
Priority: Minor


The following files in HDFS have license issues:
{{hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/util/tree.h}}
{{hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileStatusWithECPolicy.java}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9211) Fix incorrect version in hadoop-hdfs-native-client/pom.xml from HDFS-9170 branch-2 backport

2015-10-08 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949238#comment-14949238
 ] 

Eric Payne commented on HDFS-9211:
--

bq.Committed to just branch-2,
Thanks, [~andrew.wang]!

bq. Any interest in fixing up the RAT issue too? It looks related to the 
hdfs-native-client refactor.
Sure. Is this the license issue with 
{{hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/util/tree.h}}?
 Should I open a new JIRA? A quick search through the JIRA pool didn't yield 
any pertinent open JIRAs, but I may have missed it.

> Fix incorrect version in hadoop-hdfs-native-client/pom.xml from HDFS-9170 
> branch-2 backport
> ---
>
> Key: HDFS-9211
> URL: https://issues.apache.org/jira/browse/HDFS-9211
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: build
>Affects Versions: 2.8.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Fix For: 2.8.0
>
> Attachments: HDFS-9211-branch-2.001.patch
>
>
> When HDFS-9170 was backported to branch-2, the version in 
> hadoop-hdfs-project/hadoop-hdfs-native-client/pom.xml



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9211) branch-2 build broken by incorrect version in hadoop-hdfs-native-client/pom.xml

2015-10-07 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HDFS-9211:
-
Attachment: HDFS-9211-branch-2.001.patch

The error is as follows:
{noformat}
The project org.apache.hadoop:hadoop-hdfs-native-client:3.0.0-SNAPSHOT 
(.../hadoop-hdfs-project/hadoop-hdfs-native-client/pom.xml) has 1 error
{noformat}

> branch-2 build broken by incorrect version in 
> hadoop-hdfs-native-client/pom.xml 
> 
>
> Key: HDFS-9211
> URL: https://issues.apache.org/jira/browse/HDFS-9211
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: HDFS-9211-branch-2.001.patch
>
>
> When HDFS-9170 was backported to branch-2, the version in 
> hadoop-hdfs-project/hadoop-hdfs-native-client/pom.xml



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9211) branch-2 build broken by incorrect version in hadoop-hdfs-native-client/pom.xml

2015-10-07 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HDFS-9211:
-
Status: Patch Available  (was: Open)

> branch-2 build broken by incorrect version in 
> hadoop-hdfs-native-client/pom.xml 
> 
>
> Key: HDFS-9211
> URL: https://issues.apache.org/jira/browse/HDFS-9211
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: HDFS-9211-branch-2.001.patch
>
>
> When HDFS-9170 was backported to branch-2, the version in 
> hadoop-hdfs-project/hadoop-hdfs-native-client/pom.xml



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9170) Move libhdfs / fuse-dfs / libwebhdfs to hdfs-client

2015-10-07 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947690#comment-14947690
 ] 

Eric Payne commented on HDFS-9170:
--

[~wheat9], the backport of this patch to branch-2 broke the build due to 
version in hadoop-hdfs-project/hadoop-hdfs-native-client/pom.xml. Please see 
HDFS-9211

> Move libhdfs / fuse-dfs / libwebhdfs to hdfs-client
> ---
>
> Key: HDFS-9170
> URL: https://issues.apache.org/jira/browse/HDFS-9170
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Fix For: 2.8.0
>
> Attachments: HDFS-9170.000.patch, HDFS-9170.001.patch, 
> HDFS-9170.002.patch, HDFS-9170.003.patch, HDFS-9170.004.patch
>
>
> After HDFS-6200 the Java implementation of hdfs-client has be moved to a 
> separate hadoop-hdfs-client module.
> libhdfs, fuse-dfs and libwebhdfs still reside in the hadoop-hdfs module. 
> Ideally these modules should reside in the hadoop-hdfs-client. However, to 
> write unit tests for these components, it is often necessary to run 
> MiniDFSCluster which resides in the hadoop-hdfs module.
> This jira is to discuss how these native modules should layout after 
> HDFS-6200.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-9211) branch-2 build broken by incorrect version in hadoop-hdfs-native-client/pom.xml

2015-10-07 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne reassigned HDFS-9211:


Assignee: Eric Payne

> branch-2 build broken by incorrect version in 
> hadoop-hdfs-native-client/pom.xml 
> 
>
> Key: HDFS-9211
> URL: https://issues.apache.org/jira/browse/HDFS-9211
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Payne
>Assignee: Eric Payne
>
> When HDFS-9170 was backported to branch-2, the version in 
> hadoop-hdfs-project/hadoop-hdfs-native-client/pom.xml



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9211) branch-2 build broken by incorrect version in hadoop-hdfs-native-client/pom.xml

2015-10-07 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HDFS-9211:
-
Summary: branch-2 build broken by incorrect version in 
hadoop-hdfs-native-client/pom.xml   (was: branch-2 broken by incorrect version 
in hadoop-hdfs-native-client/pom.xml )

> branch-2 build broken by incorrect version in 
> hadoop-hdfs-native-client/pom.xml 
> 
>
> Key: HDFS-9211
> URL: https://issues.apache.org/jira/browse/HDFS-9211
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Payne
>
> When HDFS-9170 was backported to branch-2, the version in 
> hadoop-hdfs-project/hadoop-hdfs-native-client/pom.xml



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9211) branch-2 broken by incorrect version in hadoop-hdfs-native-client/pom.xml

2015-10-07 Thread Eric Payne (JIRA)
Eric Payne created HDFS-9211:


 Summary: branch-2 broken by incorrect version in 
hadoop-hdfs-native-client/pom.xml 
 Key: HDFS-9211
 URL: https://issues.apache.org/jira/browse/HDFS-9211
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Eric Payne


When HDFS-9170 was backported to branch-2, the version in 
hadoop-hdfs-project/hadoop-hdfs-native-client/pom.xml



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9181) Better handling of exceptions thrown during upgrade shutdown

2015-10-07 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947318#comment-14947318
 ] 

Eric Payne commented on HDFS-9181:
--

[~jojochuang], Thanks for the patch.
+1 (non-binding)
LGTM

> Better handling of exceptions thrown during upgrade shutdown
> 
>
> Key: HDFS-9181
> URL: https://issues.apache.org/jira/browse/HDFS-9181
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Minor
> Attachments: HDFS-9181.002.patch, HDFS-9181.003.patch
>
>
> Previously in HDFS-7533, a bug was fixed by suppressing exceptions during 
> upgrade shutdown. It may be appropriate as a temporary fix, but it would be 
> better if the exception is handled in some other ways.
> One way to handle it is by emitting a warning message. There could exist 
> other ways to handle it. This jira is created to discuss how to handle this 
> case better.
> Thanks to [~templedf] for bringing this up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9181) Better handling of exceptions thrown during upgrade shutdown

2015-10-05 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14943905#comment-14943905
 ] 

Eric Payne commented on HDFS-9181:
--

{quote}
I think you can get what you want by just changing the catch from Throwable to 
Exception. An uncaught Error will generally bring the whole thing down anyway, 
so not catching Errors won't impact the hanging behavior. (And if it does, we 
have bigger issues.)
{quote}
Thanks, [~templedf]. I'm fine with catching Exceptions and not Throwables. I 
would be interested in what [~kihwal] thinks.

> Better handling of exceptions thrown during upgrade shutdown
> 
>
> Key: HDFS-9181
> URL: https://issues.apache.org/jira/browse/HDFS-9181
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Minor
>
> Previously in HDFS-7533, a bug was fixed by suppressing exceptions during 
> upgrade shutdown. It may be appropriate as a temporary fix, but it would be 
> better if the exception is handled in some other ways.
> One way to handle it is by emitting a warning message. There could exist 
> other ways to handle it. This lira is created to discuss how to handle this 
> case better.
> Thanks to [~templedf] for bringing this up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9181) Better handling of exceptions thrown during upgrade shutdown

2015-10-03 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14942421#comment-14942421
 ] 

Eric Payne commented on HDFS-9181:
--

Thanks [~jojochuang] and [~templedf] for bringing this to my attention.

I am fine with putting out a warning in the case of an exception or throwable 
during shutdown. I want to emphasize that whatever resolution is decided upon 
for this JIRA, the functionality of allowing the shutdown thread to finish 
without hanging should be maintained.

[~jojochuang] stated in [this comment of 
HDFS-7533|https://issues.apache.org/jira/browse/HDFS-7533?focusedCommentId=14936261&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14936261],
 "It looks to me that catching Throwable and ignore it may cause other issues, 
for example, ignoring a OOME can be pretty bad."

Meanwhile, [~kihwal] pointed out in [this comment of 
HDFS-7533|https://issues.apache.org/jira/browse/HDFS-7533?focusedCommentId=14271049&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14271049]
 that "We can check whether a responder is running, but it may be in the 
process of shutting down. Therefore, a proper check requires additional 
locking."

In my opinion, putting in the logic to do additional locking and handling of 
exceptions is more complicated than is warranted during the datanode shutdown 
thread. The locking could also cause additional risk and complexity. I do agree 
that ignoring an OOM may cause a total datanode crash. However, that would be 
more desirable in this case than a datanode hang, which was happening before 
HDFS-7533. Will ignoring other throwables also cause the datanode to hang?

> Better handling of exceptions thrown during upgrade shutdown
> 
>
> Key: HDFS-9181
> URL: https://issues.apache.org/jira/browse/HDFS-9181
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Minor
>
> Previously in HDFS-7533, a bug was fixed by suppressing exceptions during 
> upgrade shutdown. It may be appropriate as a temporary fix, but it would be 
> better if the exception is handled in some other ways.
> One way to handle it is by emitting a warning message. There could exist 
> other ways to handle it. This lira is created to discuss how to handle this 
> case better.
> Thanks to [~templedf] for bringing this up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7945) The WebHdfs system on DN does not honor the length parameter

2015-03-18 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14367415#comment-14367415
 ] 

Eric Payne commented on HDFS-7945:
--

Thanks for implementing the fix for this issue, [~wheat9]

+1, Looks Good

> The WebHdfs system on DN does not honor the length parameter
> 
>
> Key: HDFS-7945
> URL: https://issues.apache.org/jira/browse/HDFS-7945
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Haohui Mai
>Assignee: Haohui Mai
>Priority: Blocker
> Attachments: HDFS-7945.000.patch
>
>
> HDFS-7279 introduces a new WebHdfs server on the DN. The new server does not 
> honor the length parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7818) OffsetParam should return the default value instead of throwing NPE when the value is unspecified

2015-03-06 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14351118#comment-14351118
 ] 

Eric Payne commented on HDFS-7818:
--

Thank you [~wheat9]

> OffsetParam should return the default value instead of throwing NPE when the 
> value is unspecified
> -
>
> Key: HDFS-7818
> URL: https://issues.apache.org/jira/browse/HDFS-7818
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 2.7.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Blocker
> Fix For: 2.7.0
>
> Attachments: HDFS-7818.v1.txt, HDFS-7818.v2.txt, HDFS-7818.v3.txt, 
> HDFS-7818.v4.txt, HDFS-7818.v5.txt
>
>
> This is a regression in 2.7 and later.
> {{hadoop fs -cat}} over webhdfs works, but {{hadoop fs -text}} does not:
> {code}
> $ hadoop fs -cat webhdfs://myhost.com/tmp/test.1
> ... output ...
> $ hadoop fs -text webhdfs://myhost.com/tmp/test.1
> text: org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> null
>   at 
> org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:165)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:358)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:91)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:615)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:463)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:492)
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7818) DataNode throws NPE if the WebHdfs URL does not contain the offset parameter

2015-03-06 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HDFS-7818:
-
Attachment: HDFS-7818.v5.txt

Fixing findbugs warning and updating patch to v5.

> DataNode throws NPE if the WebHdfs URL does not contain the offset parameter
> 
>
> Key: HDFS-7818
> URL: https://issues.apache.org/jira/browse/HDFS-7818
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 2.7.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Blocker
> Attachments: HDFS-7818.v1.txt, HDFS-7818.v2.txt, HDFS-7818.v3.txt, 
> HDFS-7818.v4.txt, HDFS-7818.v5.txt
>
>
> This is a regression in 2.7 and later.
> {{hadoop fs -cat}} over webhdfs works, but {{hadoop fs -text}} does not:
> {code}
> $ hadoop fs -cat webhdfs://myhost.com/tmp/test.1
> ... output ...
> $ hadoop fs -text webhdfs://myhost.com/tmp/test.1
> text: org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> null
>   at 
> org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:165)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:358)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:91)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:615)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:463)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:492)
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7818) DataNode throws NPE if the WebHdfs URL does not contain the offset parameter

2015-03-06 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HDFS-7818:
-
Priority: Blocker  (was: Critical)
Target Version/s: 2.7.0

Marking as a blocker since this is a very common scenario when using webHDFS, 
and it hits the NPE every time. The only workaround is to use HDFS instead of 
the webHDFS interface, but that is not always an option when reading cross-colo 
or off grid.

> DataNode throws NPE if the WebHdfs URL does not contain the offset parameter
> 
>
> Key: HDFS-7818
> URL: https://issues.apache.org/jira/browse/HDFS-7818
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 2.7.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Blocker
> Attachments: HDFS-7818.v1.txt, HDFS-7818.v2.txt, HDFS-7818.v3.txt, 
> HDFS-7818.v4.txt
>
>
> This is a regression in 2.7 and later.
> {{hadoop fs -cat}} over webhdfs works, but {{hadoop fs -text}} does not:
> {code}
> $ hadoop fs -cat webhdfs://myhost.com/tmp/test.1
> ... output ...
> $ hadoop fs -text webhdfs://myhost.com/tmp/test.1
> text: org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> null
>   at 
> org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:165)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:358)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:91)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:615)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:463)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:492)
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7818) DataNode throws NPE if the WebHdfs URL does not contain the offset parameter

2015-03-05 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HDFS-7818:
-
Attachment: HDFS-7818.v4.txt

[~wheat9], thank you for helping out with this issue, for your reviews, and for 
your helpful comments.

I have updated the patch with your suggestions for a new method called 
{{getOffset}}. Will you please take a look and let me know if it meets your 
approval?

> DataNode throws NPE if the WebHdfs URL does not contain the offset parameter
> 
>
> Key: HDFS-7818
> URL: https://issues.apache.org/jira/browse/HDFS-7818
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 2.7.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Critical
> Attachments: HDFS-7818.v1.txt, HDFS-7818.v2.txt, HDFS-7818.v3.txt, 
> HDFS-7818.v4.txt
>
>
> This is a regression in 2.7 and later.
> {{hadoop fs -cat}} over webhdfs works, but {{hadoop fs -text}} does not:
> {code}
> $ hadoop fs -cat webhdfs://myhost.com/tmp/test.1
> ... output ...
> $ hadoop fs -text webhdfs://myhost.com/tmp/test.1
> text: org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> null
>   at 
> org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:165)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:358)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:91)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:615)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:463)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:492)
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7818) DataNode throws NPE if the WebHdfs URL does not contain the offset parameter

2015-03-05 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HDFS-7818:
-
Priority: Critical  (was: Major)

> DataNode throws NPE if the WebHdfs URL does not contain the offset parameter
> 
>
> Key: HDFS-7818
> URL: https://issues.apache.org/jira/browse/HDFS-7818
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 2.7.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Critical
> Attachments: HDFS-7818.v1.txt, HDFS-7818.v2.txt, HDFS-7818.v3.txt
>
>
> This is a regression in 2.7 and later.
> {{hadoop fs -cat}} over webhdfs works, but {{hadoop fs -text}} does not:
> {code}
> $ hadoop fs -cat webhdfs://myhost.com/tmp/test.1
> ... output ...
> $ hadoop fs -text webhdfs://myhost.com/tmp/test.1
> text: org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> null
>   at 
> org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:165)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:358)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:91)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:615)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:463)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:492)
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7818) DataNode throws NPE if the WebHdfs URL does not contain the offset parameter

2015-03-03 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345941#comment-14345941
 ] 

Eric Payne commented on HDFS-7818:
--

Thanks for the quick resply, [~wheat9]
{quote}
bq. Does that mean you would give the patch a plus 1
No . Just to quote myself:
{quote}
Ah well. It was worth a try :-).

Okay, I will implement the {{getOffset()}} in {{OffsetParam}} in order to 
follow the existing pattern.

> DataNode throws NPE if the WebHdfs URL does not contain the offset parameter
> 
>
> Key: HDFS-7818
> URL: https://issues.apache.org/jira/browse/HDFS-7818
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 2.7.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: HDFS-7818.v1.txt, HDFS-7818.v2.txt, HDFS-7818.v3.txt
>
>
> This is a regression in 2.7 and later.
> {{hadoop fs -cat}} over webhdfs works, but {{hadoop fs -text}} does not:
> {code}
> $ hadoop fs -cat webhdfs://myhost.com/tmp/test.1
> ... output ...
> $ hadoop fs -text webhdfs://myhost.com/tmp/test.1
> text: org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> null
>   at 
> org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:165)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:358)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:91)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:615)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:463)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:492)
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7818) DataNode throws NPE if the WebHdfs URL does not contain the offset parameter

2015-03-02 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14343924#comment-14343924
 ] 

Eric Payne commented on HDFS-7818:
--

[~wheat9], thank you for your review and thoughtful comments.
{quote}
bq. That means the ctor should sub in the default when no param string is given 
(Eric's patch) so getValue always returns a non-null value
I'm fine with that...
{quote}
Does that mean you would give the patch a plus 1 :-)

> DataNode throws NPE if the WebHdfs URL does not contain the offset parameter
> 
>
> Key: HDFS-7818
> URL: https://issues.apache.org/jira/browse/HDFS-7818
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 2.7.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: HDFS-7818.v1.txt, HDFS-7818.v2.txt, HDFS-7818.v3.txt
>
>
> This is a regression in 2.7 and later.
> {{hadoop fs -cat}} over webhdfs works, but {{hadoop fs -text}} does not:
> {code}
> $ hadoop fs -cat webhdfs://myhost.com/tmp/test.1
> ... output ...
> $ hadoop fs -text webhdfs://myhost.com/tmp/test.1
> text: org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> null
>   at 
> org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:165)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:358)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:91)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:615)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:463)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:492)
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7818) DataNode throws NPE if the WebHdfs URL does not contain the offset parameter

2015-02-28 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341807#comment-14341807
 ] 

Eric Payne commented on HDFS-7818:
--

Hi [~wheat9]. Did you have a chance to think about my response?

> DataNode throws NPE if the WebHdfs URL does not contain the offset parameter
> 
>
> Key: HDFS-7818
> URL: https://issues.apache.org/jira/browse/HDFS-7818
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 2.7.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: HDFS-7818.v1.txt, HDFS-7818.v2.txt, HDFS-7818.v3.txt
>
>
> This is a regression in 2.7 and later.
> {{hadoop fs -cat}} over webhdfs works, but {{hadoop fs -text}} does not:
> {code}
> $ hadoop fs -cat webhdfs://myhost.com/tmp/test.1
> ... output ...
> $ hadoop fs -text webhdfs://myhost.com/tmp/test.1
> text: org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> null
>   at 
> org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:165)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:358)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:91)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:615)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:463)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:492)
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7818) DataNode throws NPE if the WebHdfs URL does not contain the offset parameter

2015-02-24 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14335483#comment-14335483
 ] 

Eric Payne commented on HDFS-7818:
--

Now that I look at it, the patch in HDFS-7818.v3.txt is not exactly correct 
either. I think that if we want to keep the NULL check in a constructor, it 
should be done in {{OffsetParam(final Long value)}} instead of 
{{OffsetParam(final String str)}}, since the latter invokes the former.

> DataNode throws NPE if the WebHdfs URL does not contain the offset parameter
> 
>
> Key: HDFS-7818
> URL: https://issues.apache.org/jira/browse/HDFS-7818
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 2.7.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: HDFS-7818.v1.txt, HDFS-7818.v2.txt, HDFS-7818.v3.txt
>
>
> This is a regression in 2.7 and later.
> {{hadoop fs -cat}} over webhdfs works, but {{hadoop fs -text}} does not:
> {code}
> $ hadoop fs -cat webhdfs://myhost.com/tmp/test.1
> ... output ...
> $ hadoop fs -text webhdfs://myhost.com/tmp/test.1
> text: org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> null
>   at 
> org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:165)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:358)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:91)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:615)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:463)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:492)
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7818) DataNode throws NPE if the WebHdfs URL does not contain the offset parameter

2015-02-24 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14335438#comment-14335438
 ] 

Eric Payne commented on HDFS-7818:
--

Thank you for your review, [~wheat9]
bq. Maybe it might make more sense to introduce a new method {{getOffset()}} in 
{{OffsetParam}}.
If a {{getOffset()}} method is created instead of handling the NULL case in the 
constructor as is done in the HDFS-7818.V3.txt patch, won't I also have to 
change all of the {{offset.getValue()}} calls to {{offset.getOffset()}} in the 
{{NamenodeWebHdfsMethods}} class?

The change in the current patch seems less risky because it catches the NULL 
case during construction of the object and has less code change.

> DataNode throws NPE if the WebHdfs URL does not contain the offset parameter
> 
>
> Key: HDFS-7818
> URL: https://issues.apache.org/jira/browse/HDFS-7818
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 2.7.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: HDFS-7818.v1.txt, HDFS-7818.v2.txt, HDFS-7818.v3.txt
>
>
> This is a regression in 2.7 and later.
> {{hadoop fs -cat}} over webhdfs works, but {{hadoop fs -text}} does not:
> {code}
> $ hadoop fs -cat webhdfs://myhost.com/tmp/test.1
> ... output ...
> $ hadoop fs -text webhdfs://myhost.com/tmp/test.1
> text: org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> null
>   at 
> org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:165)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:358)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:91)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:615)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:463)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:492)
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7818) FsShell -text over webhdfs fails with NPE in DN

2015-02-23 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HDFS-7818:
-
Attachment: HDFS-7818.v3.txt

[~wheat9]: Thank you very much for reviewing the patch and for your comments.

I have made the changes you suggested in the OffsetParam constructor. Please 
review.

> FsShell -text over webhdfs fails with NPE in DN
> ---
>
> Key: HDFS-7818
> URL: https://issues.apache.org/jira/browse/HDFS-7818
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 2.7.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: HDFS-7818.v1.txt, HDFS-7818.v2.txt, HDFS-7818.v3.txt
>
>
> This is a regression in 2.7 and later.
> {{hadoop fs -cat}} over webhdfs works, but {{hadoop fs -text}} does not:
> {code}
> $ hadoop fs -cat webhdfs://myhost.com/tmp/test.1
> ... output ...
> $ hadoop fs -text webhdfs://myhost.com/tmp/test.1
> text: org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> null
>   at 
> org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:165)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:358)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:91)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:615)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:463)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:492)
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7818) FsShell -text over webhdfs fails with NPE in DN

2015-02-21 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HDFS-7818:
-
Attachment: HDFS-7818.v2.txt

The test case of the first patch wasn't catching IOException. HDFS-7818.v2.txt 
fixes that.

> FsShell -text over webhdfs fails with NPE in DN
> ---
>
> Key: HDFS-7818
> URL: https://issues.apache.org/jira/browse/HDFS-7818
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 2.7.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: HDFS-7818.v1.txt, HDFS-7818.v2.txt
>
>
> This is a regression in 2.7 and later.
> {{hadoop fs -cat}} over webhdfs works, but {{hadoop fs -text}} does not:
> {code}
> $ hadoop fs -cat webhdfs://myhost.com/tmp/test.1
> ... output ...
> $ hadoop fs -text webhdfs://myhost.com/tmp/test.1
> text: org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> null
>   at 
> org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:165)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:358)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:91)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:615)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:463)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:492)
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7818) FsShell -text over webhdfs fails with NPE in DN

2015-02-21 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HDFS-7818:
-
Status: Patch Available  (was: Open)

> FsShell -text over webhdfs fails with NPE in DN
> ---
>
> Key: HDFS-7818
> URL: https://issues.apache.org/jira/browse/HDFS-7818
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 2.7.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: HDFS-7818.v1.txt
>
>
> This is a regression in 2.7 and later.
> {{hadoop fs -cat}} over webhdfs works, but {{hadoop fs -text}} does not:
> {code}
> $ hadoop fs -cat webhdfs://myhost.com/tmp/test.1
> ... output ...
> $ hadoop fs -text webhdfs://myhost.com/tmp/test.1
> text: org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> null
>   at 
> org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:165)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:358)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:91)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:615)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:463)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:492)
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7818) FsShell -text over webhdfs fails with NPE in DN

2015-02-21 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HDFS-7818:
-
Attachment: HDFS-7818.v1.txt

This patch (HDFS-7818.v1.txt) contains a test to ensure that 
ParameterParser#offset defaults to 0 when not present. I feel that there should 
be additional tests to ensure that the other parameters are handled properly 
when not present. For example, buffersize. blocksize, replication, etc.

> FsShell -text over webhdfs fails with NPE in DN
> ---
>
> Key: HDFS-7818
> URL: https://issues.apache.org/jira/browse/HDFS-7818
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 2.7.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: HDFS-7818.v1.txt
>
>
> This is a regression in 2.7 and later.
> {{hadoop fs -cat}} over webhdfs works, but {{hadoop fs -text}} does not:
> {code}
> $ hadoop fs -cat webhdfs://myhost.com/tmp/test.1
> ... output ...
> $ hadoop fs -text webhdfs://myhost.com/tmp/test.1
> text: org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> null
>   at 
> org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:165)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:358)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:91)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:615)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:463)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:492)
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7818) FsShell -text over webhdfs fails with NPE in DN

2015-02-21 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14330391#comment-14330391
 ] 

Eric Payne commented on HDFS-7818:
--

[~brahmareddy] Thanks for interest in this issue.

bq. can you describe more about the scenario..like does path gzip contain any 
special character like "+"..? 

There's nothing special about the filename. {{hadoop fs -text file}} doesn't 
work with any file, even one that is not gzipped. If a file is plain text, it 
should work the same as with {{-cat}}.

The problem is in the webhdfs {{ParameterParser}} code. If an offset is not 
passed, it should assume offset = 0 rather than allow null.

> FsShell -text over webhdfs fails with NPE in DN
> ---
>
> Key: HDFS-7818
> URL: https://issues.apache.org/jira/browse/HDFS-7818
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 2.7.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>
> This is a regression in 2.7 and later.
> {{hadoop fs -cat}} over webhdfs works, but {{hadoop fs -text}} does not:
> {code}
> $ hadoop fs -cat webhdfs://myhost.com/tmp/test.1
> ... output ...
> $ hadoop fs -text webhdfs://myhost.com/tmp/test.1
> text: org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> null
>   at 
> org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:165)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:358)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:91)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:615)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:463)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:492)
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7818) FsShell -text over webhdfs fails with NPE in DN

2015-02-21 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14330354#comment-14330354
 ] 

Eric Payne commented on HDFS-7818:
--

The stacktrace from the DN log is 

2015-02-21 17:19:19,901 [nioEventLoopGroup-3-2] WARN webhdfs.WebHdfsHandler: 
INTERNAL_SERVER_ERROR
java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.server.datanode.web.webhdfs.ParameterParser.offset(ParameterParser.java:76)
at 
org.apache.hadoop.hdfs.server.datanode.web.webhdfs.WebHdfsHandler.onOpen(WebHdfsHandler.java:190)
at 
org.apache.hadoop.hdfs.server.datanode.web.webhdfs.WebHdfsHandler.handle(WebHdfsHandler.java:129)
at 
org.apache.hadoop.hdfs.server.datanode.web.webhdfs.WebHdfsHandler$1.run(WebHdfsHandler.java:111)
at 
org.apache.hadoop.hdfs.server.datanode.web.webhdfs.WebHdfsHandler$1.run(WebHdfsHandler.java:108)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1680)
at 
org.apache.hadoop.hdfs.server.datanode.web.webhdfs.WebHdfsHandler.channelRead0(WebHdfsHandler.java:108)
at 
org.apache.hadoop.hdfs.server.datanode.web.URLDispatcher.channelRead0(URLDispatcher.java:52)
at 
org.apache.hadoop.hdfs.server.datanode.web.URLDispatcher.channelRead0(URLDispatcher.java:32)
at 
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at 
io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at 
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:163)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
at 
io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
at java.lang.Thread.run(Thread.java:745)


> FsShell -text over webhdfs fails with NPE in DN
> ---
>
> Key: HDFS-7818
> URL: https://issues.apache.org/jira/browse/HDFS-7818
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 2.7.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>
> This is a regression in 2.7 and later.
> {{hadoop fs -cat}} over webhdfs works, but {{hadoop fs -text}} does not:
> {code}
> $ hadoop fs -cat webhdfs://myhost.com/tmp/test.1
> ... output ...
> $ hadoop fs -text webhdfs://myhost.com/tmp/test.1
> text: org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> null
>   at 
> org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:165)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:358)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:91)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:615)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:463)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:492)
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   >