[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-05-27 Thread Hangjun Ye (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010841#comment-14010841
 ] 

Hangjun Ye commented on HDFS-6382:
--

Thanks Haohui for your reply.

Let me confirm I got your point. Your suggestion is that we'd better have a 
general mechanism/framework to run a job (maybe periodically) over the 
namespace inside the NN, and the TTL policy is just a specific job that might 
be implemented by user?
That's an interesting direction, we will think about it.

We are heavy users of Hadoop and also do some in-house improvements per our 
business requirement. We definitely want to contribute the improvements back to 
community, as long as it's helpful for the community.

> HDFS File/Directory TTL
> ---
>
> Key: HDFS-6382
> URL: https://issues.apache.org/jira/browse/HDFS-6382
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, namenode
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
>
> In production environment, we always have scenario like this, we want to 
> backup files on hdfs for some time and then hope to delete these files 
> automatically. For example, we keep only 1 day's logs on local disk due to 
> limited disk space, but we need to keep about 1 month's logs in order to 
> debug program bugs, so we keep all the logs on hdfs and delete logs which are 
> older than 1 month. This is a typical scenario of HDFS TTL. So here we 
> propose that hdfs can support TTL.
> Following are some details of this proposal:
> 1. HDFS can support TTL on a specified file or directory
> 2. If a TTL is set on a file, the file will be deleted automatically after 
> the TTL is expired
> 3. If a TTL is set on a directory, the child files and directories will be 
> deleted automatically after the TTL is expired
> 4. The child file/directory's TTL configuration should override its parent 
> directory's
> 5. A global configuration is needed to configure that whether the deleted 
> files/directories should go to the trash or not
> 6. A global configuration is needed to configure that whether a directory 
> with TTL should be deleted when it is emptied by TTL mechanism or not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6453) use Time#monotonicNow to avoid system clock reset

2014-05-27 Thread Liang Xie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie updated HDFS-6453:


Status: Patch Available  (was: Open)

> use Time#monotonicNow to avoid system clock reset
> -
>
> Key: HDFS-6453
> URL: https://issues.apache.org/jira/browse/HDFS-6453
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, namenode
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6453.txt
>
>
> similiar with hadoop-common, let's re-check and replace 
> System#currentTimeMillis with Time#monotonicNow in HDFS project as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6453) use Time#monotonicNow to avoid system clock reset

2014-05-27 Thread Liang Xie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie updated HDFS-6453:


Attachment: HDFS-6453.txt

> use Time#monotonicNow to avoid system clock reset
> -
>
> Key: HDFS-6453
> URL: https://issues.apache.org/jira/browse/HDFS-6453
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, namenode
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6453.txt
>
>
> similiar with hadoop-common, let's re-check and replace 
> System#currentTimeMillis with Time#monotonicNow in HDFS project as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6453) use Time#monotonicNow to avoid system clock reset

2014-05-27 Thread Liang Xie (JIRA)
Liang Xie created HDFS-6453:
---

 Summary: use Time#monotonicNow to avoid system clock reset
 Key: HDFS-6453
 URL: https://issues.apache.org/jira/browse/HDFS-6453
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie


similiar with hadoop-common, let's re-check and replace 
System#currentTimeMillis with Time#monotonicNow in HDFS project as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-4167) Add support for restoring/rolling back to a snapshot

2014-05-27 Thread Guo Ruijing (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010822#comment-14010822
 ] 

Guo Ruijing commented on HDFS-4167:
---

I agree that it's better to keep it as a standalone HDFS metadata operation. It 
is easy to restore any snapshot if block is copied out for append as HDFS-6087 
proposal.

what's change in append?

1. file f1 includes (Block1, Block2, Block3)

2. append to f1

a) client request block3 information from namenode
b) client request to datanode to copy block3 as block4
c) append to block4
d) commit block4 to namenode

what happend in snapshot?

snap1: f1 include (block1, block2, block3)
snap2: f2 include (block1, block2, block4)

how to restore snapshot? just restore snap1 as current file since no partial 
blocks are shared by different snap.


> Add support for restoring/rolling back to a snapshot
> 
>
> Key: HDFS-4167
> URL: https://issues.apache.org/jira/browse/HDFS-4167
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Suresh Srinivas
>Assignee: Jing Zhao
> Attachments: HDFS-4167.000.patch, HDFS-4167.001.patch, 
> HDFS-4167.002.patch, HDFS-4167.003.patch, HDFS-4167.004.patch
>
>
> This jira tracks work related to restoring a directory/file to a snapshot.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-05-27 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010820#comment-14010820
 ] 

Haohui Mai commented on HDFS-6382:
--

bq. TTL is a very simple (but general) policy and we might even consider it as 
an attribute of file, like the number of replicas. Seems it wouldn't introduce 
much complexity to handle it in the NN.

bq. Another benefit to having it inside NN is we don't have to handle the 
authentication/authorization problem in a separate system. For example we have 
a shared HDFS cluster for many internal users, we don't want someone to set TTL 
policy to other one's files. NN could handle it easily by its own 
authentication/authorization mechanism.

I agree that running jobs of the namespace without MR should be the direction 
to go. However, I think the main hold back here is that the design mixes the 
mechanism (running jobs of the namespace without MR) and the policy (TTL) 
together.

As [~cmccabe] pointed out earlier, every user has his / her own policy. 
Provided that HDFS has a wide range of users, this type of design / 
implementation is unlikely to fly in the ecosystem.

Currently HDFS does not have the above mechanism, you're more than welcomed to 
contribute a patch.

> HDFS File/Directory TTL
> ---
>
> Key: HDFS-6382
> URL: https://issues.apache.org/jira/browse/HDFS-6382
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, namenode
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
>
> In production environment, we always have scenario like this, we want to 
> backup files on hdfs for some time and then hope to delete these files 
> automatically. For example, we keep only 1 day's logs on local disk due to 
> limited disk space, but we need to keep about 1 month's logs in order to 
> debug program bugs, so we keep all the logs on hdfs and delete logs which are 
> older than 1 month. This is a typical scenario of HDFS TTL. So here we 
> propose that hdfs can support TTL.
> Following are some details of this proposal:
> 1. HDFS can support TTL on a specified file or directory
> 2. If a TTL is set on a file, the file will be deleted automatically after 
> the TTL is expired
> 3. If a TTL is set on a directory, the child files and directories will be 
> deleted automatically after the TTL is expired
> 4. The child file/directory's TTL configuration should override its parent 
> directory's
> 5. A global configuration is needed to configure that whether the deleted 
> files/directories should go to the trash or not
> 6. A global configuration is needed to configure that whether a directory 
> with TTL should be deleted when it is emptied by TTL mechanism or not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (HDFS-6404) HttpFS should use a 000 umask for mkdir and create operations

2014-05-27 Thread Mike Yoder (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Yoder reassigned HDFS-6404:


Assignee: Mike Yoder  (was: Alejandro Abdelnur)

> HttpFS should use a 000 umask for mkdir and create operations
> -
>
> Key: HDFS-6404
> URL: https://issues.apache.org/jira/browse/HDFS-6404
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Alejandro Abdelnur
>Assignee: Mike Yoder
>
> The FileSystem created by HttpFS should use a 000 umask not to affect the 
> permissions set by the client as it is the responsibility of the client to 
> resolve the right permissions based on the client unmask.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6310) PBImageXmlWriter should output information about Delegation Tokens

2014-05-27 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010757#comment-14010757
 ] 

Akira AJISAKA commented on HDFS-6310:
-

bq. If the attacker has access to the image, it's already game over whether oiv 
accurately dumps the image or not.
I agree with you.
[~wheat9], what do you think? If you agree with that, could you review the 
patch?

> PBImageXmlWriter should output information about Delegation Tokens
> --
>
> Key: HDFS-6310
> URL: https://issues.apache.org/jira/browse/HDFS-6310
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.4.0
>Reporter: Akira AJISAKA
>Assignee: Akira AJISAKA
> Attachments: HDFS-6310.patch
>
>
> Separated from HDFS-6293.
> The 2.4.0 pb-fsimage does contain tokens, but OfflineImageViewer with -XML 
> option does not show any tokens.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-05-27 Thread Hangjun Ye (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010734#comment-14010734
 ] 

Hangjun Ye commented on HDFS-6382:
--

Implementing it outside NN is definitely another option, and I agree with Colin 
that it's not feasible to implement a complex clean up policy (like based on 
storage space) inside NN.

TTL is a very simple (but general) policy and we might even consider it as an 
attribute of file, like the number of replicas. Seems it wouldn't introduce 
much complexity to handle it in the NN.

Another benefit to having it inside NN is we don't have to handle the 
authentication/authorization problem in a separate system. For example we have 
a shared HDFS cluster for many internal users, we don't want someone to set TTL 
policy to other one's files. NN could handle it easily by its own 
authentication/authorization mechanism.

So far a TTL-based clean up policy is good enough for our scenario (Zesheng and 
I are from the same company and we are supporting our company's internal usage 
for Hadoop) and it's would be nice to have a simple and workable solution in 
HDFS.

> HDFS File/Directory TTL
> ---
>
> Key: HDFS-6382
> URL: https://issues.apache.org/jira/browse/HDFS-6382
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, namenode
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
>
> In production environment, we always have scenario like this, we want to 
> backup files on hdfs for some time and then hope to delete these files 
> automatically. For example, we keep only 1 day's logs on local disk due to 
> limited disk space, but we need to keep about 1 month's logs in order to 
> debug program bugs, so we keep all the logs on hdfs and delete logs which are 
> older than 1 month. This is a typical scenario of HDFS TTL. So here we 
> propose that hdfs can support TTL.
> Following are some details of this proposal:
> 1. HDFS can support TTL on a specified file or directory
> 2. If a TTL is set on a file, the file will be deleted automatically after 
> the TTL is expired
> 3. If a TTL is set on a directory, the child files and directories will be 
> deleted automatically after the TTL is expired
> 4. The child file/directory's TTL configuration should override its parent 
> directory's
> 5. A global configuration is needed to configure that whether the deleted 
> files/directories should go to the trash or not
> 6. A global configuration is needed to configure that whether a directory 
> with TTL should be deleted when it is emptied by TTL mechanism or not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HDFS-6286) adding a timeout setting for local read io

2014-05-27 Thread Liang Xie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie resolved HDFS-6286.
-

Resolution: Duplicate

> adding a timeout setting for local read io
> --
>
> Key: HDFS-6286
> URL: https://issues.apache.org/jira/browse/HDFS-6286
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0, 2.4.0
>Reporter: Liang Xie
>Assignee: Liang Xie
>
> Currently, if a write or remote read requested into a sick disk, 
> DFSClient.hdfsTimeout could help the caller have a guaranteed time cost to 
> return back. but it doesn't work on local read. Take HBase scan for example,
> DFSInputStream.read -> readWithStrategy -> readBuffer -> 
> BlockReaderLocal.read ->  dataIn.read -> FileChannelImpl.read
> if it hits a bad disk, the low read io probably takes tens of seconds,  and 
> what's worse is, the "DFSInputStream.read" hold a lock always.
> Per my knowledge, there's no good mechanism to cancel a running read 
> io(Please correct me if it's wrong), so my opinion is adding a future around 
> the read request, and we could set a timeout there, if the threshold reached, 
> we can add the local node into deadnode probably...
> Any thought?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6442) Fix TestEditLogAutoroll and TestStandbyCheckpoints failure caused by port conficts

2014-05-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010723#comment-14010723
 ] 

Hadoop QA commented on HDFS-6442:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12647022/HDFS-6442.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6992//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6992//console

This message is automatically generated.

> Fix TestEditLogAutoroll and TestStandbyCheckpoints failure caused by port 
> conficts
> --
>
> Key: HDFS-6442
> URL: https://issues.apache.org/jira/browse/HDFS-6442
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
>Priority: Minor
> Attachments: HDFS-6442.1.patch, HDFS-6442.patch
>
>
> TestEditLogAutoroll and TestStandbyCheckpoints both use 10061 and 10062 to 
> set up the mini-cluster, this may result in occasionally test failure when 
> run test with -Pparallel-tests. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6286) adding a timeout setting for local read io

2014-05-27 Thread Liang Xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010718#comment-14010718
 ] 

Liang Xie commented on HDFS-6286:
-

bq. There is a high overhead to adding communication between threads to every 
read, and I don't think we want this in short-circuit reads (which is an 
optimization, after all)
Indeed,  i am fine with my prototype not in community codebase, just as a 
kindly heads up to notice this corner case:)  It  doesn't help for regular 
request perf, just against the long tail request.
bq. If we create an extra thread per DFSInputStream using SCR
i used a thread pool, so the overhead should be acceptable,  and when i checked 
the timeout/execution exception, the upper layer will treat that dn as datanode 
immediately, so it was expected no halt pool be observed per my understanding.

bq. I am going to create a JIRA to implement hedged reads for the non-pread 
case. I think that will be a better general solution that doesn't have the 
above-mentioned problems.
Cool, i also have got some of your concerns, and i totally agree that we need a 
more general solution in community code like hedged reads for regular read.  
Let's work on HDFS-6450 now and close this one.


> adding a timeout setting for local read io
> --
>
> Key: HDFS-6286
> URL: https://issues.apache.org/jira/browse/HDFS-6286
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0, 2.4.0
>Reporter: Liang Xie
>Assignee: Liang Xie
>
> Currently, if a write or remote read requested into a sick disk, 
> DFSClient.hdfsTimeout could help the caller have a guaranteed time cost to 
> return back. but it doesn't work on local read. Take HBase scan for example,
> DFSInputStream.read -> readWithStrategy -> readBuffer -> 
> BlockReaderLocal.read ->  dataIn.read -> FileChannelImpl.read
> if it hits a bad disk, the low read io probably takes tens of seconds,  and 
> what's worse is, the "DFSInputStream.read" hold a lock always.
> Per my knowledge, there's no good mechanism to cancel a running read 
> io(Please correct me if it's wrong), so my opinion is adding a future around 
> the read request, and we could set a timeout there, if the threshold reached, 
> we can add the local node into deadnode probably...
> Any thought?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6056) Clean up NFS config settings

2014-05-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010717#comment-14010717
 ] 

Hadoop QA commented on HDFS-6056:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12647017/HDFS-6056.009.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 11 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common hadoop-common-project/hadoop-nfs 
hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs-nfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6991//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6991//console

This message is automatically generated.

> Clean up NFS config settings
> 
>
> Key: HDFS-6056
> URL: https://issues.apache.org/jira/browse/HDFS-6056
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.3.0
>Reporter: Aaron T. Myers
>Assignee: Brandon Li
> Attachments: HDFS-6056.001.patch, HDFS-6056.002.patch, 
> HDFS-6056.003.patch, HDFS-6056.004.patch, HDFS-6056.005.patch, 
> HDFS-6056.006.patch, HDFS-6056.007.patch, HDFS-6056.008.patch, 
> HDFS-6056.009.patch
>
>
> As discussed on HDFS-6050, there's a few opportunities to improve the config 
> settings related to NFS. This JIRA is to implement those changes, which 
> include: moving hdfs-nfs related properties into hadoop-hdfs-nfs project, and 
> replacing 'nfs3' with 'nfs' in the property names.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6448) change BlockReaderLocalLegacy timeout detail

2014-05-27 Thread Liang Xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010701#comment-14010701
 ] 

Liang Xie commented on HDFS-6448:
-

Thanks Colin for your comment, now i begin to understand why the 
BlockReaderLocalLegacy class still in trunk :)  and also i am glad to see this 
timeout issue doesn't exist in HDFS-347 SCR.

> change BlockReaderLocalLegacy timeout detail
> 
>
> Key: HDFS-6448
> URL: https://issues.apache.org/jira/browse/HDFS-6448
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0, 2.4.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6448.txt
>
>
> Our hbase deployed upon hadoop2.0, in one accident, we hit HDFS-5016 in HDFS 
> side, but we also found from HBase side, the dfs client was hung at 
> getBlockReader, after reading code, we found there is a timeout setting in 
> current codebase though, but the default hdfsTimeout value is "-1"  ( from 
> Client.java:getTimeout(conf) )which means no timeout...
> The hung stack trace like following:
> at $Proxy21.getBlockLocalPathInfo(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientDatanodeProtocolTranslatorPB.getBlockLocalPathInfo(ClientDatanodeProtocolTranslatorPB.java:215)
> at 
> org.apache.hadoop.hdfs.BlockReaderLocal.getBlockPathInfo(BlockReaderLocal.java:267)
> at 
> org.apache.hadoop.hdfs.BlockReaderLocal.newBlockReader(BlockReaderLocal.java:180)
> at org.apache.hadoop.hdfs.DFSClient.getLocalBlockReader(DFSClient.java:812)
> One feasible fix is replacing the hdfsTimeout with socketTimeout. see 
> attached patch. Most of credit should give [~liushaohui]



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6452) ConfiguredFailoverProxyProvider should randomize currentProxyIndex on initialization

2014-05-27 Thread Gera Shegalov (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010672#comment-14010672
 ] 

Gera Shegalov commented on HDFS-6452:
-

Lohit is correct, when we implement "readable standby" similar to what is 
provided by some database systems, the fraction of failed requests even in 
"normal case" will be well below 50%. Making randomization optional is a good 
idea.

> ConfiguredFailoverProxyProvider should randomize currentProxyIndex on 
> initialization
> 
>
> Key: HDFS-6452
> URL: https://issues.apache.org/jira/browse/HDFS-6452
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ha, hdfs-client
>Affects Versions: 2.4.0
>Reporter: Gera Shegalov
>Assignee: Gera Shegalov
>
> We observe that the clients iterate proxies in the fixed order. Depending on 
> the order of namenodes in dfs.ha.namenodes. (e.g. 'nn1,nn2') and 
> the current standby (nn1), all the clients will hit nn1 first, and then 
> failover to nn2.  Chatting with [~lohit] we think we can simply select the 
> initial value of {{currentProxyIndex}} randomly, and keep the logic of 
> {{performFailover}} of iterating from left-to-right. This should halve the 
> unnecessary load on standby NN.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6452) ConfiguredFailoverProxyProvider should randomize currentProxyIndex on initialization

2014-05-27 Thread Lohit Vijayarenu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010660#comment-14010660
 ] 

Lohit Vijayarenu commented on HDFS-6452:


On our clusters we are seeing about 70-75% of load is readonly (getFileInfo, 
listStatus, getBlockLocation). With this we have been thinking about enabling 
namenode stale reads. If we do that, then having clients pick random NameNode 
would distribute clients across both NameNode. How about we have an option to 
randomize which NN to talk to by the client. By default 
ConfiguredFailoverProxyProvider would behave like today, but an option to 
randomize would be useful. 

> ConfiguredFailoverProxyProvider should randomize currentProxyIndex on 
> initialization
> 
>
> Key: HDFS-6452
> URL: https://issues.apache.org/jira/browse/HDFS-6452
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ha, hdfs-client
>Affects Versions: 2.4.0
>Reporter: Gera Shegalov
>Assignee: Gera Shegalov
>
> We observe that the clients iterate proxies in the fixed order. Depending on 
> the order of namenodes in dfs.ha.namenodes. (e.g. 'nn1,nn2') and 
> the current standby (nn1), all the clients will hit nn1 first, and then 
> failover to nn2.  Chatting with [~lohit] we think we can simply select the 
> initial value of {{currentProxyIndex}} randomly, and keep the logic of 
> {{performFailover}} of iterating from left-to-right. This should halve the 
> unnecessary load on standby NN.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-05-27 Thread Jian Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010657#comment-14010657
 ] 

Jian Wang commented on HDFS-6382:
-

I think it is better for you to provide a  (backup & clean up ) platform for 
your user ,you can  implement a lot of clean up strategy for your users in your 
company.
This can reduce a lot of repeated jobs.

> HDFS File/Directory TTL
> ---
>
> Key: HDFS-6382
> URL: https://issues.apache.org/jira/browse/HDFS-6382
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, namenode
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
>
> In production environment, we always have scenario like this, we want to 
> backup files on hdfs for some time and then hope to delete these files 
> automatically. For example, we keep only 1 day's logs on local disk due to 
> limited disk space, but we need to keep about 1 month's logs in order to 
> debug program bugs, so we keep all the logs on hdfs and delete logs which are 
> older than 1 month. This is a typical scenario of HDFS TTL. So here we 
> propose that hdfs can support TTL.
> Following are some details of this proposal:
> 1. HDFS can support TTL on a specified file or directory
> 2. If a TTL is set on a file, the file will be deleted automatically after 
> the TTL is expired
> 3. If a TTL is set on a directory, the child files and directories will be 
> deleted automatically after the TTL is expired
> 4. The child file/directory's TTL configuration should override its parent 
> directory's
> 5. A global configuration is needed to configure that whether the deleted 
> files/directories should go to the trash or not
> 6. A global configuration is needed to configure that whether a directory 
> with TTL should be deleted when it is emptied by TTL mechanism or not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-05-27 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010632#comment-14010632
 ] 

Colin Patrick McCabe commented on HDFS-6382:


bq. Why do you think that putting the cleanup mechanism into the NameNode seems 
questionable, can you point out some details?

Andrew and Chris commented about this earlier.  See:
https://issues.apache.org/jira/browse/HDFS-6382?focusedCommentId=13998933&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13998933

I would add to that:
* Every user of this is going to want a slightly different deletion policy.  
It's just way too much configuration for the NameNode to reasonably handle.  
Much easier to do it in a user process.  For example, maybe you want to keep at 
least 100 GB of logs, 100 GB of "foo" data, and 1000 GB of "bar" data.  It's 
easy to handle this complexity in a user process, incredibly complex and 
frustrating to handle it in the NameNode.
* Your nightly MR job (or whatever) also needs to be able to do things like 
email sysadmins when the disks are filling up, which the NameNode can't 
reasonably be expected to do.
* I don't see a big advantage to doing this in the NameNode, and I see a lot of 
disadvantages (more complexity to maintain, difficult configuration, need to 
restart to update config)

Maybe I could be convinced otherwise, but so far the only argument that I've 
seen for doing it in the NN is that it would be re-usable.  And this could just 
as easily apply to an implementation outside the NN.  For example, as I pointed 
out earlier, DistCp is reusable, without being in the NameNode.

> HDFS File/Directory TTL
> ---
>
> Key: HDFS-6382
> URL: https://issues.apache.org/jira/browse/HDFS-6382
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, namenode
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
>
> In production environment, we always have scenario like this, we want to 
> backup files on hdfs for some time and then hope to delete these files 
> automatically. For example, we keep only 1 day's logs on local disk due to 
> limited disk space, but we need to keep about 1 month's logs in order to 
> debug program bugs, so we keep all the logs on hdfs and delete logs which are 
> older than 1 month. This is a typical scenario of HDFS TTL. So here we 
> propose that hdfs can support TTL.
> Following are some details of this proposal:
> 1. HDFS can support TTL on a specified file or directory
> 2. If a TTL is set on a file, the file will be deleted automatically after 
> the TTL is expired
> 3. If a TTL is set on a directory, the child files and directories will be 
> deleted automatically after the TTL is expired
> 4. The child file/directory's TTL configuration should override its parent 
> directory's
> 5. A global configuration is needed to configure that whether the deleted 
> files/directories should go to the trash or not
> 6. A global configuration is needed to configure that whether a directory 
> with TTL should be deleted when it is emptied by TTL mechanism or not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6452) ConfiguredFailoverProxyProvider should randomize currentProxyIndex on initialization

2014-05-27 Thread Gera Shegalov (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010631#comment-14010631
 ] 

Gera Shegalov commented on HDFS-6452:
-

Hi Aaron, the net improvement should be in the overhead smoothness over time. 
E.g., we will smooth the storm of {{StandbyException: Operation category READ 
is not supported in state standby}}

[~jingzhao], this is targeted for deployments with automatic failover where the 
emphasis is on not trying to watch what NN is active all the time. 

> ConfiguredFailoverProxyProvider should randomize currentProxyIndex on 
> initialization
> 
>
> Key: HDFS-6452
> URL: https://issues.apache.org/jira/browse/HDFS-6452
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ha, hdfs-client
>Affects Versions: 2.4.0
>Reporter: Gera Shegalov
>Assignee: Gera Shegalov
>
> We observe that the clients iterate proxies in the fixed order. Depending on 
> the order of namenodes in dfs.ha.namenodes. (e.g. 'nn1,nn2') and 
> the current standby (nn1), all the clients will hit nn1 first, and then 
> failover to nn2.  Chatting with [~lohit] we think we can simply select the 
> initial value of {{currentProxyIndex}} randomly, and keep the logic of 
> {{performFailover}} of iterating from left-to-right. This should halve the 
> unnecessary load on standby NN.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6452) ConfiguredFailoverProxyProvider should randomize currentProxyIndex on initialization

2014-05-27 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010623#comment-14010623
 ] 

Jing Zhao commented on HDFS-6452:
-

I agree with [~atm]. The NameNode failover is not common in practice, and the 
administrator can easily control which NN is the active one while starting the 
cluster. Given that, to randomize currentProxyIndex on client initialization 
will actually increase the number of RPC in normal cases.

> ConfiguredFailoverProxyProvider should randomize currentProxyIndex on 
> initialization
> 
>
> Key: HDFS-6452
> URL: https://issues.apache.org/jira/browse/HDFS-6452
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ha, hdfs-client
>Affects Versions: 2.4.0
>Reporter: Gera Shegalov
>Assignee: Gera Shegalov
>
> We observe that the clients iterate proxies in the fixed order. Depending on 
> the order of namenodes in dfs.ha.namenodes. (e.g. 'nn1,nn2') and 
> the current standby (nn1), all the clients will hit nn1 first, and then 
> failover to nn2.  Chatting with [~lohit] we think we can simply select the 
> initial value of {{currentProxyIndex}} randomly, and keep the logic of 
> {{performFailover}} of iterating from left-to-right. This should halve the 
> unnecessary load on standby NN.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6442) Fix TestEditLogAutoroll and TestStandbyCheckpoints failure caused by port conficts

2014-05-27 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010609#comment-14010609
 ] 

Arpit Agarwal commented on HDFS-6442:
-

+1 pending Jenkins. Thanks for incorporating the suggestion.

> Fix TestEditLogAutoroll and TestStandbyCheckpoints failure caused by port 
> conficts
> --
>
> Key: HDFS-6442
> URL: https://issues.apache.org/jira/browse/HDFS-6442
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
>Priority: Minor
> Attachments: HDFS-6442.1.patch, HDFS-6442.patch
>
>
> TestEditLogAutoroll and TestStandbyCheckpoints both use 10061 and 10062 to 
> set up the mini-cluster, this may result in occasionally test failure when 
> run test with -Pparallel-tests. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6452) ConfiguredFailoverProxyProvider should randomize currentProxyIndex on initialization

2014-05-27 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010598#comment-14010598
 ] 

Aaron T. Myers commented on HDFS-6452:
--

Hi Gera, while this will obviously halve the amount of errant RPCs made to a 
standby NN in the situation where all clients connect to the standby NN first, 
it will of course also cut in half the number of RPCs that connect to the 
active the first time in the situation where all clients connect to the active 
first. Given that, is this proposal really a net improvement?

> ConfiguredFailoverProxyProvider should randomize currentProxyIndex on 
> initialization
> 
>
> Key: HDFS-6452
> URL: https://issues.apache.org/jira/browse/HDFS-6452
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ha, hdfs-client
>Affects Versions: 2.4.0
>Reporter: Gera Shegalov
>Assignee: Gera Shegalov
>
> We observe that the clients iterate proxies in the fixed order. Depending on 
> the order of namenodes in dfs.ha.namenodes. (e.g. 'nn1,nn2') and 
> the current standby (nn1), all the clients will hit nn1 first, and then 
> failover to nn2.  Chatting with [~lohit] we think we can simply select the 
> initial value of {{currentProxyIndex}} randomly, and keep the logic of 
> {{performFailover}} of iterating from left-to-right. This should halve the 
> unnecessary load on standby NN.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6452) ConfiguredFailoverProxyProvider should randomize currentProxyIndex on initialization

2014-05-27 Thread Gera Shegalov (JIRA)
Gera Shegalov created HDFS-6452:
---

 Summary: ConfiguredFailoverProxyProvider should randomize 
currentProxyIndex on initialization
 Key: HDFS-6452
 URL: https://issues.apache.org/jira/browse/HDFS-6452
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, hdfs-client
Affects Versions: 2.4.0
Reporter: Gera Shegalov
Assignee: Gera Shegalov


We observe that the clients iterate proxies in the fixed order. Depending on 
the order of namenodes in dfs.ha.namenodes. (e.g. 'nn1,nn2') and 
the current standby (nn1), all the clients will hit nn1 first, and then 
failover to nn2.  Chatting with [~lohit] we think we can simply select the 
initial value of {{currentProxyIndex}} randomly, and keep the logic of 
{{performFailover}} of iterating from left-to-right. This should halve the 
unnecessary load on standby NN.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6442) Fix TestEditLogAutoroll and TestStandbyCheckpoints failure caused by port conficts

2014-05-27 Thread Zesheng Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zesheng Wu updated HDFS-6442:
-

Summary: Fix TestEditLogAutoroll and TestStandbyCheckpoints failure caused 
by port conficts  (was: Fix TestEditLogAutoroll failure caused by port conficts)

> Fix TestEditLogAutoroll and TestStandbyCheckpoints failure caused by port 
> conficts
> --
>
> Key: HDFS-6442
> URL: https://issues.apache.org/jira/browse/HDFS-6442
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
>Priority: Minor
> Attachments: HDFS-6442.1.patch, HDFS-6442.patch
>
>
> TestEditLogAutoroll and TestStandbyCheckpoints both use 10061 and 10062 to 
> set up the mini-cluster, this may result in occasionally test failure when 
> run test with -Pparallel-tests. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6442) Fix TestEditLogAutoroll failure caused by port conficts

2014-05-27 Thread Zesheng Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zesheng Wu updated HDFS-6442:
-

Attachment: HDFS-6442.1.patch

Update the patch to address arpit's comments.

> Fix TestEditLogAutoroll failure caused by port conficts
> ---
>
> Key: HDFS-6442
> URL: https://issues.apache.org/jira/browse/HDFS-6442
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
>Priority: Minor
> Attachments: HDFS-6442.1.patch, HDFS-6442.patch
>
>
> TestEditLogAutoroll and TestStandbyCheckpoints both use 10061 and 10062 to 
> set up the mini-cluster, this may result in occasionally test failure when 
> run test with -Pparallel-tests. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6056) Clean up NFS config settings

2014-05-27 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-6056:
-

Attachment: HDFS-6056.009.patch

Rebased the patch.

> Clean up NFS config settings
> 
>
> Key: HDFS-6056
> URL: https://issues.apache.org/jira/browse/HDFS-6056
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.3.0
>Reporter: Aaron T. Myers
>Assignee: Brandon Li
> Attachments: HDFS-6056.001.patch, HDFS-6056.002.patch, 
> HDFS-6056.003.patch, HDFS-6056.004.patch, HDFS-6056.005.patch, 
> HDFS-6056.006.patch, HDFS-6056.007.patch, HDFS-6056.008.patch, 
> HDFS-6056.009.patch
>
>
> As discussed on HDFS-6050, there's a few opportunities to improve the config 
> settings related to NFS. This JIRA is to implement those changes, which 
> include: moving hdfs-nfs related properties into hadoop-hdfs-nfs project, and 
> replacing 'nfs3' with 'nfs' in the property names.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6056) Clean up NFS config settings

2014-05-27 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-6056:
-

Attachment: (was: HDFS-6056.009.patch)

> Clean up NFS config settings
> 
>
> Key: HDFS-6056
> URL: https://issues.apache.org/jira/browse/HDFS-6056
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.3.0
>Reporter: Aaron T. Myers
>Assignee: Brandon Li
> Attachments: HDFS-6056.001.patch, HDFS-6056.002.patch, 
> HDFS-6056.003.patch, HDFS-6056.004.patch, HDFS-6056.005.patch, 
> HDFS-6056.006.patch, HDFS-6056.007.patch, HDFS-6056.008.patch, 
> HDFS-6056.009.patch
>
>
> As discussed on HDFS-6050, there's a few opportunities to improve the config 
> settings related to NFS. This JIRA is to implement those changes, which 
> include: moving hdfs-nfs related properties into hadoop-hdfs-nfs project, and 
> replacing 'nfs3' with 'nfs' in the property names.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6056) Clean up NFS config settings

2014-05-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010545#comment-14010545
 ] 

Hadoop QA commented on HDFS-6056:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12647014/HDFS-6056.009.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6990//console

This message is automatically generated.

> Clean up NFS config settings
> 
>
> Key: HDFS-6056
> URL: https://issues.apache.org/jira/browse/HDFS-6056
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.3.0
>Reporter: Aaron T. Myers
>Assignee: Brandon Li
> Attachments: HDFS-6056.001.patch, HDFS-6056.002.patch, 
> HDFS-6056.003.patch, HDFS-6056.004.patch, HDFS-6056.005.patch, 
> HDFS-6056.006.patch, HDFS-6056.007.patch, HDFS-6056.008.patch, 
> HDFS-6056.009.patch
>
>
> As discussed on HDFS-6050, there's a few opportunities to improve the config 
> settings related to NFS. This JIRA is to implement those changes, which 
> include: moving hdfs-nfs related properties into hadoop-hdfs-nfs project, and 
> replacing 'nfs3' with 'nfs' in the property names.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6056) Clean up NFS config settings

2014-05-27 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-6056:
-

Attachment: HDFS-6056.009.patch

Uploaded a new patch to address Aaron's comments.

> Clean up NFS config settings
> 
>
> Key: HDFS-6056
> URL: https://issues.apache.org/jira/browse/HDFS-6056
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.3.0
>Reporter: Aaron T. Myers
>Assignee: Brandon Li
> Attachments: HDFS-6056.001.patch, HDFS-6056.002.patch, 
> HDFS-6056.003.patch, HDFS-6056.004.patch, HDFS-6056.005.patch, 
> HDFS-6056.006.patch, HDFS-6056.007.patch, HDFS-6056.008.patch, 
> HDFS-6056.009.patch
>
>
> As discussed on HDFS-6050, there's a few opportunities to improve the config 
> settings related to NFS. This JIRA is to implement those changes, which 
> include: moving hdfs-nfs related properties into hadoop-hdfs-nfs project, and 
> replacing 'nfs3' with 'nfs' in the property names.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-05-27 Thread Zesheng Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010520#comment-14010520
 ] 

Zesheng Wu commented on HDFS-6382:
--

bq. Like I said, we should write such a tool and add it to the base Hadoop 
distribution. This is similar to what we did with DistCp. Then users would not 
need to write their own versions of this stuff.
Sure, this is another good option.

bq. It's important to distinguish between creating a tool to handle deleting 
old files (which we all agree we should do), and putting this into the NameNode 
(which seems questionable).
Why do you think that putting the cleanup mechanism into the NameNode seems 
questionable, can you point out some details?

> HDFS File/Directory TTL
> ---
>
> Key: HDFS-6382
> URL: https://issues.apache.org/jira/browse/HDFS-6382
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, namenode
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
>
> In production environment, we always have scenario like this, we want to 
> backup files on hdfs for some time and then hope to delete these files 
> automatically. For example, we keep only 1 day's logs on local disk due to 
> limited disk space, but we need to keep about 1 month's logs in order to 
> debug program bugs, so we keep all the logs on hdfs and delete logs which are 
> older than 1 month. This is a typical scenario of HDFS TTL. So here we 
> propose that hdfs can support TTL.
> Following are some details of this proposal:
> 1. HDFS can support TTL on a specified file or directory
> 2. If a TTL is set on a file, the file will be deleted automatically after 
> the TTL is expired
> 3. If a TTL is set on a directory, the child files and directories will be 
> deleted automatically after the TTL is expired
> 4. The child file/directory's TTL configuration should override its parent 
> directory's
> 5. A global configuration is needed to configure that whether the deleted 
> files/directories should go to the trash or not
> 6. A global configuration is needed to configure that whether a directory 
> with TTL should be deleted when it is emptied by TTL mechanism or not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5682) Heterogeneous Storage phase 2 - APIs to expose Storage Types

2014-05-27 Thread Zesheng Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010512#comment-14010512
 ] 

Zesheng Wu commented on HDFS-5682:
--

Thanks for the responses [~arpitagarwal]
bq.The function name should communicate that this is disk space quota for a 
specific storage type, as opposed to the overall quotas which are set with 
setQuota. If the proposed name is hard to follow, how about 
get/setsetQuotaByStorageType? 

Yes, get/setQuotaByStorageType will be clearer.

bq.Let's defer this for now. The API and protocol can both be easily extended 
in a backwards compatible manner in the future without affecting existing 
applications.

OK

bq. We have to differentiate between quota unavailability vs disk space 
availability. The former will result in a quota violation exception, the latter 
will result in the behavior you described. We discuss the reasons for this in 
the HDFS-2832 design doc.

Got it, thanks.  I will look into the HDFS-2832 doc for more details.

> Heterogeneous Storage phase 2 - APIs to expose Storage Types
> 
>
> Key: HDFS-5682
> URL: https://issues.apache.org/jira/browse/HDFS-5682
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
> Attachments: 20140522-Heterogeneous-Storages-API.pdf
>
>
> Phase 1 (HDFS-2832) added support to present the DataNode as a collection of 
> discrete storages of different types.
> This Jira is to track phase 2 of the Heterogeneous Storage work which 
> involves exposing Storage Types to applications and adding Quota Management 
> support for administrators.
> This phase will also include tools support for administrators/users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6451) NFS should not return NFS3ERR_IO for AccessControlException

2014-05-27 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010503#comment-14010503
 ] 

Brandon Li commented on HDFS-6451:
--

>From [~zhongyi-altiscale]:
{quote}Hi Jing Zhao, it's definitely good to have a single exception handler 
instead of replicating the same code everywhere, but since each server 
procedure (ACCESS, GETATTR, FSSTAT, etc) might have their private data that 
needs to be written out, the child NFS3Response class still need to overload 
the writeHeaderAndResponse anyways
for AccessControlException, do you mean we need to catch it together with 
AuthorizationException in RpcProgramNfs3.java?
or do you mean we need to examine the whole codebase looking for every function 
that could potentially throw AccessControlException,
and make sure the error code is set correctly in the catch clause?{quote}

> NFS should not return NFS3ERR_IO for AccessControlException 
> 
>
> Key: HDFS-6451
> URL: https://issues.apache.org/jira/browse/HDFS-6451
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Reporter: Brandon Li
>
> As [~jingzhao] pointed out in HDFS-6411, we need to catch the 
> AccessControlException from the HDFS calls, and return NFS3ERR_PERM instead 
> of NFS3ERR_IO for it.
> Another possible improvement is to have a single class/method for the common 
> exception handling process, instead of repeating the same exception handling 
> process in different NFS methods.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6442) Fix TestEditLogAutoroll failure caused by port conficts

2014-05-27 Thread Zesheng Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010488#comment-14010488
 ] 

Zesheng Wu commented on HDFS-6442:
--

Thanks [~arpitagarwal], OK, I will make it more general as you suggested.

> Fix TestEditLogAutoroll failure caused by port conficts
> ---
>
> Key: HDFS-6442
> URL: https://issues.apache.org/jira/browse/HDFS-6442
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
>Priority: Minor
> Attachments: HDFS-6442.patch
>
>
> TestEditLogAutoroll and TestStandbyCheckpoints both use 10061 and 10062 to 
> set up the mini-cluster, this may result in occasionally test failure when 
> run test with -Pparallel-tests. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6411) nfs-hdfs-gateway mount raises I/O error and hangs when a unauthorized user attempts to access it

2014-05-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010486#comment-14010486
 ] 

Hudson commented on HDFS-6411:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5613 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5613/])
HDFS-6411. nfs-hdfs-gateway mount raises I/O error and hangs when a 
unauthorized user attempts to access it. Contributed by Brandon Li (brandonli: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1597895)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/nfs3/response/ACCESS3Response.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/RpcProgramNfs3.java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> nfs-hdfs-gateway mount raises I/O error and hangs when a unauthorized user 
> attempts to access it
> 
>
> Key: HDFS-6411
> URL: https://issues.apache.org/jira/browse/HDFS-6411
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.2.0
>Reporter: Zhongyi Xie
>Assignee: Brandon Li
> Fix For: 2.4.1
>
> Attachments: HDFS-6411-branch-2.2.patch, HDFS-6411.002.patch, 
> HDFS-6411.003.patch, HDFS-6411.004.patch, HDFS-6411.patch, 
> tcpdump-HDFS-6411-Brandon.out
>
>
> We use the nfs-hdfs gateway to expose hdfs thru nfs.
> 0) login as root, run nfs-hdfs gateway as a user, say, nfsserver. 
> [root@zhongyi-test-cluster-desktop hdfs]# ls /hdfs
> backups  hive  mr-history  system  tmp  user
> 1) add a user nfs-test: adduser nfs-test(make sure that this user is not a 
> proxyuser of nfsserver
> 2) switch to test user: su - nfs-test
> 3) access hdfs nfs gateway
> [nfs-test@zhongyi-test-cluster-desktop ~]$ ls /hdfs
> ls: cannot open directory /hdfs: Input/output error
> retry:
> [nfs-test@zhongyi-test-cluster-desktop ~]$ ls /hdfs
> ls: cannot access /hdfs: Stale NFS file handle
> 4) switch back to root and access hdfs nfs gateway
> [nfs-test@zhongyi-test-cluster-desktop ~]$ exit
> logout
> [root@zhongyi-test-cluster-desktop hdfs]# ls /hdfs
> ls: cannot access /hdfs: Stale NFS file handle
> the nfsserver log indicates we hit an authorization error in the rpc handler; 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
>  User: nfsserver is not allowed to impersonate nfs-test
> and NFS3ERR_IO is returned, which explains why we see input/output error. 
> One can catch the authorizationexception and return the correct error: 
> NFS3ERR_ACCES to fix the error message on the client side but that doesn't 
> seem to solve the mount hang issue though. When the mount hang happens, it 
> stops printing nfsserver log which makes it more difficult to figure out the 
> real cause of the hang. According to jstack and debugger, the nfsserver seems 
> to be waiting for client requests



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6376) Distcp data between two HA clusters requires another configuration

2014-05-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010480#comment-14010480
 ] 

Hadoop QA commented on HDFS-6376:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12647011/HDFS-6376-4-branch-2.4.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6989//console

This message is automatically generated.

> Distcp data between two HA clusters requires another configuration
> --
>
> Key: HDFS-6376
> URL: https://issues.apache.org/jira/browse/HDFS-6376
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, federation, hdfs-client
>Affects Versions: 2.3.0, 2.4.0
> Environment: Hadoop 2.3.0
>Reporter: Dave Marion
> Fix For: 2.4.1
>
> Attachments: HDFS-6376-2.patch, HDFS-6376-3-branch-2.4.patch, 
> HDFS-6376-4-branch-2.4.patch, HDFS-6376-branch-2.4.patch, 
> HDFS-6376-patch-1.patch
>
>
> User has to create a third set of configuration files for distcp when 
> transferring data between two HA clusters.
> Consider the scenario in [1]. You cannot put all of the required properties 
> in core-site.xml and hdfs-site.xml for the client to resolve the location of 
> both active namenodes. If you do, then the datanodes from cluster A may join 
> cluster B. I can not find a configuration option that tells the datanodes to 
> federate blocks for only one of the clusters in the configuration.
> [1] 
> http://mail-archives.apache.org/mod_mbox/hadoop-user/201404.mbox/%3CBAY172-W2133964E0C283968C161DD1520%40phx.gbl%3E



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6411) nfs-hdfs-gateway mount raises I/O error and hangs when a unauthorized user attempts to access it

2014-05-27 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-6411:
-

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

> nfs-hdfs-gateway mount raises I/O error and hangs when a unauthorized user 
> attempts to access it
> 
>
> Key: HDFS-6411
> URL: https://issues.apache.org/jira/browse/HDFS-6411
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.2.0
>Reporter: Zhongyi Xie
>Assignee: Brandon Li
> Attachments: HDFS-6411-branch-2.2.patch, HDFS-6411.002.patch, 
> HDFS-6411.003.patch, HDFS-6411.004.patch, HDFS-6411.patch, 
> tcpdump-HDFS-6411-Brandon.out
>
>
> We use the nfs-hdfs gateway to expose hdfs thru nfs.
> 0) login as root, run nfs-hdfs gateway as a user, say, nfsserver. 
> [root@zhongyi-test-cluster-desktop hdfs]# ls /hdfs
> backups  hive  mr-history  system  tmp  user
> 1) add a user nfs-test: adduser nfs-test(make sure that this user is not a 
> proxyuser of nfsserver
> 2) switch to test user: su - nfs-test
> 3) access hdfs nfs gateway
> [nfs-test@zhongyi-test-cluster-desktop ~]$ ls /hdfs
> ls: cannot open directory /hdfs: Input/output error
> retry:
> [nfs-test@zhongyi-test-cluster-desktop ~]$ ls /hdfs
> ls: cannot access /hdfs: Stale NFS file handle
> 4) switch back to root and access hdfs nfs gateway
> [nfs-test@zhongyi-test-cluster-desktop ~]$ exit
> logout
> [root@zhongyi-test-cluster-desktop hdfs]# ls /hdfs
> ls: cannot access /hdfs: Stale NFS file handle
> the nfsserver log indicates we hit an authorization error in the rpc handler; 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
>  User: nfsserver is not allowed to impersonate nfs-test
> and NFS3ERR_IO is returned, which explains why we see input/output error. 
> One can catch the authorizationexception and return the correct error: 
> NFS3ERR_ACCES to fix the error message on the client side but that doesn't 
> seem to solve the mount hang issue though. When the mount hang happens, it 
> stops printing nfsserver log which makes it more difficult to figure out the 
> real cause of the hang. According to jstack and debugger, the nfsserver seems 
> to be waiting for client requests



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6411) nfs-hdfs-gateway mount raises I/O error and hangs when a unauthorized user attempts to access it

2014-05-27 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-6411:
-

Fix Version/s: 2.4.1

> nfs-hdfs-gateway mount raises I/O error and hangs when a unauthorized user 
> attempts to access it
> 
>
> Key: HDFS-6411
> URL: https://issues.apache.org/jira/browse/HDFS-6411
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.2.0
>Reporter: Zhongyi Xie
>Assignee: Brandon Li
> Fix For: 2.4.1
>
> Attachments: HDFS-6411-branch-2.2.patch, HDFS-6411.002.patch, 
> HDFS-6411.003.patch, HDFS-6411.004.patch, HDFS-6411.patch, 
> tcpdump-HDFS-6411-Brandon.out
>
>
> We use the nfs-hdfs gateway to expose hdfs thru nfs.
> 0) login as root, run nfs-hdfs gateway as a user, say, nfsserver. 
> [root@zhongyi-test-cluster-desktop hdfs]# ls /hdfs
> backups  hive  mr-history  system  tmp  user
> 1) add a user nfs-test: adduser nfs-test(make sure that this user is not a 
> proxyuser of nfsserver
> 2) switch to test user: su - nfs-test
> 3) access hdfs nfs gateway
> [nfs-test@zhongyi-test-cluster-desktop ~]$ ls /hdfs
> ls: cannot open directory /hdfs: Input/output error
> retry:
> [nfs-test@zhongyi-test-cluster-desktop ~]$ ls /hdfs
> ls: cannot access /hdfs: Stale NFS file handle
> 4) switch back to root and access hdfs nfs gateway
> [nfs-test@zhongyi-test-cluster-desktop ~]$ exit
> logout
> [root@zhongyi-test-cluster-desktop hdfs]# ls /hdfs
> ls: cannot access /hdfs: Stale NFS file handle
> the nfsserver log indicates we hit an authorization error in the rpc handler; 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
>  User: nfsserver is not allowed to impersonate nfs-test
> and NFS3ERR_IO is returned, which explains why we see input/output error. 
> One can catch the authorizationexception and return the correct error: 
> NFS3ERR_ACCES to fix the error message on the client side but that doesn't 
> seem to solve the mount hang issue though. When the mount hang happens, it 
> stops printing nfsserver log which makes it more difficult to figure out the 
> real cause of the hang. According to jstack and debugger, the nfsserver seems 
> to be waiting for client requests



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6411) nfs-hdfs-gateway mount raises I/O error and hangs when a unauthorized user attempts to access it

2014-05-27 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010479#comment-14010479
 ] 

Brandon Li commented on HDFS-6411:
--

Thank you, guys. I've committed the patch.
Let's move further discussions for the code optimization to HDFS-6451.

> nfs-hdfs-gateway mount raises I/O error and hangs when a unauthorized user 
> attempts to access it
> 
>
> Key: HDFS-6411
> URL: https://issues.apache.org/jira/browse/HDFS-6411
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.2.0
>Reporter: Zhongyi Xie
>Assignee: Brandon Li
> Fix For: 2.4.1
>
> Attachments: HDFS-6411-branch-2.2.patch, HDFS-6411.002.patch, 
> HDFS-6411.003.patch, HDFS-6411.004.patch, HDFS-6411.patch, 
> tcpdump-HDFS-6411-Brandon.out
>
>
> We use the nfs-hdfs gateway to expose hdfs thru nfs.
> 0) login as root, run nfs-hdfs gateway as a user, say, nfsserver. 
> [root@zhongyi-test-cluster-desktop hdfs]# ls /hdfs
> backups  hive  mr-history  system  tmp  user
> 1) add a user nfs-test: adduser nfs-test(make sure that this user is not a 
> proxyuser of nfsserver
> 2) switch to test user: su - nfs-test
> 3) access hdfs nfs gateway
> [nfs-test@zhongyi-test-cluster-desktop ~]$ ls /hdfs
> ls: cannot open directory /hdfs: Input/output error
> retry:
> [nfs-test@zhongyi-test-cluster-desktop ~]$ ls /hdfs
> ls: cannot access /hdfs: Stale NFS file handle
> 4) switch back to root and access hdfs nfs gateway
> [nfs-test@zhongyi-test-cluster-desktop ~]$ exit
> logout
> [root@zhongyi-test-cluster-desktop hdfs]# ls /hdfs
> ls: cannot access /hdfs: Stale NFS file handle
> the nfsserver log indicates we hit an authorization error in the rpc handler; 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
>  User: nfsserver is not allowed to impersonate nfs-test
> and NFS3ERR_IO is returned, which explains why we see input/output error. 
> One can catch the authorizationexception and return the correct error: 
> NFS3ERR_ACCES to fix the error message on the client side but that doesn't 
> seem to solve the mount hang issue though. When the mount hang happens, it 
> stops printing nfsserver log which makes it more difficult to figure out the 
> real cause of the hang. According to jstack and debugger, the nfsserver seems 
> to be waiting for client requests



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6376) Distcp data between two HA clusters requires another configuration

2014-05-27 Thread Dave Marion (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dave Marion updated HDFS-6376:
--

Status: Open  (was: Patch Available)

forgot --no-prefix in patch 3

> Distcp data between two HA clusters requires another configuration
> --
>
> Key: HDFS-6376
> URL: https://issues.apache.org/jira/browse/HDFS-6376
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, federation, hdfs-client
>Affects Versions: 2.4.0, 2.3.0
> Environment: Hadoop 2.3.0
>Reporter: Dave Marion
> Fix For: 2.4.1
>
> Attachments: HDFS-6376-2.patch, HDFS-6376-3-branch-2.4.patch, 
> HDFS-6376-4-branch-2.4.patch, HDFS-6376-branch-2.4.patch, 
> HDFS-6376-patch-1.patch
>
>
> User has to create a third set of configuration files for distcp when 
> transferring data between two HA clusters.
> Consider the scenario in [1]. You cannot put all of the required properties 
> in core-site.xml and hdfs-site.xml for the client to resolve the location of 
> both active namenodes. If you do, then the datanodes from cluster A may join 
> cluster B. I can not find a configuration option that tells the datanodes to 
> federate blocks for only one of the clusters in the configuration.
> [1] 
> http://mail-archives.apache.org/mod_mbox/hadoop-user/201404.mbox/%3CBAY172-W2133964E0C283968C161DD1520%40phx.gbl%3E



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6376) Distcp data between two HA clusters requires another configuration

2014-05-27 Thread Dave Marion (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dave Marion updated HDFS-6376:
--

Status: Patch Available  (was: Open)

> Distcp data between two HA clusters requires another configuration
> --
>
> Key: HDFS-6376
> URL: https://issues.apache.org/jira/browse/HDFS-6376
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, federation, hdfs-client
>Affects Versions: 2.4.0, 2.3.0
> Environment: Hadoop 2.3.0
>Reporter: Dave Marion
> Fix For: 2.4.1
>
> Attachments: HDFS-6376-2.patch, HDFS-6376-3-branch-2.4.patch, 
> HDFS-6376-4-branch-2.4.patch, HDFS-6376-branch-2.4.patch, 
> HDFS-6376-patch-1.patch
>
>
> User has to create a third set of configuration files for distcp when 
> transferring data between two HA clusters.
> Consider the scenario in [1]. You cannot put all of the required properties 
> in core-site.xml and hdfs-site.xml for the client to resolve the location of 
> both active namenodes. If you do, then the datanodes from cluster A may join 
> cluster B. I can not find a configuration option that tells the datanodes to 
> federate blocks for only one of the clusters in the configuration.
> [1] 
> http://mail-archives.apache.org/mod_mbox/hadoop-user/201404.mbox/%3CBAY172-W2133964E0C283968C161DD1520%40phx.gbl%3E



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6411) nfs-hdfs-gateway mount raises I/O error and hangs when a unauthorized user attempts to access it

2014-05-27 Thread Zhongyi Xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010472#comment-14010472
 ] 

Zhongyi Xie commented on HDFS-6411:
---

Hi [~jingzhao], it's definitely good to have a single exception handler instead 
of replicating the same code everywhere, but since each server procedure 
(ACCESS, GETATTR, FSSTAT, etc) might have their private data that needs to be 
written out, the child NFS3Response class still need to overload the 
writeHeaderAndResponse anyways
for AccessControlException, do you mean we need to catch it together with 
AuthorizationException in RpcProgramNfs3.java? 
or do you mean we need to examine the whole codebase looking for every function 
that could potentially throw AccessControlException, 
and make sure the error code is set correctly in the catch clause?

> nfs-hdfs-gateway mount raises I/O error and hangs when a unauthorized user 
> attempts to access it
> 
>
> Key: HDFS-6411
> URL: https://issues.apache.org/jira/browse/HDFS-6411
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.2.0
>Reporter: Zhongyi Xie
>Assignee: Brandon Li
> Attachments: HDFS-6411-branch-2.2.patch, HDFS-6411.002.patch, 
> HDFS-6411.003.patch, HDFS-6411.004.patch, HDFS-6411.patch, 
> tcpdump-HDFS-6411-Brandon.out
>
>
> We use the nfs-hdfs gateway to expose hdfs thru nfs.
> 0) login as root, run nfs-hdfs gateway as a user, say, nfsserver. 
> [root@zhongyi-test-cluster-desktop hdfs]# ls /hdfs
> backups  hive  mr-history  system  tmp  user
> 1) add a user nfs-test: adduser nfs-test(make sure that this user is not a 
> proxyuser of nfsserver
> 2) switch to test user: su - nfs-test
> 3) access hdfs nfs gateway
> [nfs-test@zhongyi-test-cluster-desktop ~]$ ls /hdfs
> ls: cannot open directory /hdfs: Input/output error
> retry:
> [nfs-test@zhongyi-test-cluster-desktop ~]$ ls /hdfs
> ls: cannot access /hdfs: Stale NFS file handle
> 4) switch back to root and access hdfs nfs gateway
> [nfs-test@zhongyi-test-cluster-desktop ~]$ exit
> logout
> [root@zhongyi-test-cluster-desktop hdfs]# ls /hdfs
> ls: cannot access /hdfs: Stale NFS file handle
> the nfsserver log indicates we hit an authorization error in the rpc handler; 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
>  User: nfsserver is not allowed to impersonate nfs-test
> and NFS3ERR_IO is returned, which explains why we see input/output error. 
> One can catch the authorizationexception and return the correct error: 
> NFS3ERR_ACCES to fix the error message on the client side but that doesn't 
> seem to solve the mount hang issue though. When the mount hang happens, it 
> stops printing nfsserver log which makes it more difficult to figure out the 
> real cause of the hang. According to jstack and debugger, the nfsserver seems 
> to be waiting for client requests



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6376) Distcp data between two HA clusters requires another configuration

2014-05-27 Thread Dave Marion (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dave Marion updated HDFS-6376:
--

Attachment: HDFS-6376-4-branch-2.4.patch

fix patch

> Distcp data between two HA clusters requires another configuration
> --
>
> Key: HDFS-6376
> URL: https://issues.apache.org/jira/browse/HDFS-6376
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, federation, hdfs-client
>Affects Versions: 2.3.0, 2.4.0
> Environment: Hadoop 2.3.0
>Reporter: Dave Marion
> Fix For: 2.4.1
>
> Attachments: HDFS-6376-2.patch, HDFS-6376-3-branch-2.4.patch, 
> HDFS-6376-4-branch-2.4.patch, HDFS-6376-branch-2.4.patch, 
> HDFS-6376-patch-1.patch
>
>
> User has to create a third set of configuration files for distcp when 
> transferring data between two HA clusters.
> Consider the scenario in [1]. You cannot put all of the required properties 
> in core-site.xml and hdfs-site.xml for the client to resolve the location of 
> both active namenodes. If you do, then the datanodes from cluster A may join 
> cluster B. I can not find a configuration option that tells the datanodes to 
> federate blocks for only one of the clusters in the configuration.
> [1] 
> http://mail-archives.apache.org/mod_mbox/hadoop-user/201404.mbox/%3CBAY172-W2133964E0C283968C161DD1520%40phx.gbl%3E



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6056) Clean up NFS config settings

2014-05-27 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010460#comment-14010460
 ] 

Aaron T. Myers commented on HDFS-6056:
--

bq. OK. Let's further simplify the config for HDFS. Other NFS implementations 
can add their own prefix if they need.

Seems fine to me. Consistency of the HDFS configs/docs is really all I'm 
concerned with.

{quote}
I am actually not very concerned about the deprecation of keys in Common 
hadoop-nfs. The reason is that, 1) most of them are hdfs-nfs related, 2) the 
rest of them are all hidden keys and used for debug purpose except 
"dfs.nfs.exports.allowed.hosts". 
Even for "dfs.nfs.exports.allowed.hosts", we can add the deprecation 
declaration into Configuration#defaultDeprecations and remove it from 
Configuration after a couple releases. I will update the patch if this sounds 
ok to you.
{quote}

Sure, that sounds fine. Not the prettiest solution in the world, but certainly 
seems like it should work.

> Clean up NFS config settings
> 
>
> Key: HDFS-6056
> URL: https://issues.apache.org/jira/browse/HDFS-6056
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.3.0
>Reporter: Aaron T. Myers
>Assignee: Brandon Li
> Attachments: HDFS-6056.001.patch, HDFS-6056.002.patch, 
> HDFS-6056.003.patch, HDFS-6056.004.patch, HDFS-6056.005.patch, 
> HDFS-6056.006.patch, HDFS-6056.007.patch, HDFS-6056.008.patch
>
>
> As discussed on HDFS-6050, there's a few opportunities to improve the config 
> settings related to NFS. This JIRA is to implement those changes, which 
> include: moving hdfs-nfs related properties into hadoop-hdfs-nfs project, and 
> replacing 'nfs3' with 'nfs' in the property names.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6411) nfs-hdfs-gateway mount raises I/O error and hangs when a unauthorized user attempts to access it

2014-05-27 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010455#comment-14010455
 ] 

Brandon Li commented on HDFS-6411:
--

Thank you, [~jingzhao] for the review. I've filed HDFS-6451 to track the 
improvement you described.

> nfs-hdfs-gateway mount raises I/O error and hangs when a unauthorized user 
> attempts to access it
> 
>
> Key: HDFS-6411
> URL: https://issues.apache.org/jira/browse/HDFS-6411
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.2.0
>Reporter: Zhongyi Xie
>Assignee: Brandon Li
> Attachments: HDFS-6411-branch-2.2.patch, HDFS-6411.002.patch, 
> HDFS-6411.003.patch, HDFS-6411.004.patch, HDFS-6411.patch, 
> tcpdump-HDFS-6411-Brandon.out
>
>
> We use the nfs-hdfs gateway to expose hdfs thru nfs.
> 0) login as root, run nfs-hdfs gateway as a user, say, nfsserver. 
> [root@zhongyi-test-cluster-desktop hdfs]# ls /hdfs
> backups  hive  mr-history  system  tmp  user
> 1) add a user nfs-test: adduser nfs-test(make sure that this user is not a 
> proxyuser of nfsserver
> 2) switch to test user: su - nfs-test
> 3) access hdfs nfs gateway
> [nfs-test@zhongyi-test-cluster-desktop ~]$ ls /hdfs
> ls: cannot open directory /hdfs: Input/output error
> retry:
> [nfs-test@zhongyi-test-cluster-desktop ~]$ ls /hdfs
> ls: cannot access /hdfs: Stale NFS file handle
> 4) switch back to root and access hdfs nfs gateway
> [nfs-test@zhongyi-test-cluster-desktop ~]$ exit
> logout
> [root@zhongyi-test-cluster-desktop hdfs]# ls /hdfs
> ls: cannot access /hdfs: Stale NFS file handle
> the nfsserver log indicates we hit an authorization error in the rpc handler; 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
>  User: nfsserver is not allowed to impersonate nfs-test
> and NFS3ERR_IO is returned, which explains why we see input/output error. 
> One can catch the authorizationexception and return the correct error: 
> NFS3ERR_ACCES to fix the error message on the client side but that doesn't 
> seem to solve the mount hang issue though. When the mount hang happens, it 
> stops printing nfsserver log which makes it more difficult to figure out the 
> real cause of the hang. According to jstack and debugger, the nfsserver seems 
> to be waiting for client requests



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6451) NFS should not return NFS3ERR_IO for AccessControlException

2014-05-27 Thread Brandon Li (JIRA)
Brandon Li created HDFS-6451:


 Summary: NFS should not return NFS3ERR_IO for 
AccessControlException 
 Key: HDFS-6451
 URL: https://issues.apache.org/jira/browse/HDFS-6451
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Reporter: Brandon Li


As [~jingzhao] pointed out in HDFS-6411, we need to catch the 
AccessControlException from the HDFS calls, and return NFS3ERR_PERM instead of 
NFS3ERR_IO for it.

Another possible improvement is to have a single class/method for the common 
exception handling process, instead of repeating the same exception handling 
process in different NFS methods.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6411) nfs-hdfs-gateway mount raises I/O error and hangs when a unauthorized user attempts to access it

2014-05-27 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010439#comment-14010439
 ] 

Jing Zhao commented on HDFS-6411:
-

The current patch looks good to me. +1

One issue of the current code is that we may also want to catch the 
AccessControlException from the HDFS calls, and return NFS3ERR_PERM instead of 
NFS3ERR_IO for it. But I guess we can do it in a separate jira.

Another possible future improvement is that we can have a single class/method 
for the common exception handling process, instead of repeating the same 
exception handling process in different NFS methods.

> nfs-hdfs-gateway mount raises I/O error and hangs when a unauthorized user 
> attempts to access it
> 
>
> Key: HDFS-6411
> URL: https://issues.apache.org/jira/browse/HDFS-6411
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.2.0
>Reporter: Zhongyi Xie
>Assignee: Brandon Li
> Attachments: HDFS-6411-branch-2.2.patch, HDFS-6411.002.patch, 
> HDFS-6411.003.patch, HDFS-6411.004.patch, HDFS-6411.patch, 
> tcpdump-HDFS-6411-Brandon.out
>
>
> We use the nfs-hdfs gateway to expose hdfs thru nfs.
> 0) login as root, run nfs-hdfs gateway as a user, say, nfsserver. 
> [root@zhongyi-test-cluster-desktop hdfs]# ls /hdfs
> backups  hive  mr-history  system  tmp  user
> 1) add a user nfs-test: adduser nfs-test(make sure that this user is not a 
> proxyuser of nfsserver
> 2) switch to test user: su - nfs-test
> 3) access hdfs nfs gateway
> [nfs-test@zhongyi-test-cluster-desktop ~]$ ls /hdfs
> ls: cannot open directory /hdfs: Input/output error
> retry:
> [nfs-test@zhongyi-test-cluster-desktop ~]$ ls /hdfs
> ls: cannot access /hdfs: Stale NFS file handle
> 4) switch back to root and access hdfs nfs gateway
> [nfs-test@zhongyi-test-cluster-desktop ~]$ exit
> logout
> [root@zhongyi-test-cluster-desktop hdfs]# ls /hdfs
> ls: cannot access /hdfs: Stale NFS file handle
> the nfsserver log indicates we hit an authorization error in the rpc handler; 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
>  User: nfsserver is not allowed to impersonate nfs-test
> and NFS3ERR_IO is returned, which explains why we see input/output error. 
> One can catch the authorizationexception and return the correct error: 
> NFS3ERR_ACCES to fix the error message on the client side but that doesn't 
> seem to solve the mount hang issue though. When the mount hang happens, it 
> stops printing nfsserver log which makes it more difficult to figure out the 
> real cause of the hang. According to jstack and debugger, the nfsserver seems 
> to be waiting for client requests



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6442) Fix TestEditLogAutoroll failure caused by port conficts

2014-05-27 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010427#comment-14010427
 ] 

Arpit Agarwal commented on HDFS-6442:
-

Hi [~wuzesheng], the patch looks good but would you consider an approach like 
HDFS-6443 i.e. randomized port selection+ retries?

> Fix TestEditLogAutoroll failure caused by port conficts
> ---
>
> Key: HDFS-6442
> URL: https://issues.apache.org/jira/browse/HDFS-6442
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
>Priority: Minor
> Attachments: HDFS-6442.patch
>
>
> TestEditLogAutoroll and TestStandbyCheckpoints both use 10061 and 10062 to 
> set up the mini-cluster, this may result in occasionally test failure when 
> run test with -Pparallel-tests. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6056) Clean up NFS config settings

2014-05-27 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010419#comment-14010419
 ] 

Brandon Li commented on HDFS-6056:
--

Thanks, Aaron. 
{quote}move IdUserGroup#NFS_STATIC_MAPPING_FILE_KEY out of IdUserGroup and put 
it with all the other config names{quote}
Sure.
{quote}...so I would anticipate user confusion of which configs do and do not 
start with "dfs."{quote}
OK. Let's further simplify the config for HDFS. Other NFS implementations can 
add their own prefix if they need.  
{quote}...if there were some other project which only depended upon the Common 
hadoop-nfs project, the config deprecations would not be loaded. {quote}
I am actually not very concerned about the deprecation of keys in Common 
hadoop-nfs. The reason is that, 1) most of them are hdfs-nfs related, 2) the 
rest of them are all hidden keys and used for debug purpose except 
"dfs.nfs.exports.allowed.hosts". 
Even for "dfs.nfs.exports.allowed.hosts", we can add the deprecation 
declaration into Configuration#defaultDeprecations and remove it from 
Configuration after a couple releases. I will update the patch if this sounds 
ok to you.

> Clean up NFS config settings
> 
>
> Key: HDFS-6056
> URL: https://issues.apache.org/jira/browse/HDFS-6056
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.3.0
>Reporter: Aaron T. Myers
>Assignee: Brandon Li
> Attachments: HDFS-6056.001.patch, HDFS-6056.002.patch, 
> HDFS-6056.003.patch, HDFS-6056.004.patch, HDFS-6056.005.patch, 
> HDFS-6056.006.patch, HDFS-6056.007.patch, HDFS-6056.008.patch
>
>
> As discussed on HDFS-6050, there's a few opportunities to improve the config 
> settings related to NFS. This JIRA is to implement those changes, which 
> include: moving hdfs-nfs related properties into hadoop-hdfs-nfs project, and 
> replacing 'nfs3' with 'nfs' in the property names.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6447) balancer should timestamp the completion message

2014-05-27 Thread Juan Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Yu updated HDFS-6447:
--

Attachment: HDFS-6447.002.patch

There is no new test since the change is just to add timestamp to a trace.

sample output: 
May 27, 2014 2:20:25 PM  Balancing took 3.087 seconds

> balancer should timestamp the completion message
> 
>
> Key: HDFS-6447
> URL: https://issues.apache.org/jira/browse/HDFS-6447
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Reporter: Allen Wittenauer
>Assignee: Juan Yu
>Priority: Trivial
>  Labels: newbie
> Attachments: HDFS-6447.002.patch, HDFS-6447.patch.001
>
>
> When the balancer finishes, it doesn't report the time it finished.  It 
> should do this so that users have a better sense of how long it took to 
> complete.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6411) nfs-hdfs-gateway mount raises I/O error and hangs when a unauthorized user attempts to access it

2014-05-27 Thread Zhongyi Xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010267#comment-14010267
 ] 

Zhongyi Xie commented on HDFS-6411:
---

Hi [~brandonli], I've tested it on my VM and looks like the problem is fixed

However I did see something interesting
[alti-test-02@alexie-dt root]$ mkdir /hdfs/tmp/dir
mkdir: cannot create directory `/hdfs/tmp/dir': Permission denied
[alti-test-02@alexie-dt root]$ rmdir /hdfs/tmp
rmdir: failed to remove `/hdfs/tmp': Permission denied
[alti-test-02@alexie-dt root]$ rmdir /hdfs/
rmdir: failed to remove `/hdfs/': Permission denied
[alti-test-02@alexie-dt root]$ ls /hdfs
ls: cannot access /hdfs: Stale file handle

but once I log out of alti-test-02 user back to root, the NFS handle is still 
working
[root@alexie-dt ~]# ls /hdfs
backups  hive  mr-history  system  tmp  user

and I retried these steps, the problem goes away (i.e. I didn't see the Stale 
file handle again), 
so unless there is some consistent repro steps and if the handle is hang again, 
I won't worry about that

> nfs-hdfs-gateway mount raises I/O error and hangs when a unauthorized user 
> attempts to access it
> 
>
> Key: HDFS-6411
> URL: https://issues.apache.org/jira/browse/HDFS-6411
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.2.0
>Reporter: Zhongyi Xie
>Assignee: Brandon Li
> Attachments: HDFS-6411-branch-2.2.patch, HDFS-6411.002.patch, 
> HDFS-6411.003.patch, HDFS-6411.004.patch, HDFS-6411.patch, 
> tcpdump-HDFS-6411-Brandon.out
>
>
> We use the nfs-hdfs gateway to expose hdfs thru nfs.
> 0) login as root, run nfs-hdfs gateway as a user, say, nfsserver. 
> [root@zhongyi-test-cluster-desktop hdfs]# ls /hdfs
> backups  hive  mr-history  system  tmp  user
> 1) add a user nfs-test: adduser nfs-test(make sure that this user is not a 
> proxyuser of nfsserver
> 2) switch to test user: su - nfs-test
> 3) access hdfs nfs gateway
> [nfs-test@zhongyi-test-cluster-desktop ~]$ ls /hdfs
> ls: cannot open directory /hdfs: Input/output error
> retry:
> [nfs-test@zhongyi-test-cluster-desktop ~]$ ls /hdfs
> ls: cannot access /hdfs: Stale NFS file handle
> 4) switch back to root and access hdfs nfs gateway
> [nfs-test@zhongyi-test-cluster-desktop ~]$ exit
> logout
> [root@zhongyi-test-cluster-desktop hdfs]# ls /hdfs
> ls: cannot access /hdfs: Stale NFS file handle
> the nfsserver log indicates we hit an authorization error in the rpc handler; 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
>  User: nfsserver is not allowed to impersonate nfs-test
> and NFS3ERR_IO is returned, which explains why we see input/output error. 
> One can catch the authorizationexception and return the correct error: 
> NFS3ERR_ACCES to fix the error message on the client side but that doesn't 
> seem to solve the mount hang issue though. When the mount hang happens, it 
> stops printing nfsserver log which makes it more difficult to figure out the 
> real cause of the hang. According to jstack and debugger, the nfsserver seems 
> to be waiting for client requests



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6056) Clean up NFS config settings

2014-05-27 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010265#comment-14010265
 ] 

Aaron T. Myers commented on HDFS-6056:
--

Hey Brandon, latest patch looks pretty good to me. Two comments for you:

# Seems like we should move {{IdUserGroup#NFS_STATIC_MAPPING_FILE_KEY}} out of 
{{IdUserGroup}} and put it with all the other config names.
# I still find it unfortunate that we now have some configs which just start 
with "nfs." and others which start with "dfs.nfs.". From the user's 
perspective, there's no good reason for this, since they'll only be using NFS 
to access HDFS, so I would anticipate user confusion of which configs do and do 
not start with "dfs."
# I think there may be a bit of a problem with having the {{NfsConfiguration}} 
class in the hadoop-hdfs-nfs project, since it is also responsible for adding 
the {{DeprecationDeltas}} for NFS config settings which only exist in the 
hadoop-nfs (Common) project. This means that, though no such project exists 
today, if there were some other project which only depended upon the Common 
hadoop-nfs project, the config deprecations would not be loaded. This seems 
like it might be another argument in favor of moving all of this code into the 
single hadoop-hdfs-nfs project.

> Clean up NFS config settings
> 
>
> Key: HDFS-6056
> URL: https://issues.apache.org/jira/browse/HDFS-6056
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.3.0
>Reporter: Aaron T. Myers
>Assignee: Brandon Li
> Attachments: HDFS-6056.001.patch, HDFS-6056.002.patch, 
> HDFS-6056.003.patch, HDFS-6056.004.patch, HDFS-6056.005.patch, 
> HDFS-6056.006.patch, HDFS-6056.007.patch, HDFS-6056.008.patch
>
>
> As discussed on HDFS-6050, there's a few opportunities to improve the config 
> settings related to NFS. This JIRA is to implement those changes, which 
> include: moving hdfs-nfs related properties into hadoop-hdfs-nfs project, and 
> replacing 'nfs3' with 'nfs' in the property names.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6416) Use Time#monotonicNow in OpenFileCtx and OpenFileCtxCatch to avoid system clock bugs

2014-05-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010249#comment-14010249
 ] 

Hudson commented on HDFS-6416:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5612 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5612/])
HDFS-6416. Use Time#monotonicNow in OpenFileCtx and OpenFileCtxCatch to avoid 
system clock bugs. Contributed by Abhiraj Butala (brandonli: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1597868)
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtxCache.java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Use Time#monotonicNow in OpenFileCtx and OpenFileCtxCatch to avoid system 
> clock bugs
> 
>
> Key: HDFS-6416
> URL: https://issues.apache.org/jira/browse/HDFS-6416
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: nfs
>Affects Versions: 2.4.0
>Reporter: Brandon Li
>Assignee: Abhiraj Butala
>Priority: Minor
> Fix For: 2.5.0
>
> Attachments: HDFS-6416.patch
>
>
> As [~cnauroth]  pointed out in HADOOP-10612,  Time#monotonicNow is a more 
> preferred method to use since this isn't subject to system clock bugs (i.e. 
> Someone resets the clock to a time in the past, and then updates don't happen 
> for a long time.)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6441) Add ability to exclude/include few datanodes while balancing

2014-05-27 Thread Benoy Antony (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010250#comment-14010250
 ] 

Benoy Antony commented on HDFS-6441:


{quote}
What happens if neither option is given? It appears to maybe ignore all hosts?
{quote}
If neither option is given, all the nodes will be included. This is the 
behavior without this patch.
The internal boolean variable (exclude) is set to true, but the list of nodes 
to exclude will be empty.
{quote}
If both options are given, it appears to build a union of the include/exclude 
hosts, then use the last argument to determine if the union is exclude or not?
{quote}
If both options are given, the last option will be effective. 
{quote}
I seem to recall getHostName is (or used to be) a bit peculiar and can return a 
DN self-reported name, hence the getPeerHostName which is guaranteed to return 
the actual hostname. You should check and match the NN's behavior on use of 
peer name or reported name.
{quote}
I believe, You are right. I'll check and test with _getPeerHostName_
{quote}
_DEFALUT is misspelled
{quote}
I'll fix this.


> Add ability to exclude/include few datanodes while balancing
> 
>
> Key: HDFS-6441
> URL: https://issues.apache.org/jira/browse/HDFS-6441
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Affects Versions: 2.4.0
>Reporter: Benoy Antony
>Assignee: Benoy Antony
> Attachments: HDFS-6441.patch, HDFS-6441.patch, HDFS-6441.patch, 
> HDFS-6441.patch, HDFS-6441.patch, HDFS-6441.patch, HDFS-6441.patch, 
> HDFS-6441.patch, HDFS-6441.patch
>
>
> In some use cases, it is desirable to ignore a few data nodes  while 
> balancing. The administrator should be able to specify a list of data nodes 
> in a file similar to the hosts file and the balancer should ignore these data 
> nodes while balancing so that no blocks are added/removed on these nodes.
> Similarly it will be beneficial to specify that only a particular list of 
> datanodes should be considered for balancing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6416) Use Time#monotonicNow in OpenFileCtx and OpenFileCtxCatch to avoid system clock bugs

2014-05-27 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010219#comment-14010219
 ] 

Brandon Li commented on HDFS-6416:
--

Thank you, [~abutala]. I've committed the patch.

> Use Time#monotonicNow in OpenFileCtx and OpenFileCtxCatch to avoid system 
> clock bugs
> 
>
> Key: HDFS-6416
> URL: https://issues.apache.org/jira/browse/HDFS-6416
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: nfs
>Affects Versions: 2.4.0
>Reporter: Brandon Li
>Assignee: Abhiraj Butala
>Priority: Minor
> Fix For: 2.5.0
>
> Attachments: HDFS-6416.patch
>
>
> As [~cnauroth]  pointed out in HADOOP-10612,  Time#monotonicNow is a more 
> preferred method to use since this isn't subject to system clock bugs (i.e. 
> Someone resets the clock to a time in the past, and then updates don't happen 
> for a long time.)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6416) Use Time#monotonicNow in OpenFileCtx and OpenFileCtxCatch to avoid system clock bugs

2014-05-27 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-6416:
-

Fix Version/s: 2.5.0

> Use Time#monotonicNow in OpenFileCtx and OpenFileCtxCatch to avoid system 
> clock bugs
> 
>
> Key: HDFS-6416
> URL: https://issues.apache.org/jira/browse/HDFS-6416
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: nfs
>Affects Versions: 2.4.0
>Reporter: Brandon Li
>Assignee: Abhiraj Butala
>Priority: Minor
> Fix For: 2.5.0
>
> Attachments: HDFS-6416.patch
>
>
> As [~cnauroth]  pointed out in HADOOP-10612,  Time#monotonicNow is a more 
> preferred method to use since this isn't subject to system clock bugs (i.e. 
> Someone resets the clock to a time in the past, and then updates don't happen 
> for a long time.)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6416) Use Time#monotonicNow in OpenFileCtx and OpenFileCtxCatch to avoid system clock bugs

2014-05-27 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-6416:
-

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

> Use Time#monotonicNow in OpenFileCtx and OpenFileCtxCatch to avoid system 
> clock bugs
> 
>
> Key: HDFS-6416
> URL: https://issues.apache.org/jira/browse/HDFS-6416
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: nfs
>Affects Versions: 2.4.0
>Reporter: Brandon Li
>Assignee: Abhiraj Butala
>Priority: Minor
> Fix For: 2.5.0
>
> Attachments: HDFS-6416.patch
>
>
> As [~cnauroth]  pointed out in HADOOP-10612,  Time#monotonicNow is a more 
> preferred method to use since this isn't subject to system clock bugs (i.e. 
> Someone resets the clock to a time in the past, and then updates don't happen 
> for a long time.)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6411) nfs-hdfs-gateway mount raises I/O error and hangs when a unauthorized user attempts to access it

2014-05-27 Thread Zhongyi Xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010217#comment-14010217
 ] 

Zhongyi Xie commented on HDFS-6411:
---

[~brandonli], it looks good, but I haven't got a chance to test it out since my 
VM is broken today, will run the test cases once my VM is back to normal, will 
let you know by then, thanks!

> nfs-hdfs-gateway mount raises I/O error and hangs when a unauthorized user 
> attempts to access it
> 
>
> Key: HDFS-6411
> URL: https://issues.apache.org/jira/browse/HDFS-6411
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.2.0
>Reporter: Zhongyi Xie
>Assignee: Brandon Li
> Attachments: HDFS-6411-branch-2.2.patch, HDFS-6411.002.patch, 
> HDFS-6411.003.patch, HDFS-6411.004.patch, HDFS-6411.patch, 
> tcpdump-HDFS-6411-Brandon.out
>
>
> We use the nfs-hdfs gateway to expose hdfs thru nfs.
> 0) login as root, run nfs-hdfs gateway as a user, say, nfsserver. 
> [root@zhongyi-test-cluster-desktop hdfs]# ls /hdfs
> backups  hive  mr-history  system  tmp  user
> 1) add a user nfs-test: adduser nfs-test(make sure that this user is not a 
> proxyuser of nfsserver
> 2) switch to test user: su - nfs-test
> 3) access hdfs nfs gateway
> [nfs-test@zhongyi-test-cluster-desktop ~]$ ls /hdfs
> ls: cannot open directory /hdfs: Input/output error
> retry:
> [nfs-test@zhongyi-test-cluster-desktop ~]$ ls /hdfs
> ls: cannot access /hdfs: Stale NFS file handle
> 4) switch back to root and access hdfs nfs gateway
> [nfs-test@zhongyi-test-cluster-desktop ~]$ exit
> logout
> [root@zhongyi-test-cluster-desktop hdfs]# ls /hdfs
> ls: cannot access /hdfs: Stale NFS file handle
> the nfsserver log indicates we hit an authorization error in the rpc handler; 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
>  User: nfsserver is not allowed to impersonate nfs-test
> and NFS3ERR_IO is returned, which explains why we see input/output error. 
> One can catch the authorizationexception and return the correct error: 
> NFS3ERR_ACCES to fix the error message on the client side but that doesn't 
> seem to solve the mount hang issue though. When the mount hang happens, it 
> stops printing nfsserver log which makes it more difficult to figure out the 
> real cause of the hang. According to jstack and debugger, the nfsserver seems 
> to be waiting for client requests



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6411) nfs-hdfs-gateway mount raises I/O error and hangs when a unauthorized user attempts to access it

2014-05-27 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010214#comment-14010214
 ] 

Brandon Li commented on HDFS-6411:
--

[~zhongyi-altiscale], how does the new patch look?

> nfs-hdfs-gateway mount raises I/O error and hangs when a unauthorized user 
> attempts to access it
> 
>
> Key: HDFS-6411
> URL: https://issues.apache.org/jira/browse/HDFS-6411
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.2.0
>Reporter: Zhongyi Xie
>Assignee: Brandon Li
> Attachments: HDFS-6411-branch-2.2.patch, HDFS-6411.002.patch, 
> HDFS-6411.003.patch, HDFS-6411.004.patch, HDFS-6411.patch, 
> tcpdump-HDFS-6411-Brandon.out
>
>
> We use the nfs-hdfs gateway to expose hdfs thru nfs.
> 0) login as root, run nfs-hdfs gateway as a user, say, nfsserver. 
> [root@zhongyi-test-cluster-desktop hdfs]# ls /hdfs
> backups  hive  mr-history  system  tmp  user
> 1) add a user nfs-test: adduser nfs-test(make sure that this user is not a 
> proxyuser of nfsserver
> 2) switch to test user: su - nfs-test
> 3) access hdfs nfs gateway
> [nfs-test@zhongyi-test-cluster-desktop ~]$ ls /hdfs
> ls: cannot open directory /hdfs: Input/output error
> retry:
> [nfs-test@zhongyi-test-cluster-desktop ~]$ ls /hdfs
> ls: cannot access /hdfs: Stale NFS file handle
> 4) switch back to root and access hdfs nfs gateway
> [nfs-test@zhongyi-test-cluster-desktop ~]$ exit
> logout
> [root@zhongyi-test-cluster-desktop hdfs]# ls /hdfs
> ls: cannot access /hdfs: Stale NFS file handle
> the nfsserver log indicates we hit an authorization error in the rpc handler; 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
>  User: nfsserver is not allowed to impersonate nfs-test
> and NFS3ERR_IO is returned, which explains why we see input/output error. 
> One can catch the authorizationexception and return the correct error: 
> NFS3ERR_ACCES to fix the error message on the client side but that doesn't 
> seem to solve the mount hang issue though. When the mount hang happens, it 
> stops printing nfsserver log which makes it more difficult to figure out the 
> real cause of the hang. According to jstack and debugger, the nfsserver seems 
> to be waiting for client requests



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6450) Support non-positional hedged reads in HDFS

2014-05-27 Thread Colin Patrick McCabe (JIRA)
Colin Patrick McCabe created HDFS-6450:
--

 Summary: Support non-positional hedged reads in HDFS
 Key: HDFS-6450
 URL: https://issues.apache.org/jira/browse/HDFS-6450
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.4.0
Reporter: Colin Patrick McCabe


HDFS-5776 added support for hedged positional reads.  We should also support 
hedged non-position reads (aka regular reads).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6416) Use Time#monotonicNow in OpenFileCtx and OpenFileCtxCatch to avoid system clock bugs

2014-05-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010193#comment-14010193
 ] 

Hadoop QA commented on HDFS-6416:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12646189/HDFS-6416.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs-nfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6987//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6987//console

This message is automatically generated.

> Use Time#monotonicNow in OpenFileCtx and OpenFileCtxCatch to avoid system 
> clock bugs
> 
>
> Key: HDFS-6416
> URL: https://issues.apache.org/jira/browse/HDFS-6416
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: nfs
>Affects Versions: 2.4.0
>Reporter: Brandon Li
>Assignee: Abhiraj Butala
>Priority: Minor
> Attachments: HDFS-6416.patch
>
>
> As [~cnauroth]  pointed out in HADOOP-10612,  Time#monotonicNow is a more 
> preferred method to use since this isn't subject to system clock bugs (i.e. 
> Someone resets the clock to a time in the past, and then updates don't happen 
> for a long time.)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (HDFS-6379) HTTPFS - Implement ACLs support

2014-05-27 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur reassigned HDFS-6379:


Assignee: Mike Yoder  (was: Alejandro Abdelnur)

> HTTPFS - Implement ACLs support
> ---
>
> Key: HDFS-6379
> URL: https://issues.apache.org/jira/browse/HDFS-6379
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Alejandro Abdelnur
>Assignee: Mike Yoder
> Fix For: 2.4.0
>
>
> HDFS-4685 added ACLs support to WebHDFS but missed adding them to HttpFS.
> This JIRA is for such.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5682) Heterogeneous Storage phase 2 - APIs to expose Storage Types

2014-05-27 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010157#comment-14010157
 ] 

Arpit Agarwal commented on HDFS-5682:
-

Thanks for the feedback [~wuzesheng]. My responses are below.

bq.  1. About the storage type, because I didn't participate the discussion in 
HDFS-2832, I am confused by the current storage type DISK and SSD. I think SSD 
is also one type of disk, DISK and SSD are not orthogonal. Can we change 
storage type to HDD and SDD, this will be more straightforward?
Good point, I'll look into making the names clearer. In a subsequent revision 
of the API we would like to eliminate the hard-coded names from code altogether.

bq.  2. About setStorageTypeSpaceQuota/getStorageTypeSpaceQuota, these two 
names are not very natural. From the literal meaning, it sounds like 
setting/getting space quota on some storage type other than some type of 
storage. I would suggest that setStorageSpaceQuota/getStorageSpaceQuota will be 
better. I am not a native English speaker, if I were wrong, just ignore this.
The function name should communicate that this is disk space quota for a 
specific storage type, as opposed to the overall quotas which are set with 
{{setQuota}}. If the proposed name is hard to follow, how about 
{{get}}/{{setsetQuotaByStorageType}}? 

bq.  3. About the command line, hdfs dfsadmin -get(set)StorageTypeSpaceQuota, I 
think get(set) one storage type once is simple and straightforward, if we 
get(set) more than one once, because there's no atomicity guarantee, it's 
complicated to handle failure.
Yes I think we can simplify the command line as you suggested.

bq. 4. About the StoragePreference class, as you said in the design doc in 
HDFS-2832, in the future HDFS will support place replicas on different 
storages, such as 1 on SSD, and 2 on HDD. I would suggest that 
StoragePerference class can support specifying storage type of each replica 
now, in this way, we can easily support the above feature in the future.
Let's defer this for now. The API and protocol can both be easily extended in a 
backwards compatible manner in the future without affecting existing 
applications.

bq.  5. About the create file sematics, as you said in the doc "During file 
creation there must be sufficient quota to place at least one block times the 
replication factor on the target storage type, otherwise the request is falied 
immediately with QuotaExceededException", I think it will be more natural and 
friendly that first create the file on the default storage(HDD) if there's not 
enough space of desired storage type , and than let the namenode replicate the 
block to desired storage lazily when there's enough space available.
We have to differentiate between quota unavailability vs disk space 
availability. The former will result in a quota violation exception, the latter 
will result in the behavior you described. We discuss the reasons for this in 
the HDFS-2832 design doc.

> Heterogeneous Storage phase 2 - APIs to expose Storage Types
> 
>
> Key: HDFS-5682
> URL: https://issues.apache.org/jira/browse/HDFS-5682
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
> Attachments: 20140522-Heterogeneous-Storages-API.pdf
>
>
> Phase 1 (HDFS-2832) added support to present the DataNode as a collection of 
> discrete storages of different types.
> This Jira is to track phase 2 of the Heterogeneous Storage work which 
> involves exposing Storage Types to applications and adding Quota Management 
> support for administrators.
> This phase will also include tools support for administrators/users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6416) Use Time#monotonicNow in OpenFileCtx and OpenFileCtxCatch to avoid system clock bugs

2014-05-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010154#comment-14010154
 ] 

Hadoop QA commented on HDFS-6416:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12646189/HDFS-6416.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs-nfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6986//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6986//console

This message is automatically generated.

> Use Time#monotonicNow in OpenFileCtx and OpenFileCtxCatch to avoid system 
> clock bugs
> 
>
> Key: HDFS-6416
> URL: https://issues.apache.org/jira/browse/HDFS-6416
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: nfs
>Affects Versions: 2.4.0
>Reporter: Brandon Li
>Assignee: Abhiraj Butala
>Priority: Minor
> Attachments: HDFS-6416.patch
>
>
> As [~cnauroth]  pointed out in HADOOP-10612,  Time#monotonicNow is a more 
> preferred method to use since this isn't subject to system clock bugs (i.e. 
> Someone resets the clock to a time in the past, and then updates don't happen 
> for a long time.)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6447) balancer should timestamp the completion message

2014-05-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010138#comment-14010138
 ] 

Hadoop QA commented on HDFS-6447:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12646925/HDFS-6447.patch.001
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6981//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6981//console

This message is automatically generated.

> balancer should timestamp the completion message
> 
>
> Key: HDFS-6447
> URL: https://issues.apache.org/jira/browse/HDFS-6447
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Reporter: Allen Wittenauer
>Assignee: Juan Yu
>Priority: Trivial
>  Labels: newbie
> Attachments: HDFS-6447.patch.001
>
>
> When the balancer finishes, it doesn't report the time it finished.  It 
> should do this so that users have a better sense of how long it took to 
> complete.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6447) balancer should timestamp the completion message

2014-05-27 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010147#comment-14010147
 ] 

Allen Wittenauer commented on HDFS-6447:


it might be nice to format it similarly to the output of the rest of the 
balancer output, where timestamp comes first.  But other than that, yup, this 
is pretty much what I'm looking to see added. :D

> balancer should timestamp the completion message
> 
>
> Key: HDFS-6447
> URL: https://issues.apache.org/jira/browse/HDFS-6447
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Reporter: Allen Wittenauer
>Assignee: Juan Yu
>Priority: Trivial
>  Labels: newbie
> Attachments: HDFS-6447.patch.001
>
>
> When the balancer finishes, it doesn't report the time it finished.  It 
> should do this so that users have a better sense of how long it took to 
> complete.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6411) nfs-hdfs-gateway mount raises I/O error and hangs when a unauthorized user attempts to access it

2014-05-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010090#comment-14010090
 ] 

Hadoop QA commented on HDFS-6411:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12646956/HDFS-6411.004.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-nfs hadoop-hdfs-project/hadoop-hdfs-nfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6985//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6985//console

This message is automatically generated.

> nfs-hdfs-gateway mount raises I/O error and hangs when a unauthorized user 
> attempts to access it
> 
>
> Key: HDFS-6411
> URL: https://issues.apache.org/jira/browse/HDFS-6411
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.2.0
>Reporter: Zhongyi Xie
>Assignee: Brandon Li
> Attachments: HDFS-6411-branch-2.2.patch, HDFS-6411.002.patch, 
> HDFS-6411.003.patch, HDFS-6411.004.patch, HDFS-6411.patch, 
> tcpdump-HDFS-6411-Brandon.out
>
>
> We use the nfs-hdfs gateway to expose hdfs thru nfs.
> 0) login as root, run nfs-hdfs gateway as a user, say, nfsserver. 
> [root@zhongyi-test-cluster-desktop hdfs]# ls /hdfs
> backups  hive  mr-history  system  tmp  user
> 1) add a user nfs-test: adduser nfs-test(make sure that this user is not a 
> proxyuser of nfsserver
> 2) switch to test user: su - nfs-test
> 3) access hdfs nfs gateway
> [nfs-test@zhongyi-test-cluster-desktop ~]$ ls /hdfs
> ls: cannot open directory /hdfs: Input/output error
> retry:
> [nfs-test@zhongyi-test-cluster-desktop ~]$ ls /hdfs
> ls: cannot access /hdfs: Stale NFS file handle
> 4) switch back to root and access hdfs nfs gateway
> [nfs-test@zhongyi-test-cluster-desktop ~]$ exit
> logout
> [root@zhongyi-test-cluster-desktop hdfs]# ls /hdfs
> ls: cannot access /hdfs: Stale NFS file handle
> the nfsserver log indicates we hit an authorization error in the rpc handler; 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
>  User: nfsserver is not allowed to impersonate nfs-test
> and NFS3ERR_IO is returned, which explains why we see input/output error. 
> One can catch the authorizationexception and return the correct error: 
> NFS3ERR_ACCES to fix the error message on the client side but that doesn't 
> seem to solve the mount hang issue though. When the mount hang happens, it 
> stops printing nfsserver log which makes it more difficult to figure out the 
> real cause of the hang. According to jstack and debugger, the nfsserver seems 
> to be waiting for client requests



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-3493) Replication is not happened for the block (which is recovered and in finalized) to the Datanode which has got the same block with old generation timestamp in RBW

2014-05-27 Thread Juan Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010073#comment-14010073
 ] 

Juan Yu commented on HDFS-3493:
---

Hi Vinay,

Would you mind I take it over and finish it?

Thanks,
Juan

> Replication is not happened for the block (which is recovered and in 
> finalized) to the Datanode which has got the same block with old generation 
> timestamp in RBW
> -
>
> Key: HDFS-3493
> URL: https://issues.apache.org/jira/browse/HDFS-3493
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.0-alpha, 2.0.5-alpha
>Reporter: J.Andreina
>Assignee: Vinayakumar B
> Attachments: HDFS-3493.patch
>
>
> replication factor= 3, block report interval= 1min and start NN and 3DN
> Step 1:Write a file without close and do hflush (Dn1,DN2,DN3 has blk_ts1)
> Step 2:Stopped DN3
> Step 3:recovery happens and time stamp updated(blk_ts2)
> Step 4:close the file
> Step 5:blk_ts2 is finalized and available in DN1 and Dn2
> Step 6:now restarted DN3(which has got blk_ts1 in rbw)
> From the NN side there is no cmd issued to DN3 to delete the blk_ts1 . But 
> ask DN3 to make the block as corrupt .
> Replication of blk_ts2 to DN3 is not happened.
> NN logs:
> 
> {noformat}
> INFO org.apache.hadoop.hdfs.StateChange: BLOCK 
> NameSystem.addToCorruptReplicasMap: duplicate requested for 
> blk_3927215081484173742 to add as corrupt on XX.XX.XX.XX:50276 by 
> /XX.XX.XX.XX because reported RWR replica with genstamp 1007 does not match 
> COMPLETE block's genstamp in block map 1008
> INFO org.apache.hadoop.hdfs.StateChange: BLOCK* processReport: from 
> DatanodeRegistration(XX.XX.XX.XX, 
> storageID=DS-443871816-XX.XX.XX.XX-50276-1336829714197, infoPort=50275, 
> ipcPort=50277, 
> storageInfo=lv=-40;cid=CID-e654ac13-92dc-4f82-a22b-c0b6861d06d7;nsid=2063001898;c=0),
>  blocks: 2, processing time: 1 msecs
> INFO org.apache.hadoop.hdfs.StateChange: BLOCK* Removing block 
> blk_3927215081484173742_1008 from neededReplications as it has enough 
> replicas.
> INFO org.apache.hadoop.hdfs.StateChange: BLOCK 
> NameSystem.addToCorruptReplicasMap: duplicate requested for 
> blk_3927215081484173742 to add as corrupt on XX.XX.XX.XX:50276 by 
> /XX.XX.XX.XX because reported RWR replica with genstamp 1007 does not match 
> COMPLETE block's genstamp in block map 1008
> INFO org.apache.hadoop.hdfs.StateChange: BLOCK* processReport: from 
> DatanodeRegistration(XX.XX.XX.XX, 
> storageID=DS-443871816-XX.XX.XX.XX-50276-1336829714197, infoPort=50275, 
> ipcPort=50277, 
> storageInfo=lv=-40;cid=CID-e654ac13-92dc-4f82-a22b-c0b6861d06d7;nsid=2063001898;c=0),
>  blocks: 2, processing time: 1 msecs
> WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Not 
> able to place enough replicas, still in need of 1 to reach 1
> For more information, please enable DEBUG log level on 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
> {noformat}
> fsck Report
> ===
> {noformat}
> /file21:  Under replicated 
> BP-1008469586-XX.XX.XX.XX-1336829603103:blk_3927215081484173742_1008. Target 
> Replicas is 3 but found 2 replica(s).
> .Status: HEALTHY
>  Total size:  495 B
>  Total dirs:  1
>  Total files: 3
>  Total blocks (validated):3 (avg. block size 165 B)
>  Minimally replicated blocks: 3 (100.0 %)
>  Over-replicated blocks:  0 (0.0 %)
>  Under-replicated blocks: 1 (33.32 %)
>  Mis-replicated blocks:   0 (0.0 %)
>  Default replication factor:  1
>  Average block replication:   2.0
>  Corrupt blocks:  0
>  Missing replicas:1 (14.285714 %)
>  Number of data-nodes:3
>  Number of racks: 1
> FSCK ended at Sun May 13 09:49:05 IST 2012 in 9 milliseconds
> The filesystem under path '/' is HEALTHY
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-05-27 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010053#comment-14010053
 ] 

Colin Patrick McCabe commented on HDFS-6382:


bq. But if there's no internal cleanup mechanism of HDFS, all users(across 
companies) need to write their own cleanup tools respectively, lots of repeated 
work.

Like I said, we should write such a tool and add it to the base Hadoop 
distribution.  This is similar to what we did with {{DistCp}}.  Then users 
would not need to write their own versions of this stuff.

It's important to distinguish between creating a tool to handle deleting old 
files (which we all agree we should do), and putting this into the NameNode 
(which seems questionable).

> HDFS File/Directory TTL
> ---
>
> Key: HDFS-6382
> URL: https://issues.apache.org/jira/browse/HDFS-6382
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, namenode
>Affects Versions: 2.4.0
>Reporter: Zesheng Wu
>Assignee: Zesheng Wu
>
> In production environment, we always have scenario like this, we want to 
> backup files on hdfs for some time and then hope to delete these files 
> automatically. For example, we keep only 1 day's logs on local disk due to 
> limited disk space, but we need to keep about 1 month's logs in order to 
> debug program bugs, so we keep all the logs on hdfs and delete logs which are 
> older than 1 month. This is a typical scenario of HDFS TTL. So here we 
> propose that hdfs can support TTL.
> Following are some details of this proposal:
> 1. HDFS can support TTL on a specified file or directory
> 2. If a TTL is set on a file, the file will be deleted automatically after 
> the TTL is expired
> 3. If a TTL is set on a directory, the child files and directories will be 
> deleted automatically after the TTL is expired
> 4. The child file/directory's TTL configuration should override its parent 
> directory's
> 5. A global configuration is needed to configure that whether the deleted 
> files/directories should go to the trash or not
> 6. A global configuration is needed to configure that whether a directory 
> with TTL should be deleted when it is emptied by TTL mechanism or not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6411) nfs-hdfs-gateway mount raises I/O error and hangs when a unauthorized user attempts to access it

2014-05-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010034#comment-14010034
 ] 

Hadoop QA commented on HDFS-6411:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12646945/HDFS-6411.003.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-nfs hadoop-hdfs-project/hadoop-hdfs-nfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6982//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6982//console

This message is automatically generated.

> nfs-hdfs-gateway mount raises I/O error and hangs when a unauthorized user 
> attempts to access it
> 
>
> Key: HDFS-6411
> URL: https://issues.apache.org/jira/browse/HDFS-6411
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.2.0
>Reporter: Zhongyi Xie
>Assignee: Brandon Li
> Attachments: HDFS-6411-branch-2.2.patch, HDFS-6411.002.patch, 
> HDFS-6411.003.patch, HDFS-6411.004.patch, HDFS-6411.patch, 
> tcpdump-HDFS-6411-Brandon.out
>
>
> We use the nfs-hdfs gateway to expose hdfs thru nfs.
> 0) login as root, run nfs-hdfs gateway as a user, say, nfsserver. 
> [root@zhongyi-test-cluster-desktop hdfs]# ls /hdfs
> backups  hive  mr-history  system  tmp  user
> 1) add a user nfs-test: adduser nfs-test(make sure that this user is not a 
> proxyuser of nfsserver
> 2) switch to test user: su - nfs-test
> 3) access hdfs nfs gateway
> [nfs-test@zhongyi-test-cluster-desktop ~]$ ls /hdfs
> ls: cannot open directory /hdfs: Input/output error
> retry:
> [nfs-test@zhongyi-test-cluster-desktop ~]$ ls /hdfs
> ls: cannot access /hdfs: Stale NFS file handle
> 4) switch back to root and access hdfs nfs gateway
> [nfs-test@zhongyi-test-cluster-desktop ~]$ exit
> logout
> [root@zhongyi-test-cluster-desktop hdfs]# ls /hdfs
> ls: cannot access /hdfs: Stale NFS file handle
> the nfsserver log indicates we hit an authorization error in the rpc handler; 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
>  User: nfsserver is not allowed to impersonate nfs-test
> and NFS3ERR_IO is returned, which explains why we see input/output error. 
> One can catch the authorizationexception and return the correct error: 
> NFS3ERR_ACCES to fix the error message on the client side but that doesn't 
> seem to solve the mount hang issue though. When the mount hang happens, it 
> stops printing nfsserver log which makes it more difficult to figure out the 
> real cause of the hang. According to jstack and debugger, the nfsserver seems 
> to be waiting for client requests



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6286) adding a timeout setting for local read io

2014-05-27 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010032#comment-14010032
 ] 

Colin Patrick McCabe commented on HDFS-6286:


I understand your motivation here, but I'm afraid I am -1 on this at the 
moment.  There is a high overhead to adding communication between threads to 
every {{read}}, and I don't think we want this in short-circuit reads (which is 
an optimization, after all).

Any way you look at this, it is problematic.  If we create an extra thread per 
DFSInputStream using SCR, we might completely blow the thread budget of an 
application like HBase that would be hundreds or thousands of extra threads 
(since HBase has a lot of open local files).  If we have a fixed-size thread 
pool, slow disks will cause the thread pool to grind to a halt and bottleneck 
system performance.

I am open to ideas here, but I just can't see a way to resolve those problems.  
Maybe I am missing something.  In the meantime, I am going to create a JIRA to 
implement hedged reads for the non-pread case.  I think that will be a better 
general solution that doesn't have the above-mentioned problems.

> adding a timeout setting for local read io
> --
>
> Key: HDFS-6286
> URL: https://issues.apache.org/jira/browse/HDFS-6286
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0, 2.4.0
>Reporter: Liang Xie
>Assignee: Liang Xie
>
> Currently, if a write or remote read requested into a sick disk, 
> DFSClient.hdfsTimeout could help the caller have a guaranteed time cost to 
> return back. but it doesn't work on local read. Take HBase scan for example,
> DFSInputStream.read -> readWithStrategy -> readBuffer -> 
> BlockReaderLocal.read ->  dataIn.read -> FileChannelImpl.read
> if it hits a bad disk, the low read io probably takes tens of seconds,  and 
> what's worse is, the "DFSInputStream.read" hold a lock always.
> Per my knowledge, there's no good mechanism to cancel a running read 
> io(Please correct me if it's wrong), so my opinion is adding a future around 
> the read request, and we could set a timeout there, if the threshold reached, 
> we can add the local node into deadnode probably...
> Any thought?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6411) nfs-hdfs-gateway mount raises I/O error and hangs when a unauthorized user attempts to access it

2014-05-27 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-6411:
-

Attachment: HDFS-6411.004.patch

> nfs-hdfs-gateway mount raises I/O error and hangs when a unauthorized user 
> attempts to access it
> 
>
> Key: HDFS-6411
> URL: https://issues.apache.org/jira/browse/HDFS-6411
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.2.0
>Reporter: Zhongyi Xie
>Assignee: Brandon Li
> Attachments: HDFS-6411-branch-2.2.patch, HDFS-6411.002.patch, 
> HDFS-6411.003.patch, HDFS-6411.004.patch, HDFS-6411.patch, 
> tcpdump-HDFS-6411-Brandon.out
>
>
> We use the nfs-hdfs gateway to expose hdfs thru nfs.
> 0) login as root, run nfs-hdfs gateway as a user, say, nfsserver. 
> [root@zhongyi-test-cluster-desktop hdfs]# ls /hdfs
> backups  hive  mr-history  system  tmp  user
> 1) add a user nfs-test: adduser nfs-test(make sure that this user is not a 
> proxyuser of nfsserver
> 2) switch to test user: su - nfs-test
> 3) access hdfs nfs gateway
> [nfs-test@zhongyi-test-cluster-desktop ~]$ ls /hdfs
> ls: cannot open directory /hdfs: Input/output error
> retry:
> [nfs-test@zhongyi-test-cluster-desktop ~]$ ls /hdfs
> ls: cannot access /hdfs: Stale NFS file handle
> 4) switch back to root and access hdfs nfs gateway
> [nfs-test@zhongyi-test-cluster-desktop ~]$ exit
> logout
> [root@zhongyi-test-cluster-desktop hdfs]# ls /hdfs
> ls: cannot access /hdfs: Stale NFS file handle
> the nfsserver log indicates we hit an authorization error in the rpc handler; 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
>  User: nfsserver is not allowed to impersonate nfs-test
> and NFS3ERR_IO is returned, which explains why we see input/output error. 
> One can catch the authorizationexception and return the correct error: 
> NFS3ERR_ACCES to fix the error message on the client side but that doesn't 
> seem to solve the mount hang issue though. When the mount hang happens, it 
> stops printing nfsserver log which makes it more difficult to figure out the 
> real cause of the hang. According to jstack and debugger, the nfsserver seems 
> to be waiting for client requests



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6411) nfs-hdfs-gateway mount raises I/O error and hangs when a unauthorized user attempts to access it

2014-05-27 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010011#comment-14010011
 ] 

Brandon Li commented on HDFS-6411:
--

Uploaded the patch to address Zhongyi's comments. Thanks!

> nfs-hdfs-gateway mount raises I/O error and hangs when a unauthorized user 
> attempts to access it
> 
>
> Key: HDFS-6411
> URL: https://issues.apache.org/jira/browse/HDFS-6411
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.2.0
>Reporter: Zhongyi Xie
>Assignee: Brandon Li
> Attachments: HDFS-6411-branch-2.2.patch, HDFS-6411.002.patch, 
> HDFS-6411.003.patch, HDFS-6411.004.patch, HDFS-6411.patch, 
> tcpdump-HDFS-6411-Brandon.out
>
>
> We use the nfs-hdfs gateway to expose hdfs thru nfs.
> 0) login as root, run nfs-hdfs gateway as a user, say, nfsserver. 
> [root@zhongyi-test-cluster-desktop hdfs]# ls /hdfs
> backups  hive  mr-history  system  tmp  user
> 1) add a user nfs-test: adduser nfs-test(make sure that this user is not a 
> proxyuser of nfsserver
> 2) switch to test user: su - nfs-test
> 3) access hdfs nfs gateway
> [nfs-test@zhongyi-test-cluster-desktop ~]$ ls /hdfs
> ls: cannot open directory /hdfs: Input/output error
> retry:
> [nfs-test@zhongyi-test-cluster-desktop ~]$ ls /hdfs
> ls: cannot access /hdfs: Stale NFS file handle
> 4) switch back to root and access hdfs nfs gateway
> [nfs-test@zhongyi-test-cluster-desktop ~]$ exit
> logout
> [root@zhongyi-test-cluster-desktop hdfs]# ls /hdfs
> ls: cannot access /hdfs: Stale NFS file handle
> the nfsserver log indicates we hit an authorization error in the rpc handler; 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
>  User: nfsserver is not allowed to impersonate nfs-test
> and NFS3ERR_IO is returned, which explains why we see input/output error. 
> One can catch the authorizationexception and return the correct error: 
> NFS3ERR_ACCES to fix the error message on the client side but that doesn't 
> seem to solve the mount hang issue though. When the mount hang happens, it 
> stops printing nfsserver log which makes it more difficult to figure out the 
> real cause of the hang. According to jstack and debugger, the nfsserver seems 
> to be waiting for client requests



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6222) Remove background token renewer from webhdfs

2014-05-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010007#comment-14010007
 ] 

Hadoop QA commented on HDFS-6222:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12646923/HDFS-6222.trunk.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6980//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6980//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6980//console

This message is automatically generated.

> Remove background token renewer from webhdfs
> 
>
> Key: HDFS-6222
> URL: https://issues.apache.org/jira/browse/HDFS-6222
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 2.0.0-alpha, 3.0.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Attachments: HDFS-6222.branch-2.patch, HDFS-6222.trunk.patch
>
>
> The background token renewer is a source of problems for long-running 
> daemons.  Webhdfs should lazy fetch a new token when it receives an 
> InvalidToken exception.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6411) nfs-hdfs-gateway mount raises I/O error and hangs when a unauthorized user attempts to access it

2014-05-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010008#comment-14010008
 ] 

Hadoop QA commented on HDFS-6411:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12646945/HDFS-6411.003.patch
  against trunk revision .

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6984//console

This message is automatically generated.

> nfs-hdfs-gateway mount raises I/O error and hangs when a unauthorized user 
> attempts to access it
> 
>
> Key: HDFS-6411
> URL: https://issues.apache.org/jira/browse/HDFS-6411
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.2.0
>Reporter: Zhongyi Xie
>Assignee: Brandon Li
> Attachments: HDFS-6411-branch-2.2.patch, HDFS-6411.002.patch, 
> HDFS-6411.003.patch, HDFS-6411.patch, tcpdump-HDFS-6411-Brandon.out
>
>
> We use the nfs-hdfs gateway to expose hdfs thru nfs.
> 0) login as root, run nfs-hdfs gateway as a user, say, nfsserver. 
> [root@zhongyi-test-cluster-desktop hdfs]# ls /hdfs
> backups  hive  mr-history  system  tmp  user
> 1) add a user nfs-test: adduser nfs-test(make sure that this user is not a 
> proxyuser of nfsserver
> 2) switch to test user: su - nfs-test
> 3) access hdfs nfs gateway
> [nfs-test@zhongyi-test-cluster-desktop ~]$ ls /hdfs
> ls: cannot open directory /hdfs: Input/output error
> retry:
> [nfs-test@zhongyi-test-cluster-desktop ~]$ ls /hdfs
> ls: cannot access /hdfs: Stale NFS file handle
> 4) switch back to root and access hdfs nfs gateway
> [nfs-test@zhongyi-test-cluster-desktop ~]$ exit
> logout
> [root@zhongyi-test-cluster-desktop hdfs]# ls /hdfs
> ls: cannot access /hdfs: Stale NFS file handle
> the nfsserver log indicates we hit an authorization error in the rpc handler; 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
>  User: nfsserver is not allowed to impersonate nfs-test
> and NFS3ERR_IO is returned, which explains why we see input/output error. 
> One can catch the authorizationexception and return the correct error: 
> NFS3ERR_ACCES to fix the error message on the client side but that doesn't 
> seem to solve the mount hang issue though. When the mount hang happens, it 
> stops printing nfsserver log which makes it more difficult to figure out the 
> real cause of the hang. According to jstack and debugger, the nfsserver seems 
> to be waiting for client requests



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6448) change BlockReaderLocalLegacy timeout detail

2014-05-27 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010003#comment-14010003
 ] 

Colin Patrick McCabe commented on HDFS-6448:


Socket timeout seems reasonable to me.  DFSInputStream uses socketTimeout to 
get a proxy to talk to the DN, in code like this:

{code}
  /** Read the block length from one of the datanodes. */
  private long readBlockLength(LocatedBlock locatedblock) throws IOException {
...
  try {
cdp = DFSUtil.createClientDatanodeProtocolProxy(datanode,
dfsClient.getConfiguration(), dfsClient.getConf().socketTimeout,
dfsClient.getConf().connectToDnViaHostname, locatedblock);
{code}

So I am +1 on this patch.

bq. yes, we employed hadoop2.0, only the legacy HDFS-2246 available. I took a 
quick look at the HDFS-347 SCR code while making patch and did not find the 
same issue(to be honest, i am not familiar with this piece of code, so probably 
just i missed it). i think Colin Patrick McCabe have the exact answer definitely

Just as a note, we kept around the legacy block reader local only because 
HDFS-347 wasn't implemented on Windows.  If you are not using Windows, then I 
would recommend upgrading and using the new one ASAP... HDFS-2246 has a lot of 
problems besides this (its failure handling code is fairly buggy, especially in 
older releases.)

bq. Do you know if this is is only an issue in HDFS-2246 SCR? Is it present in 
HDFS-347 SCRs?

HDFS-347 uses {{socketTimeout}}.  The relevant code is in 
{{BlockReaderFactory#nextDomainPeer}}

> change BlockReaderLocalLegacy timeout detail
> 
>
> Key: HDFS-6448
> URL: https://issues.apache.org/jira/browse/HDFS-6448
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0, 2.4.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6448.txt
>
>
> Our hbase deployed upon hadoop2.0, in one accident, we hit HDFS-5016 in HDFS 
> side, but we also found from HBase side, the dfs client was hung at 
> getBlockReader, after reading code, we found there is a timeout setting in 
> current codebase though, but the default hdfsTimeout value is "-1"  ( from 
> Client.java:getTimeout(conf) )which means no timeout...
> The hung stack trace like following:
> at $Proxy21.getBlockLocalPathInfo(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientDatanodeProtocolTranslatorPB.getBlockLocalPathInfo(ClientDatanodeProtocolTranslatorPB.java:215)
> at 
> org.apache.hadoop.hdfs.BlockReaderLocal.getBlockPathInfo(BlockReaderLocal.java:267)
> at 
> org.apache.hadoop.hdfs.BlockReaderLocal.newBlockReader(BlockReaderLocal.java:180)
> at org.apache.hadoop.hdfs.DFSClient.getLocalBlockReader(DFSClient.java:812)
> One feasible fix is replacing the hdfsTimeout with socketTimeout. see 
> attached patch. Most of credit should give [~liushaohui]



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HDFS-6449) Incorrect counting in ContentSummaryComputationContext in 0.23.

2014-05-27 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee resolved HDFS-6449.
--

   Resolution: Fixed
Fix Version/s: 0.23.11
 Hadoop Flags: Reviewed

Thanks for the review, Daryn. I've committed this to branch-0.23.

> Incorrect counting in ContentSummaryComputationContext in 0.23.
> ---
>
> Key: HDFS-6449
> URL: https://issues.apache.org/jira/browse/HDFS-6449
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.23.10
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Fix For: 0.23.11
>
> Attachments: HDFS-6449.branch-0.23.patch
>
>
> In {{ContentSummaryComputationContext}}, the content counting in {{yield()}} 
> is incorrect. The result is still correct, but it ends up yielding more 
> frequently.  Trunk and branch-2 does not have this bug.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6227) ShortCircuitCache#unref should purge ShortCircuitReplicas whose streams have been closed by java interrupts

2014-05-27 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14009967#comment-14009967
 ] 

Colin Patrick McCabe commented on HDFS-6227:


Thanks, Jing.  Test failure appears to be HDFS-6257, not related.

> ShortCircuitCache#unref should purge ShortCircuitReplicas whose streams have 
> been closed by java interrupts
> ---
>
> Key: HDFS-6227
> URL: https://issues.apache.org/jira/browse/HDFS-6227
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Jing Zhao
>Assignee: Colin Patrick McCabe
> Fix For: 2.5.0
>
> Attachments: HDFS-6227.000.patch, HDFS-6227.001.patch, 
> HDFS-6227.002.patch, ShortCircuitReadInterruption.test.patch
>
>
> While running tests in a single node cluster, where short circuit read is 
> enabled and multiple threads may read the same file concurrently, one of the 
> read got ClosedChannelException and failed. Full exception trace see comment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6411) nfs-hdfs-gateway mount raises I/O error and hangs when a unauthorized user attempts to access it

2014-05-27 Thread Zhongyi Xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14009985#comment-14009985
 ] 

Zhongyi Xie commented on HDFS-6411:
---

[~brandonli], can you please also add an "else" clause in the getattr function, 
like you did in access, just in case the unwrapped exception happens to be 
something other than AuthorizationException? 
there is another related issue with fsstat where NFS3ERR_IO could also be 
replaced with NFS3_ACCESS

> nfs-hdfs-gateway mount raises I/O error and hangs when a unauthorized user 
> attempts to access it
> 
>
> Key: HDFS-6411
> URL: https://issues.apache.org/jira/browse/HDFS-6411
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.2.0
>Reporter: Zhongyi Xie
>Assignee: Brandon Li
> Attachments: HDFS-6411-branch-2.2.patch, HDFS-6411.002.patch, 
> HDFS-6411.003.patch, HDFS-6411.patch, tcpdump-HDFS-6411-Brandon.out
>
>
> We use the nfs-hdfs gateway to expose hdfs thru nfs.
> 0) login as root, run nfs-hdfs gateway as a user, say, nfsserver. 
> [root@zhongyi-test-cluster-desktop hdfs]# ls /hdfs
> backups  hive  mr-history  system  tmp  user
> 1) add a user nfs-test: adduser nfs-test(make sure that this user is not a 
> proxyuser of nfsserver
> 2) switch to test user: su - nfs-test
> 3) access hdfs nfs gateway
> [nfs-test@zhongyi-test-cluster-desktop ~]$ ls /hdfs
> ls: cannot open directory /hdfs: Input/output error
> retry:
> [nfs-test@zhongyi-test-cluster-desktop ~]$ ls /hdfs
> ls: cannot access /hdfs: Stale NFS file handle
> 4) switch back to root and access hdfs nfs gateway
> [nfs-test@zhongyi-test-cluster-desktop ~]$ exit
> logout
> [root@zhongyi-test-cluster-desktop hdfs]# ls /hdfs
> ls: cannot access /hdfs: Stale NFS file handle
> the nfsserver log indicates we hit an authorization error in the rpc handler; 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
>  User: nfsserver is not allowed to impersonate nfs-test
> and NFS3ERR_IO is returned, which explains why we see input/output error. 
> One can catch the authorizationexception and return the correct error: 
> NFS3ERR_ACCES to fix the error message on the client side but that doesn't 
> seem to solve the mount hang issue though. When the mount hang happens, it 
> stops printing nfsserver log which makes it more difficult to figure out the 
> real cause of the hang. According to jstack and debugger, the nfsserver seems 
> to be waiting for client requests



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6411) nfs-hdfs-gateway mount raises I/O error and hangs when a unauthorized user attempts to access it

2014-05-27 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14009971#comment-14009971
 ] 

Brandon Li commented on HDFS-6411:
--

Thank you, [~aw] and [~zhongyi-altiscale]. 
The error massage printed by shell is always "permission denied" in my tests 
with either NFS3ERR_ACCES or NFS3ERR_PERM.   

Regardless, I agree that NFS3ERR_ACCES is a better error status than 
NFS3ERR_PERM in this case. I've updated a new patch for the update. 



> nfs-hdfs-gateway mount raises I/O error and hangs when a unauthorized user 
> attempts to access it
> 
>
> Key: HDFS-6411
> URL: https://issues.apache.org/jira/browse/HDFS-6411
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.2.0
>Reporter: Zhongyi Xie
>Assignee: Brandon Li
> Attachments: HDFS-6411-branch-2.2.patch, HDFS-6411.002.patch, 
> HDFS-6411.003.patch, HDFS-6411.patch, tcpdump-HDFS-6411-Brandon.out
>
>
> We use the nfs-hdfs gateway to expose hdfs thru nfs.
> 0) login as root, run nfs-hdfs gateway as a user, say, nfsserver. 
> [root@zhongyi-test-cluster-desktop hdfs]# ls /hdfs
> backups  hive  mr-history  system  tmp  user
> 1) add a user nfs-test: adduser nfs-test(make sure that this user is not a 
> proxyuser of nfsserver
> 2) switch to test user: su - nfs-test
> 3) access hdfs nfs gateway
> [nfs-test@zhongyi-test-cluster-desktop ~]$ ls /hdfs
> ls: cannot open directory /hdfs: Input/output error
> retry:
> [nfs-test@zhongyi-test-cluster-desktop ~]$ ls /hdfs
> ls: cannot access /hdfs: Stale NFS file handle
> 4) switch back to root and access hdfs nfs gateway
> [nfs-test@zhongyi-test-cluster-desktop ~]$ exit
> logout
> [root@zhongyi-test-cluster-desktop hdfs]# ls /hdfs
> ls: cannot access /hdfs: Stale NFS file handle
> the nfsserver log indicates we hit an authorization error in the rpc handler; 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
>  User: nfsserver is not allowed to impersonate nfs-test
> and NFS3ERR_IO is returned, which explains why we see input/output error. 
> One can catch the authorizationexception and return the correct error: 
> NFS3ERR_ACCES to fix the error message on the client side but that doesn't 
> seem to solve the mount hang issue though. When the mount hang happens, it 
> stops printing nfsserver log which makes it more difficult to figure out the 
> real cause of the hang. According to jstack and debugger, the nfsserver seems 
> to be waiting for client requests



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6449) Incorrect counting in ContentSummaryComputationContext in 0.23.

2014-05-27 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14009962#comment-14009962
 ] 

Daryn Sharp commented on HDFS-6449:
---

+1  Proven to work since this is the version of the patch run internally.

> Incorrect counting in ContentSummaryComputationContext in 0.23.
> ---
>
> Key: HDFS-6449
> URL: https://issues.apache.org/jira/browse/HDFS-6449
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.23.10
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-6449.branch-0.23.patch
>
>
> In {{ContentSummaryComputationContext}}, the content counting in {{yield()}} 
> is incorrect. The result is still correct, but it ends up yielding more 
> frequently.  Trunk and branch-2 does not have this bug.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6447) balancer should timestamp the completion message

2014-05-27 Thread Juan Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14009960#comment-14009960
 ] 

Juan Yu commented on HDFS-6447:
---

Oops, sorry about the patch name.

> balancer should timestamp the completion message
> 
>
> Key: HDFS-6447
> URL: https://issues.apache.org/jira/browse/HDFS-6447
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Reporter: Allen Wittenauer
>Assignee: Juan Yu
>Priority: Trivial
>  Labels: newbie
> Attachments: HDFS-6447.patch.001
>
>
> When the balancer finishes, it doesn't report the time it finished.  It 
> should do this so that users have a better sense of how long it took to 
> complete.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6227) ShortCircuitCache#unref should purge ShortCircuitReplicas whose streams have been closed by java interrupts

2014-05-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14009955#comment-14009955
 ] 

Hudson commented on HDFS-6227:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5611 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5611/])
HDFS-6227. ShortCircuitCache#unref should purge ShortCircuitReplicas whose 
streams have been closed by java interrupts. Contributed by Colin Patrick 
McCabe. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1597829)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/shortcircuit/ShortCircuitCache.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderFactory.java


> ShortCircuitCache#unref should purge ShortCircuitReplicas whose streams have 
> been closed by java interrupts
> ---
>
> Key: HDFS-6227
> URL: https://issues.apache.org/jira/browse/HDFS-6227
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Jing Zhao
>Assignee: Colin Patrick McCabe
> Fix For: 2.5.0
>
> Attachments: HDFS-6227.000.patch, HDFS-6227.001.patch, 
> HDFS-6227.002.patch, ShortCircuitReadInterruption.test.patch
>
>
> While running tests in a single node cluster, where short circuit read is 
> enabled and multiple threads may read the same file concurrently, one of the 
> read got ClosedChannelException and failed. Full exception trace see comment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6411) nfs-hdfs-gateway mount raises I/O error and hangs when a unauthorized user attempts to access it

2014-05-27 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-6411:
-

Attachment: HDFS-6411.003.patch

> nfs-hdfs-gateway mount raises I/O error and hangs when a unauthorized user 
> attempts to access it
> 
>
> Key: HDFS-6411
> URL: https://issues.apache.org/jira/browse/HDFS-6411
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.2.0
>Reporter: Zhongyi Xie
>Assignee: Brandon Li
> Attachments: HDFS-6411-branch-2.2.patch, HDFS-6411.002.patch, 
> HDFS-6411.003.patch, HDFS-6411.patch, tcpdump-HDFS-6411-Brandon.out
>
>
> We use the nfs-hdfs gateway to expose hdfs thru nfs.
> 0) login as root, run nfs-hdfs gateway as a user, say, nfsserver. 
> [root@zhongyi-test-cluster-desktop hdfs]# ls /hdfs
> backups  hive  mr-history  system  tmp  user
> 1) add a user nfs-test: adduser nfs-test(make sure that this user is not a 
> proxyuser of nfsserver
> 2) switch to test user: su - nfs-test
> 3) access hdfs nfs gateway
> [nfs-test@zhongyi-test-cluster-desktop ~]$ ls /hdfs
> ls: cannot open directory /hdfs: Input/output error
> retry:
> [nfs-test@zhongyi-test-cluster-desktop ~]$ ls /hdfs
> ls: cannot access /hdfs: Stale NFS file handle
> 4) switch back to root and access hdfs nfs gateway
> [nfs-test@zhongyi-test-cluster-desktop ~]$ exit
> logout
> [root@zhongyi-test-cluster-desktop hdfs]# ls /hdfs
> ls: cannot access /hdfs: Stale NFS file handle
> the nfsserver log indicates we hit an authorization error in the rpc handler; 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
>  User: nfsserver is not allowed to impersonate nfs-test
> and NFS3ERR_IO is returned, which explains why we see input/output error. 
> One can catch the authorizationexception and return the correct error: 
> NFS3ERR_ACCES to fix the error message on the client side but that doesn't 
> seem to solve the mount hang issue though. When the mount hang happens, it 
> stops printing nfsserver log which makes it more difficult to figure out the 
> real cause of the hang. According to jstack and debugger, the nfsserver seems 
> to be waiting for client requests



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6447) balancer should timestamp the completion message

2014-05-27 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14009933#comment-14009933
 ] 

Andrew Wang commented on HDFS-6447:
---

+1 LGTM, thanks Juan. Typically we name the patches such that they end in 
.patch, e.g. "hdfs-6447.001.patch", but that's a nit :)

[~aw], is this basically what you had in mind?

> balancer should timestamp the completion message
> 
>
> Key: HDFS-6447
> URL: https://issues.apache.org/jira/browse/HDFS-6447
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Reporter: Allen Wittenauer
>Assignee: Juan Yu
>Priority: Trivial
>  Labels: newbie
> Attachments: HDFS-6447.patch.001
>
>
> When the balancer finishes, it doesn't report the time it finished.  It 
> should do this so that users have a better sense of how long it took to 
> complete.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6227) ShortCircuitCache#unref should purge ShortCircuitReplicas whose streams have been closed by java interrupts

2014-05-27 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-6227:


   Resolution: Fixed
Fix Version/s: 2.5.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I've committed this to trunk and branch-2. Thanks for the fix [~cmccabe]!

> ShortCircuitCache#unref should purge ShortCircuitReplicas whose streams have 
> been closed by java interrupts
> ---
>
> Key: HDFS-6227
> URL: https://issues.apache.org/jira/browse/HDFS-6227
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Jing Zhao
>Assignee: Colin Patrick McCabe
> Fix For: 2.5.0
>
> Attachments: HDFS-6227.000.patch, HDFS-6227.001.patch, 
> HDFS-6227.002.patch, ShortCircuitReadInterruption.test.patch
>
>
> While running tests in a single node cluster, where short circuit read is 
> enabled and multiple threads may read the same file concurrently, one of the 
> read got ClosedChannelException and failed. Full exception trace see comment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6449) Incorrect counting in ContentSummaryComputationContext in 0.23.

2014-05-27 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-6449:
-

Attachment: HDFS-6449.branch-0.23.patch

Attaching patch for branch-0.23.

> Incorrect counting in ContentSummaryComputationContext in 0.23.
> ---
>
> Key: HDFS-6449
> URL: https://issues.apache.org/jira/browse/HDFS-6449
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.23.10
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-6449.branch-0.23.patch
>
>
> In {{ContentSummaryComputationContext}}, the content counting in {{yield()}} 
> is incorrect. The result is still correct, but it ends up yielding more 
> frequently.  Trunk and branch-2 does not have this bug.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6449) Incorrect counting in ContentSummaryComputationContext in 0.23.

2014-05-27 Thread Kihwal Lee (JIRA)
Kihwal Lee created HDFS-6449:


 Summary: Incorrect counting in ContentSummaryComputationContext in 
0.23.
 Key: HDFS-6449
 URL: https://issues.apache.org/jira/browse/HDFS-6449
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.10
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Critical


In {{ContentSummaryComputationContext}}, the content counting in {{yield()}} is 
incorrect. The result is still correct, but it ends up yielding more 
frequently.  Trunk and branch-2 does not have this bug.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6447) balancer should timestamp the completion message

2014-05-27 Thread Juan Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Yu updated HDFS-6447:
--

Status: Patch Available  (was: In Progress)

> balancer should timestamp the completion message
> 
>
> Key: HDFS-6447
> URL: https://issues.apache.org/jira/browse/HDFS-6447
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Reporter: Allen Wittenauer
>Assignee: Juan Yu
>Priority: Trivial
>  Labels: newbie
> Attachments: HDFS-6447.patch.001
>
>
> When the balancer finishes, it doesn't report the time it finished.  It 
> should do this so that users have a better sense of how long it took to 
> complete.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6447) balancer should timestamp the completion message

2014-05-27 Thread Juan Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juan Yu updated HDFS-6447:
--

Attachment: HDFS-6447.patch.001

patch to report balancer finish time.

> balancer should timestamp the completion message
> 
>
> Key: HDFS-6447
> URL: https://issues.apache.org/jira/browse/HDFS-6447
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Reporter: Allen Wittenauer
>Assignee: Juan Yu
>Priority: Trivial
>  Labels: newbie
> Attachments: HDFS-6447.patch.001
>
>
> When the balancer finishes, it doesn't report the time it finished.  It 
> should do this so that users have a better sense of how long it took to 
> complete.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Work started] (HDFS-6447) balancer should timestamp the completion message

2014-05-27 Thread Juan Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-6447 started by Juan Yu.

> balancer should timestamp the completion message
> 
>
> Key: HDFS-6447
> URL: https://issues.apache.org/jira/browse/HDFS-6447
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Reporter: Allen Wittenauer
>Assignee: Juan Yu
>Priority: Trivial
>  Labels: newbie
>
> When the balancer finishes, it doesn't report the time it finished.  It 
> should do this so that users have a better sense of how long it took to 
> complete.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6222) Remove background token renewer from webhdfs

2014-05-27 Thread Daryn Sharp (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated HDFS-6222:
--

Status: Patch Available  (was: Open)

> Remove background token renewer from webhdfs
> 
>
> Key: HDFS-6222
> URL: https://issues.apache.org/jira/browse/HDFS-6222
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 2.0.0-alpha, 3.0.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Attachments: HDFS-6222.branch-2.patch, HDFS-6222.trunk.patch
>
>
> The background token renewer is a source of problems for long-running 
> daemons.  Webhdfs should lazy fetch a new token when it receives an 
> InvalidToken exception.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6222) Remove background token renewer from webhdfs

2014-05-27 Thread Daryn Sharp (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated HDFS-6222:
--

Attachment: HDFS-6222.trunk.patch
HDFS-6222.branch-2.patch

Adds lazy re-refetch of expired tokens.  Internally tested on secure clusters.  
Only difference in patches is a conflicting import.

> Remove background token renewer from webhdfs
> 
>
> Key: HDFS-6222
> URL: https://issues.apache.org/jira/browse/HDFS-6222
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 2.0.0-alpha, 3.0.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Attachments: HDFS-6222.branch-2.patch, HDFS-6222.trunk.patch
>
>
> The background token renewer is a source of problems for long-running 
> daemons.  Webhdfs should lazy fetch a new token when it receives an 
> InvalidToken exception.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6441) Add ability to exclude/include few datanodes while balancing

2014-05-27 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14009688#comment-14009688
 ] 

Daryn Sharp commented on HDFS-6441:
---

I've only quickly skimmed the raw patch, a few questions/comments:
# What happens if neither option is given?  It appears to maybe ignore all 
hosts?
# If both options are given, it appears to build a union of the include/exclude 
hosts, then use the last argument to determine if the union is exclude or not?
# I seem to recall {{getHostName}} is (or used to be) a bit peculiar and can 
return a DN self-reported name, hence the {{getPeerHostName}} which is 
guaranteed to return the actual hostname.  You should check and match the NN's 
behavior on use of peer name or reported name.
# {{DEFALUT}} is misspelled

> Add ability to exclude/include few datanodes while balancing
> 
>
> Key: HDFS-6441
> URL: https://issues.apache.org/jira/browse/HDFS-6441
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Affects Versions: 2.4.0
>Reporter: Benoy Antony
>Assignee: Benoy Antony
> Attachments: HDFS-6441.patch, HDFS-6441.patch, HDFS-6441.patch, 
> HDFS-6441.patch, HDFS-6441.patch, HDFS-6441.patch, HDFS-6441.patch, 
> HDFS-6441.patch, HDFS-6441.patch
>
>
> In some use cases, it is desirable to ignore a few data nodes  while 
> balancing. The administrator should be able to specify a list of data nodes 
> in a file similar to the hosts file and the balancer should ignore these data 
> nodes while balancing so that no blocks are added/removed on these nodes.
> Similarly it will be beneficial to specify that only a particular list of 
> datanodes should be considered for balancing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6354) NN startup does not fail when it fails to login with the spnego principal

2014-05-27 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14009656#comment-14009656
 ] 

Daryn Sharp commented on HDFS-6354:
---

The multiple principal spnego support feature should indirectly uncover a 
misconfiguration since it has to read the keytab (which may throw) and then 
throws if no HTTP principals are found.  Perhaps we need to extend the auth 
handler to verify an explicit principal is in the keytab too.

> NN startup does not fail when it fails to login with the spnego principal
> -
>
> Key: HDFS-6354
> URL: https://issues.apache.org/jira/browse/HDFS-6354
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Arpit Gupta
>
> I have noticed where the NN startup did not report any issues the login fails 
> because either the keytab is wrong or the principal does not exist etc. This 
> can be mis leading and lead to authentication failures when a client tries to 
> authenticate to the spnego principal.



--
This message was sent by Atlassian JIRA
(v6.2#6252)