[jira] [Updated] (HDFS-10337) OfflineEditsViewer stats option should print 0 instead of null for the count of operations

2016-05-05 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10337:
-
Attachment: HDFS-10337.003.patch

Thanks for review again. Update the patch for addressing the comments.

> OfflineEditsViewer stats option should print 0 instead of null for the count 
> of operations
> --
>
> Key: HDFS-10337
> URL: https://issues.apache.org/jira/browse/HDFS-10337
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Akira AJISAKA
>Assignee: Lin Yiqun
>Priority: Minor
>  Labels: newbie
> Attachments: HDFS-10337.001.patch, HDFS-10337.002.patch, 
> HDFS-10337.003.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-2043) TestHFlush failing intermittently

2016-05-05 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-2043:

Attachment: HDFS-2043.005.patch

Post the new patch for addressing the comment.

> TestHFlush failing intermittently
> -
>
> Key: HDFS-2043
> URL: https://issues.apache.org/jira/browse/HDFS-2043
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Aaron T. Myers
>Assignee: Lin Yiqun
> Attachments: HDFS-2043.002.patch, HDFS-2043.003.patch, 
> HDFS-2043.004.patch, HDFS-2043.005.patch, HDFS.001.patch
>
>
> I can't reproduce this failure reliably, but it seems like TestHFlush has 
> been failing intermittently, with the frequency increasing of late.
> Note the following two pre-commit test runs from different JIRAs where 
> TestHFlush seems to have failed spuriously:
> https://builds.apache.org/job/PreCommit-HDFS-Build/734//testReport/
> https://builds.apache.org/job/PreCommit-HDFS-Build/680//testReport/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10371) MiniDFSCluster#restartDataNode does not always stop DN before start DN

2016-05-05 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15273460#comment-15273460
 ] 

Lin Yiqun commented on HDFS-10371:
--

I suggest that we should change the param {{expireOnNN}} from false to true in 
these two {{restartDataNode}}. So the dn will be stopped and restart.
{code}
  /*
   * Restart a particular datanode, use newly assigned port
   */
  public boolean restartDataNode(int i) throws IOException {
return restartDataNode(i, false);
  }

  /*
   * Restart a particular datanode, on the same port if keepPort is true
   */
  public synchronized boolean restartDataNode(int i, boolean keepPort)
  throws IOException {
return restartDataNode(i, keepPort, false);
  }
{code}

> MiniDFSCluster#restartDataNode does not always stop DN before start DN
> --
>
> Key: HDFS-10371
> URL: https://issues.apache.org/jira/browse/HDFS-10371
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: test
>Reporter: Xiaoyu Yao
>
> This could cause intermittent port binding problem if the keep the same port 
> option is chosen as evident in the recent 
> [Jenkins|https://builds.apache.org/job/PreCommit-HDFS-Build/15366/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_91.txt]
> {code}
> Tests run: 6, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 53.772 sec 
> <<< FAILURE! - in org.apache.hadoop.hdfs.TestDecommissionWithStriped
> testDecommissionWithURBlockForSameBlockGroup(org.apache.hadoop.hdfs.TestDecommissionWithStriped)
>   Time elapsed: 6.946 sec  <<< ERROR!
> java.net.BindException: Problem binding to [localhost:52957] 
> java.net.BindException: Address already in use; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:433)
>   at sun.nio.ch.Net.bind(Net.java:425)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>   at org.apache.hadoop.ipc.Server.bind(Server.java:530)
>   at org.apache.hadoop.ipc.Server$Listener.(Server.java:793)
>   at org.apache.hadoop.ipc.Server.(Server.java:2592)
>   at org.apache.hadoop.ipc.RPC$Server.(RPC.java:958)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:563)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:538)
>   at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:800)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initIpcServer(DataNode.java:932)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1297)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:479)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2584)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2472)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2519)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.restartDataNode(MiniDFSCluster.java:2242)
>   at 
> org.apache.hadoop.hdfs.TestDecommissionWithStriped.testDecommissionWithURBlockForSameBlockGroup(TestDecommissionWithStriped.java:254)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10371) MiniDFSCluster#restartDataNode does not always stop DN before start DN

2016-05-05 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15273461#comment-15273461
 ] 

Lin Yiqun commented on HDFS-10371:
--

I suggest that we should change the param {{expireOnNN}} from false to true in 
these two {{restartDataNode}}. So the dn will be stopped and restart.
{code}
  /*
   * Restart a particular datanode, use newly assigned port
   */
  public boolean restartDataNode(int i) throws IOException {
return restartDataNode(i, false);
  }

  /*
   * Restart a particular datanode, on the same port if keepPort is true
   */
  public synchronized boolean restartDataNode(int i, boolean keepPort)
  throws IOException {
return restartDataNode(i, keepPort, false);
  }
{code}

> MiniDFSCluster#restartDataNode does not always stop DN before start DN
> --
>
> Key: HDFS-10371
> URL: https://issues.apache.org/jira/browse/HDFS-10371
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: test
>Reporter: Xiaoyu Yao
>
> This could cause intermittent port binding problem if the keep the same port 
> option is chosen as evident in the recent 
> [Jenkins|https://builds.apache.org/job/PreCommit-HDFS-Build/15366/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_91.txt]
> {code}
> Tests run: 6, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 53.772 sec 
> <<< FAILURE! - in org.apache.hadoop.hdfs.TestDecommissionWithStriped
> testDecommissionWithURBlockForSameBlockGroup(org.apache.hadoop.hdfs.TestDecommissionWithStriped)
>   Time elapsed: 6.946 sec  <<< ERROR!
> java.net.BindException: Problem binding to [localhost:52957] 
> java.net.BindException: Address already in use; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:433)
>   at sun.nio.ch.Net.bind(Net.java:425)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>   at org.apache.hadoop.ipc.Server.bind(Server.java:530)
>   at org.apache.hadoop.ipc.Server$Listener.(Server.java:793)
>   at org.apache.hadoop.ipc.Server.(Server.java:2592)
>   at org.apache.hadoop.ipc.RPC$Server.(RPC.java:958)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:563)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:538)
>   at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:800)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initIpcServer(DataNode.java:932)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1297)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:479)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2584)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2472)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2519)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.restartDataNode(MiniDFSCluster.java:2242)
>   at 
> org.apache.hadoop.hdfs.TestDecommissionWithStriped.testDecommissionWithURBlockForSameBlockGroup(TestDecommissionWithStriped.java:254)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-2043) TestHFlush failing intermittently

2016-05-05 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-2043:

Attachment: HDFS-2043.004.patch

Thanks [~iwasakims] for great analysis! 
{quote}
The testHFlushInterrupted expects that the second stm.close() succeeds but it 
is not true. Underlying streamer thread is closed since closeThreads(true) is 
called in the finally block of DFSOutputStream#closeImpl.
{quote}
This operation was added in HDFS-9812, that issue fixed the problem of streamer 
threads leaking if failure happens when closing DFSOutputStream. And after 
HDFS-9812 fixed, the second {{stm.close()}} will failed more frequently.

Thanks again for the comment. Post a new patch for this.

> TestHFlush failing intermittently
> -
>
> Key: HDFS-2043
> URL: https://issues.apache.org/jira/browse/HDFS-2043
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Aaron T. Myers
>Assignee: Lin Yiqun
> Attachments: HDFS-2043.002.patch, HDFS-2043.003.patch, 
> HDFS-2043.004.patch, HDFS.001.patch
>
>
> I can't reproduce this failure reliably, but it seems like TestHFlush has 
> been failing intermittently, with the frequency increasing of late.
> Note the following two pre-commit test runs from different JIRAs where 
> TestHFlush seems to have failed spuriously:
> https://builds.apache.org/job/PreCommit-HDFS-Build/734//testReport/
> https://builds.apache.org/job/PreCommit-HDFS-Build/680//testReport/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10338) DistCp masks potential CRC check failures

2016-05-03 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270094#comment-15270094
 ] 

Lin Yiqun commented on HDFS-10338:
--

Hi, [~raviprak], thanks for review.
I agree with your comment. Is there any other comment for the latest patch? 
[~yzhangal], could you please have a time to see this patch in DistCp?

If there are no other commet, I will post a new patch later.

> DistCp masks potential CRC check failures
> -
>
> Key: HDFS-10338
> URL: https://issues.apache.org/jira/browse/HDFS-10338
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 2.7.1
>Reporter: Elliot West
>Assignee: Lin Yiqun
> Attachments: HDFS-10338.001.patch, HDFS-10338.002.patch
>
>
> There appear to be edge cases whereby CRC checks may be circumvented when 
> requests for checksums from the source or target file system fail. In this 
> event CRCs could differ between the source and target and yet the DistCp copy 
> would succeed, even when the 'skip CRC check' option is not being used.
> The code in question is contained in the method 
> [{{org.apache.hadoop.tools.util.DistCpUtils#checksumsAreEqual(...)}}|https://github.com/apache/hadoop/blob/release-2.7.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/DistCpUtils.java#L457]
> Specifically this code block suggests that if there is a failure when trying 
> to read the source or target checksum then the method will return {{true}} 
> (i.e.  the checksums are equal), implying that the check succeeded. In actual 
> fact we just failed to obtain the checksum and could not perform the check.
> {code}
> try {
>   sourceChecksum = sourceChecksum != null ? sourceChecksum : 
> sourceFS.getFileChecksum(source);
>   targetChecksum = targetFS.getFileChecksum(target);
> } catch (IOException e) {
>   LOG.error("Unable to retrieve checksum for " + source + " or "
> + target, e);
> }
> return (sourceChecksum == null || targetChecksum == null ||
>   sourceChecksum.equals(targetChecksum));
> {code}
> I believe that at the very least the caught {{IOException}} should be 
> re-thrown. If this is not deemed desirable then I believe an option 
> ({{--strictCrc}}?) should be added to enforce a strict check where we require 
> that both the source and target CRCs are retrieved, are not null, and are 
> then compared for equality. If for any reason either of the CRCs retrievals 
> fail then an exception is thrown.
> Clearly some {{FileSystems}} do not support CRCs and invocations to 
> {{FileSystem.getFileChecksum(...)}} return {{null}} in these instances. I 
> would suggest that these should fail a strict CRC check to prevent users 
> developing a false sense of security in their copy pipeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10359) Allow trigger block report from all datanodes

2016-05-03 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268541#comment-15268541
 ] 

Lin Yiqun commented on HDFS-10359:
--

Hi, [~Tao Jie], I don't think this is a good idea. If I trigger all the 
datanode in my cluster, it will flood my namenode with block report. Maybe the 
property of {{dfs.blockreport.initialDelay}} can be used for that, but I still 
think remain the original command unchanged will be better.

> Allow trigger block report from all datanodes
> -
>
> Key: HDFS-10359
> URL: https://issues.apache.org/jira/browse/HDFS-10359
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.7.0, 2.6.1
>Reporter: Tao Jie
>
> Since we have HDFS-7278 allows trigger block report from one certain 
> datanode. It would be helpful to add a option to this command to trigger 
> block report from all datanodes.
> Command maybe like this:
> *hdfs dfsadmin -triggerBlockReport \[-incremental\] 
> *



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-10358) Refactor EncryptionZone and EncryptionZoneInt

2016-05-02 Thread Lin Yiqun (JIRA)
Lin Yiqun created HDFS-10358:


 Summary: Refactor EncryptionZone and EncryptionZoneInt
 Key: HDFS-10358
 URL: https://issues.apache.org/jira/browse/HDFS-10358
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 2.7.1
Reporter: Lin Yiqun
Assignee: Lin Yiqun


Between class {{EncryptionZone}} and {{EncryptionZoneInt}}, they can reuse  
most of fields in {{EncryptionZoneInt}}. We can refactor them to improve that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-10352) Allow users to get last access time of a given directory

2016-05-02 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun reassigned HDFS-10352:


Assignee: Lin Yiqun

> Allow users to get last access time of a given directory
> 
>
> Key: HDFS-10352
> URL: https://issues.apache.org/jira/browse/HDFS-10352
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: fs
>Affects Versions: 2.6.4
>Reporter: Eric Lin
>Assignee: Lin Yiqun
>Priority: Minor
>
> Currently FileStatus.getAccessTime() function will return 0 if path is a 
> directory, it would be ideal that if a directory path is passed, the code 
> will go through all the files under the directory and return the MAX access 
> time of all the files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10352) Allow users to get last access time of a given directory

2016-05-02 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266464#comment-15266464
 ] 

Lin Yiqun commented on HDFS-10352:
--

{quote}
Maybe we can create another getAccessTime() with different number of 
parameters. Default to not checking children files, but if forced, the code can 
check accordingly.
{quote}
This comment looks resonable. I will post a patch later, assign this JIRA to me.

> Allow users to get last access time of a given directory
> 
>
> Key: HDFS-10352
> URL: https://issues.apache.org/jira/browse/HDFS-10352
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: fs
>Affects Versions: 2.6.4
>Reporter: Eric Lin
>Priority: Minor
>
> Currently FileStatus.getAccessTime() function will return 0 if path is a 
> directory, it would be ideal that if a directory path is passed, the code 
> will go through all the files under the directory and return the MAX access 
> time of all the files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10338) DistCp masks potential CRC check failures

2016-05-01 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10338:
-
Attachment: HDFS-10338.002.patch

The failed three unit tests are related. Update the new patch for fixing this.

> DistCp masks potential CRC check failures
> -
>
> Key: HDFS-10338
> URL: https://issues.apache.org/jira/browse/HDFS-10338
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 2.7.1
>Reporter: Elliot West
>Assignee: Lin Yiqun
> Attachments: HDFS-10338.001.patch, HDFS-10338.002.patch
>
>
> There appear to be edge cases whereby CRC checks may be circumvented when 
> requests for checksums from the source or target file system fail. In this 
> event CRCs could differ between the source and target and yet the DistCp copy 
> would succeed, even when the 'skip CRC check' option is not being used.
> The code in question is contained in the method 
> [{{org.apache.hadoop.tools.util.DistCpUtils#checksumsAreEqual(...)}}|https://github.com/apache/hadoop/blob/release-2.7.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/DistCpUtils.java#L457]
> Specifically this code block suggests that if there is a failure when trying 
> to read the source or target checksum then the method will return {{true}} 
> (i.e.  the checksums are equal), implying that the check succeeded. In actual 
> fact we just failed to obtain the checksum and could not perform the check.
> {code}
> try {
>   sourceChecksum = sourceChecksum != null ? sourceChecksum : 
> sourceFS.getFileChecksum(source);
>   targetChecksum = targetFS.getFileChecksum(target);
> } catch (IOException e) {
>   LOG.error("Unable to retrieve checksum for " + source + " or "
> + target, e);
> }
> return (sourceChecksum == null || targetChecksum == null ||
>   sourceChecksum.equals(targetChecksum));
> {code}
> I believe that at the very least the caught {{IOException}} should be 
> re-thrown. If this is not deemed desirable then I believe an option 
> ({{--strictCrc}}?) should be added to enforce a strict check where we require 
> that both the source and target CRCs are retrieved, are not null, and are 
> then compared for equality. If for any reason either of the CRCs retrievals 
> fail then an exception is thrown.
> Clearly some {{FileSystems}} do not support CRCs and invocations to 
> {{FileSystem.getFileChecksum(...)}} return {{null}} in these instances. I 
> would suggest that these should fail a strict CRC check to prevent users 
> developing a false sense of security in their copy pipeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10352) Allow users to get last access time of a given directory

2016-05-01 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266085#comment-15266085
 ] 

Lin Yiqun commented on HDFS-10352:
--

Hi, [~ericlin], I also agree with this. But I'm not sure about its performance 
when one directory has lots of children files and it maybe costs some time to 
traverse all children files and compare their time. And finally it return the 
result slowly. I'm glad to work for this if someone else also support this 
proposal.

> Allow users to get last access time of a given directory
> 
>
> Key: HDFS-10352
> URL: https://issues.apache.org/jira/browse/HDFS-10352
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 2.6.4
>Reporter: Eric Lin
>Priority: Minor
>
> Currently FileStatus.getAccessTime() function will return 0 if path is a 
> directory, it would be ideal that if a directory path is passed, the code 
> will go through all the files under the directory and return the MAX access 
> time of all the files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10338) DistCp masks potential CRC check failures

2016-04-28 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10338:
-
Attachment: HDFS-10338.001.patch

Update the v001 patch for making some small changes.

> DistCp masks potential CRC check failures
> -
>
> Key: HDFS-10338
> URL: https://issues.apache.org/jira/browse/HDFS-10338
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 2.7.1
>Reporter: Elliot West
>Assignee: Lin Yiqun
> Attachments: HDFS-10338.001.patch
>
>
> There appear to be edge cases whereby CRC checks may be circumvented when 
> requests for checksums from the source or target file system fail. In this 
> event CRCs could differ between the source and target and yet the DistCp copy 
> would succeed, even when the 'skip CRC check' option is not being used.
> The code in question is contained in the method 
> [{{org.apache.hadoop.tools.util.DistCpUtils#checksumsAreEqual(...)}}|https://github.com/apache/hadoop/blob/release-2.7.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/DistCpUtils.java#L457]
> Specifically this code block suggests that if there is a failure when trying 
> to read the source or target checksum then the method will return {{true}} 
> (i.e.  the checksums are equal), implying that the check succeeded. In actual 
> fact we just failed to obtain the checksum and could not perform the check.
> {code}
> try {
>   sourceChecksum = sourceChecksum != null ? sourceChecksum : 
> sourceFS.getFileChecksum(source);
>   targetChecksum = targetFS.getFileChecksum(target);
> } catch (IOException e) {
>   LOG.error("Unable to retrieve checksum for " + source + " or "
> + target, e);
> }
> return (sourceChecksum == null || targetChecksum == null ||
>   sourceChecksum.equals(targetChecksum));
> {code}
> I believe that at the very least the caught {{IOException}} should be 
> re-thrown. If this is not deemed desirable then I believe an option 
> ({{--strictCrc}}?) should be added to enforce a strict check where we require 
> that both the source and target CRCs are retrieved, are not null, and are 
> then compared for equality. If for any reason either of the CRCs retrievals 
> fail then an exception is thrown.
> Clearly some {{FileSystems}} do not support CRCs and invocations to 
> {{FileSystem.getFileChecksum(...)}} return {{null}} in these instances. I 
> would suggest that these should fail a strict CRC check to prevent users 
> developing a false sense of security in their copy pipeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10338) DistCp masks potential CRC check failures

2016-04-28 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10338:
-
Attachment: (was: HDFS-10338.001.patch)

> DistCp masks potential CRC check failures
> -
>
> Key: HDFS-10338
> URL: https://issues.apache.org/jira/browse/HDFS-10338
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 2.7.1
>Reporter: Elliot West
>Assignee: Lin Yiqun
>
> There appear to be edge cases whereby CRC checks may be circumvented when 
> requests for checksums from the source or target file system fail. In this 
> event CRCs could differ between the source and target and yet the DistCp copy 
> would succeed, even when the 'skip CRC check' option is not being used.
> The code in question is contained in the method 
> [{{org.apache.hadoop.tools.util.DistCpUtils#checksumsAreEqual(...)}}|https://github.com/apache/hadoop/blob/release-2.7.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/DistCpUtils.java#L457]
> Specifically this code block suggests that if there is a failure when trying 
> to read the source or target checksum then the method will return {{true}} 
> (i.e.  the checksums are equal), implying that the check succeeded. In actual 
> fact we just failed to obtain the checksum and could not perform the check.
> {code}
> try {
>   sourceChecksum = sourceChecksum != null ? sourceChecksum : 
> sourceFS.getFileChecksum(source);
>   targetChecksum = targetFS.getFileChecksum(target);
> } catch (IOException e) {
>   LOG.error("Unable to retrieve checksum for " + source + " or "
> + target, e);
> }
> return (sourceChecksum == null || targetChecksum == null ||
>   sourceChecksum.equals(targetChecksum));
> {code}
> I believe that at the very least the caught {{IOException}} should be 
> re-thrown. If this is not deemed desirable then I believe an option 
> ({{--strictCrc}}?) should be added to enforce a strict check where we require 
> that both the source and target CRCs are retrieved, are not null, and are 
> then compared for equality. If for any reason either of the CRCs retrievals 
> fail then an exception is thrown.
> Clearly some {{FileSystems}} do not support CRCs and invocations to 
> {{FileSystem.getFileChecksum(...)}} return {{null}} in these instances. I 
> would suggest that these should fail a strict CRC check to prevent users 
> developing a false sense of security in their copy pipeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10338) DistCp masks potential CRC check failures

2016-04-28 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10338:
-
Attachment: HDFS-10338.001.patch

> DistCp masks potential CRC check failures
> -
>
> Key: HDFS-10338
> URL: https://issues.apache.org/jira/browse/HDFS-10338
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 2.7.1
>Reporter: Elliot West
>Assignee: Lin Yiqun
> Attachments: HDFS-10338.001.patch
>
>
> There appear to be edge cases whereby CRC checks may be circumvented when 
> requests for checksums from the source or target file system fail. In this 
> event CRCs could differ between the source and target and yet the DistCp copy 
> would succeed, even when the 'skip CRC check' option is not being used.
> The code in question is contained in the method 
> [{{org.apache.hadoop.tools.util.DistCpUtils#checksumsAreEqual(...)}}|https://github.com/apache/hadoop/blob/release-2.7.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/DistCpUtils.java#L457]
> Specifically this code block suggests that if there is a failure when trying 
> to read the source or target checksum then the method will return {{true}} 
> (i.e.  the checksums are equal), implying that the check succeeded. In actual 
> fact we just failed to obtain the checksum and could not perform the check.
> {code}
> try {
>   sourceChecksum = sourceChecksum != null ? sourceChecksum : 
> sourceFS.getFileChecksum(source);
>   targetChecksum = targetFS.getFileChecksum(target);
> } catch (IOException e) {
>   LOG.error("Unable to retrieve checksum for " + source + " or "
> + target, e);
> }
> return (sourceChecksum == null || targetChecksum == null ||
>   sourceChecksum.equals(targetChecksum));
> {code}
> I believe that at the very least the caught {{IOException}} should be 
> re-thrown. If this is not deemed desirable then I believe an option 
> ({{--strictCrc}}?) should be added to enforce a strict check where we require 
> that both the source and target CRCs are retrieved, are not null, and are 
> then compared for equality. If for any reason either of the CRCs retrievals 
> fail then an exception is thrown.
> Clearly some {{FileSystems}} do not support CRCs and invocations to 
> {{FileSystem.getFileChecksum(...)}} return {{null}} in these instances. I 
> would suggest that these should fail a strict CRC check to prevent users 
> developing a false sense of security in their copy pipeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10338) DistCp masks potential CRC check failures

2016-04-28 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10338:
-
Status: Patch Available  (was: Open)

Attach a initial patch for this, thanks for review.

> DistCp masks potential CRC check failures
> -
>
> Key: HDFS-10338
> URL: https://issues.apache.org/jira/browse/HDFS-10338
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 2.7.1
>Reporter: Elliot West
>Assignee: Lin Yiqun
> Attachments: HDFS-10338.001.patch
>
>
> There appear to be edge cases whereby CRC checks may be circumvented when 
> requests for checksums from the source or target file system fail. In this 
> event CRCs could differ between the source and target and yet the DistCp copy 
> would succeed, even when the 'skip CRC check' option is not being used.
> The code in question is contained in the method 
> [{{org.apache.hadoop.tools.util.DistCpUtils#checksumsAreEqual(...)}}|https://github.com/apache/hadoop/blob/release-2.7.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/DistCpUtils.java#L457]
> Specifically this code block suggests that if there is a failure when trying 
> to read the source or target checksum then the method will return {{true}} 
> (i.e.  the checksums are equal), implying that the check succeeded. In actual 
> fact we just failed to obtain the checksum and could not perform the check.
> {code}
> try {
>   sourceChecksum = sourceChecksum != null ? sourceChecksum : 
> sourceFS.getFileChecksum(source);
>   targetChecksum = targetFS.getFileChecksum(target);
> } catch (IOException e) {
>   LOG.error("Unable to retrieve checksum for " + source + " or "
> + target, e);
> }
> return (sourceChecksum == null || targetChecksum == null ||
>   sourceChecksum.equals(targetChecksum));
> {code}
> I believe that at the very least the caught {{IOException}} should be 
> re-thrown. If this is not deemed desirable then I believe an option 
> ({{--strictCrc}}?) should be added to enforce a strict check where we require 
> that both the source and target CRCs are retrieved, are not null, and are 
> then compared for equality. If for any reason either of the CRCs retrievals 
> fail then an exception is thrown.
> Clearly some {{FileSystems}} do not support CRCs and invocations to 
> {{FileSystem.getFileChecksum(...)}} return {{null}} in these instances. I 
> would suggest that these should fail a strict CRC check to prevent users 
> developing a false sense of security in their copy pipeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10337) OfflineEditsViewer stats option should print 0 instead of null for the count of operations

2016-04-27 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10337:
-
Attachment: HDFS-10337.002.patch

Thanks for the quick review. Update the latest patch for addressing the comment.

> OfflineEditsViewer stats option should print 0 instead of null for the count 
> of operations
> --
>
> Key: HDFS-10337
> URL: https://issues.apache.org/jira/browse/HDFS-10337
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Akira AJISAKA
>Assignee: Lin Yiqun
>Priority: Minor
>  Labels: newbie
> Attachments: HDFS-10337.001.patch, HDFS-10337.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-10338) DistCp masks potential CRC check failures

2016-04-27 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun reassigned HDFS-10338:


Assignee: Lin Yiqun

> DistCp masks potential CRC check failures
> -
>
> Key: HDFS-10338
> URL: https://issues.apache.org/jira/browse/HDFS-10338
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 2.7.1
>Reporter: Elliot West
>Assignee: Lin Yiqun
>
> There appear to be edge cases whereby CRC checks may be circumvented when 
> requests for checksums from the source or target file system fail. In this 
> event CRCs could differ between the source and target and yet the DistCp copy 
> would succeed, even when the 'skip CRC check' option is not being used.
> The code in question is contained in the method 
> [{{org.apache.hadoop.tools.util.DistCpUtils#checksumsAreEqual(...)}}|https://github.com/apache/hadoop/blob/release-2.7.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/DistCpUtils.java#L457]
> Specifically this code block suggests that if there is a failure when trying 
> to read the source or target checksum then the method will return {{true}} 
> (i.e.  the checksums are equal), implying that the check succeeded. In actual 
> fact we just failed to obtain the checksum and could not perform the check.
> {code}
> try {
>   sourceChecksum = sourceChecksum != null ? sourceChecksum : 
> sourceFS.getFileChecksum(source);
>   targetChecksum = targetFS.getFileChecksum(target);
> } catch (IOException e) {
>   LOG.error("Unable to retrieve checksum for " + source + " or "
> + target, e);
> }
> return (sourceChecksum == null || targetChecksum == null ||
>   sourceChecksum.equals(targetChecksum));
> {code}
> I believe that at the very least the caught {{IOException}} should be 
> re-thrown. If this is not deemed desirable then I believe an option 
> ({{--strictCrc}}?) should be added to enforce a strict check where we require 
> that both the source and target CRCs are retrieved, are not null, and are 
> then compared for equality. If for any reason either of the CRCs retrievals 
> fail then an exception is thrown.
> Clearly some {{FileSystems}} do not support CRCs and invocations to 
> {{FileSystem.getFileChecksum(...)}} return {{null}} in these instances. I 
> would suggest that these should fail a strict CRC check to prevent users 
> developing a false sense of security in their copy pipeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10338) DistCp masks potential CRC check failures

2016-04-27 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15261456#comment-15261456
 ] 

Lin Yiqun commented on HDFS-10338:
--

Hi, [~teabot], I have two comments for this:

* It looks the option {{ignoreFailures}} that [~liuml07] suggested will be 
better. In one sense, the {{strictCrc}} option has same meaning with 
{{skipcrccheck}} which are both doing a crc check. However now, we will do a 
strict crc check, there will be more failures in checksum comparing. So the new 
option {{ignoreFailures}} will be reasonable.

* I agree with you that some {{FileSystems}} do not support CRCs should be as a 
failed case.

Assign this work to me. If there are no other comments, I will post a patch 
later for addressing the comments as mentioned above.

> DistCp masks potential CRC check failures
> -
>
> Key: HDFS-10338
> URL: https://issues.apache.org/jira/browse/HDFS-10338
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 2.7.1
>Reporter: Elliot West
>
> There appear to be edge cases whereby CRC checks may be circumvented when 
> requests for checksums from the source or target file system fail. In this 
> event CRCs could differ between the source and target and yet the DistCp copy 
> would succeed, even when the 'skip CRC check' option is not being used.
> The code in question is contained in the method 
> [{{org.apache.hadoop.tools.util.DistCpUtils#checksumsAreEqual(...)}}|https://github.com/apache/hadoop/blob/release-2.7.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/DistCpUtils.java#L457]
> Specifically this code block suggests that if there is a failure when trying 
> to read the source or target checksum then the method will return {{true}} 
> (i.e.  the checksums are equal), implying that the check succeeded. In actual 
> fact we just failed to obtain the checksum and could not perform the check.
> {code}
> try {
>   sourceChecksum = sourceChecksum != null ? sourceChecksum : 
> sourceFS.getFileChecksum(source);
>   targetChecksum = targetFS.getFileChecksum(target);
> } catch (IOException e) {
>   LOG.error("Unable to retrieve checksum for " + source + " or "
> + target, e);
> }
> return (sourceChecksum == null || targetChecksum == null ||
>   sourceChecksum.equals(targetChecksum));
> {code}
> I believe that at the very least the caught {{IOException}} should be 
> re-thrown. If this is not deemed desirable then I believe an option 
> ({{--strictCrc}}?) should be added to enforce a strict check where we require 
> that both the source and target CRCs are retrieved, are not null, and are 
> then compared for equality. If for any reason either of the CRCs retrievals 
> fail then an exception is thrown.
> Clearly some {{FileSystems}} do not support CRCs and invocations to 
> {{FileSystem.getFileChecksum(...)}} return {{null}} in these instances. I 
> would suggest that these should fail a strict CRC check to prevent users 
> developing a false sense of security in their copy pipeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10337) OfflineEditsViewer stats option should print 0 instead of null for the count of operations

2016-04-27 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15261401#comment-15261401
 ] 

Lin Yiqun commented on HDFS-10337:
--

HI, [~ajisakaa], I attach a patch for this, can this patch satisfied your issue?

> OfflineEditsViewer stats option should print 0 instead of null for the count 
> of operations
> --
>
> Key: HDFS-10337
> URL: https://issues.apache.org/jira/browse/HDFS-10337
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Akira AJISAKA
>Assignee: Lin Yiqun
>Priority: Minor
>  Labels: newbie
> Attachments: HDFS-10337.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10337) OfflineEditsViewer stats option should print 0 instead of null for the count of operations

2016-04-27 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10337:
-
Attachment: HDFS-10337.001.patch

> OfflineEditsViewer stats option should print 0 instead of null for the count 
> of operations
> --
>
> Key: HDFS-10337
> URL: https://issues.apache.org/jira/browse/HDFS-10337
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Akira AJISAKA
>Assignee: Lin Yiqun
>Priority: Minor
>  Labels: newbie
> Attachments: HDFS-10337.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10337) OfflineEditsViewer stats option should print 0 instead of null for the count of operations

2016-04-27 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10337:
-
Status: Patch Available  (was: Open)

> OfflineEditsViewer stats option should print 0 instead of null for the count 
> of operations
> --
>
> Key: HDFS-10337
> URL: https://issues.apache.org/jira/browse/HDFS-10337
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Akira AJISAKA
>Assignee: Lin Yiqun
>Priority: Minor
>  Labels: newbie
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-10337) OfflineEditsViewer stats option should print 0 instead of null for the count of operations

2016-04-27 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun reassigned HDFS-10337:


Assignee: Lin Yiqun

> OfflineEditsViewer stats option should print 0 instead of null for the count 
> of operations
> --
>
> Key: HDFS-10337
> URL: https://issues.apache.org/jira/browse/HDFS-10337
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Akira AJISAKA
>Assignee: Lin Yiqun
>Priority: Minor
>  Labels: newbie
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10336) TestBalancer failing intermittently because of not reseting UserGroupInformation completely

2016-04-26 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10336:
-
Affects Version/s: 3.0.0

> TestBalancer failing intermittently because of not reseting 
> UserGroupInformation completely
> ---
>
> Key: HDFS-10336
> URL: https://issues.apache.org/jira/browse/HDFS-10336
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS-10336.001.patch
>
>
> The unit test {{TestBalancer}} failed sometimes. 
> I looked for the reason. I found two main reasons causing this.
> * 1st. The test {{TestBalancer#testBalancerWithKeytabs}} executed timeout.
> {code}
> org.apache.hadoop.hdfs.server.balancer.TestBalancer
> testBalancerWithKeytabs(org.apache.hadoop.hdfs.server.balancer.TestBalancer)  
> Time elapsed: 300.41 sec  <<< ERROR!
> java.lang.Exception: test timed out after 30 milliseconds
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher.waitForMoveCompletion(Dispatcher.java:1122)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchBlockMoves(Dispatcher.java:1096)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchAndCheckContinue(Dispatcher.java:1060)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.runOneIteration(Balancer.java:635)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.run(Balancer.java:689)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.testUnknownDatanode(TestBalancer.java:1098)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.access$000(TestBalancer.java:125)
> {code}
> * 2nd. The test {{TestBalancer#testBalancerWithKeytabs}} reset the {{UGI}} 
> not completely sometimes in the finally block. And this affected the other 
> unit tests threw {{IOException}}, like this:
> {code}
> testBalancerWithNonZeroThreadsForMove(org.apache.hadoop.hdfs.server.balancer.TestBalancer)
>   Time elapsed: 0 sec  <<< ERROR!
> java.io.IOException: Running in secure mode, but config doesn't have a keytab
>   at org.apache.hadoop.security.SecurityUtil.login(SecurityUtil.java:300)
> {code}
> And there were not only one test will be affected by this. We should add a 
> line to do before doing reset {{UGI}} operation and can avoid the potenial 
> exception happens.
> {code}
> UserGroupInformation.reset();
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10313) Distcp need to enforce the order of snapshot names passed to -diff

2016-04-26 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259500#comment-15259500
 ] 

Lin Yiqun commented on HDFS-10313:
--

Thanks [~yzhangal] for commit!

> Distcp need to enforce the order of snapshot names passed to -diff
> --
>
> Key: HDFS-10313
> URL: https://issues.apache.org/jira/browse/HDFS-10313
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Reporter: Yongjun Zhang
>Assignee: Lin Yiqun
> Fix For: 2.8.0
>
> Attachments: HDFS-10313.001.patch, HDFS-10313.002.patch, 
> HDFS-10313.003.patch
>
>
> This jira is to propose adding a check to distcp, when {{-diff s1 s2}} is 
> passed, we need to ensure that s2 is newer than s1, otherwise, abort with a 
> informative error message.
> This is the result of my offline discussion with [~jingzhao] on HDFS-9820. 
> Thanks Jing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10336) TestBalancer failing intermittently because of not reseting UserGroupInformation completely

2016-04-26 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10336:
-
Attachment: HDFS-10336.001.patch

> TestBalancer failing intermittently because of not reseting 
> UserGroupInformation completely
> ---
>
> Key: HDFS-10336
> URL: https://issues.apache.org/jira/browse/HDFS-10336
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS-10336.001.patch
>
>
> The unit test {{TestBalancer}} failed sometimes. 
> I looked for the reason. I found two main reasons causing this.
> * 1st. The test {{TestBalancer#testBalancerWithKeytabs}} executed timeout.
> {code}
> org.apache.hadoop.hdfs.server.balancer.TestBalancer
> testBalancerWithKeytabs(org.apache.hadoop.hdfs.server.balancer.TestBalancer)  
> Time elapsed: 300.41 sec  <<< ERROR!
> java.lang.Exception: test timed out after 30 milliseconds
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher.waitForMoveCompletion(Dispatcher.java:1122)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchBlockMoves(Dispatcher.java:1096)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchAndCheckContinue(Dispatcher.java:1060)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.runOneIteration(Balancer.java:635)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.run(Balancer.java:689)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.testUnknownDatanode(TestBalancer.java:1098)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.access$000(TestBalancer.java:125)
> {code}
> * 2nd. The test {{TestBalancer#testBalancerWithKeytabs}} reset the {{UGI}} 
> not completely sometimes in the finally block. And this affected the other 
> unit tests threw {{IOException}}, like this:
> {code}
> testBalancerWithNonZeroThreadsForMove(org.apache.hadoop.hdfs.server.balancer.TestBalancer)
>   Time elapsed: 0 sec  <<< ERROR!
> java.io.IOException: Running in secure mode, but config doesn't have a keytab
>   at org.apache.hadoop.security.SecurityUtil.login(SecurityUtil.java:300)
> {code}
> And there were not only one test will be affected by this. We should add a 
> line to do before doing reset {{UGI}} operation and can avoid the potenial 
> exception happens.
> {code}
> UserGroupInformation.reset();
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10336) TestBalancer failing intermittently because of not reseting UserGroupInformation completely

2016-04-26 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10336:
-
Status: Patch Available  (was: Open)

Attach a initial patch, thanks review.

> TestBalancer failing intermittently because of not reseting 
> UserGroupInformation completely
> ---
>
> Key: HDFS-10336
> URL: https://issues.apache.org/jira/browse/HDFS-10336
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
>
> The unit test {{TestBalancer}} failed sometimes. 
> I looked for the reason. I found two main reasons causing this.
> * 1st. The test {{TestBalancer#testBalancerWithKeytabs}} executed timeout.
> {code}
> org.apache.hadoop.hdfs.server.balancer.TestBalancer
> testBalancerWithKeytabs(org.apache.hadoop.hdfs.server.balancer.TestBalancer)  
> Time elapsed: 300.41 sec  <<< ERROR!
> java.lang.Exception: test timed out after 30 milliseconds
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher.waitForMoveCompletion(Dispatcher.java:1122)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchBlockMoves(Dispatcher.java:1096)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchAndCheckContinue(Dispatcher.java:1060)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.runOneIteration(Balancer.java:635)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.run(Balancer.java:689)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.testUnknownDatanode(TestBalancer.java:1098)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.access$000(TestBalancer.java:125)
> {code}
> * 2nd. The test {{TestBalancer#testBalancerWithKeytabs}} reset the {{UGI}} 
> not completely sometimes in the finally block. And this affected the other 
> unit tests threw {{IOException}}, like this:
> {code}
> testBalancerWithNonZeroThreadsForMove(org.apache.hadoop.hdfs.server.balancer.TestBalancer)
>   Time elapsed: 0 sec  <<< ERROR!
> java.io.IOException: Running in secure mode, but config doesn't have a keytab
>   at org.apache.hadoop.security.SecurityUtil.login(SecurityUtil.java:300)
> {code}
> And there were not only one test will be affected by this. We should add a 
> line to do before doing reset {{UGI}} operation and can avoid the potenial 
> exception happens.
> {code}
> UserGroupInformation.reset();
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-10336) TestBalancer failing intermittently because of not reseting UserGroupInformation completely

2016-04-26 Thread Lin Yiqun (JIRA)
Lin Yiqun created HDFS-10336:


 Summary: TestBalancer failing intermittently because of not 
reseting UserGroupInformation completely
 Key: HDFS-10336
 URL: https://issues.apache.org/jira/browse/HDFS-10336
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Lin Yiqun
Assignee: Lin Yiqun


The unit test {{TestBalancer}} failed sometimes. 

I looked for the reason. I found two main reasons causing this.

* 1st. The test {{TestBalancer#testBalancerWithKeytabs}} executed timeout.
{code}
org.apache.hadoop.hdfs.server.balancer.TestBalancer
testBalancerWithKeytabs(org.apache.hadoop.hdfs.server.balancer.TestBalancer)  
Time elapsed: 300.41 sec  <<< ERROR!
java.lang.Exception: test timed out after 30 milliseconds
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.hdfs.server.balancer.Dispatcher.waitForMoveCompletion(Dispatcher.java:1122)
at 
org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchBlockMoves(Dispatcher.java:1096)
at 
org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchAndCheckContinue(Dispatcher.java:1060)
at 
org.apache.hadoop.hdfs.server.balancer.Balancer.runOneIteration(Balancer.java:635)
at 
org.apache.hadoop.hdfs.server.balancer.Balancer.run(Balancer.java:689)
at 
org.apache.hadoop.hdfs.server.balancer.TestBalancer.testUnknownDatanode(TestBalancer.java:1098)
at 
org.apache.hadoop.hdfs.server.balancer.TestBalancer.access$000(TestBalancer.java:125)
{code}

* 2nd. The test {{TestBalancer#testBalancerWithKeytabs}} reset the {{UGI}} not 
completely sometimes in the finally block. And this affected the other unit 
tests threw {{IOException}}, like this:
{code}
testBalancerWithNonZeroThreadsForMove(org.apache.hadoop.hdfs.server.balancer.TestBalancer)
  Time elapsed: 0 sec  <<< ERROR!
java.io.IOException: Running in secure mode, but config doesn't have a keytab
at org.apache.hadoop.security.SecurityUtil.login(SecurityUtil.java:300)
{code}
And there were not only one test will be affected by this. We should add a line 
to do before doing reset {{UGI}} operation and can avoid the potenial exception 
happens.
{code}
UserGroupInformation.reset();
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10329) Bad initialisation of StringBuffer in RequestHedgingProxyProvider.java

2016-04-26 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259471#comment-15259471
 ] 

Lin Yiqun commented on HDFS-10329:
--

Thanks [~kihwal] for commit!

> Bad initialisation of StringBuffer in RequestHedgingProxyProvider.java
> --
>
> Key: HDFS-10329
> URL: https://issues.apache.org/jira/browse/HDFS-10329
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Reporter: Max Schaefer
>Assignee: Lin Yiqun
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: HDFS-10329.001.patch
>
>
> On [line 
> 167|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/RequestHedgingProxyProvider.java#L167]
>  of {{RequestHedgingProxyProvider.java}}, a {{StringBuilder}} is initialised 
> like this:
> {code}
> StringBuilder combinedInfo = new StringBuilder('[');
> {code}
> This won't have the (presumably) desired effect of creating a 
> {{StringBuilder}} containing the string {{"["}}; instead, it will create a 
> {{StringBuilder}} with capacity 91 (the character code of '['). See 
> [here|http://what-when-how.com/Tutorial/topic-90315a/Java-Puzzlers-Traps-Pitfalls-and-Corner-Cases-69.html]
>  for an explanation.
> To fix this, pass a string literal instead of the character literal:
> {code}
> StringBuilder combinedInfo = new StringBuilder("[");
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10329) Bad initialisation of StringBuffer in RequestHedgingProxyProvider.java

2016-04-26 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10329:
-
Status: Patch Available  (was: Open)

> Bad initialisation of StringBuffer in RequestHedgingProxyProvider.java
> --
>
> Key: HDFS-10329
> URL: https://issues.apache.org/jira/browse/HDFS-10329
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Reporter: Max Schaefer
>Assignee: Lin Yiqun
>Priority: Minor
> Attachments: HDFS-10329.001.patch
>
>
> On [line 
> 167|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/RequestHedgingProxyProvider.java#L167]
>  of {{RequestHedgingProxyProvider.java}}, a {{StringBuilder}} is initialised 
> like this:
> {code}
> StringBuilder combinedInfo = new StringBuilder('[');
> {code}
> This won't have the (presumably) desired effect of creating a 
> {{StringBuilder}} containing the string {{"["}}; instead, it will create a 
> {{StringBuilder}} with capacity 91 (the character code of '['). See 
> [here|http://what-when-how.com/Tutorial/topic-90315a/Java-Puzzlers-Traps-Pitfalls-and-Corner-Cases-69.html]
>  for an explanation.
> To fix this, pass a string literal instead of the character literal:
> {code}
> StringBuilder combinedInfo = new StringBuilder("[");
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10329) Bad initialisation of StringBuffer in RequestHedgingProxyProvider.java

2016-04-26 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10329:
-
Attachment: HDFS-10329.001.patch

> Bad initialisation of StringBuffer in RequestHedgingProxyProvider.java
> --
>
> Key: HDFS-10329
> URL: https://issues.apache.org/jira/browse/HDFS-10329
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Reporter: Max Schaefer
>Assignee: Lin Yiqun
>Priority: Minor
> Attachments: HDFS-10329.001.patch
>
>
> On [line 
> 167|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/RequestHedgingProxyProvider.java#L167]
>  of {{RequestHedgingProxyProvider.java}}, a {{StringBuilder}} is initialised 
> like this:
> {code}
> StringBuilder combinedInfo = new StringBuilder('[');
> {code}
> This won't have the (presumably) desired effect of creating a 
> {{StringBuilder}} containing the string {{"["}}; instead, it will create a 
> {{StringBuilder}} with capacity 91 (the character code of '['). See 
> [here|http://what-when-how.com/Tutorial/topic-90315a/Java-Puzzlers-Traps-Pitfalls-and-Corner-Cases-69.html]
>  for an explanation.
> To fix this, pass a string literal instead of the character literal:
> {code}
> StringBuilder combinedInfo = new StringBuilder("[");
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10329) Bad initialisation of StringBuffer in RequestHedgingProxyProvider.java

2016-04-26 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258035#comment-15258035
 ] 

Lin Yiqun commented on HDFS-10329:
--

Thanks [~xiemaisi] for reporting this. I checked for this, it's true that this 
line {{new StringBuilder('[');}} will just used the method 
{{java.lang.StringBuilder.StringBuilder(int capacity)}} rathan than 
{{java.lang.StringBuilder.StringBuilder(String str)}}. Attach a simple patch 
from me to fix this.

> Bad initialisation of StringBuffer in RequestHedgingProxyProvider.java
> --
>
> Key: HDFS-10329
> URL: https://issues.apache.org/jira/browse/HDFS-10329
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Reporter: Max Schaefer
>Assignee: Lin Yiqun
>Priority: Minor
>
> On [line 
> 167|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/RequestHedgingProxyProvider.java#L167]
>  of {{RequestHedgingProxyProvider.java}}, a {{StringBuilder}} is initialised 
> like this:
> {code}
> StringBuilder combinedInfo = new StringBuilder('[');
> {code}
> This won't have the (presumably) desired effect of creating a 
> {{StringBuilder}} containing the string {{"["}}; instead, it will create a 
> {{StringBuilder}} with capacity 91 (the character code of '['). See 
> [here|http://what-when-how.com/Tutorial/topic-90315a/Java-Puzzlers-Traps-Pitfalls-and-Corner-Cases-69.html]
>  for an explanation.
> To fix this, pass a string literal instead of the character literal:
> {code}
> StringBuilder combinedInfo = new StringBuilder("[");
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-10329) Bad initialisation of StringBuffer in RequestHedgingProxyProvider.java

2016-04-26 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun reassigned HDFS-10329:


Assignee: Lin Yiqun

> Bad initialisation of StringBuffer in RequestHedgingProxyProvider.java
> --
>
> Key: HDFS-10329
> URL: https://issues.apache.org/jira/browse/HDFS-10329
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Reporter: Max Schaefer
>Assignee: Lin Yiqun
>Priority: Minor
>
> On [line 
> 167|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/RequestHedgingProxyProvider.java#L167]
>  of {{RequestHedgingProxyProvider.java}}, a {{StringBuilder}} is initialised 
> like this:
> {code}
> StringBuilder combinedInfo = new StringBuilder('[');
> {code}
> This won't have the (presumably) desired effect of creating a 
> {{StringBuilder}} containing the string {{"["}}; instead, it will create a 
> {{StringBuilder}} with capacity 91 (the character code of '['). See 
> [here|http://what-when-how.com/Tutorial/topic-90315a/Java-Puzzlers-Traps-Pitfalls-and-Corner-Cases-69.html]
>  for an explanation.
> To fix this, pass a string literal instead of the character literal:
> {code}
> StringBuilder combinedInfo = new StringBuilder("[");
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10316) revisit corrupt replicas count

2016-04-24 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15255850#comment-15255850
 ] 

Lin Yiqun commented on HDFS-10316:
--

Hi, [~walter.k.su], it's a good catch! I looked the code and it's right that 
{{countNodes(blk).corruptReplicas()}} only checking for 
{{NORMAL}},{{READ_ONLY}} two types in method 
{{BlockManager#checkReplicaOnStorage}}, while it does not check for these two 
cases in {{BlockManager#findAndMarkBlockAsCorrupt}}. Attach a patch from me, 
what do you think of this?

> revisit corrupt replicas count
> --
>
> Key: HDFS-10316
> URL: https://issues.apache.org/jira/browse/HDFS-10316
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Walter Su
> Attachments: HDFS-10316.001.patch
>
>
> A DN has 4 types of storages:
> 1. NORMAL
> 2. READ_ONLY
> 3. FAILED
> 4. (missing/pruned)
> blocksMap.numNodes(blk) counts 1,2,3
> blocksMap.getStorages(blk) counts 1,2,3
> countNodes(blk).corruptReplicas() counts 1,2
> corruptReplicas counts 1,2,3,4. Because findAndMarkBlockAsCorrupt(..) 
> supports adding blk to the map even if the storage is not found.
> The inconsistency causes bugs like HDFS-9958.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10316) revisit corrupt replicas count

2016-04-24 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10316:
-
Attachment: HDFS-10316.001.patch

> revisit corrupt replicas count
> --
>
> Key: HDFS-10316
> URL: https://issues.apache.org/jira/browse/HDFS-10316
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Walter Su
> Attachments: HDFS-10316.001.patch
>
>
> A DN has 4 types of storages:
> 1. NORMAL
> 2. READ_ONLY
> 3. FAILED
> 4. (missing/pruned)
> blocksMap.numNodes(blk) counts 1,2,3
> blocksMap.getStorages(blk) counts 1,2,3
> countNodes(blk).corruptReplicas() counts 1,2
> corruptReplicas counts 1,2,3,4. Because findAndMarkBlockAsCorrupt(..) 
> supports adding blk to the map even if the storage is not found.
> The inconsistency causes bugs like HDFS-9958.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10313) Distcp does not check the order of snapshot names passed to -diff

2016-04-22 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10313:
-
Attachment: (was: HDFS-10313.003.patch)

> Distcp does not check the order of snapshot names passed to -diff
> -
>
> Key: HDFS-10313
> URL: https://issues.apache.org/jira/browse/HDFS-10313
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Reporter: Yongjun Zhang
>Assignee: Lin Yiqun
> Attachments: HDFS-10313.001.patch, HDFS-10313.002.patch, 
> HDFS-10313.003.patch
>
>
> This jira is to propose adding a check to distcp, when {{-diff s1 s2}} is 
> passed, we need to ensure that s2 is newer than s1, otherwise, abort with a 
> informative error message.
> This is the result of my offline discussion with [~jingzhao] on HDFS-9820. 
> Thanks Jing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10313) Distcp does not check the order of snapshot names passed to -diff

2016-04-22 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10313:
-
Attachment: HDFS-10313.003.patch

> Distcp does not check the order of snapshot names passed to -diff
> -
>
> Key: HDFS-10313
> URL: https://issues.apache.org/jira/browse/HDFS-10313
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Reporter: Yongjun Zhang
>Assignee: Lin Yiqun
> Attachments: HDFS-10313.001.patch, HDFS-10313.002.patch, 
> HDFS-10313.003.patch
>
>
> This jira is to propose adding a check to distcp, when {{-diff s1 s2}} is 
> passed, we need to ensure that s2 is newer than s1, otherwise, abort with a 
> informative error message.
> This is the result of my offline discussion with [~jingzhao] on HDFS-9820. 
> Thanks Jing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10313) Distcp does not check the order of snapshot names passed to -diff

2016-04-22 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15255104#comment-15255104
 ] 

Lin Yiqun commented on HDFS-10313:
--

The latest patch looks good that all the unit tests are passed and there is no 
other checkstyle warnings. But I have fixed another minor thing in my patch, 
and update the v003 patch.

> Distcp does not check the order of snapshot names passed to -diff
> -
>
> Key: HDFS-10313
> URL: https://issues.apache.org/jira/browse/HDFS-10313
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Reporter: Yongjun Zhang
>Assignee: Lin Yiqun
> Attachments: HDFS-10313.001.patch, HDFS-10313.002.patch, 
> HDFS-10313.003.patch
>
>
> This jira is to propose adding a check to distcp, when {{-diff s1 s2}} is 
> passed, we need to ensure that s2 is newer than s1, otherwise, abort with a 
> informative error message.
> This is the result of my offline discussion with [~jingzhao] on HDFS-9820. 
> Thanks Jing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10313) Distcp does not check the order of snapshot names passed to -diff

2016-04-21 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10313:
-
Attachment: HDFS-10313.003.patch

Update the patch for the latest comments, pending jenkins.

> Distcp does not check the order of snapshot names passed to -diff
> -
>
> Key: HDFS-10313
> URL: https://issues.apache.org/jira/browse/HDFS-10313
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Reporter: Yongjun Zhang
>Assignee: Lin Yiqun
> Attachments: HDFS-10313.001.patch, HDFS-10313.002.patch, 
> HDFS-10313.003.patch
>
>
> This jira is to propose adding a check to distcp, when {{-diff s1 s2}} is 
> passed, we need to ensure that s2 is newer than s1, otherwise, abort with a 
> informative error message.
> This is the result of my offline discussion with [~jingzhao] on HDFS-9820. 
> Thanks Jing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10313) Distcp does not check the order of snapshot names passed to -diff

2016-04-20 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10313:
-
Attachment: HDFS-10313.002.patch

Thanks [~yzhangal] for review. Update the latest for addressing the comments.

> Distcp does not check the order of snapshot names passed to -diff
> -
>
> Key: HDFS-10313
> URL: https://issues.apache.org/jira/browse/HDFS-10313
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Reporter: Yongjun Zhang
>Assignee: Lin Yiqun
> Attachments: HDFS-10313.001.patch, HDFS-10313.002.patch
>
>
> This jira is to propose adding a check to distcp, when {{-diff s1 s2}} is 
> passed, we need to ensure that s2 is newer than s1, otherwise, abort with a 
> informative error message.
> This is the result of my offline discussion with [~jingzhao] on HDFS-9820. 
> Thanks Jing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10313) Distcp does not check the order of snapshot names passed to -diff

2016-04-19 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10313:
-
Attachment: HDFS-10313.001.patch

> Distcp does not check the order of snapshot names passed to -diff
> -
>
> Key: HDFS-10313
> URL: https://issues.apache.org/jira/browse/HDFS-10313
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Reporter: Yongjun Zhang
>Assignee: Lin Yiqun
> Attachments: HDFS-10313.001.patch
>
>
> This jira is to propose adding a check to distcp, when {{-diff s1 s2}} is 
> passed, we need to ensure that s2 is newer than s1, otherwise, abort with a 
> informative error message.
> This is the result of my offline discussion with [~jingzhao] on HDFS-9820. 
> Thanks Jing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10313) Distcp does not check the order of snapshot names passed to -diff

2016-04-19 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10313:
-
Attachment: (was: HDFS-10313.001.patch)

> Distcp does not check the order of snapshot names passed to -diff
> -
>
> Key: HDFS-10313
> URL: https://issues.apache.org/jira/browse/HDFS-10313
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Reporter: Yongjun Zhang
>Assignee: Lin Yiqun
>
> This jira is to propose adding a check to distcp, when {{-diff s1 s2}} is 
> passed, we need to ensure that s2 is newer than s1, otherwise, abort with a 
> informative error message.
> This is the result of my offline discussion with [~jingzhao] on HDFS-9820. 
> Thanks Jing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10313) Distcp does not check the order of snapshot names passed to -diff

2016-04-19 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10313:
-
Attachment: HDFS-10313.001.patch

Sorry, the previous is not completed, update a latest patch.

> Distcp does not check the order of snapshot names passed to -diff
> -
>
> Key: HDFS-10313
> URL: https://issues.apache.org/jira/browse/HDFS-10313
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Reporter: Yongjun Zhang
>Assignee: Lin Yiqun
> Attachments: HDFS-10313.001.patch
>
>
> This jira is to propose adding a check to distcp, when {{-diff s1 s2}} is 
> passed, we need to ensure that s2 is newer than s1, otherwise, abort with a 
> informative error message.
> This is the result of my offline discussion with [~jingzhao] on HDFS-9820. 
> Thanks Jing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10313) Distcp does not check the order of snapshot names passed to -diff

2016-04-19 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10313:
-
Attachment: (was: HDFS-10313.001.patch)

> Distcp does not check the order of snapshot names passed to -diff
> -
>
> Key: HDFS-10313
> URL: https://issues.apache.org/jira/browse/HDFS-10313
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Reporter: Yongjun Zhang
>Assignee: Lin Yiqun
>
> This jira is to propose adding a check to distcp, when {{-diff s1 s2}} is 
> passed, we need to ensure that s2 is newer than s1, otherwise, abort with a 
> informative error message.
> This is the result of my offline discussion with [~jingzhao] on HDFS-9820. 
> Thanks Jing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10313) Distcp does not check the order of snapshot names passed to -diff

2016-04-19 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10313:
-
Attachment: HDFS-10313.001.patch

> Distcp does not check the order of snapshot names passed to -diff
> -
>
> Key: HDFS-10313
> URL: https://issues.apache.org/jira/browse/HDFS-10313
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Reporter: Yongjun Zhang
>Assignee: Lin Yiqun
> Attachments: HDFS-10313.001.patch
>
>
> This jira is to propose adding a check to distcp, when {{-diff s1 s2}} is 
> passed, we need to ensure that s2 is newer than s1, otherwise, abort with a 
> informative error message.
> This is the result of my offline discussion with [~jingzhao] on HDFS-9820. 
> Thanks Jing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10313) Distcp does not check the order of snapshot names passed to -diff

2016-04-19 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10313:
-
Status: Patch Available  (was: Open)

Attach a initial patch from me, thanks review.

> Distcp does not check the order of snapshot names passed to -diff
> -
>
> Key: HDFS-10313
> URL: https://issues.apache.org/jira/browse/HDFS-10313
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Reporter: Yongjun Zhang
>Assignee: Lin Yiqun
> Attachments: HDFS-10313.001.patch
>
>
> This jira is to propose adding a check to distcp, when {{-diff s1 s2}} is 
> passed, we need to ensure that s2 is newer than s1, otherwise, abort with a 
> informative error message.
> This is the result of my offline discussion with [~jingzhao] on HDFS-9820. 
> Thanks Jing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10313) Distcp does not check the order of snapshot names passed to -diff

2016-04-19 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15249114#comment-15249114
 ] 

Lin Yiqun commented on HDFS-10313:
--

Hi, [~yzhangal], I will upload a initial patch later.

> Distcp does not check the order of snapshot names passed to -diff
> -
>
> Key: HDFS-10313
> URL: https://issues.apache.org/jira/browse/HDFS-10313
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Reporter: Yongjun Zhang
>Assignee: Lin Yiqun
>
> This jira is to propose adding a check to distcp, when {{-diff s1 s2}} is 
> passed, we need to ensure that s2 is newer than s1, otherwise, abort with a 
> informative error message.
> This is the result of my offline discussion with [~jingzhao] on HDFS-9820. 
> Thanks Jing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-10313) Distcp does not check the order of snapshot names passed to -diff

2016-04-19 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun reassigned HDFS-10313:


Assignee: Lin Yiqun

> Distcp does not check the order of snapshot names passed to -diff
> -
>
> Key: HDFS-10313
> URL: https://issues.apache.org/jira/browse/HDFS-10313
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Reporter: Yongjun Zhang
>Assignee: Lin Yiqun
>
> This jira is to propose adding a check to distcp, when {{-diff s1 s2}} is 
> passed, we need to ensure that s2 is newer than s1, otherwise, abort with a 
> informative error message.
> This is the result of my offline discussion with [~jingzhao] on HDFS-9820. 
> Thanks Jing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10275) TestDataNodeMetrics failing intermittently due to TotalWriteTime counted incorrectly

2016-04-18 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246990#comment-15246990
 ] 

Lin Yiqun commented on HDFS-10275:
--

Thanks [~walter.k.su] for commit!

> TestDataNodeMetrics failing intermittently due to TotalWriteTime counted 
> incorrectly
> 
>
> Key: HDFS-10275
> URL: https://issues.apache.org/jira/browse/HDFS-10275
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Fix For: 2.7.3
>
> Attachments: HDFS-10275.001.patch
>
>
> The unit test {{TestDataNodeMetrics}} fails intermittently. The failed info 
> show these:
> {code}
> Results :
> Failed tests: 
>   
> TestDataNodeVolumeFailureToleration.testVolumeAndTolerableConfiguration:195->testVolumeConfig:232
>  expected: but was:
> Tests in error: 
>   TestOpenFilesWithSnapshot.testWithCheckpoint:94 ? IO Timed out waiting for 
> Min...
>   TestDataNodeMetrics.testDataNodeTimeSpend:279 ? Timeout Timed out waiting 
> for ...
>   TestHFlush.testHFlushInterrupted ? IO The stream is closed
> {code}
> In line 279 in {{TestDataNodeMetrics}}, it takes place timed out. Then I 
> looked into the code and found the real reason is that the metric of 
> {{TotalWriteTime}} frequently count 0 in each iteration of creating file. And 
> the this leads to retry operations till timeout.
> I debug the test in my local. I found the most suspect reason which cause 
> {{TotalWriteTime}} metric count always be 0 is that we using the 
> {{SimulatedFSDataset}} for spending time test. In {{SimulatedFSDataset}}, it 
> will use the inner class's method {{SimulatedOutputStream#write}} to count 
> the write time and the method of this class just updates the {{length}} and 
> throws its data away.
> {code}
> @Override
> public void write(byte[] b,
>   int off,
>   int len) throws IOException  {
>   length += len;
> }
> {code} 
> So the writing operation hardly not costs any time. So we should use a real 
> way to create file instead of simulated way. I have tested in my local that 
> the test is passed just one time when I delete the simulated way, while the 
> test retries many times to count write time in old way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10302) BlockPlacementPolicyDefault should use default replication considerload value

2016-04-18 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246988#comment-15246988
 ] 

Lin Yiqun commented on HDFS-10302:
--

Thanks [~kihwal] for quick review and commit!

> BlockPlacementPolicyDefault should use default replication considerload value
> -
>
> Key: HDFS-10302
> URL: https://issues.apache.org/jira/browse/HDFS-10302
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
>Priority: Trivial
> Fix For: 2.8.0
>
> Attachments: HDFS-10302.001.patch
>
>
> Now in method {{BlockPlacementPolicyDefault#initialize}}, it just uses value 
> {{true}} as the replication considerload default value rather than using the 
> existed string constant value 
> {{DFS_NAMENODE_REPLICATION_CONSIDERLOAD_DEFAULT}}.
> {code}
>   @Override
>   public void initialize(Configuration conf,  FSClusterStats stats,
>  NetworkTopology clusterMap, 
>  Host2NodesMap host2datanodeMap) {
> this.considerLoad = conf.getBoolean(
> DFSConfigKeys.DFS_NAMENODE_REPLICATION_CONSIDERLOAD_KEY, true);
> this.considerLoadFactor = conf.getDouble(
> DFSConfigKeys.DFS_NAMENODE_REPLICATION_CONSIDERLOAD_FACTOR,
> DFSConfigKeys.DFS_NAMENODE_REPLICATION_CONSIDERLOAD_FACTOR_DEFAULT);
> this.stats = stats;
> this.clusterMap = clusterMap;
> this.host2datanodeMap = host2datanodeMap;
> this.heartbeatInterval = conf.getLong(
> DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY,
> DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_DEFAULT) * 1000;
> this.tolerateHeartbeatMultiplier = conf.getInt(
> DFSConfigKeys.DFS_NAMENODE_TOLERATE_HEARTBEAT_MULTIPLIER_KEY,
> DFSConfigKeys.DFS_NAMENODE_TOLERATE_HEARTBEAT_MULTIPLIER_DEFAULT);
> this.staleInterval = conf.getLong(
> DFSConfigKeys.DFS_NAMENODE_STALE_DATANODE_INTERVAL_KEY, 
> DFSConfigKeys.DFS_NAMENODE_STALE_DATANODE_INTERVAL_DEFAULT);
> this.preferLocalNode = conf.getBoolean(
> DFSConfigKeys.
> DFS_NAMENODE_BLOCKPLACEMENTPOLICY_DEFAULT_PREFER_LOCAL_NODE_KEY,
> DFSConfigKeys.
> 
> DFS_NAMENODE_BLOCKPLACEMENTPOLICY_DEFAULT_PREFER_LOCAL_NODE_DEFAULT);
>   }
> {code}
> And now the value {{DFS_NAMENODE_REPLICATION_CONSIDERLOAD_DEFAULT}} is not be 
> used in any place.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10275) TestDataNodeMetrics failing intermittently due to TotalWriteTime counted incorrectly

2016-04-18 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245521#comment-15245521
 ] 

Lin Yiqun commented on HDFS-10275:
--

Hi, [~walter.k.su], I have removed {{SimulatedFSDataset.setFactory(conf);}} in 
my patch, do you means there is no need to bump the timeout time in addition?

> TestDataNodeMetrics failing intermittently due to TotalWriteTime counted 
> incorrectly
> 
>
> Key: HDFS-10275
> URL: https://issues.apache.org/jira/browse/HDFS-10275
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS-10275.001.patch
>
>
> The unit test {{TestDataNodeMetrics}} fails intermittently. The failed info 
> show these:
> {code}
> Results :
> Failed tests: 
>   
> TestDataNodeVolumeFailureToleration.testVolumeAndTolerableConfiguration:195->testVolumeConfig:232
>  expected: but was:
> Tests in error: 
>   TestOpenFilesWithSnapshot.testWithCheckpoint:94 ? IO Timed out waiting for 
> Min...
>   TestDataNodeMetrics.testDataNodeTimeSpend:279 ? Timeout Timed out waiting 
> for ...
>   TestHFlush.testHFlushInterrupted ? IO The stream is closed
> {code}
> In line 279 in {{TestDataNodeMetrics}}, it takes place timed out. Then I 
> looked into the code and found the real reason is that the metric of 
> {{TotalWriteTime}} frequently count 0 in each iteration of creating file. And 
> the this leads to retry operations till timeout.
> I debug the test in my local. I found the most suspect reason which cause 
> {{TotalWriteTime}} metric count always be 0 is that we using the 
> {{SimulatedFSDataset}} for spending time test. In {{SimulatedFSDataset}}, it 
> will use the inner class's method {{SimulatedOutputStream#write}} to count 
> the write time and the method of this class just updates the {{length}} and 
> throws its data away.
> {code}
> @Override
> public void write(byte[] b,
>   int off,
>   int len) throws IOException  {
>   length += len;
> }
> {code} 
> So the writing operation hardly not costs any time. So we should use a real 
> way to create file instead of simulated way. I have tested in my local that 
> the test is passed just one time when I delete the simulated way, while the 
> test retries many times to count write time in old way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10302) BlockPlacementPolicyDefault should use default replication considerload value

2016-04-17 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10302:
-
Issue Type: Improvement  (was: Bug)

> BlockPlacementPolicyDefault should use default replication considerload value
> -
>
> Key: HDFS-10302
> URL: https://issues.apache.org/jira/browse/HDFS-10302
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
>Priority: Trivial
> Attachments: HDFS-10302.001.patch
>
>
> Now in method {{BlockPlacementPolicyDefault#initialize}}, it just uses value 
> {{true}} as the replication considerload default value rather than using the 
> existed string constant value 
> {{DFS_NAMENODE_REPLICATION_CONSIDERLOAD_DEFAULT}}.
> {code}
>   @Override
>   public void initialize(Configuration conf,  FSClusterStats stats,
>  NetworkTopology clusterMap, 
>  Host2NodesMap host2datanodeMap) {
> this.considerLoad = conf.getBoolean(
> DFSConfigKeys.DFS_NAMENODE_REPLICATION_CONSIDERLOAD_KEY, true);
> this.considerLoadFactor = conf.getDouble(
> DFSConfigKeys.DFS_NAMENODE_REPLICATION_CONSIDERLOAD_FACTOR,
> DFSConfigKeys.DFS_NAMENODE_REPLICATION_CONSIDERLOAD_FACTOR_DEFAULT);
> this.stats = stats;
> this.clusterMap = clusterMap;
> this.host2datanodeMap = host2datanodeMap;
> this.heartbeatInterval = conf.getLong(
> DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY,
> DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_DEFAULT) * 1000;
> this.tolerateHeartbeatMultiplier = conf.getInt(
> DFSConfigKeys.DFS_NAMENODE_TOLERATE_HEARTBEAT_MULTIPLIER_KEY,
> DFSConfigKeys.DFS_NAMENODE_TOLERATE_HEARTBEAT_MULTIPLIER_DEFAULT);
> this.staleInterval = conf.getLong(
> DFSConfigKeys.DFS_NAMENODE_STALE_DATANODE_INTERVAL_KEY, 
> DFSConfigKeys.DFS_NAMENODE_STALE_DATANODE_INTERVAL_DEFAULT);
> this.preferLocalNode = conf.getBoolean(
> DFSConfigKeys.
> DFS_NAMENODE_BLOCKPLACEMENTPOLICY_DEFAULT_PREFER_LOCAL_NODE_KEY,
> DFSConfigKeys.
> 
> DFS_NAMENODE_BLOCKPLACEMENTPOLICY_DEFAULT_PREFER_LOCAL_NODE_DEFAULT);
>   }
> {code}
> And now the value {{DFS_NAMENODE_REPLICATION_CONSIDERLOAD_DEFAULT}} is not be 
> used in any place.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10302) BlockPlacementPolicyDefault should use default replication considerload value

2016-04-17 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10302:
-
Attachment: HDFS-10302.001.patch

> BlockPlacementPolicyDefault should use default replication considerload value
> -
>
> Key: HDFS-10302
> URL: https://issues.apache.org/jira/browse/HDFS-10302
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
>Priority: Trivial
> Attachments: HDFS-10302.001.patch
>
>
> Now in method {{BlockPlacementPolicyDefault#initialize}}, it just uses value 
> {{true}} as the replication considerload default value rather than using the 
> existed string constant value 
> {{DFS_NAMENODE_REPLICATION_CONSIDERLOAD_DEFAULT}}.
> {code}
>   @Override
>   public void initialize(Configuration conf,  FSClusterStats stats,
>  NetworkTopology clusterMap, 
>  Host2NodesMap host2datanodeMap) {
> this.considerLoad = conf.getBoolean(
> DFSConfigKeys.DFS_NAMENODE_REPLICATION_CONSIDERLOAD_KEY, true);
> this.considerLoadFactor = conf.getDouble(
> DFSConfigKeys.DFS_NAMENODE_REPLICATION_CONSIDERLOAD_FACTOR,
> DFSConfigKeys.DFS_NAMENODE_REPLICATION_CONSIDERLOAD_FACTOR_DEFAULT);
> this.stats = stats;
> this.clusterMap = clusterMap;
> this.host2datanodeMap = host2datanodeMap;
> this.heartbeatInterval = conf.getLong(
> DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY,
> DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_DEFAULT) * 1000;
> this.tolerateHeartbeatMultiplier = conf.getInt(
> DFSConfigKeys.DFS_NAMENODE_TOLERATE_HEARTBEAT_MULTIPLIER_KEY,
> DFSConfigKeys.DFS_NAMENODE_TOLERATE_HEARTBEAT_MULTIPLIER_DEFAULT);
> this.staleInterval = conf.getLong(
> DFSConfigKeys.DFS_NAMENODE_STALE_DATANODE_INTERVAL_KEY, 
> DFSConfigKeys.DFS_NAMENODE_STALE_DATANODE_INTERVAL_DEFAULT);
> this.preferLocalNode = conf.getBoolean(
> DFSConfigKeys.
> DFS_NAMENODE_BLOCKPLACEMENTPOLICY_DEFAULT_PREFER_LOCAL_NODE_KEY,
> DFSConfigKeys.
> 
> DFS_NAMENODE_BLOCKPLACEMENTPOLICY_DEFAULT_PREFER_LOCAL_NODE_DEFAULT);
>   }
> {code}
> And now the value {{DFS_NAMENODE_REPLICATION_CONSIDERLOAD_DEFAULT}} is not be 
> used in any place.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10302) BlockPlacementPolicyDefault should use default replication considerload value

2016-04-17 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10302:
-
Status: Patch Available  (was: Open)

Attach a simple patch for this.

> BlockPlacementPolicyDefault should use default replication considerload value
> -
>
> Key: HDFS-10302
> URL: https://issues.apache.org/jira/browse/HDFS-10302
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
>Priority: Trivial
>
> Now in method {{BlockPlacementPolicyDefault#initialize}}, it just uses value 
> {{true}} as the replication considerload default value rather than using the 
> existed string constant value 
> {{DFS_NAMENODE_REPLICATION_CONSIDERLOAD_DEFAULT}}.
> {code}
>   @Override
>   public void initialize(Configuration conf,  FSClusterStats stats,
>  NetworkTopology clusterMap, 
>  Host2NodesMap host2datanodeMap) {
> this.considerLoad = conf.getBoolean(
> DFSConfigKeys.DFS_NAMENODE_REPLICATION_CONSIDERLOAD_KEY, true);
> this.considerLoadFactor = conf.getDouble(
> DFSConfigKeys.DFS_NAMENODE_REPLICATION_CONSIDERLOAD_FACTOR,
> DFSConfigKeys.DFS_NAMENODE_REPLICATION_CONSIDERLOAD_FACTOR_DEFAULT);
> this.stats = stats;
> this.clusterMap = clusterMap;
> this.host2datanodeMap = host2datanodeMap;
> this.heartbeatInterval = conf.getLong(
> DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY,
> DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_DEFAULT) * 1000;
> this.tolerateHeartbeatMultiplier = conf.getInt(
> DFSConfigKeys.DFS_NAMENODE_TOLERATE_HEARTBEAT_MULTIPLIER_KEY,
> DFSConfigKeys.DFS_NAMENODE_TOLERATE_HEARTBEAT_MULTIPLIER_DEFAULT);
> this.staleInterval = conf.getLong(
> DFSConfigKeys.DFS_NAMENODE_STALE_DATANODE_INTERVAL_KEY, 
> DFSConfigKeys.DFS_NAMENODE_STALE_DATANODE_INTERVAL_DEFAULT);
> this.preferLocalNode = conf.getBoolean(
> DFSConfigKeys.
> DFS_NAMENODE_BLOCKPLACEMENTPOLICY_DEFAULT_PREFER_LOCAL_NODE_KEY,
> DFSConfigKeys.
> 
> DFS_NAMENODE_BLOCKPLACEMENTPOLICY_DEFAULT_PREFER_LOCAL_NODE_DEFAULT);
>   }
> {code}
> And now the value {{DFS_NAMENODE_REPLICATION_CONSIDERLOAD_DEFAULT}} is not be 
> used in any place.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-10302) BlockPlacementPolicyDefault should use default replication considerload value

2016-04-17 Thread Lin Yiqun (JIRA)
Lin Yiqun created HDFS-10302:


 Summary: BlockPlacementPolicyDefault should use default 
replication considerload value
 Key: HDFS-10302
 URL: https://issues.apache.org/jira/browse/HDFS-10302
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.1
Reporter: Lin Yiqun
Assignee: Lin Yiqun
Priority: Trivial


Now in method {{BlockPlacementPolicyDefault#initialize}}, it just uses value 
{{true}} as the replication considerload default value rather than using the 
existed string constant value {{DFS_NAMENODE_REPLICATION_CONSIDERLOAD_DEFAULT}}.
{code}
  @Override
  public void initialize(Configuration conf,  FSClusterStats stats,
 NetworkTopology clusterMap, 
 Host2NodesMap host2datanodeMap) {
this.considerLoad = conf.getBoolean(
DFSConfigKeys.DFS_NAMENODE_REPLICATION_CONSIDERLOAD_KEY, true);
this.considerLoadFactor = conf.getDouble(
DFSConfigKeys.DFS_NAMENODE_REPLICATION_CONSIDERLOAD_FACTOR,
DFSConfigKeys.DFS_NAMENODE_REPLICATION_CONSIDERLOAD_FACTOR_DEFAULT);
this.stats = stats;
this.clusterMap = clusterMap;
this.host2datanodeMap = host2datanodeMap;
this.heartbeatInterval = conf.getLong(
DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY,
DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_DEFAULT) * 1000;
this.tolerateHeartbeatMultiplier = conf.getInt(
DFSConfigKeys.DFS_NAMENODE_TOLERATE_HEARTBEAT_MULTIPLIER_KEY,
DFSConfigKeys.DFS_NAMENODE_TOLERATE_HEARTBEAT_MULTIPLIER_DEFAULT);
this.staleInterval = conf.getLong(
DFSConfigKeys.DFS_NAMENODE_STALE_DATANODE_INTERVAL_KEY, 
DFSConfigKeys.DFS_NAMENODE_STALE_DATANODE_INTERVAL_DEFAULT);
this.preferLocalNode = conf.getBoolean(
DFSConfigKeys.
DFS_NAMENODE_BLOCKPLACEMENTPOLICY_DEFAULT_PREFER_LOCAL_NODE_KEY,
DFSConfigKeys.

DFS_NAMENODE_BLOCKPLACEMENTPOLICY_DEFAULT_PREFER_LOCAL_NODE_DEFAULT);
  }
{code}

And now the value {{DFS_NAMENODE_REPLICATION_CONSIDERLOAD_DEFAULT}} is not be 
used in any place.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9772) TestBlockReplacement#testThrottler doesn't work as expected

2016-04-13 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15240420#comment-15240420
 ] 

Lin Yiqun commented on HDFS-9772:
-

Thanks [~walter.k.su] for commit!

> TestBlockReplacement#testThrottler doesn't work as expected
> ---
>
> Key: HDFS-9772
> URL: https://issues.apache.org/jira/browse/HDFS-9772
> Project: Hadoop HDFS
>  Issue Type: Test
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
>Priority: Minor
>  Labels: test
> Fix For: 2.7.3
>
> Attachments: HDFS.001.patch
>
>
> In {{TestBlockReplacement#testThrottler}}, it use a fault variable to 
> calculate the ended bandwidth. It use variable {{totalBytes}} rathe than 
> final variable {{TOTAL_BYTES}}. And the value of {{TOTAL_BYTES}} is set to 
> {{bytesToSend}}. The {{totalBytes}} looks no meaning here and this will make 
> {{totalBytes*1000/(end-start)}} always be 0 and the comparison always true. 
> The method code is below:
> {code}
> @Test
>   public void testThrottler() throws IOException {
> Configuration conf = new HdfsConfiguration();
> FileSystem.setDefaultUri(conf, "hdfs://localhost:0");
> long bandwidthPerSec = 1024*1024L;
> final long TOTAL_BYTES =6*bandwidthPerSec; 
> long bytesToSend = TOTAL_BYTES; 
> long start = Time.monotonicNow();
> DataTransferThrottler throttler = new 
> DataTransferThrottler(bandwidthPerSec);
> long totalBytes = 0L;
> long bytesSent = 1024*512L; // 0.5MB
> throttler.throttle(bytesSent);
> bytesToSend -= bytesSent;
> bytesSent = 1024*768L; // 0.75MB
> throttler.throttle(bytesSent);
> bytesToSend -= bytesSent;
> try {
>   Thread.sleep(1000);
> } catch (InterruptedException ignored) {}
> throttler.throttle(bytesToSend);
> long end = Time.monotonicNow();
> assertTrue(totalBytes*1000/(end-start)<=bandwidthPerSec);
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10279) Improve validation of the configured number of tolerated failed volumes

2016-04-13 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15240418#comment-15240418
 ] 

Lin Yiqun commented on HDFS-10279:
--

Thanks [~andrew.wang] for commit!

> Improve validation of the configured number of tolerated failed volumes
> ---
>
> Key: HDFS-10279
> URL: https://issues.apache.org/jira/browse/HDFS-10279
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Fix For: 2.8.0
>
> Attachments: HDFS-10279.001.patch, HDFS-10279.002.patch
>
>
> Now the misconfiguration for dfs.datanode.failed.volumes.tolerated are 
> detected too late and not easily be found. We can move the validation logic 
> for tolerated volumes to a eariler time that before datanode regists to 
> namenode. And this will let us detect the misconfiguration soon and easily.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10279) Improve the validation for tolerated volumes

2016-04-13 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238706#comment-15238706
 ] 

Lin Yiqun commented on HDFS-10279:
--

The failed unit test {{TestFsDatasetImpl}} is caused by 
{{TestFsDatasetImpl.testCleanShutdownOfVolume}} which tracked by HDFS-10260, 
the other failed tests seem not related.

> Improve the validation for tolerated volumes
> 
>
> Key: HDFS-10279
> URL: https://issues.apache.org/jira/browse/HDFS-10279
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS-10279.001.patch, HDFS-10279.002.patch
>
>
> Now the misconfiguration for dfs.datanode.failed.volumes.tolerated are 
> detected too late and not easily be found. We can move the validation logic 
> for tolerated volumes to a eariler time that before datanode regists to 
> namenode. And this will let us detect the misconfiguration soon and easily.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10279) Improve the validation for tolerated volumes

2016-04-12 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10279:
-
Target Version/s: 2.8.0

> Improve the validation for tolerated volumes
> 
>
> Key: HDFS-10279
> URL: https://issues.apache.org/jira/browse/HDFS-10279
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS-10279.001.patch, HDFS-10279.002.patch
>
>
> Now the misconfiguration for dfs.datanode.failed.volumes.tolerated are 
> detected too late and not easily be found. We can move the validation logic 
> for tolerated volumes to a eariler time that before datanode regists to 
> namenode. And this will let us detect the misconfiguration soon and easily.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10279) Improve the validation for tolerated volumes

2016-04-12 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10279:
-
Affects Version/s: 2.7.1

> Improve the validation for tolerated volumes
> 
>
> Key: HDFS-10279
> URL: https://issues.apache.org/jira/browse/HDFS-10279
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS-10279.001.patch, HDFS-10279.002.patch
>
>
> Now the misconfiguration for dfs.datanode.failed.volumes.tolerated are 
> detected too late and not easily be found. We can move the validation logic 
> for tolerated volumes to a eariler time that before datanode regists to 
> namenode. And this will let us detect the misconfiguration soon and easily.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10279) Improve the validation for tolerated volumes

2016-04-12 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10279:
-
Attachment: HDFS-10279.002.patch

Thanks [~andrew.wang] for review. Update the latest patch to address the 
comments, pending jenkins.

> Improve the validation for tolerated volumes
> 
>
> Key: HDFS-10279
> URL: https://issues.apache.org/jira/browse/HDFS-10279
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS-10279.001.patch, HDFS-10279.002.patch
>
>
> Now the misconfiguration for dfs.datanode.failed.volumes.tolerated are 
> detected too late and not easily be found. We can move the validation logic 
> for tolerated volumes to a eariler time that before datanode regists to 
> namenode. And this will let us detect the misconfiguration soon and easily.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10279) Improve the validation for tolerated volumes

2016-04-11 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15236630#comment-15236630
 ] 

Lin Yiqun commented on HDFS-10279:
--

Attach a initial patch. Thanks [~brahmareddy] for great idea. [~andrew.wang], 
can see this JIRA and review my patch.

> Improve the validation for tolerated volumes
> 
>
> Key: HDFS-10279
> URL: https://issues.apache.org/jira/browse/HDFS-10279
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS-10279.001.patch
>
>
> Now the misconfiguration for dfs.datanode.failed.volumes.tolerated are 
> detected too late and not easily be found. We can move the validation logic 
> for tolerated volumes to a eariler time that before datanode regists to 
> namenode. And this will let us detect the misconfiguration soon and easily.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10279) Improve the validation for tolerated volumes

2016-04-11 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10279:
-
Attachment: HDFS-10279.001.patch

> Improve the validation for tolerated volumes
> 
>
> Key: HDFS-10279
> URL: https://issues.apache.org/jira/browse/HDFS-10279
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS-10279.001.patch
>
>
> Now the misconfiguration for dfs.datanode.failed.volumes.tolerated are 
> detected too late and not easily be found. We can move the validation logic 
> for tolerated volumes to a eariler time that before datanode regists to 
> namenode. And this will let us detect the misconfiguration soon and easily.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10279) Improve the validation for tolerated volumes

2016-04-11 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10279:
-
Status: Patch Available  (was: Open)

> Improve the validation for tolerated volumes
> 
>
> Key: HDFS-10279
> URL: https://issues.apache.org/jira/browse/HDFS-10279
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS-10279.001.patch
>
>
> Now the misconfiguration for dfs.datanode.failed.volumes.tolerated are 
> detected too late and not easily be found. We can move the validation logic 
> for tolerated volumes to a eariler time that before datanode regists to 
> namenode. And this will let us detect the misconfiguration soon and easily.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-10279) Improve the validation for tolerated volumes

2016-04-11 Thread Lin Yiqun (JIRA)
Lin Yiqun created HDFS-10279:


 Summary: Improve the validation for tolerated volumes
 Key: HDFS-10279
 URL: https://issues.apache.org/jira/browse/HDFS-10279
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Lin Yiqun
Assignee: Lin Yiqun


Now the misconfiguration for dfs.datanode.failed.volumes.tolerated are detected 
too late and not easily be found. We can move the validation logic for 
tolerated volumes to a eariler time that before datanode regists to namenode. 
And this will let us detect the misconfiguration soon and easily.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9847) HDFS configuration without time unit name should accept friendly time units

2016-04-11 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15236421#comment-15236421
 ] 

Lin Yiqun commented on HDFS-9847:
-

Hi, everyone, can we go ahead of this JIRA and do a final review and commit. It 
seems there are no large disagreement from the latest comments, thanks.

> HDFS configuration without time unit name should accept friendly time units
> ---
>
> Key: HDFS-9847
> URL: https://issues.apache.org/jira/browse/HDFS-9847
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS-9847-branch-2.001.patch, 
> HDFS-9847-branch-2.002.patch, HDFS-9847-nothrow.001.patch, 
> HDFS-9847-nothrow.002.patch, HDFS-9847-nothrow.003.patch, 
> HDFS-9847-nothrow.004.patch, HDFS-9847.001.patch, HDFS-9847.002.patch, 
> HDFS-9847.003.patch, HDFS-9847.004.patch, HDFS-9847.005.patch, 
> HDFS-9847.006.patch, branch-2-delta.002.txt, timeduration-w-y.patch
>
>
> In HDFS-9821, it talks about the issue of leting existing keys use friendly 
> units e.g. 60s, 5m, 1d, 6w etc. But there are som configuration key names 
> contain time unit name, like {{dfs.blockreport.intervalMsec}}, so we can make 
> some other configurations which without time unit name to accept friendly 
> time units. The time unit  {{seconds}} is frequently used in hdfs. We can 
> updating this configurations first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10269) Invalid value configured for dfs.datanode.failed.volumes.tolerated cause the datanode exit

2016-04-11 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15236405#comment-15236405
 ] 

Lin Yiqun commented on HDFS-10269:
--

The Brahma's idea also looks good to me, I am glad to do further optimization 
on this, can assign this work for me?

> Invalid value configured for dfs.datanode.failed.volumes.tolerated cause the 
> datanode exit
> --
>
> Key: HDFS-10269
> URL: https://issues.apache.org/jira/browse/HDFS-10269
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS-10269.001.patch
>
>
> The datanode start failed and exited when I reused configured for 
> dfs.datanode.failed.volumes.tolerated as 5 from my another cluster but 
> actually the new cluster only have one datadir path. And this leaded the 
> Invalid volume failure config value and threw {{DiskErrorException}}, so the 
> datanode shutdown. The info is below:
> {code}
> 2016-04-07 09:34:45,358 WARN org.apache.hadoop.hdfs.server.common.Storage: 
> Failed to add storage for block pool: BP-1239160341-xx.xx.xx.xx-1459929303126 
> : BlockPoolSliceStorage.recoverTransitionRead: attempt to load an used block 
> storage: /home/data/hdfs/data/current/BP-1239160341-xx.xx.xx.xx-1459929303126
> 2016-04-07 09:34:45,358 FATAL 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for 
> Block pool  (Datanode Uuid unassigned) service to 
> /xx.xx.xx.xx:9000. Exiting.
> java.io.IOException: All specified directories are failed to load.
> at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:477)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1361)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1326)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:316)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:223)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:801)
> at java.lang.Thread.run(Thread.java:745)
> 2016-04-07 09:34:45,358 FATAL 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for 
> Block pool  (Datanode Uuid unassigned) service to 
> /xx.xx.xx.xx:9000. Exiting.
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Invalid volume failure 
>  config value: 5
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.(FsDatasetImpl.java:281)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetFactory.newInstance(FsDatasetFactory.java:34)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetFactory.newInstance(FsDatasetFactory.java:30)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1374)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1326)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:316)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:223)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:801)
> at java.lang.Thread.run(Thread.java:745)
> 2016-04-07 09:34:45,358 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Ending block pool service for: Block pool  (Datanode Uuid 
> unassigned) service to /xx.xx.xx.xx:9000
> 2016-04-07 09:34:45,359 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Ending block pool service for: Block pool  (Datanode Uuid 
> unassigned) service to /xx.xx.xx.xx:9000
> 2016-04-07 09:34:45,460 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Removed Block pool  (Datanode Uuid unassigned)
> 2016-04-07 09:34:47,460 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Exiting Datanode
> 2016-04-07 09:34:47,462 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
> status 0
> 2016-04-07 09:34:47,463 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> SHUTDOWN_MSG:
> {code}
> IMO, this will let users feel bad because I only configured a value 
> incorrectly. Instead of, we can give a warn info for this and reset this 
> value to the default value. It will be a better way for this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10275) TestDataNodeMetrics failing intermittently due to TotalWriteTime counted incorrectly

2016-04-10 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10275:
-
Attachment: HDFS-10275.001.patch

> TestDataNodeMetrics failing intermittently due to TotalWriteTime counted 
> incorrectly
> 
>
> Key: HDFS-10275
> URL: https://issues.apache.org/jira/browse/HDFS-10275
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS-10275.001.patch
>
>
> The unit test {{TestDataNodeMetrics}} fails intermittently. The failed info 
> show these:
> {code}
> Results :
> Failed tests: 
>   
> TestDataNodeVolumeFailureToleration.testVolumeAndTolerableConfiguration:195->testVolumeConfig:232
>  expected: but was:
> Tests in error: 
>   TestOpenFilesWithSnapshot.testWithCheckpoint:94 ? IO Timed out waiting for 
> Min...
>   TestDataNodeMetrics.testDataNodeTimeSpend:279 ? Timeout Timed out waiting 
> for ...
>   TestHFlush.testHFlushInterrupted ? IO The stream is closed
> {code}
> In line 279 in {{TestDataNodeMetrics}}, it takes place timed out. Then I 
> looked into the code and found the real reason is that the metric of 
> {{TotalWriteTime}} frequently count 0 in each iteration of creating file. And 
> the this leads to retry operations till timeout.
> I debug the test in my local. I found the most suspect reason which cause 
> {{TotalWriteTime}} metric count always be 0 is that we using the 
> {{SimulatedFSDataset}} for spending time test. In {{SimulatedFSDataset}}, it 
> will use the inner class's method {{SimulatedOutputStream#write}} to count 
> the write time and the method of this class just updates the {{length}} and 
> throws its data away.
> {code}
> @Override
> public void write(byte[] b,
>   int off,
>   int len) throws IOException  {
>   length += len;
> }
> {code} 
> So the writing operation hardly not costs any time. So we should use a real 
> way to create file instead of simulated way. I have tested in my local that 
> the test is passed just one time when I delete the simulated way, while the 
> test retries many times to count write time in old way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10275) TestDataNodeMetrics failing intermittently due to TotalWriteTime counted incorrectly

2016-04-10 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10275:
-
Status: Patch Available  (was: Open)

Attach a simple patch from me. I also bump the timeout time in patch to avoid 
that the test executed on a busy jenkins slave, kindly review.

> TestDataNodeMetrics failing intermittently due to TotalWriteTime counted 
> incorrectly
> 
>
> Key: HDFS-10275
> URL: https://issues.apache.org/jira/browse/HDFS-10275
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
>
> The unit test {{TestDataNodeMetrics}} fails intermittently. The failed info 
> show these:
> {code}
> Results :
> Failed tests: 
>   
> TestDataNodeVolumeFailureToleration.testVolumeAndTolerableConfiguration:195->testVolumeConfig:232
>  expected: but was:
> Tests in error: 
>   TestOpenFilesWithSnapshot.testWithCheckpoint:94 ? IO Timed out waiting for 
> Min...
>   TestDataNodeMetrics.testDataNodeTimeSpend:279 ? Timeout Timed out waiting 
> for ...
>   TestHFlush.testHFlushInterrupted ? IO The stream is closed
> {code}
> In line 279 in {{TestDataNodeMetrics}}, it takes place timed out. Then I 
> looked into the code and found the real reason is that the metric of 
> {{TotalWriteTime}} frequently count 0 in each iteration of creating file. And 
> the this leads to retry operations till timeout.
> I debug the test in my local. I found the most suspect reason which cause 
> {{TotalWriteTime}} metric count always be 0 is that we using the 
> {{SimulatedFSDataset}} for spending time test. In {{SimulatedFSDataset}}, it 
> will use the inner class's method {{SimulatedOutputStream#write}} to count 
> the write time and the method of this class just updates the {{length}} and 
> throws its data away.
> {code}
> @Override
> public void write(byte[] b,
>   int off,
>   int len) throws IOException  {
>   length += len;
> }
> {code} 
> So the writing operation hardly not costs any time. So we should use a real 
> way to create file instead of simulated way. I have tested in my local that 
> the test is passed just one time when I delete the simulated way, while the 
> test retries many times to count write time in old way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10275) TestDataNodeMetrics failing intermittently due to TotalWriteTime counted incorrectly

2016-04-10 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10275:
-
Description: 
The unit test {{TestDataNodeMetrics}} fails intermittently. The failed info 
show these:
{code}
Results :

Failed tests: 
  
TestDataNodeVolumeFailureToleration.testVolumeAndTolerableConfiguration:195->testVolumeConfig:232
 expected: but was:

Tests in error: 
  TestOpenFilesWithSnapshot.testWithCheckpoint:94 ? IO Timed out waiting for 
Min...
  TestDataNodeMetrics.testDataNodeTimeSpend:279 ? Timeout Timed out waiting for 
...
  TestHFlush.testHFlushInterrupted ? IO The stream is closed
{code}
In line 279 in {{TestDataNodeMetrics}}, it takes place timed out. Then I looked 
into the code and found the real reason is that the metric of 
{{TotalWriteTime}} frequently count 0 in each iteration of creating file. And 
the this leads to retry operations till timeout.
I debug the test in my local. I found the most suspect reason which cause 
{{TotalWriteTime}} metric count always be 0 is that we using the 
{{SimulatedFSDataset}} for spending time test. In {{SimulatedFSDataset}}, it 
will use the inner class's method {{SimulatedOutputStream#write}} to count the 
write time and the method of this class just updates the {{length}} and throws 
its data away.
{code}
@Override
public void write(byte[] b,
  int off,
  int len) throws IOException  {
  length += len;
}
{code} 
So the writing operation hardly not costs any time. So we should use a real way 
to create file instead of simulated way. I have tested in my local that the 
test is passed just one time when I delete the simulated way, while the test 
retries many times to count write time in old way.

  was:
The unit test {{TestDataNodeMetrics}} fails intermittently. The failed info 
show these:
{code}
Results :

Failed tests: 
  
TestDataNodeVolumeFailureToleration.testVolumeAndTolerableConfiguration:195->testVolumeConfig:232
 expected: but was:

Tests in error: 
  TestOpenFilesWithSnapshot.testWithCheckpoint:94 ? IO Timed out waiting for 
Min...
  TestDataNodeMetrics.testDataNodeTimeSpend:279 ? Timeout Timed out waiting for 
...
  TestHFlush.testHFlushInterrupted ? IO The stream is closed
{code}
In line 279 in {{TestDataNodeMetrics}}, it takes place timed out. Then I looked 
into the code and found the real reason is that the metric of 
{{TotalWriteTime}} frequently count 0 in each iteration of creating file. And 
the this leads to retry operations till timeout.
I debug the test in my local. I found the most suspect reason whic cause 
{{TotalWriteTime}} metric count always be 0 is that we using the 
{{SimulatedFSDataset}} for spending time test. In {{SimulatedFSDataset}}, it 
will use the inner class's method {{SimulatedOutputStream#write}} to count the 
write time and the method of this class just updates the {{length}} and throws 
its data away.
{code}
@Override
public void write(byte[] b,
  int off,
  int len) throws IOException  {
  length += len;
}
{code} 
So the writing operation hardly not costs any time. So we should use a real way 
to create file instead of simulated way. I have tested in my local that the 
test is passed just one time when I delete the simulated way, while the test 
retries many times to count write time in old way.


> TestDataNodeMetrics failing intermittently due to TotalWriteTime counted 
> incorrectly
> 
>
> Key: HDFS-10275
> URL: https://issues.apache.org/jira/browse/HDFS-10275
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
>
> The unit test {{TestDataNodeMetrics}} fails intermittently. The failed info 
> show these:
> {code}
> Results :
> Failed tests: 
>   
> TestDataNodeVolumeFailureToleration.testVolumeAndTolerableConfiguration:195->testVolumeConfig:232
>  expected: but was:
> Tests in error: 
>   TestOpenFilesWithSnapshot.testWithCheckpoint:94 ? IO Timed out waiting for 
> Min...
>   TestDataNodeMetrics.testDataNodeTimeSpend:279 ? Timeout Timed out waiting 
> for ...
>   TestHFlush.testHFlushInterrupted ? IO The stream is closed
> {code}
> In line 279 in {{TestDataNodeMetrics}}, it takes place timed out. Then I 
> looked into the code and found the real reason is that the metric of 
> {{TotalWriteTime}} frequently count 0 in each iteration of creating file. And 
> the this leads to retry operations till timeout.
> I debug the test in my local. I found the most suspect reason which cause 
> {{TotalWriteTime}} metric count always be 0 is that we using the 
> {{SimulatedFSDataset}} for spending time test. In {{SimulatedFSDataset}}, it 
> will use the inner class's method {{SimulatedOutputStream#write}} to count 
> the write 

[jira] [Created] (HDFS-10275) TestDataNodeMetrics failing intermittently due to TotalWriteTime counted incorrectly

2016-04-10 Thread Lin Yiqun (JIRA)
Lin Yiqun created HDFS-10275:


 Summary: TestDataNodeMetrics failing intermittently due to 
TotalWriteTime counted incorrectly
 Key: HDFS-10275
 URL: https://issues.apache.org/jira/browse/HDFS-10275
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Lin Yiqun
Assignee: Lin Yiqun


The unit test {{TestDataNodeMetrics}} fails intermittently. The failed info 
show these:
{code}
Results :

Failed tests: 
  
TestDataNodeVolumeFailureToleration.testVolumeAndTolerableConfiguration:195->testVolumeConfig:232
 expected: but was:

Tests in error: 
  TestOpenFilesWithSnapshot.testWithCheckpoint:94 ? IO Timed out waiting for 
Min...
  TestDataNodeMetrics.testDataNodeTimeSpend:279 ? Timeout Timed out waiting for 
...
  TestHFlush.testHFlushInterrupted ? IO The stream is closed
{code}
In line 279 in {{TestDataNodeMetrics}}, it takes place timed out. Then I looked 
into the code and found the real reason is that the metric of 
{{TotalWriteTime}} frequently count 0 in each iteration of creating file. And 
the this leads to retry operations till timeout.
I debug the test in my local. I found the most suspect reason whic cause 
{{TotalWriteTime}} metric count always be 0 is that we using the 
{{SimulatedFSDataset}} for spending time test. In {{SimulatedFSDataset}}, it 
will use the inner class's method {{SimulatedOutputStream#write}} to count the 
write time and the method of this class just updates the {{length}} and throws 
its data away.
{code}
@Override
public void write(byte[] b,
  int off,
  int len) throws IOException  {
  length += len;
}
{code} 
So the writing operation hardly not costs any time. So we should use a real way 
to create file instead of simulated way. I have tested in my local that the 
test is passed just one time when I delete the simulated way, while the test 
retries many times to count write time in old way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10269) Invalid value configured for dfs.datanode.failed.volumes.tolerated cause the datanode exit

2016-04-10 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15234395#comment-15234395
 ] 

Lin Yiqun commented on HDFS-10269:
--

Is there a good idea to deal with this or just close this JIRA because it seems 
not a exactly problem?

> Invalid value configured for dfs.datanode.failed.volumes.tolerated cause the 
> datanode exit
> --
>
> Key: HDFS-10269
> URL: https://issues.apache.org/jira/browse/HDFS-10269
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS-10269.001.patch
>
>
> The datanode start failed and exited when I reused configured for 
> dfs.datanode.failed.volumes.tolerated as 5 from my another cluster but 
> actually the new cluster only have one datadir path. And this leaded the 
> Invalid volume failure config value and threw {{DiskErrorException}}, so the 
> datanode shutdown. The info is below:
> {code}
> 2016-04-07 09:34:45,358 WARN org.apache.hadoop.hdfs.server.common.Storage: 
> Failed to add storage for block pool: BP-1239160341-xx.xx.xx.xx-1459929303126 
> : BlockPoolSliceStorage.recoverTransitionRead: attempt to load an used block 
> storage: /home/data/hdfs/data/current/BP-1239160341-xx.xx.xx.xx-1459929303126
> 2016-04-07 09:34:45,358 FATAL 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for 
> Block pool  (Datanode Uuid unassigned) service to 
> /xx.xx.xx.xx:9000. Exiting.
> java.io.IOException: All specified directories are failed to load.
> at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:477)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1361)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1326)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:316)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:223)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:801)
> at java.lang.Thread.run(Thread.java:745)
> 2016-04-07 09:34:45,358 FATAL 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for 
> Block pool  (Datanode Uuid unassigned) service to 
> /xx.xx.xx.xx:9000. Exiting.
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Invalid volume failure 
>  config value: 5
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.(FsDatasetImpl.java:281)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetFactory.newInstance(FsDatasetFactory.java:34)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetFactory.newInstance(FsDatasetFactory.java:30)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1374)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1326)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:316)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:223)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:801)
> at java.lang.Thread.run(Thread.java:745)
> 2016-04-07 09:34:45,358 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Ending block pool service for: Block pool  (Datanode Uuid 
> unassigned) service to /xx.xx.xx.xx:9000
> 2016-04-07 09:34:45,359 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Ending block pool service for: Block pool  (Datanode Uuid 
> unassigned) service to /xx.xx.xx.xx:9000
> 2016-04-07 09:34:45,460 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Removed Block pool  (Datanode Uuid unassigned)
> 2016-04-07 09:34:47,460 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Exiting Datanode
> 2016-04-07 09:34:47,462 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
> status 0
> 2016-04-07 09:34:47,463 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> SHUTDOWN_MSG:
> {code}
> IMO, this will let users feel bad because I only configured a value 
> incorrectly. Instead of, we can give a warn info for this and reset this 
> value to the default value. It will be a better way for this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10269) Invalid value configured for dfs.datanode.failed.volumes.tolerated cause the datanode exit

2016-04-08 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231760#comment-15231760
 ] 

Lin Yiqun commented on HDFS-10269:
--

Hi, [~cnauroth], I don't think the misconfiguration for 
dfs.datanode.failed.volumes.tolerated always mean the admin forgot to include a 
few volumes. Sometimes the user don't know the value configured for failed 
volumes bigger than volumes or just small than 0 will lead the datanode 
shutdown. And the property desciption of dfs.datanode.failed.volumes.tolerated 
also don't declare for this. It will make users confused and have to look for 
the reason for datanode's log and then restart the datanode.  It seems we 
should do a improvement for this. Giving a suitable value when invalid 
configuration hadppend will be better than just node's shutdown.

> Invalid value configured for dfs.datanode.failed.volumes.tolerated cause the 
> datanode exit
> --
>
> Key: HDFS-10269
> URL: https://issues.apache.org/jira/browse/HDFS-10269
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS-10269.001.patch
>
>
> The datanode start failed and exited when I reused configured for 
> dfs.datanode.failed.volumes.tolerated as 5 from my another cluster but 
> actually the new cluster only have one datadir path. And this leaded the 
> Invalid volume failure config value and threw {{DiskErrorException}}, so the 
> datanode shutdown. The info is below:
> {code}
> 2016-04-07 09:34:45,358 WARN org.apache.hadoop.hdfs.server.common.Storage: 
> Failed to add storage for block pool: BP-1239160341-xx.xx.xx.xx-1459929303126 
> : BlockPoolSliceStorage.recoverTransitionRead: attempt to load an used block 
> storage: /home/data/hdfs/data/current/BP-1239160341-xx.xx.xx.xx-1459929303126
> 2016-04-07 09:34:45,358 FATAL 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for 
> Block pool  (Datanode Uuid unassigned) service to 
> /xx.xx.xx.xx:9000. Exiting.
> java.io.IOException: All specified directories are failed to load.
> at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:477)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1361)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1326)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:316)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:223)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:801)
> at java.lang.Thread.run(Thread.java:745)
> 2016-04-07 09:34:45,358 FATAL 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for 
> Block pool  (Datanode Uuid unassigned) service to 
> /xx.xx.xx.xx:9000. Exiting.
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Invalid volume failure 
>  config value: 5
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.(FsDatasetImpl.java:281)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetFactory.newInstance(FsDatasetFactory.java:34)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetFactory.newInstance(FsDatasetFactory.java:30)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1374)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1326)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:316)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:223)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:801)
> at java.lang.Thread.run(Thread.java:745)
> 2016-04-07 09:34:45,358 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Ending block pool service for: Block pool  (Datanode Uuid 
> unassigned) service to /xx.xx.xx.xx:9000
> 2016-04-07 09:34:45,359 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Ending block pool service for: Block pool  (Datanode Uuid 
> unassigned) service to /xx.xx.xx.xx:9000
> 2016-04-07 09:34:45,460 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Removed Block pool  (Datanode Uuid unassigned)
> 2016-04-07 09:34:47,460 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Exiting Datanode
> 2016-04-07 09:34:47,462 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
> status 0
> 2016-04-07 09:34:47,463 INFO 

[jira] [Commented] (HDFS-10269) Invalid value configured for dfs.datanode.failed.volumes.tolerated cause the datanode exit

2016-04-06 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229631#comment-15229631
 ] 

Lin Yiqun commented on HDFS-10269:
--

I think we can add a case for address your comments. If the value is invalid 
and the failed volumes more than 0, it should also be throw DiskErrorException.

> Invalid value configured for dfs.datanode.failed.volumes.tolerated cause the 
> datanode exit
> --
>
> Key: HDFS-10269
> URL: https://issues.apache.org/jira/browse/HDFS-10269
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS-10269.001.patch
>
>
> The datanode start failed and exited when I reused configured for 
> dfs.datanode.failed.volumes.tolerated as 5 from my another cluster but 
> actually the new cluster only have one datadir path. And this leaded the 
> Invalid volume failure config value and threw {{DiskErrorException}}, so the 
> datanode shutdown. The info is below:
> {code}
> 2016-04-07 09:34:45,358 WARN org.apache.hadoop.hdfs.server.common.Storage: 
> Failed to add storage for block pool: BP-1239160341-xx.xx.xx.xx-1459929303126 
> : BlockPoolSliceStorage.recoverTransitionRead: attempt to load an used block 
> storage: /home/data/hdfs/data/current/BP-1239160341-xx.xx.xx.xx-1459929303126
> 2016-04-07 09:34:45,358 FATAL 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for 
> Block pool  (Datanode Uuid unassigned) service to 
> /xx.xx.xx.xx:9000. Exiting.
> java.io.IOException: All specified directories are failed to load.
> at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:477)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1361)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1326)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:316)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:223)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:801)
> at java.lang.Thread.run(Thread.java:745)
> 2016-04-07 09:34:45,358 FATAL 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for 
> Block pool  (Datanode Uuid unassigned) service to 
> /xx.xx.xx.xx:9000. Exiting.
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Invalid volume failure 
>  config value: 5
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.(FsDatasetImpl.java:281)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetFactory.newInstance(FsDatasetFactory.java:34)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetFactory.newInstance(FsDatasetFactory.java:30)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1374)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1326)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:316)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:223)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:801)
> at java.lang.Thread.run(Thread.java:745)
> 2016-04-07 09:34:45,358 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Ending block pool service for: Block pool  (Datanode Uuid 
> unassigned) service to /xx.xx.xx.xx:9000
> 2016-04-07 09:34:45,359 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Ending block pool service for: Block pool  (Datanode Uuid 
> unassigned) service to /xx.xx.xx.xx:9000
> 2016-04-07 09:34:45,460 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Removed Block pool  (Datanode Uuid unassigned)
> 2016-04-07 09:34:47,460 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Exiting Datanode
> 2016-04-07 09:34:47,462 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
> status 0
> 2016-04-07 09:34:47,463 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> SHUTDOWN_MSG:
> {code}
> IMO, this will let users feel bad because I only configured a value 
> incorrectly. Instead of, we can give a warn info for this and reset this 
> value to the default value. It will be a better way for this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10269) Invalid value configured for dfs.datanode.failed.volumes.tolerated cause the datanode exit

2016-04-06 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10269:
-
Attachment: HDFS-10269.001.patch

> Invalid value configured for dfs.datanode.failed.volumes.tolerated cause the 
> datanode exit
> --
>
> Key: HDFS-10269
> URL: https://issues.apache.org/jira/browse/HDFS-10269
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS-10269.001.patch
>
>
> The datanode start failed and exited when I reused configured for 
> dfs.datanode.failed.volumes.tolerated as 5 from my another cluster but 
> actually the new cluster only have one datadir path. And this leaded the 
> Invalid volume failure config value and threw {{DiskErrorException}}, so the 
> datanode shutdown. The info is below:
> {code}
> 2016-04-07 09:34:45,358 WARN org.apache.hadoop.hdfs.server.common.Storage: 
> Failed to add storage for block pool: BP-1239160341-xx.xx.xx.xx-1459929303126 
> : BlockPoolSliceStorage.recoverTransitionRead: attempt to load an used block 
> storage: /home/data/hdfs/data/current/BP-1239160341-xx.xx.xx.xx-1459929303126
> 2016-04-07 09:34:45,358 FATAL 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for 
> Block pool  (Datanode Uuid unassigned) service to 
> /xx.xx.xx.xx:9000. Exiting.
> java.io.IOException: All specified directories are failed to load.
> at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:477)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1361)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1326)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:316)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:223)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:801)
> at java.lang.Thread.run(Thread.java:745)
> 2016-04-07 09:34:45,358 FATAL 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for 
> Block pool  (Datanode Uuid unassigned) service to 
> /xx.xx.xx.xx:9000. Exiting.
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Invalid volume failure 
>  config value: 5
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.(FsDatasetImpl.java:281)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetFactory.newInstance(FsDatasetFactory.java:34)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetFactory.newInstance(FsDatasetFactory.java:30)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1374)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1326)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:316)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:223)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:801)
> at java.lang.Thread.run(Thread.java:745)
> 2016-04-07 09:34:45,358 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Ending block pool service for: Block pool  (Datanode Uuid 
> unassigned) service to /xx.xx.xx.xx:9000
> 2016-04-07 09:34:45,359 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Ending block pool service for: Block pool  (Datanode Uuid 
> unassigned) service to /xx.xx.xx.xx:9000
> 2016-04-07 09:34:45,460 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Removed Block pool  (Datanode Uuid unassigned)
> 2016-04-07 09:34:47,460 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Exiting Datanode
> 2016-04-07 09:34:47,462 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
> status 0
> 2016-04-07 09:34:47,463 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> SHUTDOWN_MSG:
> {code}
> IMO, this will let users feel bad because I only configured a value 
> incorrectly. Instead of, we can give a warn info for this and reset this 
> value to the default value. It will be a better way for this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10269) Invalid value configured for dfs.datanode.failed.volumes.tolerated cause the datanode exit

2016-04-06 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10269:
-
Status: Patch Available  (was: Open)

Attach a simple path for this.

> Invalid value configured for dfs.datanode.failed.volumes.tolerated cause the 
> datanode exit
> --
>
> Key: HDFS-10269
> URL: https://issues.apache.org/jira/browse/HDFS-10269
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS-10269.001.patch
>
>
> The datanode start failed and exited when I reused configured for 
> dfs.datanode.failed.volumes.tolerated as 5 from my another cluster but 
> actually the new cluster only have one datadir path. And this leaded the 
> Invalid volume failure config value and threw {{DiskErrorException}}, so the 
> datanode shutdown. The info is below:
> {code}
> 2016-04-07 09:34:45,358 WARN org.apache.hadoop.hdfs.server.common.Storage: 
> Failed to add storage for block pool: BP-1239160341-xx.xx.xx.xx-1459929303126 
> : BlockPoolSliceStorage.recoverTransitionRead: attempt to load an used block 
> storage: /home/data/hdfs/data/current/BP-1239160341-xx.xx.xx.xx-1459929303126
> 2016-04-07 09:34:45,358 FATAL 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for 
> Block pool  (Datanode Uuid unassigned) service to 
> /xx.xx.xx.xx:9000. Exiting.
> java.io.IOException: All specified directories are failed to load.
> at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:477)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1361)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1326)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:316)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:223)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:801)
> at java.lang.Thread.run(Thread.java:745)
> 2016-04-07 09:34:45,358 FATAL 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for 
> Block pool  (Datanode Uuid unassigned) service to 
> /xx.xx.xx.xx:9000. Exiting.
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Invalid volume failure 
>  config value: 5
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.(FsDatasetImpl.java:281)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetFactory.newInstance(FsDatasetFactory.java:34)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetFactory.newInstance(FsDatasetFactory.java:30)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1374)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1326)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:316)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:223)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:801)
> at java.lang.Thread.run(Thread.java:745)
> 2016-04-07 09:34:45,358 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Ending block pool service for: Block pool  (Datanode Uuid 
> unassigned) service to /xx.xx.xx.xx:9000
> 2016-04-07 09:34:45,359 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Ending block pool service for: Block pool  (Datanode Uuid 
> unassigned) service to /xx.xx.xx.xx:9000
> 2016-04-07 09:34:45,460 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Removed Block pool  (Datanode Uuid unassigned)
> 2016-04-07 09:34:47,460 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Exiting Datanode
> 2016-04-07 09:34:47,462 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
> status 0
> 2016-04-07 09:34:47,463 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> SHUTDOWN_MSG:
> {code}
> IMO, this will let users feel bad because I only configured a value 
> incorrectly. Instead of, we can give a warn info for this and reset this 
> value to the default value. It will be a better way for this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-10269) Invalid value configured for dfs.datanode.failed.volumes.tolerated cause the datanode exit

2016-04-06 Thread Lin Yiqun (JIRA)
Lin Yiqun created HDFS-10269:


 Summary: Invalid value configured for 
dfs.datanode.failed.volumes.tolerated cause the datanode exit
 Key: HDFS-10269
 URL: https://issues.apache.org/jira/browse/HDFS-10269
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.1
Reporter: Lin Yiqun
Assignee: Lin Yiqun


The datanode start failed and exited when I reused configured for 
dfs.datanode.failed.volumes.tolerated as 5 from my another cluster but actually 
the new cluster only have one datadir path. And this leaded the Invalid volume 
failure config value and threw {{DiskErrorException}}, so the datanode 
shutdown. The info is below:
{code}
2016-04-07 09:34:45,358 WARN org.apache.hadoop.hdfs.server.common.Storage: 
Failed to add storage for block pool: BP-1239160341-xx.xx.xx.xx-1459929303126 : 
BlockPoolSliceStorage.recoverTransitionRead: attempt to load an used block 
storage: /home/data/hdfs/data/current/BP-1239160341-xx.xx.xx.xx-1459929303126
2016-04-07 09:34:45,358 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: 
Initialization failed for Block pool  (Datanode Uuid unassigned) 
service to /xx.xx.xx.xx:9000. Exiting.
java.io.IOException: All specified directories are failed to load.
at 
org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:477)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1361)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1326)
at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:316)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:223)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:801)
at java.lang.Thread.run(Thread.java:745)
2016-04-07 09:34:45,358 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: 
Initialization failed for Block pool  (Datanode Uuid unassigned) 
service to /xx.xx.xx.xx:9000. Exiting.
org.apache.hadoop.util.DiskChecker$DiskErrorException: Invalid volume failure  
config value: 5
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.(FsDatasetImpl.java:281)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetFactory.newInstance(FsDatasetFactory.java:34)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetFactory.newInstance(FsDatasetFactory.java:30)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1374)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1326)
at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:316)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:223)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:801)
at java.lang.Thread.run(Thread.java:745)
2016-04-07 09:34:45,358 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
Ending block pool service for: Block pool  (Datanode Uuid 
unassigned) service to /xx.xx.xx.xx:9000
2016-04-07 09:34:45,359 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
Ending block pool service for: Block pool  (Datanode Uuid 
unassigned) service to /xx.xx.xx.xx:9000
2016-04-07 09:34:45,460 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Removed Block pool  (Datanode Uuid unassigned)
2016-04-07 09:34:47,460 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
Exiting Datanode
2016-04-07 09:34:47,462 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
status 0
2016-04-07 09:34:47,463 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
SHUTDOWN_MSG:
{code}

IMO, this will let users feel bad because I only configured a value 
incorrectly. Instead of, we can give a warn info for this and reset this value 
to the default value. It will be a better way for this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9583) TestBlockReplacement#testDeletedBlockWhenAddBlockIsInEdit occasionally fails

2016-04-05 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227625#comment-15227625
 ] 

Lin Yiqun commented on HDFS-9583:
-

Hi, [~jojochuang], this issue has fixed in HDFS-9865, can we duplicate to that.

> TestBlockReplacement#testDeletedBlockWhenAddBlockIsInEdit occasionally fails
> 
>
> Key: HDFS-9583
> URL: https://issues.apache.org/jira/browse/HDFS-9583
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
> Environment: Jenkins
>Reporter: Wei-Chiu Chuang
>
> https://builds.apache.org/job/Hadoop-Hdfs-trunk/2647/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestBlockReplacement/testDeletedBlockWhenAddBlockIsInEdit/
> Looking at the code, the test expects that replacing a block from one data 
> node to another  issues a delete request to 
> FsDatasetAsyncDiskService.deleteAsync(), which should have print log 
> "Scheduling ... file ... for deletion", and it waited for 3 seconds. However, 
> it never occurred.
> I think the test needs a better way to determine if the delete request is 
> executed, rather than using a fixed time out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9744) TestDirectoryScanner#testThrottling occasionally time out after 300 seconds

2016-04-05 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-9744:

Status: Patch Available  (was: Open)

> TestDirectoryScanner#testThrottling occasionally time out after 300 seconds
> ---
>
> Key: HDFS-9744
> URL: https://issues.apache.org/jira/browse/HDFS-9744
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
> Environment: Jenkins
>Reporter: Wei-Chiu Chuang
>Assignee: Lin Yiqun
>Priority: Minor
>  Labels: test
> Attachments: HDFS-9744.001.patch
>
>
> I have seen quite a few test failures in TestDirectoryScanner#testThrottling.
> https://builds.apache.org/job/Hadoop-Hdfs-trunk/2793/testReport/org.apache.hadoop.hdfs.server.datanode/TestDirectoryScanner/testThrottling/
> Looking at the log, it does not look like the test got stucked. On my local 
> machine, this test took 219 seconds. It is likely that this test takes more 
> than 300 seconds to complete on a busy jenkins slave. I think it is 
> reasonable to set a longer time out value, or reduce the number of blocks to 
> reduce the duration of the test.
> Error Message
> {noformat}
> test timed out after 30 milliseconds
> {noformat}
> Stacktrace
> {noformat}
> java.lang.Exception: test timed out after 30 milliseconds
>   at java.lang.Object.wait(Native Method)
>   at java.lang.Object.wait(Object.java:503)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.waitAndQueuePacket(DataStreamer.java:804)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.enqueueCurrentPacket(DFSOutputStream.java:423)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.enqueueCurrentPacketFull(DFSOutputStream.java:432)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.writeChunk(DFSOutputStream.java:418)
>   at 
> org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunks(FSOutputSummer.java:217)
>   at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:125)
>   at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:111)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57)
>   at java.io.DataOutputStream.write(DataOutputStream.java:107)
>   at org.apache.hadoop.hdfs.DFSTestUtil.createFile(DFSTestUtil.java:418)
>   at org.apache.hadoop.hdfs.DFSTestUtil.createFile(DFSTestUtil.java:376)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestDirectoryScanner.createFile(TestDirectoryScanner.java:108)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestDirectoryScanner.testThrottling(TestDirectoryScanner.java:584)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-9744) TestDirectoryScanner#testThrottling occasionally time out after 300 seconds

2016-04-05 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun reassigned HDFS-9744:
---

Assignee: Lin Yiqun

> TestDirectoryScanner#testThrottling occasionally time out after 300 seconds
> ---
>
> Key: HDFS-9744
> URL: https://issues.apache.org/jira/browse/HDFS-9744
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
> Environment: Jenkins
>Reporter: Wei-Chiu Chuang
>Assignee: Lin Yiqun
>Priority: Minor
>  Labels: test
> Attachments: HDFS-9744.001.patch
>
>
> I have seen quite a few test failures in TestDirectoryScanner#testThrottling.
> https://builds.apache.org/job/Hadoop-Hdfs-trunk/2793/testReport/org.apache.hadoop.hdfs.server.datanode/TestDirectoryScanner/testThrottling/
> Looking at the log, it does not look like the test got stucked. On my local 
> machine, this test took 219 seconds. It is likely that this test takes more 
> than 300 seconds to complete on a busy jenkins slave. I think it is 
> reasonable to set a longer time out value, or reduce the number of blocks to 
> reduce the duration of the test.
> Error Message
> {noformat}
> test timed out after 30 milliseconds
> {noformat}
> Stacktrace
> {noformat}
> java.lang.Exception: test timed out after 30 milliseconds
>   at java.lang.Object.wait(Native Method)
>   at java.lang.Object.wait(Object.java:503)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.waitAndQueuePacket(DataStreamer.java:804)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.enqueueCurrentPacket(DFSOutputStream.java:423)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.enqueueCurrentPacketFull(DFSOutputStream.java:432)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.writeChunk(DFSOutputStream.java:418)
>   at 
> org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunks(FSOutputSummer.java:217)
>   at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:125)
>   at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:111)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57)
>   at java.io.DataOutputStream.write(DataOutputStream.java:107)
>   at org.apache.hadoop.hdfs.DFSTestUtil.createFile(DFSTestUtil.java:418)
>   at org.apache.hadoop.hdfs.DFSTestUtil.createFile(DFSTestUtil.java:376)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestDirectoryScanner.createFile(TestDirectoryScanner.java:108)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestDirectoryScanner.testThrottling(TestDirectoryScanner.java:584)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9744) TestDirectoryScanner#testThrottling occasionally time out after 300 seconds

2016-04-05 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-9744:

Attachment: HDFS-9744.001.patch

> TestDirectoryScanner#testThrottling occasionally time out after 300 seconds
> ---
>
> Key: HDFS-9744
> URL: https://issues.apache.org/jira/browse/HDFS-9744
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
> Environment: Jenkins
>Reporter: Wei-Chiu Chuang
>Assignee: Lin Yiqun
>Priority: Minor
>  Labels: test
> Attachments: HDFS-9744.001.patch
>
>
> I have seen quite a few test failures in TestDirectoryScanner#testThrottling.
> https://builds.apache.org/job/Hadoop-Hdfs-trunk/2793/testReport/org.apache.hadoop.hdfs.server.datanode/TestDirectoryScanner/testThrottling/
> Looking at the log, it does not look like the test got stucked. On my local 
> machine, this test took 219 seconds. It is likely that this test takes more 
> than 300 seconds to complete on a busy jenkins slave. I think it is 
> reasonable to set a longer time out value, or reduce the number of blocks to 
> reduce the duration of the test.
> Error Message
> {noformat}
> test timed out after 30 milliseconds
> {noformat}
> Stacktrace
> {noformat}
> java.lang.Exception: test timed out after 30 milliseconds
>   at java.lang.Object.wait(Native Method)
>   at java.lang.Object.wait(Object.java:503)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.waitAndQueuePacket(DataStreamer.java:804)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.enqueueCurrentPacket(DFSOutputStream.java:423)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.enqueueCurrentPacketFull(DFSOutputStream.java:432)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.writeChunk(DFSOutputStream.java:418)
>   at 
> org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunks(FSOutputSummer.java:217)
>   at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:125)
>   at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:111)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57)
>   at java.io.DataOutputStream.write(DataOutputStream.java:107)
>   at org.apache.hadoop.hdfs.DFSTestUtil.createFile(DFSTestUtil.java:418)
>   at org.apache.hadoop.hdfs.DFSTestUtil.createFile(DFSTestUtil.java:376)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestDirectoryScanner.createFile(TestDirectoryScanner.java:108)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestDirectoryScanner.testThrottling(TestDirectoryScanner.java:584)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9744) TestDirectoryScanner#testThrottling occasionally time out after 300 seconds

2016-04-05 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227601#comment-15227601
 ] 

Lin Yiqun commented on HDFS-9744:
-

Thanks [~jojochuang] for finding this. I agree comments from [~templedf]. We 
can just bump the timeout for {{TestDirectory#testThrottling}}. Attach a simple 
patch from me.

> TestDirectoryScanner#testThrottling occasionally time out after 300 seconds
> ---
>
> Key: HDFS-9744
> URL: https://issues.apache.org/jira/browse/HDFS-9744
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
> Environment: Jenkins
>Reporter: Wei-Chiu Chuang
>Priority: Minor
>  Labels: test
>
> I have seen quite a few test failures in TestDirectoryScanner#testThrottling.
> https://builds.apache.org/job/Hadoop-Hdfs-trunk/2793/testReport/org.apache.hadoop.hdfs.server.datanode/TestDirectoryScanner/testThrottling/
> Looking at the log, it does not look like the test got stucked. On my local 
> machine, this test took 219 seconds. It is likely that this test takes more 
> than 300 seconds to complete on a busy jenkins slave. I think it is 
> reasonable to set a longer time out value, or reduce the number of blocks to 
> reduce the duration of the test.
> Error Message
> {noformat}
> test timed out after 30 milliseconds
> {noformat}
> Stacktrace
> {noformat}
> java.lang.Exception: test timed out after 30 milliseconds
>   at java.lang.Object.wait(Native Method)
>   at java.lang.Object.wait(Object.java:503)
>   at 
> org.apache.hadoop.hdfs.DataStreamer.waitAndQueuePacket(DataStreamer.java:804)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.enqueueCurrentPacket(DFSOutputStream.java:423)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.enqueueCurrentPacketFull(DFSOutputStream.java:432)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.writeChunk(DFSOutputStream.java:418)
>   at 
> org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunks(FSOutputSummer.java:217)
>   at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:125)
>   at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:111)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57)
>   at java.io.DataOutputStream.write(DataOutputStream.java:107)
>   at org.apache.hadoop.hdfs.DFSTestUtil.createFile(DFSTestUtil.java:418)
>   at org.apache.hadoop.hdfs.DFSTestUtil.createFile(DFSTestUtil.java:376)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestDirectoryScanner.createFile(TestDirectoryScanner.java:108)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestDirectoryScanner.testThrottling(TestDirectoryScanner.java:584)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10204) Use HadoopIllegalArgumentException replace IllegalArgumentException in hadoop-hdfs

2016-04-05 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225850#comment-15225850
 ] 

Lin Yiqun commented on HDFS-10204:
--

Hi, [~aw], what do you think of this?

> Use HadoopIllegalArgumentException replace IllegalArgumentException in 
> hadoop-hdfs
> --
>
> Key: HDFS-10204
> URL: https://issues.apache.org/jira/browse/HDFS-10204
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS-10204.001.patch, HDFS-10204.002.patch, 
> HDFS-10204.003.patch
>
>
> In HDFS-1151, it recommended that HDFS should throw 
> HadoopIllegalArgumentException instead of IllegalArgumentException as 
> described in HADOOP-6537. The intent of that was to differentiate 
> IllegalArgumentException thrown by Hadoop rather than JDK. And in current 
> code, some palces has used {{HadoopIllegalArgumentException}}. But the most 
> of them were still using {{IllegalArgumentException}}.
> Current JIRA's scope focused on hadoop-hdfs, other project 
> hadoop-hdfs-client, etc can be updated later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10234) DistCp log output should contain copied and deleted files and directories

2016-04-05 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10234:
-
Attachment: HDFS-10234.002.patch

> DistCp log output should contain copied and deleted files and directories
> -
>
> Key: HDFS-10234
> URL: https://issues.apache.org/jira/browse/HDFS-10234
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: distcp
>Affects Versions: 2.7.1
>Reporter: Konstantin Shaposhnikov
>Assignee: Lin Yiqun
> Attachments: HDFS-10234.001.patch, HDFS-10234.002.patch
>
>
> DistCp log output (specified via {{-log}} command line option) currently 
> contains only skipped and failed (when failures are ignored via {{-i}}) files.
> It will be more useful if it also contains copied and deleted files and 
> created directories.
> This should be fixed in 
> https://github.com/apache/hadoop/blob/branch-2.7.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyMapper.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10234) DistCp log output should contain copied and deleted files and directories

2016-04-05 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225784#comment-15225784
 ] 

Lin Yiqun commented on HDFS-10234:
--

Thanks [~k.shaposhni...@gmail.com] for concrete descriptions. Update the latest 
patch for addressing your comments.

> DistCp log output should contain copied and deleted files and directories
> -
>
> Key: HDFS-10234
> URL: https://issues.apache.org/jira/browse/HDFS-10234
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: distcp
>Affects Versions: 2.7.1
>Reporter: Konstantin Shaposhnikov
>Assignee: Lin Yiqun
> Attachments: HDFS-10234.001.patch
>
>
> DistCp log output (specified via {{-log}} command line option) currently 
> contains only skipped and failed (when failures are ignored via {{-i}}) files.
> It will be more useful if it also contains copied and deleted files and 
> created directories.
> This should be fixed in 
> https://github.com/apache/hadoop/blob/branch-2.7.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyMapper.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9599) TestDecommissioningStatus.testDecommissionStatus occasionally fails

2016-04-04 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15225666#comment-15225666
 ] 

Lin Yiqun commented on HDFS-9599:
-

Thanks [~iwasakims] for commit!

> TestDecommissioningStatus.testDecommissionStatus occasionally fails
> ---
>
> Key: HDFS-9599
> URL: https://issues.apache.org/jira/browse/HDFS-9599
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
> Environment: Jenkins
>Reporter: Wei-Chiu Chuang
>Assignee: Lin Yiqun
> Fix For: 2.8.0
>
> Attachments: HDFS-9599.001.patch, HDFS-9599.002.patch
>
>
> From test result of a recent jenkins nightly 
> https://builds.apache.org/job/Hadoop-Hdfs-trunk/2663/testReport/junit/org.apache.hadoop.hdfs.server.namenode/TestDecommissioningStatus/testDecommissionStatus/
> The test failed because the number of under replicated blocks is 4, instead 
> of 3.
> Looking at the log, there is a strayed block, which might have caused the 
> faillure:
> {noformat}
> 2015-12-23 00:42:05,820 [Block report processor] INFO  BlockStateChange 
> (BlockManager.java:processReport(2131)) - BLOCK* processReport: 
> blk_1073741825_1001 on node 127.0.0.1:57382 size 16384 does not belong to any 
> file
> {noformat}
> The block size 16384 suggests this is left over from the sibling test case 
> testDecommissionStatusAfterDNRestart. This can happen, because the same 
> minidfs cluster is reused between tests.
> The test implementation should do a better job isolating tests.
> Another case of failure is when the load factor comes into play, and a block 
> can not find sufficient data nodes to place replica. In this test, the 
> runtime should not consider load factor:
> {noformat}
> conf.setBoolean(DFSConfigKeys.DFS_NAMENODE_REPLICATION_CONSIDERLOAD_KEY, 
> false);
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9847) HDFS configuration without time unit name should accept friendly time units

2016-03-31 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-9847:

Attachment: HDFS-9847-nothrow.004.patch

> HDFS configuration without time unit name should accept friendly time units
> ---
>
> Key: HDFS-9847
> URL: https://issues.apache.org/jira/browse/HDFS-9847
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS-9847-branch-2.001.patch, 
> HDFS-9847-branch-2.002.patch, HDFS-9847-nothrow.001.patch, 
> HDFS-9847-nothrow.002.patch, HDFS-9847-nothrow.003.patch, 
> HDFS-9847-nothrow.004.patch, HDFS-9847.001.patch, HDFS-9847.002.patch, 
> HDFS-9847.003.patch, HDFS-9847.004.patch, HDFS-9847.005.patch, 
> HDFS-9847.006.patch, branch-2-delta.002.txt, timeduration-w-y.patch
>
>
> In HDFS-9821, it talks about the issue of leting existing keys use friendly 
> units e.g. 60s, 5m, 1d, 6w etc. But there are som configuration key names 
> contain time unit name, like {{dfs.blockreport.intervalMsec}}, so we can make 
> some other configurations which without time unit name to accept friendly 
> time units. The time unit  {{seconds}} is frequently used in hdfs. We can 
> updating this configurations first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9847) HDFS configuration without time unit name should accept friendly time units

2016-03-31 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-9847:

Attachment: (was: HDFS-9847-nothrow.004.patch)

> HDFS configuration without time unit name should accept friendly time units
> ---
>
> Key: HDFS-9847
> URL: https://issues.apache.org/jira/browse/HDFS-9847
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS-9847-branch-2.001.patch, 
> HDFS-9847-branch-2.002.patch, HDFS-9847-nothrow.001.patch, 
> HDFS-9847-nothrow.002.patch, HDFS-9847-nothrow.003.patch, 
> HDFS-9847-nothrow.004.patch, HDFS-9847.001.patch, HDFS-9847.002.patch, 
> HDFS-9847.003.patch, HDFS-9847.004.patch, HDFS-9847.005.patch, 
> HDFS-9847.006.patch, branch-2-delta.002.txt, timeduration-w-y.patch
>
>
> In HDFS-9821, it talks about the issue of leting existing keys use friendly 
> units e.g. 60s, 5m, 1d, 6w etc. But there are som configuration key names 
> contain time unit name, like {{dfs.blockreport.intervalMsec}}, so we can make 
> some other configurations which without time unit name to accept friendly 
> time units. The time unit  {{seconds}} is frequently used in hdfs. We can 
> updating this configurations first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9847) HDFS configuration without time unit name should accept friendly time units

2016-03-31 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15221199#comment-15221199
 ] 

Lin Yiqun commented on HDFS-9847:
-

Update the nothrow-004 patch for addressing comments as [~vinayrpet] suggested.

> HDFS configuration without time unit name should accept friendly time units
> ---
>
> Key: HDFS-9847
> URL: https://issues.apache.org/jira/browse/HDFS-9847
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS-9847-branch-2.001.patch, 
> HDFS-9847-branch-2.002.patch, HDFS-9847-nothrow.001.patch, 
> HDFS-9847-nothrow.002.patch, HDFS-9847-nothrow.003.patch, 
> HDFS-9847-nothrow.004.patch, HDFS-9847.001.patch, HDFS-9847.002.patch, 
> HDFS-9847.003.patch, HDFS-9847.004.patch, HDFS-9847.005.patch, 
> HDFS-9847.006.patch, branch-2-delta.002.txt, timeduration-w-y.patch
>
>
> In HDFS-9821, it talks about the issue of leting existing keys use friendly 
> units e.g. 60s, 5m, 1d, 6w etc. But there are som configuration key names 
> contain time unit name, like {{dfs.blockreport.intervalMsec}}, so we can make 
> some other configurations which without time unit name to accept friendly 
> time units. The time unit  {{seconds}} is frequently used in hdfs. We can 
> updating this configurations first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9847) HDFS configuration without time unit name should accept friendly time units

2016-03-31 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15220984#comment-15220984
 ] 

Lin Yiqun commented on HDFS-9847:
-

Update the patch nothrow-004 with leaving the defaults in hdfs-default.xml. The 
code in hadoop main source and unit tests should be ok because it has tested in 
nothow-003 patch which was changing the defaults with time unit.

> HDFS configuration without time unit name should accept friendly time units
> ---
>
> Key: HDFS-9847
> URL: https://issues.apache.org/jira/browse/HDFS-9847
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS-9847-branch-2.001.patch, 
> HDFS-9847-branch-2.002.patch, HDFS-9847-nothrow.001.patch, 
> HDFS-9847-nothrow.002.patch, HDFS-9847-nothrow.003.patch, 
> HDFS-9847-nothrow.004.patch, HDFS-9847.001.patch, HDFS-9847.002.patch, 
> HDFS-9847.003.patch, HDFS-9847.004.patch, HDFS-9847.005.patch, 
> HDFS-9847.006.patch, branch-2-delta.002.txt, timeduration-w-y.patch
>
>
> In HDFS-9821, it talks about the issue of leting existing keys use friendly 
> units e.g. 60s, 5m, 1d, 6w etc. But there are som configuration key names 
> contain time unit name, like {{dfs.blockreport.intervalMsec}}, so we can make 
> some other configurations which without time unit name to accept friendly 
> time units. The time unit  {{seconds}} is frequently used in hdfs. We can 
> updating this configurations first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9847) HDFS configuration without time unit name should accept friendly time units

2016-03-31 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-9847:

Attachment: HDFS-9847-nothrow.004.patch

> HDFS configuration without time unit name should accept friendly time units
> ---
>
> Key: HDFS-9847
> URL: https://issues.apache.org/jira/browse/HDFS-9847
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS-9847-branch-2.001.patch, 
> HDFS-9847-branch-2.002.patch, HDFS-9847-nothrow.001.patch, 
> HDFS-9847-nothrow.002.patch, HDFS-9847-nothrow.003.patch, 
> HDFS-9847-nothrow.004.patch, HDFS-9847.001.patch, HDFS-9847.002.patch, 
> HDFS-9847.003.patch, HDFS-9847.004.patch, HDFS-9847.005.patch, 
> HDFS-9847.006.patch, branch-2-delta.002.txt, timeduration-w-y.patch
>
>
> In HDFS-9821, it talks about the issue of leting existing keys use friendly 
> units e.g. 60s, 5m, 1d, 6w etc. But there are som configuration key names 
> contain time unit name, like {{dfs.blockreport.intervalMsec}}, so we can make 
> some other configurations which without time unit name to accept friendly 
> time units. The time unit  {{seconds}} is frequently used in hdfs. We can 
> updating this configurations first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10234) DistCp log output should contain copied and deleted files and directories

2016-03-31 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15219646#comment-15219646
 ] 

Lin Yiqun commented on HDFS-10234:
--

Upload a simple patch from me. I added the log output info for copying and and 
creating directories. But it seems there was no place indicated that  the file 
is deleted in class {{CopyMapper}}. Can this patch satisfied you. If I was 
missing something, you can tell me.

> DistCp log output should contain copied and deleted files and directories
> -
>
> Key: HDFS-10234
> URL: https://issues.apache.org/jira/browse/HDFS-10234
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: distcp
>Affects Versions: 2.7.1
>Reporter: Konstantin Shaposhnikov
> Attachments: HDFS-10234.001.patch
>
>
> DistCp log output (specified via {{-log}} command line option) currently 
> contains only skipped and failed (when failures are ignored via {{-i}}) files.
> It will be more useful if it also contains copied and deleted files and 
> created directories.
> This should be fixed in 
> https://github.com/apache/hadoop/blob/branch-2.7.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyMapper.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10234) DistCp log output should contain copied and deleted files and directories

2016-03-31 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10234:
-
Attachment: HDFS-10234.001.patch

> DistCp log output should contain copied and deleted files and directories
> -
>
> Key: HDFS-10234
> URL: https://issues.apache.org/jira/browse/HDFS-10234
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: distcp
>Affects Versions: 2.7.1
>Reporter: Konstantin Shaposhnikov
>Assignee: Lin Yiqun
> Attachments: HDFS-10234.001.patch
>
>
> DistCp log output (specified via {{-log}} command line option) currently 
> contains only skipped and failed (when failures are ignored via {{-i}}) files.
> It will be more useful if it also contains copied and deleted files and 
> created directories.
> This should be fixed in 
> https://github.com/apache/hadoop/blob/branch-2.7.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyMapper.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-10234) DistCp log output should contain copied and deleted files and directories

2016-03-31 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun reassigned HDFS-10234:


Assignee: Lin Yiqun

> DistCp log output should contain copied and deleted files and directories
> -
>
> Key: HDFS-10234
> URL: https://issues.apache.org/jira/browse/HDFS-10234
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: distcp
>Affects Versions: 2.7.1
>Reporter: Konstantin Shaposhnikov
>Assignee: Lin Yiqun
> Attachments: HDFS-10234.001.patch
>
>
> DistCp log output (specified via {{-log}} command line option) currently 
> contains only skipped and failed (when failures are ignored via {{-i}}) files.
> It will be more useful if it also contains copied and deleted files and 
> created directories.
> This should be fixed in 
> https://github.com/apache/hadoop/blob/branch-2.7.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyMapper.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10234) DistCp log output should contain copied and deleted files and directories

2016-03-31 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10234:
-
Status: Patch Available  (was: Open)

> DistCp log output should contain copied and deleted files and directories
> -
>
> Key: HDFS-10234
> URL: https://issues.apache.org/jira/browse/HDFS-10234
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: distcp
>Affects Versions: 2.7.1
>Reporter: Konstantin Shaposhnikov
>Assignee: Lin Yiqun
> Attachments: HDFS-10234.001.patch
>
>
> DistCp log output (specified via {{-log}} command line option) currently 
> contains only skipped and failed (when failures are ignored via {{-i}}) files.
> It will be more useful if it also contains copied and deleted files and 
> created directories.
> This should be fixed in 
> https://github.com/apache/hadoop/blob/branch-2.7.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyMapper.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9847) HDFS configuration without time unit name should accept friendly time units

2016-03-30 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15219169#comment-15219169
 ] 

Lin Yiqun commented on HDFS-9847:
-

Hi, [~chris.douglas], the latest v003 patch has fix these errors. The latest 
jenkins report's failed unit tests is not related to this JIRA. I also agree on 
this:
{quote}
IMO, we should go ahead without changing the default values and breaking tests 
in this Jira.
{quote}
We can just update hdfs-default.xml without changing the default values in v003 
patch.

> HDFS configuration without time unit name should accept friendly time units
> ---
>
> Key: HDFS-9847
> URL: https://issues.apache.org/jira/browse/HDFS-9847
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS-9847-branch-2.001.patch, 
> HDFS-9847-branch-2.002.patch, HDFS-9847-nothrow.001.patch, 
> HDFS-9847-nothrow.002.patch, HDFS-9847-nothrow.003.patch, 
> HDFS-9847.001.patch, HDFS-9847.002.patch, HDFS-9847.003.patch, 
> HDFS-9847.004.patch, HDFS-9847.005.patch, HDFS-9847.006.patch, 
> branch-2-delta.002.txt, timeduration-w-y.patch
>
>
> In HDFS-9821, it talks about the issue of leting existing keys use friendly 
> units e.g. 60s, 5m, 1d, 6w etc. But there are som configuration key names 
> contain time unit name, like {{dfs.blockreport.intervalMsec}}, so we can make 
> some other configurations which without time unit name to accept friendly 
> time units. The time unit  {{seconds}} is frequently used in hdfs. We can 
> updating this configurations first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10197) TestFsDatasetCache failing intermittently due to timeout

2016-03-30 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15217515#comment-15217515
 ] 

Lin Yiqun commented on HDFS-10197:
--

Thanks [~andrew.wang] for commit!

> TestFsDatasetCache failing intermittently due to timeout
> 
>
> Key: HDFS-10197
> URL: https://issues.apache.org/jira/browse/HDFS-10197
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Fix For: 2.8.0
>
> Attachments: HDFS-10197.001.patch, HDFS-10197.002.patch
>
>
> In {{TestFsDatasetCache}}, the unit tests failed sometimes. I collected some 
> failed reason in recent jenkins reports. They are all timeout errors.
> {code}
> Tests in error: 
>   TestFsDatasetCache.testFilesExceedMaxLockedMemory:378 ? Timeout Timed out 
> wait...
>   TestFsDatasetCache.tearDown:149 ? Timeout Timed out waiting for condition. 
> Thr...
> {code}
> {code}
> Tests in error: 
>   TestFsDatasetCache.testPageRounder:474 ?  test timed out after 6 
> milliseco...
>   TestBalancer.testUnknownDatanodeSimple:1040->testUnknownDatanode:1098 ?  
> test ...
> {code}
> But there was a little different between these failure.
> * The first because the total block time was exceed the 
> {{waitTimeMillis}}(here is 60s)  then throw the timeout exception and print 
> thread diagnostic string in method {{DFSTestUtil#verifyExpectedCacheUsage}}.
> {code}
> long st = Time.now();
> do {
>   boolean result = check.get();
>   if (result) {
> return;
>   }
>   
>   Thread.sleep(checkEveryMillis);
> } while (Time.now() - st < waitForMillis);
> 
> throw new TimeoutException("Timed out waiting for condition. " +
> "Thread diagnostics:\n" +
> TimedOutTestsListener.buildThreadDiagnosticString());
> {code}
> * The second is due to test elapsed time more than timeout time setting. Like 
> in {{TestFsDatasetCache#testPageRounder}}.
> We should adjust timeout time for these unit test which would failed 
> sometimes due to timeout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9847) HDFS configuration without time unit name should accept friendly time units

2016-03-30 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-9847:

Attachment: HDFS-9847-nothrow.003.patch

> HDFS configuration without time unit name should accept friendly time units
> ---
>
> Key: HDFS-9847
> URL: https://issues.apache.org/jira/browse/HDFS-9847
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS-9847-branch-2.001.patch, 
> HDFS-9847-branch-2.002.patch, HDFS-9847-nothrow.001.patch, 
> HDFS-9847-nothrow.002.patch, HDFS-9847-nothrow.003.patch, 
> HDFS-9847.001.patch, HDFS-9847.002.patch, HDFS-9847.003.patch, 
> HDFS-9847.004.patch, HDFS-9847.005.patch, HDFS-9847.006.patch, 
> branch-2-delta.002.txt, timeduration-w-y.patch
>
>
> In HDFS-9821, it talks about the issue of leting existing keys use friendly 
> units e.g. 60s, 5m, 1d, 6w etc. But there are som configuration key names 
> contain time unit name, like {{dfs.blockreport.intervalMsec}}, so we can make 
> some other configurations which without time unit name to accept friendly 
> time units. The time unit  {{seconds}} is frequently used in hdfs. We can 
> updating this configurations first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9847) HDFS configuration without time unit name should accept friendly time units

2016-03-30 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-9847:

Attachment: (was: HDFS-9847-nothrow.003.patch)

> HDFS configuration without time unit name should accept friendly time units
> ---
>
> Key: HDFS-9847
> URL: https://issues.apache.org/jira/browse/HDFS-9847
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS-9847-branch-2.001.patch, 
> HDFS-9847-branch-2.002.patch, HDFS-9847-nothrow.001.patch, 
> HDFS-9847-nothrow.002.patch, HDFS-9847-nothrow.003.patch, 
> HDFS-9847.001.patch, HDFS-9847.002.patch, HDFS-9847.003.patch, 
> HDFS-9847.004.patch, HDFS-9847.005.patch, HDFS-9847.006.patch, 
> branch-2-delta.002.txt, timeduration-w-y.patch
>
>
> In HDFS-9821, it talks about the issue of leting existing keys use friendly 
> units e.g. 60s, 5m, 1d, 6w etc. But there are som configuration key names 
> contain time unit name, like {{dfs.blockreport.intervalMsec}}, so we can make 
> some other configurations which without time unit name to accept friendly 
> time units. The time unit  {{seconds}} is frequently used in hdfs. We can 
> updating this configurations first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9847) HDFS configuration without time unit name should accept friendly time units

2016-03-30 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15217465#comment-15217465
 ] 

Lin Yiqun commented on HDFS-9847:
-

{quote}
Minimally, all the changes to variables in the main source tree should 
understand the new type assigned to them here.
{quote}
I have tried to change these variables and bring up the cluster in my local. 
There were actually some places has not updated to the way {{getTimeDuration}}. 
Updated a complete patch. I'm not sure the how many unit tests will failed, but 
it seems that most tests only use method {{setInt}} or {{setLong}} rather than 
{{getLong}}.

> HDFS configuration without time unit name should accept friendly time units
> ---
>
> Key: HDFS-9847
> URL: https://issues.apache.org/jira/browse/HDFS-9847
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: HDFS-9847-branch-2.001.patch, 
> HDFS-9847-branch-2.002.patch, HDFS-9847-nothrow.001.patch, 
> HDFS-9847-nothrow.002.patch, HDFS-9847-nothrow.003.patch, 
> HDFS-9847.001.patch, HDFS-9847.002.patch, HDFS-9847.003.patch, 
> HDFS-9847.004.patch, HDFS-9847.005.patch, HDFS-9847.006.patch, 
> branch-2-delta.002.txt, timeduration-w-y.patch
>
>
> In HDFS-9821, it talks about the issue of leting existing keys use friendly 
> units e.g. 60s, 5m, 1d, 6w etc. But there are som configuration key names 
> contain time unit name, like {{dfs.blockreport.intervalMsec}}, so we can make 
> some other configurations which without time unit name to accept friendly 
> time units. The time unit  {{seconds}} is frequently used in hdfs. We can 
> updating this configurations first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   4   >