[jira] [Commented] (HDFS-10285) Storage Policy Satisfier in Namenode

2016-04-27 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261637#comment-15261637
 ] 

Kai Zheng commented on HDFS-10285:
--

Thanks Uma.
bq. Could you elaborate a bit?
Sure, I meant, current distcp tool can specify with options whether to reserve 
or copy the original file settings like block size or not when copying a file 
from source place to destination folder. I suggested we might also consider to 
add an option to specify if to reserve or copy the storage policy property or 
not, if it sounds good to help in the mentioned scenario.

> Storage Policy Satisfier in Namenode
> 
>
> Key: HDFS-10285
> URL: https://issues.apache.org/jira/browse/HDFS-10285
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Affects Versions: 2.7.2
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>
> Heterogeneous storage in HDFS introduced the concept of storage policy. These 
> policies can be set on directory/file to specify the user preference, where 
> to store the physical block. When user set the storage policy before writing 
> data, then the blocks could take advantage of storage policy preferences and 
> stores physical block accordingly. 
> If user set the storage policy after writing and completing the file, then 
> the blocks would have been written with default storage policy (nothing but 
> DISK). User has to run the ‘Mover tool’ explicitly by specifying all such 
> file names as a list. In some distributed system scenarios (ex: HBase) it 
> would be difficult to collect all the files and run the tool as different 
> nodes can write files separately and file can have different paths.
> Another scenarios is, when user rename the files from one effected storage 
> policy file (inherited policy from parent directory) to another storage 
> policy effected directory, it will not copy inherited storage policy from 
> source. So it will take effect from destination file/dir parent storage 
> policy. This rename operation is just a metadata change in Namenode. The 
> physical blocks still remain with source storage policy.
> So, Tracking all such business logic based file names could be difficult for 
> admins from distributed nodes(ex: region servers) and running the Mover tool. 
> Here the proposal is to provide an API from Namenode itself for trigger the 
> storage policy satisfaction. A Daemon thread inside Namenode should track 
> such calls and process to DN as movement commands. 
> Will post the detailed design thoughts document soon. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10285) Storage Policy Satisfier in Namenode

2016-04-27 Thread Uma Maheswara Rao G (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261581#comment-15261581
 ] 

Uma Maheswara Rao G commented on HDFS-10285:


{quote}
So does this mean there would be a need to reserve and copy the inherited 
storage policy in distcp tool?
{quote}
The current implementation does not copy the source storage policy.  What you 
mean by preserve here? Sorry I did not follow this. Could you elaborate a bit?

{quote}
Yeah, having an API to allow applications to trigger the mover behavior sounds 
good. As mentioned in the proposal, there is a need in HBase on HDFS HSM. Maybe 
Jingcheng Du and Wei Zhou could have detailed description about this as I know 
you have the relevant work.
{quote}
That will be great!

Thanks a lot, Kai for your comments.

> Storage Policy Satisfier in Namenode
> 
>
> Key: HDFS-10285
> URL: https://issues.apache.org/jira/browse/HDFS-10285
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Affects Versions: 2.7.2
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>
> Heterogeneous storage in HDFS introduced the concept of storage policy. These 
> policies can be set on directory/file to specify the user preference, where 
> to store the physical block. When user set the storage policy before writing 
> data, then the blocks could take advantage of storage policy preferences and 
> stores physical block accordingly. 
> If user set the storage policy after writing and completing the file, then 
> the blocks would have been written with default storage policy (nothing but 
> DISK). User has to run the ‘Mover tool’ explicitly by specifying all such 
> file names as a list. In some distributed system scenarios (ex: HBase) it 
> would be difficult to collect all the files and run the tool as different 
> nodes can write files separately and file can have different paths.
> Another scenarios is, when user rename the files from one effected storage 
> policy file (inherited policy from parent directory) to another storage 
> policy effected directory, it will not copy inherited storage policy from 
> source. So it will take effect from destination file/dir parent storage 
> policy. This rename operation is just a metadata change in Namenode. The 
> physical blocks still remain with source storage policy.
> So, Tracking all such business logic based file names could be difficult for 
> admins from distributed nodes(ex: region servers) and running the Mover tool. 
> Here the proposal is to provide an API from Namenode itself for trigger the 
> storage policy satisfaction. A Daemon thread inside Namenode should track 
> such calls and process to DN as movement commands. 
> Will post the detailed design thoughts document soon. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10337) OfflineEditsViewer stats option should print 0 instead of null for the count of operations

2016-04-27 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10337:
-
Attachment: HDFS-10337.002.patch

Thanks for the quick review. Update the latest patch for addressing the comment.

> OfflineEditsViewer stats option should print 0 instead of null for the count 
> of operations
> --
>
> Key: HDFS-10337
> URL: https://issues.apache.org/jira/browse/HDFS-10337
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Akira AJISAKA
>Assignee: Lin Yiqun
>Priority: Minor
>  Labels: newbie
> Attachments: HDFS-10337.001.patch, HDFS-10337.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8449) Add tasks count metrics to datanode for ECWorker

2016-04-27 Thread Li Bo (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261472#comment-15261472
 ] 

Li Bo commented on HDFS-8449:
-

The two failed unit tests have no relation with this patch and will be solved 
in HDFS-10334. The check style problems also can be ignored.
Hi [~drankye], could you help me review the patch again? Thanks.

> Add tasks count metrics to datanode for ECWorker
> 
>
> Key: HDFS-8449
> URL: https://issues.apache.org/jira/browse/HDFS-8449
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Li Bo
>Assignee: Li Bo
> Attachments: HDFS-8449-000.patch, HDFS-8449-001.patch, 
> HDFS-8449-002.patch, HDFS-8449-003.patch, HDFS-8449-004.patch, 
> HDFS-8449-005.patch, HDFS-8449-006.patch
>
>
> This sub task try to record ec recovery tasks that a datanode has done, 
> including total tasks, failed tasks and sucessful tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9276) Failed to Update HDFS Delegation Token for long running application in HA mode

2016-04-27 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261467#comment-15261467
 ] 

Yi Liu commented on HDFS-9276:
--

Not only one people tell me they encounter this issue in real cluster, and ask 
me to help pushing the fix.
>From my point of view, the approach in this patch is generally OK, and may 
>still need some refinement. 

[~daryn], [~ste...@apache.org], [~cnauroth], could you help to check too?



> Failed to Update HDFS Delegation Token for long running application in HA mode
> --
>
> Key: HDFS-9276
> URL: https://issues.apache.org/jira/browse/HDFS-9276
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs, ha, security
>Affects Versions: 2.7.1
>Reporter: Liangliang Gu
>Assignee: Liangliang Gu
> Attachments: HDFS-9276.01.patch, HDFS-9276.02.patch, 
> HDFS-9276.03.patch, HDFS-9276.04.patch, HDFS-9276.05.patch, 
> HDFS-9276.06.patch, HDFS-9276.07.patch, HDFS-9276.08.patch, 
> HDFS-9276.09.patch, HDFS-9276.10.patch, HDFS-9276.11.patch, 
> HDFS-9276.12.patch, HDFS-9276.13.patch, debug1.PNG, debug2.PNG
>
>
> The Scenario is as follows:
> 1. NameNode HA is enabled.
> 2. Kerberos is enabled.
> 3. HDFS Delegation Token (not Keytab or TGT) is used to communicate with 
> NameNode.
> 4. We want to update the HDFS Delegation Token for long running applicatons. 
> HDFS Client will generate private tokens for each NameNode. When we update 
> the HDFS Delegation Token, these private tokens will not be updated, which 
> will cause token expired.
> This bug can be reproduced by the following program:
> {code}
> import java.security.PrivilegedExceptionAction
> import org.apache.hadoop.conf.Configuration
> import org.apache.hadoop.fs.{FileSystem, Path}
> import org.apache.hadoop.security.UserGroupInformation
> object HadoopKerberosTest {
>   def main(args: Array[String]): Unit = {
> val keytab = "/path/to/keytab/xxx.keytab"
> val principal = "x...@abc.com"
> val creds1 = new org.apache.hadoop.security.Credentials()
> val ugi1 = 
> UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
> ugi1.doAs(new PrivilegedExceptionAction[Void] {
>   // Get a copy of the credentials
>   override def run(): Void = {
> val fs = FileSystem.get(new Configuration())
> fs.addDelegationTokens("test", creds1)
> null
>   }
> })
> val ugi = UserGroupInformation.createRemoteUser("test")
> ugi.addCredentials(creds1)
> ugi.doAs(new PrivilegedExceptionAction[Void] {
>   // Get a copy of the credentials
>   override def run(): Void = {
> var i = 0
> while (true) {
>   val creds1 = new org.apache.hadoop.security.Credentials()
>   val ugi1 = 
> UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
>   ugi1.doAs(new PrivilegedExceptionAction[Void] {
> // Get a copy of the credentials
> override def run(): Void = {
>   val fs = FileSystem.get(new Configuration())
>   fs.addDelegationTokens("test", creds1)
>   null
> }
>   })
>   UserGroupInformation.getCurrentUser.addCredentials(creds1)
>   val fs = FileSystem.get( new Configuration())
>   i += 1
>   println()
>   println(i)
>   println(fs.listFiles(new Path("/user"), false))
>   Thread.sleep(60 * 1000)
> }
> null
>   }
> })
>   }
> }
> {code}
> To reproduce the bug, please set the following configuration to Name Node:
> {code}
> dfs.namenode.delegation.token.max-lifetime = 10min
> dfs.namenode.delegation.key.update-interval = 3min
> dfs.namenode.delegation.token.renew-interval = 3min
> {code}
> The bug will occure after 3 minutes.
> The stacktrace is:
> {code}
> Exception in thread "main" 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (HDFS_DELEGATION_TOKEN token 330156 for test) is expired
>   at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>  

[jira] [Assigned] (HDFS-10338) DistCp masks potential CRC check failures

2016-04-27 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun reassigned HDFS-10338:


Assignee: Lin Yiqun

> DistCp masks potential CRC check failures
> -
>
> Key: HDFS-10338
> URL: https://issues.apache.org/jira/browse/HDFS-10338
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 2.7.1
>Reporter: Elliot West
>Assignee: Lin Yiqun
>
> There appear to be edge cases whereby CRC checks may be circumvented when 
> requests for checksums from the source or target file system fail. In this 
> event CRCs could differ between the source and target and yet the DistCp copy 
> would succeed, even when the 'skip CRC check' option is not being used.
> The code in question is contained in the method 
> [{{org.apache.hadoop.tools.util.DistCpUtils#checksumsAreEqual(...)}}|https://github.com/apache/hadoop/blob/release-2.7.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/DistCpUtils.java#L457]
> Specifically this code block suggests that if there is a failure when trying 
> to read the source or target checksum then the method will return {{true}} 
> (i.e.  the checksums are equal), implying that the check succeeded. In actual 
> fact we just failed to obtain the checksum and could not perform the check.
> {code}
> try {
>   sourceChecksum = sourceChecksum != null ? sourceChecksum : 
> sourceFS.getFileChecksum(source);
>   targetChecksum = targetFS.getFileChecksum(target);
> } catch (IOException e) {
>   LOG.error("Unable to retrieve checksum for " + source + " or "
> + target, e);
> }
> return (sourceChecksum == null || targetChecksum == null ||
>   sourceChecksum.equals(targetChecksum));
> {code}
> I believe that at the very least the caught {{IOException}} should be 
> re-thrown. If this is not deemed desirable then I believe an option 
> ({{--strictCrc}}?) should be added to enforce a strict check where we require 
> that both the source and target CRCs are retrieved, are not null, and are 
> then compared for equality. If for any reason either of the CRCs retrievals 
> fail then an exception is thrown.
> Clearly some {{FileSystems}} do not support CRCs and invocations to 
> {{FileSystem.getFileChecksum(...)}} return {{null}} in these instances. I 
> would suggest that these should fail a strict CRC check to prevent users 
> developing a false sense of security in their copy pipeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10338) DistCp masks potential CRC check failures

2016-04-27 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261456#comment-15261456
 ] 

Lin Yiqun commented on HDFS-10338:
--

Hi, [~teabot], I have two comments for this:

* It looks the option {{ignoreFailures}} that [~liuml07] suggested will be 
better. In one sense, the {{strictCrc}} option has same meaning with 
{{skipcrccheck}} which are both doing a crc check. However now, we will do a 
strict crc check, there will be more failures in checksum comparing. So the new 
option {{ignoreFailures}} will be reasonable.

* I agree with you that some {{FileSystems}} do not support CRCs should be as a 
failed case.

Assign this work to me. If there are no other comments, I will post a patch 
later for addressing the comments as mentioned above.

> DistCp masks potential CRC check failures
> -
>
> Key: HDFS-10338
> URL: https://issues.apache.org/jira/browse/HDFS-10338
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 2.7.1
>Reporter: Elliot West
>
> There appear to be edge cases whereby CRC checks may be circumvented when 
> requests for checksums from the source or target file system fail. In this 
> event CRCs could differ between the source and target and yet the DistCp copy 
> would succeed, even when the 'skip CRC check' option is not being used.
> The code in question is contained in the method 
> [{{org.apache.hadoop.tools.util.DistCpUtils#checksumsAreEqual(...)}}|https://github.com/apache/hadoop/blob/release-2.7.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/DistCpUtils.java#L457]
> Specifically this code block suggests that if there is a failure when trying 
> to read the source or target checksum then the method will return {{true}} 
> (i.e.  the checksums are equal), implying that the check succeeded. In actual 
> fact we just failed to obtain the checksum and could not perform the check.
> {code}
> try {
>   sourceChecksum = sourceChecksum != null ? sourceChecksum : 
> sourceFS.getFileChecksum(source);
>   targetChecksum = targetFS.getFileChecksum(target);
> } catch (IOException e) {
>   LOG.error("Unable to retrieve checksum for " + source + " or "
> + target, e);
> }
> return (sourceChecksum == null || targetChecksum == null ||
>   sourceChecksum.equals(targetChecksum));
> {code}
> I believe that at the very least the caught {{IOException}} should be 
> re-thrown. If this is not deemed desirable then I believe an option 
> ({{--strictCrc}}?) should be added to enforce a strict check where we require 
> that both the source and target CRCs are retrieved, are not null, and are 
> then compared for equality. If for any reason either of the CRCs retrievals 
> fail then an exception is thrown.
> Clearly some {{FileSystems}} do not support CRCs and invocations to 
> {{FileSystem.getFileChecksum(...)}} return {{null}} in these instances. I 
> would suggest that these should fail a strict CRC check to prevent users 
> developing a false sense of security in their copy pipeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9869) Erasure Coding: Rename replication-based names in BlockManager to more generic [part-2]

2016-04-27 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261451#comment-15261451
 ] 

Rakesh R commented on HDFS-9869:


Thanks a lot [~zhz], [~andrew.wang] for the useful discussions and help in 
resolving this jira.

> Erasure Coding: Rename replication-based names in BlockManager to more 
> generic [part-2]
> ---
>
> Key: HDFS-9869
> URL: https://issues.apache.org/jira/browse/HDFS-9869
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Reporter: Rakesh R
>Assignee: Rakesh R
>  Labels: hdfs-ec-3.0-must-do
> Fix For: 3.0.0
>
> Attachments: HDFS-9869-001.patch, HDFS-9869-002.patch, 
> HDFS-9869-003.patch, HDFS-9869-004.patch, HDFS-9869-005.patch, 
> HDFS-9869-006.patch, HDFS-9869-007.patch
>
>
> The idea of this jira is to rename the following entities in BlockManager as,
> - {{PendingReplicationBlocks}} to {{PendingReconstructionBlocks}}
> - {{excessReplicateMap}} to {{extraRedundancyMap}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10337) OfflineEditsViewer stats option should print 0 instead of null for the count of operations

2016-04-27 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261442#comment-15261442
 ] 

Akira AJISAKA commented on HDFS-10337:
--

Thank you for your patch. Yes, it fixes this issue. In addition, would you use 
StringBuilder instead of StringBuffer? I'm +1 if that is addressed.

> OfflineEditsViewer stats option should print 0 instead of null for the count 
> of operations
> --
>
> Key: HDFS-10337
> URL: https://issues.apache.org/jira/browse/HDFS-10337
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Akira AJISAKA
>Assignee: Lin Yiqun
>Priority: Minor
>  Labels: newbie
> Attachments: HDFS-10337.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10337) OfflineEditsViewer stats option should print 0 instead of null for the count of operations

2016-04-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261402#comment-15261402
 ] 

Hadoop QA commented on HDFS-10337:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 5s {color} 
| {color:red} Docker failed to build yetus/hadoop:7b1c37a. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12801167/HDFS-10337.001.patch |
| JIRA Issue | HDFS-10337 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15312/console |
| Powered by | Apache Yetus 0.2.0   http://yetus.apache.org |


This message was automatically generated.



> OfflineEditsViewer stats option should print 0 instead of null for the count 
> of operations
> --
>
> Key: HDFS-10337
> URL: https://issues.apache.org/jira/browse/HDFS-10337
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Akira AJISAKA
>Assignee: Lin Yiqun
>Priority: Minor
>  Labels: newbie
> Attachments: HDFS-10337.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10337) OfflineEditsViewer stats option should print 0 instead of null for the count of operations

2016-04-27 Thread Lin Yiqun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261401#comment-15261401
 ] 

Lin Yiqun commented on HDFS-10337:
--

HI, [~ajisakaa], I attach a patch for this, can this patch satisfied your issue?

> OfflineEditsViewer stats option should print 0 instead of null for the count 
> of operations
> --
>
> Key: HDFS-10337
> URL: https://issues.apache.org/jira/browse/HDFS-10337
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Akira AJISAKA
>Assignee: Lin Yiqun
>Priority: Minor
>  Labels: newbie
> Attachments: HDFS-10337.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10337) OfflineEditsViewer stats option should print 0 instead of null for the count of operations

2016-04-27 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10337:
-
Attachment: HDFS-10337.001.patch

> OfflineEditsViewer stats option should print 0 instead of null for the count 
> of operations
> --
>
> Key: HDFS-10337
> URL: https://issues.apache.org/jira/browse/HDFS-10337
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Akira AJISAKA
>Assignee: Lin Yiqun
>Priority: Minor
>  Labels: newbie
> Attachments: HDFS-10337.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10337) OfflineEditsViewer stats option should print 0 instead of null for the count of operations

2016-04-27 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-10337:
-
Status: Patch Available  (was: Open)

> OfflineEditsViewer stats option should print 0 instead of null for the count 
> of operations
> --
>
> Key: HDFS-10337
> URL: https://issues.apache.org/jira/browse/HDFS-10337
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Akira AJISAKA
>Assignee: Lin Yiqun
>Priority: Minor
>  Labels: newbie
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-10337) OfflineEditsViewer stats option should print 0 instead of null for the count of operations

2016-04-27 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun reassigned HDFS-10337:


Assignee: Lin Yiqun

> OfflineEditsViewer stats option should print 0 instead of null for the count 
> of operations
> --
>
> Key: HDFS-10337
> URL: https://issues.apache.org/jira/browse/HDFS-10337
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Akira AJISAKA
>Assignee: Lin Yiqun
>Priority: Minor
>  Labels: newbie
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9958) BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed storages.

2016-04-27 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261388#comment-15261388
 ] 

Walter Su commented on HDFS-9958:
-

bq. I think only DFSClient currently reports storageID.
No, it doesn't.
{code}
//DFSInputStream.java
  protected void reportCheckSumFailure(CorruptedBlocks corruptedBlocks,
  int dataNodeCount, boolean isStriped) {
...
reportList.add(new LocatedBlock(blk, locs));
  }
}
...
 dfsClient.reportChecksumFailure(src,
  reportList.toArray(new LocatedBlock[reportList.size()]));
{code}

{{locs}} is {{DatanodeInfoWithStorage}} actually, it has the storageIDs. But 
the {{LocatedBlock}} constructor is wrong.
{code}
  public LocatedBlock(ExtendedBlock b, DatanodeInfo[] locs) {
// By default, startOffset is unknown(-1) and corrupt is false.
this(b, locs, null, null, -1, false, EMPTY_LOCS);
  }
...
...
  public LocatedBlock(ExtendedBlock b, DatanodeInfo[] locs, String[] storageIDs,
  StorageType[] storageTypes, long startOffset,
  boolean corrupt, DatanodeInfo[] cachedLocs) {
...
DatanodeInfoWithStorage storage = new DatanodeInfoWithStorage(di,
storageIDs != null ? storageIDs[i] : null,
storageTypes != null ? storageTypes[i] : null);
this.locs[i] = storage;
{code}
It loses the storageIDs.

> BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed 
> storages.
> 
>
> Key: HDFS-9958
> URL: https://issues.apache.org/jira/browse/HDFS-9958
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: HDFS-9958-Test-v1.txt, HDFS-9958.001.patch, 
> HDFS-9958.002.patch, HDFS-9958.003.patch, HDFS-9958.004.patch, 
> HDFS-9958.005.patch
>
>
> In a scenario where the corrupt replica is on a failed storage, before it is 
> taken out of blocksMap, there is a race which causes the creation of 
> LocatedBlock on a {{machines}} array element that is not populated. 
> Following is the root cause,
> {code}
> final int numCorruptNodes = countNodes(blk).corruptReplicas();
> {code}
> countNodes only looks at nodes with storage state as NORMAL, which in the 
> case where corrupt replica is on failed storage will amount to 
> numCorruptNodes being zero. 
> {code}
> final int numNodes = blocksMap.numNodes(blk);
> {code}
> However, numNodes will count all nodes/storages irrespective of the state of 
> the storage. Therefore numMachines will include such (failed) nodes. The 
> assert would fail only if the system is enabled to catch Assertion errors, 
> otherwise it goes ahead and tries to create LocatedBlock object for that is 
> not put in the {{machines}} array.
> Here is the stack trace:
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:45)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:40)
>   at 
> org.apache.hadoop.hdfs.protocol.LocatedBlock.(LocatedBlock.java:84)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:878)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:826)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlockList(BlockManager.java:799)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlocks(BlockManager.java:899)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1849)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:588)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.S

[jira] [Updated] (HDFS-7877) Support maintenance state for datanodes

2016-04-27 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-7877:
--
Component/s: namenode
 datanode

> Support maintenance state for datanodes
> ---
>
> Key: HDFS-7877
> URL: https://issues.apache.org/jira/browse/HDFS-7877
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Reporter: Ming Ma
> Attachments: HDFS-7877-2.patch, HDFS-7877.patch, 
> Supportmaintenancestatefordatanodes-2.pdf, 
> Supportmaintenancestatefordatanodes.pdf
>
>
> This requirement came up during the design for HDFS-7541. Given this feature 
> is mostly independent of upgrade domain feature, it is better to track it 
> under a separate jira. The design and draft patch will be available soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9389) Add maintenance states to AdminStates

2016-04-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261264#comment-15261264
 ] 

Hadoop QA commented on HDFS-9389:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 3s {color} 
| {color:red} Docker failed to build yetus/hadoop:7b1c37a. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12801153/HDFS-9389.patch |
| JIRA Issue | HDFS-9389 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15311/console |
| Powered by | Apache Yetus 0.2.0   http://yetus.apache.org |


This message was automatically generated.



> Add maintenance states to AdminStates
> -
>
> Key: HDFS-9389
> URL: https://issues.apache.org/jira/browse/HDFS-9389
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HDFS-9389.patch
>
>
> This jira will add {{ENTERING_MAINTENANCE}} and {{IN_MAINTENANCE}} to 
> {{AdminStates}} and protobuf. It will also provide the basic functionality to 
> transition DN into or out of maintenance state from DN's state machine's 
> point of view. The actual admins support and block management will be covered 
> by separate jiras.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10324) Trash directory in an encryption zone should be pre-created with sticky bit

2016-04-27 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261263#comment-15261263
 ] 

Andrew Wang commented on HDFS-10324:


Sounds good to me [~xyao]. Let's also do the provisionTrash flag as an EnumSet 
to future-proof.

> Trash directory in an encryption zone should be pre-created with sticky bit
> ---
>
> Key: HDFS-10324
> URL: https://issues.apache.org/jira/browse/HDFS-10324
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption
>Affects Versions: 2.8.0
> Environment: CDH5.7.0
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-10324.001.patch, HDFS-10324.002.patch, 
> HDFS-10324.003.patch
>
>
> We encountered a bug in HDFS-8831:
> After HDFS-8831, a deleted file in an encryption zone is moved to a .Trash 
> subdirectory within the encryption zone.
> However, if this .Trash subdirectory is not created beforehand, it will be 
> created and owned by the first user who deleted a file, with permission 
> drwx--. This creates a serious bug because any other non-privileged user 
> will not be able to delete any files within the encryption zone, because they 
> do not have the permission to move directories to the trash directory.
> We should fix this bug, by pre-creating the .Trash directory with sticky bit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10297) Increase default balance bandwidth and concurrent moves

2016-04-27 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-10297:
---
   Resolution: Fixed
Fix Version/s: (was: 2.9.0)
   2.8.0
   Status: Resolved  (was: Patch Available)

Pushed, thanks John!

> Increase default balance bandwidth and concurrent moves
> ---
>
> Key: HDFS-10297
> URL: https://issues.apache.org/jira/browse/HDFS-10297
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer & mover
>Affects Versions: 2.6.0
>Reporter: John Zhuge
>Assignee: John Zhuge
>Priority: Minor
>  Labels: supportability
> Fix For: 2.8.0
>
> Attachments: HDFS-10297.001.patch, HDFS-10297.002.patch, 
> HDFS-10297.003.branch-2.8.patch, HDFS-10297.003.patch
>
>
> Adjust the default values to better support the current level of customer 
> host and network configurations.
> Increase the default for property {{dfs.datanode.balance.bandwidthPerSec}} 
> from 1 to 10 MB. Apply to DN. 10 MB/s is about 10% of the GbE network.
> Increase the default for property 
> {{dfs.datanode.balance.max.concurrent.moves}} from 5 to 50. Apply to DN and 
> Balancer. The default number of DN receiver threads is 4096. The default 
> number of balancer mover threads is 1000.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9389) Add maintenance states to AdminStates

2016-04-27 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-9389:
--
Assignee: Ming Ma
  Status: Patch Available  (was: Open)

> Add maintenance states to AdminStates
> -
>
> Key: HDFS-9389
> URL: https://issues.apache.org/jira/browse/HDFS-9389
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HDFS-9389.patch
>
>
> This jira will add {{ENTERING_MAINTENANCE}} and {{IN_MAINTENANCE}} to 
> {{AdminStates}} and protobuf. It will also provide the basic functionality to 
> transition DN into or out of maintenance state from DN's state machine's 
> point of view. The actual admins support and block management will be covered 
> by separate jiras.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9389) Add maintenance states to AdminStates

2016-04-27 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-9389:
--
Attachment: HDFS-9389.patch

Here is the draft patch. The rest of maintenance functionalities are covered by 
other jiras.

> Add maintenance states to AdminStates
> -
>
> Key: HDFS-9389
> URL: https://issues.apache.org/jira/browse/HDFS-9389
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ming Ma
> Attachments: HDFS-9389.patch
>
>
> This jira will add {{ENTERING_MAINTENANCE}} and {{IN_MAINTENANCE}} to 
> {{AdminStates}} and protobuf. It will also provide the basic functionality to 
> transition DN into or out of maintenance state from DN's state machine's 
> point of view. The actual admins support and block management will be covered 
> by separate jiras.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client

2016-04-27 Thread Lei (Eddy) Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261052#comment-15261052
 ] 

Lei (Eddy) Xu edited comment on HDFS-3702 at 4/27/16 11:45 PM:
---

Committed to trunk and branch-2.

Thanks a lot for the detailed suggestions and kindly reviews from 
[~andrew.wang], [~nkeywal], [~stack], [~cmccabe], [~arpitagarwal] and 
[~szetszwo]! 


was (Author: eddyxu):
Committed to trunk and branch-2.

Thanks a lot for the detailed suggestions and kindly reviews from 
[~andrew.wang], [~nkeywal], [~stack], [~arpitagarwal] and [~szetszwo]! 

> Add an option for NOT writing the blocks locally if there is a datanode on 
> the same box as the client
> -
>
> Key: HDFS-3702
> URL: https://issues.apache.org/jira/browse/HDFS-3702
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.5.1
>Reporter: Nicolas Liochon
>Assignee: Lei (Eddy) Xu
>Priority: Minor
>  Labels: BB2015-05-TBR
> Fix For: 3.0.0, 2.9.0
>
> Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, 
> HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, 
> HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, 
> HDFS-3702.008.patch, HDFS-3702.009.patch, HDFS-3702.010.patch, 
> HDFS-3702.011.patch, HDFS-3702.012.patch, HDFS-3702_Design.pdf
>
>
> This is useful for Write-Ahead-Logs: these files are writen for recovery 
> only, and are not read when there are no failures.
> Taking HBase as an example, these files will be read only if the process that 
> wrote them (the 'HBase regionserver') dies. This will likely come from a 
> hardware failure, hence the corresponding datanode will be dead as well. So 
> we're writing 3 replicas, but in reality only 2 of them are really useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client

2016-04-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261153#comment-15261153
 ] 

Hudson commented on HDFS-3702:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9686 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9686/])
HDFS-3702. Fix missing imports from HDFS-3702 trunk patch. (lei: rev 
8bd0bca0b1ea524132f564b3b8332506421f64b9)
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolTranslatorPB.java


> Add an option for NOT writing the blocks locally if there is a datanode on 
> the same box as the client
> -
>
> Key: HDFS-3702
> URL: https://issues.apache.org/jira/browse/HDFS-3702
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.5.1
>Reporter: Nicolas Liochon
>Assignee: Lei (Eddy) Xu
>Priority: Minor
>  Labels: BB2015-05-TBR
> Fix For: 3.0.0, 2.9.0
>
> Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, 
> HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, 
> HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, 
> HDFS-3702.008.patch, HDFS-3702.009.patch, HDFS-3702.010.patch, 
> HDFS-3702.011.patch, HDFS-3702.012.patch, HDFS-3702_Design.pdf
>
>
> This is useful for Write-Ahead-Logs: these files are writen for recovery 
> only, and are not read when there are no failures.
> Taking HBase as an example, these files will be read only if the process that 
> wrote them (the 'HBase regionserver') dies. This will likely come from a 
> hardware failure, hence the corresponding datanode will be dead as well. So 
> we're writing 3 replicas, but in reality only 2 of them are really useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10338) DistCp masks potential CRC check failures

2016-04-27 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261127#comment-15261127
 ] 

Mingliang Liu commented on HDFS-10338:
--

I'm favor of propagating the IOException thrown by {{getFileChecksum()}}. The 
retriable command will take care of it, and after all retry-attempts the copy 
mapper will handle failures accordingly. Moreover, I believe this is orthogonal 
to the {{ignoreFailures}} option.

Another point to mention is that, the {{checksumsAreEqual()}} has a 
conflict/confusing javadoc. It claims:
{code}
   * @return If either checksum couldn't be retrieved, the function returns
   * false. If checksums are retrieved, the function returns true if they match,
   * and false otherwise.
   * @throws IOException if there's an exception while retrieving checksums.
{code}
While it has a {{throws IOException}} signature, it does not really throw any 
exception.

> DistCp masks potential CRC check failures
> -
>
> Key: HDFS-10338
> URL: https://issues.apache.org/jira/browse/HDFS-10338
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 2.7.1
>Reporter: Elliot West
>
> There appear to be edge cases whereby CRC checks may be circumvented when 
> requests for checksums from the source or target file system fail. In this 
> event CRCs could differ between the source and target and yet the DistCp copy 
> would succeed, even when the 'skip CRC check' option is not being used.
> The code in question is contained in the method 
> [{{org.apache.hadoop.tools.util.DistCpUtils#checksumsAreEqual(...)}}|https://github.com/apache/hadoop/blob/release-2.7.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/DistCpUtils.java#L457]
> Specifically this code block suggests that if there is a failure when trying 
> to read the source or target checksum then the method will return {{true}} 
> (i.e.  the checksums are equal), implying that the check succeeded. In actual 
> fact we just failed to obtain the checksum and could not perform the check.
> {code}
> try {
>   sourceChecksum = sourceChecksum != null ? sourceChecksum : 
> sourceFS.getFileChecksum(source);
>   targetChecksum = targetFS.getFileChecksum(target);
> } catch (IOException e) {
>   LOG.error("Unable to retrieve checksum for " + source + " or "
> + target, e);
> }
> return (sourceChecksum == null || targetChecksum == null ||
>   sourceChecksum.equals(targetChecksum));
> {code}
> I believe that at the very least the caught {{IOException}} should be 
> re-thrown. If this is not deemed desirable then I believe an option 
> ({{--strictCrc}}?) should be added to enforce a strict check where we require 
> that both the source and target CRCs are retrieved, are not null, and are 
> then compared for equality. If for any reason either of the CRCs retrievals 
> fail then an exception is thrown.
> Clearly some {{FileSystems}} do not support CRCs and invocations to 
> {{FileSystem.getFileChecksum(...)}} return {{null}} in these instances. I 
> would suggest that these should fail a strict CRC check to prevent users 
> developing a false sense of security in their copy pipeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10220) Namenode failover due to too long loking in LeaseManager.Monitor

2016-04-27 Thread Nicolas Fraison (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261064#comment-15261064
 ] 

Nicolas Fraison commented on HDFS-10220:


Thanks [~walter.k.su] for the catch.
We can update the break like that to avoid this issue:
{code}
} finally {
  if (isMaxLockHoldToReleaseLease(start)) {
LOG.debug("Breaking out of checkLeases() after " +
  maxLockHoldToReleaseLease + "ms.");
if (leaseToCheck.hasFiles()) {
  renewLease(leaseToCheck);
}
break;
  }
}
{code}

> Namenode failover due to too long loking in LeaseManager.Monitor
> 
>
> Key: HDFS-10220
> URL: https://issues.apache.org/jira/browse/HDFS-10220
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Nicolas Fraison
>Assignee: Nicolas Fraison
>Priority: Minor
> Attachments: HADOOP-10220.001.patch, HADOOP-10220.002.patch, 
> HADOOP-10220.003.patch, HADOOP-10220.004.patch, HADOOP-10220.005.patch, 
> threaddump_zkfc.txt
>
>
> I have faced a namenode failover due to unresponsive namenode detected by the 
> zkfc with lot's of WARN messages (5 millions) like this one:
> _org.apache.hadoop.hdfs.StateChange: BLOCK* internalReleaseLease: All 
> existing blocks are COMPLETE, lease removed, file closed._
> On the threaddump taken by the zkfc there are lots of thread blocked due to a 
> lock.
> Looking at the code, there are a lock taken by the LeaseManager.Monitor when 
> some lease must be released. Due to the really big number of lease to be 
> released the namenode has taken too many times to release them blocking all 
> other tasks and making the zkfc thinking that the namenode was not 
> available/stuck.
> The idea of this patch is to limit the number of leased released each time we 
> check for lease so the lock won't be taken for a too long time period.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client

2016-04-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261063#comment-15261063
 ] 

Hudson commented on HDFS-3702:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9685 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9685/])
HDFS-3702. Add an option for NOT writing the blocks locally if there is (lei: 
rev 0a152103f19a3e8e1b7f33aeb9dd115ba231d7b7)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirWriteFileOp.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicy.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeleteRace.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/AddBlockFlag.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSOutputStream.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestBlockPlacementPolicyRackFaultTolerant.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileCreation.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/StripedDataStreamer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/ReplicationWork.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestAvailableSpaceBlockPlacementPolicy.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/TestOpenFilesWithSnapshot.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestHASafeMode.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicy.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/ClientNamenodeProtocol.proto
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSClientRetries.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeadDatanode.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDefaultBlockPlacementPolicy.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolServerSideTranslatorPB.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/TestSnapshotBlocksMap.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyConsiderLoad.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockStoragePolicy.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestUpgradeDomainBlockPlacementPolicy.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestAddBlockRetry.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CreateFlag.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/ErasureCodingWork.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/BaseReplicationPolicyTest.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestAddStripedBlocks.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyWithUpgradeDomain.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyWithNodeGroup.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSStripe

[jira] [Updated] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client

2016-04-27 Thread Lei (Eddy) Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HDFS-3702:

   Resolution: Fixed
Fix Version/s: 2.9.0
   3.0.0
   Status: Resolved  (was: Patch Available)

Committed to trunk and branch-2.

Thanks a lot for the detailed suggestions and kindly reviews from 
[~andrew.wang], [~nkeywal], [~stack], [~arpitagarwal] and [~szetszwo]! 

> Add an option for NOT writing the blocks locally if there is a datanode on 
> the same box as the client
> -
>
> Key: HDFS-3702
> URL: https://issues.apache.org/jira/browse/HDFS-3702
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.5.1
>Reporter: Nicolas Liochon
>Assignee: Lei (Eddy) Xu
>Priority: Minor
>  Labels: BB2015-05-TBR
> Fix For: 3.0.0, 2.9.0
>
> Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, 
> HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, 
> HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, 
> HDFS-3702.008.patch, HDFS-3702.009.patch, HDFS-3702.010.patch, 
> HDFS-3702.011.patch, HDFS-3702.012.patch, HDFS-3702_Design.pdf
>
>
> This is useful for Write-Ahead-Logs: these files are writen for recovery 
> only, and are not read when there are no failures.
> Taking HBase as an example, these files will be read only if the process that 
> wrote them (the 'HBase regionserver') dies. This will likely come from a 
> hardware failure, hence the corresponding datanode will be dead as well. So 
> we're writing 3 replicas, but in reality only 2 of them are really useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-10339) libhdfs++: Expose async operations through the C API

2016-04-27 Thread James Clampffer (JIRA)
James Clampffer created HDFS-10339:
--

 Summary: libhdfs++: Expose async operations through the C API
 Key: HDFS-10339
 URL: https://issues.apache.org/jira/browse/HDFS-10339
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: James Clampffer
Assignee: James Clampffer


I propose an API that looks like the following for doing async operations in C.

(might be some typeos, going off memory of what I tried, will clean up)
{code}
typedef struct {
  int status;
  ssize_t count;
  ... whatever else ...
} async_context;

typedef void* caller_context;
typedef void (*)(const async_context*, caller_context*) capi_callback; 

void hdfsAsyncPread(hdfsFS fs, hdfsFile file, off_t offset, void *buf, size_t 
count, capi_callback, caller_context);
{code}

When invoked we take a copy of the caller context that gets forwarded to the 
callback when the async op completes; this is where a user can keep a pointer 
to some state associated with the operation.  The callback is invoked by a 
const async_contex* analogous to the Status object in the C++ API so the 
callback code can check status, bytes read, and other stuff.

Internally this can be implemented by a callable struct/lambda that captures 
the caller_context and invokes the capi_callback with the caller_context and 
result async_context. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9758) libhdfs++: Implement Python bindings

2016-04-27 Thread James Clampffer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Clampffer updated HDFS-9758:
--
Attachment: hdfs_posix.py

Adding my example using the the C API and CTypes to implement File and 
FileSystem in python

HdfsFile.readline isn't even half baked, the idea was to read 4KB blocks that 
looked like pages into a dict until a newline char was found.  That dict could 
also function as a small cache.  The rest of the implementation should work 
though.

> libhdfs++: Implement Python bindings
> 
>
> Key: HDFS-9758
> URL: https://issues.apache.org/jira/browse/HDFS-9758
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: James Clampffer
> Attachments: hdfs_posix.py
>
>
> It'd be really useful to have bindings for various scripting languages.  
> Python would be a good start because of it's popularity and how easy it is to 
> interact with shared libraries using the ctypes module.  I think bindings for 
> the V8 engine that nodeJS uses would be a close second in terms of expanding 
> the potential user base.
> Probably worth starting with just adding a synchronous API and building from 
> there to avoid interactions with python's garbage collector until the 
> bindings prove to be solid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6489) DFS Used space is not correct computed on frequent append operations

2016-04-27 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260954#comment-15260954
 ] 

Ravi Prakash commented on HDFS-6489:


The problem is here: 
https://github.com/apache/hadoop/blob/f16722d2ef31338a57a13e2c8d18c1c62d58bbaf/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java#L323
 . Even though this is an append, {{dfsUsage}} is incremented by the total 
block size every time. This can be easily seen by running the 
{{testFrequentAppend}} (included in Weiwei's patch) and adding a log line after 
line 323.
As far as I can see, this problem existed since 2012, but only recently did 
this become problematic because we started considering dfsUsed space in 
deciding whether to write a block or not.

> DFS Used space is not correct computed on frequent append operations
> 
>
> Key: HDFS-6489
> URL: https://issues.apache.org/jira/browse/HDFS-6489
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.2.0, 2.7.1, 2.7.2
>Reporter: stanley shi
>Assignee: Weiwei Yang
> Attachments: HDFS-6489.001.patch, HDFS-6489.002.patch, 
> HDFS-6489.003.patch, HDFS6489.java
>
>
> The current implementation of the Datanode will increase the DFS used space 
> on each block write operation. This is correct in most scenario (create new 
> file), but sometimes it will behave in-correct(append small data to a large 
> block).
> For example, I have a file with only one block(say, 60M). Then I try to 
> append to it very frequently but each time I append only 10 bytes;
> Then on each append, dfs used will be increased with the length of the 
> block(60M), not teh actual data length(10bytes).
> Consider in a scenario I use many clients to append concurrently to a large 
> number of files (1000+), assume the block size is 32M (half of the default 
> value), then the dfs used will be increased 1000*32M = 32G on each append to 
> the files; but actually I only write 10K bytes; this will cause the datanode 
> to report in-sufficient disk space on data write.
> {quote}2014-06-04 15:27:34,719 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode: opWriteBlock  
> BP-1649188734-10.37.7.142-1398844098971:blk_1073742834_45306 received 
> exception org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException: 
> Insufficient space for appending to FinalizedReplica, blk_1073742834_45306, 
> FINALIZED{quote}
> But the actual disk usage:
> {quote}
> [root@hdsh143 ~]# df -h
> FilesystemSize  Used Avail Use% Mounted on
> /dev/sda3  16G  2.9G   13G  20% /
> tmpfs 1.9G   72K  1.9G   1% /dev/shm
> /dev/sda1  97M   32M   61M  35% /boot
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client

2016-04-27 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260953#comment-15260953
 ] 

Tsz Wo Nicholas Sze commented on HDFS-3702:
---

After a second thought, I agree that it is fine to add 
CreateFlag.NO_LOCAL_WRITE as LimitedPrivate to HBase.  Thanks.

> Add an option for NOT writing the blocks locally if there is a datanode on 
> the same box as the client
> -
>
> Key: HDFS-3702
> URL: https://issues.apache.org/jira/browse/HDFS-3702
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.5.1
>Reporter: Nicolas Liochon
>Assignee: Lei (Eddy) Xu
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, 
> HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, 
> HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, 
> HDFS-3702.008.patch, HDFS-3702.009.patch, HDFS-3702.010.patch, 
> HDFS-3702.011.patch, HDFS-3702.012.patch, HDFS-3702_Design.pdf
>
>
> This is useful for Write-Ahead-Logs: these files are writen for recovery 
> only, and are not read when there are no failures.
> Taking HBase as an example, these files will be read only if the process that 
> wrote them (the 'HBase regionserver') dies. This will likely come from a 
> hardware failure, hence the corresponding datanode will be dead as well. So 
> we're writing 3 replicas, but in reality only 2 of them are really useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10335) Mover$Processor#chooseTarget() always chooses the first matching target storage group

2016-04-27 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260943#comment-15260943
 ] 

Mingliang Liu commented on HDFS-10335:
--

Thanks for your discussion and review, [~szetszwo].

By the way, we did not see any failing UT locally, and let's pend on Jenkins to 
verify. Meanwhile, we're testing the patch manually on a local cluster.

> Mover$Processor#chooseTarget() always chooses the first matching target 
> storage group
> -
>
> Key: HDFS-10335
> URL: https://issues.apache.org/jira/browse/HDFS-10335
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 2.8.0
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
>Priority: Critical
> Attachments: HDFS-10335.000.patch
>
>
> Currently the 
> {{org.apache.hadoop.hdfs.server.mover.Mover$Processor#chooseTarget()}} always 
> chooses the first matching target datanode from the candidate list. This may 
> make the mover schedule a lot of task to a few of the datanodes (first 
> several datanodes of the candidate list). The overall performance will suffer 
> significantly from this because of the saturated network/disk usage. 
> Specially, if the {{dfs.datanode.balance.max.concurrent.moves}} is set, the 
> scheduled move task will be queued on a few of the storage group, regardless 
> of other available storage groups. We need an algorithm which can distribute 
> the move tasks approximately even across all the candidate target storage 
> groups.
> Thanks [~szetszwo] for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10335) Mover$Processor#chooseTarget() always chooses the first matching target storage group

2016-04-27 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260940#comment-15260940
 ] 

Mingliang Liu commented on HDFS-10335:
--

{code}
Step 16 : RUN cabal update && cabal install shellcheck --global
 ---> Running in 5438b8eb4d37
Config file path source is default config file.
Config file /root/.cabal/config not found.
Writing default configuration to /root/.cabal/config
Downloading the latest package list from hackage.haskell.org
cabal: Failed to download
http://hackage.haskell.org/packages/archive/00-index.tar.gz : ErrorMisc
"Unsucessful HTTP code: 502"
The command '/bin/sh -c cabal update && cabal install shellcheck --global' 
returned a non-zero code: 1

Total Elapsed time:   0m  4s

ERROR: Docker failed to build image.
{code}

It seems the Yetus is not happy, but not Jenkins.

> Mover$Processor#chooseTarget() always chooses the first matching target 
> storage group
> -
>
> Key: HDFS-10335
> URL: https://issues.apache.org/jira/browse/HDFS-10335
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 2.8.0
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
>Priority: Critical
> Attachments: HDFS-10335.000.patch
>
>
> Currently the 
> {{org.apache.hadoop.hdfs.server.mover.Mover$Processor#chooseTarget()}} always 
> chooses the first matching target datanode from the candidate list. This may 
> make the mover schedule a lot of task to a few of the datanodes (first 
> several datanodes of the candidate list). The overall performance will suffer 
> significantly from this because of the saturated network/disk usage. 
> Specially, if the {{dfs.datanode.balance.max.concurrent.moves}} is set, the 
> scheduled move task will be queued on a few of the storage group, regardless 
> of other available storage groups. We need an algorithm which can distribute 
> the move tasks approximately even across all the candidate target storage 
> groups.
> Thanks [~szetszwo] for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10335) Mover$Processor#chooseTarget() always chooses the first matching target storage group

2016-04-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260934#comment-15260934
 ] 

Hadoop QA commented on HDFS-10335:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 4s {color} 
| {color:red} Docker failed to build yetus/hadoop:7b1c37a. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12800935/HDFS-10335.000.patch |
| JIRA Issue | HDFS-10335 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15310/console |
| Powered by | Apache Yetus 0.2.0   http://yetus.apache.org |


This message was automatically generated.



> Mover$Processor#chooseTarget() always chooses the first matching target 
> storage group
> -
>
> Key: HDFS-10335
> URL: https://issues.apache.org/jira/browse/HDFS-10335
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 2.8.0
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
>Priority: Critical
> Attachments: HDFS-10335.000.patch
>
>
> Currently the 
> {{org.apache.hadoop.hdfs.server.mover.Mover$Processor#chooseTarget()}} always 
> chooses the first matching target datanode from the candidate list. This may 
> make the mover schedule a lot of task to a few of the datanodes (first 
> several datanodes of the candidate list). The overall performance will suffer 
> significantly from this because of the saturated network/disk usage. 
> Specially, if the {{dfs.datanode.balance.max.concurrent.moves}} is set, the 
> scheduled move task will be queued on a few of the storage group, regardless 
> of other available storage groups. We need an algorithm which can distribute 
> the move tasks approximately even across all the candidate target storage 
> groups.
> Thanks [~szetszwo] for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9902) Support different values of dfs.datanode.du.reserved per storage type

2016-04-27 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-9902:

Summary: Support different values of dfs.datanode.du.reserved per storage 
type  (was: dfs.datanode.du.reserved should be difference between StorageType 
DISK and RAM_DISK)

> Support different values of dfs.datanode.du.reserved per storage type
> -
>
> Key: HDFS-9902
> URL: https://issues.apache.org/jira/browse/HDFS-9902
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.7.2
>Reporter: Pan Yuxuan
>Assignee: Brahma Reddy Battula
> Attachments: HDFS-9902-02.patch, HDFS-9902.patch
>
>
> Now Hadoop support different storage type for DISK, SSD, ARCHIVE and 
> RAM_DISK, but they share one configuration dfs.datanode.du.reserved.
> The DISK size may be several TB and the RAM_DISK size may be only several 
> tens of GB.
> The problem is that when I configure DISK and RAM_DISK (tmpfs) in the same 
> DN, and I set  dfs.datanode.du.reserved values 10GB, this will waste a lot of 
> RAM_DISK size. 
> Since the usage of RAM_DISK can be 100%, so I don't want 
> dfs.datanode.du.reserved configured for DISK impacts the usage of tmpfs.
> So can we make a new configuration for RAM_DISK or just skip this 
> configuration for RAM_DISK?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10335) Mover$Processor#chooseTarget() always chooses the first matching target storage group

2016-04-27 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-10335:
---
Hadoop Flags: Reviewed

+1 patch looks good.

> Mover$Processor#chooseTarget() always chooses the first matching target 
> storage group
> -
>
> Key: HDFS-10335
> URL: https://issues.apache.org/jira/browse/HDFS-10335
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 2.8.0
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
>Priority: Critical
> Attachments: HDFS-10335.000.patch
>
>
> Currently the 
> {{org.apache.hadoop.hdfs.server.mover.Mover$Processor#chooseTarget()}} always 
> chooses the first matching target datanode from the candidate list. This may 
> make the mover schedule a lot of task to a few of the datanodes (first 
> several datanodes of the candidate list). The overall performance will suffer 
> significantly from this because of the saturated network/disk usage. 
> Specially, if the {{dfs.datanode.balance.max.concurrent.moves}} is set, the 
> scheduled move task will be queued on a few of the storage group, regardless 
> of other available storage groups. We need an algorithm which can distribute 
> the move tasks approximately even across all the candidate target storage 
> groups.
> Thanks [~szetszwo] for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10335) Mover$Processor#chooseTarget() always chooses the first matching target storage group

2016-04-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260873#comment-15260873
 ] 

Hadoop QA commented on HDFS-10335:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 3s {color} 
| {color:red} Docker failed to build yetus/hadoop:7b1c37a. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12800935/HDFS-10335.000.patch |
| JIRA Issue | HDFS-10335 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15309/console |
| Powered by | Apache Yetus 0.2.0   http://yetus.apache.org |


This message was automatically generated.



> Mover$Processor#chooseTarget() always chooses the first matching target 
> storage group
> -
>
> Key: HDFS-10335
> URL: https://issues.apache.org/jira/browse/HDFS-10335
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 2.8.0
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
>Priority: Critical
> Attachments: HDFS-10335.000.patch
>
>
> Currently the 
> {{org.apache.hadoop.hdfs.server.mover.Mover$Processor#chooseTarget()}} always 
> chooses the first matching target datanode from the candidate list. This may 
> make the mover schedule a lot of task to a few of the datanodes (first 
> several datanodes of the candidate list). The overall performance will suffer 
> significantly from this because of the saturated network/disk usage. 
> Specially, if the {{dfs.datanode.balance.max.concurrent.moves}} is set, the 
> scheduled move task will be queued on a few of the storage group, regardless 
> of other available storage groups. We need an algorithm which can distribute 
> the move tasks approximately even across all the candidate target storage 
> groups.
> Thanks [~szetszwo] for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10335) Mover$Processor#chooseTarget() always chooses the first matching target storage group

2016-04-27 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-10335:
-
Status: Patch Available  (was: In Progress)

> Mover$Processor#chooseTarget() always chooses the first matching target 
> storage group
> -
>
> Key: HDFS-10335
> URL: https://issues.apache.org/jira/browse/HDFS-10335
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 2.8.0
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
>Priority: Critical
> Attachments: HDFS-10335.000.patch
>
>
> Currently the 
> {{org.apache.hadoop.hdfs.server.mover.Mover$Processor#chooseTarget()}} always 
> chooses the first matching target datanode from the candidate list. This may 
> make the mover schedule a lot of task to a few of the datanodes (first 
> several datanodes of the candidate list). The overall performance will suffer 
> significantly from this because of the saturated network/disk usage. 
> Specially, if the {{dfs.datanode.balance.max.concurrent.moves}} is set, the 
> scheduled move task will be queued on a few of the storage group, regardless 
> of other available storage groups. We need an algorithm which can distribute 
> the move tasks approximately even across all the candidate target storage 
> groups.
> Thanks [~szetszwo] for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client

2016-04-27 Thread Lei (Eddy) Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260856#comment-15260856
 ] 

Lei (Eddy) Xu commented on HDFS-3702:
-

All the tests failures are not related. I pass all tests locally with the 
exception of TestHFlush, which was reported in HDFS-2043 and thus not related.

If there is no further objections, I will commit this by EOD. Thanks.

> Add an option for NOT writing the blocks locally if there is a datanode on 
> the same box as the client
> -
>
> Key: HDFS-3702
> URL: https://issues.apache.org/jira/browse/HDFS-3702
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.5.1
>Reporter: Nicolas Liochon
>Assignee: Lei (Eddy) Xu
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, 
> HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, 
> HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, 
> HDFS-3702.008.patch, HDFS-3702.009.patch, HDFS-3702.010.patch, 
> HDFS-3702.011.patch, HDFS-3702.012.patch, HDFS-3702_Design.pdf
>
>
> This is useful for Write-Ahead-Logs: these files are writen for recovery 
> only, and are not read when there are no failures.
> Taking HBase as an example, these files will be read only if the process that 
> wrote them (the 'HBase regionserver') dies. This will likely come from a 
> hardware failure, hence the corresponding datanode will be dead as well. So 
> we're writing 3 replicas, but in reality only 2 of them are really useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10220) Namenode failover due to too long loking in LeaseManager.Monitor

2016-04-27 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260674#comment-15260674
 ] 

Ravi Prakash commented on HDFS-10220:
-

Good catch Walter! That seems broken prior to this patch too, doesn't it? 
[~wheat9] Could you please comment on whether {{leaseToCheck}} should be put 
back into sortedLeases in case {{completed}} is {{false}} . Otherwise, the 
lease will never be checked again. Or perhaps I am not understanding some other 
mechanism through which it would.

> Namenode failover due to too long loking in LeaseManager.Monitor
> 
>
> Key: HDFS-10220
> URL: https://issues.apache.org/jira/browse/HDFS-10220
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Nicolas Fraison
>Assignee: Nicolas Fraison
>Priority: Minor
> Attachments: HADOOP-10220.001.patch, HADOOP-10220.002.patch, 
> HADOOP-10220.003.patch, HADOOP-10220.004.patch, HADOOP-10220.005.patch, 
> threaddump_zkfc.txt
>
>
> I have faced a namenode failover due to unresponsive namenode detected by the 
> zkfc with lot's of WARN messages (5 millions) like this one:
> _org.apache.hadoop.hdfs.StateChange: BLOCK* internalReleaseLease: All 
> existing blocks are COMPLETE, lease removed, file closed._
> On the threaddump taken by the zkfc there are lots of thread blocked due to a 
> lock.
> Looking at the code, there are a lock taken by the LeaseManager.Monitor when 
> some lease must be released. Due to the really big number of lease to be 
> released the namenode has taken too many times to release them blocking all 
> other tasks and making the zkfc thinking that the namenode was not 
> available/stuck.
> The idea of this patch is to limit the number of leased released each time we 
> check for lease so the lock won't be taken for a too long time period.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9902) dfs.datanode.du.reserved should be difference between StorageType DISK and RAM_DISK

2016-04-27 Thread Xiaoyu Yao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260499#comment-15260499
 ] 

Xiaoyu Yao commented on HDFS-9902:
--

Agree with [~arpitagarwal], we need to document the new keys 
{{dfs.datanode.du.#storagetype#.reserved}}. 

> dfs.datanode.du.reserved should be difference between StorageType DISK and 
> RAM_DISK
> ---
>
> Key: HDFS-9902
> URL: https://issues.apache.org/jira/browse/HDFS-9902
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.7.2
>Reporter: Pan Yuxuan
>Assignee: Brahma Reddy Battula
> Attachments: HDFS-9902-02.patch, HDFS-9902.patch
>
>
> Now Hadoop support different storage type for DISK, SSD, ARCHIVE and 
> RAM_DISK, but they share one configuration dfs.datanode.du.reserved.
> The DISK size may be several TB and the RAM_DISK size may be only several 
> tens of GB.
> The problem is that when I configure DISK and RAM_DISK (tmpfs) in the same 
> DN, and I set  dfs.datanode.du.reserved values 10GB, this will waste a lot of 
> RAM_DISK size. 
> Since the usage of RAM_DISK can be 100%, so I don't want 
> dfs.datanode.du.reserved configured for DISK impacts the usage of tmpfs.
> So can we make a new configuration for RAM_DISK or just skip this 
> configuration for RAM_DISK?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10175) add per-operation stats to FileSystem.Statistics

2016-04-27 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260498#comment-15260498
 ] 

Colin Patrick McCabe commented on HDFS-10175:
-

BTW, sorry for the last-minute-ness of this scheduling, [~liuml07] and 
[~steve_l].

Webex here at 10:30:

HDFS-10175 webex
Wednesday, April 27, 2016
10:30 am | Pacific Daylight Time (San Francisco, GMT-07:00) | 1 hr

JOIN WEBEX MEETING
https://cloudera.webex.com/cloudera/j.php?MTID=mebca25435f158dec71b2589561e71b29
Meeting number: 294 963 170 Meeting password: 1234

JOIN BY PHONE 1-650-479-3208 Call-in toll number (US/Canada) Access code: 294 
963 170 Global call-in numbers: 
https://cloudera.webex.com/cloudera/globalcallin.php?serviceType=MC&ED=45642173&tollFree=0
 

> add per-operation stats to FileSystem.Statistics
> 
>
> Key: HDFS-10175
> URL: https://issues.apache.org/jira/browse/HDFS-10175
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Ram Venkatesh
>Assignee: Mingliang Liu
> Attachments: HDFS-10175.000.patch, HDFS-10175.001.patch, 
> HDFS-10175.002.patch, HDFS-10175.003.patch, HDFS-10175.004.patch, 
> HDFS-10175.005.patch, HDFS-10175.006.patch, TestStatisticsOverhead.java
>
>
> Currently FileSystem.Statistics exposes the following statistics:
> BytesRead
> BytesWritten
> ReadOps
> LargeReadOps
> WriteOps
> These are in-turn exposed as job counters by MapReduce and other frameworks. 
> There is logic within DfsClient to map operations to these counters that can 
> be confusing, for instance, mkdirs counts as a writeOp.
> Proposed enhancement:
> Add a statistic for each DfsClient operation including create, append, 
> createSymlink, delete, exists, mkdirs, rename and expose them as new 
> properties on the Statistics object. The operation-specific counters can be 
> used for analyzing the load imposed by a particular job on HDFS. 
> For example, we can use them to identify jobs that end up creating a large 
> number of files.
> Once this information is available in the Statistics object, the app 
> frameworks like MapReduce can expose them as additional counters to be 
> aggregated and recorded as part of job summary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-10338) DistCp masks potential CRC check failures

2016-04-27 Thread Elliot West (JIRA)
Elliot West created HDFS-10338:
--

 Summary: DistCp masks potential CRC check failures
 Key: HDFS-10338
 URL: https://issues.apache.org/jira/browse/HDFS-10338
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: distcp
Affects Versions: 2.7.1
Reporter: Elliot West


There appear to be edge cases whereby CRC checks may be circumvented when 
requests for checksums from the source or target file system fail. In this 
event CRCs could differ between the source and target and yet the DistCp copy 
would succeed, even when the 'skip CRC check' option is not being used.

The code in question is contained in the method 
[{{org.apache.hadoop.tools.util.DistCpUtils#checksumsAreEqual(...)}}|https://github.com/apache/hadoop/blob/release-2.7.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/DistCpUtils.java#L457]

Specifically this code block suggests that if there is a failure when trying to 
read the source or target checksum then the method will return {{true}} (i.e.  
the checksums are equal), implying that the check succeeded. In actual fact we 
just failed to obtain the checksum and could not perform the check.
{code}
try {
  sourceChecksum = sourceChecksum != null ? sourceChecksum : 
sourceFS.getFileChecksum(source);
  targetChecksum = targetFS.getFileChecksum(target);
} catch (IOException e) {
  LOG.error("Unable to retrieve checksum for " + source + " or "
+ target, e);
}
return (sourceChecksum == null || targetChecksum == null ||
  sourceChecksum.equals(targetChecksum));
{code}

I believe that at the very least the caught {{IOException}} should be 
re-thrown. If this is not deemed desirable then I believe an option 
({{--strictCrc}}?) should be added to enforce a strict check where we require 
that both the source and target CRCs are retrieved, are not null, and are then 
compared for equality. If for any reason either of the CRCs retrievals fail 
then an exception is thrown.

Clearly some {{FileSystems}} do not support CRCs and invocations to 
{{FileSystem.getFileChecksum(...)}} return {{null}} in these instances. I would 
suggest that these should fail a strict CRC check to prevent users developing a 
false sense of security in their copy pipeline.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10175) add per-operation stats to FileSystem.Statistics

2016-04-27 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260487#comment-15260487
 ] 

Colin Patrick McCabe commented on HDFS-10175:
-

Great.  Let me add a webex

> add per-operation stats to FileSystem.Statistics
> 
>
> Key: HDFS-10175
> URL: https://issues.apache.org/jira/browse/HDFS-10175
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Ram Venkatesh
>Assignee: Mingliang Liu
> Attachments: HDFS-10175.000.patch, HDFS-10175.001.patch, 
> HDFS-10175.002.patch, HDFS-10175.003.patch, HDFS-10175.004.patch, 
> HDFS-10175.005.patch, HDFS-10175.006.patch, TestStatisticsOverhead.java
>
>
> Currently FileSystem.Statistics exposes the following statistics:
> BytesRead
> BytesWritten
> ReadOps
> LargeReadOps
> WriteOps
> These are in-turn exposed as job counters by MapReduce and other frameworks. 
> There is logic within DfsClient to map operations to these counters that can 
> be confusing, for instance, mkdirs counts as a writeOp.
> Proposed enhancement:
> Add a statistic for each DfsClient operation including create, append, 
> createSymlink, delete, exists, mkdirs, rename and expose them as new 
> properties on the Statistics object. The operation-specific counters can be 
> used for analyzing the load imposed by a particular job on HDFS. 
> For example, we can use them to identify jobs that end up creating a large 
> number of files.
> Once this information is available in the Statistics object, the app 
> frameworks like MapReduce can expose them as additional counters to be 
> aggregated and recorded as part of job summary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-10337) OfflineEditsViewer stats option should print 0 instead of null for the count of operations

2016-04-27 Thread Akira AJISAKA (JIRA)
Akira AJISAKA created HDFS-10337:


 Summary: OfflineEditsViewer stats option should print 0 instead of 
null for the count of operations
 Key: HDFS-10337
 URL: https://issues.apache.org/jira/browse/HDFS-10337
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.2
Reporter: Akira AJISAKA
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HDFS-9902) dfs.datanode.du.reserved should be difference between StorageType DISK and RAM_DISK

2016-04-27 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260434#comment-15260434
 ] 

Arpit Agarwal edited comment on HDFS-9902 at 4/27/16 4:31 PM:
--

Hi [~brahmareddy], thank you for reporting this. The fix lgtm.

The unit test can be done more simply without MiniDFSCluster. Just instantiate 
"FsVolumeImpl" objects with different storage types and check {{#reserved}} is 
initialized correctly. Also could you please update the documentation of 
{{dfs.datanode.du.reserved}}?


was (Author: arpitagarwal):
Hi [~brahmareddy], thank you for reporting this. The fix lgtm.

The unit test can be done more simply without MiniDFSCluster. Just instantiate 
"FsVolumeImpl" objects with different storage types and check that the value of 
{{#reserved}}. Also could you please update the documentation of 
{{dfs.datanode.du.reserved}}?

> dfs.datanode.du.reserved should be difference between StorageType DISK and 
> RAM_DISK
> ---
>
> Key: HDFS-9902
> URL: https://issues.apache.org/jira/browse/HDFS-9902
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.7.2
>Reporter: Pan Yuxuan
>Assignee: Brahma Reddy Battula
> Attachments: HDFS-9902-02.patch, HDFS-9902.patch
>
>
> Now Hadoop support different storage type for DISK, SSD, ARCHIVE and 
> RAM_DISK, but they share one configuration dfs.datanode.du.reserved.
> The DISK size may be several TB and the RAM_DISK size may be only several 
> tens of GB.
> The problem is that when I configure DISK and RAM_DISK (tmpfs) in the same 
> DN, and I set  dfs.datanode.du.reserved values 10GB, this will waste a lot of 
> RAM_DISK size. 
> Since the usage of RAM_DISK can be 100%, so I don't want 
> dfs.datanode.du.reserved configured for DISK impacts the usage of tmpfs.
> So can we make a new configuration for RAM_DISK or just skip this 
> configuration for RAM_DISK?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9902) dfs.datanode.du.reserved should be difference between StorageType DISK and RAM_DISK

2016-04-27 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260434#comment-15260434
 ] 

Arpit Agarwal commented on HDFS-9902:
-

Hi [~brahmareddy], thank you for reporting this. The fix lgtm.

The unit test can be done more simply without MiniDFSCluster. Just instantiate 
"FsVolumeImpl" objects with different storage types and check that the value of 
{{#reserved}}. Also could you please update the documentation of 
{{dfs.datanode.du.reserved}}?

> dfs.datanode.du.reserved should be difference between StorageType DISK and 
> RAM_DISK
> ---
>
> Key: HDFS-9902
> URL: https://issues.apache.org/jira/browse/HDFS-9902
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.7.2
>Reporter: Pan Yuxuan
>Assignee: Brahma Reddy Battula
> Attachments: HDFS-9902-02.patch, HDFS-9902.patch
>
>
> Now Hadoop support different storage type for DISK, SSD, ARCHIVE and 
> RAM_DISK, but they share one configuration dfs.datanode.du.reserved.
> The DISK size may be several TB and the RAM_DISK size may be only several 
> tens of GB.
> The problem is that when I configure DISK and RAM_DISK (tmpfs) in the same 
> DN, and I set  dfs.datanode.du.reserved values 10GB, this will waste a lot of 
> RAM_DISK size. 
> Since the usage of RAM_DISK can be 100%, so I don't want 
> dfs.datanode.du.reserved configured for DISK impacts the usage of tmpfs.
> So can we make a new configuration for RAM_DISK or just skip this 
> configuration for RAM_DISK?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-8746) Reduce the latency of streaming reads by re-using DN connections

2016-04-27 Thread James Clampffer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Clampffer reassigned HDFS-8746:
-

Assignee: James Clampffer  (was: Bob Hansen)

> Reduce the latency of streaming reads by re-using DN connections
> 
>
> Key: HDFS-8746
> URL: https://issues.apache.org/jira/browse/HDFS-8746
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Bob Hansen
>Assignee: James Clampffer
>
> The current libhdfspp implementation opens a new connection for each pread.  
> For streaming reads (especially streaming short-buffer reads coming from the 
> C API, and especially once we get SSL handshake overhead), our throughput 
> will be dominated by the connection latency of reconnecting to the DataNodes.
> The target use case is a multi-block file that is being sequentially streamed 
> and processed by the client application, which consumes the data as it comes 
> from the DN and throws it away.  The data is read into moderately small 
> buffers (~64k - ~1MB) owned by the consumer, and overall throughput is the 
> critical metric.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9758) libhdfs++: Implement Python bindings

2016-04-27 Thread James Clampffer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260423#comment-15260423
 ] 

James Clampffer commented on HDFS-9758:
---

Here's a bunch of my thoughts about this, let me know what you think.  I 
haven't done much in Python 3.x so some of my assumptions might not hold true 
there.

My thinking was to focus on supporting CPython via CTypes, at least initially.  
I have a patch where I hacked together a demo of how this could be done that 
I'll dig up and post later today (doesn't support iterable files or readline() 
and isn't optimized but otherwise works well enough).  My overall opinion about 
this is that we should make it as easy to access HDFS through python as 
possible so less configuration and fewer dependencies is really important to 
get people to use it.  Naturally if some minor amount of configurations leads 
to a huge performance boost than it's worth considering.

I think CPython is the best place to focus simply because of it's ubiquity. 
PyPy is a cool project but doesn't come installed by default on many linux 
distributions as far as I know.  CPython ships with CTypes so that's one less 
dependency to bring in (unless CFFI is also included as a default library), but 
as you said you're pretty much stuck writing C wrapper functions for 
everything.  I don't think that's a dealbreaker as forcing a C API walls off 
exceptions and things that shouldn't be getting into the interpreter anyway.  
Does Cython get you a whole lot of benefits over something like CTypes?  I 
don't have experience with it.

Boost.Python or a pure python extension would mostly likely be the cleanest and 
most performant way of doing this sort of thing at the expense of extra 
complexity.  I've also heard that hadoop and boost generally don't mix but 
we've already made an exception for boost::asio (maybe that's different because 
it's header only?).  The only concern I'd have with both would be that they tie 
the module to the libhdfs++ C++ ABI so we'd have to be careful about 
compatibility.  I could see writing a module being a big benefit because then 
we could hook into the GC to properly support garbage collected async 
operations.

I think it's important to make sure at least some this work can help implement 
bindings for other languages but I think most approaches would do that in one 
way or another.  I'm partial to building language specific wrappers over the C 
API just because most scripting languages have a way of calling C functions.

> libhdfs++: Implement Python bindings
> 
>
> Key: HDFS-9758
> URL: https://issues.apache.org/jira/browse/HDFS-9758
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: James Clampffer
>
> It'd be really useful to have bindings for various scripting languages.  
> Python would be a good start because of it's popularity and how easy it is to 
> interact with shared libraries using the ctypes module.  I think bindings for 
> the V8 engine that nodeJS uses would be a close second in terms of expanding 
> the potential user base.
> Probably worth starting with just adding a synchronous API and building from 
> there to avoid interactions with python's garbage collector until the 
> bindings prove to be solid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10332) hdfs-native-client fails to build with CMake 2.8.11 or earlier

2016-04-27 Thread James Clampffer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Clampffer updated HDFS-10332:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> hdfs-native-client fails to build with CMake 2.8.11 or earlier
> --
>
> Key: HDFS-10332
> URL: https://issues.apache.org/jira/browse/HDFS-10332
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Tibor Kiss
>Assignee: Tibor Kiss
>Priority: Minor
> Attachments: HDFS-10332.01.patch, HDFS-10332.HDFS-8707.001.patch
>
>
> Due to a new syntax introduced in CMake 2.8.12 (get_filename_component's 
> function when VAR=DIRECTORY) the native-client won't build.
> Currently RHEL6 & 7 are using older version of CMake. 
> Error log:
> {noformat}
> [INFO] --- maven-antrun-plugin:1.7:run (make) @ hadoop-hdfs-native-client ---
> [INFO] Executing tasks
> main:
>  [exec] JAVA_HOME=, 
> JAVA_JVM_LIBRARY=/usr/java/jdk1.7.0_79/jre/lib/amd64/server/libjvm.so
>  [exec] JAVA_INCLUDE_PATH=/usr/java/jdk1.7.0_79/include, 
> JAVA_INCLUDE_PATH2=/usr/java/jdk1.7.0_79/include/linux
>  [exec] Located all JNI components successfully.
>  [exec] -- Could NOT find PROTOBUF (missing:  PROTOBUF_LIBRARY 
> PROTOBUF_INCLUDE_DIR)
>  [exec] -- valgrind location: MEMORYCHECK_COMMAND-NOTFOUND
>  [exec] -- checking for module 'fuse'
>  [exec] --   package 'fuse' not found
>  [exec] -- Failed to find Linux FUSE libraries or include files.  Will 
> not build FUSE client.
>  [exec] -- Configuring incomplete, errors occurred!
>  [exec] CMake Error at main/native/libhdfspp/CMakeLists.txt:71 
> (get_filename_component):
>  [exec]   get_filename_component unknown component DIRECTORY
>  [exec] Call Stack (most recent call first):
>  [exec]   main/native/libhdfspp/CMakeLists.txt:95 (copy_on_demand)
>  [exec]
>  [exec]
>  [exec] CMake Error at main/native/libhdfspp/CMakeLists.txt:71 
> (get_filename_component):
>  [exec]   get_filename_component unknown component DIRECTORY
>  [exec] Call Stack (most recent call first):
>  [exec]   main/native/libhdfspp/CMakeLists.txt:96 (copy_on_demand)
>  [exec]
>  [exec]
>  [exec] CMake Error at main/native/libhdfspp/CMakeLists.txt:71 
> (get_filename_component):
>  [exec]   get_filename_component unknown component DIRECTORY
>  [exec] Call Stack (most recent call first):
>  [exec]   main/native/libhdfspp/CMakeLists.txt:97 (copy_on_demand)
>  [exec]
>  [exec]
>  [exec] CMake Error at main/native/libhdfspp/CMakeLists.txt:71 
> (get_filename_component):
>  [exec]   get_filename_component unknown component DIRECTORY
>  [exec] Call Stack (most recent call first):
>  [exec]   main/native/libhdfspp/CMakeLists.txt:98 (copy_on_demand)
>  [exec]
>  [exec]
>  [exec] CMake Error: The following variables are used in this project, 
> but they are set to NOTFOUND.
>  [exec] Please set them or make sure they are set and tested correctly in 
> the CMake files:
>  [exec] PROTOBUF_LIBRARY (ADVANCED)
>  [exec] linked by target "hdfspp" in directory 
> /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp
>  [exec] linked by target "hdfspp_static" in directory 
> /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp
>  [exec] linked by target "protoc-gen-hrpc" in directory 
> /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/proto
>  [exec] linked by target "bad_datanode_test" in directory 
> /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/tests
>  [exec] linked by target "hdfs_builder_test" in directory 
> /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/tests
>  [exec] linked by target "hdfspp_errors_test" in directory 
> /home/tiborkiss/devel/workspace
>  [exec] 
> /hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/tests
>  [exec] linked by target "libhdfs_threaded_hdfspp_test_shim_static" 
> in directory 
> /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/test
>  [exec] linked by target "logging_test" in directory 
> /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/tests
>  [exec] linked by target "node_exclusion_test" in directory 
> /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-clien

[jira] [Commented] (HDFS-10332) hdfs-native-client fails to build with CMake 2.8.11 or earlier

2016-04-27 Thread James Clampffer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260355#comment-15260355
 ] 

James Clampffer commented on HDFS-10332:


Hi Tibor,
Thanks for finding this problem and fixing it!  I've committed it to HDFS-8707.

Making sure libhdfs++ builds/runs as expected on RHEL 6/7 is a priority of mine 
as well; I had been building a newer version of CMake from source so I hadn't 
noticed this.

> hdfs-native-client fails to build with CMake 2.8.11 or earlier
> --
>
> Key: HDFS-10332
> URL: https://issues.apache.org/jira/browse/HDFS-10332
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Tibor Kiss
>Assignee: Tibor Kiss
>Priority: Minor
> Attachments: HDFS-10332.01.patch, HDFS-10332.HDFS-8707.001.patch
>
>
> Due to a new syntax introduced in CMake 2.8.12 (get_filename_component's 
> function when VAR=DIRECTORY) the native-client won't build.
> Currently RHEL6 & 7 are using older version of CMake. 
> Error log:
> {noformat}
> [INFO] --- maven-antrun-plugin:1.7:run (make) @ hadoop-hdfs-native-client ---
> [INFO] Executing tasks
> main:
>  [exec] JAVA_HOME=, 
> JAVA_JVM_LIBRARY=/usr/java/jdk1.7.0_79/jre/lib/amd64/server/libjvm.so
>  [exec] JAVA_INCLUDE_PATH=/usr/java/jdk1.7.0_79/include, 
> JAVA_INCLUDE_PATH2=/usr/java/jdk1.7.0_79/include/linux
>  [exec] Located all JNI components successfully.
>  [exec] -- Could NOT find PROTOBUF (missing:  PROTOBUF_LIBRARY 
> PROTOBUF_INCLUDE_DIR)
>  [exec] -- valgrind location: MEMORYCHECK_COMMAND-NOTFOUND
>  [exec] -- checking for module 'fuse'
>  [exec] --   package 'fuse' not found
>  [exec] -- Failed to find Linux FUSE libraries or include files.  Will 
> not build FUSE client.
>  [exec] -- Configuring incomplete, errors occurred!
>  [exec] CMake Error at main/native/libhdfspp/CMakeLists.txt:71 
> (get_filename_component):
>  [exec]   get_filename_component unknown component DIRECTORY
>  [exec] Call Stack (most recent call first):
>  [exec]   main/native/libhdfspp/CMakeLists.txt:95 (copy_on_demand)
>  [exec]
>  [exec]
>  [exec] CMake Error at main/native/libhdfspp/CMakeLists.txt:71 
> (get_filename_component):
>  [exec]   get_filename_component unknown component DIRECTORY
>  [exec] Call Stack (most recent call first):
>  [exec]   main/native/libhdfspp/CMakeLists.txt:96 (copy_on_demand)
>  [exec]
>  [exec]
>  [exec] CMake Error at main/native/libhdfspp/CMakeLists.txt:71 
> (get_filename_component):
>  [exec]   get_filename_component unknown component DIRECTORY
>  [exec] Call Stack (most recent call first):
>  [exec]   main/native/libhdfspp/CMakeLists.txt:97 (copy_on_demand)
>  [exec]
>  [exec]
>  [exec] CMake Error at main/native/libhdfspp/CMakeLists.txt:71 
> (get_filename_component):
>  [exec]   get_filename_component unknown component DIRECTORY
>  [exec] Call Stack (most recent call first):
>  [exec]   main/native/libhdfspp/CMakeLists.txt:98 (copy_on_demand)
>  [exec]
>  [exec]
>  [exec] CMake Error: The following variables are used in this project, 
> but they are set to NOTFOUND.
>  [exec] Please set them or make sure they are set and tested correctly in 
> the CMake files:
>  [exec] PROTOBUF_LIBRARY (ADVANCED)
>  [exec] linked by target "hdfspp" in directory 
> /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp
>  [exec] linked by target "hdfspp_static" in directory 
> /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp
>  [exec] linked by target "protoc-gen-hrpc" in directory 
> /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/proto
>  [exec] linked by target "bad_datanode_test" in directory 
> /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/tests
>  [exec] linked by target "hdfs_builder_test" in directory 
> /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/tests
>  [exec] linked by target "hdfspp_errors_test" in directory 
> /home/tiborkiss/devel/workspace
>  [exec] 
> /hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/tests
>  [exec] linked by target "libhdfs_threaded_hdfspp_test_shim_static" 
> in directory 
> /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/test
>  [exec] linked by target "logging_test" in directory 
> /home/tiborkiss/devel

[jira] [Updated] (HDFS-10287) MiniDFSCluster should implement AutoCloseable

2016-04-27 Thread Andras Bokor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andras Bokor updated HDFS-10287:

Attachment: HDFS-10287.01.patch

> MiniDFSCluster should implement AutoCloseable
> -
>
> Key: HDFS-10287
> URL: https://issues.apache.org/jira/browse/HDFS-10287
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 2.7.0
>Reporter: John Zhuge
>Assignee: John Zhuge
>Priority: Trivial
> Attachments: HDFS-10287.01.patch
>
>
> {{MiniDFSCluster}} should implement {{AutoCloseable}} in order to support 
> [try-with-resources|https://docs.oracle.com/javase/tutorial/essential/exceptions/tryResourceClose.html].
>  It will make test code a little cleaner and more reliable.
> Since {{AutoCloseable}} is only in Java 1.7 or later, this can not be 
> backported to Hadoop version prior to 2.7.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10287) MiniDFSCluster should implement AutoCloseable

2016-04-27 Thread Andras Bokor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andras Bokor updated HDFS-10287:

Attachment: (was: HDFS-10287.01.patch)

> MiniDFSCluster should implement AutoCloseable
> -
>
> Key: HDFS-10287
> URL: https://issues.apache.org/jira/browse/HDFS-10287
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 2.7.0
>Reporter: John Zhuge
>Assignee: John Zhuge
>Priority: Trivial
> Attachments: HDFS-10287.01.patch
>
>
> {{MiniDFSCluster}} should implement {{AutoCloseable}} in order to support 
> [try-with-resources|https://docs.oracle.com/javase/tutorial/essential/exceptions/tryResourceClose.html].
>  It will make test code a little cleaner and more reliable.
> Since {{AutoCloseable}} is only in Java 1.7 or later, this can not be 
> backported to Hadoop version prior to 2.7.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10287) MiniDFSCluster should implement AutoCloseable

2016-04-27 Thread Andras Bokor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andras Bokor updated HDFS-10287:

Attachment: HDFS-10287.01.patch

I added AutoCloseable to the class and I also modified the realated test class 
as an evidence it works.

> MiniDFSCluster should implement AutoCloseable
> -
>
> Key: HDFS-10287
> URL: https://issues.apache.org/jira/browse/HDFS-10287
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 2.7.0
>Reporter: John Zhuge
>Assignee: John Zhuge
>Priority: Trivial
> Attachments: HDFS-10287.01.patch
>
>
> {{MiniDFSCluster}} should implement {{AutoCloseable}} in order to support 
> [try-with-resources|https://docs.oracle.com/javase/tutorial/essential/exceptions/tryResourceClose.html].
>  It will make test code a little cleaner and more reliable.
> Since {{AutoCloseable}} is only in Java 1.7 or later, this can not be 
> backported to Hadoop version prior to 2.7.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HDFS-9958) BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed storages.

2016-04-27 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260200#comment-15260200
 ] 

Kihwal Lee edited comment on HDFS-9958 at 4/27/16 2:20 PM:
---

I was on and off helping Kuhu with the patch.  I'll take a quick pass over it 
today.


was (Author: daryn):
I was on and off helping Ku with the patch.  I'll take a quick pass over it 
today.

> BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed 
> storages.
> 
>
> Key: HDFS-9958
> URL: https://issues.apache.org/jira/browse/HDFS-9958
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: HDFS-9958-Test-v1.txt, HDFS-9958.001.patch, 
> HDFS-9958.002.patch, HDFS-9958.003.patch, HDFS-9958.004.patch, 
> HDFS-9958.005.patch
>
>
> In a scenario where the corrupt replica is on a failed storage, before it is 
> taken out of blocksMap, there is a race which causes the creation of 
> LocatedBlock on a {{machines}} array element that is not populated. 
> Following is the root cause,
> {code}
> final int numCorruptNodes = countNodes(blk).corruptReplicas();
> {code}
> countNodes only looks at nodes with storage state as NORMAL, which in the 
> case where corrupt replica is on failed storage will amount to 
> numCorruptNodes being zero. 
> {code}
> final int numNodes = blocksMap.numNodes(blk);
> {code}
> However, numNodes will count all nodes/storages irrespective of the state of 
> the storage. Therefore numMachines will include such (failed) nodes. The 
> assert would fail only if the system is enabled to catch Assertion errors, 
> otherwise it goes ahead and tries to create LocatedBlock object for that is 
> not put in the {{machines}} array.
> Here is the stack trace:
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:45)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:40)
>   at 
> org.apache.hadoop.hdfs.protocol.LocatedBlock.(LocatedBlock.java:84)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:878)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:826)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlockList(BlockManager.java:799)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlocks(BlockManager.java:899)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1849)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:588)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9958) BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed storages.

2016-04-27 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260226#comment-15260226
 ] 

Kihwal Lee commented on HDFS-9958:
--

bq. I'm surprised that most of the time, {{storageID}} is null.
The {{storageID}} is not always available. If a corruption is detected by the 
block/volume scanner, storageID can be filled in. But when bad blocks are 
reported by {{reportRemoteBadBlock()}} during re-replication or balancing, the 
reporting node won't know the ID. If we blindly make it report the locally 
available id, it will end up reporting a wrong id.  I think only {{DFSClient}} 
currently reports {{storageID}}.  This shouldn't be a problem as long as the 
assumption that a datanode stores only one replica/stripe of a block holds.

> BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed 
> storages.
> 
>
> Key: HDFS-9958
> URL: https://issues.apache.org/jira/browse/HDFS-9958
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: HDFS-9958-Test-v1.txt, HDFS-9958.001.patch, 
> HDFS-9958.002.patch, HDFS-9958.003.patch, HDFS-9958.004.patch, 
> HDFS-9958.005.patch
>
>
> In a scenario where the corrupt replica is on a failed storage, before it is 
> taken out of blocksMap, there is a race which causes the creation of 
> LocatedBlock on a {{machines}} array element that is not populated. 
> Following is the root cause,
> {code}
> final int numCorruptNodes = countNodes(blk).corruptReplicas();
> {code}
> countNodes only looks at nodes with storage state as NORMAL, which in the 
> case where corrupt replica is on failed storage will amount to 
> numCorruptNodes being zero. 
> {code}
> final int numNodes = blocksMap.numNodes(blk);
> {code}
> However, numNodes will count all nodes/storages irrespective of the state of 
> the storage. Therefore numMachines will include such (failed) nodes. The 
> assert would fail only if the system is enabled to catch Assertion errors, 
> otherwise it goes ahead and tries to create LocatedBlock object for that is 
> not put in the {{machines}} array.
> Here is the stack trace:
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:45)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:40)
>   at 
> org.apache.hadoop.hdfs.protocol.LocatedBlock.(LocatedBlock.java:84)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:878)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:826)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlockList(BlockManager.java:799)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlocks(BlockManager.java:899)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1849)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:588)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10330) Add Corrupt Blocks Information in Metasave Output

2016-04-27 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260215#comment-15260215
 ] 

Kuhu Shukla commented on HDFS-10330:


Thanks a lot Kihwal!

> Add Corrupt Blocks Information in Metasave Output
> -
>
> Key: HDFS-10330
> URL: https://issues.apache.org/jira/browse/HDFS-10330
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.2
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Fix For: 2.8.0
>
> Attachments: HDFS-10330.001.patch, HDFS-10330.002.patch
>
>
> Along with Datanode information and other vital block information, it would 
> be useful to have corruptblocks' detailed info as part of metasave since 
> currently the jmx tracks only the count of corrupt nodes. This JIRA addresses 
> this improvement. CC: [~kihwal], [~daryn].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9958) BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed storages.

2016-04-27 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260200#comment-15260200
 ] 

Daryn Sharp commented on HDFS-9958:
---

I was on and off helping Ku with the patch.  I'll take a quick pass over it 
today.

> BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed 
> storages.
> 
>
> Key: HDFS-9958
> URL: https://issues.apache.org/jira/browse/HDFS-9958
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: HDFS-9958-Test-v1.txt, HDFS-9958.001.patch, 
> HDFS-9958.002.patch, HDFS-9958.003.patch, HDFS-9958.004.patch, 
> HDFS-9958.005.patch
>
>
> In a scenario where the corrupt replica is on a failed storage, before it is 
> taken out of blocksMap, there is a race which causes the creation of 
> LocatedBlock on a {{machines}} array element that is not populated. 
> Following is the root cause,
> {code}
> final int numCorruptNodes = countNodes(blk).corruptReplicas();
> {code}
> countNodes only looks at nodes with storage state as NORMAL, which in the 
> case where corrupt replica is on failed storage will amount to 
> numCorruptNodes being zero. 
> {code}
> final int numNodes = blocksMap.numNodes(blk);
> {code}
> However, numNodes will count all nodes/storages irrespective of the state of 
> the storage. Therefore numMachines will include such (failed) nodes. The 
> assert would fail only if the system is enabled to catch Assertion errors, 
> otherwise it goes ahead and tries to create LocatedBlock object for that is 
> not put in the {{machines}} array.
> Here is the stack trace:
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:45)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:40)
>   at 
> org.apache.hadoop.hdfs.protocol.LocatedBlock.(LocatedBlock.java:84)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:878)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:826)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlockList(BlockManager.java:799)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlocks(BlockManager.java:899)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1849)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:588)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10330) Add Corrupt Blocks Information in Metasave Output

2016-04-27 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-10330:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

I've committed this to trunk, branch-2 and branch-2.8. Thanks for the patch, 
[~kshukla].

> Add Corrupt Blocks Information in Metasave Output
> -
>
> Key: HDFS-10330
> URL: https://issues.apache.org/jira/browse/HDFS-10330
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.2
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Fix For: 2.8.0
>
> Attachments: HDFS-10330.001.patch, HDFS-10330.002.patch
>
>
> Along with Datanode information and other vital block information, it would 
> be useful to have corruptblocks' detailed info as part of metasave since 
> currently the jmx tracks only the count of corrupt nodes. This JIRA addresses 
> this improvement. CC: [~kihwal], [~daryn].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10330) Add Corrupt Blocks Information in Metasave Output

2016-04-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260150#comment-15260150
 ] 

Hudson commented on HDFS-10330:
---

FAILURE: Integrated in Hadoop-trunk-Commit #9680 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9680/])
HDFS-10330. Add Corrupt Blocks Information in Metasave output. (kihwal: rev 
919a1d824a0a61145dc7ae59cfba3f34d91f2681)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CorruptReplicasMap.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestMetaSave.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java


> Add Corrupt Blocks Information in Metasave Output
> -
>
> Key: HDFS-10330
> URL: https://issues.apache.org/jira/browse/HDFS-10330
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.2
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: HDFS-10330.001.patch, HDFS-10330.002.patch
>
>
> Along with Datanode information and other vital block information, it would 
> be useful to have corruptblocks' detailed info as part of metasave since 
> currently the jmx tracks only the count of corrupt nodes. This JIRA addresses 
> this improvement. CC: [~kihwal], [~daryn].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10330) Add Corrupt Blocks Information in Metasave Output

2016-04-27 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260111#comment-15260111
 ] 

Kihwal Lee commented on HDFS-10330:
---

Sample output
{noformat}
Corrupt Blocks:
Block=123412345 Node=10.0.0.1:1004  
StorageID=DS-27b3aa33-4052-4625-86d0-999234270a3f   StorageState=NORMAL 
TotalReplicas=4 Reason=GENSTAMP_MISMATCH
{noformat}
+1 Looks good

> Add Corrupt Blocks Information in Metasave Output
> -
>
> Key: HDFS-10330
> URL: https://issues.apache.org/jira/browse/HDFS-10330
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.2
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: HDFS-10330.001.patch, HDFS-10330.002.patch
>
>
> Along with Datanode information and other vital block information, it would 
> be useful to have corruptblocks' detailed info as part of metasave since 
> currently the jmx tracks only the count of corrupt nodes. This JIRA addresses 
> this improvement. CC: [~kihwal], [~daryn].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8449) Add tasks count metrics to datanode for ECWorker

2016-04-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15259993#comment-15259993
 ] 

Hadoop QA commented on HDFS-8449:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
37s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s 
{color} | {color:green} trunk passed with JDK v1.8.0_92 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
20s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 51s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
54s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s 
{color} | {color:green} trunk passed with JDK v1.8.0_92 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 44s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
44s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed with JDK v1.8.0_92 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 19s 
{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs: patch generated 2 new + 
61 unchanged - 0 fixed = 63 total (was 61) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 47s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 7s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 0s 
{color} | {color:green} the patch passed with JDK v1.8.0_92 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 41s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 58m 33s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_92. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 55m 26s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
26s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 138m 49s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_92 Failed junit tests | hadoop.hdfs.TestHFlush |
|   | hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl |
| JDK v1.7.0_95 Failed junit tests | hadoop.hdfs.TestHFlush |
|   | hadoop.hdfs.TestLeaseRecovery2 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:fbe3e86 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12800976/HDFS-8449-006.patch |
| JIRA Issue | HDFS-8449 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux fcfa7aab1bbc 3.13.0-36-low

[jira] [Commented] (HDFS-9958) BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed storages.

2016-04-27 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15259918#comment-15259918
 ] 

Walter Su commented on HDFS-9958:
-

Failed tests are not related. Will commit shortly if there's no further comment.

> BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed 
> storages.
> 
>
> Key: HDFS-9958
> URL: https://issues.apache.org/jira/browse/HDFS-9958
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: HDFS-9958-Test-v1.txt, HDFS-9958.001.patch, 
> HDFS-9958.002.patch, HDFS-9958.003.patch, HDFS-9958.004.patch, 
> HDFS-9958.005.patch
>
>
> In a scenario where the corrupt replica is on a failed storage, before it is 
> taken out of blocksMap, there is a race which causes the creation of 
> LocatedBlock on a {{machines}} array element that is not populated. 
> Following is the root cause,
> {code}
> final int numCorruptNodes = countNodes(blk).corruptReplicas();
> {code}
> countNodes only looks at nodes with storage state as NORMAL, which in the 
> case where corrupt replica is on failed storage will amount to 
> numCorruptNodes being zero. 
> {code}
> final int numNodes = blocksMap.numNodes(blk);
> {code}
> However, numNodes will count all nodes/storages irrespective of the state of 
> the storage. Therefore numMachines will include such (failed) nodes. The 
> assert would fail only if the system is enabled to catch Assertion errors, 
> otherwise it goes ahead and tries to create LocatedBlock object for that is 
> not put in the {{machines}} array.
> Here is the stack trace:
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:45)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:40)
>   at 
> org.apache.hadoop.hdfs.protocol.LocatedBlock.(LocatedBlock.java:84)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:878)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:826)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlockList(BlockManager.java:799)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlocks(BlockManager.java:899)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1849)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:588)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6187) Update the document of hftp / hsftp in branch-2

2016-04-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15259891#comment-15259891
 ] 

Hadoop QA commented on HDFS-6187:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 9m 40s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
15s {color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 23s 
{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 6s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
24s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 29m 10s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:babe025 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12800980/HDFS-6187.branch-2.001.patch
 |
| JIRA Issue | HDFS-6187 |
| Optional Tests |  asflicense  mvnsite  |
| uname | Linux 4d3c5ea17b91 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | branch-2 / 9d3ddb0 |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15308/console |
| Powered by | Apache Yetus 0.2.0   http://yetus.apache.org |


This message was automatically generated.



> Update the document of hftp / hsftp in branch-2
> ---
>
> Key: HDFS-6187
> URL: https://issues.apache.org/jira/browse/HDFS-6187
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Haohui Mai
>  Labels: newbie
> Attachments: HDFS-6187.001.patch, HDFS-6187.branch-2.001.patch
>
>
> HDFS-5570 has removed hftp / hsftp from trunk. The documentation of hftp / 
> hsftp in branch-2 need to be updated to indicate that these two filesystems 
> are deprecated in 2.x and will be unavailable in 3.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10332) hdfs-native-client fails to build with CMake 2.8.11 or earlier

2016-04-27 Thread Tibor Kiss (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tibor Kiss updated HDFS-10332:
--
Attachment: (was: HDFS-10332-HDFS-8707.001.patch)

> hdfs-native-client fails to build with CMake 2.8.11 or earlier
> --
>
> Key: HDFS-10332
> URL: https://issues.apache.org/jira/browse/HDFS-10332
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Tibor Kiss
>Assignee: Tibor Kiss
>Priority: Minor
> Attachments: HDFS-10332.01.patch, HDFS-10332.HDFS-8707.001.patch
>
>
> Due to a new syntax introduced in CMake 2.8.12 (get_filename_component's 
> function when VAR=DIRECTORY) the native-client won't build.
> Currently RHEL6 & 7 are using older version of CMake. 
> Error log:
> {noformat}
> [INFO] --- maven-antrun-plugin:1.7:run (make) @ hadoop-hdfs-native-client ---
> [INFO] Executing tasks
> main:
>  [exec] JAVA_HOME=, 
> JAVA_JVM_LIBRARY=/usr/java/jdk1.7.0_79/jre/lib/amd64/server/libjvm.so
>  [exec] JAVA_INCLUDE_PATH=/usr/java/jdk1.7.0_79/include, 
> JAVA_INCLUDE_PATH2=/usr/java/jdk1.7.0_79/include/linux
>  [exec] Located all JNI components successfully.
>  [exec] -- Could NOT find PROTOBUF (missing:  PROTOBUF_LIBRARY 
> PROTOBUF_INCLUDE_DIR)
>  [exec] -- valgrind location: MEMORYCHECK_COMMAND-NOTFOUND
>  [exec] -- checking for module 'fuse'
>  [exec] --   package 'fuse' not found
>  [exec] -- Failed to find Linux FUSE libraries or include files.  Will 
> not build FUSE client.
>  [exec] -- Configuring incomplete, errors occurred!
>  [exec] CMake Error at main/native/libhdfspp/CMakeLists.txt:71 
> (get_filename_component):
>  [exec]   get_filename_component unknown component DIRECTORY
>  [exec] Call Stack (most recent call first):
>  [exec]   main/native/libhdfspp/CMakeLists.txt:95 (copy_on_demand)
>  [exec]
>  [exec]
>  [exec] CMake Error at main/native/libhdfspp/CMakeLists.txt:71 
> (get_filename_component):
>  [exec]   get_filename_component unknown component DIRECTORY
>  [exec] Call Stack (most recent call first):
>  [exec]   main/native/libhdfspp/CMakeLists.txt:96 (copy_on_demand)
>  [exec]
>  [exec]
>  [exec] CMake Error at main/native/libhdfspp/CMakeLists.txt:71 
> (get_filename_component):
>  [exec]   get_filename_component unknown component DIRECTORY
>  [exec] Call Stack (most recent call first):
>  [exec]   main/native/libhdfspp/CMakeLists.txt:97 (copy_on_demand)
>  [exec]
>  [exec]
>  [exec] CMake Error at main/native/libhdfspp/CMakeLists.txt:71 
> (get_filename_component):
>  [exec]   get_filename_component unknown component DIRECTORY
>  [exec] Call Stack (most recent call first):
>  [exec]   main/native/libhdfspp/CMakeLists.txt:98 (copy_on_demand)
>  [exec]
>  [exec]
>  [exec] CMake Error: The following variables are used in this project, 
> but they are set to NOTFOUND.
>  [exec] Please set them or make sure they are set and tested correctly in 
> the CMake files:
>  [exec] PROTOBUF_LIBRARY (ADVANCED)
>  [exec] linked by target "hdfspp" in directory 
> /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp
>  [exec] linked by target "hdfspp_static" in directory 
> /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp
>  [exec] linked by target "protoc-gen-hrpc" in directory 
> /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/proto
>  [exec] linked by target "bad_datanode_test" in directory 
> /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/tests
>  [exec] linked by target "hdfs_builder_test" in directory 
> /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/tests
>  [exec] linked by target "hdfspp_errors_test" in directory 
> /home/tiborkiss/devel/workspace
>  [exec] 
> /hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/tests
>  [exec] linked by target "libhdfs_threaded_hdfspp_test_shim_static" 
> in directory 
> /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/test
>  [exec] linked by target "logging_test" in directory 
> /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/tests
>  [exec] linked by target "node_exclusion_test" in directory 
> /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhd

[jira] [Commented] (HDFS-10175) add per-operation stats to FileSystem.Statistics

2016-04-27 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15259851#comment-15259851
 ] 

Steve Loughran commented on HDFS-10175:
---

bq. One thing that is a bit concerning about metrics2 is that I think people 
feel that this interface should be stable (i.e. don't remove or alter things 
once they're in), which would be a big constraint on us. 

Ops teams don't like metrics they rely on being taken away; they also view 
published metrics as the API. As the compatibility docs say "Metrics should 
preserve compatibility within the major release."

bq. Perhaps we could document that per-fs stats were @Public @Evolving rather 
than stable

+1 to that, though it'll be important not to break binary compatibility with 
external filesystems.

FWIW, issues I have with metrics2, apart from "steve doesn't understand the 
design fully" is its preference for singletons registered with JMX. This makes 
sense in deployed services, not for tests.

bq. Do we have any ideas about how Spark will consume these metrics in the 
longer term?

Spark ia a Coda Hale instrumented codebase, as are many other apps these days. 
Integration between hadoop metrics of any form and the Coda Hale libraries 
would be something to address, but not here.



> add per-operation stats to FileSystem.Statistics
> 
>
> Key: HDFS-10175
> URL: https://issues.apache.org/jira/browse/HDFS-10175
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Ram Venkatesh
>Assignee: Mingliang Liu
> Attachments: HDFS-10175.000.patch, HDFS-10175.001.patch, 
> HDFS-10175.002.patch, HDFS-10175.003.patch, HDFS-10175.004.patch, 
> HDFS-10175.005.patch, HDFS-10175.006.patch, TestStatisticsOverhead.java
>
>
> Currently FileSystem.Statistics exposes the following statistics:
> BytesRead
> BytesWritten
> ReadOps
> LargeReadOps
> WriteOps
> These are in-turn exposed as job counters by MapReduce and other frameworks. 
> There is logic within DfsClient to map operations to these counters that can 
> be confusing, for instance, mkdirs counts as a writeOp.
> Proposed enhancement:
> Add a statistic for each DfsClient operation including create, append, 
> createSymlink, delete, exists, mkdirs, rename and expose them as new 
> properties on the Statistics object. The operation-specific counters can be 
> used for analyzing the load imposed by a particular job on HDFS. 
> For example, we can use them to identify jobs that end up creating a large 
> number of files.
> Once this information is available in the Statistics object, the app 
> frameworks like MapReduce can expose them as additional counters to be 
> aggregated and recorded as part of job summary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10175) add per-operation stats to FileSystem.Statistics

2016-04-27 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15259838#comment-15259838
 ] 

Steve Loughran commented on HDFS-10175:
---

Wednesday april 26, 10:30 AM PST; 18:30 UK, 17:30 GMT works for me.

 Webex binding?

if you don't specify one, mine is at https://hortonworks.webex.com/meet/stevel



> add per-operation stats to FileSystem.Statistics
> 
>
> Key: HDFS-10175
> URL: https://issues.apache.org/jira/browse/HDFS-10175
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Ram Venkatesh
>Assignee: Mingliang Liu
> Attachments: HDFS-10175.000.patch, HDFS-10175.001.patch, 
> HDFS-10175.002.patch, HDFS-10175.003.patch, HDFS-10175.004.patch, 
> HDFS-10175.005.patch, HDFS-10175.006.patch, TestStatisticsOverhead.java
>
>
> Currently FileSystem.Statistics exposes the following statistics:
> BytesRead
> BytesWritten
> ReadOps
> LargeReadOps
> WriteOps
> These are in-turn exposed as job counters by MapReduce and other frameworks. 
> There is logic within DfsClient to map operations to these counters that can 
> be confusing, for instance, mkdirs counts as a writeOp.
> Proposed enhancement:
> Add a statistic for each DfsClient operation including create, append, 
> createSymlink, delete, exists, mkdirs, rename and expose them as new 
> properties on the Statistics object. The operation-specific counters can be 
> used for analyzing the load imposed by a particular job on HDFS. 
> For example, we can use them to identify jobs that end up creating a large 
> number of files.
> Once this information is available in the Statistics object, the app 
> frameworks like MapReduce can expose them as additional counters to be 
> aggregated and recorded as part of job summary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6187) Update the document of hftp / hsftp in branch-2

2016-04-27 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HDFS-6187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gergely Novák updated HDFS-6187:

Attachment: HDFS-6187.branch-2.001.patch

> Update the document of hftp / hsftp in branch-2
> ---
>
> Key: HDFS-6187
> URL: https://issues.apache.org/jira/browse/HDFS-6187
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Haohui Mai
>  Labels: newbie
> Attachments: HDFS-6187.001.patch, HDFS-6187.branch-2.001.patch
>
>
> HDFS-5570 has removed hftp / hsftp from trunk. The documentation of hftp / 
> hsftp in branch-2 need to be updated to indicate that these two filesystems 
> are deprecated in 2.x and will be unavailable in 3.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6187) Update the document of hftp / hsftp in branch-2

2016-04-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15259827#comment-15259827
 ] 

Hadoop QA commented on HDFS-6187:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 4s {color} 
| {color:red} HDFS-6187 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12800971/HDFS-6187.001.patch |
| JIRA Issue | HDFS-6187 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15306/console |
| Powered by | Apache Yetus 0.2.0   http://yetus.apache.org |


This message was automatically generated.



> Update the document of hftp / hsftp in branch-2
> ---
>
> Key: HDFS-6187
> URL: https://issues.apache.org/jira/browse/HDFS-6187
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Haohui Mai
>  Labels: newbie
> Attachments: HDFS-6187.001.patch
>
>
> HDFS-5570 has removed hftp / hsftp from trunk. The documentation of hftp / 
> hsftp in branch-2 need to be updated to indicate that these two filesystems 
> are deprecated in 2.x and will be unavailable in 3.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8449) Add tasks count metrics to datanode for ECWorker

2016-04-27 Thread Li Bo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Bo updated HDFS-8449:

Attachment: HDFS-8449-006.patch

> Add tasks count metrics to datanode for ECWorker
> 
>
> Key: HDFS-8449
> URL: https://issues.apache.org/jira/browse/HDFS-8449
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Li Bo
>Assignee: Li Bo
> Attachments: HDFS-8449-000.patch, HDFS-8449-001.patch, 
> HDFS-8449-002.patch, HDFS-8449-003.patch, HDFS-8449-004.patch, 
> HDFS-8449-005.patch, HDFS-8449-006.patch
>
>
> This sub task try to record ec recovery tasks that a datanode has done, 
> including total tasks, failed tasks and sucessful tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6187) Update the document of hftp / hsftp in branch-2

2016-04-27 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HDFS-6187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gergely Novák updated HDFS-6187:

Status: Patch Available  (was: Open)

> Update the document of hftp / hsftp in branch-2
> ---
>
> Key: HDFS-6187
> URL: https://issues.apache.org/jira/browse/HDFS-6187
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Haohui Mai
>  Labels: newbie
> Attachments: HDFS-6187.001.patch
>
>
> HDFS-5570 has removed hftp / hsftp from trunk. The documentation of hftp / 
> hsftp in branch-2 need to be updated to indicate that these two filesystems 
> are deprecated in 2.x and will be unavailable in 3.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6187) Update the document of hftp / hsftp in branch-2

2016-04-27 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HDFS-6187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15259812#comment-15259812
 ] 

Gergely Novák commented on HDFS-6187:
-

Added deprecated message to the HFTP Guide document in patch #001. [~wheat9] is 
this what you meant?

> Update the document of hftp / hsftp in branch-2
> ---
>
> Key: HDFS-6187
> URL: https://issues.apache.org/jira/browse/HDFS-6187
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Haohui Mai
>  Labels: newbie
> Attachments: HDFS-6187.001.patch
>
>
> HDFS-5570 has removed hftp / hsftp from trunk. The documentation of hftp / 
> hsftp in branch-2 need to be updated to indicate that these two filesystems 
> are deprecated in 2.x and will be unavailable in 3.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6187) Update the document of hftp / hsftp in branch-2

2016-04-27 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HDFS-6187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gergely Novák updated HDFS-6187:

Attachment: HDFS-6187.001.patch

> Update the document of hftp / hsftp in branch-2
> ---
>
> Key: HDFS-6187
> URL: https://issues.apache.org/jira/browse/HDFS-6187
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Haohui Mai
>  Labels: newbie
> Attachments: HDFS-6187.001.patch
>
>
> HDFS-5570 has removed hftp / hsftp from trunk. The documentation of hftp / 
> hsftp in branch-2 need to be updated to indicate that these two filesystems 
> are deprecated in 2.x and will be unavailable in 3.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9758) libhdfs++: Implement Python bindings

2016-04-27 Thread Tibor Kiss (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15259810#comment-15259810
 ] 

Tibor Kiss commented on HDFS-9758:
--

We have several options to implement Python bindings for the pure C++ HDFS 
Client:
 - CFFI (MIT License)
 - cppyy (MIT License)
 - Ctypes (MIT License)
 - Cython (Apache License)
 - SWIG (GPL License)
 - Boost.Python (Boost Software License)
 - pure python extensions

While CFFI is simple & clean it does not support C++. 
cppyy would be a great choice but it supports pypy at this time.
Ctypes is integrated to CPython since 2.5, but C++ support is not great.
Cython does support both C & C++, seems a reasonable choice.
SWIG also supports C & C++, plus it could be later used to bring other 
scripting language support. It's licensing could be a problem.
Boost.Python seems to have great C++ support at a first glance. It's license 
needs to be studied.

Thoughts / feelings / preferences?

> libhdfs++: Implement Python bindings
> 
>
> Key: HDFS-9758
> URL: https://issues.apache.org/jira/browse/HDFS-9758
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: James Clampffer
>
> It'd be really useful to have bindings for various scripting languages.  
> Python would be a good start because of it's popularity and how easy it is to 
> interact with shared libraries using the ctypes module.  I think bindings for 
> the V8 engine that nodeJS uses would be a close second in terms of expanding 
> the potential user base.
> Probably worth starting with just adding a synchronous API and building from 
> there to avoid interactions with python's garbage collector until the 
> bindings prove to be solid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-04-27 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15259688#comment-15259688
 ] 

Konstantin Shvachko commented on HDFS-10301:


??Maybe I'm misunderstanding the proposal, but don't we already do all of 
this???

Yes you misunderstood. This part is not my proposal. This is what we already 
do, and therefore I call them *Constraints*, because they complicate the 
*Problem*. The proposal is in the third bullet point titled *Approach*.

??What does the NameNode do if the DataNode is restarted while sending these 
RPCs, so that it never gets a chance to send all the storages that it claimed 
existed?  It seems like you will get stuck??

No, I will not get stuck. All br-RCPs are completely independent of each other. 
It's just that one of them has all storages, and indicates to the NameNode that 
it should update its storage list for the DataNode. NN processes as many of 
such RPCs, as DN sends. If the DN dies the NN will declare it dead in due time, 
or if DN restarts within 10 minutes it will send new set of block reports from 
scratch. I do not see any inconsistencies.

You can think of it as a new operation SyncStorages, which does just that - 
updates NameNode's knowledge of DN's storages. I combined this operation with 
the first br-RPC. One can combine it with any other call, same as you propose 
to combine it with the heartbeat. Except it seems a poor idea, since we don't 
want to wait for removal of thousands of replicas on a heartbeat.

??interleaved block reports are extremely rare??

You keep saying this. But it is not rare for me. Are you convincing me not to 
believe my eyes or that you checked the logs on your thousands of clusters? I 
did check mine.

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Colin Patrick McCabe
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.01.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10336) TestBalancer failing intermittently because of not reseting UserGroupInformation completely

2016-04-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15259683#comment-15259683
 ] 

Hadoop QA commented on HDFS-10336:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
30s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 39s 
{color} | {color:green} trunk passed with JDK v1.8.0_92 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 42s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
5s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 47s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
27s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
28s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 0s 
{color} | {color:green} trunk passed with JDK v1.8.0_92 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 49s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
26s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 41s 
{color} | {color:green} the patch passed with JDK v1.8.0_92 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 41s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 41s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 41s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
4s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 47s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
27s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
58s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 59s 
{color} | {color:green} the patch passed with JDK v1.8.0_92 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 46s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 6m 52s 
{color} | {color:green} hadoop-common in the patch passed with JDK v1.8.0_92. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 57m 55s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_92. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 20s 
{color} | {color:green} hadoop-common in the patch passed with JDK v1.7.0_95. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 53m 30s 
{color} | {color:green} hadoop-hdfs in the patch passed with JDK v1.7.0_95. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
25s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 184m 14s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_92 Failed junit