[jira] [Commented] (HDFS-10285) Storage Policy Satisfier in Namenode
[ https://issues.apache.org/jira/browse/HDFS-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261637#comment-15261637 ] Kai Zheng commented on HDFS-10285: -- Thanks Uma. bq. Could you elaborate a bit? Sure, I meant, current distcp tool can specify with options whether to reserve or copy the original file settings like block size or not when copying a file from source place to destination folder. I suggested we might also consider to add an option to specify if to reserve or copy the storage policy property or not, if it sounds good to help in the mentioned scenario. > Storage Policy Satisfier in Namenode > > > Key: HDFS-10285 > URL: https://issues.apache.org/jira/browse/HDFS-10285 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Affects Versions: 2.7.2 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G > > Heterogeneous storage in HDFS introduced the concept of storage policy. These > policies can be set on directory/file to specify the user preference, where > to store the physical block. When user set the storage policy before writing > data, then the blocks could take advantage of storage policy preferences and > stores physical block accordingly. > If user set the storage policy after writing and completing the file, then > the blocks would have been written with default storage policy (nothing but > DISK). User has to run the ‘Mover tool’ explicitly by specifying all such > file names as a list. In some distributed system scenarios (ex: HBase) it > would be difficult to collect all the files and run the tool as different > nodes can write files separately and file can have different paths. > Another scenarios is, when user rename the files from one effected storage > policy file (inherited policy from parent directory) to another storage > policy effected directory, it will not copy inherited storage policy from > source. So it will take effect from destination file/dir parent storage > policy. This rename operation is just a metadata change in Namenode. The > physical blocks still remain with source storage policy. > So, Tracking all such business logic based file names could be difficult for > admins from distributed nodes(ex: region servers) and running the Mover tool. > Here the proposal is to provide an API from Namenode itself for trigger the > storage policy satisfaction. A Daemon thread inside Namenode should track > such calls and process to DN as movement commands. > Will post the detailed design thoughts document soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10285) Storage Policy Satisfier in Namenode
[ https://issues.apache.org/jira/browse/HDFS-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261581#comment-15261581 ] Uma Maheswara Rao G commented on HDFS-10285: {quote} So does this mean there would be a need to reserve and copy the inherited storage policy in distcp tool? {quote} The current implementation does not copy the source storage policy. What you mean by preserve here? Sorry I did not follow this. Could you elaborate a bit? {quote} Yeah, having an API to allow applications to trigger the mover behavior sounds good. As mentioned in the proposal, there is a need in HBase on HDFS HSM. Maybe Jingcheng Du and Wei Zhou could have detailed description about this as I know you have the relevant work. {quote} That will be great! Thanks a lot, Kai for your comments. > Storage Policy Satisfier in Namenode > > > Key: HDFS-10285 > URL: https://issues.apache.org/jira/browse/HDFS-10285 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Affects Versions: 2.7.2 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G > > Heterogeneous storage in HDFS introduced the concept of storage policy. These > policies can be set on directory/file to specify the user preference, where > to store the physical block. When user set the storage policy before writing > data, then the blocks could take advantage of storage policy preferences and > stores physical block accordingly. > If user set the storage policy after writing and completing the file, then > the blocks would have been written with default storage policy (nothing but > DISK). User has to run the ‘Mover tool’ explicitly by specifying all such > file names as a list. In some distributed system scenarios (ex: HBase) it > would be difficult to collect all the files and run the tool as different > nodes can write files separately and file can have different paths. > Another scenarios is, when user rename the files from one effected storage > policy file (inherited policy from parent directory) to another storage > policy effected directory, it will not copy inherited storage policy from > source. So it will take effect from destination file/dir parent storage > policy. This rename operation is just a metadata change in Namenode. The > physical blocks still remain with source storage policy. > So, Tracking all such business logic based file names could be difficult for > admins from distributed nodes(ex: region servers) and running the Mover tool. > Here the proposal is to provide an API from Namenode itself for trigger the > storage policy satisfaction. A Daemon thread inside Namenode should track > such calls and process to DN as movement commands. > Will post the detailed design thoughts document soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10337) OfflineEditsViewer stats option should print 0 instead of null for the count of operations
[ https://issues.apache.org/jira/browse/HDFS-10337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Yiqun updated HDFS-10337: - Attachment: HDFS-10337.002.patch Thanks for the quick review. Update the latest patch for addressing the comment. > OfflineEditsViewer stats option should print 0 instead of null for the count > of operations > -- > > Key: HDFS-10337 > URL: https://issues.apache.org/jira/browse/HDFS-10337 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Akira AJISAKA >Assignee: Lin Yiqun >Priority: Minor > Labels: newbie > Attachments: HDFS-10337.001.patch, HDFS-10337.002.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8449) Add tasks count metrics to datanode for ECWorker
[ https://issues.apache.org/jira/browse/HDFS-8449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261472#comment-15261472 ] Li Bo commented on HDFS-8449: - The two failed unit tests have no relation with this patch and will be solved in HDFS-10334. The check style problems also can be ignored. Hi [~drankye], could you help me review the patch again? Thanks. > Add tasks count metrics to datanode for ECWorker > > > Key: HDFS-8449 > URL: https://issues.apache.org/jira/browse/HDFS-8449 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Li Bo >Assignee: Li Bo > Attachments: HDFS-8449-000.patch, HDFS-8449-001.patch, > HDFS-8449-002.patch, HDFS-8449-003.patch, HDFS-8449-004.patch, > HDFS-8449-005.patch, HDFS-8449-006.patch > > > This sub task try to record ec recovery tasks that a datanode has done, > including total tasks, failed tasks and sucessful tasks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9276) Failed to Update HDFS Delegation Token for long running application in HA mode
[ https://issues.apache.org/jira/browse/HDFS-9276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261467#comment-15261467 ] Yi Liu commented on HDFS-9276: -- Not only one people tell me they encounter this issue in real cluster, and ask me to help pushing the fix. >From my point of view, the approach in this patch is generally OK, and may >still need some refinement. [~daryn], [~ste...@apache.org], [~cnauroth], could you help to check too? > Failed to Update HDFS Delegation Token for long running application in HA mode > -- > > Key: HDFS-9276 > URL: https://issues.apache.org/jira/browse/HDFS-9276 > Project: Hadoop HDFS > Issue Type: Bug > Components: fs, ha, security >Affects Versions: 2.7.1 >Reporter: Liangliang Gu >Assignee: Liangliang Gu > Attachments: HDFS-9276.01.patch, HDFS-9276.02.patch, > HDFS-9276.03.patch, HDFS-9276.04.patch, HDFS-9276.05.patch, > HDFS-9276.06.patch, HDFS-9276.07.patch, HDFS-9276.08.patch, > HDFS-9276.09.patch, HDFS-9276.10.patch, HDFS-9276.11.patch, > HDFS-9276.12.patch, HDFS-9276.13.patch, debug1.PNG, debug2.PNG > > > The Scenario is as follows: > 1. NameNode HA is enabled. > 2. Kerberos is enabled. > 3. HDFS Delegation Token (not Keytab or TGT) is used to communicate with > NameNode. > 4. We want to update the HDFS Delegation Token for long running applicatons. > HDFS Client will generate private tokens for each NameNode. When we update > the HDFS Delegation Token, these private tokens will not be updated, which > will cause token expired. > This bug can be reproduced by the following program: > {code} > import java.security.PrivilegedExceptionAction > import org.apache.hadoop.conf.Configuration > import org.apache.hadoop.fs.{FileSystem, Path} > import org.apache.hadoop.security.UserGroupInformation > object HadoopKerberosTest { > def main(args: Array[String]): Unit = { > val keytab = "/path/to/keytab/xxx.keytab" > val principal = "x...@abc.com" > val creds1 = new org.apache.hadoop.security.Credentials() > val ugi1 = > UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab) > ugi1.doAs(new PrivilegedExceptionAction[Void] { > // Get a copy of the credentials > override def run(): Void = { > val fs = FileSystem.get(new Configuration()) > fs.addDelegationTokens("test", creds1) > null > } > }) > val ugi = UserGroupInformation.createRemoteUser("test") > ugi.addCredentials(creds1) > ugi.doAs(new PrivilegedExceptionAction[Void] { > // Get a copy of the credentials > override def run(): Void = { > var i = 0 > while (true) { > val creds1 = new org.apache.hadoop.security.Credentials() > val ugi1 = > UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab) > ugi1.doAs(new PrivilegedExceptionAction[Void] { > // Get a copy of the credentials > override def run(): Void = { > val fs = FileSystem.get(new Configuration()) > fs.addDelegationTokens("test", creds1) > null > } > }) > UserGroupInformation.getCurrentUser.addCredentials(creds1) > val fs = FileSystem.get( new Configuration()) > i += 1 > println() > println(i) > println(fs.listFiles(new Path("/user"), false)) > Thread.sleep(60 * 1000) > } > null > } > }) > } > } > {code} > To reproduce the bug, please set the following configuration to Name Node: > {code} > dfs.namenode.delegation.token.max-lifetime = 10min > dfs.namenode.delegation.key.update-interval = 3min > dfs.namenode.delegation.token.renew-interval = 3min > {code} > The bug will occure after 3 minutes. > The stacktrace is: > {code} > Exception in thread "main" > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (HDFS_DELEGATION_TOKEN token 330156 for test) is expired > at org.apache.hadoop.ipc.Client.call(Client.java:1347) > at org.apache.hadoop.ipc.Client.call(Client.java:1300) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) >
[jira] [Assigned] (HDFS-10338) DistCp masks potential CRC check failures
[ https://issues.apache.org/jira/browse/HDFS-10338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Yiqun reassigned HDFS-10338: Assignee: Lin Yiqun > DistCp masks potential CRC check failures > - > > Key: HDFS-10338 > URL: https://issues.apache.org/jira/browse/HDFS-10338 > Project: Hadoop HDFS > Issue Type: Bug > Components: distcp >Affects Versions: 2.7.1 >Reporter: Elliot West >Assignee: Lin Yiqun > > There appear to be edge cases whereby CRC checks may be circumvented when > requests for checksums from the source or target file system fail. In this > event CRCs could differ between the source and target and yet the DistCp copy > would succeed, even when the 'skip CRC check' option is not being used. > The code in question is contained in the method > [{{org.apache.hadoop.tools.util.DistCpUtils#checksumsAreEqual(...)}}|https://github.com/apache/hadoop/blob/release-2.7.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/DistCpUtils.java#L457] > Specifically this code block suggests that if there is a failure when trying > to read the source or target checksum then the method will return {{true}} > (i.e. the checksums are equal), implying that the check succeeded. In actual > fact we just failed to obtain the checksum and could not perform the check. > {code} > try { > sourceChecksum = sourceChecksum != null ? sourceChecksum : > sourceFS.getFileChecksum(source); > targetChecksum = targetFS.getFileChecksum(target); > } catch (IOException e) { > LOG.error("Unable to retrieve checksum for " + source + " or " > + target, e); > } > return (sourceChecksum == null || targetChecksum == null || > sourceChecksum.equals(targetChecksum)); > {code} > I believe that at the very least the caught {{IOException}} should be > re-thrown. If this is not deemed desirable then I believe an option > ({{--strictCrc}}?) should be added to enforce a strict check where we require > that both the source and target CRCs are retrieved, are not null, and are > then compared for equality. If for any reason either of the CRCs retrievals > fail then an exception is thrown. > Clearly some {{FileSystems}} do not support CRCs and invocations to > {{FileSystem.getFileChecksum(...)}} return {{null}} in these instances. I > would suggest that these should fail a strict CRC check to prevent users > developing a false sense of security in their copy pipeline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10338) DistCp masks potential CRC check failures
[ https://issues.apache.org/jira/browse/HDFS-10338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261456#comment-15261456 ] Lin Yiqun commented on HDFS-10338: -- Hi, [~teabot], I have two comments for this: * It looks the option {{ignoreFailures}} that [~liuml07] suggested will be better. In one sense, the {{strictCrc}} option has same meaning with {{skipcrccheck}} which are both doing a crc check. However now, we will do a strict crc check, there will be more failures in checksum comparing. So the new option {{ignoreFailures}} will be reasonable. * I agree with you that some {{FileSystems}} do not support CRCs should be as a failed case. Assign this work to me. If there are no other comments, I will post a patch later for addressing the comments as mentioned above. > DistCp masks potential CRC check failures > - > > Key: HDFS-10338 > URL: https://issues.apache.org/jira/browse/HDFS-10338 > Project: Hadoop HDFS > Issue Type: Bug > Components: distcp >Affects Versions: 2.7.1 >Reporter: Elliot West > > There appear to be edge cases whereby CRC checks may be circumvented when > requests for checksums from the source or target file system fail. In this > event CRCs could differ between the source and target and yet the DistCp copy > would succeed, even when the 'skip CRC check' option is not being used. > The code in question is contained in the method > [{{org.apache.hadoop.tools.util.DistCpUtils#checksumsAreEqual(...)}}|https://github.com/apache/hadoop/blob/release-2.7.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/DistCpUtils.java#L457] > Specifically this code block suggests that if there is a failure when trying > to read the source or target checksum then the method will return {{true}} > (i.e. the checksums are equal), implying that the check succeeded. In actual > fact we just failed to obtain the checksum and could not perform the check. > {code} > try { > sourceChecksum = sourceChecksum != null ? sourceChecksum : > sourceFS.getFileChecksum(source); > targetChecksum = targetFS.getFileChecksum(target); > } catch (IOException e) { > LOG.error("Unable to retrieve checksum for " + source + " or " > + target, e); > } > return (sourceChecksum == null || targetChecksum == null || > sourceChecksum.equals(targetChecksum)); > {code} > I believe that at the very least the caught {{IOException}} should be > re-thrown. If this is not deemed desirable then I believe an option > ({{--strictCrc}}?) should be added to enforce a strict check where we require > that both the source and target CRCs are retrieved, are not null, and are > then compared for equality. If for any reason either of the CRCs retrievals > fail then an exception is thrown. > Clearly some {{FileSystems}} do not support CRCs and invocations to > {{FileSystem.getFileChecksum(...)}} return {{null}} in these instances. I > would suggest that these should fail a strict CRC check to prevent users > developing a false sense of security in their copy pipeline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9869) Erasure Coding: Rename replication-based names in BlockManager to more generic [part-2]
[ https://issues.apache.org/jira/browse/HDFS-9869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261451#comment-15261451 ] Rakesh R commented on HDFS-9869: Thanks a lot [~zhz], [~andrew.wang] for the useful discussions and help in resolving this jira. > Erasure Coding: Rename replication-based names in BlockManager to more > generic [part-2] > --- > > Key: HDFS-9869 > URL: https://issues.apache.org/jira/browse/HDFS-9869 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding >Reporter: Rakesh R >Assignee: Rakesh R > Labels: hdfs-ec-3.0-must-do > Fix For: 3.0.0 > > Attachments: HDFS-9869-001.patch, HDFS-9869-002.patch, > HDFS-9869-003.patch, HDFS-9869-004.patch, HDFS-9869-005.patch, > HDFS-9869-006.patch, HDFS-9869-007.patch > > > The idea of this jira is to rename the following entities in BlockManager as, > - {{PendingReplicationBlocks}} to {{PendingReconstructionBlocks}} > - {{excessReplicateMap}} to {{extraRedundancyMap}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10337) OfflineEditsViewer stats option should print 0 instead of null for the count of operations
[ https://issues.apache.org/jira/browse/HDFS-10337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261442#comment-15261442 ] Akira AJISAKA commented on HDFS-10337: -- Thank you for your patch. Yes, it fixes this issue. In addition, would you use StringBuilder instead of StringBuffer? I'm +1 if that is addressed. > OfflineEditsViewer stats option should print 0 instead of null for the count > of operations > -- > > Key: HDFS-10337 > URL: https://issues.apache.org/jira/browse/HDFS-10337 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Akira AJISAKA >Assignee: Lin Yiqun >Priority: Minor > Labels: newbie > Attachments: HDFS-10337.001.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10337) OfflineEditsViewer stats option should print 0 instead of null for the count of operations
[ https://issues.apache.org/jira/browse/HDFS-10337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261402#comment-15261402 ] Hadoop QA commented on HDFS-10337: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 5s {color} | {color:red} Docker failed to build yetus/hadoop:7b1c37a. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12801167/HDFS-10337.001.patch | | JIRA Issue | HDFS-10337 | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/15312/console | | Powered by | Apache Yetus 0.2.0 http://yetus.apache.org | This message was automatically generated. > OfflineEditsViewer stats option should print 0 instead of null for the count > of operations > -- > > Key: HDFS-10337 > URL: https://issues.apache.org/jira/browse/HDFS-10337 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Akira AJISAKA >Assignee: Lin Yiqun >Priority: Minor > Labels: newbie > Attachments: HDFS-10337.001.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10337) OfflineEditsViewer stats option should print 0 instead of null for the count of operations
[ https://issues.apache.org/jira/browse/HDFS-10337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261401#comment-15261401 ] Lin Yiqun commented on HDFS-10337: -- HI, [~ajisakaa], I attach a patch for this, can this patch satisfied your issue? > OfflineEditsViewer stats option should print 0 instead of null for the count > of operations > -- > > Key: HDFS-10337 > URL: https://issues.apache.org/jira/browse/HDFS-10337 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Akira AJISAKA >Assignee: Lin Yiqun >Priority: Minor > Labels: newbie > Attachments: HDFS-10337.001.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10337) OfflineEditsViewer stats option should print 0 instead of null for the count of operations
[ https://issues.apache.org/jira/browse/HDFS-10337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Yiqun updated HDFS-10337: - Attachment: HDFS-10337.001.patch > OfflineEditsViewer stats option should print 0 instead of null for the count > of operations > -- > > Key: HDFS-10337 > URL: https://issues.apache.org/jira/browse/HDFS-10337 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Akira AJISAKA >Assignee: Lin Yiqun >Priority: Minor > Labels: newbie > Attachments: HDFS-10337.001.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10337) OfflineEditsViewer stats option should print 0 instead of null for the count of operations
[ https://issues.apache.org/jira/browse/HDFS-10337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Yiqun updated HDFS-10337: - Status: Patch Available (was: Open) > OfflineEditsViewer stats option should print 0 instead of null for the count > of operations > -- > > Key: HDFS-10337 > URL: https://issues.apache.org/jira/browse/HDFS-10337 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Akira AJISAKA >Assignee: Lin Yiqun >Priority: Minor > Labels: newbie > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-10337) OfflineEditsViewer stats option should print 0 instead of null for the count of operations
[ https://issues.apache.org/jira/browse/HDFS-10337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Yiqun reassigned HDFS-10337: Assignee: Lin Yiqun > OfflineEditsViewer stats option should print 0 instead of null for the count > of operations > -- > > Key: HDFS-10337 > URL: https://issues.apache.org/jira/browse/HDFS-10337 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Akira AJISAKA >Assignee: Lin Yiqun >Priority: Minor > Labels: newbie > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9958) BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed storages.
[ https://issues.apache.org/jira/browse/HDFS-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261388#comment-15261388 ] Walter Su commented on HDFS-9958: - bq. I think only DFSClient currently reports storageID. No, it doesn't. {code} //DFSInputStream.java protected void reportCheckSumFailure(CorruptedBlocks corruptedBlocks, int dataNodeCount, boolean isStriped) { ... reportList.add(new LocatedBlock(blk, locs)); } } ... dfsClient.reportChecksumFailure(src, reportList.toArray(new LocatedBlock[reportList.size()])); {code} {{locs}} is {{DatanodeInfoWithStorage}} actually, it has the storageIDs. But the {{LocatedBlock}} constructor is wrong. {code} public LocatedBlock(ExtendedBlock b, DatanodeInfo[] locs) { // By default, startOffset is unknown(-1) and corrupt is false. this(b, locs, null, null, -1, false, EMPTY_LOCS); } ... ... public LocatedBlock(ExtendedBlock b, DatanodeInfo[] locs, String[] storageIDs, StorageType[] storageTypes, long startOffset, boolean corrupt, DatanodeInfo[] cachedLocs) { ... DatanodeInfoWithStorage storage = new DatanodeInfoWithStorage(di, storageIDs != null ? storageIDs[i] : null, storageTypes != null ? storageTypes[i] : null); this.locs[i] = storage; {code} It loses the storageIDs. > BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed > storages. > > > Key: HDFS-9958 > URL: https://issues.apache.org/jira/browse/HDFS-9958 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: HDFS-9958-Test-v1.txt, HDFS-9958.001.patch, > HDFS-9958.002.patch, HDFS-9958.003.patch, HDFS-9958.004.patch, > HDFS-9958.005.patch > > > In a scenario where the corrupt replica is on a failed storage, before it is > taken out of blocksMap, there is a race which causes the creation of > LocatedBlock on a {{machines}} array element that is not populated. > Following is the root cause, > {code} > final int numCorruptNodes = countNodes(blk).corruptReplicas(); > {code} > countNodes only looks at nodes with storage state as NORMAL, which in the > case where corrupt replica is on failed storage will amount to > numCorruptNodes being zero. > {code} > final int numNodes = blocksMap.numNodes(blk); > {code} > However, numNodes will count all nodes/storages irrespective of the state of > the storage. Therefore numMachines will include such (failed) nodes. The > assert would fail only if the system is enabled to catch Assertion errors, > otherwise it goes ahead and tries to create LocatedBlock object for that is > not put in the {{machines}} array. > Here is the stack trace: > {code} > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:45) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:40) > at > org.apache.hadoop.hdfs.protocol.LocatedBlock.(LocatedBlock.java:84) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:878) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:826) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlockList(BlockManager.java:799) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlocks(BlockManager.java:899) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1849) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:588) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.S
[jira] [Updated] (HDFS-7877) Support maintenance state for datanodes
[ https://issues.apache.org/jira/browse/HDFS-7877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-7877: -- Component/s: namenode datanode > Support maintenance state for datanodes > --- > > Key: HDFS-7877 > URL: https://issues.apache.org/jira/browse/HDFS-7877 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Ming Ma > Attachments: HDFS-7877-2.patch, HDFS-7877.patch, > Supportmaintenancestatefordatanodes-2.pdf, > Supportmaintenancestatefordatanodes.pdf > > > This requirement came up during the design for HDFS-7541. Given this feature > is mostly independent of upgrade domain feature, it is better to track it > under a separate jira. The design and draft patch will be available soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9389) Add maintenance states to AdminStates
[ https://issues.apache.org/jira/browse/HDFS-9389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261264#comment-15261264 ] Hadoop QA commented on HDFS-9389: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 3s {color} | {color:red} Docker failed to build yetus/hadoop:7b1c37a. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12801153/HDFS-9389.patch | | JIRA Issue | HDFS-9389 | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/15311/console | | Powered by | Apache Yetus 0.2.0 http://yetus.apache.org | This message was automatically generated. > Add maintenance states to AdminStates > - > > Key: HDFS-9389 > URL: https://issues.apache.org/jira/browse/HDFS-9389 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ming Ma >Assignee: Ming Ma > Attachments: HDFS-9389.patch > > > This jira will add {{ENTERING_MAINTENANCE}} and {{IN_MAINTENANCE}} to > {{AdminStates}} and protobuf. It will also provide the basic functionality to > transition DN into or out of maintenance state from DN's state machine's > point of view. The actual admins support and block management will be covered > by separate jiras. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10324) Trash directory in an encryption zone should be pre-created with sticky bit
[ https://issues.apache.org/jira/browse/HDFS-10324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261263#comment-15261263 ] Andrew Wang commented on HDFS-10324: Sounds good to me [~xyao]. Let's also do the provisionTrash flag as an EnumSet to future-proof. > Trash directory in an encryption zone should be pre-created with sticky bit > --- > > Key: HDFS-10324 > URL: https://issues.apache.org/jira/browse/HDFS-10324 > Project: Hadoop HDFS > Issue Type: Bug > Components: encryption >Affects Versions: 2.8.0 > Environment: CDH5.7.0 >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-10324.001.patch, HDFS-10324.002.patch, > HDFS-10324.003.patch > > > We encountered a bug in HDFS-8831: > After HDFS-8831, a deleted file in an encryption zone is moved to a .Trash > subdirectory within the encryption zone. > However, if this .Trash subdirectory is not created beforehand, it will be > created and owned by the first user who deleted a file, with permission > drwx--. This creates a serious bug because any other non-privileged user > will not be able to delete any files within the encryption zone, because they > do not have the permission to move directories to the trash directory. > We should fix this bug, by pre-creating the .Trash directory with sticky bit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10297) Increase default balance bandwidth and concurrent moves
[ https://issues.apache.org/jira/browse/HDFS-10297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-10297: --- Resolution: Fixed Fix Version/s: (was: 2.9.0) 2.8.0 Status: Resolved (was: Patch Available) Pushed, thanks John! > Increase default balance bandwidth and concurrent moves > --- > > Key: HDFS-10297 > URL: https://issues.apache.org/jira/browse/HDFS-10297 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer & mover >Affects Versions: 2.6.0 >Reporter: John Zhuge >Assignee: John Zhuge >Priority: Minor > Labels: supportability > Fix For: 2.8.0 > > Attachments: HDFS-10297.001.patch, HDFS-10297.002.patch, > HDFS-10297.003.branch-2.8.patch, HDFS-10297.003.patch > > > Adjust the default values to better support the current level of customer > host and network configurations. > Increase the default for property {{dfs.datanode.balance.bandwidthPerSec}} > from 1 to 10 MB. Apply to DN. 10 MB/s is about 10% of the GbE network. > Increase the default for property > {{dfs.datanode.balance.max.concurrent.moves}} from 5 to 50. Apply to DN and > Balancer. The default number of DN receiver threads is 4096. The default > number of balancer mover threads is 1000. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9389) Add maintenance states to AdminStates
[ https://issues.apache.org/jira/browse/HDFS-9389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-9389: -- Assignee: Ming Ma Status: Patch Available (was: Open) > Add maintenance states to AdminStates > - > > Key: HDFS-9389 > URL: https://issues.apache.org/jira/browse/HDFS-9389 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ming Ma >Assignee: Ming Ma > Attachments: HDFS-9389.patch > > > This jira will add {{ENTERING_MAINTENANCE}} and {{IN_MAINTENANCE}} to > {{AdminStates}} and protobuf. It will also provide the basic functionality to > transition DN into or out of maintenance state from DN's state machine's > point of view. The actual admins support and block management will be covered > by separate jiras. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9389) Add maintenance states to AdminStates
[ https://issues.apache.org/jira/browse/HDFS-9389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-9389: -- Attachment: HDFS-9389.patch Here is the draft patch. The rest of maintenance functionalities are covered by other jiras. > Add maintenance states to AdminStates > - > > Key: HDFS-9389 > URL: https://issues.apache.org/jira/browse/HDFS-9389 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ming Ma > Attachments: HDFS-9389.patch > > > This jira will add {{ENTERING_MAINTENANCE}} and {{IN_MAINTENANCE}} to > {{AdminStates}} and protobuf. It will also provide the basic functionality to > transition DN into or out of maintenance state from DN's state machine's > point of view. The actual admins support and block management will be covered > by separate jiras. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client
[ https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261052#comment-15261052 ] Lei (Eddy) Xu edited comment on HDFS-3702 at 4/27/16 11:45 PM: --- Committed to trunk and branch-2. Thanks a lot for the detailed suggestions and kindly reviews from [~andrew.wang], [~nkeywal], [~stack], [~cmccabe], [~arpitagarwal] and [~szetszwo]! was (Author: eddyxu): Committed to trunk and branch-2. Thanks a lot for the detailed suggestions and kindly reviews from [~andrew.wang], [~nkeywal], [~stack], [~arpitagarwal] and [~szetszwo]! > Add an option for NOT writing the blocks locally if there is a datanode on > the same box as the client > - > > Key: HDFS-3702 > URL: https://issues.apache.org/jira/browse/HDFS-3702 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.5.1 >Reporter: Nicolas Liochon >Assignee: Lei (Eddy) Xu >Priority: Minor > Labels: BB2015-05-TBR > Fix For: 3.0.0, 2.9.0 > > Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, > HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, > HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, > HDFS-3702.008.patch, HDFS-3702.009.patch, HDFS-3702.010.patch, > HDFS-3702.011.patch, HDFS-3702.012.patch, HDFS-3702_Design.pdf > > > This is useful for Write-Ahead-Logs: these files are writen for recovery > only, and are not read when there are no failures. > Taking HBase as an example, these files will be read only if the process that > wrote them (the 'HBase regionserver') dies. This will likely come from a > hardware failure, hence the corresponding datanode will be dead as well. So > we're writing 3 replicas, but in reality only 2 of them are really useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client
[ https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261153#comment-15261153 ] Hudson commented on HDFS-3702: -- FAILURE: Integrated in Hadoop-trunk-Commit #9686 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9686/]) HDFS-3702. Fix missing imports from HDFS-3702 trunk patch. (lei: rev 8bd0bca0b1ea524132f564b3b8332506421f64b9) * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolTranslatorPB.java > Add an option for NOT writing the blocks locally if there is a datanode on > the same box as the client > - > > Key: HDFS-3702 > URL: https://issues.apache.org/jira/browse/HDFS-3702 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.5.1 >Reporter: Nicolas Liochon >Assignee: Lei (Eddy) Xu >Priority: Minor > Labels: BB2015-05-TBR > Fix For: 3.0.0, 2.9.0 > > Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, > HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, > HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, > HDFS-3702.008.patch, HDFS-3702.009.patch, HDFS-3702.010.patch, > HDFS-3702.011.patch, HDFS-3702.012.patch, HDFS-3702_Design.pdf > > > This is useful for Write-Ahead-Logs: these files are writen for recovery > only, and are not read when there are no failures. > Taking HBase as an example, these files will be read only if the process that > wrote them (the 'HBase regionserver') dies. This will likely come from a > hardware failure, hence the corresponding datanode will be dead as well. So > we're writing 3 replicas, but in reality only 2 of them are really useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10338) DistCp masks potential CRC check failures
[ https://issues.apache.org/jira/browse/HDFS-10338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261127#comment-15261127 ] Mingliang Liu commented on HDFS-10338: -- I'm favor of propagating the IOException thrown by {{getFileChecksum()}}. The retriable command will take care of it, and after all retry-attempts the copy mapper will handle failures accordingly. Moreover, I believe this is orthogonal to the {{ignoreFailures}} option. Another point to mention is that, the {{checksumsAreEqual()}} has a conflict/confusing javadoc. It claims: {code} * @return If either checksum couldn't be retrieved, the function returns * false. If checksums are retrieved, the function returns true if they match, * and false otherwise. * @throws IOException if there's an exception while retrieving checksums. {code} While it has a {{throws IOException}} signature, it does not really throw any exception. > DistCp masks potential CRC check failures > - > > Key: HDFS-10338 > URL: https://issues.apache.org/jira/browse/HDFS-10338 > Project: Hadoop HDFS > Issue Type: Bug > Components: distcp >Affects Versions: 2.7.1 >Reporter: Elliot West > > There appear to be edge cases whereby CRC checks may be circumvented when > requests for checksums from the source or target file system fail. In this > event CRCs could differ between the source and target and yet the DistCp copy > would succeed, even when the 'skip CRC check' option is not being used. > The code in question is contained in the method > [{{org.apache.hadoop.tools.util.DistCpUtils#checksumsAreEqual(...)}}|https://github.com/apache/hadoop/blob/release-2.7.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/DistCpUtils.java#L457] > Specifically this code block suggests that if there is a failure when trying > to read the source or target checksum then the method will return {{true}} > (i.e. the checksums are equal), implying that the check succeeded. In actual > fact we just failed to obtain the checksum and could not perform the check. > {code} > try { > sourceChecksum = sourceChecksum != null ? sourceChecksum : > sourceFS.getFileChecksum(source); > targetChecksum = targetFS.getFileChecksum(target); > } catch (IOException e) { > LOG.error("Unable to retrieve checksum for " + source + " or " > + target, e); > } > return (sourceChecksum == null || targetChecksum == null || > sourceChecksum.equals(targetChecksum)); > {code} > I believe that at the very least the caught {{IOException}} should be > re-thrown. If this is not deemed desirable then I believe an option > ({{--strictCrc}}?) should be added to enforce a strict check where we require > that both the source and target CRCs are retrieved, are not null, and are > then compared for equality. If for any reason either of the CRCs retrievals > fail then an exception is thrown. > Clearly some {{FileSystems}} do not support CRCs and invocations to > {{FileSystem.getFileChecksum(...)}} return {{null}} in these instances. I > would suggest that these should fail a strict CRC check to prevent users > developing a false sense of security in their copy pipeline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10220) Namenode failover due to too long loking in LeaseManager.Monitor
[ https://issues.apache.org/jira/browse/HDFS-10220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261064#comment-15261064 ] Nicolas Fraison commented on HDFS-10220: Thanks [~walter.k.su] for the catch. We can update the break like that to avoid this issue: {code} } finally { if (isMaxLockHoldToReleaseLease(start)) { LOG.debug("Breaking out of checkLeases() after " + maxLockHoldToReleaseLease + "ms."); if (leaseToCheck.hasFiles()) { renewLease(leaseToCheck); } break; } } {code} > Namenode failover due to too long loking in LeaseManager.Monitor > > > Key: HDFS-10220 > URL: https://issues.apache.org/jira/browse/HDFS-10220 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Nicolas Fraison >Assignee: Nicolas Fraison >Priority: Minor > Attachments: HADOOP-10220.001.patch, HADOOP-10220.002.patch, > HADOOP-10220.003.patch, HADOOP-10220.004.patch, HADOOP-10220.005.patch, > threaddump_zkfc.txt > > > I have faced a namenode failover due to unresponsive namenode detected by the > zkfc with lot's of WARN messages (5 millions) like this one: > _org.apache.hadoop.hdfs.StateChange: BLOCK* internalReleaseLease: All > existing blocks are COMPLETE, lease removed, file closed._ > On the threaddump taken by the zkfc there are lots of thread blocked due to a > lock. > Looking at the code, there are a lock taken by the LeaseManager.Monitor when > some lease must be released. Due to the really big number of lease to be > released the namenode has taken too many times to release them blocking all > other tasks and making the zkfc thinking that the namenode was not > available/stuck. > The idea of this patch is to limit the number of leased released each time we > check for lease so the lock won't be taken for a too long time period. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client
[ https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261063#comment-15261063 ] Hudson commented on HDFS-3702: -- FAILURE: Integrated in Hadoop-trunk-Commit #9685 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9685/]) HDFS-3702. Add an option for NOT writing the blocks locally if there is (lei: rev 0a152103f19a3e8e1b7f33aeb9dd115ba231d7b7) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirWriteFileOp.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicy.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeleteRace.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/AddBlockFlag.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSOutputStream.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestBlockPlacementPolicyRackFaultTolerant.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileCreation.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/StripedDataStreamer.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/ReplicationWork.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestAvailableSpaceBlockPlacementPolicy.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/TestOpenFilesWithSnapshot.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestHASafeMode.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicy.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/ClientNamenodeProtocol.proto * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSClientRetries.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeadDatanode.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDefaultBlockPlacementPolicy.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolServerSideTranslatorPB.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/TestSnapshotBlocksMap.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyConsiderLoad.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockStoragePolicy.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestUpgradeDomainBlockPlacementPolicy.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestAddBlockRetry.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CreateFlag.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/ErasureCodingWork.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/BaseReplicationPolicyTest.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestAddStripedBlocks.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyWithUpgradeDomain.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyWithNodeGroup.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSStripe
[jira] [Updated] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client
[ https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated HDFS-3702: Resolution: Fixed Fix Version/s: 2.9.0 3.0.0 Status: Resolved (was: Patch Available) Committed to trunk and branch-2. Thanks a lot for the detailed suggestions and kindly reviews from [~andrew.wang], [~nkeywal], [~stack], [~arpitagarwal] and [~szetszwo]! > Add an option for NOT writing the blocks locally if there is a datanode on > the same box as the client > - > > Key: HDFS-3702 > URL: https://issues.apache.org/jira/browse/HDFS-3702 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.5.1 >Reporter: Nicolas Liochon >Assignee: Lei (Eddy) Xu >Priority: Minor > Labels: BB2015-05-TBR > Fix For: 3.0.0, 2.9.0 > > Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, > HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, > HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, > HDFS-3702.008.patch, HDFS-3702.009.patch, HDFS-3702.010.patch, > HDFS-3702.011.patch, HDFS-3702.012.patch, HDFS-3702_Design.pdf > > > This is useful for Write-Ahead-Logs: these files are writen for recovery > only, and are not read when there are no failures. > Taking HBase as an example, these files will be read only if the process that > wrote them (the 'HBase regionserver') dies. This will likely come from a > hardware failure, hence the corresponding datanode will be dead as well. So > we're writing 3 replicas, but in reality only 2 of them are really useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-10339) libhdfs++: Expose async operations through the C API
James Clampffer created HDFS-10339: -- Summary: libhdfs++: Expose async operations through the C API Key: HDFS-10339 URL: https://issues.apache.org/jira/browse/HDFS-10339 Project: Hadoop HDFS Issue Type: Sub-task Reporter: James Clampffer Assignee: James Clampffer I propose an API that looks like the following for doing async operations in C. (might be some typeos, going off memory of what I tried, will clean up) {code} typedef struct { int status; ssize_t count; ... whatever else ... } async_context; typedef void* caller_context; typedef void (*)(const async_context*, caller_context*) capi_callback; void hdfsAsyncPread(hdfsFS fs, hdfsFile file, off_t offset, void *buf, size_t count, capi_callback, caller_context); {code} When invoked we take a copy of the caller context that gets forwarded to the callback when the async op completes; this is where a user can keep a pointer to some state associated with the operation. The callback is invoked by a const async_contex* analogous to the Status object in the C++ API so the callback code can check status, bytes read, and other stuff. Internally this can be implemented by a callable struct/lambda that captures the caller_context and invokes the capi_callback with the caller_context and result async_context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9758) libhdfs++: Implement Python bindings
[ https://issues.apache.org/jira/browse/HDFS-9758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Clampffer updated HDFS-9758: -- Attachment: hdfs_posix.py Adding my example using the the C API and CTypes to implement File and FileSystem in python HdfsFile.readline isn't even half baked, the idea was to read 4KB blocks that looked like pages into a dict until a newline char was found. That dict could also function as a small cache. The rest of the implementation should work though. > libhdfs++: Implement Python bindings > > > Key: HDFS-9758 > URL: https://issues.apache.org/jira/browse/HDFS-9758 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: James Clampffer > Attachments: hdfs_posix.py > > > It'd be really useful to have bindings for various scripting languages. > Python would be a good start because of it's popularity and how easy it is to > interact with shared libraries using the ctypes module. I think bindings for > the V8 engine that nodeJS uses would be a close second in terms of expanding > the potential user base. > Probably worth starting with just adding a synchronous API and building from > there to avoid interactions with python's garbage collector until the > bindings prove to be solid. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6489) DFS Used space is not correct computed on frequent append operations
[ https://issues.apache.org/jira/browse/HDFS-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260954#comment-15260954 ] Ravi Prakash commented on HDFS-6489: The problem is here: https://github.com/apache/hadoop/blob/f16722d2ef31338a57a13e2c8d18c1c62d58bbaf/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java#L323 . Even though this is an append, {{dfsUsage}} is incremented by the total block size every time. This can be easily seen by running the {{testFrequentAppend}} (included in Weiwei's patch) and adding a log line after line 323. As far as I can see, this problem existed since 2012, but only recently did this become problematic because we started considering dfsUsed space in deciding whether to write a block or not. > DFS Used space is not correct computed on frequent append operations > > > Key: HDFS-6489 > URL: https://issues.apache.org/jira/browse/HDFS-6489 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.2.0, 2.7.1, 2.7.2 >Reporter: stanley shi >Assignee: Weiwei Yang > Attachments: HDFS-6489.001.patch, HDFS-6489.002.patch, > HDFS-6489.003.patch, HDFS6489.java > > > The current implementation of the Datanode will increase the DFS used space > on each block write operation. This is correct in most scenario (create new > file), but sometimes it will behave in-correct(append small data to a large > block). > For example, I have a file with only one block(say, 60M). Then I try to > append to it very frequently but each time I append only 10 bytes; > Then on each append, dfs used will be increased with the length of the > block(60M), not teh actual data length(10bytes). > Consider in a scenario I use many clients to append concurrently to a large > number of files (1000+), assume the block size is 32M (half of the default > value), then the dfs used will be increased 1000*32M = 32G on each append to > the files; but actually I only write 10K bytes; this will cause the datanode > to report in-sufficient disk space on data write. > {quote}2014-06-04 15:27:34,719 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: opWriteBlock > BP-1649188734-10.37.7.142-1398844098971:blk_1073742834_45306 received > exception org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException: > Insufficient space for appending to FinalizedReplica, blk_1073742834_45306, > FINALIZED{quote} > But the actual disk usage: > {quote} > [root@hdsh143 ~]# df -h > FilesystemSize Used Avail Use% Mounted on > /dev/sda3 16G 2.9G 13G 20% / > tmpfs 1.9G 72K 1.9G 1% /dev/shm > /dev/sda1 97M 32M 61M 35% /boot > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client
[ https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260953#comment-15260953 ] Tsz Wo Nicholas Sze commented on HDFS-3702: --- After a second thought, I agree that it is fine to add CreateFlag.NO_LOCAL_WRITE as LimitedPrivate to HBase. Thanks. > Add an option for NOT writing the blocks locally if there is a datanode on > the same box as the client > - > > Key: HDFS-3702 > URL: https://issues.apache.org/jira/browse/HDFS-3702 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.5.1 >Reporter: Nicolas Liochon >Assignee: Lei (Eddy) Xu >Priority: Minor > Labels: BB2015-05-TBR > Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, > HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, > HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, > HDFS-3702.008.patch, HDFS-3702.009.patch, HDFS-3702.010.patch, > HDFS-3702.011.patch, HDFS-3702.012.patch, HDFS-3702_Design.pdf > > > This is useful for Write-Ahead-Logs: these files are writen for recovery > only, and are not read when there are no failures. > Taking HBase as an example, these files will be read only if the process that > wrote them (the 'HBase regionserver') dies. This will likely come from a > hardware failure, hence the corresponding datanode will be dead as well. So > we're writing 3 replicas, but in reality only 2 of them are really useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10335) Mover$Processor#chooseTarget() always chooses the first matching target storage group
[ https://issues.apache.org/jira/browse/HDFS-10335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260943#comment-15260943 ] Mingliang Liu commented on HDFS-10335: -- Thanks for your discussion and review, [~szetszwo]. By the way, we did not see any failing UT locally, and let's pend on Jenkins to verify. Meanwhile, we're testing the patch manually on a local cluster. > Mover$Processor#chooseTarget() always chooses the first matching target > storage group > - > > Key: HDFS-10335 > URL: https://issues.apache.org/jira/browse/HDFS-10335 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover >Affects Versions: 2.8.0 >Reporter: Mingliang Liu >Assignee: Mingliang Liu >Priority: Critical > Attachments: HDFS-10335.000.patch > > > Currently the > {{org.apache.hadoop.hdfs.server.mover.Mover$Processor#chooseTarget()}} always > chooses the first matching target datanode from the candidate list. This may > make the mover schedule a lot of task to a few of the datanodes (first > several datanodes of the candidate list). The overall performance will suffer > significantly from this because of the saturated network/disk usage. > Specially, if the {{dfs.datanode.balance.max.concurrent.moves}} is set, the > scheduled move task will be queued on a few of the storage group, regardless > of other available storage groups. We need an algorithm which can distribute > the move tasks approximately even across all the candidate target storage > groups. > Thanks [~szetszwo] for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10335) Mover$Processor#chooseTarget() always chooses the first matching target storage group
[ https://issues.apache.org/jira/browse/HDFS-10335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260940#comment-15260940 ] Mingliang Liu commented on HDFS-10335: -- {code} Step 16 : RUN cabal update && cabal install shellcheck --global ---> Running in 5438b8eb4d37 Config file path source is default config file. Config file /root/.cabal/config not found. Writing default configuration to /root/.cabal/config Downloading the latest package list from hackage.haskell.org [91mca[0m[91mbal: Failed to download http://hackage.haskell.org/packages/archive/00-index.tar.gz : ErrorMisc "Un[0m[91msucessful [0m[91mHTTP code: 502" [0mThe command '/bin/sh -c cabal update && cabal install shellcheck --global' returned a non-zero code: 1 Total Elapsed time: 0m 4s ERROR: Docker failed to build image. {code} It seems the Yetus is not happy, but not Jenkins. > Mover$Processor#chooseTarget() always chooses the first matching target > storage group > - > > Key: HDFS-10335 > URL: https://issues.apache.org/jira/browse/HDFS-10335 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover >Affects Versions: 2.8.0 >Reporter: Mingliang Liu >Assignee: Mingliang Liu >Priority: Critical > Attachments: HDFS-10335.000.patch > > > Currently the > {{org.apache.hadoop.hdfs.server.mover.Mover$Processor#chooseTarget()}} always > chooses the first matching target datanode from the candidate list. This may > make the mover schedule a lot of task to a few of the datanodes (first > several datanodes of the candidate list). The overall performance will suffer > significantly from this because of the saturated network/disk usage. > Specially, if the {{dfs.datanode.balance.max.concurrent.moves}} is set, the > scheduled move task will be queued on a few of the storage group, regardless > of other available storage groups. We need an algorithm which can distribute > the move tasks approximately even across all the candidate target storage > groups. > Thanks [~szetszwo] for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10335) Mover$Processor#chooseTarget() always chooses the first matching target storage group
[ https://issues.apache.org/jira/browse/HDFS-10335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260934#comment-15260934 ] Hadoop QA commented on HDFS-10335: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 4s {color} | {color:red} Docker failed to build yetus/hadoop:7b1c37a. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12800935/HDFS-10335.000.patch | | JIRA Issue | HDFS-10335 | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/15310/console | | Powered by | Apache Yetus 0.2.0 http://yetus.apache.org | This message was automatically generated. > Mover$Processor#chooseTarget() always chooses the first matching target > storage group > - > > Key: HDFS-10335 > URL: https://issues.apache.org/jira/browse/HDFS-10335 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover >Affects Versions: 2.8.0 >Reporter: Mingliang Liu >Assignee: Mingliang Liu >Priority: Critical > Attachments: HDFS-10335.000.patch > > > Currently the > {{org.apache.hadoop.hdfs.server.mover.Mover$Processor#chooseTarget()}} always > chooses the first matching target datanode from the candidate list. This may > make the mover schedule a lot of task to a few of the datanodes (first > several datanodes of the candidate list). The overall performance will suffer > significantly from this because of the saturated network/disk usage. > Specially, if the {{dfs.datanode.balance.max.concurrent.moves}} is set, the > scheduled move task will be queued on a few of the storage group, regardless > of other available storage groups. We need an algorithm which can distribute > the move tasks approximately even across all the candidate target storage > groups. > Thanks [~szetszwo] for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9902) Support different values of dfs.datanode.du.reserved per storage type
[ https://issues.apache.org/jira/browse/HDFS-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-9902: Summary: Support different values of dfs.datanode.du.reserved per storage type (was: dfs.datanode.du.reserved should be difference between StorageType DISK and RAM_DISK) > Support different values of dfs.datanode.du.reserved per storage type > - > > Key: HDFS-9902 > URL: https://issues.apache.org/jira/browse/HDFS-9902 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.7.2 >Reporter: Pan Yuxuan >Assignee: Brahma Reddy Battula > Attachments: HDFS-9902-02.patch, HDFS-9902.patch > > > Now Hadoop support different storage type for DISK, SSD, ARCHIVE and > RAM_DISK, but they share one configuration dfs.datanode.du.reserved. > The DISK size may be several TB and the RAM_DISK size may be only several > tens of GB. > The problem is that when I configure DISK and RAM_DISK (tmpfs) in the same > DN, and I set dfs.datanode.du.reserved values 10GB, this will waste a lot of > RAM_DISK size. > Since the usage of RAM_DISK can be 100%, so I don't want > dfs.datanode.du.reserved configured for DISK impacts the usage of tmpfs. > So can we make a new configuration for RAM_DISK or just skip this > configuration for RAM_DISK? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10335) Mover$Processor#chooseTarget() always chooses the first matching target storage group
[ https://issues.apache.org/jira/browse/HDFS-10335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-10335: --- Hadoop Flags: Reviewed +1 patch looks good. > Mover$Processor#chooseTarget() always chooses the first matching target > storage group > - > > Key: HDFS-10335 > URL: https://issues.apache.org/jira/browse/HDFS-10335 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover >Affects Versions: 2.8.0 >Reporter: Mingliang Liu >Assignee: Mingliang Liu >Priority: Critical > Attachments: HDFS-10335.000.patch > > > Currently the > {{org.apache.hadoop.hdfs.server.mover.Mover$Processor#chooseTarget()}} always > chooses the first matching target datanode from the candidate list. This may > make the mover schedule a lot of task to a few of the datanodes (first > several datanodes of the candidate list). The overall performance will suffer > significantly from this because of the saturated network/disk usage. > Specially, if the {{dfs.datanode.balance.max.concurrent.moves}} is set, the > scheduled move task will be queued on a few of the storage group, regardless > of other available storage groups. We need an algorithm which can distribute > the move tasks approximately even across all the candidate target storage > groups. > Thanks [~szetszwo] for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10335) Mover$Processor#chooseTarget() always chooses the first matching target storage group
[ https://issues.apache.org/jira/browse/HDFS-10335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260873#comment-15260873 ] Hadoop QA commented on HDFS-10335: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 3s {color} | {color:red} Docker failed to build yetus/hadoop:7b1c37a. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12800935/HDFS-10335.000.patch | | JIRA Issue | HDFS-10335 | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/15309/console | | Powered by | Apache Yetus 0.2.0 http://yetus.apache.org | This message was automatically generated. > Mover$Processor#chooseTarget() always chooses the first matching target > storage group > - > > Key: HDFS-10335 > URL: https://issues.apache.org/jira/browse/HDFS-10335 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover >Affects Versions: 2.8.0 >Reporter: Mingliang Liu >Assignee: Mingliang Liu >Priority: Critical > Attachments: HDFS-10335.000.patch > > > Currently the > {{org.apache.hadoop.hdfs.server.mover.Mover$Processor#chooseTarget()}} always > chooses the first matching target datanode from the candidate list. This may > make the mover schedule a lot of task to a few of the datanodes (first > several datanodes of the candidate list). The overall performance will suffer > significantly from this because of the saturated network/disk usage. > Specially, if the {{dfs.datanode.balance.max.concurrent.moves}} is set, the > scheduled move task will be queued on a few of the storage group, regardless > of other available storage groups. We need an algorithm which can distribute > the move tasks approximately even across all the candidate target storage > groups. > Thanks [~szetszwo] for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10335) Mover$Processor#chooseTarget() always chooses the first matching target storage group
[ https://issues.apache.org/jira/browse/HDFS-10335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-10335: - Status: Patch Available (was: In Progress) > Mover$Processor#chooseTarget() always chooses the first matching target > storage group > - > > Key: HDFS-10335 > URL: https://issues.apache.org/jira/browse/HDFS-10335 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover >Affects Versions: 2.8.0 >Reporter: Mingliang Liu >Assignee: Mingliang Liu >Priority: Critical > Attachments: HDFS-10335.000.patch > > > Currently the > {{org.apache.hadoop.hdfs.server.mover.Mover$Processor#chooseTarget()}} always > chooses the first matching target datanode from the candidate list. This may > make the mover schedule a lot of task to a few of the datanodes (first > several datanodes of the candidate list). The overall performance will suffer > significantly from this because of the saturated network/disk usage. > Specially, if the {{dfs.datanode.balance.max.concurrent.moves}} is set, the > scheduled move task will be queued on a few of the storage group, regardless > of other available storage groups. We need an algorithm which can distribute > the move tasks approximately even across all the candidate target storage > groups. > Thanks [~szetszwo] for offline discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client
[ https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260856#comment-15260856 ] Lei (Eddy) Xu commented on HDFS-3702: - All the tests failures are not related. I pass all tests locally with the exception of TestHFlush, which was reported in HDFS-2043 and thus not related. If there is no further objections, I will commit this by EOD. Thanks. > Add an option for NOT writing the blocks locally if there is a datanode on > the same box as the client > - > > Key: HDFS-3702 > URL: https://issues.apache.org/jira/browse/HDFS-3702 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.5.1 >Reporter: Nicolas Liochon >Assignee: Lei (Eddy) Xu >Priority: Minor > Labels: BB2015-05-TBR > Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, > HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, > HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, > HDFS-3702.008.patch, HDFS-3702.009.patch, HDFS-3702.010.patch, > HDFS-3702.011.patch, HDFS-3702.012.patch, HDFS-3702_Design.pdf > > > This is useful for Write-Ahead-Logs: these files are writen for recovery > only, and are not read when there are no failures. > Taking HBase as an example, these files will be read only if the process that > wrote them (the 'HBase regionserver') dies. This will likely come from a > hardware failure, hence the corresponding datanode will be dead as well. So > we're writing 3 replicas, but in reality only 2 of them are really useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10220) Namenode failover due to too long loking in LeaseManager.Monitor
[ https://issues.apache.org/jira/browse/HDFS-10220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260674#comment-15260674 ] Ravi Prakash commented on HDFS-10220: - Good catch Walter! That seems broken prior to this patch too, doesn't it? [~wheat9] Could you please comment on whether {{leaseToCheck}} should be put back into sortedLeases in case {{completed}} is {{false}} . Otherwise, the lease will never be checked again. Or perhaps I am not understanding some other mechanism through which it would. > Namenode failover due to too long loking in LeaseManager.Monitor > > > Key: HDFS-10220 > URL: https://issues.apache.org/jira/browse/HDFS-10220 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Nicolas Fraison >Assignee: Nicolas Fraison >Priority: Minor > Attachments: HADOOP-10220.001.patch, HADOOP-10220.002.patch, > HADOOP-10220.003.patch, HADOOP-10220.004.patch, HADOOP-10220.005.patch, > threaddump_zkfc.txt > > > I have faced a namenode failover due to unresponsive namenode detected by the > zkfc with lot's of WARN messages (5 millions) like this one: > _org.apache.hadoop.hdfs.StateChange: BLOCK* internalReleaseLease: All > existing blocks are COMPLETE, lease removed, file closed._ > On the threaddump taken by the zkfc there are lots of thread blocked due to a > lock. > Looking at the code, there are a lock taken by the LeaseManager.Monitor when > some lease must be released. Due to the really big number of lease to be > released the namenode has taken too many times to release them blocking all > other tasks and making the zkfc thinking that the namenode was not > available/stuck. > The idea of this patch is to limit the number of leased released each time we > check for lease so the lock won't be taken for a too long time period. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9902) dfs.datanode.du.reserved should be difference between StorageType DISK and RAM_DISK
[ https://issues.apache.org/jira/browse/HDFS-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260499#comment-15260499 ] Xiaoyu Yao commented on HDFS-9902: -- Agree with [~arpitagarwal], we need to document the new keys {{dfs.datanode.du.#storagetype#.reserved}}. > dfs.datanode.du.reserved should be difference between StorageType DISK and > RAM_DISK > --- > > Key: HDFS-9902 > URL: https://issues.apache.org/jira/browse/HDFS-9902 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.7.2 >Reporter: Pan Yuxuan >Assignee: Brahma Reddy Battula > Attachments: HDFS-9902-02.patch, HDFS-9902.patch > > > Now Hadoop support different storage type for DISK, SSD, ARCHIVE and > RAM_DISK, but they share one configuration dfs.datanode.du.reserved. > The DISK size may be several TB and the RAM_DISK size may be only several > tens of GB. > The problem is that when I configure DISK and RAM_DISK (tmpfs) in the same > DN, and I set dfs.datanode.du.reserved values 10GB, this will waste a lot of > RAM_DISK size. > Since the usage of RAM_DISK can be 100%, so I don't want > dfs.datanode.du.reserved configured for DISK impacts the usage of tmpfs. > So can we make a new configuration for RAM_DISK or just skip this > configuration for RAM_DISK? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10175) add per-operation stats to FileSystem.Statistics
[ https://issues.apache.org/jira/browse/HDFS-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260498#comment-15260498 ] Colin Patrick McCabe commented on HDFS-10175: - BTW, sorry for the last-minute-ness of this scheduling, [~liuml07] and [~steve_l]. Webex here at 10:30: HDFS-10175 webex Wednesday, April 27, 2016 10:30 am | Pacific Daylight Time (San Francisco, GMT-07:00) | 1 hr JOIN WEBEX MEETING https://cloudera.webex.com/cloudera/j.php?MTID=mebca25435f158dec71b2589561e71b29 Meeting number: 294 963 170 Meeting password: 1234 JOIN BY PHONE 1-650-479-3208 Call-in toll number (US/Canada) Access code: 294 963 170 Global call-in numbers: https://cloudera.webex.com/cloudera/globalcallin.php?serviceType=MC&ED=45642173&tollFree=0 > add per-operation stats to FileSystem.Statistics > > > Key: HDFS-10175 > URL: https://issues.apache.org/jira/browse/HDFS-10175 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Reporter: Ram Venkatesh >Assignee: Mingliang Liu > Attachments: HDFS-10175.000.patch, HDFS-10175.001.patch, > HDFS-10175.002.patch, HDFS-10175.003.patch, HDFS-10175.004.patch, > HDFS-10175.005.patch, HDFS-10175.006.patch, TestStatisticsOverhead.java > > > Currently FileSystem.Statistics exposes the following statistics: > BytesRead > BytesWritten > ReadOps > LargeReadOps > WriteOps > These are in-turn exposed as job counters by MapReduce and other frameworks. > There is logic within DfsClient to map operations to these counters that can > be confusing, for instance, mkdirs counts as a writeOp. > Proposed enhancement: > Add a statistic for each DfsClient operation including create, append, > createSymlink, delete, exists, mkdirs, rename and expose them as new > properties on the Statistics object. The operation-specific counters can be > used for analyzing the load imposed by a particular job on HDFS. > For example, we can use them to identify jobs that end up creating a large > number of files. > Once this information is available in the Statistics object, the app > frameworks like MapReduce can expose them as additional counters to be > aggregated and recorded as part of job summary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-10338) DistCp masks potential CRC check failures
Elliot West created HDFS-10338: -- Summary: DistCp masks potential CRC check failures Key: HDFS-10338 URL: https://issues.apache.org/jira/browse/HDFS-10338 Project: Hadoop HDFS Issue Type: Bug Components: distcp Affects Versions: 2.7.1 Reporter: Elliot West There appear to be edge cases whereby CRC checks may be circumvented when requests for checksums from the source or target file system fail. In this event CRCs could differ between the source and target and yet the DistCp copy would succeed, even when the 'skip CRC check' option is not being used. The code in question is contained in the method [{{org.apache.hadoop.tools.util.DistCpUtils#checksumsAreEqual(...)}}|https://github.com/apache/hadoop/blob/release-2.7.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/DistCpUtils.java#L457] Specifically this code block suggests that if there is a failure when trying to read the source or target checksum then the method will return {{true}} (i.e. the checksums are equal), implying that the check succeeded. In actual fact we just failed to obtain the checksum and could not perform the check. {code} try { sourceChecksum = sourceChecksum != null ? sourceChecksum : sourceFS.getFileChecksum(source); targetChecksum = targetFS.getFileChecksum(target); } catch (IOException e) { LOG.error("Unable to retrieve checksum for " + source + " or " + target, e); } return (sourceChecksum == null || targetChecksum == null || sourceChecksum.equals(targetChecksum)); {code} I believe that at the very least the caught {{IOException}} should be re-thrown. If this is not deemed desirable then I believe an option ({{--strictCrc}}?) should be added to enforce a strict check where we require that both the source and target CRCs are retrieved, are not null, and are then compared for equality. If for any reason either of the CRCs retrievals fail then an exception is thrown. Clearly some {{FileSystems}} do not support CRCs and invocations to {{FileSystem.getFileChecksum(...)}} return {{null}} in these instances. I would suggest that these should fail a strict CRC check to prevent users developing a false sense of security in their copy pipeline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10175) add per-operation stats to FileSystem.Statistics
[ https://issues.apache.org/jira/browse/HDFS-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260487#comment-15260487 ] Colin Patrick McCabe commented on HDFS-10175: - Great. Let me add a webex > add per-operation stats to FileSystem.Statistics > > > Key: HDFS-10175 > URL: https://issues.apache.org/jira/browse/HDFS-10175 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Reporter: Ram Venkatesh >Assignee: Mingliang Liu > Attachments: HDFS-10175.000.patch, HDFS-10175.001.patch, > HDFS-10175.002.patch, HDFS-10175.003.patch, HDFS-10175.004.patch, > HDFS-10175.005.patch, HDFS-10175.006.patch, TestStatisticsOverhead.java > > > Currently FileSystem.Statistics exposes the following statistics: > BytesRead > BytesWritten > ReadOps > LargeReadOps > WriteOps > These are in-turn exposed as job counters by MapReduce and other frameworks. > There is logic within DfsClient to map operations to these counters that can > be confusing, for instance, mkdirs counts as a writeOp. > Proposed enhancement: > Add a statistic for each DfsClient operation including create, append, > createSymlink, delete, exists, mkdirs, rename and expose them as new > properties on the Statistics object. The operation-specific counters can be > used for analyzing the load imposed by a particular job on HDFS. > For example, we can use them to identify jobs that end up creating a large > number of files. > Once this information is available in the Statistics object, the app > frameworks like MapReduce can expose them as additional counters to be > aggregated and recorded as part of job summary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-10337) OfflineEditsViewer stats option should print 0 instead of null for the count of operations
Akira AJISAKA created HDFS-10337: Summary: OfflineEditsViewer stats option should print 0 instead of null for the count of operations Key: HDFS-10337 URL: https://issues.apache.org/jira/browse/HDFS-10337 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.2 Reporter: Akira AJISAKA Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HDFS-9902) dfs.datanode.du.reserved should be difference between StorageType DISK and RAM_DISK
[ https://issues.apache.org/jira/browse/HDFS-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260434#comment-15260434 ] Arpit Agarwal edited comment on HDFS-9902 at 4/27/16 4:31 PM: -- Hi [~brahmareddy], thank you for reporting this. The fix lgtm. The unit test can be done more simply without MiniDFSCluster. Just instantiate "FsVolumeImpl" objects with different storage types and check {{#reserved}} is initialized correctly. Also could you please update the documentation of {{dfs.datanode.du.reserved}}? was (Author: arpitagarwal): Hi [~brahmareddy], thank you for reporting this. The fix lgtm. The unit test can be done more simply without MiniDFSCluster. Just instantiate "FsVolumeImpl" objects with different storage types and check that the value of {{#reserved}}. Also could you please update the documentation of {{dfs.datanode.du.reserved}}? > dfs.datanode.du.reserved should be difference between StorageType DISK and > RAM_DISK > --- > > Key: HDFS-9902 > URL: https://issues.apache.org/jira/browse/HDFS-9902 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.7.2 >Reporter: Pan Yuxuan >Assignee: Brahma Reddy Battula > Attachments: HDFS-9902-02.patch, HDFS-9902.patch > > > Now Hadoop support different storage type for DISK, SSD, ARCHIVE and > RAM_DISK, but they share one configuration dfs.datanode.du.reserved. > The DISK size may be several TB and the RAM_DISK size may be only several > tens of GB. > The problem is that when I configure DISK and RAM_DISK (tmpfs) in the same > DN, and I set dfs.datanode.du.reserved values 10GB, this will waste a lot of > RAM_DISK size. > Since the usage of RAM_DISK can be 100%, so I don't want > dfs.datanode.du.reserved configured for DISK impacts the usage of tmpfs. > So can we make a new configuration for RAM_DISK or just skip this > configuration for RAM_DISK? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9902) dfs.datanode.du.reserved should be difference between StorageType DISK and RAM_DISK
[ https://issues.apache.org/jira/browse/HDFS-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260434#comment-15260434 ] Arpit Agarwal commented on HDFS-9902: - Hi [~brahmareddy], thank you for reporting this. The fix lgtm. The unit test can be done more simply without MiniDFSCluster. Just instantiate "FsVolumeImpl" objects with different storage types and check that the value of {{#reserved}}. Also could you please update the documentation of {{dfs.datanode.du.reserved}}? > dfs.datanode.du.reserved should be difference between StorageType DISK and > RAM_DISK > --- > > Key: HDFS-9902 > URL: https://issues.apache.org/jira/browse/HDFS-9902 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.7.2 >Reporter: Pan Yuxuan >Assignee: Brahma Reddy Battula > Attachments: HDFS-9902-02.patch, HDFS-9902.patch > > > Now Hadoop support different storage type for DISK, SSD, ARCHIVE and > RAM_DISK, but they share one configuration dfs.datanode.du.reserved. > The DISK size may be several TB and the RAM_DISK size may be only several > tens of GB. > The problem is that when I configure DISK and RAM_DISK (tmpfs) in the same > DN, and I set dfs.datanode.du.reserved values 10GB, this will waste a lot of > RAM_DISK size. > Since the usage of RAM_DISK can be 100%, so I don't want > dfs.datanode.du.reserved configured for DISK impacts the usage of tmpfs. > So can we make a new configuration for RAM_DISK or just skip this > configuration for RAM_DISK? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-8746) Reduce the latency of streaming reads by re-using DN connections
[ https://issues.apache.org/jira/browse/HDFS-8746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Clampffer reassigned HDFS-8746: - Assignee: James Clampffer (was: Bob Hansen) > Reduce the latency of streaming reads by re-using DN connections > > > Key: HDFS-8746 > URL: https://issues.apache.org/jira/browse/HDFS-8746 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Bob Hansen >Assignee: James Clampffer > > The current libhdfspp implementation opens a new connection for each pread. > For streaming reads (especially streaming short-buffer reads coming from the > C API, and especially once we get SSL handshake overhead), our throughput > will be dominated by the connection latency of reconnecting to the DataNodes. > The target use case is a multi-block file that is being sequentially streamed > and processed by the client application, which consumes the data as it comes > from the DN and throws it away. The data is read into moderately small > buffers (~64k - ~1MB) owned by the consumer, and overall throughput is the > critical metric. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9758) libhdfs++: Implement Python bindings
[ https://issues.apache.org/jira/browse/HDFS-9758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260423#comment-15260423 ] James Clampffer commented on HDFS-9758: --- Here's a bunch of my thoughts about this, let me know what you think. I haven't done much in Python 3.x so some of my assumptions might not hold true there. My thinking was to focus on supporting CPython via CTypes, at least initially. I have a patch where I hacked together a demo of how this could be done that I'll dig up and post later today (doesn't support iterable files or readline() and isn't optimized but otherwise works well enough). My overall opinion about this is that we should make it as easy to access HDFS through python as possible so less configuration and fewer dependencies is really important to get people to use it. Naturally if some minor amount of configurations leads to a huge performance boost than it's worth considering. I think CPython is the best place to focus simply because of it's ubiquity. PyPy is a cool project but doesn't come installed by default on many linux distributions as far as I know. CPython ships with CTypes so that's one less dependency to bring in (unless CFFI is also included as a default library), but as you said you're pretty much stuck writing C wrapper functions for everything. I don't think that's a dealbreaker as forcing a C API walls off exceptions and things that shouldn't be getting into the interpreter anyway. Does Cython get you a whole lot of benefits over something like CTypes? I don't have experience with it. Boost.Python or a pure python extension would mostly likely be the cleanest and most performant way of doing this sort of thing at the expense of extra complexity. I've also heard that hadoop and boost generally don't mix but we've already made an exception for boost::asio (maybe that's different because it's header only?). The only concern I'd have with both would be that they tie the module to the libhdfs++ C++ ABI so we'd have to be careful about compatibility. I could see writing a module being a big benefit because then we could hook into the GC to properly support garbage collected async operations. I think it's important to make sure at least some this work can help implement bindings for other languages but I think most approaches would do that in one way or another. I'm partial to building language specific wrappers over the C API just because most scripting languages have a way of calling C functions. > libhdfs++: Implement Python bindings > > > Key: HDFS-9758 > URL: https://issues.apache.org/jira/browse/HDFS-9758 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: James Clampffer > > It'd be really useful to have bindings for various scripting languages. > Python would be a good start because of it's popularity and how easy it is to > interact with shared libraries using the ctypes module. I think bindings for > the V8 engine that nodeJS uses would be a close second in terms of expanding > the potential user base. > Probably worth starting with just adding a synchronous API and building from > there to avoid interactions with python's garbage collector until the > bindings prove to be solid. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10332) hdfs-native-client fails to build with CMake 2.8.11 or earlier
[ https://issues.apache.org/jira/browse/HDFS-10332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Clampffer updated HDFS-10332: --- Resolution: Fixed Status: Resolved (was: Patch Available) > hdfs-native-client fails to build with CMake 2.8.11 or earlier > -- > > Key: HDFS-10332 > URL: https://issues.apache.org/jira/browse/HDFS-10332 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Tibor Kiss >Assignee: Tibor Kiss >Priority: Minor > Attachments: HDFS-10332.01.patch, HDFS-10332.HDFS-8707.001.patch > > > Due to a new syntax introduced in CMake 2.8.12 (get_filename_component's > function when VAR=DIRECTORY) the native-client won't build. > Currently RHEL6 & 7 are using older version of CMake. > Error log: > {noformat} > [INFO] --- maven-antrun-plugin:1.7:run (make) @ hadoop-hdfs-native-client --- > [INFO] Executing tasks > main: > [exec] JAVA_HOME=, > JAVA_JVM_LIBRARY=/usr/java/jdk1.7.0_79/jre/lib/amd64/server/libjvm.so > [exec] JAVA_INCLUDE_PATH=/usr/java/jdk1.7.0_79/include, > JAVA_INCLUDE_PATH2=/usr/java/jdk1.7.0_79/include/linux > [exec] Located all JNI components successfully. > [exec] -- Could NOT find PROTOBUF (missing: PROTOBUF_LIBRARY > PROTOBUF_INCLUDE_DIR) > [exec] -- valgrind location: MEMORYCHECK_COMMAND-NOTFOUND > [exec] -- checking for module 'fuse' > [exec] -- package 'fuse' not found > [exec] -- Failed to find Linux FUSE libraries or include files. Will > not build FUSE client. > [exec] -- Configuring incomplete, errors occurred! > [exec] CMake Error at main/native/libhdfspp/CMakeLists.txt:71 > (get_filename_component): > [exec] get_filename_component unknown component DIRECTORY > [exec] Call Stack (most recent call first): > [exec] main/native/libhdfspp/CMakeLists.txt:95 (copy_on_demand) > [exec] > [exec] > [exec] CMake Error at main/native/libhdfspp/CMakeLists.txt:71 > (get_filename_component): > [exec] get_filename_component unknown component DIRECTORY > [exec] Call Stack (most recent call first): > [exec] main/native/libhdfspp/CMakeLists.txt:96 (copy_on_demand) > [exec] > [exec] > [exec] CMake Error at main/native/libhdfspp/CMakeLists.txt:71 > (get_filename_component): > [exec] get_filename_component unknown component DIRECTORY > [exec] Call Stack (most recent call first): > [exec] main/native/libhdfspp/CMakeLists.txt:97 (copy_on_demand) > [exec] > [exec] > [exec] CMake Error at main/native/libhdfspp/CMakeLists.txt:71 > (get_filename_component): > [exec] get_filename_component unknown component DIRECTORY > [exec] Call Stack (most recent call first): > [exec] main/native/libhdfspp/CMakeLists.txt:98 (copy_on_demand) > [exec] > [exec] > [exec] CMake Error: The following variables are used in this project, > but they are set to NOTFOUND. > [exec] Please set them or make sure they are set and tested correctly in > the CMake files: > [exec] PROTOBUF_LIBRARY (ADVANCED) > [exec] linked by target "hdfspp" in directory > /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp > [exec] linked by target "hdfspp_static" in directory > /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp > [exec] linked by target "protoc-gen-hrpc" in directory > /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/proto > [exec] linked by target "bad_datanode_test" in directory > /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/tests > [exec] linked by target "hdfs_builder_test" in directory > /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/tests > [exec] linked by target "hdfspp_errors_test" in directory > /home/tiborkiss/devel/workspace > [exec] > /hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/tests > [exec] linked by target "libhdfs_threaded_hdfspp_test_shim_static" > in directory > /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/test > [exec] linked by target "logging_test" in directory > /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/tests > [exec] linked by target "node_exclusion_test" in directory > /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-clien
[jira] [Commented] (HDFS-10332) hdfs-native-client fails to build with CMake 2.8.11 or earlier
[ https://issues.apache.org/jira/browse/HDFS-10332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260355#comment-15260355 ] James Clampffer commented on HDFS-10332: Hi Tibor, Thanks for finding this problem and fixing it! I've committed it to HDFS-8707. Making sure libhdfs++ builds/runs as expected on RHEL 6/7 is a priority of mine as well; I had been building a newer version of CMake from source so I hadn't noticed this. > hdfs-native-client fails to build with CMake 2.8.11 or earlier > -- > > Key: HDFS-10332 > URL: https://issues.apache.org/jira/browse/HDFS-10332 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Tibor Kiss >Assignee: Tibor Kiss >Priority: Minor > Attachments: HDFS-10332.01.patch, HDFS-10332.HDFS-8707.001.patch > > > Due to a new syntax introduced in CMake 2.8.12 (get_filename_component's > function when VAR=DIRECTORY) the native-client won't build. > Currently RHEL6 & 7 are using older version of CMake. > Error log: > {noformat} > [INFO] --- maven-antrun-plugin:1.7:run (make) @ hadoop-hdfs-native-client --- > [INFO] Executing tasks > main: > [exec] JAVA_HOME=, > JAVA_JVM_LIBRARY=/usr/java/jdk1.7.0_79/jre/lib/amd64/server/libjvm.so > [exec] JAVA_INCLUDE_PATH=/usr/java/jdk1.7.0_79/include, > JAVA_INCLUDE_PATH2=/usr/java/jdk1.7.0_79/include/linux > [exec] Located all JNI components successfully. > [exec] -- Could NOT find PROTOBUF (missing: PROTOBUF_LIBRARY > PROTOBUF_INCLUDE_DIR) > [exec] -- valgrind location: MEMORYCHECK_COMMAND-NOTFOUND > [exec] -- checking for module 'fuse' > [exec] -- package 'fuse' not found > [exec] -- Failed to find Linux FUSE libraries or include files. Will > not build FUSE client. > [exec] -- Configuring incomplete, errors occurred! > [exec] CMake Error at main/native/libhdfspp/CMakeLists.txt:71 > (get_filename_component): > [exec] get_filename_component unknown component DIRECTORY > [exec] Call Stack (most recent call first): > [exec] main/native/libhdfspp/CMakeLists.txt:95 (copy_on_demand) > [exec] > [exec] > [exec] CMake Error at main/native/libhdfspp/CMakeLists.txt:71 > (get_filename_component): > [exec] get_filename_component unknown component DIRECTORY > [exec] Call Stack (most recent call first): > [exec] main/native/libhdfspp/CMakeLists.txt:96 (copy_on_demand) > [exec] > [exec] > [exec] CMake Error at main/native/libhdfspp/CMakeLists.txt:71 > (get_filename_component): > [exec] get_filename_component unknown component DIRECTORY > [exec] Call Stack (most recent call first): > [exec] main/native/libhdfspp/CMakeLists.txt:97 (copy_on_demand) > [exec] > [exec] > [exec] CMake Error at main/native/libhdfspp/CMakeLists.txt:71 > (get_filename_component): > [exec] get_filename_component unknown component DIRECTORY > [exec] Call Stack (most recent call first): > [exec] main/native/libhdfspp/CMakeLists.txt:98 (copy_on_demand) > [exec] > [exec] > [exec] CMake Error: The following variables are used in this project, > but they are set to NOTFOUND. > [exec] Please set them or make sure they are set and tested correctly in > the CMake files: > [exec] PROTOBUF_LIBRARY (ADVANCED) > [exec] linked by target "hdfspp" in directory > /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp > [exec] linked by target "hdfspp_static" in directory > /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp > [exec] linked by target "protoc-gen-hrpc" in directory > /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/proto > [exec] linked by target "bad_datanode_test" in directory > /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/tests > [exec] linked by target "hdfs_builder_test" in directory > /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/tests > [exec] linked by target "hdfspp_errors_test" in directory > /home/tiborkiss/devel/workspace > [exec] > /hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/tests > [exec] linked by target "libhdfs_threaded_hdfspp_test_shim_static" > in directory > /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/test > [exec] linked by target "logging_test" in directory > /home/tiborkiss/devel
[jira] [Updated] (HDFS-10287) MiniDFSCluster should implement AutoCloseable
[ https://issues.apache.org/jira/browse/HDFS-10287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andras Bokor updated HDFS-10287: Attachment: HDFS-10287.01.patch > MiniDFSCluster should implement AutoCloseable > - > > Key: HDFS-10287 > URL: https://issues.apache.org/jira/browse/HDFS-10287 > Project: Hadoop HDFS > Issue Type: Improvement > Components: test >Affects Versions: 2.7.0 >Reporter: John Zhuge >Assignee: John Zhuge >Priority: Trivial > Attachments: HDFS-10287.01.patch > > > {{MiniDFSCluster}} should implement {{AutoCloseable}} in order to support > [try-with-resources|https://docs.oracle.com/javase/tutorial/essential/exceptions/tryResourceClose.html]. > It will make test code a little cleaner and more reliable. > Since {{AutoCloseable}} is only in Java 1.7 or later, this can not be > backported to Hadoop version prior to 2.7. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10287) MiniDFSCluster should implement AutoCloseable
[ https://issues.apache.org/jira/browse/HDFS-10287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andras Bokor updated HDFS-10287: Attachment: (was: HDFS-10287.01.patch) > MiniDFSCluster should implement AutoCloseable > - > > Key: HDFS-10287 > URL: https://issues.apache.org/jira/browse/HDFS-10287 > Project: Hadoop HDFS > Issue Type: Improvement > Components: test >Affects Versions: 2.7.0 >Reporter: John Zhuge >Assignee: John Zhuge >Priority: Trivial > Attachments: HDFS-10287.01.patch > > > {{MiniDFSCluster}} should implement {{AutoCloseable}} in order to support > [try-with-resources|https://docs.oracle.com/javase/tutorial/essential/exceptions/tryResourceClose.html]. > It will make test code a little cleaner and more reliable. > Since {{AutoCloseable}} is only in Java 1.7 or later, this can not be > backported to Hadoop version prior to 2.7. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10287) MiniDFSCluster should implement AutoCloseable
[ https://issues.apache.org/jira/browse/HDFS-10287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andras Bokor updated HDFS-10287: Attachment: HDFS-10287.01.patch I added AutoCloseable to the class and I also modified the realated test class as an evidence it works. > MiniDFSCluster should implement AutoCloseable > - > > Key: HDFS-10287 > URL: https://issues.apache.org/jira/browse/HDFS-10287 > Project: Hadoop HDFS > Issue Type: Improvement > Components: test >Affects Versions: 2.7.0 >Reporter: John Zhuge >Assignee: John Zhuge >Priority: Trivial > Attachments: HDFS-10287.01.patch > > > {{MiniDFSCluster}} should implement {{AutoCloseable}} in order to support > [try-with-resources|https://docs.oracle.com/javase/tutorial/essential/exceptions/tryResourceClose.html]. > It will make test code a little cleaner and more reliable. > Since {{AutoCloseable}} is only in Java 1.7 or later, this can not be > backported to Hadoop version prior to 2.7. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HDFS-9958) BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed storages.
[ https://issues.apache.org/jira/browse/HDFS-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260200#comment-15260200 ] Kihwal Lee edited comment on HDFS-9958 at 4/27/16 2:20 PM: --- I was on and off helping Kuhu with the patch. I'll take a quick pass over it today. was (Author: daryn): I was on and off helping Ku with the patch. I'll take a quick pass over it today. > BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed > storages. > > > Key: HDFS-9958 > URL: https://issues.apache.org/jira/browse/HDFS-9958 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: HDFS-9958-Test-v1.txt, HDFS-9958.001.patch, > HDFS-9958.002.patch, HDFS-9958.003.patch, HDFS-9958.004.patch, > HDFS-9958.005.patch > > > In a scenario where the corrupt replica is on a failed storage, before it is > taken out of blocksMap, there is a race which causes the creation of > LocatedBlock on a {{machines}} array element that is not populated. > Following is the root cause, > {code} > final int numCorruptNodes = countNodes(blk).corruptReplicas(); > {code} > countNodes only looks at nodes with storage state as NORMAL, which in the > case where corrupt replica is on failed storage will amount to > numCorruptNodes being zero. > {code} > final int numNodes = blocksMap.numNodes(blk); > {code} > However, numNodes will count all nodes/storages irrespective of the state of > the storage. Therefore numMachines will include such (failed) nodes. The > assert would fail only if the system is enabled to catch Assertion errors, > otherwise it goes ahead and tries to create LocatedBlock object for that is > not put in the {{machines}} array. > Here is the stack trace: > {code} > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:45) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:40) > at > org.apache.hadoop.hdfs.protocol.LocatedBlock.(LocatedBlock.java:84) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:878) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:826) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlockList(BlockManager.java:799) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlocks(BlockManager.java:899) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1849) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:588) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9958) BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed storages.
[ https://issues.apache.org/jira/browse/HDFS-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260226#comment-15260226 ] Kihwal Lee commented on HDFS-9958: -- bq. I'm surprised that most of the time, {{storageID}} is null. The {{storageID}} is not always available. If a corruption is detected by the block/volume scanner, storageID can be filled in. But when bad blocks are reported by {{reportRemoteBadBlock()}} during re-replication or balancing, the reporting node won't know the ID. If we blindly make it report the locally available id, it will end up reporting a wrong id. I think only {{DFSClient}} currently reports {{storageID}}. This shouldn't be a problem as long as the assumption that a datanode stores only one replica/stripe of a block holds. > BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed > storages. > > > Key: HDFS-9958 > URL: https://issues.apache.org/jira/browse/HDFS-9958 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: HDFS-9958-Test-v1.txt, HDFS-9958.001.patch, > HDFS-9958.002.patch, HDFS-9958.003.patch, HDFS-9958.004.patch, > HDFS-9958.005.patch > > > In a scenario where the corrupt replica is on a failed storage, before it is > taken out of blocksMap, there is a race which causes the creation of > LocatedBlock on a {{machines}} array element that is not populated. > Following is the root cause, > {code} > final int numCorruptNodes = countNodes(blk).corruptReplicas(); > {code} > countNodes only looks at nodes with storage state as NORMAL, which in the > case where corrupt replica is on failed storage will amount to > numCorruptNodes being zero. > {code} > final int numNodes = blocksMap.numNodes(blk); > {code} > However, numNodes will count all nodes/storages irrespective of the state of > the storage. Therefore numMachines will include such (failed) nodes. The > assert would fail only if the system is enabled to catch Assertion errors, > otherwise it goes ahead and tries to create LocatedBlock object for that is > not put in the {{machines}} array. > Here is the stack trace: > {code} > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:45) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:40) > at > org.apache.hadoop.hdfs.protocol.LocatedBlock.(LocatedBlock.java:84) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:878) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:826) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlockList(BlockManager.java:799) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlocks(BlockManager.java:899) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1849) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:588) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10330) Add Corrupt Blocks Information in Metasave Output
[ https://issues.apache.org/jira/browse/HDFS-10330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260215#comment-15260215 ] Kuhu Shukla commented on HDFS-10330: Thanks a lot Kihwal! > Add Corrupt Blocks Information in Metasave Output > - > > Key: HDFS-10330 > URL: https://issues.apache.org/jira/browse/HDFS-10330 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.7.2 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Fix For: 2.8.0 > > Attachments: HDFS-10330.001.patch, HDFS-10330.002.patch > > > Along with Datanode information and other vital block information, it would > be useful to have corruptblocks' detailed info as part of metasave since > currently the jmx tracks only the count of corrupt nodes. This JIRA addresses > this improvement. CC: [~kihwal], [~daryn]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9958) BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed storages.
[ https://issues.apache.org/jira/browse/HDFS-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260200#comment-15260200 ] Daryn Sharp commented on HDFS-9958: --- I was on and off helping Ku with the patch. I'll take a quick pass over it today. > BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed > storages. > > > Key: HDFS-9958 > URL: https://issues.apache.org/jira/browse/HDFS-9958 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: HDFS-9958-Test-v1.txt, HDFS-9958.001.patch, > HDFS-9958.002.patch, HDFS-9958.003.patch, HDFS-9958.004.patch, > HDFS-9958.005.patch > > > In a scenario where the corrupt replica is on a failed storage, before it is > taken out of blocksMap, there is a race which causes the creation of > LocatedBlock on a {{machines}} array element that is not populated. > Following is the root cause, > {code} > final int numCorruptNodes = countNodes(blk).corruptReplicas(); > {code} > countNodes only looks at nodes with storage state as NORMAL, which in the > case where corrupt replica is on failed storage will amount to > numCorruptNodes being zero. > {code} > final int numNodes = blocksMap.numNodes(blk); > {code} > However, numNodes will count all nodes/storages irrespective of the state of > the storage. Therefore numMachines will include such (failed) nodes. The > assert would fail only if the system is enabled to catch Assertion errors, > otherwise it goes ahead and tries to create LocatedBlock object for that is > not put in the {{machines}} array. > Here is the stack trace: > {code} > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:45) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:40) > at > org.apache.hadoop.hdfs.protocol.LocatedBlock.(LocatedBlock.java:84) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:878) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:826) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlockList(BlockManager.java:799) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlocks(BlockManager.java:899) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1849) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:588) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10330) Add Corrupt Blocks Information in Metasave Output
[ https://issues.apache.org/jira/browse/HDFS-10330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-10330: -- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) I've committed this to trunk, branch-2 and branch-2.8. Thanks for the patch, [~kshukla]. > Add Corrupt Blocks Information in Metasave Output > - > > Key: HDFS-10330 > URL: https://issues.apache.org/jira/browse/HDFS-10330 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.7.2 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Fix For: 2.8.0 > > Attachments: HDFS-10330.001.patch, HDFS-10330.002.patch > > > Along with Datanode information and other vital block information, it would > be useful to have corruptblocks' detailed info as part of metasave since > currently the jmx tracks only the count of corrupt nodes. This JIRA addresses > this improvement. CC: [~kihwal], [~daryn]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10330) Add Corrupt Blocks Information in Metasave Output
[ https://issues.apache.org/jira/browse/HDFS-10330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260150#comment-15260150 ] Hudson commented on HDFS-10330: --- FAILURE: Integrated in Hadoop-trunk-Commit #9680 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9680/]) HDFS-10330. Add Corrupt Blocks Information in Metasave output. (kihwal: rev 919a1d824a0a61145dc7ae59cfba3f34d91f2681) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CorruptReplicasMap.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestMetaSave.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java > Add Corrupt Blocks Information in Metasave Output > - > > Key: HDFS-10330 > URL: https://issues.apache.org/jira/browse/HDFS-10330 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.7.2 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: HDFS-10330.001.patch, HDFS-10330.002.patch > > > Along with Datanode information and other vital block information, it would > be useful to have corruptblocks' detailed info as part of metasave since > currently the jmx tracks only the count of corrupt nodes. This JIRA addresses > this improvement. CC: [~kihwal], [~daryn]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10330) Add Corrupt Blocks Information in Metasave Output
[ https://issues.apache.org/jira/browse/HDFS-10330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260111#comment-15260111 ] Kihwal Lee commented on HDFS-10330: --- Sample output {noformat} Corrupt Blocks: Block=123412345 Node=10.0.0.1:1004 StorageID=DS-27b3aa33-4052-4625-86d0-999234270a3f StorageState=NORMAL TotalReplicas=4 Reason=GENSTAMP_MISMATCH {noformat} +1 Looks good > Add Corrupt Blocks Information in Metasave Output > - > > Key: HDFS-10330 > URL: https://issues.apache.org/jira/browse/HDFS-10330 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.7.2 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: HDFS-10330.001.patch, HDFS-10330.002.patch > > > Along with Datanode information and other vital block information, it would > be useful to have corruptblocks' detailed info as part of metasave since > currently the jmx tracks only the count of corrupt nodes. This JIRA addresses > this improvement. CC: [~kihwal], [~daryn]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8449) Add tasks count metrics to datanode for ECWorker
[ https://issues.apache.org/jira/browse/HDFS-8449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15259993#comment-15259993 ] Hadoop QA commented on HDFS-8449: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 37s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s {color} | {color:green} trunk passed with JDK v1.8.0_92 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 51s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 54s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s {color} | {color:green} trunk passed with JDK v1.8.0_92 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 44s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 44s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s {color} | {color:green} the patch passed with JDK v1.8.0_92 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 36s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 19s {color} | {color:red} hadoop-hdfs-project/hadoop-hdfs: patch generated 2 new + 61 unchanged - 0 fixed = 63 total (was 61) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 47s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 7s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 0s {color} | {color:green} the patch passed with JDK v1.8.0_92 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 41s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 58m 33s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_92. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 55m 26s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 138m 49s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_92 Failed junit tests | hadoop.hdfs.TestHFlush | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl | | JDK v1.7.0_95 Failed junit tests | hadoop.hdfs.TestHFlush | | | hadoop.hdfs.TestLeaseRecovery2 | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:fbe3e86 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12800976/HDFS-8449-006.patch | | JIRA Issue | HDFS-8449 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux fcfa7aab1bbc 3.13.0-36-low
[jira] [Commented] (HDFS-9958) BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed storages.
[ https://issues.apache.org/jira/browse/HDFS-9958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15259918#comment-15259918 ] Walter Su commented on HDFS-9958: - Failed tests are not related. Will commit shortly if there's no further comment. > BlockManager#createLocatedBlocks can throw NPE for corruptBlocks on failed > storages. > > > Key: HDFS-9958 > URL: https://issues.apache.org/jira/browse/HDFS-9958 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: HDFS-9958-Test-v1.txt, HDFS-9958.001.patch, > HDFS-9958.002.patch, HDFS-9958.003.patch, HDFS-9958.004.patch, > HDFS-9958.005.patch > > > In a scenario where the corrupt replica is on a failed storage, before it is > taken out of blocksMap, there is a race which causes the creation of > LocatedBlock on a {{machines}} array element that is not populated. > Following is the root cause, > {code} > final int numCorruptNodes = countNodes(blk).corruptReplicas(); > {code} > countNodes only looks at nodes with storage state as NORMAL, which in the > case where corrupt replica is on failed storage will amount to > numCorruptNodes being zero. > {code} > final int numNodes = blocksMap.numNodes(blk); > {code} > However, numNodes will count all nodes/storages irrespective of the state of > the storage. Therefore numMachines will include such (failed) nodes. The > assert would fail only if the system is enabled to catch Assertion errors, > otherwise it goes ahead and tries to create LocatedBlock object for that is > not put in the {{machines}} array. > Here is the stack trace: > {code} > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:45) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.toDatanodeInfos(DatanodeStorageInfo.java:40) > at > org.apache.hadoop.hdfs.protocol.LocatedBlock.(LocatedBlock.java:84) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:878) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlock(BlockManager.java:826) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlockList(BlockManager.java:799) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.createLocatedBlocks(BlockManager.java:899) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1849) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:588) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6187) Update the document of hftp / hsftp in branch-2
[ https://issues.apache.org/jira/browse/HDFS-6187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15259891#comment-15259891 ] Hadoop QA commented on HDFS-6187: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 9m 40s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 15s {color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 23s {color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 6s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 29m 10s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:babe025 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12800980/HDFS-6187.branch-2.001.patch | | JIRA Issue | HDFS-6187 | | Optional Tests | asflicense mvnsite | | uname | Linux 4d3c5ea17b91 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | branch-2 / 9d3ddb0 | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/15308/console | | Powered by | Apache Yetus 0.2.0 http://yetus.apache.org | This message was automatically generated. > Update the document of hftp / hsftp in branch-2 > --- > > Key: HDFS-6187 > URL: https://issues.apache.org/jira/browse/HDFS-6187 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Haohui Mai > Labels: newbie > Attachments: HDFS-6187.001.patch, HDFS-6187.branch-2.001.patch > > > HDFS-5570 has removed hftp / hsftp from trunk. The documentation of hftp / > hsftp in branch-2 need to be updated to indicate that these two filesystems > are deprecated in 2.x and will be unavailable in 3.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10332) hdfs-native-client fails to build with CMake 2.8.11 or earlier
[ https://issues.apache.org/jira/browse/HDFS-10332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tibor Kiss updated HDFS-10332: -- Attachment: (was: HDFS-10332-HDFS-8707.001.patch) > hdfs-native-client fails to build with CMake 2.8.11 or earlier > -- > > Key: HDFS-10332 > URL: https://issues.apache.org/jira/browse/HDFS-10332 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Tibor Kiss >Assignee: Tibor Kiss >Priority: Minor > Attachments: HDFS-10332.01.patch, HDFS-10332.HDFS-8707.001.patch > > > Due to a new syntax introduced in CMake 2.8.12 (get_filename_component's > function when VAR=DIRECTORY) the native-client won't build. > Currently RHEL6 & 7 are using older version of CMake. > Error log: > {noformat} > [INFO] --- maven-antrun-plugin:1.7:run (make) @ hadoop-hdfs-native-client --- > [INFO] Executing tasks > main: > [exec] JAVA_HOME=, > JAVA_JVM_LIBRARY=/usr/java/jdk1.7.0_79/jre/lib/amd64/server/libjvm.so > [exec] JAVA_INCLUDE_PATH=/usr/java/jdk1.7.0_79/include, > JAVA_INCLUDE_PATH2=/usr/java/jdk1.7.0_79/include/linux > [exec] Located all JNI components successfully. > [exec] -- Could NOT find PROTOBUF (missing: PROTOBUF_LIBRARY > PROTOBUF_INCLUDE_DIR) > [exec] -- valgrind location: MEMORYCHECK_COMMAND-NOTFOUND > [exec] -- checking for module 'fuse' > [exec] -- package 'fuse' not found > [exec] -- Failed to find Linux FUSE libraries or include files. Will > not build FUSE client. > [exec] -- Configuring incomplete, errors occurred! > [exec] CMake Error at main/native/libhdfspp/CMakeLists.txt:71 > (get_filename_component): > [exec] get_filename_component unknown component DIRECTORY > [exec] Call Stack (most recent call first): > [exec] main/native/libhdfspp/CMakeLists.txt:95 (copy_on_demand) > [exec] > [exec] > [exec] CMake Error at main/native/libhdfspp/CMakeLists.txt:71 > (get_filename_component): > [exec] get_filename_component unknown component DIRECTORY > [exec] Call Stack (most recent call first): > [exec] main/native/libhdfspp/CMakeLists.txt:96 (copy_on_demand) > [exec] > [exec] > [exec] CMake Error at main/native/libhdfspp/CMakeLists.txt:71 > (get_filename_component): > [exec] get_filename_component unknown component DIRECTORY > [exec] Call Stack (most recent call first): > [exec] main/native/libhdfspp/CMakeLists.txt:97 (copy_on_demand) > [exec] > [exec] > [exec] CMake Error at main/native/libhdfspp/CMakeLists.txt:71 > (get_filename_component): > [exec] get_filename_component unknown component DIRECTORY > [exec] Call Stack (most recent call first): > [exec] main/native/libhdfspp/CMakeLists.txt:98 (copy_on_demand) > [exec] > [exec] > [exec] CMake Error: The following variables are used in this project, > but they are set to NOTFOUND. > [exec] Please set them or make sure they are set and tested correctly in > the CMake files: > [exec] PROTOBUF_LIBRARY (ADVANCED) > [exec] linked by target "hdfspp" in directory > /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp > [exec] linked by target "hdfspp_static" in directory > /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp > [exec] linked by target "protoc-gen-hrpc" in directory > /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/proto > [exec] linked by target "bad_datanode_test" in directory > /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/tests > [exec] linked by target "hdfs_builder_test" in directory > /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/tests > [exec] linked by target "hdfspp_errors_test" in directory > /home/tiborkiss/devel/workspace > [exec] > /hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/tests > [exec] linked by target "libhdfs_threaded_hdfspp_test_shim_static" > in directory > /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/test > [exec] linked by target "logging_test" in directory > /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/tests > [exec] linked by target "node_exclusion_test" in directory > /home/tiborkiss/devel/workspace/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhd
[jira] [Commented] (HDFS-10175) add per-operation stats to FileSystem.Statistics
[ https://issues.apache.org/jira/browse/HDFS-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15259851#comment-15259851 ] Steve Loughran commented on HDFS-10175: --- bq. One thing that is a bit concerning about metrics2 is that I think people feel that this interface should be stable (i.e. don't remove or alter things once they're in), which would be a big constraint on us. Ops teams don't like metrics they rely on being taken away; they also view published metrics as the API. As the compatibility docs say "Metrics should preserve compatibility within the major release." bq. Perhaps we could document that per-fs stats were @Public @Evolving rather than stable +1 to that, though it'll be important not to break binary compatibility with external filesystems. FWIW, issues I have with metrics2, apart from "steve doesn't understand the design fully" is its preference for singletons registered with JMX. This makes sense in deployed services, not for tests. bq. Do we have any ideas about how Spark will consume these metrics in the longer term? Spark ia a Coda Hale instrumented codebase, as are many other apps these days. Integration between hadoop metrics of any form and the Coda Hale libraries would be something to address, but not here. > add per-operation stats to FileSystem.Statistics > > > Key: HDFS-10175 > URL: https://issues.apache.org/jira/browse/HDFS-10175 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Reporter: Ram Venkatesh >Assignee: Mingliang Liu > Attachments: HDFS-10175.000.patch, HDFS-10175.001.patch, > HDFS-10175.002.patch, HDFS-10175.003.patch, HDFS-10175.004.patch, > HDFS-10175.005.patch, HDFS-10175.006.patch, TestStatisticsOverhead.java > > > Currently FileSystem.Statistics exposes the following statistics: > BytesRead > BytesWritten > ReadOps > LargeReadOps > WriteOps > These are in-turn exposed as job counters by MapReduce and other frameworks. > There is logic within DfsClient to map operations to these counters that can > be confusing, for instance, mkdirs counts as a writeOp. > Proposed enhancement: > Add a statistic for each DfsClient operation including create, append, > createSymlink, delete, exists, mkdirs, rename and expose them as new > properties on the Statistics object. The operation-specific counters can be > used for analyzing the load imposed by a particular job on HDFS. > For example, we can use them to identify jobs that end up creating a large > number of files. > Once this information is available in the Statistics object, the app > frameworks like MapReduce can expose them as additional counters to be > aggregated and recorded as part of job summary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10175) add per-operation stats to FileSystem.Statistics
[ https://issues.apache.org/jira/browse/HDFS-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15259838#comment-15259838 ] Steve Loughran commented on HDFS-10175: --- Wednesday april 26, 10:30 AM PST; 18:30 UK, 17:30 GMT works for me. Webex binding? if you don't specify one, mine is at https://hortonworks.webex.com/meet/stevel > add per-operation stats to FileSystem.Statistics > > > Key: HDFS-10175 > URL: https://issues.apache.org/jira/browse/HDFS-10175 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Reporter: Ram Venkatesh >Assignee: Mingliang Liu > Attachments: HDFS-10175.000.patch, HDFS-10175.001.patch, > HDFS-10175.002.patch, HDFS-10175.003.patch, HDFS-10175.004.patch, > HDFS-10175.005.patch, HDFS-10175.006.patch, TestStatisticsOverhead.java > > > Currently FileSystem.Statistics exposes the following statistics: > BytesRead > BytesWritten > ReadOps > LargeReadOps > WriteOps > These are in-turn exposed as job counters by MapReduce and other frameworks. > There is logic within DfsClient to map operations to these counters that can > be confusing, for instance, mkdirs counts as a writeOp. > Proposed enhancement: > Add a statistic for each DfsClient operation including create, append, > createSymlink, delete, exists, mkdirs, rename and expose them as new > properties on the Statistics object. The operation-specific counters can be > used for analyzing the load imposed by a particular job on HDFS. > For example, we can use them to identify jobs that end up creating a large > number of files. > Once this information is available in the Statistics object, the app > frameworks like MapReduce can expose them as additional counters to be > aggregated and recorded as part of job summary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6187) Update the document of hftp / hsftp in branch-2
[ https://issues.apache.org/jira/browse/HDFS-6187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gergely Novák updated HDFS-6187: Attachment: HDFS-6187.branch-2.001.patch > Update the document of hftp / hsftp in branch-2 > --- > > Key: HDFS-6187 > URL: https://issues.apache.org/jira/browse/HDFS-6187 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Haohui Mai > Labels: newbie > Attachments: HDFS-6187.001.patch, HDFS-6187.branch-2.001.patch > > > HDFS-5570 has removed hftp / hsftp from trunk. The documentation of hftp / > hsftp in branch-2 need to be updated to indicate that these two filesystems > are deprecated in 2.x and will be unavailable in 3.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6187) Update the document of hftp / hsftp in branch-2
[ https://issues.apache.org/jira/browse/HDFS-6187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15259827#comment-15259827 ] Hadoop QA commented on HDFS-6187: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 4s {color} | {color:red} HDFS-6187 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12800971/HDFS-6187.001.patch | | JIRA Issue | HDFS-6187 | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/15306/console | | Powered by | Apache Yetus 0.2.0 http://yetus.apache.org | This message was automatically generated. > Update the document of hftp / hsftp in branch-2 > --- > > Key: HDFS-6187 > URL: https://issues.apache.org/jira/browse/HDFS-6187 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Haohui Mai > Labels: newbie > Attachments: HDFS-6187.001.patch > > > HDFS-5570 has removed hftp / hsftp from trunk. The documentation of hftp / > hsftp in branch-2 need to be updated to indicate that these two filesystems > are deprecated in 2.x and will be unavailable in 3.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8449) Add tasks count metrics to datanode for ECWorker
[ https://issues.apache.org/jira/browse/HDFS-8449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Bo updated HDFS-8449: Attachment: HDFS-8449-006.patch > Add tasks count metrics to datanode for ECWorker > > > Key: HDFS-8449 > URL: https://issues.apache.org/jira/browse/HDFS-8449 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Li Bo >Assignee: Li Bo > Attachments: HDFS-8449-000.patch, HDFS-8449-001.patch, > HDFS-8449-002.patch, HDFS-8449-003.patch, HDFS-8449-004.patch, > HDFS-8449-005.patch, HDFS-8449-006.patch > > > This sub task try to record ec recovery tasks that a datanode has done, > including total tasks, failed tasks and sucessful tasks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6187) Update the document of hftp / hsftp in branch-2
[ https://issues.apache.org/jira/browse/HDFS-6187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gergely Novák updated HDFS-6187: Status: Patch Available (was: Open) > Update the document of hftp / hsftp in branch-2 > --- > > Key: HDFS-6187 > URL: https://issues.apache.org/jira/browse/HDFS-6187 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Haohui Mai > Labels: newbie > Attachments: HDFS-6187.001.patch > > > HDFS-5570 has removed hftp / hsftp from trunk. The documentation of hftp / > hsftp in branch-2 need to be updated to indicate that these two filesystems > are deprecated in 2.x and will be unavailable in 3.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6187) Update the document of hftp / hsftp in branch-2
[ https://issues.apache.org/jira/browse/HDFS-6187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15259812#comment-15259812 ] Gergely Novák commented on HDFS-6187: - Added deprecated message to the HFTP Guide document in patch #001. [~wheat9] is this what you meant? > Update the document of hftp / hsftp in branch-2 > --- > > Key: HDFS-6187 > URL: https://issues.apache.org/jira/browse/HDFS-6187 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Haohui Mai > Labels: newbie > Attachments: HDFS-6187.001.patch > > > HDFS-5570 has removed hftp / hsftp from trunk. The documentation of hftp / > hsftp in branch-2 need to be updated to indicate that these two filesystems > are deprecated in 2.x and will be unavailable in 3.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6187) Update the document of hftp / hsftp in branch-2
[ https://issues.apache.org/jira/browse/HDFS-6187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gergely Novák updated HDFS-6187: Attachment: HDFS-6187.001.patch > Update the document of hftp / hsftp in branch-2 > --- > > Key: HDFS-6187 > URL: https://issues.apache.org/jira/browse/HDFS-6187 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Haohui Mai > Labels: newbie > Attachments: HDFS-6187.001.patch > > > HDFS-5570 has removed hftp / hsftp from trunk. The documentation of hftp / > hsftp in branch-2 need to be updated to indicate that these two filesystems > are deprecated in 2.x and will be unavailable in 3.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9758) libhdfs++: Implement Python bindings
[ https://issues.apache.org/jira/browse/HDFS-9758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15259810#comment-15259810 ] Tibor Kiss commented on HDFS-9758: -- We have several options to implement Python bindings for the pure C++ HDFS Client: - CFFI (MIT License) - cppyy (MIT License) - Ctypes (MIT License) - Cython (Apache License) - SWIG (GPL License) - Boost.Python (Boost Software License) - pure python extensions While CFFI is simple & clean it does not support C++. cppyy would be a great choice but it supports pypy at this time. Ctypes is integrated to CPython since 2.5, but C++ support is not great. Cython does support both C & C++, seems a reasonable choice. SWIG also supports C & C++, plus it could be later used to bring other scripting language support. It's licensing could be a problem. Boost.Python seems to have great C++ support at a first glance. It's license needs to be studied. Thoughts / feelings / preferences? > libhdfs++: Implement Python bindings > > > Key: HDFS-9758 > URL: https://issues.apache.org/jira/browse/HDFS-9758 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: James Clampffer > > It'd be really useful to have bindings for various scripting languages. > Python would be a good start because of it's popularity and how easy it is to > interact with shared libraries using the ctypes module. I think bindings for > the V8 engine that nodeJS uses would be a close second in terms of expanding > the potential user base. > Probably worth starting with just adding a synchronous API and building from > there to avoid interactions with python's garbage collector until the > bindings prove to be solid. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order
[ https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15259688#comment-15259688 ] Konstantin Shvachko commented on HDFS-10301: ??Maybe I'm misunderstanding the proposal, but don't we already do all of this??? Yes you misunderstood. This part is not my proposal. This is what we already do, and therefore I call them *Constraints*, because they complicate the *Problem*. The proposal is in the third bullet point titled *Approach*. ??What does the NameNode do if the DataNode is restarted while sending these RPCs, so that it never gets a chance to send all the storages that it claimed existed? It seems like you will get stuck?? No, I will not get stuck. All br-RCPs are completely independent of each other. It's just that one of them has all storages, and indicates to the NameNode that it should update its storage list for the DataNode. NN processes as many of such RPCs, as DN sends. If the DN dies the NN will declare it dead in due time, or if DN restarts within 10 minutes it will send new set of block reports from scratch. I do not see any inconsistencies. You can think of it as a new operation SyncStorages, which does just that - updates NameNode's knowledge of DN's storages. I combined this operation with the first br-RPC. One can combine it with any other call, same as you propose to combine it with the heartbeat. Except it seems a poor idea, since we don't want to wait for removal of thousands of replicas on a heartbeat. ??interleaved block reports are extremely rare?? You keep saying this. But it is not rare for me. Are you convincing me not to believe my eyes or that you checked the logs on your thousands of clusters? I did check mine. > BlockReport retransmissions may lead to storages falsely being declared > zombie if storage report processing happens out of order > > > Key: HDFS-10301 > URL: https://issues.apache.org/jira/browse/HDFS-10301 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.1 >Reporter: Konstantin Shvachko >Assignee: Colin Patrick McCabe >Priority: Critical > Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, > HDFS-10301.01.patch, zombieStorageLogs.rtf > > > When NameNode is busy a DataNode can timeout sending a block report. Then it > sends the block report again. Then NameNode while process these two reports > at the same time can interleave processing storages from different reports. > This screws up the blockReportId field, which makes NameNode think that some > storages are zombie. Replicas from zombie storages are immediately removed, > causing missing blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10336) TestBalancer failing intermittently because of not reseting UserGroupInformation completely
[ https://issues.apache.org/jira/browse/HDFS-10336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15259683#comment-15259683 ] Hadoop QA commented on HDFS-10336: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 30s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 39s {color} | {color:green} trunk passed with JDK v1.8.0_92 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 42s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 5s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 47s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 27s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 28s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 0s {color} | {color:green} trunk passed with JDK v1.8.0_92 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 49s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 41s {color} | {color:green} the patch passed with JDK v1.8.0_92 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 41s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 41s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 41s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 4s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 47s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 27s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 58s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 59s {color} | {color:green} the patch passed with JDK v1.8.0_92 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 46s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 6m 52s {color} | {color:green} hadoop-common in the patch passed with JDK v1.8.0_92. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 57m 55s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_92. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 20s {color} | {color:green} hadoop-common in the patch passed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 53m 30s {color} | {color:green} hadoop-hdfs in the patch passed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 184m 14s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_92 Failed junit