[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order
[ https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419823#comment-15419823 ] Colin P. McCabe commented on HDFS-10301: I don't think the heartbeat is the right place to handle reconciling the block storages. One reason is because this adds extra complexity and time to the heartbeat, which happens far more frequently than an FBR. We even talked about making the heartbeat lockless-- clearly you can't do that if you are traversing all the block storages. Taking the FSN lock is expensive and heartbeats are sent quite frequently from each DN-- every few seconds. Another reason reconciling storages in heartbeats is bad is because if the heartbeat tells you about a new storage, you won't know what blocks are in it until the FBR arrives. So the NN may end up assigning a bunch of new blocks to a storage which looks empty, but really is full. I came up with what I believe is the correct patch to fix this problem months ago. It's here as https://issues.apache.org/jira/secure/attachment/12805931/HDFS-10301.005.patch . It doesn't modify any RPCs or add any new mechanisms. Instead, it just fixes the obvious bug in the HDFS-7960 logic. The only counter-argument to applying patch 005 that anyone ever came up with is that it doesn't eliminate zombies when FBRs get interleaved. But this is not a good counter-argument, since FBR interleaving is extremely, extremely rare in well-run clusters. The proof should be obvious-- if FBR interleaving happened on more clusters, more people would hit this serious data loss bug. This JIRA has been extremely frustrating. It seems like most, if not all, of the points that I brought up in my reviews were ignored. I talked about the obvious problems with compatibility with [~shv]'s solution and even explicitly asked him to test the upgrade case. I told him that this JIRA was a bad one to give to a promising new contributor such as [~redvine], because it required a lot of context and was extremely tricky. Both myself and [~andrew.wang] commented that overloading BlockListAsLongs was confusing and not necessary. The patch confused "not modifying the .proto file" with "not modifying the RPC content" which are two very separate concepts, as I commented over and over. Clearly these comments were ignored. If anything, I think [~shv] got very lucky that the bug manifested itself quickly rather than creating a serious data loss situation a few months down the road, like the one I had to debug when fixing HDFS-7960. Again I would urge you to just commit patch 005. Or at least evaluate it. > BlockReport retransmissions may lead to storages falsely being declared > zombie if storage report processing happens out of order > > > Key: HDFS-10301 > URL: https://issues.apache.org/jira/browse/HDFS-10301 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.1 >Reporter: Konstantin Shvachko >Assignee: Vinitha Reddy Gankidi >Priority: Critical > Fix For: 2.7.4 > > Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, > HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, > HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, > HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, > HDFS-10301.012.patch, HDFS-10301.013.patch, HDFS-10301.branch-2.7.patch, > HDFS-10301.branch-2.patch, HDFS-10301.sample.patch, zombieStorageLogs.rtf > > > When NameNode is busy a DataNode can timeout sending a block report. Then it > sends the block report again. Then NameNode while process these two reports > at the same time can interleave processing storages from different reports. > This screws up the blockReportId field, which makes NameNode think that some > storages are zombie. Replicas from zombie storages are immediately removed, > causing missing blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10752) Several log refactoring/improvement suggestion in HDFS
[ https://issues.apache.org/jira/browse/HDFS-10752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nemo Chen updated HDFS-10752: - Description: As per conversation with [~vrushalic], we merged HDFS-10749, HDFS-10750, HDFS-10751, HDFS-10753, under this issue. HDFS-10749 *Method invocation in logs can be replaced by variable* Similar to the fix for HDFS-409. In file: hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java In code block: {code:borderStyle=solid} lastQueuedSeqno = currentPacket.getSeqno(); if (DFSClient.LOG.isDebugEnabled()) { DFSClient.LOG.debug("Queued packet " + currentPacket.getSeqno()); } {code} currentPacket.getSeqno() is better to be replaced by variable lastQueuedSeqno. HDFS-10750 Similar to the fix for AVRO-115. In file: hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java in line 695, the logging code: {code:borderStyle=solid} LOG.info(getRole() + " RPC up at: " + rpcServer.getRpcAddress()); {code} In the same class, there is a method in line 907: {code:borderStyle=solid} /** * @return NameNode RPC address */ public InetSocketAddress getNameNodeAddress() { return rpcServer.getRpcAddress(); } {code} We can tell that rpcServer.getRpcAddress() could be replaced by method getNameNodeAddress() for the case of readability and simplicity HDFS-10751 Similar to the fix for AVRO-115. In file: hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtxCache.java in line 72, the logging code: {code:borderStyle=solid} LOG.trace("openFileMap size:" + openFileMap.size()); {code} In the same class, there is a method in line 189: {code:borderStyle=solid} int size() { return openFileMap.size(); } {code} We can tell that openFileMap.size() could be replaced by method size() for the case of readability and simplicity *Print variable in byte* Similar to the fix for HBASE-623, in file: hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupImage.java In the following method, the log printed variable data (in byte[]). A possible fix is add Bytes.toString(data). {code} /** * Write the batch of edits to the local copy of the edit logs. */ private void logEditsLocally(long firstTxId, int numTxns, byte[] data) { long expectedTxId = editLog.getLastWrittenTxId() + 1; Preconditions.checkState(firstTxId == expectedTxId, "received txid batch starting at %s but expected txn %s", firstTxId, expectedTxId); editLog.setNextTxId(firstTxId + numTxns - 1); editLog.logEdit(data.length, data); editLog.logSync(); } {code} HDFS-10753 *MethodInvocation replaced by variable due to toString method* Similar to the fix in HADOOP-6419, in file: hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CorruptReplicasMap.java in line 76, the blk.getBlockName() method invocation is invoked on variable blk. "blk" is the class instance of Block. {code} void addToCorruptReplicasMap(Block blk, DatanodeDescriptor dn, String reason, Reason reasonCode) { ... NameNode.blockStateChangeLog.info( "BLOCK NameSystem.addToCorruptReplicasMap: {} added as corrupt on " + "{} by {} {}", blk.getBlockName(), dn, Server.getRemoteIp(), reasonText); {code} In file: hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/Block.java {code} @Override public String toString() { return getBlockName() + "_" + getGenerationStamp(); } {code} The toString() method contain not only getBlockName() but also getGenerationStamp which may be helpful for debugging purpose. Therefore blk.getBlockName() can be replaced by blk was: As per conversation with [~vrushalic], we merged HDFS-10749, HDFS-10750, HDFS-10751, HDFS-10753, under this issue. --- HDFS-10749 *Method invocation in logs can be replaced by variable* Similar to the fix for HDFS-409. In file: hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java In code block: {code:borderStyle=solid} lastQueuedSeqno = currentPacket.getSeqno(); if (DFSClient.LOG.isDebugEnabled()) { DFSClient.LOG.debug("Queued packet " + currentPacket.getSeqno()); } {code} currentPacket.getSeqno() is better to be replaced by variable lastQueuedSeqno. --- HDFS-10750 Similar to the fix for AVRO-115. In file: hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java in line 695, the logging code: {code:borderStyle=solid} LOG.info(getRole() + " RPC up at: " + rpcServer.getRpcAddress()); {code} In the same class, there is
[jira] [Updated] (HDFS-10752) Several log refactoring/improvement suggestion in HDFS
[ https://issues.apache.org/jira/browse/HDFS-10752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nemo Chen updated HDFS-10752: - Description: As per conversation with [~vrushalic], we merged HDFS-10749, HDFS-10750, HDFS-10751, HDFS-10753, under this issue. --- HDFS-10749 *Method invocation in logs can be replaced by variable* Similar to the fix for HDFS-409. In file: hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java In code block: {code:borderStyle=solid} lastQueuedSeqno = currentPacket.getSeqno(); if (DFSClient.LOG.isDebugEnabled()) { DFSClient.LOG.debug("Queued packet " + currentPacket.getSeqno()); } {code} currentPacket.getSeqno() is better to be replaced by variable lastQueuedSeqno. --- HDFS-10750 Similar to the fix for AVRO-115. In file: hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java in line 695, the logging code: {code:borderStyle=solid} LOG.info(getRole() + " RPC up at: " + rpcServer.getRpcAddress()); {code} In the same class, there is a method in line 907: {code:borderStyle=solid} /** * @return NameNode RPC address */ public InetSocketAddress getNameNodeAddress() { return rpcServer.getRpcAddress(); } {code} We can tell that rpcServer.getRpcAddress() could be replaced by method getNameNodeAddress() for the case of readability and simplicity --- HDFS-10751 Similar to the fix for AVRO-115. In file: hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtxCache.java in line 72, the logging code: {code:borderStyle=solid} LOG.trace("openFileMap size:" + openFileMap.size()); {code} In the same class, there is a method in line 189: {code:borderStyle=solid} int size() { return openFileMap.size(); } {code} We can tell that openFileMap.size() could be replaced by method size() for the case of readability and simplicity --- *Print variable in byte* Similar to the fix for HBASE-623, in file: hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupImage.java In the following method, the log printed variable data (in byte[]). A possible fix is add Bytes.toString(data). {code} /** * Write the batch of edits to the local copy of the edit logs. */ private void logEditsLocally(long firstTxId, int numTxns, byte[] data) { long expectedTxId = editLog.getLastWrittenTxId() + 1; Preconditions.checkState(firstTxId == expectedTxId, "received txid batch starting at %s but expected txn %s", firstTxId, expectedTxId); editLog.setNextTxId(firstTxId + numTxns - 1); editLog.logEdit(data.length, data); editLog.logSync(); } {code} HDFS-10753 *MethodInvocation replaced by variable due to toString method* Similar to the fix in HADOOP-6419, in file: hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CorruptReplicasMap.java in line 76, the blk.getBlockName() method invocation is invoked on variable blk. "blk" is the class instance of Block. {code} void addToCorruptReplicasMap(Block blk, DatanodeDescriptor dn, String reason, Reason reasonCode) { ... NameNode.blockStateChangeLog.info( "BLOCK NameSystem.addToCorruptReplicasMap: {} added as corrupt on " + "{} by {} {}", blk.getBlockName(), dn, Server.getRemoteIp(), reasonText); {code} In file: hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/Block.java {code} @Override public String toString() { return getBlockName() + "_" + getGenerationStamp(); } {code} The toString() method contain not only getBlockName() but also getGenerationStamp which may be helpful for debugging purpose. Therefore blk.getBlockName() can be replaced by blk was: As per conversation with [~vrushalic], we merged HDFS-10749, HDFS-10750, HDFS-10751, HDFS-10753, under this issue. --- HDFS-10749 *Method invocation in logs can be replaced by variable* Similar to the fix for HDFS-409. In file: hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java In code block: {code:borderStyle=solid} lastQueuedSeqno = currentPacket.getSeqno(); if (DFSClient.LOG.isDebugEnabled()) { DFSClient.LOG.debug("Queued packet " + currentPacket.getSeqno()); } {code} currentPacket.getSeqno() is better to be replaced by variable lastQueuedSeqno. --- HDFS-10750 Similar to the fix for AVRO-115. In file: hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java in line 695, the logging code: {code:borderStyle=solid} LOG.info(getRole() + " RPC up at: " + rpcServer.getRpcAddress()); {code} In the same class, there is a me
[jira] [Updated] (HDFS-10752) Several log refactoring/improvement suggestion in HDFS
[ https://issues.apache.org/jira/browse/HDFS-10752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nemo Chen updated HDFS-10752: - Description: As per conversation with [~vrushalic], we merged HDFS-10749, HDFS-10750, HDFS-10751, HDFS-10753, under this issue. --- HDFS-10749 *Method invocation in logs can be replaced by variable* Similar to the fix for HDFS-409. In file: hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java In code block: {code:borderStyle=solid} lastQueuedSeqno = currentPacket.getSeqno(); if (DFSClient.LOG.isDebugEnabled()) { DFSClient.LOG.debug("Queued packet " + currentPacket.getSeqno()); } {code} currentPacket.getSeqno() is better to be replaced by variable lastQueuedSeqno. --- *Print variable in byte* Similar to the fix for HBASE-623, in file: hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupImage.java In the following method, the log printed variable data (in byte[]). A possible fix is add Bytes.toString(data). {code} /** * Write the batch of edits to the local copy of the edit logs. */ private void logEditsLocally(long firstTxId, int numTxns, byte[] data) { long expectedTxId = editLog.getLastWrittenTxId() + 1; Preconditions.checkState(firstTxId == expectedTxId, "received txid batch starting at %s but expected txn %s", firstTxId, expectedTxId); editLog.setNextTxId(firstTxId + numTxns - 1); editLog.logEdit(data.length, data); editLog.logSync(); } {code} HDFS-10753 *MethodInvocation replaced by variable due to toString method* Similar to the fix in HADOOP-6419, in file: hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CorruptReplicasMap.java in line 76, the blk.getBlockName() method invocation is invoked on variable blk. "blk" is the class instance of Block. {code} void addToCorruptReplicasMap(Block blk, DatanodeDescriptor dn, String reason, Reason reasonCode) { ... NameNode.blockStateChangeLog.info( "BLOCK NameSystem.addToCorruptReplicasMap: {} added as corrupt on " + "{} by {} {}", blk.getBlockName(), dn, Server.getRemoteIp(), reasonText); {code} In file: hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/Block.java {code} @Override public String toString() { return getBlockName() + "_" + getGenerationStamp(); } {code} The toString() method contain not only getBlockName() but also getGenerationStamp which may be helpful for debugging purpose. Therefore blk.getBlockName() can be replaced by blk was: As per conversation with [~vrushalic], we merged HDFS-10749, HDFS-10750, HDFS-10751, HDFS-10753, under this issue. --- HDFS-10749 *Method invocation in logs can be replaced by variable* Similar to the fix for HDFS-409. In file: hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java In code block: {code:borderStyle=solid} lastQueuedSeqno = currentPacket.getSeqno(); if (DFSClient.LOG.isDebugEnabled()) { DFSClient.LOG.debug("Queued packet " + currentPacket.getSeqno()); } {code} currentPacket.getSeqno() is better to be replaced by variable lastQueuedSeqno. --- *Print variable in byte* Similar to the fix for HBASE-623, in file: hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupImage.java In the following method, the log printed variable data (in byte[]). A possible fix is add Bytes.toString(data). {code} /** * Write the batch of edits to the local copy of the edit logs. */ private void logEditsLocally(long firstTxId, int numTxns, byte[] data) { long expectedTxId = editLog.getLastWrittenTxId() + 1; Preconditions.checkState(firstTxId == expectedTxId, "received txid batch starting at %s but expected txn %s", firstTxId, expectedTxId); editLog.setNextTxId(firstTxId + numTxns - 1); editLog.logEdit(data.length, data); editLog.logSync(); } {code} *MethodInvocation replaced by variable due to toString method* Similar to the fix in HADOOP-6419, in file: hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CorruptReplicasMap.java in line 76, the blk.getBlockName() method invocation is invoked on variable blk. "blk" is the class instance of Block. {code} void addToCorruptReplicasMap(Block blk, DatanodeDescriptor dn, String reason, Reason reasonCode) { ... NameNode.blockStateChangeLog.info( "BLOCK NameSystem.addToCorruptReplicasMap: {} added as corrupt on " + "{} by {} {}", blk.getBlockName(), dn, Server.getRemoteIp(), reasonText); {code} In file: hadoop-rel-release-2.7.2/hadoop-h
[jira] [Updated] (HDFS-10752) Several log refactoring/improvement suggestion in HDFS
[ https://issues.apache.org/jira/browse/HDFS-10752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nemo Chen updated HDFS-10752: - Description: As per conversation with [~vrushalic], we merged HDFS-10749, HDFS-10750, HDFS-10751, HDFS-10753, under this issue. --- HDFS-10749 *Method invocation in logs can be replaced by variable* Similar to the fix for HDFS-409. In file: hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java In code block: {code:borderStyle=solid} lastQueuedSeqno = currentPacket.getSeqno(); if (DFSClient.LOG.isDebugEnabled()) { DFSClient.LOG.debug("Queued packet " + currentPacket.getSeqno()); } {code} currentPacket.getSeqno() is better to be replaced by variable lastQueuedSeqno. --- HDFS-10750 Similar to the fix for AVRO-115. In file: hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java in line 695, the logging code: {code:borderStyle=solid} LOG.info(getRole() + " RPC up at: " + rpcServer.getRpcAddress()); {code} In the same class, there is a method in line 907: {code:borderStyle=solid} /** * @return NameNode RPC address */ public InetSocketAddress getNameNodeAddress() { return rpcServer.getRpcAddress(); } {code} We can tell that rpcServer.getRpcAddress() could be replaced by method getNameNodeAddress() for the case of readability and simplicity --- HDFS-10751 --- *Print variable in byte* Similar to the fix for HBASE-623, in file: hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupImage.java In the following method, the log printed variable data (in byte[]). A possible fix is add Bytes.toString(data). {code} /** * Write the batch of edits to the local copy of the edit logs. */ private void logEditsLocally(long firstTxId, int numTxns, byte[] data) { long expectedTxId = editLog.getLastWrittenTxId() + 1; Preconditions.checkState(firstTxId == expectedTxId, "received txid batch starting at %s but expected txn %s", firstTxId, expectedTxId); editLog.setNextTxId(firstTxId + numTxns - 1); editLog.logEdit(data.length, data); editLog.logSync(); } {code} HDFS-10753 *MethodInvocation replaced by variable due to toString method* Similar to the fix in HADOOP-6419, in file: hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CorruptReplicasMap.java in line 76, the blk.getBlockName() method invocation is invoked on variable blk. "blk" is the class instance of Block. {code} void addToCorruptReplicasMap(Block blk, DatanodeDescriptor dn, String reason, Reason reasonCode) { ... NameNode.blockStateChangeLog.info( "BLOCK NameSystem.addToCorruptReplicasMap: {} added as corrupt on " + "{} by {} {}", blk.getBlockName(), dn, Server.getRemoteIp(), reasonText); {code} In file: hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/Block.java {code} @Override public String toString() { return getBlockName() + "_" + getGenerationStamp(); } {code} The toString() method contain not only getBlockName() but also getGenerationStamp which may be helpful for debugging purpose. Therefore blk.getBlockName() can be replaced by blk was: As per conversation with [~vrushalic], we merged HDFS-10749, HDFS-10750, HDFS-10751, HDFS-10753, under this issue. --- HDFS-10749 *Method invocation in logs can be replaced by variable* Similar to the fix for HDFS-409. In file: hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java In code block: {code:borderStyle=solid} lastQueuedSeqno = currentPacket.getSeqno(); if (DFSClient.LOG.isDebugEnabled()) { DFSClient.LOG.debug("Queued packet " + currentPacket.getSeqno()); } {code} currentPacket.getSeqno() is better to be replaced by variable lastQueuedSeqno. --- *Print variable in byte* Similar to the fix for HBASE-623, in file: hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupImage.java In the following method, the log printed variable data (in byte[]). A possible fix is add Bytes.toString(data). {code} /** * Write the batch of edits to the local copy of the edit logs. */ private void logEditsLocally(long firstTxId, int numTxns, byte[] data) { long expectedTxId = editLog.getLastWrittenTxId() + 1; Preconditions.checkState(firstTxId == expectedTxId, "received txid batch starting at %s but expected txn %s", firstTxId, expectedTxId); editLog.setNextTxId(firstTxId + numTxns - 1); editLog.logEdit(data.length, data); editLog.logSync(); } {code} HDFS-10753 *MethodInvocation replaced by variable due to toSt
[jira] [Updated] (HDFS-10752) Several log refactoring/improvement suggestion in HDFS
[ https://issues.apache.org/jira/browse/HDFS-10752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nemo Chen updated HDFS-10752: - Description: As per conversation with [~vrushalic], we merged HDFS-10749, HDFS-10750, HDFS-10751, HDFS-10753, under this issue. --- HDFS-10749 *Method invocation in logs can be replaced by variable* Similar to the fix for HDFS-409. In file: hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java In code block: {code:borderStyle=solid} lastQueuedSeqno = currentPacket.getSeqno(); if (DFSClient.LOG.isDebugEnabled()) { DFSClient.LOG.debug("Queued packet " + currentPacket.getSeqno()); } {code} currentPacket.getSeqno() is better to be replaced by variable lastQueuedSeqno. --- *Print variable in byte* Similar to the fix for HBASE-623, in file: hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupImage.java In the following method, the log printed variable data (in byte[]). A possible fix is add Bytes.toString(data). {code} /** * Write the batch of edits to the local copy of the edit logs. */ private void logEditsLocally(long firstTxId, int numTxns, byte[] data) { long expectedTxId = editLog.getLastWrittenTxId() + 1; Preconditions.checkState(firstTxId == expectedTxId, "received txid batch starting at %s but expected txn %s", firstTxId, expectedTxId); editLog.setNextTxId(firstTxId + numTxns - 1); editLog.logEdit(data.length, data); editLog.logSync(); } {code} *MethodInvocation replaced by variable due to toString method* Similar to the fix in HADOOP-6419, in file: hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CorruptReplicasMap.java in line 76, the blk.getBlockName() method invocation is invoked on variable blk. "blk" is the class instance of Block. {code} void addToCorruptReplicasMap(Block blk, DatanodeDescriptor dn, String reason, Reason reasonCode) { ... NameNode.blockStateChangeLog.info( "BLOCK NameSystem.addToCorruptReplicasMap: {} added as corrupt on " + "{} by {} {}", blk.getBlockName(), dn, Server.getRemoteIp(), reasonText); {code} In file: hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/Block.java {code} @Override public String toString() { return getBlockName() + "_" + getGenerationStamp(); } {code} The toString() method contain not only getBlockName() but also getGenerationStamp which may be helpful for debugging purpose. Therefore blk.getBlockName() can be replaced by blk was: As per conversation with [~vrushalic], we merged HDFS-10749, HDFS-10750, HDFS-10751, HDFS-10753, under this issue. *Print variable in byte* Similar to the fix for HBASE-623, in file: hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupImage.java In the following method, the log printed variable data (in byte[]). A possible fix is add Bytes.toString(data). {code} /** * Write the batch of edits to the local copy of the edit logs. */ private void logEditsLocally(long firstTxId, int numTxns, byte[] data) { long expectedTxId = editLog.getLastWrittenTxId() + 1; Preconditions.checkState(firstTxId == expectedTxId, "received txid batch starting at %s but expected txn %s", firstTxId, expectedTxId); editLog.setNextTxId(firstTxId + numTxns - 1); editLog.logEdit(data.length, data); editLog.logSync(); } {code} *Method invocation in logs can be replaced by variable* Similar to the fix for HDFS-409. In file: hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java In code block: {code:borderStyle=solid} lastQueuedSeqno = currentPacket.getSeqno(); if (DFSClient.LOG.isDebugEnabled()) { DFSClient.LOG.debug("Queued packet " + currentPacket.getSeqno()); } {code} currentPacket.getSeqno() is better to be replaced by variable lastQueuedSeqno. *MethodInvocation replaced by variable due to toString method* Similar to the fix in HADOOP-6419, in file: hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CorruptReplicasMap.java in line 76, the blk.getBlockName() method invocation is invoked on variable blk. "blk" is the class instance of Block. {code} void addToCorruptReplicasMap(Block blk, DatanodeDescriptor dn, String reason, Reason reasonCode) { ... NameNode.blockStateChangeLog.info( "BLOCK NameSystem.addToCorruptReplicasMap: {} added as corrupt on " + "{} by {} {}", blk.getBlockName(), dn, Server.getRemoteIp(), reasonText); {code} In file: hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/mai
[jira] [Updated] (HDFS-10752) Several log refactoring/improvement suggestion in HDFS
[ https://issues.apache.org/jira/browse/HDFS-10752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nemo Chen updated HDFS-10752: - Description: As per conversation with [~vrushalic], we merged HDFS-10749, HDFS-10750, HDFS-10751, HDFS-10753, under this issue. *Print variable in byte* Similar to the fix for HBASE-623, in file: hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupImage.java In the following method, the log printed variable data (in byte[]). A possible fix is add Bytes.toString(data). {code} /** * Write the batch of edits to the local copy of the edit logs. */ private void logEditsLocally(long firstTxId, int numTxns, byte[] data) { long expectedTxId = editLog.getLastWrittenTxId() + 1; Preconditions.checkState(firstTxId == expectedTxId, "received txid batch starting at %s but expected txn %s", firstTxId, expectedTxId); editLog.setNextTxId(firstTxId + numTxns - 1); editLog.logEdit(data.length, data); editLog.logSync(); } {code} *Method invocation in logs can be replaced by variable* Similar to the fix for HDFS-409. In file: hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java In code block: {code:borderStyle=solid} lastQueuedSeqno = currentPacket.getSeqno(); if (DFSClient.LOG.isDebugEnabled()) { DFSClient.LOG.debug("Queued packet " + currentPacket.getSeqno()); } {code} currentPacket.getSeqno() is better to be replaced by variable lastQueuedSeqno. *MethodInvocation replaced by variable due to toString method* Similar to the fix in HADOOP-6419, in file: hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CorruptReplicasMap.java in line 76, the blk.getBlockName() method invocation is invoked on variable blk. "blk" is the class instance of Block. {code} void addToCorruptReplicasMap(Block blk, DatanodeDescriptor dn, String reason, Reason reasonCode) { ... NameNode.blockStateChangeLog.info( "BLOCK NameSystem.addToCorruptReplicasMap: {} added as corrupt on " + "{} by {} {}", blk.getBlockName(), dn, Server.getRemoteIp(), reasonText); {code} In file: hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/Block.java {code} @Override public String toString() { return getBlockName() + "_" + getGenerationStamp(); } {code} The toString() method contain not only getBlockName() but also getGenerationStamp which may be helpful for debugging purpose. Therefore blk.getBlockName() can be replaced by blk was: As per conversation with [~vrushalic], we merged HDFS-10753, HDFS-10749 under this issue. *Print variable in byte* Similar to the fix for HBASE-623, in file: hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupImage.java In the following method, the log printed variable data (in byte[]). A possible fix is add Bytes.toString(data). {code} /** * Write the batch of edits to the local copy of the edit logs. */ private void logEditsLocally(long firstTxId, int numTxns, byte[] data) { long expectedTxId = editLog.getLastWrittenTxId() + 1; Preconditions.checkState(firstTxId == expectedTxId, "received txid batch starting at %s but expected txn %s", firstTxId, expectedTxId); editLog.setNextTxId(firstTxId + numTxns - 1); editLog.logEdit(data.length, data); editLog.logSync(); } {code} *Method invocation in logs can be replaced by variable* Similar to the fix for HDFS-409. In file: hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java In code block: {code:borderStyle=solid} lastQueuedSeqno = currentPacket.getSeqno(); if (DFSClient.LOG.isDebugEnabled()) { DFSClient.LOG.debug("Queued packet " + currentPacket.getSeqno()); } {code} currentPacket.getSeqno() is better to be replaced by variable lastQueuedSeqno. *MethodInvocation replaced by variable due to toString method* Similar to the fix in HADOOP-6419, in file: hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CorruptReplicasMap.java in line 76, the blk.getBlockName() method invocation is invoked on variable blk. "blk" is the class instance of Block. {code} void addToCorruptReplicasMap(Block blk, DatanodeDescriptor dn, String reason, Reason reasonCode) { ... NameNode.blockStateChangeLog.info( "BLOCK NameSystem.addToCorruptReplicasMap: {} added as corrupt on " + "{} by {} {}", blk.getBlockName(), dn, Server.getRemoteIp(), reasonText); {code} In file: hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/Block.j
[jira] [Commented] (HDFS-9696) Garbage snapshot records lingering forever
[ https://issues.apache.org/jira/browse/HDFS-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419709#comment-15419709 ] Hadoop QA commented on HDFS-9696: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 90m 43s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}111m 12s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestPersistBlocks | | Timed out junit tests | org.apache.hadoop.hdfs.TestLeaseRecovery2 | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12823548/HDFS-9696.v2.patch | | JIRA Issue | HDFS-9696 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 6fa8fece0684 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 23c6e3c | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/16417/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/16417/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/16417/console | | Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Garbage snapshot records lingering forever > -- > > Key: HDFS-9696 > URL: https://issues.apache.org/jira/browse/HDFS-9696 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Kihwal Lee >Assignee: Kih
[jira] [Commented] (HDFS-10757) KMSClientProvider combined with KeyProviderCache can result in wrong UGI being used
[ https://issues.apache.org/jira/browse/HDFS-10757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419682#comment-15419682 ] Jitendra Nath Pandey commented on HDFS-10757: - [~asuresh], thanks for explaining the context. In this context it works because the server has a login user that is stored as the actualUgi and that is the one always needed, but in some other scenarios as in HADOOP-13381 the actualUgi becomes incorrect. Many servers, that are processing an incoming request that was authenticated via proxy mechanism, setup a proxy-UGI with a real user without credentials, because the credentials of the real-user are not really available on the server. Therefore, the proxy-ugi is relevant for real authentication only in the context of a client. The proxyUgi setup by the server in this context should not be propagated for further calls to other services. That means a new proxy user should be explicitly setup to make further calls. Suppose a general flow goes like this: (===> denotes a remote call) client1 > Server1 (Hive, Oozie) Authenticates and creates ugi1> Server1 Processes ---> Server1 creates client2 to read encrypted data ===> Server2 (NN or KMS) When Server1 authenticates client1 it creates a ugi1 (which may be a proxy ugi) to preserve the context in which authentication of client1 was performed. Now when Sever1 instantiates a client2 to make a call to Server2 it should not use ugi1 because the authentication context in ugi1 is not relevant for this call. In my opinion a new ugi2 should be explicitly setup, which has the right credentials. > KMSClientProvider combined with KeyProviderCache can result in wrong UGI > being used > --- > > Key: HDFS-10757 > URL: https://issues.apache.org/jira/browse/HDFS-10757 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Sergey Shelukhin >Priority: Critical > > ClientContext::get gets the context from CACHE via a config setting based > name, then KeyProviderCache stored in ClientContext gets the key provider > cached by URI from the configuration, too. These would return the same > KeyProvider regardless of current UGI. > KMSClientProvider caches the UGI (actualUgi) in ctor; that means in > particular that all the users of DFS with KMSClientProvider in a process will > get the KMS token (along with other credentials) of the first user, via the > above cache. > Either KMSClientProvider shouldn't store the UGI, or one of the caches should > be UGI-aware, like the FS object cache. > Side note: the comment in createConnection that purports to handle the > different UGI doesn't seem to cover what it says it covers. In our case, we > have two unrelated UGIs with no auth (createRemoteUser) with bunch of tokens, > including a KMS token, added. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-9696) Garbage snapshot records lingering forever
[ https://issues.apache.org/jira/browse/HDFS-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-9696: - Attachment: HDFS-9696.v2.patch > Garbage snapshot records lingering forever > -- > > Key: HDFS-9696 > URL: https://issues.apache.org/jira/browse/HDFS-9696 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Critical > Attachments: HDFS-9696.patch, HDFS-9696.v2.patch > > > We have a cluster where the snapshot feature might have been tested years > ago. When the HDFS does not have any snapshot, but I see filediff records > persisted in its fsimage. Since it has been restarted many times and > checkpointed over 100 times since then, it must haven been persisted and > carried over since then. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9696) Garbage snapshot records lingering forever
[ https://issues.apache.org/jira/browse/HDFS-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419584#comment-15419584 ] Kihwal Lee commented on HDFS-9696: -- It looks like these tests failed because the snapshot section wasn't present. When the existing namenode reloads such an image, the snapshot manager state may not be properly reset. I made it skip only the diff section and they seem to pass. > Garbage snapshot records lingering forever > -- > > Key: HDFS-9696 > URL: https://issues.apache.org/jira/browse/HDFS-9696 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Critical > Attachments: HDFS-9696.patch, HDFS-9696.v2.patch > > > We have a cluster where the snapshot feature might have been tested years > ago. When the HDFS does not have any snapshot, but I see filediff records > persisted in its fsimage. Since it has been restarted many times and > checkpointed over 100 times since then, it must haven been persisted and > carried over since then. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9696) Garbage snapshot records lingering forever
[ https://issues.apache.org/jira/browse/HDFS-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419494#comment-15419494 ] Tsz Wo Nicholas Sze commented on HDFS-9696: --- The test failures seem related. Please take a look. :) > Garbage snapshot records lingering forever > -- > > Key: HDFS-9696 > URL: https://issues.apache.org/jira/browse/HDFS-9696 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Critical > Attachments: HDFS-9696.patch > > > We have a cluster where the snapshot feature might have been tested years > ago. When the HDFS does not have any snapshot, but I see filediff records > persisted in its fsimage. Since it has been restarted many times and > checkpointed over 100 times since then, it must haven been persisted and > carried over since then. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10747) o.a.h.hdfs.tools.DebugAdmin usage message is misleading
[ https://issues.apache.org/jira/browse/HDFS-10747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419480#comment-15419480 ] Mingliang Liu commented on HDFS-10747: -- Thanks for the review, [~jojochuang]. > o.a.h.hdfs.tools.DebugAdmin usage message is misleading > --- > > Key: HDFS-10747 > URL: https://issues.apache.org/jira/browse/HDFS-10747 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 2.7.0 >Reporter: Mingliang Liu >Assignee: Mingliang Liu >Priority: Minor > Attachments: HDFS-10747.000.patch > > > [HDFS-6917] added a helpful hdfs debug command to validate blocks and call > recoverlease. The usage doc is kinda misleading, as following: > {code} > $ hdfs debug verify > creating a new configuration > verify [-meta ] [-block ] > Verify HDFS metadata and block files. If a block file is specified, we > will verify that the checksums in the metadata file match the block > file. > {code} > Actually the {{-meta }} is necessary. {{[]}} is for optional > arguments, if we follow the > [convention|http://pubs.opengroup.org/onlinepubs/9699919799]. > {code} > $ hdfs debug recoverLease > creating a new configuration > recoverLease [-path ] [-retries ] > Recover the lease on the specified path. The path must reside on an > HDFS filesystem. The default number of retries is 1. > {code} > {{-path }} is also the same case. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9696) Garbage snapshot records lingering forever
[ https://issues.apache.org/jira/browse/HDFS-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419449#comment-15419449 ] Hadoop QA commented on HDFS-9696: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 56m 57s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 76m 14s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.snapshot.TestCheckpointsWithSnapshots | | | hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12823518/HDFS-9696.patch | | JIRA Issue | HDFS-9696 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 44f817c68730 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 20:42:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 23c6e3c | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/16416/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/16416/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/16416/console | | Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Garbage snapshot records lingering forever > -- > > Key: HDFS-9696 > URL: https://issues.apache.org/jira/browse/HDFS-9696 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Kihwal Lee >
[jira] [Updated] (HDFS-9696) Garbage snapshot records lingering forever
[ https://issues.apache.org/jira/browse/HDFS-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-9696: - Attachment: HDFS-9696.patch Attaching a patch containing a unit test. > Garbage snapshot records lingering forever > -- > > Key: HDFS-9696 > URL: https://issues.apache.org/jira/browse/HDFS-9696 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Critical > Attachments: HDFS-9696.patch > > > We have a cluster where the snapshot feature might have been tested years > ago. When the HDFS does not have any snapshot, but I see filediff records > persisted in its fsimage. Since it has been restarted many times and > checkpointed over 100 times since then, it must haven been persisted and > carried over since then. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-9696) Garbage snapshot records lingering forever
[ https://issues.apache.org/jira/browse/HDFS-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-9696: - Status: Patch Available (was: Reopened) > Garbage snapshot records lingering forever > -- > > Key: HDFS-9696 > URL: https://issues.apache.org/jira/browse/HDFS-9696 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Critical > Attachments: HDFS-9696.patch > > > We have a cluster where the snapshot feature might have been tested years > ago. When the HDFS does not have any snapshot, but I see filediff records > persisted in its fsimage. Since it has been restarted many times and > checkpointed over 100 times since then, it must haven been persisted and > carried over since then. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10760) DataXceiver#run() should not log InvalidToken exception as an error
[ https://issues.apache.org/jira/browse/HDFS-10760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419311#comment-15419311 ] Hadoop QA commented on HDFS-10760: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 12m 21s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 23s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 58 unchanged - 0 fixed = 59 total (was 58) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 58m 42s{color} | {color:green} hadoop-hdfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 90m 27s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12823430/HADOOP-13492.patch | | JIRA Issue | HDFS-10760 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux e56684d45254 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 20:42:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 23c6e3c | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/16415/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/16415/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/16415/console | | Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > DataXceiver#run() should not log InvalidToken exception as an error > --- > > Key: HDFS-10760 > URL: https://issues.apache.org/jira/browse/HDFS-10760 > Proj
[jira] [Commented] (HDFS-10636) Modify ReplicaInfo to remove the assumption that replica metadata and data are stored in java.io.File.
[ https://issues.apache.org/jira/browse/HDFS-10636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419293#comment-15419293 ] Virajith Jalaparti commented on HDFS-10636: --- Hi [~eddyxu], Thank you for the comments. I agree with points 1 and 2, and will fix them. bq. {{breakHardlinksIfNeeded}}, {{copyMetadata}} and {{copyBlockdata}} should not be in {{ReplicaInfo}}. Or should not use {{File}} as input. Agree that {{copyMetadata}} and {{copyBlockdata}} should not have {{File}} as a parameter. I will change it to {{URI}} to be more general. {{breakHardLinksIfNeeded}} has always been in {{ReplicaInfo}}. I made it abstract and moved the implementation to {{LocalReplica}}. bq. {{ReplicaUnderRecovery}}. Is there a way to avoid casting {{ReplicaInfo}} to {{LocalReplica}}? The only place where the fact that {{original}} is a {{LocalReplica}} matters is in {{ReplicaUnderRecovery::setDir()}}. One way to address this would be to add the cast only when {{original.setDir()}} is called. The other way to deal with this would be to add {{setDir}} to {{ReplicaInfo}} but to avoid {{File}} as a parameter, it should take in an URI. Which do you think is better? bq. In general, there are places in this patch that return {{ReplicaInfo}} for {{FinalizedReplica}}. which would makes type system weaker and is not future-proof. Is it necessary to be changed? This was intentional. The way I was thinking about it was that the state of {{ReplicaInfo}} should be known using {{ReplicaInfo::getState()}}, and not using the type system. The current code does the latter -- it uses the type system to ensure that replicas are in a certain state. Not relying on the type system and using the former (use {{ReplicaInfo::getState()}}) seems a cleaner way of doing this. What do you think? Also, {{FinalizedReplica}} in the current type hierarchy, is a {{LocalReplica}}. So, referring to replicas using {{FinalizedReplica}} assumes that they are {{LocalReplica}} and hence, backed by {{File}} s. > Modify ReplicaInfo to remove the assumption that replica metadata and data > are stored in java.io.File. > -- > > Key: HDFS-10636 > URL: https://issues.apache.org/jira/browse/HDFS-10636 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, fs >Reporter: Virajith Jalaparti >Assignee: Virajith Jalaparti > Attachments: HDFS-10636.001.patch, HDFS-10636.002.patch, > HDFS-10636.003.patch, HDFS-10636.004.patch, HDFS-10636.005.patch > > > Replace java.io.File related APIs from {{ReplicaInfo}}, and enable the > definition of new {{ReplicaInfo}} sub-classes whose metadata and data can be > present on external storages (HDFS-9806). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10755) TestDecommissioningStatus BindException Failure
[ https://issues.apache.org/jira/browse/HDFS-10755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419248#comment-15419248 ] Eric Badger commented on HDFS-10755: The pre-commit test failure is unrelated to the patch. I believe the patch is ready for review. [~kihwal], could you take a look? > TestDecommissioningStatus BindException Failure > --- > > Key: HDFS-10755 > URL: https://issues.apache.org/jira/browse/HDFS-10755 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-10755.001.patch, HDFS-10755.002.patch > > > Tests in TestDecomissioningStatus call MiniDFSCluster.dataNodeRestart(). They > are required to come back up on the same (initially ephemeral) port that they > were on before being shutdown. Because of this, there is an inherent race > condition where another process could bind to the port while the datanode is > down. If this happens then we get a BindException failure. However, all of > the tests in TestDecommissioningStatus depend on the cluster being up and > running for them to run correctly. So if a test blows up the cluster, the > subsequent tests will also fail. Below I show the BindException failure as > well as the subsequent test failure that occurred. > {noformat} > java.net.BindException: Problem binding to [localhost:35370] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:436) > at sun.nio.ch.Net.bind(Net.java:428) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at org.apache.hadoop.ipc.Server.bind(Server.java:430) > at org.apache.hadoop.ipc.Server$Listener.(Server.java:768) > at org.apache.hadoop.ipc.Server.(Server.java:2391) > at org.apache.hadoop.ipc.RPC$Server.(RPC.java:951) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:523) > at > org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:498) > at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:796) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.initIpcServer(DataNode.java:802) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1134) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:429) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2387) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2274) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2321) > at > org.apache.hadoop.hdfs.MiniDFSCluster.restartDataNode(MiniDFSCluster.java:2037) > at > org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.testDecommissionDeadDN(TestDecommissioningStatus.java:426) > {noformat} > {noformat} > java.lang.AssertionError: Number of Datanodes expected:<2> but was:<1> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.testDecommissionStatus(TestDecommissioningStatus.java:275) > {noformat} > I don't think there's any way to avoid the inherent race condition with > getting the same ephemeral port, but we can definitely fix the tests so that > it doesn't cause subsequent tests to fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10757) KMSClientProvider combined with KeyProviderCache can result in wrong UGI being used
[ https://issues.apache.org/jira/browse/HDFS-10757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419234#comment-15419234 ] Arun Suresh commented on HDFS-10757: Hmmm... I think I remember the context for why it was implemented as such. bq. If the currentUgi is a proxy user it will have a real UGI. currentUgi.getRealUser() should give us the actual ugi. That is true, but the KMSCP was being implemented around the same time as HADOOP-10835. That JIRA was meant to plumb proxy user through HTTP. If you look at this [snippet|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/web/DelegationTokenAuthenticationFilter.java#L247-L267] of code, you will notice that if the currentUser is authenticated via a delegation token, the realUser is actually a dummy user created via {{ UserGroupInformation.createRemoteUser()}} and does not have any credentials to create the connection, which is why I guess it was decided to have a loginUgi/actualUgi created in the KMSCP constructor. > KMSClientProvider combined with KeyProviderCache can result in wrong UGI > being used > --- > > Key: HDFS-10757 > URL: https://issues.apache.org/jira/browse/HDFS-10757 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Sergey Shelukhin >Priority: Critical > > ClientContext::get gets the context from CACHE via a config setting based > name, then KeyProviderCache stored in ClientContext gets the key provider > cached by URI from the configuration, too. These would return the same > KeyProvider regardless of current UGI. > KMSClientProvider caches the UGI (actualUgi) in ctor; that means in > particular that all the users of DFS with KMSClientProvider in a process will > get the KMS token (along with other credentials) of the first user, via the > above cache. > Either KMSClientProvider shouldn't store the UGI, or one of the caches should > be UGI-aware, like the FS object cache. > Side note: the comment in createConnection that purports to handle the > different UGI doesn't seem to cover what it says it covers. In our case, we > have two unrelated UGIs with no auth (createRemoteUser) with bunch of tokens, > including a KMS token, added. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-10757) KMSClientProvider combined with KeyProviderCache can result in wrong UGI being used
[ https://issues.apache.org/jira/browse/HDFS-10757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419234#comment-15419234 ] Arun Suresh edited comment on HDFS-10757 at 8/12/16 6:09 PM: - Hmmm... I think I remember the context for why it was implemented as such. bq. If the currentUgi is a proxy user it will have a real UGI. currentUgi.getRealUser() should give us the actual ugi. That is true, but the KMSCP was being implemented around the same time as HADOOP-10835. That JIRA was meant to plumb proxy user through HTTP. If you look at this [snippet|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/web/DelegationTokenAuthenticationFilter.java#L247-L267] of code, you will notice that if the currentUser is authenticated via a delegation token, the realUser is actually a dummy user created via {{UserGroupInformation.createRemoteUser()}} and does not have any credentials to create the connection, which is why I guess it was decided to have a loginUgi/actualUgi created in the KMSCP constructor. was (Author: asuresh): Hmmm... I think I remember the context for why it was implemented as such. bq. If the currentUgi is a proxy user it will have a real UGI. currentUgi.getRealUser() should give us the actual ugi. That is true, but the KMSCP was being implemented around the same time as HADOOP-10835. That JIRA was meant to plumb proxy user through HTTP. If you look at this [snippet|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/web/DelegationTokenAuthenticationFilter.java#L247-L267] of code, you will notice that if the currentUser is authenticated via a delegation token, the realUser is actually a dummy user created via {{ UserGroupInformation.createRemoteUser()}} and does not have any credentials to create the connection, which is why I guess it was decided to have a loginUgi/actualUgi created in the KMSCP constructor. > KMSClientProvider combined with KeyProviderCache can result in wrong UGI > being used > --- > > Key: HDFS-10757 > URL: https://issues.apache.org/jira/browse/HDFS-10757 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Sergey Shelukhin >Priority: Critical > > ClientContext::get gets the context from CACHE via a config setting based > name, then KeyProviderCache stored in ClientContext gets the key provider > cached by URI from the configuration, too. These would return the same > KeyProvider regardless of current UGI. > KMSClientProvider caches the UGI (actualUgi) in ctor; that means in > particular that all the users of DFS with KMSClientProvider in a process will > get the KMS token (along with other credentials) of the first user, via the > above cache. > Either KMSClientProvider shouldn't store the UGI, or one of the caches should > be UGI-aware, like the FS object cache. > Side note: the comment in createConnection that purports to handle the > different UGI doesn't seem to cover what it says it covers. In our case, we > have two unrelated UGIs with no auth (createRemoteUser) with bunch of tokens, > including a KMS token, added. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10746) libhdfs++: synchronize access to working_directory and bytes_read_.
[ https://issues.apache.org/jira/browse/HDFS-10746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419201#comment-15419201 ] Hadoop QA commented on HDFS-10746: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 12m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 27s{color} | {color:green} HDFS-8707 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 10s{color} | {color:green} HDFS-8707 passed with JDK v1.8.0_101 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 35s{color} | {color:green} HDFS-8707 passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 14s{color} | {color:green} HDFS-8707 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s{color} | {color:green} HDFS-8707 passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 36s{color} | {color:green} the patch passed with JDK v1.8.0_101 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 5m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 44s{color} | {color:green} the patch passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 5m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 36s{color} | {color:green} hadoop-hdfs-native-client in the patch passed with JDK v1.7.0_101. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 63m 5s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0cf5e66 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12823502/HDFS-10746.HDFS-8707.001.patch | | JIRA Issue | HDFS-10746 | | Optional Tests | asflicense compile cc mvnsite javac unit | | uname | Linux cef2c55eb629 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 20:42:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | HDFS-8707 / ea932e7 | | Default Java | 1.7.0_101 | | Multi-JDK versions | /usr/lib/jvm/java-8-oracle:1.8.0_101 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_101 | | JDK v1.7.0_101 Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/16414/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs-native-client U: hadoop-hdfs-project/hadoop-hdfs-native-client | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/16414/console | | Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > libhdfs++: synchronize access to working_directory and bytes_read_. > --- > > Key: HDFS-10746 > URL: https://issues.apache.org/jira/browse/HDFS-10746 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Anatoli Shein >Assignee: A
[jira] [Updated] (HDFS-10760) DataXceiver#run() should not log InvalidToken exception as an error
[ https://issues.apache.org/jira/browse/HDFS-10760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-10760: Status: Patch Available (was: Open) > DataXceiver#run() should not log InvalidToken exception as an error > --- > > Key: HDFS-10760 > URL: https://issues.apache.org/jira/browse/HDFS-10760 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0-alpha1 >Reporter: Pan Yuxuan >Assignee: Pan Yuxuan > Attachments: HADOOP-13492.patch > > > DataXceiver#run() just log InvalidToken exception as an error. > When client has an expired token and just refetch a new token, the DN log > will has an error like below: > {noformat} > 2016-08-11 02:41:09,817 ERROR datanode.DataNode (DataXceiver.java:run(269)) - > XXX:50010:DataXceiver error processing READ_BLOCK operation src: > /10.17.1.5:38844 dst: /10.17.1.5:50010 > org.apache.hadoop.security.token.SecretManager$InvalidToken: Block token with > block_token_identifier (expiryDate=1470850746803, keyId=-2093956963, > userId=hbase, blockPoolId=BP-641703426-10.17.1.2-1468517918886, > blockId=1077120201, access modes=[READ]) is expired. > at > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.checkAccess(BlockTokenSecretManager.java:280) > at > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.checkAccess(BlockTokenSecretManager.java:301) > at > org.apache.hadoop.hdfs.security.token.block.BlockPoolTokenSecretManager.checkAccess(BlockPoolTokenSecretManager.java:97) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1236) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:481) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:242) > at java.lang.Thread.run(Thread.java:745) > {noformat} > This is not a server error and the DataXceiver#checkAccess() has already > loged the InvalidToken as a warning. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10760) DataXceiver#run() should not log InvalidToken exception as an error
[ https://issues.apache.org/jira/browse/HDFS-10760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-10760: Assignee: Pan Yuxuan > DataXceiver#run() should not log InvalidToken exception as an error > --- > > Key: HDFS-10760 > URL: https://issues.apache.org/jira/browse/HDFS-10760 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0-alpha1 >Reporter: Pan Yuxuan >Assignee: Pan Yuxuan > Attachments: HADOOP-13492.patch > > > DataXceiver#run() just log InvalidToken exception as an error. > When client has an expired token and just refetch a new token, the DN log > will has an error like below: > {noformat} > 2016-08-11 02:41:09,817 ERROR datanode.DataNode (DataXceiver.java:run(269)) - > XXX:50010:DataXceiver error processing READ_BLOCK operation src: > /10.17.1.5:38844 dst: /10.17.1.5:50010 > org.apache.hadoop.security.token.SecretManager$InvalidToken: Block token with > block_token_identifier (expiryDate=1470850746803, keyId=-2093956963, > userId=hbase, blockPoolId=BP-641703426-10.17.1.2-1468517918886, > blockId=1077120201, access modes=[READ]) is expired. > at > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.checkAccess(BlockTokenSecretManager.java:280) > at > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.checkAccess(BlockTokenSecretManager.java:301) > at > org.apache.hadoop.hdfs.security.token.block.BlockPoolTokenSecretManager.checkAccess(BlockPoolTokenSecretManager.java:97) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1236) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:481) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:242) > at java.lang.Thread.run(Thread.java:745) > {noformat} > This is not a server error and the DataXceiver#checkAccess() has already > loged the InvalidToken as a warning. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Moved] (HDFS-10760) DataXceiver#run() should not log InvalidToken exception as an error
[ https://issues.apache.org/jira/browse/HDFS-10760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer moved HADOOP-13492 to HDFS-10760: -- Affects Version/s: (was: 3.0.0-alpha1) 3.0.0-alpha1 Key: HDFS-10760 (was: HADOOP-13492) Project: Hadoop HDFS (was: Hadoop Common) > DataXceiver#run() should not log InvalidToken exception as an error > --- > > Key: HDFS-10760 > URL: https://issues.apache.org/jira/browse/HDFS-10760 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0-alpha1 >Reporter: Pan Yuxuan > Attachments: HADOOP-13492.patch > > > DataXceiver#run() just log InvalidToken exception as an error. > When client has an expired token and just refetch a new token, the DN log > will has an error like below: > {noformat} > 2016-08-11 02:41:09,817 ERROR datanode.DataNode (DataXceiver.java:run(269)) - > XXX:50010:DataXceiver error processing READ_BLOCK operation src: > /10.17.1.5:38844 dst: /10.17.1.5:50010 > org.apache.hadoop.security.token.SecretManager$InvalidToken: Block token with > block_token_identifier (expiryDate=1470850746803, keyId=-2093956963, > userId=hbase, blockPoolId=BP-641703426-10.17.1.2-1468517918886, > blockId=1077120201, access modes=[READ]) is expired. > at > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.checkAccess(BlockTokenSecretManager.java:280) > at > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.checkAccess(BlockTokenSecretManager.java:301) > at > org.apache.hadoop.hdfs.security.token.block.BlockPoolTokenSecretManager.checkAccess(BlockPoolTokenSecretManager.java:97) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1236) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:481) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:242) > at java.lang.Thread.run(Thread.java:745) > {noformat} > This is not a server error and the DataXceiver#checkAccess() has already > loged the InvalidToken as a warning. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10679) libhdfs++: Implement parallel find with wildcards tool
[ https://issues.apache.org/jira/browse/HDFS-10679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419141#comment-15419141 ] James Clampffer commented on HDFS-10679: Awesome patch! Your benchmarks demonstrate a lot of the reasons this library was built in the first place. Not only is it way faster than the Java client, it's using significant fewer resources. Things like page faults and contexts switches slow down the program but also have significant externalized costs in the form of cache/TLB pollution, extra use of bus bandwidth, and extra IRQ handling that impact everything else running on the system. Not much you can do if you're stuck in a java environment because cpu/memory bound things are never going to win there, but for people writing new code in C/C++ minimizing those costs is a huge win. About the code: 1) {code} void FileSystemImpl::FindShim(const Status &stat, std::shared_ptr> stat_infos, bool directory_has_more, std::string path, const std::string &name, std::shared_ptr recursion_counter, std::shared_ptr lock, std::shared_ptr> dirs, uint32_t position, bool searchPath, const std::function>, bool)> &handler) { {code} Using this many arguments for a large function makes it really hard to distinguish what are local variables and what was passed in. Could you bundle these up into a struct/class that represents the state? That way any time a developer sees some_arg_struct->lock they can infer that it was passed in as an argument. The other benefit that this gives is later on, when you do lambda capture by \[=\] you could just explicitly bind to that struct type for the recursion. This code tends to be very dense so being explicit with capture lists and type names e.g. avoiding "auto" in non-trivial statements can do a lot to improve maintainability. 2) You have a lot of good comments about the control flow. Could you add a few about higher level design? > libhdfs++: Implement parallel find with wildcards tool > -- > > Key: HDFS-10679 > URL: https://issues.apache.org/jira/browse/HDFS-10679 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Anatoli Shein >Assignee: Anatoli Shein > Attachments: HDFS-10679.HDFS-8707.000.patch, > HDFS-10679.HDFS-8707.001.patch, HDFS-10679.HDFS-8707.002.patch, > HDFS-10679.HDFS-8707.003.patch, HDFS-10679.HDFS-8707.004.patch, > HDFS-10679.HDFS-8707.005.patch, HDFS-10679.HDFS-8707.006.patch, > HDFS-10679.HDFS-8707.007.patch, HDFS-10679.HDFS-8707.008.patch, > HDFS-10679.HDFS-8707.009.patch > > > The find tool will issue the GetListing namenode operation on a given > directory, and filter the results using posix globbing library. > If the recursive option is selected, for each returned entry that is a > directory the tool will issue another asynchronous call GetListing and repeat > the result processing in a recursive fashion. > One implementation issue that needs to be addressed is the way how results > are returned back to the user: we can either buffer the results and return > them to the user in bulk, or we can return results continuously as they > arrive. While buffering would be an easier solution, returning results as > they arrive would be more beneficial to the user in terms of performance, > since the result processing can start as soon as the first results arrive > without any delay. In order to do that we need the user to use a loop to > process arriving results, and we need to send a special message back to the > user when the search is over. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9696) Garbage snapshot records lingering forever
[ https://issues.apache.org/jira/browse/HDFS-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419139#comment-15419139 ] Tsz Wo Nicholas Sze commented on HDFS-9696: --- [~kihwal], the idea is simple and great! Please submit a patch. Thanks! > Garbage snapshot records lingering forever > -- > > Key: HDFS-9696 > URL: https://issues.apache.org/jira/browse/HDFS-9696 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Critical > > We have a cluster where the snapshot feature might have been tested years > ago. When the HDFS does not have any snapshot, but I see filediff records > persisted in its fsimage. Since it has been restarted many times and > checkpointed over 100 times since then, it must haven been persisted and > carried over since then. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-9696) Garbage snapshot records lingering forever
[ https://issues.apache.org/jira/browse/HDFS-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-9696: - Target Version/s: 2.6.5, 2.7.4 > Garbage snapshot records lingering forever > -- > > Key: HDFS-9696 > URL: https://issues.apache.org/jira/browse/HDFS-9696 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Critical > > We have a cluster where the snapshot feature might have been tested years > ago. When the HDFS does not have any snapshot, but I see filediff records > persisted in its fsimage. Since it has been restarted many times and > checkpointed over 100 times since then, it must haven been persisted and > carried over since then. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9696) Garbage snapshot records lingering forever
[ https://issues.apache.org/jira/browse/HDFS-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419099#comment-15419099 ] Kihwal Lee commented on HDFS-9696: -- Does something like this make sense? Saving a diff section involves iterating the entire inode map. When there is no snapshot, we can potentially cut down fsimage saving time and reduce java object generation. {code} --- a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java +++ b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java @@ -496,7 +496,10 @@ private void saveInternal(FileOutputStream fout, Step step = new Step(StepType.INODES, filePath); prog.beginStep(Phase.SAVING_CHECKPOINT, step); saveInodes(b); - saveSnapshots(b); + if (context.getSourceNamesystem().getSnapshotManager() + .getNumSnapshots() > 0) { +saveSnapshots(b); + } prog.endStep(Phase.SAVING_CHECKPOINT, step); {code} If no one objects, I will add a test case and submit a patch. > Garbage snapshot records lingering forever > -- > > Key: HDFS-9696 > URL: https://issues.apache.org/jira/browse/HDFS-9696 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Kihwal Lee >Priority: Critical > > We have a cluster where the snapshot feature might have been tested years > ago. When the HDFS does not have any snapshot, but I see filediff records > persisted in its fsimage. Since it has been restarted many times and > checkpointed over 100 times since then, it must haven been persisted and > carried over since then. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-9696) Garbage snapshot records lingering forever
[ https://issues.apache.org/jira/browse/HDFS-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee reassigned HDFS-9696: Assignee: Kihwal Lee > Garbage snapshot records lingering forever > -- > > Key: HDFS-9696 > URL: https://issues.apache.org/jira/browse/HDFS-9696 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Critical > > We have a cluster where the snapshot feature might have been tested years > ago. When the HDFS does not have any snapshot, but I see filediff records > persisted in its fsimage. Since it has been restarted many times and > checkpointed over 100 times since then, it must haven been persisted and > carried over since then. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10746) libhdfs++: synchronize access to working_directory and bytes_read_.
[ https://issues.apache.org/jira/browse/HDFS-10746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anatoli Shein updated HDFS-10746: - Attachment: HDFS-10746.HDFS-8707.001.patch Reattaching for a CI run > libhdfs++: synchronize access to working_directory and bytes_read_. > --- > > Key: HDFS-10746 > URL: https://issues.apache.org/jira/browse/HDFS-10746 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Anatoli Shein >Assignee: Anatoli Shein > Attachments: HDFS-10746.HDFS-8707.000.patch, > HDFS-10746.HDFS-8707.001.patch > > > std::string working_directory is located in hdfs.cc and access to it should > be synchronized with locks. > uint64_t bytes_read_; is located in filehandle.h and it should be made atomic > in order to be thread safe when multithreading becomes available. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9696) Garbage snapshot records lingering forever
[ https://issues.apache.org/jira/browse/HDFS-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419080#comment-15419080 ] Kihwal Lee commented on HDFS-9696: -- One basic sanity check can be done for cases where there is no snapshot. When saving snapshot diff section, we can call {{getNumSnapshots()}} to check whether there is any snapshot. If none, saving diff section can be skipped. > Garbage snapshot records lingering forever > -- > > Key: HDFS-9696 > URL: https://issues.apache.org/jira/browse/HDFS-9696 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Kihwal Lee >Priority: Critical > > We have a cluster where the snapshot feature might have been tested years > ago. When the HDFS does not have any snapshot, but I see filediff records > persisted in its fsimage. Since it has been restarted many times and > checkpointed over 100 times since then, it must haven been persisted and > carried over since then. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10746) libhdfs++: synchronize access to working_directory and bytes_read_.
[ https://issues.apache.org/jira/browse/HDFS-10746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419077#comment-15419077 ] James Clampffer commented on HDFS-10746: Looks good to me, +1. Will commit once CI runs. > libhdfs++: synchronize access to working_directory and bytes_read_. > --- > > Key: HDFS-10746 > URL: https://issues.apache.org/jira/browse/HDFS-10746 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Anatoli Shein >Assignee: Anatoli Shein > Attachments: HDFS-10746.HDFS-8707.000.patch > > > std::string working_directory is located in hdfs.cc and access to it should > be synchronized with locks. > uint64_t bytes_read_; is located in filehandle.h and it should be made atomic > in order to be thread safe when multithreading becomes available. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10740) libhdfs++: Implement recursive directory generator
[ https://issues.apache.org/jira/browse/HDFS-10740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Clampffer updated HDFS-10740: --- Resolution: Fixed Status: Resolved (was: Patch Available) > libhdfs++: Implement recursive directory generator > -- > > Key: HDFS-10740 > URL: https://issues.apache.org/jira/browse/HDFS-10740 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Anatoli Shein >Assignee: Anatoli Shein > Attachments: HDFS-10740.HDFS-8707.000.patch, > HDFS-10740.HDFS-8707.001.patch, HDFS-10740.HDFS-8707.002.patch > > > This tool will allow us do benchmarking/testing our find functionality, and > will be a good example showing how to call a large number or namenode > operations reqursively. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10740) libhdfs++: Implement recursive directory generator
[ https://issues.apache.org/jira/browse/HDFS-10740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419033#comment-15419033 ] James Clampffer commented on HDFS-10740: Committed to HDFS-8707. Thanks [~anatoli.shein]! > libhdfs++: Implement recursive directory generator > -- > > Key: HDFS-10740 > URL: https://issues.apache.org/jira/browse/HDFS-10740 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Anatoli Shein >Assignee: Anatoli Shein > Attachments: HDFS-10740.HDFS-8707.000.patch, > HDFS-10740.HDFS-8707.001.patch, HDFS-10740.HDFS-8707.002.patch > > > This tool will allow us do benchmarking/testing our find functionality, and > will be a good example showing how to call a large number or namenode > operations reqursively. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9696) Garbage snapshot records lingering forever
[ https://issues.apache.org/jira/browse/HDFS-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419007#comment-15419007 ] Yongjun Zhang commented on HDFS-9696: - Thanks for the info [~kihwal]! > Garbage snapshot records lingering forever > -- > > Key: HDFS-9696 > URL: https://issues.apache.org/jira/browse/HDFS-9696 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Kihwal Lee >Priority: Critical > > We have a cluster where the snapshot feature might have been tested years > ago. When the HDFS does not have any snapshot, but I see filediff records > persisted in its fsimage. Since it has been restarted many times and > checkpointed over 100 times since then, it must haven been persisted and > carried over since then. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-10759) Change fsimage bool isStriped from boolean to an enum
Ewan Higgs created HDFS-10759: - Summary: Change fsimage bool isStriped from boolean to an enum Key: HDFS-10759 URL: https://issues.apache.org/jira/browse/HDFS-10759 Project: Hadoop HDFS Issue Type: Bug Components: hdfs Affects Versions: 3.0.0-alpha1, 3.0.0-beta1, 3.0.0-alpha2 Reporter: Ewan Higgs The new erasure coding project has updated the protocol for fsimage such that the {{INodeFile}} has a boolean '{{isStriped}}'. I think this is better as an enum or integer since a boolean precludes any future block types. For example: {code} enum BlockType { CONTIGUOUS = 0, STRIPED = 1, } {code} We can also make this more robust to future changes where there are different block types supported in a staged rollout. Here, we would use {{UNKNOWN_BLOCK_TYPE}} as the first value since this is the default value. See [here|http://androiddevblog.com/protocol-buffers-pitfall-adding-enum-values/] for more discussion. {code} enum BlockType { UNKNOWN_BLOCK_TYPE = 0, CONTIGUOUS = 1, STRIPED = 2, } {code} But I'm not convinced this is necessary since there are other enums that don't use this approach. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10755) TestDecommissioningStatus BindException Failure
[ https://issues.apache.org/jira/browse/HDFS-10755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15418985#comment-15418985 ] Hadoop QA commented on HDFS-10755: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 76m 41s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 99m 59s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12823464/HDFS-10755.002.patch | | JIRA Issue | HDFS-10755 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 5e3df241e573 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 9019606 | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/16413/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/16413/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/16413/console | | Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > TestDecommissioningStatus BindException Failure > --- > > Key: HDFS-10755 > URL: https://issues.apache.org/jira/browse/HDFS-10755 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-10755.001.pa
[jira] [Commented] (HDFS-10758) ReconfigurableBase can log sensitive information
[ https://issues.apache.org/jira/browse/HDFS-10758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15418982#comment-15418982 ] Sean Busbey commented on HDFS-10758: bq. I think a generic mechanism for redacting sensitive information for textual display will be useful to some of the web UIs too. Should this be in the Hadoop Common tracker so that the solution can be leveraged by both HDFS and YARN? > ReconfigurableBase can log sensitive information > > > Key: HDFS-10758 > URL: https://issues.apache.org/jira/browse/HDFS-10758 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Sean Mackrory >Assignee: Sean Mackrory > > ReconfigurableBase will log old and new configuration values, which may cause > sensitive parameters (most notably cloud storage keys, though there may be > other instances) to get included in the logs. > Given the currently small list of reconfigurable properties, an argument > could be made for simply not logging the property values at all, but this is > not the only instance where potentially sensitive configuration gets written > somewhere else in plaintext. I think a generic mechanism for redacting > sensitive information for textual display will be useful to some of the web > UIs too. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-10758) ReconfigurableBase can log sensitive information
Sean Mackrory created HDFS-10758: Summary: ReconfigurableBase can log sensitive information Key: HDFS-10758 URL: https://issues.apache.org/jira/browse/HDFS-10758 Project: Hadoop HDFS Issue Type: Bug Reporter: Sean Mackrory Assignee: Sean Mackrory ReconfigurableBase will log old and new configuration values, which may cause sensitive parameters (most notably cloud storage keys, though there may be other instances) to get included in the logs. Given the currently small list of reconfigurable properties, an argument could be made for simply not logging the property values at all, but this is not the only instance where potentially sensitive configuration gets written somewhere else in plaintext. I think a generic mechanism for redacting sensitive information for textual display will be useful to some of the web UIs too. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9696) Garbage snapshot records lingering forever
[ https://issues.apache.org/jira/browse/HDFS-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15418942#comment-15418942 ] Kihwal Lee commented on HDFS-9696: -- It turns out that HDFS-9406 is not related to this issue. The garbage snapshot filediffs with snapshotId=-1 were being generated by a bug fixed in HDFS-7056 by [~zero45]. {code} /** Is this inode in the latest snapshot? */ public final boolean isInLatestSnapshot(final int latestSnapshotId) { -if (latestSnapshotId == Snapshot.CURRENT_STATE_ID) { +if (latestSnapshotId == Snapshot.CURRENT_STATE_ID || +latestSnapshotId == Snapshot.NO_SNAPSHOT_ID) { return false; } {code} [~shv] explained, {quote} (7) Plamen says this is because Snapshot.findLatestSnapshot() may return NO_SNAPSHOT_ID, which breaks recordModification() if you don't have that additional check. We see it when commitBlockSynchronization() is called for truncated block. {quote} We have actually traced the generation of these filediff entries to {{commitBlockSynchronization()}} activities when the NN was running 2.5. This stops in 2.7 thanks to HDFS-7056. However, the garbage lives on until those files are deleted. Can we have a sanity check during snapshot diff loading so that these entries can be discarded? > Garbage snapshot records lingering forever > -- > > Key: HDFS-9696 > URL: https://issues.apache.org/jira/browse/HDFS-9696 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Kihwal Lee >Priority: Critical > > We have a cluster where the snapshot feature might have been tested years > ago. When the HDFS does not have any snapshot, but I see filediff records > persisted in its fsimage. Since it has been restarted many times and > checkpointed over 100 times since then, it must haven been persisted and > carried over since then. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-9696) Garbage snapshot records lingering forever
[ https://issues.apache.org/jira/browse/HDFS-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee reopened HDFS-9696: -- Assignee: (was: Yongjun Zhang) > Garbage snapshot records lingering forever > -- > > Key: HDFS-9696 > URL: https://issues.apache.org/jira/browse/HDFS-9696 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.2 >Reporter: Kihwal Lee >Priority: Critical > > We have a cluster where the snapshot feature might have been tested years > ago. When the HDFS does not have any snapshot, but I see filediff records > persisted in its fsimage. Since it has been restarted many times and > checkpointed over 100 times since then, it must haven been persisted and > carried over since then. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10755) TestDecommissioningStatus BindException Failure
[ https://issues.apache.org/jira/browse/HDFS-10755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated HDFS-10755: --- Attachment: HDFS-10755.002.patch Attaching patch to address the checkstyle comments. Both of the test failures seem unrelated and they did not fail locally when I ran them with this patch. > TestDecommissioningStatus BindException Failure > --- > > Key: HDFS-10755 > URL: https://issues.apache.org/jira/browse/HDFS-10755 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-10755.001.patch, HDFS-10755.002.patch > > > Tests in TestDecomissioningStatus call MiniDFSCluster.dataNodeRestart(). They > are required to come back up on the same (initially ephemeral) port that they > were on before being shutdown. Because of this, there is an inherent race > condition where another process could bind to the port while the datanode is > down. If this happens then we get a BindException failure. However, all of > the tests in TestDecommissioningStatus depend on the cluster being up and > running for them to run correctly. So if a test blows up the cluster, the > subsequent tests will also fail. Below I show the BindException failure as > well as the subsequent test failure that occurred. > {noformat} > java.net.BindException: Problem binding to [localhost:35370] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:436) > at sun.nio.ch.Net.bind(Net.java:428) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at org.apache.hadoop.ipc.Server.bind(Server.java:430) > at org.apache.hadoop.ipc.Server$Listener.(Server.java:768) > at org.apache.hadoop.ipc.Server.(Server.java:2391) > at org.apache.hadoop.ipc.RPC$Server.(RPC.java:951) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:523) > at > org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:498) > at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:796) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.initIpcServer(DataNode.java:802) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1134) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:429) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2387) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2274) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2321) > at > org.apache.hadoop.hdfs.MiniDFSCluster.restartDataNode(MiniDFSCluster.java:2037) > at > org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.testDecommissionDeadDN(TestDecommissioningStatus.java:426) > {noformat} > {noformat} > java.lang.AssertionError: Number of Datanodes expected:<2> but was:<1> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.testDecommissionStatus(TestDecommissioningStatus.java:275) > {noformat} > I don't think there's any way to avoid the inherent race condition with > getting the same ephemeral port, but we can definitely fix the tests so that > it doesn't cause subsequent tests to fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10747) o.a.h.hdfs.tools.DebugAdmin usage message is misleading
[ https://issues.apache.org/jira/browse/HDFS-10747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15418830#comment-15418830 ] Wei-Chiu Chuang commented on HDFS-10747: Yes the usage is misleading. Thanks [~liuml07] for bringing this up. I think the patch looks good to me. > o.a.h.hdfs.tools.DebugAdmin usage message is misleading > --- > > Key: HDFS-10747 > URL: https://issues.apache.org/jira/browse/HDFS-10747 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 2.7.0 >Reporter: Mingliang Liu >Assignee: Mingliang Liu >Priority: Minor > Attachments: HDFS-10747.000.patch > > > [HDFS-6917] added a helpful hdfs debug command to validate blocks and call > recoverlease. The usage doc is kinda misleading, as following: > {code} > $ hdfs debug verify > creating a new configuration > verify [-meta ] [-block ] > Verify HDFS metadata and block files. If a block file is specified, we > will verify that the checksums in the metadata file match the block > file. > {code} > Actually the {{-meta }} is necessary. {{[]}} is for optional > arguments, if we follow the > [convention|http://pubs.opengroup.org/onlinepubs/9699919799]. > {code} > $ hdfs debug recoverLease > creating a new configuration > recoverLease [-path ] [-retries ] > Recover the lease on the specified path. The path must reside on an > HDFS filesystem. The default number of retries is 1. > {code} > {{-path }} is also the same case. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10731) FSDirectory#verifyMaxDirItems does not log path name
[ https://issues.apache.org/jira/browse/HDFS-10731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15418813#comment-15418813 ] Hudson commented on HDFS-10731: --- SUCCESS: Integrated in Hadoop-trunk-Commit #10268 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/10268/]) HDFS-10731. FSDirectory#verifyMaxDirItems does not log path name. (weichiu: rev 9019606b69bfb7019c8642b6cbcbb93645cc19e3) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/FSLimitException.java > FSDirectory#verifyMaxDirItems does not log path name > > > Key: HDFS-10731 > URL: https://issues.apache.org/jira/browse/HDFS-10731 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.2 >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Minor > Labels: supportability > Fix For: 2.8.0 > > Attachments: HDFS-10731.001.patch > > > {quote} > 2016-08-05 14:42:04,687 ERROR > org.apache.hadoop.hdfs.server.namenode.NameNode: > FSDirectory.verifyMaxDirItems: The directory item limit of null is exceeded: > limit=1048576 items=1048576 > {quote} > The error message above logs the path name incorrectly (null). Without the > path name it is hard to tell which directory is in trouble. The exception > should set the path name before being logged. > This bug was seen on a CDH 5.5.2 cluster, but CDH5.5.2 is roughly up to date > with Apache Hadoop 2.7.2. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10731) FSDirectory#verifyMaxDirItems does not log path name
[ https://issues.apache.org/jira/browse/HDFS-10731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-10731: --- Resolution: Fixed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) Committed to trunk, branch-2 and branch-2.8 Thanks [~xiaochen] for reviewing the patch! > FSDirectory#verifyMaxDirItems does not log path name > > > Key: HDFS-10731 > URL: https://issues.apache.org/jira/browse/HDFS-10731 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.2 >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Minor > Labels: supportability > Fix For: 2.8.0 > > Attachments: HDFS-10731.001.patch > > > {quote} > 2016-08-05 14:42:04,687 ERROR > org.apache.hadoop.hdfs.server.namenode.NameNode: > FSDirectory.verifyMaxDirItems: The directory item limit of null is exceeded: > limit=1048576 items=1048576 > {quote} > The error message above logs the path name incorrectly (null). Without the > path name it is hard to tell which directory is in trouble. The exception > should set the path name before being logged. > This bug was seen on a CDH 5.5.2 cluster, but CDH5.5.2 is roughly up to date > with Apache Hadoop 2.7.2. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-10754) libhdfs++: Create tools directory and implement hdfs_cat
[ https://issues.apache.org/jira/browse/HDFS-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anatoli Shein reassigned HDFS-10754: Assignee: Anatoli Shein > libhdfs++: Create tools directory and implement hdfs_cat > > > Key: HDFS-10754 > URL: https://issues.apache.org/jira/browse/HDFS-10754 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Anatoli Shein >Assignee: Anatoli Shein > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10609) Uncaught InvalidEncryptionKeyException during pipeline recovery may abort downstream applications
[ https://issues.apache.org/jira/browse/HDFS-10609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15418759#comment-15418759 ] Wei-Chiu Chuang commented on HDFS-10609: [~xiaochen] would you mind to take a look at this patch? Thx! > Uncaught InvalidEncryptionKeyException during pipeline recovery may abort > downstream applications > - > > Key: HDFS-10609 > URL: https://issues.apache.org/jira/browse/HDFS-10609 > Project: Hadoop HDFS > Issue Type: Bug > Components: encryption >Affects Versions: 2.6.0 > Environment: CDH5.8.0 >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-10609.001.patch, HDFS-10609.002.patch > > > In normal operations, if SASL negotiation fails due to > {{InvalidEncryptionKeyException}}, it is typically a benign exception, which > is caught and retried : > {code:title=SaslDataTransferServer#doSaslHandshake} > if (ioe instanceof SaslException && > ioe.getCause() != null && > ioe.getCause() instanceof InvalidEncryptionKeyException) { > // This could just be because the client is long-lived and hasn't gotten > // a new encryption key from the NN in a while. Upon receiving this > // error, the client will get a new encryption key from the NN and retry > // connecting to this DN. > sendInvalidKeySaslErrorMessage(out, ioe.getCause().getMessage()); > } > {code} > {code:title=DFSOutputStream.DataStreamer#createBlockOutputStream} > if (ie instanceof InvalidEncryptionKeyException && refetchEncryptionKey > 0) { > DFSClient.LOG.info("Will fetch a new encryption key and retry, " > + "encryption key was invalid when connecting to " > + nodes[0] + " : " + ie); > {code} > However, if the exception is thrown during pipeline recovery, the > corresponding code does not handle it properly, and the exception is spilled > out to downstream applications, such as SOLR, aborting its operation: > {quote} > 2016-07-06 12:12:51,992 ERROR org.apache.solr.update.HdfsTransactionLog: > Exception closing tlog. > org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: > Can't re-compute encryption key for nonce, since the required block key > (keyID=557709482) doesn't exist. Current key: 1350592619 > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.transfer(DFSOutputStream.java:1308) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1272) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1433) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:1147) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:632) > 2016-07-06 12:12:51,997 ERROR org.apache.solr.update.CommitTracker: auto > commit error...:org.apache.solr.common.SolrException: > org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: > Can't re-compute encryption key for nonce, since the required block key > (keyID=557709482) doesn't exist. Current key: 1350592619 > at > org.apache.solr.update.HdfsTransactionLog.close(HdfsTransactionLog.java:316) > at > org.apache.solr.update.TransactionLog.decref(TransactionLog.java:505) > at org.apache.solr.update.UpdateLog.addOldLog(UpdateLog.java:380) > at org.apache.solr.update.UpdateLog.postCommit(UpdateLog.java:676) > at > org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:623) > at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurr
[jira] [Updated] (HDFS-8312) Trash does not descent into child directories to check for permissions
[ https://issues.apache.org/jira/browse/HDFS-8312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated HDFS-8312: -- Priority: Critical (was: Major) > Trash does not descent into child directories to check for permissions > -- > > Key: HDFS-8312 > URL: https://issues.apache.org/jira/browse/HDFS-8312 > Project: Hadoop HDFS > Issue Type: Bug > Components: fs, security >Affects Versions: 2.2.0, 2.6.0, 2.7.2 >Reporter: Eric Yang >Assignee: Weiwei Yang >Priority: Critical > Attachments: HDFS-8312-testcase.patch > > > HDFS trash does not descent into child directory to check if user has > permission to delete files. For example: > Run the following command to initialize directory structure as super user: > {code} > hadoop fs -mkdir /BSS/level1 > hadoop fs -mkdir /BSS/level1/level2 > hadoop fs -mkdir /BSS/level1/level2/level3 > hadoop fs -put /tmp/appConfig.json /BSS/level1/level2/level3/testfile.txt > hadoop fs -chown user1:users /BSS/level1/level2/level3/testfile.txt > hadoop fs -chown -R user1:users /BSS/level1 > hadoop fs -chown -R 750 /BSS/level1 > hadoop fs -chmod -R 640 /BSS/level1/level2/level3/testfile.txt > hadoop fs -chmod 775 /BSS > {code} > Change to a normal user called user2. > When trash is enabled: > {code} > sudo su user2 - > hadoop fs -rm -r /BSS/level1 > 15/05/01 16:51:20 INFO fs.TrashPolicyDefault: Namenode trash configuration: > Deletion interval = 3600 minutes, Emptier interval = 0 minutes. > Moved: 'hdfs://bdvs323.svl.ibm.com:9000/BSS/level1' to trash at: > hdfs://bdvs323.svl.ibm.com:9000/user/user2/.Trash/Current > {code} > When trash is disabled: > {code} > /opt/ibm/biginsights/IHC/bin/hadoop fs -Dfs.trash.interval=0 -rm -r > /BSS/level1 > 15/05/01 16:58:31 INFO fs.TrashPolicyDefault: Namenode trash configuration: > Deletion interval = 0 minutes, Emptier interval = 0 minutes. > rm: Permission denied: user=user2, access=ALL, > inode="/BSS/level1":user1:users:drwxr-x--- > {code} > There is inconsistency between trash behavior and delete behavior. When > trash is enabled, files owned by user1 is deleted by user2. It looks like > trash does not recursively validate if the child directory files can be > removed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10757) KMSClientProvider combined with KeyProviderCache can result in wrong UGI being used
[ https://issues.apache.org/jira/browse/HDFS-10757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15418409#comment-15418409 ] Jitendra Nath Pandey commented on HDFS-10757: - I think storing the {{actualUgi}} in KMSClientProvider is incorrect because the providers are cached for a long time, and the currentUGI may be completely different from the actualUGI. Therefore, it may be a good idea to consider removing actualUgi from KMSClientProvider. I am inclined to say that setting up of the UGI should be done by client code using the FileSystem. The KMSClientProvider on every call should only check following: If the currentUGI has a realUgi, us the realUgi as actualUgi or use the currentUgi as the actualUgi. I may not have the whole context on why actualUgi was added in the constructor of KMSClientProvider, but would like to understand. > KMSClientProvider combined with KeyProviderCache can result in wrong UGI > being used > --- > > Key: HDFS-10757 > URL: https://issues.apache.org/jira/browse/HDFS-10757 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Sergey Shelukhin >Priority: Critical > > ClientContext::get gets the context from CACHE via a config setting based > name, then KeyProviderCache stored in ClientContext gets the key provider > cached by URI from the configuration, too. These would return the same > KeyProvider regardless of current UGI. > KMSClientProvider caches the UGI (actualUgi) in ctor; that means in > particular that all the users of DFS with KMSClientProvider in a process will > get the KMS token (along with other credentials) of the first user, via the > above cache. > Either KMSClientProvider shouldn't store the UGI, or one of the caches should > be UGI-aware, like the FS object cache. > Side note: the comment in createConnection that purports to handle the > different UGI doesn't seem to cover what it says it covers. In our case, we > have two unrelated UGIs with no auth (createRemoteUser) with bunch of tokens, > including a KMS token, added. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-8668) Erasure Coding: revisit buffer used for encoding and decoding.
[ https://issues.apache.org/jira/browse/HDFS-8668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SammiChen updated HDFS-8668: Attachment: HDFS-8668-v9.patch > Erasure Coding: revisit buffer used for encoding and decoding. > -- > > Key: HDFS-8668 > URL: https://issues.apache.org/jira/browse/HDFS-8668 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Yi Liu >Assignee: SammiChen > Attachments: HDFS-8668-v1.patch, HDFS-8668-v2.patch, > HDFS-8668-v3.patch, HDFS-8668-v4.patch, HDFS-8668-v5.patch, > HDFS-8668-v6.patch, HDFS-8668-v7.patch, HDFS-8668-v8.patch, HDFS-8668-v9.patch > > > For encoding and decoding buffers, currently some places use java heap > ByteBuffer, some use direct byteBUffer, and some use java byte array. If > the coder implementation is native, we should use direct ByteBuffer. This > jira is to revisit all encoding/decoding buffers and improve them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-8312) Trash does not descent into child directories to check for permissions
[ https://issues.apache.org/jira/browse/HDFS-8312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15418538#comment-15418538 ] Weiwei Yang commented on HDFS-8312: --- Attached a test case demonstrated this issue, see [^HDFS-8312-testcase.patch]. Also thinking on how to fix it. There are generally two options 1) Trash now calls {{FileSystem.rename}} to move file/dir to trash dir, so it only checks rename permission. A fix is to add a new method to check if delete is permitted, expose that from FileSystem API so we can check if user has permission to delete before rename in trash code. 2) Improve {{Emptier}} code logic to let emptier run per user, so even user removes somebody else stuff to trash, the emptier will still not be able to remove it because it's not permitted by this user. This is better than delete ... I personal prefer option 1 because 2 looks like a partial fix, we should avoid user moving things to trash if it is not allowed to at first place. Any suggestions ? Appreciate! > Trash does not descent into child directories to check for permissions > -- > > Key: HDFS-8312 > URL: https://issues.apache.org/jira/browse/HDFS-8312 > Project: Hadoop HDFS > Issue Type: Bug > Components: fs, security >Affects Versions: 2.2.0, 2.6.0, 2.7.2 >Reporter: Eric Yang >Assignee: Weiwei Yang > Attachments: HDFS-8312-testcase.patch > > > HDFS trash does not descent into child directory to check if user has > permission to delete files. For example: > Run the following command to initialize directory structure as super user: > {code} > hadoop fs -mkdir /BSS/level1 > hadoop fs -mkdir /BSS/level1/level2 > hadoop fs -mkdir /BSS/level1/level2/level3 > hadoop fs -put /tmp/appConfig.json /BSS/level1/level2/level3/testfile.txt > hadoop fs -chown user1:users /BSS/level1/level2/level3/testfile.txt > hadoop fs -chown -R user1:users /BSS/level1 > hadoop fs -chown -R 750 /BSS/level1 > hadoop fs -chmod -R 640 /BSS/level1/level2/level3/testfile.txt > hadoop fs -chmod 775 /BSS > {code} > Change to a normal user called user2. > When trash is enabled: > {code} > sudo su user2 - > hadoop fs -rm -r /BSS/level1 > 15/05/01 16:51:20 INFO fs.TrashPolicyDefault: Namenode trash configuration: > Deletion interval = 3600 minutes, Emptier interval = 0 minutes. > Moved: 'hdfs://bdvs323.svl.ibm.com:9000/BSS/level1' to trash at: > hdfs://bdvs323.svl.ibm.com:9000/user/user2/.Trash/Current > {code} > When trash is disabled: > {code} > /opt/ibm/biginsights/IHC/bin/hadoop fs -Dfs.trash.interval=0 -rm -r > /BSS/level1 > 15/05/01 16:58:31 INFO fs.TrashPolicyDefault: Namenode trash configuration: > Deletion interval = 0 minutes, Emptier interval = 0 minutes. > rm: Permission denied: user=user2, access=ALL, > inode="/BSS/level1":user1:users:drwxr-x--- > {code} > There is inconsistency between trash behavior and delete behavior. When > trash is enabled, files owned by user1 is deleted by user2. It looks like > trash does not recursively validate if the child directory files can be > removed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-8312) Trash does not descent into child directories to check for permissions
[ https://issues.apache.org/jira/browse/HDFS-8312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated HDFS-8312: -- Attachment: HDFS-8312-testcase.patch > Trash does not descent into child directories to check for permissions > -- > > Key: HDFS-8312 > URL: https://issues.apache.org/jira/browse/HDFS-8312 > Project: Hadoop HDFS > Issue Type: Bug > Components: fs, security >Affects Versions: 2.2.0, 2.6.0, 2.7.2 >Reporter: Eric Yang >Assignee: Weiwei Yang > Attachments: HDFS-8312-testcase.patch > > > HDFS trash does not descent into child directory to check if user has > permission to delete files. For example: > Run the following command to initialize directory structure as super user: > {code} > hadoop fs -mkdir /BSS/level1 > hadoop fs -mkdir /BSS/level1/level2 > hadoop fs -mkdir /BSS/level1/level2/level3 > hadoop fs -put /tmp/appConfig.json /BSS/level1/level2/level3/testfile.txt > hadoop fs -chown user1:users /BSS/level1/level2/level3/testfile.txt > hadoop fs -chown -R user1:users /BSS/level1 > hadoop fs -chown -R 750 /BSS/level1 > hadoop fs -chmod -R 640 /BSS/level1/level2/level3/testfile.txt > hadoop fs -chmod 775 /BSS > {code} > Change to a normal user called user2. > When trash is enabled: > {code} > sudo su user2 - > hadoop fs -rm -r /BSS/level1 > 15/05/01 16:51:20 INFO fs.TrashPolicyDefault: Namenode trash configuration: > Deletion interval = 3600 minutes, Emptier interval = 0 minutes. > Moved: 'hdfs://bdvs323.svl.ibm.com:9000/BSS/level1' to trash at: > hdfs://bdvs323.svl.ibm.com:9000/user/user2/.Trash/Current > {code} > When trash is disabled: > {code} > /opt/ibm/biginsights/IHC/bin/hadoop fs -Dfs.trash.interval=0 -rm -r > /BSS/level1 > 15/05/01 16:58:31 INFO fs.TrashPolicyDefault: Namenode trash configuration: > Deletion interval = 0 minutes, Emptier interval = 0 minutes. > rm: Permission denied: user=user2, access=ALL, > inode="/BSS/level1":user1:users:drwxr-x--- > {code} > There is inconsistency between trash behavior and delete behavior. When > trash is enabled, files owned by user1 is deleted by user2. It looks like > trash does not recursively validate if the child directory files can be > removed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9530) ReservedSpace is not cleared for abandoned Blocks
[ https://issues.apache.org/jira/browse/HDFS-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15418460#comment-15418460 ] Brahma Reddy Battula commented on HDFS-9530: Ok.thanks for feedback allen. > ReservedSpace is not cleared for abandoned Blocks > - > > Key: HDFS-9530 > URL: https://issues.apache.org/jira/browse/HDFS-9530 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0, 2.7.1 >Reporter: Fei Hui >Assignee: Brahma Reddy Battula >Priority: Critical > Fix For: 2.7.3 > > Attachments: HDFS-9530-01.patch, HDFS-9530-02.patch, > HDFS-9530-03.patch, HDFS-9530-branch-2.6.patch, > HDFS-9530-branch-2.7-001.patch, HDFS-9530-branch-2.7-002.patch > > > i think there are bugs in HDFS > === > here is config > > dfs.datanode.data.dir > > > file:///mnt/disk4,file:///mnt/disk1,file:///mnt/disk3,file:///mnt/disk2 > > > here is dfsadmin report > [hadoop@worker-1 ~]$ hadoop dfsadmin -report > DEPRECATED: Use of this script to execute hdfs command is deprecated. > Instead use the hdfs command for it. > Configured Capacity: 240769253376 (224.23 GB) > Present Capacity: 238604832768 (222.22 GB) > DFS Remaining: 215772954624 (200.95 GB) > DFS Used: 22831878144 (21.26 GB) > DFS Used%: 9.57% > Under replicated blocks: 4 > Blocks with corrupt replicas: 0 > Missing blocks: 0 > - > Live datanodes (3): > Name: 10.117.60.59:50010 (worker-2) > Hostname: worker-2 > Decommission Status : Normal > Configured Capacity: 80256417792 (74.74 GB) > DFS Used: 7190958080 (6.70 GB) > Non DFS Used: 721473536 (688.05 MB) > DFS Remaining: 72343986176 (67.38 GB) > DFS Used%: 8.96% > DFS Remaining%: 90.14% > Configured Cache Capacity: 0 (0 B) > Cache Used: 0 (0 B) > Cache Remaining: 0 (0 B) > Cache Used%: 100.00% > Cache Remaining%: 0.00% > Xceivers: 1 > Last contact: Wed Dec 09 15:55:02 CST 2015 > Name: 10.168.156.0:50010 (worker-3) > Hostname: worker-3 > Decommission Status : Normal > Configured Capacity: 80256417792 (74.74 GB) > DFS Used: 7219073024 (6.72 GB) > Non DFS Used: 721473536 (688.05 MB) > DFS Remaining: 72315871232 (67.35 GB) > DFS Used%: 9.00% > DFS Remaining%: 90.11% > Configured Cache Capacity: 0 (0 B) > Cache Used: 0 (0 B) > Cache Remaining: 0 (0 B) > Cache Used%: 100.00% > Cache Remaining%: 0.00% > Xceivers: 1 > Last contact: Wed Dec 09 15:55:03 CST 2015 > Name: 10.117.15.38:50010 (worker-1) > Hostname: worker-1 > Decommission Status : Normal > Configured Capacity: 80256417792 (74.74 GB) > DFS Used: 8421847040 (7.84 GB) > Non DFS Used: 721473536 (688.05 MB) > DFS Remaining: 71113097216 (66.23 GB) > DFS Used%: 10.49% > DFS Remaining%: 88.61% > Configured Cache Capacity: 0 (0 B) > Cache Used: 0 (0 B) > Cache Remaining: 0 (0 B) > Cache Used%: 100.00% > Cache Remaining%: 0.00% > Xceivers: 1 > Last contact: Wed Dec 09 15:55:03 CST 2015 > > when running hive job , dfsadmin report as follows > [hadoop@worker-1 ~]$ hadoop dfsadmin -report > DEPRECATED: Use of this script to execute hdfs command is deprecated. > Instead use the hdfs command for it. > Configured Capacity: 240769253376 (224.23 GB) > Present Capacity: 108266011136 (100.83 GB) > DFS Remaining: 80078416384 (74.58 GB) > DFS Used: 28187594752 (26.25 GB) > DFS Used%: 26.04% > Under replicated blocks: 7 > Blocks with corrupt replicas: 0 > Missing blocks: 0 > - > Live datanodes (3): > Name: 10.117.60.59:50010 (worker-2) > Hostname: worker-2 > Decommission Status : Normal > Configured Capacity: 80256417792 (74.74 GB) > DFS Used: 9015627776 (8.40 GB) > Non DFS Used: 44303742464 (41.26 GB) > DFS Remaining: 26937047552 (25.09 GB) > DFS Used%: 11.23% > DFS Remaining%: 33.56% > Configured Cache Capacity: 0 (0 B) > Cache Used: 0 (0 B) > Cache Remaining: 0 (0 B) > Cache Used%: 100.00% > Cache Remaining%: 0.00% > Xceivers: 693 > Last contact: Wed Dec 09 15:37:35 CST 2015 > Name: 10.168.156.0:50010 (worker-3) > Hostname: worker-3 > Decommission Status : Normal > Configured Capacity: 80256417792 (74.74 GB) > DFS Used: 9163116544 (8.53 GB) > Non DFS Used: 47895897600 (44.61 GB) > DFS Remaining: 23197403648 (21.60 GB) > DFS Used%: 11.42% > DFS Remaining%: 28.90% > Configured Cache Capacity: 0 (0 B) > Cache Used: 0 (0 B) > Cache Remaining: 0 (0 B) > Cache Used%: 100.00% > Cache Remaining%: 0.00% > Xceivers: 750 > Last contact: Wed Dec 09 15:37:36 CST 2015 > Name: 10.117.15.38:50010 (worker-1) > Hostname: worker-1 > Decommission Status : Normal > Configured Capacity: 80256417792 (74.74 GB) > DFS Used: 10008850432 (9.32 GB) > Non DFS Used: 40303602176 (3
[jira] [Commented] (HDFS-9530) ReservedSpace is not cleared for abandoned Blocks
[ https://issues.apache.org/jira/browse/HDFS-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15418458#comment-15418458 ] Brahma Reddy Battula commented on HDFS-9530: Ok.. thanks arpit.. Even I ran before uploading the patch,did not induced any test failure. > ReservedSpace is not cleared for abandoned Blocks > - > > Key: HDFS-9530 > URL: https://issues.apache.org/jira/browse/HDFS-9530 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0, 2.7.1 >Reporter: Fei Hui >Assignee: Brahma Reddy Battula >Priority: Critical > Fix For: 2.7.3 > > Attachments: HDFS-9530-01.patch, HDFS-9530-02.patch, > HDFS-9530-03.patch, HDFS-9530-branch-2.6.patch, > HDFS-9530-branch-2.7-001.patch, HDFS-9530-branch-2.7-002.patch > > > i think there are bugs in HDFS > === > here is config > > dfs.datanode.data.dir > > > file:///mnt/disk4,file:///mnt/disk1,file:///mnt/disk3,file:///mnt/disk2 > > > here is dfsadmin report > [hadoop@worker-1 ~]$ hadoop dfsadmin -report > DEPRECATED: Use of this script to execute hdfs command is deprecated. > Instead use the hdfs command for it. > Configured Capacity: 240769253376 (224.23 GB) > Present Capacity: 238604832768 (222.22 GB) > DFS Remaining: 215772954624 (200.95 GB) > DFS Used: 22831878144 (21.26 GB) > DFS Used%: 9.57% > Under replicated blocks: 4 > Blocks with corrupt replicas: 0 > Missing blocks: 0 > - > Live datanodes (3): > Name: 10.117.60.59:50010 (worker-2) > Hostname: worker-2 > Decommission Status : Normal > Configured Capacity: 80256417792 (74.74 GB) > DFS Used: 7190958080 (6.70 GB) > Non DFS Used: 721473536 (688.05 MB) > DFS Remaining: 72343986176 (67.38 GB) > DFS Used%: 8.96% > DFS Remaining%: 90.14% > Configured Cache Capacity: 0 (0 B) > Cache Used: 0 (0 B) > Cache Remaining: 0 (0 B) > Cache Used%: 100.00% > Cache Remaining%: 0.00% > Xceivers: 1 > Last contact: Wed Dec 09 15:55:02 CST 2015 > Name: 10.168.156.0:50010 (worker-3) > Hostname: worker-3 > Decommission Status : Normal > Configured Capacity: 80256417792 (74.74 GB) > DFS Used: 7219073024 (6.72 GB) > Non DFS Used: 721473536 (688.05 MB) > DFS Remaining: 72315871232 (67.35 GB) > DFS Used%: 9.00% > DFS Remaining%: 90.11% > Configured Cache Capacity: 0 (0 B) > Cache Used: 0 (0 B) > Cache Remaining: 0 (0 B) > Cache Used%: 100.00% > Cache Remaining%: 0.00% > Xceivers: 1 > Last contact: Wed Dec 09 15:55:03 CST 2015 > Name: 10.117.15.38:50010 (worker-1) > Hostname: worker-1 > Decommission Status : Normal > Configured Capacity: 80256417792 (74.74 GB) > DFS Used: 8421847040 (7.84 GB) > Non DFS Used: 721473536 (688.05 MB) > DFS Remaining: 71113097216 (66.23 GB) > DFS Used%: 10.49% > DFS Remaining%: 88.61% > Configured Cache Capacity: 0 (0 B) > Cache Used: 0 (0 B) > Cache Remaining: 0 (0 B) > Cache Used%: 100.00% > Cache Remaining%: 0.00% > Xceivers: 1 > Last contact: Wed Dec 09 15:55:03 CST 2015 > > when running hive job , dfsadmin report as follows > [hadoop@worker-1 ~]$ hadoop dfsadmin -report > DEPRECATED: Use of this script to execute hdfs command is deprecated. > Instead use the hdfs command for it. > Configured Capacity: 240769253376 (224.23 GB) > Present Capacity: 108266011136 (100.83 GB) > DFS Remaining: 80078416384 (74.58 GB) > DFS Used: 28187594752 (26.25 GB) > DFS Used%: 26.04% > Under replicated blocks: 7 > Blocks with corrupt replicas: 0 > Missing blocks: 0 > - > Live datanodes (3): > Name: 10.117.60.59:50010 (worker-2) > Hostname: worker-2 > Decommission Status : Normal > Configured Capacity: 80256417792 (74.74 GB) > DFS Used: 9015627776 (8.40 GB) > Non DFS Used: 44303742464 (41.26 GB) > DFS Remaining: 26937047552 (25.09 GB) > DFS Used%: 11.23% > DFS Remaining%: 33.56% > Configured Cache Capacity: 0 (0 B) > Cache Used: 0 (0 B) > Cache Remaining: 0 (0 B) > Cache Used%: 100.00% > Cache Remaining%: 0.00% > Xceivers: 693 > Last contact: Wed Dec 09 15:37:35 CST 2015 > Name: 10.168.156.0:50010 (worker-3) > Hostname: worker-3 > Decommission Status : Normal > Configured Capacity: 80256417792 (74.74 GB) > DFS Used: 9163116544 (8.53 GB) > Non DFS Used: 47895897600 (44.61 GB) > DFS Remaining: 23197403648 (21.60 GB) > DFS Used%: 11.42% > DFS Remaining%: 28.90% > Configured Cache Capacity: 0 (0 B) > Cache Used: 0 (0 B) > Cache Remaining: 0 (0 B) > Cache Used%: 100.00% > Cache Remaining%: 0.00% > Xceivers: 750 > Last contact: Wed Dec 09 15:37:36 CST 2015 > Name: 10.117.15.38:50010 (worker-1) > Hostname: worker-1 > Decommission Status : Normal > Configured Capacity: 80256417792 (74.74 GB) >
[jira] [Commented] (HDFS-10757) KMSClientProvider combined with KeyProviderCache can result in wrong UGI being used
[ https://issues.apache.org/jira/browse/HDFS-10757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15418447#comment-15418447 ] Jitendra Nath Pandey commented on HDFS-10757: - If the currentUgi is a proxy user it will have a real UGI. {{currentUgi.getRealUser()}} should give us the actual ugi. > KMSClientProvider combined with KeyProviderCache can result in wrong UGI > being used > --- > > Key: HDFS-10757 > URL: https://issues.apache.org/jira/browse/HDFS-10757 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Sergey Shelukhin >Priority: Critical > > ClientContext::get gets the context from CACHE via a config setting based > name, then KeyProviderCache stored in ClientContext gets the key provider > cached by URI from the configuration, too. These would return the same > KeyProvider regardless of current UGI. > KMSClientProvider caches the UGI (actualUgi) in ctor; that means in > particular that all the users of DFS with KMSClientProvider in a process will > get the KMS token (along with other credentials) of the first user, via the > above cache. > Either KMSClientProvider shouldn't store the UGI, or one of the caches should > be UGI-aware, like the FS object cache. > Side note: the comment in createConnection that purports to handle the > different UGI doesn't seem to cover what it says it covers. In our case, we > have two unrelated UGIs with no auth (createRemoteUser) with bunch of tokens, > including a KMS token, added. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org