date:20160812

[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-08-12 Thread Colin P. McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419823#comment-15419823
 ] 

Colin P. McCabe commented on HDFS-10301:


I don't think the heartbeat is the right place to handle reconciling the block 
storages.  One reason is because this adds extra complexity and time to the 
heartbeat, which happens far more frequently than an FBR.  We even talked about 
making the heartbeat lockless-- clearly you can't do that if you are traversing 
all the block storages.  Taking the FSN lock is expensive and heartbeats are 
sent quite frequently from each DN-- every few seconds.  Another reason 
reconciling storages in heartbeats is bad is because if the heartbeat tells you 
about a new storage, you won't know what blocks are in it until the FBR 
arrives.  So the NN may end up assigning a bunch of new blocks to a storage 
which looks empty, but really is full.

I came up with what I believe is the correct patch to fix this problem months 
ago.  It's here as 
https://issues.apache.org/jira/secure/attachment/12805931/HDFS-10301.005.patch 
.  It doesn't modify any RPCs or add any new mechanisms.  Instead, it just 
fixes the obvious bug in the HDFS-7960 logic.  The only counter-argument to 
applying patch 005 that anyone ever came up with is that it doesn't eliminate 
zombies when FBRs get interleaved.  But this is not a good counter-argument, 
since FBR interleaving is extremely, extremely rare in well-run clusters.  The 
proof should be obvious-- if FBR interleaving happened on more clusters, more 
people would hit this serious data loss bug.

This JIRA has been extremely frustrating.  It seems like most, if not all, of 
the points that I brought up in my reviews were ignored.  I talked about the 
obvious problems with compatibility with [~shv]'s solution and even explicitly 
asked him to test the upgrade case.  I told him that this JIRA was a bad one to 
give to a promising new contributor such as [~redvine], because it required a 
lot of context and was extremely tricky.  Both myself and [~andrew.wang] 
commented that overloading BlockListAsLongs was confusing and not necessary.  
The patch confused "not modifying the .proto file" with "not modifying the RPC 
content" which are two very separate concepts, as I commented over and over.  
Clearly these comments were ignored.  If anything, I think [~shv] got very 
lucky that the bug manifested itself quickly rather than creating a serious 
data loss situation a few months down the road, like the one I had to debug 
when fixing HDFS-7960.

Again I would urge you to just commit patch 005.  Or at least evaluate it.

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>Priority: Critical
> Fix For: 2.7.4
>
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, 
> HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, 
> HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, 
> HDFS-10301.012.patch, HDFS-10301.013.patch, HDFS-10301.branch-2.7.patch, 
> HDFS-10301.branch-2.patch, HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-10752) Several log refactoring/improvement suggestion in HDFS

2016-08-12 Thread Nemo Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemo Chen updated HDFS-10752:
-
Description: 
As per conversation with [~vrushalic], we merged HDFS-10749, HDFS-10750, 
HDFS-10751, HDFS-10753,  under this issue.

HDFS-10749
*Method invocation in logs can be replaced by variable*
Similar to the fix for HDFS-409. In file:

hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java

In code block:
{code:borderStyle=solid}
lastQueuedSeqno = currentPacket.getSeqno();
if (DFSClient.LOG.isDebugEnabled()) {
DFSClient.LOG.debug("Queued packet " + currentPacket.getSeqno());
}
{code}
currentPacket.getSeqno() is better to be replaced by variable lastQueuedSeqno.

HDFS-10750
Similar to the fix for AVRO-115. In file:

hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java

in line 695, the logging code:
{code:borderStyle=solid}
LOG.info(getRole() + " RPC up at: " + rpcServer.getRpcAddress());
{code}

In the same class, there is a method in line 907:
{code:borderStyle=solid}
  /**
   * @return NameNode RPC address
   */
  public InetSocketAddress getNameNodeAddress() {
return rpcServer.getRpcAddress();
  }
{code}


We can tell that rpcServer.getRpcAddress() could be replaced by  method 
getNameNodeAddress() for the case of readability and simplicity


HDFS-10751
Similar to the fix for AVRO-115. In file:

hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtxCache.java

in line 72, the logging code:
{code:borderStyle=solid}
LOG.trace("openFileMap size:" + openFileMap.size());
{code}
In the same class, there is a method in line 189:
{code:borderStyle=solid}
  int size() {
return openFileMap.size();
  }
{code}


We can tell that openFileMap.size() could be replaced by  method size() for the 
case of readability and simplicity


*Print variable in byte*
Similar to the fix for HBASE-623, in file:

hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupImage.java

In the following method, the log printed variable data (in byte[]). A possible 
fix is add Bytes.toString(data).
{code}
/**
   * Write the batch of edits to the local copy of the edit logs.
   */
  private void logEditsLocally(long firstTxId, int numTxns, byte[] data) {
long expectedTxId = editLog.getLastWrittenTxId() + 1;
Preconditions.checkState(firstTxId == expectedTxId,
"received txid batch starting at %s but expected txn %s",
firstTxId, expectedTxId);
editLog.setNextTxId(firstTxId + numTxns - 1);
editLog.logEdit(data.length, data);
editLog.logSync();
  }
{code}



HDFS-10753
*MethodInvocation replaced by variable due to toString method*
Similar to the fix in HADOOP-6419, in file:

hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CorruptReplicasMap.java

in line 76, the blk.getBlockName() method invocation is invoked on variable 
blk. "blk" is the class instance of Block.

{code}
void addToCorruptReplicasMap(Block blk, DatanodeDescriptor dn,
  String reason, Reason reasonCode) {
...
NameNode.blockStateChangeLog.info(
  "BLOCK NameSystem.addToCorruptReplicasMap: {} added as corrupt on "
  + "{} by {} {}", blk.getBlockName(), dn, Server.getRemoteIp(),
  reasonText);
{code}

In file: 
hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/Block.java
{code}
  @Override
  public String toString() {
return getBlockName() + "_" + getGenerationStamp();
  }
{code}
The toString() method contain not only getBlockName() but also 
getGenerationStamp which may be helpful for debugging purpose. Therefore 
blk.getBlockName() can be replaced by blk

  was:
As per conversation with [~vrushalic], we merged HDFS-10749, HDFS-10750, 
HDFS-10751, HDFS-10753,  under this issue.
---
HDFS-10749
*Method invocation in logs can be replaced by variable*
Similar to the fix for HDFS-409. In file:

hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java

In code block:
{code:borderStyle=solid}
lastQueuedSeqno = currentPacket.getSeqno();
if (DFSClient.LOG.isDebugEnabled()) {
DFSClient.LOG.debug("Queued packet " + currentPacket.getSeqno());
}
{code}
currentPacket.getSeqno() is better to be replaced by variable lastQueuedSeqno.
---
HDFS-10750
Similar to the fix for AVRO-115. In file:

hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java

in line 695, the logging code:
{code:borderStyle=solid}
LOG.info(getRole() + " RPC up at: " + rpcServer.getRpcAddress());
{code}

In the same class, there is

[jira] [Updated] (HDFS-10752) Several log refactoring/improvement suggestion in HDFS

2016-08-12 Thread Nemo Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemo Chen updated HDFS-10752:
-
Description: 
As per conversation with [~vrushalic], we merged HDFS-10749, HDFS-10750, 
HDFS-10751, HDFS-10753,  under this issue.
---
HDFS-10749
*Method invocation in logs can be replaced by variable*
Similar to the fix for HDFS-409. In file:

hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java

In code block:
{code:borderStyle=solid}
lastQueuedSeqno = currentPacket.getSeqno();
if (DFSClient.LOG.isDebugEnabled()) {
DFSClient.LOG.debug("Queued packet " + currentPacket.getSeqno());
}
{code}
currentPacket.getSeqno() is better to be replaced by variable lastQueuedSeqno.
---
HDFS-10750
Similar to the fix for AVRO-115. In file:

hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java

in line 695, the logging code:
{code:borderStyle=solid}
LOG.info(getRole() + " RPC up at: " + rpcServer.getRpcAddress());
{code}

In the same class, there is a method in line 907:
{code:borderStyle=solid}
  /**
   * @return NameNode RPC address
   */
  public InetSocketAddress getNameNodeAddress() {
return rpcServer.getRpcAddress();
  }
{code}


We can tell that rpcServer.getRpcAddress() could be replaced by  method 
getNameNodeAddress() for the case of readability and simplicity

---
HDFS-10751
Similar to the fix for AVRO-115. In file:

hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtxCache.java

in line 72, the logging code:
{code:borderStyle=solid}
LOG.trace("openFileMap size:" + openFileMap.size());
{code}
In the same class, there is a method in line 189:
{code:borderStyle=solid}
  int size() {
return openFileMap.size();
  }
{code}


We can tell that openFileMap.size() could be replaced by  method size() for the 
case of readability and simplicity

---
*Print variable in byte*
Similar to the fix for HBASE-623, in file:

hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupImage.java

In the following method, the log printed variable data (in byte[]). A possible 
fix is add Bytes.toString(data).
{code}
/**
   * Write the batch of edits to the local copy of the edit logs.
   */
  private void logEditsLocally(long firstTxId, int numTxns, byte[] data) {
long expectedTxId = editLog.getLastWrittenTxId() + 1;
Preconditions.checkState(firstTxId == expectedTxId,
"received txid batch starting at %s but expected txn %s",
firstTxId, expectedTxId);
editLog.setNextTxId(firstTxId + numTxns - 1);
editLog.logEdit(data.length, data);
editLog.logSync();
  }
{code}



HDFS-10753
*MethodInvocation replaced by variable due to toString method*
Similar to the fix in HADOOP-6419, in file:

hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CorruptReplicasMap.java

in line 76, the blk.getBlockName() method invocation is invoked on variable 
blk. "blk" is the class instance of Block.

{code}
void addToCorruptReplicasMap(Block blk, DatanodeDescriptor dn,
  String reason, Reason reasonCode) {
...
NameNode.blockStateChangeLog.info(
  "BLOCK NameSystem.addToCorruptReplicasMap: {} added as corrupt on "
  + "{} by {} {}", blk.getBlockName(), dn, Server.getRemoteIp(),
  reasonText);
{code}

In file: 
hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/Block.java
{code}
  @Override
  public String toString() {
return getBlockName() + "_" + getGenerationStamp();
  }
{code}
The toString() method contain not only getBlockName() but also 
getGenerationStamp which may be helpful for debugging purpose. Therefore 
blk.getBlockName() can be replaced by blk

  was:
As per conversation with [~vrushalic], we merged HDFS-10749, HDFS-10750, 
HDFS-10751, HDFS-10753,  under this issue.
---
HDFS-10749
*Method invocation in logs can be replaced by variable*
Similar to the fix for HDFS-409. In file:

hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java

In code block:
{code:borderStyle=solid}
lastQueuedSeqno = currentPacket.getSeqno();
if (DFSClient.LOG.isDebugEnabled()) {
DFSClient.LOG.debug("Queued packet " + currentPacket.getSeqno());
}
{code}
currentPacket.getSeqno() is better to be replaced by variable lastQueuedSeqno.
---
HDFS-10750
Similar to the fix for AVRO-115. In file:

hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java

in line 695, the logging code:
{code:borderStyle=solid}
LOG.info(getRole() + " RPC up at: " + rpcServer.getRpcAddress());
{code}

In the same class, there is a me

[jira] [Updated] (HDFS-10752) Several log refactoring/improvement suggestion in HDFS

2016-08-12 Thread Nemo Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemo Chen updated HDFS-10752:
-
Description: 
As per conversation with [~vrushalic], we merged HDFS-10749, HDFS-10750, 
HDFS-10751, HDFS-10753,  under this issue.
---
HDFS-10749
*Method invocation in logs can be replaced by variable*
Similar to the fix for HDFS-409. In file:

hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java

In code block:
{code:borderStyle=solid}
lastQueuedSeqno = currentPacket.getSeqno();
if (DFSClient.LOG.isDebugEnabled()) {
DFSClient.LOG.debug("Queued packet " + currentPacket.getSeqno());
}
{code}
currentPacket.getSeqno() is better to be replaced by variable lastQueuedSeqno.
---
*Print variable in byte*
Similar to the fix for HBASE-623, in file:

hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupImage.java

In the following method, the log printed variable data (in byte[]). A possible 
fix is add Bytes.toString(data).
{code}
/**
   * Write the batch of edits to the local copy of the edit logs.
   */
  private void logEditsLocally(long firstTxId, int numTxns, byte[] data) {
long expectedTxId = editLog.getLastWrittenTxId() + 1;
Preconditions.checkState(firstTxId == expectedTxId,
"received txid batch starting at %s but expected txn %s",
firstTxId, expectedTxId);
editLog.setNextTxId(firstTxId + numTxns - 1);
editLog.logEdit(data.length, data);
editLog.logSync();
  }
{code}



HDFS-10753
*MethodInvocation replaced by variable due to toString method*
Similar to the fix in HADOOP-6419, in file:

hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CorruptReplicasMap.java

in line 76, the blk.getBlockName() method invocation is invoked on variable 
blk. "blk" is the class instance of Block.

{code}
void addToCorruptReplicasMap(Block blk, DatanodeDescriptor dn,
  String reason, Reason reasonCode) {
...
NameNode.blockStateChangeLog.info(
  "BLOCK NameSystem.addToCorruptReplicasMap: {} added as corrupt on "
  + "{} by {} {}", blk.getBlockName(), dn, Server.getRemoteIp(),
  reasonText);
{code}

In file: 
hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/Block.java
{code}
  @Override
  public String toString() {
return getBlockName() + "_" + getGenerationStamp();
  }
{code}
The toString() method contain not only getBlockName() but also 
getGenerationStamp which may be helpful for debugging purpose. Therefore 
blk.getBlockName() can be replaced by blk

  was:
As per conversation with [~vrushalic], we merged HDFS-10749, HDFS-10750, 
HDFS-10751, HDFS-10753,  under this issue.
---
HDFS-10749
*Method invocation in logs can be replaced by variable*
Similar to the fix for HDFS-409. In file:

hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java

In code block:
{code:borderStyle=solid}
lastQueuedSeqno = currentPacket.getSeqno();
if (DFSClient.LOG.isDebugEnabled()) {
DFSClient.LOG.debug("Queued packet " + currentPacket.getSeqno());
}
{code}
currentPacket.getSeqno() is better to be replaced by variable lastQueuedSeqno.
---
*Print variable in byte*
Similar to the fix for HBASE-623, in file:

hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupImage.java

In the following method, the log printed variable data (in byte[]). A possible 
fix is add Bytes.toString(data).
{code}
/**
   * Write the batch of edits to the local copy of the edit logs.
   */
  private void logEditsLocally(long firstTxId, int numTxns, byte[] data) {
long expectedTxId = editLog.getLastWrittenTxId() + 1;
Preconditions.checkState(firstTxId == expectedTxId,
"received txid batch starting at %s but expected txn %s",
firstTxId, expectedTxId);
editLog.setNextTxId(firstTxId + numTxns - 1);
editLog.logEdit(data.length, data);
editLog.logSync();
  }
{code}



*MethodInvocation replaced by variable due to toString method*
Similar to the fix in HADOOP-6419, in file:

hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CorruptReplicasMap.java

in line 76, the blk.getBlockName() method invocation is invoked on variable 
blk. "blk" is the class instance of Block.

{code}
void addToCorruptReplicasMap(Block blk, DatanodeDescriptor dn,
  String reason, Reason reasonCode) {
...
NameNode.blockStateChangeLog.info(
  "BLOCK NameSystem.addToCorruptReplicasMap: {} added as corrupt on "
  + "{} by {} {}", blk.getBlockName(), dn, Server.getRemoteIp(),
  reasonText);
{code}

In file: 
hadoop-rel-release-2.7.2/hadoop-h

[jira] [Updated] (HDFS-10752) Several log refactoring/improvement suggestion in HDFS

2016-08-12 Thread Nemo Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemo Chen updated HDFS-10752:
-
Description: 
As per conversation with [~vrushalic], we merged HDFS-10749, HDFS-10750, 
HDFS-10751, HDFS-10753,  under this issue.
---
HDFS-10749
*Method invocation in logs can be replaced by variable*
Similar to the fix for HDFS-409. In file:

hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java

In code block:
{code:borderStyle=solid}
lastQueuedSeqno = currentPacket.getSeqno();
if (DFSClient.LOG.isDebugEnabled()) {
DFSClient.LOG.debug("Queued packet " + currentPacket.getSeqno());
}
{code}
currentPacket.getSeqno() is better to be replaced by variable lastQueuedSeqno.
---
HDFS-10750
Similar to the fix for AVRO-115. In file:

hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java

in line 695, the logging code:
{code:borderStyle=solid}
LOG.info(getRole() + " RPC up at: " + rpcServer.getRpcAddress());
{code}

In the same class, there is a method in line 907:
{code:borderStyle=solid}
  /**
   * @return NameNode RPC address
   */
  public InetSocketAddress getNameNodeAddress() {
return rpcServer.getRpcAddress();
  }
{code}


We can tell that rpcServer.getRpcAddress() could be replaced by  method 
getNameNodeAddress() for the case of readability and simplicity

---
HDFS-10751


---
*Print variable in byte*
Similar to the fix for HBASE-623, in file:

hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupImage.java

In the following method, the log printed variable data (in byte[]). A possible 
fix is add Bytes.toString(data).
{code}
/**
   * Write the batch of edits to the local copy of the edit logs.
   */
  private void logEditsLocally(long firstTxId, int numTxns, byte[] data) {
long expectedTxId = editLog.getLastWrittenTxId() + 1;
Preconditions.checkState(firstTxId == expectedTxId,
"received txid batch starting at %s but expected txn %s",
firstTxId, expectedTxId);
editLog.setNextTxId(firstTxId + numTxns - 1);
editLog.logEdit(data.length, data);
editLog.logSync();
  }
{code}



HDFS-10753
*MethodInvocation replaced by variable due to toString method*
Similar to the fix in HADOOP-6419, in file:

hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CorruptReplicasMap.java

in line 76, the blk.getBlockName() method invocation is invoked on variable 
blk. "blk" is the class instance of Block.

{code}
void addToCorruptReplicasMap(Block blk, DatanodeDescriptor dn,
  String reason, Reason reasonCode) {
...
NameNode.blockStateChangeLog.info(
  "BLOCK NameSystem.addToCorruptReplicasMap: {} added as corrupt on "
  + "{} by {} {}", blk.getBlockName(), dn, Server.getRemoteIp(),
  reasonText);
{code}

In file: 
hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/Block.java
{code}
  @Override
  public String toString() {
return getBlockName() + "_" + getGenerationStamp();
  }
{code}
The toString() method contain not only getBlockName() but also 
getGenerationStamp which may be helpful for debugging purpose. Therefore 
blk.getBlockName() can be replaced by blk

  was:
As per conversation with [~vrushalic], we merged HDFS-10749, HDFS-10750, 
HDFS-10751, HDFS-10753,  under this issue.
---
HDFS-10749
*Method invocation in logs can be replaced by variable*
Similar to the fix for HDFS-409. In file:

hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java

In code block:
{code:borderStyle=solid}
lastQueuedSeqno = currentPacket.getSeqno();
if (DFSClient.LOG.isDebugEnabled()) {
DFSClient.LOG.debug("Queued packet " + currentPacket.getSeqno());
}
{code}
currentPacket.getSeqno() is better to be replaced by variable lastQueuedSeqno.
---
*Print variable in byte*
Similar to the fix for HBASE-623, in file:

hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupImage.java

In the following method, the log printed variable data (in byte[]). A possible 
fix is add Bytes.toString(data).
{code}
/**
   * Write the batch of edits to the local copy of the edit logs.
   */
  private void logEditsLocally(long firstTxId, int numTxns, byte[] data) {
long expectedTxId = editLog.getLastWrittenTxId() + 1;
Preconditions.checkState(firstTxId == expectedTxId,
"received txid batch starting at %s but expected txn %s",
firstTxId, expectedTxId);
editLog.setNextTxId(firstTxId + numTxns - 1);
editLog.logEdit(data.length, data);
editLog.logSync();
  }
{code}



HDFS-10753
*MethodInvocation replaced by variable due to toSt

[jira] [Updated] (HDFS-10752) Several log refactoring/improvement suggestion in HDFS

2016-08-12 Thread Nemo Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemo Chen updated HDFS-10752:
-
Description: 
As per conversation with [~vrushalic], we merged HDFS-10749, HDFS-10750, 
HDFS-10751, HDFS-10753,  under this issue.
---
HDFS-10749
*Method invocation in logs can be replaced by variable*
Similar to the fix for HDFS-409. In file:

hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java

In code block:
{code:borderStyle=solid}
lastQueuedSeqno = currentPacket.getSeqno();
if (DFSClient.LOG.isDebugEnabled()) {
DFSClient.LOG.debug("Queued packet " + currentPacket.getSeqno());
}
{code}
currentPacket.getSeqno() is better to be replaced by variable lastQueuedSeqno.
---
*Print variable in byte*
Similar to the fix for HBASE-623, in file:

hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupImage.java

In the following method, the log printed variable data (in byte[]). A possible 
fix is add Bytes.toString(data).
{code}
/**
   * Write the batch of edits to the local copy of the edit logs.
   */
  private void logEditsLocally(long firstTxId, int numTxns, byte[] data) {
long expectedTxId = editLog.getLastWrittenTxId() + 1;
Preconditions.checkState(firstTxId == expectedTxId,
"received txid batch starting at %s but expected txn %s",
firstTxId, expectedTxId);
editLog.setNextTxId(firstTxId + numTxns - 1);
editLog.logEdit(data.length, data);
editLog.logSync();
  }
{code}



*MethodInvocation replaced by variable due to toString method*
Similar to the fix in HADOOP-6419, in file:

hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CorruptReplicasMap.java

in line 76, the blk.getBlockName() method invocation is invoked on variable 
blk. "blk" is the class instance of Block.

{code}
void addToCorruptReplicasMap(Block blk, DatanodeDescriptor dn,
  String reason, Reason reasonCode) {
...
NameNode.blockStateChangeLog.info(
  "BLOCK NameSystem.addToCorruptReplicasMap: {} added as corrupt on "
  + "{} by {} {}", blk.getBlockName(), dn, Server.getRemoteIp(),
  reasonText);
{code}

In file: 
hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/Block.java
{code}
  @Override
  public String toString() {
return getBlockName() + "_" + getGenerationStamp();
  }
{code}
The toString() method contain not only getBlockName() but also 
getGenerationStamp which may be helpful for debugging purpose. Therefore 
blk.getBlockName() can be replaced by blk

  was:
As per conversation with [~vrushalic], we merged HDFS-10749, HDFS-10750, 
HDFS-10751, HDFS-10753,  under this issue.
*Print variable in byte*
Similar to the fix for HBASE-623, in file:

hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupImage.java

In the following method, the log printed variable data (in byte[]). A possible 
fix is add Bytes.toString(data).
{code}
/**
   * Write the batch of edits to the local copy of the edit logs.
   */
  private void logEditsLocally(long firstTxId, int numTxns, byte[] data) {
long expectedTxId = editLog.getLastWrittenTxId() + 1;
Preconditions.checkState(firstTxId == expectedTxId,
"received txid batch starting at %s but expected txn %s",
firstTxId, expectedTxId);
editLog.setNextTxId(firstTxId + numTxns - 1);
editLog.logEdit(data.length, data);
editLog.logSync();
  }
{code}

*Method invocation in logs can be replaced by variable*
Similar to the fix for HDFS-409. In file:

hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java

In code block:
{code:borderStyle=solid}
lastQueuedSeqno = currentPacket.getSeqno();
if (DFSClient.LOG.isDebugEnabled()) {
DFSClient.LOG.debug("Queued packet " + currentPacket.getSeqno());
}
{code}
currentPacket.getSeqno() is better to be replaced by variable lastQueuedSeqno.

*MethodInvocation replaced by variable due to toString method*
Similar to the fix in HADOOP-6419, in file:

hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CorruptReplicasMap.java

in line 76, the blk.getBlockName() method invocation is invoked on variable 
blk. "blk" is the class instance of Block.

{code}
void addToCorruptReplicasMap(Block blk, DatanodeDescriptor dn,
  String reason, Reason reasonCode) {
...
NameNode.blockStateChangeLog.info(
  "BLOCK NameSystem.addToCorruptReplicasMap: {} added as corrupt on "
  + "{} by {} {}", blk.getBlockName(), dn, Server.getRemoteIp(),
  reasonText);
{code}

In file: 
hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/mai

[jira] [Updated] (HDFS-10752) Several log refactoring/improvement suggestion in HDFS

2016-08-12 Thread Nemo Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemo Chen updated HDFS-10752:
-
Description: 
As per conversation with [~vrushalic], we merged HDFS-10749, HDFS-10750, 
HDFS-10751, HDFS-10753,  under this issue.
*Print variable in byte*
Similar to the fix for HBASE-623, in file:

hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupImage.java

In the following method, the log printed variable data (in byte[]). A possible 
fix is add Bytes.toString(data).
{code}
/**
   * Write the batch of edits to the local copy of the edit logs.
   */
  private void logEditsLocally(long firstTxId, int numTxns, byte[] data) {
long expectedTxId = editLog.getLastWrittenTxId() + 1;
Preconditions.checkState(firstTxId == expectedTxId,
"received txid batch starting at %s but expected txn %s",
firstTxId, expectedTxId);
editLog.setNextTxId(firstTxId + numTxns - 1);
editLog.logEdit(data.length, data);
editLog.logSync();
  }
{code}

*Method invocation in logs can be replaced by variable*
Similar to the fix for HDFS-409. In file:

hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java

In code block:
{code:borderStyle=solid}
lastQueuedSeqno = currentPacket.getSeqno();
if (DFSClient.LOG.isDebugEnabled()) {
DFSClient.LOG.debug("Queued packet " + currentPacket.getSeqno());
}
{code}
currentPacket.getSeqno() is better to be replaced by variable lastQueuedSeqno.

*MethodInvocation replaced by variable due to toString method*
Similar to the fix in HADOOP-6419, in file:

hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CorruptReplicasMap.java

in line 76, the blk.getBlockName() method invocation is invoked on variable 
blk. "blk" is the class instance of Block.

{code}
void addToCorruptReplicasMap(Block blk, DatanodeDescriptor dn,
  String reason, Reason reasonCode) {
...
NameNode.blockStateChangeLog.info(
  "BLOCK NameSystem.addToCorruptReplicasMap: {} added as corrupt on "
  + "{} by {} {}", blk.getBlockName(), dn, Server.getRemoteIp(),
  reasonText);
{code}

In file: 
hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/Block.java
{code}
  @Override
  public String toString() {
return getBlockName() + "_" + getGenerationStamp();
  }
{code}
The toString() method contain not only getBlockName() but also 
getGenerationStamp which may be helpful for debugging purpose. Therefore 
blk.getBlockName() can be replaced by blk

  was:
As per conversation with [~vrushalic], we merged HDFS-10753, HDFS-10749 under 
this issue.
*Print variable in byte*
Similar to the fix for HBASE-623, in file:

hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupImage.java

In the following method, the log printed variable data (in byte[]). A possible 
fix is add Bytes.toString(data).
{code}
/**
   * Write the batch of edits to the local copy of the edit logs.
   */
  private void logEditsLocally(long firstTxId, int numTxns, byte[] data) {
long expectedTxId = editLog.getLastWrittenTxId() + 1;
Preconditions.checkState(firstTxId == expectedTxId,
"received txid batch starting at %s but expected txn %s",
firstTxId, expectedTxId);
editLog.setNextTxId(firstTxId + numTxns - 1);
editLog.logEdit(data.length, data);
editLog.logSync();
  }
{code}

*Method invocation in logs can be replaced by variable*
Similar to the fix for HDFS-409. In file:

hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java

In code block:
{code:borderStyle=solid}
lastQueuedSeqno = currentPacket.getSeqno();
if (DFSClient.LOG.isDebugEnabled()) {
DFSClient.LOG.debug("Queued packet " + currentPacket.getSeqno());
}
{code}
currentPacket.getSeqno() is better to be replaced by variable lastQueuedSeqno.

*MethodInvocation replaced by variable due to toString method*
Similar to the fix in HADOOP-6419, in file:

hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CorruptReplicasMap.java

in line 76, the blk.getBlockName() method invocation is invoked on variable 
blk. "blk" is the class instance of Block.

{code}
void addToCorruptReplicasMap(Block blk, DatanodeDescriptor dn,
  String reason, Reason reasonCode) {
...
NameNode.blockStateChangeLog.info(
  "BLOCK NameSystem.addToCorruptReplicasMap: {} added as corrupt on "
  + "{} by {} {}", blk.getBlockName(), dn, Server.getRemoteIp(),
  reasonText);
{code}

In file: 
hadoop-rel-release-2.7.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/Block.j

[jira] [Commented] (HDFS-9696) Garbage snapshot records lingering forever

2016-08-12 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419709#comment-15419709
 ] 

Hadoop QA commented on HDFS-9696:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 90m 43s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
19s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}111m 12s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestPersistBlocks |
| Timed out junit tests | org.apache.hadoop.hdfs.TestLeaseRecovery2 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12823548/HDFS-9696.v2.patch |
| JIRA Issue | HDFS-9696 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 6fa8fece0684 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 23c6e3c |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16417/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16417/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16417/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Garbage snapshot records lingering forever
> --
>
> Key: HDFS-9696
> URL: https://issues.apache.org/jira/browse/HDFS-9696
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Kihwal Lee
>Assignee: Kih

[jira] [Commented] (HDFS-10757) KMSClientProvider combined with KeyProviderCache can result in wrong UGI being used

2016-08-12 Thread Jitendra Nath Pandey (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419682#comment-15419682
 ] 

Jitendra Nath Pandey commented on HDFS-10757:
-

[~asuresh], thanks for explaining the context.
  In this context it works because the server has a login user that is stored 
as the actualUgi and that is the one always needed, but in some other scenarios 
as in HADOOP-13381 the actualUgi becomes incorrect. 
  Many servers, that are processing an incoming request that was  authenticated 
via proxy mechanism, setup a proxy-UGI with a real user without credentials, 
because the credentials of the real-user are not really available on the 
server. Therefore, the proxy-ugi is relevant for real authentication only in 
the context of a client. The proxyUgi setup by the server in this context 
should not be propagated for further calls to other services. That means a new 
proxy user should be explicitly setup to make further calls.
   Suppose a general flow goes like this: (===> denotes a remote call)

  client1 > Server1 (Hive, Oozie) Authenticates and creates ugi1> 
Server1 Processes ---> Server1 creates client2 to read encrypted data ===> 
Server2 (NN or KMS)

When Server1 authenticates client1 it creates a ugi1 (which may be a proxy ugi) 
to preserve the context in which authentication of client1 was performed. Now 
when Sever1 instantiates a client2 to make a call to Server2 it should not use 
ugi1 because the authentication context in ugi1 is not relevant for this call. 
In my opinion a new ugi2 should be explicitly setup, which has the right 
credentials. 


> KMSClientProvider combined with KeyProviderCache can result in wrong UGI 
> being used
> ---
>
> Key: HDFS-10757
> URL: https://issues.apache.org/jira/browse/HDFS-10757
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Priority: Critical
>
> ClientContext::get gets the context from CACHE via a config setting based 
> name, then KeyProviderCache stored in ClientContext gets the key provider 
> cached by URI from the configuration, too. These would return the same 
> KeyProvider regardless of current UGI.
> KMSClientProvider caches the UGI (actualUgi) in ctor; that means in 
> particular that all the users of DFS with KMSClientProvider in a process will 
> get the KMS token (along with other credentials) of the first user, via the 
> above cache.
> Either KMSClientProvider shouldn't store the UGI, or one of the caches should 
> be UGI-aware, like the FS object cache.
> Side note: the comment in createConnection that purports to handle the 
> different UGI doesn't seem to cover what it says it covers. In our case, we 
> have two unrelated UGIs with no auth (createRemoteUser) with bunch of tokens, 
> including a KMS token, added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-9696) Garbage snapshot records lingering forever

2016-08-12 Thread Kihwal Lee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-9696:
-
Attachment: HDFS-9696.v2.patch

> Garbage snapshot records lingering forever
> --
>
> Key: HDFS-9696
> URL: https://issues.apache.org/jira/browse/HDFS-9696
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-9696.patch, HDFS-9696.v2.patch
>
>
> We have a cluster where the snapshot feature might have been tested years 
> ago. When the HDFS does not have any snapshot, but I see filediff records 
> persisted in its fsimage.  Since it has been restarted many times and 
> checkpointed over 100 times since then, it must haven been persisted and  
> carried over since then.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-9696) Garbage snapshot records lingering forever

2016-08-12 Thread Kihwal Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419584#comment-15419584
 ] 

Kihwal Lee commented on HDFS-9696:
--

It looks like these tests failed because the snapshot section wasn't present. 
When the existing namenode reloads such an image, the snapshot manager state 
may not be properly reset. I made it skip only the diff section and they seem 
to pass. 

> Garbage snapshot records lingering forever
> --
>
> Key: HDFS-9696
> URL: https://issues.apache.org/jira/browse/HDFS-9696
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-9696.patch, HDFS-9696.v2.patch
>
>
> We have a cluster where the snapshot feature might have been tested years 
> ago. When the HDFS does not have any snapshot, but I see filediff records 
> persisted in its fsimage.  Since it has been restarted many times and 
> checkpointed over 100 times since then, it must haven been persisted and  
> carried over since then.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-9696) Garbage snapshot records lingering forever

2016-08-12 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419494#comment-15419494
 ] 

Tsz Wo Nicholas Sze commented on HDFS-9696:
---

The test failures seem related.  Please take a look. :)

> Garbage snapshot records lingering forever
> --
>
> Key: HDFS-9696
> URL: https://issues.apache.org/jira/browse/HDFS-9696
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-9696.patch
>
>
> We have a cluster where the snapshot feature might have been tested years 
> ago. When the HDFS does not have any snapshot, but I see filediff records 
> persisted in its fsimage.  Since it has been restarted many times and 
> checkpointed over 100 times since then, it must haven been persisted and  
> carried over since then.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10747) o.a.h.hdfs.tools.DebugAdmin usage message is misleading

2016-08-12 Thread Mingliang Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419480#comment-15419480
 ] 

Mingliang Liu commented on HDFS-10747:
--

Thanks for the review, [~jojochuang].

> o.a.h.hdfs.tools.DebugAdmin usage message is misleading
> ---
>
> Key: HDFS-10747
> URL: https://issues.apache.org/jira/browse/HDFS-10747
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.7.0
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
>Priority: Minor
> Attachments: HDFS-10747.000.patch
>
>
> [HDFS-6917] added a helpful hdfs debug command to validate blocks and call 
> recoverlease. The usage doc is kinda misleading, as following:
> {code}
> $ hdfs debug verify
> creating a new configuration
> verify [-meta ] [-block ]
>   Verify HDFS metadata and block files.  If a block file is specified, we
>   will verify that the checksums in the metadata file match the block
>   file.
> {code}
> Actually the {{-meta }} is necessary. {{[]}} is for optional 
> arguments, if we follow the 
> [convention|http://pubs.opengroup.org/onlinepubs/9699919799].
> {code}
> $ hdfs debug recoverLease
> creating a new configuration
> recoverLease [-path ] [-retries ]
>   Recover the lease on the specified path.  The path must reside on an
>   HDFS filesystem.  The default number of retries is 1.
> {code}
> {{-path }} is also the same case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-9696) Garbage snapshot records lingering forever

2016-08-12 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419449#comment-15419449
 ] 

Hadoop QA commented on HDFS-9696:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
14s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
 0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 56m 57s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
18s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 76m 14s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.namenode.snapshot.TestCheckpointsWithSnapshots |
|   | hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12823518/HDFS-9696.patch |
| JIRA Issue | HDFS-9696 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 44f817c68730 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 
20:42:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 23c6e3c |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16416/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16416/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16416/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Garbage snapshot records lingering forever
> --
>
> Key: HDFS-9696
> URL: https://issues.apache.org/jira/browse/HDFS-9696
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Kihwal Lee
>

[jira] [Updated] (HDFS-9696) Garbage snapshot records lingering forever

2016-08-12 Thread Kihwal Lee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-9696:
-
Attachment: HDFS-9696.patch

Attaching a patch containing a unit test. 

> Garbage snapshot records lingering forever
> --
>
> Key: HDFS-9696
> URL: https://issues.apache.org/jira/browse/HDFS-9696
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-9696.patch
>
>
> We have a cluster where the snapshot feature might have been tested years 
> ago. When the HDFS does not have any snapshot, but I see filediff records 
> persisted in its fsimage.  Since it has been restarted many times and 
> checkpointed over 100 times since then, it must haven been persisted and  
> carried over since then.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-9696) Garbage snapshot records lingering forever

2016-08-12 Thread Kihwal Lee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-9696:
-
Status: Patch Available  (was: Reopened)

> Garbage snapshot records lingering forever
> --
>
> Key: HDFS-9696
> URL: https://issues.apache.org/jira/browse/HDFS-9696
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-9696.patch
>
>
> We have a cluster where the snapshot feature might have been tested years 
> ago. When the HDFS does not have any snapshot, but I see filediff records 
> persisted in its fsimage.  Since it has been restarted many times and 
> checkpointed over 100 times since then, it must haven been persisted and  
> carried over since then.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10760) DataXceiver#run() should not log InvalidToken exception as an error

2016-08-12 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419311#comment-15419311
 ] 

Hadoop QA commented on HDFS-10760:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 12m 
21s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 23s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 1 new + 58 unchanged - 0 fixed = 59 total (was 58) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 58m 
42s{color} | {color:green} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 90m 27s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12823430/HADOOP-13492.patch |
| JIRA Issue | HDFS-10760 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux e56684d45254 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 
20:42:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 23c6e3c |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16415/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16415/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16415/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> DataXceiver#run() should not log InvalidToken exception as an error
> ---
>
> Key: HDFS-10760
> URL: https://issues.apache.org/jira/browse/HDFS-10760
> Proj

[jira] [Commented] (HDFS-10636) Modify ReplicaInfo to remove the assumption that replica metadata and data are stored in java.io.File.

2016-08-12 Thread Virajith Jalaparti (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419293#comment-15419293
 ] 

Virajith Jalaparti commented on HDFS-10636:
---

Hi [~eddyxu], 

Thank you for the comments. 

I agree with points 1 and 2, and will fix them.

bq. {{breakHardlinksIfNeeded}}, {{copyMetadata}} and {{copyBlockdata}} should 
not be in {{ReplicaInfo}}. Or should not use {{File}} as input.

Agree that {{copyMetadata}} and {{copyBlockdata}} should not have {{File}} as a 
parameter. I will change it to {{URI}} to be more general. 
{{breakHardLinksIfNeeded}} has always been in {{ReplicaInfo}}. I made it 
abstract and moved the implementation to {{LocalReplica}}. 

bq. {{ReplicaUnderRecovery}}. Is there a way to avoid casting {{ReplicaInfo}} 
to {{LocalReplica}}?

The only place where the fact that {{original}} is a {{LocalReplica}} matters 
is in {{ReplicaUnderRecovery::setDir()}}. One way to address this would be to 
add the cast only when {{original.setDir()}} is called. The other way to deal 
with this would be to add {{setDir}} to {{ReplicaInfo}} but to avoid {{File}} 
as a parameter, it should take in an URI. Which do you think is better?

bq. In general, there are places in this patch that return {{ReplicaInfo}} for 
{{FinalizedReplica}}. which would makes type system weaker and is not 
future-proof. Is it necessary to be changed?

This was intentional. The way I was thinking about it was that the state of 
{{ReplicaInfo}} should be known using {{ReplicaInfo::getState()}}, and not 
using the type system. The current code does the latter -- it uses the type 
system to ensure that replicas are in a certain state. Not relying on the type 
system and using the former (use {{ReplicaInfo::getState()}}) seems a cleaner 
way of doing this. What do you think? Also, {{FinalizedReplica}} in the current 
type hierarchy, is a {{LocalReplica}}. So, referring to replicas using 
{{FinalizedReplica}} assumes that they are {{LocalReplica}} and hence, backed 
by {{File}} s. 

> Modify ReplicaInfo to remove the assumption that replica metadata and data 
> are stored in java.io.File.
> --
>
> Key: HDFS-10636
> URL: https://issues.apache.org/jira/browse/HDFS-10636
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, fs
>Reporter: Virajith Jalaparti
>Assignee: Virajith Jalaparti
> Attachments: HDFS-10636.001.patch, HDFS-10636.002.patch, 
> HDFS-10636.003.patch, HDFS-10636.004.patch, HDFS-10636.005.patch
>
>
> Replace java.io.File related APIs from {{ReplicaInfo}}, and enable the 
> definition of new {{ReplicaInfo}} sub-classes whose metadata and data can be 
> present on external storages (HDFS-9806). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10755) TestDecommissioningStatus BindException Failure

2016-08-12 Thread Eric Badger (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419248#comment-15419248
 ] 

Eric Badger commented on HDFS-10755:


The pre-commit test failure is unrelated to the patch. I believe the patch is 
ready for review. [~kihwal], could you take a look?

> TestDecommissioningStatus BindException Failure
> ---
>
> Key: HDFS-10755
> URL: https://issues.apache.org/jira/browse/HDFS-10755
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-10755.001.patch, HDFS-10755.002.patch
>
>
> Tests in TestDecomissioningStatus call MiniDFSCluster.dataNodeRestart(). They 
> are required to come back up on the same (initially ephemeral) port that they 
> were on before being shutdown. Because of this, there is an inherent race 
> condition where another process could bind to the port while the datanode is 
> down. If this happens then we get a BindException failure. However, all of 
> the tests in TestDecommissioningStatus depend on the cluster being up and 
> running for them to run correctly. So if a test blows up the cluster, the 
> subsequent tests will also fail. Below I show the BindException failure as 
> well as the subsequent test failure that occurred.
> {noformat}
> java.net.BindException: Problem binding to [localhost:35370] 
> java.net.BindException: Address already in use; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:436)
>   at sun.nio.ch.Net.bind(Net.java:428)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>   at org.apache.hadoop.ipc.Server.bind(Server.java:430)
>   at org.apache.hadoop.ipc.Server$Listener.(Server.java:768)
>   at org.apache.hadoop.ipc.Server.(Server.java:2391)
>   at org.apache.hadoop.ipc.RPC$Server.(RPC.java:951)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:523)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:498)
>   at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:796)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initIpcServer(DataNode.java:802)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1134)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:429)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2387)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2274)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2321)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.restartDataNode(MiniDFSCluster.java:2037)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.testDecommissionDeadDN(TestDecommissioningStatus.java:426)
> {noformat}
> {noformat}
> java.lang.AssertionError: Number of Datanodes  expected:<2> but was:<1>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.testDecommissionStatus(TestDecommissioningStatus.java:275)
> {noformat}
> I don't think there's any way to avoid the inherent race condition with 
> getting the same ephemeral port, but we can definitely fix the tests so that 
> it doesn't cause subsequent tests to fail. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10757) KMSClientProvider combined with KeyProviderCache can result in wrong UGI being used

2016-08-12 Thread Arun Suresh (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419234#comment-15419234
 ] 

Arun Suresh commented on HDFS-10757:


Hmmm... I think I remember the context for why it was implemented as such.

bq. If the currentUgi is a proxy user it will have a real UGI. 
currentUgi.getRealUser() should give us the actual ugi.
That is true, but the KMSCP was being implemented around the same time as 
HADOOP-10835. That JIRA was meant to plumb proxy user through HTTP. If you look 
at this 
[snippet|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/web/DelegationTokenAuthenticationFilter.java#L247-L267]
 of code, you will notice that if the currentUser is authenticated via a 
delegation token, the realUser is actually a dummy user created via {{ 
UserGroupInformation.createRemoteUser()}} and does not have any credentials to 
create the connection, which is why I guess it was decided to have a 
loginUgi/actualUgi created in the KMSCP constructor.

> KMSClientProvider combined with KeyProviderCache can result in wrong UGI 
> being used
> ---
>
> Key: HDFS-10757
> URL: https://issues.apache.org/jira/browse/HDFS-10757
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Priority: Critical
>
> ClientContext::get gets the context from CACHE via a config setting based 
> name, then KeyProviderCache stored in ClientContext gets the key provider 
> cached by URI from the configuration, too. These would return the same 
> KeyProvider regardless of current UGI.
> KMSClientProvider caches the UGI (actualUgi) in ctor; that means in 
> particular that all the users of DFS with KMSClientProvider in a process will 
> get the KMS token (along with other credentials) of the first user, via the 
> above cache.
> Either KMSClientProvider shouldn't store the UGI, or one of the caches should 
> be UGI-aware, like the FS object cache.
> Side note: the comment in createConnection that purports to handle the 
> different UGI doesn't seem to cover what it says it covers. In our case, we 
> have two unrelated UGIs with no auth (createRemoteUser) with bunch of tokens, 
> including a KMS token, added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-10757) KMSClientProvider combined with KeyProviderCache can result in wrong UGI being used

2016-08-12 Thread Arun Suresh (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419234#comment-15419234
 ] 

Arun Suresh edited comment on HDFS-10757 at 8/12/16 6:09 PM:
-

Hmmm... I think I remember the context for why it was implemented as such.

bq. If the currentUgi is a proxy user it will have a real UGI. 
currentUgi.getRealUser() should give us the actual ugi.
That is true, but the KMSCP was being implemented around the same time as 
HADOOP-10835. That JIRA was meant to plumb proxy user through HTTP. If you look 
at this 
[snippet|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/web/DelegationTokenAuthenticationFilter.java#L247-L267]
 of code, you will notice that if the currentUser is authenticated via a 
delegation token, the realUser is actually a dummy user created via 
{{UserGroupInformation.createRemoteUser()}} and does not have any credentials 
to create the connection, which is why I guess it was decided to have a 
loginUgi/actualUgi created in the KMSCP constructor.


was (Author: asuresh):
Hmmm... I think I remember the context for why it was implemented as such.

bq. If the currentUgi is a proxy user it will have a real UGI. 
currentUgi.getRealUser() should give us the actual ugi.
That is true, but the KMSCP was being implemented around the same time as 
HADOOP-10835. That JIRA was meant to plumb proxy user through HTTP. If you look 
at this 
[snippet|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/web/DelegationTokenAuthenticationFilter.java#L247-L267]
 of code, you will notice that if the currentUser is authenticated via a 
delegation token, the realUser is actually a dummy user created via {{ 
UserGroupInformation.createRemoteUser()}} and does not have any credentials to 
create the connection, which is why I guess it was decided to have a 
loginUgi/actualUgi created in the KMSCP constructor.

> KMSClientProvider combined with KeyProviderCache can result in wrong UGI 
> being used
> ---
>
> Key: HDFS-10757
> URL: https://issues.apache.org/jira/browse/HDFS-10757
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Priority: Critical
>
> ClientContext::get gets the context from CACHE via a config setting based 
> name, then KeyProviderCache stored in ClientContext gets the key provider 
> cached by URI from the configuration, too. These would return the same 
> KeyProvider regardless of current UGI.
> KMSClientProvider caches the UGI (actualUgi) in ctor; that means in 
> particular that all the users of DFS with KMSClientProvider in a process will 
> get the KMS token (along with other credentials) of the first user, via the 
> above cache.
> Either KMSClientProvider shouldn't store the UGI, or one of the caches should 
> be UGI-aware, like the FS object cache.
> Side note: the comment in createConnection that purports to handle the 
> different UGI doesn't seem to cover what it says it covers. In our case, we 
> have two unrelated UGIs with no auth (createRemoteUser) with bunch of tokens, 
> including a KMS token, added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10746) libhdfs++: synchronize access to working_directory and bytes_read_.

2016-08-12 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419201#comment-15419201
 ] 

Hadoop QA commented on HDFS-10746:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 12m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
27s{color} | {color:green} HDFS-8707 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
10s{color} | {color:green} HDFS-8707 passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  5m 
35s{color} | {color:green} HDFS-8707 passed with JDK v1.7.0_101 {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
14s{color} | {color:green} HDFS-8707 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
13s{color} | {color:green} HDFS-8707 passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  5m 
36s{color} | {color:green} the patch passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  5m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  5m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  5m 
44s{color} | {color:green} the patch passed with JDK v1.7.0_101 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  5m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  5m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  8m 
36s{color} | {color:green} hadoop-hdfs-native-client in the patch passed with 
JDK v1.7.0_101. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
18s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 63m  5s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0cf5e66 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12823502/HDFS-10746.HDFS-8707.001.patch
 |
| JIRA Issue | HDFS-10746 |
| Optional Tests |  asflicense  compile  cc  mvnsite  javac  unit  |
| uname | Linux cef2c55eb629 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 
20:42:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | HDFS-8707 / ea932e7 |
| Default Java | 1.7.0_101 |
| Multi-JDK versions |  /usr/lib/jvm/java-8-oracle:1.8.0_101 
/usr/lib/jvm/java-7-openjdk-amd64:1.7.0_101 |
| JDK v1.7.0_101  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16414/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs-native-client U: 
hadoop-hdfs-project/hadoop-hdfs-native-client |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16414/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> libhdfs++: synchronize access to working_directory and bytes_read_.
> ---
>
> Key: HDFS-10746
> URL: https://issues.apache.org/jira/browse/HDFS-10746
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Anatoli Shein
>Assignee: A

[jira] [Updated] (HDFS-10760) DataXceiver#run() should not log InvalidToken exception as an error

2016-08-12 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-10760:

Status: Patch Available  (was: Open)

> DataXceiver#run() should not log InvalidToken exception as an error
> ---
>
> Key: HDFS-10760
> URL: https://issues.apache.org/jira/browse/HDFS-10760
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha1
>Reporter: Pan Yuxuan
>Assignee: Pan Yuxuan
> Attachments: HADOOP-13492.patch
>
>
> DataXceiver#run() just log InvalidToken exception as an error.
> When client has an expired token and just refetch a new token, the DN log 
> will has an error like below:
> {noformat}
> 2016-08-11 02:41:09,817 ERROR datanode.DataNode (DataXceiver.java:run(269)) - 
> XXX:50010:DataXceiver error processing READ_BLOCK operation  src: 
> /10.17.1.5:38844 dst: /10.17.1.5:50010
> org.apache.hadoop.security.token.SecretManager$InvalidToken: Block token with 
> block_token_identifier (expiryDate=1470850746803, keyId=-2093956963, 
> userId=hbase, blockPoolId=BP-641703426-10.17.1.2-1468517918886, 
> blockId=1077120201, access modes=[READ]) is expired.
> at 
> org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.checkAccess(BlockTokenSecretManager.java:280)
> at 
> org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.checkAccess(BlockTokenSecretManager.java:301)
> at 
> org.apache.hadoop.hdfs.security.token.block.BlockPoolTokenSecretManager.checkAccess(BlockPoolTokenSecretManager.java:97)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1236)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:481)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:242)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> This is not a server error and the DataXceiver#checkAccess() has already 
> loged the InvalidToken as a warning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-10760) DataXceiver#run() should not log InvalidToken exception as an error

2016-08-12 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-10760:

Assignee: Pan Yuxuan

> DataXceiver#run() should not log InvalidToken exception as an error
> ---
>
> Key: HDFS-10760
> URL: https://issues.apache.org/jira/browse/HDFS-10760
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha1
>Reporter: Pan Yuxuan
>Assignee: Pan Yuxuan
> Attachments: HADOOP-13492.patch
>
>
> DataXceiver#run() just log InvalidToken exception as an error.
> When client has an expired token and just refetch a new token, the DN log 
> will has an error like below:
> {noformat}
> 2016-08-11 02:41:09,817 ERROR datanode.DataNode (DataXceiver.java:run(269)) - 
> XXX:50010:DataXceiver error processing READ_BLOCK operation  src: 
> /10.17.1.5:38844 dst: /10.17.1.5:50010
> org.apache.hadoop.security.token.SecretManager$InvalidToken: Block token with 
> block_token_identifier (expiryDate=1470850746803, keyId=-2093956963, 
> userId=hbase, blockPoolId=BP-641703426-10.17.1.2-1468517918886, 
> blockId=1077120201, access modes=[READ]) is expired.
> at 
> org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.checkAccess(BlockTokenSecretManager.java:280)
> at 
> org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.checkAccess(BlockTokenSecretManager.java:301)
> at 
> org.apache.hadoop.hdfs.security.token.block.BlockPoolTokenSecretManager.checkAccess(BlockPoolTokenSecretManager.java:97)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1236)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:481)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:242)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> This is not a server error and the DataXceiver#checkAccess() has already 
> loged the InvalidToken as a warning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Moved] (HDFS-10760) DataXceiver#run() should not log InvalidToken exception as an error

2016-08-12 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer moved HADOOP-13492 to HDFS-10760:
--

Affects Version/s: (was: 3.0.0-alpha1)
   3.0.0-alpha1
  Key: HDFS-10760  (was: HADOOP-13492)
  Project: Hadoop HDFS  (was: Hadoop Common)

> DataXceiver#run() should not log InvalidToken exception as an error
> ---
>
> Key: HDFS-10760
> URL: https://issues.apache.org/jira/browse/HDFS-10760
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha1
>Reporter: Pan Yuxuan
> Attachments: HADOOP-13492.patch
>
>
> DataXceiver#run() just log InvalidToken exception as an error.
> When client has an expired token and just refetch a new token, the DN log 
> will has an error like below:
> {noformat}
> 2016-08-11 02:41:09,817 ERROR datanode.DataNode (DataXceiver.java:run(269)) - 
> XXX:50010:DataXceiver error processing READ_BLOCK operation  src: 
> /10.17.1.5:38844 dst: /10.17.1.5:50010
> org.apache.hadoop.security.token.SecretManager$InvalidToken: Block token with 
> block_token_identifier (expiryDate=1470850746803, keyId=-2093956963, 
> userId=hbase, blockPoolId=BP-641703426-10.17.1.2-1468517918886, 
> blockId=1077120201, access modes=[READ]) is expired.
> at 
> org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.checkAccess(BlockTokenSecretManager.java:280)
> at 
> org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.checkAccess(BlockTokenSecretManager.java:301)
> at 
> org.apache.hadoop.hdfs.security.token.block.BlockPoolTokenSecretManager.checkAccess(BlockPoolTokenSecretManager.java:97)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1236)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:481)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:242)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> This is not a server error and the DataXceiver#checkAccess() has already 
> loged the InvalidToken as a warning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10679) libhdfs++: Implement parallel find with wildcards tool

2016-08-12 Thread James Clampffer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419141#comment-15419141
 ] 

James Clampffer commented on HDFS-10679:


Awesome patch! Your benchmarks demonstrate a lot of the reasons this library 
was built in the first place.  Not only is it way faster than the Java client, 
it's using significant fewer resources.  Things like page faults and contexts 
switches slow down the program but also have significant externalized costs in 
the form of cache/TLB pollution, extra use of bus bandwidth, and extra IRQ 
handling that impact everything else running on the system.  Not much you can 
do if you're stuck in a java environment because cpu/memory bound things are 
never going to win there, but for people writing new code in C/C++ minimizing 
those costs is a huge win.

About the code:
1)
{code}
void FileSystemImpl::FindShim(const Status &stat, 
std::shared_ptr> stat_infos, bool directory_has_more,
std::string path, const std::string &name, std::shared_ptr 
recursion_counter,
std::shared_ptr lock, std::shared_ptr> 
dirs, uint32_t position,
bool searchPath, const std::function>, bool)> &handler) {
{code}
Using this many arguments for a large function makes it really hard to 
distinguish what are local variables and what was passed in.  Could you bundle 
these up into a struct/class that represents the state?  That way any time a 
developer sees some_arg_struct->lock they can infer that it was passed in as an 
argument.  The other benefit that this gives is later on, when you do lambda 
capture by \[=\] you could just explicitly bind to that struct type for the 
recursion.  This code tends to be very dense so being explicit with capture 
lists and type names e.g. avoiding "auto" in non-trivial statements can do a 
lot to improve maintainability.

2) You have a lot of good comments about the control flow.  Could you add a few 
about higher level design?


> libhdfs++: Implement parallel find with wildcards tool
> --
>
> Key: HDFS-10679
> URL: https://issues.apache.org/jira/browse/HDFS-10679
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Anatoli Shein
>Assignee: Anatoli Shein
> Attachments: HDFS-10679.HDFS-8707.000.patch, 
> HDFS-10679.HDFS-8707.001.patch, HDFS-10679.HDFS-8707.002.patch, 
> HDFS-10679.HDFS-8707.003.patch, HDFS-10679.HDFS-8707.004.patch, 
> HDFS-10679.HDFS-8707.005.patch, HDFS-10679.HDFS-8707.006.patch, 
> HDFS-10679.HDFS-8707.007.patch, HDFS-10679.HDFS-8707.008.patch, 
> HDFS-10679.HDFS-8707.009.patch
>
>
> The find tool will issue the GetListing namenode operation on a given 
> directory, and filter the results using posix globbing library.
> If the recursive option is selected, for each returned entry that is a 
> directory the tool will issue another asynchronous call GetListing and repeat 
> the result processing in a recursive fashion.
> One implementation issue that needs to be addressed is the way how results 
> are returned back to the user: we can either buffer the results and return 
> them to the user in bulk, or we can return results continuously as they 
> arrive. While buffering would be an easier solution, returning results as 
> they arrive would be more beneficial to the user in terms of performance, 
> since the result processing can start as soon as the first results arrive 
> without any delay. In order to do that we need the user to use a loop to 
> process arriving results, and we need to send a special message back to the 
> user when the search is over.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-9696) Garbage snapshot records lingering forever

2016-08-12 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419139#comment-15419139
 ] 

Tsz Wo Nicholas Sze commented on HDFS-9696:
---

[~kihwal], the idea is simple and great!  Please submit a patch.  Thanks!

> Garbage snapshot records lingering forever
> --
>
> Key: HDFS-9696
> URL: https://issues.apache.org/jira/browse/HDFS-9696
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
>
> We have a cluster where the snapshot feature might have been tested years 
> ago. When the HDFS does not have any snapshot, but I see filediff records 
> persisted in its fsimage.  Since it has been restarted many times and 
> checkpointed over 100 times since then, it must haven been persisted and  
> carried over since then.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-9696) Garbage snapshot records lingering forever

2016-08-12 Thread Kihwal Lee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-9696:
-
Target Version/s: 2.6.5, 2.7.4

> Garbage snapshot records lingering forever
> --
>
> Key: HDFS-9696
> URL: https://issues.apache.org/jira/browse/HDFS-9696
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
>
> We have a cluster where the snapshot feature might have been tested years 
> ago. When the HDFS does not have any snapshot, but I see filediff records 
> persisted in its fsimage.  Since it has been restarted many times and 
> checkpointed over 100 times since then, it must haven been persisted and  
> carried over since then.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-9696) Garbage snapshot records lingering forever

2016-08-12 Thread Kihwal Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419099#comment-15419099
 ] 

Kihwal Lee commented on HDFS-9696:
--

Does something like this make sense?  Saving a diff section involves iterating 
the entire inode map. When there is no snapshot, we can potentially cut down 
fsimage saving time and reduce java object generation.
{code}
--- 
a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java
+++ 
b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java
@@ -496,7 +496,10 @@ private void saveInternal(FileOutputStream fout,
   Step step = new Step(StepType.INODES, filePath);
   prog.beginStep(Phase.SAVING_CHECKPOINT, step);
   saveInodes(b);
-  saveSnapshots(b);
+  if (context.getSourceNamesystem().getSnapshotManager()
+  .getNumSnapshots() > 0) {
+saveSnapshots(b);
+  }
   prog.endStep(Phase.SAVING_CHECKPOINT, step);
{code}

If no one objects, I will add a test case and submit a patch.

> Garbage snapshot records lingering forever
> --
>
> Key: HDFS-9696
> URL: https://issues.apache.org/jira/browse/HDFS-9696
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Kihwal Lee
>Priority: Critical
>
> We have a cluster where the snapshot feature might have been tested years 
> ago. When the HDFS does not have any snapshot, but I see filediff records 
> persisted in its fsimage.  Since it has been restarted many times and 
> checkpointed over 100 times since then, it must haven been persisted and  
> carried over since then.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Assigned] (HDFS-9696) Garbage snapshot records lingering forever

2016-08-12 Thread Kihwal Lee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee reassigned HDFS-9696:


Assignee: Kihwal Lee

> Garbage snapshot records lingering forever
> --
>
> Key: HDFS-9696
> URL: https://issues.apache.org/jira/browse/HDFS-9696
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
>
> We have a cluster where the snapshot feature might have been tested years 
> ago. When the HDFS does not have any snapshot, but I see filediff records 
> persisted in its fsimage.  Since it has been restarted many times and 
> checkpointed over 100 times since then, it must haven been persisted and  
> carried over since then.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-10746) libhdfs++: synchronize access to working_directory and bytes_read_.

2016-08-12 Thread Anatoli Shein (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anatoli Shein updated HDFS-10746:
-
Attachment: HDFS-10746.HDFS-8707.001.patch

Reattaching for a CI run

> libhdfs++: synchronize access to working_directory and bytes_read_.
> ---
>
> Key: HDFS-10746
> URL: https://issues.apache.org/jira/browse/HDFS-10746
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Anatoli Shein
>Assignee: Anatoli Shein
> Attachments: HDFS-10746.HDFS-8707.000.patch, 
> HDFS-10746.HDFS-8707.001.patch
>
>
> std::string working_directory is located in hdfs.cc and access to it should 
> be synchronized with locks.
> uint64_t bytes_read_; is located in filehandle.h and it should be made atomic 
> in order to be thread safe when multithreading becomes available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-9696) Garbage snapshot records lingering forever

2016-08-12 Thread Kihwal Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419080#comment-15419080
 ] 

Kihwal Lee commented on HDFS-9696:
--

One basic sanity check can be done for cases where there is no snapshot. When 
saving snapshot diff section, we can call {{getNumSnapshots()}} to check 
whether there is any snapshot. If none, saving diff section can be skipped.


> Garbage snapshot records lingering forever
> --
>
> Key: HDFS-9696
> URL: https://issues.apache.org/jira/browse/HDFS-9696
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Kihwal Lee
>Priority: Critical
>
> We have a cluster where the snapshot feature might have been tested years 
> ago. When the HDFS does not have any snapshot, but I see filediff records 
> persisted in its fsimage.  Since it has been restarted many times and 
> checkpointed over 100 times since then, it must haven been persisted and  
> carried over since then.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10746) libhdfs++: synchronize access to working_directory and bytes_read_.

2016-08-12 Thread James Clampffer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419077#comment-15419077
 ] 

James Clampffer commented on HDFS-10746:


Looks good to me, +1.  Will commit once CI runs.

> libhdfs++: synchronize access to working_directory and bytes_read_.
> ---
>
> Key: HDFS-10746
> URL: https://issues.apache.org/jira/browse/HDFS-10746
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Anatoli Shein
>Assignee: Anatoli Shein
> Attachments: HDFS-10746.HDFS-8707.000.patch
>
>
> std::string working_directory is located in hdfs.cc and access to it should 
> be synchronized with locks.
> uint64_t bytes_read_; is located in filehandle.h and it should be made atomic 
> in order to be thread safe when multithreading becomes available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-10740) libhdfs++: Implement recursive directory generator

2016-08-12 Thread James Clampffer (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Clampffer updated HDFS-10740:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> libhdfs++: Implement recursive directory generator
> --
>
> Key: HDFS-10740
> URL: https://issues.apache.org/jira/browse/HDFS-10740
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Anatoli Shein
>Assignee: Anatoli Shein
> Attachments: HDFS-10740.HDFS-8707.000.patch, 
> HDFS-10740.HDFS-8707.001.patch, HDFS-10740.HDFS-8707.002.patch
>
>
> This tool will allow us do benchmarking/testing our find functionality, and 
> will be a good example showing how to call a large number or namenode 
> operations reqursively.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10740) libhdfs++: Implement recursive directory generator

2016-08-12 Thread James Clampffer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419033#comment-15419033
 ] 

James Clampffer commented on HDFS-10740:


Committed to HDFS-8707.  Thanks [~anatoli.shein]!

> libhdfs++: Implement recursive directory generator
> --
>
> Key: HDFS-10740
> URL: https://issues.apache.org/jira/browse/HDFS-10740
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Anatoli Shein
>Assignee: Anatoli Shein
> Attachments: HDFS-10740.HDFS-8707.000.patch, 
> HDFS-10740.HDFS-8707.001.patch, HDFS-10740.HDFS-8707.002.patch
>
>
> This tool will allow us do benchmarking/testing our find functionality, and 
> will be a good example showing how to call a large number or namenode 
> operations reqursively.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-9696) Garbage snapshot records lingering forever

2016-08-12 Thread Yongjun Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419007#comment-15419007
 ] 

Yongjun Zhang commented on HDFS-9696:
-

Thanks for the info [~kihwal]!


> Garbage snapshot records lingering forever
> --
>
> Key: HDFS-9696
> URL: https://issues.apache.org/jira/browse/HDFS-9696
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Kihwal Lee
>Priority: Critical
>
> We have a cluster where the snapshot feature might have been tested years 
> ago. When the HDFS does not have any snapshot, but I see filediff records 
> persisted in its fsimage.  Since it has been restarted many times and 
> checkpointed over 100 times since then, it must haven been persisted and  
> carried over since then.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-10759) Change fsimage bool isStriped from boolean to an enum

2016-08-12 Thread Ewan Higgs (JIRA)

Ewan Higgs created HDFS-10759:
-

 Summary: Change fsimage bool isStriped from boolean to an enum
 Key: HDFS-10759
 URL: https://issues.apache.org/jira/browse/HDFS-10759
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs
Affects Versions: 3.0.0-alpha1, 3.0.0-beta1, 3.0.0-alpha2
Reporter: Ewan Higgs


The new erasure coding project has updated the protocol for fsimage such that 
the {{INodeFile}} has a boolean '{{isStriped}}'. I think this is better as an 
enum or integer since a boolean precludes any future block types. 

For example:

{code}
enum BlockType {
  CONTIGUOUS = 0,
  STRIPED = 1,
}
{code}
We can also make this more robust to future changes where there are different 
block types supported in a staged rollout.  Here, we would use 
{{UNKNOWN_BLOCK_TYPE}} as the first value since this is the default value. See 
[here|http://androiddevblog.com/protocol-buffers-pitfall-adding-enum-values/] 
for more discussion.

{code}
enum BlockType {
  UNKNOWN_BLOCK_TYPE = 0,
  CONTIGUOUS = 1,
  STRIPED = 2,
}
{code}

But I'm not convinced this is necessary since there are other enums that don't 
use this approach.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10755) TestDecommissioningStatus BindException Failure

2016-08-12 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15418985#comment-15418985
 ] 

Hadoop QA commented on HDFS-10755:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 76m 41s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 99m 59s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12823464/HDFS-10755.002.patch |
| JIRA Issue | HDFS-10755 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 5e3df241e573 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 9019606 |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16413/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16413/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16413/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> TestDecommissioningStatus BindException Failure
> ---
>
> Key: HDFS-10755
> URL: https://issues.apache.org/jira/browse/HDFS-10755
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-10755.001.pa

[jira] [Commented] (HDFS-10758) ReconfigurableBase can log sensitive information

2016-08-12 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15418982#comment-15418982
 ] 

Sean Busbey commented on HDFS-10758:


bq.  I think a generic mechanism for redacting sensitive information for 
textual display will be useful to some of the web UIs too.

Should this be in the Hadoop Common tracker so that the solution can be 
leveraged by both HDFS and YARN?

> ReconfigurableBase can log sensitive information
> 
>
> Key: HDFS-10758
> URL: https://issues.apache.org/jira/browse/HDFS-10758
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>
> ReconfigurableBase will log old and new configuration values, which may cause 
> sensitive parameters (most notably cloud storage keys, though there may be 
> other instances) to get included in the logs. 
> Given the currently small list of reconfigurable properties, an argument 
> could be made for simply not logging the property values at all, but this is 
> not the only instance where potentially sensitive configuration gets written 
> somewhere else in plaintext. I think a generic mechanism for redacting 
> sensitive information for textual display will be useful to some of the web 
> UIs too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-10758) ReconfigurableBase can log sensitive information

2016-08-12 Thread Sean Mackrory (JIRA)

Sean Mackrory created HDFS-10758:


 Summary: ReconfigurableBase can log sensitive information
 Key: HDFS-10758
 URL: https://issues.apache.org/jira/browse/HDFS-10758
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Sean Mackrory
Assignee: Sean Mackrory


ReconfigurableBase will log old and new configuration values, which may cause 
sensitive parameters (most notably cloud storage keys, though there may be 
other instances) to get included in the logs. 

Given the currently small list of reconfigurable properties, an argument could 
be made for simply not logging the property values at all, but this is not the 
only instance where potentially sensitive configuration gets written somewhere 
else in plaintext. I think a generic mechanism for redacting sensitive 
information for textual display will be useful to some of the web UIs too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-9696) Garbage snapshot records lingering forever

2016-08-12 Thread Kihwal Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15418942#comment-15418942
 ] 

Kihwal Lee commented on HDFS-9696:
--

It turns out that HDFS-9406 is not related to this issue.

The garbage snapshot filediffs with snapshotId=-1 were being generated by a bug 
fixed in HDFS-7056 by [~zero45]. 
{code}
   /** Is this inode in the latest snapshot? */
   public final boolean isInLatestSnapshot(final int latestSnapshotId) {
-if (latestSnapshotId == Snapshot.CURRENT_STATE_ID) {
+if (latestSnapshotId == Snapshot.CURRENT_STATE_ID ||
+latestSnapshotId == Snapshot.NO_SNAPSHOT_ID) {
   return false;
 }
{code}
[~shv] explained,
{quote}
(7) Plamen says this is because Snapshot.findLatestSnapshot() may return 
NO_SNAPSHOT_ID, which breaks recordModification() if you don't have that 
additional check. We see it when commitBlockSynchronization() is called for 
truncated block.
{quote}

We have actually traced the generation of these filediff entries to 
{{commitBlockSynchronization()}} activities when the NN was running 2.5. This 
stops in 2.7 thanks to HDFS-7056.  However, the garbage lives on until those 
files are deleted.  Can we have a sanity check during snapshot diff loading so 
that these entries can be discarded?

> Garbage snapshot records lingering forever
> --
>
> Key: HDFS-9696
> URL: https://issues.apache.org/jira/browse/HDFS-9696
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Kihwal Lee
>Priority: Critical
>
> We have a cluster where the snapshot feature might have been tested years 
> ago. When the HDFS does not have any snapshot, but I see filediff records 
> persisted in its fsimage.  Since it has been restarted many times and 
> checkpointed over 100 times since then, it must haven been persisted and  
> carried over since then.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Reopened] (HDFS-9696) Garbage snapshot records lingering forever

2016-08-12 Thread Kihwal Lee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee reopened HDFS-9696:
--
  Assignee: (was: Yongjun Zhang)

> Garbage snapshot records lingering forever
> --
>
> Key: HDFS-9696
> URL: https://issues.apache.org/jira/browse/HDFS-9696
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Kihwal Lee
>Priority: Critical
>
> We have a cluster where the snapshot feature might have been tested years 
> ago. When the HDFS does not have any snapshot, but I see filediff records 
> persisted in its fsimage.  Since it has been restarted many times and 
> checkpointed over 100 times since then, it must haven been persisted and  
> carried over since then.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-10755) TestDecommissioningStatus BindException Failure

2016-08-12 Thread Eric Badger (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HDFS-10755:
---
Attachment: HDFS-10755.002.patch

Attaching patch to address the checkstyle comments. Both of the test failures 
seem unrelated and they did not fail locally when I ran them with this patch.

> TestDecommissioningStatus BindException Failure
> ---
>
> Key: HDFS-10755
> URL: https://issues.apache.org/jira/browse/HDFS-10755
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-10755.001.patch, HDFS-10755.002.patch
>
>
> Tests in TestDecomissioningStatus call MiniDFSCluster.dataNodeRestart(). They 
> are required to come back up on the same (initially ephemeral) port that they 
> were on before being shutdown. Because of this, there is an inherent race 
> condition where another process could bind to the port while the datanode is 
> down. If this happens then we get a BindException failure. However, all of 
> the tests in TestDecommissioningStatus depend on the cluster being up and 
> running for them to run correctly. So if a test blows up the cluster, the 
> subsequent tests will also fail. Below I show the BindException failure as 
> well as the subsequent test failure that occurred.
> {noformat}
> java.net.BindException: Problem binding to [localhost:35370] 
> java.net.BindException: Address already in use; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:436)
>   at sun.nio.ch.Net.bind(Net.java:428)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>   at org.apache.hadoop.ipc.Server.bind(Server.java:430)
>   at org.apache.hadoop.ipc.Server$Listener.(Server.java:768)
>   at org.apache.hadoop.ipc.Server.(Server.java:2391)
>   at org.apache.hadoop.ipc.RPC$Server.(RPC.java:951)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:523)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:498)
>   at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:796)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initIpcServer(DataNode.java:802)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1134)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:429)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2387)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2274)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2321)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.restartDataNode(MiniDFSCluster.java:2037)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.testDecommissionDeadDN(TestDecommissioningStatus.java:426)
> {noformat}
> {noformat}
> java.lang.AssertionError: Number of Datanodes  expected:<2> but was:<1>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.testDecommissionStatus(TestDecommissioningStatus.java:275)
> {noformat}
> I don't think there's any way to avoid the inherent race condition with 
> getting the same ephemeral port, but we can definitely fix the tests so that 
> it doesn't cause subsequent tests to fail. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10747) o.a.h.hdfs.tools.DebugAdmin usage message is misleading

2016-08-12 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15418830#comment-15418830
 ] 

Wei-Chiu Chuang commented on HDFS-10747:


Yes the usage is misleading. Thanks [~liuml07] for bringing this up. I think 
the patch looks good to me.

> o.a.h.hdfs.tools.DebugAdmin usage message is misleading
> ---
>
> Key: HDFS-10747
> URL: https://issues.apache.org/jira/browse/HDFS-10747
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.7.0
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
>Priority: Minor
> Attachments: HDFS-10747.000.patch
>
>
> [HDFS-6917] added a helpful hdfs debug command to validate blocks and call 
> recoverlease. The usage doc is kinda misleading, as following:
> {code}
> $ hdfs debug verify
> creating a new configuration
> verify [-meta ] [-block ]
>   Verify HDFS metadata and block files.  If a block file is specified, we
>   will verify that the checksums in the metadata file match the block
>   file.
> {code}
> Actually the {{-meta }} is necessary. {{[]}} is for optional 
> arguments, if we follow the 
> [convention|http://pubs.opengroup.org/onlinepubs/9699919799].
> {code}
> $ hdfs debug recoverLease
> creating a new configuration
> recoverLease [-path ] [-retries ]
>   Recover the lease on the specified path.  The path must reside on an
>   HDFS filesystem.  The default number of retries is 1.
> {code}
> {{-path }} is also the same case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10731) FSDirectory#verifyMaxDirItems does not log path name

2016-08-12 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15418813#comment-15418813
 ] 

Hudson commented on HDFS-10731:
---

SUCCESS: Integrated in Hadoop-trunk-Commit #10268 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/10268/])
HDFS-10731. FSDirectory#verifyMaxDirItems does not log path name. (weichiu: rev 
9019606b69bfb7019c8642b6cbcbb93645cc19e3)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/FSLimitException.java


> FSDirectory#verifyMaxDirItems does not log path name
> 
>
> Key: HDFS-10731
> URL: https://issues.apache.org/jira/browse/HDFS-10731
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.2
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Minor
>  Labels: supportability
> Fix For: 2.8.0
>
> Attachments: HDFS-10731.001.patch
>
>
> {quote}
> 2016-08-05 14:42:04,687 ERROR 
> org.apache.hadoop.hdfs.server.namenode.NameNode: 
> FSDirectory.verifyMaxDirItems: The directory item limit of null is exceeded: 
> limit=1048576 items=1048576
> {quote}
> The error message above logs the path name incorrectly (null). Without the 
> path name it is hard to tell which directory is in trouble. The exception 
> should set the path name before being logged.
> This bug was seen on a CDH 5.5.2 cluster, but CDH5.5.2 is roughly up to date 
> with Apache Hadoop 2.7.2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-10731) FSDirectory#verifyMaxDirItems does not log path name

2016-08-12 Thread Wei-Chiu Chuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-10731:
---
   Resolution: Fixed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

Committed to trunk, branch-2 and branch-2.8
Thanks [~xiaochen] for reviewing the patch!

> FSDirectory#verifyMaxDirItems does not log path name
> 
>
> Key: HDFS-10731
> URL: https://issues.apache.org/jira/browse/HDFS-10731
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.2
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Minor
>  Labels: supportability
> Fix For: 2.8.0
>
> Attachments: HDFS-10731.001.patch
>
>
> {quote}
> 2016-08-05 14:42:04,687 ERROR 
> org.apache.hadoop.hdfs.server.namenode.NameNode: 
> FSDirectory.verifyMaxDirItems: The directory item limit of null is exceeded: 
> limit=1048576 items=1048576
> {quote}
> The error message above logs the path name incorrectly (null). Without the 
> path name it is hard to tell which directory is in trouble. The exception 
> should set the path name before being logged.
> This bug was seen on a CDH 5.5.2 cluster, but CDH5.5.2 is roughly up to date 
> with Apache Hadoop 2.7.2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Assigned] (HDFS-10754) libhdfs++: Create tools directory and implement hdfs_cat

2016-08-12 Thread Anatoli Shein (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anatoli Shein reassigned HDFS-10754:


Assignee: Anatoli Shein

> libhdfs++: Create tools directory and implement hdfs_cat
> 
>
> Key: HDFS-10754
> URL: https://issues.apache.org/jira/browse/HDFS-10754
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Anatoli Shein
>Assignee: Anatoli Shein
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10609) Uncaught InvalidEncryptionKeyException during pipeline recovery may abort downstream applications

2016-08-12 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15418759#comment-15418759
 ] 

Wei-Chiu Chuang commented on HDFS-10609:


[~xiaochen] would you mind to take a look at this patch? Thx!

> Uncaught InvalidEncryptionKeyException during pipeline recovery may abort 
> downstream applications
> -
>
> Key: HDFS-10609
> URL: https://issues.apache.org/jira/browse/HDFS-10609
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption
>Affects Versions: 2.6.0
> Environment: CDH5.8.0
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-10609.001.patch, HDFS-10609.002.patch
>
>
> In normal operations, if SASL negotiation fails due to 
> {{InvalidEncryptionKeyException}}, it is typically a benign exception, which 
> is caught and retried :
> {code:title=SaslDataTransferServer#doSaslHandshake}
>   if (ioe instanceof SaslException &&
>   ioe.getCause() != null &&
>   ioe.getCause() instanceof InvalidEncryptionKeyException) {
> // This could just be because the client is long-lived and hasn't gotten
> // a new encryption key from the NN in a while. Upon receiving this
> // error, the client will get a new encryption key from the NN and retry
> // connecting to this DN.
> sendInvalidKeySaslErrorMessage(out, ioe.getCause().getMessage());
>   } 
> {code}
> {code:title=DFSOutputStream.DataStreamer#createBlockOutputStream}
> if (ie instanceof InvalidEncryptionKeyException && refetchEncryptionKey > 0) {
> DFSClient.LOG.info("Will fetch a new encryption key and retry, " 
> + "encryption key was invalid when connecting to "
> + nodes[0] + " : " + ie);
> {code}
> However, if the exception is thrown during pipeline recovery, the 
> corresponding code does not handle it properly, and the exception is spilled 
> out to downstream applications, such as SOLR, aborting its operation:
> {quote}
> 2016-07-06 12:12:51,992 ERROR org.apache.solr.update.HdfsTransactionLog: 
> Exception closing tlog.
> org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: 
> Can't re-compute encryption key for nonce, since the required block key 
> (keyID=557709482) doesn't exist. Current key: 1350592619
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.transfer(DFSOutputStream.java:1308)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1272)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1433)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:1147)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:632)
> 2016-07-06 12:12:51,997 ERROR org.apache.solr.update.CommitTracker: auto 
> commit error...:org.apache.solr.common.SolrException: 
> org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: 
> Can't re-compute encryption key for nonce, since the required block key 
> (keyID=557709482) doesn't exist. Current key: 1350592619
> at 
> org.apache.solr.update.HdfsTransactionLog.close(HdfsTransactionLog.java:316)
> at 
> org.apache.solr.update.TransactionLog.decref(TransactionLog.java:505)
> at org.apache.solr.update.UpdateLog.addOldLog(UpdateLog.java:380)
> at org.apache.solr.update.UpdateLog.postCommit(UpdateLog.java:676)
> at 
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:623)
> at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurr

[jira] [Updated] (HDFS-8312) Trash does not descent into child directories to check for permissions

2016-08-12 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated HDFS-8312:
--
Priority: Critical  (was: Major)

> Trash does not descent into child directories to check for permissions
> --
>
> Key: HDFS-8312
> URL: https://issues.apache.org/jira/browse/HDFS-8312
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs, security
>Affects Versions: 2.2.0, 2.6.0, 2.7.2
>Reporter: Eric Yang
>Assignee: Weiwei Yang
>Priority: Critical
> Attachments: HDFS-8312-testcase.patch
>
>
> HDFS trash does not descent into child directory to check if user has 
> permission to delete files.  For example:
> Run the following command to initialize directory structure as super user:
> {code}
> hadoop fs -mkdir /BSS/level1
> hadoop fs -mkdir /BSS/level1/level2
> hadoop fs -mkdir /BSS/level1/level2/level3
> hadoop fs -put /tmp/appConfig.json /BSS/level1/level2/level3/testfile.txt
> hadoop fs -chown user1:users /BSS/level1/level2/level3/testfile.txt
> hadoop fs -chown -R user1:users /BSS/level1
> hadoop fs -chown -R 750 /BSS/level1
> hadoop fs -chmod -R 640 /BSS/level1/level2/level3/testfile.txt
> hadoop fs -chmod 775 /BSS
> {code}
> Change to a normal user called user2. 
> When trash is enabled:
> {code}
> sudo su user2 -
> hadoop fs -rm -r /BSS/level1
> 15/05/01 16:51:20 INFO fs.TrashPolicyDefault: Namenode trash configuration: 
> Deletion interval = 3600 minutes, Emptier interval = 0 minutes.
> Moved: 'hdfs://bdvs323.svl.ibm.com:9000/BSS/level1' to trash at: 
> hdfs://bdvs323.svl.ibm.com:9000/user/user2/.Trash/Current
> {code}
> When trash is disabled:
> {code}
> /opt/ibm/biginsights/IHC/bin/hadoop fs -Dfs.trash.interval=0 -rm -r 
> /BSS/level1
> 15/05/01 16:58:31 INFO fs.TrashPolicyDefault: Namenode trash configuration: 
> Deletion interval = 0 minutes, Emptier interval = 0 minutes.
> rm: Permission denied: user=user2, access=ALL, 
> inode="/BSS/level1":user1:users:drwxr-x---
> {code}
> There is inconsistency between trash behavior and delete behavior.  When 
> trash is enabled, files owned by user1 is deleted by user2.  It looks like 
> trash does not recursively validate if the child directory files can be 
> removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10757) KMSClientProvider combined with KeyProviderCache can result in wrong UGI being used

2016-08-12 Thread Jitendra Nath Pandey (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15418409#comment-15418409
 ] 

Jitendra Nath Pandey commented on HDFS-10757:
-

  I think storing the {{actualUgi}} in KMSClientProvider is incorrect because 
the providers are cached for a long time, and the currentUGI may be completely 
different from the actualUGI.  Therefore, it may be a good idea to consider 
removing actualUgi from KMSClientProvider. I am inclined to say that setting up 
of the UGI should be done by client code using the FileSystem. The 
KMSClientProvider on every call should only check following: If the currentUGI 
has a realUgi, us the realUgi as actualUgi or use the currentUgi as the 
actualUgi. 
  I may not have the whole context on why actualUgi was added in the 
constructor of KMSClientProvider, but would like to understand.

> KMSClientProvider combined with KeyProviderCache can result in wrong UGI 
> being used
> ---
>
> Key: HDFS-10757
> URL: https://issues.apache.org/jira/browse/HDFS-10757
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Priority: Critical
>
> ClientContext::get gets the context from CACHE via a config setting based 
> name, then KeyProviderCache stored in ClientContext gets the key provider 
> cached by URI from the configuration, too. These would return the same 
> KeyProvider regardless of current UGI.
> KMSClientProvider caches the UGI (actualUgi) in ctor; that means in 
> particular that all the users of DFS with KMSClientProvider in a process will 
> get the KMS token (along with other credentials) of the first user, via the 
> above cache.
> Either KMSClientProvider shouldn't store the UGI, or one of the caches should 
> be UGI-aware, like the FS object cache.
> Side note: the comment in createConnection that purports to handle the 
> different UGI doesn't seem to cover what it says it covers. In our case, we 
> have two unrelated UGIs with no auth (createRemoteUser) with bunch of tokens, 
> including a KMS token, added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-8668) Erasure Coding: revisit buffer used for encoding and decoding.

2016-08-12 Thread SammiChen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HDFS-8668:

Attachment: HDFS-8668-v9.patch

> Erasure Coding: revisit buffer used for encoding and decoding.
> --
>
> Key: HDFS-8668
> URL: https://issues.apache.org/jira/browse/HDFS-8668
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Yi Liu
>Assignee: SammiChen
> Attachments: HDFS-8668-v1.patch, HDFS-8668-v2.patch, 
> HDFS-8668-v3.patch, HDFS-8668-v4.patch, HDFS-8668-v5.patch, 
> HDFS-8668-v6.patch, HDFS-8668-v7.patch, HDFS-8668-v8.patch, HDFS-8668-v9.patch
>
>
> For encoding and decoding buffers, currently some places use java heap 
> ByteBuffer,  some use direct byteBUffer, and some use java byte array.  If 
> the coder implementation is native, we should use direct ByteBuffer. This 
> jira is to  revisit all encoding/decoding buffers and improve them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-8312) Trash does not descent into child directories to check for permissions

2016-08-12 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15418538#comment-15418538
 ] 

Weiwei Yang commented on HDFS-8312:
---

Attached a test case demonstrated this issue, see [^HDFS-8312-testcase.patch]. 
Also thinking on how to fix it. There are generally two options 

1) Trash now calls {{FileSystem.rename}} to move file/dir to trash dir, so it 
only checks rename permission. A fix is to add a new method to check if delete 
is permitted, expose that from FileSystem API so we can check if user has 
permission to delete before rename in trash code.

2) Improve {{Emptier}} code logic to let emptier run per user, so even user 
removes somebody else stuff to trash, the emptier will still not be able to 
remove it because it's not permitted by this user. This is better than delete 
... 

I personal prefer option 1 because 2 looks like a partial fix, we should avoid 
user moving things to trash if it is not allowed to at first place. 

Any suggestions ? Appreciate!

> Trash does not descent into child directories to check for permissions
> --
>
> Key: HDFS-8312
> URL: https://issues.apache.org/jira/browse/HDFS-8312
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs, security
>Affects Versions: 2.2.0, 2.6.0, 2.7.2
>Reporter: Eric Yang
>Assignee: Weiwei Yang
> Attachments: HDFS-8312-testcase.patch
>
>
> HDFS trash does not descent into child directory to check if user has 
> permission to delete files.  For example:
> Run the following command to initialize directory structure as super user:
> {code}
> hadoop fs -mkdir /BSS/level1
> hadoop fs -mkdir /BSS/level1/level2
> hadoop fs -mkdir /BSS/level1/level2/level3
> hadoop fs -put /tmp/appConfig.json /BSS/level1/level2/level3/testfile.txt
> hadoop fs -chown user1:users /BSS/level1/level2/level3/testfile.txt
> hadoop fs -chown -R user1:users /BSS/level1
> hadoop fs -chown -R 750 /BSS/level1
> hadoop fs -chmod -R 640 /BSS/level1/level2/level3/testfile.txt
> hadoop fs -chmod 775 /BSS
> {code}
> Change to a normal user called user2. 
> When trash is enabled:
> {code}
> sudo su user2 -
> hadoop fs -rm -r /BSS/level1
> 15/05/01 16:51:20 INFO fs.TrashPolicyDefault: Namenode trash configuration: 
> Deletion interval = 3600 minutes, Emptier interval = 0 minutes.
> Moved: 'hdfs://bdvs323.svl.ibm.com:9000/BSS/level1' to trash at: 
> hdfs://bdvs323.svl.ibm.com:9000/user/user2/.Trash/Current
> {code}
> When trash is disabled:
> {code}
> /opt/ibm/biginsights/IHC/bin/hadoop fs -Dfs.trash.interval=0 -rm -r 
> /BSS/level1
> 15/05/01 16:58:31 INFO fs.TrashPolicyDefault: Namenode trash configuration: 
> Deletion interval = 0 minutes, Emptier interval = 0 minutes.
> rm: Permission denied: user=user2, access=ALL, 
> inode="/BSS/level1":user1:users:drwxr-x---
> {code}
> There is inconsistency between trash behavior and delete behavior.  When 
> trash is enabled, files owned by user1 is deleted by user2.  It looks like 
> trash does not recursively validate if the child directory files can be 
> removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-8312) Trash does not descent into child directories to check for permissions

2016-08-12 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated HDFS-8312:
--
Attachment: HDFS-8312-testcase.patch

> Trash does not descent into child directories to check for permissions
> --
>
> Key: HDFS-8312
> URL: https://issues.apache.org/jira/browse/HDFS-8312
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs, security
>Affects Versions: 2.2.0, 2.6.0, 2.7.2
>Reporter: Eric Yang
>Assignee: Weiwei Yang
> Attachments: HDFS-8312-testcase.patch
>
>
> HDFS trash does not descent into child directory to check if user has 
> permission to delete files.  For example:
> Run the following command to initialize directory structure as super user:
> {code}
> hadoop fs -mkdir /BSS/level1
> hadoop fs -mkdir /BSS/level1/level2
> hadoop fs -mkdir /BSS/level1/level2/level3
> hadoop fs -put /tmp/appConfig.json /BSS/level1/level2/level3/testfile.txt
> hadoop fs -chown user1:users /BSS/level1/level2/level3/testfile.txt
> hadoop fs -chown -R user1:users /BSS/level1
> hadoop fs -chown -R 750 /BSS/level1
> hadoop fs -chmod -R 640 /BSS/level1/level2/level3/testfile.txt
> hadoop fs -chmod 775 /BSS
> {code}
> Change to a normal user called user2. 
> When trash is enabled:
> {code}
> sudo su user2 -
> hadoop fs -rm -r /BSS/level1
> 15/05/01 16:51:20 INFO fs.TrashPolicyDefault: Namenode trash configuration: 
> Deletion interval = 3600 minutes, Emptier interval = 0 minutes.
> Moved: 'hdfs://bdvs323.svl.ibm.com:9000/BSS/level1' to trash at: 
> hdfs://bdvs323.svl.ibm.com:9000/user/user2/.Trash/Current
> {code}
> When trash is disabled:
> {code}
> /opt/ibm/biginsights/IHC/bin/hadoop fs -Dfs.trash.interval=0 -rm -r 
> /BSS/level1
> 15/05/01 16:58:31 INFO fs.TrashPolicyDefault: Namenode trash configuration: 
> Deletion interval = 0 minutes, Emptier interval = 0 minutes.
> rm: Permission denied: user=user2, access=ALL, 
> inode="/BSS/level1":user1:users:drwxr-x---
> {code}
> There is inconsistency between trash behavior and delete behavior.  When 
> trash is enabled, files owned by user1 is deleted by user2.  It looks like 
> trash does not recursively validate if the child directory files can be 
> removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-9530) ReservedSpace is not cleared for abandoned Blocks

2016-08-12 Thread Brahma Reddy Battula (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15418460#comment-15418460
 ] 

Brahma Reddy Battula commented on HDFS-9530:


Ok.thanks for feedback allen.

> ReservedSpace is not cleared for abandoned Blocks
> -
>
> Key: HDFS-9530
> URL: https://issues.apache.org/jira/browse/HDFS-9530
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0, 2.7.1
>Reporter: Fei Hui
>Assignee: Brahma Reddy Battula
>Priority: Critical
> Fix For: 2.7.3
>
> Attachments: HDFS-9530-01.patch, HDFS-9530-02.patch, 
> HDFS-9530-03.patch, HDFS-9530-branch-2.6.patch, 
> HDFS-9530-branch-2.7-001.patch, HDFS-9530-branch-2.7-002.patch
>
>
> i think there are bugs in HDFS
> ===
> here is config
>   
> dfs.datanode.data.dir
> 
> 
> file:///mnt/disk4,file:///mnt/disk1,file:///mnt/disk3,file:///mnt/disk2
> 
>   
> here is dfsadmin report 
> [hadoop@worker-1 ~]$ hadoop dfsadmin -report
> DEPRECATED: Use of this script to execute hdfs command is deprecated.
> Instead use the hdfs command for it.
> Configured Capacity: 240769253376 (224.23 GB)
> Present Capacity: 238604832768 (222.22 GB)
> DFS Remaining: 215772954624 (200.95 GB)
> DFS Used: 22831878144 (21.26 GB)
> DFS Used%: 9.57%
> Under replicated blocks: 4
> Blocks with corrupt replicas: 0
> Missing blocks: 0
> -
> Live datanodes (3):
> Name: 10.117.60.59:50010 (worker-2)
> Hostname: worker-2
> Decommission Status : Normal
> Configured Capacity: 80256417792 (74.74 GB)
> DFS Used: 7190958080 (6.70 GB)
> Non DFS Used: 721473536 (688.05 MB)
> DFS Remaining: 72343986176 (67.38 GB)
> DFS Used%: 8.96%
> DFS Remaining%: 90.14%
> Configured Cache Capacity: 0 (0 B)
> Cache Used: 0 (0 B)
> Cache Remaining: 0 (0 B)
> Cache Used%: 100.00%
> Cache Remaining%: 0.00%
> Xceivers: 1
> Last contact: Wed Dec 09 15:55:02 CST 2015
> Name: 10.168.156.0:50010 (worker-3)
> Hostname: worker-3
> Decommission Status : Normal
> Configured Capacity: 80256417792 (74.74 GB)
> DFS Used: 7219073024 (6.72 GB)
> Non DFS Used: 721473536 (688.05 MB)
> DFS Remaining: 72315871232 (67.35 GB)
> DFS Used%: 9.00%
> DFS Remaining%: 90.11%
> Configured Cache Capacity: 0 (0 B)
> Cache Used: 0 (0 B)
> Cache Remaining: 0 (0 B)
> Cache Used%: 100.00%
> Cache Remaining%: 0.00%
> Xceivers: 1
> Last contact: Wed Dec 09 15:55:03 CST 2015
> Name: 10.117.15.38:50010 (worker-1)
> Hostname: worker-1
> Decommission Status : Normal
> Configured Capacity: 80256417792 (74.74 GB)
> DFS Used: 8421847040 (7.84 GB)
> Non DFS Used: 721473536 (688.05 MB)
> DFS Remaining: 71113097216 (66.23 GB)
> DFS Used%: 10.49%
> DFS Remaining%: 88.61%
> Configured Cache Capacity: 0 (0 B)
> Cache Used: 0 (0 B)
> Cache Remaining: 0 (0 B)
> Cache Used%: 100.00%
> Cache Remaining%: 0.00%
> Xceivers: 1
> Last contact: Wed Dec 09 15:55:03 CST 2015
> 
> when running hive job , dfsadmin report as follows
> [hadoop@worker-1 ~]$ hadoop dfsadmin -report
> DEPRECATED: Use of this script to execute hdfs command is deprecated.
> Instead use the hdfs command for it.
> Configured Capacity: 240769253376 (224.23 GB)
> Present Capacity: 108266011136 (100.83 GB)
> DFS Remaining: 80078416384 (74.58 GB)
> DFS Used: 28187594752 (26.25 GB)
> DFS Used%: 26.04%
> Under replicated blocks: 7
> Blocks with corrupt replicas: 0
> Missing blocks: 0
> -
> Live datanodes (3):
> Name: 10.117.60.59:50010 (worker-2)
> Hostname: worker-2
> Decommission Status : Normal
> Configured Capacity: 80256417792 (74.74 GB)
> DFS Used: 9015627776 (8.40 GB)
> Non DFS Used: 44303742464 (41.26 GB)
> DFS Remaining: 26937047552 (25.09 GB)
> DFS Used%: 11.23%
> DFS Remaining%: 33.56%
> Configured Cache Capacity: 0 (0 B)
> Cache Used: 0 (0 B)
> Cache Remaining: 0 (0 B)
> Cache Used%: 100.00%
> Cache Remaining%: 0.00%
> Xceivers: 693
> Last contact: Wed Dec 09 15:37:35 CST 2015
> Name: 10.168.156.0:50010 (worker-3)
> Hostname: worker-3
> Decommission Status : Normal
> Configured Capacity: 80256417792 (74.74 GB)
> DFS Used: 9163116544 (8.53 GB)
> Non DFS Used: 47895897600 (44.61 GB)
> DFS Remaining: 23197403648 (21.60 GB)
> DFS Used%: 11.42%
> DFS Remaining%: 28.90%
> Configured Cache Capacity: 0 (0 B)
> Cache Used: 0 (0 B)
> Cache Remaining: 0 (0 B)
> Cache Used%: 100.00%
> Cache Remaining%: 0.00%
> Xceivers: 750
> Last contact: Wed Dec 09 15:37:36 CST 2015
> Name: 10.117.15.38:50010 (worker-1)
> Hostname: worker-1
> Decommission Status : Normal
> Configured Capacity: 80256417792 (74.74 GB)
> DFS Used: 10008850432 (9.32 GB)
> Non DFS Used: 40303602176 (3

[jira] [Commented] (HDFS-9530) ReservedSpace is not cleared for abandoned Blocks

2016-08-12 Thread Brahma Reddy Battula (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15418458#comment-15418458
 ] 

Brahma Reddy Battula commented on HDFS-9530:


Ok.. thanks arpit.. Even I ran before uploading the patch,did not induced any 
test failure.

> ReservedSpace is not cleared for abandoned Blocks
> -
>
> Key: HDFS-9530
> URL: https://issues.apache.org/jira/browse/HDFS-9530
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0, 2.7.1
>Reporter: Fei Hui
>Assignee: Brahma Reddy Battula
>Priority: Critical
> Fix For: 2.7.3
>
> Attachments: HDFS-9530-01.patch, HDFS-9530-02.patch, 
> HDFS-9530-03.patch, HDFS-9530-branch-2.6.patch, 
> HDFS-9530-branch-2.7-001.patch, HDFS-9530-branch-2.7-002.patch
>
>
> i think there are bugs in HDFS
> ===
> here is config
>   
> dfs.datanode.data.dir
> 
> 
> file:///mnt/disk4,file:///mnt/disk1,file:///mnt/disk3,file:///mnt/disk2
> 
>   
> here is dfsadmin report 
> [hadoop@worker-1 ~]$ hadoop dfsadmin -report
> DEPRECATED: Use of this script to execute hdfs command is deprecated.
> Instead use the hdfs command for it.
> Configured Capacity: 240769253376 (224.23 GB)
> Present Capacity: 238604832768 (222.22 GB)
> DFS Remaining: 215772954624 (200.95 GB)
> DFS Used: 22831878144 (21.26 GB)
> DFS Used%: 9.57%
> Under replicated blocks: 4
> Blocks with corrupt replicas: 0
> Missing blocks: 0
> -
> Live datanodes (3):
> Name: 10.117.60.59:50010 (worker-2)
> Hostname: worker-2
> Decommission Status : Normal
> Configured Capacity: 80256417792 (74.74 GB)
> DFS Used: 7190958080 (6.70 GB)
> Non DFS Used: 721473536 (688.05 MB)
> DFS Remaining: 72343986176 (67.38 GB)
> DFS Used%: 8.96%
> DFS Remaining%: 90.14%
> Configured Cache Capacity: 0 (0 B)
> Cache Used: 0 (0 B)
> Cache Remaining: 0 (0 B)
> Cache Used%: 100.00%
> Cache Remaining%: 0.00%
> Xceivers: 1
> Last contact: Wed Dec 09 15:55:02 CST 2015
> Name: 10.168.156.0:50010 (worker-3)
> Hostname: worker-3
> Decommission Status : Normal
> Configured Capacity: 80256417792 (74.74 GB)
> DFS Used: 7219073024 (6.72 GB)
> Non DFS Used: 721473536 (688.05 MB)
> DFS Remaining: 72315871232 (67.35 GB)
> DFS Used%: 9.00%
> DFS Remaining%: 90.11%
> Configured Cache Capacity: 0 (0 B)
> Cache Used: 0 (0 B)
> Cache Remaining: 0 (0 B)
> Cache Used%: 100.00%
> Cache Remaining%: 0.00%
> Xceivers: 1
> Last contact: Wed Dec 09 15:55:03 CST 2015
> Name: 10.117.15.38:50010 (worker-1)
> Hostname: worker-1
> Decommission Status : Normal
> Configured Capacity: 80256417792 (74.74 GB)
> DFS Used: 8421847040 (7.84 GB)
> Non DFS Used: 721473536 (688.05 MB)
> DFS Remaining: 71113097216 (66.23 GB)
> DFS Used%: 10.49%
> DFS Remaining%: 88.61%
> Configured Cache Capacity: 0 (0 B)
> Cache Used: 0 (0 B)
> Cache Remaining: 0 (0 B)
> Cache Used%: 100.00%
> Cache Remaining%: 0.00%
> Xceivers: 1
> Last contact: Wed Dec 09 15:55:03 CST 2015
> 
> when running hive job , dfsadmin report as follows
> [hadoop@worker-1 ~]$ hadoop dfsadmin -report
> DEPRECATED: Use of this script to execute hdfs command is deprecated.
> Instead use the hdfs command for it.
> Configured Capacity: 240769253376 (224.23 GB)
> Present Capacity: 108266011136 (100.83 GB)
> DFS Remaining: 80078416384 (74.58 GB)
> DFS Used: 28187594752 (26.25 GB)
> DFS Used%: 26.04%
> Under replicated blocks: 7
> Blocks with corrupt replicas: 0
> Missing blocks: 0
> -
> Live datanodes (3):
> Name: 10.117.60.59:50010 (worker-2)
> Hostname: worker-2
> Decommission Status : Normal
> Configured Capacity: 80256417792 (74.74 GB)
> DFS Used: 9015627776 (8.40 GB)
> Non DFS Used: 44303742464 (41.26 GB)
> DFS Remaining: 26937047552 (25.09 GB)
> DFS Used%: 11.23%
> DFS Remaining%: 33.56%
> Configured Cache Capacity: 0 (0 B)
> Cache Used: 0 (0 B)
> Cache Remaining: 0 (0 B)
> Cache Used%: 100.00%
> Cache Remaining%: 0.00%
> Xceivers: 693
> Last contact: Wed Dec 09 15:37:35 CST 2015
> Name: 10.168.156.0:50010 (worker-3)
> Hostname: worker-3
> Decommission Status : Normal
> Configured Capacity: 80256417792 (74.74 GB)
> DFS Used: 9163116544 (8.53 GB)
> Non DFS Used: 47895897600 (44.61 GB)
> DFS Remaining: 23197403648 (21.60 GB)
> DFS Used%: 11.42%
> DFS Remaining%: 28.90%
> Configured Cache Capacity: 0 (0 B)
> Cache Used: 0 (0 B)
> Cache Remaining: 0 (0 B)
> Cache Used%: 100.00%
> Cache Remaining%: 0.00%
> Xceivers: 750
> Last contact: Wed Dec 09 15:37:36 CST 2015
> Name: 10.117.15.38:50010 (worker-1)
> Hostname: worker-1
> Decommission Status : Normal
> Configured Capacity: 80256417792 (74.74 GB)
>

[jira] [Commented] (HDFS-10757) KMSClientProvider combined with KeyProviderCache can result in wrong UGI being used

2016-08-12 Thread Jitendra Nath Pandey (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15418447#comment-15418447
 ] 

Jitendra Nath Pandey commented on HDFS-10757:
-

If the currentUgi is a proxy user it will have a real UGI. 
{{currentUgi.getRealUser()}} should give us the actual ugi.

> KMSClientProvider combined with KeyProviderCache can result in wrong UGI 
> being used
> ---
>
> Key: HDFS-10757
> URL: https://issues.apache.org/jira/browse/HDFS-10757
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Priority: Critical
>
> ClientContext::get gets the context from CACHE via a config setting based 
> name, then KeyProviderCache stored in ClientContext gets the key provider 
> cached by URI from the configuration, too. These would return the same 
> KeyProvider regardless of current UGI.
> KMSClientProvider caches the UGI (actualUgi) in ctor; that means in 
> particular that all the users of DFS with KMSClientProvider in a process will 
> get the KMS token (along with other credentials) of the first user, via the 
> above cache.
> Either KMSClientProvider shouldn't store the UGI, or one of the caches should 
> be UGI-aware, like the FS object cache.
> Side note: the comment in createConnection that purports to handle the 
> different UGI doesn't seem to cover what it says it covers. In our case, we 
> have two unrelated UGIs with no auth (createRemoteUser) with bunch of tokens, 
> including a KMS token, added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

56 matches

Mail list logo