[jira] [Created] (HDFS-13731) Investigate TestReencryption timeouts

2018-07-11 Thread Xiao Chen (JIRA)
Xiao Chen created HDFS-13731:


 Summary: Investigate TestReencryption timeouts
 Key: HDFS-13731
 URL: https://issues.apache.org/jira/browse/HDFS-13731
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: encryption, test
Affects Versions: 3.0.0
Reporter: Xiao Chen


HDFS-12837 fixed some flakiness of Reencryption related tests. But as 
[~zvenczel]'s comment, there are a few timeouts still. We should investigate 
that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13730) BlockReaderRemote.sendReadResult throws NPE

2018-07-11 Thread Wei-Chiu Chuang (JIRA)
Wei-Chiu Chuang created HDFS-13730:
--

 Summary: BlockReaderRemote.sendReadResult throws NPE
 Key: HDFS-13730
 URL: https://issues.apache.org/jira/browse/HDFS-13730
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
 Environment: Hadoop 3.0.0, HBase 2.0.0 + HBASE-20403.
Reporter: Wei-Chiu Chuang


Found the following exception thrown in a HBase RegionServer log (Hadoop 3.0.0 
+ HBase 2.0.0. The hbase prefetch bug HBASE-20403 was fixed on this cluster, 
but I am not sure if that's related at all):
{noformat}
2018-07-11 11:10:44,462 WARN org.apache.hadoop.hbase.io.hfile.HFileReaderImpl: 
Stream moved/closed or prefetch 
cancelled?path=hdfs://ns1/hbase/data/default/IntegrationTestBigLinkedList_20180711003954/449fa9bf5a7483295493258b5af50abc/meta/e9de0683f8a9413a94183c752bea0ca5,
 offset=216505135,
end=2309991906
java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.net.NioInetPeer.getRemoteAddressString(NioInetPeer.java:99)
at 
org.apache.hadoop.hdfs.net.EncryptedPeer.getRemoteAddressString(EncryptedPeer.java:105)
at 
org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.sendReadResult(BlockReaderRemote.java:330)
at 
org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.readNextPacket(BlockReaderRemote.java:233)
at 
org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.read(BlockReaderRemote.java:165)
at 
org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1050)
at 
org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:992)
at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1348)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1312)
at org.apache.hadoop.crypto.CryptoInputStream.read(CryptoInputStream.java:331)
at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)
at 
org.apache.hadoop.hbase.io.hfile.HFileBlock.positionalReadWithExtra(HFileBlock.java:805)
at 
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readAtOffset(HFileBlock.java:1565)
at 
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1769)
at 
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1594)
at 
org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1488)
at 
org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$1.run(HFileReaderImpl.java:278)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748){noformat}
The relevant Hadoop code:
{code:java|title=BlockReaderRemote#sendReadResult}
void sendReadResult(Status statusCode) {
  assert !sentStatusCode : "already sent status code to " + peer;
  try {
writeReadResult(peer.getOutputStream(), statusCode);
sentStatusCode = true;
  } catch (IOException e) {
// It's ok not to be able to send this. But something is probably wrong.
LOG.info("Could not send read status (" + statusCode + ") to datanode " +
peer.getRemoteAddressString() + ": " + e.getMessage());
  }
}
{code}
So the NPE was thrown within a exception handler. A possible explanation could 
be that the socket was closed so client couldn't write, and 
Socket#getRemoteSocketAddress() returns null when the socket is closed.

Suggest check for nullity and return an empty string in
{noformat}
NioInetPeer.getRemoteAddressString{noformat}
.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-251) Integrate BlockDeletingService in KeyValueHandler

2018-07-11 Thread Lokesh Jain (JIRA)
Lokesh Jain created HDDS-251:


 Summary: Integrate BlockDeletingService in KeyValueHandler
 Key: HDDS-251
 URL: https://issues.apache.org/jira/browse/HDDS-251
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode
Reporter: Lokesh Jain
Assignee: Lokesh Jain
 Fix For: 0.2.1


This Jira aims to integrate BlockDeletingService in KeyValueHandler. It also 
fixes the unit tests related to delete blocks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13728) Disk Balaner should not fail if volume usage is greater than capacity

2018-07-11 Thread Stephen O'Donnell (JIRA)
Stephen O'Donnell created HDFS-13728:


 Summary: Disk Balaner should not fail if volume usage is greater 
than capacity
 Key: HDFS-13728
 URL: https://issues.apache.org/jira/browse/HDFS-13728
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: diskbalancer
Affects Versions: 3.0.3
Reporter: Stephen O'Donnell


We have seen a couple of scenarios where the disk balancer fails because a 
datanode reports more spaced used on a disk than its capacity, which should not 
be possible.

This is due to the check below in DiskBalancerVolume.java:

{code}
  public void setUsed(long dfsUsedSpace) {
Preconditions.checkArgument(dfsUsedSpace < this.getCapacity(),
"DiskBalancerVolume.setUsed: dfsUsedSpace(%s) < capacity(%s)",
dfsUsedSpace, getCapacity());
this.used = dfsUsedSpace;
  }
{code}

While I agree that it should not be possible for a DN to report more usage on a 
volume than its capacity, there seems to be some issue that causes this to 
occur sometimes.

In general, this full disk is what causes someone to want to run the Disk 
Balancer, only to find it fails with the error.

There appears to be nothing you can do to force the Disk Balancer to run at 
this point, but in the scenarios I saw, some data was removed from the disk and 
usage dropped below the capacity resolving the issue.

Can we considered relaxing the above check, and if the usage is greater than 
the capacity, just set the usage to the capacity so the calculations all work 
ok?

Eg something like this:

{code}
   public void setUsed(long dfsUsedSpace) {
-Preconditions.checkArgument(dfsUsedSpace < this.getCapacity());
-this.used = dfsUsedSpace;
+if (dfsUsedSpace > this.getCapacity()) {
+  this.used = this.getCapacity();
+} else {
+  this.used = dfsUsedSpace;
+}
   }
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13727) Log full stack trace if DiskBalancer exits with an unhandle exceptiopn

2018-07-11 Thread Stephen O'Donnell (JIRA)
Stephen O'Donnell created HDFS-13727:


 Summary: Log full stack trace if DiskBalancer exits with an 
unhandle exceptiopn
 Key: HDFS-13727
 URL: https://issues.apache.org/jira/browse/HDFS-13727
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: diskbalancer
Affects Versions: 3.0.3
Reporter: Stephen O'Donnell


In HDFS-13175 it was discovered that when a DN reports the usage on a volume to 
be greater than the volume capacity, the disk balancer will fail with an 
unhelpful error:

{code}
$ hdfs diskbalancer -report -top 5

18/06/11 10:19:43 INFO command.Command: Processing report command
18/06/11 10:19:44 INFO balancer.KeyManager: Block token params received from 
NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
18/06/11 10:19:44 INFO block.BlockTokenSecretManager: Setting block keys
18/06/11 10:19:44 INFO balancer.KeyManager: Update block keys every 2hrs, 
30mins, 0sec
18/06/11 10:19:44 ERROR tools.DiskBalancerCLI: 
java.lang.IllegalArgumentException
{code}

In HDFS-13175, a change was made to include more details in the exception name, 
 so after the change the code is:

{code}
  public void setUsed(long dfsUsedSpace) {
Preconditions.checkArgument(dfsUsedSpace < this.getCapacity(),
"DiskBalancerVolume.setUsed: dfsUsedSpace(%s) < capacity(%s)",
dfsUsedSpace, getCapacity());
this.used = dfsUsedSpace;
  }
{code}

There may however be other scenarios that cause the balancer to exit with an 
unhandled exception, and it would be helpful if the tool logged out the full 
stack trace on error rather than just the exception name.

In DiskBalancerCLI.java, the relevant code is:

{code}
  public static void main(String[] argv) throws Exception {
DiskBalancerCLI shell = new DiskBalancerCLI(new HdfsConfiguration());
int res = 0;
try {
  res = ToolRunner.run(shell, argv);
} catch (Exception ex) {
  LOG.error(ex.toString());
  res = 1;
}
System.exit(res);
  }
{code}

We should change the error logged in the exception block to log out the full 
stack to give more information on all unhandled errors, eg:

{code}
LOG.error(ex.toString(), ex);
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13726) RBF: Fix RBF configuration links

2018-07-11 Thread Takanobu Asanuma (JIRA)
Takanobu Asanuma created HDFS-13726:
---

 Summary: RBF: Fix RBF configuration links
 Key: HDFS-13726
 URL: https://issues.apache.org/jira/browse/HDFS-13726
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: documentation
Reporter: Takanobu Asanuma
Assignee: Takanobu Asanuma


That moved from hdfs-default.xml to hdfs-rbf-default.xml.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org