[jira] [Created] (HDFS-13731) Investigate TestReencryption timeouts
Xiao Chen created HDFS-13731: Summary: Investigate TestReencryption timeouts Key: HDFS-13731 URL: https://issues.apache.org/jira/browse/HDFS-13731 Project: Hadoop HDFS Issue Type: Bug Components: encryption, test Affects Versions: 3.0.0 Reporter: Xiao Chen HDFS-12837 fixed some flakiness of Reencryption related tests. But as [~zvenczel]'s comment, there are a few timeouts still. We should investigate that. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13730) BlockReaderRemote.sendReadResult throws NPE
Wei-Chiu Chuang created HDFS-13730: -- Summary: BlockReaderRemote.sendReadResult throws NPE Key: HDFS-13730 URL: https://issues.apache.org/jira/browse/HDFS-13730 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Environment: Hadoop 3.0.0, HBase 2.0.0 + HBASE-20403. Reporter: Wei-Chiu Chuang Found the following exception thrown in a HBase RegionServer log (Hadoop 3.0.0 + HBase 2.0.0. The hbase prefetch bug HBASE-20403 was fixed on this cluster, but I am not sure if that's related at all): {noformat} 2018-07-11 11:10:44,462 WARN org.apache.hadoop.hbase.io.hfile.HFileReaderImpl: Stream moved/closed or prefetch cancelled?path=hdfs://ns1/hbase/data/default/IntegrationTestBigLinkedList_20180711003954/449fa9bf5a7483295493258b5af50abc/meta/e9de0683f8a9413a94183c752bea0ca5, offset=216505135, end=2309991906 java.lang.NullPointerException at org.apache.hadoop.hdfs.net.NioInetPeer.getRemoteAddressString(NioInetPeer.java:99) at org.apache.hadoop.hdfs.net.EncryptedPeer.getRemoteAddressString(EncryptedPeer.java:105) at org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.sendReadResult(BlockReaderRemote.java:330) at org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.readNextPacket(BlockReaderRemote.java:233) at org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.read(BlockReaderRemote.java:165) at org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1050) at org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:992) at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1348) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1312) at org.apache.hadoop.crypto.CryptoInputStream.read(CryptoInputStream.java:331) at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92) at org.apache.hadoop.hbase.io.hfile.HFileBlock.positionalReadWithExtra(HFileBlock.java:805) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readAtOffset(HFileBlock.java:1565) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1769) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1594) at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1488) at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$1.run(HFileReaderImpl.java:278) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748){noformat} The relevant Hadoop code: {code:java|title=BlockReaderRemote#sendReadResult} void sendReadResult(Status statusCode) { assert !sentStatusCode : "already sent status code to " + peer; try { writeReadResult(peer.getOutputStream(), statusCode); sentStatusCode = true; } catch (IOException e) { // It's ok not to be able to send this. But something is probably wrong. LOG.info("Could not send read status (" + statusCode + ") to datanode " + peer.getRemoteAddressString() + ": " + e.getMessage()); } } {code} So the NPE was thrown within a exception handler. A possible explanation could be that the socket was closed so client couldn't write, and Socket#getRemoteSocketAddress() returns null when the socket is closed. Suggest check for nullity and return an empty string in {noformat} NioInetPeer.getRemoteAddressString{noformat} . -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-251) Integrate BlockDeletingService in KeyValueHandler
Lokesh Jain created HDDS-251: Summary: Integrate BlockDeletingService in KeyValueHandler Key: HDDS-251 URL: https://issues.apache.org/jira/browse/HDDS-251 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Datanode Reporter: Lokesh Jain Assignee: Lokesh Jain Fix For: 0.2.1 This Jira aims to integrate BlockDeletingService in KeyValueHandler. It also fixes the unit tests related to delete blocks. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13728) Disk Balaner should not fail if volume usage is greater than capacity
Stephen O'Donnell created HDFS-13728: Summary: Disk Balaner should not fail if volume usage is greater than capacity Key: HDFS-13728 URL: https://issues.apache.org/jira/browse/HDFS-13728 Project: Hadoop HDFS Issue Type: Improvement Components: diskbalancer Affects Versions: 3.0.3 Reporter: Stephen O'Donnell We have seen a couple of scenarios where the disk balancer fails because a datanode reports more spaced used on a disk than its capacity, which should not be possible. This is due to the check below in DiskBalancerVolume.java: {code} public void setUsed(long dfsUsedSpace) { Preconditions.checkArgument(dfsUsedSpace < this.getCapacity(), "DiskBalancerVolume.setUsed: dfsUsedSpace(%s) < capacity(%s)", dfsUsedSpace, getCapacity()); this.used = dfsUsedSpace; } {code} While I agree that it should not be possible for a DN to report more usage on a volume than its capacity, there seems to be some issue that causes this to occur sometimes. In general, this full disk is what causes someone to want to run the Disk Balancer, only to find it fails with the error. There appears to be nothing you can do to force the Disk Balancer to run at this point, but in the scenarios I saw, some data was removed from the disk and usage dropped below the capacity resolving the issue. Can we considered relaxing the above check, and if the usage is greater than the capacity, just set the usage to the capacity so the calculations all work ok? Eg something like this: {code} public void setUsed(long dfsUsedSpace) { -Preconditions.checkArgument(dfsUsedSpace < this.getCapacity()); -this.used = dfsUsedSpace; +if (dfsUsedSpace > this.getCapacity()) { + this.used = this.getCapacity(); +} else { + this.used = dfsUsedSpace; +} } {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13727) Log full stack trace if DiskBalancer exits with an unhandle exceptiopn
Stephen O'Donnell created HDFS-13727: Summary: Log full stack trace if DiskBalancer exits with an unhandle exceptiopn Key: HDFS-13727 URL: https://issues.apache.org/jira/browse/HDFS-13727 Project: Hadoop HDFS Issue Type: Improvement Components: diskbalancer Affects Versions: 3.0.3 Reporter: Stephen O'Donnell In HDFS-13175 it was discovered that when a DN reports the usage on a volume to be greater than the volume capacity, the disk balancer will fail with an unhelpful error: {code} $ hdfs diskbalancer -report -top 5 18/06/11 10:19:43 INFO command.Command: Processing report command 18/06/11 10:19:44 INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec 18/06/11 10:19:44 INFO block.BlockTokenSecretManager: Setting block keys 18/06/11 10:19:44 INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec 18/06/11 10:19:44 ERROR tools.DiskBalancerCLI: java.lang.IllegalArgumentException {code} In HDFS-13175, a change was made to include more details in the exception name, so after the change the code is: {code} public void setUsed(long dfsUsedSpace) { Preconditions.checkArgument(dfsUsedSpace < this.getCapacity(), "DiskBalancerVolume.setUsed: dfsUsedSpace(%s) < capacity(%s)", dfsUsedSpace, getCapacity()); this.used = dfsUsedSpace; } {code} There may however be other scenarios that cause the balancer to exit with an unhandled exception, and it would be helpful if the tool logged out the full stack trace on error rather than just the exception name. In DiskBalancerCLI.java, the relevant code is: {code} public static void main(String[] argv) throws Exception { DiskBalancerCLI shell = new DiskBalancerCLI(new HdfsConfiguration()); int res = 0; try { res = ToolRunner.run(shell, argv); } catch (Exception ex) { LOG.error(ex.toString()); res = 1; } System.exit(res); } {code} We should change the error logged in the exception block to log out the full stack to give more information on all unhandled errors, eg: {code} LOG.error(ex.toString(), ex); {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13726) RBF: Fix RBF configuration links
Takanobu Asanuma created HDFS-13726: --- Summary: RBF: Fix RBF configuration links Key: HDFS-13726 URL: https://issues.apache.org/jira/browse/HDFS-13726 Project: Hadoop HDFS Issue Type: Sub-task Components: documentation Reporter: Takanobu Asanuma Assignee: Takanobu Asanuma That moved from hdfs-default.xml to hdfs-rbf-default.xml. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org