[jira] [Commented] (HDDS-2152) Ozone client fails with OOM while writing a large (~300MB) key.
[ https://issues.apache.org/jira/browse/HDDS-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937255#comment-16937255 ] Tsz Wo Nicholas Sze commented on HDDS-2152: --- HDDS-2169 has a patch, that will address a buffer copy. > Ozone client fails with OOM while writing a large (~300MB) key. > --- > > Key: HDDS-2152 > URL: https://issues.apache.org/jira/browse/HDDS-2152 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Reporter: Aravindan Vijayan >Assignee: YiSheng Lien >Priority: Major > Attachments: largekey.png > > > {code} > dd if=/dev/zero of=testfile bs=1024 count=307200 > ozone sh key put /vol1/bucket1/key testfile > {code} > {code} > Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at > java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) at > java.nio.ByteBuffer.allocate(ByteBuffer.java:335) at > org.apache.hadoop.hdds.scm.storage.BufferPool.allocateBufferIfNeeded(BufferPool.java:66) > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.write(BlockOutputStream.java:234) > at > org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.write(BlockOutputStreamEntry.java:129) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:211) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:193) > at > org.apache.hadoop.ozone.client.io.OzoneOutputStream.write(OzoneOutputStream.java:49) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:96) at > org.apache.hadoop.ozone.web.ozShell.keys.PutKeyHandler.call(PutKeyHandler.java:117) > at > org.apache.hadoop.ozone.web.ozShell.keys.PutKeyHandler.call(PutKeyHandler.java:55) > at picocli.CommandLine.execute(CommandLine.java:1173) at > picocli.CommandLine.access$800(CommandLine.java:141) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2169) Avoid buffer copies while submitting client requests in Ratis
[ https://issues.apache.org/jira/browse/HDDS-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937249#comment-16937249 ] Tsz Wo Nicholas Sze commented on HDDS-2169: --- [~msingh], thanks for taking a look. The patch does apply. Have you tried it? Anyway, I just have submitted a pull request https://github.com/apache/hadoop/pull/1517 > Also this problem needs to be fixed for appendEntries from leader to follower > as well. Sure, let's fix it in a separated JIRA. > Avoid buffer copies while submitting client requests in Ratis > - > > Key: HDDS-2169 > URL: https://issues.apache.org/jira/browse/HDDS-2169 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Shashikant Banerjee >Assignee: Tsz Wo Nicholas Sze >Priority: Major > Labels: pull-request-available > Attachments: o2169_20190923.patch > > Time Spent: 10m > Remaining Estimate: 0h > > Currently, while sending write requests to Ratis from ozone, a protobuf > object containing data encoded and then resultant protobuf is again > converted to a byteString which internally does a copy of the buffer embedded > inside the protobuf again so that it can be submitted over to Ratis client. > Again, while sending the appendRequest as well while building up the > appendRequestProto, it might be again copying the data. The idea here is to > provide client so pass the raw data(stateMachine data) separately to ratis > client without copying overhead. > > {code:java} > private CompletableFuture sendRequestAsync( > ContainerCommandRequestProto request) { > try (Scope scope = GlobalTracer.get() > .buildSpan("XceiverClientRatis." + request.getCmdType().name()) > .startActive(true)) { > ContainerCommandRequestProto finalPayload = > ContainerCommandRequestProto.newBuilder(request) > .setTraceID(TracingUtil.exportCurrentSpan()) > .build(); > boolean isReadOnlyRequest = HddsUtils.isReadOnly(finalPayload); > // finalPayload already has the byteString data embedded. > ByteString byteString = finalPayload.toByteString(); -> It involves a > copy again. > if (LOG.isDebugEnabled()) { > LOG.debug("sendCommandAsync {} {}", isReadOnlyRequest, > sanitizeForDebug(finalPayload)); > } > return isReadOnlyRequest ? > getClient().sendReadOnlyAsync(() -> byteString) : > getClient().sendAsync(() -> byteString); > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work started] (HDDS-2169) Avoid buffer copies while submitting client requests in Ratis
[ https://issues.apache.org/jira/browse/HDDS-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDDS-2169 started by Tsz Wo Nicholas Sze. - > Avoid buffer copies while submitting client requests in Ratis > - > > Key: HDDS-2169 > URL: https://issues.apache.org/jira/browse/HDDS-2169 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Shashikant Banerjee >Assignee: Tsz Wo Nicholas Sze >Priority: Major > Attachments: o2169_20190923.patch > > > Currently, while sending write requests to Ratis from ozone, a protobuf > object containing data encoded and then resultant protobuf is again > converted to a byteString which internally does a copy of the buffer embedded > inside the protobuf again so that it can be submitted over to Ratis client. > Again, while sending the appendRequest as well while building up the > appendRequestProto, it might be again copying the data. The idea here is to > provide client so pass the raw data(stateMachine data) separately to ratis > client without copying overhead. > > {code:java} > private CompletableFuture sendRequestAsync( > ContainerCommandRequestProto request) { > try (Scope scope = GlobalTracer.get() > .buildSpan("XceiverClientRatis." + request.getCmdType().name()) > .startActive(true)) { > ContainerCommandRequestProto finalPayload = > ContainerCommandRequestProto.newBuilder(request) > .setTraceID(TracingUtil.exportCurrentSpan()) > .build(); > boolean isReadOnlyRequest = HddsUtils.isReadOnly(finalPayload); > // finalPayload already has the byteString data embedded. > ByteString byteString = finalPayload.toByteString(); -> It involves a > copy again. > if (LOG.isDebugEnabled()) { > LOG.debug("sendCommandAsync {} {}", isReadOnlyRequest, > sanitizeForDebug(finalPayload)); > } > return isReadOnlyRequest ? > getClient().sendReadOnlyAsync(() -> byteString) : > getClient().sendAsync(() -> byteString); > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2169) Avoid buffer copies while submitting client requests in Ratis
[ https://issues.apache.org/jira/browse/HDDS-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDDS-2169: -- Status: Patch Available (was: Open) o2169_20190923.patch: 1st patch. > Avoid buffer copies while submitting client requests in Ratis > - > > Key: HDDS-2169 > URL: https://issues.apache.org/jira/browse/HDDS-2169 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Shashikant Banerjee >Assignee: Tsz Wo Nicholas Sze >Priority: Major > Attachments: o2169_20190923.patch > > > Currently, while sending write requests to Ratis from ozone, a protobuf > object containing data encoded and then resultant protobuf is again > converted to a byteString which internally does a copy of the buffer embedded > inside the protobuf again so that it can be submitted over to Ratis client. > Again, while sending the appendRequest as well while building up the > appendRequestProto, it might be again copying the data. The idea here is to > provide client so pass the raw data(stateMachine data) separately to ratis > client without copying overhead. > > {code:java} > private CompletableFuture sendRequestAsync( > ContainerCommandRequestProto request) { > try (Scope scope = GlobalTracer.get() > .buildSpan("XceiverClientRatis." + request.getCmdType().name()) > .startActive(true)) { > ContainerCommandRequestProto finalPayload = > ContainerCommandRequestProto.newBuilder(request) > .setTraceID(TracingUtil.exportCurrentSpan()) > .build(); > boolean isReadOnlyRequest = HddsUtils.isReadOnly(finalPayload); > // finalPayload already has the byteString data embedded. > ByteString byteString = finalPayload.toByteString(); -> It involves a > copy again. > if (LOG.isDebugEnabled()) { > LOG.debug("sendCommandAsync {} {}", isReadOnlyRequest, > sanitizeForDebug(finalPayload)); > } > return isReadOnlyRequest ? > getClient().sendReadOnlyAsync(() -> byteString) : > getClient().sendAsync(() -> byteString); > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work stopped] (HDDS-2169) Avoid buffer copies while submitting client requests in Ratis
[ https://issues.apache.org/jira/browse/HDDS-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDDS-2169 stopped by Tsz Wo Nicholas Sze. - > Avoid buffer copies while submitting client requests in Ratis > - > > Key: HDDS-2169 > URL: https://issues.apache.org/jira/browse/HDDS-2169 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Shashikant Banerjee >Assignee: Tsz Wo Nicholas Sze >Priority: Major > Attachments: o2169_20190923.patch > > > Currently, while sending write requests to Ratis from ozone, a protobuf > object containing data encoded and then resultant protobuf is again > converted to a byteString which internally does a copy of the buffer embedded > inside the protobuf again so that it can be submitted over to Ratis client. > Again, while sending the appendRequest as well while building up the > appendRequestProto, it might be again copying the data. The idea here is to > provide client so pass the raw data(stateMachine data) separately to ratis > client without copying overhead. > > {code:java} > private CompletableFuture sendRequestAsync( > ContainerCommandRequestProto request) { > try (Scope scope = GlobalTracer.get() > .buildSpan("XceiverClientRatis." + request.getCmdType().name()) > .startActive(true)) { > ContainerCommandRequestProto finalPayload = > ContainerCommandRequestProto.newBuilder(request) > .setTraceID(TracingUtil.exportCurrentSpan()) > .build(); > boolean isReadOnlyRequest = HddsUtils.isReadOnly(finalPayload); > // finalPayload already has the byteString data embedded. > ByteString byteString = finalPayload.toByteString(); -> It involves a > copy again. > if (LOG.isDebugEnabled()) { > LOG.debug("sendCommandAsync {} {}", isReadOnlyRequest, > sanitizeForDebug(finalPayload)); > } > return isReadOnlyRequest ? > getClient().sendReadOnlyAsync(() -> byteString) : > getClient().sendAsync(() -> byteString); > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2169) Avoid buffer copies while submitting client requests in Ratis
[ https://issues.apache.org/jira/browse/HDDS-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDDS-2169: -- Attachment: o2169_20190923.patch > Avoid buffer copies while submitting client requests in Ratis > - > > Key: HDDS-2169 > URL: https://issues.apache.org/jira/browse/HDDS-2169 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Shashikant Banerjee >Assignee: Tsz Wo Nicholas Sze >Priority: Major > Attachments: o2169_20190923.patch > > > Currently, while sending write requests to Ratis from ozone, a protobuf > object containing data encoded and then resultant protobuf is again > converted to a byteString which internally does a copy of the buffer embedded > inside the protobuf again so that it can be submitted over to Ratis client. > Again, while sending the appendRequest as well while building up the > appendRequestProto, it might be again copying the data. The idea here is to > provide client so pass the raw data(stateMachine data) separately to ratis > client without copying overhead. > > {code:java} > private CompletableFuture sendRequestAsync( > ContainerCommandRequestProto request) { > try (Scope scope = GlobalTracer.get() > .buildSpan("XceiverClientRatis." + request.getCmdType().name()) > .startActive(true)) { > ContainerCommandRequestProto finalPayload = > ContainerCommandRequestProto.newBuilder(request) > .setTraceID(TracingUtil.exportCurrentSpan()) > .build(); > boolean isReadOnlyRequest = HddsUtils.isReadOnly(finalPayload); > // finalPayload already has the byteString data embedded. > ByteString byteString = finalPayload.toByteString(); -> It involves a > copy again. > if (LOG.isDebugEnabled()) { > LOG.debug("sendCommandAsync {} {}", isReadOnlyRequest, > sanitizeForDebug(finalPayload)); > } > return isReadOnlyRequest ? > getClient().sendReadOnlyAsync(() -> byteString) : > getClient().sendAsync(() -> byteString); > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Moved] (HDDS-2169) Avoid buffer copies while submitting client requests in Ratis
[ https://issues.apache.org/jira/browse/HDDS-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze moved RATIS-688 to HDDS-2169: - Component/s: (was: server) (was: client) Fix Version/s: (was: 0.4.0) Key: HDDS-2169 (was: RATIS-688) Target Version/s: (was: 0.4.0) Workflow: patch-available, re-open possible (was: no-reopen-closed, patch-avail) Issue Type: Improvement (was: Bug) Project: Hadoop Distributed Data Store (was: Ratis) > Avoid buffer copies while submitting client requests in Ratis > - > > Key: HDDS-2169 > URL: https://issues.apache.org/jira/browse/HDDS-2169 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Shashikant Banerjee >Assignee: Tsz Wo Nicholas Sze >Priority: Major > > Currently, while sending write requests to Ratis from ozone, a protobuf > object containing data encoded and then resultant protobuf is again > converted to a byteString which internally does a copy of the buffer embedded > inside the protobuf again so that it can be submitted over to Ratis client. > Again, while sending the appendRequest as well while building up the > appendRequestProto, it might be again copying the data. The idea here is to > provide client so pass the raw data(stateMachine data) separately to ratis > client without copying overhead. > > {code:java} > private CompletableFuture sendRequestAsync( > ContainerCommandRequestProto request) { > try (Scope scope = GlobalTracer.get() > .buildSpan("XceiverClientRatis." + request.getCmdType().name()) > .startActive(true)) { > ContainerCommandRequestProto finalPayload = > ContainerCommandRequestProto.newBuilder(request) > .setTraceID(TracingUtil.exportCurrentSpan()) > .build(); > boolean isReadOnlyRequest = HddsUtils.isReadOnly(finalPayload); > // finalPayload already has the byteString data embedded. > ByteString byteString = finalPayload.toByteString(); -> It involves a > copy again. > if (LOG.isDebugEnabled()) { > LOG.debug("sendCommandAsync {} {}", isReadOnlyRequest, > sanitizeForDebug(finalPayload)); > } > return isReadOnlyRequest ? > getClient().sendReadOnlyAsync(() -> byteString) : > getClient().sendAsync(() -> byteString); > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13101) Yet another fsimage corruption related to snapshot
[ https://issues.apache.org/jira/browse/HDFS-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908523#comment-16908523 ] Tsz Wo Nicholas Sze commented on HDFS-13101: > ... Do you plan to cherrypick the commit into lower branches? I am happy to > help out ... [~jojochuang], sound good. Please help. Thanks a lot! > Yet another fsimage corruption related to snapshot > -- > > Key: HDFS-13101 > URL: https://issues.apache.org/jira/browse/HDFS-13101 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Reporter: Yongjun Zhang >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-13101.001.patch, HDFS-13101.002.patch, > HDFS-13101.003.patch, HDFS-13101.004.patch, > HDFS-13101.corruption_repro.patch, > HDFS-13101.corruption_repro_simplified.patch > > > Lately we saw case similar to HDFS-9406, even though HDFS-9406 fix is > present, so it's likely another case not covered by the fix. We are currently > trying to collect good fsimage + editlogs to replay to reproduce it and > investigate. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13101) Yet another fsimage corruption related to snapshot
[ https://issues.apache.org/jira/browse/HDFS-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902330#comment-16902330 ] Tsz Wo Nicholas Sze commented on HDFS-13101: [~shashikant], great work on the patch! Could you fix the checkstyle warnings and see if the unit test failures are related? > Yet another fsimage corruption related to snapshot > -- > > Key: HDFS-13101 > URL: https://issues.apache.org/jira/browse/HDFS-13101 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Reporter: Yongjun Zhang >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDFS-13101.001.patch, HDFS-13101.002.patch, > HDFS-13101.003.patch, HDFS-13101.corruption_repro.patch, > HDFS-13101.corruption_repro_simplified.patch > > > Lately we saw case similar to HDFS-9406, even though HDFS-9406 fix is > present, so it's likely another case not covered by the fix. We are currently > trying to collect good fsimage + editlogs to replay to reproduce it and > investigate. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13101) Yet another fsimage corruption related to snapshot
[ https://issues.apache.org/jira/browse/HDFS-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900382#comment-16900382 ] Tsz Wo Nicholas Sze commented on HDFS-13101: +1 the 003 patch looks good. Pending Jenkins. > Yet another fsimage corruption related to snapshot > -- > > Key: HDFS-13101 > URL: https://issues.apache.org/jira/browse/HDFS-13101 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Reporter: Yongjun Zhang >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDFS-13101.001.patch, HDFS-13101.002.patch, > HDFS-13101.003.patch, HDFS-13101.corruption_repro.patch, > HDFS-13101.corruption_repro_simplified.patch > > > Lately we saw case similar to HDFS-9406, even though HDFS-9406 fix is > present, so it's likely another case not covered by the fix. We are currently > trying to collect good fsimage + editlogs to replay to reproduce it and > investigate. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13101) Yet another fsimage corruption related to snapshot
[ https://issues.apache.org/jira/browse/HDFS-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899171#comment-16899171 ] Tsz Wo Nicholas Sze commented on HDFS-13101: The fix could be {code} // DirectoryWithSnapshotFeature#cleanDirectory(). if (priorCreated != null) { -// we only check the node originally in prior's created list -for (INode cNode : priorDiff.diff.getCreatedUnmodifiable()) { - if (priorCreated.containsKey(cNode)) { -cNode.cleanSubtree(reclaimContext, snapshot, NO_SNAPSHOT_ID); +if (currentINode.isLastReference()) { + // if this is the last reference, the created list can be destroyed. + priorDiff.getChildrenDiff().destroyCreatedList( + reclaimContext, currentINode); +} else { + // we only check the node originally in prior's created list + for (INode cNode : priorDiff.diff.getCreatedUnmodifiable()) { +if (priorCreated.containsKey(cNode)) { + cNode.cleanSubtree(reclaimContext, snapshot, NO_SNAPSHOT_ID); +} } } } {code} where isLastReference() is a new method in INode. {code} //INode.java /** * @return true if this is a reference and the reference count is 1; * otherwise, return false. */ public boolean isLastReference() { final INodeReference ref = getParentReference(); if (!(ref instanceof WithCount)) { return false; } return ((WithCount)ref).getReferenceCount() == 1; } {code} > Yet another fsimage corruption related to snapshot > -- > > Key: HDFS-13101 > URL: https://issues.apache.org/jira/browse/HDFS-13101 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Reporter: Yongjun Zhang >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDFS-13101.001.patch, HDFS-13101.corruption_repro.patch, > HDFS-13101.corruption_repro_simplified.patch > > > Lately we saw case similar to HDFS-9406, even though HDFS-9406 fix is > present, so it's likely another case not covered by the fix. We are currently > trying to collect good fsimage + editlogs to replay to reproduce it and > investigate. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13101) Yet another fsimage corruption related to snapshot
[ https://issues.apache.org/jira/browse/HDFS-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899168#comment-16899168 ] Tsz Wo Nicholas Sze commented on HDFS-13101: I agree that the bug is in DirectoryWithSnapshotFeature#cleanDirectory(). When the directory is the last reference, the entire created list should be destroyed instead of cleaning individual cNode. > Yet another fsimage corruption related to snapshot > -- > > Key: HDFS-13101 > URL: https://issues.apache.org/jira/browse/HDFS-13101 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Reporter: Yongjun Zhang >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDFS-13101.001.patch, HDFS-13101.corruption_repro.patch, > HDFS-13101.corruption_repro_simplified.patch > > > Lately we saw case similar to HDFS-9406, even though HDFS-9406 fix is > present, so it's likely another case not covered by the fix. We are currently > trying to collect good fsimage + editlogs to replay to reproduce it and > investigate. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-13101) Yet another fsimage corruption related to snapshot
[ https://issues.apache.org/jira/browse/HDFS-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze reassigned HDFS-13101: -- Assignee: Shashikant Banerjee (was: Siyao Meng) Component/s: snapshots [~shashikant], it is great that you have come up a small unit test showing the bug! Assigning this to you ... > Yet another fsimage corruption related to snapshot > -- > > Key: HDFS-13101 > URL: https://issues.apache.org/jira/browse/HDFS-13101 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Reporter: Yongjun Zhang >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDFS-13101.001.patch, HDFS-13101.corruption_repro.patch, > HDFS-13101.corruption_repro_simplified.patch > > > Lately we saw case similar to HDFS-9406, even though HDFS-9406 fix is > present, so it's likely another case not covered by the fix. We are currently > trying to collect good fsimage + editlogs to replay to reproduce it and > investigate. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14499) Misleading REM_QUOTA value with snapshot and trash feature enabled for a directory
[ https://issues.apache.org/jira/browse/HDFS-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16883334#comment-16883334 ] Tsz Wo Nicholas Sze commented on HDFS-14499: +1 the 003 patch looks good. > Misleading REM_QUOTA value with snapshot and trash feature enabled for a > directory > -- > > Key: HDFS-14499 > URL: https://issues.apache.org/jira/browse/HDFS-14499 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDFS-14499.000.patch, HDFS-14499.001.patch, > HDFS-14499.002.patch > > > This is the flow of steps where we see a discrepancy between REM_QUOTA and > new file operation failure. REM_QUOTA shows a value of 1 but file creation > operation does not succeed. > {code:java} > hdfs@c3265-node3 root$ hdfs dfs -mkdir /dir1 > hdfs@c3265-node3 root$ hdfs dfsadmin -setQuota 2 /dir1 > hdfs@c3265-node3 root$ hdfs dfsadmin -allowSnapshot /dir1 > Allowing snaphot on /dir1 succeeded > hdfs@c3265-node3 root$ hdfs dfs -touchz /dir1/file1 > hdfs@c3265-node3 root$ hdfs dfs -createSnapshot /dir1 snap1 > Created snapshot /dir1/.snapshot/snap1 > hdfs@c3265-node3 root$ hdfs dfs -count -v -q /dir1 > QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE > PATHNAME > 2 0 none inf 1 1 0 /dir1 > hdfs@c3265-node3 root$ hdfs dfs -rm /dir1/file1 > 19/03/26 11:20:25 INFO fs.TrashPolicyDefault: Moved: > 'hdfs://smajetinn/dir1/file1' to trash at: > hdfs://smajetinn/user/hdfs/.Trash/Current/dir1/file11553599225772 > hdfs@c3265-node3 root$ hdfs dfs -count -v -q /dir1 > QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE > PATHNAME > 2 1 none inf 1 0 0 /dir1 > hdfs@c3265-node3 root$ hdfs dfs -touchz /dir1/file1 > touchz: The NameSpace quota (directories and files) of directory /dir1 is > exceeded: quota=2 file count=3{code} > The issue here, is that the count command takes only files and directories > into account not the inode references. When trash is enabled, the deletion of > files inside a directory actually does a rename operation as a result of > which an inode reference is maintained in the deleted list of the snapshot > diff which is taken into account while computing the namespace quota, but > count command (getContentSummary()) ,just takes into account just the files > and directories, not the referenced entity for calculating the REM_QUOTA. The > referenced entity is taken into account for space quota only. > InodeReference.java: > --- > {code:java} > @Override > public final ContentSummaryComputationContext computeContentSummary( > int snapshotId, ContentSummaryComputationContext summary) { > final int s = snapshotId < lastSnapshotId ? snapshotId : lastSnapshotId; > // only count storagespace for WithName > final QuotaCounts q = computeQuotaUsage( > summary.getBlockStoragePolicySuite(), getStoragePolicyID(), false, > s); > summary.getCounts().addContent(Content.DISKSPACE, q.getStorageSpace()); > summary.getCounts().addTypeSpaces(q.getTypeSpaces()); > return summary; > } > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14499) Misleading REM_QUOTA value with snasphot and trash feature enabled for a directory
[ https://issues.apache.org/jira/browse/HDFS-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16882340#comment-16882340 ] Tsz Wo Nicholas Sze commented on HDFS-14499: {code} + int id = lastSnapshotId != Snapshot.CURRENT_STATE_ID ? snapshotId : + this.lastSnapshotId; {code} It should be {{snapshotId != Snapshot.CURRENT_STATE_ID}}. The patch looks good other than that. > Misleading REM_QUOTA value with snasphot and trash feature enabled for a > directory > -- > > Key: HDFS-14499 > URL: https://issues.apache.org/jira/browse/HDFS-14499 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDFS-14499.000.patch, HDFS-14499.001.patch > > > This is the flow of steps where we see a discrepancy between REM_QUOTA and > new file operation failure. REM_QUOTA shows a value of 1 but file creation > operation does not succeed. > {code:java} > hdfs@c3265-node3 root$ hdfs dfs -mkdir /dir1 > hdfs@c3265-node3 root$ hdfs dfsadmin -setQuota 2 /dir1 > hdfs@c3265-node3 root$ hdfs dfsadmin -allowSnapshot /dir1 > Allowing snaphot on /dir1 succeeded > hdfs@c3265-node3 root$ hdfs dfs -touchz /dir1/file1 > hdfs@c3265-node3 root$ hdfs dfs -createSnapshot /dir1 snap1 > Created snapshot /dir1/.snapshot/snap1 > hdfs@c3265-node3 root$ hdfs dfs -count -v -q /dir1 > QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE > PATHNAME > 2 0 none inf 1 1 0 /dir1 > hdfs@c3265-node3 root$ hdfs dfs -rm /dir1/file1 > 19/03/26 11:20:25 INFO fs.TrashPolicyDefault: Moved: > 'hdfs://smajetinn/dir1/file1' to trash at: > hdfs://smajetinn/user/hdfs/.Trash/Current/dir1/file11553599225772 > hdfs@c3265-node3 root$ hdfs dfs -count -v -q /dir1 > QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE > PATHNAME > 2 1 none inf 1 0 0 /dir1 > hdfs@c3265-node3 root$ hdfs dfs -touchz /dir1/file1 > touchz: The NameSpace quota (directories and files) of directory /dir1 is > exceeded: quota=2 file count=3{code} > The issue here, is that the count command takes only files and directories > into account not the inode references. When trash is enabled, the deletion of > files inside a directory actually does a rename operation as a result of > which an inode reference is maintained in the deleted list of the snapshot > diff which is taken into account while computing the namespace quota, but > count command (getContentSummary()) ,just takes into account just the files > and directories, not the referenced entity for calculating the REM_QUOTA. The > referenced entity is taken into account for space quota only. > InodeReference.java: > --- > {code:java} > @Override > public final ContentSummaryComputationContext computeContentSummary( > int snapshotId, ContentSummaryComputationContext summary) { > final int s = snapshotId < lastSnapshotId ? snapshotId : lastSnapshotId; > // only count storagespace for WithName > final QuotaCounts q = computeQuotaUsage( > summary.getBlockStoragePolicySuite(), getStoragePolicyID(), false, > s); > summary.getCounts().addContent(Content.DISKSPACE, q.getStorageSpace()); > summary.getCounts().addTypeSpaces(q.getTypeSpaces()); > return summary; > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14499) Misleading REM_QUOTA value with snasphot and trash feature enabled for a directory
[ https://issues.apache.org/jira/browse/HDFS-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852747#comment-16852747 ] Tsz Wo Nicholas Sze commented on HDFS-14499: Thanks [~shashikant]. The parameter name in computeContentSummary is snapshotId but not lastSnapshotId. So that the code needs to be updated. > Misleading REM_QUOTA value with snasphot and trash feature enabled for a > directory > -- > > Key: HDFS-14499 > URL: https://issues.apache.org/jira/browse/HDFS-14499 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDFS-14499.000.patch > > > This is the flow of steps where we see a discrepancy between REM_QUOTA and > new file operation failure. REM_QUOTA shows a value of 1 but file creation > operation does not succeed. > {code:java} > hdfs@c3265-node3 root$ hdfs dfs -mkdir /dir1 > hdfs@c3265-node3 root$ hdfs dfsadmin -setQuota 2 /dir1 > hdfs@c3265-node3 root$ hdfs dfsadmin -allowSnapshot /dir1 > Allowing snaphot on /dir1 succeeded > hdfs@c3265-node3 root$ hdfs dfs -touchz /dir1/file1 > hdfs@c3265-node3 root$ hdfs dfs -createSnapshot /dir1 snap1 > Created snapshot /dir1/.snapshot/snap1 > hdfs@c3265-node3 root$ hdfs dfs -count -v -q /dir1 > QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE > PATHNAME > 2 0 none inf 1 1 0 /dir1 > hdfs@c3265-node3 root$ hdfs dfs -rm /dir1/file1 > 19/03/26 11:20:25 INFO fs.TrashPolicyDefault: Moved: > 'hdfs://smajetinn/dir1/file1' to trash at: > hdfs://smajetinn/user/hdfs/.Trash/Current/dir1/file11553599225772 > hdfs@c3265-node3 root$ hdfs dfs -count -v -q /dir1 > QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE > PATHNAME > 2 1 none inf 1 0 0 /dir1 > hdfs@c3265-node3 root$ hdfs dfs -touchz /dir1/file1 > touchz: The NameSpace quota (directories and files) of directory /dir1 is > exceeded: quota=2 file count=3{code} > The issue here, is that the count command takes only files and directories > into account not the inode references. When trash is enabled, the deletion of > files inside a directory actually does a rename operation as a result of > which an inode reference is maintained in the deleted list of the snapshot > diff which is taken into account while computing the namespace quota, but > count command (getContentSummary()) ,just takes into account just the files > and directories, not the referenced entity for calculating the REM_QUOTA. The > referenced entity is taken into account for space quota only. > InodeReference.java: > --- > {code:java} > @Override > public final ContentSummaryComputationContext computeContentSummary( > int snapshotId, ContentSummaryComputationContext summary) { > final int s = snapshotId < lastSnapshotId ? snapshotId : lastSnapshotId; > // only count storagespace for WithName > final QuotaCounts q = computeQuotaUsage( > summary.getBlockStoragePolicySuite(), getStoragePolicyID(), false, > s); > summary.getCounts().addContent(Content.DISKSPACE, q.getStorageSpace()); > summary.getCounts().addTypeSpaces(q.getTypeSpaces()); > return summary; > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-372) There are three buffer copies in BlockOutputStream
[ https://issues.apache.org/jira/browse/HDDS-372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809946#comment-16809946 ] Tsz Wo Nicholas Sze commented on HDDS-372: -- +1 the 005 patch looks good. Pending Jenkins. > There are three buffer copies in BlockOutputStream > -- > > Key: HDDS-372 > URL: https://issues.apache.org/jira/browse/HDDS-372 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Client >Reporter: Tsz Wo Nicholas Sze >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDDS-372.001.patch, HDDS-372.002.patch, > HDDS-372.003.patch, HDDS-372.004.patch, HDDS-372.005.patch, > HDDS-372.20180829.patch > > > Currently, there are three buffer copies in ChunkOutputStream > # from byte[] to ByteBuffer, and > # from ByteBuffer to ByteString. > # from ByteString to ByteBuffer for checskum computation > We should eliminate the ByteBuffer in the middle. > For zero copy io, we should support WritableByteChannel instead of > OutputStream. It won't be done in this JIRA. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-372) There are three buffer copies in BlockOutputStream
[ https://issues.apache.org/jira/browse/HDDS-372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809795#comment-16809795 ] Tsz Wo Nicholas Sze commented on HDDS-372: -- {code} // flip the buffer so as to read the data starting from pos 0 again // for checksum computation in case there is actual copy involved // in the ByteString conversion if (!ByteStringHelper.isUnsafeByteOperationsEnabled()) { chunk.flip(); } {code} - Let's flip the buffer anyway. Otherwise, it is hard to use the ByteStringHelper.getByteString(ByteBuffer) API. {code} //ByteStringHelper private static ByteString copyFrom(ByteBuffer buffer) { final ByteString bytes = ByteString.copyFrom(buffer); buffer.flip(); return bytes; } public static ByteString getByteString(ByteBuffer buffer) { return isUnsafeByteOperationsEnabled ? UnsafeByteOperations.unsafeWrap(buffer) : copyFrom(buffer); } {code} - Please fix the checkstyle warnings and see if the test failures are related. Patch looks good other than that. Thanks. > There are three buffer copies in BlockOutputStream > -- > > Key: HDDS-372 > URL: https://issues.apache.org/jira/browse/HDDS-372 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Client >Reporter: Tsz Wo Nicholas Sze >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDDS-372.001.patch, HDDS-372.002.patch, > HDDS-372.003.patch, HDDS-372.004.patch, HDDS-372.20180829.patch > > > Currently, there are three buffer copies in ChunkOutputStream > # from byte[] to ByteBuffer, and > # from ByteBuffer to ByteString. > # from ByteString to ByteBuffer for checskum computation > We should eliminate the ByteBuffer in the middle. > For zero copy io, we should support WritableByteChannel instead of > OutputStream. It won't be done in this JIRA. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-372) There are three buffer copies in BlockOutputStream
[ https://issues.apache.org/jira/browse/HDDS-372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809535#comment-16809535 ] Tsz Wo Nicholas Sze commented on HDDS-372: -- BlockOutputStreamEntry is still using safeBufferByteStringCopy and it is built by isUnsafeByteOperationsEnabled, i.e. unsafe becomes safe. To avoid this kind of bug, let's avoid passing the boolean around. We may initialize ByteStringHelper as below. {code} public class ByteStringHelper { private static final AtomicBoolean initialized = new AtomicBoolean(); private static volatile boolean isUnsafeByteOperationsEnabled; public static void init(boolean isUnsafeByteOperationsEnabled) { final boolean set = initialized.compareAndSet(false, true); if (set) { ByteStringHelper.isUnsafeByteOperationsEnabled = isUnsafeByteOperationsEnabled; } else { // already initialized, check values Preconditions.checkState(ByteStringHelper.isUnsafeByteOperationsEnabled == isUnsafeByteOperationsEnabled); } } public static ByteString getByteString(ByteBuffer buffer) { return isUnsafeByteOperationsEnabled ? UnsafeByteOperations.unsafeWrap(buffer) : ByteString.copyFrom(buffer); } public static ByteString getByteString(byte[] bytes) { return isUnsafeByteOperationsEnabled ? UnsafeByteOperations.unsafeWrap(bytes) : ByteString.copyFrom(bytes); } } {code} > There are three buffer copies in BlockOutputStream > -- > > Key: HDDS-372 > URL: https://issues.apache.org/jira/browse/HDDS-372 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Client >Reporter: Tsz Wo Nicholas Sze >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDDS-372.001.patch, HDDS-372.002.patch, > HDDS-372.003.patch, HDDS-372.20180829.patch > > > Currently, there are three buffer copies in ChunkOutputStream > # from byte[] to ByteBuffer, and > # from ByteBuffer to ByteString. > # from ByteString to ByteBuffer for checskum computation > We should eliminate the ByteBuffer in the middle. > For zero copy io, we should support WritableByteChannel instead of > OutputStream. It won't be done in this JIRA. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-372) There are three buffer copies in BlockOutputStream
[ https://issues.apache.org/jira/browse/HDDS-372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808414#comment-16808414 ] Tsz Wo Nicholas Sze commented on HDDS-372: -- Thanks [~shashikant]. Some quick comments: - Do not change Checksum to use UnsafeByteOperations since (1) checksum size is very small compared with the data and (2) checksum is used to detect data change -- if there is a bug involving UnsafeByteOperations, the checksum may be able to detect it. - How about renaming the new conf "ozone.safe.buffer.bytestring.copy" to "ozone.client.UnsafeByteOperations.enabled"? > There are three buffer copies in BlockOutputStream > -- > > Key: HDDS-372 > URL: https://issues.apache.org/jira/browse/HDDS-372 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Client >Reporter: Tsz Wo Nicholas Sze >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDDS-372.001.patch, HDDS-372.002.patch, > HDDS-372.20180829.patch > > > Currently, there are three buffer copies in ChunkOutputStream > # from byte[] to ByteBuffer, and > # from ByteBuffer to ByteString. > # from ByteString to ByteBuffer for checskum computation > We should eliminate the ByteBuffer in the middle. > For zero copy io, we should support WritableByteChannel instead of > OutputStream. It won't be done in this JIRA. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-699) Detect Ozone Network topology
[ https://issues.apache.org/jira/browse/HDDS-699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16794909#comment-16794909 ] Tsz Wo Nicholas Sze commented on HDDS-699: -- Thank you, [~Sammi]. > Detect Ozone Network topology > - > > Key: HDDS-699 > URL: https://issues.apache.org/jira/browse/HDDS-699 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Sammi Chen >Priority: Major > Fix For: 0.5.0 > > Attachments: HDDS-699.00.patch, HDDS-699.01.patch, HDDS-699.02.patch, > HDDS-699.03.patch, HDDS-699.04.patch, HDDS-699.05.patch, HDDS-699.06.patch, > HDDS-699.07.patch, HDDS-699.08.patch, HDDS-699.09.patch > > > Traditionally this has been implemented in Hadoop via script or customizable > java class. One thing we want to add here is the flexible multi-level support > instead of fixed levels like DC/Rack/NG/Node. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-699) Detect Ozone Network topology
[ https://issues.apache.org/jira/browse/HDDS-699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16794683#comment-16794683 ] Tsz Wo Nicholas Sze commented on HDDS-699: -- +1 the 09 patch looks good. The test failures do not seem related. Let me start another Jenkins build. > Detect Ozone Network topology > - > > Key: HDDS-699 > URL: https://issues.apache.org/jira/browse/HDDS-699 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Sammi Chen >Priority: Major > Attachments: HDDS-699.00.patch, HDDS-699.01.patch, HDDS-699.02.patch, > HDDS-699.03.patch, HDDS-699.04.patch, HDDS-699.05.patch, HDDS-699.06.patch, > HDDS-699.07.patch, HDDS-699.08.patch, HDDS-699.09.patch > > > Traditionally this has been implemented in Hadoop via script or customizable > java class. One thing we want to add here is the flexible multi-level support > instead of fixed levels like DC/Rack/NG/Node. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-699) Detect Ozone Network topology
[ https://issues.apache.org/jira/browse/HDDS-699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793352#comment-16793352 ] Tsz Wo Nicholas Sze commented on HDDS-699: -- > ... In testConcurrentAccess, all the individual tests in the class are > scheduled to run currently in different threads to test the robustness of the > NetworkTopologyImpl. ... It seems that testConcurrentAccess does not work well. The test does not fail even if there is an AssertionError or IllegalArgumentException. If it never fails, how could we tell if there is a bug? How about removing it for the moment? We may add it when we have a better design later on. > Detect Ozone Network topology > - > > Key: HDDS-699 > URL: https://issues.apache.org/jira/browse/HDDS-699 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Sammi Chen >Priority: Major > Attachments: HDDS-699.00.patch, HDDS-699.01.patch, HDDS-699.02.patch, > HDDS-699.03.patch, HDDS-699.04.patch, HDDS-699.05.patch, HDDS-699.06.patch, > HDDS-699.07.patch, HDDS-699.08.patch > > > Traditionally this has been implemented in Hadoop via script or customizable > java class. One thing we want to add here is the flexible multi-level support > instead of fixed levels like DC/Rack/NG/Node. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDDS-699) Detect Ozone Network topology
[ https://issues.apache.org/jira/browse/HDDS-699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793053#comment-16793053 ] Tsz Wo Nicholas Sze edited comment on HDDS-699 at 3/14/19 8:43 PM: --- Tried to run TestNetworkTopologyImpl locally. There are a lot of exceptions and errors although the tests do not fail. {code:java} org.junit.internal.AssumptionViolatedException: got: , expected: is at org.junit.Assume.assumeThat(Assume.java:95) at org.junit.Assume.assumeTrue(Assume.java:41) at org.apache.hadoop.hdds.scm.net.TestNetworkTopologyImpl.testAncestor(TestNetworkTopologyImpl.java:238) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) Exception in thread "Thread-19" org.junit.internal.AssumptionViolatedException: got: , expected: is at org.junit.Assume.assumeThat(Assume.java:95) at org.junit.Assume.assumeTrue(Assume.java:41) at org.apache.hadoop.hdds.scm.net.TestNetworkTopologyImpl.testAncestor(TestNetworkTopologyImpl.java:238) at org.apache.hadoop.hdds.scm.net.TestNetworkTopologyImpl.lambda$testConcurrentAccess$9(TestNetworkTopologyImpl.java:853) at java.lang.Thread.run(Thread.java:748) Exception in thread "Thread-18" java.lang.IllegalArgumentException: affinityNode /1.1.1.1 doesn't have ancestor on generation 1 at org.apache.hadoop.hdds.scm.net.NetworkTopologyImpl.chooseNodeInternal(NetworkTopologyImpl.java:498) at org.apache.hadoop.hdds.scm.net.NetworkTopologyImpl.getNode(NetworkTopologyImpl.java:481) at org.apache.hadoop.hdds.scm.net.TestNetworkTopologyImpl.pickNodes(TestNetworkTopologyImpl.java:972) at org.apache.hadoop.hdds.scm.net.TestNetworkTopologyImpl.testChooseRandomWithAffinityNode(TestNetworkTopologyImpl.java:596) at org.apache.hadoop.hdds.scm.net.TestNetworkTopologyImpl.lambda$testConcurrentAccess$8(TestNetworkTopologyImpl.java:849) at java.lang.Thread.run(Thread.java:748) Exception in thread "Thread-45" java.lang.IllegalArgumentException: Affinity node /r1/1.1.1.1 is not a member of topology at org.apache.hadoop.hdds.scm.net.NetworkTopologyImpl.checkAffinityNode(NetworkTopologyImpl.java:767) at org.apache.hadoop.hdds.scm.net.NetworkTopologyImpl.getNode(NetworkTopologyImpl.java:476) at org.apache.hadoop.hdds.scm.net.TestNetworkTopologyImpl.pickNodes(TestNetworkTopologyImpl.java:972) at org.apache.hadoop.hdds.scm.net.TestNetworkTopologyImpl.testChooseRandomWithAffinityNode(TestNetworkTopologyImpl.java:596) at org.apache.hadoop.hdds.scm.net.TestNetworkTopologyImpl.lambda$testConcurrentAccess$8(TestNetworkTopologyImpl.java:849) at java.lang.Thread.run(Thread.java:748) Exception in thread "Thread-41" java.lang.AssertionError at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.hdds.scm.net.TestNetworkTopologyImpl.testChooseRandomExcludedNode(TestNetworkTopologyImpl.java:454) at org.apache.hadoop.hdds.scm.net.TestNetworkTopologyImpl.lambda$testConcurrentAccess$4(TestNetworkTopologyImpl.java:833) at java.lang.Thread.run(Thread.java:748) Exception in thread "Thread-72" java.lang.IllegalArgumentException: Affinity node /d1/r1/1.1.1.1 is not a member of topology at org.apache.hadoop.hdds.scm.net.NetworkTopologyImpl.checkAffinityNode(NetworkTopologyImpl.java:767) at org.apache.hadoop.hdds.scm.net.NetworkTopologyImpl.getNode(NetworkTopologyImpl.java:476) at org.apache.hadoop.hdds.scm.net.TestNetworkTopologyImpl.pickNodes(TestNetworkTopologyImpl.java:972) at org.apache.hadoop.hdds.scm.net.TestNetworkTopologyImpl.testChooseRandomWithAffinityNode(TestNetworkTopologyImpl.java:596) at org.apache.hadoop.hdds.scm.net.TestNetworkTopologyImpl.lambda$testConcurrentAccess$8(TestNetworkTopologyImpl.java:849) at java.lang.Thread.run(Thread.java:748) Exception in thread "Thread-76" java.lang.AssertionError: reader:/d1/r1/1.1.1.1,node1:/d2/r3/6.6.6.6,node2:/d1/r1/2.2.2.2,cost1:6,cost2:2
[jira] [Commented] (HDDS-699) Detect Ozone Network topology
[ https://issues.apache.org/jira/browse/HDDS-699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793055#comment-16793055 ] Tsz Wo Nicholas Sze commented on HDDS-699: -- The 08 patch look good other than the TestNetworkTopologyImpl problems. +1 > Detect Ozone Network topology > - > > Key: HDDS-699 > URL: https://issues.apache.org/jira/browse/HDDS-699 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Sammi Chen >Priority: Major > Attachments: HDDS-699.00.patch, HDDS-699.01.patch, HDDS-699.02.patch, > HDDS-699.03.patch, HDDS-699.04.patch, HDDS-699.05.patch, HDDS-699.06.patch, > HDDS-699.07.patch, HDDS-699.08.patch > > > Traditionally this has been implemented in Hadoop via script or customizable > java class. One thing we want to add here is the flexible multi-level support > instead of fixed levels like DC/Rack/NG/Node. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDDS-699) Detect Ozone Network topology
[ https://issues.apache.org/jira/browse/HDDS-699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793053#comment-16793053 ] Tsz Wo Nicholas Sze edited comment on HDDS-699 at 3/14/19 8:42 PM: --- Tried to run TestNetworkTopologyImpl locally. There are a lot of exceptions and errors although the tests does not fail. {code} org.junit.internal.AssumptionViolatedException: got: , expected: is at org.junit.Assume.assumeThat(Assume.java:95) at org.junit.Assume.assumeTrue(Assume.java:41) at org.apache.hadoop.hdds.scm.net.TestNetworkTopologyImpl.testAncestor(TestNetworkTopologyImpl.java:238) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) Exception in thread "Thread-19" org.junit.internal.AssumptionViolatedException: got: , expected: is at org.junit.Assume.assumeThat(Assume.java:95) at org.junit.Assume.assumeTrue(Assume.java:41) at org.apache.hadoop.hdds.scm.net.TestNetworkTopologyImpl.testAncestor(TestNetworkTopologyImpl.java:238) at org.apache.hadoop.hdds.scm.net.TestNetworkTopologyImpl.lambda$testConcurrentAccess$9(TestNetworkTopologyImpl.java:853) at java.lang.Thread.run(Thread.java:748) Exception in thread "Thread-18" java.lang.IllegalArgumentException: affinityNode /1.1.1.1 doesn't have ancestor on generation 1 at org.apache.hadoop.hdds.scm.net.NetworkTopologyImpl.chooseNodeInternal(NetworkTopologyImpl.java:498) at org.apache.hadoop.hdds.scm.net.NetworkTopologyImpl.getNode(NetworkTopologyImpl.java:481) at org.apache.hadoop.hdds.scm.net.TestNetworkTopologyImpl.pickNodes(TestNetworkTopologyImpl.java:972) at org.apache.hadoop.hdds.scm.net.TestNetworkTopologyImpl.testChooseRandomWithAffinityNode(TestNetworkTopologyImpl.java:596) at org.apache.hadoop.hdds.scm.net.TestNetworkTopologyImpl.lambda$testConcurrentAccess$8(TestNetworkTopologyImpl.java:849) at java.lang.Thread.run(Thread.java:748) Exception in thread "Thread-45" java.lang.IllegalArgumentException: Affinity node /r1/1.1.1.1 is not a member of topology at org.apache.hadoop.hdds.scm.net.NetworkTopologyImpl.checkAffinityNode(NetworkTopologyImpl.java:767) at org.apache.hadoop.hdds.scm.net.NetworkTopologyImpl.getNode(NetworkTopologyImpl.java:476) at org.apache.hadoop.hdds.scm.net.TestNetworkTopologyImpl.pickNodes(TestNetworkTopologyImpl.java:972) at org.apache.hadoop.hdds.scm.net.TestNetworkTopologyImpl.testChooseRandomWithAffinityNode(TestNetworkTopologyImpl.java:596) at org.apache.hadoop.hdds.scm.net.TestNetworkTopologyImpl.lambda$testConcurrentAccess$8(TestNetworkTopologyImpl.java:849) at java.lang.Thread.run(Thread.java:748) Exception in thread "Thread-41" java.lang.AssertionError at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.hdds.scm.net.TestNetworkTopologyImpl.testChooseRandomExcludedNode(TestNetworkTopologyImpl.java:454) at org.apache.hadoop.hdds.scm.net.TestNetworkTopologyImpl.lambda$testConcurrentAccess$4(TestNetworkTopologyImpl.java:833) at java.lang.Thread.run(Thread.java:748) Exception in thread "Thread-72" java.lang.IllegalArgumentException: Affinity node /d1/r1/1.1.1.1 is not a member of topology at org.apache.hadoop.hdds.scm.net.NetworkTopologyImpl.checkAffinityNode(NetworkTopologyImpl.java:767) at org.apache.hadoop.hdds.scm.net.NetworkTopologyImpl.getNode(NetworkTopologyImpl.java:476) at org.apache.hadoop.hdds.scm.net.TestNetworkTopologyImpl.pickNodes(TestNetworkTopologyImpl.java:972) at org.apache.hadoop.hdds.scm.net.TestNetworkTopologyImpl.testChooseRandomWithAffinityNode(TestNetworkTopologyImpl.java:596) at org.apache.hadoop.hdds.scm.net.TestNetworkTopologyImpl.lambda$testConcurrentAccess$8(TestNetworkTopologyImpl.java:849) at java.lang.Thread.run(Thread.java:748) Exception in thread "Thread-76" java.lang.AssertionError: reader:/d1/r1/1.1.1.1,node1:/d2/r3/6.6.6.6,node2:/d1/r1/2.2.2.2,cost1:6,cost2:2
[jira] [Commented] (HDDS-699) Detect Ozone Network topology
[ https://issues.apache.org/jira/browse/HDDS-699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793053#comment-16793053 ] Tsz Wo Nicholas Sze commented on HDDS-699: -- Tried to run TestNetworkTopologyImpl locally. There are a lot of exceptions and errors although the tests does not fail. {code} /Library/Java/JavaVirtualMachines/jdk1.8.0_191.jdk/Contents/Home/bin/java -ea -Dhadoop.log.dir=/Users/szetszwo/hadoop/h1-readonly/hadoop-hdds/common/target/log -Dhadoop.tmp.dir=/Users/szetszwo/hadoop/h1-readonly/hadoop-hdds/common/target/tmp -Dtest.build.dir=/Users/szetszwo/hadoop/h1-readonly/hadoop-hdds/common/target/test-dir -Dtest.build.data=/Users/szetszwo/hadoop/h1-readonly/hadoop-hdds/common/target/test-dir -Dtest.build.classes=/Users/szetszwo/hadoop/h1-readonly/hadoop-hdds/common/target/test-classes -Djava.net.preferIPv4Stack=true -Djava.security.krb5.conf=/Users/szetszwo/hadoop/h1-readonly/hadoop-hdds/common/target/test-classes/krb5.conf -Djava.security.egd=file:///dev/urandom -Xmx2048m -XX:+HeapDumpOnOutOfMemoryError -Didea.test.cyclic.buffer.size=1048576 "-javaagent:/Applications/IntelliJ IDEA.app/Contents/lib/idea_rt.jar=57589:/Applications/IntelliJ IDEA.app/Contents/bin" -Dfile.encoding=UTF-8 -classpath "/Applications/IntelliJ IDEA.app/Contents/lib/idea_rt.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/junit/lib/junit-rt.jar:/Applications/IntelliJ
[jira] [Commented] (HDDS-699) Detect Ozone Network topology
[ https://issues.apache.org/jira/browse/HDDS-699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791100#comment-16791100 ] Tsz Wo Nicholas Sze commented on HDDS-699: -- Some final comments: - There are some code duplication in NetworkTopologyImpl -* getNode(..) and one of the chooseRandom(..) are mostly the same. We should refactor them. -* Different versions of chooseRandom(..) should just call the most general chooseRandom(..) method. - Some items in [this comment|https://issues.apache.org/jira/browse/HDDS-699?focusedCommentId=16786253=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16786253] are not yet addressed. - There are a few checkstyle warnings. Thanks! > Detect Ozone Network topology > - > > Key: HDDS-699 > URL: https://issues.apache.org/jira/browse/HDDS-699 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Sammi Chen >Priority: Major > Attachments: HDDS-699.00.patch, HDDS-699.01.patch, HDDS-699.02.patch, > HDDS-699.03.patch, HDDS-699.04.patch, HDDS-699.05.patch, HDDS-699.06.patch > > > Traditionally this has been implemented in Hadoop via script or customizable > java class. One thing we want to add here is the flexible multi-level support > instead of fixed levels like DC/Rack/NG/Node. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-699) Detect Ozone Network topology
[ https://issues.apache.org/jira/browse/HDDS-699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791097#comment-16791097 ] Tsz Wo Nicholas Sze commented on HDDS-699: -- Just found that NetUtils.removeDuplicate has already taken care my previous comment. Thanks. > Detect Ozone Network topology > - > > Key: HDDS-699 > URL: https://issues.apache.org/jira/browse/HDDS-699 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Sammi Chen >Priority: Major > Attachments: HDDS-699.00.patch, HDDS-699.01.patch, HDDS-699.02.patch, > HDDS-699.03.patch, HDDS-699.04.patch, HDDS-699.05.patch, HDDS-699.06.patch > > > Traditionally this has been implemented in Hadoop via script or customizable > java class. One thing we want to add here is the flexible multi-level support > instead of fixed levels like DC/Rack/NG/Node. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-699) Detect Ozone Network topology
[ https://issues.apache.org/jira/browse/HDDS-699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791051#comment-16791051 ] Tsz Wo Nicholas Sze commented on HDDS-699: -- Thanks [~Sammi] for the 06 patch. The example in the javadoc of getLeaf is very useful! {code} */ --- root * / \ * /\ */ \ * /\ * dc1 dc2 */ \ / \ * / \ / \ * / \ / \ *rack1 rack2 rack1 rack2 * / \ / \ / \ / \ * n1 n2 n3 n4 n5 n6 n7 n8 {code} Consider the following two sets of input: # leafIndex = 2, excludedScope = /dc1/rack1, excludedNodes = \{/dc1/rack1/n1}, ancestorGen = 0 # leafIndex = 2, excludedScope = /dc1/rack1, excludedNodes = \{/dc1/rack1/n1}, ancestorGen = 2 In #1, the entire /dc1/rack1 is excluded so that the output is n4. In #2, the entire /dc1 is excluded so that the output is n6. Therefore, we should calculate the overlap and remove it, if there is any. {code} public Node getLeaf(int leafIndex, String excludedScope, Collection excludedNodes, int ancestorGen) { ... // build an ancestor(children) to exclude node count map Map countMap = getAncestorCountMap(excludedNodes, ancestorGen, currentGen); // check overlap between excludedScope and countMap if (excludedScope != null) { for(Iterator> i = countMap.entrySet().iterator(); i.hasNext(); ) { final Map.Entry entry = i.next(); final String path = entry.getKey().getNetworkFullPath(); if (path.startsWith(excludedScope)) { // this node is a part of the excludedScope i.remove(); } else if (excludedScope.startsWith(path)) { // the excludedScope is already excluded by the this node excludedScope = null; } } } // nodes covered by excluded scope int excludedNodeCount = getExcludedScopeNodeCount(excludedScope); ... } {code} > Detect Ozone Network topology > - > > Key: HDDS-699 > URL: https://issues.apache.org/jira/browse/HDDS-699 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Sammi Chen >Priority: Major > Attachments: HDDS-699.00.patch, HDDS-699.01.patch, HDDS-699.02.patch, > HDDS-699.03.patch, HDDS-699.04.patch, HDDS-699.05.patch, HDDS-699.06.patch > > > Traditionally this has been implemented in Hadoop via script or customizable > java class. One thing we want to add here is the flexible multi-level support > instead of fixed levels like DC/Rack/NG/Node. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-699) Detect Ozone Network topology
[ https://issues.apache.org/jira/browse/HDDS-699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16787126#comment-16787126 ] Tsz Wo Nicholas Sze commented on HDDS-699: -- > I think here you mean getAncestorCounts. ... I do mean getAncestorNodeMap. It only returns one of the node of an ancestor, which is a bug, since two excluded nodes under the same ancestor may or may not overlap, depending on the numLevelToExclude (ancestorGen). In the 05 patch, getAncestorNodeMap and getAncestorCountMap should be combined to one method, as shown in [getAncestorCounts in this comment|https://issues.apache.org/jira/browse/HDDS-699?focusedCommentId=16786248=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16786248]. > would like to keep current behavior to fit its function name. Otherwise > people may has questions. It is common to define a node to be its own ancestor. For example, see https://en.wikipedia.org/wiki/Lowest_common_ancestor > Detect Ozone Network topology > - > > Key: HDDS-699 > URL: https://issues.apache.org/jira/browse/HDDS-699 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Sammi Chen >Priority: Major > Attachments: HDDS-699.00.patch, HDDS-699.01.patch, HDDS-699.02.patch, > HDDS-699.03.patch, HDDS-699.04.patch, HDDS-699.05.patch > > > Traditionally this has been implemented in Hadoop via script or customizable > java class. One thing we want to add here is the flexible multi-level support > instead of fixed levels like DC/Rack/NG/Node. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDDS-699) Detect Ozone Network topology
[ https://issues.apache.org/jira/browse/HDDS-699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786248#comment-16786248 ] Tsz Wo Nicholas Sze edited comment on HDDS-699 at 3/7/19 12:48 AM: --- Thanks [~Sammi]. Just found that the logic for getLeaf(int leafIndex, String excludedScope, Collection excludedNodes, int ancestorGen) is quite complicated. - First of all, let's consistently use "level" instead of "generation" in the code. In the getLeaf methods, let's rename ancestorGen to numLevelToExclude. - excludedScope and excludedNodes may overlap so that we should filter out the overlapped nodes. {code} public Node getLeaf(int leafIndex, String excludedScope, Collection excludedNodes, int numLevelToExclude) { Preconditions.checkArgument(leafIndex >= 0 && numLevelToExclude >= 0); if (excludedScope != null) { excludedNodes = excludedNodes.stream() .filter(n -> !n.getNetworkFullPath().startsWith(excludedScope)) .collect(Collectors.toList()); } return getLeafRecursively(leafIndex, excludedScope, excludedNodes, numLevelToExclude); // see below for getLeafRecursively } {code} - Let's change getAncestor(0) to return this. It will simplify the code. {code} public Node getAncestor(int generation) { Preconditions.checkArgument(generation >= 0); Node current = this; while (generation > 0 && current != null) { current = current.getParent(); generation--; } return current; } {code} - Then, we need to take care the excluded node counting with numLevelToExclude. getAncestorNodeMap seems incorrect since it does not consider numLevelToExclude. When consider numLevelToExclude, two excluded nodes under the same ancestor may or may not overlap. We should filter out the overlap first as below. {code} /** * @return a map: ancestor-node -> node-count, where * the ancestor-node corresponds to the levelToReturn, and * the node-count corresponds to the levelToCount. */ private Map getAncestorCounts(Collection nodes, int levelToReturn, int levelToCount) { Preconditions.checkState(levelToReturn >= levelToCount); if (nodes == null || nodes.size() == 0) { return Collections.emptyMap(); } // map: levelToCount -> levelToReturn final Map map = new HashMap<>(); for (Node node: nodes) { final Node toCount = node.getAncestor(levelToCount); final Node toReturn = node.getAncestor(levelToReturn - levelToCount); map.putIfAbsent(toCount, toReturn); } // map: levelToReturn -> counts final Map counts = new HashMap<>(); for (Map.Entry entry : map.entrySet()) { final Node toCount = entry.getKey(); final Node toReturn = entry.getValue(); counts.compute(toReturn, (key, n) -> (n == null? 0: n) + toCount.getNumOfLeaves()); } return counts; } {code} - Finally, here is the getLeafRecursively(..). The other getLeaf methods can be removed. {code} private Node getLeafRecursively(int leafIndex, String excludedScope, Collection excludedNodes, int numLevelToExclude) { if (isLeafParent()) { return getLeafOnLeafParent(leafIndex, excludedScope, excludedNodes); } final int levelToReturn = NodeSchemaManager.getInstance().getMaxLevel() - getLevel() - 1; final Map excludedAncestors = getAncestorCounts( excludedNodes, levelToReturn, numLevelToExclude); final int excludedScopeCount = getScopeNodeCount(excludedScope); for(Node node : childrenMap.values()) { int leafCount = node.getNumOfLeaves(); if (excludedScope != null && excludedScope.startsWith(node.getNetworkFullPath())) { leafCount -= excludedScopeCount; } leafCount -= excludedAncestors.get(node); if (leafCount > 0) { if (leafIndex < leafCount) { return ((InnerNodeImpl)node).getLeafRecursively( leafIndex, excludedScope, excludedNodes, numLevelToExclude); } leafIndex -= leafCount; } } return null; } {code} was (Author: szetszwo): Thanks [~Sammi]. Just found that the logic for getLeaf(int leafIndex, String excludedScope, Collection excludedNodes, int ancestorGen) is quite complicated. - First of all, let's consistently use "level" instead of "generation" in the code. In the getLeaf methods, let's rename ancestorGen to numLevelToExclude. - excludedScope and excludedNodes may overlap so that we should filter out the overlapped nodes. {code} public Node getLeaf(int leafIndex, String excludedScope, Collection excludedNodes, int numLevelToExclude) { Preconditions.checkArgument(leafIndex >= 0 && numLevelToExclude >= 0); if (excludedScope != null) { excludedNodes = excludedNodes.stream() .filter(n -> !n.getNetworkFullPath().startsWith(excludedScope)) .collect(Collectors.toList()); }
[jira] [Comment Edited] (HDDS-699) Detect Ozone Network topology
[ https://issues.apache.org/jira/browse/HDDS-699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786248#comment-16786248 ] Tsz Wo Nicholas Sze edited comment on HDDS-699 at 3/7/19 12:50 AM: --- Thanks [~Sammi]. Just found that the logic for getLeaf(int leafIndex, String excludedScope, Collection excludedNodes, int ancestorGen) is quite complicated. - First of all, let's consistently use "level" instead of "generation" in the code. In the getLeaf methods, let's rename ancestorGen to numLevelToExclude. - excludedScope and excludedNodes may overlap so that we should filter out the overlapped nodes. {code} public Node getLeaf(int leafIndex, String excludedScope, Collection excludedNodes, int numLevelToExclude) { Preconditions.checkArgument(leafIndex >= 0 && numLevelToExclude >= 0); if (excludedScope != null) { excludedNodes = excludedNodes.stream() .filter(n -> !n.getNetworkFullPath().startsWith(excludedScope)) .collect(Collectors.toList()); } return getLeafRecursively(leafIndex, excludedScope, excludedNodes, numLevelToExclude); // see below for getLeafRecursively } {code} - Let's change getAncestor(0) to return this. It will simplify the code. {code} public Node getAncestor(int generation) { Preconditions.checkArgument(generation >= 0); Node current = this; while (generation > 0 && current != null) { current = current.getParent(); generation--; } return current; } {code} - Then, we need to take care the excluded node counting with numLevelToExclude. getAncestorNodeMap seems incorrect since it does not consider numLevelToExclude. When consider numLevelToExclude, two excluded nodes under the same ancestor may or may not overlap. We should filter out the overlap first as below. {code} /** * @return a map: ancestor-node -> node-count, where * the ancestor-node corresponds to the levelToReturn, and * the node-count corresponds to the levelToCount. */ private Map getAncestorCounts(Collection nodes, int levelToReturn, int levelToCount) { Preconditions.checkState(levelToReturn >= levelToCount); if (nodes == null || nodes.size() == 0) { return Collections.emptyMap(); } // map: levelToCount -> levelToReturn final Map map = new HashMap<>(); for (Node node: nodes) { final Node toCount = node.getAncestor(levelToCount); final Node toReturn = toCount.getAncestor(levelToReturn - levelToCount); map.putIfAbsent(toCount, toReturn); } // map: levelToReturn -> counts final Map counts = new HashMap<>(); for (Map.Entry entry : map.entrySet()) { final Node toCount = entry.getKey(); final Node toReturn = entry.getValue(); counts.compute(toReturn, (key, n) -> (n == null? 0: n) + toCount.getNumOfLeaves()); } return counts; } {code} - Finally, here is the getLeafRecursively(..). The other getLeaf methods can be removed. {code} private Node getLeafRecursively(int leafIndex, String excludedScope, Collection excludedNodes, int numLevelToExclude) { if (isLeafParent()) { return getLeafOnLeafParent(leafIndex, excludedScope, excludedNodes); } final int levelToReturn = NodeSchemaManager.getInstance().getMaxLevel() - getLevel() - 1; final Map excludedAncestors = getAncestorCounts( excludedNodes, levelToReturn, numLevelToExclude); final int excludedScopeCount = getScopeNodeCount(excludedScope); for(Node node : childrenMap.values()) { int leafCount = node.getNumOfLeaves(); if (excludedScope != null && excludedScope.startsWith(node.getNetworkFullPath())) { leafCount -= excludedScopeCount; } leafCount -= excludedAncestors.get(node); if (leafCount > 0) { if (leafIndex < leafCount) { return ((InnerNodeImpl)node).getLeafRecursively( leafIndex, excludedScope, excludedNodes, numLevelToExclude); } leafIndex -= leafCount; } } return null; } {code} was (Author: szetszwo): Thanks [~Sammi]. Just found that the logic for getLeaf(int leafIndex, String excludedScope, Collection excludedNodes, int ancestorGen) is quite complicated. - First of all, let's consistently use "level" instead of "generation" in the code. In the getLeaf methods, let's rename ancestorGen to numLevelToExclude. - excludedScope and excludedNodes may overlap so that we should filter out the overlapped nodes. {code} public Node getLeaf(int leafIndex, String excludedScope, Collection excludedNodes, int numLevelToExclude) { Preconditions.checkArgument(leafIndex >= 0 && numLevelToExclude >= 0); if (excludedScope != null) { excludedNodes = excludedNodes.stream() .filter(n -> !n.getNetworkFullPath().startsWith(excludedScope)) .collect(Collectors.toList()); }
[jira] [Commented] (HDDS-699) Detect Ozone Network topology
[ https://issues.apache.org/jira/browse/HDDS-699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786253#comment-16786253 ] Tsz Wo Nicholas Sze commented on HDDS-699: -- Some other comments: - Move getNumOfLeaves() from InnerNode to Node. It returns 1 for non-InnerNode. - In Node, getParent() should return InnerNode instead of Node. - Add a new field fullPath to NodeImpl. The getNetworkFullPath() just return it in order to avoid constructing the string many many times. - In InnerNodeImpl, remove getNodes(int level) and getChildren(). They are unused. - In InnerNodeImpl.isParent(node), it seems wrong to return true when node.getNetworkLocation().equals(NetConstants.ROOT). - In InnerNodeImpl.getNode(String loc), we should first check if loc is absolute and then return null if the prefix does not match. {code} // InnerNodeImpl.getNode(String loc), if (loc.startsWith(PATH_SEPARATOR_STR)) { // remove this node's location from loc if (loc.startsWith(this.getNetworkFullPath())) { loc = loc.substring(this.getNetworkFullPath().length()); } else { return null; } } {code} - Add \@Override for the overrided methods > Detect Ozone Network topology > - > > Key: HDDS-699 > URL: https://issues.apache.org/jira/browse/HDDS-699 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Sammi Chen >Priority: Major > Attachments: HDDS-699.00.patch, HDDS-699.01.patch, HDDS-699.02.patch, > HDDS-699.03.patch, HDDS-699.04.patch, HDDS-699.05.patch > > > Traditionally this has been implemented in Hadoop via script or customizable > java class. One thing we want to add here is the flexible multi-level support > instead of fixed levels like DC/Rack/NG/Node. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-699) Detect Ozone Network topology
[ https://issues.apache.org/jira/browse/HDDS-699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786248#comment-16786248 ] Tsz Wo Nicholas Sze commented on HDDS-699: -- Thanks [~Sammi]. Just found that the logic for getLeaf(int leafIndex, String excludedScope, Collection excludedNodes, int ancestorGen) is quite complicated. - First of all, let's consistently use "level" instead of "generation" in the code. In the getLeaf methods, let's rename ancestorGen to numLevelToExclude. - excludedScope and excludedNodes may overlap so that we should filter out the overlapped nodes. {code} public Node getLeaf(int leafIndex, String excludedScope, Collection excludedNodes, int numLevelToExclude) { Preconditions.checkArgument(leafIndex >= 0 && numLevelToExclude >= 0); if (excludedScope != null) { excludedNodes = excludedNodes.stream() .filter(n -> !n.getNetworkFullPath().startsWith(excludedScope)) .collect(Collectors.toList()); } return getLeafRecursively(leafIndex, excludedScope, excludedNodes, numLevelToExclude); // see below for getLeafRecursively } {code} - Let's change getAncestor(0) to return this. It will simplify the code. {code} public Node getAncestor(int generation) { Preconditions.checkArgument(generation >= 0); Node current = this; while (generation > 0 && current != null) { current = current.getParent(); generation--; } return current; } {code} - Then, we need to take care the excluded node counting with numLevelToExclude. getAncestorNodeMap seems incorrect since it does not consider numLevelToExclude. When consider numLevelToExclude, two excluded nodes under the same ancestor may or may not overlap. We should filter out the overlap first as below. {code} /** * @return a map: ancestor-node -> node-count, where * the ancestor-node corresponds to the levelToReturn, and * the node-count corresponds to the levelToCount. */ private Map getAncestorCounts(Collection nodes, int levelToReturn, int levelToCount) { Preconditions.checkState(levelToReturn >= levelToCount); if (nodes == null || nodes.size() == 0) { return Collections.emptyMap(); } // map: levelToCount -> levelToReturn final Map map = new HashMap<>(); for (Node node: nodes) { final Node toReturn = node.getAncestor(levelToReturn); final Node toCount = levelToCount == levelToReturn ? toReturn: node.getAncestor(levelToCount); map.putIfAbsent(toCount, toReturn); } // map: levelToReturn -> counts final Map counts = new HashMap<>(); for (Map.Entry entry : map.entrySet()) { final Node toCount = entry.getKey(); final Node toReturn = entry.getValue(); counts.compute(toReturn, (key, n) -> (n == null? 0: n) + toCount.getNumOfLeaves()); } return counts; } {code} - Finally, here is the getLeafRecursively(..). The other getLeaf methods can be removed. {code} private Node getLeafRecursively(int leafIndex, String excludedScope, Collection excludedNodes, int numLevelToExclude) { if (isLeafParent()) { return getLeafOnLeafParent(leafIndex, excludedScope, excludedNodes); } final int levelToReturn = NodeSchemaManager.getInstance().getMaxLevel() - getLevel() - 1; final Map excludedAncestors = getAncestorCounts( excludedNodes, levelToReturn, numLevelToExclude); final int excludedScopeCount = getScopeNodeCount(excludedScope); for(Node node : childrenMap.values()) { int leafCount = node.getNumOfLeaves(); if (excludedScope != null && excludedScope.startsWith(node.getNetworkFullPath())) { leafCount -= excludedScopeCount; } leafCount -= excludedAncestors.get(node); if (leafCount > 0) { if (leafIndex < leafCount) { return ((InnerNodeImpl)node).getLeafRecursively( leafIndex, excludedScope, excludedNodes, numLevelToExclude); } leafIndex -= leafCount; } } return null; } {code} > Detect Ozone Network topology > - > > Key: HDDS-699 > URL: https://issues.apache.org/jira/browse/HDDS-699 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Sammi Chen >Priority: Major > Attachments: HDDS-699.00.patch, HDDS-699.01.patch, HDDS-699.02.patch, > HDDS-699.03.patch, HDDS-699.04.patch, HDDS-699.05.patch > > > Traditionally this has been implemented in Hadoop via script or customizable > java class. One thing we want to add here is the flexible multi-level support > instead of fixed levels like DC/Rack/NG/Node. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail:
[jira] [Commented] (HDDS-699) Detect Ozone Network topology
[ https://issues.apache.org/jira/browse/HDDS-699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784986#comment-16784986 ] Tsz Wo Nicholas Sze commented on HDDS-699: -- Some initial comments on the 04 patch. (will continue reviewing it.) - remove setCost from Node since it is never used. - In NodeImpl, change name, location and cost to final. We should remove the set(..) method, which is only used in constructors, and refactor the code as below. {code} // host:port# private final String name; // string representation of this node's location private final String location; // the cost to go through this node private final int cost; // which level of the tree the node resides, start from 1 for root private int level; // node's parent private Node parent; /** * Construct a node from its name and its location. * @param name this node's name (can be null, must not contain * {@link NetConstants#PATH_SEPARATOR}) * @param location this node's location */ public NodeImpl(String name, String location, int cost) { if (name != null && name.contains(PATH_SEPARATOR_STR)) { throw new IllegalArgumentException( "Network location name:" + name + " should not contain " + PATH_SEPARATOR_STR); } this.name = (name == null) ? ROOT : name; this.location = NetUtils.normalize(location); this.cost = cost; } /** * Construct a node from its name and its location. * @param name this node's name (can be null, must not contain * {@link NetConstants#PATH_SEPARATOR}) * @param location this node's location * @param parent this node's parent node * @param level this node's level in the tree * @param cost this node's cost if traffic goes through it */ public NodeImpl(String name, String location, Node parent, int level, int cost) { this(name, location, cost); this.parent = parent; this.level = level; } // Note that the other constructors are removed. {code} - In InnerNode, removes the following methods and change them to private in InnerNodeImpl. They are only used in InnerNodeImpl internally. -* getChildren(), -* getNodes(int level), -* getLeaf(int leafIndex, Collection excludedNodes), -* getLeaf(int leafIndex, String excludedScope), -* getLeaf(int leafIndex, Collection excludedNodes, int ancestorGen), -* getLeaf(int leafIndex, String excludedScope, Collection excludedNodes) -* isParent(Node n), -* isLeafParent(). - In InnerNodeImpl, -* rename getAncestorsI to getAncestorsCounts -* rename getAncestorsII to getAncestorNodes - NetworkTopologyImpl.toString should acquire readLock. > Detect Ozone Network topology > - > > Key: HDDS-699 > URL: https://issues.apache.org/jira/browse/HDDS-699 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Sammi Chen >Priority: Major > Attachments: HDDS-699.00.patch, HDDS-699.01.patch, HDDS-699.02.patch, > HDDS-699.03.patch, HDDS-699.04.patch > > > Traditionally this has been implemented in Hadoop via script or customizable > java class. One thing we want to add here is the flexible multi-level support > instead of fixed levels like DC/Rack/NG/Node. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-699) Detect Ozone Network topology
[ https://issues.apache.org/jira/browse/HDDS-699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784946#comment-16784946 ] Tsz Wo Nicholas Sze commented on HDDS-699: -- > cp: cannot stat > '/testptch/hadoop/hadoop-ozone/objectstore-service/target/hadoop-ozone-objectstore-service-0.4.0-SNAPSHOT-plugin.jar': > No such file or directory It has nothing to do with the patch here since it also fails without the patch. bq. -1 compile 19m 1s root in trunk failed. > Detect Ozone Network topology > - > > Key: HDDS-699 > URL: https://issues.apache.org/jira/browse/HDDS-699 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Sammi Chen >Priority: Major > Attachments: HDDS-699.00.patch, HDDS-699.01.patch, HDDS-699.02.patch, > HDDS-699.03.patch, HDDS-699.04.patch > > > Traditionally this has been implemented in Hadoop via script or customizable > java class. One thing we want to add here is the flexible multi-level support > instead of fixed levels like DC/Rack/NG/Node. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-699) Detect Ozone Network topology
[ https://issues.apache.org/jira/browse/HDDS-699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783831#comment-16783831 ] Tsz Wo Nicholas Sze commented on HDDS-699: -- [~Sammi], I have checked the 03 patch. It looks good in general! Please fix the findbugs and other warnings. I will check the new patch in more details. Thanks a lot. > Detect Ozone Network topology > - > > Key: HDDS-699 > URL: https://issues.apache.org/jira/browse/HDDS-699 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Sammi Chen >Priority: Major > Attachments: HDDS-699.00.patch, HDDS-699.01.patch, HDDS-699.02.patch, > HDDS-699.03.patch > > > Traditionally this has been implemented in Hadoop via script or customizable > java class. One thing we want to add here is the flexible multi-level support > instead of fixed levels like DC/Rack/NG/Node. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-699) Detect Ozone Network topology
[ https://issues.apache.org/jira/browse/HDDS-699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780842#comment-16780842 ] Tsz Wo Nicholas Sze commented on HDDS-699: -- > ... I think it's a trade off between performance and accurancy. Using a > single RW-lock at network topology level has better accurancy while lower > performance. The question is whether accurancy can be sacrified in some cases > without big impact to other modules. That's good point! It is perfectly fine if we provide a high performance but sometime inaccurate API. In such case, we should specify the behavior carefully. > Detect Ozone Network topology > - > > Key: HDDS-699 > URL: https://issues.apache.org/jira/browse/HDDS-699 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Sammi Chen >Priority: Major > Attachments: HDDS-699.00.patch, HDDS-699.01.patch, HDDS-699.02.patch > > > Traditionally this has been implemented in Hadoop via script or customizable > java class. One thing we want to add here is the flexible multi-level support > instead of fixed levels like DC/Rack/NG/Node. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-451) PutKey failed due to error "Rejecting write chunk request. Chunk overwrite without explicit request"
[ https://issues.apache.org/jira/browse/HDDS-451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze resolved HDDS-451. -- Resolution: Cannot Reproduce Resolving as "Cannot Reproduce". > PutKey failed due to error "Rejecting write chunk request. Chunk overwrite > without explicit request" > > > Key: HDDS-451 > URL: https://issues.apache.org/jira/browse/HDDS-451 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Affects Versions: 0.2.1 >Reporter: Nilotpal Nandi >Assignee: Shashikant Banerjee >Priority: Blocker > Labels: alpha2 > Attachments: all-node-ozone-logs-1536841590.tar.gz > > > steps taken : > -- > # Ran Put Key command to write 50GB data. Put Key client operation failed > after 17 mins. > error seen ozone.log : > > > {code} > 2018-09-13 12:11:53,734 [ForkJoinPool.commonPool-worker-20] DEBUG > (ChunkManagerImpl.java:85) - writing > chunk:bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_1 > chunk stage:COMMIT_DATA chunk > file:/tmp/hadoop-root/dfs/data/hdds/de0a9e01-4a12-40e3-b567-51b9bd83248e/current/containerDir0/16/chunks/bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_1 > tmp chunk file > 2018-09-13 12:11:56,576 [pool-3-thread-60] DEBUG (ChunkManagerImpl.java:85) - > writing > chunk:bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2 > chunk stage:WRITE_DATA chunk > file:/tmp/hadoop-root/dfs/data/hdds/de0a9e01-4a12-40e3-b567-51b9bd83248e/current/containerDir0/16/chunks/bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2 > tmp chunk file > 2018-09-13 12:11:56,739 [ForkJoinPool.commonPool-worker-20] DEBUG > (ChunkManagerImpl.java:85) - writing > chunk:bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2 > chunk stage:COMMIT_DATA chunk > file:/tmp/hadoop-root/dfs/data/hdds/de0a9e01-4a12-40e3-b567-51b9bd83248e/current/containerDir0/16/chunks/bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2 > tmp chunk file > 2018-09-13 12:12:21,410 [Datanode State Machine Thread - 0] DEBUG > (DatanodeStateMachine.java:148) - Executing cycle Number : 206 > 2018-09-13 12:12:51,411 [Datanode State Machine Thread - 0] DEBUG > (DatanodeStateMachine.java:148) - Executing cycle Number : 207 > 2018-09-13 12:12:53,525 [BlockDeletingService#1] DEBUG > (TopNOrderedContainerDeletionChoosingPolicy.java:79) - Stop looking for next > container, there is no pending deletion block contained in remaining > containers. > 2018-09-13 12:12:55,048 [Datanode ReportManager Thread - 1] DEBUG > (ContainerSet.java:191) - Starting container report iteration. > 2018-09-13 12:13:02,626 [pool-3-thread-1] ERROR (ChunkUtils.java:244) - > Rejecting write chunk request. Chunk overwrite without explicit request. > ChunkInfo{chunkName='bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2, > offset=0, len=16777216} > 2018-09-13 12:13:03,035 [pool-3-thread-1] INFO (ContainerUtils.java:149) - > Operation: WriteChunk : Trace ID: 54834b29-603d-4ba9-9d68-0885215759d8 : > Message: Rejecting write chunk request. OverWrite flag > required.ChunkInfo{chunkName='bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2, > offset=0, len=16777216} : Result: OVERWRITE_FLAG_REQUIRED > 2018-09-13 12:13:03,037 [ForkJoinPool.commonPool-worker-11] ERROR > (ChunkUtils.java:244) - Rejecting write chunk request. Chunk overwrite > without explicit request. > ChunkInfo{chunkName='bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2, > offset=0, len=16777216} > 2018-09-13 12:13:03,037 [ForkJoinPool.commonPool-worker-11] INFO > (ContainerUtils.java:149) - Operation: WriteChunk : Trace ID: > 54834b29-603d-4ba9-9d68-0885215759d8 : Message: Rejecting write chunk > request. OverWrite flag > required.ChunkInfo{chunkName='bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2, > offset=0, len=16777216} : Result: OVERWRITE_FLAG_REQUIRED > > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-372) There are three buffer copies in BlockOutputStream
[ https://issues.apache.org/jira/browse/HDDS-372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779903#comment-16779903 ] Tsz Wo Nicholas Sze commented on HDDS-372: -- [~shashikant], it is great that you have picked this up. Thanks. > There are three buffer copies in BlockOutputStream > -- > > Key: HDDS-372 > URL: https://issues.apache.org/jira/browse/HDDS-372 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Client >Reporter: Tsz Wo Nicholas Sze >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDDS-372.20180829.patch > > > Currently, there are three buffer copies in ChunkOutputStream > # from byte[] to ByteBuffer, and > # from ByteBuffer to ByteString. > # from ByteString to ByteBuffer for checskum computation > We should eliminate the ByteBuffer in the middle. > For zero copy io, we should support WritableByteChannel instead of > OutputStream. It won't be done in this JIRA. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-451) PutKey failed due to error "Rejecting write chunk request. Chunk overwrite without explicit request"
[ https://issues.apache.org/jira/browse/HDDS-451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779883#comment-16779883 ] Tsz Wo Nicholas Sze commented on HDDS-451: -- Let's resolve this then? The description and stack trace become stale. We should file a new JIRA if we see a problem in the future. > PutKey failed due to error "Rejecting write chunk request. Chunk overwrite > without explicit request" > > > Key: HDDS-451 > URL: https://issues.apache.org/jira/browse/HDDS-451 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Affects Versions: 0.2.1 >Reporter: Nilotpal Nandi >Assignee: Shashikant Banerjee >Priority: Blocker > Labels: alpha2 > Attachments: all-node-ozone-logs-1536841590.tar.gz > > > steps taken : > -- > # Ran Put Key command to write 50GB data. Put Key client operation failed > after 17 mins. > error seen ozone.log : > > > {code} > 2018-09-13 12:11:53,734 [ForkJoinPool.commonPool-worker-20] DEBUG > (ChunkManagerImpl.java:85) - writing > chunk:bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_1 > chunk stage:COMMIT_DATA chunk > file:/tmp/hadoop-root/dfs/data/hdds/de0a9e01-4a12-40e3-b567-51b9bd83248e/current/containerDir0/16/chunks/bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_1 > tmp chunk file > 2018-09-13 12:11:56,576 [pool-3-thread-60] DEBUG (ChunkManagerImpl.java:85) - > writing > chunk:bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2 > chunk stage:WRITE_DATA chunk > file:/tmp/hadoop-root/dfs/data/hdds/de0a9e01-4a12-40e3-b567-51b9bd83248e/current/containerDir0/16/chunks/bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2 > tmp chunk file > 2018-09-13 12:11:56,739 [ForkJoinPool.commonPool-worker-20] DEBUG > (ChunkManagerImpl.java:85) - writing > chunk:bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2 > chunk stage:COMMIT_DATA chunk > file:/tmp/hadoop-root/dfs/data/hdds/de0a9e01-4a12-40e3-b567-51b9bd83248e/current/containerDir0/16/chunks/bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2 > tmp chunk file > 2018-09-13 12:12:21,410 [Datanode State Machine Thread - 0] DEBUG > (DatanodeStateMachine.java:148) - Executing cycle Number : 206 > 2018-09-13 12:12:51,411 [Datanode State Machine Thread - 0] DEBUG > (DatanodeStateMachine.java:148) - Executing cycle Number : 207 > 2018-09-13 12:12:53,525 [BlockDeletingService#1] DEBUG > (TopNOrderedContainerDeletionChoosingPolicy.java:79) - Stop looking for next > container, there is no pending deletion block contained in remaining > containers. > 2018-09-13 12:12:55,048 [Datanode ReportManager Thread - 1] DEBUG > (ContainerSet.java:191) - Starting container report iteration. > 2018-09-13 12:13:02,626 [pool-3-thread-1] ERROR (ChunkUtils.java:244) - > Rejecting write chunk request. Chunk overwrite without explicit request. > ChunkInfo{chunkName='bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2, > offset=0, len=16777216} > 2018-09-13 12:13:03,035 [pool-3-thread-1] INFO (ContainerUtils.java:149) - > Operation: WriteChunk : Trace ID: 54834b29-603d-4ba9-9d68-0885215759d8 : > Message: Rejecting write chunk request. OverWrite flag > required.ChunkInfo{chunkName='bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2, > offset=0, len=16777216} : Result: OVERWRITE_FLAG_REQUIRED > 2018-09-13 12:13:03,037 [ForkJoinPool.commonPool-worker-11] ERROR > (ChunkUtils.java:244) - Rejecting write chunk request. Chunk overwrite > without explicit request. > ChunkInfo{chunkName='bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2, > offset=0, len=16777216} > 2018-09-13 12:13:03,037 [ForkJoinPool.commonPool-worker-11] INFO > (ContainerUtils.java:149) - Operation: WriteChunk : Trace ID: > 54834b29-603d-4ba9-9d68-0885215759d8 : Message: Rejecting write chunk > request. OverWrite flag > required.ChunkInfo{chunkName='bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2, > offset=0, len=16777216} : Result: OVERWRITE_FLAG_REQUIRED > > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-699) Detect Ozone Network topology
[ https://issues.apache.org/jira/browse/HDDS-699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16777522#comment-16777522 ] Tsz Wo Nicholas Sze commented on HDDS-699: -- [~Sammi] thanks for the update. Thanks for adding many tests. Using AtomicInteger and AtomicReference may not work since individual fields may be mutated during a computation of a method. For example, when calling sortByDistanceCost, all nodes are in the topology initially. Then, some of the nodes may be removed during the computation of sortByDistanceCost. sortByDistanceCost may return incorrect results. I just have realized that the patches here are mostly adding new code but not yet change the existing code to use the new NetworkTopology. Then, the 01 patch actually is better. We may make NetworkTopology pluggable and improve it later. How about you address the previous comments from the 01 patch (using a single RW-lock)? > Detect Ozone Network topology > - > > Key: HDDS-699 > URL: https://issues.apache.org/jira/browse/HDDS-699 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Sammi Chen >Priority: Major > Attachments: HDDS-699.00.patch, HDDS-699.01.patch, HDDS-699.02.patch > > > Traditionally this has been implemented in Hadoop via script or customizable > java class. One thing we want to add here is the flexible multi-level support > instead of fixed levels like DC/Rack/NG/Node. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-699) Detect Ozone Network topology
[ https://issues.apache.org/jira/browse/HDDS-699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16755460#comment-16755460 ] Tsz Wo Nicholas Sze commented on HDDS-699: -- Two suggestions: # Move the root level locking to the second level. The root node does not cache aggregate information so that write lock at root is needed only when the second level is changed. Each node in the second level maintain a lock to protect its subtree. # Separate the NetworkTopology interface and implementation so that replacing the implementation in the future becomes possible. #2 may not be easy. If we have #2, I am fine with any implementation today. > Detect Ozone Network topology > - > > Key: HDDS-699 > URL: https://issues.apache.org/jira/browse/HDDS-699 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Sammi Chen >Priority: Major > Attachments: HDDS-699.00.patch, HDDS-699.01.patch > > > Traditionally this has been implemented in Hadoop via script or customizable > java class. One thing we want to add here is the flexible multi-level support > instead of fixed levels like DC/Rack/NG/Node. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-699) Detect Ozone Network topology
[ https://issues.apache.org/jira/browse/HDDS-699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16755329#comment-16755329 ] Tsz Wo Nicholas Sze commented on HDDS-699: -- > ... For the NetworkTopology performance, at the beginning I thought most of > the access are reads after the network topology is built, so a read write > reentrant single netlock approach may not cause much performance penalty. ... In a large cluster, datanodes keep up and down so that NetworkTopology keeps changing. There are a large amount of NetworkTopology queries. NetworkTopology becomes a scalability bottleneck. Consider that Ozone is to support small objects. We can foresee that the problem will be more worse compared to HDFS. > If it will really cause big performance issue, then we'd better do some > improvement. ... When we see the problem in production clusters, it is hard to do the improvement since it is very risky to change such critical code at that time. Also, there are API incompatibility -- it is impossible to change NodeImpl from mutable to immutable. So, it is now or never. :) > Detect Ozone Network topology > - > > Key: HDDS-699 > URL: https://issues.apache.org/jira/browse/HDDS-699 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Sammi Chen >Priority: Major > Attachments: HDDS-699.00.patch, HDDS-699.01.patch > > > Traditionally this has been implemented in Hadoop via script or customizable > java class. One thing we want to add here is the flexible multi-level support > instead of fixed levels like DC/Rack/NG/Node. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-699) Detect Ozone Network topology
[ https://issues.apache.org/jira/browse/HDDS-699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16754403#comment-16754403 ] Tsz Wo Nicholas Sze commented on HDDS-699: -- Thanks [~Sammi] for working the patch. Some comments/questions - Do you expect NetConf be set by users/admins? If not, let's rename it something like NetConstants. In hadoop, conf is supposed to be set by users/admin. - NetworkTopology uses the single netlock approach for the entire data structure. It has been a performance bottleneck in HDFS for a long time. I wonder if we could make Node and InnerNode threadsafe: -* the NodeImpl can be immutable so that the accesses do not need any lock. -* childrenMap in InnerNodeImpl can be changed to ConcurrentHashMap -* Do not maintain numOfLeaves in Root. For the other InnerNode, numOfLeaves is protected by a lock in getNumOfLeaves(), add(..) and remove(..) - Remove NetworkTopology.random since it has a race condition. Use ThreadLocalRandom. > Detect Ozone Network topology > - > > Key: HDDS-699 > URL: https://issues.apache.org/jira/browse/HDDS-699 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Sammi Chen >Priority: Major > Attachments: HDDS-699.00.patch, HDDS-699.01.patch > > > Traditionally this has been implemented in Hadoop via script or customizable > java class. One thing we want to add here is the flexible multi-level support > instead of fixed levels like DC/Rack/NG/Node. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-698) Support Topology Awareness for Ozone
[ https://issues.apache.org/jira/browse/HDDS-698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16741485#comment-16741485 ] Tsz Wo Nicholas Sze commented on HDDS-698: -- [~Sammi] and [~djp], thanks for working on this. Are you planning to post patches to the subtasks? Or just to post a patch here? Please let me know. I am happy to review the patches. > Support Topology Awareness for Ozone > > > Key: HDDS-698 > URL: https://issues.apache.org/jira/browse/HDDS-698 > Project: Hadoop Distributed Data Store > Issue Type: New Feature >Reporter: Xiaoyu Yao >Assignee: Sammi Chen >Priority: Major > Attachments: HDDS-698.000.patch, network-topology-default.xml, > network-topology-nodegroup.xml > > > This is an umbrella JIRA to add topology aware support for Ozone Pipelines, > Containers and Blocks. Long time since HDFS is created, we provide > rack/nodegroup awareness for reliability and high performance for data > access. Ozone need a similar mechanism and this can be more flexible for > cloud scenarios. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-870) Avoid creating block sized buffer in ChunkGroupOutputStream
[ https://issues.apache.org/jira/browse/HDDS-870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16713455#comment-16713455 ] Tsz Wo Nicholas Sze commented on HDDS-870: -- Filed RATIS-453 to fix the retry behavior in Ratis. > Avoid creating block sized buffer in ChunkGroupOutputStream > --- > > Key: HDDS-870 > URL: https://issues.apache.org/jira/browse/HDDS-870 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Client >Affects Versions: 0.4.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 0.4.0 > > Attachments: HDDS-870.000.patch, HDDS-870.001.patch, > HDDS-870.002.patch, HDDS-870.003.patch, HDDS-870.004.patch, > HDDS-870.005.patch, HDDS-870.006.patch, HDDS-870.007.patch, > HDDS-870.008.patch, HDDS-870.009.patch > > > Currently, for a key, we create a block size byteBuffer in order for caching > data. This can be replaced with an array of buffers of size flush buffer size > configured for handling 2 node failures as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14112) Avoid recursive call to external authorizer for getContentSummary.
[ https://issues.apache.org/jira/browse/HDFS-14112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-14112: --- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.2.1 Status: Resolved (was: Patch Available) Thanks [~jnp] for reviewing the patch. I have committed this. > Avoid recursive call to external authorizer for getContentSummary. > -- > > Key: HDFS-14112 > URL: https://issues.apache.org/jira/browse/HDFS-14112 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Jitendra Nath Pandey >Assignee: Tsz Wo Nicholas Sze >Priority: Critical > Fix For: 3.2.1 > > Attachments: h14112_20181128.patch, h14112_20181129.patch > > > HDFS-12130 optimizes permission check, and invokes permission checker > recursively for each component of the tree, which works well for FSPermission > checker. > But for certain external authorizers it may be more efficient to make one > call with {{subaccess}}, because often they don't have to evaluate for each > and every component of the path. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14112) Avoid recursive call to external authorizer for getContentSummary.
[ https://issues.apache.org/jira/browse/HDFS-14112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16703670#comment-16703670 ] Tsz Wo Nicholas Sze commented on HDFS-14112: h14112_20181129.patch: adds the new conf to hdfs-default.xml > Avoid recursive call to external authorizer for getContentSummary. > -- > > Key: HDFS-14112 > URL: https://issues.apache.org/jira/browse/HDFS-14112 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Jitendra Nath Pandey >Assignee: Tsz Wo Nicholas Sze >Priority: Critical > Attachments: h14112_20181128.patch, h14112_20181129.patch > > > HDFS-12130 optimizes permission check, and invokes permission checker > recursively for each component of the tree, which works well for FSPermission > checker. > But for certain external authorizers it may be more efficient to make one > call with {{subaccess}}, because often they don't have to evaluate for each > and every component of the path. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14112) Avoid recursive call to external authorizer for getContentSummary.
[ https://issues.apache.org/jira/browse/HDFS-14112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-14112: --- Attachment: h14112_20181129.patch > Avoid recursive call to external authorizer for getContentSummary. > -- > > Key: HDFS-14112 > URL: https://issues.apache.org/jira/browse/HDFS-14112 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Jitendra Nath Pandey >Assignee: Tsz Wo Nicholas Sze >Priority: Critical > Attachments: h14112_20181128.patch, h14112_20181129.patch > > > HDFS-12130 optimizes permission check, and invokes permission checker > recursively for each component of the tree, which works well for FSPermission > checker. > But for certain external authorizers it may be more efficient to make one > call with {{subaccess}}, because often they don't have to evaluate for each > and every component of the path. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14112) Avoid recursive call to external authorizer for getContentSummary.
[ https://issues.apache.org/jira/browse/HDFS-14112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-14112: --- Component/s: namenode > Avoid recursive call to external authorizer for getContentSummary. > -- > > Key: HDFS-14112 > URL: https://issues.apache.org/jira/browse/HDFS-14112 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Jitendra Nath Pandey >Assignee: Tsz Wo Nicholas Sze >Priority: Critical > Attachments: h14112_20181128.patch > > > HDFS-12130 optimizes permission check, and invokes permission checker > recursively for each component of the tree, which works well for FSPermission > checker. > But for certain external authorizers it may be more efficient to make one > call with {{subaccess}}, because often they don't have to evaluate for each > and every component of the path. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14112) Avoid recursive call to external authorizer for getContentSummary.
[ https://issues.apache.org/jira/browse/HDFS-14112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-14112: --- Status: Patch Available (was: Open) h14112_20181128.patch: adds back the subAccess check in pre HDFS-12130 and a conf. > Avoid recursive call to external authorizer for getContentSummary. > -- > > Key: HDFS-14112 > URL: https://issues.apache.org/jira/browse/HDFS-14112 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Jitendra Nath Pandey >Assignee: Tsz Wo Nicholas Sze >Priority: Critical > Attachments: h14112_20181128.patch > > > HDFS-12130 optimizes permission check, and invokes permission checker > recursively for each component of the tree, which works well for FSPermission > checker. > But for certain external authorizers it may be more efficient to make one > call with {{subaccess}}, because often they don't have to evaluate for each > and every component of the path. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14112) Avoid recursive call to external authorizer for getContentSummary.
[ https://issues.apache.org/jira/browse/HDFS-14112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-14112: --- Attachment: h14112_20181128.patch > Avoid recursive call to external authorizer for getContentSummary. > -- > > Key: HDFS-14112 > URL: https://issues.apache.org/jira/browse/HDFS-14112 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Jitendra Nath Pandey >Assignee: Tsz Wo Nicholas Sze >Priority: Critical > Attachments: h14112_20181128.patch > > > HDFS-12130 optimizes permission check, and invokes permission checker > recursively for each component of the tree, which works well for FSPermission > checker. > But for certain external authorizers it may be more efficient to make one > call with {{subaccess}}, because often they don't have to evaluate for each > and every component of the path. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-826) Update Ratis to 0.3.0-6f3419a-SNAPSHOT
[ https://issues.apache.org/jira/browse/HDDS-826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDDS-826: - Resolution: Fixed Fix Version/s: 0.3.0 Status: Resolved (was: Patch Available) Thanks [~jnp] and [~msingh] for reviewing the patches. I have committed this. > Update Ratis to 0.3.0-6f3419a-SNAPSHOT > -- > > Key: HDDS-826 > URL: https://issues.apache.org/jira/browse/HDDS-826 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze >Priority: Major > Fix For: 0.3.0 > > Attachments: HDDS-826.20181109b.patch, HDDS-826.20181109c.patch > > > RATIS-404 fixed a deadlock bug. We should update Ratis here. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-826) Update Ratis to 0.3.0-6f3419a-SNAPSHOT
[ https://issues.apache.org/jira/browse/HDDS-826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682117#comment-16682117 ] Tsz Wo Nicholas Sze commented on HDDS-826: -- HDDS-826.20181109c.patch: changes also hadoop-ozone/pom.xml. > Update Ratis to 0.3.0-6f3419a-SNAPSHOT > -- > > Key: HDDS-826 > URL: https://issues.apache.org/jira/browse/HDDS-826 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze >Priority: Major > Attachments: HDDS-826.20181109b.patch, HDDS-826.20181109c.patch > > > RATIS-404 fixed a deadlock bug. We should update Ratis here. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-826) Update Ratis to 0.3.0-6f3419a-SNAPSHOT
[ https://issues.apache.org/jira/browse/HDDS-826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDDS-826: - Attachment: HDDS-826.20181109c.patch > Update Ratis to 0.3.0-6f3419a-SNAPSHOT > -- > > Key: HDDS-826 > URL: https://issues.apache.org/jira/browse/HDDS-826 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze >Priority: Major > Attachments: HDDS-826.20181109b.patch, HDDS-826.20181109c.patch > > > RATIS-404 fixed a deadlock bug. We should update Ratis here. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-826) Update Ratis to 0.3.0-6f3419a-SNAPSHOT
[ https://issues.apache.org/jira/browse/HDDS-826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDDS-826: - Attachment: (was: HDDS-826.001.patch) > Update Ratis to 0.3.0-6f3419a-SNAPSHOT > -- > > Key: HDDS-826 > URL: https://issues.apache.org/jira/browse/HDDS-826 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze >Priority: Major > Attachments: HDDS-826.20181109b.patch > > > RATIS-404 fixed a deadlock bug. We should update Ratis here. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-826) Update Ratis to 0.3.0-6f3419a-SNAPSHOT
[ https://issues.apache.org/jira/browse/HDDS-826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682095#comment-16682095 ] Tsz Wo Nicholas Sze commented on HDDS-826: -- Thanks. I somehow thought that the problem is in the file name. Here is a new patch for trunk. HDDS-826.20181109b.patch > Update Ratis to 0.3.0-6f3419a-SNAPSHOT > -- > > Key: HDDS-826 > URL: https://issues.apache.org/jira/browse/HDDS-826 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze >Priority: Major > Attachments: HDDS-826.20181109b.patch > > > RATIS-404 fixed a deadlock bug. We should update Ratis here. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-826) Update Ratis to 0.3.0-6f3419a-SNAPSHOT
[ https://issues.apache.org/jira/browse/HDDS-826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDDS-826: - Attachment: HDDS-826.20181109b.patch > Update Ratis to 0.3.0-6f3419a-SNAPSHOT > -- > > Key: HDDS-826 > URL: https://issues.apache.org/jira/browse/HDDS-826 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze >Priority: Major > Attachments: HDDS-826.20181109b.patch > > > RATIS-404 fixed a deadlock bug. We should update Ratis here. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-826) Update Ratis to 0.3.0-6f3419a-SNAPSHOT
[ https://issues.apache.org/jira/browse/HDDS-826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDDS-826: - Attachment: (was: HDDS-826.20181109.patch) > Update Ratis to 0.3.0-6f3419a-SNAPSHOT > -- > > Key: HDDS-826 > URL: https://issues.apache.org/jira/browse/HDDS-826 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze >Priority: Major > Attachments: HDDS-826.20181109b.patch > > > RATIS-404 fixed a deadlock bug. We should update Ratis here. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-826) Update Ratis to 0.3.0-6f3419a-SNAPSHOT
[ https://issues.apache.org/jira/browse/HDDS-826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDDS-826: - Attachment: HDDS-826.001.patch > Update Ratis to 0.3.0-6f3419a-SNAPSHOT > -- > > Key: HDDS-826 > URL: https://issues.apache.org/jira/browse/HDDS-826 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze >Priority: Major > Attachments: HDDS-826.001.patch, HDDS-826.20181109.patch > > > RATIS-404 fixed a deadlock bug. We should update Ratis here. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-826) Update Ratis to 0.3.0-6f3419a-SNAPSHOT
[ https://issues.apache.org/jira/browse/HDDS-826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16681972#comment-16681972 ] Tsz Wo Nicholas Sze commented on HDDS-826: -- HDDS-826.20181109.patch: Re-upload the patch using . instead of _ in the file name. > Update Ratis to 0.3.0-6f3419a-SNAPSHOT > -- > > Key: HDDS-826 > URL: https://issues.apache.org/jira/browse/HDDS-826 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze >Priority: Major > Attachments: HDDS-826.20181109.patch > > > RATIS-404 fixed a deadlock bug. We should update Ratis here. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-826) Update Ratis to 0.3.0-6f3419a-SNAPSHOT
[ https://issues.apache.org/jira/browse/HDDS-826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDDS-826: - Attachment: HDDS-826.20181109.patch > Update Ratis to 0.3.0-6f3419a-SNAPSHOT > -- > > Key: HDDS-826 > URL: https://issues.apache.org/jira/browse/HDDS-826 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze >Priority: Major > Attachments: HDDS-826.20181109.patch > > > RATIS-404 fixed a deadlock bug. We should update Ratis here. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-826) Update Ratis to 0.3.0-6f3419a-SNAPSHOT
[ https://issues.apache.org/jira/browse/HDDS-826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDDS-826: - Attachment: (was: HDDS-826_20181109.patch) > Update Ratis to 0.3.0-6f3419a-SNAPSHOT > -- > > Key: HDDS-826 > URL: https://issues.apache.org/jira/browse/HDDS-826 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze >Priority: Major > Attachments: HDDS-826.20181109.patch > > > RATIS-404 fixed a deadlock bug. We should update Ratis here. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-826) Update Ratis to 0.3.0-6f3419a-SNAPSHOT
[ https://issues.apache.org/jira/browse/HDDS-826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDDS-826: - Status: Patch Available (was: Open) > Update Ratis to 0.3.0-6f3419a-SNAPSHOT > -- > > Key: HDDS-826 > URL: https://issues.apache.org/jira/browse/HDDS-826 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze >Priority: Major > Attachments: HDDS-826_20181109.patch > > > RATIS-404 fixed a deadlock bug. We should update Ratis here. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-826) Update Ratis to 0.3.0-6f3419a-SNAPSHOT
[ https://issues.apache.org/jira/browse/HDDS-826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDDS-826: - Attachment: HDDS-826_20181109.patch > Update Ratis to 0.3.0-6f3419a-SNAPSHOT > -- > > Key: HDDS-826 > URL: https://issues.apache.org/jira/browse/HDDS-826 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze >Priority: Major > Attachments: HDDS-826_20181109.patch > > > RATIS-404 fixed a deadlock bug. We should update Ratis here. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-826) Update Ratis to 0.3.0-6f3419a-SNAPSHOT
Tsz Wo Nicholas Sze created HDDS-826: Summary: Update Ratis to 0.3.0-6f3419a-SNAPSHOT Key: HDDS-826 URL: https://issues.apache.org/jira/browse/HDDS-826 Project: Hadoop Distributed Data Store Issue Type: Improvement Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze RATIS-404 fixed a deadlock bug. We should update Ratis here. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-806) Update Ratis to latest snapshot version in ozone
[ https://issues.apache.org/jira/browse/HDDS-806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16680392#comment-16680392 ] Tsz Wo Nicholas Sze commented on HDDS-806: -- [~msingh], thanks a lot for the followup works. [~shashikant], thank you for reviewing and committing the patches. > Update Ratis to latest snapshot version in ozone > > > Key: HDDS-806 > URL: https://issues.apache.org/jira/browse/HDDS-806 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Affects Versions: 0.3.0 >Reporter: Nilotpal Nandi >Assignee: Tsz Wo Nicholas Sze >Priority: Blocker > Fix For: 0.3.0, 0.4.0 > > Attachments: HDDS-806.001.patch, HDDS-806.002.patch, > HDDS-806_20181107.patch, all-node-ozone-logs-1540979056.tar.gz > > > datanode stopped due to following error : > datanode.log > {noformat} > 2018-10-31 09:12:04,517 INFO org.apache.ratis.server.impl.RaftServerImpl: > 9fab9937-fbcd-4196-8014-cb165045724b: set configuration 169: > [9fab9937-fbcd-4196-8014-cb165045724b:172.27.15.131:9858, > ce0084c2-97cd-4c97-9378-e5175daad18b:172.27.15.139:9858, > f0291cb4-7a48-456a-847f-9f91a12aa850:172.27.38.9:9858], old=null at 169 > 2018-10-31 09:12:22,187 ERROR org.apache.ratis.server.storage.RaftLogWorker: > Terminating with exit status 1: > 9fab9937-fbcd-4196-8014-cb165045724b-RaftLogWorker failed. > org.apache.ratis.protocol.TimeoutIOException: Timeout: WriteLog:182: (t:10, > i:182), STATEMACHINELOGENTRY, client-611073BBFA46, > cid=127-writeStateMachineData > at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:87) > at > org.apache.ratis.server.storage.RaftLogWorker$WriteLog.execute(RaftLogWorker.java:310) > at org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:182) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.util.concurrent.TimeoutException > at > java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771) > at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) > at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:79) > ... 3 more{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-691) Dependency convergence error for org.apache.hadoop:hadoop-annotations
[ https://issues.apache.org/jira/browse/HDDS-691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDDS-691: - Resolution: Not A Problem Status: Resolved (was: Patch Available) This seems not a problem anymore. Resolving ... > Dependency convergence error for org.apache.hadoop:hadoop-annotations > - > > Key: HDDS-691 > URL: https://issues.apache.org/jira/browse/HDDS-691 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Affects Versions: 0.2.1 >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze >Priority: Major > Attachments: HDDS-691_20181018.patch, HDDS-691_20181019.patch > > > {code} > [WARNING] > Dependency convergence error for > org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT paths to dependency are: > +-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT > +-org.apache.hadoop:hadoop-common:3.3.0-20181017.235917-140 > +-org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT > and > +-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT > +-org.apache.hadoop:hadoop-common:3.3.0-20181017.235917-140 > +-org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT > and > +-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT > +-org.apache.hadoop:hadoop-annotations:3.3.0-20181017.235840-140 > [WARNING] Rule 0: org.apache.maven.plugins.enforcer.DependencyConvergence > failed with message: > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-806) writeStateMachineData times out because chunk executors are not scheduled
[ https://issues.apache.org/jira/browse/HDDS-806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679066#comment-16679066 ] Tsz Wo Nicholas Sze commented on HDDS-806: -- BTW, [~jnp] has suggested to reduce the log queue size in Ozone. How about setting it to 1024? {code} raft.server.log.queue.size (int, default=4096) {code} > writeStateMachineData times out because chunk executors are not scheduled > - > > Key: HDDS-806 > URL: https://issues.apache.org/jira/browse/HDDS-806 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Affects Versions: 0.3.0 >Reporter: Nilotpal Nandi >Assignee: Mukul Kumar Singh >Priority: Blocker > Fix For: 0.3.0 > > Attachments: all-node-ozone-logs-1540979056.tar.gz > > > datanode stopped due to following error : > datanode.log > {noformat} > 2018-10-31 09:12:04,517 INFO org.apache.ratis.server.impl.RaftServerImpl: > 9fab9937-fbcd-4196-8014-cb165045724b: set configuration 169: > [9fab9937-fbcd-4196-8014-cb165045724b:172.27.15.131:9858, > ce0084c2-97cd-4c97-9378-e5175daad18b:172.27.15.139:9858, > f0291cb4-7a48-456a-847f-9f91a12aa850:172.27.38.9:9858], old=null at 169 > 2018-10-31 09:12:22,187 ERROR org.apache.ratis.server.storage.RaftLogWorker: > Terminating with exit status 1: > 9fab9937-fbcd-4196-8014-cb165045724b-RaftLogWorker failed. > org.apache.ratis.protocol.TimeoutIOException: Timeout: WriteLog:182: (t:10, > i:182), STATEMACHINELOGENTRY, client-611073BBFA46, > cid=127-writeStateMachineData > at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:87) > at > org.apache.ratis.server.storage.RaftLogWorker$WriteLog.execute(RaftLogWorker.java:310) > at org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:182) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.util.concurrent.TimeoutException > at > java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771) > at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) > at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:79) > ... 3 more{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-806) writeStateMachineData times out because chunk executors are not scheduled
[ https://issues.apache.org/jira/browse/HDDS-806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDDS-806: - Attachment: HDDS-806_20181107.patch > writeStateMachineData times out because chunk executors are not scheduled > - > > Key: HDDS-806 > URL: https://issues.apache.org/jira/browse/HDDS-806 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Affects Versions: 0.3.0 >Reporter: Nilotpal Nandi >Assignee: Mukul Kumar Singh >Priority: Blocker > Fix For: 0.3.0 > > Attachments: HDDS-806_20181107.patch, > all-node-ozone-logs-1540979056.tar.gz > > > datanode stopped due to following error : > datanode.log > {noformat} > 2018-10-31 09:12:04,517 INFO org.apache.ratis.server.impl.RaftServerImpl: > 9fab9937-fbcd-4196-8014-cb165045724b: set configuration 169: > [9fab9937-fbcd-4196-8014-cb165045724b:172.27.15.131:9858, > ce0084c2-97cd-4c97-9378-e5175daad18b:172.27.15.139:9858, > f0291cb4-7a48-456a-847f-9f91a12aa850:172.27.38.9:9858], old=null at 169 > 2018-10-31 09:12:22,187 ERROR org.apache.ratis.server.storage.RaftLogWorker: > Terminating with exit status 1: > 9fab9937-fbcd-4196-8014-cb165045724b-RaftLogWorker failed. > org.apache.ratis.protocol.TimeoutIOException: Timeout: WriteLog:182: (t:10, > i:182), STATEMACHINELOGENTRY, client-611073BBFA46, > cid=127-writeStateMachineData > at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:87) > at > org.apache.ratis.server.storage.RaftLogWorker$WriteLog.execute(RaftLogWorker.java:310) > at org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:182) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.util.concurrent.TimeoutException > at > java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771) > at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) > at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:79) > ... 3 more{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-806) writeStateMachineData times out because chunk executors are not scheduled
[ https://issues.apache.org/jira/browse/HDDS-806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDDS-806: - Assignee: Tsz Wo Nicholas Sze (was: Mukul Kumar Singh) Status: Patch Available (was: Open) HDDS-806_20181107.patch: updates Ratis version > writeStateMachineData times out because chunk executors are not scheduled > - > > Key: HDDS-806 > URL: https://issues.apache.org/jira/browse/HDDS-806 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Affects Versions: 0.3.0 >Reporter: Nilotpal Nandi >Assignee: Tsz Wo Nicholas Sze >Priority: Blocker > Fix For: 0.3.0 > > Attachments: HDDS-806_20181107.patch, > all-node-ozone-logs-1540979056.tar.gz > > > datanode stopped due to following error : > datanode.log > {noformat} > 2018-10-31 09:12:04,517 INFO org.apache.ratis.server.impl.RaftServerImpl: > 9fab9937-fbcd-4196-8014-cb165045724b: set configuration 169: > [9fab9937-fbcd-4196-8014-cb165045724b:172.27.15.131:9858, > ce0084c2-97cd-4c97-9378-e5175daad18b:172.27.15.139:9858, > f0291cb4-7a48-456a-847f-9f91a12aa850:172.27.38.9:9858], old=null at 169 > 2018-10-31 09:12:22,187 ERROR org.apache.ratis.server.storage.RaftLogWorker: > Terminating with exit status 1: > 9fab9937-fbcd-4196-8014-cb165045724b-RaftLogWorker failed. > org.apache.ratis.protocol.TimeoutIOException: Timeout: WriteLog:182: (t:10, > i:182), STATEMACHINELOGENTRY, client-611073BBFA46, > cid=127-writeStateMachineData > at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:87) > at > org.apache.ratis.server.storage.RaftLogWorker$WriteLog.execute(RaftLogWorker.java:310) > at org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:182) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.util.concurrent.TimeoutException > at > java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771) > at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) > at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:79) > ... 3 more{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-806) writeStateMachineData times out because chunk executors are not scheduled
[ https://issues.apache.org/jira/browse/HDDS-806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678935#comment-16678935 ] Tsz Wo Nicholas Sze commented on HDDS-806: -- [~msingh], RATIS-396 is now committed. Also have deployed 0.3.0-1d07b18-SNAPSHOT . > writeStateMachineData times out because chunk executors are not scheduled > - > > Key: HDDS-806 > URL: https://issues.apache.org/jira/browse/HDDS-806 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Affects Versions: 0.3.0 >Reporter: Nilotpal Nandi >Assignee: Mukul Kumar Singh >Priority: Blocker > Fix For: 0.3.0 > > Attachments: all-node-ozone-logs-1540979056.tar.gz > > > datanode stopped due to following error : > datanode.log > {noformat} > 2018-10-31 09:12:04,517 INFO org.apache.ratis.server.impl.RaftServerImpl: > 9fab9937-fbcd-4196-8014-cb165045724b: set configuration 169: > [9fab9937-fbcd-4196-8014-cb165045724b:172.27.15.131:9858, > ce0084c2-97cd-4c97-9378-e5175daad18b:172.27.15.139:9858, > f0291cb4-7a48-456a-847f-9f91a12aa850:172.27.38.9:9858], old=null at 169 > 2018-10-31 09:12:22,187 ERROR org.apache.ratis.server.storage.RaftLogWorker: > Terminating with exit status 1: > 9fab9937-fbcd-4196-8014-cb165045724b-RaftLogWorker failed. > org.apache.ratis.protocol.TimeoutIOException: Timeout: WriteLog:182: (t:10, > i:182), STATEMACHINELOGENTRY, client-611073BBFA46, > cid=127-writeStateMachineData > at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:87) > at > org.apache.ratis.server.storage.RaftLogWorker$WriteLog.execute(RaftLogWorker.java:310) > at org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:182) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.util.concurrent.TimeoutException > at > java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771) > at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) > at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:79) > ... 3 more{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13999) Bogus missing block warning if the file is under construction when NN starts
[ https://issues.apache.org/jira/browse/HDFS-13999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677472#comment-16677472 ] Tsz Wo Nicholas Sze commented on HDFS-13999: +1 the 001 patch looks good. > Bogus missing block warning if the file is under construction when NN starts > > > Key: HDFS-13999 > URL: https://issues.apache.org/jira/browse/HDFS-13999 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.0 >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Major > Attachments: HDFS-13999.branch-2.7.001.patch, webui missing blocks.png > > > We found an interesting case where web UI displays a few missing blocks, but > it doesn't state which files are corrupt. What'll also happen is that fsck > states the file system is healthy. This bug is similar to HDFS-10827 and > HDFS-8533. > (See the attachment for an example) > Using Dynamometer, I was able to reproduce the bug, and realized the the > "missing" blocks are actually healthy, but somehow neededReplications doesn't > get updated when NN receives block reports. What's more interesting is that > the files associated with the "missing" blocks are under construction when NN > starts, and so after a while NN prints file recovery log. > Given that, I determined the following code is the source of bug: > {code:java|title=BlockManager#addStoredBlock} > >// if file is under construction, then done for now > if (bc.isUnderConstruction()) { > return storedBlock; > } > {code} > which is wrong, because a file may have multiple blocks, and the first block > is complete. In which case, the neededReplications structure doesn't get > updated for the first block, and thus the missing block warning on the web > UI. More appropriately, it should check the state of the block itself, not > the file. > Fortunately, it was unintentionally fixed via HDFS-9754: > {code:java} > // if block is still under construction, then done for now > if (!storedBlock.isCompleteOrCommitted()) { > return storedBlock; > } > {code} > We should bring this fix into branch-2.7 too. That said, this is a harmless > warning, and should go away after the under-construction-files are recovered, > and the NN restarts (or force full block reports). > Kudos to Dynamometer! It would be impossible to reproduce this bug without > the tool. And thanks [~smeng] for helping with the reproduction. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-722) ozone datanodes failed to start on few nodes
[ https://issues.apache.org/jira/browse/HDDS-722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663280#comment-16663280 ] Tsz Wo Nicholas Sze commented on HDDS-722: -- Ratis should tolerate the last half written log entry; filed RATIS-373. > ozone datanodes failed to start on few nodes > > > Key: HDDS-722 > URL: https://issues.apache.org/jira/browse/HDDS-722 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.3.0 >Reporter: Nilotpal Nandi >Priority: Critical > Attachments: all-node-ozone-logs-1540356965.tar.gz > > > steps taken : > -- > # put few keys using ozonefs. > # stopped all services of the cluster. > # started om and scm. > # After sometime , started datanodes. > All datanodes failed to start . Out of 12 datanodes, 4 datanodes failed to > start. > > Here is the datanode log snippet : > > > {noformat} > 2018-10-24 04:49:30,594 ERROR > org.apache.ratis.server.impl.StateMachineUpdater: Terminating with exit > status 2: StateMachineUpdater-9524f4e2-9031-4852-ab7c-11c2da3460db: the > StateMachineUpdater hits Throwable > org.apache.ratis.server.storage.RaftLogIOException: java.io.IOException: > Premature EOF from inputStream > at org.apache.ratis.server.storage.LogSegment.loadCache(LogSegment.java:299) > at > org.apache.ratis.server.storage.SegmentedRaftLog.get(SegmentedRaftLog.java:192) > at > org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:142) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Premature EOF from inputStream > at org.apache.ratis.util.IOUtils.readFully(IOUtils.java:100) > at org.apache.ratis.server.storage.LogReader.decodeEntry(LogReader.java:250) > at org.apache.ratis.server.storage.LogReader.readEntry(LogReader.java:155) > at > org.apache.ratis.server.storage.LogInputStream.nextEntry(LogInputStream.java:128) > at > org.apache.ratis.server.storage.LogSegment.readSegmentFile(LogSegment.java:110) > at org.apache.ratis.server.storage.LogSegment.access$400(LogSegment.java:43) > at > org.apache.ratis.server.storage.LogSegment$LogEntryLoader.load(LogSegment.java:167) > at > org.apache.ratis.server.storage.LogSegment$LogEntryLoader.load(LogSegment.java:161) > at org.apache.ratis.server.storage.LogSegment.loadCache(LogSegment.java:295) > ... 3 more > 2018-10-24 04:49:30,598 INFO org.apache.hadoop.ozone.HddsDatanodeService: > SHUTDOWN_MSG: > / > SHUTDOWN_MSG: Shutting down HddsDatanodeService at > ctr-e138-1518143905142-541661-01-03.hwx.site/172.27.57.0 > / > 2018-10-24 04:49:30,598 WARN org.apache.hadoop.fs.CachingGetSpaceUsed: Thread > Interrupted waiting to refresh disk information: sleep interrupted > > {noformat} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-638) enable ratis snapshots for HDDS datanodes
[ https://issues.apache.org/jira/browse/HDDS-638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658294#comment-16658294 ] Tsz Wo Nicholas Sze commented on HDDS-638: -- [~msingh], thanks for the update. +1 the 002 patch looks good. The findbugs warning is not related. > enable ratis snapshots for HDDS datanodes > - > > Key: HDDS-638 > URL: https://issues.apache.org/jira/browse/HDDS-638 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.3.0 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Blocker > Attachments: HDDS-638.001.patch, HDDS-638.002.patch > > > Currently on a restart, a hdds datanode, starts applying log entries from the > start of the log. > This should can be avoided by taking a ratis snapshot to persist the last > stable state and on restart the datanodes start applying log from that index. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-638) enable ratis snapshots for HDDS datanodes
[ https://issues.apache.org/jira/browse/HDDS-638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658188#comment-16658188 ] Tsz Wo Nicholas Sze commented on HDDS-638: -- Some comments on ContainerStateMachine: - For updating lastAppliedTermIndex: -* ConcurrentHashMap does not support null values. It won't work since addRequest always returns null. Just remove addRequest. -* Remove addRequest(..) since it do not seem useful. -* lastSuccessfullyAppliedIndex does not seem useful. How about removing it? Just use lastAppliedTermIndex in BaseStateMachine. The code should look like below: {code} + private void updateLastAppliedTermIndex() { +Long appledTerm = null; +long appliedIndex = -1; +for(long i = getLastAppliedTermIndex().getIndex() + 1;; i++) { + final Long removed = containerCommandCompletionMap.remove(i); + if (removed == null) { +break; + } + appledTerm = removed; + appliedIndex = i; +} +if (appledTerm != null) { + updateLastAppliedTermIndex(appliedIndex, appledTerm); +} + } + /* * ApplyTransaction calls in Ratis are sequential. */ @Override public CompletableFuture applyTransaction(TransactionContext trx) { +final long index = trx.getLogEntry().getIndex(); try { metrics.incNumApplyTransactionsOps(); ContainerCommandRequestProto requestProto = @@ -418,7 +476,7 @@ private ByteString readStateMachineData(LogEntryProto entry, blockDataProto.getBlockID()); return completeExceptionally(ioe); } -blockData.setBlockCommitSequenceId(trx.getLogEntry().getIndex()); +blockData.setBlockCommitSequenceId(index); final ContainerProtos.PutBlockRequestProto putBlockRequestProto = ContainerProtos.PutBlockRequestProto .newBuilder(requestProto.getPutBlock()) @@ -440,6 +498,13 @@ private ByteString readStateMachineData(LogEntryProto entry, future.thenApply( r -> createContainerFutureMap.remove(containerID).complete(null)); } + + future.thenAccept( m -> { +final Long previous = containerCommandCompletionMap.put(index, trx.getLogEntry().getTerm()); +Preconditions.checkState(previous == null); +updateLastAppliedTermIndex(); + }); + return future; } catch (IOException e) { metrics.incNumApplyTransactionsFails(); {code} - Why "TODO persist open containers in snapshots"? Open containers should be persisted if the index is applied to state machine. No? - In loadSnapshot, remove the snapshotFile.exists() check. It must exist by storage.getLatestSnapshot(). -* Remove the warning from the snapshot == null case. It is normal when the storage is newly formatted. - Add @Override to takeSnapshot() and it should throw IOException when createNewFile() fails. - In the test, it should check if the expected snapshot files exists. Some other comments: - flushStateMachineData is expensive since it loops through the entire map. It should be rewritten (probably in a sepearted JIRA.) - The following TODO in initialize(.,) can be removed. BaseStateMachine.getId() will the server id iff initialize has been called; otherwise, it returns null. {code} // TODO: Add a flag that tells you that initialize has been called. // Check with Ratis if this feature is done in Ratis. {code} > enable ratis snapshots for HDDS datanodes > - > > Key: HDDS-638 > URL: https://issues.apache.org/jira/browse/HDDS-638 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.3.0 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Blocker > Attachments: HDDS-638.001.patch > > > Currently on a restart, a hdds datanode, starts applying log entries from the > start of the log. > This should can be avoided by taking a ratis snapshot to persist the last > stable state and on restart the datanodes start applying log from that index. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-691) Dependency convergence error for org.apache.hadoop:hadoop-annotations
[ https://issues.apache.org/jira/browse/HDDS-691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16656992#comment-16656992 ] Tsz Wo Nicholas Sze commented on HDDS-691: -- Sure, removing uniqueVersions sounds good. Here is a now patch: HDDS-691_20181019.patch > Dependency convergence error for org.apache.hadoop:hadoop-annotations > - > > Key: HDDS-691 > URL: https://issues.apache.org/jira/browse/HDDS-691 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze >Priority: Major > Attachments: HDDS-691_20181018.patch, HDDS-691_20181019.patch > > > {code} > [WARNING] > Dependency convergence error for > org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT paths to dependency are: > +-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT > +-org.apache.hadoop:hadoop-common:3.3.0-20181017.235917-140 > +-org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT > and > +-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT > +-org.apache.hadoop:hadoop-common:3.3.0-20181017.235917-140 > +-org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT > and > +-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT > +-org.apache.hadoop:hadoop-annotations:3.3.0-20181017.235840-140 > [WARNING] Rule 0: org.apache.maven.plugins.enforcer.DependencyConvergence > failed with message: > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-691) Dependency convergence error for org.apache.hadoop:hadoop-annotations
[ https://issues.apache.org/jira/browse/HDDS-691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDDS-691: - Attachment: HDDS-691_20181019.patch > Dependency convergence error for org.apache.hadoop:hadoop-annotations > - > > Key: HDDS-691 > URL: https://issues.apache.org/jira/browse/HDDS-691 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze >Priority: Major > Attachments: HDDS-691_20181018.patch, HDDS-691_20181019.patch > > > {code} > [WARNING] > Dependency convergence error for > org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT paths to dependency are: > +-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT > +-org.apache.hadoop:hadoop-common:3.3.0-20181017.235917-140 > +-org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT > and > +-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT > +-org.apache.hadoop:hadoop-common:3.3.0-20181017.235917-140 > +-org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT > and > +-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT > +-org.apache.hadoop:hadoop-annotations:3.3.0-20181017.235840-140 > [WARNING] Rule 0: org.apache.maven.plugins.enforcer.DependencyConvergence > failed with message: > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-691) Dependency convergence error for org.apache.hadoop:hadoop-annotations
[ https://issues.apache.org/jira/browse/HDDS-691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16656265#comment-16656265 ] Tsz Wo Nicholas Sze commented on HDDS-691: -- Hi [~elek], below are the rationale behind the patch - First of all, the hadoop-common dependency in hadoop-hdds/common/pom.xml is obviously redundant since the parent hadoop-hdds/pom.xml already has it. - From the dependency convergence error, the grandparent hadoop-project-dist/pom.xml already has the hadoop-annotations dependency. hadoop-hdds/common/pom.xml gets hadoop-annotations again from the hadoop-common dependency. If we set the scope to "provided" for hadoop-common in hadoop-hdds/common/pom.xml, the dependency becomes non-transitive so that it won't get hadoop-annotations again. BTW, the hadoop-common dependency in hadoop-hdfs/pom.xml is also "provided". We probably should do the same for hadoop-hdds? > Dependency convergence error for org.apache.hadoop:hadoop-annotations > - > > Key: HDDS-691 > URL: https://issues.apache.org/jira/browse/HDDS-691 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze >Priority: Major > Attachments: HDDS-691_20181018.patch > > > {code} > [WARNING] > Dependency convergence error for > org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT paths to dependency are: > +-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT > +-org.apache.hadoop:hadoop-common:3.3.0-20181017.235917-140 > +-org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT > and > +-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT > +-org.apache.hadoop:hadoop-common:3.3.0-20181017.235917-140 > +-org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT > and > +-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT > +-org.apache.hadoop:hadoop-annotations:3.3.0-20181017.235840-140 > [WARNING] Rule 0: org.apache.maven.plugins.enforcer.DependencyConvergence > failed with message: > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-691) Dependency convergence error for org.apache.hadoop:hadoop-annotations
[ https://issues.apache.org/jira/browse/HDDS-691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDDS-691: - Status: Patch Available (was: Open) > Dependency convergence error for org.apache.hadoop:hadoop-annotations > - > > Key: HDDS-691 > URL: https://issues.apache.org/jira/browse/HDDS-691 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze >Priority: Major > Attachments: HDDS-691_20181018.patch > > > {code} > [WARNING] > Dependency convergence error for > org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT paths to dependency are: > +-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT > +-org.apache.hadoop:hadoop-common:3.3.0-20181017.235917-140 > +-org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT > and > +-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT > +-org.apache.hadoop:hadoop-common:3.3.0-20181017.235917-140 > +-org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT > and > +-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT > +-org.apache.hadoop:hadoop-annotations:3.3.0-20181017.235840-140 > [WARNING] Rule 0: org.apache.maven.plugins.enforcer.DependencyConvergence > failed with message: > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-691) Dependency convergence error for org.apache.hadoop:hadoop-annotations
[ https://issues.apache.org/jira/browse/HDDS-691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654998#comment-16654998 ] Tsz Wo Nicholas Sze commented on HDDS-691: -- HDDS-691_20181018.patch: changes the scope to "provided". > Dependency convergence error for org.apache.hadoop:hadoop-annotations > - > > Key: HDDS-691 > URL: https://issues.apache.org/jira/browse/HDDS-691 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze >Priority: Major > Attachments: HDDS-691_20181018.patch > > > {code} > [WARNING] > Dependency convergence error for > org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT paths to dependency are: > +-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT > +-org.apache.hadoop:hadoop-common:3.3.0-20181017.235917-140 > +-org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT > and > +-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT > +-org.apache.hadoop:hadoop-common:3.3.0-20181017.235917-140 > +-org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT > and > +-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT > +-org.apache.hadoop:hadoop-annotations:3.3.0-20181017.235840-140 > [WARNING] Rule 0: org.apache.maven.plugins.enforcer.DependencyConvergence > failed with message: > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-691) Dependency convergence error for org.apache.hadoop:hadoop-annotations
[ https://issues.apache.org/jira/browse/HDDS-691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDDS-691: - Attachment: HDDS-691_20181018.patch > Dependency convergence error for org.apache.hadoop:hadoop-annotations > - > > Key: HDDS-691 > URL: https://issues.apache.org/jira/browse/HDDS-691 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze >Priority: Major > Attachments: HDDS-691_20181018.patch > > > {code} > [WARNING] > Dependency convergence error for > org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT paths to dependency are: > +-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT > +-org.apache.hadoop:hadoop-common:3.3.0-20181017.235917-140 > +-org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT > and > +-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT > +-org.apache.hadoop:hadoop-common:3.3.0-20181017.235917-140 > +-org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT > and > +-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT > +-org.apache.hadoop:hadoop-annotations:3.3.0-20181017.235840-140 > [WARNING] Rule 0: org.apache.maven.plugins.enforcer.DependencyConvergence > failed with message: > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-676) Enable Read from open Containers via Standalone Protocol
[ https://issues.apache.org/jira/browse/HDDS-676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654995#comment-16654995 ] Tsz Wo Nicholas Sze commented on HDDS-676: -- {code} [WARNING] Dependency convergence error for org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT paths to dependency are: +-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT +-org.apache.hadoop:hadoop-common:3.3.0-20181017.235917-140 +-org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT and +-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT +-org.apache.hadoop:hadoop-common:3.3.0-20181017.235917-140 +-org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT and +-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT +-org.apache.hadoop:hadoop-annotations:3.3.0-20181017.235840-140 [WARNING] Rule 0: org.apache.maven.plugins.enforcer.DependencyConvergence failed with message: {code} It seems that the pom files have some bugs; filed HDDS-691. > Enable Read from open Containers via Standalone Protocol > > > Key: HDDS-676 > URL: https://issues.apache.org/jira/browse/HDDS-676 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDDS-676.001.patch > > > With BlockCommitSequenceId getting updated per block commit on open > containers in OM as well datanode, Ozone Client reads can through Standalone > protocol not necessarily requiring Ratis. Client should verify the BCSID of > the container which has the data block , which should always be greater than > or equal to the BCSID of the block to be read and the existing block BCSID > should exactly match that of the block to be read. As a part of this, Client > can try to read from a replica with a supplied BCSID and failover to the next > one in case the block does ont exist on one replica. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-691) Dependency convergence error for org.apache.hadoop:hadoop-annotations
[ https://issues.apache.org/jira/browse/HDDS-691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDDS-691: - Description: {code} [WARNING] Dependency convergence error for org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT paths to dependency are: +-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT +-org.apache.hadoop:hadoop-common:3.3.0-20181017.235917-140 +-org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT and +-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT +-org.apache.hadoop:hadoop-common:3.3.0-20181017.235917-140 +-org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT and +-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT +-org.apache.hadoop:hadoop-annotations:3.3.0-20181017.235840-140 [WARNING] Rule 0: org.apache.maven.plugins.enforcer.DependencyConvergence failed with message: {code} > Dependency convergence error for org.apache.hadoop:hadoop-annotations > - > > Key: HDDS-691 > URL: https://issues.apache.org/jira/browse/HDDS-691 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze >Priority: Major > > {code} > [WARNING] > Dependency convergence error for > org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT paths to dependency are: > +-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT > +-org.apache.hadoop:hadoop-common:3.3.0-20181017.235917-140 > +-org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT > and > +-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT > +-org.apache.hadoop:hadoop-common:3.3.0-20181017.235917-140 > +-org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT > and > +-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT > +-org.apache.hadoop:hadoop-annotations:3.3.0-20181017.235840-140 > [WARNING] Rule 0: org.apache.maven.plugins.enforcer.DependencyConvergence > failed with message: > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-691) Dependency convergence error for org.apache.hadoop:hadoop-annotations
Tsz Wo Nicholas Sze created HDDS-691: Summary: Dependency convergence error for org.apache.hadoop:hadoop-annotations Key: HDDS-691 URL: https://issues.apache.org/jira/browse/HDDS-691 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-625) putKey hangs for a long time after completion, sometimes forever
[ https://issues.apache.org/jira/browse/HDDS-625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDDS-625: - Resolution: Fixed Fix Version/s: 0.3.0 Status: Resolved (was: Patch Available) I have committed this. Thanks, [~arpitagarwal]! > putKey hangs for a long time after completion, sometimes forever > > > Key: HDDS-625 > URL: https://issues.apache.org/jira/browse/HDDS-625 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal >Priority: Blocker > Fix For: 0.3.0 > > Attachments: HDDS-625.01.patch, HDDS-625.02.patch, > ozone-shell-thread-dump.txt > > > putKey hangs, sometimes forever. > TRACE log output in comment below. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-625) putKey hangs for a long time after completion, sometimes forever
[ https://issues.apache.org/jira/browse/HDDS-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647319#comment-16647319 ] Tsz Wo Nicholas Sze commented on HDDS-625: -- +1 the 02 patch looks good. > putKey hangs for a long time after completion, sometimes forever > > > Key: HDDS-625 > URL: https://issues.apache.org/jira/browse/HDDS-625 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal >Priority: Blocker > Attachments: HDDS-625.01.patch, HDDS-625.02.patch, > ozone-shell-thread-dump.txt > > > putKey hangs, sometimes forever. > TRACE log output in comment below. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-625) putKey hangs for a long time after completion, sometimes forever
[ https://issues.apache.org/jira/browse/HDDS-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647152#comment-16647152 ] Tsz Wo Nicholas Sze commented on HDDS-625: -- I just have deployed Ratis 0.3.0-9b2d7b6-SNAPSHOT. > putKey hangs for a long time after completion, sometimes forever > > > Key: HDDS-625 > URL: https://issues.apache.org/jira/browse/HDDS-625 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Reporter: Arpit Agarwal >Priority: Blocker > Attachments: ozone-shell-thread-dump.txt > > > putKey hangs, sometimes forever. > TRACE log output in comment below. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-625) putKey hangs for a long time after completion, sometimes forever
[ https://issues.apache.org/jira/browse/HDDS-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1664#comment-1664 ] Tsz Wo Nicholas Sze commented on HDDS-625: -- Filed RATIS-348. > putKey hangs for a long time after completion, sometimes forever > > > Key: HDDS-625 > URL: https://issues.apache.org/jira/browse/HDDS-625 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Reporter: Arpit Agarwal >Priority: Blocker > Attachments: ozone-shell-thread-dump.txt > > > putKey hangs, sometimes forever. > TRACE log output in comment below. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-632) TimeoutScheduler and SlidingWindow should use daemon threads
Tsz Wo Nicholas Sze created HDDS-632: Summary: TimeoutScheduler and SlidingWindow should use daemon threads Key: HDDS-632 URL: https://issues.apache.org/jira/browse/HDDS-632 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze In HDDS-625, we found that the Ozone client does not terminate. The SlidingWindow (debug) thread and the TimeoutScheduler threads are holding up process termination. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-625) putKey hangs for a long time after completion, sometimes forever
[ https://issues.apache.org/jira/browse/HDDS-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646661#comment-16646661 ] Tsz Wo Nicholas Sze commented on HDDS-625: -- It seems that the SlidingWindow (debug) thread and the TimeoutScheduler threads are holding up process termination. Setting them to daemon should fix the problem. > putKey hangs for a long time after completion, sometimes forever > > > Key: HDDS-625 > URL: https://issues.apache.org/jira/browse/HDDS-625 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Reporter: Arpit Agarwal >Priority: Blocker > Attachments: ozone-shell-thread-dump.txt > > > putKey hangs, sometimes forever. > TRACE log output in comment below. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-554) In XceiverClientSpi, implements sendCommand(..) using sendCommandAsync(..)
[ https://issues.apache.org/jira/browse/HDDS-554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDDS-554: - Status: Patch Available (was: Open) > In XceiverClientSpi, implements sendCommand(..) using sendCommandAsync(..) > -- > > Key: HDDS-554 > URL: https://issues.apache.org/jira/browse/HDDS-554 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Client >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze >Priority: Major > Attachments: HDDS-554_20180925.patch > > > The advantages is two-fold -- > # it simplifies the code, and > # the async API is more efficient. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-554) In XceiverClientSpi, implements sendCommand(..) using sendCommandAsync(..)
[ https://issues.apache.org/jira/browse/HDDS-554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDDS-554: - Attachment: HDDS-554_20180925.patch > In XceiverClientSpi, implements sendCommand(..) using sendCommandAsync(..) > -- > > Key: HDDS-554 > URL: https://issues.apache.org/jira/browse/HDDS-554 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Client >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze >Priority: Major > Attachments: HDDS-554_20180925.patch > > > The advantages is two-fold -- > # it simplifies the code, and > # the async API is more efficient. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-554) In XceiverClientSpi, implements sendCommand(..) using sendCommandAsync(..)
Tsz Wo Nicholas Sze created HDDS-554: Summary: In XceiverClientSpi, implements sendCommand(..) using sendCommandAsync(..) Key: HDDS-554 URL: https://issues.apache.org/jira/browse/HDDS-554 Project: Hadoop Distributed Data Store Issue Type: Improvement Components: Ozone Client Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze The advantages is two-fold -- # it simplifies the code, and # the async API is more efficient. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-451) PutKey failed due to error "Rejecting write chunk request. Chunk overwrite without explicit request"
[ https://issues.apache.org/jira/browse/HDDS-451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624208#comment-16624208 ] Tsz Wo Nicholas Sze commented on HDDS-451: -- {code} //StateMachine.java /** * Notify the state machine that the raft peer is no longer leader. */ void notifyNotLeader(Collection pendingEntries) throws IOException; {code} ContainerStateMachine should override the above notifyNotLeader(..) so that it can cleanup the not-yet-committed stateMachineData. I will check more the details. > PutKey failed due to error "Rejecting write chunk request. Chunk overwrite > without explicit request" > > > Key: HDDS-451 > URL: https://issues.apache.org/jira/browse/HDDS-451 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Affects Versions: 0.2.1 >Reporter: Nilotpal Nandi >Assignee: Shashikant Banerjee >Priority: Blocker > Labels: alpha2 > Attachments: all-node-ozone-logs-1536841590.tar.gz > > > steps taken : > -- > # Ran Put Key command to write 50GB data. Put Key client operation failed > after 17 mins. > error seen ozone.log : > > > {code} > 2018-09-13 12:11:53,734 [ForkJoinPool.commonPool-worker-20] DEBUG > (ChunkManagerImpl.java:85) - writing > chunk:bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_1 > chunk stage:COMMIT_DATA chunk > file:/tmp/hadoop-root/dfs/data/hdds/de0a9e01-4a12-40e3-b567-51b9bd83248e/current/containerDir0/16/chunks/bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_1 > tmp chunk file > 2018-09-13 12:11:56,576 [pool-3-thread-60] DEBUG (ChunkManagerImpl.java:85) - > writing > chunk:bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2 > chunk stage:WRITE_DATA chunk > file:/tmp/hadoop-root/dfs/data/hdds/de0a9e01-4a12-40e3-b567-51b9bd83248e/current/containerDir0/16/chunks/bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2 > tmp chunk file > 2018-09-13 12:11:56,739 [ForkJoinPool.commonPool-worker-20] DEBUG > (ChunkManagerImpl.java:85) - writing > chunk:bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2 > chunk stage:COMMIT_DATA chunk > file:/tmp/hadoop-root/dfs/data/hdds/de0a9e01-4a12-40e3-b567-51b9bd83248e/current/containerDir0/16/chunks/bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2 > tmp chunk file > 2018-09-13 12:12:21,410 [Datanode State Machine Thread - 0] DEBUG > (DatanodeStateMachine.java:148) - Executing cycle Number : 206 > 2018-09-13 12:12:51,411 [Datanode State Machine Thread - 0] DEBUG > (DatanodeStateMachine.java:148) - Executing cycle Number : 207 > 2018-09-13 12:12:53,525 [BlockDeletingService#1] DEBUG > (TopNOrderedContainerDeletionChoosingPolicy.java:79) - Stop looking for next > container, there is no pending deletion block contained in remaining > containers. > 2018-09-13 12:12:55,048 [Datanode ReportManager Thread - 1] DEBUG > (ContainerSet.java:191) - Starting container report iteration. > 2018-09-13 12:13:02,626 [pool-3-thread-1] ERROR (ChunkUtils.java:244) - > Rejecting write chunk request. Chunk overwrite without explicit request. > ChunkInfo{chunkName='bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2, > offset=0, len=16777216} > 2018-09-13 12:13:03,035 [pool-3-thread-1] INFO (ContainerUtils.java:149) - > Operation: WriteChunk : Trace ID: 54834b29-603d-4ba9-9d68-0885215759d8 : > Message: Rejecting write chunk request. OverWrite flag > required.ChunkInfo{chunkName='bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2, > offset=0, len=16777216} : Result: OVERWRITE_FLAG_REQUIRED > 2018-09-13 12:13:03,037 [ForkJoinPool.commonPool-worker-11] ERROR > (ChunkUtils.java:244) - Rejecting write chunk request. Chunk overwrite > without explicit request. > ChunkInfo{chunkName='bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2, > offset=0, len=16777216} > 2018-09-13 12:13:03,037 [ForkJoinPool.commonPool-worker-11] INFO > (ContainerUtils.java:149) - Operation: WriteChunk : Trace ID: > 54834b29-603d-4ba9-9d68-0885215759d8 : Message: Rejecting write chunk > request. OverWrite flag > required.ChunkInfo{chunkName='bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2, > offset=0, len=16777216} : Result: OVERWRITE_FLAG_REQUIRED > > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail:
[jira] [Commented] (HDDS-368) all tests in TestOzoneRestClient failed due to "zh_CN" OS language
[ https://issues.apache.org/jira/browse/HDDS-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16622823#comment-16622823 ] Tsz Wo Nicholas Sze commented on HDDS-368: -- > Java version: 1.8.0_111 Could you also try updating it? Mine is 1.8.0_172. > FYI, Once the string transfered by HTTP have Chinese character(or character > not in english letters and numbers), The "string".length() will shorter than > the "string".getBytes().length, and then the data will be truncated by > transfer and the error occured. Do you see a way to fix it? > all tests in TestOzoneRestClient failed due to "zh_CN" OS language > -- > > Key: HDDS-368 > URL: https://issues.apache.org/jira/browse/HDDS-368 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Affects Versions: 0.2.1 >Reporter: LiXin Ge >Priority: Critical > Labels: alpha2 > > OS: Ubuntu 16.04.1 LTS (GNU/Linux 4.4.0-116-generic x86_64) > java version: 1.8.0_111 > mvn: Apache Maven 3.3.9 > Default locale: zh_CN, platform encoding: UTF-8 > Test command: mvn test -Dtest=TestOzoneRestClient -Phdds > > All the tests in TestOzoneRestClient failed in my local machine with > exception like below, does it mean anybody who have runtime environment like > me can't run the Ozone Rest test now? > {noformat} > [ERROR] > testCreateBucket(org.apache.hadoop.ozone.client.rest.TestOzoneRestClient) > Time elapsed: 0.01 s <<< ERROR! > java.io.IOException: org.apache.hadoop.ozone.client.rest.OzoneException: > Unparseable date: "m, 28 1970 19:23:50 GMT" > at > org.apache.hadoop.ozone.client.rest.RestClient.executeHttpRequest(RestClient.java:853) > at > org.apache.hadoop.ozone.client.rest.RestClient.createVolume(RestClient.java:252) > at > org.apache.hadoop.ozone.client.rest.RestClient.createVolume(RestClient.java:210) > at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.ozone.client.OzoneClientInvocationHandler.invoke(OzoneClientInvocationHandler.java:54) > at com.sun.proxy.$Proxy73.createVolume(Unknown Source) > at > org.apache.hadoop.ozone.client.ObjectStore.createVolume(ObjectStore.java:66) > at > org.apache.hadoop.ozone.client.rest.TestOzoneRestClient.testCreateBucket(TestOzoneRestClient.java:174) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > Caused by: org.apache.hadoop.ozone.client.rest.OzoneException: Unparseable > date: "m, 28 1970 19:23:50 GMT" > at sun.reflect.GeneratedConstructorAccessor27.newInstance(Unknown > Source) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > com.fasterxml.jackson.databind.introspect.AnnotatedConstructor.call(AnnotatedConstructor.java:119) > at > com.fasterxml.jackson.databind.deser.std.StdValueInstantiator.createUsingDefault(StdValueInstantiator.java:270) > at > com.fasterxml.jackson.databind.deser.std.ThrowableDeserializer.deserializeFromObject(ThrowableDeserializer.java:149) > at > com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:159) > at > com.fasterxml.jackson.databind.ObjectReader._bindAndClose(ObjectReader.java:1611) > at > com.fasterxml.jackson.databind.ObjectReader.readValue(ObjectReader.java:1219) > at > org.apache.hadoop.ozone.client.rest.OzoneException.parse(OzoneException.java:265) > ... 39 more > {noformat} > or like: > {noformat} > [ERROR] Failures: > [ERROR] TestOzoneRestClient.testDeleteKey > Expected: exception with message a string containing "Lookup key failed, > error" > but: message was "Unexpected end-of-input within/between Object entries > at [Source: (String)"{ > "owner" : { > "name" : "hadoop" > }, > "quota" : { > "unit" : "TB", > "size" : 1048576 > }, > "volumeName" : "f93ed82d-dff6-4b75-a1c5-6a0fef5aa6dd", > "createdOn" : "���, 06 ��� +50611 08:28:21 GMT", > "createdBy" "; line: 11, column: 251]" > Stacktrace was: com.fasterxml.jackson.core.io.JsonEOFException: Unexpected > end-of-input within/between Object entries > at [Source: (String)"{ > "owner" : { > "name" : "hadoop" >
[jira] [Commented] (HDDS-451) PutKey failed due to error "Rejecting write chunk request. Chunk overwrite without explicit request"
[ https://issues.apache.org/jira/browse/HDDS-451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16621011#comment-16621011 ] Tsz Wo Nicholas Sze commented on HDDS-451: -- Then, we should log it for easier debugging. > PutKey failed due to error "Rejecting write chunk request. Chunk overwrite > without explicit request" > > > Key: HDDS-451 > URL: https://issues.apache.org/jira/browse/HDDS-451 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Affects Versions: 0.2.1 >Reporter: Nilotpal Nandi >Assignee: Shashikant Banerjee >Priority: Blocker > Labels: alpha2 > Attachments: all-node-ozone-logs-1536841590.tar.gz > > > steps taken : > -- > # Ran Put Key command to write 50GB data. Put Key client operation failed > after 17 mins. > error seen ozone.log : > > > {code} > 2018-09-13 12:11:53,734 [ForkJoinPool.commonPool-worker-20] DEBUG > (ChunkManagerImpl.java:85) - writing > chunk:bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_1 > chunk stage:COMMIT_DATA chunk > file:/tmp/hadoop-root/dfs/data/hdds/de0a9e01-4a12-40e3-b567-51b9bd83248e/current/containerDir0/16/chunks/bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_1 > tmp chunk file > 2018-09-13 12:11:56,576 [pool-3-thread-60] DEBUG (ChunkManagerImpl.java:85) - > writing > chunk:bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2 > chunk stage:WRITE_DATA chunk > file:/tmp/hadoop-root/dfs/data/hdds/de0a9e01-4a12-40e3-b567-51b9bd83248e/current/containerDir0/16/chunks/bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2 > tmp chunk file > 2018-09-13 12:11:56,739 [ForkJoinPool.commonPool-worker-20] DEBUG > (ChunkManagerImpl.java:85) - writing > chunk:bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2 > chunk stage:COMMIT_DATA chunk > file:/tmp/hadoop-root/dfs/data/hdds/de0a9e01-4a12-40e3-b567-51b9bd83248e/current/containerDir0/16/chunks/bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2 > tmp chunk file > 2018-09-13 12:12:21,410 [Datanode State Machine Thread - 0] DEBUG > (DatanodeStateMachine.java:148) - Executing cycle Number : 206 > 2018-09-13 12:12:51,411 [Datanode State Machine Thread - 0] DEBUG > (DatanodeStateMachine.java:148) - Executing cycle Number : 207 > 2018-09-13 12:12:53,525 [BlockDeletingService#1] DEBUG > (TopNOrderedContainerDeletionChoosingPolicy.java:79) - Stop looking for next > container, there is no pending deletion block contained in remaining > containers. > 2018-09-13 12:12:55,048 [Datanode ReportManager Thread - 1] DEBUG > (ContainerSet.java:191) - Starting container report iteration. > 2018-09-13 12:13:02,626 [pool-3-thread-1] ERROR (ChunkUtils.java:244) - > Rejecting write chunk request. Chunk overwrite without explicit request. > ChunkInfo{chunkName='bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2, > offset=0, len=16777216} > 2018-09-13 12:13:03,035 [pool-3-thread-1] INFO (ContainerUtils.java:149) - > Operation: WriteChunk : Trace ID: 54834b29-603d-4ba9-9d68-0885215759d8 : > Message: Rejecting write chunk request. OverWrite flag > required.ChunkInfo{chunkName='bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2, > offset=0, len=16777216} : Result: OVERWRITE_FLAG_REQUIRED > 2018-09-13 12:13:03,037 [ForkJoinPool.commonPool-worker-11] ERROR > (ChunkUtils.java:244) - Rejecting write chunk request. Chunk overwrite > without explicit request. > ChunkInfo{chunkName='bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2, > offset=0, len=16777216} > 2018-09-13 12:13:03,037 [ForkJoinPool.commonPool-worker-11] INFO > (ContainerUtils.java:149) - Operation: WriteChunk : Trace ID: > 54834b29-603d-4ba9-9d68-0885215759d8 : Message: Rejecting write chunk > request. OverWrite flag > required.ChunkInfo{chunkName='bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2, > offset=0, len=16777216} : Result: OVERWRITE_FLAG_REQUIRED > > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDDS-368) all tests in TestOzoneRestClient failed due to "zh_CN" OS language
[ https://issues.apache.org/jira/browse/HDDS-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16620987#comment-16620987 ] Tsz Wo Nicholas Sze edited comment on HDDS-368 at 9/19/18 6:22 PM: --- {code} $mvn --version Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T11:33:14-07:00) Maven home: /usr/local/Cellar/maven/3.5.4/libexec Java version: 1.8.0_172, vendor: Oracle Corporation, runtime: /Library/Java/JavaVirtualMachines/jdk1.8.0_172.jdk/Contents/Home/jre Default locale: zh_CN, platform encoding: UTF-8 OS name: "mac os x", version: "10.13.6", arch: "x86_64", family: "mac" {code} [~GeLiXin], I have set my locale to zh_CN. I can see some compiler warnings in Chinese but TestOzoneRestClient has not failed. Could you try updating your maven/java versions? {code} [INFO] Compiling 23 source files to /Users/szetszwo/hadoop/apache-hadoop/hadoop-ozone/integration-test/target/test-classes [WARNING] /Users/szetszwo/hadoop/apache-hadoop/hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/web/client/TestKeys.java: 某些输入文件使用了未经检查或不安全的操作。 [WARNING] /Users/szetszwo/hadoop/apache-hadoop/hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/web/client/TestKeys.java: 有关详细信息, 请使用 -Xlint:unchecked 重新编译。 [INFO] [INFO] --- maven-surefire-plugin:2.21.0:test (default-test) @ hadoop-ozone-integration-test --- [INFO] [INFO] --- [INFO] T E S T S [INFO] --- [INFO] Running org.apache.hadoop.ozone.client.rest.TestOzoneRestClient [INFO] Tests run: 21, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 12.136 s - in org.apache.hadoop.ozone.client.rest.TestOzoneRestClient [INFO] [INFO] Results: [INFO] [INFO] Tests run: 21, Failures: 0, Errors: 0, Skipped: 0 {code} was (Author: szetszwo): {code} $mvn --version Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T11:33:14-07:00) Maven home: /usr/local/Cellar/maven/3.5.4/libexec Java version: 1.8.0_172, vendor: Oracle Corporation, runtime: /Library/Java/JavaVirtualMachines/jdk1.8.0_172.jdk/Contents/Home/jre Default locale: zh_CN, platform encoding: UTF-8 OS name: "mac os x", version: "10.13.6", arch: "x86_64", family: "mac" {code} [~GeLiXin], I have set my locale to zh_CN. I can see some compiler warnings in Chinese but TestOzoneRestClient have not failed. Could you try updating your maven/java versions? {code} [INFO] Compiling 23 source files to /Users/szetszwo/hadoop/apache-hadoop/hadoop-ozone/integration-test/target/test-classes [WARNING] /Users/szetszwo/hadoop/apache-hadoop/hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/web/client/TestKeys.java: 某些输入文件使用了未经检查或不安全的操作。 [WARNING] /Users/szetszwo/hadoop/apache-hadoop/hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/web/client/TestKeys.java: 有关详细信息, 请使用 -Xlint:unchecked 重新编译。 [INFO] [INFO] --- maven-surefire-plugin:2.21.0:test (default-test) @ hadoop-ozone-integration-test --- [INFO] [INFO] --- [INFO] T E S T S [INFO] --- [INFO] Running org.apache.hadoop.ozone.client.rest.TestOzoneRestClient [INFO] Tests run: 21, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 12.136 s - in org.apache.hadoop.ozone.client.rest.TestOzoneRestClient [INFO] [INFO] Results: [INFO] [INFO] Tests run: 21, Failures: 0, Errors: 0, Skipped: 0 [INFO] {code} > all tests in TestOzoneRestClient failed due to "zh_CN" OS language > -- > > Key: HDDS-368 > URL: https://issues.apache.org/jira/browse/HDDS-368 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Affects Versions: 0.2.1 >Reporter: LiXin Ge >Priority: Critical > Labels: alpha2 > > OS: Ubuntu 16.04.1 LTS (GNU/Linux 4.4.0-116-generic x86_64) > java version: 1.8.0_111 > mvn: Apache Maven 3.3.9 > Default locale: zh_CN, platform encoding: UTF-8 > Test command: mvn test -Dtest=TestOzoneRestClient -Phdds > > All the tests in TestOzoneRestClient failed in my local machine with > exception like below, does it mean anybody who have runtime environment like > me can't run the Ozone Rest test now? > {noformat} > [ERROR] > testCreateBucket(org.apache.hadoop.ozone.client.rest.TestOzoneRestClient) > Time elapsed: 0.01 s <<< ERROR! > java.io.IOException: org.apache.hadoop.ozone.client.rest.OzoneException: > Unparseable date: "m, 28 1970 19:23:50 GMT" > at > org.apache.hadoop.ozone.client.rest.RestClient.executeHttpRequest(RestClient.java:853) > at > org.apache.hadoop.ozone.client.rest.RestClient.createVolume(RestClient.java:252) > at >