[jira] [Commented] (HDDS-2152) Ozone client fails with OOM while writing a large (~300MB) key.

2019-09-24 Thread Tsz Wo Nicholas Sze (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937255#comment-16937255
 ] 

Tsz Wo Nicholas Sze commented on HDDS-2152:
---

HDDS-2169 has a patch, that will address a buffer copy.

> Ozone client fails with OOM while writing a large (~300MB) key.
> ---
>
> Key: HDDS-2152
> URL: https://issues.apache.org/jira/browse/HDDS-2152
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Reporter: Aravindan Vijayan
>Assignee: YiSheng Lien
>Priority: Major
> Attachments: largekey.png
>
>
> {code}
> dd if=/dev/zero of=testfile bs=1024 count=307200
> ozone sh key put /vol1/bucket1/key testfile
> {code}
> {code}
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at 
> java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) at 
> java.nio.ByteBuffer.allocate(ByteBuffer.java:335) at 
> org.apache.hadoop.hdds.scm.storage.BufferPool.allocateBufferIfNeeded(BufferPool.java:66)
>  at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.write(BlockOutputStream.java:234)
>  at 
> org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.write(BlockOutputStreamEntry.java:129)
>  at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:211)
>  at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:193)
>  at 
> org.apache.hadoop.ozone.client.io.OzoneOutputStream.write(OzoneOutputStream.java:49)
>  at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:96) at 
> org.apache.hadoop.ozone.web.ozShell.keys.PutKeyHandler.call(PutKeyHandler.java:117)
>  at 
> org.apache.hadoop.ozone.web.ozShell.keys.PutKeyHandler.call(PutKeyHandler.java:55)
>  at picocli.CommandLine.execute(CommandLine.java:1173) at 
> picocli.CommandLine.access$800(CommandLine.java:141)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2169) Avoid buffer copies while submitting client requests in Ratis

2019-09-24 Thread Tsz Wo Nicholas Sze (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937249#comment-16937249
 ] 

Tsz Wo Nicholas Sze commented on HDDS-2169:
---

[~msingh], thanks for taking a look.  The patch does apply.  Have you tried it?

Anyway, I just have submitted a pull request 
https://github.com/apache/hadoop/pull/1517

> Also this problem needs to be fixed for appendEntries from leader to follower 
> as well.

Sure, let's fix it in a separated JIRA.



> Avoid buffer copies while submitting client requests in Ratis
> -
>
> Key: HDDS-2169
> URL: https://issues.apache.org/jira/browse/HDDS-2169
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Shashikant Banerjee
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
>  Labels: pull-request-available
> Attachments: o2169_20190923.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, while sending write requests to Ratis from ozone, a protobuf 
> object containing data encoded  and then resultant protobuf is again 
> converted to a byteString which internally does a copy of the buffer embedded 
> inside the protobuf again so that it can be submitted over to Ratis client. 
> Again, while sending the appendRequest as well while building up the 
> appendRequestProto, it might be again copying the data. The idea here is to 
> provide client so pass the raw data(stateMachine data) separately to ratis 
> client without copying overhead. 
>  
> {code:java}
> private CompletableFuture sendRequestAsync(
> ContainerCommandRequestProto request) {
>   try (Scope scope = GlobalTracer.get()
>   .buildSpan("XceiverClientRatis." + request.getCmdType().name())
>   .startActive(true)) {
> ContainerCommandRequestProto finalPayload =
> ContainerCommandRequestProto.newBuilder(request)
> .setTraceID(TracingUtil.exportCurrentSpan())
> .build();
> boolean isReadOnlyRequest = HddsUtils.isReadOnly(finalPayload);
> //  finalPayload already has the byteString data embedded. 
> ByteString byteString = finalPayload.toByteString(); -> It involves a 
> copy again.
> if (LOG.isDebugEnabled()) {
>   LOG.debug("sendCommandAsync {} {}", isReadOnlyRequest,
>   sanitizeForDebug(finalPayload));
> }
> return isReadOnlyRequest ?
> getClient().sendReadOnlyAsync(() -> byteString) :
> getClient().sendAsync(() -> byteString);
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work started] (HDDS-2169) Avoid buffer copies while submitting client requests in Ratis

2019-09-23 Thread Tsz Wo Nicholas Sze (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDDS-2169 started by Tsz Wo Nicholas Sze.
-
> Avoid buffer copies while submitting client requests in Ratis
> -
>
> Key: HDDS-2169
> URL: https://issues.apache.org/jira/browse/HDDS-2169
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Shashikant Banerjee
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
> Attachments: o2169_20190923.patch
>
>
> Currently, while sending write requests to Ratis from ozone, a protobuf 
> object containing data encoded  and then resultant protobuf is again 
> converted to a byteString which internally does a copy of the buffer embedded 
> inside the protobuf again so that it can be submitted over to Ratis client. 
> Again, while sending the appendRequest as well while building up the 
> appendRequestProto, it might be again copying the data. The idea here is to 
> provide client so pass the raw data(stateMachine data) separately to ratis 
> client without copying overhead. 
>  
> {code:java}
> private CompletableFuture sendRequestAsync(
> ContainerCommandRequestProto request) {
>   try (Scope scope = GlobalTracer.get()
>   .buildSpan("XceiverClientRatis." + request.getCmdType().name())
>   .startActive(true)) {
> ContainerCommandRequestProto finalPayload =
> ContainerCommandRequestProto.newBuilder(request)
> .setTraceID(TracingUtil.exportCurrentSpan())
> .build();
> boolean isReadOnlyRequest = HddsUtils.isReadOnly(finalPayload);
> //  finalPayload already has the byteString data embedded. 
> ByteString byteString = finalPayload.toByteString(); -> It involves a 
> copy again.
> if (LOG.isDebugEnabled()) {
>   LOG.debug("sendCommandAsync {} {}", isReadOnlyRequest,
>   sanitizeForDebug(finalPayload));
> }
> return isReadOnlyRequest ?
> getClient().sendReadOnlyAsync(() -> byteString) :
> getClient().sendAsync(() -> byteString);
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2169) Avoid buffer copies while submitting client requests in Ratis

2019-09-23 Thread Tsz Wo Nicholas Sze (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDDS-2169:
--
Status: Patch Available  (was: Open)

o2169_20190923.patch: 1st patch.

> Avoid buffer copies while submitting client requests in Ratis
> -
>
> Key: HDDS-2169
> URL: https://issues.apache.org/jira/browse/HDDS-2169
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Shashikant Banerjee
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
> Attachments: o2169_20190923.patch
>
>
> Currently, while sending write requests to Ratis from ozone, a protobuf 
> object containing data encoded  and then resultant protobuf is again 
> converted to a byteString which internally does a copy of the buffer embedded 
> inside the protobuf again so that it can be submitted over to Ratis client. 
> Again, while sending the appendRequest as well while building up the 
> appendRequestProto, it might be again copying the data. The idea here is to 
> provide client so pass the raw data(stateMachine data) separately to ratis 
> client without copying overhead. 
>  
> {code:java}
> private CompletableFuture sendRequestAsync(
> ContainerCommandRequestProto request) {
>   try (Scope scope = GlobalTracer.get()
>   .buildSpan("XceiverClientRatis." + request.getCmdType().name())
>   .startActive(true)) {
> ContainerCommandRequestProto finalPayload =
> ContainerCommandRequestProto.newBuilder(request)
> .setTraceID(TracingUtil.exportCurrentSpan())
> .build();
> boolean isReadOnlyRequest = HddsUtils.isReadOnly(finalPayload);
> //  finalPayload already has the byteString data embedded. 
> ByteString byteString = finalPayload.toByteString(); -> It involves a 
> copy again.
> if (LOG.isDebugEnabled()) {
>   LOG.debug("sendCommandAsync {} {}", isReadOnlyRequest,
>   sanitizeForDebug(finalPayload));
> }
> return isReadOnlyRequest ?
> getClient().sendReadOnlyAsync(() -> byteString) :
> getClient().sendAsync(() -> byteString);
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work stopped] (HDDS-2169) Avoid buffer copies while submitting client requests in Ratis

2019-09-23 Thread Tsz Wo Nicholas Sze (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDDS-2169 stopped by Tsz Wo Nicholas Sze.
-
> Avoid buffer copies while submitting client requests in Ratis
> -
>
> Key: HDDS-2169
> URL: https://issues.apache.org/jira/browse/HDDS-2169
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Shashikant Banerjee
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
> Attachments: o2169_20190923.patch
>
>
> Currently, while sending write requests to Ratis from ozone, a protobuf 
> object containing data encoded  and then resultant protobuf is again 
> converted to a byteString which internally does a copy of the buffer embedded 
> inside the protobuf again so that it can be submitted over to Ratis client. 
> Again, while sending the appendRequest as well while building up the 
> appendRequestProto, it might be again copying the data. The idea here is to 
> provide client so pass the raw data(stateMachine data) separately to ratis 
> client without copying overhead. 
>  
> {code:java}
> private CompletableFuture sendRequestAsync(
> ContainerCommandRequestProto request) {
>   try (Scope scope = GlobalTracer.get()
>   .buildSpan("XceiverClientRatis." + request.getCmdType().name())
>   .startActive(true)) {
> ContainerCommandRequestProto finalPayload =
> ContainerCommandRequestProto.newBuilder(request)
> .setTraceID(TracingUtil.exportCurrentSpan())
> .build();
> boolean isReadOnlyRequest = HddsUtils.isReadOnly(finalPayload);
> //  finalPayload already has the byteString data embedded. 
> ByteString byteString = finalPayload.toByteString(); -> It involves a 
> copy again.
> if (LOG.isDebugEnabled()) {
>   LOG.debug("sendCommandAsync {} {}", isReadOnlyRequest,
>   sanitizeForDebug(finalPayload));
> }
> return isReadOnlyRequest ?
> getClient().sendReadOnlyAsync(() -> byteString) :
> getClient().sendAsync(() -> byteString);
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2169) Avoid buffer copies while submitting client requests in Ratis

2019-09-23 Thread Tsz Wo Nicholas Sze (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDDS-2169:
--
Attachment: o2169_20190923.patch

> Avoid buffer copies while submitting client requests in Ratis
> -
>
> Key: HDDS-2169
> URL: https://issues.apache.org/jira/browse/HDDS-2169
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Shashikant Banerjee
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
> Attachments: o2169_20190923.patch
>
>
> Currently, while sending write requests to Ratis from ozone, a protobuf 
> object containing data encoded  and then resultant protobuf is again 
> converted to a byteString which internally does a copy of the buffer embedded 
> inside the protobuf again so that it can be submitted over to Ratis client. 
> Again, while sending the appendRequest as well while building up the 
> appendRequestProto, it might be again copying the data. The idea here is to 
> provide client so pass the raw data(stateMachine data) separately to ratis 
> client without copying overhead. 
>  
> {code:java}
> private CompletableFuture sendRequestAsync(
> ContainerCommandRequestProto request) {
>   try (Scope scope = GlobalTracer.get()
>   .buildSpan("XceiverClientRatis." + request.getCmdType().name())
>   .startActive(true)) {
> ContainerCommandRequestProto finalPayload =
> ContainerCommandRequestProto.newBuilder(request)
> .setTraceID(TracingUtil.exportCurrentSpan())
> .build();
> boolean isReadOnlyRequest = HddsUtils.isReadOnly(finalPayload);
> //  finalPayload already has the byteString data embedded. 
> ByteString byteString = finalPayload.toByteString(); -> It involves a 
> copy again.
> if (LOG.isDebugEnabled()) {
>   LOG.debug("sendCommandAsync {} {}", isReadOnlyRequest,
>   sanitizeForDebug(finalPayload));
> }
> return isReadOnlyRequest ?
> getClient().sendReadOnlyAsync(() -> byteString) :
> getClient().sendAsync(() -> byteString);
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Moved] (HDDS-2169) Avoid buffer copies while submitting client requests in Ratis

2019-09-23 Thread Tsz Wo Nicholas Sze (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze moved RATIS-688 to HDDS-2169:
-

 Component/s: (was: server)
  (was: client)
   Fix Version/s: (was: 0.4.0)
 Key: HDDS-2169  (was: RATIS-688)
Target Version/s:   (was: 0.4.0)
Workflow: patch-available, re-open possible  (was: 
no-reopen-closed, patch-avail)
  Issue Type: Improvement  (was: Bug)
 Project: Hadoop Distributed Data Store  (was: Ratis)

> Avoid buffer copies while submitting client requests in Ratis
> -
>
> Key: HDDS-2169
> URL: https://issues.apache.org/jira/browse/HDDS-2169
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Shashikant Banerjee
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
>
> Currently, while sending write requests to Ratis from ozone, a protobuf 
> object containing data encoded  and then resultant protobuf is again 
> converted to a byteString which internally does a copy of the buffer embedded 
> inside the protobuf again so that it can be submitted over to Ratis client. 
> Again, while sending the appendRequest as well while building up the 
> appendRequestProto, it might be again copying the data. The idea here is to 
> provide client so pass the raw data(stateMachine data) separately to ratis 
> client without copying overhead. 
>  
> {code:java}
> private CompletableFuture sendRequestAsync(
> ContainerCommandRequestProto request) {
>   try (Scope scope = GlobalTracer.get()
>   .buildSpan("XceiverClientRatis." + request.getCmdType().name())
>   .startActive(true)) {
> ContainerCommandRequestProto finalPayload =
> ContainerCommandRequestProto.newBuilder(request)
> .setTraceID(TracingUtil.exportCurrentSpan())
> .build();
> boolean isReadOnlyRequest = HddsUtils.isReadOnly(finalPayload);
> //  finalPayload already has the byteString data embedded. 
> ByteString byteString = finalPayload.toByteString(); -> It involves a 
> copy again.
> if (LOG.isDebugEnabled()) {
>   LOG.debug("sendCommandAsync {} {}", isReadOnlyRequest,
>   sanitizeForDebug(finalPayload));
> }
> return isReadOnlyRequest ?
> getClient().sendReadOnlyAsync(() -> byteString) :
> getClient().sendAsync(() -> byteString);
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13101) Yet another fsimage corruption related to snapshot

2019-08-15 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908523#comment-16908523
 ] 

Tsz Wo Nicholas Sze commented on HDFS-13101:


> ... Do you plan to cherrypick the commit into lower branches?  I am happy to 
> help out ...

[~jojochuang], sound good.  Please help.  Thanks a lot!

> Yet another fsimage corruption related to snapshot
> --
>
> Key: HDFS-13101
> URL: https://issues.apache.org/jira/browse/HDFS-13101
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Yongjun Zhang
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-13101.001.patch, HDFS-13101.002.patch, 
> HDFS-13101.003.patch, HDFS-13101.004.patch, 
> HDFS-13101.corruption_repro.patch, 
> HDFS-13101.corruption_repro_simplified.patch
>
>
> Lately we saw case similar to HDFS-9406, even though HDFS-9406 fix is 
> present, so it's likely another case not covered by the fix. We are currently 
> trying to collect good fsimage + editlogs to replay to reproduce it and 
> investigate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13101) Yet another fsimage corruption related to snapshot

2019-08-07 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902330#comment-16902330
 ] 

Tsz Wo Nicholas Sze commented on HDFS-13101:


[~shashikant], great work on the patch!  Could you fix the checkstyle warnings 
and see if the unit test failures are related?

> Yet another fsimage corruption related to snapshot
> --
>
> Key: HDFS-13101
> URL: https://issues.apache.org/jira/browse/HDFS-13101
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Yongjun Zhang
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-13101.001.patch, HDFS-13101.002.patch, 
> HDFS-13101.003.patch, HDFS-13101.corruption_repro.patch, 
> HDFS-13101.corruption_repro_simplified.patch
>
>
> Lately we saw case similar to HDFS-9406, even though HDFS-9406 fix is 
> present, so it's likely another case not covered by the fix. We are currently 
> trying to collect good fsimage + editlogs to replay to reproduce it and 
> investigate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13101) Yet another fsimage corruption related to snapshot

2019-08-05 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900382#comment-16900382
 ] 

Tsz Wo Nicholas Sze commented on HDFS-13101:


+1 the 003 patch looks good.  Pending Jenkins.

> Yet another fsimage corruption related to snapshot
> --
>
> Key: HDFS-13101
> URL: https://issues.apache.org/jira/browse/HDFS-13101
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Yongjun Zhang
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-13101.001.patch, HDFS-13101.002.patch, 
> HDFS-13101.003.patch, HDFS-13101.corruption_repro.patch, 
> HDFS-13101.corruption_repro_simplified.patch
>
>
> Lately we saw case similar to HDFS-9406, even though HDFS-9406 fix is 
> present, so it's likely another case not covered by the fix. We are currently 
> trying to collect good fsimage + editlogs to replay to reproduce it and 
> investigate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13101) Yet another fsimage corruption related to snapshot

2019-08-02 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899171#comment-16899171
 ] 

Tsz Wo Nicholas Sze commented on HDFS-13101:


The fix could be
{code}
// DirectoryWithSnapshotFeature#cleanDirectory().
   if (priorCreated != null) {
-// we only check the node originally in prior's created list
-for (INode cNode : priorDiff.diff.getCreatedUnmodifiable()) {
-  if (priorCreated.containsKey(cNode)) {
-cNode.cleanSubtree(reclaimContext, snapshot, NO_SNAPSHOT_ID);
+if (currentINode.isLastReference()) {
+  // if this is the last reference, the created list can be 
destroyed.
+  priorDiff.getChildrenDiff().destroyCreatedList(
+  reclaimContext, currentINode);
+} else {
+  // we only check the node originally in prior's created list
+  for (INode cNode : priorDiff.diff.getCreatedUnmodifiable()) {
+if (priorCreated.containsKey(cNode)) {
+  cNode.cleanSubtree(reclaimContext, snapshot, NO_SNAPSHOT_ID);
+}
   }
 }
   }
{code}
where isLastReference() is a new method in INode.
{code}
//INode.java
  /**
   * @return true if this is a reference and the reference count is 1;
   * otherwise, return false.
   */
  public boolean isLastReference() {
final INodeReference ref = getParentReference();
if (!(ref instanceof WithCount)) {
  return false;
}
return ((WithCount)ref).getReferenceCount() == 1;
  }
{code}

> Yet another fsimage corruption related to snapshot
> --
>
> Key: HDFS-13101
> URL: https://issues.apache.org/jira/browse/HDFS-13101
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Yongjun Zhang
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-13101.001.patch, HDFS-13101.corruption_repro.patch, 
> HDFS-13101.corruption_repro_simplified.patch
>
>
> Lately we saw case similar to HDFS-9406, even though HDFS-9406 fix is 
> present, so it's likely another case not covered by the fix. We are currently 
> trying to collect good fsimage + editlogs to replay to reproduce it and 
> investigate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13101) Yet another fsimage corruption related to snapshot

2019-08-02 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899168#comment-16899168
 ] 

Tsz Wo Nicholas Sze commented on HDFS-13101:


I agree that the bug is in DirectoryWithSnapshotFeature#cleanDirectory().  When 
the directory is the last reference, the entire created list should be 
destroyed instead of cleaning individual cNode.

> Yet another fsimage corruption related to snapshot
> --
>
> Key: HDFS-13101
> URL: https://issues.apache.org/jira/browse/HDFS-13101
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Yongjun Zhang
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-13101.001.patch, HDFS-13101.corruption_repro.patch, 
> HDFS-13101.corruption_repro_simplified.patch
>
>
> Lately we saw case similar to HDFS-9406, even though HDFS-9406 fix is 
> present, so it's likely another case not covered by the fix. We are currently 
> trying to collect good fsimage + editlogs to replay to reproduce it and 
> investigate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-13101) Yet another fsimage corruption related to snapshot

2019-08-02 Thread Tsz Wo Nicholas Sze (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze reassigned HDFS-13101:
--

   Assignee: Shashikant Banerjee  (was: Siyao Meng)
Component/s: snapshots

[~shashikant], it is great that you have come up a small unit test showing the 
bug!  Assigning this to you ...

> Yet another fsimage corruption related to snapshot
> --
>
> Key: HDFS-13101
> URL: https://issues.apache.org/jira/browse/HDFS-13101
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Yongjun Zhang
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-13101.001.patch, HDFS-13101.corruption_repro.patch, 
> HDFS-13101.corruption_repro_simplified.patch
>
>
> Lately we saw case similar to HDFS-9406, even though HDFS-9406 fix is 
> present, so it's likely another case not covered by the fix. We are currently 
> trying to collect good fsimage + editlogs to replay to reproduce it and 
> investigate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14499) Misleading REM_QUOTA value with snapshot and trash feature enabled for a directory

2019-07-11 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16883334#comment-16883334
 ] 

Tsz Wo Nicholas Sze commented on HDFS-14499:


+1 the 003 patch looks good.

> Misleading REM_QUOTA value with snapshot and trash feature enabled for a 
> directory
> --
>
> Key: HDFS-14499
> URL: https://issues.apache.org/jira/browse/HDFS-14499
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-14499.000.patch, HDFS-14499.001.patch, 
> HDFS-14499.002.patch
>
>
> This is the flow of steps where we see a discrepancy between REM_QUOTA and 
> new file operation failure. REM_QUOTA shows a value of  1 but file creation 
> operation does not succeed.
> {code:java}
> hdfs@c3265-node3 root$ hdfs dfs -mkdir /dir1
> hdfs@c3265-node3 root$ hdfs dfsadmin -setQuota 2 /dir1
> hdfs@c3265-node3 root$ hdfs dfsadmin -allowSnapshot /dir1
> Allowing snaphot on /dir1 succeeded
> hdfs@c3265-node3 root$ hdfs dfs -touchz /dir1/file1
> hdfs@c3265-node3 root$ hdfs dfs -createSnapshot /dir1 snap1
> Created snapshot /dir1/.snapshot/snap1
> hdfs@c3265-node3 root$ hdfs dfs -count -v -q /dir1
> QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE 
> PATHNAME
> 2 0 none inf 1 1 0 /dir1
> hdfs@c3265-node3 root$ hdfs dfs -rm /dir1/file1
> 19/03/26 11:20:25 INFO fs.TrashPolicyDefault: Moved: 
> 'hdfs://smajetinn/dir1/file1' to trash at: 
> hdfs://smajetinn/user/hdfs/.Trash/Current/dir1/file11553599225772
> hdfs@c3265-node3 root$ hdfs dfs -count -v -q /dir1
> QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE 
> PATHNAME
> 2 1 none inf 1 0 0 /dir1
> hdfs@c3265-node3 root$ hdfs dfs -touchz /dir1/file1
> touchz: The NameSpace quota (directories and files) of directory /dir1 is 
> exceeded: quota=2 file count=3{code}
> The issue here, is that the count command takes only files and directories 
> into account not the inode references. When trash is enabled, the deletion of 
> files inside a directory actually does a rename operation as a result of 
> which an inode reference is maintained in the deleted list of the snapshot 
> diff which is taken into account while computing the namespace quota, but 
> count command (getContentSummary()) ,just takes into account just the files 
> and directories, not the referenced entity for calculating the REM_QUOTA. The 
> referenced entity is taken into account for space quota only.
> InodeReference.java:
> ---
> {code:java}
>  @Override
> public final ContentSummaryComputationContext computeContentSummary(
> int snapshotId, ContentSummaryComputationContext summary) {
>   final int s = snapshotId < lastSnapshotId ? snapshotId : lastSnapshotId;
>   // only count storagespace for WithName
>   final QuotaCounts q = computeQuotaUsage(
>   summary.getBlockStoragePolicySuite(), getStoragePolicyID(), false, 
> s);
>   summary.getCounts().addContent(Content.DISKSPACE, q.getStorageSpace());
>   summary.getCounts().addTypeSpaces(q.getTypeSpaces());
>   return summary;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14499) Misleading REM_QUOTA value with snasphot and trash feature enabled for a directory

2019-07-10 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16882340#comment-16882340
 ] 

Tsz Wo Nicholas Sze commented on HDFS-14499:


{code}
+  int id = lastSnapshotId != Snapshot.CURRENT_STATE_ID ? snapshotId :
+  this.lastSnapshotId;
{code}
It should be {{snapshotId != Snapshot.CURRENT_STATE_ID}}.  The patch looks good 
other than that.

> Misleading REM_QUOTA value with snasphot and trash feature enabled for a 
> directory
> --
>
> Key: HDFS-14499
> URL: https://issues.apache.org/jira/browse/HDFS-14499
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-14499.000.patch, HDFS-14499.001.patch
>
>
> This is the flow of steps where we see a discrepancy between REM_QUOTA and 
> new file operation failure. REM_QUOTA shows a value of  1 but file creation 
> operation does not succeed.
> {code:java}
> hdfs@c3265-node3 root$ hdfs dfs -mkdir /dir1
> hdfs@c3265-node3 root$ hdfs dfsadmin -setQuota 2 /dir1
> hdfs@c3265-node3 root$ hdfs dfsadmin -allowSnapshot /dir1
> Allowing snaphot on /dir1 succeeded
> hdfs@c3265-node3 root$ hdfs dfs -touchz /dir1/file1
> hdfs@c3265-node3 root$ hdfs dfs -createSnapshot /dir1 snap1
> Created snapshot /dir1/.snapshot/snap1
> hdfs@c3265-node3 root$ hdfs dfs -count -v -q /dir1
> QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE 
> PATHNAME
> 2 0 none inf 1 1 0 /dir1
> hdfs@c3265-node3 root$ hdfs dfs -rm /dir1/file1
> 19/03/26 11:20:25 INFO fs.TrashPolicyDefault: Moved: 
> 'hdfs://smajetinn/dir1/file1' to trash at: 
> hdfs://smajetinn/user/hdfs/.Trash/Current/dir1/file11553599225772
> hdfs@c3265-node3 root$ hdfs dfs -count -v -q /dir1
> QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE 
> PATHNAME
> 2 1 none inf 1 0 0 /dir1
> hdfs@c3265-node3 root$ hdfs dfs -touchz /dir1/file1
> touchz: The NameSpace quota (directories and files) of directory /dir1 is 
> exceeded: quota=2 file count=3{code}
> The issue here, is that the count command takes only files and directories 
> into account not the inode references. When trash is enabled, the deletion of 
> files inside a directory actually does a rename operation as a result of 
> which an inode reference is maintained in the deleted list of the snapshot 
> diff which is taken into account while computing the namespace quota, but 
> count command (getContentSummary()) ,just takes into account just the files 
> and directories, not the referenced entity for calculating the REM_QUOTA. The 
> referenced entity is taken into account for space quota only.
> InodeReference.java:
> ---
> {code:java}
>  @Override
> public final ContentSummaryComputationContext computeContentSummary(
> int snapshotId, ContentSummaryComputationContext summary) {
>   final int s = snapshotId < lastSnapshotId ? snapshotId : lastSnapshotId;
>   // only count storagespace for WithName
>   final QuotaCounts q = computeQuotaUsage(
>   summary.getBlockStoragePolicySuite(), getStoragePolicyID(), false, 
> s);
>   summary.getCounts().addContent(Content.DISKSPACE, q.getStorageSpace());
>   summary.getCounts().addTypeSpaces(q.getTypeSpaces());
>   return summary;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14499) Misleading REM_QUOTA value with snasphot and trash feature enabled for a directory

2019-05-31 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852747#comment-16852747
 ] 

Tsz Wo Nicholas Sze commented on HDFS-14499:


Thanks [~shashikant].

The parameter name in computeContentSummary is snapshotId but not 
lastSnapshotId.  So that the code needs to be updated.

> Misleading REM_QUOTA value with snasphot and trash feature enabled for a 
> directory
> --
>
> Key: HDFS-14499
> URL: https://issues.apache.org/jira/browse/HDFS-14499
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-14499.000.patch
>
>
> This is the flow of steps where we see a discrepancy between REM_QUOTA and 
> new file operation failure. REM_QUOTA shows a value of  1 but file creation 
> operation does not succeed.
> {code:java}
> hdfs@c3265-node3 root$ hdfs dfs -mkdir /dir1
> hdfs@c3265-node3 root$ hdfs dfsadmin -setQuota 2 /dir1
> hdfs@c3265-node3 root$ hdfs dfsadmin -allowSnapshot /dir1
> Allowing snaphot on /dir1 succeeded
> hdfs@c3265-node3 root$ hdfs dfs -touchz /dir1/file1
> hdfs@c3265-node3 root$ hdfs dfs -createSnapshot /dir1 snap1
> Created snapshot /dir1/.snapshot/snap1
> hdfs@c3265-node3 root$ hdfs dfs -count -v -q /dir1
> QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE 
> PATHNAME
> 2 0 none inf 1 1 0 /dir1
> hdfs@c3265-node3 root$ hdfs dfs -rm /dir1/file1
> 19/03/26 11:20:25 INFO fs.TrashPolicyDefault: Moved: 
> 'hdfs://smajetinn/dir1/file1' to trash at: 
> hdfs://smajetinn/user/hdfs/.Trash/Current/dir1/file11553599225772
> hdfs@c3265-node3 root$ hdfs dfs -count -v -q /dir1
> QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE 
> PATHNAME
> 2 1 none inf 1 0 0 /dir1
> hdfs@c3265-node3 root$ hdfs dfs -touchz /dir1/file1
> touchz: The NameSpace quota (directories and files) of directory /dir1 is 
> exceeded: quota=2 file count=3{code}
> The issue here, is that the count command takes only files and directories 
> into account not the inode references. When trash is enabled, the deletion of 
> files inside a directory actually does a rename operation as a result of 
> which an inode reference is maintained in the deleted list of the snapshot 
> diff which is taken into account while computing the namespace quota, but 
> count command (getContentSummary()) ,just takes into account just the files 
> and directories, not the referenced entity for calculating the REM_QUOTA. The 
> referenced entity is taken into account for space quota only.
> InodeReference.java:
> ---
> {code:java}
>  @Override
> public final ContentSummaryComputationContext computeContentSummary(
> int snapshotId, ContentSummaryComputationContext summary) {
>   final int s = snapshotId < lastSnapshotId ? snapshotId : lastSnapshotId;
>   // only count storagespace for WithName
>   final QuotaCounts q = computeQuotaUsage(
>   summary.getBlockStoragePolicySuite(), getStoragePolicyID(), false, 
> s);
>   summary.getCounts().addContent(Content.DISKSPACE, q.getStorageSpace());
>   summary.getCounts().addTypeSpaces(q.getTypeSpaces());
>   return summary;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-372) There are three buffer copies in BlockOutputStream

2019-04-04 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809946#comment-16809946
 ] 

Tsz Wo Nicholas Sze commented on HDDS-372:
--

+1 the 005 patch looks good.  Pending Jenkins.

> There are three buffer copies in BlockOutputStream
> --
>
> Key: HDDS-372
> URL: https://issues.apache.org/jira/browse/HDDS-372
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Client
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDDS-372.001.patch, HDDS-372.002.patch, 
> HDDS-372.003.patch, HDDS-372.004.patch, HDDS-372.005.patch, 
> HDDS-372.20180829.patch
>
>
> Currently, there are three buffer copies in ChunkOutputStream
>  # from byte[] to ByteBuffer, and
>  # from ByteBuffer to ByteString.
>  # from ByteString to ByteBuffer for checskum computation
> We should eliminate the ByteBuffer in the middle.
> For zero copy io, we should support WritableByteChannel instead of 
> OutputStream. It won't be done in this JIRA.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-372) There are three buffer copies in BlockOutputStream

2019-04-04 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809795#comment-16809795
 ] 

Tsz Wo Nicholas Sze commented on HDDS-372:
--

{code}
// flip the buffer so as to read the data starting from pos 0 again
// for checksum computation in case there is actual copy involved
// in the ByteString conversion
if (!ByteStringHelper.isUnsafeByteOperationsEnabled()) {
  chunk.flip();
}
{code}
- Let's flip the buffer anyway.  Otherwise, it is hard to use the 
ByteStringHelper.getByteString(ByteBuffer) API.
{code}
//ByteStringHelper
  private static ByteString copyFrom(ByteBuffer buffer) {
final ByteString bytes = ByteString.copyFrom(buffer);
buffer.flip();
return bytes;
  }

  public static ByteString getByteString(ByteBuffer buffer) {
return isUnsafeByteOperationsEnabled ?
UnsafeByteOperations.unsafeWrap(buffer) : copyFrom(buffer);
  }
{code}

- Please fix the checkstyle warnings and see if the test failures are related.

Patch looks good other than that.  Thanks.


> There are three buffer copies in BlockOutputStream
> --
>
> Key: HDDS-372
> URL: https://issues.apache.org/jira/browse/HDDS-372
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Client
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDDS-372.001.patch, HDDS-372.002.patch, 
> HDDS-372.003.patch, HDDS-372.004.patch, HDDS-372.20180829.patch
>
>
> Currently, there are three buffer copies in ChunkOutputStream
>  # from byte[] to ByteBuffer, and
>  # from ByteBuffer to ByteString.
>  # from ByteString to ByteBuffer for checskum computation
> We should eliminate the ByteBuffer in the middle.
> For zero copy io, we should support WritableByteChannel instead of 
> OutputStream. It won't be done in this JIRA.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-372) There are three buffer copies in BlockOutputStream

2019-04-03 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809535#comment-16809535
 ] 

Tsz Wo Nicholas Sze commented on HDDS-372:
--

BlockOutputStreamEntry is still using safeBufferByteStringCopy and it is built 
by isUnsafeByteOperationsEnabled, i.e. unsafe becomes safe.  To avoid this kind 
of bug, let's avoid passing the boolean around.  We may initialize 
ByteStringHelper as below.
{code}
public class ByteStringHelper {
  private static final AtomicBoolean initialized = new AtomicBoolean();
  private static volatile boolean isUnsafeByteOperationsEnabled;

  public static void init(boolean isUnsafeByteOperationsEnabled) {
final boolean set = initialized.compareAndSet(false, true);
if (set) {
  ByteStringHelper.isUnsafeByteOperationsEnabled = 
isUnsafeByteOperationsEnabled;
} else {
  // already initialized, check values
  Preconditions.checkState(ByteStringHelper.isUnsafeByteOperationsEnabled 
== isUnsafeByteOperationsEnabled);
}
  }

  public static ByteString getByteString(ByteBuffer buffer) {
return isUnsafeByteOperationsEnabled ?
UnsafeByteOperations.unsafeWrap(buffer) : ByteString.copyFrom(buffer);
  }

  public static ByteString getByteString(byte[] bytes) {
return isUnsafeByteOperationsEnabled ?
UnsafeByteOperations.unsafeWrap(bytes) : ByteString.copyFrom(bytes);
  }
}
{code}


> There are three buffer copies in BlockOutputStream
> --
>
> Key: HDDS-372
> URL: https://issues.apache.org/jira/browse/HDDS-372
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Client
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDDS-372.001.patch, HDDS-372.002.patch, 
> HDDS-372.003.patch, HDDS-372.20180829.patch
>
>
> Currently, there are three buffer copies in ChunkOutputStream
>  # from byte[] to ByteBuffer, and
>  # from ByteBuffer to ByteString.
>  # from ByteString to ByteBuffer for checskum computation
> We should eliminate the ByteBuffer in the middle.
> For zero copy io, we should support WritableByteChannel instead of 
> OutputStream. It won't be done in this JIRA.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-372) There are three buffer copies in BlockOutputStream

2019-04-03 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808414#comment-16808414
 ] 

Tsz Wo Nicholas Sze commented on HDDS-372:
--

Thanks [~shashikant].  Some quick comments:

- Do not change Checksum to use UnsafeByteOperations since (1) checksum size is 
very small compared with the data and (2) checksum is used to detect data 
change -- if there is a bug involving UnsafeByteOperations, the checksum may be 
able to detect it.
- How about renaming the new conf "ozone.safe.buffer.bytestring.copy" to 
"ozone.client.UnsafeByteOperations.enabled"?

> There are three buffer copies in BlockOutputStream
> --
>
> Key: HDDS-372
> URL: https://issues.apache.org/jira/browse/HDDS-372
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Client
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDDS-372.001.patch, HDDS-372.002.patch, 
> HDDS-372.20180829.patch
>
>
> Currently, there are three buffer copies in ChunkOutputStream
>  # from byte[] to ByteBuffer, and
>  # from ByteBuffer to ByteString.
>  # from ByteString to ByteBuffer for checskum computation
> We should eliminate the ByteBuffer in the middle.
> For zero copy io, we should support WritableByteChannel instead of 
> OutputStream. It won't be done in this JIRA.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-699) Detect Ozone Network topology

2019-03-18 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16794909#comment-16794909
 ] 

Tsz Wo Nicholas Sze commented on HDDS-699:
--

Thank you, [~Sammi].

> Detect Ozone Network topology
> -
>
> Key: HDDS-699
> URL: https://issues.apache.org/jira/browse/HDDS-699
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Sammi Chen
>Priority: Major
> Fix For: 0.5.0
>
> Attachments: HDDS-699.00.patch, HDDS-699.01.patch, HDDS-699.02.patch, 
> HDDS-699.03.patch, HDDS-699.04.patch, HDDS-699.05.patch, HDDS-699.06.patch, 
> HDDS-699.07.patch, HDDS-699.08.patch, HDDS-699.09.patch
>
>
> Traditionally this has been implemented in Hadoop via script or customizable 
> java class. One thing we want to add here is the flexible multi-level support 
> instead of fixed levels like DC/Rack/NG/Node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-699) Detect Ozone Network topology

2019-03-17 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16794683#comment-16794683
 ] 

Tsz Wo Nicholas Sze commented on HDDS-699:
--

+1 the 09 patch looks good.

The test failures do not seem related.  Let me start another Jenkins build.
 

> Detect Ozone Network topology
> -
>
> Key: HDDS-699
> URL: https://issues.apache.org/jira/browse/HDDS-699
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Sammi Chen
>Priority: Major
> Attachments: HDDS-699.00.patch, HDDS-699.01.patch, HDDS-699.02.patch, 
> HDDS-699.03.patch, HDDS-699.04.patch, HDDS-699.05.patch, HDDS-699.06.patch, 
> HDDS-699.07.patch, HDDS-699.08.patch, HDDS-699.09.patch
>
>
> Traditionally this has been implemented in Hadoop via script or customizable 
> java class. One thing we want to add here is the flexible multi-level support 
> instead of fixed levels like DC/Rack/NG/Node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-699) Detect Ozone Network topology

2019-03-15 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793352#comment-16793352
 ] 

Tsz Wo Nicholas Sze commented on HDDS-699:
--

> ... In testConcurrentAccess, all the individual tests in the class are 
> scheduled to run currently in different threads to test the robustness of the 
> NetworkTopologyImpl. ...

It seems that testConcurrentAccess does not work well.  The test does not fail 
even if there is an AssertionError or IllegalArgumentException.  If it never 
fails, how could we tell if there is a bug?

How about removing it for the moment?  We may add it when we have a better 
design later on.

> Detect Ozone Network topology
> -
>
> Key: HDDS-699
> URL: https://issues.apache.org/jira/browse/HDDS-699
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Sammi Chen
>Priority: Major
> Attachments: HDDS-699.00.patch, HDDS-699.01.patch, HDDS-699.02.patch, 
> HDDS-699.03.patch, HDDS-699.04.patch, HDDS-699.05.patch, HDDS-699.06.patch, 
> HDDS-699.07.patch, HDDS-699.08.patch
>
>
> Traditionally this has been implemented in Hadoop via script or customizable 
> java class. One thing we want to add here is the flexible multi-level support 
> instead of fixed levels like DC/Rack/NG/Node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-699) Detect Ozone Network topology

2019-03-14 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793053#comment-16793053
 ] 

Tsz Wo Nicholas Sze edited comment on HDDS-699 at 3/14/19 8:43 PM:
---

Tried to run TestNetworkTopologyImpl locally. There are a lot of exceptions and 
errors although the tests do not fail.
{code:java}
org.junit.internal.AssumptionViolatedException: got: , expected: is 



at org.junit.Assume.assumeThat(Assume.java:95)
at org.junit.Assume.assumeTrue(Assume.java:41)
at 
org.apache.hadoop.hdds.scm.net.TestNetworkTopologyImpl.testAncestor(TestNetworkTopologyImpl.java:238)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)

Exception in thread "Thread-19" org.junit.internal.AssumptionViolatedException: 
got: , expected: is 
at org.junit.Assume.assumeThat(Assume.java:95)
at org.junit.Assume.assumeTrue(Assume.java:41)
at 
org.apache.hadoop.hdds.scm.net.TestNetworkTopologyImpl.testAncestor(TestNetworkTopologyImpl.java:238)
at 
org.apache.hadoop.hdds.scm.net.TestNetworkTopologyImpl.lambda$testConcurrentAccess$9(TestNetworkTopologyImpl.java:853)
at java.lang.Thread.run(Thread.java:748)
Exception in thread "Thread-18" java.lang.IllegalArgumentException: 
affinityNode /1.1.1.1 doesn't have ancestor on generation  1
at 
org.apache.hadoop.hdds.scm.net.NetworkTopologyImpl.chooseNodeInternal(NetworkTopologyImpl.java:498)
at 
org.apache.hadoop.hdds.scm.net.NetworkTopologyImpl.getNode(NetworkTopologyImpl.java:481)
at 
org.apache.hadoop.hdds.scm.net.TestNetworkTopologyImpl.pickNodes(TestNetworkTopologyImpl.java:972)
at 
org.apache.hadoop.hdds.scm.net.TestNetworkTopologyImpl.testChooseRandomWithAffinityNode(TestNetworkTopologyImpl.java:596)
at 
org.apache.hadoop.hdds.scm.net.TestNetworkTopologyImpl.lambda$testConcurrentAccess$8(TestNetworkTopologyImpl.java:849)
at java.lang.Thread.run(Thread.java:748)
Exception in thread "Thread-45" java.lang.IllegalArgumentException: Affinity 
node /r1/1.1.1.1 is not a member of topology
at 
org.apache.hadoop.hdds.scm.net.NetworkTopologyImpl.checkAffinityNode(NetworkTopologyImpl.java:767)
at 
org.apache.hadoop.hdds.scm.net.NetworkTopologyImpl.getNode(NetworkTopologyImpl.java:476)
at 
org.apache.hadoop.hdds.scm.net.TestNetworkTopologyImpl.pickNodes(TestNetworkTopologyImpl.java:972)
at 
org.apache.hadoop.hdds.scm.net.TestNetworkTopologyImpl.testChooseRandomWithAffinityNode(TestNetworkTopologyImpl.java:596)
at 
org.apache.hadoop.hdds.scm.net.TestNetworkTopologyImpl.lambda$testConcurrentAccess$8(TestNetworkTopologyImpl.java:849)
at java.lang.Thread.run(Thread.java:748)
Exception in thread "Thread-41" java.lang.AssertionError
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at 
org.apache.hadoop.hdds.scm.net.TestNetworkTopologyImpl.testChooseRandomExcludedNode(TestNetworkTopologyImpl.java:454)
at 
org.apache.hadoop.hdds.scm.net.TestNetworkTopologyImpl.lambda$testConcurrentAccess$4(TestNetworkTopologyImpl.java:833)
at java.lang.Thread.run(Thread.java:748)
Exception in thread "Thread-72" java.lang.IllegalArgumentException: Affinity 
node /d1/r1/1.1.1.1 is not a member of topology
at 
org.apache.hadoop.hdds.scm.net.NetworkTopologyImpl.checkAffinityNode(NetworkTopologyImpl.java:767)
at 
org.apache.hadoop.hdds.scm.net.NetworkTopologyImpl.getNode(NetworkTopologyImpl.java:476)
at 
org.apache.hadoop.hdds.scm.net.TestNetworkTopologyImpl.pickNodes(TestNetworkTopologyImpl.java:972)
at 
org.apache.hadoop.hdds.scm.net.TestNetworkTopologyImpl.testChooseRandomWithAffinityNode(TestNetworkTopologyImpl.java:596)
at 
org.apache.hadoop.hdds.scm.net.TestNetworkTopologyImpl.lambda$testConcurrentAccess$8(TestNetworkTopologyImpl.java:849)
at java.lang.Thread.run(Thread.java:748)
Exception in thread "Thread-76" java.lang.AssertionError: 
reader:/d1/r1/1.1.1.1,node1:/d2/r3/6.6.6.6,node2:/d1/r1/2.2.2.2,cost1:6,cost2:2
   

[jira] [Commented] (HDDS-699) Detect Ozone Network topology

2019-03-14 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793055#comment-16793055
 ] 

Tsz Wo Nicholas Sze commented on HDDS-699:
--

The 08 patch look good other than the TestNetworkTopologyImpl problems.
+1

> Detect Ozone Network topology
> -
>
> Key: HDDS-699
> URL: https://issues.apache.org/jira/browse/HDDS-699
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Sammi Chen
>Priority: Major
> Attachments: HDDS-699.00.patch, HDDS-699.01.patch, HDDS-699.02.patch, 
> HDDS-699.03.patch, HDDS-699.04.patch, HDDS-699.05.patch, HDDS-699.06.patch, 
> HDDS-699.07.patch, HDDS-699.08.patch
>
>
> Traditionally this has been implemented in Hadoop via script or customizable 
> java class. One thing we want to add here is the flexible multi-level support 
> instead of fixed levels like DC/Rack/NG/Node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-699) Detect Ozone Network topology

2019-03-14 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793053#comment-16793053
 ] 

Tsz Wo Nicholas Sze edited comment on HDDS-699 at 3/14/19 8:42 PM:
---

Tried to run TestNetworkTopologyImpl locally.  There are a lot of exceptions 
and errors although the tests does not fail.  
{code}

org.junit.internal.AssumptionViolatedException: got: , expected: is 



at org.junit.Assume.assumeThat(Assume.java:95)
at org.junit.Assume.assumeTrue(Assume.java:41)
at 
org.apache.hadoop.hdds.scm.net.TestNetworkTopologyImpl.testAncestor(TestNetworkTopologyImpl.java:238)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)

Exception in thread "Thread-19" org.junit.internal.AssumptionViolatedException: 
got: , expected: is 
at org.junit.Assume.assumeThat(Assume.java:95)
at org.junit.Assume.assumeTrue(Assume.java:41)
at 
org.apache.hadoop.hdds.scm.net.TestNetworkTopologyImpl.testAncestor(TestNetworkTopologyImpl.java:238)
at 
org.apache.hadoop.hdds.scm.net.TestNetworkTopologyImpl.lambda$testConcurrentAccess$9(TestNetworkTopologyImpl.java:853)
at java.lang.Thread.run(Thread.java:748)
Exception in thread "Thread-18" java.lang.IllegalArgumentException: 
affinityNode /1.1.1.1 doesn't have ancestor on generation  1
at 
org.apache.hadoop.hdds.scm.net.NetworkTopologyImpl.chooseNodeInternal(NetworkTopologyImpl.java:498)
at 
org.apache.hadoop.hdds.scm.net.NetworkTopologyImpl.getNode(NetworkTopologyImpl.java:481)
at 
org.apache.hadoop.hdds.scm.net.TestNetworkTopologyImpl.pickNodes(TestNetworkTopologyImpl.java:972)
at 
org.apache.hadoop.hdds.scm.net.TestNetworkTopologyImpl.testChooseRandomWithAffinityNode(TestNetworkTopologyImpl.java:596)
at 
org.apache.hadoop.hdds.scm.net.TestNetworkTopologyImpl.lambda$testConcurrentAccess$8(TestNetworkTopologyImpl.java:849)
at java.lang.Thread.run(Thread.java:748)
Exception in thread "Thread-45" java.lang.IllegalArgumentException: Affinity 
node /r1/1.1.1.1 is not a member of topology
at 
org.apache.hadoop.hdds.scm.net.NetworkTopologyImpl.checkAffinityNode(NetworkTopologyImpl.java:767)
at 
org.apache.hadoop.hdds.scm.net.NetworkTopologyImpl.getNode(NetworkTopologyImpl.java:476)
at 
org.apache.hadoop.hdds.scm.net.TestNetworkTopologyImpl.pickNodes(TestNetworkTopologyImpl.java:972)
at 
org.apache.hadoop.hdds.scm.net.TestNetworkTopologyImpl.testChooseRandomWithAffinityNode(TestNetworkTopologyImpl.java:596)
at 
org.apache.hadoop.hdds.scm.net.TestNetworkTopologyImpl.lambda$testConcurrentAccess$8(TestNetworkTopologyImpl.java:849)
at java.lang.Thread.run(Thread.java:748)
Exception in thread "Thread-41" java.lang.AssertionError
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at 
org.apache.hadoop.hdds.scm.net.TestNetworkTopologyImpl.testChooseRandomExcludedNode(TestNetworkTopologyImpl.java:454)
at 
org.apache.hadoop.hdds.scm.net.TestNetworkTopologyImpl.lambda$testConcurrentAccess$4(TestNetworkTopologyImpl.java:833)
at java.lang.Thread.run(Thread.java:748)
Exception in thread "Thread-72" java.lang.IllegalArgumentException: Affinity 
node /d1/r1/1.1.1.1 is not a member of topology
at 
org.apache.hadoop.hdds.scm.net.NetworkTopologyImpl.checkAffinityNode(NetworkTopologyImpl.java:767)
at 
org.apache.hadoop.hdds.scm.net.NetworkTopologyImpl.getNode(NetworkTopologyImpl.java:476)
at 
org.apache.hadoop.hdds.scm.net.TestNetworkTopologyImpl.pickNodes(TestNetworkTopologyImpl.java:972)
at 
org.apache.hadoop.hdds.scm.net.TestNetworkTopologyImpl.testChooseRandomWithAffinityNode(TestNetworkTopologyImpl.java:596)
at 
org.apache.hadoop.hdds.scm.net.TestNetworkTopologyImpl.lambda$testConcurrentAccess$8(TestNetworkTopologyImpl.java:849)
at java.lang.Thread.run(Thread.java:748)
Exception in thread "Thread-76" java.lang.AssertionError: 
reader:/d1/r1/1.1.1.1,node1:/d2/r3/6.6.6.6,node2:/d1/r1/2.2.2.2,cost1:6,cost2:2
  

[jira] [Commented] (HDDS-699) Detect Ozone Network topology

2019-03-14 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793053#comment-16793053
 ] 

Tsz Wo Nicholas Sze commented on HDDS-699:
--

Tried to run TestNetworkTopologyImpl locally.  There are a lot of exceptions 
and errors although the tests does not fail.  
{code}
/Library/Java/JavaVirtualMachines/jdk1.8.0_191.jdk/Contents/Home/bin/java -ea 
-Dhadoop.log.dir=/Users/szetszwo/hadoop/h1-readonly/hadoop-hdds/common/target/log
 
-Dhadoop.tmp.dir=/Users/szetszwo/hadoop/h1-readonly/hadoop-hdds/common/target/tmp
 
-Dtest.build.dir=/Users/szetszwo/hadoop/h1-readonly/hadoop-hdds/common/target/test-dir
 
-Dtest.build.data=/Users/szetszwo/hadoop/h1-readonly/hadoop-hdds/common/target/test-dir
 
-Dtest.build.classes=/Users/szetszwo/hadoop/h1-readonly/hadoop-hdds/common/target/test-classes
 -Djava.net.preferIPv4Stack=true 
-Djava.security.krb5.conf=/Users/szetszwo/hadoop/h1-readonly/hadoop-hdds/common/target/test-classes/krb5.conf
 -Djava.security.egd=file:///dev/urandom -Xmx2048m 
-XX:+HeapDumpOnOutOfMemoryError -Didea.test.cyclic.buffer.size=1048576 
"-javaagent:/Applications/IntelliJ 
IDEA.app/Contents/lib/idea_rt.jar=57589:/Applications/IntelliJ 
IDEA.app/Contents/bin" -Dfile.encoding=UTF-8 -classpath "/Applications/IntelliJ 
IDEA.app/Contents/lib/idea_rt.jar:/Applications/IntelliJ 
IDEA.app/Contents/plugins/junit/lib/junit-rt.jar:/Applications/IntelliJ 

[jira] [Commented] (HDDS-699) Detect Ozone Network topology

2019-03-12 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791100#comment-16791100
 ] 

Tsz Wo Nicholas Sze commented on HDDS-699:
--

Some final comments:
- There are some code duplication in NetworkTopologyImpl
-* getNode(..) and one of the chooseRandom(..) are mostly the same.  We should 
refactor them.
-* Different versions of chooseRandom(..) should just call the most general 
chooseRandom(..) method.
- Some items in [this 
comment|https://issues.apache.org/jira/browse/HDDS-699?focusedCommentId=16786253=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16786253]
 are not yet addressed.
- There are a few checkstyle warnings.

Thanks!

> Detect Ozone Network topology
> -
>
> Key: HDDS-699
> URL: https://issues.apache.org/jira/browse/HDDS-699
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Sammi Chen
>Priority: Major
> Attachments: HDDS-699.00.patch, HDDS-699.01.patch, HDDS-699.02.patch, 
> HDDS-699.03.patch, HDDS-699.04.patch, HDDS-699.05.patch, HDDS-699.06.patch
>
>
> Traditionally this has been implemented in Hadoop via script or customizable 
> java class. One thing we want to add here is the flexible multi-level support 
> instead of fixed levels like DC/Rack/NG/Node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-699) Detect Ozone Network topology

2019-03-12 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791097#comment-16791097
 ] 

Tsz Wo Nicholas Sze commented on HDDS-699:
--

Just found that NetUtils.removeDuplicate has already taken care my previous 
comment.  Thanks.

> Detect Ozone Network topology
> -
>
> Key: HDDS-699
> URL: https://issues.apache.org/jira/browse/HDDS-699
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Sammi Chen
>Priority: Major
> Attachments: HDDS-699.00.patch, HDDS-699.01.patch, HDDS-699.02.patch, 
> HDDS-699.03.patch, HDDS-699.04.patch, HDDS-699.05.patch, HDDS-699.06.patch
>
>
> Traditionally this has been implemented in Hadoop via script or customizable 
> java class. One thing we want to add here is the flexible multi-level support 
> instead of fixed levels like DC/Rack/NG/Node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-699) Detect Ozone Network topology

2019-03-12 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791051#comment-16791051
 ] 

Tsz Wo Nicholas Sze commented on HDDS-699:
--

Thanks [~Sammi] for the 06 patch.  The example in the javadoc of getLeaf is 
very useful!
{code}
   */  --- root
   *  /  \
   * /\
   */  \
   *   /\
   * dc1 dc2
   */ \ / \
   *   /   \   /   \
   *  / \ / \
   *rack1 rack2  rack1  rack2
   *   / \ / \  / \ / \
   * n1  n2  n3 n4 n5  n6  n7 n8
{code}
Consider the following two sets of input:
#  leafIndex = 2, excludedScope = /dc1/rack1, excludedNodes = \{/dc1/rack1/n1}, 
ancestorGen = 0
#  leafIndex = 2, excludedScope = /dc1/rack1, excludedNodes = \{/dc1/rack1/n1}, 
ancestorGen = 2

In #1, the entire /dc1/rack1 is excluded so that the output is n4.  

In #2, the entire /dc1 is excluded so that the output is n6.

Therefore, we should calculate the overlap and remove it, if there is any.
{code}
  public Node getLeaf(int leafIndex, String excludedScope,
  Collection excludedNodes, int ancestorGen) {
...

// build an ancestor(children) to exclude node count map
Map countMap =
getAncestorCountMap(excludedNodes, ancestorGen, currentGen);

// check overlap between excludedScope and countMap
if (excludedScope != null) {
  for(Iterator> i = 
countMap.entrySet().iterator(); i.hasNext(); ) {
final Map.Entry entry = i.next();
final String path = entry.getKey().getNetworkFullPath();
if (path.startsWith(excludedScope)) {
  // this node is a part of the excludedScope
  i.remove();
} else if (excludedScope.startsWith(path)) {
  // the excludedScope is already excluded by the this node
  excludedScope = null;
}
  }
}

// nodes covered by excluded scope
int excludedNodeCount = getExcludedScopeNodeCount(excludedScope);
...
  }
{code}


> Detect Ozone Network topology
> -
>
> Key: HDDS-699
> URL: https://issues.apache.org/jira/browse/HDDS-699
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Sammi Chen
>Priority: Major
> Attachments: HDDS-699.00.patch, HDDS-699.01.patch, HDDS-699.02.patch, 
> HDDS-699.03.patch, HDDS-699.04.patch, HDDS-699.05.patch, HDDS-699.06.patch
>
>
> Traditionally this has been implemented in Hadoop via script or customizable 
> java class. One thing we want to add here is the flexible multi-level support 
> instead of fixed levels like DC/Rack/NG/Node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-699) Detect Ozone Network topology

2019-03-07 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16787126#comment-16787126
 ] 

Tsz Wo Nicholas Sze commented on HDDS-699:
--

> I think here you mean getAncestorCounts. ...

I do mean getAncestorNodeMap.  It only returns one of the node of an ancestor, 
which is a bug, since two excluded nodes under the same ancestor may or may not 
overlap, depending on the numLevelToExclude  (ancestorGen).

In the 05 patch, getAncestorNodeMap and getAncestorCountMap should be combined 
to one method, as shown in [getAncestorCounts in this 
comment|https://issues.apache.org/jira/browse/HDDS-699?focusedCommentId=16786248=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16786248].

>  would like to keep current behavior to fit its function name. Otherwise 
> people may has questions.

It is common to define a node to be its own ancestor.  For example, see 
https://en.wikipedia.org/wiki/Lowest_common_ancestor

> Detect Ozone Network topology
> -
>
> Key: HDDS-699
> URL: https://issues.apache.org/jira/browse/HDDS-699
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Sammi Chen
>Priority: Major
> Attachments: HDDS-699.00.patch, HDDS-699.01.patch, HDDS-699.02.patch, 
> HDDS-699.03.patch, HDDS-699.04.patch, HDDS-699.05.patch
>
>
> Traditionally this has been implemented in Hadoop via script or customizable 
> java class. One thing we want to add here is the flexible multi-level support 
> instead of fixed levels like DC/Rack/NG/Node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-699) Detect Ozone Network topology

2019-03-06 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786248#comment-16786248
 ] 

Tsz Wo Nicholas Sze edited comment on HDDS-699 at 3/7/19 12:48 AM:
---

Thanks [~Sammi].  Just found that the logic for getLeaf(int leafIndex, String 
excludedScope, Collection excludedNodes, int ancestorGen) is quite 
complicated.
- First of all, let's consistently use "level" instead of "generation" in the 
code.  In the getLeaf methods, let's rename ancestorGen to numLevelToExclude.
- excludedScope and excludedNodes may overlap so that we should filter out the 
overlapped nodes.
{code}
  public Node getLeaf(int leafIndex, String excludedScope,
  Collection excludedNodes, int numLevelToExclude) {
Preconditions.checkArgument(leafIndex >= 0 && numLevelToExclude >= 0);
if (excludedScope != null) {
  excludedNodes = excludedNodes.stream()
  .filter(n -> !n.getNetworkFullPath().startsWith(excludedScope))
  .collect(Collectors.toList());
}
return getLeafRecursively(leafIndex, excludedScope, excludedNodes, 
numLevelToExclude);
// see below for getLeafRecursively
  }
{code}
- Let's change getAncestor(0) to return this. It will simplify the code.
{code}
  public Node getAncestor(int generation) {
Preconditions.checkArgument(generation >= 0);
Node current = this;
while (generation > 0 && current != null) {
  current = current.getParent();
  generation--;
}
return current;
  }
{code}
- Then, we need to take care the excluded node counting with numLevelToExclude. 
 getAncestorNodeMap seems incorrect since it does not consider 
numLevelToExclude.  When consider numLevelToExclude, two excluded nodes under 
the same ancestor may or may not overlap.  We should filter out the overlap 
first as below.
{code}
  /**
   * @return a map: ancestor-node  -> node-count, where
   * the ancestor-node corresponds to the levelToReturn, and
   * the node-count corresponds to the levelToCount.
   */
  private Map getAncestorCounts(Collection nodes,
  int levelToReturn, int levelToCount) {
Preconditions.checkState(levelToReturn >= levelToCount);
if (nodes == null || nodes.size() == 0) {
  return Collections.emptyMap();
}

// map: levelToCount -> levelToReturn
final Map map = new HashMap<>();
for (Node node: nodes) {
  final Node toCount = node.getAncestor(levelToCount);
  final Node toReturn = node.getAncestor(levelToReturn - levelToCount);
  map.putIfAbsent(toCount, toReturn);
}

// map: levelToReturn -> counts
final Map counts = new HashMap<>();
for (Map.Entry entry : map.entrySet()) {
  final Node toCount = entry.getKey();
  final Node toReturn = entry.getValue();
  counts.compute(toReturn, (key, n) -> (n == null? 0: n) + 
toCount.getNumOfLeaves());
}

return counts;
  }
{code}

- Finally, here is the getLeafRecursively(..).  The other getLeaf methods can 
be removed.
{code}
  private Node getLeafRecursively(int leafIndex, String excludedScope,
  Collection excludedNodes, int numLevelToExclude) {
if (isLeafParent()) {
  return getLeafOnLeafParent(leafIndex, excludedScope, excludedNodes);
}

final int levelToReturn = NodeSchemaManager.getInstance().getMaxLevel() - 
getLevel() - 1;
final Map excludedAncestors = getAncestorCounts(
excludedNodes, levelToReturn, numLevelToExclude);
final int excludedScopeCount = getScopeNodeCount(excludedScope);


for(Node node : childrenMap.values()) {
  int leafCount = node.getNumOfLeaves();
  if (excludedScope != null && 
excludedScope.startsWith(node.getNetworkFullPath())) {
leafCount -= excludedScopeCount;
  }
  leafCount -= excludedAncestors.get(node);
  if (leafCount > 0) {
if (leafIndex < leafCount) {
  return ((InnerNodeImpl)node).getLeafRecursively(
  leafIndex, excludedScope, excludedNodes, numLevelToExclude);
}
leafIndex -= leafCount;
  }
}
return null;
  }
{code}



was (Author: szetszwo):
Thanks [~Sammi].  Just found that the logic for getLeaf(int leafIndex, String 
excludedScope, Collection excludedNodes, int ancestorGen) is quite 
complicated.
- First of all, let's consistently use "level" instead of "generation" in the 
code.  In the getLeaf methods, let's rename ancestorGen to numLevelToExclude.
- excludedScope and excludedNodes may overlap so that we should filter out the 
overlapped nodes.
{code}
  public Node getLeaf(int leafIndex, String excludedScope,
  Collection excludedNodes, int numLevelToExclude) {
Preconditions.checkArgument(leafIndex >= 0 && numLevelToExclude >= 0);
if (excludedScope != null) {
  excludedNodes = excludedNodes.stream()
  .filter(n -> !n.getNetworkFullPath().startsWith(excludedScope))
  .collect(Collectors.toList());
}

[jira] [Comment Edited] (HDDS-699) Detect Ozone Network topology

2019-03-06 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786248#comment-16786248
 ] 

Tsz Wo Nicholas Sze edited comment on HDDS-699 at 3/7/19 12:50 AM:
---

Thanks [~Sammi].  Just found that the logic for getLeaf(int leafIndex, String 
excludedScope, Collection excludedNodes, int ancestorGen) is quite 
complicated.
- First of all, let's consistently use "level" instead of "generation" in the 
code.  In the getLeaf methods, let's rename ancestorGen to numLevelToExclude.
- excludedScope and excludedNodes may overlap so that we should filter out the 
overlapped nodes.
{code}
  public Node getLeaf(int leafIndex, String excludedScope,
  Collection excludedNodes, int numLevelToExclude) {
Preconditions.checkArgument(leafIndex >= 0 && numLevelToExclude >= 0);
if (excludedScope != null) {
  excludedNodes = excludedNodes.stream()
  .filter(n -> !n.getNetworkFullPath().startsWith(excludedScope))
  .collect(Collectors.toList());
}
return getLeafRecursively(leafIndex, excludedScope, excludedNodes, 
numLevelToExclude);
// see below for getLeafRecursively
  }
{code}
- Let's change getAncestor(0) to return this. It will simplify the code.
{code}
  public Node getAncestor(int generation) {
Preconditions.checkArgument(generation >= 0);
Node current = this;
while (generation > 0 && current != null) {
  current = current.getParent();
  generation--;
}
return current;
  }
{code}
- Then, we need to take care the excluded node counting with numLevelToExclude. 
 getAncestorNodeMap seems incorrect since it does not consider 
numLevelToExclude.  When consider numLevelToExclude, two excluded nodes under 
the same ancestor may or may not overlap.  We should filter out the overlap 
first as below.
{code}
  /**
   * @return a map: ancestor-node  -> node-count, where
   * the ancestor-node corresponds to the levelToReturn, and
   * the node-count corresponds to the levelToCount.
   */
  private Map getAncestorCounts(Collection nodes,
  int levelToReturn, int levelToCount) {
Preconditions.checkState(levelToReturn >= levelToCount);
if (nodes == null || nodes.size() == 0) {
  return Collections.emptyMap();
}

// map: levelToCount -> levelToReturn
final Map map = new HashMap<>();
for (Node node: nodes) {
  final Node toCount = node.getAncestor(levelToCount);
  final Node toReturn = toCount.getAncestor(levelToReturn - levelToCount);
  map.putIfAbsent(toCount, toReturn);
}

// map: levelToReturn -> counts
final Map counts = new HashMap<>();
for (Map.Entry entry : map.entrySet()) {
  final Node toCount = entry.getKey();
  final Node toReturn = entry.getValue();
  counts.compute(toReturn, (key, n) -> (n == null? 0: n) + 
toCount.getNumOfLeaves());
}

return counts;
  }
{code}

- Finally, here is the getLeafRecursively(..).  The other getLeaf methods can 
be removed.
{code}
  private Node getLeafRecursively(int leafIndex, String excludedScope,
  Collection excludedNodes, int numLevelToExclude) {
if (isLeafParent()) {
  return getLeafOnLeafParent(leafIndex, excludedScope, excludedNodes);
}

final int levelToReturn = NodeSchemaManager.getInstance().getMaxLevel() - 
getLevel() - 1;
final Map excludedAncestors = getAncestorCounts(
excludedNodes, levelToReturn, numLevelToExclude);
final int excludedScopeCount = getScopeNodeCount(excludedScope);


for(Node node : childrenMap.values()) {
  int leafCount = node.getNumOfLeaves();
  if (excludedScope != null && 
excludedScope.startsWith(node.getNetworkFullPath())) {
leafCount -= excludedScopeCount;
  }
  leafCount -= excludedAncestors.get(node);
  if (leafCount > 0) {
if (leafIndex < leafCount) {
  return ((InnerNodeImpl)node).getLeafRecursively(
  leafIndex, excludedScope, excludedNodes, numLevelToExclude);
}
leafIndex -= leafCount;
  }
}
return null;
  }
{code}



was (Author: szetszwo):
Thanks [~Sammi].  Just found that the logic for getLeaf(int leafIndex, String 
excludedScope, Collection excludedNodes, int ancestorGen) is quite 
complicated.
- First of all, let's consistently use "level" instead of "generation" in the 
code.  In the getLeaf methods, let's rename ancestorGen to numLevelToExclude.
- excludedScope and excludedNodes may overlap so that we should filter out the 
overlapped nodes.
{code}
  public Node getLeaf(int leafIndex, String excludedScope,
  Collection excludedNodes, int numLevelToExclude) {
Preconditions.checkArgument(leafIndex >= 0 && numLevelToExclude >= 0);
if (excludedScope != null) {
  excludedNodes = excludedNodes.stream()
  .filter(n -> !n.getNetworkFullPath().startsWith(excludedScope))
  .collect(Collectors.toList());
}
   

[jira] [Commented] (HDDS-699) Detect Ozone Network topology

2019-03-06 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786253#comment-16786253
 ] 

Tsz Wo Nicholas Sze commented on HDDS-699:
--

Some other comments:
- Move getNumOfLeaves() from InnerNode to Node. It returns 1 for non-InnerNode.
- In Node, getParent() should return InnerNode instead of Node.
- Add a new field fullPath to NodeImpl.  The getNetworkFullPath() just return 
it in order to avoid constructing the string many many times.
- In InnerNodeImpl, remove getNodes(int level) and getChildren().  They are 
unused.
- In InnerNodeImpl.isParent(node),  it seems wrong to return true when 
node.getNetworkLocation().equals(NetConstants.ROOT).
- In InnerNodeImpl.getNode(String loc), we should first check if loc is 
absolute and then return null if the prefix does not match.
{code}
// InnerNodeImpl.getNode(String loc),
if (loc.startsWith(PATH_SEPARATOR_STR)) {
  // remove this node's location from loc
  if (loc.startsWith(this.getNetworkFullPath())) {
loc = loc.substring(this.getNetworkFullPath().length());
  } else {
return null;
  }
}
{code}
- Add \@Override for the overrided methods


> Detect Ozone Network topology
> -
>
> Key: HDDS-699
> URL: https://issues.apache.org/jira/browse/HDDS-699
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Sammi Chen
>Priority: Major
> Attachments: HDDS-699.00.patch, HDDS-699.01.patch, HDDS-699.02.patch, 
> HDDS-699.03.patch, HDDS-699.04.patch, HDDS-699.05.patch
>
>
> Traditionally this has been implemented in Hadoop via script or customizable 
> java class. One thing we want to add here is the flexible multi-level support 
> instead of fixed levels like DC/Rack/NG/Node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-699) Detect Ozone Network topology

2019-03-06 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786248#comment-16786248
 ] 

Tsz Wo Nicholas Sze commented on HDDS-699:
--

Thanks [~Sammi].  Just found that the logic for getLeaf(int leafIndex, String 
excludedScope, Collection excludedNodes, int ancestorGen) is quite 
complicated.
- First of all, let's consistently use "level" instead of "generation" in the 
code.  In the getLeaf methods, let's rename ancestorGen to numLevelToExclude.
- excludedScope and excludedNodes may overlap so that we should filter out the 
overlapped nodes.
{code}
  public Node getLeaf(int leafIndex, String excludedScope,
  Collection excludedNodes, int numLevelToExclude) {
Preconditions.checkArgument(leafIndex >= 0 && numLevelToExclude >= 0);
if (excludedScope != null) {
  excludedNodes = excludedNodes.stream()
  .filter(n -> !n.getNetworkFullPath().startsWith(excludedScope))
  .collect(Collectors.toList());
}
return getLeafRecursively(leafIndex, excludedScope, excludedNodes, 
numLevelToExclude);
// see below for getLeafRecursively
  }
{code}
- Let's change getAncestor(0) to return this. It will simplify the code.
{code}
  public Node getAncestor(int generation) {
Preconditions.checkArgument(generation >= 0);
Node current = this;
while (generation > 0 && current != null) {
  current = current.getParent();
  generation--;
}
return current;
  }
{code}
- Then, we need to take care the excluded node counting with numLevelToExclude. 
 getAncestorNodeMap seems incorrect since it does not consider 
numLevelToExclude.  When consider numLevelToExclude, two excluded nodes under 
the same ancestor may or may not overlap.  We should filter out the overlap 
first as below.
{code}
  /**
   * @return a map: ancestor-node  -> node-count, where
   * the ancestor-node corresponds to the levelToReturn, and
   * the node-count corresponds to the levelToCount.
   */
  private Map getAncestorCounts(Collection nodes,
  int levelToReturn, int levelToCount) {
Preconditions.checkState(levelToReturn >= levelToCount);
if (nodes == null || nodes.size() == 0) {
  return Collections.emptyMap();
}

// map: levelToCount -> levelToReturn
final Map map = new HashMap<>();
for (Node node: nodes) {
  final Node toReturn = node.getAncestor(levelToReturn);
  final Node toCount = levelToCount == levelToReturn ? toReturn: 
node.getAncestor(levelToCount);
  map.putIfAbsent(toCount, toReturn);
}

// map: levelToReturn -> counts
final Map counts = new HashMap<>();
for (Map.Entry entry : map.entrySet()) {
  final Node toCount = entry.getKey();
  final Node toReturn = entry.getValue();
  counts.compute(toReturn, (key, n) -> (n == null? 0: n) + 
toCount.getNumOfLeaves());
}

return counts;
  }
{code}

- Finally, here is the getLeafRecursively(..).  The other getLeaf methods can 
be removed.
{code}
  private Node getLeafRecursively(int leafIndex, String excludedScope,
  Collection excludedNodes, int numLevelToExclude) {
if (isLeafParent()) {
  return getLeafOnLeafParent(leafIndex, excludedScope, excludedNodes);
}

final int levelToReturn = NodeSchemaManager.getInstance().getMaxLevel() - 
getLevel() - 1;
final Map excludedAncestors = getAncestorCounts(
excludedNodes, levelToReturn, numLevelToExclude);
final int excludedScopeCount = getScopeNodeCount(excludedScope);


for(Node node : childrenMap.values()) {
  int leafCount = node.getNumOfLeaves();
  if (excludedScope != null && 
excludedScope.startsWith(node.getNetworkFullPath())) {
leafCount -= excludedScopeCount;
  }
  leafCount -= excludedAncestors.get(node);
  if (leafCount > 0) {
if (leafIndex < leafCount) {
  return ((InnerNodeImpl)node).getLeafRecursively(
  leafIndex, excludedScope, excludedNodes, numLevelToExclude);
}
leafIndex -= leafCount;
  }
}
return null;
  }
{code}


> Detect Ozone Network topology
> -
>
> Key: HDDS-699
> URL: https://issues.apache.org/jira/browse/HDDS-699
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Sammi Chen
>Priority: Major
> Attachments: HDDS-699.00.patch, HDDS-699.01.patch, HDDS-699.02.patch, 
> HDDS-699.03.patch, HDDS-699.04.patch, HDDS-699.05.patch
>
>
> Traditionally this has been implemented in Hadoop via script or customizable 
> java class. One thing we want to add here is the flexible multi-level support 
> instead of fixed levels like DC/Rack/NG/Node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: 

[jira] [Commented] (HDDS-699) Detect Ozone Network topology

2019-03-05 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784986#comment-16784986
 ] 

Tsz Wo Nicholas Sze commented on HDDS-699:
--

Some initial comments on the 04 patch.  (will continue reviewing it.)
- remove setCost from Node since it is never used.
- In NodeImpl, change name, location and cost to final.  We should remove the 
set(..) method, which is only used in constructors, and refactor the code as 
below.
{code}
  // host:port#
  private final String name;
  // string representation of this node's location
  private final String location;
  // the cost to go through this node
  private final int cost;
  // which level of the tree the node resides, start from 1 for root
  private int level;
  // node's parent
  private Node parent;

  /**
   * Construct a node from its name and its location.
   * @param name this node's name (can be null, must not contain
   * {@link NetConstants#PATH_SEPARATOR})
   * @param location this node's location
   */
  public NodeImpl(String name, String location, int cost) {
if (name != null && name.contains(PATH_SEPARATOR_STR)) {
  throw new IllegalArgumentException(
  "Network location name:" + name + " should not contain " +
  PATH_SEPARATOR_STR);
}
this.name = (name == null) ? ROOT : name;
this.location = NetUtils.normalize(location);
this.cost = cost;
  }

  /**
   * Construct a node from its name and its location.
   * @param name this node's name (can be null, must not contain
   * {@link NetConstants#PATH_SEPARATOR})
   * @param location this node's location
   * @param parent this node's parent node
   * @param level this node's level in the tree
   * @param cost this node's cost if traffic goes through it
   */
  public NodeImpl(String name, String location, Node parent, int level,
  int cost) {
this(name, location, cost);
this.parent = parent;
this.level = level;
  }

// Note that the other constructors are removed.
{code}
- In InnerNode, removes the following methods and change them to private in 
InnerNodeImpl.  They are only used in InnerNodeImpl internally. 
-* getChildren(),
-* getNodes(int level),
-* getLeaf(int leafIndex, Collection excludedNodes),
-* getLeaf(int leafIndex, String excludedScope), 
-* getLeaf(int leafIndex, Collection excludedNodes, int ancestorGen),
-* getLeaf(int leafIndex, String excludedScope, Collection excludedNodes)
-* isParent(Node n),
-* isLeafParent().
- In InnerNodeImpl,
-* rename getAncestorsI to getAncestorsCounts
-* rename getAncestorsII to getAncestorNodes
- NetworkTopologyImpl.toString should acquire readLock.


> Detect Ozone Network topology
> -
>
> Key: HDDS-699
> URL: https://issues.apache.org/jira/browse/HDDS-699
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Sammi Chen
>Priority: Major
> Attachments: HDDS-699.00.patch, HDDS-699.01.patch, HDDS-699.02.patch, 
> HDDS-699.03.patch, HDDS-699.04.patch
>
>
> Traditionally this has been implemented in Hadoop via script or customizable 
> java class. One thing we want to add here is the flexible multi-level support 
> instead of fixed levels like DC/Rack/NG/Node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-699) Detect Ozone Network topology

2019-03-05 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784946#comment-16784946
 ] 

Tsz Wo Nicholas Sze commented on HDDS-699:
--

> cp: cannot stat 
> '/testptch/hadoop/hadoop-ozone/objectstore-service/target/hadoop-ozone-objectstore-service-0.4.0-SNAPSHOT-plugin.jar':
>  No such file or directory

It has nothing to do with the patch here since it also fails without the patch.

bq.  -1 compile 19m 1s  root in trunk failed. 

> Detect Ozone Network topology
> -
>
> Key: HDDS-699
> URL: https://issues.apache.org/jira/browse/HDDS-699
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Sammi Chen
>Priority: Major
> Attachments: HDDS-699.00.patch, HDDS-699.01.patch, HDDS-699.02.patch, 
> HDDS-699.03.patch, HDDS-699.04.patch
>
>
> Traditionally this has been implemented in Hadoop via script or customizable 
> java class. One thing we want to add here is the flexible multi-level support 
> instead of fixed levels like DC/Rack/NG/Node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-699) Detect Ozone Network topology

2019-03-04 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783831#comment-16783831
 ] 

Tsz Wo Nicholas Sze commented on HDDS-699:
--

[~Sammi], I have checked the 03 patch.  It looks good in general!  Please fix 
the findbugs and other warnings.  I will check the new patch in more details.  
Thanks a lot.

> Detect Ozone Network topology
> -
>
> Key: HDDS-699
> URL: https://issues.apache.org/jira/browse/HDDS-699
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Sammi Chen
>Priority: Major
> Attachments: HDDS-699.00.patch, HDDS-699.01.patch, HDDS-699.02.patch, 
> HDDS-699.03.patch
>
>
> Traditionally this has been implemented in Hadoop via script or customizable 
> java class. One thing we want to add here is the flexible multi-level support 
> instead of fixed levels like DC/Rack/NG/Node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-699) Detect Ozone Network topology

2019-02-28 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780842#comment-16780842
 ] 

Tsz Wo Nicholas Sze commented on HDDS-699:
--

> ...  I think it's a trade off between performance and accurancy. Using a 
> single RW-lock at network topology level has better accurancy while lower 
> performance. The question is whether accurancy can be sacrified in some cases 
> without big impact to other modules.

That's good point!  It is perfectly fine if we provide a high performance but 
sometime inaccurate API.  In such case, we should specify the behavior 
carefully. 

> Detect Ozone Network topology
> -
>
> Key: HDDS-699
> URL: https://issues.apache.org/jira/browse/HDDS-699
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Sammi Chen
>Priority: Major
> Attachments: HDDS-699.00.patch, HDDS-699.01.patch, HDDS-699.02.patch
>
>
> Traditionally this has been implemented in Hadoop via script or customizable 
> java class. One thing we want to add here is the flexible multi-level support 
> instead of fixed levels like DC/Rack/NG/Node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-451) PutKey failed due to error "Rejecting write chunk request. Chunk overwrite without explicit request"

2019-02-27 Thread Tsz Wo Nicholas Sze (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze resolved HDDS-451.
--
Resolution: Cannot Reproduce

Resolving as "Cannot Reproduce".

> PutKey failed due to error "Rejecting write chunk request. Chunk overwrite 
> without explicit request"
> 
>
> Key: HDDS-451
> URL: https://issues.apache.org/jira/browse/HDDS-451
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.2.1
>Reporter: Nilotpal Nandi
>Assignee: Shashikant Banerjee
>Priority: Blocker
>  Labels: alpha2
> Attachments: all-node-ozone-logs-1536841590.tar.gz
>
>
> steps taken :
> --
>  # Ran Put Key command to write 50GB data. Put Key client operation failed 
> after 17 mins.
> error seen  ozone.log :
> 
>  
> {code}
> 2018-09-13 12:11:53,734 [ForkJoinPool.commonPool-worker-20] DEBUG 
> (ChunkManagerImpl.java:85) - writing 
> chunk:bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_1
>  chunk stage:COMMIT_DATA chunk 
> file:/tmp/hadoop-root/dfs/data/hdds/de0a9e01-4a12-40e3-b567-51b9bd83248e/current/containerDir0/16/chunks/bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_1
>  tmp chunk file
> 2018-09-13 12:11:56,576 [pool-3-thread-60] DEBUG (ChunkManagerImpl.java:85) - 
> writing 
> chunk:bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2
>  chunk stage:WRITE_DATA chunk 
> file:/tmp/hadoop-root/dfs/data/hdds/de0a9e01-4a12-40e3-b567-51b9bd83248e/current/containerDir0/16/chunks/bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2
>  tmp chunk file
> 2018-09-13 12:11:56,739 [ForkJoinPool.commonPool-worker-20] DEBUG 
> (ChunkManagerImpl.java:85) - writing 
> chunk:bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2
>  chunk stage:COMMIT_DATA chunk 
> file:/tmp/hadoop-root/dfs/data/hdds/de0a9e01-4a12-40e3-b567-51b9bd83248e/current/containerDir0/16/chunks/bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2
>  tmp chunk file
> 2018-09-13 12:12:21,410 [Datanode State Machine Thread - 0] DEBUG 
> (DatanodeStateMachine.java:148) - Executing cycle Number : 206
> 2018-09-13 12:12:51,411 [Datanode State Machine Thread - 0] DEBUG 
> (DatanodeStateMachine.java:148) - Executing cycle Number : 207
> 2018-09-13 12:12:53,525 [BlockDeletingService#1] DEBUG 
> (TopNOrderedContainerDeletionChoosingPolicy.java:79) - Stop looking for next 
> container, there is no pending deletion block contained in remaining 
> containers.
> 2018-09-13 12:12:55,048 [Datanode ReportManager Thread - 1] DEBUG 
> (ContainerSet.java:191) - Starting container report iteration.
> 2018-09-13 12:13:02,626 [pool-3-thread-1] ERROR (ChunkUtils.java:244) - 
> Rejecting write chunk request. Chunk overwrite without explicit request. 
> ChunkInfo{chunkName='bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2,
>  offset=0, len=16777216}
> 2018-09-13 12:13:03,035 [pool-3-thread-1] INFO (ContainerUtils.java:149) - 
> Operation: WriteChunk : Trace ID: 54834b29-603d-4ba9-9d68-0885215759d8 : 
> Message: Rejecting write chunk request. OverWrite flag 
> required.ChunkInfo{chunkName='bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2,
>  offset=0, len=16777216} : Result: OVERWRITE_FLAG_REQUIRED
> 2018-09-13 12:13:03,037 [ForkJoinPool.commonPool-worker-11] ERROR 
> (ChunkUtils.java:244) - Rejecting write chunk request. Chunk overwrite 
> without explicit request. 
> ChunkInfo{chunkName='bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2,
>  offset=0, len=16777216}
> 2018-09-13 12:13:03,037 [ForkJoinPool.commonPool-worker-11] INFO 
> (ContainerUtils.java:149) - Operation: WriteChunk : Trace ID: 
> 54834b29-603d-4ba9-9d68-0885215759d8 : Message: Rejecting write chunk 
> request. OverWrite flag 
> required.ChunkInfo{chunkName='bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2,
>  offset=0, len=16777216} : Result: OVERWRITE_FLAG_REQUIRED
>  
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-372) There are three buffer copies in BlockOutputStream

2019-02-27 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779903#comment-16779903
 ] 

Tsz Wo Nicholas Sze commented on HDDS-372:
--

[~shashikant], it is great that you have picked this up.  Thanks.

> There are three buffer copies in BlockOutputStream
> --
>
> Key: HDDS-372
> URL: https://issues.apache.org/jira/browse/HDDS-372
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Client
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDDS-372.20180829.patch
>
>
> Currently, there are three buffer copies in ChunkOutputStream
>  # from byte[] to ByteBuffer, and
>  # from ByteBuffer to ByteString.
>  # from ByteString to ByteBuffer for checskum computation
> We should eliminate the ByteBuffer in the middle.
> For zero copy io, we should support WritableByteChannel instead of 
> OutputStream. It won't be done in this JIRA.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-451) PutKey failed due to error "Rejecting write chunk request. Chunk overwrite without explicit request"

2019-02-27 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779883#comment-16779883
 ] 

Tsz Wo Nicholas Sze commented on HDDS-451:
--

Let's resolve this then?  The description and stack trace become stale.  We 
should file a new JIRA if we see a problem in the future.

> PutKey failed due to error "Rejecting write chunk request. Chunk overwrite 
> without explicit request"
> 
>
> Key: HDDS-451
> URL: https://issues.apache.org/jira/browse/HDDS-451
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.2.1
>Reporter: Nilotpal Nandi
>Assignee: Shashikant Banerjee
>Priority: Blocker
>  Labels: alpha2
> Attachments: all-node-ozone-logs-1536841590.tar.gz
>
>
> steps taken :
> --
>  # Ran Put Key command to write 50GB data. Put Key client operation failed 
> after 17 mins.
> error seen  ozone.log :
> 
>  
> {code}
> 2018-09-13 12:11:53,734 [ForkJoinPool.commonPool-worker-20] DEBUG 
> (ChunkManagerImpl.java:85) - writing 
> chunk:bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_1
>  chunk stage:COMMIT_DATA chunk 
> file:/tmp/hadoop-root/dfs/data/hdds/de0a9e01-4a12-40e3-b567-51b9bd83248e/current/containerDir0/16/chunks/bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_1
>  tmp chunk file
> 2018-09-13 12:11:56,576 [pool-3-thread-60] DEBUG (ChunkManagerImpl.java:85) - 
> writing 
> chunk:bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2
>  chunk stage:WRITE_DATA chunk 
> file:/tmp/hadoop-root/dfs/data/hdds/de0a9e01-4a12-40e3-b567-51b9bd83248e/current/containerDir0/16/chunks/bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2
>  tmp chunk file
> 2018-09-13 12:11:56,739 [ForkJoinPool.commonPool-worker-20] DEBUG 
> (ChunkManagerImpl.java:85) - writing 
> chunk:bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2
>  chunk stage:COMMIT_DATA chunk 
> file:/tmp/hadoop-root/dfs/data/hdds/de0a9e01-4a12-40e3-b567-51b9bd83248e/current/containerDir0/16/chunks/bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2
>  tmp chunk file
> 2018-09-13 12:12:21,410 [Datanode State Machine Thread - 0] DEBUG 
> (DatanodeStateMachine.java:148) - Executing cycle Number : 206
> 2018-09-13 12:12:51,411 [Datanode State Machine Thread - 0] DEBUG 
> (DatanodeStateMachine.java:148) - Executing cycle Number : 207
> 2018-09-13 12:12:53,525 [BlockDeletingService#1] DEBUG 
> (TopNOrderedContainerDeletionChoosingPolicy.java:79) - Stop looking for next 
> container, there is no pending deletion block contained in remaining 
> containers.
> 2018-09-13 12:12:55,048 [Datanode ReportManager Thread - 1] DEBUG 
> (ContainerSet.java:191) - Starting container report iteration.
> 2018-09-13 12:13:02,626 [pool-3-thread-1] ERROR (ChunkUtils.java:244) - 
> Rejecting write chunk request. Chunk overwrite without explicit request. 
> ChunkInfo{chunkName='bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2,
>  offset=0, len=16777216}
> 2018-09-13 12:13:03,035 [pool-3-thread-1] INFO (ContainerUtils.java:149) - 
> Operation: WriteChunk : Trace ID: 54834b29-603d-4ba9-9d68-0885215759d8 : 
> Message: Rejecting write chunk request. OverWrite flag 
> required.ChunkInfo{chunkName='bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2,
>  offset=0, len=16777216} : Result: OVERWRITE_FLAG_REQUIRED
> 2018-09-13 12:13:03,037 [ForkJoinPool.commonPool-worker-11] ERROR 
> (ChunkUtils.java:244) - Rejecting write chunk request. Chunk overwrite 
> without explicit request. 
> ChunkInfo{chunkName='bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2,
>  offset=0, len=16777216}
> 2018-09-13 12:13:03,037 [ForkJoinPool.commonPool-worker-11] INFO 
> (ContainerUtils.java:149) - Operation: WriteChunk : Trace ID: 
> 54834b29-603d-4ba9-9d68-0885215759d8 : Message: Rejecting write chunk 
> request. OverWrite flag 
> required.ChunkInfo{chunkName='bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2,
>  offset=0, len=16777216} : Result: OVERWRITE_FLAG_REQUIRED
>  
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-699) Detect Ozone Network topology

2019-02-25 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16777522#comment-16777522
 ] 

Tsz Wo Nicholas Sze commented on HDDS-699:
--

[~Sammi] thanks for the update. Thanks for adding many tests.

Using AtomicInteger and AtomicReference may not work since individual fields 
may be mutated during a computation of a method.  For example, when calling 
sortByDistanceCost, all nodes are in the topology initially.  Then, some of the 
nodes may be removed during the computation of sortByDistanceCost.  
sortByDistanceCost may return incorrect results.

I just have realized that the patches here are mostly adding new code but not 
yet change the existing code to use the new NetworkTopology.  Then, the 01 
patch actually is better.  We may make NetworkTopology  pluggable and improve 
it later.

How about you address the previous comments from the 01 patch (using a single 
RW-lock)?

> Detect Ozone Network topology
> -
>
> Key: HDDS-699
> URL: https://issues.apache.org/jira/browse/HDDS-699
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Sammi Chen
>Priority: Major
> Attachments: HDDS-699.00.patch, HDDS-699.01.patch, HDDS-699.02.patch
>
>
> Traditionally this has been implemented in Hadoop via script or customizable 
> java class. One thing we want to add here is the flexible multi-level support 
> instead of fixed levels like DC/Rack/NG/Node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-699) Detect Ozone Network topology

2019-01-29 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16755460#comment-16755460
 ] 

Tsz Wo Nicholas Sze commented on HDDS-699:
--

Two suggestions:

# Move the root level locking to the second level.  The root node does not 
cache aggregate information so that write lock at root is needed only when the 
second level is changed.  Each node in the second level maintain a lock to 
protect its subtree.
# Separate the NetworkTopology interface and implementation so that replacing 
the implementation in the future becomes possible.

#2 may not be easy.  If we have #2, I am fine with any implementation today.

> Detect Ozone Network topology
> -
>
> Key: HDDS-699
> URL: https://issues.apache.org/jira/browse/HDDS-699
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Sammi Chen
>Priority: Major
> Attachments: HDDS-699.00.patch, HDDS-699.01.patch
>
>
> Traditionally this has been implemented in Hadoop via script or customizable 
> java class. One thing we want to add here is the flexible multi-level support 
> instead of fixed levels like DC/Rack/NG/Node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-699) Detect Ozone Network topology

2019-01-29 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16755329#comment-16755329
 ] 

Tsz Wo Nicholas Sze commented on HDDS-699:
--

> ... For the NetworkTopology performance, at the beginning I thought most of 
> the access are reads after the network topology is built, so a read write 
> reentrant single netlock approach may not cause much performance penalty. ...

In a large cluster, datanodes keep up and down so that NetworkTopology keeps 
changing.  There are a large amount of NetworkTopology queries.  
NetworkTopology becomes a scalability bottleneck.

Consider that Ozone is to support small objects.  We can foresee that the 
problem will be more worse compared to HDFS.

> If it will really cause big performance issue, then we'd better do some 
> improvement. ...

When we see the problem in production clusters, it is hard to do the 
improvement since it is very risky to change such critical code at that time.  
Also, there are API incompatibility --  it is impossible to change NodeImpl 
from mutable to immutable.  So, it is now or never.  :)


> Detect Ozone Network topology
> -
>
> Key: HDDS-699
> URL: https://issues.apache.org/jira/browse/HDDS-699
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Sammi Chen
>Priority: Major
> Attachments: HDDS-699.00.patch, HDDS-699.01.patch
>
>
> Traditionally this has been implemented in Hadoop via script or customizable 
> java class. One thing we want to add here is the flexible multi-level support 
> instead of fixed levels like DC/Rack/NG/Node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-699) Detect Ozone Network topology

2019-01-28 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16754403#comment-16754403
 ] 

Tsz Wo Nicholas Sze commented on HDDS-699:
--

Thanks [~Sammi] for working the patch.  Some comments/questions
- Do you expect NetConf be set by users/admins?  If not, let's rename it 
something like NetConstants.  In hadoop, conf is supposed to be set by 
users/admin.

- NetworkTopology uses the single netlock approach for the entire data 
structure.  It has been a performance bottleneck in HDFS for a long time.  I 
wonder if we could make Node and InnerNode threadsafe:
-* the NodeImpl can be immutable so that the accesses do not need any lock.
-* childrenMap in InnerNodeImpl can be changed to ConcurrentHashMap
-* Do not maintain numOfLeaves in Root.  For the other InnerNode, numOfLeaves 
is protected by a lock in getNumOfLeaves(), add(..) and remove(..)

- Remove NetworkTopology.random since it has a race condition.  Use 
ThreadLocalRandom.

> Detect Ozone Network topology
> -
>
> Key: HDDS-699
> URL: https://issues.apache.org/jira/browse/HDDS-699
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Sammi Chen
>Priority: Major
> Attachments: HDDS-699.00.patch, HDDS-699.01.patch
>
>
> Traditionally this has been implemented in Hadoop via script or customizable 
> java class. One thing we want to add here is the flexible multi-level support 
> instead of fixed levels like DC/Rack/NG/Node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-698) Support Topology Awareness for Ozone

2019-01-12 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16741485#comment-16741485
 ] 

Tsz Wo Nicholas Sze commented on HDDS-698:
--

[~Sammi] and [~djp], thanks for working on this.

Are you planning to post patches to the subtasks?  Or just to post a patch 
here?  Please let me know.  I am happy to review the patches.

> Support Topology Awareness for Ozone
> 
>
> Key: HDDS-698
> URL: https://issues.apache.org/jira/browse/HDDS-698
> Project: Hadoop Distributed Data Store
>  Issue Type: New Feature
>Reporter: Xiaoyu Yao
>Assignee: Sammi Chen
>Priority: Major
> Attachments: HDDS-698.000.patch, network-topology-default.xml, 
> network-topology-nodegroup.xml
>
>
> This is an umbrella JIRA to add topology aware support for Ozone Pipelines, 
> Containers and Blocks. Long time since HDFS is created, we provide 
> rack/nodegroup awareness for reliability and high performance for data 
> access. Ozone need a similar mechanism and this can be more flexible for 
> cloud scenarios. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-870) Avoid creating block sized buffer in ChunkGroupOutputStream

2018-12-07 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16713455#comment-16713455
 ] 

Tsz Wo Nicholas Sze commented on HDDS-870:
--

Filed RATIS-453 to fix the retry behavior in Ratis.

> Avoid creating block sized buffer in ChunkGroupOutputStream
> ---
>
> Key: HDDS-870
> URL: https://issues.apache.org/jira/browse/HDDS-870
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Client
>Affects Versions: 0.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.4.0
>
> Attachments: HDDS-870.000.patch, HDDS-870.001.patch, 
> HDDS-870.002.patch, HDDS-870.003.patch, HDDS-870.004.patch, 
> HDDS-870.005.patch, HDDS-870.006.patch, HDDS-870.007.patch, 
> HDDS-870.008.patch, HDDS-870.009.patch
>
>
> Currently, for a key, we create a block size byteBuffer in order for caching 
> data. This can be replaced with an array of buffers of size flush buffer size 
> configured for handling 2 node failures as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14112) Avoid recursive call to external authorizer for getContentSummary.

2018-11-29 Thread Tsz Wo Nicholas Sze (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-14112:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.2.1
   Status: Resolved  (was: Patch Available)

Thanks [~jnp] for reviewing the patch.

I have committed this.

> Avoid recursive call to external authorizer for getContentSummary.
> --
>
> Key: HDFS-14112
> URL: https://issues.apache.org/jira/browse/HDFS-14112
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Jitendra Nath Pandey
>Assignee: Tsz Wo Nicholas Sze
>Priority: Critical
> Fix For: 3.2.1
>
> Attachments: h14112_20181128.patch, h14112_20181129.patch
>
>
> HDFS-12130 optimizes permission check, and invokes permission checker 
> recursively for each component of the tree, which works well for FSPermission 
> checker.
> But for certain external authorizers it may be more efficient to make one 
> call with {{subaccess}}, because often they don't have to evaluate for each 
> and every component of the path.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14112) Avoid recursive call to external authorizer for getContentSummary.

2018-11-29 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16703670#comment-16703670
 ] 

Tsz Wo Nicholas Sze commented on HDFS-14112:


h14112_20181129.patch: adds the new conf to hdfs-default.xml


> Avoid recursive call to external authorizer for getContentSummary.
> --
>
> Key: HDFS-14112
> URL: https://issues.apache.org/jira/browse/HDFS-14112
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Jitendra Nath Pandey
>Assignee: Tsz Wo Nicholas Sze
>Priority: Critical
> Attachments: h14112_20181128.patch, h14112_20181129.patch
>
>
> HDFS-12130 optimizes permission check, and invokes permission checker 
> recursively for each component of the tree, which works well for FSPermission 
> checker.
> But for certain external authorizers it may be more efficient to make one 
> call with {{subaccess}}, because often they don't have to evaluate for each 
> and every component of the path.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14112) Avoid recursive call to external authorizer for getContentSummary.

2018-11-29 Thread Tsz Wo Nicholas Sze (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-14112:
---
Attachment: h14112_20181129.patch

> Avoid recursive call to external authorizer for getContentSummary.
> --
>
> Key: HDFS-14112
> URL: https://issues.apache.org/jira/browse/HDFS-14112
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Jitendra Nath Pandey
>Assignee: Tsz Wo Nicholas Sze
>Priority: Critical
> Attachments: h14112_20181128.patch, h14112_20181129.patch
>
>
> HDFS-12130 optimizes permission check, and invokes permission checker 
> recursively for each component of the tree, which works well for FSPermission 
> checker.
> But for certain external authorizers it may be more efficient to make one 
> call with {{subaccess}}, because often they don't have to evaluate for each 
> and every component of the path.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14112) Avoid recursive call to external authorizer for getContentSummary.

2018-11-28 Thread Tsz Wo Nicholas Sze (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-14112:
---
Component/s: namenode

> Avoid recursive call to external authorizer for getContentSummary.
> --
>
> Key: HDFS-14112
> URL: https://issues.apache.org/jira/browse/HDFS-14112
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Jitendra Nath Pandey
>Assignee: Tsz Wo Nicholas Sze
>Priority: Critical
> Attachments: h14112_20181128.patch
>
>
> HDFS-12130 optimizes permission check, and invokes permission checker 
> recursively for each component of the tree, which works well for FSPermission 
> checker.
> But for certain external authorizers it may be more efficient to make one 
> call with {{subaccess}}, because often they don't have to evaluate for each 
> and every component of the path.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14112) Avoid recursive call to external authorizer for getContentSummary.

2018-11-28 Thread Tsz Wo Nicholas Sze (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-14112:
---
Status: Patch Available  (was: Open)

h14112_20181128.patch: adds back the subAccess check in pre HDFS-12130 and a 
conf.

> Avoid recursive call to external authorizer for getContentSummary.
> --
>
> Key: HDFS-14112
> URL: https://issues.apache.org/jira/browse/HDFS-14112
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Jitendra Nath Pandey
>Assignee: Tsz Wo Nicholas Sze
>Priority: Critical
> Attachments: h14112_20181128.patch
>
>
> HDFS-12130 optimizes permission check, and invokes permission checker 
> recursively for each component of the tree, which works well for FSPermission 
> checker.
> But for certain external authorizers it may be more efficient to make one 
> call with {{subaccess}}, because often they don't have to evaluate for each 
> and every component of the path.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14112) Avoid recursive call to external authorizer for getContentSummary.

2018-11-28 Thread Tsz Wo Nicholas Sze (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-14112:
---
Attachment: h14112_20181128.patch

> Avoid recursive call to external authorizer for getContentSummary.
> --
>
> Key: HDFS-14112
> URL: https://issues.apache.org/jira/browse/HDFS-14112
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jitendra Nath Pandey
>Assignee: Tsz Wo Nicholas Sze
>Priority: Critical
> Attachments: h14112_20181128.patch
>
>
> HDFS-12130 optimizes permission check, and invokes permission checker 
> recursively for each component of the tree, which works well for FSPermission 
> checker.
> But for certain external authorizers it may be more efficient to make one 
> call with {{subaccess}}, because often they don't have to evaluate for each 
> and every component of the path.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-826) Update Ratis to 0.3.0-6f3419a-SNAPSHOT

2018-11-09 Thread Tsz Wo Nicholas Sze (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDDS-826:
-
   Resolution: Fixed
Fix Version/s: 0.3.0
   Status: Resolved  (was: Patch Available)

Thanks [~jnp] and [~msingh] for reviewing the patches.

I have committed this.

> Update Ratis to 0.3.0-6f3419a-SNAPSHOT
> --
>
> Key: HDDS-826
> URL: https://issues.apache.org/jira/browse/HDDS-826
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
> Fix For: 0.3.0
>
> Attachments: HDDS-826.20181109b.patch, HDDS-826.20181109c.patch
>
>
> RATIS-404 fixed a deadlock bug.  We should update Ratis here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-826) Update Ratis to 0.3.0-6f3419a-SNAPSHOT

2018-11-09 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682117#comment-16682117
 ] 

Tsz Wo Nicholas Sze commented on HDDS-826:
--

HDDS-826.20181109c.patch: changes also hadoop-ozone/pom.xml.

> Update Ratis to 0.3.0-6f3419a-SNAPSHOT
> --
>
> Key: HDDS-826
> URL: https://issues.apache.org/jira/browse/HDDS-826
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
> Attachments: HDDS-826.20181109b.patch, HDDS-826.20181109c.patch
>
>
> RATIS-404 fixed a deadlock bug.  We should update Ratis here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-826) Update Ratis to 0.3.0-6f3419a-SNAPSHOT

2018-11-09 Thread Tsz Wo Nicholas Sze (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDDS-826:
-
Attachment: HDDS-826.20181109c.patch

> Update Ratis to 0.3.0-6f3419a-SNAPSHOT
> --
>
> Key: HDDS-826
> URL: https://issues.apache.org/jira/browse/HDDS-826
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
> Attachments: HDDS-826.20181109b.patch, HDDS-826.20181109c.patch
>
>
> RATIS-404 fixed a deadlock bug.  We should update Ratis here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-826) Update Ratis to 0.3.0-6f3419a-SNAPSHOT

2018-11-09 Thread Tsz Wo Nicholas Sze (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDDS-826:
-
Attachment: (was: HDDS-826.001.patch)

> Update Ratis to 0.3.0-6f3419a-SNAPSHOT
> --
>
> Key: HDDS-826
> URL: https://issues.apache.org/jira/browse/HDDS-826
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
> Attachments: HDDS-826.20181109b.patch
>
>
> RATIS-404 fixed a deadlock bug.  We should update Ratis here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-826) Update Ratis to 0.3.0-6f3419a-SNAPSHOT

2018-11-09 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682095#comment-16682095
 ] 

Tsz Wo Nicholas Sze commented on HDDS-826:
--

Thanks.  I somehow thought that the problem is in the file name.  Here is a new 
patch for trunk.

HDDS-826.20181109b.patch

> Update Ratis to 0.3.0-6f3419a-SNAPSHOT
> --
>
> Key: HDDS-826
> URL: https://issues.apache.org/jira/browse/HDDS-826
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
> Attachments: HDDS-826.20181109b.patch
>
>
> RATIS-404 fixed a deadlock bug.  We should update Ratis here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-826) Update Ratis to 0.3.0-6f3419a-SNAPSHOT

2018-11-09 Thread Tsz Wo Nicholas Sze (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDDS-826:
-
Attachment: HDDS-826.20181109b.patch

> Update Ratis to 0.3.0-6f3419a-SNAPSHOT
> --
>
> Key: HDDS-826
> URL: https://issues.apache.org/jira/browse/HDDS-826
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
> Attachments: HDDS-826.20181109b.patch
>
>
> RATIS-404 fixed a deadlock bug.  We should update Ratis here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-826) Update Ratis to 0.3.0-6f3419a-SNAPSHOT

2018-11-09 Thread Tsz Wo Nicholas Sze (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDDS-826:
-
Attachment: (was: HDDS-826.20181109.patch)

> Update Ratis to 0.3.0-6f3419a-SNAPSHOT
> --
>
> Key: HDDS-826
> URL: https://issues.apache.org/jira/browse/HDDS-826
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
> Attachments: HDDS-826.20181109b.patch
>
>
> RATIS-404 fixed a deadlock bug.  We should update Ratis here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-826) Update Ratis to 0.3.0-6f3419a-SNAPSHOT

2018-11-09 Thread Tsz Wo Nicholas Sze (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDDS-826:
-
Attachment: HDDS-826.001.patch

> Update Ratis to 0.3.0-6f3419a-SNAPSHOT
> --
>
> Key: HDDS-826
> URL: https://issues.apache.org/jira/browse/HDDS-826
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
> Attachments: HDDS-826.001.patch, HDDS-826.20181109.patch
>
>
> RATIS-404 fixed a deadlock bug.  We should update Ratis here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-826) Update Ratis to 0.3.0-6f3419a-SNAPSHOT

2018-11-09 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16681972#comment-16681972
 ] 

Tsz Wo Nicholas Sze commented on HDDS-826:
--

HDDS-826.20181109.patch: Re-upload the patch using . instead of _ in the file 
name.

> Update Ratis to 0.3.0-6f3419a-SNAPSHOT
> --
>
> Key: HDDS-826
> URL: https://issues.apache.org/jira/browse/HDDS-826
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
> Attachments: HDDS-826.20181109.patch
>
>
> RATIS-404 fixed a deadlock bug.  We should update Ratis here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-826) Update Ratis to 0.3.0-6f3419a-SNAPSHOT

2018-11-09 Thread Tsz Wo Nicholas Sze (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDDS-826:
-
Attachment: HDDS-826.20181109.patch

> Update Ratis to 0.3.0-6f3419a-SNAPSHOT
> --
>
> Key: HDDS-826
> URL: https://issues.apache.org/jira/browse/HDDS-826
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
> Attachments: HDDS-826.20181109.patch
>
>
> RATIS-404 fixed a deadlock bug.  We should update Ratis here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-826) Update Ratis to 0.3.0-6f3419a-SNAPSHOT

2018-11-09 Thread Tsz Wo Nicholas Sze (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDDS-826:
-
Attachment: (was: HDDS-826_20181109.patch)

> Update Ratis to 0.3.0-6f3419a-SNAPSHOT
> --
>
> Key: HDDS-826
> URL: https://issues.apache.org/jira/browse/HDDS-826
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
> Attachments: HDDS-826.20181109.patch
>
>
> RATIS-404 fixed a deadlock bug.  We should update Ratis here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-826) Update Ratis to 0.3.0-6f3419a-SNAPSHOT

2018-11-09 Thread Tsz Wo Nicholas Sze (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDDS-826:
-
Status: Patch Available  (was: Open)

> Update Ratis to 0.3.0-6f3419a-SNAPSHOT
> --
>
> Key: HDDS-826
> URL: https://issues.apache.org/jira/browse/HDDS-826
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
> Attachments: HDDS-826_20181109.patch
>
>
> RATIS-404 fixed a deadlock bug.  We should update Ratis here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-826) Update Ratis to 0.3.0-6f3419a-SNAPSHOT

2018-11-09 Thread Tsz Wo Nicholas Sze (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDDS-826:
-
Attachment: HDDS-826_20181109.patch

> Update Ratis to 0.3.0-6f3419a-SNAPSHOT
> --
>
> Key: HDDS-826
> URL: https://issues.apache.org/jira/browse/HDDS-826
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
> Attachments: HDDS-826_20181109.patch
>
>
> RATIS-404 fixed a deadlock bug.  We should update Ratis here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-826) Update Ratis to 0.3.0-6f3419a-SNAPSHOT

2018-11-09 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDDS-826:


 Summary: Update Ratis to 0.3.0-6f3419a-SNAPSHOT
 Key: HDDS-826
 URL: https://issues.apache.org/jira/browse/HDDS-826
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze


RATIS-404 fixed a deadlock bug.  We should update Ratis here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-806) Update Ratis to latest snapshot version in ozone

2018-11-08 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16680392#comment-16680392
 ] 

Tsz Wo Nicholas Sze commented on HDDS-806:
--

[~msingh], thanks a lot for the followup works.

[~shashikant], thank you for reviewing and committing the patches.

> Update Ratis to latest snapshot version in ozone
> 
>
> Key: HDDS-806
> URL: https://issues.apache.org/jira/browse/HDDS-806
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 0.3.0
>Reporter: Nilotpal Nandi
>Assignee: Tsz Wo Nicholas Sze
>Priority: Blocker
> Fix For: 0.3.0, 0.4.0
>
> Attachments: HDDS-806.001.patch, HDDS-806.002.patch, 
> HDDS-806_20181107.patch, all-node-ozone-logs-1540979056.tar.gz
>
>
> datanode stopped due to following error :
> datanode.log
> {noformat}
> 2018-10-31 09:12:04,517 INFO org.apache.ratis.server.impl.RaftServerImpl: 
> 9fab9937-fbcd-4196-8014-cb165045724b: set configuration 169: 
> [9fab9937-fbcd-4196-8014-cb165045724b:172.27.15.131:9858, 
> ce0084c2-97cd-4c97-9378-e5175daad18b:172.27.15.139:9858, 
> f0291cb4-7a48-456a-847f-9f91a12aa850:172.27.38.9:9858], old=null at 169
> 2018-10-31 09:12:22,187 ERROR org.apache.ratis.server.storage.RaftLogWorker: 
> Terminating with exit status 1: 
> 9fab9937-fbcd-4196-8014-cb165045724b-RaftLogWorker failed.
> org.apache.ratis.protocol.TimeoutIOException: Timeout: WriteLog:182: (t:10, 
> i:182), STATEMACHINELOGENTRY, client-611073BBFA46, 
> cid=127-writeStateMachineData
>  at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:87)
>  at 
> org.apache.ratis.server.storage.RaftLogWorker$WriteLog.execute(RaftLogWorker.java:310)
>  at org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:182)
>  at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.concurrent.TimeoutException
>  at 
> java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
>  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
>  at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:79)
>  ... 3 more{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-691) Dependency convergence error for org.apache.hadoop:hadoop-annotations

2018-11-08 Thread Tsz Wo Nicholas Sze (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDDS-691:
-
Resolution: Not A Problem
Status: Resolved  (was: Patch Available)

This seems not a problem anymore.  Resolving ...

> Dependency convergence error for org.apache.hadoop:hadoop-annotations
> -
>
> Key: HDDS-691
> URL: https://issues.apache.org/jira/browse/HDDS-691
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 0.2.1
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
> Attachments: HDDS-691_20181018.patch, HDDS-691_20181019.patch
>
>
> {code}
> [WARNING] 
> Dependency convergence error for 
> org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT paths to dependency are:
> +-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT
>   +-org.apache.hadoop:hadoop-common:3.3.0-20181017.235917-140
> +-org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT
> and
> +-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT
>   +-org.apache.hadoop:hadoop-common:3.3.0-20181017.235917-140
> +-org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT
> and
> +-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT
>   +-org.apache.hadoop:hadoop-annotations:3.3.0-20181017.235840-140
> [WARNING] Rule 0: org.apache.maven.plugins.enforcer.DependencyConvergence 
> failed with message:
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-806) writeStateMachineData times out because chunk executors are not scheduled

2018-11-08 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679066#comment-16679066
 ] 

Tsz Wo Nicholas Sze commented on HDDS-806:
--

BTW, [~jnp] has suggested to reduce the log queue size in Ozone.  How about 
setting it to 1024?
{code}
raft.server.log.queue.size (int, default=4096)
{code}


> writeStateMachineData times out because chunk executors are not scheduled
> -
>
> Key: HDDS-806
> URL: https://issues.apache.org/jira/browse/HDDS-806
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 0.3.0
>Reporter: Nilotpal Nandi
>Assignee: Mukul Kumar Singh
>Priority: Blocker
> Fix For: 0.3.0
>
> Attachments: all-node-ozone-logs-1540979056.tar.gz
>
>
> datanode stopped due to following error :
> datanode.log
> {noformat}
> 2018-10-31 09:12:04,517 INFO org.apache.ratis.server.impl.RaftServerImpl: 
> 9fab9937-fbcd-4196-8014-cb165045724b: set configuration 169: 
> [9fab9937-fbcd-4196-8014-cb165045724b:172.27.15.131:9858, 
> ce0084c2-97cd-4c97-9378-e5175daad18b:172.27.15.139:9858, 
> f0291cb4-7a48-456a-847f-9f91a12aa850:172.27.38.9:9858], old=null at 169
> 2018-10-31 09:12:22,187 ERROR org.apache.ratis.server.storage.RaftLogWorker: 
> Terminating with exit status 1: 
> 9fab9937-fbcd-4196-8014-cb165045724b-RaftLogWorker failed.
> org.apache.ratis.protocol.TimeoutIOException: Timeout: WriteLog:182: (t:10, 
> i:182), STATEMACHINELOGENTRY, client-611073BBFA46, 
> cid=127-writeStateMachineData
>  at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:87)
>  at 
> org.apache.ratis.server.storage.RaftLogWorker$WriteLog.execute(RaftLogWorker.java:310)
>  at org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:182)
>  at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.concurrent.TimeoutException
>  at 
> java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
>  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
>  at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:79)
>  ... 3 more{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-806) writeStateMachineData times out because chunk executors are not scheduled

2018-11-08 Thread Tsz Wo Nicholas Sze (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDDS-806:
-
Attachment: HDDS-806_20181107.patch

> writeStateMachineData times out because chunk executors are not scheduled
> -
>
> Key: HDDS-806
> URL: https://issues.apache.org/jira/browse/HDDS-806
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 0.3.0
>Reporter: Nilotpal Nandi
>Assignee: Mukul Kumar Singh
>Priority: Blocker
> Fix For: 0.3.0
>
> Attachments: HDDS-806_20181107.patch, 
> all-node-ozone-logs-1540979056.tar.gz
>
>
> datanode stopped due to following error :
> datanode.log
> {noformat}
> 2018-10-31 09:12:04,517 INFO org.apache.ratis.server.impl.RaftServerImpl: 
> 9fab9937-fbcd-4196-8014-cb165045724b: set configuration 169: 
> [9fab9937-fbcd-4196-8014-cb165045724b:172.27.15.131:9858, 
> ce0084c2-97cd-4c97-9378-e5175daad18b:172.27.15.139:9858, 
> f0291cb4-7a48-456a-847f-9f91a12aa850:172.27.38.9:9858], old=null at 169
> 2018-10-31 09:12:22,187 ERROR org.apache.ratis.server.storage.RaftLogWorker: 
> Terminating with exit status 1: 
> 9fab9937-fbcd-4196-8014-cb165045724b-RaftLogWorker failed.
> org.apache.ratis.protocol.TimeoutIOException: Timeout: WriteLog:182: (t:10, 
> i:182), STATEMACHINELOGENTRY, client-611073BBFA46, 
> cid=127-writeStateMachineData
>  at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:87)
>  at 
> org.apache.ratis.server.storage.RaftLogWorker$WriteLog.execute(RaftLogWorker.java:310)
>  at org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:182)
>  at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.concurrent.TimeoutException
>  at 
> java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
>  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
>  at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:79)
>  ... 3 more{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-806) writeStateMachineData times out because chunk executors are not scheduled

2018-11-08 Thread Tsz Wo Nicholas Sze (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDDS-806:
-
Assignee: Tsz Wo Nicholas Sze  (was: Mukul Kumar Singh)
  Status: Patch Available  (was: Open)

HDDS-806_20181107.patch: updates Ratis version

> writeStateMachineData times out because chunk executors are not scheduled
> -
>
> Key: HDDS-806
> URL: https://issues.apache.org/jira/browse/HDDS-806
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 0.3.0
>Reporter: Nilotpal Nandi
>Assignee: Tsz Wo Nicholas Sze
>Priority: Blocker
> Fix For: 0.3.0
>
> Attachments: HDDS-806_20181107.patch, 
> all-node-ozone-logs-1540979056.tar.gz
>
>
> datanode stopped due to following error :
> datanode.log
> {noformat}
> 2018-10-31 09:12:04,517 INFO org.apache.ratis.server.impl.RaftServerImpl: 
> 9fab9937-fbcd-4196-8014-cb165045724b: set configuration 169: 
> [9fab9937-fbcd-4196-8014-cb165045724b:172.27.15.131:9858, 
> ce0084c2-97cd-4c97-9378-e5175daad18b:172.27.15.139:9858, 
> f0291cb4-7a48-456a-847f-9f91a12aa850:172.27.38.9:9858], old=null at 169
> 2018-10-31 09:12:22,187 ERROR org.apache.ratis.server.storage.RaftLogWorker: 
> Terminating with exit status 1: 
> 9fab9937-fbcd-4196-8014-cb165045724b-RaftLogWorker failed.
> org.apache.ratis.protocol.TimeoutIOException: Timeout: WriteLog:182: (t:10, 
> i:182), STATEMACHINELOGENTRY, client-611073BBFA46, 
> cid=127-writeStateMachineData
>  at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:87)
>  at 
> org.apache.ratis.server.storage.RaftLogWorker$WriteLog.execute(RaftLogWorker.java:310)
>  at org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:182)
>  at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.concurrent.TimeoutException
>  at 
> java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
>  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
>  at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:79)
>  ... 3 more{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-806) writeStateMachineData times out because chunk executors are not scheduled

2018-11-07 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678935#comment-16678935
 ] 

Tsz Wo Nicholas Sze commented on HDDS-806:
--

[~msingh], RATIS-396 is now committed.  Also have deployed 
0.3.0-1d07b18-SNAPSHOT .

> writeStateMachineData times out because chunk executors are not scheduled
> -
>
> Key: HDDS-806
> URL: https://issues.apache.org/jira/browse/HDDS-806
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 0.3.0
>Reporter: Nilotpal Nandi
>Assignee: Mukul Kumar Singh
>Priority: Blocker
> Fix For: 0.3.0
>
> Attachments: all-node-ozone-logs-1540979056.tar.gz
>
>
> datanode stopped due to following error :
> datanode.log
> {noformat}
> 2018-10-31 09:12:04,517 INFO org.apache.ratis.server.impl.RaftServerImpl: 
> 9fab9937-fbcd-4196-8014-cb165045724b: set configuration 169: 
> [9fab9937-fbcd-4196-8014-cb165045724b:172.27.15.131:9858, 
> ce0084c2-97cd-4c97-9378-e5175daad18b:172.27.15.139:9858, 
> f0291cb4-7a48-456a-847f-9f91a12aa850:172.27.38.9:9858], old=null at 169
> 2018-10-31 09:12:22,187 ERROR org.apache.ratis.server.storage.RaftLogWorker: 
> Terminating with exit status 1: 
> 9fab9937-fbcd-4196-8014-cb165045724b-RaftLogWorker failed.
> org.apache.ratis.protocol.TimeoutIOException: Timeout: WriteLog:182: (t:10, 
> i:182), STATEMACHINELOGENTRY, client-611073BBFA46, 
> cid=127-writeStateMachineData
>  at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:87)
>  at 
> org.apache.ratis.server.storage.RaftLogWorker$WriteLog.execute(RaftLogWorker.java:310)
>  at org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:182)
>  at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.concurrent.TimeoutException
>  at 
> java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
>  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
>  at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:79)
>  ... 3 more{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13999) Bogus missing block warning if the file is under construction when NN starts

2018-11-06 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677472#comment-16677472
 ] 

Tsz Wo Nicholas Sze commented on HDFS-13999:


+1 the 001 patch looks good.

> Bogus missing block warning if the file is under construction when NN starts
> 
>
> Key: HDFS-13999
> URL: https://issues.apache.org/jira/browse/HDFS-13999
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Attachments: HDFS-13999.branch-2.7.001.patch, webui missing blocks.png
>
>
> We found an interesting case where web UI displays a few missing blocks, but 
> it doesn't state which files are corrupt. What'll also happen is that fsck 
> states the file system is healthy. This bug is similar to HDFS-10827 and 
> HDFS-8533. 
>  (See the attachment for an example)
> Using Dynamometer, I was able to reproduce the bug, and realized the the 
> "missing" blocks are actually healthy, but somehow neededReplications doesn't 
> get updated when NN receives block reports. What's more interesting is that 
> the files associated with the "missing" blocks are under construction when NN 
> starts, and so after a while NN prints file recovery log.
> Given that, I determined the following code is the source of bug:
> {code:java|title=BlockManager#addStoredBlock}
> 
>// if file is under construction, then done for now
> if (bc.isUnderConstruction()) {
>   return storedBlock;
> }
> {code}
> which is wrong, because a file may have multiple blocks, and the first block 
> is complete. In which case, the neededReplications structure doesn't get 
> updated for the first block, and thus the missing block warning on the web 
> UI. More appropriately, it should check the state of the block itself, not 
> the file.
> Fortunately, it was unintentionally fixed via HDFS-9754:
> {code:java}
> // if block is still under construction, then done for now
> if (!storedBlock.isCompleteOrCommitted()) {
>   return storedBlock;
> }
> {code}
> We should bring this fix into branch-2.7 too. That said, this is a harmless 
> warning, and should go away after the under-construction-files are recovered, 
> and the NN restarts (or force full block reports).
> Kudos to Dynamometer! It would be impossible to reproduce this bug without 
> the tool. And thanks [~smeng] for helping with the reproduction.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-722) ozone datanodes failed to start on few nodes

2018-10-24 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663280#comment-16663280
 ] 

Tsz Wo Nicholas Sze commented on HDDS-722:
--

Ratis should tolerate the last half written log entry; filed RATIS-373.

> ozone datanodes failed to start on few nodes
> 
>
> Key: HDDS-722
> URL: https://issues.apache.org/jira/browse/HDDS-722
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.3.0
>Reporter: Nilotpal Nandi
>Priority: Critical
> Attachments: all-node-ozone-logs-1540356965.tar.gz
>
>
> steps taken :
> --
>  # put few keys using ozonefs.
>  # stopped all services of the cluster.
>  # started om and scm.
>  # After sometime , started datanodes.
> All datanodes failed to start . Out of 12 datanodes, 4 datanodes failed to 
> start.
>  
> Here is the datanode log snippet :
> 
>  
> {noformat}
> 2018-10-24 04:49:30,594 ERROR 
> org.apache.ratis.server.impl.StateMachineUpdater: Terminating with exit 
> status 2: StateMachineUpdater-9524f4e2-9031-4852-ab7c-11c2da3460db: the 
> StateMachineUpdater hits Throwable
> org.apache.ratis.server.storage.RaftLogIOException: java.io.IOException: 
> Premature EOF from inputStream
>  at org.apache.ratis.server.storage.LogSegment.loadCache(LogSegment.java:299)
>  at 
> org.apache.ratis.server.storage.SegmentedRaftLog.get(SegmentedRaftLog.java:192)
>  at 
> org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:142)
>  at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Premature EOF from inputStream
>  at org.apache.ratis.util.IOUtils.readFully(IOUtils.java:100)
>  at org.apache.ratis.server.storage.LogReader.decodeEntry(LogReader.java:250)
>  at org.apache.ratis.server.storage.LogReader.readEntry(LogReader.java:155)
>  at 
> org.apache.ratis.server.storage.LogInputStream.nextEntry(LogInputStream.java:128)
>  at 
> org.apache.ratis.server.storage.LogSegment.readSegmentFile(LogSegment.java:110)
>  at org.apache.ratis.server.storage.LogSegment.access$400(LogSegment.java:43)
>  at 
> org.apache.ratis.server.storage.LogSegment$LogEntryLoader.load(LogSegment.java:167)
>  at 
> org.apache.ratis.server.storage.LogSegment$LogEntryLoader.load(LogSegment.java:161)
>  at org.apache.ratis.server.storage.LogSegment.loadCache(LogSegment.java:295)
>  ... 3 more
> 2018-10-24 04:49:30,598 INFO org.apache.hadoop.ozone.HddsDatanodeService: 
> SHUTDOWN_MSG:
> /
> SHUTDOWN_MSG: Shutting down HddsDatanodeService at 
> ctr-e138-1518143905142-541661-01-03.hwx.site/172.27.57.0
> /
> 2018-10-24 04:49:30,598 WARN org.apache.hadoop.fs.CachingGetSpaceUsed: Thread 
> Interrupted waiting to refresh disk information: sleep interrupted
>  
> {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-638) enable ratis snapshots for HDDS datanodes

2018-10-21 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658294#comment-16658294
 ] 

Tsz Wo Nicholas Sze commented on HDDS-638:
--

[~msingh], thanks for the update.

+1 the 002 patch looks good.  The findbugs warning is not related.

> enable ratis snapshots for HDDS datanodes
> -
>
> Key: HDDS-638
> URL: https://issues.apache.org/jira/browse/HDDS-638
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Blocker
> Attachments: HDDS-638.001.patch, HDDS-638.002.patch
>
>
> Currently on a restart, a hdds datanode, starts applying log entries from the 
> start of the log.
> This should can be avoided by taking a ratis snapshot to persist the last 
> stable state and on restart the datanodes start applying log from that index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-638) enable ratis snapshots for HDDS datanodes

2018-10-21 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658188#comment-16658188
 ] 

Tsz Wo Nicholas Sze commented on HDDS-638:
--

Some comments on ContainerStateMachine:
- For updating lastAppliedTermIndex:
-* ConcurrentHashMap does not support null values. It won't work since 
addRequest always returns null.  Just remove addRequest.
-* Remove addRequest(..) since it do not seem useful.
-* lastSuccessfullyAppliedIndex does not seem useful. How about removing it? 
Just use lastAppliedTermIndex in BaseStateMachine.

The code should look like below:
{code}
+  private void updateLastAppliedTermIndex() {
+Long appledTerm = null;
+long appliedIndex = -1;
+for(long i = getLastAppliedTermIndex().getIndex() + 1;; i++) {
+  final Long removed = containerCommandCompletionMap.remove(i);
+  if (removed == null) {
+break;
+  }
+  appledTerm = removed;
+  appliedIndex = i;
+}
+if (appledTerm != null) {
+  updateLastAppliedTermIndex(appliedIndex, appledTerm);
+}
+  }
+
   /*
* ApplyTransaction calls in Ratis are sequential.
*/
   @Override
   public CompletableFuture applyTransaction(TransactionContext trx) {
+final long index = trx.getLogEntry().getIndex();
 try {
   metrics.incNumApplyTransactionsOps();
   ContainerCommandRequestProto requestProto =
@@ -418,7 +476,7 @@ private ByteString readStateMachineData(LogEntryProto entry,
   blockDataProto.getBlockID());
   return completeExceptionally(ioe);
 }
-blockData.setBlockCommitSequenceId(trx.getLogEntry().getIndex());
+blockData.setBlockCommitSequenceId(index);
 final ContainerProtos.PutBlockRequestProto putBlockRequestProto =
 ContainerProtos.PutBlockRequestProto
 .newBuilder(requestProto.getPutBlock())
@@ -440,6 +498,13 @@ private ByteString readStateMachineData(LogEntryProto 
entry,
 future.thenApply(
 r -> createContainerFutureMap.remove(containerID).complete(null));
   }
+
+  future.thenAccept( m -> {
+final Long previous = containerCommandCompletionMap.put(index, 
trx.getLogEntry().getTerm());
+Preconditions.checkState(previous == null);
+updateLastAppliedTermIndex();
+  });
+
   return future;
 } catch (IOException e) {
   metrics.incNumApplyTransactionsFails();
{code}


- Why "TODO persist open containers in snapshots"?  Open containers should be 
persisted if the index is applied to state machine.  No?

- In loadSnapshot, remove the snapshotFile.exists() check.  It must exist by 
storage.getLatestSnapshot().
-* Remove the warning from the snapshot == null case.  It is normal when the 
storage is newly formatted.

- Add @Override to takeSnapshot() and it should throw IOException when 
createNewFile() fails.

- In the test, it should check if the expected snapshot files exists.


Some other comments:

- flushStateMachineData is expensive since it loops through the entire map.  It 
should be rewritten (probably in a sepearted JIRA.)

- The following TODO in initialize(.,) can be removed.  
BaseStateMachine.getId() will the server id iff initialize has been called; 
otherwise, it returns null. 
{code}
// TODO: Add a flag that tells you that initialize has been called.
// Check with Ratis if this feature is done in Ratis.
{code}

> enable ratis snapshots for HDDS datanodes
> -
>
> Key: HDDS-638
> URL: https://issues.apache.org/jira/browse/HDDS-638
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Blocker
> Attachments: HDDS-638.001.patch
>
>
> Currently on a restart, a hdds datanode, starts applying log entries from the 
> start of the log.
> This should can be avoided by taking a ratis snapshot to persist the last 
> stable state and on restart the datanodes start applying log from that index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-691) Dependency convergence error for org.apache.hadoop:hadoop-annotations

2018-10-19 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16656992#comment-16656992
 ] 

Tsz Wo Nicholas Sze commented on HDDS-691:
--

Sure, removing uniqueVersions sounds good.

Here is a now patch: HDDS-691_20181019.patch 


> Dependency convergence error for org.apache.hadoop:hadoop-annotations
> -
>
> Key: HDDS-691
> URL: https://issues.apache.org/jira/browse/HDDS-691
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
> Attachments: HDDS-691_20181018.patch, HDDS-691_20181019.patch
>
>
> {code}
> [WARNING] 
> Dependency convergence error for 
> org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT paths to dependency are:
> +-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT
>   +-org.apache.hadoop:hadoop-common:3.3.0-20181017.235917-140
> +-org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT
> and
> +-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT
>   +-org.apache.hadoop:hadoop-common:3.3.0-20181017.235917-140
> +-org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT
> and
> +-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT
>   +-org.apache.hadoop:hadoop-annotations:3.3.0-20181017.235840-140
> [WARNING] Rule 0: org.apache.maven.plugins.enforcer.DependencyConvergence 
> failed with message:
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-691) Dependency convergence error for org.apache.hadoop:hadoop-annotations

2018-10-19 Thread Tsz Wo Nicholas Sze (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDDS-691:
-
Attachment: HDDS-691_20181019.patch

> Dependency convergence error for org.apache.hadoop:hadoop-annotations
> -
>
> Key: HDDS-691
> URL: https://issues.apache.org/jira/browse/HDDS-691
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
> Attachments: HDDS-691_20181018.patch, HDDS-691_20181019.patch
>
>
> {code}
> [WARNING] 
> Dependency convergence error for 
> org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT paths to dependency are:
> +-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT
>   +-org.apache.hadoop:hadoop-common:3.3.0-20181017.235917-140
> +-org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT
> and
> +-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT
>   +-org.apache.hadoop:hadoop-common:3.3.0-20181017.235917-140
> +-org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT
> and
> +-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT
>   +-org.apache.hadoop:hadoop-annotations:3.3.0-20181017.235840-140
> [WARNING] Rule 0: org.apache.maven.plugins.enforcer.DependencyConvergence 
> failed with message:
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-691) Dependency convergence error for org.apache.hadoop:hadoop-annotations

2018-10-18 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16656265#comment-16656265
 ] 

Tsz Wo Nicholas Sze commented on HDDS-691:
--

Hi [~elek], below are the rationale behind the patch 
- First of all, the hadoop-common dependency in hadoop-hdds/common/pom.xml is 
obviously redundant since the parent hadoop-hdds/pom.xml already has it.
- From the dependency convergence error, the grandparent 
hadoop-project-dist/pom.xml already has the hadoop-annotations dependency.   
hadoop-hdds/common/pom.xml gets hadoop-annotations again from the hadoop-common 
dependency.  If we set the scope to "provided" for hadoop-common in 
hadoop-hdds/common/pom.xml, the dependency becomes non-transitive so that it 
won't get hadoop-annotations again.

BTW, the hadoop-common dependency in hadoop-hdfs/pom.xml is also "provided". We 
probably should do the same for hadoop-hdds?

> Dependency convergence error for org.apache.hadoop:hadoop-annotations
> -
>
> Key: HDDS-691
> URL: https://issues.apache.org/jira/browse/HDDS-691
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
> Attachments: HDDS-691_20181018.patch
>
>
> {code}
> [WARNING] 
> Dependency convergence error for 
> org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT paths to dependency are:
> +-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT
>   +-org.apache.hadoop:hadoop-common:3.3.0-20181017.235917-140
> +-org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT
> and
> +-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT
>   +-org.apache.hadoop:hadoop-common:3.3.0-20181017.235917-140
> +-org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT
> and
> +-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT
>   +-org.apache.hadoop:hadoop-annotations:3.3.0-20181017.235840-140
> [WARNING] Rule 0: org.apache.maven.plugins.enforcer.DependencyConvergence 
> failed with message:
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-691) Dependency convergence error for org.apache.hadoop:hadoop-annotations

2018-10-18 Thread Tsz Wo Nicholas Sze (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDDS-691:
-
Status: Patch Available  (was: Open)

> Dependency convergence error for org.apache.hadoop:hadoop-annotations
> -
>
> Key: HDDS-691
> URL: https://issues.apache.org/jira/browse/HDDS-691
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
> Attachments: HDDS-691_20181018.patch
>
>
> {code}
> [WARNING] 
> Dependency convergence error for 
> org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT paths to dependency are:
> +-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT
>   +-org.apache.hadoop:hadoop-common:3.3.0-20181017.235917-140
> +-org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT
> and
> +-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT
>   +-org.apache.hadoop:hadoop-common:3.3.0-20181017.235917-140
> +-org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT
> and
> +-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT
>   +-org.apache.hadoop:hadoop-annotations:3.3.0-20181017.235840-140
> [WARNING] Rule 0: org.apache.maven.plugins.enforcer.DependencyConvergence 
> failed with message:
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-691) Dependency convergence error for org.apache.hadoop:hadoop-annotations

2018-10-18 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654998#comment-16654998
 ] 

Tsz Wo Nicholas Sze commented on HDDS-691:
--

HDDS-691_20181018.patch: changes the scope to "provided".

> Dependency convergence error for org.apache.hadoop:hadoop-annotations
> -
>
> Key: HDDS-691
> URL: https://issues.apache.org/jira/browse/HDDS-691
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
> Attachments: HDDS-691_20181018.patch
>
>
> {code}
> [WARNING] 
> Dependency convergence error for 
> org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT paths to dependency are:
> +-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT
>   +-org.apache.hadoop:hadoop-common:3.3.0-20181017.235917-140
> +-org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT
> and
> +-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT
>   +-org.apache.hadoop:hadoop-common:3.3.0-20181017.235917-140
> +-org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT
> and
> +-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT
>   +-org.apache.hadoop:hadoop-annotations:3.3.0-20181017.235840-140
> [WARNING] Rule 0: org.apache.maven.plugins.enforcer.DependencyConvergence 
> failed with message:
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-691) Dependency convergence error for org.apache.hadoop:hadoop-annotations

2018-10-18 Thread Tsz Wo Nicholas Sze (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDDS-691:
-
Attachment: HDDS-691_20181018.patch

> Dependency convergence error for org.apache.hadoop:hadoop-annotations
> -
>
> Key: HDDS-691
> URL: https://issues.apache.org/jira/browse/HDDS-691
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
> Attachments: HDDS-691_20181018.patch
>
>
> {code}
> [WARNING] 
> Dependency convergence error for 
> org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT paths to dependency are:
> +-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT
>   +-org.apache.hadoop:hadoop-common:3.3.0-20181017.235917-140
> +-org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT
> and
> +-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT
>   +-org.apache.hadoop:hadoop-common:3.3.0-20181017.235917-140
> +-org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT
> and
> +-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT
>   +-org.apache.hadoop:hadoop-annotations:3.3.0-20181017.235840-140
> [WARNING] Rule 0: org.apache.maven.plugins.enforcer.DependencyConvergence 
> failed with message:
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-676) Enable Read from open Containers via Standalone Protocol

2018-10-18 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654995#comment-16654995
 ] 

Tsz Wo Nicholas Sze commented on HDDS-676:
--

{code}
[WARNING] 
Dependency convergence error for 
org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT paths to dependency are:
+-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT
  +-org.apache.hadoop:hadoop-common:3.3.0-20181017.235917-140
+-org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT
and
+-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT
  +-org.apache.hadoop:hadoop-common:3.3.0-20181017.235917-140
+-org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT
and
+-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT
  +-org.apache.hadoop:hadoop-annotations:3.3.0-20181017.235840-140

[WARNING] Rule 0: org.apache.maven.plugins.enforcer.DependencyConvergence 
failed with message:
{code}
It seems that the pom files have some bugs; filed HDDS-691.

> Enable Read from open Containers via Standalone Protocol
> 
>
> Key: HDDS-676
> URL: https://issues.apache.org/jira/browse/HDDS-676
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDDS-676.001.patch
>
>
> With BlockCommitSequenceId getting updated per block commit on open 
> containers in OM as well datanode, Ozone Client reads can through Standalone 
> protocol not necessarily requiring Ratis. Client should verify the BCSID of 
> the container which has the data block , which should always be greater than 
> or equal to the BCSID of the block to be read and the existing block BCSID 
> should exactly match that of the block to be read. As a part of this, Client 
> can try to read from a replica with a supplied BCSID and failover to the next 
> one in case the block does ont exist on one replica.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-691) Dependency convergence error for org.apache.hadoop:hadoop-annotations

2018-10-18 Thread Tsz Wo Nicholas Sze (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDDS-691:
-
Description: 
{code}
[WARNING] 
Dependency convergence error for 
org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT paths to dependency are:
+-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT
  +-org.apache.hadoop:hadoop-common:3.3.0-20181017.235917-140
+-org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT
and
+-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT
  +-org.apache.hadoop:hadoop-common:3.3.0-20181017.235917-140
+-org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT
and
+-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT
  +-org.apache.hadoop:hadoop-annotations:3.3.0-20181017.235840-140

[WARNING] Rule 0: org.apache.maven.plugins.enforcer.DependencyConvergence 
failed with message:
{code}


> Dependency convergence error for org.apache.hadoop:hadoop-annotations
> -
>
> Key: HDDS-691
> URL: https://issues.apache.org/jira/browse/HDDS-691
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
>
> {code}
> [WARNING] 
> Dependency convergence error for 
> org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT paths to dependency are:
> +-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT
>   +-org.apache.hadoop:hadoop-common:3.3.0-20181017.235917-140
> +-org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT
> and
> +-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT
>   +-org.apache.hadoop:hadoop-common:3.3.0-20181017.235917-140
> +-org.apache.hadoop:hadoop-annotations:3.3.0-SNAPSHOT
> and
> +-org.apache.hadoop:hadoop-hdds-common:0.4.0-SNAPSHOT
>   +-org.apache.hadoop:hadoop-annotations:3.3.0-20181017.235840-140
> [WARNING] Rule 0: org.apache.maven.plugins.enforcer.DependencyConvergence 
> failed with message:
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-691) Dependency convergence error for org.apache.hadoop:hadoop-annotations

2018-10-18 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDDS-691:


 Summary: Dependency convergence error for 
org.apache.hadoop:hadoop-annotations
 Key: HDDS-691
 URL: https://issues.apache.org/jira/browse/HDDS-691
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-625) putKey hangs for a long time after completion, sometimes forever

2018-10-11 Thread Tsz Wo Nicholas Sze (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDDS-625:
-
   Resolution: Fixed
Fix Version/s: 0.3.0
   Status: Resolved  (was: Patch Available)

I have committed this.  Thanks, [~arpitagarwal]!

> putKey hangs for a long time after completion, sometimes forever
> 
>
> Key: HDDS-625
> URL: https://issues.apache.org/jira/browse/HDDS-625
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
>Priority: Blocker
> Fix For: 0.3.0
>
> Attachments: HDDS-625.01.patch, HDDS-625.02.patch, 
> ozone-shell-thread-dump.txt
>
>
> putKey hangs, sometimes forever.
> TRACE log output in comment below.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-625) putKey hangs for a long time after completion, sometimes forever

2018-10-11 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647319#comment-16647319
 ] 

Tsz Wo Nicholas Sze commented on HDDS-625:
--

+1 the 02 patch looks good.

> putKey hangs for a long time after completion, sometimes forever
> 
>
> Key: HDDS-625
> URL: https://issues.apache.org/jira/browse/HDDS-625
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
>Priority: Blocker
> Attachments: HDDS-625.01.patch, HDDS-625.02.patch, 
> ozone-shell-thread-dump.txt
>
>
> putKey hangs, sometimes forever.
> TRACE log output in comment below.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-625) putKey hangs for a long time after completion, sometimes forever

2018-10-11 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647152#comment-16647152
 ] 

Tsz Wo Nicholas Sze commented on HDDS-625:
--

I just have deployed Ratis 0.3.0-9b2d7b6-SNAPSHOT.

> putKey hangs for a long time after completion, sometimes forever
> 
>
> Key: HDDS-625
> URL: https://issues.apache.org/jira/browse/HDDS-625
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Reporter: Arpit Agarwal
>Priority: Blocker
> Attachments: ozone-shell-thread-dump.txt
>
>
> putKey hangs, sometimes forever.
> TRACE log output in comment below.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-625) putKey hangs for a long time after completion, sometimes forever

2018-10-11 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1664#comment-1664
 ] 

Tsz Wo Nicholas Sze commented on HDDS-625:
--

Filed RATIS-348.

> putKey hangs for a long time after completion, sometimes forever
> 
>
> Key: HDDS-625
> URL: https://issues.apache.org/jira/browse/HDDS-625
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Reporter: Arpit Agarwal
>Priority: Blocker
> Attachments: ozone-shell-thread-dump.txt
>
>
> putKey hangs, sometimes forever.
> TRACE log output in comment below.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-632) TimeoutScheduler and SlidingWindow should use daemon threads

2018-10-11 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDDS-632:


 Summary: TimeoutScheduler and SlidingWindow should use daemon 
threads
 Key: HDDS-632
 URL: https://issues.apache.org/jira/browse/HDDS-632
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze


In HDDS-625, we found that the Ozone client does not terminate.  The 
SlidingWindow (debug) thread and the TimeoutScheduler threads are holding up 
process termination.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-625) putKey hangs for a long time after completion, sometimes forever

2018-10-11 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646661#comment-16646661
 ] 

Tsz Wo Nicholas Sze commented on HDDS-625:
--

It seems that the SlidingWindow (debug) thread and the TimeoutScheduler threads 
are holding up process termination.  Setting them to daemon should fix the 
problem.

> putKey hangs for a long time after completion, sometimes forever
> 
>
> Key: HDDS-625
> URL: https://issues.apache.org/jira/browse/HDDS-625
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Reporter: Arpit Agarwal
>Priority: Blocker
> Attachments: ozone-shell-thread-dump.txt
>
>
> putKey hangs, sometimes forever.
> TRACE log output in comment below.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-554) In XceiverClientSpi, implements sendCommand(..) using sendCommandAsync(..)

2018-09-25 Thread Tsz Wo Nicholas Sze (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDDS-554:
-
Status: Patch Available  (was: Open)

> In XceiverClientSpi, implements sendCommand(..) using sendCommandAsync(..)
> --
>
> Key: HDDS-554
> URL: https://issues.apache.org/jira/browse/HDDS-554
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Client
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
> Attachments: HDDS-554_20180925.patch
>
>
> The advantages is two-fold --
> # it simplifies the code, and
> # the async API is more efficient.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-554) In XceiverClientSpi, implements sendCommand(..) using sendCommandAsync(..)

2018-09-25 Thread Tsz Wo Nicholas Sze (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDDS-554:
-
Attachment: HDDS-554_20180925.patch

> In XceiverClientSpi, implements sendCommand(..) using sendCommandAsync(..)
> --
>
> Key: HDDS-554
> URL: https://issues.apache.org/jira/browse/HDDS-554
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Client
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
>Priority: Major
> Attachments: HDDS-554_20180925.patch
>
>
> The advantages is two-fold --
> # it simplifies the code, and
> # the async API is more efficient.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-554) In XceiverClientSpi, implements sendCommand(..) using sendCommandAsync(..)

2018-09-25 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDDS-554:


 Summary: In XceiverClientSpi, implements sendCommand(..) using 
sendCommandAsync(..)
 Key: HDDS-554
 URL: https://issues.apache.org/jira/browse/HDDS-554
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: Ozone Client
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze


The advantages is two-fold --
# it simplifies the code, and
# the async API is more efficient.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-451) PutKey failed due to error "Rejecting write chunk request. Chunk overwrite without explicit request"

2018-09-21 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624208#comment-16624208
 ] 

Tsz Wo Nicholas Sze commented on HDDS-451:
--

{code}
//StateMachine.java
  /**
   * Notify the state machine that the raft peer is no longer leader.
   */
  void notifyNotLeader(Collection pendingEntries) throws 
IOException;
{code}
ContainerStateMachine should override the above notifyNotLeader(..) so that it 
can cleanup the not-yet-committed stateMachineData.  I will check more the 
details.

> PutKey failed due to error "Rejecting write chunk request. Chunk overwrite 
> without explicit request"
> 
>
> Key: HDDS-451
> URL: https://issues.apache.org/jira/browse/HDDS-451
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.2.1
>Reporter: Nilotpal Nandi
>Assignee: Shashikant Banerjee
>Priority: Blocker
>  Labels: alpha2
> Attachments: all-node-ozone-logs-1536841590.tar.gz
>
>
> steps taken :
> --
>  # Ran Put Key command to write 50GB data. Put Key client operation failed 
> after 17 mins.
> error seen  ozone.log :
> 
>  
> {code}
> 2018-09-13 12:11:53,734 [ForkJoinPool.commonPool-worker-20] DEBUG 
> (ChunkManagerImpl.java:85) - writing 
> chunk:bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_1
>  chunk stage:COMMIT_DATA chunk 
> file:/tmp/hadoop-root/dfs/data/hdds/de0a9e01-4a12-40e3-b567-51b9bd83248e/current/containerDir0/16/chunks/bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_1
>  tmp chunk file
> 2018-09-13 12:11:56,576 [pool-3-thread-60] DEBUG (ChunkManagerImpl.java:85) - 
> writing 
> chunk:bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2
>  chunk stage:WRITE_DATA chunk 
> file:/tmp/hadoop-root/dfs/data/hdds/de0a9e01-4a12-40e3-b567-51b9bd83248e/current/containerDir0/16/chunks/bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2
>  tmp chunk file
> 2018-09-13 12:11:56,739 [ForkJoinPool.commonPool-worker-20] DEBUG 
> (ChunkManagerImpl.java:85) - writing 
> chunk:bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2
>  chunk stage:COMMIT_DATA chunk 
> file:/tmp/hadoop-root/dfs/data/hdds/de0a9e01-4a12-40e3-b567-51b9bd83248e/current/containerDir0/16/chunks/bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2
>  tmp chunk file
> 2018-09-13 12:12:21,410 [Datanode State Machine Thread - 0] DEBUG 
> (DatanodeStateMachine.java:148) - Executing cycle Number : 206
> 2018-09-13 12:12:51,411 [Datanode State Machine Thread - 0] DEBUG 
> (DatanodeStateMachine.java:148) - Executing cycle Number : 207
> 2018-09-13 12:12:53,525 [BlockDeletingService#1] DEBUG 
> (TopNOrderedContainerDeletionChoosingPolicy.java:79) - Stop looking for next 
> container, there is no pending deletion block contained in remaining 
> containers.
> 2018-09-13 12:12:55,048 [Datanode ReportManager Thread - 1] DEBUG 
> (ContainerSet.java:191) - Starting container report iteration.
> 2018-09-13 12:13:02,626 [pool-3-thread-1] ERROR (ChunkUtils.java:244) - 
> Rejecting write chunk request. Chunk overwrite without explicit request. 
> ChunkInfo{chunkName='bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2,
>  offset=0, len=16777216}
> 2018-09-13 12:13:03,035 [pool-3-thread-1] INFO (ContainerUtils.java:149) - 
> Operation: WriteChunk : Trace ID: 54834b29-603d-4ba9-9d68-0885215759d8 : 
> Message: Rejecting write chunk request. OverWrite flag 
> required.ChunkInfo{chunkName='bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2,
>  offset=0, len=16777216} : Result: OVERWRITE_FLAG_REQUIRED
> 2018-09-13 12:13:03,037 [ForkJoinPool.commonPool-worker-11] ERROR 
> (ChunkUtils.java:244) - Rejecting write chunk request. Chunk overwrite 
> without explicit request. 
> ChunkInfo{chunkName='bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2,
>  offset=0, len=16777216}
> 2018-09-13 12:13:03,037 [ForkJoinPool.commonPool-worker-11] INFO 
> (ContainerUtils.java:149) - Operation: WriteChunk : Trace ID: 
> 54834b29-603d-4ba9-9d68-0885215759d8 : Message: Rejecting write chunk 
> request. OverWrite flag 
> required.ChunkInfo{chunkName='bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2,
>  offset=0, len=16777216} : Result: OVERWRITE_FLAG_REQUIRED
>  
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: 

[jira] [Commented] (HDDS-368) all tests in TestOzoneRestClient failed due to "zh_CN" OS language

2018-09-20 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16622823#comment-16622823
 ] 

Tsz Wo Nicholas Sze commented on HDDS-368:
--

> Java version: 1.8.0_111

Could you also try updating it?  Mine is  1.8.0_172.

> FYI, Once the string transfered by HTTP have Chinese character(or character 
> not in english letters and numbers), The "string".length() will shorter than 
> the "string".getBytes().length, and then the data will be truncated by 
> transfer and the error occured.

Do you see a way to fix it?

> all tests in TestOzoneRestClient failed due to "zh_CN" OS language
> --
>
> Key: HDDS-368
> URL: https://issues.apache.org/jira/browse/HDDS-368
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.2.1
>Reporter: LiXin Ge
>Priority: Critical
>  Labels: alpha2
>
> OS: Ubuntu 16.04.1 LTS (GNU/Linux 4.4.0-116-generic x86_64)
> java version: 1.8.0_111
> mvn: Apache Maven 3.3.9
> Default locale: zh_CN, platform encoding: UTF-8
> Test command: mvn test -Dtest=TestOzoneRestClient -Phdds
>  
>  All the tests in TestOzoneRestClient failed in my local machine with 
> exception like below, does it mean anybody who have runtime environment like 
> me can't run the Ozone Rest test now?
> {noformat}
> [ERROR] 
> testCreateBucket(org.apache.hadoop.ozone.client.rest.TestOzoneRestClient) 
> Time elapsed: 0.01 s <<< ERROR!
> java.io.IOException: org.apache.hadoop.ozone.client.rest.OzoneException: 
> Unparseable date: "m, 28 1970 19:23:50 GMT"
>  at 
> org.apache.hadoop.ozone.client.rest.RestClient.executeHttpRequest(RestClient.java:853)
>  at 
> org.apache.hadoop.ozone.client.rest.RestClient.createVolume(RestClient.java:252)
>  at 
> org.apache.hadoop.ozone.client.rest.RestClient.createVolume(RestClient.java:210)
>  at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.apache.hadoop.ozone.client.OzoneClientInvocationHandler.invoke(OzoneClientInvocationHandler.java:54)
>  at com.sun.proxy.$Proxy73.createVolume(Unknown Source)
>  at 
> org.apache.hadoop.ozone.client.ObjectStore.createVolume(ObjectStore.java:66)
>  at 
> org.apache.hadoop.ozone.client.rest.TestOzoneRestClient.testCreateBucket(TestOzoneRestClient.java:174)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> Caused by: org.apache.hadoop.ozone.client.rest.OzoneException: Unparseable 
> date: "m, 28 1970 19:23:50 GMT"
> at sun.reflect.GeneratedConstructorAccessor27.newInstance(Unknown 
> Source)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at 
> com.fasterxml.jackson.databind.introspect.AnnotatedConstructor.call(AnnotatedConstructor.java:119)
> at 
> com.fasterxml.jackson.databind.deser.std.StdValueInstantiator.createUsingDefault(StdValueInstantiator.java:270)
> at 
> com.fasterxml.jackson.databind.deser.std.ThrowableDeserializer.deserializeFromObject(ThrowableDeserializer.java:149)
> at 
> com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:159)
> at 
> com.fasterxml.jackson.databind.ObjectReader._bindAndClose(ObjectReader.java:1611)
> at 
> com.fasterxml.jackson.databind.ObjectReader.readValue(ObjectReader.java:1219)
> at 
> org.apache.hadoop.ozone.client.rest.OzoneException.parse(OzoneException.java:265)
> ... 39 more
> {noformat}
> or like:
> {noformat}
> [ERROR] Failures:
> [ERROR]   TestOzoneRestClient.testDeleteKey
> Expected: exception with message a string containing "Lookup key failed, 
> error"
>  but: message was "Unexpected end-of-input within/between Object entries
>  at [Source: (String)"{
>   "owner" : {
> "name" : "hadoop"
>   },
>   "quota" : {
> "unit" : "TB",
> "size" : 1048576
>   },
>   "volumeName" : "f93ed82d-dff6-4b75-a1c5-6a0fef5aa6dd",
>   "createdOn" : "���, 06 ��� +50611 08:28:21 GMT",
>   "createdBy" "; line: 11, column: 251]"
> Stacktrace was: com.fasterxml.jackson.core.io.JsonEOFException: Unexpected 
> end-of-input within/between Object entries
>  at [Source: (String)"{
>   "owner" : {
> "name" : "hadoop"
>   

[jira] [Commented] (HDDS-451) PutKey failed due to error "Rejecting write chunk request. Chunk overwrite without explicit request"

2018-09-19 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16621011#comment-16621011
 ] 

Tsz Wo Nicholas Sze commented on HDDS-451:
--

Then, we should log it for easier debugging.

> PutKey failed due to error "Rejecting write chunk request. Chunk overwrite 
> without explicit request"
> 
>
> Key: HDDS-451
> URL: https://issues.apache.org/jira/browse/HDDS-451
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.2.1
>Reporter: Nilotpal Nandi
>Assignee: Shashikant Banerjee
>Priority: Blocker
>  Labels: alpha2
> Attachments: all-node-ozone-logs-1536841590.tar.gz
>
>
> steps taken :
> --
>  # Ran Put Key command to write 50GB data. Put Key client operation failed 
> after 17 mins.
> error seen  ozone.log :
> 
>  
> {code}
> 2018-09-13 12:11:53,734 [ForkJoinPool.commonPool-worker-20] DEBUG 
> (ChunkManagerImpl.java:85) - writing 
> chunk:bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_1
>  chunk stage:COMMIT_DATA chunk 
> file:/tmp/hadoop-root/dfs/data/hdds/de0a9e01-4a12-40e3-b567-51b9bd83248e/current/containerDir0/16/chunks/bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_1
>  tmp chunk file
> 2018-09-13 12:11:56,576 [pool-3-thread-60] DEBUG (ChunkManagerImpl.java:85) - 
> writing 
> chunk:bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2
>  chunk stage:WRITE_DATA chunk 
> file:/tmp/hadoop-root/dfs/data/hdds/de0a9e01-4a12-40e3-b567-51b9bd83248e/current/containerDir0/16/chunks/bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2
>  tmp chunk file
> 2018-09-13 12:11:56,739 [ForkJoinPool.commonPool-worker-20] DEBUG 
> (ChunkManagerImpl.java:85) - writing 
> chunk:bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2
>  chunk stage:COMMIT_DATA chunk 
> file:/tmp/hadoop-root/dfs/data/hdds/de0a9e01-4a12-40e3-b567-51b9bd83248e/current/containerDir0/16/chunks/bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2
>  tmp chunk file
> 2018-09-13 12:12:21,410 [Datanode State Machine Thread - 0] DEBUG 
> (DatanodeStateMachine.java:148) - Executing cycle Number : 206
> 2018-09-13 12:12:51,411 [Datanode State Machine Thread - 0] DEBUG 
> (DatanodeStateMachine.java:148) - Executing cycle Number : 207
> 2018-09-13 12:12:53,525 [BlockDeletingService#1] DEBUG 
> (TopNOrderedContainerDeletionChoosingPolicy.java:79) - Stop looking for next 
> container, there is no pending deletion block contained in remaining 
> containers.
> 2018-09-13 12:12:55,048 [Datanode ReportManager Thread - 1] DEBUG 
> (ContainerSet.java:191) - Starting container report iteration.
> 2018-09-13 12:13:02,626 [pool-3-thread-1] ERROR (ChunkUtils.java:244) - 
> Rejecting write chunk request. Chunk overwrite without explicit request. 
> ChunkInfo{chunkName='bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2,
>  offset=0, len=16777216}
> 2018-09-13 12:13:03,035 [pool-3-thread-1] INFO (ContainerUtils.java:149) - 
> Operation: WriteChunk : Trace ID: 54834b29-603d-4ba9-9d68-0885215759d8 : 
> Message: Rejecting write chunk request. OverWrite flag 
> required.ChunkInfo{chunkName='bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2,
>  offset=0, len=16777216} : Result: OVERWRITE_FLAG_REQUIRED
> 2018-09-13 12:13:03,037 [ForkJoinPool.commonPool-worker-11] ERROR 
> (ChunkUtils.java:244) - Rejecting write chunk request. Chunk overwrite 
> without explicit request. 
> ChunkInfo{chunkName='bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2,
>  offset=0, len=16777216}
> 2018-09-13 12:13:03,037 [ForkJoinPool.commonPool-worker-11] INFO 
> (ContainerUtils.java:149) - Operation: WriteChunk : Trace ID: 
> 54834b29-603d-4ba9-9d68-0885215759d8 : Message: Rejecting write chunk 
> request. OverWrite flag 
> required.ChunkInfo{chunkName='bd80b58a5eba888200a4832a0f2aafb3_stream_5f3b2505-6964-45c9-a7ad-827388a1e6a0_chunk_2,
>  offset=0, len=16777216} : Result: OVERWRITE_FLAG_REQUIRED
>  
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-368) all tests in TestOzoneRestClient failed due to "zh_CN" OS language

2018-09-19 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16620987#comment-16620987
 ] 

Tsz Wo Nicholas Sze edited comment on HDDS-368 at 9/19/18 6:22 PM:
---

{code}
$mvn --version
Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 
2018-06-17T11:33:14-07:00)
Maven home: /usr/local/Cellar/maven/3.5.4/libexec
Java version: 1.8.0_172, vendor: Oracle Corporation, runtime: 
/Library/Java/JavaVirtualMachines/jdk1.8.0_172.jdk/Contents/Home/jre
Default locale: zh_CN, platform encoding: UTF-8
OS name: "mac os x", version: "10.13.6", arch: "x86_64", family: "mac"
{code}
[~GeLiXin], I have set my locale to zh_CN.  I can see some compiler warnings in 
Chinese but TestOzoneRestClient has not failed.  Could you try updating your 
maven/java versions?
{code}
[INFO] Compiling 23 source files to 
/Users/szetszwo/hadoop/apache-hadoop/hadoop-ozone/integration-test/target/test-classes
[WARNING] 
/Users/szetszwo/hadoop/apache-hadoop/hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/web/client/TestKeys.java:
 某些输入文件使用了未经检查或不安全的操作。
[WARNING] 
/Users/szetszwo/hadoop/apache-hadoop/hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/web/client/TestKeys.java:
 有关详细信息, 请使用 -Xlint:unchecked 重新编译。
[INFO] 
[INFO] --- maven-surefire-plugin:2.21.0:test (default-test) @ 
hadoop-ozone-integration-test ---
[INFO] 
[INFO] ---
[INFO]  T E S T S
[INFO] ---
[INFO] Running org.apache.hadoop.ozone.client.rest.TestOzoneRestClient
[INFO] Tests run: 21, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 12.136 
s - in org.apache.hadoop.ozone.client.rest.TestOzoneRestClient
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 21, Failures: 0, Errors: 0, Skipped: 0
{code}



was (Author: szetszwo):
{code}
$mvn --version
Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 
2018-06-17T11:33:14-07:00)
Maven home: /usr/local/Cellar/maven/3.5.4/libexec
Java version: 1.8.0_172, vendor: Oracle Corporation, runtime: 
/Library/Java/JavaVirtualMachines/jdk1.8.0_172.jdk/Contents/Home/jre
Default locale: zh_CN, platform encoding: UTF-8
OS name: "mac os x", version: "10.13.6", arch: "x86_64", family: "mac"
{code}
[~GeLiXin], I have set my locale to zh_CN.  I can see some compiler warnings in 
Chinese but TestOzoneRestClient have not failed.  Could you try updating your 
maven/java versions?
{code}
[INFO] Compiling 23 source files to 
/Users/szetszwo/hadoop/apache-hadoop/hadoop-ozone/integration-test/target/test-classes
[WARNING] 
/Users/szetszwo/hadoop/apache-hadoop/hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/web/client/TestKeys.java:
 某些输入文件使用了未经检查或不安全的操作。
[WARNING] 
/Users/szetszwo/hadoop/apache-hadoop/hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/web/client/TestKeys.java:
 有关详细信息, 请使用 -Xlint:unchecked 重新编译。
[INFO] 
[INFO] --- maven-surefire-plugin:2.21.0:test (default-test) @ 
hadoop-ozone-integration-test ---
[INFO] 
[INFO] ---
[INFO]  T E S T S
[INFO] ---
[INFO] Running org.apache.hadoop.ozone.client.rest.TestOzoneRestClient
[INFO] Tests run: 21, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 12.136 
s - in org.apache.hadoop.ozone.client.rest.TestOzoneRestClient
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 21, Failures: 0, Errors: 0, Skipped: 0
[INFO] 
{code}


> all tests in TestOzoneRestClient failed due to "zh_CN" OS language
> --
>
> Key: HDDS-368
> URL: https://issues.apache.org/jira/browse/HDDS-368
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.2.1
>Reporter: LiXin Ge
>Priority: Critical
>  Labels: alpha2
>
> OS: Ubuntu 16.04.1 LTS (GNU/Linux 4.4.0-116-generic x86_64)
> java version: 1.8.0_111
> mvn: Apache Maven 3.3.9
> Default locale: zh_CN, platform encoding: UTF-8
> Test command: mvn test -Dtest=TestOzoneRestClient -Phdds
>  
>  All the tests in TestOzoneRestClient failed in my local machine with 
> exception like below, does it mean anybody who have runtime environment like 
> me can't run the Ozone Rest test now?
> {noformat}
> [ERROR] 
> testCreateBucket(org.apache.hadoop.ozone.client.rest.TestOzoneRestClient) 
> Time elapsed: 0.01 s <<< ERROR!
> java.io.IOException: org.apache.hadoop.ozone.client.rest.OzoneException: 
> Unparseable date: "m, 28 1970 19:23:50 GMT"
>  at 
> org.apache.hadoop.ozone.client.rest.RestClient.executeHttpRequest(RestClient.java:853)
>  at 
> org.apache.hadoop.ozone.client.rest.RestClient.createVolume(RestClient.java:252)
>  at 
> 

  1   2   3   4   5   6   7   8   9   10   >