[GitHub] [hadoop-ozone] vivekratnavel commented on issue #675: HDDS-3170. Fix issues in File count by size task.

2020-03-14 Thread GitBox
vivekratnavel commented on issue #675: HDDS-3170. Fix issues in File count by 
size task.
URL: https://github.com/apache/hadoop-ozone/pull/675#issuecomment-599175045
 
 
   +1 LGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-3181) Intermittent failure in TestReconWithOzoneManager due to BindException

2020-03-14 Thread Attila Doroszlai (Jira)
Attila Doroszlai created HDDS-3181:
--

 Summary: Intermittent failure in TestReconWithOzoneManager due to 
BindException
 Key: HDDS-3181
 URL: https://issues.apache.org/jira/browse/HDDS-3181
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: test
Reporter: Attila Doroszlai
Assignee: Attila Doroszlai


TestReconWithOzoneManager may fail with BindException:

{code:title=https://github.com/apache/hadoop-ozone/pull/677/checks?check_run_id=507376007}
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 19.707 s <<< 
FAILURE! - in org.apache.hadoop.ozone.recon.TestReconWithOzoneManager
org.apache.hadoop.ozone.recon.TestReconWithOzoneManager  Time elapsed: 19.706 s 
 <<< ERROR!
picocli.CommandLine$ExecutionException: Error while calling command 
(org.apache.hadoop.ozone.recon.ReconServer@23f74a49): java.net.BindException: 
Port in use: 0.0.0.0:36263
...
at 
org.apache.hadoop.ozone.MiniOzoneClusterImpl$Builder.build(MiniOzoneClusterImpl.java:534)
at 
org.apache.hadoop.ozone.recon.TestReconWithOzoneManager.init(TestReconWithOzoneManager.java:109)
...
Caused by: java.net.BindException: Port in use: 0.0.0.0:36263
at 
org.apache.hadoop.hdds.server.http.HttpServer2.constructBindException(HttpServer2.java:1200)
at 
org.apache.hadoop.hdds.server.http.HttpServer2.bindForSinglePort(HttpServer2.java:1222)
at 
org.apache.hadoop.hdds.server.http.HttpServer2.openListeners(HttpServer2.java:1281)
at 
org.apache.hadoop.hdds.server.http.HttpServer2.start(HttpServer2.java:1136)
at 
org.apache.hadoop.hdds.server.http.BaseHttpServer.start(BaseHttpServer.java:252)
at org.apache.hadoop.ozone.recon.ReconServer.start(ReconServer.java:128)
at org.apache.hadoop.ozone.recon.ReconServer.call(ReconServer.java:106)
at org.apache.hadoop.ozone.recon.ReconServer.call(ReconServer.java:50)
at picocli.CommandLine.execute(CommandLine.java:1173)
... 27 more
{code}

{code:title=test output}
2020-03-14 06:17:08,677 [main] INFO  http.BaseHttpServer 
(BaseHttpServer.java:updateConnectorAddress(284)) - HTTP server of ozoneManager 
listening at http://0.0.0.0:36263
...
2020-03-14 06:17:11,589 [main] INFO  http.BaseHttpServer 
(BaseHttpServer.java:newHttpServer2BuilderForOzone(170)) - Starting Web-server 
for recon at: http://0.0.0.0:36263
...
2020-03-14 06:17:12,756 [main] INFO  recon.ReconServer 
(ReconServer.java:start(125)) - Starting Recon server
2020-03-14 06:17:12,757 [main] INFO  http.HttpServer2 
(HttpServer2.java:start(1139)) - HttpServer.start() threw a non Bind IOException
java.net.BindException: Port in use: 0.0.0.0:36263
...
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-2848) Recon changes to make snapshots work with OM HA

2020-03-14 Thread Aravindan Vijayan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aravindan Vijayan resolved HDDS-2848.
-
Resolution: Fixed

PR merged. Thanks for the patch [~swagle]. 

> Recon changes to make snapshots work with OM HA
> ---
>
> Key: HDDS-2848
> URL: https://issues.apache.org/jira/browse/HDDS-2848
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Recon
>Reporter: Aravindan Vijayan
>Assignee: Siddharth Wagle
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Recon talks to OM in 2 ways - Through HTTP to get DB snapshot, and through 
> RPC to get delta updates.
> Since Recon already uses the OzoneManagerClientProtocol to query the 
> OzoneManager RPC, the RPC client automatically routes the request to the 
> leader on an OM HA cluster. Recon only needs the updates from the OM RocksDB 
> store, and does not need the in flight updates in the OM DoubleBuffer. Due to 
> the guarantee from Ratis that the leader’s RocksDB will always be up to date, 
> Recon does not need to worry about going back in time when a current OM 
> leader goes down. We have to pass in the om service ID to the Ozone Manager 
> client in Recon, and the failover works internally. Currently we pass in 
> 'null'.
> To make the HTTP call to work against OM HA, Recon has to find out the 
> current OM leader and download the snapshot from that OM instance. We can use 
> the way this has been implemented in 
> org.apache.hadoop.ozone.admin.om.GetServiceRolesSubcommand. We can get the 
> roles of OM instances and then determine the leader from that. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] avijayanhwx merged pull request #666: HDDS-2848. Recon changes to make snapshots work with OM HA.

2020-03-14 Thread GitBox
avijayanhwx merged pull request #666: HDDS-2848. Recon changes to make 
snapshots work with OM HA.
URL: https://github.com/apache/hadoop-ozone/pull/666
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] avijayanhwx commented on issue #666: HDDS-2848. Recon changes to make snapshots work with OM HA.

2020-03-14 Thread GitBox
avijayanhwx commented on issue #666: HDDS-2848. Recon changes to make snapshots 
work with OM HA.
URL: https://github.com/apache/hadoop-ozone/pull/666#issuecomment-599150054
 
 
   Thank you for working on this @swagle.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] avijayanhwx commented on issue #666: HDDS-2848. Recon changes to make snapshots work with OM HA.

2020-03-14 Thread GitBox
avijayanhwx commented on issue #666: HDDS-2848. Recon changes to make snapshots 
work with OM HA.
URL: https://github.com/apache/hadoop-ozone/pull/666#issuecomment-599150010
 
 
   `org.apache.hadoop.ozone.freon.TestDataValidateWithUnsafeByteOperations`
   Failure seems unrelated.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] avijayanhwx commented on a change in pull request #681: HDDS-3153. Create REST API to serve Recon Dashboard and integrate with UI in Recon.

2020-03-14 Thread GitBox
avijayanhwx commented on a change in pull request #681: HDDS-3153. Create REST 
API to serve Recon Dashboard and integrate with UI in Recon.
URL: https://github.com/apache/hadoop-ozone/pull/681#discussion_r392628718
 
 

 ##
 File path: 
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/api/ClusterStateEndpoint.java
 ##
 @@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.api;
+
+import org.apache.hadoop.hdds.protocol.DatanodeDetails;
+import org.apache.hadoop.hdds.protocol.proto.HddsProtos.NodeState;
+import org.apache.hadoop.hdds.scm.container.placement.metrics.SCMNodeStat;
+import org.apache.hadoop.hdds.scm.server.OzoneStorageContainerManager;
+import org.apache.hadoop.ozone.recon.api.types.ClusterStateResponse;
+import org.apache.hadoop.ozone.recon.api.types.DatanodeStorageReport;
+import org.apache.hadoop.ozone.recon.api.types.DatanodesCount;
+import org.apache.hadoop.ozone.recon.recovery.ReconOMMetadataManager;
+import org.apache.hadoop.ozone.recon.scm.ReconContainerManager;
+import org.apache.hadoop.ozone.recon.scm.ReconNodeManager;
+import org.apache.hadoop.ozone.recon.scm.ReconPipelineManager;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.inject.Inject;
+import javax.ws.rs.GET;
+import javax.ws.rs.Path;
+import javax.ws.rs.Produces;
+import javax.ws.rs.core.MediaType;
+import javax.ws.rs.core.Response;
+import java.util.List;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.concurrent.atomic.AtomicLong;
+
+/**
+ * Endpoint to fetch current state of ozone cluster.
+ */
+@Path("/clusterState")
+@Produces(MediaType.APPLICATION_JSON)
+public class ClusterStateEndpoint {
+
+  private static final Logger LOG =
+  LoggerFactory.getLogger(ClusterStateEndpoint.class);
+
+  private ReconNodeManager nodeManager;
+  private ReconPipelineManager pipelineManager;
+  private ReconContainerManager containerManager;
+  private ReconOMMetadataManager omMetadataManager;
+
+  @Inject
+  ClusterStateEndpoint(OzoneStorageContainerManager reconSCM,
+   ReconOMMetadataManager omMetadataManager) {
+this.nodeManager =
+(ReconNodeManager) reconSCM.getScmNodeManager();
+this.pipelineManager = (ReconPipelineManager) 
reconSCM.getPipelineManager();
+this.containerManager =
+(ReconContainerManager) reconSCM.getContainerManager();
+this.omMetadataManager = omMetadataManager;
+  }
+
+  /**
+   * Return a summary report on current cluster state.
+   * @return {@link Response}
+   */
+  @GET
+  public Response getClusterState() {
+List datanodeDetails = nodeManager.getAllNodes();
+AtomicInteger healthyDatanodes = new AtomicInteger();
+int containers = this.containerManager.getContainerIDs().size();
+int pipelines = this.pipelineManager.getPipelines().size();
+long volumes;
+long buckets;
+long keys;
+AtomicLong capacity = new AtomicLong(0L);
+AtomicLong used = new AtomicLong(0L);
 
 Review comment:
   Why AtomicLong? Simple long may be enough.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] avijayanhwx commented on a change in pull request #681: HDDS-3153. Create REST API to serve Recon Dashboard and integrate with UI in Recon.

2020-03-14 Thread GitBox
avijayanhwx commented on a change in pull request #681: HDDS-3153. Create REST 
API to serve Recon Dashboard and integrate with UI in Recon.
URL: https://github.com/apache/hadoop-ozone/pull/681#discussion_r392627568
 
 

 ##
 File path: 
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/api/types/PipelineMetadata.java
 ##
 @@ -167,9 +149,19 @@ public PipelineMetadata build() {
   Preconditions.checkNotNull(datanodes);
   Preconditions.checkNotNull(replicationType);
 
-  return new PipelineMetadata(pipelineId, status, leaderNode, datanodes,
 
 Review comment:
   Why was this changed from using the constructor? Setting the class field 
directly is non standard.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] avijayanhwx commented on a change in pull request #681: HDDS-3153. Create REST API to serve Recon Dashboard and integrate with UI in Recon.

2020-03-14 Thread GitBox
avijayanhwx commented on a change in pull request #681: HDDS-3153. Create REST 
API to serve Recon Dashboard and integrate with UI in Recon.
URL: https://github.com/apache/hadoop-ozone/pull/681#discussion_r392628837
 
 

 ##
 File path: 
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/api/ClusterStateEndpoint.java
 ##
 @@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.api;
+
+import org.apache.hadoop.hdds.protocol.DatanodeDetails;
+import org.apache.hadoop.hdds.protocol.proto.HddsProtos.NodeState;
+import org.apache.hadoop.hdds.scm.container.placement.metrics.SCMNodeStat;
+import org.apache.hadoop.hdds.scm.server.OzoneStorageContainerManager;
+import org.apache.hadoop.ozone.recon.api.types.ClusterStateResponse;
+import org.apache.hadoop.ozone.recon.api.types.DatanodeStorageReport;
+import org.apache.hadoop.ozone.recon.api.types.DatanodesCount;
+import org.apache.hadoop.ozone.recon.recovery.ReconOMMetadataManager;
+import org.apache.hadoop.ozone.recon.scm.ReconContainerManager;
+import org.apache.hadoop.ozone.recon.scm.ReconNodeManager;
+import org.apache.hadoop.ozone.recon.scm.ReconPipelineManager;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.inject.Inject;
+import javax.ws.rs.GET;
+import javax.ws.rs.Path;
+import javax.ws.rs.Produces;
+import javax.ws.rs.core.MediaType;
+import javax.ws.rs.core.Response;
+import java.util.List;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.concurrent.atomic.AtomicLong;
+
+/**
+ * Endpoint to fetch current state of ozone cluster.
+ */
+@Path("/clusterState")
+@Produces(MediaType.APPLICATION_JSON)
+public class ClusterStateEndpoint {
+
+  private static final Logger LOG =
+  LoggerFactory.getLogger(ClusterStateEndpoint.class);
+
+  private ReconNodeManager nodeManager;
+  private ReconPipelineManager pipelineManager;
+  private ReconContainerManager containerManager;
+  private ReconOMMetadataManager omMetadataManager;
+
+  @Inject
+  ClusterStateEndpoint(OzoneStorageContainerManager reconSCM,
+   ReconOMMetadataManager omMetadataManager) {
+this.nodeManager =
+(ReconNodeManager) reconSCM.getScmNodeManager();
+this.pipelineManager = (ReconPipelineManager) 
reconSCM.getPipelineManager();
+this.containerManager =
+(ReconContainerManager) reconSCM.getContainerManager();
+this.omMetadataManager = omMetadataManager;
+  }
+
+  /**
+   * Return a summary report on current cluster state.
+   * @return {@link Response}
+   */
+  @GET
+  public Response getClusterState() {
+List datanodeDetails = nodeManager.getAllNodes();
+AtomicInteger healthyDatanodes = new AtomicInteger();
+int containers = this.containerManager.getContainerIDs().size();
+int pipelines = this.pipelineManager.getPipelines().size();
+long volumes;
+long buckets;
+long keys;
+AtomicLong capacity = new AtomicLong(0L);
+AtomicLong used = new AtomicLong(0L);
+AtomicLong remaining = new AtomicLong(0L);
+datanodeDetails.forEach(datanode -> {
+  NodeState nodeState = nodeManager.getNodeState(datanode);
+  SCMNodeStat nodeStat = nodeManager.getNodeStat(datanode).get();
 
 Review comment:
   Can we use SCMNodeManager#getStats here? It is supposed to give the 
aggregate stats from all nodes. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] avijayanhwx commented on a change in pull request #681: HDDS-3153. Create REST API to serve Recon Dashboard and integrate with UI in Recon.

2020-03-14 Thread GitBox
avijayanhwx commented on a change in pull request #681: HDDS-3153. Create REST 
API to serve Recon Dashboard and integrate with UI in Recon.
URL: https://github.com/apache/hadoop-ozone/pull/681#discussion_r392628566
 
 

 ##
 File path: 
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/api/ClusterStateEndpoint.java
 ##
 @@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.api;
+
+import org.apache.hadoop.hdds.protocol.DatanodeDetails;
+import org.apache.hadoop.hdds.protocol.proto.HddsProtos.NodeState;
+import org.apache.hadoop.hdds.scm.container.placement.metrics.SCMNodeStat;
+import org.apache.hadoop.hdds.scm.server.OzoneStorageContainerManager;
+import org.apache.hadoop.ozone.recon.api.types.ClusterStateResponse;
+import org.apache.hadoop.ozone.recon.api.types.DatanodeStorageReport;
+import org.apache.hadoop.ozone.recon.api.types.DatanodesCount;
+import org.apache.hadoop.ozone.recon.recovery.ReconOMMetadataManager;
+import org.apache.hadoop.ozone.recon.scm.ReconContainerManager;
+import org.apache.hadoop.ozone.recon.scm.ReconNodeManager;
+import org.apache.hadoop.ozone.recon.scm.ReconPipelineManager;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.inject.Inject;
+import javax.ws.rs.GET;
+import javax.ws.rs.Path;
+import javax.ws.rs.Produces;
+import javax.ws.rs.core.MediaType;
+import javax.ws.rs.core.Response;
+import java.util.List;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.concurrent.atomic.AtomicLong;
+
+/**
+ * Endpoint to fetch current state of ozone cluster.
+ */
+@Path("/clusterState")
+@Produces(MediaType.APPLICATION_JSON)
+public class ClusterStateEndpoint {
+
+  private static final Logger LOG =
+  LoggerFactory.getLogger(ClusterStateEndpoint.class);
+
+  private ReconNodeManager nodeManager;
+  private ReconPipelineManager pipelineManager;
+  private ReconContainerManager containerManager;
+  private ReconOMMetadataManager omMetadataManager;
+
+  @Inject
+  ClusterStateEndpoint(OzoneStorageContainerManager reconSCM,
+   ReconOMMetadataManager omMetadataManager) {
+this.nodeManager =
+(ReconNodeManager) reconSCM.getScmNodeManager();
+this.pipelineManager = (ReconPipelineManager) 
reconSCM.getPipelineManager();
+this.containerManager =
+(ReconContainerManager) reconSCM.getContainerManager();
+this.omMetadataManager = omMetadataManager;
+  }
+
+  /**
+   * Return a summary report on current cluster state.
+   * @return {@link Response}
+   */
+  @GET
+  public Response getClusterState() {
+List datanodeDetails = nodeManager.getAllNodes();
+AtomicInteger healthyDatanodes = new AtomicInteger();
+int containers = this.containerManager.getContainerIDs().size();
+int pipelines = this.pipelineManager.getPipelines().size();
+long volumes;
+long buckets;
+long keys;
+AtomicLong capacity = new AtomicLong(0L);
+AtomicLong used = new AtomicLong(0L);
+AtomicLong remaining = new AtomicLong(0L);
+datanodeDetails.forEach(datanode -> {
 
 Review comment:
   We can use SCMNodeManager#getNodeCount(NodeState)  to get the number of 
nodes in a specific state.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] avijayanhwx commented on a change in pull request #681: HDDS-3153. Create REST API to serve Recon Dashboard and integrate with UI in Recon.

2020-03-14 Thread GitBox
avijayanhwx commented on a change in pull request #681: HDDS-3153. Create REST 
API to serve Recon Dashboard and integrate with UI in Recon.
URL: https://github.com/apache/hadoop-ozone/pull/681#discussion_r392627516
 
 

 ##
 File path: 
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/api/types/DatanodesCount.java
 ##
 @@ -0,0 +1,47 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.ozone.recon.api.types;
+
+import javax.xml.bind.annotation.XmlAccessType;
+import javax.xml.bind.annotation.XmlAccessorType;
+import javax.xml.bind.annotation.XmlElement;
+
+/**
+ * Metadata object that contains datanode counts based on its state.
+ */
+@XmlAccessorType(XmlAccessType.FIELD)
+public class DatanodesCount {
 
 Review comment:
   IMO this class is redundant. We can just capture the num datanodes (healthy 
and total) as 2 longs in the response. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] avijayanhwx commented on a change in pull request #681: HDDS-3153. Create REST API to serve Recon Dashboard and integrate with UI in Recon.

2020-03-14 Thread GitBox
avijayanhwx commented on a change in pull request #681: HDDS-3153. Create REST 
API to serve Recon Dashboard and integrate with UI in Recon.
URL: https://github.com/apache/hadoop-ozone/pull/681#discussion_r392628661
 
 

 ##
 File path: 
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/api/ClusterStateEndpoint.java
 ##
 @@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.api;
+
+import org.apache.hadoop.hdds.protocol.DatanodeDetails;
+import org.apache.hadoop.hdds.protocol.proto.HddsProtos.NodeState;
+import org.apache.hadoop.hdds.scm.container.placement.metrics.SCMNodeStat;
+import org.apache.hadoop.hdds.scm.server.OzoneStorageContainerManager;
+import org.apache.hadoop.ozone.recon.api.types.ClusterStateResponse;
+import org.apache.hadoop.ozone.recon.api.types.DatanodeStorageReport;
+import org.apache.hadoop.ozone.recon.api.types.DatanodesCount;
+import org.apache.hadoop.ozone.recon.recovery.ReconOMMetadataManager;
+import org.apache.hadoop.ozone.recon.scm.ReconContainerManager;
+import org.apache.hadoop.ozone.recon.scm.ReconNodeManager;
+import org.apache.hadoop.ozone.recon.scm.ReconPipelineManager;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.inject.Inject;
+import javax.ws.rs.GET;
+import javax.ws.rs.Path;
+import javax.ws.rs.Produces;
+import javax.ws.rs.core.MediaType;
+import javax.ws.rs.core.Response;
+import java.util.List;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.concurrent.atomic.AtomicLong;
+
+/**
+ * Endpoint to fetch current state of ozone cluster.
+ */
+@Path("/clusterState")
+@Produces(MediaType.APPLICATION_JSON)
+public class ClusterStateEndpoint {
+
+  private static final Logger LOG =
+  LoggerFactory.getLogger(ClusterStateEndpoint.class);
+
+  private ReconNodeManager nodeManager;
+  private ReconPipelineManager pipelineManager;
+  private ReconContainerManager containerManager;
+  private ReconOMMetadataManager omMetadataManager;
+
+  @Inject
+  ClusterStateEndpoint(OzoneStorageContainerManager reconSCM,
+   ReconOMMetadataManager omMetadataManager) {
+this.nodeManager =
+(ReconNodeManager) reconSCM.getScmNodeManager();
+this.pipelineManager = (ReconPipelineManager) 
reconSCM.getPipelineManager();
+this.containerManager =
+(ReconContainerManager) reconSCM.getContainerManager();
+this.omMetadataManager = omMetadataManager;
+  }
+
+  /**
+   * Return a summary report on current cluster state.
+   * @return {@link Response}
+   */
+  @GET
+  public Response getClusterState() {
+List datanodeDetails = nodeManager.getAllNodes();
+AtomicInteger healthyDatanodes = new AtomicInteger();
+int containers = this.containerManager.getContainerIDs().size();
+int pipelines = this.pipelineManager.getPipelines().size();
+long volumes;
+long buckets;
+long keys;
+AtomicLong capacity = new AtomicLong(0L);
+AtomicLong used = new AtomicLong(0L);
+AtomicLong remaining = new AtomicLong(0L);
+datanodeDetails.forEach(datanode -> {
+  NodeState nodeState = nodeManager.getNodeState(datanode);
+  SCMNodeStat nodeStat = nodeManager.getNodeStat(datanode).get();
+  if (nodeState.equals(NodeState.HEALTHY)) {
+healthyDatanodes.getAndIncrement();
+  }
+  capacity.getAndAdd(nodeStat.getCapacity().get());
+  used.getAndAdd(nodeStat.getScmUsed().get());
+  remaining.getAndAdd(nodeStat.getRemaining().get());
+});
+DatanodeStorageReport storageReport =
+new DatanodeStorageReport(capacity.get(), used.get(), remaining.get());
+DatanodesCount datanodesCount = new DatanodesCount(datanodeDetails.size(),
+healthyDatanodes.get());
+ClusterStateResponse.Builder builder = ClusterStateResponse.newBuilder();
+try {
+  volumes = omMetadataManager.getVolumeTable().getEstimatedKeyCount();
+  builder.setVolumes(volumes);
+} catch (Exception ex) {
+  LOG.error("Unable to get Volumes count in ClusterStateResponse.", ex);
+}
+try {
+  buckets = omMetadataManager.getBucketTable().getEstimatedKeyCount();
 
 Review comment:
   Nit. We can directly use

[GitHub] [hadoop-ozone] vivekratnavel commented on issue #681: HDDS-3153. Create REST API to serve Recon Dashboard and integrate with UI in Recon.

2020-03-14 Thread GitBox
vivekratnavel commented on issue #681: HDDS-3153. Create REST API to serve 
Recon Dashboard and integrate with UI in Recon.
URL: https://github.com/apache/hadoop-ozone/pull/681#issuecomment-599142315
 
 
   @avijayanhwx @elek Please review


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3153) Create REST API to serve Recon Dashboard and integrate with UI in Recon.

2020-03-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3153:
-
Labels: pull-request-available  (was: )

> Create REST API to serve Recon Dashboard and integrate with UI in Recon.
> 
>
> Key: HDDS-3153
> URL: https://issues.apache.org/jira/browse/HDDS-3153
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Recon
>Affects Versions: 0.5.0
>Reporter: Vivek Ratnavel Subramanian
>Assignee: Vivek Ratnavel Subramanian
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screen Shot 2020-03-10 at 12.10.41 PM.png
>
>
> Add a REST API to serve information required for recon dashboard
> !Screen Shot 2020-03-10 at 12.10.41 PM.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] vivekratnavel opened a new pull request #681: HDDS-3153. Create REST API to serve Recon Dashboard and integrate with UI in Recon.

2020-03-14 Thread GitBox
vivekratnavel opened a new pull request #681: HDDS-3153. Create REST API to 
serve Recon Dashboard and integrate with UI in Recon.
URL: https://github.com/apache/hadoop-ozone/pull/681
 
 
   ## What changes were proposed in this pull request?
   
   - Add REST api endpoint to serve cluster state (/api/v1/clusterState)
   - Integrate UI with clusterState API
   - Remove dangling javadoc comment in LICENSE block from all Recon files
   - Add unit and integration tests
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-3153
   
   ## How was this patch tested?
   
   unit tests, integration tests and manual tests using docker-compose.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] xiaoyuyao commented on issue #665: HDDS-3160. Disable index and filter block cache for RocksDB.

2020-03-14 Thread GitBox
xiaoyuyao commented on issue #665: HDDS-3160. Disable index and filter block 
cache for RocksDB.
URL: https://github.com/apache/hadoop-ozone/pull/665#issuecomment-599118517
 
 
   Thanks @elek  for the patch. Given the block cache size (256MB) and the per 
SST index/filter cost ~5.5MB, It would handle up to about 50 SST files before 
thrashing happen. I agree that the mix the index/filter with block data could 
impact the performance of the block cache. 
   
   This is one catch moving it out of block cache with this change. We will 
lost the [control over the max memory usage of those 
index/filters](https://github.com/facebook/rocksdb/wiki/Block-Cache). We should 
consider adding profiles for larger block cache size (e.g., change the current 
256MB to 8GB or higher for OM DB with 100M+ keys or above) for the scenario 
when taming rocks db memory usage is required, or adding support for [portioned 
index 
filters](https://github.com/facebook/rocksdb/wiki/Partitioned-Index-Filters). 
   
   I'm +1 on this change with one minor ask: If would be great if you could 
attach some RocksDB block cache metrics on block cache hit/miss ratio before 
and after this change, on index/filter/data, respectively. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3177) Periodic dependency update (Java)

2020-03-14 Thread Attila Doroszlai (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Doroszlai updated HDDS-3177:
---
Status: Patch Available  (was: In Progress)

> Periodic dependency update (Java)
> -
>
> Key: HDDS-3177
> URL: https://issues.apache.org/jira/browse/HDDS-3177
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>Reporter: Wei-Chiu Chuang
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available
> Attachments: dependency-check-report.html
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Must:
> jackson-databind2.9.9 --> 2.10.3
> netty-all 4.0.52 --> 4.1.46
> nimbus-jose-jwt 4.41.1 --> 7.9 (or remove it?)
> Nice to have:
> cdi-api 1.2 --> 2.0.SP1 (major version change)
> hadoop 3.2.0 --> 3.2.1
> ===
> protobuf 2.5.0 --> ? this is more controversial 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3177) Periodic dependency update (Java)

2020-03-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3177:
-
Labels: pull-request-available  (was: )

> Periodic dependency update (Java)
> -
>
> Key: HDDS-3177
> URL: https://issues.apache.org/jira/browse/HDDS-3177
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>Reporter: Wei-Chiu Chuang
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available
> Attachments: dependency-check-report.html
>
>
> Must:
> jackson-databind2.9.9 --> 2.10.3
> netty-all 4.0.52 --> 4.1.46
> nimbus-jose-jwt 4.41.1 --> 7.9 (or remove it?)
> Nice to have:
> cdi-api 1.2 --> 2.0.SP1 (major version change)
> hadoop 3.2.0 --> 3.2.1
> ===
> protobuf 2.5.0 --> ? this is more controversial 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] adoroszlai opened a new pull request #680: HDDS-3177. Periodic dependency update (Java)

2020-03-14 Thread GitBox
adoroszlai opened a new pull request #680: HDDS-3177. Periodic dependency 
update (Java)
URL: https://github.com/apache/hadoop-ozone/pull/680
 
 
   ## What changes were proposed in this pull request?
   
   1. Upgrade:
* jackson-databind2.9.9 --> 2.10.3
* netty-all 4.0.52 --> 4.1.47
* nimbus-jose-jwt 4.41.1 --> 7.9
   2. Use existing `bouncycastle.version` property (= 1.60)
   3. Remove unused dependencies:
* `org.apache.htrace:*`
* `org.apache.hbase:*`
   4. Remove unused/useless properties:
* `hadoop.assemblies.version`
* `hadoop.common.build.dir` (hadoop-common source no longer present in the 
same repo)
* `hbase.*.version`
   5. Exclude `ratis-examples` (accidental dependency of `ratis-tools`)
   
   https://issues.apache.org/jira/browse/HDDS-3177
   
   ## How was this patch tested?
   
   https://github.com/adoroszlai/hadoop-ozone/runs/507994482


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] bshashikant commented on issue #673: HDDS-3064. Get Key is hung when READ delay is injected in chunk file path.

2020-03-14 Thread GitBox
bshashikant commented on issue #673: HDDS-3064. Get Key is hung when READ delay 
is injected in chunk file path.
URL: https://github.com/apache/hadoop-ozone/pull/673#issuecomment-599111792
 
 
   > > It was verified in the actual test up where the problem originated.
   > 
   > Can you please share the details how was this patch tested and how the 
problem can be reproduced?
   
   It was reproduced in fault injection testing environment. @nilotpalnandi , 
can you please add some details.?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] bshashikant commented on issue #673: HDDS-3064. Get Key is hung when READ delay is injected in chunk file path.

2020-03-14 Thread GitBox
bshashikant commented on issue #673: HDDS-3064. Get Key is hung when READ delay 
is injected in chunk file path.
URL: https://github.com/apache/hadoop-ozone/pull/673#issuecomment-599111536
 
 
   > @adoroszlai @bshashikant I am not saying we wait for the entire block to 
be read. The client will be reused for all the chunk reads. Lets take a 
scenario.
   > Chunk1 fails on dn1. Succeded on dn2.
   > Chunk2 -> we should try reading from either dn2 or dn3. We will need to 
mark dn1 timed out in the code. Currently we can still retry dn1.
   > Chunk3 -> dn2, dn3 fail. We should retry dn1 before failing the read.
   > For retrying dn1 we will need to reestablish the connection. I think we 
are handling this.
   > 
   > Regarding the deadline, if we set a deadline at t=0, the stub will fail at 
t=30 even if the client is able to read chunks from the stub. Please verify if 
this is how the deadline works. I think we will need to differentiate between 
this scenario and scenario where chunk read is timing out.
   
   @lokeshj1703 , in the read path, once a read op fails on a datanode, the 
same op never gets retried on the same datanode. 
   bb --- Regarding the deadline, if we set a deadline at t=0, the stub will 
fail at t=30 even if the client is able to read chunks from the stub. Please 
verify if this is how the deadline works. I think we will need to differentiate 
between this scenario and scenario where chunk read is timing out.
   
   The deadline which is set is for on the rpc call. If the rpc response 
doesn't complete within teh deadline , it marks it deadline exceeded. The same 
deadline is being used in Ratis as well. The client itself doesn't maintain a 
timer as such.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3180) Datanode fails to start due to confused inconsistent volume state

2020-03-14 Thread Yiqun Lin (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDDS-3180:

Status: Patch Available  (was: Open)

> Datanode fails to start due to confused inconsistent volume state
> -
>
> Key: HDDS-3180
> URL: https://issues.apache.org/jira/browse/HDDS-3180
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Affects Versions: 0.4.1
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I meet an error in my testing ozone cluster when I restart datanode. From the 
> log, it throws inconsistent volume state but without other detailed helpful 
> info:
> {noformat}
> 2020-03-14 02:31:46,204 [main] INFO  (LogAdapter.java:51) - registered 
> UNIX signal handlers for [TERM, HUP, INT]
> 2020-03-14 02:31:46,736 [main] INFO  (HddsDatanodeService.java:204) - 
> HddsDatanodeService host:lyq-xx.xx.xx.xx ip:xx.xx.xx.xx
> 2020-03-14 02:31:46,784 [main] INFO  (HddsVolume.java:177) - Creating 
> Volume: /tmp/hadoop-hdfs/dfs/data/hdds of storage type : DISK and capacity : 
> 20063645696
> 2020-03-14 02:31:46,786 [main] ERROR (MutableVolumeSet.java:202) - Failed 
> to parse the storage location: file:///tmp/hadoop-hdfs/dfs/data
> java.io.IOException: Volume is in an INCONSISTENT state. Skipped loading 
> volume: /tmp/hadoop-hdfs/dfs/data/hdds
> at 
> org.apache.hadoop.ozone.container.common.volume.HddsVolume.initialize(HddsVolume.java:226)
> at 
> org.apache.hadoop.ozone.container.common.volume.HddsVolume.(HddsVolume.java:180)
> at 
> org.apache.hadoop.ozone.container.common.volume.HddsVolume.(HddsVolume.java:71)
> at 
> org.apache.hadoop.ozone.container.common.volume.HddsVolume$Builder.build(HddsVolume.java:158)
> at 
> org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.createVolume(MutableVolumeSet.java:336)
> at 
> org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.initializeVolumeSet(MutableVolumeSet.java:183)
> at 
> org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.(MutableVolumeSet.java:139)
> at 
> org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.(MutableVolumeSet.java:111)
> at 
> org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.(OzoneContainer.java:97)
> at 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.(DatanodeStateMachine.java:128)
> at 
> org.apache.hadoop.ozone.HddsDatanodeService.start(HddsDatanodeService.java:235)
> at 
> org.apache.hadoop.ozone.HddsDatanodeService.start(HddsDatanodeService.java:179)
> at 
> org.apache.hadoop.ozone.HddsDatanodeService.call(HddsDatanodeService.java:154)
> at 
> org.apache.hadoop.ozone.HddsDatanodeService.call(HddsDatanodeService.java:78)
> at picocli.CommandLine.execute(CommandLine.java:1173)
> at picocli.CommandLine.access$800(CommandLine.java:141)
> at picocli.CommandLine$RunLast.handle(CommandLine.java:1367)
> at picocli.CommandLine$RunLast.handle(CommandLine.java:1335)
> at 
> picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:1243)
> at picocli.CommandLine.parseWithHandlers(CommandLine.java:1526)
> at picocli.CommandLine.parseWithHandler(CommandLine.java:1465)
> at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:65)
> at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:56)
> at 
> org.apache.hadoop.ozone.HddsDatanodeService.main(HddsDatanodeService.java:137)
> 2020-03-14 02:31:46,795 [shutdown-hook-0] INFO  (LogAdapter.java:51) - 
> SHUTDOWN_MSG:
> {noformat}
> Then I look into the code and the root cause is that the version file was 
> lost in that node.
> We need to log key message as well to help user quickly know the root cause 
> of this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-3180) Datanode fails to start due to confused inconsistent volume state

2020-03-14 Thread Yiqun Lin (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17059327#comment-17059327
 ] 

Yiqun Lin edited comment on HDDS-3180 at 3/14/20, 12:29 PM:


We need to additionally add log for the inconsistent state because this state 
will lead Datanode failed to start.

A more friendly message tested in my local:
{noformat}
2020-03-14 04:41:27,249 [main] INFO  (HddsVolume.java:177) - Creating 
Volume: /tmp/hadoop-hdfs/dfs/data/hdds of storage type : DISK and capacity : 
9997713408
2020-03-14 04:41:27,250 [main] WARN  (HddsVolume.java:252) - VERSION file 
does not exist in volume /tmp/hadoop-hdfs/dfs/data/hdds, current volume state: 
INCONSISTENT.
2020-03-14 04:41:27,257 [main] ERROR (MutableVolumeSet.java:202) - Failed 
to parse the storage location: file:///tmp/hadoop-hdfs/dfs/data
java.io.IOException: Volume is in an INCONSISTENT state. Skipped loading 
volume: /tmp/hadoop-hdfs/dfs/data/hdds
at 
org.apache.hadoop.ozone.container.common.volume.HddsVolume.initialize(HddsVolume.java:226)
at 
org.apache.hadoop.ozone.container.common.volume.HddsVolume.(HddsVolume.java:180)
at 
org.apache.hadoop.ozone.container.common.volume.HddsVolume.(HddsVolume.java:71)
at 
org.apache.hadoop.ozone.container.common.volume.HddsVolume$Builder.build(HddsVolume.java:158)
at 
org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.createVolume(MutableVolumeSet.java:336)
{noformat}


was (Author: linyiqun):
We need to additionally add log for the inconsistent state because this state 
will lead Datanode failed to start.

> Datanode fails to start due to confused inconsistent volume state
> -
>
> Key: HDDS-3180
> URL: https://issues.apache.org/jira/browse/HDDS-3180
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Affects Versions: 0.4.1
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I meet an error in my testing ozone cluster when I restart datanode. From the 
> log, it throws inconsistent volume state but without other detailed helpful 
> info:
> {noformat}
> 2020-03-14 02:31:46,204 [main] INFO  (LogAdapter.java:51) - registered 
> UNIX signal handlers for [TERM, HUP, INT]
> 2020-03-14 02:31:46,736 [main] INFO  (HddsDatanodeService.java:204) - 
> HddsDatanodeService host:lyq-xx.xx.xx.xx ip:xx.xx.xx.xx
> 2020-03-14 02:31:46,784 [main] INFO  (HddsVolume.java:177) - Creating 
> Volume: /tmp/hadoop-hdfs/dfs/data/hdds of storage type : DISK and capacity : 
> 20063645696
> 2020-03-14 02:31:46,786 [main] ERROR (MutableVolumeSet.java:202) - Failed 
> to parse the storage location: file:///tmp/hadoop-hdfs/dfs/data
> java.io.IOException: Volume is in an INCONSISTENT state. Skipped loading 
> volume: /tmp/hadoop-hdfs/dfs/data/hdds
> at 
> org.apache.hadoop.ozone.container.common.volume.HddsVolume.initialize(HddsVolume.java:226)
> at 
> org.apache.hadoop.ozone.container.common.volume.HddsVolume.(HddsVolume.java:180)
> at 
> org.apache.hadoop.ozone.container.common.volume.HddsVolume.(HddsVolume.java:71)
> at 
> org.apache.hadoop.ozone.container.common.volume.HddsVolume$Builder.build(HddsVolume.java:158)
> at 
> org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.createVolume(MutableVolumeSet.java:336)
> at 
> org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.initializeVolumeSet(MutableVolumeSet.java:183)
> at 
> org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.(MutableVolumeSet.java:139)
> at 
> org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.(MutableVolumeSet.java:111)
> at 
> org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.(OzoneContainer.java:97)
> at 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.(DatanodeStateMachine.java:128)
> at 
> org.apache.hadoop.ozone.HddsDatanodeService.start(HddsDatanodeService.java:235)
> at 
> org.apache.hadoop.ozone.HddsDatanodeService.start(HddsDatanodeService.java:179)
> at 
> org.apache.hadoop.ozone.HddsDatanodeService.call(HddsDatanodeService.java:154)
> at 
> org.apache.hadoop.ozone.HddsDatanodeService.call(HddsDatanodeService.java:78)
> at picocli.CommandLine.execute(CommandLine.java:1173)
> at picocli.CommandLine.access$800(CommandLine.java:141)
> at picocli.CommandLine$RunLast.handle(CommandLine.java:1367)
> at picocli.CommandLine$RunLast.handle(CommandLine.java:1335)
> at 
> picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.

[jira] [Commented] (HDDS-3180) Datanode fails to start due to confused inconsistent volume state

2020-03-14 Thread Yiqun Lin (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17059327#comment-17059327
 ] 

Yiqun Lin commented on HDDS-3180:
-

We need to additionally add log for the inconsistent state because this state 
will lead Datanode failed to start.

> Datanode fails to start due to confused inconsistent volume state
> -
>
> Key: HDDS-3180
> URL: https://issues.apache.org/jira/browse/HDDS-3180
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Affects Versions: 0.4.1
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I meet an error in my testing ozone cluster when I restart datanode. From the 
> log, it throws inconsistent volume state but without other detailed helpful 
> info:
> {noformat}
> 2020-03-14 02:31:46,204 [main] INFO  (LogAdapter.java:51) - registered 
> UNIX signal handlers for [TERM, HUP, INT]
> 2020-03-14 02:31:46,736 [main] INFO  (HddsDatanodeService.java:204) - 
> HddsDatanodeService host:lyq-xx.xx.xx.xx ip:xx.xx.xx.xx
> 2020-03-14 02:31:46,784 [main] INFO  (HddsVolume.java:177) - Creating 
> Volume: /tmp/hadoop-hdfs/dfs/data/hdds of storage type : DISK and capacity : 
> 20063645696
> 2020-03-14 02:31:46,786 [main] ERROR (MutableVolumeSet.java:202) - Failed 
> to parse the storage location: file:///tmp/hadoop-hdfs/dfs/data
> java.io.IOException: Volume is in an INCONSISTENT state. Skipped loading 
> volume: /tmp/hadoop-hdfs/dfs/data/hdds
> at 
> org.apache.hadoop.ozone.container.common.volume.HddsVolume.initialize(HddsVolume.java:226)
> at 
> org.apache.hadoop.ozone.container.common.volume.HddsVolume.(HddsVolume.java:180)
> at 
> org.apache.hadoop.ozone.container.common.volume.HddsVolume.(HddsVolume.java:71)
> at 
> org.apache.hadoop.ozone.container.common.volume.HddsVolume$Builder.build(HddsVolume.java:158)
> at 
> org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.createVolume(MutableVolumeSet.java:336)
> at 
> org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.initializeVolumeSet(MutableVolumeSet.java:183)
> at 
> org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.(MutableVolumeSet.java:139)
> at 
> org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.(MutableVolumeSet.java:111)
> at 
> org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.(OzoneContainer.java:97)
> at 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.(DatanodeStateMachine.java:128)
> at 
> org.apache.hadoop.ozone.HddsDatanodeService.start(HddsDatanodeService.java:235)
> at 
> org.apache.hadoop.ozone.HddsDatanodeService.start(HddsDatanodeService.java:179)
> at 
> org.apache.hadoop.ozone.HddsDatanodeService.call(HddsDatanodeService.java:154)
> at 
> org.apache.hadoop.ozone.HddsDatanodeService.call(HddsDatanodeService.java:78)
> at picocli.CommandLine.execute(CommandLine.java:1173)
> at picocli.CommandLine.access$800(CommandLine.java:141)
> at picocli.CommandLine$RunLast.handle(CommandLine.java:1367)
> at picocli.CommandLine$RunLast.handle(CommandLine.java:1335)
> at 
> picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:1243)
> at picocli.CommandLine.parseWithHandlers(CommandLine.java:1526)
> at picocli.CommandLine.parseWithHandler(CommandLine.java:1465)
> at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:65)
> at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:56)
> at 
> org.apache.hadoop.ozone.HddsDatanodeService.main(HddsDatanodeService.java:137)
> 2020-03-14 02:31:46,795 [shutdown-hook-0] INFO  (LogAdapter.java:51) - 
> SHUTDOWN_MSG:
> {noformat}
> Then I look into the code and the root cause is that the version file was 
> lost in that node.
> We need to log key message as well to help user quickly know the root cause 
> of this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3180) Datanode fails to start due to confused inconsistent volume state

2020-03-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3180:
-
Labels: pull-request-available  (was: )

> Datanode fails to start due to confused inconsistent volume state
> -
>
> Key: HDDS-3180
> URL: https://issues.apache.org/jira/browse/HDDS-3180
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Affects Versions: 0.4.1
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Major
>  Labels: pull-request-available
>
> I meet an error in my testing ozone cluster when I restart datanode. From the 
> log, it throws inconsistent volume state but without other detailed helpful 
> info:
> {noformat}
> 2020-03-14 02:31:46,204 [main] INFO  (LogAdapter.java:51) - registered 
> UNIX signal handlers for [TERM, HUP, INT]
> 2020-03-14 02:31:46,736 [main] INFO  (HddsDatanodeService.java:204) - 
> HddsDatanodeService host:lyq-xx.xx.xx.xx ip:xx.xx.xx.xx
> 2020-03-14 02:31:46,784 [main] INFO  (HddsVolume.java:177) - Creating 
> Volume: /tmp/hadoop-hdfs/dfs/data/hdds of storage type : DISK and capacity : 
> 20063645696
> 2020-03-14 02:31:46,786 [main] ERROR (MutableVolumeSet.java:202) - Failed 
> to parse the storage location: file:///tmp/hadoop-hdfs/dfs/data
> java.io.IOException: Volume is in an INCONSISTENT state. Skipped loading 
> volume: /tmp/hadoop-hdfs/dfs/data/hdds
> at 
> org.apache.hadoop.ozone.container.common.volume.HddsVolume.initialize(HddsVolume.java:226)
> at 
> org.apache.hadoop.ozone.container.common.volume.HddsVolume.(HddsVolume.java:180)
> at 
> org.apache.hadoop.ozone.container.common.volume.HddsVolume.(HddsVolume.java:71)
> at 
> org.apache.hadoop.ozone.container.common.volume.HddsVolume$Builder.build(HddsVolume.java:158)
> at 
> org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.createVolume(MutableVolumeSet.java:336)
> at 
> org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.initializeVolumeSet(MutableVolumeSet.java:183)
> at 
> org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.(MutableVolumeSet.java:139)
> at 
> org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.(MutableVolumeSet.java:111)
> at 
> org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.(OzoneContainer.java:97)
> at 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.(DatanodeStateMachine.java:128)
> at 
> org.apache.hadoop.ozone.HddsDatanodeService.start(HddsDatanodeService.java:235)
> at 
> org.apache.hadoop.ozone.HddsDatanodeService.start(HddsDatanodeService.java:179)
> at 
> org.apache.hadoop.ozone.HddsDatanodeService.call(HddsDatanodeService.java:154)
> at 
> org.apache.hadoop.ozone.HddsDatanodeService.call(HddsDatanodeService.java:78)
> at picocli.CommandLine.execute(CommandLine.java:1173)
> at picocli.CommandLine.access$800(CommandLine.java:141)
> at picocli.CommandLine$RunLast.handle(CommandLine.java:1367)
> at picocli.CommandLine$RunLast.handle(CommandLine.java:1335)
> at 
> picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:1243)
> at picocli.CommandLine.parseWithHandlers(CommandLine.java:1526)
> at picocli.CommandLine.parseWithHandler(CommandLine.java:1465)
> at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:65)
> at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:56)
> at 
> org.apache.hadoop.ozone.HddsDatanodeService.main(HddsDatanodeService.java:137)
> 2020-03-14 02:31:46,795 [shutdown-hook-0] INFO  (LogAdapter.java:51) - 
> SHUTDOWN_MSG:
> {noformat}
> Then I look into the code and the root cause is that the version file was 
> lost in that node.
> We need to log key message as well to help user quickly know the root cause 
> of this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] linyiqun opened a new pull request #679: HDDS-3180. Datanode fails to start due to confused inconsistent volum…

2020-03-14 Thread GitBox
linyiqun opened a new pull request #679: HDDS-3180. Datanode fails to start due 
to confused inconsistent volum…
URL: https://github.com/apache/hadoop-ozone/pull/679
 
 
   ## What changes were proposed in this pull request?
   Add helpful error message for root cause of Datanode startup failure.
   
   ## What is the link to the Apache JIRA
   https://issues.apache.org/jira/browse/HDDS-3180
   
   ## How was this patch tested?
   Tested manually in the local env.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3180) Datanode fails to start due to confused inconsistent volume state

2020-03-14 Thread Yiqun Lin (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDDS-3180:

Summary: Datanode fails to start due to confused inconsistent volume state  
(was: Datanode fails to start due to inconsistent volume state without helpful 
error message)

> Datanode fails to start due to confused inconsistent volume state
> -
>
> Key: HDDS-3180
> URL: https://issues.apache.org/jira/browse/HDDS-3180
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Affects Versions: 0.4.1
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Major
>
> I meet an error in my testing ozone cluster when I restart datanode. From the 
> log, it throws inconsistent volume state but without other detailed helpful 
> info:
> {noformat}
> 2020-03-14 02:31:46,204 [main] INFO  (LogAdapter.java:51) - registered 
> UNIX signal handlers for [TERM, HUP, INT]
> 2020-03-14 02:31:46,736 [main] INFO  (HddsDatanodeService.java:204) - 
> HddsDatanodeService host:lyq-xx.xx.xx.xx ip:xx.xx.xx.xx
> 2020-03-14 02:31:46,784 [main] INFO  (HddsVolume.java:177) - Creating 
> Volume: /tmp/hadoop-hdfs/dfs/data/hdds of storage type : DISK and capacity : 
> 20063645696
> 2020-03-14 02:31:46,786 [main] ERROR (MutableVolumeSet.java:202) - Failed 
> to parse the storage location: file:///tmp/hadoop-hdfs/dfs/data
> java.io.IOException: Volume is in an INCONSISTENT state. Skipped loading 
> volume: /tmp/hadoop-hdfs/dfs/data/hdds
> at 
> org.apache.hadoop.ozone.container.common.volume.HddsVolume.initialize(HddsVolume.java:226)
> at 
> org.apache.hadoop.ozone.container.common.volume.HddsVolume.(HddsVolume.java:180)
> at 
> org.apache.hadoop.ozone.container.common.volume.HddsVolume.(HddsVolume.java:71)
> at 
> org.apache.hadoop.ozone.container.common.volume.HddsVolume$Builder.build(HddsVolume.java:158)
> at 
> org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.createVolume(MutableVolumeSet.java:336)
> at 
> org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.initializeVolumeSet(MutableVolumeSet.java:183)
> at 
> org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.(MutableVolumeSet.java:139)
> at 
> org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.(MutableVolumeSet.java:111)
> at 
> org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.(OzoneContainer.java:97)
> at 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.(DatanodeStateMachine.java:128)
> at 
> org.apache.hadoop.ozone.HddsDatanodeService.start(HddsDatanodeService.java:235)
> at 
> org.apache.hadoop.ozone.HddsDatanodeService.start(HddsDatanodeService.java:179)
> at 
> org.apache.hadoop.ozone.HddsDatanodeService.call(HddsDatanodeService.java:154)
> at 
> org.apache.hadoop.ozone.HddsDatanodeService.call(HddsDatanodeService.java:78)
> at picocli.CommandLine.execute(CommandLine.java:1173)
> at picocli.CommandLine.access$800(CommandLine.java:141)
> at picocli.CommandLine$RunLast.handle(CommandLine.java:1367)
> at picocli.CommandLine$RunLast.handle(CommandLine.java:1335)
> at 
> picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:1243)
> at picocli.CommandLine.parseWithHandlers(CommandLine.java:1526)
> at picocli.CommandLine.parseWithHandler(CommandLine.java:1465)
> at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:65)
> at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:56)
> at 
> org.apache.hadoop.ozone.HddsDatanodeService.main(HddsDatanodeService.java:137)
> 2020-03-14 02:31:46,795 [shutdown-hook-0] INFO  (LogAdapter.java:51) - 
> SHUTDOWN_MSG:
> {noformat}
> Then I look into the code and the root cause is that the version file was 
> lost in that node.
> We need to log key message as well to help user quickly know the root cause 
> of this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3180) Datanode fails to start due to inconsistent volume state without helpful error message

2020-03-14 Thread Yiqun Lin (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDDS-3180:

Summary: Datanode fails to start due to inconsistent volume state without 
helpful error message  (was: Datanode shutdown due to inconsistent volume state 
without helpful error message)

> Datanode fails to start due to inconsistent volume state without helpful 
> error message
> --
>
> Key: HDDS-3180
> URL: https://issues.apache.org/jira/browse/HDDS-3180
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Affects Versions: 0.4.1
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Major
>
> I meet an error in my testing ozone cluster when I restart datanode. From the 
> log, it throws inconsistent volume state but without other detailed helpful 
> info:
> {noformat}
> 2020-03-14 02:31:46,204 [main] INFO  (LogAdapter.java:51) - registered 
> UNIX signal handlers for [TERM, HUP, INT]
> 2020-03-14 02:31:46,736 [main] INFO  (HddsDatanodeService.java:204) - 
> HddsDatanodeService host:lyq-xx.xx.xx.xx ip:xx.xx.xx.xx
> 2020-03-14 02:31:46,784 [main] INFO  (HddsVolume.java:177) - Creating 
> Volume: /tmp/hadoop-hdfs/dfs/data/hdds of storage type : DISK and capacity : 
> 20063645696
> 2020-03-14 02:31:46,786 [main] ERROR (MutableVolumeSet.java:202) - Failed 
> to parse the storage location: file:///tmp/hadoop-hdfs/dfs/data
> java.io.IOException: Volume is in an INCONSISTENT state. Skipped loading 
> volume: /tmp/hadoop-hdfs/dfs/data/hdds
> at 
> org.apache.hadoop.ozone.container.common.volume.HddsVolume.initialize(HddsVolume.java:226)
> at 
> org.apache.hadoop.ozone.container.common.volume.HddsVolume.(HddsVolume.java:180)
> at 
> org.apache.hadoop.ozone.container.common.volume.HddsVolume.(HddsVolume.java:71)
> at 
> org.apache.hadoop.ozone.container.common.volume.HddsVolume$Builder.build(HddsVolume.java:158)
> at 
> org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.createVolume(MutableVolumeSet.java:336)
> at 
> org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.initializeVolumeSet(MutableVolumeSet.java:183)
> at 
> org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.(MutableVolumeSet.java:139)
> at 
> org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.(MutableVolumeSet.java:111)
> at 
> org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.(OzoneContainer.java:97)
> at 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.(DatanodeStateMachine.java:128)
> at 
> org.apache.hadoop.ozone.HddsDatanodeService.start(HddsDatanodeService.java:235)
> at 
> org.apache.hadoop.ozone.HddsDatanodeService.start(HddsDatanodeService.java:179)
> at 
> org.apache.hadoop.ozone.HddsDatanodeService.call(HddsDatanodeService.java:154)
> at 
> org.apache.hadoop.ozone.HddsDatanodeService.call(HddsDatanodeService.java:78)
> at picocli.CommandLine.execute(CommandLine.java:1173)
> at picocli.CommandLine.access$800(CommandLine.java:141)
> at picocli.CommandLine$RunLast.handle(CommandLine.java:1367)
> at picocli.CommandLine$RunLast.handle(CommandLine.java:1335)
> at 
> picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:1243)
> at picocli.CommandLine.parseWithHandlers(CommandLine.java:1526)
> at picocli.CommandLine.parseWithHandler(CommandLine.java:1465)
> at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:65)
> at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:56)
> at 
> org.apache.hadoop.ozone.HddsDatanodeService.main(HddsDatanodeService.java:137)
> 2020-03-14 02:31:46,795 [shutdown-hook-0] INFO  (LogAdapter.java:51) - 
> SHUTDOWN_MSG:
> {noformat}
> Then I look into the code and the root cause is that the version file was 
> lost in that node.
> We need to log key message as well to help user quickly know the root cause 
> of this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-3180) Datanode shutdown due to inconsistent volume state without helpful error message

2020-03-14 Thread Yiqun Lin (Jira)
Yiqun Lin created HDDS-3180:
---

 Summary: Datanode shutdown due to inconsistent volume state 
without helpful error message
 Key: HDDS-3180
 URL: https://issues.apache.org/jira/browse/HDDS-3180
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
Affects Versions: 0.4.1
Reporter: Yiqun Lin
Assignee: Yiqun Lin


I meet an error in my testing ozone cluster when I restart datanode. From the 
log, it throws inconsistent volume state but without other detailed helpful 
info:
{noformat}
2020-03-14 02:31:46,204 [main] INFO  (LogAdapter.java:51) - registered UNIX 
signal handlers for [TERM, HUP, INT]
2020-03-14 02:31:46,736 [main] INFO  (HddsDatanodeService.java:204) - 
HddsDatanodeService host:lyq-xx.xx.xx.xx ip:xx.xx.xx.xx
2020-03-14 02:31:46,784 [main] INFO  (HddsVolume.java:177) - Creating 
Volume: /tmp/hadoop-hdfs/dfs/data/hdds of storage type : DISK and capacity : 
20063645696
2020-03-14 02:31:46,786 [main] ERROR (MutableVolumeSet.java:202) - Failed 
to parse the storage location: file:///tmp/hadoop-hdfs/dfs/data
java.io.IOException: Volume is in an INCONSISTENT state. Skipped loading 
volume: /tmp/hadoop-hdfs/dfs/data/hdds
at 
org.apache.hadoop.ozone.container.common.volume.HddsVolume.initialize(HddsVolume.java:226)
at 
org.apache.hadoop.ozone.container.common.volume.HddsVolume.(HddsVolume.java:180)
at 
org.apache.hadoop.ozone.container.common.volume.HddsVolume.(HddsVolume.java:71)
at 
org.apache.hadoop.ozone.container.common.volume.HddsVolume$Builder.build(HddsVolume.java:158)
at 
org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.createVolume(MutableVolumeSet.java:336)
at 
org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.initializeVolumeSet(MutableVolumeSet.java:183)
at 
org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.(MutableVolumeSet.java:139)
at 
org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.(MutableVolumeSet.java:111)
at 
org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.(OzoneContainer.java:97)
at 
org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.(DatanodeStateMachine.java:128)
at 
org.apache.hadoop.ozone.HddsDatanodeService.start(HddsDatanodeService.java:235)
at 
org.apache.hadoop.ozone.HddsDatanodeService.start(HddsDatanodeService.java:179)
at 
org.apache.hadoop.ozone.HddsDatanodeService.call(HddsDatanodeService.java:154)
at 
org.apache.hadoop.ozone.HddsDatanodeService.call(HddsDatanodeService.java:78)
at picocli.CommandLine.execute(CommandLine.java:1173)
at picocli.CommandLine.access$800(CommandLine.java:141)
at picocli.CommandLine$RunLast.handle(CommandLine.java:1367)
at picocli.CommandLine$RunLast.handle(CommandLine.java:1335)
at 
picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:1243)
at picocli.CommandLine.parseWithHandlers(CommandLine.java:1526)
at picocli.CommandLine.parseWithHandler(CommandLine.java:1465)
at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:65)
at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:56)
at 
org.apache.hadoop.ozone.HddsDatanodeService.main(HddsDatanodeService.java:137)
2020-03-14 02:31:46,795 [shutdown-hook-0] INFO  (LogAdapter.java:51) - 
SHUTDOWN_MSG:
{noformat}

Then I look into the code and the root cause is that the version file was lost 
in that node.
We need to log key message as well to help user quickly know the root cause of 
this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-3176) Remove unused dependency version strings

2020-03-14 Thread Attila Doroszlai (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Doroszlai reassigned HDDS-3176:
--

Assignee: Attila Doroszlai

> Remove unused dependency version strings
> 
>
> Key: HDDS-3176
> URL: https://issues.apache.org/jira/browse/HDDS-3176
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>Affects Versions: 0.5.0
>Reporter: Wei-Chiu Chuang
>Assignee: Attila Doroszlai
>Priority: Minor
>  Labels: newbie
>
> After the repo was split from hadoop, there are a few unused 
> dependencies/version strings left in pom.xml. They can be removed.
> Example: 
> {code}
> 1.2.6
> 2.0.0-beta-1
> {code}
> There may be more.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] elek commented on issue #673: HDDS-3064. Get Key is hung when READ delay is injected in chunk file path.

2020-03-14 Thread GitBox
elek commented on issue #673: HDDS-3064. Get Key is hung when READ delay is 
injected in chunk file path.
URL: https://github.com/apache/hadoop-ozone/pull/673#issuecomment-599027682
 
 
   > It was verified in the actual test up where the problem originated.
   
   Can you please share the details how was this patch tested and how the 
problem can be reproduced?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-3148) Logs cluttered by AlreadyExistsException from Ratis

2020-03-14 Thread Attila Doroszlai (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-3148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17059207#comment-17059207
 ] 

Attila Doroszlai commented on HDDS-3148:


Opened RATIS-828 for Ratis follow-up.

> Logs cluttered by AlreadyExistsException from Ratis
> ---
>
> Key: HDDS-3148
> URL: https://issues.apache.org/jira/browse/HDDS-3148
> Project: Hadoop Distributed Data Store
>  Issue Type: Wish
>  Components: Ozone Datanode
>Reporter: Attila Doroszlai
>Assignee: Siddharth Wagle
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Ozone startup logs are cluttered by printing stack trace of 
> AlreadyExistsException related to group addition.  Example:
> {code}
> 2020-03-09 13:53:01,563 [grpc-default-executor-0] WARN  impl.RaftServerProxy 
> (RaftServerProxy.java:lambda$groupAddAsync$11(390)) - 
> 7a07f161-9144-44b2-8baa-73f0e9299675: Failed groupAdd* 
> GroupManagementRequest:client-27FB1A91809E->7a07f161-9144-44b2-8baa-73f0e9299675@group-E151028E3AC0,
>  cid=2, seq=0, RW, null, 
> Add:group-E151028E3AC0:[18f4e257-bf09-482e-b1bb-a2408a093ff7:172.17.0.2:43845,
>  7a07f161-9144-44b2-8baa-73f0e9299675:172.17.0.2:41551, 
> 8a66c80e-ab55-4975-92a9-8aaf06ab418a:172.17.0.2:36921]
> java.util.concurrent.CompletionException: 
> org.apache.ratis.protocol.AlreadyExistsException: 
> 7a07f161-9144-44b2-8baa-73f0e9299675: Failed to add 
> group-E151028E3AC0:[18f4e257-bf09-482e-b1bb-a2408a093ff7:172.17.0.2:43845, 
> 7a07f161-9144-44b2-8baa-73f0e9299675:172.17.0.2:41551, 
> 8a66c80e-ab55-4975-92a9-8aaf06ab418a:172.17.0.2:36921] since the group 
> already exists in the map.
>   at 
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
>   at 
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
>   at 
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607)
>   at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
>   at 
> java.util.concurrent.CompletableFuture.uniApplyStage(CompletableFuture.java:631)
>   at 
> java.util.concurrent.CompletableFuture.thenApplyAsync(CompletableFuture.java:2006)
>   at 
> org.apache.ratis.server.impl.RaftServerProxy.groupAddAsync(RaftServerProxy.java:379)
>   at 
> org.apache.ratis.server.impl.RaftServerProxy.groupManagementAsync(RaftServerProxy.java:363)
>   at 
> org.apache.ratis.grpc.server.GrpcAdminProtocolService.lambda$groupManagement$0(GrpcAdminProtocolService.java:42)
>   at org.apache.ratis.grpc.GrpcUtil.asyncCall(GrpcUtil.java:160)
>   at 
> org.apache.ratis.grpc.server.GrpcAdminProtocolService.groupManagement(GrpcAdminProtocolService.java:42)
>   at 
> org.apache.ratis.proto.grpc.AdminProtocolServiceGrpc$MethodHandlers.invoke(AdminProtocolServiceGrpc.java:358)
>   at 
> org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:172)
>   at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:331)
>   at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:814)
>   at 
> org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
>   at 
> org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.ratis.protocol.AlreadyExistsException: 
> 7a07f161-9144-44b2-8baa-73f0e9299675: Failed to add 
> group-E151028E3AC0:[18f4e257-bf09-482e-b1bb-a2408a093ff7:172.17.0.2:43845, 
> 7a07f161-9144-44b2-8baa-73f0e9299675:172.17.0.2:41551, 
> 8a66c80e-ab55-4975-92a9-8aaf06ab418a:172.17.0.2:36921] since the group 
> already exists in the map.
>   at 
> org.apache.ratis.server.impl.RaftServerProxy$ImplMap.addNew(RaftServerProxy.java:83)
>   at 
> org.apache.ratis.server.impl.RaftServerProxy.groupAddAsync(RaftServerProxy.java:378)
>   ... 13 more
> {code}
> Since these are "normal", I think stack trace should be suppressed.
> CC [~nanda]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org