[GitHub] [hadoop-ozone] vivekratnavel commented on issue #675: HDDS-3170. Fix issues in File count by size task.
vivekratnavel commented on issue #675: HDDS-3170. Fix issues in File count by size task. URL: https://github.com/apache/hadoop-ozone/pull/675#issuecomment-599175045 +1 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-3181) Intermittent failure in TestReconWithOzoneManager due to BindException
Attila Doroszlai created HDDS-3181: -- Summary: Intermittent failure in TestReconWithOzoneManager due to BindException Key: HDDS-3181 URL: https://issues.apache.org/jira/browse/HDDS-3181 Project: Hadoop Distributed Data Store Issue Type: Bug Components: test Reporter: Attila Doroszlai Assignee: Attila Doroszlai TestReconWithOzoneManager may fail with BindException: {code:title=https://github.com/apache/hadoop-ozone/pull/677/checks?check_run_id=507376007} Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 19.707 s <<< FAILURE! - in org.apache.hadoop.ozone.recon.TestReconWithOzoneManager org.apache.hadoop.ozone.recon.TestReconWithOzoneManager Time elapsed: 19.706 s <<< ERROR! picocli.CommandLine$ExecutionException: Error while calling command (org.apache.hadoop.ozone.recon.ReconServer@23f74a49): java.net.BindException: Port in use: 0.0.0.0:36263 ... at org.apache.hadoop.ozone.MiniOzoneClusterImpl$Builder.build(MiniOzoneClusterImpl.java:534) at org.apache.hadoop.ozone.recon.TestReconWithOzoneManager.init(TestReconWithOzoneManager.java:109) ... Caused by: java.net.BindException: Port in use: 0.0.0.0:36263 at org.apache.hadoop.hdds.server.http.HttpServer2.constructBindException(HttpServer2.java:1200) at org.apache.hadoop.hdds.server.http.HttpServer2.bindForSinglePort(HttpServer2.java:1222) at org.apache.hadoop.hdds.server.http.HttpServer2.openListeners(HttpServer2.java:1281) at org.apache.hadoop.hdds.server.http.HttpServer2.start(HttpServer2.java:1136) at org.apache.hadoop.hdds.server.http.BaseHttpServer.start(BaseHttpServer.java:252) at org.apache.hadoop.ozone.recon.ReconServer.start(ReconServer.java:128) at org.apache.hadoop.ozone.recon.ReconServer.call(ReconServer.java:106) at org.apache.hadoop.ozone.recon.ReconServer.call(ReconServer.java:50) at picocli.CommandLine.execute(CommandLine.java:1173) ... 27 more {code} {code:title=test output} 2020-03-14 06:17:08,677 [main] INFO http.BaseHttpServer (BaseHttpServer.java:updateConnectorAddress(284)) - HTTP server of ozoneManager listening at http://0.0.0.0:36263 ... 2020-03-14 06:17:11,589 [main] INFO http.BaseHttpServer (BaseHttpServer.java:newHttpServer2BuilderForOzone(170)) - Starting Web-server for recon at: http://0.0.0.0:36263 ... 2020-03-14 06:17:12,756 [main] INFO recon.ReconServer (ReconServer.java:start(125)) - Starting Recon server 2020-03-14 06:17:12,757 [main] INFO http.HttpServer2 (HttpServer2.java:start(1139)) - HttpServer.start() threw a non Bind IOException java.net.BindException: Port in use: 0.0.0.0:36263 ... {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-2848) Recon changes to make snapshots work with OM HA
[ https://issues.apache.org/jira/browse/HDDS-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aravindan Vijayan resolved HDDS-2848. - Resolution: Fixed PR merged. Thanks for the patch [~swagle]. > Recon changes to make snapshots work with OM HA > --- > > Key: HDDS-2848 > URL: https://issues.apache.org/jira/browse/HDDS-2848 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Recon >Reporter: Aravindan Vijayan >Assignee: Siddharth Wagle >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Recon talks to OM in 2 ways - Through HTTP to get DB snapshot, and through > RPC to get delta updates. > Since Recon already uses the OzoneManagerClientProtocol to query the > OzoneManager RPC, the RPC client automatically routes the request to the > leader on an OM HA cluster. Recon only needs the updates from the OM RocksDB > store, and does not need the in flight updates in the OM DoubleBuffer. Due to > the guarantee from Ratis that the leader’s RocksDB will always be up to date, > Recon does not need to worry about going back in time when a current OM > leader goes down. We have to pass in the om service ID to the Ozone Manager > client in Recon, and the failover works internally. Currently we pass in > 'null'. > To make the HTTP call to work against OM HA, Recon has to find out the > current OM leader and download the snapshot from that OM instance. We can use > the way this has been implemented in > org.apache.hadoop.ozone.admin.om.GetServiceRolesSubcommand. We can get the > roles of OM instances and then determine the leader from that. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] avijayanhwx merged pull request #666: HDDS-2848. Recon changes to make snapshots work with OM HA.
avijayanhwx merged pull request #666: HDDS-2848. Recon changes to make snapshots work with OM HA. URL: https://github.com/apache/hadoop-ozone/pull/666 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] avijayanhwx commented on issue #666: HDDS-2848. Recon changes to make snapshots work with OM HA.
avijayanhwx commented on issue #666: HDDS-2848. Recon changes to make snapshots work with OM HA. URL: https://github.com/apache/hadoop-ozone/pull/666#issuecomment-599150054 Thank you for working on this @swagle. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] avijayanhwx commented on issue #666: HDDS-2848. Recon changes to make snapshots work with OM HA.
avijayanhwx commented on issue #666: HDDS-2848. Recon changes to make snapshots work with OM HA. URL: https://github.com/apache/hadoop-ozone/pull/666#issuecomment-599150010 `org.apache.hadoop.ozone.freon.TestDataValidateWithUnsafeByteOperations` Failure seems unrelated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] avijayanhwx commented on a change in pull request #681: HDDS-3153. Create REST API to serve Recon Dashboard and integrate with UI in Recon.
avijayanhwx commented on a change in pull request #681: HDDS-3153. Create REST API to serve Recon Dashboard and integrate with UI in Recon. URL: https://github.com/apache/hadoop-ozone/pull/681#discussion_r392628718 ## File path: hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/api/ClusterStateEndpoint.java ## @@ -0,0 +1,128 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.ozone.recon.api; + +import org.apache.hadoop.hdds.protocol.DatanodeDetails; +import org.apache.hadoop.hdds.protocol.proto.HddsProtos.NodeState; +import org.apache.hadoop.hdds.scm.container.placement.metrics.SCMNodeStat; +import org.apache.hadoop.hdds.scm.server.OzoneStorageContainerManager; +import org.apache.hadoop.ozone.recon.api.types.ClusterStateResponse; +import org.apache.hadoop.ozone.recon.api.types.DatanodeStorageReport; +import org.apache.hadoop.ozone.recon.api.types.DatanodesCount; +import org.apache.hadoop.ozone.recon.recovery.ReconOMMetadataManager; +import org.apache.hadoop.ozone.recon.scm.ReconContainerManager; +import org.apache.hadoop.ozone.recon.scm.ReconNodeManager; +import org.apache.hadoop.ozone.recon.scm.ReconPipelineManager; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import javax.inject.Inject; +import javax.ws.rs.GET; +import javax.ws.rs.Path; +import javax.ws.rs.Produces; +import javax.ws.rs.core.MediaType; +import javax.ws.rs.core.Response; +import java.util.List; +import java.util.concurrent.atomic.AtomicInteger; +import java.util.concurrent.atomic.AtomicLong; + +/** + * Endpoint to fetch current state of ozone cluster. + */ +@Path("/clusterState") +@Produces(MediaType.APPLICATION_JSON) +public class ClusterStateEndpoint { + + private static final Logger LOG = + LoggerFactory.getLogger(ClusterStateEndpoint.class); + + private ReconNodeManager nodeManager; + private ReconPipelineManager pipelineManager; + private ReconContainerManager containerManager; + private ReconOMMetadataManager omMetadataManager; + + @Inject + ClusterStateEndpoint(OzoneStorageContainerManager reconSCM, + ReconOMMetadataManager omMetadataManager) { +this.nodeManager = +(ReconNodeManager) reconSCM.getScmNodeManager(); +this.pipelineManager = (ReconPipelineManager) reconSCM.getPipelineManager(); +this.containerManager = +(ReconContainerManager) reconSCM.getContainerManager(); +this.omMetadataManager = omMetadataManager; + } + + /** + * Return a summary report on current cluster state. + * @return {@link Response} + */ + @GET + public Response getClusterState() { +List datanodeDetails = nodeManager.getAllNodes(); +AtomicInteger healthyDatanodes = new AtomicInteger(); +int containers = this.containerManager.getContainerIDs().size(); +int pipelines = this.pipelineManager.getPipelines().size(); +long volumes; +long buckets; +long keys; +AtomicLong capacity = new AtomicLong(0L); +AtomicLong used = new AtomicLong(0L); Review comment: Why AtomicLong? Simple long may be enough. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] avijayanhwx commented on a change in pull request #681: HDDS-3153. Create REST API to serve Recon Dashboard and integrate with UI in Recon.
avijayanhwx commented on a change in pull request #681: HDDS-3153. Create REST API to serve Recon Dashboard and integrate with UI in Recon. URL: https://github.com/apache/hadoop-ozone/pull/681#discussion_r392627568 ## File path: hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/api/types/PipelineMetadata.java ## @@ -167,9 +149,19 @@ public PipelineMetadata build() { Preconditions.checkNotNull(datanodes); Preconditions.checkNotNull(replicationType); - return new PipelineMetadata(pipelineId, status, leaderNode, datanodes, Review comment: Why was this changed from using the constructor? Setting the class field directly is non standard. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] avijayanhwx commented on a change in pull request #681: HDDS-3153. Create REST API to serve Recon Dashboard and integrate with UI in Recon.
avijayanhwx commented on a change in pull request #681: HDDS-3153. Create REST API to serve Recon Dashboard and integrate with UI in Recon. URL: https://github.com/apache/hadoop-ozone/pull/681#discussion_r392628837 ## File path: hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/api/ClusterStateEndpoint.java ## @@ -0,0 +1,128 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.ozone.recon.api; + +import org.apache.hadoop.hdds.protocol.DatanodeDetails; +import org.apache.hadoop.hdds.protocol.proto.HddsProtos.NodeState; +import org.apache.hadoop.hdds.scm.container.placement.metrics.SCMNodeStat; +import org.apache.hadoop.hdds.scm.server.OzoneStorageContainerManager; +import org.apache.hadoop.ozone.recon.api.types.ClusterStateResponse; +import org.apache.hadoop.ozone.recon.api.types.DatanodeStorageReport; +import org.apache.hadoop.ozone.recon.api.types.DatanodesCount; +import org.apache.hadoop.ozone.recon.recovery.ReconOMMetadataManager; +import org.apache.hadoop.ozone.recon.scm.ReconContainerManager; +import org.apache.hadoop.ozone.recon.scm.ReconNodeManager; +import org.apache.hadoop.ozone.recon.scm.ReconPipelineManager; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import javax.inject.Inject; +import javax.ws.rs.GET; +import javax.ws.rs.Path; +import javax.ws.rs.Produces; +import javax.ws.rs.core.MediaType; +import javax.ws.rs.core.Response; +import java.util.List; +import java.util.concurrent.atomic.AtomicInteger; +import java.util.concurrent.atomic.AtomicLong; + +/** + * Endpoint to fetch current state of ozone cluster. + */ +@Path("/clusterState") +@Produces(MediaType.APPLICATION_JSON) +public class ClusterStateEndpoint { + + private static final Logger LOG = + LoggerFactory.getLogger(ClusterStateEndpoint.class); + + private ReconNodeManager nodeManager; + private ReconPipelineManager pipelineManager; + private ReconContainerManager containerManager; + private ReconOMMetadataManager omMetadataManager; + + @Inject + ClusterStateEndpoint(OzoneStorageContainerManager reconSCM, + ReconOMMetadataManager omMetadataManager) { +this.nodeManager = +(ReconNodeManager) reconSCM.getScmNodeManager(); +this.pipelineManager = (ReconPipelineManager) reconSCM.getPipelineManager(); +this.containerManager = +(ReconContainerManager) reconSCM.getContainerManager(); +this.omMetadataManager = omMetadataManager; + } + + /** + * Return a summary report on current cluster state. + * @return {@link Response} + */ + @GET + public Response getClusterState() { +List datanodeDetails = nodeManager.getAllNodes(); +AtomicInteger healthyDatanodes = new AtomicInteger(); +int containers = this.containerManager.getContainerIDs().size(); +int pipelines = this.pipelineManager.getPipelines().size(); +long volumes; +long buckets; +long keys; +AtomicLong capacity = new AtomicLong(0L); +AtomicLong used = new AtomicLong(0L); +AtomicLong remaining = new AtomicLong(0L); +datanodeDetails.forEach(datanode -> { + NodeState nodeState = nodeManager.getNodeState(datanode); + SCMNodeStat nodeStat = nodeManager.getNodeStat(datanode).get(); Review comment: Can we use SCMNodeManager#getStats here? It is supposed to give the aggregate stats from all nodes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] avijayanhwx commented on a change in pull request #681: HDDS-3153. Create REST API to serve Recon Dashboard and integrate with UI in Recon.
avijayanhwx commented on a change in pull request #681: HDDS-3153. Create REST API to serve Recon Dashboard and integrate with UI in Recon. URL: https://github.com/apache/hadoop-ozone/pull/681#discussion_r392628566 ## File path: hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/api/ClusterStateEndpoint.java ## @@ -0,0 +1,128 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.ozone.recon.api; + +import org.apache.hadoop.hdds.protocol.DatanodeDetails; +import org.apache.hadoop.hdds.protocol.proto.HddsProtos.NodeState; +import org.apache.hadoop.hdds.scm.container.placement.metrics.SCMNodeStat; +import org.apache.hadoop.hdds.scm.server.OzoneStorageContainerManager; +import org.apache.hadoop.ozone.recon.api.types.ClusterStateResponse; +import org.apache.hadoop.ozone.recon.api.types.DatanodeStorageReport; +import org.apache.hadoop.ozone.recon.api.types.DatanodesCount; +import org.apache.hadoop.ozone.recon.recovery.ReconOMMetadataManager; +import org.apache.hadoop.ozone.recon.scm.ReconContainerManager; +import org.apache.hadoop.ozone.recon.scm.ReconNodeManager; +import org.apache.hadoop.ozone.recon.scm.ReconPipelineManager; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import javax.inject.Inject; +import javax.ws.rs.GET; +import javax.ws.rs.Path; +import javax.ws.rs.Produces; +import javax.ws.rs.core.MediaType; +import javax.ws.rs.core.Response; +import java.util.List; +import java.util.concurrent.atomic.AtomicInteger; +import java.util.concurrent.atomic.AtomicLong; + +/** + * Endpoint to fetch current state of ozone cluster. + */ +@Path("/clusterState") +@Produces(MediaType.APPLICATION_JSON) +public class ClusterStateEndpoint { + + private static final Logger LOG = + LoggerFactory.getLogger(ClusterStateEndpoint.class); + + private ReconNodeManager nodeManager; + private ReconPipelineManager pipelineManager; + private ReconContainerManager containerManager; + private ReconOMMetadataManager omMetadataManager; + + @Inject + ClusterStateEndpoint(OzoneStorageContainerManager reconSCM, + ReconOMMetadataManager omMetadataManager) { +this.nodeManager = +(ReconNodeManager) reconSCM.getScmNodeManager(); +this.pipelineManager = (ReconPipelineManager) reconSCM.getPipelineManager(); +this.containerManager = +(ReconContainerManager) reconSCM.getContainerManager(); +this.omMetadataManager = omMetadataManager; + } + + /** + * Return a summary report on current cluster state. + * @return {@link Response} + */ + @GET + public Response getClusterState() { +List datanodeDetails = nodeManager.getAllNodes(); +AtomicInteger healthyDatanodes = new AtomicInteger(); +int containers = this.containerManager.getContainerIDs().size(); +int pipelines = this.pipelineManager.getPipelines().size(); +long volumes; +long buckets; +long keys; +AtomicLong capacity = new AtomicLong(0L); +AtomicLong used = new AtomicLong(0L); +AtomicLong remaining = new AtomicLong(0L); +datanodeDetails.forEach(datanode -> { Review comment: We can use SCMNodeManager#getNodeCount(NodeState) to get the number of nodes in a specific state. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] avijayanhwx commented on a change in pull request #681: HDDS-3153. Create REST API to serve Recon Dashboard and integrate with UI in Recon.
avijayanhwx commented on a change in pull request #681: HDDS-3153. Create REST API to serve Recon Dashboard and integrate with UI in Recon. URL: https://github.com/apache/hadoop-ozone/pull/681#discussion_r392627516 ## File path: hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/api/types/DatanodesCount.java ## @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.ozone.recon.api.types; + +import javax.xml.bind.annotation.XmlAccessType; +import javax.xml.bind.annotation.XmlAccessorType; +import javax.xml.bind.annotation.XmlElement; + +/** + * Metadata object that contains datanode counts based on its state. + */ +@XmlAccessorType(XmlAccessType.FIELD) +public class DatanodesCount { Review comment: IMO this class is redundant. We can just capture the num datanodes (healthy and total) as 2 longs in the response. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] avijayanhwx commented on a change in pull request #681: HDDS-3153. Create REST API to serve Recon Dashboard and integrate with UI in Recon.
avijayanhwx commented on a change in pull request #681: HDDS-3153. Create REST API to serve Recon Dashboard and integrate with UI in Recon. URL: https://github.com/apache/hadoop-ozone/pull/681#discussion_r392628661 ## File path: hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/api/ClusterStateEndpoint.java ## @@ -0,0 +1,128 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.ozone.recon.api; + +import org.apache.hadoop.hdds.protocol.DatanodeDetails; +import org.apache.hadoop.hdds.protocol.proto.HddsProtos.NodeState; +import org.apache.hadoop.hdds.scm.container.placement.metrics.SCMNodeStat; +import org.apache.hadoop.hdds.scm.server.OzoneStorageContainerManager; +import org.apache.hadoop.ozone.recon.api.types.ClusterStateResponse; +import org.apache.hadoop.ozone.recon.api.types.DatanodeStorageReport; +import org.apache.hadoop.ozone.recon.api.types.DatanodesCount; +import org.apache.hadoop.ozone.recon.recovery.ReconOMMetadataManager; +import org.apache.hadoop.ozone.recon.scm.ReconContainerManager; +import org.apache.hadoop.ozone.recon.scm.ReconNodeManager; +import org.apache.hadoop.ozone.recon.scm.ReconPipelineManager; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import javax.inject.Inject; +import javax.ws.rs.GET; +import javax.ws.rs.Path; +import javax.ws.rs.Produces; +import javax.ws.rs.core.MediaType; +import javax.ws.rs.core.Response; +import java.util.List; +import java.util.concurrent.atomic.AtomicInteger; +import java.util.concurrent.atomic.AtomicLong; + +/** + * Endpoint to fetch current state of ozone cluster. + */ +@Path("/clusterState") +@Produces(MediaType.APPLICATION_JSON) +public class ClusterStateEndpoint { + + private static final Logger LOG = + LoggerFactory.getLogger(ClusterStateEndpoint.class); + + private ReconNodeManager nodeManager; + private ReconPipelineManager pipelineManager; + private ReconContainerManager containerManager; + private ReconOMMetadataManager omMetadataManager; + + @Inject + ClusterStateEndpoint(OzoneStorageContainerManager reconSCM, + ReconOMMetadataManager omMetadataManager) { +this.nodeManager = +(ReconNodeManager) reconSCM.getScmNodeManager(); +this.pipelineManager = (ReconPipelineManager) reconSCM.getPipelineManager(); +this.containerManager = +(ReconContainerManager) reconSCM.getContainerManager(); +this.omMetadataManager = omMetadataManager; + } + + /** + * Return a summary report on current cluster state. + * @return {@link Response} + */ + @GET + public Response getClusterState() { +List datanodeDetails = nodeManager.getAllNodes(); +AtomicInteger healthyDatanodes = new AtomicInteger(); +int containers = this.containerManager.getContainerIDs().size(); +int pipelines = this.pipelineManager.getPipelines().size(); +long volumes; +long buckets; +long keys; +AtomicLong capacity = new AtomicLong(0L); +AtomicLong used = new AtomicLong(0L); +AtomicLong remaining = new AtomicLong(0L); +datanodeDetails.forEach(datanode -> { + NodeState nodeState = nodeManager.getNodeState(datanode); + SCMNodeStat nodeStat = nodeManager.getNodeStat(datanode).get(); + if (nodeState.equals(NodeState.HEALTHY)) { +healthyDatanodes.getAndIncrement(); + } + capacity.getAndAdd(nodeStat.getCapacity().get()); + used.getAndAdd(nodeStat.getScmUsed().get()); + remaining.getAndAdd(nodeStat.getRemaining().get()); +}); +DatanodeStorageReport storageReport = +new DatanodeStorageReport(capacity.get(), used.get(), remaining.get()); +DatanodesCount datanodesCount = new DatanodesCount(datanodeDetails.size(), +healthyDatanodes.get()); +ClusterStateResponse.Builder builder = ClusterStateResponse.newBuilder(); +try { + volumes = omMetadataManager.getVolumeTable().getEstimatedKeyCount(); + builder.setVolumes(volumes); +} catch (Exception ex) { + LOG.error("Unable to get Volumes count in ClusterStateResponse.", ex); +} +try { + buckets = omMetadataManager.getBucketTable().getEstimatedKeyCount(); Review comment: Nit. We can directly use
[GitHub] [hadoop-ozone] vivekratnavel commented on issue #681: HDDS-3153. Create REST API to serve Recon Dashboard and integrate with UI in Recon.
vivekratnavel commented on issue #681: HDDS-3153. Create REST API to serve Recon Dashboard and integrate with UI in Recon. URL: https://github.com/apache/hadoop-ozone/pull/681#issuecomment-599142315 @avijayanhwx @elek Please review This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3153) Create REST API to serve Recon Dashboard and integrate with UI in Recon.
[ https://issues.apache.org/jira/browse/HDDS-3153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3153: - Labels: pull-request-available (was: ) > Create REST API to serve Recon Dashboard and integrate with UI in Recon. > > > Key: HDDS-3153 > URL: https://issues.apache.org/jira/browse/HDDS-3153 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Recon >Affects Versions: 0.5.0 >Reporter: Vivek Ratnavel Subramanian >Assignee: Vivek Ratnavel Subramanian >Priority: Major > Labels: pull-request-available > Attachments: Screen Shot 2020-03-10 at 12.10.41 PM.png > > > Add a REST API to serve information required for recon dashboard > !Screen Shot 2020-03-10 at 12.10.41 PM.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] vivekratnavel opened a new pull request #681: HDDS-3153. Create REST API to serve Recon Dashboard and integrate with UI in Recon.
vivekratnavel opened a new pull request #681: HDDS-3153. Create REST API to serve Recon Dashboard and integrate with UI in Recon. URL: https://github.com/apache/hadoop-ozone/pull/681 ## What changes were proposed in this pull request? - Add REST api endpoint to serve cluster state (/api/v1/clusterState) - Integrate UI with clusterState API - Remove dangling javadoc comment in LICENSE block from all Recon files - Add unit and integration tests ## What is the link to the Apache JIRA https://issues.apache.org/jira/browse/HDDS-3153 ## How was this patch tested? unit tests, integration tests and manual tests using docker-compose. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] xiaoyuyao commented on issue #665: HDDS-3160. Disable index and filter block cache for RocksDB.
xiaoyuyao commented on issue #665: HDDS-3160. Disable index and filter block cache for RocksDB. URL: https://github.com/apache/hadoop-ozone/pull/665#issuecomment-599118517 Thanks @elek for the patch. Given the block cache size (256MB) and the per SST index/filter cost ~5.5MB, It would handle up to about 50 SST files before thrashing happen. I agree that the mix the index/filter with block data could impact the performance of the block cache. This is one catch moving it out of block cache with this change. We will lost the [control over the max memory usage of those index/filters](https://github.com/facebook/rocksdb/wiki/Block-Cache). We should consider adding profiles for larger block cache size (e.g., change the current 256MB to 8GB or higher for OM DB with 100M+ keys or above) for the scenario when taming rocks db memory usage is required, or adding support for [portioned index filters](https://github.com/facebook/rocksdb/wiki/Partitioned-Index-Filters). I'm +1 on this change with one minor ask: If would be great if you could attach some RocksDB block cache metrics on block cache hit/miss ratio before and after this change, on index/filter/data, respectively. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3177) Periodic dependency update (Java)
[ https://issues.apache.org/jira/browse/HDDS-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Doroszlai updated HDDS-3177: --- Status: Patch Available (was: In Progress) > Periodic dependency update (Java) > - > > Key: HDDS-3177 > URL: https://issues.apache.org/jira/browse/HDDS-3177 > Project: Hadoop Distributed Data Store > Issue Type: Task >Reporter: Wei-Chiu Chuang >Assignee: Attila Doroszlai >Priority: Major > Labels: pull-request-available > Attachments: dependency-check-report.html > > Time Spent: 10m > Remaining Estimate: 0h > > Must: > jackson-databind2.9.9 --> 2.10.3 > netty-all 4.0.52 --> 4.1.46 > nimbus-jose-jwt 4.41.1 --> 7.9 (or remove it?) > Nice to have: > cdi-api 1.2 --> 2.0.SP1 (major version change) > hadoop 3.2.0 --> 3.2.1 > === > protobuf 2.5.0 --> ? this is more controversial -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3177) Periodic dependency update (Java)
[ https://issues.apache.org/jira/browse/HDDS-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3177: - Labels: pull-request-available (was: ) > Periodic dependency update (Java) > - > > Key: HDDS-3177 > URL: https://issues.apache.org/jira/browse/HDDS-3177 > Project: Hadoop Distributed Data Store > Issue Type: Task >Reporter: Wei-Chiu Chuang >Assignee: Attila Doroszlai >Priority: Major > Labels: pull-request-available > Attachments: dependency-check-report.html > > > Must: > jackson-databind2.9.9 --> 2.10.3 > netty-all 4.0.52 --> 4.1.46 > nimbus-jose-jwt 4.41.1 --> 7.9 (or remove it?) > Nice to have: > cdi-api 1.2 --> 2.0.SP1 (major version change) > hadoop 3.2.0 --> 3.2.1 > === > protobuf 2.5.0 --> ? this is more controversial -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] adoroszlai opened a new pull request #680: HDDS-3177. Periodic dependency update (Java)
adoroszlai opened a new pull request #680: HDDS-3177. Periodic dependency update (Java) URL: https://github.com/apache/hadoop-ozone/pull/680 ## What changes were proposed in this pull request? 1. Upgrade: * jackson-databind2.9.9 --> 2.10.3 * netty-all 4.0.52 --> 4.1.47 * nimbus-jose-jwt 4.41.1 --> 7.9 2. Use existing `bouncycastle.version` property (= 1.60) 3. Remove unused dependencies: * `org.apache.htrace:*` * `org.apache.hbase:*` 4. Remove unused/useless properties: * `hadoop.assemblies.version` * `hadoop.common.build.dir` (hadoop-common source no longer present in the same repo) * `hbase.*.version` 5. Exclude `ratis-examples` (accidental dependency of `ratis-tools`) https://issues.apache.org/jira/browse/HDDS-3177 ## How was this patch tested? https://github.com/adoroszlai/hadoop-ozone/runs/507994482 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] bshashikant commented on issue #673: HDDS-3064. Get Key is hung when READ delay is injected in chunk file path.
bshashikant commented on issue #673: HDDS-3064. Get Key is hung when READ delay is injected in chunk file path. URL: https://github.com/apache/hadoop-ozone/pull/673#issuecomment-599111792 > > It was verified in the actual test up where the problem originated. > > Can you please share the details how was this patch tested and how the problem can be reproduced? It was reproduced in fault injection testing environment. @nilotpalnandi , can you please add some details.? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] bshashikant commented on issue #673: HDDS-3064. Get Key is hung when READ delay is injected in chunk file path.
bshashikant commented on issue #673: HDDS-3064. Get Key is hung when READ delay is injected in chunk file path. URL: https://github.com/apache/hadoop-ozone/pull/673#issuecomment-599111536 > @adoroszlai @bshashikant I am not saying we wait for the entire block to be read. The client will be reused for all the chunk reads. Lets take a scenario. > Chunk1 fails on dn1. Succeded on dn2. > Chunk2 -> we should try reading from either dn2 or dn3. We will need to mark dn1 timed out in the code. Currently we can still retry dn1. > Chunk3 -> dn2, dn3 fail. We should retry dn1 before failing the read. > For retrying dn1 we will need to reestablish the connection. I think we are handling this. > > Regarding the deadline, if we set a deadline at t=0, the stub will fail at t=30 even if the client is able to read chunks from the stub. Please verify if this is how the deadline works. I think we will need to differentiate between this scenario and scenario where chunk read is timing out. @lokeshj1703 , in the read path, once a read op fails on a datanode, the same op never gets retried on the same datanode. bb --- Regarding the deadline, if we set a deadline at t=0, the stub will fail at t=30 even if the client is able to read chunks from the stub. Please verify if this is how the deadline works. I think we will need to differentiate between this scenario and scenario where chunk read is timing out. The deadline which is set is for on the rpc call. If the rpc response doesn't complete within teh deadline , it marks it deadline exceeded. The same deadline is being used in Ratis as well. The client itself doesn't maintain a timer as such. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3180) Datanode fails to start due to confused inconsistent volume state
[ https://issues.apache.org/jira/browse/HDDS-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yiqun Lin updated HDDS-3180: Status: Patch Available (was: Open) > Datanode fails to start due to confused inconsistent volume state > - > > Key: HDDS-3180 > URL: https://issues.apache.org/jira/browse/HDDS-3180 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Affects Versions: 0.4.1 >Reporter: Yiqun Lin >Assignee: Yiqun Lin >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > I meet an error in my testing ozone cluster when I restart datanode. From the > log, it throws inconsistent volume state but without other detailed helpful > info: > {noformat} > 2020-03-14 02:31:46,204 [main] INFO (LogAdapter.java:51) - registered > UNIX signal handlers for [TERM, HUP, INT] > 2020-03-14 02:31:46,736 [main] INFO (HddsDatanodeService.java:204) - > HddsDatanodeService host:lyq-xx.xx.xx.xx ip:xx.xx.xx.xx > 2020-03-14 02:31:46,784 [main] INFO (HddsVolume.java:177) - Creating > Volume: /tmp/hadoop-hdfs/dfs/data/hdds of storage type : DISK and capacity : > 20063645696 > 2020-03-14 02:31:46,786 [main] ERROR (MutableVolumeSet.java:202) - Failed > to parse the storage location: file:///tmp/hadoop-hdfs/dfs/data > java.io.IOException: Volume is in an INCONSISTENT state. Skipped loading > volume: /tmp/hadoop-hdfs/dfs/data/hdds > at > org.apache.hadoop.ozone.container.common.volume.HddsVolume.initialize(HddsVolume.java:226) > at > org.apache.hadoop.ozone.container.common.volume.HddsVolume.(HddsVolume.java:180) > at > org.apache.hadoop.ozone.container.common.volume.HddsVolume.(HddsVolume.java:71) > at > org.apache.hadoop.ozone.container.common.volume.HddsVolume$Builder.build(HddsVolume.java:158) > at > org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.createVolume(MutableVolumeSet.java:336) > at > org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.initializeVolumeSet(MutableVolumeSet.java:183) > at > org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.(MutableVolumeSet.java:139) > at > org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.(MutableVolumeSet.java:111) > at > org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.(OzoneContainer.java:97) > at > org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.(DatanodeStateMachine.java:128) > at > org.apache.hadoop.ozone.HddsDatanodeService.start(HddsDatanodeService.java:235) > at > org.apache.hadoop.ozone.HddsDatanodeService.start(HddsDatanodeService.java:179) > at > org.apache.hadoop.ozone.HddsDatanodeService.call(HddsDatanodeService.java:154) > at > org.apache.hadoop.ozone.HddsDatanodeService.call(HddsDatanodeService.java:78) > at picocli.CommandLine.execute(CommandLine.java:1173) > at picocli.CommandLine.access$800(CommandLine.java:141) > at picocli.CommandLine$RunLast.handle(CommandLine.java:1367) > at picocli.CommandLine$RunLast.handle(CommandLine.java:1335) > at > picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:1243) > at picocli.CommandLine.parseWithHandlers(CommandLine.java:1526) > at picocli.CommandLine.parseWithHandler(CommandLine.java:1465) > at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:65) > at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:56) > at > org.apache.hadoop.ozone.HddsDatanodeService.main(HddsDatanodeService.java:137) > 2020-03-14 02:31:46,795 [shutdown-hook-0] INFO (LogAdapter.java:51) - > SHUTDOWN_MSG: > {noformat} > Then I look into the code and the root cause is that the version file was > lost in that node. > We need to log key message as well to help user quickly know the root cause > of this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDDS-3180) Datanode fails to start due to confused inconsistent volume state
[ https://issues.apache.org/jira/browse/HDDS-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17059327#comment-17059327 ] Yiqun Lin edited comment on HDDS-3180 at 3/14/20, 12:29 PM: We need to additionally add log for the inconsistent state because this state will lead Datanode failed to start. A more friendly message tested in my local: {noformat} 2020-03-14 04:41:27,249 [main] INFO (HddsVolume.java:177) - Creating Volume: /tmp/hadoop-hdfs/dfs/data/hdds of storage type : DISK and capacity : 9997713408 2020-03-14 04:41:27,250 [main] WARN (HddsVolume.java:252) - VERSION file does not exist in volume /tmp/hadoop-hdfs/dfs/data/hdds, current volume state: INCONSISTENT. 2020-03-14 04:41:27,257 [main] ERROR (MutableVolumeSet.java:202) - Failed to parse the storage location: file:///tmp/hadoop-hdfs/dfs/data java.io.IOException: Volume is in an INCONSISTENT state. Skipped loading volume: /tmp/hadoop-hdfs/dfs/data/hdds at org.apache.hadoop.ozone.container.common.volume.HddsVolume.initialize(HddsVolume.java:226) at org.apache.hadoop.ozone.container.common.volume.HddsVolume.(HddsVolume.java:180) at org.apache.hadoop.ozone.container.common.volume.HddsVolume.(HddsVolume.java:71) at org.apache.hadoop.ozone.container.common.volume.HddsVolume$Builder.build(HddsVolume.java:158) at org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.createVolume(MutableVolumeSet.java:336) {noformat} was (Author: linyiqun): We need to additionally add log for the inconsistent state because this state will lead Datanode failed to start. > Datanode fails to start due to confused inconsistent volume state > - > > Key: HDDS-3180 > URL: https://issues.apache.org/jira/browse/HDDS-3180 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Affects Versions: 0.4.1 >Reporter: Yiqun Lin >Assignee: Yiqun Lin >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > I meet an error in my testing ozone cluster when I restart datanode. From the > log, it throws inconsistent volume state but without other detailed helpful > info: > {noformat} > 2020-03-14 02:31:46,204 [main] INFO (LogAdapter.java:51) - registered > UNIX signal handlers for [TERM, HUP, INT] > 2020-03-14 02:31:46,736 [main] INFO (HddsDatanodeService.java:204) - > HddsDatanodeService host:lyq-xx.xx.xx.xx ip:xx.xx.xx.xx > 2020-03-14 02:31:46,784 [main] INFO (HddsVolume.java:177) - Creating > Volume: /tmp/hadoop-hdfs/dfs/data/hdds of storage type : DISK and capacity : > 20063645696 > 2020-03-14 02:31:46,786 [main] ERROR (MutableVolumeSet.java:202) - Failed > to parse the storage location: file:///tmp/hadoop-hdfs/dfs/data > java.io.IOException: Volume is in an INCONSISTENT state. Skipped loading > volume: /tmp/hadoop-hdfs/dfs/data/hdds > at > org.apache.hadoop.ozone.container.common.volume.HddsVolume.initialize(HddsVolume.java:226) > at > org.apache.hadoop.ozone.container.common.volume.HddsVolume.(HddsVolume.java:180) > at > org.apache.hadoop.ozone.container.common.volume.HddsVolume.(HddsVolume.java:71) > at > org.apache.hadoop.ozone.container.common.volume.HddsVolume$Builder.build(HddsVolume.java:158) > at > org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.createVolume(MutableVolumeSet.java:336) > at > org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.initializeVolumeSet(MutableVolumeSet.java:183) > at > org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.(MutableVolumeSet.java:139) > at > org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.(MutableVolumeSet.java:111) > at > org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.(OzoneContainer.java:97) > at > org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.(DatanodeStateMachine.java:128) > at > org.apache.hadoop.ozone.HddsDatanodeService.start(HddsDatanodeService.java:235) > at > org.apache.hadoop.ozone.HddsDatanodeService.start(HddsDatanodeService.java:179) > at > org.apache.hadoop.ozone.HddsDatanodeService.call(HddsDatanodeService.java:154) > at > org.apache.hadoop.ozone.HddsDatanodeService.call(HddsDatanodeService.java:78) > at picocli.CommandLine.execute(CommandLine.java:1173) > at picocli.CommandLine.access$800(CommandLine.java:141) > at picocli.CommandLine$RunLast.handle(CommandLine.java:1367) > at picocli.CommandLine$RunLast.handle(CommandLine.java:1335) > at > picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.
[jira] [Commented] (HDDS-3180) Datanode fails to start due to confused inconsistent volume state
[ https://issues.apache.org/jira/browse/HDDS-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17059327#comment-17059327 ] Yiqun Lin commented on HDDS-3180: - We need to additionally add log for the inconsistent state because this state will lead Datanode failed to start. > Datanode fails to start due to confused inconsistent volume state > - > > Key: HDDS-3180 > URL: https://issues.apache.org/jira/browse/HDDS-3180 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Affects Versions: 0.4.1 >Reporter: Yiqun Lin >Assignee: Yiqun Lin >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > I meet an error in my testing ozone cluster when I restart datanode. From the > log, it throws inconsistent volume state but without other detailed helpful > info: > {noformat} > 2020-03-14 02:31:46,204 [main] INFO (LogAdapter.java:51) - registered > UNIX signal handlers for [TERM, HUP, INT] > 2020-03-14 02:31:46,736 [main] INFO (HddsDatanodeService.java:204) - > HddsDatanodeService host:lyq-xx.xx.xx.xx ip:xx.xx.xx.xx > 2020-03-14 02:31:46,784 [main] INFO (HddsVolume.java:177) - Creating > Volume: /tmp/hadoop-hdfs/dfs/data/hdds of storage type : DISK and capacity : > 20063645696 > 2020-03-14 02:31:46,786 [main] ERROR (MutableVolumeSet.java:202) - Failed > to parse the storage location: file:///tmp/hadoop-hdfs/dfs/data > java.io.IOException: Volume is in an INCONSISTENT state. Skipped loading > volume: /tmp/hadoop-hdfs/dfs/data/hdds > at > org.apache.hadoop.ozone.container.common.volume.HddsVolume.initialize(HddsVolume.java:226) > at > org.apache.hadoop.ozone.container.common.volume.HddsVolume.(HddsVolume.java:180) > at > org.apache.hadoop.ozone.container.common.volume.HddsVolume.(HddsVolume.java:71) > at > org.apache.hadoop.ozone.container.common.volume.HddsVolume$Builder.build(HddsVolume.java:158) > at > org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.createVolume(MutableVolumeSet.java:336) > at > org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.initializeVolumeSet(MutableVolumeSet.java:183) > at > org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.(MutableVolumeSet.java:139) > at > org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.(MutableVolumeSet.java:111) > at > org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.(OzoneContainer.java:97) > at > org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.(DatanodeStateMachine.java:128) > at > org.apache.hadoop.ozone.HddsDatanodeService.start(HddsDatanodeService.java:235) > at > org.apache.hadoop.ozone.HddsDatanodeService.start(HddsDatanodeService.java:179) > at > org.apache.hadoop.ozone.HddsDatanodeService.call(HddsDatanodeService.java:154) > at > org.apache.hadoop.ozone.HddsDatanodeService.call(HddsDatanodeService.java:78) > at picocli.CommandLine.execute(CommandLine.java:1173) > at picocli.CommandLine.access$800(CommandLine.java:141) > at picocli.CommandLine$RunLast.handle(CommandLine.java:1367) > at picocli.CommandLine$RunLast.handle(CommandLine.java:1335) > at > picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:1243) > at picocli.CommandLine.parseWithHandlers(CommandLine.java:1526) > at picocli.CommandLine.parseWithHandler(CommandLine.java:1465) > at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:65) > at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:56) > at > org.apache.hadoop.ozone.HddsDatanodeService.main(HddsDatanodeService.java:137) > 2020-03-14 02:31:46,795 [shutdown-hook-0] INFO (LogAdapter.java:51) - > SHUTDOWN_MSG: > {noformat} > Then I look into the code and the root cause is that the version file was > lost in that node. > We need to log key message as well to help user quickly know the root cause > of this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3180) Datanode fails to start due to confused inconsistent volume state
[ https://issues.apache.org/jira/browse/HDDS-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3180: - Labels: pull-request-available (was: ) > Datanode fails to start due to confused inconsistent volume state > - > > Key: HDDS-3180 > URL: https://issues.apache.org/jira/browse/HDDS-3180 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Affects Versions: 0.4.1 >Reporter: Yiqun Lin >Assignee: Yiqun Lin >Priority: Major > Labels: pull-request-available > > I meet an error in my testing ozone cluster when I restart datanode. From the > log, it throws inconsistent volume state but without other detailed helpful > info: > {noformat} > 2020-03-14 02:31:46,204 [main] INFO (LogAdapter.java:51) - registered > UNIX signal handlers for [TERM, HUP, INT] > 2020-03-14 02:31:46,736 [main] INFO (HddsDatanodeService.java:204) - > HddsDatanodeService host:lyq-xx.xx.xx.xx ip:xx.xx.xx.xx > 2020-03-14 02:31:46,784 [main] INFO (HddsVolume.java:177) - Creating > Volume: /tmp/hadoop-hdfs/dfs/data/hdds of storage type : DISK and capacity : > 20063645696 > 2020-03-14 02:31:46,786 [main] ERROR (MutableVolumeSet.java:202) - Failed > to parse the storage location: file:///tmp/hadoop-hdfs/dfs/data > java.io.IOException: Volume is in an INCONSISTENT state. Skipped loading > volume: /tmp/hadoop-hdfs/dfs/data/hdds > at > org.apache.hadoop.ozone.container.common.volume.HddsVolume.initialize(HddsVolume.java:226) > at > org.apache.hadoop.ozone.container.common.volume.HddsVolume.(HddsVolume.java:180) > at > org.apache.hadoop.ozone.container.common.volume.HddsVolume.(HddsVolume.java:71) > at > org.apache.hadoop.ozone.container.common.volume.HddsVolume$Builder.build(HddsVolume.java:158) > at > org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.createVolume(MutableVolumeSet.java:336) > at > org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.initializeVolumeSet(MutableVolumeSet.java:183) > at > org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.(MutableVolumeSet.java:139) > at > org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.(MutableVolumeSet.java:111) > at > org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.(OzoneContainer.java:97) > at > org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.(DatanodeStateMachine.java:128) > at > org.apache.hadoop.ozone.HddsDatanodeService.start(HddsDatanodeService.java:235) > at > org.apache.hadoop.ozone.HddsDatanodeService.start(HddsDatanodeService.java:179) > at > org.apache.hadoop.ozone.HddsDatanodeService.call(HddsDatanodeService.java:154) > at > org.apache.hadoop.ozone.HddsDatanodeService.call(HddsDatanodeService.java:78) > at picocli.CommandLine.execute(CommandLine.java:1173) > at picocli.CommandLine.access$800(CommandLine.java:141) > at picocli.CommandLine$RunLast.handle(CommandLine.java:1367) > at picocli.CommandLine$RunLast.handle(CommandLine.java:1335) > at > picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:1243) > at picocli.CommandLine.parseWithHandlers(CommandLine.java:1526) > at picocli.CommandLine.parseWithHandler(CommandLine.java:1465) > at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:65) > at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:56) > at > org.apache.hadoop.ozone.HddsDatanodeService.main(HddsDatanodeService.java:137) > 2020-03-14 02:31:46,795 [shutdown-hook-0] INFO (LogAdapter.java:51) - > SHUTDOWN_MSG: > {noformat} > Then I look into the code and the root cause is that the version file was > lost in that node. > We need to log key message as well to help user quickly know the root cause > of this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] linyiqun opened a new pull request #679: HDDS-3180. Datanode fails to start due to confused inconsistent volum…
linyiqun opened a new pull request #679: HDDS-3180. Datanode fails to start due to confused inconsistent volum… URL: https://github.com/apache/hadoop-ozone/pull/679 ## What changes were proposed in this pull request? Add helpful error message for root cause of Datanode startup failure. ## What is the link to the Apache JIRA https://issues.apache.org/jira/browse/HDDS-3180 ## How was this patch tested? Tested manually in the local env. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3180) Datanode fails to start due to confused inconsistent volume state
[ https://issues.apache.org/jira/browse/HDDS-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yiqun Lin updated HDDS-3180: Summary: Datanode fails to start due to confused inconsistent volume state (was: Datanode fails to start due to inconsistent volume state without helpful error message) > Datanode fails to start due to confused inconsistent volume state > - > > Key: HDDS-3180 > URL: https://issues.apache.org/jira/browse/HDDS-3180 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Affects Versions: 0.4.1 >Reporter: Yiqun Lin >Assignee: Yiqun Lin >Priority: Major > > I meet an error in my testing ozone cluster when I restart datanode. From the > log, it throws inconsistent volume state but without other detailed helpful > info: > {noformat} > 2020-03-14 02:31:46,204 [main] INFO (LogAdapter.java:51) - registered > UNIX signal handlers for [TERM, HUP, INT] > 2020-03-14 02:31:46,736 [main] INFO (HddsDatanodeService.java:204) - > HddsDatanodeService host:lyq-xx.xx.xx.xx ip:xx.xx.xx.xx > 2020-03-14 02:31:46,784 [main] INFO (HddsVolume.java:177) - Creating > Volume: /tmp/hadoop-hdfs/dfs/data/hdds of storage type : DISK and capacity : > 20063645696 > 2020-03-14 02:31:46,786 [main] ERROR (MutableVolumeSet.java:202) - Failed > to parse the storage location: file:///tmp/hadoop-hdfs/dfs/data > java.io.IOException: Volume is in an INCONSISTENT state. Skipped loading > volume: /tmp/hadoop-hdfs/dfs/data/hdds > at > org.apache.hadoop.ozone.container.common.volume.HddsVolume.initialize(HddsVolume.java:226) > at > org.apache.hadoop.ozone.container.common.volume.HddsVolume.(HddsVolume.java:180) > at > org.apache.hadoop.ozone.container.common.volume.HddsVolume.(HddsVolume.java:71) > at > org.apache.hadoop.ozone.container.common.volume.HddsVolume$Builder.build(HddsVolume.java:158) > at > org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.createVolume(MutableVolumeSet.java:336) > at > org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.initializeVolumeSet(MutableVolumeSet.java:183) > at > org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.(MutableVolumeSet.java:139) > at > org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.(MutableVolumeSet.java:111) > at > org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.(OzoneContainer.java:97) > at > org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.(DatanodeStateMachine.java:128) > at > org.apache.hadoop.ozone.HddsDatanodeService.start(HddsDatanodeService.java:235) > at > org.apache.hadoop.ozone.HddsDatanodeService.start(HddsDatanodeService.java:179) > at > org.apache.hadoop.ozone.HddsDatanodeService.call(HddsDatanodeService.java:154) > at > org.apache.hadoop.ozone.HddsDatanodeService.call(HddsDatanodeService.java:78) > at picocli.CommandLine.execute(CommandLine.java:1173) > at picocli.CommandLine.access$800(CommandLine.java:141) > at picocli.CommandLine$RunLast.handle(CommandLine.java:1367) > at picocli.CommandLine$RunLast.handle(CommandLine.java:1335) > at > picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:1243) > at picocli.CommandLine.parseWithHandlers(CommandLine.java:1526) > at picocli.CommandLine.parseWithHandler(CommandLine.java:1465) > at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:65) > at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:56) > at > org.apache.hadoop.ozone.HddsDatanodeService.main(HddsDatanodeService.java:137) > 2020-03-14 02:31:46,795 [shutdown-hook-0] INFO (LogAdapter.java:51) - > SHUTDOWN_MSG: > {noformat} > Then I look into the code and the root cause is that the version file was > lost in that node. > We need to log key message as well to help user quickly know the root cause > of this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3180) Datanode fails to start due to inconsistent volume state without helpful error message
[ https://issues.apache.org/jira/browse/HDDS-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yiqun Lin updated HDDS-3180: Summary: Datanode fails to start due to inconsistent volume state without helpful error message (was: Datanode shutdown due to inconsistent volume state without helpful error message) > Datanode fails to start due to inconsistent volume state without helpful > error message > -- > > Key: HDDS-3180 > URL: https://issues.apache.org/jira/browse/HDDS-3180 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Affects Versions: 0.4.1 >Reporter: Yiqun Lin >Assignee: Yiqun Lin >Priority: Major > > I meet an error in my testing ozone cluster when I restart datanode. From the > log, it throws inconsistent volume state but without other detailed helpful > info: > {noformat} > 2020-03-14 02:31:46,204 [main] INFO (LogAdapter.java:51) - registered > UNIX signal handlers for [TERM, HUP, INT] > 2020-03-14 02:31:46,736 [main] INFO (HddsDatanodeService.java:204) - > HddsDatanodeService host:lyq-xx.xx.xx.xx ip:xx.xx.xx.xx > 2020-03-14 02:31:46,784 [main] INFO (HddsVolume.java:177) - Creating > Volume: /tmp/hadoop-hdfs/dfs/data/hdds of storage type : DISK and capacity : > 20063645696 > 2020-03-14 02:31:46,786 [main] ERROR (MutableVolumeSet.java:202) - Failed > to parse the storage location: file:///tmp/hadoop-hdfs/dfs/data > java.io.IOException: Volume is in an INCONSISTENT state. Skipped loading > volume: /tmp/hadoop-hdfs/dfs/data/hdds > at > org.apache.hadoop.ozone.container.common.volume.HddsVolume.initialize(HddsVolume.java:226) > at > org.apache.hadoop.ozone.container.common.volume.HddsVolume.(HddsVolume.java:180) > at > org.apache.hadoop.ozone.container.common.volume.HddsVolume.(HddsVolume.java:71) > at > org.apache.hadoop.ozone.container.common.volume.HddsVolume$Builder.build(HddsVolume.java:158) > at > org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.createVolume(MutableVolumeSet.java:336) > at > org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.initializeVolumeSet(MutableVolumeSet.java:183) > at > org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.(MutableVolumeSet.java:139) > at > org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.(MutableVolumeSet.java:111) > at > org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.(OzoneContainer.java:97) > at > org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.(DatanodeStateMachine.java:128) > at > org.apache.hadoop.ozone.HddsDatanodeService.start(HddsDatanodeService.java:235) > at > org.apache.hadoop.ozone.HddsDatanodeService.start(HddsDatanodeService.java:179) > at > org.apache.hadoop.ozone.HddsDatanodeService.call(HddsDatanodeService.java:154) > at > org.apache.hadoop.ozone.HddsDatanodeService.call(HddsDatanodeService.java:78) > at picocli.CommandLine.execute(CommandLine.java:1173) > at picocli.CommandLine.access$800(CommandLine.java:141) > at picocli.CommandLine$RunLast.handle(CommandLine.java:1367) > at picocli.CommandLine$RunLast.handle(CommandLine.java:1335) > at > picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:1243) > at picocli.CommandLine.parseWithHandlers(CommandLine.java:1526) > at picocli.CommandLine.parseWithHandler(CommandLine.java:1465) > at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:65) > at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:56) > at > org.apache.hadoop.ozone.HddsDatanodeService.main(HddsDatanodeService.java:137) > 2020-03-14 02:31:46,795 [shutdown-hook-0] INFO (LogAdapter.java:51) - > SHUTDOWN_MSG: > {noformat} > Then I look into the code and the root cause is that the version file was > lost in that node. > We need to log key message as well to help user quickly know the root cause > of this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-3180) Datanode shutdown due to inconsistent volume state without helpful error message
Yiqun Lin created HDDS-3180: --- Summary: Datanode shutdown due to inconsistent volume state without helpful error message Key: HDDS-3180 URL: https://issues.apache.org/jira/browse/HDDS-3180 Project: Hadoop Distributed Data Store Issue Type: Improvement Affects Versions: 0.4.1 Reporter: Yiqun Lin Assignee: Yiqun Lin I meet an error in my testing ozone cluster when I restart datanode. From the log, it throws inconsistent volume state but without other detailed helpful info: {noformat} 2020-03-14 02:31:46,204 [main] INFO (LogAdapter.java:51) - registered UNIX signal handlers for [TERM, HUP, INT] 2020-03-14 02:31:46,736 [main] INFO (HddsDatanodeService.java:204) - HddsDatanodeService host:lyq-xx.xx.xx.xx ip:xx.xx.xx.xx 2020-03-14 02:31:46,784 [main] INFO (HddsVolume.java:177) - Creating Volume: /tmp/hadoop-hdfs/dfs/data/hdds of storage type : DISK and capacity : 20063645696 2020-03-14 02:31:46,786 [main] ERROR (MutableVolumeSet.java:202) - Failed to parse the storage location: file:///tmp/hadoop-hdfs/dfs/data java.io.IOException: Volume is in an INCONSISTENT state. Skipped loading volume: /tmp/hadoop-hdfs/dfs/data/hdds at org.apache.hadoop.ozone.container.common.volume.HddsVolume.initialize(HddsVolume.java:226) at org.apache.hadoop.ozone.container.common.volume.HddsVolume.(HddsVolume.java:180) at org.apache.hadoop.ozone.container.common.volume.HddsVolume.(HddsVolume.java:71) at org.apache.hadoop.ozone.container.common.volume.HddsVolume$Builder.build(HddsVolume.java:158) at org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.createVolume(MutableVolumeSet.java:336) at org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.initializeVolumeSet(MutableVolumeSet.java:183) at org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.(MutableVolumeSet.java:139) at org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.(MutableVolumeSet.java:111) at org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.(OzoneContainer.java:97) at org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.(DatanodeStateMachine.java:128) at org.apache.hadoop.ozone.HddsDatanodeService.start(HddsDatanodeService.java:235) at org.apache.hadoop.ozone.HddsDatanodeService.start(HddsDatanodeService.java:179) at org.apache.hadoop.ozone.HddsDatanodeService.call(HddsDatanodeService.java:154) at org.apache.hadoop.ozone.HddsDatanodeService.call(HddsDatanodeService.java:78) at picocli.CommandLine.execute(CommandLine.java:1173) at picocli.CommandLine.access$800(CommandLine.java:141) at picocli.CommandLine$RunLast.handle(CommandLine.java:1367) at picocli.CommandLine$RunLast.handle(CommandLine.java:1335) at picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:1243) at picocli.CommandLine.parseWithHandlers(CommandLine.java:1526) at picocli.CommandLine.parseWithHandler(CommandLine.java:1465) at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:65) at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:56) at org.apache.hadoop.ozone.HddsDatanodeService.main(HddsDatanodeService.java:137) 2020-03-14 02:31:46,795 [shutdown-hook-0] INFO (LogAdapter.java:51) - SHUTDOWN_MSG: {noformat} Then I look into the code and the root cause is that the version file was lost in that node. We need to log key message as well to help user quickly know the root cause of this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-3176) Remove unused dependency version strings
[ https://issues.apache.org/jira/browse/HDDS-3176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Doroszlai reassigned HDDS-3176: -- Assignee: Attila Doroszlai > Remove unused dependency version strings > > > Key: HDDS-3176 > URL: https://issues.apache.org/jira/browse/HDDS-3176 > Project: Hadoop Distributed Data Store > Issue Type: Task >Affects Versions: 0.5.0 >Reporter: Wei-Chiu Chuang >Assignee: Attila Doroszlai >Priority: Minor > Labels: newbie > > After the repo was split from hadoop, there are a few unused > dependencies/version strings left in pom.xml. They can be removed. > Example: > {code} > 1.2.6 > 2.0.0-beta-1 > {code} > There may be more. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] elek commented on issue #673: HDDS-3064. Get Key is hung when READ delay is injected in chunk file path.
elek commented on issue #673: HDDS-3064. Get Key is hung when READ delay is injected in chunk file path. URL: https://github.com/apache/hadoop-ozone/pull/673#issuecomment-599027682 > It was verified in the actual test up where the problem originated. Can you please share the details how was this patch tested and how the problem can be reproduced? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-3148) Logs cluttered by AlreadyExistsException from Ratis
[ https://issues.apache.org/jira/browse/HDDS-3148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17059207#comment-17059207 ] Attila Doroszlai commented on HDDS-3148: Opened RATIS-828 for Ratis follow-up. > Logs cluttered by AlreadyExistsException from Ratis > --- > > Key: HDDS-3148 > URL: https://issues.apache.org/jira/browse/HDDS-3148 > Project: Hadoop Distributed Data Store > Issue Type: Wish > Components: Ozone Datanode >Reporter: Attila Doroszlai >Assignee: Siddharth Wagle >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Ozone startup logs are cluttered by printing stack trace of > AlreadyExistsException related to group addition. Example: > {code} > 2020-03-09 13:53:01,563 [grpc-default-executor-0] WARN impl.RaftServerProxy > (RaftServerProxy.java:lambda$groupAddAsync$11(390)) - > 7a07f161-9144-44b2-8baa-73f0e9299675: Failed groupAdd* > GroupManagementRequest:client-27FB1A91809E->7a07f161-9144-44b2-8baa-73f0e9299675@group-E151028E3AC0, > cid=2, seq=0, RW, null, > Add:group-E151028E3AC0:[18f4e257-bf09-482e-b1bb-a2408a093ff7:172.17.0.2:43845, > 7a07f161-9144-44b2-8baa-73f0e9299675:172.17.0.2:41551, > 8a66c80e-ab55-4975-92a9-8aaf06ab418a:172.17.0.2:36921] > java.util.concurrent.CompletionException: > org.apache.ratis.protocol.AlreadyExistsException: > 7a07f161-9144-44b2-8baa-73f0e9299675: Failed to add > group-E151028E3AC0:[18f4e257-bf09-482e-b1bb-a2408a093ff7:172.17.0.2:43845, > 7a07f161-9144-44b2-8baa-73f0e9299675:172.17.0.2:41551, > 8a66c80e-ab55-4975-92a9-8aaf06ab418a:172.17.0.2:36921] since the group > already exists in the map. > at > java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) > at > java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) > at > java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607) > at > java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591) > at > java.util.concurrent.CompletableFuture.uniApplyStage(CompletableFuture.java:631) > at > java.util.concurrent.CompletableFuture.thenApplyAsync(CompletableFuture.java:2006) > at > org.apache.ratis.server.impl.RaftServerProxy.groupAddAsync(RaftServerProxy.java:379) > at > org.apache.ratis.server.impl.RaftServerProxy.groupManagementAsync(RaftServerProxy.java:363) > at > org.apache.ratis.grpc.server.GrpcAdminProtocolService.lambda$groupManagement$0(GrpcAdminProtocolService.java:42) > at org.apache.ratis.grpc.GrpcUtil.asyncCall(GrpcUtil.java:160) > at > org.apache.ratis.grpc.server.GrpcAdminProtocolService.groupManagement(GrpcAdminProtocolService.java:42) > at > org.apache.ratis.proto.grpc.AdminProtocolServiceGrpc$MethodHandlers.invoke(AdminProtocolServiceGrpc.java:358) > at > org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:172) > at > org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:331) > at > org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:814) > at > org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) > at > org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.ratis.protocol.AlreadyExistsException: > 7a07f161-9144-44b2-8baa-73f0e9299675: Failed to add > group-E151028E3AC0:[18f4e257-bf09-482e-b1bb-a2408a093ff7:172.17.0.2:43845, > 7a07f161-9144-44b2-8baa-73f0e9299675:172.17.0.2:41551, > 8a66c80e-ab55-4975-92a9-8aaf06ab418a:172.17.0.2:36921] since the group > already exists in the map. > at > org.apache.ratis.server.impl.RaftServerProxy$ImplMap.addNew(RaftServerProxy.java:83) > at > org.apache.ratis.server.impl.RaftServerProxy.groupAddAsync(RaftServerProxy.java:378) > ... 13 more > {code} > Since these are "normal", I think stack trace should be suppressed. > CC [~nanda] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org