[jira] [Updated] (HDDS-3027) Ozone: Ensure usage of parameterized slf4j log syntax for ozone
[ https://issues.apache.org/jira/browse/HDDS-3027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3027: - Labels: newbie pull-request-available (was: newbie) > Ozone: Ensure usage of parameterized slf4j log syntax for ozone > --- > > Key: HDDS-3027 > URL: https://issues.apache.org/jira/browse/HDDS-3027 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Xiaoyu Yao >Priority: Trivial > Labels: newbie, pull-request-available > Fix For: 0.5.0 > > > various places use LOG.info("text " + something). they should all move to > LOG.info("text {}", something) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2860) Cluster disk space metrics should reflect decommission and maintenance states
[ https://issues.apache.org/jira/browse/HDDS-2860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-2860: - Labels: pull-request-available (was: ) > Cluster disk space metrics should reflect decommission and maintenance states > - > > Key: HDDS-2860 > URL: https://issues.apache.org/jira/browse/HDDS-2860 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Affects Versions: 0.5.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Labels: pull-request-available > > Now we have decommission states, we need to adjust the cluster capacity, > space used and available metrics which are exposed via JMX. > For a node decommissioning, the space used on the node effectively needs to > be transfer to other nodes via container replication before decommission can > complete, but this is difficult to track from a space usage perspective. When > a node completes decommission, we can assume it provides no capacity to the > cluster and uses none. Therefore, for decommissioning + decommissioned nodes, > the simplest calculation is to exclude the node completely in a similar way > to a dead node. > For maintenance nodes, things are even less clear. For a maintenance node, it > is read only so it cannot provide capacity to the cluster, but it is expected > to return to service, so excluding it completely probably does not make > sense. However, perhaps the simplest solution is to do the following: > 1. For any node not IN_SERVICE, do not include its usage or space in the > cluster capacity totals. > 2. Introduce some new metrics to account for the maintenance and perhaps > decommission capacity, so it is not lost eg: > {code} > # Existing metrics > "DiskCapacity" : 62725623808, > "DiskUsed" : 4096, > "DiskRemaining" : 50459619328, > # Suggested additional new ones, with the above only considering IN_SERVICE > nodes: > "MaintenanceDiskCapacity": 0 > "MaintenanceDiskUsed": 0 > "MaintenanceDiskRemaining": 0 > "DecommissionedDiskCapacity": 0 > "DecommissionedDiskUsed": 0 > "DecommissionedDiskRemaining": 0 > ... > {code} > That way, the cluster totals are only what is currently "online", but we have > the other metrics to track what has been removed etc. The key advantage of > this, is that it is easy to understand. > There could also be an argument that the new decommissionedDisk metrics are > not needed as that capacity is technically lost from the cluster forever. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3028) Use own version from InterfaceAudience/Stability version
[ https://issues.apache.org/jira/browse/HDDS-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3028: - Labels: pull-request-available (was: ) > Use own version from InterfaceAudience/Stability version > > > Key: HDDS-3028 > URL: https://issues.apache.org/jira/browse/HDDS-3028 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Marton Elek >Assignee: Marton Elek >Priority: Major > Labels: pull-request-available > > Current Ozone code uses the Hadoop version from @InterfaceAudience and > @InterfaceStability annotations. > While Hadoop uses the annotations during the javadoc generation, in Ozone > they are used only as markers as Ozone doesn't generate javadoc during the > releases. > The two annotations are in the Hadoop common project. I propose to copy them > and use the copied annotations instead of the original one. It would help us > to reduce the dependencies on Hadoop (the hadoop-common which contains the > original annotations has 87 transitive dependencies!!) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3031) OM HA- Client requests get LeaderNotReadyException after OM restart
[ https://issues.apache.org/jira/browse/HDDS-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3031: - Labels: pull-request-available (was: ) > OM HA- Client requests get LeaderNotReadyException after OM restart > --- > > Key: HDDS-3031 > URL: https://issues.apache.org/jira/browse/HDDS-3031 > Project: Hadoop Distributed Data Store > Issue Type: New Feature >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > > Scenario: > 1.Set up OM HA cluster. > 2. Perform some write operations. > 3. Restart OM's. > 4. Now try any write operation. > Below error will be thrown for 15 times, and finally, client request will > fail. > {code:java} > > 2020-02-15 10:11:23,244 [qtp2025269734-19] INFO > org.apache.hadoop.io.retry.RetryInvocationHandler: > com.google.protobuf.ServiceException: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ozone.om.exceptions.OMLeaderNotReadyException): > om1@group-D0D586AF6951 is in LEADER state but not ready yet. > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.processReply(OzoneManagerRatisServer.java:177) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.submitRequest(OzoneManagerRatisServer.java:136) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestToRatis(OzoneManagerProtocolServerSideTranslatorPB.java:162) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:118) > at > org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:97) > at > org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > , while invoking $Proxy81.submitRequest over > nodeId=om1,nodeAddress=om-ha-1.vpc.cloudera.com:9862 after 1 failover > attempts. Trying to failover immediately. > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3030) Key Rename should preserve the ObjectID
[ https://issues.apache.org/jira/browse/HDDS-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3030: - Labels: pull-request-available (was: ) > Key Rename should preserve the ObjectID > --- > > Key: HDDS-3030 > URL: https://issues.apache.org/jira/browse/HDDS-3030 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: om >Reporter: Hanisha Koneru >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > > On Key Renames, objectID should be preserved from the original Key. > Currently it is being set to the new transactionLogIndex of the rename > request. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3034) Broken return code check in unit/integration
[ https://issues.apache.org/jira/browse/HDDS-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3034: - Labels: pull-request-available (was: ) > Broken return code check in unit/integration > > > Key: HDDS-3034 > URL: https://issues.apache.org/jira/browse/HDDS-3034 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: build, test >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Major > Labels: pull-request-available > > HDDS-2915 fixed unit/integration check result in case of Maven error. > However, return code check was broken by output redirection via pipeline > added in HDDS-2833 and HDDS-2960: > bq. The return status of a pipeline is the exit status of the last command, > unless the pipefail option is enabled. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3037) Hide JooQ welcome message on start
[ https://issues.apache.org/jira/browse/HDDS-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3037: - Labels: pull-request-available (was: ) > Hide JooQ welcome message on start > -- > > Key: HDDS-3037 > URL: https://issues.apache.org/jira/browse/HDDS-3037 > Project: Hadoop Distributed Data Store > Issue Type: Wish > Components: Ozone Recon >Affects Versions: 0.5.0 >Reporter: Siddharth Wagle >Assignee: Siddharth Wagle >Priority: Minor > Labels: pull-request-available > Fix For: 0.5.0 > > > Ozone recon start prints out this self-ad message: > {code} > 2020-02-19 13:23:18,671 [main] INFO jooq.Constants > (JooqLogger.java:info(338)) - > > @@ > @@ > @@@@ > @@ > @@ @@@@ > @@ @@ @@@@ > @@@@@@ > @@ > @@ > @@@@@@ > @@@@ @@ @@ > @@@@ @@ @@ > @@@@ @ @ @@ > @@@@@@ > @@@ @ > @@ > @@ Thank you for using jOOQ 3.11.9 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3040) Update Ratis version to 0.5.0 released.
[ https://issues.apache.org/jira/browse/HDDS-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3040: - Labels: pull-request-available (was: ) > Update Ratis version to 0.5.0 released. > --- > > Key: HDDS-3040 > URL: https://issues.apache.org/jira/browse/HDDS-3040 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Major > Labels: pull-request-available > > Update Ozone to use latest released version of Ratis 0.5.0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3035) Add ability to enable Ratis metrics in OzoneManager
[ https://issues.apache.org/jira/browse/HDDS-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3035: - Labels: pull-request-available (was: ) > Add ability to enable Ratis metrics in OzoneManager > --- > > Key: HDDS-3035 > URL: https://issues.apache.org/jira/browse/HDDS-3035 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.5.0 >Reporter: Aravindan Vijayan >Assignee: Siddharth Wagle >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > > Whenever OM uses Ratis, we may need the ability to collect its metrics > through OM JMX. This should be a straightforward change, similar to > org.apache.hadoop.ozone.HddsDatanodeService#start(). > {code} > public void start() { > //All the Ratis metrics (registered from now) will be published via JMX > and > //via the prometheus exporter (used by the /prom servlet > MetricRegistries.global() > .addReporterRegistration(MetricsReporting.jmxReporter()); > MetricRegistries.global().addReporterRegistration( > registry -> CollectorRegistry.defaultRegistry.register( > new RatisDropwizardExports( > registry.getDropWizardMetricRegistry(; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3042) Support running full Ratis pipeline from IDE (IntelliJ)
[ https://issues.apache.org/jira/browse/HDDS-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3042: - Labels: pull-request-available (was: ) > Support running full Ratis pipeline from IDE (IntelliJ) > > > Key: HDDS-3042 > URL: https://issues.apache.org/jira/browse/HDDS-3042 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Marton Elek >Priority: Major > Labels: pull-request-available > > HDDS-1522 introduced a method to run full cluster in IntelliJ. The runner > configurations can be copied with a shell script and a basic ozone-site.xml > and log configuration to make it easy to run ozone from IDE. > Unfortunately this setup supports only one Datanode and it's harder to debug > full Ozone pipeline (3 datanodes) from IDE. > This patch provides 3 different configuration for 3 datanodes with different > ports to make it possible to run them on the same host from the IDE. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3040) Update Ratis version to 0.5.0 released.
[ https://issues.apache.org/jira/browse/HDDS-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3040: - Labels: pull-request-available (was: ) > Update Ratis version to 0.5.0 released. > --- > > Key: HDDS-3040 > URL: https://issues.apache.org/jira/browse/HDDS-3040 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Update Ozone to use latest released version of Ratis 0.5.0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3045) Integration test crashes due to ReconServer NPE
[ https://issues.apache.org/jira/browse/HDDS-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3045: - Labels: pull-request-available (was: ) > Integration test crashes due to ReconServer NPE > --- > > Key: HDDS-3045 > URL: https://issues.apache.org/jira/browse/HDDS-3045 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Major > Labels: pull-request-available > > {code:title=https://github.com/apache/hadoop-ozone/runs/457951373} > [ERROR] Crashed tests: > [ERROR] org.apache.hadoop.fs.ozone.TestOzoneFileInterfaces > {code} > Log ends with NPE starting Recon, but not sure if it is a cause or an effect: > {code:title=https://github.com/apache/hadoop-ozone/suites/470859679/artifacts/2058465} > 2020-02-20 14:45:25,041 [main] INFO tasks.ReconTaskControllerImpl > (ReconTaskControllerImpl.java:start(230)) - Starting Recon Task Controller. > java.lang.NullPointerException > at > org.apache.hadoop.ozone.recon.ReconServer.start(ReconServer.java:118) > at org.apache.hadoop.ozone.recon.ReconServer.call(ReconServer.java:95) > at org.apache.hadoop.ozone.recon.ReconServer.call(ReconServer.java:39) > {code} > In [another run|https://github.com/apache/hadoop-ozone/runs/457338931] > TestOzoneClientRetriesOnException crashed, > [log|https://github.com/apache/hadoop-ozone/suites/470219052/artifacts/2048272] > also ends with same NPE. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3016) Fix TestMultiBlockWritesWithDnFailures.java
[ https://issues.apache.org/jira/browse/HDDS-3016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3016: - Labels: pull-request-available (was: ) > Fix TestMultiBlockWritesWithDnFailures.java > --- > > Key: HDDS-3016 > URL: https://issues.apache.org/jira/browse/HDDS-3016 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Client >Affects Versions: 0.5.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3044) Fix TestDeleteWithSlowFollower.java
[ https://issues.apache.org/jira/browse/HDDS-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3044: - Labels: pull-request-available (was: ) > Fix TestDeleteWithSlowFollower.java > --- > > Key: HDDS-3044 > URL: https://issues.apache.org/jira/browse/HDDS-3044 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Datanode >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-818) OzoneConfiguration uses an existing XMLRoot value
[ https://issues.apache.org/jira/browse/HDDS-818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-818: Labels: pull-request-available (was: ) > OzoneConfiguration uses an existing XMLRoot value > - > > Key: HDDS-818 > URL: https://issues.apache.org/jira/browse/HDDS-818 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Giovanni Matteo Fumarola >Priority: Major > Labels: pull-request-available > Attachments: HDDS-818.v0.patch > > > OzoneConfiguration and ConfInfo have > @XmlRootElement(name = "configuration") > This makes REST Client crash for XML calls. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3046) Fix Retry handling in Hadoop RPC Client
[ https://issues.apache.org/jira/browse/HDDS-3046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3046: - Labels: OMHA pull-request-available (was: OMHA) > Fix Retry handling in Hadoop RPC Client > --- > > Key: HDDS-3046 > URL: https://issues.apache.org/jira/browse/HDDS-3046 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: OMHA, pull-request-available > > Right now for all other exceptions other than serviceException we use > FailOverOnNetworkException. > This Exception policy is created with 15 max fail overs and 15 retries. > > {code:java} > retryPolicyOnNetworkException.shouldRetry( > exception, retries, failovers, isIdempotentOrAtMostOnce);{code} > *2 issues with this:* > # When shouldRetry returns action FAILOVER_AND_RETRY, it will stuck with > same OM, and does not perform failover to next OM. As > OMFailoverProxyProvider#performFailover() is a dummy call does not perform > any failover. > # When ozone.client.failover.max.attempts is set to 15, now with 2 policies > with each set to 15, we will retry 15*2 times in worst scenario. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3051) Periodic HDDS volume checker thread should be a daemon
[ https://issues.apache.org/jira/browse/HDDS-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3051: - Labels: pull-request-available (was: ) > Periodic HDDS volume checker thread should be a daemon > -- > > Key: HDDS-3051 > URL: https://issues.apache.org/jira/browse/HDDS-3051 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Marton Elek >Assignee: Marton Elek >Priority: Major > Labels: pull-request-available > > Periodic HDDS volume checker is an auxiliary thread. It can be stopped if the > main threads are stopped. Therefore we need to mark it as daemon thread. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3053) Decrease the number of the chunk writer threads
[ https://issues.apache.org/jira/browse/HDDS-3053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3053: - Labels: pull-request-available (was: ) > Decrease the number of the chunk writer threads > --- > > Key: HDDS-3053 > URL: https://issues.apache.org/jira/browse/HDDS-3053 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Marton Elek >Priority: Major > Labels: pull-request-available > > As of now we create 60 threads ( > dfs.container.ratis.num.write.chunk.threads) to write chunk data to the disk. > As the write is limited by the IO I can't see any benefit to have so many > threads. High number of thread means a high context switch overhead, > therefore it seems to be more reasonable to use only a limited number of > threads. > For example 10 threads should be enough even with 5 external disk. > If you know any reason to keep the number 60, please let me know... -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3050) Use meaningful name for ChunkWriter threads
[ https://issues.apache.org/jira/browse/HDDS-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3050: - Labels: pull-request-available (was: ) > Use meaningful name for ChunkWriter threads > --- > > Key: HDDS-3050 > URL: https://issues.apache.org/jira/browse/HDDS-3050 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Marton Elek >Assignee: Marton Elek >Priority: Major > Labels: pull-request-available > > ChunkWriter threads acreated with a naming schema 'pool-[x]-thread-[y]'. We > can use better naming (especially as we have 60 threads...) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3052) Test ChunkManagerImpl performance with long-running freon tests
[ https://issues.apache.org/jira/browse/HDDS-3052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3052: - Labels: pull-request-available (was: ) > Test ChunkManagerImpl performance with long-running freon tests > --- > > Key: HDDS-3052 > URL: https://issues.apache.org/jira/browse/HDDS-3052 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Marton Elek >Assignee: Marton Elek >Priority: Major > Labels: pull-request-available > > ChunkManagerImpl is the core of the data write path. Would be great to test > it with standard Freon toolset. It can provide a baseline disk speed and also > can validate different behavior of the ChunkManagerImpl (eg. is it faster > from 60 threads?) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3055) SCM crash during startup does not print any error message to log
[ https://issues.apache.org/jira/browse/HDDS-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3055: - Labels: pull-request-available (was: ) > SCM crash during startup does not print any error message to log > > > Key: HDDS-3055 > URL: https://issues.apache.org/jira/browse/HDDS-3055 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > > SCM start up failed due to a pipelineNotFoundException, there is no error > message logged in to SCM log. > In the log file, we can see just below log message no reason for the crash is > logged. > > > {code:java} > 2020-02-20 15:37:56,079 [shutdown-hook-0] INFO > org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter: > SHUTDOWN_MSG: > / > SHUTDOWN_MSG: Shutting down StorageContainerManager at xx.xx.xx/10.65.51.49 > {code} > In the out file, we can see below, but not complete exception message. > {code:java} > PipelineID=x not found{code} > > The actual reason for failure is not clearly logged if an exception has > occurred during SCM startup. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3047) ObjectStore#listVolumesByUser and CreateVolumeHandler#call should get user's full principal name instead of login name by default
[ https://issues.apache.org/jira/browse/HDDS-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3047: - Labels: pull-request-available (was: ) > ObjectStore#listVolumesByUser and CreateVolumeHandler#call should get user's > full principal name instead of login name by default > - > > Key: HDDS-3047 > URL: https://issues.apache.org/jira/browse/HDDS-3047 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Reporter: Siyao Meng >Assignee: Siyao Meng >Priority: Major > Labels: pull-request-available > > [{{ObjectStore#listVolumesByUser}}|https://github.com/apache/hadoop-ozone/blob/2fa37ef99b8fb4575169ba8326eeb677b3d2ed74/hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/ObjectStore.java#L249-L256] > is using {{getShortUserName()}} by default (when user is empty or null): > {code:java|title=ObjectStore#listVolumesByUser} > public Iterator listVolumesByUser(String user, > String volumePrefix, String prevVolume) > throws IOException { > if(Strings.isNullOrEmpty(user)) { > user = UserGroupInformation.getCurrentUser().getShortUserName(); // <-- > } > return new VolumeIterator(user, volumePrefix, prevVolume); > } > {code} > It should use {{getUserName()}} instead. > For a quick reference for the difference between {{getUserName()}} and > {{getShortUserName()}}: > {code:java|title=UserGroupInformation#getUserName} > /** >* Get the user's full principal name. >* @return the user's full principal name. >*/ > @InterfaceAudience.Public > @InterfaceStability.Evolving > public String getUserName() { > return user.getName(); > } > {code} > {code:java|title=UserGroupInformation#getShortUserName} > /** >* Get the user's login name. >* @return the user's name up to the first '/' or '@'. >*/ > public String getShortUserName() { > return user.getShortName(); > } > {code} > This won't cause issue if Kerberos is not in use. However, once Kerberos is > enabled, {{getUserName()}} and {{getShortUserName()}} result differs and can > cause some issues. > When Kerberos is enabled, {{getUserName()}} returns full principal name e.g. > {{om/o...@example.com}}, but {{getShortUserName()}} will return login name > e.g. {{hadoop}}. > If {{hadoop.security.auth_to_local}} is set, {{getShortUserName()}} result > can become very different from full principal name. > For example, when {{hadoop.security.auth_to_local = > RULE:[2:$1@$0](.*)s/.*/root/}}, > {{getShortUserName()}} returns {{root}}, while {{getUserName()}} still gives > {{om/o...@example.com}}.) > This can lead to user experience issue (when Kerberos is enabled) where the > user creates a volume with ozone shell ([uses > {{getUserName()}}|https://github.com/apache/hadoop-ozone/blob/ecb5bf4df1d80723835a1500d595102f3f861708/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/web/ozShell/volume/CreateVolumeHandler.java#L63-L65] > internally) then try to list it with {{ObjectStore#listVolumesByUser(null, > ...)}} ([uses {{getShortUserName()}} by > default|https://github.com/apache/hadoop-ozone/blob/2fa37ef99b8fb4575169ba8326eeb677b3d2ed74/hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/ObjectStore.java#L238-L256] > when user param is empty or null), the user won't see any volumes because of > the mismatch. > We should also double check *all* usages that uses {{getShortUserName()}}. > *Update:* > Xiaoyu and I checked that the usage of {{getShortUserName()}} on the server > side shouldn't become a problem. Because server should've maintained it's own > auth_to_local rules (admin should make sure they separate each user into > different short names. just don't map multiple principal names into the same > then it won't be a problem). > The usage in {{BasicOzoneFileSystem}} itself also seems valid because that > {{getShortUserName()}} is only used for client side purpose (to set > {{workingDir}}, etc.). > But the usage in {{ObjectStore#listVolumesByUser}} is confirmed problematic > at the moment, which needs to be fixed. Same for > [{{CreateVolumeHandler#call}}|https://github.com/apache/hadoop-ozone/blob/ecb5bf4df1d80723835a1500d595102f3f861708/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/web/ozShell/volume/CreateVolumeHandler.java#L81-L83]: > {code:java|title=CreateVolumeHandler#call} > } else { > rootName = UserGroupInformation.getCurrentUser().getShortUserName(); > } > {code} > It should pass full principal name to server. > CC [~xyao] [~aengineer] [~arp] [~bharat] -- This message was sent by Atlassian Jira (v8.3.4#803005) -
[jira] [Updated] (HDDS-3057) Improve Ozone Shell ACL operations' help text readability
[ https://issues.apache.org/jira/browse/HDDS-3057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3057: - Labels: pull-request-available (was: ) > Improve Ozone Shell ACL operations' help text readability > - > > Key: HDDS-3057 > URL: https://issues.apache.org/jira/browse/HDDS-3057 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Siyao Meng >Assignee: Siyao Meng >Priority: Major > Labels: pull-request-available > > Currently: > {code:bash|title=ozone sh volume addacl -h} > $ ozone sh volume addacl -h > Usage: ozone sh volume addacl [-hV] -a= [-s=] > Add a new Acl. >URI of the volume/bucket. > Ozone URI could start with o3:// or without > prefix. URI > may contain the host and port of the OM server. > Both > are optional. If they are not specified it will > be > identified from the config files. > -a, --acl= Add acl.r = READ,w = WRITE,c = CREATE,d = > DELETE,l = > LIST,a = ALL,n = NONE,x = READ_AC,y = > WRITE_ACEx user: > user1:rw or group:hadoop:rw > -h, --helpShow this help message and exit. > -s, --store= store type. i.e OZONE or S3 > -V, --version Print version information and exit. > {code} > {code:bash|title=ozone sh bucket addacl -h} > $ ozone sh bucket addacl -h > Usage: ozone sh bucket addacl [-hV] -a= [-s=] > Add a new Acl. >URI of the volume/bucket. > Ozone URI could start with o3:// or without > prefix. URI > may contain the host and port of the OM server. > Both > are optional. If they are not specified it will > be > identified from the config files. > -a, --acl= new acl.r = READ,w = WRITE,c = CREATE,d = > DELETE,l = > LIST,a = ALL,n = NONE,x = READ_AC,y = > WRITE_ACEx user: > user1:rw or group:hadoop:rw > -h, --helpShow this help message and exit. > -s, --store= store type. i.e OZONE or S3 > -V, --version Print version information and exit. > {code} > Same for {{ozone sh (volume|bucket|key) (addacl|removeacl|setacl|-getacl-)}} > It would look much nicer to have a line separator or space between {{acl.}} > and {{.r = READ,...}}. > And improves the prompt on error and overall readability, and correct typo: > {{READ_AC -> READ_ACL}}, {{WRITE_AC -> WRITE_ACL}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3054) OzoneFileStatus#getModificationTime should return actual directory modification time when its OmKeyInfo is available
[ https://issues.apache.org/jira/browse/HDDS-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3054: - Labels: pull-request-available (was: ) > OzoneFileStatus#getModificationTime should return actual directory > modification time when its OmKeyInfo is available > > > Key: HDDS-3054 > URL: https://issues.apache.org/jira/browse/HDDS-3054 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Siyao Meng >Assignee: Siyao Meng >Priority: Major > Labels: pull-request-available > > As of current implementation, > [{{getModificationTime()}}|https://github.com/apache/hadoop-ozone/blob/c9f26ccf9f93a052c5c0c042c57b6f87709597ae/hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/helpers/OzoneFileStatus.java#L90-L107] > always returns "fake" modification time (current time) for directory due to > the reason that a directory in Ozone might be faked from a file key. > But, there are cases where real directory key exists in OzoneBucket. For > example when user calls {{fs.mkdirs(directory)}}. In this case, a reasonable > thing to do would be getting the modification time from the OmInfoKey, rather > than faking it. > CC [~xyao] > My POC for the fix: > {code:java|title=Diff} > diff --git > a/hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/helpers/OzoneFileStatus.java > > b/hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/helpers/OzoneFileStatus.java > index 8717946512..708e62d692 100644 > --- > a/hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/helpers/OzoneFileStatus.java > +++ > b/hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/helpers/OzoneFileStatus.java > @@ -93,7 +93,7 @@ public FileStatus makeQualified(URI defaultUri, Path parent, > */ >@Override >public long getModificationTime(){ > -if (isDirectory()) { > +if (isDirectory() && super.getModificationTime() == 0) { >return System.currentTimeMillis(); > } else { >return super.getModificationTime(); > diff --git > a/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java > > b/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java > index 1be5fb3f3c..cb8f647a41 100644 > --- > a/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java > +++ > b/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java > @@ -2004,8 +2004,14 @@ public OmKeyInfo lookupFile(OmKeyArgs args, String > clientAddress) >} else { > // if entry is a directory > if (!deletedKeySet.contains(entryInDb)) { > - cacheKeyMap.put(entryInDb, > - new OzoneFileStatus(immediateChild)); > + if (!entryKeyName.equals(immediateChild)) { > +cacheKeyMap.put(entryInDb, > +new OzoneFileStatus(immediateChild)); > + } else { > +// If entryKeyName matches dir name, we have the info > +cacheKeyMap.put(entryInDb, > +new OzoneFileStatus(value, 0, true)); > + } >countEntries++; > } > // skip the other descendants of this child directory. > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3026) OzoneManager#listStatus should be auditted as READ operation instead of WRITE operation
[ https://issues.apache.org/jira/browse/HDDS-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3026: - Labels: pull-request-available (was: ) > OzoneManager#listStatus should be auditted as READ operation instead of WRITE > operation > --- > > Key: HDDS-3026 > URL: https://issues.apache.org/jira/browse/HDDS-3026 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Sammi Chen >Assignee: Jan Hentschel >Priority: Major > Labels: pull-request-available > > Currently, listStatus use AUDIT.logWriteSuccess and AUDIT.logWriteFailure > to log AUDIT info. It should use AUDIT.logReadSuccess and > AUDIT.logReadFailure instead. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3058) OzoneFileSystem should override unsupported set type FileSystem API
[ https://issues.apache.org/jira/browse/HDDS-3058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3058: - Labels: pull-request-available (was: ) > OzoneFileSystem should override unsupported set type FileSystem API > --- > > Key: HDDS-3058 > URL: https://issues.apache.org/jira/browse/HDDS-3058 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Filesystem >Affects Versions: 0.4.1 >Reporter: Yiqun Lin >Assignee: Yiqun Lin >Priority: Major > Labels: pull-request-available > > Currently, OzoneFileSystem only implements some common useful FileSystem APIs > and most of other API are not supported and inherited from parent class > FileSystem by default. However, FileSystem do nothing in some set type > method, like setReplication, setOwner. > {code:java} > public void setVerifyChecksum(boolean verifyChecksum) { > //doesn't do anything > } > public void setWriteChecksum(boolean writeChecksum) { > //doesn't do anything > } > public boolean setReplication(Path src, short replication) > throws IOException { > return true; > } > public void setPermission(Path p, FsPermission permission > ) throws IOException { > } > public void setOwner(Path p, String username, String groupname > ) throws IOException { > } > public void setTimes(Path p, long mtime, long atime > ) throws IOException { > } > {code} > This set type functions depend on the sub-filesystem implementation. We need > to to throw unsupported exception if sub-filesystem cannot support this. > Otherwise, it will make users confused to use hadoop fs -setrep command or > call setReplication api. Users will not see any exception but the command/API > can execute fine. This is happened when I tested for the OzoneFileSystem via > hadoop fs command way. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3059) OzoneManager#listFileStatus should be auditted as READ operation instead of WRITE operation
[ https://issues.apache.org/jira/browse/HDDS-3059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3059: - Labels: pull-request-available (was: ) > OzoneManager#listFileStatus should be auditted as READ operation instead of > WRITE operation > --- > > Key: HDDS-3059 > URL: https://issues.apache.org/jira/browse/HDDS-3059 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Jan Hentschel >Assignee: Jan Hentschel >Priority: Minor > Labels: pull-request-available > > Currently, listFileStatus use AUDIT.logWriteSuccess and AUDIT.logWriteFailure > to log AUDIT info. It should use AUDIT.logReadSuccess and > AUDIT.logReadFailure instead. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2886) parse and dump datanode segment file to pritable text
[ https://issues.apache.org/jira/browse/HDDS-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-2886: - Labels: pull-request-available (was: ) > parse and dump datanode segment file to pritable text > - > > Key: HDDS-2886 > URL: https://issues.apache.org/jira/browse/HDDS-2886 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.5.0 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Labels: pull-request-available > Attachments: HDDS-2886.001.patch, log_dump, log_inprogress_0 > > > Add a tool to parse and dump datanode ratis log files and dump them in a > string format. > This tool will help in debugging of ratis issues. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3049) Replication factor passed in create API doesn't take effect
[ https://issues.apache.org/jira/browse/HDDS-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3049: - Labels: pull-request-available (was: ) > Replication factor passed in create API doesn't take effect > --- > > Key: HDDS-3049 > URL: https://issues.apache.org/jira/browse/HDDS-3049 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Sammi Chen >Assignee: mingchao zhao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3043) Fix TestFailureHandlingByClient.java
[ https://issues.apache.org/jira/browse/HDDS-3043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3043: - Labels: pull-request-available (was: ) > Fix TestFailureHandlingByClient.java > > > Key: HDDS-3043 > URL: https://issues.apache.org/jira/browse/HDDS-3043 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Client >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3013) Fix TestBlockOutputStreamWithFailures.java
[ https://issues.apache.org/jira/browse/HDDS-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3013: - Labels: pull-request-available (was: ) > Fix TestBlockOutputStreamWithFailures.java > -- > > Key: HDDS-3013 > URL: https://issues.apache.org/jira/browse/HDDS-3013 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Client >Affects Versions: 0.5.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3063) Add test to verify replication factor of ozone fs
[ https://issues.apache.org/jira/browse/HDDS-3063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3063: - Labels: pull-request-available (was: ) > Add test to verify replication factor of ozone fs > - > > Key: HDDS-3063 > URL: https://issues.apache.org/jira/browse/HDDS-3063 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: test >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Minor > Labels: pull-request-available > > Currently no test verifies that {{ozone fs}} creates keys for files with the > expected replication factor. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2995) Add integration test for Recon's Passive SCM state.
[ https://issues.apache.org/jira/browse/HDDS-2995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-2995: - Labels: pull-request-available (was: ) > Add integration test for Recon's Passive SCM state. > --- > > Key: HDDS-2995 > URL: https://issues.apache.org/jira/browse/HDDS-2995 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Recon >Affects Versions: 0.5.0 >Reporter: Aravindan Vijayan >Assignee: Aravindan Vijayan >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Attachments: HDDS-2995-001.patch > > > * Verify Recon gets pipeline, node and container report from Datanode. > * Verify SCM metadata state == Recon metadata state (Create pipeline , Close > pipeline, create container) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3065) Ozone Filesystem should return real default replication
[ https://issues.apache.org/jira/browse/HDDS-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3065: - Labels: pull-request-available (was: ) > Ozone Filesystem should return real default replication > --- > > Key: HDDS-3065 > URL: https://issues.apache.org/jira/browse/HDDS-3065 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Filesystem >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Major > Labels: pull-request-available > > Ozone {{FileSystem}} implementation should return the actual configured > replication factor for {{getDefaultReplication()}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3066) SCM startup failed during loading containers from DB
[ https://issues.apache.org/jira/browse/HDDS-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3066: - Labels: pull-request-available (was: ) > SCM startup failed during loading containers from DB > - > > Key: HDDS-3066 > URL: https://issues.apache.org/jira/browse/HDDS-3066 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > > This is happening because pipeline scrubber came and removed pipeline, and > it closed pipeline and removed from DB and triggered close containers to set > them to CLOSING. When SCM is restarted before close container command is > handled and change the state to CLOSING, the below issue can happen. > > This can happen in other scenarios like when safeModeHandler calls > finalizeAndDestroyPipeline and do SCM restart. > > The root cause for this is Pipeline removed from DB and the container is in > open state in this scenario, and when trying to get pipeline we will crash > SCM due to the {{PipelineNotFoundException error.}} > {{}} > {code:java} > 2020-02-21 13:57:34,888 [main] ERROR > org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter: SCM start > failed with exception > org.apache.hadoop.hdds.scm.pipeline.PipelineNotFoundException: > PipelineID=35dff62d-9bfa-449b-b6e8-6f00cc8c1b6e not found at > org.apache.hadoop.hdds.scm.pipeline.PipelineStateMap.getPipeline(PipelineStateMap.java:133) > at > org.apache.hadoop.hdds.scm.pipeline.PipelineStateMap.addContainerToPipeline(PipelineStateMap.java:110) > at > org.apache.hadoop.hdds.scm.pipeline.PipelineStateManager.addContainerToPipeline(PipelineStateManager.java:59) > at > org.apache.hadoop.hdds.scm.pipeline.SCMPipelineManager.addContainerToPipeline(SCMPipelineManager.java:309) > at > org.apache.hadoop.hdds.scm.container.SCMContainerManager.loadExistingContainers(SCMContainerManager.java:121) > at > org.apache.hadoop.hdds.scm.container.SCMContainerManager.(SCMContainerManager.java:107) > at > org.apache.hadoop.hdds.scm.server.StorageContainerManager.initializeSystemManagers(StorageContainerManager.java:412) > at > org.apache.hadoop.hdds.scm.server.StorageContainerManager.(StorageContainerManager.java:283) > at > org.apache.hadoop.hdds.scm.server.StorageContainerManager.(StorageContainerManager.java:215) > at > org.apache.hadoop.hdds.scm.server.StorageContainerManager.createSCM(StorageContainerManager.java:612) > at > org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter$SCMStarterHelper.start(StorageContainerManagerStarter.java:142) > at > org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter.startScm(StorageContainerManagerStarter.java:117) > at > org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter.call(StorageContainerManagerStarter.java:66) > at > org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter.call(StorageContainerManagerStarter.java:42) > at picocli.CommandLine.execute(CommandLine.java:1173) at > picocli.CommandLine.access$800(CommandLine.java:141) at > picocli.CommandLine$RunLast.handle(CommandLine.java:1367) at > picocli.CommandLine$RunLast.handle(CommandLine.java:1335) at > picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:1243) > at picocli.CommandLine.parseWithHandlers(CommandLine.java:1526) at > picocli.CommandLine.parseWithHandler(CommandLine.java:1465) at > org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:65) at > org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:56) at > org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter.main(StorageContainerManagerStarter.java:55) > 2020-02-21 13:57:34,892 [shutdown-hook-0] INFO > org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter: > SHUTDOWN_MSG: / > SHUTDOWN_MSG: Shutting down StorageContainerManager at > om-ha-1.vpc.cloudera.com/10.65.51.49 > /{code} > {{}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3067) Fix Bug in Scrub Pipeline causing destory pipelines after SCM restart
[ https://issues.apache.org/jira/browse/HDDS-3067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3067: - Labels: OMHATest pull-request-available (was: OMHATest) > Fix Bug in Scrub Pipeline causing destory pipelines after SCM restart > - > > Key: HDDS-3067 > URL: https://issues.apache.org/jira/browse/HDDS-3067 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: OMHATest, pull-request-available > > Currently, the scrubber is run as part of create pipeline. > When SCM is started, scrubber is coming up and cleaning up all the containers > in SCM. Because when loading pipelines, the pipelineCreationTimeStamp is set > from when the pipeline is created. > > Because of this, below condition is satisfied and destroying all the > pipelines when SCM is restarted. This can be easily reproduced start SCM, > wait for 10 minutes and restart SCM. > > {code:java} > List needToSrubPipelines = stateManager.getPipelines(type, factor, > Pipeline.PipelineState.ALLOCATED).stream() > .filter(p -> currentTime.toEpochMilli() - p.getCreationTimestamp() > .toEpochMilli() >= pipelineScrubTimeoutInMills) > .collect(Collectors.toList()); > for (Pipeline p : needToSrubPipelines) { > LOG.info("srubbing pipeline: id: " + p.getId().toString() + > " since it stays at ALLOCATED stage for " + > Duration.between(currentTime, p.getCreationTimestamp()).toMinutes() + > " mins."); > finalizeAndDestroyPipeline(p, false); > }{code} > > *Log showing scrubbing of pipeline* > > {code:java} > 2020-02-20 12:42:18,946 [RatisPipelineUtilsThread] INFO > org.apache.hadoop.hdds.scm.pipeline.SCMPipelineManager: srubbing pipeline: > id: PipelineID=35dff62d-9bfa-449b-b6e8-6f00cc8c1b6e since it stays at > ALLOCATED stage for -1003 mins.{code} > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3068) OM crash during startup does not print any error message to log
[ https://issues.apache.org/jira/browse/HDDS-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3068: - Labels: OMHATest pull-request-available (was: OMHATest) > OM crash during startup does not print any error message to log > --- > > Key: HDDS-3068 > URL: https://issues.apache.org/jira/browse/HDDS-3068 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Manager >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: OMHATest, pull-request-available > > During code read found similar thing, we don't log for OM start also. As OM > startup also using similar code for startup. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2648) TestOzoneManagerDoubleBufferWithOMResponse
[ https://issues.apache.org/jira/browse/HDDS-2648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-2648: - Labels: pull-request-available (was: ) > TestOzoneManagerDoubleBufferWithOMResponse > -- > > Key: HDDS-2648 > URL: https://issues.apache.org/jira/browse/HDDS-2648 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Reporter: Marton Elek >Assignee: Bharat Viswanadham >Priority: Blocker > Labels: pull-request-available > > The test is flaky: > > Example run: [https://github.com/apache/hadoop-ozone/runs/325281277] > > Failure: > {code:java} > --- > Test set: > org.apache.hadoop.ozone.om.ratis.TestOzoneManagerDoubleBufferWithOMResponse > --- > Tests run: 3, Failures: 1, Errors: 0, Skipped: 1, Time elapsed: 5.31 s <<< > FAILURE! - in > org.apache.hadoop.ozone.om.ratis.TestOzoneManagerDoubleBufferWithOMResponse > testDoubleBufferWithMixOfTransactionsParallel(org.apache.hadoop.ozone.om.ratis.TestOzoneManagerDoubleBufferWithOMResponse) > Time elapsed: 0.282 s <<< FAILURE! > java.lang.AssertionError: expected:<32> but was:<29> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.ozone.om.ratis.TestOzoneManagerDoubleBufferWithOMResponse.testDoubleBufferWithMixOfTransactionsParallel(TestOzoneManagerDoubleBufferWithOMResponse.java:247) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2816) Fix shell description for --start parameter of listing keys
[ https://issues.apache.org/jira/browse/HDDS-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-2816: - Labels: newbie pull-request-available (was: newbie) > Fix shell description for --start parameter of listing keys > --- > > Key: HDDS-2816 > URL: https://issues.apache.org/jira/browse/HDDS-2816 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: YiSheng Lien >Priority: Minor > Labels: newbie, pull-request-available > > In master-branch, the listing key show the description > *--start= The first key to start the listing* > We should update the description to "*The key to start the listing from. This > will be excluded from the result.*" -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2984) Allocate Block failing with NPE
[ https://issues.apache.org/jira/browse/HDDS-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-2984: - Labels: pull-request-available teragentest (was: teragentest) > Allocate Block failing with NPE > --- > > Key: HDDS-2984 > URL: https://issues.apache.org/jira/browse/HDDS-2984 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Attila Doroszlai >Priority: Major > Labels: pull-request-available, teragentest > Attachments: Screen Shot 2020-02-05 at 2.48.56 PM.png > > > When running teragen in one of the run observed this error. > {code:java} > 05 14:43:16,635 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception > running child : INTERNAL_ERROR > org.apache.hadoop.ozone.om.exceptions.OMException: > java.lang.NullPointerException at > java.util.Objects.requireNonNull(Objects.java:203) at > java.util.Optional.(Optional.java:96) at > java.util.Optional.of(Optional.java:108) at > org.apache.hadoop.hdds.scm.pipeline.SCMPipelineMetrics.incNumBlocksAllocated(SCMPipelineMetrics.java:118) > at > org.apache.hadoop.hdds.scm.pipeline.SCMPipelineManager.incNumBlocksAllocatedMetric(SCMPipelineManager.java:520) > at > org.apache.hadoop.hdds.scm.block.BlockManagerImpl.newBlock(BlockManagerImpl.java:265) > at > org.apache.hadoop.hdds.scm.block.BlockManagerImpl.allocateBlock(BlockManagerImpl.java:233) > at > org.apache.hadoop.hdds.scm.server.SCMBlockProtocolServer.allocateBlock(SCMBlockProtocolServer.java:188) > at > org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocolServerSideTranslatorPB.allocateScmBlock(ScmBlockLocationProtocolServerSideTranslatorPB.java:159) > at > org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocolServerSideTranslatorPB.processMessage(ScmBlockLocationProtocolServerSideTranslatorPB.java:117) > at > org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72) > at > org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocolServerSideTranslatorPB.send(ScmBlockLocationProtocolServerSideTranslatorPB.java:98) > at > org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:13157) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) at > org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:984) at > org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:912) at > java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:422) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2882) at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:792) > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.createFile(OzoneManagerProtocolClientSideTranslatorPB.java:1596) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.apache.hadoop.hdds.tracing.TraceAllMethod.invoke(TraceAllMethod.java:71) > at com.sun.proxy.$Proxy18.createFile(Unknown Source) at > org.apache.hadoop.ozone.client.rpc.RpcClient.createFile(RpcClient.java:1071) > at > org.apache.hadoop.ozone.client.OzoneBucket.createFile(OzoneBucket.java:538) > at > org.apache.hadoop.fs.ozone.BasicOzoneClientAdapterImpl.createFile(BasicOzoneClientAdapterImpl.java:208) > at > org.apache.hadoop.fs.ozone.BasicOzoneFileSystem.createOutputStream(BasicOzoneFileSystem.java:256) > at > org.apache.hadoop.fs.ozone.BasicOzoneFileSystem.create(BasicOzoneFileSystem.java:237) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1133) at > org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1113) at > org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1002) at > org.apache.hadoop.fs.FileSystem.create(FileSystem.java:990) at > org.apache.hadoop.examples.terasort.TeraOutputFormat.getRecordWriter(TeraOutputFormat.java:141) > at > org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.(MapTask.java:659) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:779) at > org.apache.hadoop.mapred.MapTask.run(MapTask.java:347) at > org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) at > java.security.AccessController.doPriv
[jira] [Updated] (HDDS-3070) NPE when stop recon server while recon server was not really started before
[ https://issues.apache.org/jira/browse/HDDS-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3070: - Labels: pull-request-available (was: ) > NPE when stop recon server while recon server was not really started before > --- > > Key: HDDS-3070 > URL: https://issues.apache.org/jira/browse/HDDS-3070 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Recon >Affects Versions: 0.4.1 >Reporter: Yiqun Lin >Assignee: Yiqun Lin >Priority: Minor > Labels: pull-request-available > > I met a NPE error when I did test for Ozone. Seems the root cause is that > recon server was not really started however we still try to stop it. > {noformat} > 2020-02-25 20:22:44,296 [Thread-0] ERROR ozone.MiniOzoneClusterImpl > (MiniOzoneClusterImpl.java:build(525)) - Exception while shutting down the > Recon. > java.lang.NullPointerException > at > org.apache.hadoop.ozone.recon.tasks.ReconTaskControllerImpl.stop(ReconTaskControllerImpl.java:237) > at > org.apache.hadoop.ozone.recon.spi.impl.OzoneManagerServiceProviderImpl.stop(OzoneManagerServiceProviderImpl.java:229) > at org.apache.hadoop.ozone.recon.ReconServer.stop(ReconServer.java:132) > at > org.apache.hadoop.ozone.MiniOzoneClusterImpl.stopRecon(MiniOzoneClusterImpl.java:470) > at > org.apache.hadoop.ozone.MiniOzoneClusterImpl.access$200(MiniOzoneClusterImpl.java:87) > at > org.apache.hadoop.ozone.MiniOzoneClusterImpl$Builder.build(MiniOzoneClusterImpl.java:523) > at > org.apache.hadoop.fs.ozone.TestOzoneFileSystem.testFileSystem(TestOzoneFileSystem.java:72) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3038) TestRatisPipelineLeader fails since we no longer wait for leader in the HealthyPipelineSafeModeExitRule
[ https://issues.apache.org/jira/browse/HDDS-3038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3038: - Labels: pull-request-available (was: ) > TestRatisPipelineLeader fails since we no longer wait for leader in the > HealthyPipelineSafeModeExitRule > --- > > Key: HDDS-3038 > URL: https://issues.apache.org/jira/browse/HDDS-3038 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Attila Doroszlai >Assignee: Siddharth Wagle >Priority: Major > Labels: pull-request-available > > {code:title=https://github.com/apache/hadoop-ozone/runs/456217344} > [ERROR] Tests run: 2, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: > 29.823 s <<< FAILURE! - in org.apache.hadoop.hdds.scm.TestRatisPipelineLeader > [ERROR] > testLeaderIdAfterLeaderChange(org.apache.hadoop.hdds.scm.TestRatisPipelineLeader) > Time elapsed: 5.367 s <<< FAILURE! > java.lang.AssertionError > ... > at > org.apache.hadoop.hdds.scm.TestRatisPipelineLeader.verifyLeaderInfo(TestRatisPipelineLeader.java:125) > at > org.apache.hadoop.hdds.scm.TestRatisPipelineLeader.testLeaderIdAfterLeaderChange(TestRatisPipelineLeader.java:106) > [ERROR] > testLeaderIdUsedOnFirstCall(org.apache.hadoop.hdds.scm.TestRatisPipelineLeader) > Time elapsed: 0.008 s <<< FAILURE! > java.lang.AssertionError > ... > at > org.apache.hadoop.hdds.scm.TestRatisPipelineLeader.verifyLeaderInfo(TestRatisPipelineLeader.java:125) > at > org.apache.hadoop.hdds.scm.TestRatisPipelineLeader.testLeaderIdUsedOnFirstCall(TestRatisPipelineLeader.java:76) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3072) SCM scrub pipeline should be started after coming out of safe mode
[ https://issues.apache.org/jira/browse/HDDS-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3072: - Labels: pull-request-available (was: ) > SCM scrub pipeline should be started after coming out of safe mode > -- > > Key: HDDS-3072 > URL: https://issues.apache.org/jira/browse/HDDS-3072 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > > We should start scrubbing pipelines after SCM is out of safe mode. > Reasons to do this: > # Right now, we do scrub pipeline as part of triggerPipelineCreation, now > when we scrub pipelines in allocated state for more than > "ozone.scm.pipeline.allocated.timeout", we might close some pipelines and > with this, we might not be able to come out of safeMode. As in SafeModeRules, > we get pipeline count from pipelineDB during initialization. > Example scenario: > # Stop 3 Datanodes. > # Restart SCM. > # Start Datanode after 6 mts. We shall never come out of safe mode, as > pipeline in allocated state will meet scrubber time out condition. > To not to be in these kinds of scenarios, better thing to be done here is > scrub pipelines after SCM out of the safe mode > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3073) Implement ofs://: Fix listStatus continuation
[ https://issues.apache.org/jira/browse/HDDS-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3073: - Labels: pull-request-available (was: ) > Implement ofs://: Fix listStatus continuation > - > > Key: HDDS-3073 > URL: https://issues.apache.org/jira/browse/HDDS-3073 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Siyao Meng >Assignee: Siyao Meng >Priority: Major > Labels: pull-request-available > > Supplement to HDDS-2928 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3002) Make the Mountd working for Ozone
[ https://issues.apache.org/jira/browse/HDDS-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3002: - Labels: pull-request-available (was: ) > Make the Mountd working for Ozone > - > > Key: HDDS-3002 > URL: https://issues.apache.org/jira/browse/HDDS-3002 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Prashant Pogde >Assignee: Prashant Pogde >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2996) Create REST API to serve Node information and integrate with UI in Recon.
[ https://issues.apache.org/jira/browse/HDDS-2996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-2996: - Labels: pull-request-available (was: ) > Create REST API to serve Node information and integrate with UI in Recon. > - > > Key: HDDS-2996 > URL: https://issues.apache.org/jira/browse/HDDS-2996 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Recon >Affects Versions: 0.5.0 >Reporter: Aravindan Vijayan >Assignee: Vivek Ratnavel Subramanian >Priority: Major > Labels: pull-request-available > > We need a REST API in Recon to serve up information for the Datanodes page > (HDDS-2827). The REST API can also include other useful methods present in > NodeManager that gives the user information about the Nodes in the cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3084) Smoketest to write data on network aware cluster
[ https://issues.apache.org/jira/browse/HDDS-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3084: - Labels: pull-request-available (was: ) > Smoketest to write data on network aware cluster > > > Key: HDDS-3084 > URL: https://issues.apache.org/jira/browse/HDDS-3084 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Affects Versions: 0.6.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Labels: pull-request-available > > It would be good to create a smoke test which: > 1. Writes some data on a network aware cluster > 2. Stops 1 rack and ensures the data is still readable > 3. Restart the rack and stop the other rack and again check the data is > readable > That way we can have some confidence the data is being written to both racks > OK. > One issue with a test like this on a small cluster, is that there is a high > chance the data will end up on 2 racks naturally, even if no network topology > is configured. If that was the case, we would expect intermittent test > failures. > However, if network topology is working fine, then we would not expect any > failures. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3069) Delete key is failing
[ https://issues.apache.org/jira/browse/HDDS-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3069: - Labels: pull-request-available (was: ) > Delete key is failing > - > > Key: HDDS-3069 > URL: https://issues.apache.org/jira/browse/HDDS-3069 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Reporter: Nilotpal Nandi >Assignee: Bharat Viswanadham >Priority: Blocker > Labels: pull-request-available > > Delete key is failing . Here is the stack trace of the failure: > > > {noformat} > INFO retry.RetryInvocationHandler: com.google.protobuf.ServiceException: > org.apache.hadoop.ipc.RemoteException(java.lang.IllegalArgumentException): > Trying to set updateID to 26 which is not greater than the current value of > 433 for OMKeyInfo{volume='vol-test-restartcomponentozonereaddata-1582093704', > bucket='buck-test-restartcomponentozonereaddata-1582093704', > key='ReadOzoneFile_1582093709', dataSize='10485760', > creationTime='1582093712218', type='RATIS', factor='THREE'} E at > com.google.common.base.Preconditions.checkArgument(Preconditions.java:142) E > at > org.apache.hadoop.ozone.om.helpers.WithObjectID.setUpdateID(WithObjectID.java:79) > E at > org.apache.hadoop.ozone.om.request.key.OMKeyDeleteRequest.validateAndUpdateCache(OMKeyDeleteRequest.java:147) > E at > org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequest(OzoneManagerRequestHandler.java:230) > E at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB.java:210) > E at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:130) > E at > org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72) > E at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:98) > E at > org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java) > E at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528) > E at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) E at > org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:984) E at > org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:912) E at > java.base/java.security.AccessController.doPrivileged(Native Method) E at > java.base/javax.security.auth.Subject.doAs(Subject.java:423) E at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) > E at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2882) E , while > invoking $Proxy16.submitRequest over > nodeId=null,nodeAddress=quasar-vbncen-3.quasar-vbncen.root.hwx.site:9862. > Trying to failover immediately. > > .. > .. > .. > .. > > 20/02/19 03:37:17 INFO retry.RetryInvocationHandler: > com.google.protobuf.ServiceException: > org.apache.hadoop.ipc.RemoteException(java.lang.IllegalArgumentException): > Trying to set updateID to 22 which is not greater than the current value of > 1143 for OMKeyInfo{volume='vol-test-kill-datanode-1582075168', > bucket='buck-test-kill-datanode-1582075168', > key='replication_test1_1582075173', dataSize='104857600', > creationTime='1582075177268', type='RATIS', factor='THREE'} E at > com.google.common.base.Preconditions.checkArgument(Preconditions.java:142) E > at > org.apache.hadoop.ozone.om.helpers.WithObjectID.setUpdateID(WithObjectID.java:79) > E at > org.apache.hadoop.ozone.om.request.key.OMKeyDeleteRequest.validateAndUpdateCache(OMKeyDeleteRequest.java:147) > E at > org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequest(OzoneManagerRequestHandler.java:230) > E at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB.java:210) > E at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:130) > E at > org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72) > E at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:98) > E at > org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java) > E at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(Proto
[jira] [Updated] (HDDS-2929) Implement ofs://: temp directory mount
[ https://issues.apache.org/jira/browse/HDDS-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-2929: - Labels: pull-request-available (was: ) > Implement ofs://: temp directory mount > -- > > Key: HDDS-2929 > URL: https://issues.apache.org/jira/browse/HDDS-2929 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Siyao Meng >Assignee: Siyao Meng >Priority: Major > Labels: pull-request-available > > Because of ofs:// filesystem hierarchy starts with volume then bucket, an > application typically won't be able to write directly under a first-level > folder, e.g. ofs://service-id1/tmp/. /tmp/ is a special case that we need to > handle since that is the default location most legacy Hadoop applications > write to for swap/temporary files. In order to address this, we would > introduce /tmp/ as a client-side "mount" to another Ozone bucket. > Note that the preliminary implementation would only allow for /tmp/ to be a > mount but not any user-defined path. > This depends on HDDS-2840 and HDDS-2928. > Demo PR on my fork: https://github.com/smengcl/hadoop-ozone/pull/1 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3041) memory leak of s3g
[ https://issues.apache.org/jira/browse/HDDS-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3041: - Labels: pull-request-available (was: ) > memory leak of s3g > -- > > Key: HDDS-3041 > URL: https://issues.apache.org/jira/browse/HDDS-3041 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: S3 >Affects Versions: 0.6.0 >Reporter: JieWang >Priority: Major > Labels: pull-request-available > Attachments: image-2020-02-24-12-06-22-248.png, > image-2020-02-24-12-10-09-552.png, image-2020-02-26-17-11-31-834.png, > screenshot-1.png, screenshot-2.png, screenshot-3.png, screenshot-4.png, > screenshot-5.png > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3090) Fix logging in OMFileRequest and OzoneManager
[ https://issues.apache.org/jira/browse/HDDS-3090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3090: - Labels: pull-request-available (was: ) > Fix logging in OMFileRequest and OzoneManager > - > > Key: HDDS-3090 > URL: https://issues.apache.org/jira/browse/HDDS-3090 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Manager >Reporter: Supratim Deka >Assignee: Supratim Deka >Priority: Trivial > Labels: pull-request-available > > HDDS-2940 introduced a INFO level log in > hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/file/OMFileCreateRequest.java > This needs to be a TRACE because it occurs in the regular file create path. > Also, trace logs introduced in OzoneManager and OMFileRequest.java need to be > parameterized. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3078) Include output of timed out test in bundle
[ https://issues.apache.org/jira/browse/HDDS-3078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3078: - Labels: pull-request-available (was: ) > Include output of timed out test in bundle > -- > > Key: HDDS-3078 > URL: https://issues.apache.org/jira/browse/HDDS-3078 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: build, test >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Major > Labels: pull-request-available > > Sometimes a unit/integration test does not complete, nor does it crash. We > should collect the output of such tests in the result bundle for analysis. > Example: > {code:title=https://github.com/adoroszlai/hadoop-ozone/runs/469172863} > 2020-02-26T08:15:58.2297584Z [INFO] Running > org.apache.hadoop.ozone.freon.TestRandomKeyGenerator > 2020-02-26T08:30:59.6189916Z [INFO] Running > org.apache.hadoop.ozone.freon.TestDataValidateWithUnsafeByteOperations > ... > 2020-02-26T08:32:47.6155975Z [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M1:test (default-test) > on project hadoop-ozone-integration-test: There was a timeout or other error > in the fork > {code} > In this case TestRandomKeyGenerator had this problem. It might be a bit > tricky to find such tests, since these are not explicitly listed at the end, > unlike failed or crashed tests. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3092) Duplicate large key test
[ https://issues.apache.org/jira/browse/HDDS-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3092: - Labels: pull-request-available (was: ) > Duplicate large key test > > > Key: HDDS-3092 > URL: https://issues.apache.org/jira/browse/HDDS-3092 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Minor > Labels: pull-request-available > > {{TestDataValidate}} has 2 large key tests: > * {{ratisTestLargeKey}} > * {{standaloneTestLargeKey}} > But both of these test RATIS/3 replication since HDDS-675. I think > {{standaloneTestLargeKey}} can be removed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3108) Remove unused ForkJoinPool in RatisPipelineProvider
[ https://issues.apache.org/jira/browse/HDDS-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3108: - Labels: pull-request-available (was: ) > Remove unused ForkJoinPool in RatisPipelineProvider > --- > > Key: HDDS-3108 > URL: https://issues.apache.org/jira/browse/HDDS-3108 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Affects Versions: 0.6.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Labels: pull-request-available > > The RatisPipelineProvider has a ForkJoinPool that is never used anywhere > except when it is shutdown. > I suspect it was used previously and then some refactoring has made it > redundant and it got left behind. > This jira is to remove that unused code. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3109) Refactor 'Recon' in MiniOzoneCluster to use ephemeral port.
[ https://issues.apache.org/jira/browse/HDDS-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3109: - Labels: pull-request-available (was: ) > Refactor 'Recon' in MiniOzoneCluster to use ephemeral port. > --- > > Key: HDDS-3109 > URL: https://issues.apache.org/jira/browse/HDDS-3109 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Aravindan Vijayan >Assignee: Aravindan Vijayan >Priority: Major > Labels: pull-request-available > Fix For: 0.6.0 > > > Currently, Recon uses an ephemeral port only in the integration test for > Recon. In all other integration tests, we end up using the default (9888) > that causes failures in other integration tests that start up a Mini ozone > cluster. In addition, we want to start up Recon in MiniOzoneCluster by > explicitly requesting it rather than by default. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3085) OM Delta updates request in Recon should work with secure Ozone Manager.
[ https://issues.apache.org/jira/browse/HDDS-3085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3085: - Labels: pull-request-available (was: ) > OM Delta updates request in Recon should work with secure Ozone Manager. > > > Key: HDDS-3085 > URL: https://issues.apache.org/jira/browse/HDDS-3085 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Recon >Reporter: Aravindan Vijayan >Assignee: Vivek Ratnavel Subramanian >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2911) lastUsed and stateEnterTime value in container info is not human friendly
[ https://issues.apache.org/jira/browse/HDDS-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-2911: - Labels: newbie pull-request-available (was: newbie) > lastUsed and stateEnterTime value in container info is not human friendly > - > > Key: HDDS-2911 > URL: https://issues.apache.org/jira/browse/HDDS-2911 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Sammi Chen >Assignee: mingchao zhao >Priority: Major > Labels: newbie, pull-request-available > > ozone scmcli container list -s=7 > { > "state" : "CLOSED", > "replicationFactor" : "THREE", > "replicationType" : "RATIS", > "usedBytes" : 4794248299, > "numberOfKeys" : 7649, > "lastUsed" : 5388521335, > "stateEnterTime" : 808947405, > "owner" : "a46123a8-be63-4736-9478-ce4d8ac845cc", > "containerID" : 8, > "deleteTransactionId" : 0, > "sequenceId" : 0, > "open" : false > } -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3111) Add unit test for container replication behavior under different container placement policy
[ https://issues.apache.org/jira/browse/HDDS-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3111: - Labels: pull-request-available (was: ) > Add unit test for container replication behavior under different container > placement policy > --- > > Key: HDDS-3111 > URL: https://issues.apache.org/jira/browse/HDDS-3111 > Project: Hadoop Distributed Data Store > Issue Type: Test >Reporter: Yiqun Lin >Assignee: Yiqun Lin >Priority: Major > Labels: pull-request-available > > Currently, the unit test for ReplicationManager only tested for container > state change and container placement policy only focus on the policy > algorithm. > And we lack of one integration unit test for testing container replication > behavior under different container placement policy. Including some corner > cases, like not enough candidate node, fallback cases in rack awareness > policy. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2877) Fix description of return type
[ https://issues.apache.org/jira/browse/HDDS-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-2877: - Labels: newbie pull-request-available (was: newbie) > Fix description of return type > -- > > Key: HDDS-2877 > URL: https://issues.apache.org/jira/browse/HDDS-2877 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: YiSheng Lien >Priority: Minor > Labels: newbie, pull-request-available > > In this > [method|https://github.com/apache/hadoop-ozone/blob/master/hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/OzoneBucket.java#L616#L623], > the return type is *List*. > The description of return type is *List* for now, we should > update it to *List* -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3106) Intermittent timeout in TestOzoneManagerDoubleBufferWithOMResponse#testDoubleBuffer
[ https://issues.apache.org/jira/browse/HDDS-3106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3106: - Labels: pull-request-available (was: ) > Intermittent timeout in > TestOzoneManagerDoubleBufferWithOMResponse#testDoubleBuffer > --- > > Key: HDDS-3106 > URL: https://issues.apache.org/jira/browse/HDDS-3106 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Critical > Labels: pull-request-available > > {code:title=https://github.com/adoroszlai/hadoop-ozone/runs/474452740} > [ERROR] Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: > 505.227 s <<< FAILURE! - in > org.apache.hadoop.ozone.om.ratis.TestOzoneManagerDoubleBufferWithOMResponse > [ERROR] > testDoubleBuffer(org.apache.hadoop.ozone.om.ratis.TestOzoneManagerDoubleBufferWithOMResponse) > Time elapsed: 500.142 s <<< ERROR! > java.lang.Exception: test timed out after 50 milliseconds > at > org.apache.hadoop.ozone.om.ratis.TestOzoneManagerDoubleBufferWithOMResponse.testDoubleBuffer(TestOzoneManagerDoubleBufferWithOMResponse.java:394) > at > org.apache.hadoop.ozone.om.ratis.TestOzoneManagerDoubleBufferWithOMResponse.testDoubleBuffer(TestOzoneManagerDoubleBufferWithOMResponse.java:130) > {code} > Also in: > https://github.com/apache/hadoop-ozone/pull/590/checks?check_run_id=467388979 > CC [~bharat] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3113) Add new Freon test for putBlock
[ https://issues.apache.org/jira/browse/HDDS-3113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3113: - Labels: pull-request-available (was: ) > Add new Freon test for putBlock > --- > > Key: HDDS-3113 > URL: https://issues.apache.org/jira/browse/HDDS-3113 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: test >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Major > Labels: pull-request-available > > The goal of this task is to introduce a new Freon test that issues putBlock > commands. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2941) file create : create key table entries for intermediate directories in the path
[ https://issues.apache.org/jira/browse/HDDS-2941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-2941: - Labels: pull-request-available (was: ) > file create : create key table entries for intermediate directories in the > path > --- > > Key: HDDS-2941 > URL: https://issues.apache.org/jira/browse/HDDS-2941 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Manager >Reporter: Supratim Deka >Assignee: Supratim Deka >Priority: Major > Labels: pull-request-available > > similar to and a follow-up pf HDDS-2940 > this change covers the file create request handler in the OM. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2980) Delete replayed entry from OpenKeyTable during commit
[ https://issues.apache.org/jira/browse/HDDS-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-2980: - Labels: pull-request-available (was: ) > Delete replayed entry from OpenKeyTable during commit > - > > Key: HDDS-2980 > URL: https://issues.apache.org/jira/browse/HDDS-2980 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Hanisha Koneru >Assignee: Hanisha Koneru >Priority: Blocker > Labels: pull-request-available > > During KeyCreate (and S3InitiateMultipartUpload), we do not check the > OpenKeyTable if the key already exists. If it does exist and the transaction > is replayed, we just override the key in OpenKeyTable. This is done to avoid > extra DB reads. > During KeyCommit (or S3MultipartUploadCommit), if the key was already > committed, then we do not replay the transaction. This would result in the > OpenKeyTable entry to remain in the DB till it is garbage collected. > To avoid storing stale entries in OpenKeyTable, during commit replays, we > should check the openKeyTable and delete the entry if it exists. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3115) NPE seen in datanode log as ApplyTransaction failed
[ https://issues.apache.org/jira/browse/HDDS-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3115: - Labels: pull-request-available (was: ) > NPE seen in datanode log as ApplyTransaction failed > --- > > Key: HDDS-3115 > URL: https://issues.apache.org/jira/browse/HDDS-3115 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Nilotpal Nandi >Assignee: Siddharth Wagle >Priority: Major > Labels: pull-request-available > > Errors seen in datanode log. > > {noformat} > 1:04:32.860 PM ERROR ContainerStateMachine > gid group-00234F8B3578 : ApplyTransaction failed. cmd PutBlock logIndex 56 > msg : ContainerID 16 does not exist Container Result: CONTAINER_NOT_FOUND > 1:04:32.860 PM ERROR XceiverServerRatis > pipeline Action CLOSE on pipeline > PipelineID=b9601efc-f8bf-4b72-8077-00234f8b3578.Reason : Ratis Transaction > failure in datanode 2ba0ecb0-0739-4da9-9541-5fef23479f28 with role FOLLOWER > .Triggering pipeline close action. > 1:04:32.860 PM ERROR ContainerStateMachine > gid group-00234F8B3578 : ApplyTransaction failed. cmd WriteChunk logIndex 59 > exception {} > java.lang.NullPointerException > at > org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:226) > at > org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:162) > at > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:396) > at > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.runCommand(ContainerStateMachine.java:406) > at > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$applyTransaction$6(ContainerStateMachine.java:745) > at > java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834){noformat} > > {noformat} > 1:04:32.861 PM ERROR ContainerStateMachine > gid group-00234F8B3578 : ApplyTransaction failed. cmd PutBlock logIndex 60 > msg : ContainerID 16 does not exist Container Result: CONTAINER_NOT_FOUND > 1:04:32.861 PM ERROR XceiverServerRatis > pipeline Action CLOSE on pipeline > PipelineID=b9601efc-f8bf-4b72-8077-00234f8b3578.Reason : Ratis Transaction > failure in datanode 2ba0ecb0-0739-4da9-9541-5fef23479f28 with role FOLLOWER > .Triggering pipeline close action. > 1:04:32.904 PM ERROR StateContext > Critical error occurred in StateMachine, setting shutDownMachine > 1:04:34.862 PM ERROR DatanodeStateMachine > DatanodeStateMachine Shutdown due to an critical error{noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3118) Possible deadlock in LockManager
[ https://issues.apache.org/jira/browse/HDDS-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3118: - Labels: pull-request-available (was: ) > Possible deadlock in LockManager > > > Key: HDDS-3118 > URL: https://issues.apache.org/jira/browse/HDDS-3118 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Attila Doroszlai >Assignee: Bharat Viswanadham >Priority: Blocker > Labels: pull-request-available > Attachments: repro.log, repro.patch > > > {{LockManager}} has a possible deadlock. > # Number of locks is limited by using a {{GenericObjectPool}}. If N locks > are already acquired, new requestors need to wait. This wait in > {{getLockForLocking}} happens in a callback executed from > {{ConcurrentHashMap#compute}} while holding a lock on a map entry. > # While releasing a lock, {{decrementActiveLockCount}} implicitly requires a > lock on an entry in {{ConcurrentHashMap}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1008) Invalidate closed container replicas on a failed volume
[ https://issues.apache.org/jira/browse/HDDS-1008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-1008: - Labels: pull-request-available (was: ) > Invalidate closed container replicas on a failed volume > --- > > Key: HDDS-1008 > URL: https://issues.apache.org/jira/browse/HDDS-1008 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Datanode >Reporter: Arpit Agarwal >Assignee: Supratim Deka >Priority: Major > Labels: pull-request-available > > When a volume is detected as failed, all closed containers on the volume > should be marked as invalid. > Open containers will be handled separately. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3110) Fix race condition in Recon's container and pipeline handling.
[ https://issues.apache.org/jira/browse/HDDS-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3110: - Labels: pull-request-available (was: ) > Fix race condition in Recon's container and pipeline handling. > -- > > Key: HDDS-3110 > URL: https://issues.apache.org/jira/browse/HDDS-3110 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Recon >Reporter: Aravindan Vijayan >Assignee: Aravindan Vijayan >Priority: Major > Labels: pull-request-available > Fix For: 0.6.0 > > > Fix the following issues in Recon > * Both the Incremental container report handler and the regular container > report handler add new containers from SCM whenever they see a new container. > This test and add step must be synchronized between the 2 handlers to avoid > any inconsistent metadata state. > * NodeStateMap in allow does not addition of a single container to the Map of > Node -> Set of Containers since it instantiates with a > Collections.emptySet(), and then relies on a map.put() to update the value. > Changing this to a "new HashSet" allows addition of a container one by one > which is possible in Recon. > * Improve logging in Recon Container Manager when it receives a container > report from a node before receiving the pipeline report for a newly created > pipeline. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3071) Datanodes unable to connect to recon in Secure Environment
[ https://issues.apache.org/jira/browse/HDDS-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3071: - Labels: pull-request-available (was: ) > Datanodes unable to connect to recon in Secure Environment > -- > > Key: HDDS-3071 > URL: https://issues.apache.org/jira/browse/HDDS-3071 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Recon >Affects Versions: 0.6.0 >Reporter: Vivek Ratnavel Subramanian >Assignee: Aravindan Vijayan >Priority: Major > Labels: pull-request-available > Attachments: HDDS-3071.patch > > > Datanodes throw this exception while connecting to recon. > {code:java} > datanode_1 | java.io.IOException: DestHost:destPort recon:9891 , > LocalHost:localPort 6a99ad69685d/192.168.48.4:0. Failed on local exception: > java.io.IOException: Couldn't set up IO streams: > java.lang.IllegalArgumentException: Empty nameString not alloweddatanode_1 | > java.io.IOException: DestHost:destPort recon:9891 , LocalHost:localPort > 6a99ad69685d/192.168.48.4:0. Failed on local exception: java.io.IOException: > Couldn't set up IO streams: java.lang.IllegalArgumentException: Empty > nameString not alloweddatanode_1 | at > java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method)datanode_1 | at > java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)datanode_1 > | at > java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)datanode_1 > | at > java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)datanode_1 > | at > org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831)datanode_1 > | at > org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:806)datanode_1 | > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1515)datanode_1 | > at org.apache.hadoop.ipc.Client.call(Client.java:1457)datanode_1 | at > org.apache.hadoop.ipc.Client.call(Client.java:1367)datanode_1 | at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)datanode_1 > | at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)datanode_1 > | at com.sun.proxy.$Proxy40.submitRequest(Unknown Source)datanode_1 | at > org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolClientSideTranslatorPB.submitRequest(StorageContainerDatanodeProtocolClientSideTranslatorPB.java:116)datanode_1 > | at > org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolClientSideTranslatorPB.getVersion(StorageContainerDatanodeProtocolClientSideTranslatorPB.java:132)datanode_1 > | at > org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:71)datanode_1 > | at > org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:42)datanode_1 > | at > java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)datanode_1 > | at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)datanode_1 > | at > java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)datanode_1 > | at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)datanode_1 > | at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)datanode_1 > | at java.base/java.lang.Thread.run(Thread.java:834)datanode_1 | Caused > by: java.io.IOException: Couldn't set up IO streams: > java.lang.IllegalArgumentException: Empty nameString not alloweddatanode_1 | > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:866)datanode_1 > | at > org.apache.hadoop.ipc.Client$Connection.access$3700(Client.java:411)datanode_1 > | at > org.apache.hadoop.ipc.Client.getConnection(Client.java:1572)datanode_1 | at > org.apache.hadoop.ipc.Client.call(Client.java:1403)datanode_1 | ... 14 > moredatanode_1 | Caused by: java.lang.IllegalArgumentException: Empty > nameString not alloweddatanode_1 | at > java.security.jgss/sun.security.krb5.PrincipalName.validateNameStrings(PrincipalName.java:174)datanode_1 > | at > java.security.jgss/sun.security.krb5.PrincipalName.(PrincipalName.java:397)datanode_1 > | at > java.security.jgss/sun.security.krb5.PrincipalName.(PrincipalName.java:471)datanode_1 > | at > java.security.jgss/javax.security.auth.kerberos.KerberosPrincipal.(KerberosPrincipal.java:172)datanode_1 > | at > org.apache.hadoop.security.SaslRpcClient.getServerPrincipal(SaslRpcClient.java:305)datanode_1 > | at > or
[jira] [Updated] (HDDS-3116) Datanode sometimes fails to start with NPE when starting Ratis xceiver server
[ https://issues.apache.org/jira/browse/HDDS-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3116: - Labels: pull-request-available (was: ) > Datanode sometimes fails to start with NPE when starting Ratis xceiver server > - > > Key: HDDS-3116 > URL: https://issues.apache.org/jira/browse/HDDS-3116 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.5.0 >Reporter: Stephen O'Donnell >Assignee: Nanda kumar >Priority: Major > Labels: pull-request-available > Attachments: full_logs.txt > > > While working on a network Topology test (HDDS-3084) which does the following: > 1. Start a cluster with 6 DNs and 2 racks. > 2. Create a volume, bucket and a single key. > 3. Stop one rack of hosts using "docker-compose down" > 4. Read the data from the single key > 5. Start the 3 down hosts > 6. Stop the other 3 hosts > 7. Attempt to read the key again. > At step 5 I sometimes see this stack trace in one of the DNs and it fails to > full come up: > {code} > 2020-03-02 13:01:31,887 [Datanode State Machine Thread - 0] INFO > ozoneimpl.OzoneContainer: Attempting to start container services. > 2020-03-02 13:01:31,887 [Datanode State Machine Thread - 0] INFO > ozoneimpl.OzoneContainer: Background container scanner has been disabled. > 2020-03-02 13:01:31,887 [Datanode State Machine Thread - 0] INFO > ratis.XceiverServerRatis: Starting XceiverServerRatis > 8c1178dd-c44d-49d1-b899-cc3e40ae8f23 at port 9858 > 2020-03-02 13:01:31,887 [Datanode State Machine Thread - 0] WARN > statemachine.EndpointStateMachine: Unable to communicate to SCM server at > scm:9861 for past 15000 seconds. > java.io.IOException: java.lang.NullPointerException > at org.apache.ratis.util.IOUtils.asIOException(IOUtils.java:54) > at org.apache.ratis.util.IOUtils.toIOException(IOUtils.java:61) > at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:70) > at > org.apache.ratis.server.impl.RaftServerProxy.getImpls(RaftServerProxy.java:284) > at > org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:296) > at > org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.start(XceiverServerRatis.java:418) > at > org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.start(OzoneContainer.java:232) > at > org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:113) > at > org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:42) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.sendPipelineReport(XceiverServerRatis.java:757) > at > org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.notifyGroupAdd(XceiverServerRatis.java:739) > at > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.initialize(ContainerStateMachine.java:218) > at > org.apache.ratis.server.impl.ServerState.initStatemachine(ServerState.java:160) > at org.apache.ratis.server.impl.ServerState.(ServerState.java:112) > at > org.apache.ratis.server.impl.RaftServerImpl.(RaftServerImpl.java:112) > at > org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$2(RaftServerProxy.java:208) > at > java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700) > ... 3 more > {code} > The DN does not recover from this automatically, although I confirmed that a > full cluster restart fixed it (docker-compose stop; docker-compose start). I > will try to confirm if a restart of the stuck DN would fix it or not too. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3119) When ratis is enabled in OM, double Buffer metrics not getting updated
[ https://issues.apache.org/jira/browse/HDDS-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3119: - Labels: pull-request-available (was: ) > When ratis is enabled in OM, double Buffer metrics not getting updated > --- > > Key: HDDS-3119 > URL: https://issues.apache.org/jira/browse/HDDS-3119 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Reporter: Bharat Viswanadham >Assignee: Aravindan Vijayan >Priority: Blocker > Labels: pull-request-available > > DoubleBuffer metrics are not getting updated when ratis is enabled in OM. > There is no issue when ratis is not enabled, double buffer metrics are > updating fine. > {code:java} > {"name": > "Hadoop:service=OzoneManager,name=OzoneManagerDoubleBufferMetrics","modelerType": > "OzoneManagerDoubleBufferMetrics","tag.Hostname": > "hw13865.hitronhub.home","TotalNumOfFlushOperations": > 0,"TotalNumOfFlushedTransactions": > 0,"MaxNumberOfTransactionsFlushedInOneIteration": 0,"FlushTimeNumOps": > 0,"FlushTimeAvgTime": 0,"AvgFlushTransactionsInOneIteration": 0},{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3124) Time interval calculate error
[ https://issues.apache.org/jira/browse/HDDS-3124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3124: - Labels: pull-request-available (was: ) > Time interval calculate error > -- > > Key: HDDS-3124 > URL: https://issues.apache.org/jira/browse/HDDS-3124 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Affects Versions: 0.5.0 >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Labels: pull-request-available > Attachments: screenshot-1.png > > > Just as the image shows, the time interval in log "Unable to communicate to > SCM server at scm-0.scm:9861 for past " is 0, 3000 seconds, but actually it > is 0, 300 seconds. > !screenshot-1.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3129) Skip KeyTable check in OMKeyCommit
[ https://issues.apache.org/jira/browse/HDDS-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3129: - Labels: pull-request-available (was: ) > Skip KeyTable check in OMKeyCommit > -- > > Key: HDDS-3129 > URL: https://issues.apache.org/jira/browse/HDDS-3129 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > > With replay logic, we have additional keyTable check to detect whether it is > replay or not. > In non-HA case, we don't need this check. So this Jira is to skip that check > in case of non-HA when ratis is not enabled. > > *Ran simple test to know the perf impact:* > > 2295 Keys/sec with Additional Key Table check > 2824 Keys/sec with removing that check > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2874) Fix description of return in getNextListOfBuckets
[ https://issues.apache.org/jira/browse/HDDS-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-2874: - Labels: newbie pull-request-available (was: newbie) > Fix description of return in getNextListOfBuckets > - > > Key: HDDS-2874 > URL: https://issues.apache.org/jira/browse/HDDS-2874 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: YiSheng Lien >Priority: Minor > Labels: newbie, pull-request-available > > In this > [line|https://github.com/apache/hadoop-ozone/blob/master/hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/OzoneVolume.java#L319], > the description of return should be *List* -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2780) Fix javadoc of OMVolume response classes
[ https://issues.apache.org/jira/browse/HDDS-2780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-2780: - Labels: newbie pull-request-available (was: newbie) > Fix javadoc of OMVolume response classes > > > Key: HDDS-2780 > URL: https://issues.apache.org/jira/browse/HDDS-2780 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Bharat Viswanadham >Priority: Minor > Labels: newbie, pull-request-available > > Fix javadoc of OMVolumeCreateResponse and OMVolumeDeleteResponse > {code:java} > /** > * Response for CreateBucket request. > */ > public class OMVolumeCreateResponse extends OMClientResponse{code} > This should be "Response for CreateVolume request". > > {code:java} > /** > * Response for CreateVolume request. > */ > public class OMVolumeDeleteResponse extends OMClientResponse{code} > > This should be "Response for DeleteVolume request". -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3123) Create REST API to serve Pipeline information and integrate with UI in Recon.
[ https://issues.apache.org/jira/browse/HDDS-3123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3123: - Labels: pull-request-available (was: ) > Create REST API to serve Pipeline information and integrate with UI in Recon. > - > > Key: HDDS-3123 > URL: https://issues.apache.org/jira/browse/HDDS-3123 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Recon >Affects Versions: 0.5.0 >Reporter: Vivek Ratnavel Subramanian >Assignee: Vivek Ratnavel Subramanian >Priority: Major > Labels: pull-request-available > > We need a REST API to serve Pipeline information in Recon and integrate with > existing Recon UI. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3121) Fix TestSCMPipelineBytesWrittenMetrics
[ https://issues.apache.org/jira/browse/HDDS-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3121: - Labels: pull-request-available (was: ) > Fix TestSCMPipelineBytesWrittenMetrics > -- > > Key: HDDS-3121 > URL: https://issues.apache.org/jira/browse/HDDS-3121 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > > In the test, we have Thread.sleep and then check the metric value. It will be > better to use GenerictestUtils.waitFor and check the value of the metric. In > few of the runs we have seen this test failed. > {code:java} > Thread.sleep(100 * 1000L); > metrics = > getMetrics(SCMPipelineMetrics.class.getSimpleName()); > for (Pipeline pipeline : cluster.getStorageContainerManager() > .getPipelineManager().getPipelines()) { > Assert.assertEquals(bytesWritten, getLongCounter( > SCMPipelineMetrics.getBytesWrittenMetricName(pipeline), metrics)); > }{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3075) Improve query result of container info in scmcli when container doesn't exist
[ https://issues.apache.org/jira/browse/HDDS-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3075: - Labels: pull-request-available (was: ) > Improve query result of container info in scmcli when container doesn't exist > - > > Key: HDDS-3075 > URL: https://issues.apache.org/jira/browse/HDDS-3075 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: YiSheng Lien >Assignee: Siddharth Wagle >Priority: Major > Labels: pull-request-available > Attachments: scmcli container info.png > > > When *ozone scmcli container info* queried the ** that doesn't > exist, the query result would only show *#*. > I propose that we should inform user the container doesn't exist. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2339) Add OzoneManager to MiniOzoneChaosCluster
[ https://issues.apache.org/jira/browse/HDDS-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-2339: - Labels: pull-request-available (was: ) > Add OzoneManager to MiniOzoneChaosCluster > - > > Key: HDDS-2339 > URL: https://issues.apache.org/jira/browse/HDDS-2339 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: om >Reporter: Mukul Kumar Singh >Assignee: Hanisha Koneru >Priority: Major > Labels: pull-request-available > > This jira proposes to add OzoneManager to MiniOzoneChaosCluster with OzoneHA > implementation done. This will help in discovering bugs in Ozone Manager HA -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3131) TestMiniChaosOzoneCluster timeout
[ https://issues.apache.org/jira/browse/HDDS-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3131: - Labels: pull-request-available (was: ) > TestMiniChaosOzoneCluster timeout > - > > Key: HDDS-3131 > URL: https://issues.apache.org/jira/browse/HDDS-3131 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Attila Doroszlai >Priority: Critical > Labels: pull-request-available > Attachments: unit (1).zip, unit (2).zip > > > TestMiniChaosOzoneCluster times out in CI runs rather frequently: > https://github.com/apache/hadoop-ozone/runs/486890736 > https://github.com/apache/hadoop-ozone/runs/486890004 > https://github.com/apache/hadoop-ozone/runs/486836962 > Logs are available in "unit" artifacts. > CC [~msingh] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3130) Add jaeger trace span in s3gateway
[ https://issues.apache.org/jira/browse/HDDS-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3130: - Labels: pull-request-available (was: ) > Add jaeger trace span in s3gateway > -- > > Key: HDDS-3130 > URL: https://issues.apache.org/jira/browse/HDDS-3130 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: runzhiwang >Assignee: runzhiwang >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3132) NPE when create RPC client
[ https://issues.apache.org/jira/browse/HDDS-3132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3132: - Labels: pull-request-available (was: ) > NPE when create RPC client > --- > > Key: HDDS-3132 > URL: https://issues.apache.org/jira/browse/HDDS-3132 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Sammi Chen >Assignee: Sammi Chen >Priority: Major > Labels: pull-request-available > > java.io.IOException: Couldn't create RpcClient protocol > at > org.apache.hadoop.ozone.client.OzoneClientFactory.getClientProtocol(OzoneClientFactory.java:197) > at > org.apache.hadoop.ozone.client.OzoneClientFactory.getClientProtocol(OzoneClientFactory.java:173) > at > org.apache.hadoop.ozone.client.OzoneClientFactory.getClient(OzoneClientFactory.java:74) > at > org.preta.tools.ozone.benchmark.om.OmReadBenchmark.getInputKeyNames(OmReadBenchmark.java:101) > at > org.preta.tools.ozone.benchmark.om.OmReadBenchmark.execute(OmReadBenchmark.java:76) > at > org.preta.tools.ozone.benchmark.om.AbstractOmBenchmark.run(AbstractOmBenchmark.java:63) > at picocli.CommandLine.executeUserObject(CommandLine.java:1729) > at picocli.CommandLine.access$900(CommandLine.java:145) > at picocli.CommandLine$RunLast.handle(CommandLine.java:2101) > at picocli.CommandLine$RunLast.handle(CommandLine.java:2068) > at > picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:1935) > at picocli.CommandLine.execute(CommandLine.java:1864) > at org.preta.tools.ozone.Main.execute(Main.java:50) > at org.preta.tools.ozone.Main.main(Main.java:54) > Caused by: java.lang.NullPointerException: Name is null > at java.lang.Enum.valueOf(Enum.java:236) > at > org.apache.hadoop.ozone.security.acl.IAccessAuthorizer$ACLType.valueOf(IAccessAuthorizer.java:48) > at > org.apache.hadoop.ozone.security.acl.OzoneAclConfig.getUserDefaultRights(OzoneAclConfig.java:52) > at > org.apache.hadoop.ozone.client.rpc.RpcClient.(RpcClient.java:148) > at > org.apache.hadoop.ozone.client.OzoneClientFactory.getClientProtocol(OzoneClientFactory.java:190 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3117) Recon throws InterruptedException while getting new snapshot from OM
[ https://issues.apache.org/jira/browse/HDDS-3117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3117: - Labels: pull-request-available (was: ) > Recon throws InterruptedException while getting new snapshot from OM > > > Key: HDDS-3117 > URL: https://issues.apache.org/jira/browse/HDDS-3117 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Recon >Affects Versions: 0.5.0 >Reporter: Vivek Ratnavel Subramanian >Assignee: Aravindan Vijayan >Priority: Major > Labels: pull-request-available > > Recon throws the following exception in a cluster with 21 datanodes and 1 > million keys > {code:java} > 12:46:23.482 PMINFOOzoneManagerServiceProviderImplObtaining full snapshot > from Ozone Manager > 12:47:08.072 PMINFOOzoneManagerServiceProviderImplGot new checkpoint from OM > : /var/lib/hadoop-ozone/recon/om/data/om.snapshot.db_1583181983482 > 12:47:08.072 PMINFOReconOmMetadataManagerImplCleaning up old OM snapshot db > at /var/lib/hadoop-ozone/recon/om/data/om.snapshot.db_1583174381836. > 12:47:08.166 PMINFOReconOmMetadataManagerImplCreated OM DB handle from > snapshot at /var/lib/hadoop-ozone/recon/om/data/om.snapshot.db_1583181983482. > 12:47:08.276 PMINFOOzoneManagerServiceProviderImplCalling reprocess on Recon > tasks. > 12:47:08.276 PMERROROzoneManagerServiceProviderImplUnable to update Recon's > OM DB with new snapshot > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1302) > at java.util.concurrent.Semaphore.acquire(Semaphore.java:312) > at > org.apache.hadoop.ozone.recon.tasks.ReconTaskControllerImpl.reInitializeTasks(ReconTaskControllerImpl.java:175) > at > org.apache.hadoop.ozone.recon.spi.impl.OzoneManagerServiceProviderImpl.syncDataFromOM(OzoneManagerServiceProviderImpl.java:375) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3120) Freon work with OM HA
[ https://issues.apache.org/jira/browse/HDDS-3120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3120: - Labels: pull-request-available (was: ) > Freon work with OM HA > - > > Key: HDDS-3120 > URL: https://issues.apache.org/jira/browse/HDDS-3120 > Project: Hadoop Distributed Data Store > Issue Type: New Feature >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > > Make Freon commands work with OM HA -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3089) TestSCMNodeManager intermittent crash
[ https://issues.apache.org/jira/browse/HDDS-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3089: - Labels: pull-request-available (was: ) > TestSCMNodeManager intermittent crash > - > > Key: HDDS-3089 > URL: https://issues.apache.org/jira/browse/HDDS-3089 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Major > Labels: pull-request-available > Attachments: hs_err_pid9082.log, > org.apache.hadoop.hdds.scm.node.TestSCMNodeManager-output.txt > > > TestSCMNodeManager crashed in one of the runs, although it passes usually: > {code:title=https://github.com/apache/hadoop-ozone/pull/601/checks?check_run_id=471611827} > [ERROR] Crashed tests: > [ERROR] org.apache.hadoop.hdds.scm.node.TestSCMNodeManager > {code} > {code:title=hs_err_pid9082.log} > siginfo: si_signo: 11 (SIGSEGV), si_code: 2 (SEGV_ACCERR), si_addr: > 0x7f378cf6f340 > Stack: [0x7f37626fb000,0x7f37627fc000], sp=0x7f37627f9e48, free > space=1019k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native > code) > C 0x7f378cf6f340 > C [librocksdbjni3775377216204452319.so+0x2a05dd] > rocksdb::DB::Delete(rocksdb::WriteOptions const&, > rocksdb::ColumnFamilyHandle*, rocksdb::Slice const&)+0x4d > C [librocksdbjni3775377216204452319.so+0x2a0641] > rocksdb::DBImpl::Delete(rocksdb::WriteOptions const&, > rocksdb::ColumnFamilyHandle*, rocksdb::Slice const&)+0x11 > C [librocksdbjni3775377216204452319.so+0x1a931a] > rocksdb::DB::Delete(rocksdb::WriteOptions const&, rocksdb::Slice const&)+0xba > C [librocksdbjni3775377216204452319.so+0x19f3e0] > rocksdb_delete_helper(JNIEnv_*, rocksdb::DB*, rocksdb::WriteOptions const&, > rocksdb::ColumnFamilyHandle*, _jbyteArray*, int, int)+0x130 > C [librocksdbjni3775377216204452319.so+0x19f4a1] > Java_org_rocksdb_RocksDB_delete__J_3BII+0x41 > j org.rocksdb.RocksDB.delete(J[BII)V+0 > j org.rocksdb.RocksDB.delete([B)V+13 > j org.apache.hadoop.hdds.utils.RocksDBStore.delete([B)V+9 > j > org.apache.hadoop.hdds.scm.pipeline.SCMPipelineManager.removePipeline(Lorg/apache/hadoop/hdds/scm/pipeline/PipelineID;)V+35 > j > org.apache.hadoop.hdds.scm.pipeline.SCMPipelineManager.destroyPipeline(Lorg/apache/hadoop/hdds/scm/pipeline/Pipeline;)V+27 > ... > j > org.apache.hadoop.hdds.scm.node.DeadNodeHandler.destroyPipelines(Lorg/apache/hadoop/hdds/protocol/DatanodeDetails;)V+28 > j > org.apache.hadoop.hdds.scm.node.DeadNodeHandler.onMessage(Lorg/apache/hadoop/hdds/protocol/DatanodeDetails;Lorg/apache/hadoop/hdds/server/events/EventPublisher;)V+6 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3140) Remove hard-coded SNAPSHOT version from GitHub workflows
[ https://issues.apache.org/jira/browse/HDDS-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3140: - Labels: pull-request-available (was: ) > Remove hard-coded SNAPSHOT version from GitHub workflows > > > Key: HDDS-3140 > URL: https://issues.apache.org/jira/browse/HDDS-3140 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: build >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Major > Labels: pull-request-available > > Ozone's GitHub Actions workflows only work with SNAPSHOT versions due to > hard-coded {{ozone-*-SNAPSHOT}} in target path. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3141) Unit check fails to execute insight and mini-chaos-tests modules
[ https://issues.apache.org/jira/browse/HDDS-3141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3141: - Labels: pull-request-available (was: ) > Unit check fails to execute insight and mini-chaos-tests modules > > > Key: HDDS-3141 > URL: https://issues.apache.org/jira/browse/HDDS-3141 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: build, test >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Major > Labels: pull-request-available > > {code:title=https://github.com/apache/hadoop-ozone/runs/490978126?check_suite_focus=true} > 2020-03-06T19:13:08.6122969Z [ERROR] Failed to execute goal on project > hadoop-ozone-insight: Could not resolve dependencies for project > org.apache.hadoop:hadoop-ozone-insight:jar:0.5.0-beta: Could not find > artifact org.apache.hadoop:hadoop-ozone-integration-test:jar:tests:0.5.0-beta > in apache.snapshots.https > (https://repository.apache.org/content/repositories/snapshots) -> [Help 1] > 2020-03-06T19:13:08.6180318Z [ERROR] Failed to execute goal on project > mini-chaos-tests: Could not resolve dependencies for project > org.apache.hadoop:mini-chaos-tests:jar:0.5.0-beta: Failure to find > org.apache.hadoop:hadoop-ozone-integration-test:jar:tests:0.5.0-beta in > https://repository.apache.org/content/repositories/snapshots was cached in > the local repository, resolution will not be reattempted until the update > interval of apache.snapshots.https has elapsed or updates are forced -> [Help > 1] > {code} > Unit check skips {{integration-test}}, but these 2 modules depend on it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3143) Rename silently ignored tests
[ https://issues.apache.org/jira/browse/HDDS-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3143: - Labels: pull-request-available (was: ) > Rename silently ignored tests > - > > Key: HDDS-3143 > URL: https://issues.apache.org/jira/browse/HDDS-3143 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Minor > Labels: pull-request-available > > Surefire plugin is configured to run {{Test*}} classes, but there are two > test classes named {{*Test}}: > {code} > $ find */*/src/test/java -name '*Test.java' | xargs grep -l '@Test' > hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/HddsServerUtilTest.java > hadoop-ozone/insight/src/test/java/org/apache/hadoop/ozone/insight/LogSubcommandTest.java > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3150) Implement getIfExist in Table and use it in CreateKey/File
[ https://issues.apache.org/jira/browse/HDDS-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3150: - Labels: pull-request-available (was: ) > Implement getIfExist in Table and use it in CreateKey/File > -- > > Key: HDDS-3150 > URL: https://issues.apache.org/jira/browse/HDDS-3150 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > > With replay, now we use directly get() API. > Previously the code > OMKeyRequest.java > > {code:java} > else if (omMetadataManager.getKeyTable().isExist(dbKeyName)) { > // TODO: Need to be fixed, as when key already exists, we are > // appending new blocks to existing key. > keyInfo = omMetadataManager.getKeyTable().get(dbKeyName);{code} > > Now for every create key/File we use get API, this is changed for replay > {code:java} > OmKeyInfo dbKeyInfo = > omMetadataManager.getKeyTable().get(dbKeyName); > if (dbKeyInfo != null) {{code} > The proposal is to change get with getIfExist, and make use of keyMayExist. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3100) Fix TestDeadNodeHandler.
[ https://issues.apache.org/jira/browse/HDDS-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3100: - Labels: pull-request-available (was: ) > Fix TestDeadNodeHandler. > > > Key: HDDS-3100 > URL: https://issues.apache.org/jira/browse/HDDS-3100 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Affects Versions: 0.5.0 >Reporter: Aravindan Vijayan >Assignee: Aravindan Vijayan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3142) Create isolated enviornment for OM to test it without SCM
[ https://issues.apache.org/jira/browse/HDDS-3142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3142: - Labels: pull-request-available (was: ) > Create isolated enviornment for OM to test it without SCM > - > > Key: HDDS-3142 > URL: https://issues.apache.org/jira/browse/HDDS-3142 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Marton Elek >Assignee: Marton Elek >Priority: Major > Labels: pull-request-available > > OmKeyGenerator class from Freon can generate keys (open key + commit key). > But this test tests both OM and SCM performance. It seems to be useful to > have a method to test only the OM performance with faking the response from > SCM. > Can be done easily with the same approach what we have in HDDS-3023: A simple > utility class can be implemented and with byteman we can replace the client > calls with the fake method. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3152) Reduce number of chunkwriter threads in integration tests
[ https://issues.apache.org/jira/browse/HDDS-3152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3152: - Labels: pull-request-available (was: ) > Reduce number of chunkwriter threads in integration tests > - > > Key: HDDS-3152 > URL: https://issues.apache.org/jira/browse/HDDS-3152 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: test >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Minor > Labels: pull-request-available > > Integration tests run multiple datanodes in the same JVM. Each datanode > comes with 60 chunk writer threads by default (may be decreased in > HDDS-3053). This makes thread dumps (eg. produced by > {{GenericTestUtils.waitFor}} on timeout) really hard to navigate, as there > may be 300+ such threads. > Since integration tests are generally run with a single disk which is even > shared among the datanodes, a few threads per datanode should be enough. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3095) Intermittent failure in TestFailureHandlingByClient#testDatanodeExclusionWithMajorityCommit
[ https://issues.apache.org/jira/browse/HDDS-3095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3095: - Labels: pull-request-available (was: ) > Intermittent failure in > TestFailureHandlingByClient#testDatanodeExclusionWithMajorityCommit > --- > > Key: HDDS-3095 > URL: https://issues.apache.org/jira/browse/HDDS-3095 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Major > Labels: pull-request-available > > {code:title=https://github.com/apache/hadoop-ozone/pull/614/checks?check_run_id=472938597} > [ERROR] Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: > 284.887 s <<< FAILURE! - in > org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient > [ERROR] > testDatanodeExclusionWithMajorityCommit(org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient) > Time elapsed: 66.589 s <<< FAILURE! > java.lang.AssertionError > ... >at > org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient.testDatanodeExclusionWithMajorityCommit(TestFailureHandlingByClient.java:336) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2989) Intermittent timeout in TestBlockManager
[ https://issues.apache.org/jira/browse/HDDS-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-2989: - Labels: pull-request-available (was: ) > Intermittent timeout in TestBlockManager > > > Key: HDDS-2989 > URL: https://issues.apache.org/jira/browse/HDDS-2989 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Major > Labels: pull-request-available > > {code:title=https://github.com/apache/hadoop-ozone/runs/430663688} > 2020-02-06T21:44:53.5319531Z [ERROR] Tests run: 9, Failures: 0, Errors: 1, > Skipped: 0, Time elapsed: 5.344 s <<< FAILURE! - in > org.apache.hadoop.hdds.scm.block.TestBlockManager > 2020-02-06T21:44:53.5319796Z [ERROR] > testMultipleBlockAllocation(org.apache.hadoop.hdds.scm.block.TestBlockManager) > Time elapsed: 1.167 s <<< ERROR! > 2020-02-06T21:44:53.5319942Z java.util.concurrent.TimeoutException: > 2020-02-06T21:44:53.5320496Z Timed out waiting for condition. Thread > diagnostics: > 2020-02-06T21:44:53.5320839Z Timestamp: 2020-02-06 09:44:52,261 > 2020-02-06T21:44:53.5320901Z > 2020-02-06T21:44:53.5321178Z "Thread-26" prio=5 tid=46 runnable > 2020-02-06T21:44:53.5321292Z java.lang.Thread.State: RUNNABLE > 2020-02-06T21:44:53.5321391Z at java.lang.Thread.dumpThreads(Native > Method) > 2020-02-06T21:44:53.5326891Z at > java.lang.Thread.getAllStackTraces(Thread.java:1610) > 2020-02-06T21:44:53.5327144Z at > org.apache.hadoop.test.TimedOutTestsListener.buildThreadDump(TimedOutTestsListener.java:87) > 2020-02-06T21:44:53.5327309Z at > org.apache.hadoop.test.TimedOutTestsListener.buildThreadDiagnosticString(TimedOutTestsListener.java:73) > 2020-02-06T21:44:53.5327465Z at > org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:389) > 2020-02-06T21:44:53.5327618Z at > org.apache.hadoop.hdds.scm.block.TestBlockManager.testMultipleBlockAllocation(TestBlockManager.java:280) > 2020-02-06T21:44:53.5388042Z at > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > 2020-02-06T21:44:53.5388702Z at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > 2020-02-06T21:44:53.5388905Z at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 2020-02-06T21:44:53.5389045Z at > java.lang.reflect.Method.invoke(Method.java:498) > 2020-02-06T21:44:53.5389195Z at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > 2020-02-06T21:44:53.5389331Z at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > 2020-02-06T21:44:53.5389662Z at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > 2020-02-06T21:44:53.5389776Z at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > 2020-02-06T21:44:53.5389916Z at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) > 2020-02-06T21:44:53.5390040Z "Signal Dispatcher" daemon prio=9 tid=4 runnable > 2020-02-06T21:44:53.5390156Z java.lang.Thread.State: RUNNABLE > 2020-02-06T21:44:53.5390783Z > "EventQueue-CloseContainerForCloseContainerEventHandler" prio=5 tid=32 in > Object.wait() > 2020-02-06T21:44:53.5390916Z java.lang.Thread.State: WAITING (on object > monitor) > 2020-02-06T21:44:53.5391019Z at sun.misc.Unsafe.park(Native Method) > 2020-02-06T21:44:53.5391149Z at > java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > 2020-02-06T21:44:53.5391299Z at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) > 2020-02-06T21:44:53.5391448Z at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > 2020-02-06T21:44:53.5391587Z at > java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074) > 2020-02-06T21:44:53.5391721Z at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134) > 2020-02-06T21:44:53.5391844Z at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > 2020-02-06T21:44:53.5391971Z at java.lang.Thread.run(Thread.java:748) > 2020-02-06T21:44:53.5392100Z "IPC Server idle connection scanner for port > 43801" daemon prio=5 tid=24 in Object.wait() > 2020-02-06T21:44:53.5392227Z java.lang.Thread.State: WAITING (on object > monitor) > 2020-02-06T21:44:53.5392347Z at java.lang.Object.wait(Native Method) > 2020-02-06T21:44:53.5392463Z at java.lang.Object.wait(Object.java:502) > 2
[jira] [Updated] (HDDS-3104) Integration test crashes due to critical error in datanode
[ https://issues.apache.org/jira/browse/HDDS-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3104: - Labels: pull-request-available (was: ) > Integration test crashes due to critical error in datanode > -- > > Key: HDDS-3104 > URL: https://issues.apache.org/jira/browse/HDDS-3104 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Attila Doroszlai >Assignee: Siddharth Wagle >Priority: Major > Labels: pull-request-available > Attachments: HDDS-3104.patch, > org.apache.hadoop.ozone.client.rpc.TestContainerStateMachineFailureOnRead-output.txt > > > {code:title=test log} > 2020-02-28 07:36:17,759 [Datanode State Machine Thread - 0] ERROR > statemachine.StateContext (StateContext.java:execute(420)) - Critical error > occurred in StateMachine, setting shutDownMachine > ... > 2020-02-28 07:36:21,216 [Datanode State Machine Thread - 0] INFO > util.ExitUtil (ExitUtil.java:terminate(210)) - Exiting with status 1: > ExitException > {code} > {code:title=build output} > [ERROR] ExecutionException The forked VM terminated without properly saying > goodbye. VM crash or System.exit called? > {code} > https://github.com/adoroszlai/hadoop-ozone/runs/474218807 > https://github.com/adoroszlai/hadoop-ozone/suites/487650271/artifacts/2327174 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3148) Logs cluttered by AlreadyExistsException from Ratis
[ https://issues.apache.org/jira/browse/HDDS-3148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3148: - Labels: pull-request-available (was: ) > Logs cluttered by AlreadyExistsException from Ratis > --- > > Key: HDDS-3148 > URL: https://issues.apache.org/jira/browse/HDDS-3148 > Project: Hadoop Distributed Data Store > Issue Type: Wish > Components: Ozone Datanode >Reporter: Attila Doroszlai >Assignee: Siddharth Wagle >Priority: Major > Labels: pull-request-available > > Ozone startup logs are cluttered by printing stack trace of > AlreadyExistsException related to group addition. Example: > {code} > 2020-03-09 13:53:01,563 [grpc-default-executor-0] WARN impl.RaftServerProxy > (RaftServerProxy.java:lambda$groupAddAsync$11(390)) - > 7a07f161-9144-44b2-8baa-73f0e9299675: Failed groupAdd* > GroupManagementRequest:client-27FB1A91809E->7a07f161-9144-44b2-8baa-73f0e9299675@group-E151028E3AC0, > cid=2, seq=0, RW, null, > Add:group-E151028E3AC0:[18f4e257-bf09-482e-b1bb-a2408a093ff7:172.17.0.2:43845, > 7a07f161-9144-44b2-8baa-73f0e9299675:172.17.0.2:41551, > 8a66c80e-ab55-4975-92a9-8aaf06ab418a:172.17.0.2:36921] > java.util.concurrent.CompletionException: > org.apache.ratis.protocol.AlreadyExistsException: > 7a07f161-9144-44b2-8baa-73f0e9299675: Failed to add > group-E151028E3AC0:[18f4e257-bf09-482e-b1bb-a2408a093ff7:172.17.0.2:43845, > 7a07f161-9144-44b2-8baa-73f0e9299675:172.17.0.2:41551, > 8a66c80e-ab55-4975-92a9-8aaf06ab418a:172.17.0.2:36921] since the group > already exists in the map. > at > java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) > at > java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) > at > java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607) > at > java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591) > at > java.util.concurrent.CompletableFuture.uniApplyStage(CompletableFuture.java:631) > at > java.util.concurrent.CompletableFuture.thenApplyAsync(CompletableFuture.java:2006) > at > org.apache.ratis.server.impl.RaftServerProxy.groupAddAsync(RaftServerProxy.java:379) > at > org.apache.ratis.server.impl.RaftServerProxy.groupManagementAsync(RaftServerProxy.java:363) > at > org.apache.ratis.grpc.server.GrpcAdminProtocolService.lambda$groupManagement$0(GrpcAdminProtocolService.java:42) > at org.apache.ratis.grpc.GrpcUtil.asyncCall(GrpcUtil.java:160) > at > org.apache.ratis.grpc.server.GrpcAdminProtocolService.groupManagement(GrpcAdminProtocolService.java:42) > at > org.apache.ratis.proto.grpc.AdminProtocolServiceGrpc$MethodHandlers.invoke(AdminProtocolServiceGrpc.java:358) > at > org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:172) > at > org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:331) > at > org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:814) > at > org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) > at > org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.ratis.protocol.AlreadyExistsException: > 7a07f161-9144-44b2-8baa-73f0e9299675: Failed to add > group-E151028E3AC0:[18f4e257-bf09-482e-b1bb-a2408a093ff7:172.17.0.2:43845, > 7a07f161-9144-44b2-8baa-73f0e9299675:172.17.0.2:41551, > 8a66c80e-ab55-4975-92a9-8aaf06ab418a:172.17.0.2:36921] since the group > already exists in the map. > at > org.apache.ratis.server.impl.RaftServerProxy$ImplMap.addNew(RaftServerProxy.java:83) > at > org.apache.ratis.server.impl.RaftServerProxy.groupAddAsync(RaftServerProxy.java:378) > ... 13 more > {code} > Since these are "normal", I think stack trace should be suppressed. > CC [~nanda] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3157) Fix docker startup command in README.md
[ https://issues.apache.org/jira/browse/HDDS-3157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3157: - Labels: pull-request-available (was: ) > Fix docker startup command in README.md > > > Key: HDDS-3157 > URL: https://issues.apache.org/jira/browse/HDDS-3157 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Neo Yang >Assignee: Neo Yang >Priority: Minor > Labels: pull-request-available > > There are some docker-compose startup command errors in [this > block|https://github.com/apache/hadoop-ozone#build-from-source], it should be: > cd hadoop-ozone/dist/target/ozone-*/compose{color:#FF}*/ozone*{color} > docker-compose up -d {color:#FF}*--*{color}scale datanode=3 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3156) update allocateContainer to remove additional createPipeline step.
[ https://issues.apache.org/jira/browse/HDDS-3156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3156: - Labels: pull-request-available (was: ) > update allocateContainer to remove additional createPipeline step. > -- > > Key: HDDS-3156 > URL: https://issues.apache.org/jira/browse/HDDS-3156 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: SCM >Reporter: Li Cheng >Assignee: Li Cheng >Priority: Major > Labels: pull-request-available > > AllocateContainer now tries allocatePipelines. But with multi-raft, it should > not worry about whether there are available pipelines. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3159) Bump RocksDB version to the latest one
[ https://issues.apache.org/jira/browse/HDDS-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3159: - Labels: pull-request-available (was: ) > Bump RocksDB version to the latest one > -- > > Key: HDDS-3159 > URL: https://issues.apache.org/jira/browse/HDDS-3159 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Marton Elek >Assignee: Marton Elek >Priority: Minor > Labels: pull-request-available > > 6.0.1 -- our current version from RocksDB -- released one year ago. Since > than many new versions are released with important bug fixes. > I propose to update to the latest one... -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3160) Disable index and filter block cache for RocksDB
[ https://issues.apache.org/jira/browse/HDDS-3160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-3160: - Labels: pull-request-available (was: ) > Disable index and filter block cache for RocksDB > > > Key: HDDS-3160 > URL: https://issues.apache.org/jira/browse/HDDS-3160 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Marton Elek >Assignee: Marton Elek >Priority: Minor > Labels: pull-request-available > Attachments: key_allocation_after.png, key_allocation_before.png, > profile.png > > > During preformance tests It was noticed that the OM performance is dropped > after 10-20 million of keys. (see the screenshot). > By default cache_index_and_filter_blocks is enabled for all of our RocksDB > instances (see DBProfile) which is not the best option. (For example see this > thread: https://github.com/facebook/rocksdb/issues/3961#) > With turning on this cache the indexes and bloom filters are cached **inside > the block cache** which makes slower the cache when we have significant data. > Without turning it on (based on my understanding) all the indexes will remain > open without any cache. With our current settings we have only a few number > of sst files (even with million of keys) therefore it seems to be safe to > turn this option off. > With turning this option of I was able to write >100M keys with high > throughput. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2848) Recon changes to make snapshots work with OM HA
[ https://issues.apache.org/jira/browse/HDDS-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-2848: - Labels: pull-request-available (was: ) > Recon changes to make snapshots work with OM HA > --- > > Key: HDDS-2848 > URL: https://issues.apache.org/jira/browse/HDDS-2848 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Recon >Reporter: Aravindan Vijayan >Assignee: Siddharth Wagle >Priority: Major > Labels: pull-request-available > > Recon talks to OM in 2 ways - Through HTTP to get DB snapshot, and through > RPC to get delta updates. > Since Recon already uses the OzoneManagerClientProtocol to query the > OzoneManager RPC, the RPC client automatically routes the request to the > leader on an OM HA cluster. Recon only needs the updates from the OM RocksDB > store, and does not need the in flight updates in the OM DoubleBuffer. Due to > the guarantee from Ratis that the leader’s RocksDB will always be up to date, > Recon does not need to worry about going back in time when a current OM > leader goes down. We have to pass in the om service ID to the Ozone Manager > client in Recon, and the failover works internally. Currently we pass in > 'null'. > To make the HTTP call to work against OM HA, Recon has to find out the > current OM leader and download the snapshot from that OM instance. We can use > the way this has been implemented in > org.apache.hadoop.ozone.admin.om.GetServiceRolesSubcommand. We can get the > roles of OM instances and then determine the leader from that. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org