[jira] [Updated] (HDDS-3027) Ozone: Ensure usage of parameterized slf4j log syntax for ozone

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3027:
-
Labels: newbie pull-request-available  (was: newbie)

> Ozone: Ensure usage of parameterized slf4j log syntax for ozone
> ---
>
> Key: HDDS-3027
> URL: https://issues.apache.org/jira/browse/HDDS-3027
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Xiaoyu Yao
>Priority: Trivial
>  Labels: newbie, pull-request-available
> Fix For: 0.5.0
>
>
> various places use LOG.info("text " + something). they should all move to 
> LOG.info("text {}", something)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2860) Cluster disk space metrics should reflect decommission and maintenance states

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-2860:
-
Labels: pull-request-available  (was: )

> Cluster disk space metrics should reflect decommission and maintenance states
> -
>
> Key: HDDS-2860
> URL: https://issues.apache.org/jira/browse/HDDS-2860
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Affects Versions: 0.5.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
>
> Now we have decommission states, we need to adjust the cluster capacity, 
> space used and available metrics which are exposed via JMX.
> For a node decommissioning, the space used on the node effectively needs to 
> be transfer to other nodes via container replication before decommission can 
> complete, but this is difficult to track from a space usage perspective. When 
> a node completes decommission, we can assume it provides no capacity to the 
> cluster and uses none. Therefore, for decommissioning + decommissioned nodes, 
> the simplest calculation is to exclude the node completely in a similar way 
> to a dead node.
> For maintenance nodes, things are even less clear. For a maintenance node, it 
> is read only so it cannot provide capacity to the cluster, but it is expected 
> to return to service, so excluding it completely probably does not make 
> sense. However, perhaps the simplest solution is to do the following:
> 1. For any node not IN_SERVICE, do not include its usage or space in the 
> cluster capacity totals.
> 2. Introduce some new metrics to account for the maintenance and perhaps 
> decommission capacity, so it is not lost eg:
> {code}
> # Existing metrics
> "DiskCapacity" : 62725623808,
> "DiskUsed" : 4096,
> "DiskRemaining" : 50459619328,
> # Suggested additional new ones, with the above only considering IN_SERVICE 
> nodes:
> "MaintenanceDiskCapacity": 0
> "MaintenanceDiskUsed": 0
> "MaintenanceDiskRemaining": 0
> "DecommissionedDiskCapacity": 0
> "DecommissionedDiskUsed": 0
> "DecommissionedDiskRemaining": 0
> ...
> {code}
> That way, the cluster totals are only what is currently "online", but we have 
> the other metrics to track what has been removed etc. The key advantage of 
> this, is that it is easy to understand.
> There could also be an argument that the new decommissionedDisk metrics are 
> not needed as that capacity is technically lost from the cluster forever.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3028) Use own version from InterfaceAudience/Stability version

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3028:
-
Labels: pull-request-available  (was: )

> Use own version from InterfaceAudience/Stability version
> 
>
> Key: HDDS-3028
> URL: https://issues.apache.org/jira/browse/HDDS-3028
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Marton Elek
>Assignee: Marton Elek
>Priority: Major
>  Labels: pull-request-available
>
> Current Ozone code uses the Hadoop version from @InterfaceAudience and 
> @InterfaceStability annotations.
> While Hadoop uses the annotations during the javadoc generation, in Ozone 
> they are used only as markers as Ozone doesn't generate javadoc during the 
> releases.
> The two annotations are in the Hadoop common project. I propose to copy them 
> and use the copied annotations instead of the original one. It would help us 
> to reduce the dependencies on Hadoop (the hadoop-common which contains the 
> original annotations has 87 transitive dependencies!!)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3031) OM HA- Client requests get LeaderNotReadyException after OM restart

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3031:
-
Labels: pull-request-available  (was: )

> OM HA- Client requests get LeaderNotReadyException after OM restart
> ---
>
> Key: HDDS-3031
> URL: https://issues.apache.org/jira/browse/HDDS-3031
> Project: Hadoop Distributed Data Store
>  Issue Type: New Feature
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>  Labels: pull-request-available
>
> Scenario:
> 1.Set up OM HA cluster.
> 2. Perform some write operations.
> 3. Restart OM's.
> 4. Now try any write operation.
> Below error will be thrown for 15 times, and finally, client request will 
> fail.
> {code:java}
>  
> 2020-02-15 10:11:23,244 [qtp2025269734-19] INFO 
> org.apache.hadoop.io.retry.RetryInvocationHandler: 
> com.google.protobuf.ServiceException: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ozone.om.exceptions.OMLeaderNotReadyException):
>  om1@group-D0D586AF6951 is in LEADER state but not ready yet.
>         at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.processReply(OzoneManagerRatisServer.java:177)
>         at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.submitRequest(OzoneManagerRatisServer.java:136)
>         at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestToRatis(OzoneManagerProtocolServerSideTranslatorPB.java:162)
>         at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:118)
>         at 
> org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72)
>         at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:97)
>         at 
> org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
>         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
> , while invoking $Proxy81.submitRequest over 
> nodeId=om1,nodeAddress=om-ha-1.vpc.cloudera.com:9862 after 1 failover 
> attempts. Trying to failover immediately.
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3030) Key Rename should preserve the ObjectID

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3030:
-
Labels: pull-request-available  (was: )

> Key Rename should preserve the ObjectID
> ---
>
> Key: HDDS-3030
> URL: https://issues.apache.org/jira/browse/HDDS-3030
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: om
>Reporter: Hanisha Koneru
>Assignee: Bharat Viswanadham
>Priority: Major
>  Labels: pull-request-available
>
> On Key Renames, objectID should be preserved from the original Key. 
> Currently it is being set to the new transactionLogIndex of the rename 
> request.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3034) Broken return code check in unit/integration

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3034:
-
Labels: pull-request-available  (was: )

> Broken return code check in unit/integration
> 
>
> Key: HDDS-3034
> URL: https://issues.apache.org/jira/browse/HDDS-3034
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: build, test
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available
>
> HDDS-2915 fixed unit/integration check result in case of Maven error.  
> However, return code check was broken by output redirection via pipeline 
> added in HDDS-2833 and HDDS-2960:
> bq. The return status of a pipeline is the exit status of the last command, 
> unless the pipefail option is enabled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3037) Hide JooQ welcome message on start

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3037:
-
Labels: pull-request-available  (was: )

> Hide JooQ welcome message on start
> --
>
> Key: HDDS-3037
> URL: https://issues.apache.org/jira/browse/HDDS-3037
> Project: Hadoop Distributed Data Store
>  Issue Type: Wish
>  Components: Ozone Recon
>Affects Versions: 0.5.0
>Reporter: Siddharth Wagle
>Assignee: Siddharth Wagle
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>
> Ozone recon start prints out this self-ad message:
> {code}
> 2020-02-19 13:23:18,671 [main] INFO  jooq.Constants 
> (JooqLogger.java:info(338)) - 
>   
> @@
> @@
>   @@@@
> @@
>   @@  @@@@
> @@    @@  @@@@
> @@@@@@
> @@
> @@
> @@@@@@
> @@@@  @@    @@
> @@@@  @@    @@
> @@@@  @  @  @@
> @@@@@@
> @@@  @
> @@
> @@  Thank you for using jOOQ 3.11.9
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3040) Update Ratis version to 0.5.0 released.

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3040:
-
Labels: pull-request-available  (was: )

> Update Ratis version to 0.5.0 released.
> ---
>
> Key: HDDS-3040
> URL: https://issues.apache.org/jira/browse/HDDS-3040
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
>  Labels: pull-request-available
>
> Update Ozone to use latest released version of Ratis 0.5.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3035) Add ability to enable Ratis metrics in OzoneManager

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3035:
-
Labels: pull-request-available  (was: )

> Add ability to enable Ratis metrics in OzoneManager
> ---
>
> Key: HDDS-3035
> URL: https://issues.apache.org/jira/browse/HDDS-3035
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.5.0
>Reporter: Aravindan Vijayan
>Assignee: Siddharth Wagle
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>
> Whenever OM uses Ratis, we may need the ability to collect its metrics 
> through OM JMX. This should be a straightforward change, similar to 
> org.apache.hadoop.ozone.HddsDatanodeService#start(). 
> {code}
>   public void start() {
> //All the Ratis metrics (registered from now) will be published via JMX 
> and
> //via the prometheus exporter (used by the /prom servlet
> MetricRegistries.global()
> .addReporterRegistration(MetricsReporting.jmxReporter());
> MetricRegistries.global().addReporterRegistration(
> registry -> CollectorRegistry.defaultRegistry.register(
> new RatisDropwizardExports(
> registry.getDropWizardMetricRegistry(;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3042) Support running full Ratis pipeline from IDE (IntelliJ)

2020-02-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3042:
-
Labels: pull-request-available  (was: )

> Support running full Ratis pipeline from IDE (IntelliJ) 
> 
>
> Key: HDDS-3042
> URL: https://issues.apache.org/jira/browse/HDDS-3042
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Marton Elek
>Priority: Major
>  Labels: pull-request-available
>
> HDDS-1522 introduced a method to run full cluster in IntelliJ. The runner 
> configurations can be copied with a shell script and a basic ozone-site.xml 
> and log configuration to make it easy to run ozone from IDE.
> Unfortunately this setup supports only one Datanode and it's harder to debug 
> full Ozone pipeline (3 datanodes) from IDE.
> This patch provides 3 different configuration for 3 datanodes with different 
> ports to make it possible to run them on the same host from the IDE.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3040) Update Ratis version to 0.5.0 released.

2020-02-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3040:
-
Labels: pull-request-available  (was: )

> Update Ratis version to 0.5.0 released.
> ---
>
> Key: HDDS-3040
> URL: https://issues.apache.org/jira/browse/HDDS-3040
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Update Ozone to use latest released version of Ratis 0.5.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3045) Integration test crashes due to ReconServer NPE

2020-02-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3045:
-
Labels: pull-request-available  (was: )

> Integration test crashes due to ReconServer NPE
> ---
>
> Key: HDDS-3045
> URL: https://issues.apache.org/jira/browse/HDDS-3045
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available
>
> {code:title=https://github.com/apache/hadoop-ozone/runs/457951373}
> [ERROR] Crashed tests:
> [ERROR] org.apache.hadoop.fs.ozone.TestOzoneFileInterfaces
> {code}
> Log ends with NPE starting Recon, but not sure if it is a cause or an effect:
> {code:title=https://github.com/apache/hadoop-ozone/suites/470859679/artifacts/2058465}
> 2020-02-20 14:45:25,041 [main] INFO  tasks.ReconTaskControllerImpl 
> (ReconTaskControllerImpl.java:start(230)) - Starting Recon Task Controller.
> java.lang.NullPointerException
> at 
> org.apache.hadoop.ozone.recon.ReconServer.start(ReconServer.java:118)
> at org.apache.hadoop.ozone.recon.ReconServer.call(ReconServer.java:95)
> at org.apache.hadoop.ozone.recon.ReconServer.call(ReconServer.java:39)
> {code}
> In [another run|https://github.com/apache/hadoop-ozone/runs/457338931] 
> TestOzoneClientRetriesOnException crashed, 
> [log|https://github.com/apache/hadoop-ozone/suites/470219052/artifacts/2048272]
>  also ends with same NPE.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3016) Fix TestMultiBlockWritesWithDnFailures.java

2020-02-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3016:
-
Labels: pull-request-available  (was: )

> Fix TestMultiBlockWritesWithDnFailures.java
> ---
>
> Key: HDDS-3016
> URL: https://issues.apache.org/jira/browse/HDDS-3016
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Client
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3044) Fix TestDeleteWithSlowFollower.java

2020-02-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3044:
-
Labels: pull-request-available  (was: )

> Fix TestDeleteWithSlowFollower.java
> ---
>
> Key: HDDS-3044
> URL: https://issues.apache.org/jira/browse/HDDS-3044
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Datanode
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-818) OzoneConfiguration uses an existing XMLRoot value

2020-02-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-818:

Labels: pull-request-available  (was: )

> OzoneConfiguration uses an existing XMLRoot value
> -
>
> Key: HDDS-818
> URL: https://issues.apache.org/jira/browse/HDDS-818
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Giovanni Matteo Fumarola
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDDS-818.v0.patch
>
>
> OzoneConfiguration and ConfInfo have 
> @XmlRootElement(name = "configuration")
> This makes REST Client crash for XML calls.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3046) Fix Retry handling in Hadoop RPC Client

2020-02-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3046:
-
Labels: OMHA pull-request-available  (was: OMHA)

> Fix Retry handling in Hadoop RPC Client
> ---
>
> Key: HDDS-3046
> URL: https://issues.apache.org/jira/browse/HDDS-3046
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>  Labels: OMHA, pull-request-available
>
> Right now for all other exceptions other than serviceException we use 
> FailOverOnNetworkException.
> This Exception policy is created with 15 max fail overs and 15 retries. 
>  
> {code:java}
> retryPolicyOnNetworkException.shouldRetry(
>  exception, retries, failovers, isIdempotentOrAtMostOnce);{code}
> *2 issues with this:*
>  # When shouldRetry returns action FAILOVER_AND_RETRY, it will stuck with 
> same OM, and does not perform failover to next OM.  As 
> OMFailoverProxyProvider#performFailover() is a dummy call does not perform 
> any failover.
>  # When ozone.client.failover.max.attempts is set to 15, now with 2 policies 
> with each set to 15, we will retry 15*2 times in worst scenario. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3051) Periodic HDDS volume checker thread should be a daemon

2020-02-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3051:
-
Labels: pull-request-available  (was: )

> Periodic HDDS volume checker thread should be a daemon
> --
>
> Key: HDDS-3051
> URL: https://issues.apache.org/jira/browse/HDDS-3051
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Marton Elek
>Assignee: Marton Elek
>Priority: Major
>  Labels: pull-request-available
>
> Periodic HDDS volume checker is an auxiliary thread. It can be stopped if the 
> main threads are stopped. Therefore we need to mark it as daemon thread.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3053) Decrease the number of the chunk writer threads

2020-02-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3053:
-
Labels: pull-request-available  (was: )

> Decrease the number of the chunk writer threads
> ---
>
> Key: HDDS-3053
> URL: https://issues.apache.org/jira/browse/HDDS-3053
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Marton Elek
>Priority: Major
>  Labels: pull-request-available
>
> As of now we create 60 threads (
> dfs.container.ratis.num.write.chunk.threads) to write chunk data to the disk. 
> As the write is limited by the IO I can't see any benefit to have so many 
> threads. High number of thread means a high context switch overhead, 
> therefore it seems to be more reasonable to use only a limited number of 
> threads.
> For example 10 threads should be enough even with 5 external disk.
> If you know any reason to keep the number 60, please let me know...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3050) Use meaningful name for ChunkWriter threads

2020-02-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3050:
-
Labels: pull-request-available  (was: )

> Use meaningful name for ChunkWriter threads
> ---
>
> Key: HDDS-3050
> URL: https://issues.apache.org/jira/browse/HDDS-3050
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Marton Elek
>Assignee: Marton Elek
>Priority: Major
>  Labels: pull-request-available
>
> ChunkWriter threads acreated with a naming schema 'pool-[x]-thread-[y]'. We 
> can use better naming (especially as we have 60 threads...)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3052) Test ChunkManagerImpl performance with long-running freon tests

2020-02-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3052:
-
Labels: pull-request-available  (was: )

> Test ChunkManagerImpl performance with long-running freon tests
> ---
>
> Key: HDDS-3052
> URL: https://issues.apache.org/jira/browse/HDDS-3052
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Marton Elek
>Assignee: Marton Elek
>Priority: Major
>  Labels: pull-request-available
>
> ChunkManagerImpl is the core of the data write path. Would be great to test 
> it with standard Freon toolset. It can provide a baseline disk speed and also 
> can validate different behavior of the ChunkManagerImpl (eg. is it faster 
> from 60 threads?)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3055) SCM crash during startup does not print any error message to log

2020-02-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3055:
-
Labels: pull-request-available  (was: )

> SCM crash during startup does not print any error message to log
> 
>
> Key: HDDS-3055
> URL: https://issues.apache.org/jira/browse/HDDS-3055
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>  Labels: pull-request-available
>
> SCM start up failed due to a pipelineNotFoundException, there is no error 
> message logged in to SCM log.
> In the log file, we can see just below log message no reason for the crash is 
> logged.
>  
>  
> {code:java}
> 2020-02-20 15:37:56,079 [shutdown-hook-0] INFO 
> org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter: 
> SHUTDOWN_MSG:
> /
> SHUTDOWN_MSG: Shutting down StorageContainerManager at xx.xx.xx/10.65.51.49
> {code}
> In the out file, we can see below, but not complete exception message.
> {code:java}
> PipelineID=x not found{code}
>  
> The actual reason for failure is not clearly logged if an exception has 
> occurred during SCM startup.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3047) ObjectStore#listVolumesByUser and CreateVolumeHandler#call should get user's full principal name instead of login name by default

2020-02-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3047:
-
Labels: pull-request-available  (was: )

> ObjectStore#listVolumesByUser and CreateVolumeHandler#call should get user's 
> full principal name instead of login name by default
> -
>
> Key: HDDS-3047
> URL: https://issues.apache.org/jira/browse/HDDS-3047
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>  Labels: pull-request-available
>
> [{{ObjectStore#listVolumesByUser}}|https://github.com/apache/hadoop-ozone/blob/2fa37ef99b8fb4575169ba8326eeb677b3d2ed74/hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/ObjectStore.java#L249-L256]
>  is using {{getShortUserName()}} by default (when user is empty or null):
> {code:java|title=ObjectStore#listVolumesByUser}
>   public Iterator listVolumesByUser(String user,
>   String volumePrefix, String prevVolume)
>   throws IOException {
> if(Strings.isNullOrEmpty(user)) {
>   user = UserGroupInformation.getCurrentUser().getShortUserName();  // <--
> }
> return new VolumeIterator(user, volumePrefix, prevVolume);
>   }
> {code}
> It should use {{getUserName()}} instead.
> For a quick reference for the difference between {{getUserName()}} and 
> {{getShortUserName()}}:
> {code:java|title=UserGroupInformation#getUserName}
>   /**
>* Get the user's full principal name.
>* @return the user's full principal name.
>*/
>   @InterfaceAudience.Public
>   @InterfaceStability.Evolving
>   public String getUserName() {
> return user.getName();
>   }
> {code}
> {code:java|title=UserGroupInformation#getShortUserName}
>   /**
>* Get the user's login name.
>* @return the user's name up to the first '/' or '@'.
>*/
>   public String getShortUserName() {
> return user.getShortName();
>   }
> {code}
> This won't cause issue if Kerberos is not in use. However, once Kerberos is 
> enabled, {{getUserName()}} and {{getShortUserName()}} result differs and can 
> cause some issues.
> When Kerberos is enabled, {{getUserName()}} returns full principal name e.g. 
> {{om/o...@example.com}}, but {{getShortUserName()}} will return login name 
> e.g. {{hadoop}}.
> If {{hadoop.security.auth_to_local}} is set, {{getShortUserName()}} result 
> can become very different from full principal name.
> For example, when {{hadoop.security.auth_to_local = 
> RULE:[2:$1@$0](.*)s/.*/root/}},
> {{getShortUserName()}} returns {{root}}, while {{getUserName()}} still gives 
> {{om/o...@example.com}}.)
> This can lead to user experience issue (when Kerberos is enabled) where the 
> user creates a volume with ozone shell ([uses 
> {{getUserName()}}|https://github.com/apache/hadoop-ozone/blob/ecb5bf4df1d80723835a1500d595102f3f861708/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/web/ozShell/volume/CreateVolumeHandler.java#L63-L65]
>  internally) then try to list it with {{ObjectStore#listVolumesByUser(null, 
> ...)}} ([uses {{getShortUserName()}} by 
> default|https://github.com/apache/hadoop-ozone/blob/2fa37ef99b8fb4575169ba8326eeb677b3d2ed74/hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/ObjectStore.java#L238-L256]
>  when user param is empty or null), the user won't see any volumes because of 
> the mismatch.
> We should also double check *all* usages that uses {{getShortUserName()}}.
> *Update:*
> Xiaoyu and I checked that the usage of {{getShortUserName()}} on the server 
> side shouldn't become a problem. Because server should've maintained it's own 
> auth_to_local rules (admin should make sure they separate each user into 
> different short names. just don't map multiple principal names into the same 
> then it won't be a problem).
> The usage in {{BasicOzoneFileSystem}} itself also seems valid because that 
> {{getShortUserName()}} is only used for client side purpose (to set 
> {{workingDir}}, etc.).
> But the usage in {{ObjectStore#listVolumesByUser}} is confirmed problematic 
> at the moment, which needs to be fixed. Same for 
> [{{CreateVolumeHandler#call}}|https://github.com/apache/hadoop-ozone/blob/ecb5bf4df1d80723835a1500d595102f3f861708/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/web/ozShell/volume/CreateVolumeHandler.java#L81-L83]:
> {code:java|title=CreateVolumeHandler#call}
>   } else {
> rootName = UserGroupInformation.getCurrentUser().getShortUserName();
>   }
> {code}
> It should pass full principal name to server.
> CC [~xyao] [~aengineer] [~arp] [~bharat]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-

[jira] [Updated] (HDDS-3057) Improve Ozone Shell ACL operations' help text readability

2020-02-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3057:
-
Labels: pull-request-available  (was: )

> Improve Ozone Shell ACL operations' help text readability
> -
>
> Key: HDDS-3057
> URL: https://issues.apache.org/jira/browse/HDDS-3057
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>  Labels: pull-request-available
>
> Currently:
> {code:bash|title=ozone sh volume addacl -h}
> $ ozone sh volume addacl -h
> Usage: ozone sh volume addacl [-hV] -a= [-s=] 
> Add a new Acl.
>URI of the volume/bucket.
> Ozone URI could start with o3:// or without 
> prefix. URI
>   may contain the host and port of the OM server. 
> Both
>   are optional. If they are not specified it will 
> be
>   identified from the config files.
>   -a, --acl=   Add acl.r = READ,w = WRITE,c = CREATE,d = 
> DELETE,l =
>   LIST,a = ALL,n = NONE,x = READ_AC,y = 
> WRITE_ACEx user:
>   user1:rw or group:hadoop:rw
>   -h, --helpShow this help message and exit.
>   -s, --store=   store type. i.e OZONE or S3
>   -V, --version Print version information and exit.
> {code}
> {code:bash|title=ozone sh bucket addacl -h}
> $ ozone sh bucket addacl -h
> Usage: ozone sh bucket addacl [-hV] -a= [-s=] 
> Add a new Acl.
>URI of the volume/bucket.
> Ozone URI could start with o3:// or without 
> prefix. URI
>   may contain the host and port of the OM server. 
> Both
>   are optional. If they are not specified it will 
> be
>   identified from the config files.
>   -a, --acl=   new acl.r = READ,w = WRITE,c = CREATE,d = 
> DELETE,l =
>   LIST,a = ALL,n = NONE,x = READ_AC,y = 
> WRITE_ACEx user:
>   user1:rw or group:hadoop:rw
>   -h, --helpShow this help message and exit.
>   -s, --store=   store type. i.e OZONE or S3
>   -V, --version Print version information and exit.
> {code}
> Same for {{ozone sh (volume|bucket|key) (addacl|removeacl|setacl|-getacl-)}}
> It would look much nicer to have a line separator or space between {{acl.}} 
> and {{.r = READ,...}}.
> And improves the prompt on error and overall readability, and correct typo: 
> {{READ_AC -> READ_ACL}}, {{WRITE_AC -> WRITE_ACL}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3054) OzoneFileStatus#getModificationTime should return actual directory modification time when its OmKeyInfo is available

2020-02-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3054:
-
Labels: pull-request-available  (was: )

> OzoneFileStatus#getModificationTime should return actual directory 
> modification time when its OmKeyInfo is available
> 
>
> Key: HDDS-3054
> URL: https://issues.apache.org/jira/browse/HDDS-3054
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>  Labels: pull-request-available
>
> As of current implementation, 
> [{{getModificationTime()}}|https://github.com/apache/hadoop-ozone/blob/c9f26ccf9f93a052c5c0c042c57b6f87709597ae/hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/helpers/OzoneFileStatus.java#L90-L107]
>  always returns "fake" modification time (current time) for directory due to 
> the reason that a directory in Ozone might be faked from a file key.
> But, there are cases where real directory key exists in OzoneBucket. For 
> example when user calls {{fs.mkdirs(directory)}}. In this case, a reasonable 
> thing to do would be getting the modification time from the OmInfoKey, rather 
> than faking it.
> CC [~xyao]
> My POC for the fix:
> {code:java|title=Diff}
> diff --git 
> a/hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/helpers/OzoneFileStatus.java
>  
> b/hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/helpers/OzoneFileStatus.java
> index 8717946512..708e62d692 100644
> --- 
> a/hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/helpers/OzoneFileStatus.java
> +++ 
> b/hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/helpers/OzoneFileStatus.java
> @@ -93,7 +93,7 @@ public FileStatus makeQualified(URI defaultUri, Path parent,
> */
>@Override
>public long getModificationTime(){
> -if (isDirectory()) {
> +if (isDirectory() && super.getModificationTime() == 0) {
>return System.currentTimeMillis();
>  } else {
>return super.getModificationTime();
> diff --git 
> a/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java
>  
> b/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java
> index 1be5fb3f3c..cb8f647a41 100644
> --- 
> a/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java
> +++ 
> b/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java
> @@ -2004,8 +2004,14 @@ public OmKeyInfo lookupFile(OmKeyArgs args, String 
> clientAddress)
>} else {
>  // if entry is a directory
>  if (!deletedKeySet.contains(entryInDb)) {
> -  cacheKeyMap.put(entryInDb,
> -  new OzoneFileStatus(immediateChild));
> +  if (!entryKeyName.equals(immediateChild)) {
> +cacheKeyMap.put(entryInDb,
> +new OzoneFileStatus(immediateChild));
> +  } else {
> +// If entryKeyName matches dir name, we have the info
> +cacheKeyMap.put(entryInDb,
> +new OzoneFileStatus(value, 0, true));
> +  }
>countEntries++;
>  }
>  // skip the other descendants of this child directory.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3026) OzoneManager#listStatus should be auditted as READ operation instead of WRITE operation

2020-02-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3026:
-
Labels: pull-request-available  (was: )

> OzoneManager#listStatus should be auditted as READ operation instead of WRITE 
> operation
> ---
>
> Key: HDDS-3026
> URL: https://issues.apache.org/jira/browse/HDDS-3026
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Sammi Chen
>Assignee: Jan Hentschel
>Priority: Major
>  Labels: pull-request-available
>
> Currently, listStatus use  AUDIT.logWriteSuccess and  AUDIT.logWriteFailure 
> to log AUDIT info. It should use  AUDIT.logReadSuccess and  
> AUDIT.logReadFailure instead. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3058) OzoneFileSystem should override unsupported set type FileSystem API

2020-02-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3058:
-
Labels: pull-request-available  (was: )

> OzoneFileSystem should override unsupported set type FileSystem API
> ---
>
> Key: HDDS-3058
> URL: https://issues.apache.org/jira/browse/HDDS-3058
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Filesystem
>Affects Versions: 0.4.1
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Major
>  Labels: pull-request-available
>
> Currently, OzoneFileSystem only implements some common useful FileSystem APIs 
> and most of other API are not supported and inherited from parent class 
> FileSystem by default. However, FileSystem do nothing in some set type 
> method, like setReplication, setOwner.
> {code:java}
>  public void setVerifyChecksum(boolean verifyChecksum) {
> //doesn't do anything
>   }
>   public void setWriteChecksum(boolean writeChecksum) {
> //doesn't do anything
>   }
>   public boolean setReplication(Path src, short replication)
> throws IOException {
> return true;
>   }
>   public void setPermission(Path p, FsPermission permission
>   ) throws IOException {
>   }
>   public void setOwner(Path p, String username, String groupname
>   ) throws IOException {
>   }
>   public void setTimes(Path p, long mtime, long atime
>   ) throws IOException {
>   }
> {code}
> This set type functions depend on the sub-filesystem implementation. We need 
> to to throw unsupported exception if sub-filesystem cannot support this. 
> Otherwise, it will make users confused to use hadoop fs -setrep command or 
> call setReplication api. Users will not see any exception but the command/API 
> can execute fine. This is happened when I tested for the OzoneFileSystem via 
> hadoop fs command way.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3059) OzoneManager#listFileStatus should be auditted as READ operation instead of WRITE operation

2020-02-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3059:
-
Labels: pull-request-available  (was: )

> OzoneManager#listFileStatus should be auditted as READ operation instead of 
> WRITE operation
> ---
>
> Key: HDDS-3059
> URL: https://issues.apache.org/jira/browse/HDDS-3059
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Jan Hentschel
>Assignee: Jan Hentschel
>Priority: Minor
>  Labels: pull-request-available
>
> Currently, listFileStatus use AUDIT.logWriteSuccess and AUDIT.logWriteFailure 
> to log AUDIT info. It should use AUDIT.logReadSuccess and 
> AUDIT.logReadFailure instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2886) parse and dump datanode segment file to pritable text

2020-02-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-2886:
-
Labels: pull-request-available  (was: )

> parse and dump datanode segment file to pritable text
> -
>
> Key: HDDS-2886
> URL: https://issues.apache.org/jira/browse/HDDS-2886
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.5.0
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDDS-2886.001.patch, log_dump, log_inprogress_0
>
>
> Add a tool to parse and dump datanode ratis log files and dump them in a 
> string format.
> This tool will help in debugging of ratis issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3049) Replication factor passed in create API doesn't take effect

2020-02-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3049:
-
Labels: pull-request-available  (was: )

> Replication factor passed in create API doesn't take effect
> ---
>
> Key: HDDS-3049
> URL: https://issues.apache.org/jira/browse/HDDS-3049
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Sammi Chen
>Assignee: mingchao zhao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3043) Fix TestFailureHandlingByClient.java

2020-02-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3043:
-
Labels: pull-request-available  (was: )

> Fix TestFailureHandlingByClient.java
> 
>
> Key: HDDS-3043
> URL: https://issues.apache.org/jira/browse/HDDS-3043
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Client
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3013) Fix TestBlockOutputStreamWithFailures.java

2020-02-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3013:
-
Labels: pull-request-available  (was: )

> Fix TestBlockOutputStreamWithFailures.java
> --
>
> Key: HDDS-3013
> URL: https://issues.apache.org/jira/browse/HDDS-3013
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Client
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3063) Add test to verify replication factor of ozone fs

2020-02-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3063:
-
Labels: pull-request-available  (was: )

> Add test to verify replication factor of ozone fs
> -
>
> Key: HDDS-3063
> URL: https://issues.apache.org/jira/browse/HDDS-3063
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: test
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Minor
>  Labels: pull-request-available
>
> Currently no test verifies that {{ozone fs}} creates keys for files with the 
> expected replication factor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2995) Add integration test for Recon's Passive SCM state.

2020-02-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-2995:
-
Labels: pull-request-available  (was: )

> Add integration test for Recon's Passive SCM state.
> ---
>
> Key: HDDS-2995
> URL: https://issues.apache.org/jira/browse/HDDS-2995
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Recon
>Affects Versions: 0.5.0
>Reporter: Aravindan Vijayan
>Assignee: Aravindan Vijayan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
> Attachments: HDDS-2995-001.patch
>
>
> * Verify Recon gets pipeline, node and container report from Datanode.
> * Verify SCM metadata state == Recon metadata state (Create pipeline , Close 
> pipeline, create container)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3065) Ozone Filesystem should return real default replication

2020-02-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3065:
-
Labels: pull-request-available  (was: )

> Ozone Filesystem should return real default replication
> ---
>
> Key: HDDS-3065
> URL: https://issues.apache.org/jira/browse/HDDS-3065
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Filesystem
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available
>
> Ozone {{FileSystem}} implementation should return the actual configured 
> replication factor for {{getDefaultReplication()}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3066) SCM startup failed during loading containers from DB

2020-02-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3066:
-
Labels: pull-request-available  (was: )

> SCM startup failed  during loading containers from DB
> -
>
> Key: HDDS-3066
> URL: https://issues.apache.org/jira/browse/HDDS-3066
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>  Labels: pull-request-available
>
>  This is happening because pipeline scrubber came and removed pipeline, and 
> it closed pipeline and removed from DB and triggered close containers to set 
> them to CLOSING. When SCM is restarted before close container command is 
> handled and change the state to CLOSING, the below issue can happen.
>  
> This can happen in other scenarios like when safeModeHandler calls 
> finalizeAndDestroyPipeline and do SCM restart. 
>  
> The root cause for this is Pipeline removed from DB and the container is in 
> open state in this scenario, and when trying to get pipeline we will crash 
> SCM due to the {{PipelineNotFoundException error.}}
> {{}}
> {code:java}
>  2020-02-21 13:57:34,888 [main] ERROR 
> org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter: SCM start 
> failed with exception 
> org.apache.hadoop.hdds.scm.pipeline.PipelineNotFoundException: 
> PipelineID=35dff62d-9bfa-449b-b6e8-6f00cc8c1b6e not found at 
> org.apache.hadoop.hdds.scm.pipeline.PipelineStateMap.getPipeline(PipelineStateMap.java:133)
>  at 
> org.apache.hadoop.hdds.scm.pipeline.PipelineStateMap.addContainerToPipeline(PipelineStateMap.java:110)
>  at 
> org.apache.hadoop.hdds.scm.pipeline.PipelineStateManager.addContainerToPipeline(PipelineStateManager.java:59)
>  at 
> org.apache.hadoop.hdds.scm.pipeline.SCMPipelineManager.addContainerToPipeline(SCMPipelineManager.java:309)
>  at 
> org.apache.hadoop.hdds.scm.container.SCMContainerManager.loadExistingContainers(SCMContainerManager.java:121)
>  at 
> org.apache.hadoop.hdds.scm.container.SCMContainerManager.(SCMContainerManager.java:107)
>  at 
> org.apache.hadoop.hdds.scm.server.StorageContainerManager.initializeSystemManagers(StorageContainerManager.java:412)
>  at 
> org.apache.hadoop.hdds.scm.server.StorageContainerManager.(StorageContainerManager.java:283)
>  at 
> org.apache.hadoop.hdds.scm.server.StorageContainerManager.(StorageContainerManager.java:215)
>  at 
> org.apache.hadoop.hdds.scm.server.StorageContainerManager.createSCM(StorageContainerManager.java:612)
>  at 
> org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter$SCMStarterHelper.start(StorageContainerManagerStarter.java:142)
>  at 
> org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter.startScm(StorageContainerManagerStarter.java:117)
>  at 
> org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter.call(StorageContainerManagerStarter.java:66)
>  at 
> org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter.call(StorageContainerManagerStarter.java:42)
>  at picocli.CommandLine.execute(CommandLine.java:1173) at 
> picocli.CommandLine.access$800(CommandLine.java:141) at 
> picocli.CommandLine$RunLast.handle(CommandLine.java:1367) at 
> picocli.CommandLine$RunLast.handle(CommandLine.java:1335) at 
> picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:1243)
>  at picocli.CommandLine.parseWithHandlers(CommandLine.java:1526) at 
> picocli.CommandLine.parseWithHandler(CommandLine.java:1465) at 
> org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:65) at 
> org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:56) at 
> org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter.main(StorageContainerManagerStarter.java:55)
>  2020-02-21 13:57:34,892 [shutdown-hook-0] INFO 
> org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter: 
> SHUTDOWN_MSG: / 
> SHUTDOWN_MSG: Shutting down StorageContainerManager at 
> om-ha-1.vpc.cloudera.com/10.65.51.49 
> /{code}
> {{}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3067) Fix Bug in Scrub Pipeline causing destory pipelines after SCM restart

2020-02-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3067:
-
Labels: OMHATest pull-request-available  (was: OMHATest)

> Fix Bug in Scrub Pipeline causing destory pipelines after SCM restart
> -
>
> Key: HDDS-3067
> URL: https://issues.apache.org/jira/browse/HDDS-3067
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>  Labels: OMHATest, pull-request-available
>
> Currently, the scrubber is run as part of create pipeline. 
> When SCM is started, scrubber is coming up and cleaning up all the containers 
> in SCM. Because when loading pipelines, the pipelineCreationTimeStamp is set 
> from when the pipeline is created.
>  
> Because of this, below condition is satisfied and destroying all the 
> pipelines when SCM is restarted. This can be easily reproduced start SCM, 
> wait for 10 minutes and restart SCM.
>  
> {code:java}
> List needToSrubPipelines = stateManager.getPipelines(type, factor,
>  Pipeline.PipelineState.ALLOCATED).stream()
>  .filter(p -> currentTime.toEpochMilli() - p.getCreationTimestamp()
>  .toEpochMilli() >= pipelineScrubTimeoutInMills)
>  .collect(Collectors.toList());
> for (Pipeline p : needToSrubPipelines) {
>  LOG.info("srubbing pipeline: id: " + p.getId().toString() +
>  " since it stays at ALLOCATED stage for " +
>  Duration.between(currentTime, p.getCreationTimestamp()).toMinutes() +
>  " mins.");
>  finalizeAndDestroyPipeline(p, false);
> }{code}
>  
> *Log showing scrubbing of pipeline*
>  
> {code:java}
> 2020-02-20 12:42:18,946 [RatisPipelineUtilsThread] INFO 
> org.apache.hadoop.hdds.scm.pipeline.SCMPipelineManager: srubbing pipeline: 
> id: PipelineID=35dff62d-9bfa-449b-b6e8-6f00cc8c1b6e since it stays at 
> ALLOCATED stage for -1003 mins.{code}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3068) OM crash during startup does not print any error message to log

2020-02-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3068:
-
Labels: OMHATest pull-request-available  (was: OMHATest)

> OM crash during startup does not print any error message to log
> ---
>
> Key: HDDS-3068
> URL: https://issues.apache.org/jira/browse/HDDS-3068
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Manager
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>  Labels: OMHATest, pull-request-available
>
> During code read found similar thing, we don't log for OM start also. As OM 
> startup also using similar code for startup.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2648) TestOzoneManagerDoubleBufferWithOMResponse

2020-02-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-2648:
-
Labels: pull-request-available  (was: )

> TestOzoneManagerDoubleBufferWithOMResponse
> --
>
> Key: HDDS-2648
> URL: https://issues.apache.org/jira/browse/HDDS-2648
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Reporter: Marton Elek
>Assignee: Bharat Viswanadham
>Priority: Blocker
>  Labels: pull-request-available
>
> The test is flaky:
>  
> Example run: [https://github.com/apache/hadoop-ozone/runs/325281277]
>  
> Failure:
> {code:java}
> ---
> Test set: 
> org.apache.hadoop.ozone.om.ratis.TestOzoneManagerDoubleBufferWithOMResponse
> ---
> Tests run: 3, Failures: 1, Errors: 0, Skipped: 1, Time elapsed: 5.31 s <<< 
> FAILURE! - in 
> org.apache.hadoop.ozone.om.ratis.TestOzoneManagerDoubleBufferWithOMResponse
> testDoubleBufferWithMixOfTransactionsParallel(org.apache.hadoop.ozone.om.ratis.TestOzoneManagerDoubleBufferWithOMResponse)
>   Time elapsed: 0.282 s  <<< FAILURE!
> java.lang.AssertionError: expected:<32> but was:<29>
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:555)
> at org.junit.Assert.assertEquals(Assert.java:542)
> at 
> org.apache.hadoop.ozone.om.ratis.TestOzoneManagerDoubleBufferWithOMResponse.testDoubleBufferWithMixOfTransactionsParallel(TestOzoneManagerDoubleBufferWithOMResponse.java:247)
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2816) Fix shell description for --start parameter of listing keys

2020-02-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-2816:
-
Labels: newbie pull-request-available  (was: newbie)

> Fix shell description for --start parameter of listing keys
> ---
>
> Key: HDDS-2816
> URL: https://issues.apache.org/jira/browse/HDDS-2816
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: YiSheng Lien
>Priority: Minor
>  Labels: newbie, pull-request-available
>
> In master-branch, the listing key show the description 
>  *--start= The first key to start the listing*
> We should update the description to "*The key to start the listing from. This 
> will be excluded from the result.*"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2984) Allocate Block failing with NPE

2020-02-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-2984:
-
Labels: pull-request-available teragentest  (was: teragentest)

> Allocate Block failing with NPE
> ---
>
> Key: HDDS-2984
> URL: https://issues.apache.org/jira/browse/HDDS-2984
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Bharat Viswanadham
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available, teragentest
> Attachments: Screen Shot 2020-02-05 at 2.48.56 PM.png
>
>
> When running teragen in one of the run observed this error.
> {code:java}
> 05 14:43:16,635 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception 
> running child : INTERNAL_ERROR 
> org.apache.hadoop.ozone.om.exceptions.OMException: 
> java.lang.NullPointerException at 
> java.util.Objects.requireNonNull(Objects.java:203) at 
> java.util.Optional.(Optional.java:96) at 
> java.util.Optional.of(Optional.java:108) at 
> org.apache.hadoop.hdds.scm.pipeline.SCMPipelineMetrics.incNumBlocksAllocated(SCMPipelineMetrics.java:118)
>  at 
> org.apache.hadoop.hdds.scm.pipeline.SCMPipelineManager.incNumBlocksAllocatedMetric(SCMPipelineManager.java:520)
>  at 
> org.apache.hadoop.hdds.scm.block.BlockManagerImpl.newBlock(BlockManagerImpl.java:265)
>  at 
> org.apache.hadoop.hdds.scm.block.BlockManagerImpl.allocateBlock(BlockManagerImpl.java:233)
>  at 
> org.apache.hadoop.hdds.scm.server.SCMBlockProtocolServer.allocateBlock(SCMBlockProtocolServer.java:188)
>  at 
> org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocolServerSideTranslatorPB.allocateScmBlock(ScmBlockLocationProtocolServerSideTranslatorPB.java:159)
>  at 
> org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocolServerSideTranslatorPB.processMessage(ScmBlockLocationProtocolServerSideTranslatorPB.java:117)
>  at 
> org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72)
>  at 
> org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocolServerSideTranslatorPB.send(ScmBlockLocationProtocolServerSideTranslatorPB.java:98)
>  at 
> org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:13157)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) at 
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:984) at 
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:912) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:422) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2882) at 
> org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:792)
>  at 
> org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.createFile(OzoneManagerProtocolClientSideTranslatorPB.java:1596)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.apache.hadoop.hdds.tracing.TraceAllMethod.invoke(TraceAllMethod.java:71) 
> at com.sun.proxy.$Proxy18.createFile(Unknown Source) at 
> org.apache.hadoop.ozone.client.rpc.RpcClient.createFile(RpcClient.java:1071) 
> at 
> org.apache.hadoop.ozone.client.OzoneBucket.createFile(OzoneBucket.java:538) 
> at 
> org.apache.hadoop.fs.ozone.BasicOzoneClientAdapterImpl.createFile(BasicOzoneClientAdapterImpl.java:208)
>  at 
> org.apache.hadoop.fs.ozone.BasicOzoneFileSystem.createOutputStream(BasicOzoneFileSystem.java:256)
>  at 
> org.apache.hadoop.fs.ozone.BasicOzoneFileSystem.create(BasicOzoneFileSystem.java:237)
>  at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1133) at 
> org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1113) at 
> org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1002) at 
> org.apache.hadoop.fs.FileSystem.create(FileSystem.java:990) at 
> org.apache.hadoop.examples.terasort.TeraOutputFormat.getRecordWriter(TeraOutputFormat.java:141)
>  at 
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.(MapTask.java:659)
>  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:779) at 
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:347) at 
> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) at 
> java.security.AccessController.doPriv

[jira] [Updated] (HDDS-3070) NPE when stop recon server while recon server was not really started before

2020-02-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3070:
-
Labels: pull-request-available  (was: )

> NPE when stop recon server while recon server was not really started before
> ---
>
> Key: HDDS-3070
> URL: https://issues.apache.org/jira/browse/HDDS-3070
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Recon
>Affects Versions: 0.4.1
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Minor
>  Labels: pull-request-available
>
> I met a NPE error when I did test for Ozone. Seems the root cause is that 
> recon server was not really started however we still try to stop it.
> {noformat}
> 2020-02-25 20:22:44,296 [Thread-0] ERROR ozone.MiniOzoneClusterImpl 
> (MiniOzoneClusterImpl.java:build(525)) - Exception while shutting down the 
> Recon.
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.ozone.recon.tasks.ReconTaskControllerImpl.stop(ReconTaskControllerImpl.java:237)
>   at 
> org.apache.hadoop.ozone.recon.spi.impl.OzoneManagerServiceProviderImpl.stop(OzoneManagerServiceProviderImpl.java:229)
>   at org.apache.hadoop.ozone.recon.ReconServer.stop(ReconServer.java:132)
>   at 
> org.apache.hadoop.ozone.MiniOzoneClusterImpl.stopRecon(MiniOzoneClusterImpl.java:470)
>   at 
> org.apache.hadoop.ozone.MiniOzoneClusterImpl.access$200(MiniOzoneClusterImpl.java:87)
>   at 
> org.apache.hadoop.ozone.MiniOzoneClusterImpl$Builder.build(MiniOzoneClusterImpl.java:523)
>   at 
> org.apache.hadoop.fs.ozone.TestOzoneFileSystem.testFileSystem(TestOzoneFileSystem.java:72)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3038) TestRatisPipelineLeader fails since we no longer wait for leader in the HealthyPipelineSafeModeExitRule

2020-02-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3038:
-
Labels: pull-request-available  (was: )

> TestRatisPipelineLeader fails since we no longer wait for leader in the 
> HealthyPipelineSafeModeExitRule
> ---
>
> Key: HDDS-3038
> URL: https://issues.apache.org/jira/browse/HDDS-3038
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Attila Doroszlai
>Assignee: Siddharth Wagle
>Priority: Major
>  Labels: pull-request-available
>
> {code:title=https://github.com/apache/hadoop-ozone/runs/456217344}
> [ERROR] Tests run: 2, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 
> 29.823 s <<< FAILURE! - in org.apache.hadoop.hdds.scm.TestRatisPipelineLeader
> [ERROR] 
> testLeaderIdAfterLeaderChange(org.apache.hadoop.hdds.scm.TestRatisPipelineLeader)
>   Time elapsed: 5.367 s  <<< FAILURE!
> java.lang.AssertionError
>   ...
>   at 
> org.apache.hadoop.hdds.scm.TestRatisPipelineLeader.verifyLeaderInfo(TestRatisPipelineLeader.java:125)
>   at 
> org.apache.hadoop.hdds.scm.TestRatisPipelineLeader.testLeaderIdAfterLeaderChange(TestRatisPipelineLeader.java:106)
> [ERROR] 
> testLeaderIdUsedOnFirstCall(org.apache.hadoop.hdds.scm.TestRatisPipelineLeader)
>   Time elapsed: 0.008 s  <<< FAILURE!
> java.lang.AssertionError
>   ...
>   at 
> org.apache.hadoop.hdds.scm.TestRatisPipelineLeader.verifyLeaderInfo(TestRatisPipelineLeader.java:125)
>   at 
> org.apache.hadoop.hdds.scm.TestRatisPipelineLeader.testLeaderIdUsedOnFirstCall(TestRatisPipelineLeader.java:76)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3072) SCM scrub pipeline should be started after coming out of safe mode

2020-02-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3072:
-
Labels: pull-request-available  (was: )

> SCM scrub pipeline should be started after coming out of safe mode
> --
>
> Key: HDDS-3072
> URL: https://issues.apache.org/jira/browse/HDDS-3072
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>  Labels: pull-request-available
>
> We should start scrubbing pipelines after SCM is out of safe mode.
> Reasons to do this:
>  # Right now, we do scrub pipeline as part of triggerPipelineCreation, now 
> when we scrub pipelines in allocated state for more than 
> "ozone.scm.pipeline.allocated.timeout", we might close some pipelines and 
> with this, we might not be able to come out of safeMode. As in SafeModeRules, 
> we get pipeline count from pipelineDB during initialization.
> Example scenario:
>  # Stop 3 Datanodes. 
>  # Restart SCM.
>  # Start Datanode after 6 mts. We shall never come out of safe mode, as 
> pipeline in allocated state will meet scrubber time out condition.
> To not to be in these kinds of scenarios, better thing to be done here is 
> scrub pipelines after SCM out of the safe mode
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3073) Implement ofs://: Fix listStatus continuation

2020-02-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3073:
-
Labels: pull-request-available  (was: )

> Implement ofs://: Fix listStatus continuation
> -
>
> Key: HDDS-3073
> URL: https://issues.apache.org/jira/browse/HDDS-3073
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>  Labels: pull-request-available
>
> Supplement to HDDS-2928



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3002) Make the Mountd working for Ozone

2020-02-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3002:
-
Labels: pull-request-available  (was: )

> Make the Mountd working for Ozone
> -
>
> Key: HDDS-3002
> URL: https://issues.apache.org/jira/browse/HDDS-3002
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Prashant Pogde
>Assignee: Prashant Pogde
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2996) Create REST API to serve Node information and integrate with UI in Recon.

2020-02-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-2996:
-
Labels: pull-request-available  (was: )

> Create REST API to serve Node information and integrate with UI in Recon.
> -
>
> Key: HDDS-2996
> URL: https://issues.apache.org/jira/browse/HDDS-2996
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Recon
>Affects Versions: 0.5.0
>Reporter: Aravindan Vijayan
>Assignee: Vivek Ratnavel Subramanian
>Priority: Major
>  Labels: pull-request-available
>
> We need a REST API in Recon to serve up information for the Datanodes page 
> (HDDS-2827). The REST API can also include other useful methods present in 
> NodeManager that gives the user information about the Nodes in the cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3084) Smoketest to write data on network aware cluster

2020-02-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3084:
-
Labels: pull-request-available  (was: )

> Smoketest to write data on network aware cluster
> 
>
> Key: HDDS-3084
> URL: https://issues.apache.org/jira/browse/HDDS-3084
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Affects Versions: 0.6.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
>
> It would be good to create a smoke test which:
> 1. Writes some data on a network aware cluster
> 2. Stops 1 rack and ensures the data is still readable
> 3. Restart the rack and stop the other rack and again check the data is 
> readable
> That way we can have some confidence the data is being written to both racks 
> OK.
> One issue with a test like this on a small cluster, is that there is a high 
> chance the data will end up on 2 racks naturally, even if no network topology 
> is configured. If that was the case, we would expect intermittent test 
> failures. 
> However, if network topology is working fine, then we would not expect any 
> failures.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3069) Delete key is failing

2020-02-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3069:
-
Labels: pull-request-available  (was: )

> Delete key is failing
> -
>
> Key: HDDS-3069
> URL: https://issues.apache.org/jira/browse/HDDS-3069
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Reporter: Nilotpal Nandi
>Assignee: Bharat Viswanadham
>Priority: Blocker
>  Labels: pull-request-available
>
> Delete key is failing . Here is the stack trace of the failure:
>  
>  
> {noformat}
> INFO retry.RetryInvocationHandler: com.google.protobuf.ServiceException: 
> org.apache.hadoop.ipc.RemoteException(java.lang.IllegalArgumentException): 
> Trying to set updateID to 26 which is not greater than the current value of 
> 433 for OMKeyInfo{volume='vol-test-restartcomponentozonereaddata-1582093704', 
> bucket='buck-test-restartcomponentozonereaddata-1582093704', 
> key='ReadOzoneFile_1582093709', dataSize='10485760', 
> creationTime='1582093712218', type='RATIS', factor='THREE'} E at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:142) E 
> at 
> org.apache.hadoop.ozone.om.helpers.WithObjectID.setUpdateID(WithObjectID.java:79)
>  E at 
> org.apache.hadoop.ozone.om.request.key.OMKeyDeleteRequest.validateAndUpdateCache(OMKeyDeleteRequest.java:147)
>  E at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequest(OzoneManagerRequestHandler.java:230)
>  E at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB.java:210)
>  E at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:130)
>  E at 
> org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72)
>  E at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:98)
>  E at 
> org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java)
>  E at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
>  E at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) E at 
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:984) E at 
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:912) E at 
> java.base/java.security.AccessController.doPrivileged(Native Method) E at 
> java.base/javax.security.auth.Subject.doAs(Subject.java:423) E at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>  E at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2882) E , while 
> invoking $Proxy16.submitRequest over 
> nodeId=null,nodeAddress=quasar-vbncen-3.quasar-vbncen.root.hwx.site:9862. 
> Trying to failover immediately.
>  
> ..
> ..
> ..
> ..
>  
> 20/02/19 03:37:17 INFO retry.RetryInvocationHandler: 
> com.google.protobuf.ServiceException: 
> org.apache.hadoop.ipc.RemoteException(java.lang.IllegalArgumentException): 
> Trying to set updateID to 22 which is not greater than the current value of 
> 1143 for OMKeyInfo{volume='vol-test-kill-datanode-1582075168', 
> bucket='buck-test-kill-datanode-1582075168', 
> key='replication_test1_1582075173', dataSize='104857600', 
> creationTime='1582075177268', type='RATIS', factor='THREE'} E at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:142) E 
> at 
> org.apache.hadoop.ozone.om.helpers.WithObjectID.setUpdateID(WithObjectID.java:79)
>  E at 
> org.apache.hadoop.ozone.om.request.key.OMKeyDeleteRequest.validateAndUpdateCache(OMKeyDeleteRequest.java:147)
>  E at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequest(OzoneManagerRequestHandler.java:230)
>  E at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB.java:210)
>  E at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:130)
>  E at 
> org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72)
>  E at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:98)
>  E at 
> org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java)
>  E at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(Proto

[jira] [Updated] (HDDS-2929) Implement ofs://: temp directory mount

2020-02-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-2929:
-
Labels: pull-request-available  (was: )

> Implement ofs://: temp directory mount
> --
>
> Key: HDDS-2929
> URL: https://issues.apache.org/jira/browse/HDDS-2929
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>  Labels: pull-request-available
>
> Because of ofs:// filesystem hierarchy starts with volume then bucket, an 
> application typically won't be able to write directly under a first-level 
> folder, e.g. ofs://service-id1/tmp/. /tmp/ is a special case that we need to 
> handle since that is the default location most legacy Hadoop applications 
> write to for swap/temporary files. In order to address this, we would 
> introduce /tmp/ as a client-side "mount" to another Ozone bucket.
> Note that the preliminary implementation would only allow for /tmp/ to be a 
> mount but not any user-defined path.
> This depends on HDDS-2840 and HDDS-2928.
> Demo PR on my fork: https://github.com/smengcl/hadoop-ozone/pull/1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3041) memory leak of s3g

2020-02-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3041:
-
Labels: pull-request-available  (was: )

> memory leak of s3g
> --
>
> Key: HDDS-3041
> URL: https://issues.apache.org/jira/browse/HDDS-3041
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: S3
>Affects Versions: 0.6.0
>Reporter: JieWang
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2020-02-24-12-06-22-248.png, 
> image-2020-02-24-12-10-09-552.png, image-2020-02-26-17-11-31-834.png, 
> screenshot-1.png, screenshot-2.png, screenshot-3.png, screenshot-4.png, 
> screenshot-5.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3090) Fix logging in OMFileRequest and OzoneManager

2020-02-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3090:
-
Labels: pull-request-available  (was: )

> Fix logging in OMFileRequest and OzoneManager
> -
>
> Key: HDDS-3090
> URL: https://issues.apache.org/jira/browse/HDDS-3090
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Manager
>Reporter: Supratim Deka
>Assignee: Supratim Deka
>Priority: Trivial
>  Labels: pull-request-available
>
> HDDS-2940 introduced a INFO level log in 
> hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/file/OMFileCreateRequest.java
> This needs to be a TRACE because it occurs in the regular file create path.
> Also, trace logs introduced in OzoneManager and OMFileRequest.java need to be 
> parameterized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3078) Include output of timed out test in bundle

2020-02-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3078:
-
Labels: pull-request-available  (was: )

> Include output of timed out test in bundle
> --
>
> Key: HDDS-3078
> URL: https://issues.apache.org/jira/browse/HDDS-3078
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: build, test
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available
>
> Sometimes a unit/integration test does not complete, nor does it crash.  We 
> should collect the output of such tests in the result bundle for analysis.
> Example:
> {code:title=https://github.com/adoroszlai/hadoop-ozone/runs/469172863}
> 2020-02-26T08:15:58.2297584Z [INFO] Running 
> org.apache.hadoop.ozone.freon.TestRandomKeyGenerator
> 2020-02-26T08:30:59.6189916Z [INFO] Running 
> org.apache.hadoop.ozone.freon.TestDataValidateWithUnsafeByteOperations
> ...
> 2020-02-26T08:32:47.6155975Z [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M1:test (default-test) 
> on project hadoop-ozone-integration-test: There was a timeout or other error 
> in the fork
> {code}
> In this case TestRandomKeyGenerator had this problem.  It might be a bit 
> tricky to find such tests, since these are not explicitly listed at the end, 
> unlike failed or crashed tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3092) Duplicate large key test

2020-02-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3092:
-
Labels: pull-request-available  (was: )

> Duplicate large key test
> 
>
> Key: HDDS-3092
> URL: https://issues.apache.org/jira/browse/HDDS-3092
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Minor
>  Labels: pull-request-available
>
> {{TestDataValidate}} has 2 large key tests:
> * {{ratisTestLargeKey}}
> * {{standaloneTestLargeKey}}
> But both of these test RATIS/3 replication since HDDS-675.  I think 
> {{standaloneTestLargeKey}} can be removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3108) Remove unused ForkJoinPool in RatisPipelineProvider

2020-02-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3108:
-
Labels: pull-request-available  (was: )

> Remove unused ForkJoinPool in RatisPipelineProvider
> ---
>
> Key: HDDS-3108
> URL: https://issues.apache.org/jira/browse/HDDS-3108
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.6.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
>
> The RatisPipelineProvider has a ForkJoinPool that is never used anywhere 
> except when it is shutdown.
> I suspect it was used previously and then some refactoring has made it 
> redundant and it got left behind.
> This jira is to remove that unused code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3109) Refactor 'Recon' in MiniOzoneCluster to use ephemeral port.

2020-02-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3109:
-
Labels: pull-request-available  (was: )

> Refactor 'Recon' in MiniOzoneCluster to use ephemeral port.
> ---
>
> Key: HDDS-3109
> URL: https://issues.apache.org/jira/browse/HDDS-3109
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Aravindan Vijayan
>Assignee: Aravindan Vijayan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>
> Currently, Recon uses an ephemeral port only in the integration test for 
> Recon. In all other integration tests, we end up using the default (9888) 
> that causes failures in other integration tests that start up a Mini ozone 
> cluster. In addition, we want to start up Recon in MiniOzoneCluster by 
> explicitly requesting it rather than by default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3085) OM Delta updates request in Recon should work with secure Ozone Manager.

2020-02-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3085:
-
Labels: pull-request-available  (was: )

> OM Delta updates request in Recon should work with secure Ozone Manager.
> 
>
> Key: HDDS-3085
> URL: https://issues.apache.org/jira/browse/HDDS-3085
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Recon
>Reporter: Aravindan Vijayan
>Assignee: Vivek Ratnavel Subramanian
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2911) lastUsed and stateEnterTime value in container info is not human friendly

2020-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-2911:
-
Labels: newbie pull-request-available  (was: newbie)

> lastUsed and stateEnterTime value in container info is not human friendly
> -
>
> Key: HDDS-2911
> URL: https://issues.apache.org/jira/browse/HDDS-2911
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Sammi Chen
>Assignee: mingchao zhao
>Priority: Major
>  Labels: newbie, pull-request-available
>
> ozone scmcli container list -s=7
> {
>   "state" : "CLOSED",
>   "replicationFactor" : "THREE",
>   "replicationType" : "RATIS",
>   "usedBytes" : 4794248299,
>   "numberOfKeys" : 7649,
>   "lastUsed" : 5388521335,
>   "stateEnterTime" : 808947405,
>   "owner" : "a46123a8-be63-4736-9478-ce4d8ac845cc",
>   "containerID" : 8,
>   "deleteTransactionId" : 0,
>   "sequenceId" : 0,
>   "open" : false
> }



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3111) Add unit test for container replication behavior under different container placement policy

2020-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3111:
-
Labels: pull-request-available  (was: )

> Add unit test for container replication behavior under different container 
> placement policy
> ---
>
> Key: HDDS-3111
> URL: https://issues.apache.org/jira/browse/HDDS-3111
> Project: Hadoop Distributed Data Store
>  Issue Type: Test
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Major
>  Labels: pull-request-available
>
> Currently, the unit test for ReplicationManager only tested for container 
> state change and container placement policy only focus on the policy 
> algorithm.
> And we lack of one integration unit test for testing container replication 
> behavior under different container placement policy. Including some corner 
> cases, like not enough candidate node, fallback cases in rack awareness 
> policy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2877) Fix description of return type

2020-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-2877:
-
Labels: newbie pull-request-available  (was: newbie)

> Fix description of return type
> --
>
> Key: HDDS-2877
> URL: https://issues.apache.org/jira/browse/HDDS-2877
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: YiSheng Lien
>Priority: Minor
>  Labels: newbie, pull-request-available
>
> In this 
> [method|https://github.com/apache/hadoop-ozone/blob/master/hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/OzoneBucket.java#L616#L623],
>  the return type is *List*.
> The description of return type is *List* for now, we should 
> update it to *List*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3106) Intermittent timeout in TestOzoneManagerDoubleBufferWithOMResponse#testDoubleBuffer

2020-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3106:
-
Labels: pull-request-available  (was: )

> Intermittent timeout in 
> TestOzoneManagerDoubleBufferWithOMResponse#testDoubleBuffer
> ---
>
> Key: HDDS-3106
> URL: https://issues.apache.org/jira/browse/HDDS-3106
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Critical
>  Labels: pull-request-available
>
> {code:title=https://github.com/adoroszlai/hadoop-ozone/runs/474452740}
> [ERROR] Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 505.227 s <<< FAILURE! - in 
> org.apache.hadoop.ozone.om.ratis.TestOzoneManagerDoubleBufferWithOMResponse
> [ERROR] 
> testDoubleBuffer(org.apache.hadoop.ozone.om.ratis.TestOzoneManagerDoubleBufferWithOMResponse)
>   Time elapsed: 500.142 s  <<< ERROR!
> java.lang.Exception: test timed out after 50 milliseconds
>   at 
> org.apache.hadoop.ozone.om.ratis.TestOzoneManagerDoubleBufferWithOMResponse.testDoubleBuffer(TestOzoneManagerDoubleBufferWithOMResponse.java:394)
>   at 
> org.apache.hadoop.ozone.om.ratis.TestOzoneManagerDoubleBufferWithOMResponse.testDoubleBuffer(TestOzoneManagerDoubleBufferWithOMResponse.java:130)
> {code}
> Also in: 
> https://github.com/apache/hadoop-ozone/pull/590/checks?check_run_id=467388979
> CC [~bharat]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3113) Add new Freon test for putBlock

2020-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3113:
-
Labels: pull-request-available  (was: )

> Add new Freon test for putBlock
> ---
>
> Key: HDDS-3113
> URL: https://issues.apache.org/jira/browse/HDDS-3113
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: test
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available
>
> The goal of this task is to introduce a new Freon test that issues putBlock 
> commands.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2941) file create : create key table entries for intermediate directories in the path

2020-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-2941:
-
Labels: pull-request-available  (was: )

> file create : create key table entries for intermediate directories in the 
> path
> ---
>
> Key: HDDS-2941
> URL: https://issues.apache.org/jira/browse/HDDS-2941
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Manager
>Reporter: Supratim Deka
>Assignee: Supratim Deka
>Priority: Major
>  Labels: pull-request-available
>
> similar to and a follow-up pf HDDS-2940
> this change covers the file create request handler in the OM.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2980) Delete replayed entry from OpenKeyTable during commit

2020-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-2980:
-
Labels: pull-request-available  (was: )

> Delete replayed entry from OpenKeyTable during commit
> -
>
> Key: HDDS-2980
> URL: https://issues.apache.org/jira/browse/HDDS-2980
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Blocker
>  Labels: pull-request-available
>
> During KeyCreate (and S3InitiateMultipartUpload), we do not check the 
> OpenKeyTable if the key already exists. If it does exist and the transaction 
> is replayed, we just override the key in OpenKeyTable. This is done to avoid 
> extra DB reads.
> During KeyCommit (or S3MultipartUploadCommit), if the key was already 
> committed, then we do not replay the transaction. This would result in the 
> OpenKeyTable entry to remain in the DB till it is garbage collected. 
> To avoid storing stale entries in OpenKeyTable, during commit replays, we 
> should check the openKeyTable and delete the entry if it exists.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3115) NPE seen in datanode log as ApplyTransaction failed

2020-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3115:
-
Labels: pull-request-available  (was: )

> NPE seen in datanode log as ApplyTransaction failed
> ---
>
> Key: HDDS-3115
> URL: https://issues.apache.org/jira/browse/HDDS-3115
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Nilotpal Nandi
>Assignee: Siddharth Wagle
>Priority: Major
>  Labels: pull-request-available
>
> Errors seen in datanode log. 
>  
> {noformat}
> 1:04:32.860 PM ERROR ContainerStateMachine
> gid group-00234F8B3578 : ApplyTransaction failed. cmd PutBlock logIndex 56 
> msg : ContainerID 16 does not exist Container Result: CONTAINER_NOT_FOUND
> 1:04:32.860 PM ERROR XceiverServerRatis
> pipeline Action CLOSE on pipeline 
> PipelineID=b9601efc-f8bf-4b72-8077-00234f8b3578.Reason : Ratis Transaction 
> failure in datanode 2ba0ecb0-0739-4da9-9541-5fef23479f28 with role FOLLOWER 
> .Triggering pipeline close action.
> 1:04:32.860 PM ERROR ContainerStateMachine
> gid group-00234F8B3578 : ApplyTransaction failed. cmd WriteChunk logIndex 59 
> exception {}
> java.lang.NullPointerException
>  at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:226)
>  at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:162)
>  at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:396)
>  at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.runCommand(ContainerStateMachine.java:406)
>  at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$applyTransaction$6(ContainerStateMachine.java:745)
>  at 
> java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  at java.base/java.lang.Thread.run(Thread.java:834){noformat}
>  
> {noformat}
> 1:04:32.861 PM ERROR ContainerStateMachine
> gid group-00234F8B3578 : ApplyTransaction failed. cmd PutBlock logIndex 60 
> msg : ContainerID 16 does not exist Container Result: CONTAINER_NOT_FOUND
> 1:04:32.861 PM ERROR XceiverServerRatis
> pipeline Action CLOSE on pipeline 
> PipelineID=b9601efc-f8bf-4b72-8077-00234f8b3578.Reason : Ratis Transaction 
> failure in datanode 2ba0ecb0-0739-4da9-9541-5fef23479f28 with role FOLLOWER 
> .Triggering pipeline close action.
> 1:04:32.904 PM ERROR StateContext
> Critical error occurred in StateMachine, setting shutDownMachine
> 1:04:34.862 PM ERROR DatanodeStateMachine
> DatanodeStateMachine Shutdown due to an critical error{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3118) Possible deadlock in LockManager

2020-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3118:
-
Labels: pull-request-available  (was: )

> Possible deadlock in LockManager
> 
>
> Key: HDDS-3118
> URL: https://issues.apache.org/jira/browse/HDDS-3118
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Attila Doroszlai
>Assignee: Bharat Viswanadham
>Priority: Blocker
>  Labels: pull-request-available
> Attachments: repro.log, repro.patch
>
>
> {{LockManager}} has a possible deadlock.
> # Number of locks is limited by using a {{GenericObjectPool}}.  If N locks 
> are already acquired, new requestors need to wait.  This wait in 
> {{getLockForLocking}} happens in a callback executed from 
> {{ConcurrentHashMap#compute}} while holding a lock on a map entry.
> # While releasing a lock, {{decrementActiveLockCount}} implicitly requires a 
> lock on an entry in {{ConcurrentHashMap}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1008) Invalidate closed container replicas on a failed volume

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-1008:
-
Labels: pull-request-available  (was: )

> Invalidate closed container replicas on a failed volume
> ---
>
> Key: HDDS-1008
> URL: https://issues.apache.org/jira/browse/HDDS-1008
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Reporter: Arpit Agarwal
>Assignee: Supratim Deka
>Priority: Major
>  Labels: pull-request-available
>
> When a volume is detected as failed, all closed containers on the volume 
> should be marked as invalid.
> Open containers will be handled separately.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3110) Fix race condition in Recon's container and pipeline handling.

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3110:
-
Labels: pull-request-available  (was: )

> Fix race condition in Recon's container and pipeline handling.
> --
>
> Key: HDDS-3110
> URL: https://issues.apache.org/jira/browse/HDDS-3110
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Recon
>Reporter: Aravindan Vijayan
>Assignee: Aravindan Vijayan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>
> Fix the following issues in Recon
> * Both the Incremental container report handler and the regular container 
> report handler add new containers from SCM whenever they see a new container. 
> This test and add step must be synchronized between the 2 handlers to avoid 
> any inconsistent metadata state.
> * NodeStateMap in allow does not addition of a single container to the Map of 
> Node -> Set of Containers since it instantiates with a 
> Collections.emptySet(), and then relies on a map.put() to update the value. 
> Changing this to a "new HashSet" allows addition of a container one by one 
> which is possible in Recon.
> * Improve logging in Recon Container Manager when it receives a container 
> report from a node before receiving the pipeline report for a newly created 
> pipeline.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3071) Datanodes unable to connect to recon in Secure Environment

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3071:
-
Labels: pull-request-available  (was: )

> Datanodes unable to connect to recon in Secure Environment
> --
>
> Key: HDDS-3071
> URL: https://issues.apache.org/jira/browse/HDDS-3071
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Recon
>Affects Versions: 0.6.0
>Reporter: Vivek Ratnavel Subramanian
>Assignee: Aravindan Vijayan
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDDS-3071.patch
>
>
> Datanodes throw this exception while connecting to recon.
> {code:java}
> datanode_1  | java.io.IOException: DestHost:destPort recon:9891 , 
> LocalHost:localPort 6a99ad69685d/192.168.48.4:0. Failed on local exception: 
> java.io.IOException: Couldn't set up IO streams: 
> java.lang.IllegalArgumentException: Empty nameString not alloweddatanode_1  | 
> java.io.IOException: DestHost:destPort recon:9891 , LocalHost:localPort 
> 6a99ad69685d/192.168.48.4:0. Failed on local exception: java.io.IOException: 
> Couldn't set up IO streams: java.lang.IllegalArgumentException: Empty 
> nameString not alloweddatanode_1  |  at 
> java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>  Method)datanode_1  |  at 
> java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)datanode_1
>   |  at 
> java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)datanode_1
>   |  at 
> java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)datanode_1
>   |  at 
> org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831)datanode_1  
> |  at 
> org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:806)datanode_1  |  
> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1515)datanode_1  | 
>  at org.apache.hadoop.ipc.Client.call(Client.java:1457)datanode_1  |  at 
> org.apache.hadoop.ipc.Client.call(Client.java:1367)datanode_1  |  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)datanode_1
>   |  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)datanode_1
>   |  at com.sun.proxy.$Proxy40.submitRequest(Unknown Source)datanode_1  |  at 
> org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolClientSideTranslatorPB.submitRequest(StorageContainerDatanodeProtocolClientSideTranslatorPB.java:116)datanode_1
>   |  at 
> org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolClientSideTranslatorPB.getVersion(StorageContainerDatanodeProtocolClientSideTranslatorPB.java:132)datanode_1
>   |  at 
> org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:71)datanode_1
>   |  at 
> org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:42)datanode_1
>   |  at 
> java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)datanode_1  
> |  at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)datanode_1
>   |  at 
> java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)datanode_1  
> |  at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)datanode_1
>   |  at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)datanode_1
>   |  at java.base/java.lang.Thread.run(Thread.java:834)datanode_1  | Caused 
> by: java.io.IOException: Couldn't set up IO streams: 
> java.lang.IllegalArgumentException: Empty nameString not alloweddatanode_1  | 
>  at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:866)datanode_1
>   |  at 
> org.apache.hadoop.ipc.Client$Connection.access$3700(Client.java:411)datanode_1
>   |  at 
> org.apache.hadoop.ipc.Client.getConnection(Client.java:1572)datanode_1  |  at 
> org.apache.hadoop.ipc.Client.call(Client.java:1403)datanode_1  |  ... 14 
> moredatanode_1  | Caused by: java.lang.IllegalArgumentException: Empty 
> nameString not alloweddatanode_1  |  at 
> java.security.jgss/sun.security.krb5.PrincipalName.validateNameStrings(PrincipalName.java:174)datanode_1
>   |  at 
> java.security.jgss/sun.security.krb5.PrincipalName.(PrincipalName.java:397)datanode_1
>   |  at 
> java.security.jgss/sun.security.krb5.PrincipalName.(PrincipalName.java:471)datanode_1
>   |  at 
> java.security.jgss/javax.security.auth.kerberos.KerberosPrincipal.(KerberosPrincipal.java:172)datanode_1
>   |  at 
> org.apache.hadoop.security.SaslRpcClient.getServerPrincipal(SaslRpcClient.java:305)datanode_1
>   |  at 
> or

[jira] [Updated] (HDDS-3116) Datanode sometimes fails to start with NPE when starting Ratis xceiver server

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3116:
-
Labels: pull-request-available  (was: )

> Datanode sometimes fails to start with NPE when starting Ratis xceiver server
> -
>
> Key: HDDS-3116
> URL: https://issues.apache.org/jira/browse/HDDS-3116
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.5.0
>Reporter: Stephen O'Donnell
>Assignee: Nanda kumar
>Priority: Major
>  Labels: pull-request-available
> Attachments: full_logs.txt
>
>
> While working on a network Topology test (HDDS-3084) which does the following:
> 1. Start a cluster with 6 DNs and 2 racks.
> 2. Create a volume, bucket and a single key.
> 3. Stop one rack of hosts using "docker-compose down"
> 4. Read the data from the single key
> 5. Start the 3 down hosts
> 6. Stop the other 3 hosts
> 7. Attempt to read the key again.
> At step 5 I sometimes see this stack trace in one of the DNs and it fails to 
> full come up:
> {code}
> 2020-03-02 13:01:31,887 [Datanode State Machine Thread - 0] INFO 
> ozoneimpl.OzoneContainer: Attempting to start container services.
> 2020-03-02 13:01:31,887 [Datanode State Machine Thread - 0] INFO 
> ozoneimpl.OzoneContainer: Background container scanner has been disabled.
> 2020-03-02 13:01:31,887 [Datanode State Machine Thread - 0] INFO 
> ratis.XceiverServerRatis: Starting XceiverServerRatis 
> 8c1178dd-c44d-49d1-b899-cc3e40ae8f23 at port 9858
> 2020-03-02 13:01:31,887 [Datanode State Machine Thread - 0] WARN 
> statemachine.EndpointStateMachine: Unable to communicate to SCM server at 
> scm:9861 for past 15000 seconds.
> java.io.IOException: java.lang.NullPointerException
>   at org.apache.ratis.util.IOUtils.asIOException(IOUtils.java:54)
>   at org.apache.ratis.util.IOUtils.toIOException(IOUtils.java:61)
>   at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:70)
>   at 
> org.apache.ratis.server.impl.RaftServerProxy.getImpls(RaftServerProxy.java:284)
>   at 
> org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:296)
>   at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.start(XceiverServerRatis.java:418)
>   at 
> org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.start(OzoneContainer.java:232)
>   at 
> org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:113)
>   at 
> org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:42)
>   at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>   at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>   at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.sendPipelineReport(XceiverServerRatis.java:757)
>   at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.notifyGroupAdd(XceiverServerRatis.java:739)
>   at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.initialize(ContainerStateMachine.java:218)
>   at 
> org.apache.ratis.server.impl.ServerState.initStatemachine(ServerState.java:160)
>   at org.apache.ratis.server.impl.ServerState.(ServerState.java:112)
>   at 
> org.apache.ratis.server.impl.RaftServerImpl.(RaftServerImpl.java:112)
>   at 
> org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$2(RaftServerProxy.java:208)
>   at 
> java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
>   ... 3 more
> {code}
> The DN does not recover from this automatically, although I confirmed that a 
> full cluster restart fixed it (docker-compose stop; docker-compose start). I 
> will try to confirm if a restart of the stuck DN would fix it or not too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3119) When ratis is enabled in OM, double Buffer metrics not getting updated

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3119:
-
Labels: pull-request-available  (was: )

> When ratis is enabled in OM, double Buffer metrics not getting updated 
> ---
>
> Key: HDDS-3119
> URL: https://issues.apache.org/jira/browse/HDDS-3119
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Reporter: Bharat Viswanadham
>Assignee: Aravindan Vijayan
>Priority: Blocker
>  Labels: pull-request-available
>
> DoubleBuffer metrics are not getting updated when ratis is enabled in OM.
> There is no issue when ratis is not enabled, double buffer metrics are 
> updating fine.
> {code:java}
> {"name": 
> "Hadoop:service=OzoneManager,name=OzoneManagerDoubleBufferMetrics","modelerType":
>  "OzoneManagerDoubleBufferMetrics","tag.Hostname": 
> "hw13865.hitronhub.home","TotalNumOfFlushOperations": 
> 0,"TotalNumOfFlushedTransactions": 
> 0,"MaxNumberOfTransactionsFlushedInOneIteration": 0,"FlushTimeNumOps": 
> 0,"FlushTimeAvgTime": 0,"AvgFlushTransactionsInOneIteration": 0},{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3124) Time interval calculate error

2020-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3124:
-
Labels: pull-request-available  (was: )

> Time interval calculate error 
> --
>
> Key: HDDS-3124
> URL: https://issues.apache.org/jira/browse/HDDS-3124
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 0.5.0
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
>  Labels: pull-request-available
> Attachments: screenshot-1.png
>
>
> Just as the image shows, the time interval in log "Unable to communicate to 
> SCM server at scm-0.scm:9861 for past " is 0, 3000 seconds, but actually it 
> is 0, 300 seconds.
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3129) Skip KeyTable check in OMKeyCommit

2020-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3129:
-
Labels: pull-request-available  (was: )

> Skip KeyTable check in OMKeyCommit
> --
>
> Key: HDDS-3129
> URL: https://issues.apache.org/jira/browse/HDDS-3129
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>  Labels: pull-request-available
>
> With replay logic, we have additional keyTable check to detect whether it is 
> replay or not.
> In non-HA case, we don't need this check. So this Jira is to skip that check 
> in case of non-HA when ratis is not enabled.
>  
> *Ran simple test to know the perf impact:*
>  
> 2295 Keys/sec with Additional Key Table check
> 2824 Keys/sec with removing that check
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2874) Fix description of return in getNextListOfBuckets

2020-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-2874:
-
Labels: newbie pull-request-available  (was: newbie)

> Fix description of return in getNextListOfBuckets
> -
>
> Key: HDDS-2874
> URL: https://issues.apache.org/jira/browse/HDDS-2874
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: YiSheng Lien
>Priority: Minor
>  Labels: newbie, pull-request-available
>
> In this 
> [line|https://github.com/apache/hadoop-ozone/blob/master/hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/OzoneVolume.java#L319],
>  the description of return should be *List*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2780) Fix javadoc of OMVolume response classes

2020-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-2780:
-
Labels: newbie pull-request-available  (was: newbie)

> Fix javadoc of OMVolume response classes
> 
>
> Key: HDDS-2780
> URL: https://issues.apache.org/jira/browse/HDDS-2780
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Bharat Viswanadham
>Priority: Minor
>  Labels: newbie, pull-request-available
>
> Fix javadoc of OMVolumeCreateResponse and OMVolumeDeleteResponse
> {code:java}
> /**
>  * Response for CreateBucket request.
>  */
> public class OMVolumeCreateResponse extends OMClientResponse{code}
> This should be "Response for CreateVolume request".
>  
> {code:java}
> /**
>  * Response for CreateVolume request.
>  */
> public class OMVolumeDeleteResponse extends OMClientResponse{code}
>  
> This should be "Response for DeleteVolume request".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3123) Create REST API to serve Pipeline information and integrate with UI in Recon.

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3123:
-
Labels: pull-request-available  (was: )

> Create REST API to serve Pipeline information and integrate with UI in Recon.
> -
>
> Key: HDDS-3123
> URL: https://issues.apache.org/jira/browse/HDDS-3123
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Recon
>Affects Versions: 0.5.0
>Reporter: Vivek Ratnavel Subramanian
>Assignee: Vivek Ratnavel Subramanian
>Priority: Major
>  Labels: pull-request-available
>
> We need a REST API to serve Pipeline information in Recon and integrate with 
> existing Recon UI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3121) Fix TestSCMPipelineBytesWrittenMetrics

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3121:
-
Labels: pull-request-available  (was: )

> Fix TestSCMPipelineBytesWrittenMetrics
> --
>
> Key: HDDS-3121
> URL: https://issues.apache.org/jira/browse/HDDS-3121
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>  Labels: pull-request-available
>
> In the test, we have Thread.sleep and then check the metric value. It will be 
> better to use GenerictestUtils.waitFor and check the value of the metric. In 
> few of the runs we have seen this test failed.
> {code:java}
> Thread.sleep(100 * 1000L);
> metrics =
>  getMetrics(SCMPipelineMetrics.class.getSimpleName());
> for (Pipeline pipeline : cluster.getStorageContainerManager()
>  .getPipelineManager().getPipelines()) {
>  Assert.assertEquals(bytesWritten, getLongCounter(
>  SCMPipelineMetrics.getBytesWrittenMetricName(pipeline), metrics));
> }{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3075) Improve query result of container info in scmcli when container doesn't exist

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3075:
-
Labels: pull-request-available  (was: )

> Improve query result of container info in scmcli when container doesn't exist
> -
>
> Key: HDDS-3075
> URL: https://issues.apache.org/jira/browse/HDDS-3075
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: YiSheng Lien
>Assignee: Siddharth Wagle
>Priority: Major
>  Labels: pull-request-available
> Attachments: scmcli container info.png
>
>
> When *ozone scmcli container info* queried the ** that doesn't 
> exist, the query result would only show *#*.
> I propose that we should inform user the container doesn't exist.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2339) Add OzoneManager to MiniOzoneChaosCluster

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-2339:
-
Labels: pull-request-available  (was: )

> Add OzoneManager to MiniOzoneChaosCluster
> -
>
> Key: HDDS-2339
> URL: https://issues.apache.org/jira/browse/HDDS-2339
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: om
>Reporter: Mukul Kumar Singh
>Assignee: Hanisha Koneru
>Priority: Major
>  Labels: pull-request-available
>
> This jira proposes to add OzoneManager to MiniOzoneChaosCluster with OzoneHA 
> implementation done. This will help in discovering bugs in Ozone Manager HA



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3131) TestMiniChaosOzoneCluster timeout

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3131:
-
Labels: pull-request-available  (was: )

> TestMiniChaosOzoneCluster timeout
> -
>
> Key: HDDS-3131
> URL: https://issues.apache.org/jira/browse/HDDS-3131
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Attila Doroszlai
>Priority: Critical
>  Labels: pull-request-available
> Attachments: unit (1).zip, unit (2).zip
>
>
> TestMiniChaosOzoneCluster times out in CI runs rather frequently:
> https://github.com/apache/hadoop-ozone/runs/486890736
> https://github.com/apache/hadoop-ozone/runs/486890004
> https://github.com/apache/hadoop-ozone/runs/486836962
> Logs are available in "unit" artifacts.
> CC [~msingh]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3130) Add jaeger trace span in s3gateway

2020-03-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3130:
-
Labels: pull-request-available  (was: )

> Add jaeger trace span in s3gateway
> --
>
> Key: HDDS-3130
> URL: https://issues.apache.org/jira/browse/HDDS-3130
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: runzhiwang
>Assignee: runzhiwang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3132) NPE when create RPC client

2020-03-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3132:
-
Labels: pull-request-available  (was: )

> NPE when create RPC client 
> ---
>
> Key: HDDS-3132
> URL: https://issues.apache.org/jira/browse/HDDS-3132
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Sammi Chen
>Assignee: Sammi Chen
>Priority: Major
>  Labels: pull-request-available
>
> java.io.IOException: Couldn't create RpcClient protocol
>   at 
> org.apache.hadoop.ozone.client.OzoneClientFactory.getClientProtocol(OzoneClientFactory.java:197)
>   at 
> org.apache.hadoop.ozone.client.OzoneClientFactory.getClientProtocol(OzoneClientFactory.java:173)
>   at 
> org.apache.hadoop.ozone.client.OzoneClientFactory.getClient(OzoneClientFactory.java:74)
>   at 
> org.preta.tools.ozone.benchmark.om.OmReadBenchmark.getInputKeyNames(OmReadBenchmark.java:101)
>   at 
> org.preta.tools.ozone.benchmark.om.OmReadBenchmark.execute(OmReadBenchmark.java:76)
>   at 
> org.preta.tools.ozone.benchmark.om.AbstractOmBenchmark.run(AbstractOmBenchmark.java:63)
>   at picocli.CommandLine.executeUserObject(CommandLine.java:1729)
>   at picocli.CommandLine.access$900(CommandLine.java:145)
>   at picocli.CommandLine$RunLast.handle(CommandLine.java:2101)
>   at picocli.CommandLine$RunLast.handle(CommandLine.java:2068)
>   at 
> picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:1935)
>   at picocli.CommandLine.execute(CommandLine.java:1864)
>   at org.preta.tools.ozone.Main.execute(Main.java:50)
>   at org.preta.tools.ozone.Main.main(Main.java:54)
> Caused by: java.lang.NullPointerException: Name is null
>   at java.lang.Enum.valueOf(Enum.java:236)
>   at 
> org.apache.hadoop.ozone.security.acl.IAccessAuthorizer$ACLType.valueOf(IAccessAuthorizer.java:48)
>   at 
> org.apache.hadoop.ozone.security.acl.OzoneAclConfig.getUserDefaultRights(OzoneAclConfig.java:52)
>   at 
> org.apache.hadoop.ozone.client.rpc.RpcClient.(RpcClient.java:148)
>   at 
> org.apache.hadoop.ozone.client.OzoneClientFactory.getClientProtocol(OzoneClientFactory.java:190



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3117) Recon throws InterruptedException while getting new snapshot from OM

2020-03-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3117:
-
Labels: pull-request-available  (was: )

> Recon throws InterruptedException while getting new snapshot from OM
> 
>
> Key: HDDS-3117
> URL: https://issues.apache.org/jira/browse/HDDS-3117
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Recon
>Affects Versions: 0.5.0
>Reporter: Vivek Ratnavel Subramanian
>Assignee: Aravindan Vijayan
>Priority: Major
>  Labels: pull-request-available
>
> Recon throws the following exception in a cluster with 21 datanodes and 1 
> million keys
> {code:java}
> 12:46:23.482 PMINFOOzoneManagerServiceProviderImplObtaining full snapshot 
> from Ozone Manager
> 12:47:08.072 PMINFOOzoneManagerServiceProviderImplGot new checkpoint from OM 
> : /var/lib/hadoop-ozone/recon/om/data/om.snapshot.db_1583181983482
> 12:47:08.072 PMINFOReconOmMetadataManagerImplCleaning up old OM snapshot db 
> at /var/lib/hadoop-ozone/recon/om/data/om.snapshot.db_1583174381836.
> 12:47:08.166 PMINFOReconOmMetadataManagerImplCreated OM DB handle from 
> snapshot at /var/lib/hadoop-ozone/recon/om/data/om.snapshot.db_1583181983482.
> 12:47:08.276 PMINFOOzoneManagerServiceProviderImplCalling reprocess on Recon 
> tasks.
> 12:47:08.276 PMERROROzoneManagerServiceProviderImplUnable to update Recon's 
> OM DB with new snapshot 
> java.lang.InterruptedException
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1302)
>   at java.util.concurrent.Semaphore.acquire(Semaphore.java:312)
>   at 
> org.apache.hadoop.ozone.recon.tasks.ReconTaskControllerImpl.reInitializeTasks(ReconTaskControllerImpl.java:175)
>   at 
> org.apache.hadoop.ozone.recon.spi.impl.OzoneManagerServiceProviderImpl.syncDataFromOM(OzoneManagerServiceProviderImpl.java:375)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3120) Freon work with OM HA

2020-03-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3120:
-
Labels: pull-request-available  (was: )

> Freon work with OM HA
> -
>
> Key: HDDS-3120
> URL: https://issues.apache.org/jira/browse/HDDS-3120
> Project: Hadoop Distributed Data Store
>  Issue Type: New Feature
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>  Labels: pull-request-available
>
> Make Freon commands work with OM HA



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3089) TestSCMNodeManager intermittent crash

2020-03-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3089:
-
Labels: pull-request-available  (was: )

> TestSCMNodeManager intermittent crash
> -
>
> Key: HDDS-3089
> URL: https://issues.apache.org/jira/browse/HDDS-3089
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available
> Attachments: hs_err_pid9082.log, 
> org.apache.hadoop.hdds.scm.node.TestSCMNodeManager-output.txt
>
>
> TestSCMNodeManager crashed in one of the runs, although it passes usually:
> {code:title=https://github.com/apache/hadoop-ozone/pull/601/checks?check_run_id=471611827}
> [ERROR] Crashed tests:
> [ERROR] org.apache.hadoop.hdds.scm.node.TestSCMNodeManager
> {code}
> {code:title=hs_err_pid9082.log}
> siginfo: si_signo: 11 (SIGSEGV), si_code: 2 (SEGV_ACCERR), si_addr: 
> 0x7f378cf6f340
> Stack: [0x7f37626fb000,0x7f37627fc000],  sp=0x7f37627f9e48,  free 
> space=1019k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native 
> code)
> C  0x7f378cf6f340
> C  [librocksdbjni3775377216204452319.so+0x2a05dd]  
> rocksdb::DB::Delete(rocksdb::WriteOptions const&, 
> rocksdb::ColumnFamilyHandle*, rocksdb::Slice const&)+0x4d
> C  [librocksdbjni3775377216204452319.so+0x2a0641]  
> rocksdb::DBImpl::Delete(rocksdb::WriteOptions const&, 
> rocksdb::ColumnFamilyHandle*, rocksdb::Slice const&)+0x11
> C  [librocksdbjni3775377216204452319.so+0x1a931a]  
> rocksdb::DB::Delete(rocksdb::WriteOptions const&, rocksdb::Slice const&)+0xba
> C  [librocksdbjni3775377216204452319.so+0x19f3e0]  
> rocksdb_delete_helper(JNIEnv_*, rocksdb::DB*, rocksdb::WriteOptions const&, 
> rocksdb::ColumnFamilyHandle*, _jbyteArray*, int, int)+0x130
> C  [librocksdbjni3775377216204452319.so+0x19f4a1]  
> Java_org_rocksdb_RocksDB_delete__J_3BII+0x41
> j  org.rocksdb.RocksDB.delete(J[BII)V+0
> j  org.rocksdb.RocksDB.delete([B)V+13
> j  org.apache.hadoop.hdds.utils.RocksDBStore.delete([B)V+9
> j  
> org.apache.hadoop.hdds.scm.pipeline.SCMPipelineManager.removePipeline(Lorg/apache/hadoop/hdds/scm/pipeline/PipelineID;)V+35
> j  
> org.apache.hadoop.hdds.scm.pipeline.SCMPipelineManager.destroyPipeline(Lorg/apache/hadoop/hdds/scm/pipeline/Pipeline;)V+27
> ...
> j  
> org.apache.hadoop.hdds.scm.node.DeadNodeHandler.destroyPipelines(Lorg/apache/hadoop/hdds/protocol/DatanodeDetails;)V+28
> j  
> org.apache.hadoop.hdds.scm.node.DeadNodeHandler.onMessage(Lorg/apache/hadoop/hdds/protocol/DatanodeDetails;Lorg/apache/hadoop/hdds/server/events/EventPublisher;)V+6
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3140) Remove hard-coded SNAPSHOT version from GitHub workflows

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3140:
-
Labels: pull-request-available  (was: )

> Remove hard-coded SNAPSHOT version from GitHub workflows
> 
>
> Key: HDDS-3140
> URL: https://issues.apache.org/jira/browse/HDDS-3140
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: build
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available
>
> Ozone's GitHub Actions workflows only work with SNAPSHOT versions due to 
> hard-coded {{ozone-*-SNAPSHOT}} in target path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3141) Unit check fails to execute insight and mini-chaos-tests modules

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3141:
-
Labels: pull-request-available  (was: )

> Unit check fails to execute insight and mini-chaos-tests modules
> 
>
> Key: HDDS-3141
> URL: https://issues.apache.org/jira/browse/HDDS-3141
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: build, test
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available
>
> {code:title=https://github.com/apache/hadoop-ozone/runs/490978126?check_suite_focus=true}
> 2020-03-06T19:13:08.6122969Z [ERROR] Failed to execute goal on project 
> hadoop-ozone-insight: Could not resolve dependencies for project 
> org.apache.hadoop:hadoop-ozone-insight:jar:0.5.0-beta: Could not find 
> artifact org.apache.hadoop:hadoop-ozone-integration-test:jar:tests:0.5.0-beta 
> in apache.snapshots.https 
> (https://repository.apache.org/content/repositories/snapshots) -> [Help 1]
> 2020-03-06T19:13:08.6180318Z [ERROR] Failed to execute goal on project 
> mini-chaos-tests: Could not resolve dependencies for project 
> org.apache.hadoop:mini-chaos-tests:jar:0.5.0-beta: Failure to find 
> org.apache.hadoop:hadoop-ozone-integration-test:jar:tests:0.5.0-beta in 
> https://repository.apache.org/content/repositories/snapshots was cached in 
> the local repository, resolution will not be reattempted until the update 
> interval of apache.snapshots.https has elapsed or updates are forced -> [Help 
> 1]
> {code}
> Unit check skips {{integration-test}}, but these 2 modules depend on it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3143) Rename silently ignored tests

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3143:
-
Labels: pull-request-available  (was: )

> Rename silently ignored tests
> -
>
> Key: HDDS-3143
> URL: https://issues.apache.org/jira/browse/HDDS-3143
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Minor
>  Labels: pull-request-available
>
> Surefire plugin is configured to run {{Test*}} classes, but there are two 
> test classes named {{*Test}}:
> {code}
> $ find */*/src/test/java -name '*Test.java' | xargs grep -l '@Test'
> hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/HddsServerUtilTest.java
> hadoop-ozone/insight/src/test/java/org/apache/hadoop/ozone/insight/LogSubcommandTest.java
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3150) Implement getIfExist in Table and use it in CreateKey/File

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3150:
-
Labels: pull-request-available  (was: )

> Implement getIfExist in Table and use it in CreateKey/File
> --
>
> Key: HDDS-3150
> URL: https://issues.apache.org/jira/browse/HDDS-3150
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>  Labels: pull-request-available
>
> With replay, now we use directly get() API.
> Previously the code
> OMKeyRequest.java
>  
> {code:java}
> else if (omMetadataManager.getKeyTable().isExist(dbKeyName)) {
>  // TODO: Need to be fixed, as when key already exists, we are
>  // appending new blocks to existing key.
>  keyInfo = omMetadataManager.getKeyTable().get(dbKeyName);{code}
>  
> Now for every create key/File we use get API, this is changed for replay
> {code:java}
> OmKeyInfo dbKeyInfo =
>  omMetadataManager.getKeyTable().get(dbKeyName);
> if (dbKeyInfo != null) {{code}
> The proposal is to change get with getIfExist, and make use of keyMayExist.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3100) Fix TestDeadNodeHandler.

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3100:
-
Labels: pull-request-available  (was: )

> Fix TestDeadNodeHandler.
> 
>
> Key: HDDS-3100
> URL: https://issues.apache.org/jira/browse/HDDS-3100
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Affects Versions: 0.5.0
>Reporter: Aravindan Vijayan
>Assignee: Aravindan Vijayan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3142) Create isolated enviornment for OM to test it without SCM

2020-03-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3142:
-
Labels: pull-request-available  (was: )

> Create isolated enviornment for OM to test it without SCM
> -
>
> Key: HDDS-3142
> URL: https://issues.apache.org/jira/browse/HDDS-3142
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Marton Elek
>Assignee: Marton Elek
>Priority: Major
>  Labels: pull-request-available
>
> OmKeyGenerator class from Freon can generate keys (open key + commit key). 
> But this test tests both OM and SCM performance. It seems to be useful to 
> have a method to test only the OM performance with faking the response from 
> SCM.  
> Can be done easily with the same approach what we have in HDDS-3023: A simple 
> utility class can be implemented and with byteman we can replace the client 
> calls with the fake method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3152) Reduce number of chunkwriter threads in integration tests

2020-03-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3152:
-
Labels: pull-request-available  (was: )

> Reduce number of chunkwriter threads in integration tests
> -
>
> Key: HDDS-3152
> URL: https://issues.apache.org/jira/browse/HDDS-3152
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: test
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Minor
>  Labels: pull-request-available
>
> Integration tests run multiple datanodes in the same JVM.  Each datanode 
> comes with 60 chunk writer threads by default (may be decreased in 
> HDDS-3053).  This makes thread dumps (eg. produced by 
> {{GenericTestUtils.waitFor}} on timeout) really hard to navigate, as there 
> may be 300+ such threads.
> Since integration tests are generally run with a single disk which is even 
> shared among the datanodes, a few threads per datanode should be enough.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3095) Intermittent failure in TestFailureHandlingByClient#testDatanodeExclusionWithMajorityCommit

2020-03-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3095:
-
Labels: pull-request-available  (was: )

> Intermittent failure in 
> TestFailureHandlingByClient#testDatanodeExclusionWithMajorityCommit
> ---
>
> Key: HDDS-3095
> URL: https://issues.apache.org/jira/browse/HDDS-3095
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available
>
> {code:title=https://github.com/apache/hadoop-ozone/pull/614/checks?check_run_id=472938597}
> [ERROR] Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
> 284.887 s <<< FAILURE! - in 
> org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient
> [ERROR] 
> testDatanodeExclusionWithMajorityCommit(org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient)
>   Time elapsed: 66.589 s  <<< FAILURE!
> java.lang.AssertionError
> ...
>at 
> org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient.testDatanodeExclusionWithMajorityCommit(TestFailureHandlingByClient.java:336)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2989) Intermittent timeout in TestBlockManager

2020-03-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-2989:
-
Labels: pull-request-available  (was: )

> Intermittent timeout in TestBlockManager
> 
>
> Key: HDDS-2989
> URL: https://issues.apache.org/jira/browse/HDDS-2989
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available
>
> {code:title=https://github.com/apache/hadoop-ozone/runs/430663688}
> 2020-02-06T21:44:53.5319531Z [ERROR] Tests run: 9, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 5.344 s <<< FAILURE! - in 
> org.apache.hadoop.hdds.scm.block.TestBlockManager
> 2020-02-06T21:44:53.5319796Z [ERROR] 
> testMultipleBlockAllocation(org.apache.hadoop.hdds.scm.block.TestBlockManager)
>   Time elapsed: 1.167 s  <<< ERROR!
> 2020-02-06T21:44:53.5319942Z java.util.concurrent.TimeoutException: 
> 2020-02-06T21:44:53.5320496Z Timed out waiting for condition. Thread 
> diagnostics:
> 2020-02-06T21:44:53.5320839Z Timestamp: 2020-02-06 09:44:52,261
> 2020-02-06T21:44:53.5320901Z 
> 2020-02-06T21:44:53.5321178Z "Thread-26"  prio=5 tid=46 runnable
> 2020-02-06T21:44:53.5321292Z java.lang.Thread.State: RUNNABLE
> 2020-02-06T21:44:53.5321391Z at java.lang.Thread.dumpThreads(Native 
> Method)
> 2020-02-06T21:44:53.5326891Z at 
> java.lang.Thread.getAllStackTraces(Thread.java:1610)
> 2020-02-06T21:44:53.5327144Z at 
> org.apache.hadoop.test.TimedOutTestsListener.buildThreadDump(TimedOutTestsListener.java:87)
> 2020-02-06T21:44:53.5327309Z at 
> org.apache.hadoop.test.TimedOutTestsListener.buildThreadDiagnosticString(TimedOutTestsListener.java:73)
> 2020-02-06T21:44:53.5327465Z at 
> org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:389)
> 2020-02-06T21:44:53.5327618Z at 
> org.apache.hadoop.hdds.scm.block.TestBlockManager.testMultipleBlockAllocation(TestBlockManager.java:280)
> 2020-02-06T21:44:53.5388042Z at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2020-02-06T21:44:53.5388702Z at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2020-02-06T21:44:53.5388905Z at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2020-02-06T21:44:53.5389045Z at 
> java.lang.reflect.Method.invoke(Method.java:498)
> 2020-02-06T21:44:53.5389195Z at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> 2020-02-06T21:44:53.5389331Z at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 2020-02-06T21:44:53.5389662Z at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> 2020-02-06T21:44:53.5389776Z at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> 2020-02-06T21:44:53.5389916Z at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> 2020-02-06T21:44:53.5390040Z "Signal Dispatcher" daemon prio=9 tid=4 runnable
> 2020-02-06T21:44:53.5390156Z java.lang.Thread.State: RUNNABLE
> 2020-02-06T21:44:53.5390783Z 
> "EventQueue-CloseContainerForCloseContainerEventHandler"  prio=5 tid=32 in 
> Object.wait()
> 2020-02-06T21:44:53.5390916Z java.lang.Thread.State: WAITING (on object 
> monitor)
> 2020-02-06T21:44:53.5391019Z at sun.misc.Unsafe.park(Native Method)
> 2020-02-06T21:44:53.5391149Z at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> 2020-02-06T21:44:53.5391299Z at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> 2020-02-06T21:44:53.5391448Z at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> 2020-02-06T21:44:53.5391587Z at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
> 2020-02-06T21:44:53.5391721Z at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
> 2020-02-06T21:44:53.5391844Z at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 2020-02-06T21:44:53.5391971Z at java.lang.Thread.run(Thread.java:748)
> 2020-02-06T21:44:53.5392100Z "IPC Server idle connection scanner for port 
> 43801" daemon prio=5 tid=24 in Object.wait()
> 2020-02-06T21:44:53.5392227Z java.lang.Thread.State: WAITING (on object 
> monitor)
> 2020-02-06T21:44:53.5392347Z at java.lang.Object.wait(Native Method)
> 2020-02-06T21:44:53.5392463Z at java.lang.Object.wait(Object.java:502)
> 2

[jira] [Updated] (HDDS-3104) Integration test crashes due to critical error in datanode

2020-03-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3104:
-
Labels: pull-request-available  (was: )

> Integration test crashes due to critical error in datanode
> --
>
> Key: HDDS-3104
> URL: https://issues.apache.org/jira/browse/HDDS-3104
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Attila Doroszlai
>Assignee: Siddharth Wagle
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDDS-3104.patch, 
> org.apache.hadoop.ozone.client.rpc.TestContainerStateMachineFailureOnRead-output.txt
>
>
> {code:title=test log}
> 2020-02-28 07:36:17,759 [Datanode State Machine Thread - 0] ERROR 
> statemachine.StateContext (StateContext.java:execute(420)) - Critical error 
> occurred in StateMachine, setting shutDownMachine
> ...
> 2020-02-28 07:36:21,216 [Datanode State Machine Thread - 0] INFO  
> util.ExitUtil (ExitUtil.java:terminate(210)) - Exiting with status 1: 
> ExitException
> {code}
> {code:title=build output}
> [ERROR] ExecutionException The forked VM terminated without properly saying 
> goodbye. VM crash or System.exit called?
> {code}
> https://github.com/adoroszlai/hadoop-ozone/runs/474218807
> https://github.com/adoroszlai/hadoop-ozone/suites/487650271/artifacts/2327174



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3148) Logs cluttered by AlreadyExistsException from Ratis

2020-03-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3148:
-
Labels: pull-request-available  (was: )

> Logs cluttered by AlreadyExistsException from Ratis
> ---
>
> Key: HDDS-3148
> URL: https://issues.apache.org/jira/browse/HDDS-3148
> Project: Hadoop Distributed Data Store
>  Issue Type: Wish
>  Components: Ozone Datanode
>Reporter: Attila Doroszlai
>Assignee: Siddharth Wagle
>Priority: Major
>  Labels: pull-request-available
>
> Ozone startup logs are cluttered by printing stack trace of 
> AlreadyExistsException related to group addition.  Example:
> {code}
> 2020-03-09 13:53:01,563 [grpc-default-executor-0] WARN  impl.RaftServerProxy 
> (RaftServerProxy.java:lambda$groupAddAsync$11(390)) - 
> 7a07f161-9144-44b2-8baa-73f0e9299675: Failed groupAdd* 
> GroupManagementRequest:client-27FB1A91809E->7a07f161-9144-44b2-8baa-73f0e9299675@group-E151028E3AC0,
>  cid=2, seq=0, RW, null, 
> Add:group-E151028E3AC0:[18f4e257-bf09-482e-b1bb-a2408a093ff7:172.17.0.2:43845,
>  7a07f161-9144-44b2-8baa-73f0e9299675:172.17.0.2:41551, 
> 8a66c80e-ab55-4975-92a9-8aaf06ab418a:172.17.0.2:36921]
> java.util.concurrent.CompletionException: 
> org.apache.ratis.protocol.AlreadyExistsException: 
> 7a07f161-9144-44b2-8baa-73f0e9299675: Failed to add 
> group-E151028E3AC0:[18f4e257-bf09-482e-b1bb-a2408a093ff7:172.17.0.2:43845, 
> 7a07f161-9144-44b2-8baa-73f0e9299675:172.17.0.2:41551, 
> 8a66c80e-ab55-4975-92a9-8aaf06ab418a:172.17.0.2:36921] since the group 
> already exists in the map.
>   at 
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
>   at 
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
>   at 
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607)
>   at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
>   at 
> java.util.concurrent.CompletableFuture.uniApplyStage(CompletableFuture.java:631)
>   at 
> java.util.concurrent.CompletableFuture.thenApplyAsync(CompletableFuture.java:2006)
>   at 
> org.apache.ratis.server.impl.RaftServerProxy.groupAddAsync(RaftServerProxy.java:379)
>   at 
> org.apache.ratis.server.impl.RaftServerProxy.groupManagementAsync(RaftServerProxy.java:363)
>   at 
> org.apache.ratis.grpc.server.GrpcAdminProtocolService.lambda$groupManagement$0(GrpcAdminProtocolService.java:42)
>   at org.apache.ratis.grpc.GrpcUtil.asyncCall(GrpcUtil.java:160)
>   at 
> org.apache.ratis.grpc.server.GrpcAdminProtocolService.groupManagement(GrpcAdminProtocolService.java:42)
>   at 
> org.apache.ratis.proto.grpc.AdminProtocolServiceGrpc$MethodHandlers.invoke(AdminProtocolServiceGrpc.java:358)
>   at 
> org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:172)
>   at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:331)
>   at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:814)
>   at 
> org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
>   at 
> org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.ratis.protocol.AlreadyExistsException: 
> 7a07f161-9144-44b2-8baa-73f0e9299675: Failed to add 
> group-E151028E3AC0:[18f4e257-bf09-482e-b1bb-a2408a093ff7:172.17.0.2:43845, 
> 7a07f161-9144-44b2-8baa-73f0e9299675:172.17.0.2:41551, 
> 8a66c80e-ab55-4975-92a9-8aaf06ab418a:172.17.0.2:36921] since the group 
> already exists in the map.
>   at 
> org.apache.ratis.server.impl.RaftServerProxy$ImplMap.addNew(RaftServerProxy.java:83)
>   at 
> org.apache.ratis.server.impl.RaftServerProxy.groupAddAsync(RaftServerProxy.java:378)
>   ... 13 more
> {code}
> Since these are "normal", I think stack trace should be suppressed.
> CC [~nanda]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3157) Fix docker startup command in README.md

2020-03-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3157:
-
Labels: pull-request-available  (was: )

>  Fix docker startup command in README.md
> 
>
> Key: HDDS-3157
> URL: https://issues.apache.org/jira/browse/HDDS-3157
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Neo Yang
>Assignee: Neo Yang
>Priority: Minor
>  Labels: pull-request-available
>
> There are some docker-compose startup command errors in [this 
> block|https://github.com/apache/hadoop-ozone#build-from-source], it should be:
> cd hadoop-ozone/dist/target/ozone-*/compose{color:#FF}*/ozone*{color}
> docker-compose up -d {color:#FF}*--*{color}scale datanode=3



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3156) update allocateContainer to remove additional createPipeline step.

2020-03-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3156:
-
Labels: pull-request-available  (was: )

> update allocateContainer to remove additional createPipeline step.
> --
>
> Key: HDDS-3156
> URL: https://issues.apache.org/jira/browse/HDDS-3156
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: SCM
>Reporter: Li Cheng
>Assignee: Li Cheng
>Priority: Major
>  Labels: pull-request-available
>
> AllocateContainer now tries allocatePipelines. But with multi-raft, it should 
> not worry about whether there are available pipelines.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3159) Bump RocksDB version to the latest one

2020-03-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3159:
-
Labels: pull-request-available  (was: )

> Bump RocksDB version to the latest one
> --
>
> Key: HDDS-3159
> URL: https://issues.apache.org/jira/browse/HDDS-3159
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Marton Elek
>Assignee: Marton Elek
>Priority: Minor
>  Labels: pull-request-available
>
> 6.0.1 -- our current version from RocksDB -- released one year ago. Since 
> than many new versions are released with important bug fixes.
> I propose to update to the latest one...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3160) Disable index and filter block cache for RocksDB

2020-03-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-3160:
-
Labels: pull-request-available  (was: )

> Disable index and filter block cache for RocksDB
> 
>
> Key: HDDS-3160
> URL: https://issues.apache.org/jira/browse/HDDS-3160
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Marton Elek
>Assignee: Marton Elek
>Priority: Minor
>  Labels: pull-request-available
> Attachments: key_allocation_after.png, key_allocation_before.png, 
> profile.png
>
>
> During preformance tests It was noticed that the OM performance is dropped 
> after 10-20 million of keys. (see the screenshot).
> By default cache_index_and_filter_blocks is enabled for all of our RocksDB 
> instances (see DBProfile) which is not the best option. (For example see this 
> thread: https://github.com/facebook/rocksdb/issues/3961#)
> With turning on this cache the indexes and bloom filters are cached **inside 
> the block cache** which makes slower the cache when we have significant data.
> Without turning it on (based on my understanding) all the indexes will remain 
> open without any cache. With our current settings we have only a few number 
> of sst files (even with million of keys) therefore it seems to be safe to 
> turn this option off.
> With turning this option of I was able to write >100M keys with high 
> throughput. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2848) Recon changes to make snapshots work with OM HA

2020-03-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-2848:
-
Labels: pull-request-available  (was: )

> Recon changes to make snapshots work with OM HA
> ---
>
> Key: HDDS-2848
> URL: https://issues.apache.org/jira/browse/HDDS-2848
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Recon
>Reporter: Aravindan Vijayan
>Assignee: Siddharth Wagle
>Priority: Major
>  Labels: pull-request-available
>
> Recon talks to OM in 2 ways - Through HTTP to get DB snapshot, and through 
> RPC to get delta updates.
> Since Recon already uses the OzoneManagerClientProtocol to query the 
> OzoneManager RPC, the RPC client automatically routes the request to the 
> leader on an OM HA cluster. Recon only needs the updates from the OM RocksDB 
> store, and does not need the in flight updates in the OM DoubleBuffer. Due to 
> the guarantee from Ratis that the leader’s RocksDB will always be up to date, 
> Recon does not need to worry about going back in time when a current OM 
> leader goes down. We have to pass in the om service ID to the Ozone Manager 
> client in Recon, and the failover works internally. Currently we pass in 
> 'null'.
> To make the HTTP call to work against OM HA, Recon has to find out the 
> current OM leader and download the snapshot from that OM instance. We can use 
> the way this has been implemented in 
> org.apache.hadoop.ozone.admin.om.GetServiceRolesSubcommand. We can get the 
> roles of OM instances and then determine the leader from that. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



<    1   2   3   4   5   6   7   8   9   10   >