[jira] [Updated] (HDDS-4191) Add failover proxy for SCM container client

2020-10-27 Thread Li Cheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Cheng updated HDDS-4191:
---
Status: Patch Available  (was: In Progress)

https://github.com/apache/hadoop-ozone/pull/1514

> Add failover proxy for SCM container client
> ---
>
> Key: HDDS-4191
> URL: https://issues.apache.org/jira/browse/HDDS-4191
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Li Cheng
>Assignee: Li Cheng
>Priority: Major
>  Labels: pull-request-available
>
> Take advantage of failover proxy in HDDS-3188 and have failover proxy for SCM 
> container client as well



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-4393) Fix CI and test failures after force push on 2020/10/26

2020-10-27 Thread Li Cheng (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-4393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17221168#comment-17221168
 ] 

Li Cheng commented on HDDS-4393:


[https://github.com/apache/hadoop-ozone/pull/1522] shows the current feature 
branch HDDS-2823 has issues in CI.

> Fix CI and test failures after force push on 2020/10/26
> ---
>
> Key: HDDS-4393
> URL: https://issues.apache.org/jira/browse/HDDS-4393
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM HA
>Reporter: Li Cheng
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-4393) Fix CI and test failures after force push on 2020/10/26

2020-10-26 Thread Li Cheng (Jira)
Li Cheng created HDDS-4393:
--

 Summary: Fix CI and test failures after force push on 2020/10/26
 Key: HDDS-4393
 URL: https://issues.apache.org/jira/browse/HDDS-4393
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
  Components: SCM HA
Reporter: Li Cheng






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-4339) Ozone S3 gateway throws NPE with goofys

2020-10-22 Thread Li Cheng (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-4339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17219456#comment-17219456
 ] 

Li Cheng commented on HDDS-4339:


I created https://issues.apache.org/jira/browse/HDDS-4361 to track s3g error 
messages.

> Ozone S3 gateway throws NPE with goofys
> ---
>
> Key: HDDS-4339
> URL: https://issues.apache.org/jira/browse/HDDS-4339
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Li Cheng
>Assignee: Li Cheng
>Priority: Blocker
>  Labels: pull-request-available
> Attachments: image-2020-10-13-15-23-49-864.png
>
>
> Configured goofys and s3g on different hosts and Fiotest writes files on the 
> goofys mount point. Export AWS secrets on the s3g host. See a bunch of NPE in 
> s3g logs.
>  # Looks like missing AWS auth header could cause NPE. Looks like 
> AWSSignatureProcessor.init() doesn't handle header missing which causes NPE.
>  # Why it's missing AWS auth header is also unknown.
> Note that there are files that have been successfully written into Ozone via 
> goofys, while not all of them are succeeded.  
>  
> 2020-10-13 11:18:43,425 [qtp1686100174-1238] ERROR 
> org.apache.hadoop.ozone.s3.OzoneClientProducer: Error: 
> org.jboss.weld.exceptions.WeldException: WELD-49: Unable to invoke public 
> void org.apache.hadoop.ozone.s3.AWSSignatureProcessor.init() throws 
> java.lang.Exception on 
> org.apache.hadoop.ozone.s3.AWSSignatureProcessor@5535155b
>  at 
> org.jboss.weld.injection.producer.DefaultLifecycleCallbackInvoker.invokeMethods(DefaultLifecycleCallbackInvoker.java:99)
>  at 
> org.jboss.weld.injection.producer.DefaultLifecycleCallbackInvoker.postConstruct(DefaultLifecycleCallbackInvoker.java:80)
>  at 
> org.jboss.weld.injection.producer.BasicInjectionTarget.postConstruct(BasicInjectionTarget.java:122)
>  at 
> org.glassfish.jersey.ext.cdi1x.internal.CdiComponentProvider$InjectionManagerInjectedCdiTarget.postConstruct(CdiComponentProvider.java:887)
>  at org.jboss.weld.bean.ManagedBean.create(ManagedBean.java:162)
>  at org.jboss.weld.context.AbstractContext.get(AbstractContext.java:96)
>  at 
> org.jboss.weld.bean.ContextualInstanceStrategy$DefaultContextualInstanceStrategy.get(ContextualInstanceStrategy.java:100)
>  at 
> org.jboss.weld.bean.ContextualInstanceStrategy$CachingContextualInstanceStrategy.get(ContextualInstanceStrategy.java:177)
>  at org.jboss.weld.bean.ContextualInstance.get(ContextualInstance.java:50)
>  at 
> org.jboss.weld.bean.proxy.ContextBeanInstance.getInstance(ContextBeanInstance.java:99)
>  at 
> org.jboss.weld.bean.proxy.ProxyMethodHandler.getInstance(ProxyMethodHandler.java:125)
>  at 
> org.apache.hadoop.ozone.s3.AWSSignatureProcessor$Proxy$_$$_WeldClientProxy.getAwsAccessId(Unknown
>  Source)
>  at 
> org.apache.hadoop.ozone.s3.OzoneClientProducer.getClient(OzoneClientProducer.java:79)
>  at 
> org.apache.hadoop.ozone.s3.OzoneClientProducer.createClient(OzoneClientProducer.java:68)
>  at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.jboss.weld.injection.StaticMethodInjectionPoint.invoke(StaticMethodInjectionPoint.java:88)
>  at 
> org.jboss.weld.injection.StaticMethodInjectionPoint.invoke(StaticMethodInjectionPoint.java:78)
>  at 
> org.jboss.weld.injection.producer.ProducerMethodProducer.produce(ProducerMethodProducer.java:100)
>  at 
> org.jboss.weld.injection.producer.AbstractMemberProducer.produce(AbstractMemberProducer.java:161)
>  at 
> org.jboss.weld.bean.AbstractProducerBean.create(AbstractProducerBean.java:180)
>  at 
> org.jboss.weld.context.unbound.DependentContextImpl.get(DependentContextImpl.java:70)
>  at 
> org.jboss.weld.bean.ContextualInstanceStrategy$DefaultContextualInstanceStrategy.get(ContextualInstanceStrategy.java:100)
>  at org.jboss.weld.bean.ContextualInstance.get(ContextualInstance.java:50)
>  at 
> org.jboss.weld.manager.BeanManagerImpl.getReference(BeanManagerImpl.java:785)
>  at 
> org.jboss.weld.manager.BeanManagerImpl.getInjectableReference(BeanManagerImpl.java:885)
>  at 
> org.jboss.weld.injection.FieldInjectionPoint.inject(FieldInjectionPoint.java:92)
>  at org.jboss.weld.util.Beans.injectBoundFields(Beans.java:358)
>  at org.jboss.weld.util.Beans.injectFieldsAndInitializers(Beans.java:369)
>  at 
> org.jboss.weld.injection.producer.ResourceInjector$1.proceed(ResourceInjector.java:70)
>  at 
> org.jboss.weld.injection.InjectionContextImpl.run(InjectionContextImpl.java:48)
>  at 
> org.jboss.weld.injection.producer.ResourceInjector.inject(ResourceInjector.java:72)
>  at 
> 

[jira] [Commented] (HDDS-4365) SCMBlockLocationFailoverProxyProvider should use ScmBlockLocationProtocolPB.class in RPC.setProtocolEngine

2020-10-22 Thread Li Cheng (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17218793#comment-17218793
 ] 

Li Cheng commented on HDDS-4365:


Merged. Resolving this...

> SCMBlockLocationFailoverProxyProvider should use 
> ScmBlockLocationProtocolPB.class in RPC.setProtocolEngine
> --
>
> Key: HDDS-4365
> URL: https://issues.apache.org/jira/browse/HDDS-4365
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Glen Geng
>Assignee: Glen Geng
>Priority: Minor
>  Labels: pull-request-available
>
> in SCMBlockLocationFailoverProxyProvider,
> currently it is
> {code:java}
> private ScmBlockLocationProtocolPB createSCMProxy(
> InetSocketAddress scmAddress) throws IOException {
>   ...
>   RPC.setProtocolEngine(hadoopConf, ScmBlockLocationProtocol.class,
>   ProtobufRpcEngine.class);
>   ...{code}
>  it should be 
> {code:java}
> private ScmBlockLocationProtocolPB createSCMProxy(
> InetSocketAddress scmAddress) throws IOException {
>   ...
>   RPC.setProtocolEngine(hadoopConf, ScmBlockLocationProtocolPB.class,
>   ProtobufRpcEngine.class);
>   ...{code}
>  
> FYi, according to non-HA version
> {code:java}
> private static ScmBlockLocationProtocol getScmBlockClient(
> OzoneConfiguration conf) throws IOException {
>   RPC.setProtocolEngine(conf, ScmBlockLocationProtocolPB.class,
>   ProtobufRpcEngine.class);
>   long scmVersion =
>   RPC.getProtocolVersion(ScmBlockLocationProtocolPB.class);
>   InetSocketAddress scmBlockAddress =
>   getScmAddressForBlockClients(conf);
>   ScmBlockLocationProtocolClientSideTranslatorPB scmBlockLocationClient =
>   new ScmBlockLocationProtocolClientSideTranslatorPB(
>   RPC.getProxy(ScmBlockLocationProtocolPB.class, scmVersion,
>   scmBlockAddress, UserGroupInformation.getCurrentUser(), conf,
>   NetUtils.getDefaultSocketFactory(conf),
>   Client.getRpcTimeout(conf)));
>   return TracingUtil
>   .createProxy(scmBlockLocationClient, ScmBlockLocationProtocol.class,
>   conf);
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-4365) SCMBlockLocationFailoverProxyProvider should use ScmBlockLocationProtocolPB.class in RPC.setProtocolEngine

2020-10-22 Thread Li Cheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Cheng resolved HDDS-4365.

Resolution: Fixed

> SCMBlockLocationFailoverProxyProvider should use 
> ScmBlockLocationProtocolPB.class in RPC.setProtocolEngine
> --
>
> Key: HDDS-4365
> URL: https://issues.apache.org/jira/browse/HDDS-4365
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Glen Geng
>Assignee: Glen Geng
>Priority: Minor
>  Labels: pull-request-available
>
> in SCMBlockLocationFailoverProxyProvider,
> currently it is
> {code:java}
> private ScmBlockLocationProtocolPB createSCMProxy(
> InetSocketAddress scmAddress) throws IOException {
>   ...
>   RPC.setProtocolEngine(hadoopConf, ScmBlockLocationProtocol.class,
>   ProtobufRpcEngine.class);
>   ...{code}
>  it should be 
> {code:java}
> private ScmBlockLocationProtocolPB createSCMProxy(
> InetSocketAddress scmAddress) throws IOException {
>   ...
>   RPC.setProtocolEngine(hadoopConf, ScmBlockLocationProtocolPB.class,
>   ProtobufRpcEngine.class);
>   ...{code}
>  
> FYi, according to non-HA version
> {code:java}
> private static ScmBlockLocationProtocol getScmBlockClient(
> OzoneConfiguration conf) throws IOException {
>   RPC.setProtocolEngine(conf, ScmBlockLocationProtocolPB.class,
>   ProtobufRpcEngine.class);
>   long scmVersion =
>   RPC.getProtocolVersion(ScmBlockLocationProtocolPB.class);
>   InetSocketAddress scmBlockAddress =
>   getScmAddressForBlockClients(conf);
>   ScmBlockLocationProtocolClientSideTranslatorPB scmBlockLocationClient =
>   new ScmBlockLocationProtocolClientSideTranslatorPB(
>   RPC.getProxy(ScmBlockLocationProtocolPB.class, scmVersion,
>   scmBlockAddress, UserGroupInformation.getCurrentUser(), conf,
>   NetUtils.getDefaultSocketFactory(conf),
>   Client.getRpcTimeout(conf)));
>   return TracingUtil
>   .createProxy(scmBlockLocationClient, ScmBlockLocationProtocol.class,
>   conf);
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-4361) S3 native error messages when header is illegal

2020-10-21 Thread Li Cheng (Jira)
Li Cheng created HDDS-4361:
--

 Summary: S3 native error messages when header is illegal
 Key: HDDS-4361
 URL: https://issues.apache.org/jira/browse/HDDS-4361
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: S3
Affects Versions: 1.0.0
Reporter: Li Cheng


Following up on https://issues.apache.org/jira/browse/HDDS-4339 and 
https://issues.apache.org/jira/browse/HDDS-3843, missing auth or other info in 
header may cause S3Client to throw NPE or log an error message.

Rather than that, s3g should return s3 native error messages to requests with 
invalid header or other reasons. However, s3client should be able to 
initialized. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-3188) Add failover proxy to SCM block protocol

2020-10-19 Thread Li Cheng (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17216508#comment-17216508
 ] 

Li Cheng commented on HDDS-3188:


PR is merged. Resolving

> Add failover proxy to SCM block protocol
> 
>
> Key: HDDS-3188
> URL: https://issues.apache.org/jira/browse/HDDS-3188
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Li Cheng
>Assignee: Li Cheng
>Priority: Major
>  Labels: pull-request-available
>
> Need to supports 2N + 1 SCMs. Add configs and logic to support multiple SCMs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3188) Add failover proxy to SCM block protocol

2020-10-19 Thread Li Cheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Cheng updated HDDS-3188:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Add failover proxy to SCM block protocol
> 
>
> Key: HDDS-3188
> URL: https://issues.apache.org/jira/browse/HDDS-3188
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Li Cheng
>Assignee: Li Cheng
>Priority: Major
>  Labels: pull-request-available
>
> Need to supports 2N + 1 SCMs. Add configs and logic to support multiple SCMs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-4192) enable SCM Raft Group based on config ozone.scm.names

2020-10-19 Thread Li Cheng (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-4192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17216507#comment-17216507
 ] 

Li Cheng commented on HDDS-4192:


PR is merged. Thanks [~glengeng] for the contribution. Resolving

> enable SCM Raft Group based on config ozone.scm.names
> -
>
> Key: HDDS-4192
> URL: https://issues.apache.org/jira/browse/HDDS-4192
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Glen Geng
>Assignee: Glen Geng
>Priority: Major
>  Labels: pull-request-available
>
>  
> Say ozone.scm.names is "ip1,ip2,ip3", scm with ip1 identifies its RaftPeerId 
> as scm1,  scm with ip2 identifies its RaftPeerId as scm2, scm with ip3 
> identifies its RaftPeerId as scm3. They will automatically become a raft 
> group.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-4192) enable SCM Raft Group based on config ozone.scm.names

2020-10-19 Thread Li Cheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Cheng resolved HDDS-4192.

Resolution: Fixed

> enable SCM Raft Group based on config ozone.scm.names
> -
>
> Key: HDDS-4192
> URL: https://issues.apache.org/jira/browse/HDDS-4192
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Glen Geng
>Assignee: Glen Geng
>Priority: Major
>  Labels: pull-request-available
>
>  
> Say ozone.scm.names is "ip1,ip2,ip3", scm with ip1 identifies its RaftPeerId 
> as scm1,  scm with ip2 identifies its RaftPeerId as scm2, scm with ip3 
> identifies its RaftPeerId as scm3. They will automatically become a raft 
> group.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-4339) Ozone S3 gateway throws NPE with goofys

2020-10-14 Thread Li Cheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Cheng reassigned HDDS-4339:
--

Assignee: Li Cheng

> Ozone S3 gateway throws NPE with goofys
> ---
>
> Key: HDDS-4339
> URL: https://issues.apache.org/jira/browse/HDDS-4339
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Li Cheng
>Assignee: Li Cheng
>Priority: Blocker
>  Labels: pull-request-available
> Attachments: image-2020-10-13-15-23-49-864.png
>
>
> Configured goofys and s3g on different hosts and Fiotest writes files on the 
> goofys mount point. Export AWS secrets on the s3g host. See a bunch of NPE in 
> s3g logs.
>  # Looks like missing AWS auth header could cause NPE. Looks like 
> AWSSignatureProcessor.init() doesn't handle header missing which causes NPE.
>  # Why it's missing AWS auth header is also unknown.
> Note that there are files that have been successfully written into Ozone via 
> goofys, while not all of them are succeeded.  
>  
> 2020-10-13 11:18:43,425 [qtp1686100174-1238] ERROR 
> org.apache.hadoop.ozone.s3.OzoneClientProducer: Error: 
> org.jboss.weld.exceptions.WeldException: WELD-49: Unable to invoke public 
> void org.apache.hadoop.ozone.s3.AWSSignatureProcessor.init() throws 
> java.lang.Exception on 
> org.apache.hadoop.ozone.s3.AWSSignatureProcessor@5535155b
>  at 
> org.jboss.weld.injection.producer.DefaultLifecycleCallbackInvoker.invokeMethods(DefaultLifecycleCallbackInvoker.java:99)
>  at 
> org.jboss.weld.injection.producer.DefaultLifecycleCallbackInvoker.postConstruct(DefaultLifecycleCallbackInvoker.java:80)
>  at 
> org.jboss.weld.injection.producer.BasicInjectionTarget.postConstruct(BasicInjectionTarget.java:122)
>  at 
> org.glassfish.jersey.ext.cdi1x.internal.CdiComponentProvider$InjectionManagerInjectedCdiTarget.postConstruct(CdiComponentProvider.java:887)
>  at org.jboss.weld.bean.ManagedBean.create(ManagedBean.java:162)
>  at org.jboss.weld.context.AbstractContext.get(AbstractContext.java:96)
>  at 
> org.jboss.weld.bean.ContextualInstanceStrategy$DefaultContextualInstanceStrategy.get(ContextualInstanceStrategy.java:100)
>  at 
> org.jboss.weld.bean.ContextualInstanceStrategy$CachingContextualInstanceStrategy.get(ContextualInstanceStrategy.java:177)
>  at org.jboss.weld.bean.ContextualInstance.get(ContextualInstance.java:50)
>  at 
> org.jboss.weld.bean.proxy.ContextBeanInstance.getInstance(ContextBeanInstance.java:99)
>  at 
> org.jboss.weld.bean.proxy.ProxyMethodHandler.getInstance(ProxyMethodHandler.java:125)
>  at 
> org.apache.hadoop.ozone.s3.AWSSignatureProcessor$Proxy$_$$_WeldClientProxy.getAwsAccessId(Unknown
>  Source)
>  at 
> org.apache.hadoop.ozone.s3.OzoneClientProducer.getClient(OzoneClientProducer.java:79)
>  at 
> org.apache.hadoop.ozone.s3.OzoneClientProducer.createClient(OzoneClientProducer.java:68)
>  at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.jboss.weld.injection.StaticMethodInjectionPoint.invoke(StaticMethodInjectionPoint.java:88)
>  at 
> org.jboss.weld.injection.StaticMethodInjectionPoint.invoke(StaticMethodInjectionPoint.java:78)
>  at 
> org.jboss.weld.injection.producer.ProducerMethodProducer.produce(ProducerMethodProducer.java:100)
>  at 
> org.jboss.weld.injection.producer.AbstractMemberProducer.produce(AbstractMemberProducer.java:161)
>  at 
> org.jboss.weld.bean.AbstractProducerBean.create(AbstractProducerBean.java:180)
>  at 
> org.jboss.weld.context.unbound.DependentContextImpl.get(DependentContextImpl.java:70)
>  at 
> org.jboss.weld.bean.ContextualInstanceStrategy$DefaultContextualInstanceStrategy.get(ContextualInstanceStrategy.java:100)
>  at org.jboss.weld.bean.ContextualInstance.get(ContextualInstance.java:50)
>  at 
> org.jboss.weld.manager.BeanManagerImpl.getReference(BeanManagerImpl.java:785)
>  at 
> org.jboss.weld.manager.BeanManagerImpl.getInjectableReference(BeanManagerImpl.java:885)
>  at 
> org.jboss.weld.injection.FieldInjectionPoint.inject(FieldInjectionPoint.java:92)
>  at org.jboss.weld.util.Beans.injectBoundFields(Beans.java:358)
>  at org.jboss.weld.util.Beans.injectFieldsAndInitializers(Beans.java:369)
>  at 
> org.jboss.weld.injection.producer.ResourceInjector$1.proceed(ResourceInjector.java:70)
>  at 
> org.jboss.weld.injection.InjectionContextImpl.run(InjectionContextImpl.java:48)
>  at 
> org.jboss.weld.injection.producer.ResourceInjector.inject(ResourceInjector.java:72)
>  at 
> org.jboss.weld.injection.producer.BasicInjectionTarget.inject(BasicInjectionTarget.java:117)
>  at 
> 

[jira] [Commented] (HDDS-4339) Ozone S3 gateway throws NPE with goofys

2020-10-14 Thread Li Cheng (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-4339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213856#comment-17213856
 ] 

Li Cheng commented on HDDS-4339:


[~bharat] The issue is slightly different here. Before the header goes into 
AWSSignatureProcessor, we need to validate there is auth field in the header 
before the following validation depends on it. However, we are missing a place 
to handle the requests who don't contain auth field. 

> Ozone S3 gateway throws NPE with goofys
> ---
>
> Key: HDDS-4339
> URL: https://issues.apache.org/jira/browse/HDDS-4339
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Li Cheng
>Priority: Blocker
> Attachments: image-2020-10-13-15-23-49-864.png
>
>
> Configured goofys and s3g on different hosts and Fiotest writes files on the 
> goofys mount point. Export AWS secrets on the s3g host. See a bunch of NPE in 
> s3g logs.
>  # Looks like missing AWS auth header could cause NPE. Looks like 
> AWSSignatureProcessor.init() doesn't handle header missing which causes NPE.
>  # Why it's missing AWS auth header is also unknown.
> Note that there are files that have been successfully written into Ozone via 
> goofys, while not all of them are succeeded.  
>  
> 2020-10-13 11:18:43,425 [qtp1686100174-1238] ERROR 
> org.apache.hadoop.ozone.s3.OzoneClientProducer: Error: 
> org.jboss.weld.exceptions.WeldException: WELD-49: Unable to invoke public 
> void org.apache.hadoop.ozone.s3.AWSSignatureProcessor.init() throws 
> java.lang.Exception on 
> org.apache.hadoop.ozone.s3.AWSSignatureProcessor@5535155b
>  at 
> org.jboss.weld.injection.producer.DefaultLifecycleCallbackInvoker.invokeMethods(DefaultLifecycleCallbackInvoker.java:99)
>  at 
> org.jboss.weld.injection.producer.DefaultLifecycleCallbackInvoker.postConstruct(DefaultLifecycleCallbackInvoker.java:80)
>  at 
> org.jboss.weld.injection.producer.BasicInjectionTarget.postConstruct(BasicInjectionTarget.java:122)
>  at 
> org.glassfish.jersey.ext.cdi1x.internal.CdiComponentProvider$InjectionManagerInjectedCdiTarget.postConstruct(CdiComponentProvider.java:887)
>  at org.jboss.weld.bean.ManagedBean.create(ManagedBean.java:162)
>  at org.jboss.weld.context.AbstractContext.get(AbstractContext.java:96)
>  at 
> org.jboss.weld.bean.ContextualInstanceStrategy$DefaultContextualInstanceStrategy.get(ContextualInstanceStrategy.java:100)
>  at 
> org.jboss.weld.bean.ContextualInstanceStrategy$CachingContextualInstanceStrategy.get(ContextualInstanceStrategy.java:177)
>  at org.jboss.weld.bean.ContextualInstance.get(ContextualInstance.java:50)
>  at 
> org.jboss.weld.bean.proxy.ContextBeanInstance.getInstance(ContextBeanInstance.java:99)
>  at 
> org.jboss.weld.bean.proxy.ProxyMethodHandler.getInstance(ProxyMethodHandler.java:125)
>  at 
> org.apache.hadoop.ozone.s3.AWSSignatureProcessor$Proxy$_$$_WeldClientProxy.getAwsAccessId(Unknown
>  Source)
>  at 
> org.apache.hadoop.ozone.s3.OzoneClientProducer.getClient(OzoneClientProducer.java:79)
>  at 
> org.apache.hadoop.ozone.s3.OzoneClientProducer.createClient(OzoneClientProducer.java:68)
>  at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.jboss.weld.injection.StaticMethodInjectionPoint.invoke(StaticMethodInjectionPoint.java:88)
>  at 
> org.jboss.weld.injection.StaticMethodInjectionPoint.invoke(StaticMethodInjectionPoint.java:78)
>  at 
> org.jboss.weld.injection.producer.ProducerMethodProducer.produce(ProducerMethodProducer.java:100)
>  at 
> org.jboss.weld.injection.producer.AbstractMemberProducer.produce(AbstractMemberProducer.java:161)
>  at 
> org.jboss.weld.bean.AbstractProducerBean.create(AbstractProducerBean.java:180)
>  at 
> org.jboss.weld.context.unbound.DependentContextImpl.get(DependentContextImpl.java:70)
>  at 
> org.jboss.weld.bean.ContextualInstanceStrategy$DefaultContextualInstanceStrategy.get(ContextualInstanceStrategy.java:100)
>  at org.jboss.weld.bean.ContextualInstance.get(ContextualInstance.java:50)
>  at 
> org.jboss.weld.manager.BeanManagerImpl.getReference(BeanManagerImpl.java:785)
>  at 
> org.jboss.weld.manager.BeanManagerImpl.getInjectableReference(BeanManagerImpl.java:885)
>  at 
> org.jboss.weld.injection.FieldInjectionPoint.inject(FieldInjectionPoint.java:92)
>  at org.jboss.weld.util.Beans.injectBoundFields(Beans.java:358)
>  at org.jboss.weld.util.Beans.injectFieldsAndInitializers(Beans.java:369)
>  at 
> org.jboss.weld.injection.producer.ResourceInjector$1.proceed(ResourceInjector.java:70)
>  at 
> org.jboss.weld.injection.InjectionContextImpl.run(InjectionContextImpl.java:48)
>  at 
> 

[jira] [Created] (HDDS-4339) Ozone S3 gateway throws NPE with goofys

2020-10-13 Thread Li Cheng (Jira)
Li Cheng created HDDS-4339:
--

 Summary: Ozone S3 gateway throws NPE with goofys
 Key: HDDS-4339
 URL: https://issues.apache.org/jira/browse/HDDS-4339
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Li Cheng
 Attachments: image-2020-10-13-15-23-49-864.png

Configured goofys and s3g on different hosts and Fiotest writes files on the 
goofys mount point. Export AWS secrets on the s3g host. See a bunch of NPE in 
s3g logs.
 # Looks like missing AWS auth header could cause NPE. Looks like 
AWSSignatureProcessor.init() doesn't handle header missing which causes NPE.
 # Why it's missing AWS auth header is also unknown.

Note that there are files that have been successfully written into Ozone via 
goofys, while not all of them are succeeded.  

 

2020-10-13 11:18:43,425 [qtp1686100174-1238] ERROR 
org.apache.hadoop.ozone.s3.OzoneClientProducer: Error: 
org.jboss.weld.exceptions.WeldException: WELD-49: Unable to invoke public 
void org.apache.hadoop.ozone.s3.AWSSignatureProcessor.init() throws 
java.lang.Exception on org.apache.hadoop.ozone.s3.AWSSignatureProcessor@5535155b
 at 
org.jboss.weld.injection.producer.DefaultLifecycleCallbackInvoker.invokeMethods(DefaultLifecycleCallbackInvoker.java:99)
 at 
org.jboss.weld.injection.producer.DefaultLifecycleCallbackInvoker.postConstruct(DefaultLifecycleCallbackInvoker.java:80)
 at 
org.jboss.weld.injection.producer.BasicInjectionTarget.postConstruct(BasicInjectionTarget.java:122)
 at 
org.glassfish.jersey.ext.cdi1x.internal.CdiComponentProvider$InjectionManagerInjectedCdiTarget.postConstruct(CdiComponentProvider.java:887)
 at org.jboss.weld.bean.ManagedBean.create(ManagedBean.java:162)
 at org.jboss.weld.context.AbstractContext.get(AbstractContext.java:96)
 at 
org.jboss.weld.bean.ContextualInstanceStrategy$DefaultContextualInstanceStrategy.get(ContextualInstanceStrategy.java:100)
 at 
org.jboss.weld.bean.ContextualInstanceStrategy$CachingContextualInstanceStrategy.get(ContextualInstanceStrategy.java:177)
 at org.jboss.weld.bean.ContextualInstance.get(ContextualInstance.java:50)
 at 
org.jboss.weld.bean.proxy.ContextBeanInstance.getInstance(ContextBeanInstance.java:99)
 at 
org.jboss.weld.bean.proxy.ProxyMethodHandler.getInstance(ProxyMethodHandler.java:125)
 at 
org.apache.hadoop.ozone.s3.AWSSignatureProcessor$Proxy$_$$_WeldClientProxy.getAwsAccessId(Unknown
 Source)
 at 
org.apache.hadoop.ozone.s3.OzoneClientProducer.getClient(OzoneClientProducer.java:79)
 at 
org.apache.hadoop.ozone.s3.OzoneClientProducer.createClient(OzoneClientProducer.java:68)
 at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at 
org.jboss.weld.injection.StaticMethodInjectionPoint.invoke(StaticMethodInjectionPoint.java:88)
 at 
org.jboss.weld.injection.StaticMethodInjectionPoint.invoke(StaticMethodInjectionPoint.java:78)
 at 
org.jboss.weld.injection.producer.ProducerMethodProducer.produce(ProducerMethodProducer.java:100)
 at 
org.jboss.weld.injection.producer.AbstractMemberProducer.produce(AbstractMemberProducer.java:161)
 at 
org.jboss.weld.bean.AbstractProducerBean.create(AbstractProducerBean.java:180)
 at 
org.jboss.weld.context.unbound.DependentContextImpl.get(DependentContextImpl.java:70)
 at 
org.jboss.weld.bean.ContextualInstanceStrategy$DefaultContextualInstanceStrategy.get(ContextualInstanceStrategy.java:100)
 at org.jboss.weld.bean.ContextualInstance.get(ContextualInstance.java:50)
 at 
org.jboss.weld.manager.BeanManagerImpl.getReference(BeanManagerImpl.java:785)
 at 
org.jboss.weld.manager.BeanManagerImpl.getInjectableReference(BeanManagerImpl.java:885)
 at 
org.jboss.weld.injection.FieldInjectionPoint.inject(FieldInjectionPoint.java:92)
 at org.jboss.weld.util.Beans.injectBoundFields(Beans.java:358)
 at org.jboss.weld.util.Beans.injectFieldsAndInitializers(Beans.java:369)
 at 
org.jboss.weld.injection.producer.ResourceInjector$1.proceed(ResourceInjector.java:70)
 at 
org.jboss.weld.injection.InjectionContextImpl.run(InjectionContextImpl.java:48)
 at 
org.jboss.weld.injection.producer.ResourceInjector.inject(ResourceInjector.java:72)
 at 
org.jboss.weld.injection.producer.BasicInjectionTarget.inject(BasicInjectionTarget.java:117)
 at 
org.glassfish.jersey.ext.cdi1x.internal.CdiComponentProvider$InjectionManagerInjectedCdiTarget.inject(CdiComponentProvider.java:873)
 at org.jboss.weld.bean.ManagedBean.create(ManagedBean.java:159)
 at 
org.jboss.weld.context.unbound.DependentContextImpl.get(DependentContextImpl.java:70)
 at 
org.jboss.weld.bean.ContextualInstanceStrategy$DefaultContextualInstanceStrategy.get(ContextualInstanceStrategy.java:100)
 at org.jboss.weld.bean.ContextualInstance.get(ContextualInstance.java:50)
 at 

[jira] [Assigned] (HDDS-3103) Have multi-raft pipeline calculator to recommend best pipeline number per datanode

2020-10-13 Thread Li Cheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Cheng reassigned HDDS-3103:
--

Assignee: (was: Li Cheng)

> Have multi-raft pipeline calculator to recommend best pipeline number per 
> datanode
> --
>
> Key: HDDS-3103
> URL: https://issues.apache.org/jira/browse/HDDS-3103
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: SCM
>Affects Versions: 0.5.0
>Reporter: Li Cheng
>Priority: Critical
>
> PipelinePlacementPolicy should have a calculator method to recommend better 
> number for pipeline number per node. The number used to come from 
> ozone.datanode.pipeline.limit in config. SCM should be able to consider how 
> many ratis dir and the ratis retry timeout to recommend the best pipeline 
> number for every node.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-4295) SCM ServiceManager

2020-09-29 Thread Li Cheng (Jira)
Li Cheng created HDDS-4295:
--

 Summary: SCM ServiceManager 
 Key: HDDS-4295
 URL: https://issues.apache.org/jira/browse/HDDS-4295
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
  Components: SCM
Reporter: Li Cheng


SCM ServiceManager is going to control all the SCM background service so that 
they are only serving as the leader. 

ServiceManager also would bootstrap all the background services and protocol 
servers. 

It also needs to do validation steps when the SCM is up as the leader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-3206) Make sure AllocateBlock can only be executed on leader SCM

2020-09-29 Thread Li Cheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Cheng resolved HDDS-3206.

Resolution: Duplicate

> Make sure AllocateBlock can only be executed on leader SCM
> --
>
> Key: HDDS-3206
> URL: https://issues.apache.org/jira/browse/HDDS-3206
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Li Cheng
>Priority: Major
>
> Check if the current is leader. If not, return NonLeaderException



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-3199) Handle PipelineAction and OpenPipline from DN to SCM

2020-09-29 Thread Li Cheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Cheng resolved HDDS-3199.

Resolution: Duplicate

> Handle PipelineAction and OpenPipline from DN to SCM
> 
>
> Key: HDDS-3199
> URL: https://issues.apache.org/jira/browse/HDDS-3199
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Li Cheng
>Priority: Major
>
> PipelineAction and OpenPipeline should only sent to leader SCM and leader SCM 
> will take action to close or open pipelines. Pipeline state change will be 
> updated to followers via Ratis. If action is sent to followers, follower SCM 
> will reject with NonLeaderException and DN will retry.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-4294) Backport updates from ContainerManager(V1)

2020-09-29 Thread Li Cheng (Jira)
Li Cheng created HDDS-4294:
--

 Summary: Backport updates from ContainerManager(V1)
 Key: HDDS-4294
 URL: https://issues.apache.org/jira/browse/HDDS-4294
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: Li Cheng






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-4293) Backport updates from PipelineManager(V1)

2020-09-29 Thread Li Cheng (Jira)
Li Cheng created HDDS-4293:
--

 Summary: Backport updates from PipelineManager(V1)
 Key: HDDS-4293
 URL: https://issues.apache.org/jira/browse/HDDS-4293
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: Li Cheng






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3211) Design for SCM HA configuration

2020-09-29 Thread Li Cheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Cheng updated HDDS-3211:
---
Summary: Design for SCM HA configuration  (was: Make SCM HA configurable)

> Design for SCM HA configuration
> ---
>
> Key: HDDS-3211
> URL: https://issues.apache.org/jira/browse/HDDS-3211
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Li Cheng
>Priority: Major
>
> Need a switch in all path to turn on/off SCM HA.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-3200) Handle NodeReport from DN to SCMs

2020-09-29 Thread Li Cheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Cheng resolved HDDS-3200.

Resolution: Duplicate

> Handle NodeReport from DN to SCMs
> -
>
> Key: HDDS-3200
> URL: https://issues.apache.org/jira/browse/HDDS-3200
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Li Cheng
>Priority: Major
>
> NodeReport sends to all SCMs. Only leader SCM can take action to change node 
> status.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-3193) Handle ContainerReport and IncrementalContainerReport

2020-09-29 Thread Li Cheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Cheng resolved HDDS-3193.

Resolution: Duplicate

> Handle ContainerReport and IncrementalContainerReport
> -
>
> Key: HDDS-3193
> URL: https://issues.apache.org/jira/browse/HDDS-3193
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Li Cheng
>Priority: Major
>
> Let DataNode send to all SCMs for contianerReport and 
> IncrementalContainerReport. And SCM should be aware of BSCID in reports to 
> know to version of report. SCM will NOT applyTransaction for container 
> reports. But only record the sequenceId like BCSID in reports.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-3211) Make SCM HA configurable

2020-09-29 Thread Li Cheng (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-3211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17204409#comment-17204409
 ] 

Li Cheng commented on HDDS-3211:


[~nicholasjiang] Hey Nicolas, this issue would require an overall design for 
SCM HA configuration considering multi-scms as well as allowing federation. 
Also this HA config ma apply for entire Ozone, which means we would need to 
update what OM HA does now.

> Make SCM HA configurable
> 
>
> Key: HDDS-3211
> URL: https://issues.apache.org/jira/browse/HDDS-3211
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Li Cheng
>Priority: Major
>
> Need a switch in all path to turn on/off SCM HA.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-4115) CLI command to show current SCM leader and follower status

2020-09-29 Thread Li Cheng (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-4115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17203727#comment-17203727
 ] 

Li Cheng commented on HDDS-4115:


Patch is merged. Resolving

> CLI command to show current SCM leader and follower status
> --
>
> Key: HDDS-4115
> URL: https://issues.apache.org/jira/browse/HDDS-4115
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Li Cheng
>Assignee: Rui Wang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-4115) CLI command to show current SCM leader and follower status

2020-09-29 Thread Li Cheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Cheng updated HDDS-4115:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> CLI command to show current SCM leader and follower status
> --
>
> Key: HDDS-4115
> URL: https://issues.apache.org/jira/browse/HDDS-4115
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Li Cheng
>Assignee: Rui Wang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-3661) Add Snapshot into new SCMRatisServer and SCMStateMachine

2020-09-27 Thread Li Cheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Cheng reassigned HDDS-3661:
--

Assignee: Rui Wang  (was: Li Cheng)

> Add Snapshot into new SCMRatisServer  and SCMStateMachine
> -
>
> Key: HDDS-3661
> URL: https://issues.apache.org/jira/browse/HDDS-3661
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Li Cheng
>Assignee: Rui Wang
>Priority: Major
>
> Now we have prototype SCMRatisServer and SCMStateMachine under Ratis and HA 
> path. Implement Snapshot support into new SCMRatisServer and SCMStateMachine 
> as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-4132) Switch to ContainerManagerV2

2020-09-27 Thread Li Cheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Cheng resolved HDDS-4132.

Resolution: Duplicate

https://issues.apache.org/jira/browse/HDDS-4133

> Switch to ContainerManagerV2
> 
>
> Key: HDDS-4132
> URL: https://issues.apache.org/jira/browse/HDDS-4132
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM HA
>Reporter: Li Cheng
>Assignee: Li Cheng
>Priority: Major
>
> Use the new ContainerManagerV2 API 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-3203) Replication can only be executed on leader

2020-09-27 Thread Li Cheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Cheng resolved HDDS-3203.

Resolution: Duplicate

> Replication can only be executed on leader
> --
>
> Key: HDDS-3203
> URL: https://issues.apache.org/jira/browse/HDDS-3203
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Li Cheng
>Priority: Major
>
> Replication should only execute on leader. 
> If the leader has changed, the new leader will initialize new tasks based on 
> its current view.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3188) Add failover proxy to SCM block protocol

2020-09-27 Thread Li Cheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Cheng updated HDDS-3188:
---
Status: Patch Available  (was: In Progress)

> Add failover proxy to SCM block protocol
> 
>
> Key: HDDS-3188
> URL: https://issues.apache.org/jira/browse/HDDS-3188
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Li Cheng
>Assignee: Li Cheng
>Priority: Major
>  Labels: pull-request-available
>
> Need to supports 2N + 1 SCMs. Add configs and logic to support multiple SCMs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-3188) Add failover proxy to SCM block protocol

2020-09-27 Thread Li Cheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Cheng reassigned HDDS-3188:
--

Assignee: Li Cheng  (was: Li Cheng)

> Add failover proxy to SCM block protocol
> 
>
> Key: HDDS-3188
> URL: https://issues.apache.org/jira/browse/HDDS-3188
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Li Cheng
>Assignee: Li Cheng
>Priority: Major
>  Labels: pull-request-available
>
> Need to supports 2N + 1 SCMs. Add configs and logic to support multiple SCMs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-2823) SCM HA Support

2020-09-26 Thread Li Cheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Cheng reassigned HDDS-2823:
--

Assignee: Li Cheng  (was: Li Cheng)

> SCM HA Support 
> ---
>
> Key: HDDS-2823
> URL: https://issues.apache.org/jira/browse/HDDS-2823
> Project: Hadoop Distributed Data Store
>  Issue Type: New Feature
>  Components: SCM HA
>Reporter: Sammi Chen
>Assignee: Li Cheng
>Priority: Major
>
> OM HA is close to feature complete now. It's time to support SCM HA, to make 
> sure there is no SPoF in the system.
>  
> Design doc: 
> https://docs.google.com/document/d/1vr_z6mQgtS1dtI0nANoJlzvF1oLV-AtnNJnxAgg69rM/edit?usp=sharing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-4281) Use suggestedLeader for SCM failover proxy performing failover

2020-09-26 Thread Li Cheng (Jira)
Li Cheng created HDDS-4281:
--

 Summary: Use suggestedLeader for SCM failover proxy performing 
failover
 Key: HDDS-4281
 URL: https://issues.apache.org/jira/browse/HDDS-4281
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: Li Cheng


related with HDDS-3188.

Use suggestedLeader for SCM failover proxy response.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-4221) Support extra large storage capacity server as datanode

2020-09-24 Thread Li Cheng (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-4221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201902#comment-17201902
 ] 

Li Cheng commented on HDDS-4221:


There is discussion over RaftClient sharing one gRPC channel on every datanode:

https://issues.apache.org/jira/browse/RATIS-1072


https://issues.apache.org/jira/browse/RATIS-1074

> Support extra large storage capacity server as datanode
> ---
>
> Key: HDDS-4221
> URL: https://issues.apache.org/jira/browse/HDDS-4221
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Sammi Chen
>Priority: Major
> Attachments: image-2020-09-25-12-41-38-113.png
>
>
> There is customer request to support high density storage server as datanode, 
> hardware configuration for example,  96 Core, 32G DDR4 *8, 480G SATA SSD, 
> 25GbE *2 , 60 * 12TB HDD.  
> How to fully utilize the hardware resource and unleash it's power is a big 
> challenge. 
> This umbrella JIRA is created to host all the discussions and next step 
> actions towards the final goal. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-4221) Support extra large storage capacity server as datanode

2020-09-24 Thread Li Cheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Cheng updated HDDS-4221:
---
Attachment: image-2020-09-25-12-41-38-113.png

> Support extra large storage capacity server as datanode
> ---
>
> Key: HDDS-4221
> URL: https://issues.apache.org/jira/browse/HDDS-4221
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Sammi Chen
>Priority: Major
> Attachments: image-2020-09-25-12-41-38-113.png
>
>
> There is customer request to support high density storage server as datanode, 
> hardware configuration for example,  96 Core, 32G DDR4 *8, 480G SATA SSD, 
> 25GbE *2 , 60 * 12TB HDD.  
> How to fully utilize the hardware resource and unleash it's power is a big 
> challenge. 
> This umbrella JIRA is created to host all the discussions and next step 
> actions towards the final goal. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-4221) Support extra large storage capacity server as datanode

2020-09-24 Thread Li Cheng (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-4221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201901#comment-17201901
 ] 

Li Cheng commented on HDDS-4221:


In cosbench via S3 test, it looks like the writing performance performance has 
difference when we test towards single bucket and multiple buckets

The above one is for single bucket and the below one is for 4 buckes.

!image-2020-09-25-12-41-38-113.png!

> Support extra large storage capacity server as datanode
> ---
>
> Key: HDDS-4221
> URL: https://issues.apache.org/jira/browse/HDDS-4221
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Sammi Chen
>Priority: Major
> Attachments: image-2020-09-25-12-41-38-113.png
>
>
> There is customer request to support high density storage server as datanode, 
> hardware configuration for example,  96 Core, 32G DDR4 *8, 480G SATA SSD, 
> 25GbE *2 , 60 * 12TB HDD.  
> How to fully utilize the hardware resource and unleash it's power is a big 
> challenge. 
> This umbrella JIRA is created to host all the discussions and next step 
> actions towards the final goal. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-4228) add field 'num' to ALLOCATE_BLOCK of scm audit log.

2020-09-10 Thread Li Cheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-4228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Cheng resolved HDDS-4228.

Fix Version/s: 1.1.0
   Resolution: Fixed

> add field 'num' to ALLOCATE_BLOCK of scm audit log.
> ---
>
> Key: HDDS-4228
> URL: https://issues.apache.org/jira/browse/HDDS-4228
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Glen Geng
>Assignee: Glen Geng
>Priority: Minor
>  Labels: pull-request-available, pull-requests-available
> Fix For: 1.1.0
>
>
>  
> The scm audit log for ALLOCATE_BLOCK is as follows:
> {code:java}
> 2020-09-10 03:42:08,196 | INFO | SCMAudit | user=root | ip=172.16.90.221 | 
> op=ALLOCATE_BLOCK {owner=7da0b4c4-d053-4fa0-8648-44ff0b8ba1bf, 
> size=268435456, type=RATIS, factor=THREE} | ret=SUCCESS |{code}
>  
> One might be interested about the num of blocks allocated, better add field 
> 'num' to  ALLOCATE_BLOCK of scm audit log.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-4228) add field 'num' to ALLOCATE_BLOCK of scm audit log.

2020-09-10 Thread Li Cheng (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-4228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17193955#comment-17193955
 ] 

Li Cheng commented on HDDS-4228:


PR is merged. Thanks [~glengeng] for working on this.

> add field 'num' to ALLOCATE_BLOCK of scm audit log.
> ---
>
> Key: HDDS-4228
> URL: https://issues.apache.org/jira/browse/HDDS-4228
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Glen Geng
>Assignee: Glen Geng
>Priority: Minor
>  Labels: pull-request-available, pull-requests-available
>
>  
> The scm audit log for ALLOCATE_BLOCK is as follows:
> {code:java}
> 2020-09-10 03:42:08,196 | INFO | SCMAudit | user=root | ip=172.16.90.221 | 
> op=ALLOCATE_BLOCK {owner=7da0b4c4-d053-4fa0-8648-44ff0b8ba1bf, 
> size=268435456, type=RATIS, factor=THREE} | ret=SUCCESS |{code}
>  
> One might be interested about the num of blocks allocated, better add field 
> 'num' to  ALLOCATE_BLOCK of scm audit log.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-4191) Add failover proxy for SCM container client

2020-09-02 Thread Li Cheng (Jira)
Li Cheng created HDDS-4191:
--

 Summary: Add failover proxy for SCM container client
 Key: HDDS-4191
 URL: https://issues.apache.org/jira/browse/HDDS-4191
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: Li Cheng
Assignee: Li Cheng


Take advantage of failover proxy in HDDS-3188 and have failover proxy for SCM 
container client as well



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-3188) Add failover proxy to SCM block protocol

2020-09-02 Thread Li Cheng (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189070#comment-17189070
 ] 

Li Cheng commented on HDDS-3188:


Dividing 'Enable Multiple SCMs' to smaller task

> Add failover proxy to SCM block protocol
> 
>
> Key: HDDS-3188
> URL: https://issues.apache.org/jira/browse/HDDS-3188
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Li Cheng
>Assignee: Li Cheng
>Priority: Major
>  Labels: pull-request-available
>
> Need to supports 2N + 1 SCMs. Add configs and logic to support multiple SCMs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3188) Add failover proxy to SCM block protocol

2020-09-02 Thread Li Cheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Cheng updated HDDS-3188:
---
Summary: Add failover proxy to SCM block protocol  (was: Enable Multiple 
SCMs)

> Add failover proxy to SCM block protocol
> 
>
> Key: HDDS-3188
> URL: https://issues.apache.org/jira/browse/HDDS-3188
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Li Cheng
>Assignee: Li Cheng
>Priority: Major
>  Labels: pull-request-available
>
> Need to supports 2N + 1 SCMs. Add configs and logic to support multiple SCMs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-3677) Handle events fired from PipelineManager to close container

2020-08-20 Thread Li Cheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Cheng resolved HDDS-3677.

Resolution: Duplicate

This is resolved in PipelineManagerV2

> Handle events fired from PipelineManager to close container
> ---
>
> Key: HDDS-3677
> URL: https://issues.apache.org/jira/browse/HDDS-3677
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Li Cheng
>Priority: Major
>
> finalizePipeline used to fire events to close containers. In new interface, 
> we should decide where to fire these events.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-4132) Switch to ContainerManagerV2

2020-08-20 Thread Li Cheng (Jira)
Li Cheng created HDDS-4132:
--

 Summary: Switch to ContainerManagerV2
 Key: HDDS-4132
 URL: https://issues.apache.org/jira/browse/HDDS-4132
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
  Components: SCM HA
Reporter: Li Cheng
Assignee: Li Cheng


Use the new ContainerManagerV2 API 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-4116) SCM CLI command towards certain IP

2020-08-17 Thread Li Cheng (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17179307#comment-17179307
 ] 

Li Cheng commented on HDDS-4116:


[~adoroszlai] The goal of this task is to enable admin cli to send scm related 
commands to certain SCM with IP address. Right now it's sending command just to 
SCM service, but with SCM HA, we would have leader SCM and followers.

> SCM CLI command towards certain IP
> --
>
> Key: HDDS-4116
> URL: https://issues.apache.org/jira/browse/HDDS-4116
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Li Cheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3837) Add isLeader check for SCM state updates

2020-08-17 Thread Li Cheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Cheng updated HDDS-3837:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Add isLeader check for SCM state updates
> 
>
> Key: HDDS-3837
> URL: https://issues.apache.org/jira/browse/HDDS-3837
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Li Cheng
>Assignee: Li Cheng
>Priority: Major
>  Labels: pull-request-available
>
> We only allow leader to make decisions to update map, DB and fire events to DN



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-4116) SCM CLI command towards certain IP

2020-08-13 Thread Li Cheng (Jira)
Li Cheng created HDDS-4116:
--

 Summary: SCM CLI command towards certain IP
 Key: HDDS-4116
 URL: https://issues.apache.org/jira/browse/HDDS-4116
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: Li Cheng






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-4115) CLI command to show current SCM leader and follower status

2020-08-13 Thread Li Cheng (Jira)
Li Cheng created HDDS-4115:
--

 Summary: CLI command to show current SCM leader and follower status
 Key: HDDS-4115
 URL: https://issues.apache.org/jira/browse/HDDS-4115
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: Li Cheng






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-3190) SCM needs to replay RaftLog for recovery

2020-08-12 Thread Li Cheng (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-3190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176716#comment-17176716
 ] 

Li Cheng commented on HDDS-3190:


[~amaliujia] Hey Rui,

Welcome to join ozone community! Let's find some time to have a sync-up 
offline. I shall introduce the vision as well as the design of SCM HA :)

> SCM needs to replay RaftLog for recovery
> 
>
> Key: HDDS-3190
> URL: https://issues.apache.org/jira/browse/HDDS-3190
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Li Cheng
>Assignee: Rui Wang
>Priority: Major
>
> Need to add a big Proto file for all types of requests to store in RaftLog. 
> SCM needs to replay RaftLog for recovery.
> Note that Proto may have further changes. Until all data structures are 
> finished, need to leave some room for compatibility. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-3962) Use getRoleInfoProto() in isLeader check

2020-07-15 Thread Li Cheng (Jira)
Li Cheng created HDDS-3962:
--

 Summary: Use getRoleInfoProto() in isLeader check
 Key: HDDS-3962
 URL: https://issues.apache.org/jira/browse/HDDS-3962
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: Li Cheng
Assignee: Glen Geng


{{RATIS-1001}} is going to have term in leadership check. SCM should check 
whether it's leader and at which term. Current isLeader check doesn't report 
term.

 

[https://github.com/apache/hadoop-ozone/pull/1191/files/0ca2ff54d496ee9c74273a79a1d33e0dd998eecf#diff-0282ededa84a94d13dbed6fbb7ee159bR75]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3837) Add isLeader check for SCM state updates

2020-07-12 Thread Li Cheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Cheng updated HDDS-3837:
---
Status: Patch Available  (was: In Progress)

[https://github.com/apache/hadoop-ozone/pull/1191]

> Add isLeader check for SCM state updates
> 
>
> Key: HDDS-3837
> URL: https://issues.apache.org/jira/browse/HDDS-3837
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Li Cheng
>Assignee: Li Cheng
>Priority: Major
>  Labels: pull-request-available
>
> We only allow leader to make decisions to update map, DB and fire events to DN



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-3188) Enable Multiple SCMs

2020-07-12 Thread Li Cheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Cheng reassigned HDDS-3188:
--

Assignee: Li Cheng

> Enable Multiple SCMs
> 
>
> Key: HDDS-3188
> URL: https://issues.apache.org/jira/browse/HDDS-3188
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Li Cheng
>Assignee: Li Cheng
>Priority: Major
>
> Need to supports 2N SCMs. Add configs and logic to support multiple SCMs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-3837) Add isLeader check for SCM state updates

2020-07-03 Thread Li Cheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Cheng reassigned HDDS-3837:
--

Assignee: Li Cheng

> Add isLeader check for SCM state updates
> 
>
> Key: HDDS-3837
> URL: https://issues.apache.org/jira/browse/HDDS-3837
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Li Cheng
>Assignee: Li Cheng
>Priority: Major
>
> We only allow leader to make decisions to update map, DB and fire events to DN



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-3911) Compile error in acceptance test on HDDS-2823

2020-07-02 Thread Li Cheng (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-3911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17150672#comment-17150672
 ] 

Li Cheng edited comment on HDDS-3911 at 7/3/20, 2:09 AM:
-

[https://github.com/apache/hadoop-ozone/pull/1157] is merged.Thanks for 
contribution.


was (Author: licheng):
[https://github.com/apache/hadoop-ozone/pull/1157] is merged. Closing this 
JIRA. 

> Compile error in acceptance test on HDDS-2823
> -
>
> Key: HDDS-3911
> URL: https://issues.apache.org/jira/browse/HDDS-3911
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM HA
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Blocker
>  Labels: pull-request-available
>
> {code}
> [INFO] --- hadoop-maven-plugins:3.2.1:protoc (compile-protoc) @ 
> hadoop-hdds-server-scm ---
> [WARNING] [protoc, --version] failed: java.io.IOException: Cannot run program 
> "protoc": error=2, No such file or directory
> [ERROR] stdout: []
> {code}
> https://github.com/apache/hadoop-ozone/runs/814218639



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-3911) Compile error in acceptance test on HDDS-2823

2020-07-02 Thread Li Cheng (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-3911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17150672#comment-17150672
 ] 

Li Cheng commented on HDDS-3911:


[https://github.com/apache/hadoop-ozone/pull/1157] is merged. Closing this 
JIRA. 

> Compile error in acceptance test on HDDS-2823
> -
>
> Key: HDDS-3911
> URL: https://issues.apache.org/jira/browse/HDDS-3911
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM HA
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Blocker
>  Labels: pull-request-available
>
> {code}
> [INFO] --- hadoop-maven-plugins:3.2.1:protoc (compile-protoc) @ 
> hadoop-hdds-server-scm ---
> [WARNING] [protoc, --version] failed: java.io.IOException: Cannot run program 
> "protoc": error=2, No such file or directory
> [ERROR] stdout: []
> {code}
> https://github.com/apache/hadoop-ozone/runs/814218639



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-3838) Handle stale leader issue

2020-06-19 Thread Li Cheng (Jira)
Li Cheng created HDDS-3838:
--

 Summary: Handle stale leader issue
 Key: HDDS-3838
 URL: https://issues.apache.org/jira/browse/HDDS-3838
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: Li Cheng


There could be a stale SCM leader and a new SCM leader and both can communicate 
to DNs. We need to handle consistency issue.

 

https://docs.google.com/document/d/1-5-KpR2GYIwWXGRH_C8IUVbFsm8RiETOVNYsMB5W8Ic/edit?usp=sharing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-3837) Add isLeader check for SCM state updates

2020-06-19 Thread Li Cheng (Jira)
Li Cheng created HDDS-3837:
--

 Summary: Add isLeader check for SCM state updates
 Key: HDDS-3837
 URL: https://issues.apache.org/jira/browse/HDDS-3837
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: Li Cheng


We only allow leader to make decisions to update map, DB and fire events to DN



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-3191) Switch current pipeline interface to the new Replication based interface to write to Ratis

2020-06-17 Thread Li Cheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Cheng reassigned HDDS-3191:
--

Assignee: Li Cheng

> Switch current pipeline interface to the new Replication based interface to 
> write to Ratis
> --
>
> Key: HDDS-3191
> URL: https://issues.apache.org/jira/browse/HDDS-3191
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Li Cheng
>Assignee: Li Cheng
>Priority: Major
>
> Due to consistency concern, SCM needs to applyTransaction to RaftLog before 
> it writes to local database and in memory maps. Need refactor the current 
> codes to put this part to Ratis.
> Ratis will write to DB on behalf of SCM.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-3191) Switch current pipeline interface to the new Replication based interface to write to Ratis

2020-06-17 Thread Li Cheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Cheng reassigned HDDS-3191:
--

Assignee: (was: Li Cheng)

> Switch current pipeline interface to the new Replication based interface to 
> write to Ratis
> --
>
> Key: HDDS-3191
> URL: https://issues.apache.org/jira/browse/HDDS-3191
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Li Cheng
>Priority: Major
>
> Due to consistency concern, SCM needs to applyTransaction to RaftLog before 
> it writes to local database and in memory maps. Need refactor the current 
> codes to put this part to Ratis.
> Ratis will write to DB on behalf of SCM.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-3679) Add unit tests for new PipelineManager interface

2020-06-16 Thread Li Cheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Cheng resolved HDDS-3679.

Fix Version/s: 0.6.0
   Resolution: Fixed

> Add unit tests for new PipelineManager interface
> 
>
> Key: HDDS-3679
> URL: https://issues.apache.org/jira/browse/HDDS-3679
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Li Cheng
>Assignee: Li Cheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-3776) Upgrading RocksDB version to avoid java heap issue

2020-06-10 Thread Li Cheng (Jira)
Li Cheng created HDDS-3776:
--

 Summary: Upgrading RocksDB version to avoid java heap issue
 Key: HDDS-3776
 URL: https://issues.apache.org/jira/browse/HDDS-3776
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: upgrade
Affects Versions: 0.5.0
Reporter: Li Cheng


Currently we have rocksdb 6.6.4 as major version and there are some jvm issues 
in tests (happened in [https://github.com/apache/hadoop-ozone/pull/1019]) 
related to rocksdb core dump. We may upgrade to 6.8.1 to avoid this issue.

{{JRE version: Java(TM) SE Runtime Environment (8.0_211-b12) (build 
1.8.0_211-b12)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.211-b12 mixed mode bsd-amd64 
compressed oops)
# Problematic frame:
# C  [librocksdbjni2954960755376440018.jnilib+0x602b8]  
rocksdb::GetColumnFamilyID(rocksdb::ColumnFamilyHandle*)+0x8

See full dump at 
[https://the-asf.slack.com/files/U0159PV5Z6U/F0152UAJF0S/hs_err_pid90655.log?origin_team=T4S1WH2J3_channel=D014L2URB6E](url)}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-3499) Address compatibility issue by SCM DB instances change

2020-06-10 Thread Li Cheng (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-3499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17132853#comment-17132853
 ] 

Li Cheng commented on HDDS-3499:


[~arp] Our internal production deployment is still on schedule. But we have 
done internal tests to verify the step works for me. Resolving this now...

> Address compatibility issue by SCM DB instances change
> --
>
> Key: HDDS-3499
> URL: https://issues.apache.org/jira/browse/HDDS-3499
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Li Cheng
>Assignee: Marton Elek
>Priority: Blocker
>  Labels: Triaged
>
> After https://issues.apache.org/jira/browse/HDDS-3172, SCM now has one single 
> rocksdb instance instead of multiple db instances. 
> For running Ozone cluster, we need to address compatibility issues. One 
> possible way is to have a side-way tool to migrate old metadata from multiple 
> dbs to current single db.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-3499) Address compatibility issue by SCM DB instances change

2020-06-10 Thread Li Cheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Cheng resolved HDDS-3499.

Fix Version/s: 0.6.0
   Resolution: Fixed

> Address compatibility issue by SCM DB instances change
> --
>
> Key: HDDS-3499
> URL: https://issues.apache.org/jira/browse/HDDS-3499
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Li Cheng
>Assignee: Marton Elek
>Priority: Blocker
>  Labels: Triaged
> Fix For: 0.6.0
>
>
> After https://issues.apache.org/jira/browse/HDDS-3172, SCM now has one single 
> rocksdb instance instead of multiple db instances. 
> For running Ozone cluster, we need to address compatibility issues. One 
> possible way is to have a side-way tool to migrate old metadata from multiple 
> dbs to current single db.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-3662) decouple finalize and destroy pipeline

2020-06-04 Thread Li Cheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Cheng reassigned HDDS-3662:
--

Assignee: Li Cheng

> decouple finalize and destroy pipeline
> --
>
> Key: HDDS-3662
> URL: https://issues.apache.org/jira/browse/HDDS-3662
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Li Cheng
>Assignee: Li Cheng
>Priority: Major
>
> We have to decouple finalize and destroy pipeline. We should have two 
> separate calls, closePipeline and destroyPipeline.
> Close pipeline should only update the pipeline state, it’s the job of the 
> caller to issue close container commands to all the containers in the 
> pipeline.
> Destroy pipeline should be called from pipeline scrubber, once a pipeline has 
> spent enough time in closed state the pipeline scrubber should call destroy 
> pipeline.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-3693) Switch to PipelineStateManagerV2 and put PipelineFactory in PipelineManager

2020-06-02 Thread Li Cheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Cheng resolved HDDS-3693.

Release Note: PR is merged
  Resolution: Fixed

> Switch to PipelineStateManagerV2 and put PipelineFactory in PipelineManager
> ---
>
> Key: HDDS-3693
> URL: https://issues.apache.org/jira/browse/HDDS-3693
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Li Cheng
>Assignee: Li Cheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-3693) Switch to PipelineStateManagerV2 and put PipelineFactory in PipelineManager

2020-06-01 Thread Li Cheng (Jira)
Li Cheng created HDDS-3693:
--

 Summary: Switch to PipelineStateManagerV2 and put PipelineFactory 
in PipelineManager
 Key: HDDS-3693
 URL: https://issues.apache.org/jira/browse/HDDS-3693
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: Li Cheng
Assignee: Li Cheng






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-3499) Address compatibility issue by SCM DB instances change

2020-05-29 Thread Li Cheng (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-3499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119406#comment-17119406
 ] 

Li Cheng commented on HDDS-3499:


Hey I've tested in a testing cluster and it works for us. [~elek] [~arp]

We will prepare a deployment next week and hopefully our production cluster can 
migrate safely.

> Address compatibility issue by SCM DB instances change
> --
>
> Key: HDDS-3499
> URL: https://issues.apache.org/jira/browse/HDDS-3499
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Li Cheng
>Assignee: Marton Elek
>Priority: Blocker
>
> After https://issues.apache.org/jira/browse/HDDS-3172, SCM now has one single 
> rocksdb instance instead of multiple db instances. 
> For running Ozone cluster, we need to address compatibility issues. One 
> possible way is to have a side-way tool to migrate old metadata from multiple 
> dbs to current single db.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-3684) Add tests for replication annotation

2020-05-28 Thread Li Cheng (Jira)
Li Cheng created HDDS-3684:
--

 Summary: Add tests for replication annotation
 Key: HDDS-3684
 URL: https://issues.apache.org/jira/browse/HDDS-3684
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: Li Cheng






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-3191) Switch current pipeline interface to the new Replication based interface to write to Ratis

2020-05-28 Thread Li Cheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Cheng reassigned HDDS-3191:
--

Assignee: Li Cheng

> Switch current pipeline interface to the new Replication based interface to 
> write to Ratis
> --
>
> Key: HDDS-3191
> URL: https://issues.apache.org/jira/browse/HDDS-3191
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Li Cheng
>Assignee: Li Cheng
>Priority: Major
>
> Due to consistency concern, SCM needs to applyTransaction to RaftLog before 
> it writes to local database and in memory maps. Need refactor the current 
> codes to put this part to Ratis.
> Ratis will write to DB on behalf of SCM.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3191) Switch current pipeline interface to the new Replication based interface to write to Ratis

2020-05-28 Thread Li Cheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Cheng updated HDDS-3191:
---
Summary: Switch current pipeline interface to the new Replication based 
interface to write to Ratis  (was: Interface to write to Ratis before write to 
SCM DB)

> Switch current pipeline interface to the new Replication based interface to 
> write to Ratis
> --
>
> Key: HDDS-3191
> URL: https://issues.apache.org/jira/browse/HDDS-3191
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Li Cheng
>Priority: Major
>
> Due to consistency concern, SCM needs to applyTransaction to RaftLog before 
> it writes to local database and in memory maps. Need refactor the current 
> codes to put this part to Ratis.
> Ratis will write to DB on behalf of SCM.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3196) New PipelineManager interface to persist to RatisServer

2020-05-28 Thread Li Cheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Cheng updated HDDS-3196:
---
Status: Patch Available  (was: Open)

[https://github.com/apache/hadoop-ozone/pull/980]

> New PipelineManager interface to persist to RatisServer
> ---
>
> Key: HDDS-3196
> URL: https://issues.apache.org/jira/browse/HDDS-3196
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Li Cheng
>Assignee: Li Cheng
>Priority: Major
>
> This applies to DestroyPipeline as well createPipeline



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-3679) Add unit tests for new PipelineManager interface

2020-05-28 Thread Li Cheng (Jira)
Li Cheng created HDDS-3679:
--

 Summary: Add unit tests for new PipelineManager interface
 Key: HDDS-3679
 URL: https://issues.apache.org/jira/browse/HDDS-3679
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: Li Cheng
Assignee: Li Cheng






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3192) Handle AllocateContainer operation for HA

2020-05-28 Thread Li Cheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Cheng updated HDDS-3192:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Handle AllocateContainer operation for HA
> -
>
> Key: HDDS-3192
> URL: https://issues.apache.org/jira/browse/HDDS-3192
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM HA
>Reporter: Li Cheng
>Assignee: Nanda kumar
>Priority: Major
>
> Allocate container calls should make sure that the newly created container 
> information is replicated to the followers via Ratis.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-3192) Handle AllocateContainer operation for HA

2020-05-28 Thread Li Cheng (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118613#comment-17118613
 ] 

Li Cheng commented on HDDS-3192:


PR is merged. Thanks Nanda for this contribution.

> Handle AllocateContainer operation for HA
> -
>
> Key: HDDS-3192
> URL: https://issues.apache.org/jira/browse/HDDS-3192
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM HA
>Reporter: Li Cheng
>Assignee: Nanda kumar
>Priority: Major
>
> Allocate container calls should make sure that the newly created container 
> information is replicated to the followers via Ratis.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-3677) Handle events fired from PipelineManager to close container

2020-05-28 Thread Li Cheng (Jira)
Li Cheng created HDDS-3677:
--

 Summary: Handle events fired from PipelineManager to close 
container
 Key: HDDS-3677
 URL: https://issues.apache.org/jira/browse/HDDS-3677
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: Li Cheng


finalizePipeline used to fire events to close containers. In new interface, we 
should decide where to fire these events.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3196) New PipelineManager interface to persist to RatisServer

2020-05-26 Thread Li Cheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Cheng updated HDDS-3196:
---
Summary: New PipelineManager interface to persist to RatisServer  (was: 
Pipeline mutation needs to applyTransaction before writing to DB)

> New PipelineManager interface to persist to RatisServer
> ---
>
> Key: HDDS-3196
> URL: https://issues.apache.org/jira/browse/HDDS-3196
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Li Cheng
>Assignee: Li Cheng
>Priority: Major
>
> This applies to DestroyPipeline as well createPipeline



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-3661) Add Snapshot into new SCMRatisServer and SCMStateMachine

2020-05-26 Thread Li Cheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Cheng updated HDDS-3661:
---
Summary: Add Snapshot into new SCMRatisServer  and SCMStateMachine  (was: 
Combine different versions of SCMRatisServer and SCMStateMachine)

> Add Snapshot into new SCMRatisServer  and SCMStateMachine
> -
>
> Key: HDDS-3661
> URL: https://issues.apache.org/jira/browse/HDDS-3661
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Li Cheng
>Assignee: Li Cheng
>Priority: Major
>
> Now we have prototype SCMRatisServer and SCMStateMachine under Ratis and HA 
> path. We should combine them together.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-3661) Combine different versions of SCMRatisServer and SCMStateMachine

2020-05-26 Thread Li Cheng (Jira)
Li Cheng created HDDS-3661:
--

 Summary: Combine different versions of SCMRatisServer and 
SCMStateMachine
 Key: HDDS-3661
 URL: https://issues.apache.org/jira/browse/HDDS-3661
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: Li Cheng
Assignee: Li Cheng


Now we have prototype SCMRatisServer and SCMStateMachine under Ratis and HA 
path. We should combine them together.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-3660) Arrange Util classes for SCM HA

2020-05-26 Thread Li Cheng (Jira)
Li Cheng created HDDS-3660:
--

 Summary: Arrange Util classes for SCM HA
 Key: HDDS-3660
 URL: https://issues.apache.org/jira/browse/HDDS-3660
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: Li Cheng
Assignee: Nanda kumar


Now we have SCMHAUtils and RatisUtil. We need to arrange util classes for SCM 
HA better.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-3186) Introduce generic SCMRatisRequest and SCMRatisResponse

2020-05-26 Thread Li Cheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Cheng resolved HDDS-3186.

Resolution: Fixed

> Introduce generic SCMRatisRequest and SCMRatisResponse
> --
>
> Key: HDDS-3186
> URL: https://issues.apache.org/jira/browse/HDDS-3186
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Li Cheng
>Assignee: Nanda kumar
>Priority: Major
>  Labels: pull-request-available
>
> This jira will introduce generic SCMRatisRequest and SCMRatisResponse which 
> will be used by all the Ratis operations inside SCM. We will also have a 
> generic StateMachine which will dispatch the request to registered handlers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-3186) Introduce generic SCMRatisRequest and SCMRatisResponse

2020-05-26 Thread Li Cheng (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17116536#comment-17116536
 ] 

Li Cheng commented on HDDS-3186:


PR is merged. Thanks Nanda for contribution.

> Introduce generic SCMRatisRequest and SCMRatisResponse
> --
>
> Key: HDDS-3186
> URL: https://issues.apache.org/jira/browse/HDDS-3186
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Li Cheng
>Assignee: Nanda kumar
>Priority: Major
>  Labels: pull-request-available
>
> This jira will introduce generic SCMRatisRequest and SCMRatisResponse which 
> will be used by all the Ratis operations inside SCM. We will also have a 
> generic StateMachine which will dispatch the request to registered handlers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-3556) Refactor configuration in SCMRatisServer to Java-based configuration

2020-05-19 Thread Li Cheng (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-3556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111719#comment-17111719
 ] 

Li Cheng commented on HDDS-3556:


PR is merged.

> Refactor configuration in SCMRatisServer to Java-based configuration
> 
>
> Key: HDDS-3556
> URL: https://issues.apache.org/jira/browse/HDDS-3556
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Affects Versions: 0.5.0
>Reporter: Li Cheng
>Assignee: Li Cheng
>Priority: Major
>  Labels: pull-request-available
>
> [https://cwiki.apache.org/confluence/display/HADOOP/Java-based+configuration+API]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-3556) Refactor configuration in SCMRatisServer to Java-based configuration

2020-05-19 Thread Li Cheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Cheng resolved HDDS-3556.

Release Note: PR is merged.
  Resolution: Fixed

> Refactor configuration in SCMRatisServer to Java-based configuration
> 
>
> Key: HDDS-3556
> URL: https://issues.apache.org/jira/browse/HDDS-3556
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Affects Versions: 0.5.0
>Reporter: Li Cheng
>Assignee: Li Cheng
>Priority: Major
>  Labels: pull-request-available
>
> [https://cwiki.apache.org/confluence/display/HADOOP/Java-based+configuration+API]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-3499) Address compatibility issue by SCM DB instances change

2020-05-19 Thread Li Cheng (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-3499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111086#comment-17111086
 ] 

Li Cheng commented on HDDS-3499:


[~elek]Thanks for the testing. I will test with a VM cluster with 1 master and 
3 datanodes. We see how it goes.

 

[~arp] Sure, start to test it soon. Will get back to you.

> Address compatibility issue by SCM DB instances change
> --
>
> Key: HDDS-3499
> URL: https://issues.apache.org/jira/browse/HDDS-3499
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Li Cheng
>Assignee: Marton Elek
>Priority: Blocker
>
> After https://issues.apache.org/jira/browse/HDDS-3172, SCM now has one single 
> rocksdb instance instead of multiple db instances. 
> For running Ozone cluster, we need to address compatibility issues. One 
> possible way is to have a side-way tool to migrate old metadata from multiple 
> dbs to current single db.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-3499) Address compatibility issue by SCM DB instances change

2020-05-18 Thread Li Cheng (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-3499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17110144#comment-17110144
 ] 

Li Cheng commented on HDDS-3499:


[~elek] Hey Marton, our production cluster is pending on this Jira for future 
upgrade. We are expecting an upgrade recently. Would you like to share some 
progress here?

> Address compatibility issue by SCM DB instances change
> --
>
> Key: HDDS-3499
> URL: https://issues.apache.org/jira/browse/HDDS-3499
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Li Cheng
>Assignee: Marton Elek
>Priority: Blocker
>
> After https://issues.apache.org/jira/browse/HDDS-3172, SCM now has one single 
> rocksdb instance instead of multiple db instances. 
> For running Ozone cluster, we need to address compatibility issues. One 
> possible way is to have a side-way tool to migrate old metadata from multiple 
> dbs to current single db.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-3196) Pipeline mutation needs to applyTransaction before writing to DB

2020-05-15 Thread Li Cheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Cheng reassigned HDDS-3196:
--

Assignee: Li Cheng

> Pipeline mutation needs to applyTransaction before writing to DB
> 
>
> Key: HDDS-3196
> URL: https://issues.apache.org/jira/browse/HDDS-3196
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Li Cheng
>Assignee: Li Cheng
>Priority: Major
>
> This applies to DestroyPipeline as well createPipeline



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-3577) Reusable Ratis configuration among OM, SCM, DN and container

2020-05-12 Thread Li Cheng (Jira)
Li Cheng created HDDS-3577:
--

 Summary: Reusable Ratis configuration among OM, SCM, DN and 
container
 Key: HDDS-3577
 URL: https://issues.apache.org/jira/browse/HDDS-3577
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
Affects Versions: 0.5.0
Reporter: Li Cheng


Currently we have OM HA, container and DN who are using Ratis for consistency 
and redundancy. They all have own Ratis configuration. SCM  HA is ongoing and 
SCM is also going to have Ratis support.

Also we are moving to java based configuration: 
[https://cwiki.apache.org/confluence/display/HADOOP/Java-based+configuration+API]

We shall consider clean up some naming fashion for all these configs and think 
of reusing some of them. We now have ozone.om.ratis, ozone.scm.ratis, 
hdds.datanode.ratis, dfs.container.ratis and even dfs.ratis. We may name them 
as ozone.ratis.* and let annotation to look for 'ozone.ratis' as prefix and 
then reuse some configs across OM, SCM and container.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-3559) Datanode doesn't handle java heap OutOfMemory exception

2020-05-08 Thread Li Cheng (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17102314#comment-17102314
 ] 

Li Cheng commented on HDDS-3559:


!http://file.tapd.oa.com//tfl/captures/2020-05/tapd_20417861_base64_1588909049_64.png!

> Datanode doesn't handle java heap OutOfMemory exception 
> 
>
> Key: HDDS-3559
> URL: https://issues.apache.org/jira/browse/HDDS-3559
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.5.0
>Reporter: Li Cheng
>Priority: Major
>
> 2020-05-05 15:47:41,568 [Datanode State Machine Thread - 167] WARN 
> org.apache.hadoop.ozone.container.common.statemachine.Endpoi
> ntStateMachine: Unable to communicate to SCM server at host-10-51-87-181:9861 
> for past 0 seconds.
> java.io.IOException: com.google.protobuf.ServiceException: 
> java.lang.OutOfMemoryError: Java heap space
>         at 
> org.apache.hadoop.ipc.ProtobufHelper.getRemoteException(ProtobufHelper.java:47)
>         at 
> org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolClientSideTranslatorPB.submitRequest(StorageContainerDatanodeProtocolClientSideTranslatorPB.java:118)
>         at 
> org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolClientSideTranslatorPB.sendHeartbeat(StorageContainerDatanodeProtocolClientSideTranslatorPB.java:148)
>         at 
> org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask.call(HeartbeatEndpointTask.java:145)
>         at 
> org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask.call(HeartbeatEndpointTask.java:76)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: com.google.protobuf.ServiceException: java.lang.OutOfMemoryError: 
> Java heap space
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.getReturnMessage(ProtobufRpcEngine.java:293)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:270)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>         at com.sun.proxy.$Proxy38.submitRequest(Unknown Source)
>         at 
> org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolClientSideTranslatorPB.submitRequest(StorageContainerDatanodeProtocolClientSideTranslatorPB.java:116)
>  
> On a cluster, one datanode stops reporting to SCM while being kept unknown. 
> The datanode process is still working. Log shows Java heap OOM when it's 
> serializing protobuf for rpc message. However, datanode silently stops 
> reports to SCM and the process becomes stale.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-3559) Datanode doesn't handle java heap OutOfMemory exception

2020-05-08 Thread Li Cheng (Jira)
Li Cheng created HDDS-3559:
--

 Summary: Datanode doesn't handle java heap OutOfMemory exception 
 Key: HDDS-3559
 URL: https://issues.apache.org/jira/browse/HDDS-3559
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode
Affects Versions: 0.5.0
Reporter: Li Cheng


2020-05-05 15:47:41,568 [Datanode State Machine Thread - 167] WARN 
org.apache.hadoop.ozone.container.common.statemachine.Endpoi
ntStateMachine: Unable to communicate to SCM server at host-10-51-87-181:9861 
for past 0 seconds.
java.io.IOException: com.google.protobuf.ServiceException: 
java.lang.OutOfMemoryError: Java heap space
        at 
org.apache.hadoop.ipc.ProtobufHelper.getRemoteException(ProtobufHelper.java:47)
        at 
org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolClientSideTranslatorPB.submitRequest(StorageContainerDatanodeProtocolClientSideTranslatorPB.java:118)
        at 
org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolClientSideTranslatorPB.sendHeartbeat(StorageContainerDatanodeProtocolClientSideTranslatorPB.java:148)
        at 
org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask.call(HeartbeatEndpointTask.java:145)
        at 
org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask.call(HeartbeatEndpointTask.java:76)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: com.google.protobuf.ServiceException: java.lang.OutOfMemoryError: 
Java heap space
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.getReturnMessage(ProtobufRpcEngine.java:293)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:270)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
        at com.sun.proxy.$Proxy38.submitRequest(Unknown Source)
        at 
org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolClientSideTranslatorPB.submitRequest(StorageContainerDatanodeProtocolClientSideTranslatorPB.java:116)
 
On a cluster, one datanode stops reporting to SCM while being kept unknown. The 
datanode process is still working. Log shows Java heap OOM when it's 
serializing protobuf for rpc message. However, datanode silently stops reports 
to SCM and the process becomes stale.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-3556) Refactor configuration in SCMRatisServer to Java-based configuration

2020-05-07 Thread Li Cheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Cheng reassigned HDDS-3556:
--

Assignee: Li Cheng

> Refactor configuration in SCMRatisServer to Java-based configuration
> 
>
> Key: HDDS-3556
> URL: https://issues.apache.org/jira/browse/HDDS-3556
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Affects Versions: 0.5.0
>Reporter: Li Cheng
>Assignee: Li Cheng
>Priority: Major
>
> [https://cwiki.apache.org/confluence/display/HADOOP/Java-based+configuration+API]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-3556) Refactor configuration in SCMRatisServer to Java-based configuration

2020-05-07 Thread Li Cheng (Jira)
Li Cheng created HDDS-3556:
--

 Summary: Refactor configuration in SCMRatisServer to Java-based 
configuration
 Key: HDDS-3556
 URL: https://issues.apache.org/jira/browse/HDDS-3556
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
  Components: SCM
Affects Versions: 0.5.0
Reporter: Li Cheng


[https://cwiki.apache.org/confluence/display/HADOOP/Java-based+configuration+API]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-3186) Client requests to SCM RatisServer

2020-05-06 Thread Li Cheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Cheng reassigned HDDS-3186:
--

Assignee: Nanda kumar  (was: Li Cheng)

> Client requests to SCM RatisServer
> --
>
> Key: HDDS-3186
> URL: https://issues.apache.org/jira/browse/HDDS-3186
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Li Cheng
>Assignee: Nanda kumar
>Priority: Major
>
> Refactor requests to be handled by SCM RatisServer



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-3499) Address compatibility issue by SCM DB instances change

2020-04-28 Thread Li Cheng (Jira)
Li Cheng created HDDS-3499:
--

 Summary: Address compatibility issue by SCM DB instances change
 Key: HDDS-3499
 URL: https://issues.apache.org/jira/browse/HDDS-3499
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Li Cheng


After https://issues.apache.org/jira/browse/HDDS-3172, SCM now has one single 
rocksdb instance instead of multiple db instances. 

For running Ozone cluster, we need to address compatibility issues. One 
possible way is to have a side-way tool to migrate old metadata from multiple 
dbs to current single db.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-3491) SCM Invoke Handler for Ratis calls

2020-04-27 Thread Li Cheng (Jira)
Li Cheng created HDDS-3491:
--

 Summary: SCM  Invoke Handler for Ratis calls
 Key: HDDS-3491
 URL: https://issues.apache.org/jira/browse/HDDS-3491
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: Li Cheng






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-3186) Client requests to SCM RatisServer

2020-04-21 Thread Li Cheng (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17088481#comment-17088481
 ] 

Li Cheng commented on HDDS-3186:


https://docs.google.com/document/d/1NIf7GypgHFvznB_nb1An-vNfZc8BvxBVCoIpkI_JnQs/edit?usp=sharing

> Client requests to SCM RatisServer
> --
>
> Key: HDDS-3186
> URL: https://issues.apache.org/jira/browse/HDDS-3186
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Li Cheng
>Assignee: Li Cheng
>Priority: Major
>
> Refactor requests to be handled by SCM RatisServer



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-3466) Improve filterViableNodes performance in pipeline creation

2020-04-20 Thread Li Cheng (Jira)
Li Cheng created HDDS-3466:
--

 Summary: Improve filterViableNodes performance in pipeline creation
 Key: HDDS-3466
 URL: https://issues.apache.org/jira/browse/HDDS-3466
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: SCM
Affects Versions: 0.5.0
Reporter: Li Cheng


Per [~sodonnell]'s investigation, pipeline creation may have performance issue 
once the load-sorting algorithm in 
https://issues.apache.org/jira/browse/HDDS-3139. 

 

This task is to track potential performance bottleneck caused by this sorting 
operation for pipeline creation in large scale cluster.

 

I am a little concerned about the expense of forming the list of healthy nodes 
on large clusters. We have to do quite a lot of work to form a list and then 
only use 3 nodes from the list. Even the method {{currentPipelineCount()}} 
needs to do a few map lookups per node to get the current pipeline count. This 
is the case even before this change. Creating a pipeline on a large cluster 
would be expensive already, but this change probably makes it worse, due to the 
sort needed. I know it was me who suggested the sort.

I think the code as it is will work OK upto about 1000 nodes, and then the 
performance will drop off as the number of nodes goes toward 10k.

Eg here are some benchmarks I created using this test code, which is similar to 
what we are doing in {{filterViableNodes()}}:

 

{{  public List sortingWithMap(BenchmarkState state) \{
return state.otherList.stream()
.map(o -> new Mock(o, state.rand.nextInt(20)))
.filter(o -> o.getSize() <= 20)
.sorted(Comparator.comparingInt(Mock::getSize))
.map(o -> o.getObject())
.collect(Collectors.toList());
  }}}

The OPs per second for various list sizes are:

 

{{Benchmark   (listSize)   Mode  Cnt   Score Error  Units
Sorting.sortingWithMap 100  thrpt3  113948.345 ± 446.426  ops/s
Sorting.sortingWithMap1000  thrpt39468.507 ± 894.138  ops/s
Sorting.sortingWithMap5000  thrpt31931.612 ± 263.919  ops/s
Sorting.sortingWithMap   1  thrpt3 970.745 ±  25.823  ops/s
Sorting.sortingWithMap  10  thrpt3  87.684 ±  35.438  ops/s}}

For a 1000 node cluster, with 10 pipelines per node, we would be looking at 
about 1 second to form all the piplines.

For a 5k node cluster, it would be about 25 seconds

For a 10k node cluster it would be 103 seconds, but even here, that would be at 
close to 1000 pipelines per second.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-3186) Client requests to SCM RatisServer

2020-04-20 Thread Li Cheng (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17087524#comment-17087524
 ] 

Li Cheng commented on HDDS-3186:


Nanda's doc: 
[https://docs.google.com/document/d/1YGdROzaWn8RqIjnvMafH0P0hu6M_b_2AQHMLWYVXXb8/edit?invite=CIfyodwC=5e97cf44#]

> Client requests to SCM RatisServer
> --
>
> Key: HDDS-3186
> URL: https://issues.apache.org/jira/browse/HDDS-3186
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Li Cheng
>Assignee: Li Cheng
>Priority: Major
>
> Refactor requests to be handled by SCM RatisServer



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-3187) SCM StateMachine

2020-04-20 Thread Li Cheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Cheng resolved HDDS-3187.

Release Note: PR is merged
  Resolution: Fixed

> SCM StateMachine
> 
>
> Key: HDDS-3187
> URL: https://issues.apache.org/jira/browse/HDDS-3187
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Li Cheng
>Assignee: Li Cheng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> SCM needs a StateMachine to manage states. StateMachine supports 
> applyTransaction and call RatisServer API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-3329) Ozone cluster expansion: Block deletion mismatch

2020-04-02 Thread Li Cheng (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-3329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17073654#comment-17073654
 ] 

Li Cheng commented on HDDS-3329:


We were having different network-config for topology between SCM and datanode 
and we modified it on datanode following with a restart. Not sure if it matters.

> Ozone cluster expansion: Block deletion mismatch
> 
>
> Key: HDDS-3329
> URL: https://issues.apache.org/jira/browse/HDDS-3329
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.4.1
>Reporter: Li Cheng
>Assignee: Lokesh Jain
>Priority: Major
>
> SCM logs keep printing this when we expand Ozone cluster with more datanodes.
>  
> 2020-04-02 19:45:42,745 
> [EventQueue-PendingDeleteStatusForPendingDeleteHandler] INFO 
> org.apache.hadoop.hdds.scm.block.SCMBlockDeletingService: Block deletion 
> txnID mismatch in datanode 1eacbd89-a835-438e-aa4b-5bc78adb7c8c for 
> containerID 314. Datanode delete txnID: 0, SCM txnID: 1208
> 2020-04-02 19:45:42,745 
> [EventQueue-PendingDeleteStatusForPendingDeleteHandler] INFO 
> org.apache.hadoop.hdds.scm.block.SCMBlockDeletingService: Block deletion 
> txnID mismatch in datanode 1eacbd89-a835-438e-aa4b-5bc78adb7c8c for 
> containerID 351. Datanode delete txnID: 0, SCM txnID: 662
> 2020-04-02 19:45:42,745 
> [EventQueue-PendingDeleteStatusForPendingDeleteHandler] INFO 
> org.apache.hadoop.hdds.scm.block.SCMBlockDeletingService: Block deletion 
> txnID mismatch in datanode 1eacbd89-a835-438e-aa4b-5bc78adb7c8c for 
> containerID 352. Datanode delete txnID: 0, SCM txnID: 1085



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-3329) Ozone cluster expansion: Block deletion mismatch

2020-04-02 Thread Li Cheng (Jira)
Li Cheng created HDDS-3329:
--

 Summary: Ozone cluster expansion: Block deletion mismatch
 Key: HDDS-3329
 URL: https://issues.apache.org/jira/browse/HDDS-3329
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: SCM
Affects Versions: 0.4.1
Reporter: Li Cheng


SCM logs keep printing this when we expand Ozone cluster with more datanodes.

 

2020-04-02 19:45:42,745 [EventQueue-PendingDeleteStatusForPendingDeleteHandler] 
INFO org.apache.hadoop.hdds.scm.block.SCMBlockDeletingService: Block deletion 
txnID mismatch in datanode 1eacbd89-a835-438e-aa4b-5bc78adb7c8c for containerID 
314. Datanode delete txnID: 0, SCM txnID: 1208
2020-04-02 19:45:42,745 [EventQueue-PendingDeleteStatusForPendingDeleteHandler] 
INFO org.apache.hadoop.hdds.scm.block.SCMBlockDeletingService: Block deletion 
txnID mismatch in datanode 1eacbd89-a835-438e-aa4b-5bc78adb7c8c for containerID 
351. Datanode delete txnID: 0, SCM txnID: 662
2020-04-02 19:45:42,745 [EventQueue-PendingDeleteStatusForPendingDeleteHandler] 
INFO org.apache.hadoop.hdds.scm.block.SCMBlockDeletingService: Block deletion 
txnID mismatch in datanode 1eacbd89-a835-438e-aa4b-5bc78adb7c8c for containerID 
352. Datanode delete txnID: 0, SCM txnID: 1085



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-3187) SCM StateMachine

2020-03-31 Thread Li Cheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Cheng reassigned HDDS-3187:
--

Assignee: Li Cheng

> SCM StateMachine
> 
>
> Key: HDDS-3187
> URL: https://issues.apache.org/jira/browse/HDDS-3187
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Li Cheng
>Assignee: Li Cheng
>Priority: Major
>
> SCM needs a StateMachine to manage states. StateMachine supports 
> applyTransaction and call RatisServer API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



  1   2   3   >