[jira] [Updated] (HDDS-4191) Add failover proxy for SCM container client
[ https://issues.apache.org/jira/browse/HDDS-4191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng updated HDDS-4191: --- Status: Patch Available (was: In Progress) https://github.com/apache/hadoop-ozone/pull/1514 > Add failover proxy for SCM container client > --- > > Key: HDDS-4191 > URL: https://issues.apache.org/jira/browse/HDDS-4191 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Li Cheng >Assignee: Li Cheng >Priority: Major > Labels: pull-request-available > > Take advantage of failover proxy in HDDS-3188 and have failover proxy for SCM > container client as well -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-4393) Fix CI and test failures after force push on 2020/10/26
[ https://issues.apache.org/jira/browse/HDDS-4393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17221168#comment-17221168 ] Li Cheng commented on HDDS-4393: [https://github.com/apache/hadoop-ozone/pull/1522] shows the current feature branch HDDS-2823 has issues in CI. > Fix CI and test failures after force push on 2020/10/26 > --- > > Key: HDDS-4393 > URL: https://issues.apache.org/jira/browse/HDDS-4393 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM HA >Reporter: Li Cheng >Priority: Blocker > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-4393) Fix CI and test failures after force push on 2020/10/26
Li Cheng created HDDS-4393: -- Summary: Fix CI and test failures after force push on 2020/10/26 Key: HDDS-4393 URL: https://issues.apache.org/jira/browse/HDDS-4393 Project: Hadoop Distributed Data Store Issue Type: Sub-task Components: SCM HA Reporter: Li Cheng -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-4339) Ozone S3 gateway throws NPE with goofys
[ https://issues.apache.org/jira/browse/HDDS-4339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17219456#comment-17219456 ] Li Cheng commented on HDDS-4339: I created https://issues.apache.org/jira/browse/HDDS-4361 to track s3g error messages. > Ozone S3 gateway throws NPE with goofys > --- > > Key: HDDS-4339 > URL: https://issues.apache.org/jira/browse/HDDS-4339 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Affects Versions: 1.0.0 >Reporter: Li Cheng >Assignee: Li Cheng >Priority: Blocker > Labels: pull-request-available > Attachments: image-2020-10-13-15-23-49-864.png > > > Configured goofys and s3g on different hosts and Fiotest writes files on the > goofys mount point. Export AWS secrets on the s3g host. See a bunch of NPE in > s3g logs. > # Looks like missing AWS auth header could cause NPE. Looks like > AWSSignatureProcessor.init() doesn't handle header missing which causes NPE. > # Why it's missing AWS auth header is also unknown. > Note that there are files that have been successfully written into Ozone via > goofys, while not all of them are succeeded. > > 2020-10-13 11:18:43,425 [qtp1686100174-1238] ERROR > org.apache.hadoop.ozone.s3.OzoneClientProducer: Error: > org.jboss.weld.exceptions.WeldException: WELD-49: Unable to invoke public > void org.apache.hadoop.ozone.s3.AWSSignatureProcessor.init() throws > java.lang.Exception on > org.apache.hadoop.ozone.s3.AWSSignatureProcessor@5535155b > at > org.jboss.weld.injection.producer.DefaultLifecycleCallbackInvoker.invokeMethods(DefaultLifecycleCallbackInvoker.java:99) > at > org.jboss.weld.injection.producer.DefaultLifecycleCallbackInvoker.postConstruct(DefaultLifecycleCallbackInvoker.java:80) > at > org.jboss.weld.injection.producer.BasicInjectionTarget.postConstruct(BasicInjectionTarget.java:122) > at > org.glassfish.jersey.ext.cdi1x.internal.CdiComponentProvider$InjectionManagerInjectedCdiTarget.postConstruct(CdiComponentProvider.java:887) > at org.jboss.weld.bean.ManagedBean.create(ManagedBean.java:162) > at org.jboss.weld.context.AbstractContext.get(AbstractContext.java:96) > at > org.jboss.weld.bean.ContextualInstanceStrategy$DefaultContextualInstanceStrategy.get(ContextualInstanceStrategy.java:100) > at > org.jboss.weld.bean.ContextualInstanceStrategy$CachingContextualInstanceStrategy.get(ContextualInstanceStrategy.java:177) > at org.jboss.weld.bean.ContextualInstance.get(ContextualInstance.java:50) > at > org.jboss.weld.bean.proxy.ContextBeanInstance.getInstance(ContextBeanInstance.java:99) > at > org.jboss.weld.bean.proxy.ProxyMethodHandler.getInstance(ProxyMethodHandler.java:125) > at > org.apache.hadoop.ozone.s3.AWSSignatureProcessor$Proxy$_$$_WeldClientProxy.getAwsAccessId(Unknown > Source) > at > org.apache.hadoop.ozone.s3.OzoneClientProducer.getClient(OzoneClientProducer.java:79) > at > org.apache.hadoop.ozone.s3.OzoneClientProducer.createClient(OzoneClientProducer.java:68) > at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.jboss.weld.injection.StaticMethodInjectionPoint.invoke(StaticMethodInjectionPoint.java:88) > at > org.jboss.weld.injection.StaticMethodInjectionPoint.invoke(StaticMethodInjectionPoint.java:78) > at > org.jboss.weld.injection.producer.ProducerMethodProducer.produce(ProducerMethodProducer.java:100) > at > org.jboss.weld.injection.producer.AbstractMemberProducer.produce(AbstractMemberProducer.java:161) > at > org.jboss.weld.bean.AbstractProducerBean.create(AbstractProducerBean.java:180) > at > org.jboss.weld.context.unbound.DependentContextImpl.get(DependentContextImpl.java:70) > at > org.jboss.weld.bean.ContextualInstanceStrategy$DefaultContextualInstanceStrategy.get(ContextualInstanceStrategy.java:100) > at org.jboss.weld.bean.ContextualInstance.get(ContextualInstance.java:50) > at > org.jboss.weld.manager.BeanManagerImpl.getReference(BeanManagerImpl.java:785) > at > org.jboss.weld.manager.BeanManagerImpl.getInjectableReference(BeanManagerImpl.java:885) > at > org.jboss.weld.injection.FieldInjectionPoint.inject(FieldInjectionPoint.java:92) > at org.jboss.weld.util.Beans.injectBoundFields(Beans.java:358) > at org.jboss.weld.util.Beans.injectFieldsAndInitializers(Beans.java:369) > at > org.jboss.weld.injection.producer.ResourceInjector$1.proceed(ResourceInjector.java:70) > at > org.jboss.weld.injection.InjectionContextImpl.run(InjectionContextImpl.java:48) > at > org.jboss.weld.injection.producer.ResourceInjector.inject(ResourceInjector.java:72) > at >
[jira] [Commented] (HDDS-4365) SCMBlockLocationFailoverProxyProvider should use ScmBlockLocationProtocolPB.class in RPC.setProtocolEngine
[ https://issues.apache.org/jira/browse/HDDS-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17218793#comment-17218793 ] Li Cheng commented on HDDS-4365: Merged. Resolving this... > SCMBlockLocationFailoverProxyProvider should use > ScmBlockLocationProtocolPB.class in RPC.setProtocolEngine > -- > > Key: HDDS-4365 > URL: https://issues.apache.org/jira/browse/HDDS-4365 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Reporter: Glen Geng >Assignee: Glen Geng >Priority: Minor > Labels: pull-request-available > > in SCMBlockLocationFailoverProxyProvider, > currently it is > {code:java} > private ScmBlockLocationProtocolPB createSCMProxy( > InetSocketAddress scmAddress) throws IOException { > ... > RPC.setProtocolEngine(hadoopConf, ScmBlockLocationProtocol.class, > ProtobufRpcEngine.class); > ...{code} > it should be > {code:java} > private ScmBlockLocationProtocolPB createSCMProxy( > InetSocketAddress scmAddress) throws IOException { > ... > RPC.setProtocolEngine(hadoopConf, ScmBlockLocationProtocolPB.class, > ProtobufRpcEngine.class); > ...{code} > > FYi, according to non-HA version > {code:java} > private static ScmBlockLocationProtocol getScmBlockClient( > OzoneConfiguration conf) throws IOException { > RPC.setProtocolEngine(conf, ScmBlockLocationProtocolPB.class, > ProtobufRpcEngine.class); > long scmVersion = > RPC.getProtocolVersion(ScmBlockLocationProtocolPB.class); > InetSocketAddress scmBlockAddress = > getScmAddressForBlockClients(conf); > ScmBlockLocationProtocolClientSideTranslatorPB scmBlockLocationClient = > new ScmBlockLocationProtocolClientSideTranslatorPB( > RPC.getProxy(ScmBlockLocationProtocolPB.class, scmVersion, > scmBlockAddress, UserGroupInformation.getCurrentUser(), conf, > NetUtils.getDefaultSocketFactory(conf), > Client.getRpcTimeout(conf))); > return TracingUtil > .createProxy(scmBlockLocationClient, ScmBlockLocationProtocol.class, > conf); > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-4365) SCMBlockLocationFailoverProxyProvider should use ScmBlockLocationProtocolPB.class in RPC.setProtocolEngine
[ https://issues.apache.org/jira/browse/HDDS-4365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng resolved HDDS-4365. Resolution: Fixed > SCMBlockLocationFailoverProxyProvider should use > ScmBlockLocationProtocolPB.class in RPC.setProtocolEngine > -- > > Key: HDDS-4365 > URL: https://issues.apache.org/jira/browse/HDDS-4365 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Reporter: Glen Geng >Assignee: Glen Geng >Priority: Minor > Labels: pull-request-available > > in SCMBlockLocationFailoverProxyProvider, > currently it is > {code:java} > private ScmBlockLocationProtocolPB createSCMProxy( > InetSocketAddress scmAddress) throws IOException { > ... > RPC.setProtocolEngine(hadoopConf, ScmBlockLocationProtocol.class, > ProtobufRpcEngine.class); > ...{code} > it should be > {code:java} > private ScmBlockLocationProtocolPB createSCMProxy( > InetSocketAddress scmAddress) throws IOException { > ... > RPC.setProtocolEngine(hadoopConf, ScmBlockLocationProtocolPB.class, > ProtobufRpcEngine.class); > ...{code} > > FYi, according to non-HA version > {code:java} > private static ScmBlockLocationProtocol getScmBlockClient( > OzoneConfiguration conf) throws IOException { > RPC.setProtocolEngine(conf, ScmBlockLocationProtocolPB.class, > ProtobufRpcEngine.class); > long scmVersion = > RPC.getProtocolVersion(ScmBlockLocationProtocolPB.class); > InetSocketAddress scmBlockAddress = > getScmAddressForBlockClients(conf); > ScmBlockLocationProtocolClientSideTranslatorPB scmBlockLocationClient = > new ScmBlockLocationProtocolClientSideTranslatorPB( > RPC.getProxy(ScmBlockLocationProtocolPB.class, scmVersion, > scmBlockAddress, UserGroupInformation.getCurrentUser(), conf, > NetUtils.getDefaultSocketFactory(conf), > Client.getRpcTimeout(conf))); > return TracingUtil > .createProxy(scmBlockLocationClient, ScmBlockLocationProtocol.class, > conf); > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-4361) S3 native error messages when header is illegal
Li Cheng created HDDS-4361: -- Summary: S3 native error messages when header is illegal Key: HDDS-4361 URL: https://issues.apache.org/jira/browse/HDDS-4361 Project: Hadoop Distributed Data Store Issue Type: Bug Components: S3 Affects Versions: 1.0.0 Reporter: Li Cheng Following up on https://issues.apache.org/jira/browse/HDDS-4339 and https://issues.apache.org/jira/browse/HDDS-3843, missing auth or other info in header may cause S3Client to throw NPE or log an error message. Rather than that, s3g should return s3 native error messages to requests with invalid header or other reasons. However, s3client should be able to initialized. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-3188) Add failover proxy to SCM block protocol
[ https://issues.apache.org/jira/browse/HDDS-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17216508#comment-17216508 ] Li Cheng commented on HDDS-3188: PR is merged. Resolving > Add failover proxy to SCM block protocol > > > Key: HDDS-3188 > URL: https://issues.apache.org/jira/browse/HDDS-3188 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Reporter: Li Cheng >Assignee: Li Cheng >Priority: Major > Labels: pull-request-available > > Need to supports 2N + 1 SCMs. Add configs and logic to support multiple SCMs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3188) Add failover proxy to SCM block protocol
[ https://issues.apache.org/jira/browse/HDDS-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng updated HDDS-3188: --- Resolution: Fixed Status: Resolved (was: Patch Available) > Add failover proxy to SCM block protocol > > > Key: HDDS-3188 > URL: https://issues.apache.org/jira/browse/HDDS-3188 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Reporter: Li Cheng >Assignee: Li Cheng >Priority: Major > Labels: pull-request-available > > Need to supports 2N + 1 SCMs. Add configs and logic to support multiple SCMs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-4192) enable SCM Raft Group based on config ozone.scm.names
[ https://issues.apache.org/jira/browse/HDDS-4192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17216507#comment-17216507 ] Li Cheng commented on HDDS-4192: PR is merged. Thanks [~glengeng] for the contribution. Resolving > enable SCM Raft Group based on config ozone.scm.names > - > > Key: HDDS-4192 > URL: https://issues.apache.org/jira/browse/HDDS-4192 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Reporter: Glen Geng >Assignee: Glen Geng >Priority: Major > Labels: pull-request-available > > > Say ozone.scm.names is "ip1,ip2,ip3", scm with ip1 identifies its RaftPeerId > as scm1, scm with ip2 identifies its RaftPeerId as scm2, scm with ip3 > identifies its RaftPeerId as scm3. They will automatically become a raft > group. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-4192) enable SCM Raft Group based on config ozone.scm.names
[ https://issues.apache.org/jira/browse/HDDS-4192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng resolved HDDS-4192. Resolution: Fixed > enable SCM Raft Group based on config ozone.scm.names > - > > Key: HDDS-4192 > URL: https://issues.apache.org/jira/browse/HDDS-4192 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Reporter: Glen Geng >Assignee: Glen Geng >Priority: Major > Labels: pull-request-available > > > Say ozone.scm.names is "ip1,ip2,ip3", scm with ip1 identifies its RaftPeerId > as scm1, scm with ip2 identifies its RaftPeerId as scm2, scm with ip3 > identifies its RaftPeerId as scm3. They will automatically become a raft > group. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-4339) Ozone S3 gateway throws NPE with goofys
[ https://issues.apache.org/jira/browse/HDDS-4339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng reassigned HDDS-4339: -- Assignee: Li Cheng > Ozone S3 gateway throws NPE with goofys > --- > > Key: HDDS-4339 > URL: https://issues.apache.org/jira/browse/HDDS-4339 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Affects Versions: 1.0.0 >Reporter: Li Cheng >Assignee: Li Cheng >Priority: Blocker > Labels: pull-request-available > Attachments: image-2020-10-13-15-23-49-864.png > > > Configured goofys and s3g on different hosts and Fiotest writes files on the > goofys mount point. Export AWS secrets on the s3g host. See a bunch of NPE in > s3g logs. > # Looks like missing AWS auth header could cause NPE. Looks like > AWSSignatureProcessor.init() doesn't handle header missing which causes NPE. > # Why it's missing AWS auth header is also unknown. > Note that there are files that have been successfully written into Ozone via > goofys, while not all of them are succeeded. > > 2020-10-13 11:18:43,425 [qtp1686100174-1238] ERROR > org.apache.hadoop.ozone.s3.OzoneClientProducer: Error: > org.jboss.weld.exceptions.WeldException: WELD-49: Unable to invoke public > void org.apache.hadoop.ozone.s3.AWSSignatureProcessor.init() throws > java.lang.Exception on > org.apache.hadoop.ozone.s3.AWSSignatureProcessor@5535155b > at > org.jboss.weld.injection.producer.DefaultLifecycleCallbackInvoker.invokeMethods(DefaultLifecycleCallbackInvoker.java:99) > at > org.jboss.weld.injection.producer.DefaultLifecycleCallbackInvoker.postConstruct(DefaultLifecycleCallbackInvoker.java:80) > at > org.jboss.weld.injection.producer.BasicInjectionTarget.postConstruct(BasicInjectionTarget.java:122) > at > org.glassfish.jersey.ext.cdi1x.internal.CdiComponentProvider$InjectionManagerInjectedCdiTarget.postConstruct(CdiComponentProvider.java:887) > at org.jboss.weld.bean.ManagedBean.create(ManagedBean.java:162) > at org.jboss.weld.context.AbstractContext.get(AbstractContext.java:96) > at > org.jboss.weld.bean.ContextualInstanceStrategy$DefaultContextualInstanceStrategy.get(ContextualInstanceStrategy.java:100) > at > org.jboss.weld.bean.ContextualInstanceStrategy$CachingContextualInstanceStrategy.get(ContextualInstanceStrategy.java:177) > at org.jboss.weld.bean.ContextualInstance.get(ContextualInstance.java:50) > at > org.jboss.weld.bean.proxy.ContextBeanInstance.getInstance(ContextBeanInstance.java:99) > at > org.jboss.weld.bean.proxy.ProxyMethodHandler.getInstance(ProxyMethodHandler.java:125) > at > org.apache.hadoop.ozone.s3.AWSSignatureProcessor$Proxy$_$$_WeldClientProxy.getAwsAccessId(Unknown > Source) > at > org.apache.hadoop.ozone.s3.OzoneClientProducer.getClient(OzoneClientProducer.java:79) > at > org.apache.hadoop.ozone.s3.OzoneClientProducer.createClient(OzoneClientProducer.java:68) > at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.jboss.weld.injection.StaticMethodInjectionPoint.invoke(StaticMethodInjectionPoint.java:88) > at > org.jboss.weld.injection.StaticMethodInjectionPoint.invoke(StaticMethodInjectionPoint.java:78) > at > org.jboss.weld.injection.producer.ProducerMethodProducer.produce(ProducerMethodProducer.java:100) > at > org.jboss.weld.injection.producer.AbstractMemberProducer.produce(AbstractMemberProducer.java:161) > at > org.jboss.weld.bean.AbstractProducerBean.create(AbstractProducerBean.java:180) > at > org.jboss.weld.context.unbound.DependentContextImpl.get(DependentContextImpl.java:70) > at > org.jboss.weld.bean.ContextualInstanceStrategy$DefaultContextualInstanceStrategy.get(ContextualInstanceStrategy.java:100) > at org.jboss.weld.bean.ContextualInstance.get(ContextualInstance.java:50) > at > org.jboss.weld.manager.BeanManagerImpl.getReference(BeanManagerImpl.java:785) > at > org.jboss.weld.manager.BeanManagerImpl.getInjectableReference(BeanManagerImpl.java:885) > at > org.jboss.weld.injection.FieldInjectionPoint.inject(FieldInjectionPoint.java:92) > at org.jboss.weld.util.Beans.injectBoundFields(Beans.java:358) > at org.jboss.weld.util.Beans.injectFieldsAndInitializers(Beans.java:369) > at > org.jboss.weld.injection.producer.ResourceInjector$1.proceed(ResourceInjector.java:70) > at > org.jboss.weld.injection.InjectionContextImpl.run(InjectionContextImpl.java:48) > at > org.jboss.weld.injection.producer.ResourceInjector.inject(ResourceInjector.java:72) > at > org.jboss.weld.injection.producer.BasicInjectionTarget.inject(BasicInjectionTarget.java:117) > at >
[jira] [Commented] (HDDS-4339) Ozone S3 gateway throws NPE with goofys
[ https://issues.apache.org/jira/browse/HDDS-4339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213856#comment-17213856 ] Li Cheng commented on HDDS-4339: [~bharat] The issue is slightly different here. Before the header goes into AWSSignatureProcessor, we need to validate there is auth field in the header before the following validation depends on it. However, we are missing a place to handle the requests who don't contain auth field. > Ozone S3 gateway throws NPE with goofys > --- > > Key: HDDS-4339 > URL: https://issues.apache.org/jira/browse/HDDS-4339 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Affects Versions: 1.0.0 >Reporter: Li Cheng >Priority: Blocker > Attachments: image-2020-10-13-15-23-49-864.png > > > Configured goofys and s3g on different hosts and Fiotest writes files on the > goofys mount point. Export AWS secrets on the s3g host. See a bunch of NPE in > s3g logs. > # Looks like missing AWS auth header could cause NPE. Looks like > AWSSignatureProcessor.init() doesn't handle header missing which causes NPE. > # Why it's missing AWS auth header is also unknown. > Note that there are files that have been successfully written into Ozone via > goofys, while not all of them are succeeded. > > 2020-10-13 11:18:43,425 [qtp1686100174-1238] ERROR > org.apache.hadoop.ozone.s3.OzoneClientProducer: Error: > org.jboss.weld.exceptions.WeldException: WELD-49: Unable to invoke public > void org.apache.hadoop.ozone.s3.AWSSignatureProcessor.init() throws > java.lang.Exception on > org.apache.hadoop.ozone.s3.AWSSignatureProcessor@5535155b > at > org.jboss.weld.injection.producer.DefaultLifecycleCallbackInvoker.invokeMethods(DefaultLifecycleCallbackInvoker.java:99) > at > org.jboss.weld.injection.producer.DefaultLifecycleCallbackInvoker.postConstruct(DefaultLifecycleCallbackInvoker.java:80) > at > org.jboss.weld.injection.producer.BasicInjectionTarget.postConstruct(BasicInjectionTarget.java:122) > at > org.glassfish.jersey.ext.cdi1x.internal.CdiComponentProvider$InjectionManagerInjectedCdiTarget.postConstruct(CdiComponentProvider.java:887) > at org.jboss.weld.bean.ManagedBean.create(ManagedBean.java:162) > at org.jboss.weld.context.AbstractContext.get(AbstractContext.java:96) > at > org.jboss.weld.bean.ContextualInstanceStrategy$DefaultContextualInstanceStrategy.get(ContextualInstanceStrategy.java:100) > at > org.jboss.weld.bean.ContextualInstanceStrategy$CachingContextualInstanceStrategy.get(ContextualInstanceStrategy.java:177) > at org.jboss.weld.bean.ContextualInstance.get(ContextualInstance.java:50) > at > org.jboss.weld.bean.proxy.ContextBeanInstance.getInstance(ContextBeanInstance.java:99) > at > org.jboss.weld.bean.proxy.ProxyMethodHandler.getInstance(ProxyMethodHandler.java:125) > at > org.apache.hadoop.ozone.s3.AWSSignatureProcessor$Proxy$_$$_WeldClientProxy.getAwsAccessId(Unknown > Source) > at > org.apache.hadoop.ozone.s3.OzoneClientProducer.getClient(OzoneClientProducer.java:79) > at > org.apache.hadoop.ozone.s3.OzoneClientProducer.createClient(OzoneClientProducer.java:68) > at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.jboss.weld.injection.StaticMethodInjectionPoint.invoke(StaticMethodInjectionPoint.java:88) > at > org.jboss.weld.injection.StaticMethodInjectionPoint.invoke(StaticMethodInjectionPoint.java:78) > at > org.jboss.weld.injection.producer.ProducerMethodProducer.produce(ProducerMethodProducer.java:100) > at > org.jboss.weld.injection.producer.AbstractMemberProducer.produce(AbstractMemberProducer.java:161) > at > org.jboss.weld.bean.AbstractProducerBean.create(AbstractProducerBean.java:180) > at > org.jboss.weld.context.unbound.DependentContextImpl.get(DependentContextImpl.java:70) > at > org.jboss.weld.bean.ContextualInstanceStrategy$DefaultContextualInstanceStrategy.get(ContextualInstanceStrategy.java:100) > at org.jboss.weld.bean.ContextualInstance.get(ContextualInstance.java:50) > at > org.jboss.weld.manager.BeanManagerImpl.getReference(BeanManagerImpl.java:785) > at > org.jboss.weld.manager.BeanManagerImpl.getInjectableReference(BeanManagerImpl.java:885) > at > org.jboss.weld.injection.FieldInjectionPoint.inject(FieldInjectionPoint.java:92) > at org.jboss.weld.util.Beans.injectBoundFields(Beans.java:358) > at org.jboss.weld.util.Beans.injectFieldsAndInitializers(Beans.java:369) > at > org.jboss.weld.injection.producer.ResourceInjector$1.proceed(ResourceInjector.java:70) > at > org.jboss.weld.injection.InjectionContextImpl.run(InjectionContextImpl.java:48) > at >
[jira] [Created] (HDDS-4339) Ozone S3 gateway throws NPE with goofys
Li Cheng created HDDS-4339: -- Summary: Ozone S3 gateway throws NPE with goofys Key: HDDS-4339 URL: https://issues.apache.org/jira/browse/HDDS-4339 Project: Hadoop Distributed Data Store Issue Type: Bug Affects Versions: 1.0.0 Reporter: Li Cheng Attachments: image-2020-10-13-15-23-49-864.png Configured goofys and s3g on different hosts and Fiotest writes files on the goofys mount point. Export AWS secrets on the s3g host. See a bunch of NPE in s3g logs. # Looks like missing AWS auth header could cause NPE. Looks like AWSSignatureProcessor.init() doesn't handle header missing which causes NPE. # Why it's missing AWS auth header is also unknown. Note that there are files that have been successfully written into Ozone via goofys, while not all of them are succeeded. 2020-10-13 11:18:43,425 [qtp1686100174-1238] ERROR org.apache.hadoop.ozone.s3.OzoneClientProducer: Error: org.jboss.weld.exceptions.WeldException: WELD-49: Unable to invoke public void org.apache.hadoop.ozone.s3.AWSSignatureProcessor.init() throws java.lang.Exception on org.apache.hadoop.ozone.s3.AWSSignatureProcessor@5535155b at org.jboss.weld.injection.producer.DefaultLifecycleCallbackInvoker.invokeMethods(DefaultLifecycleCallbackInvoker.java:99) at org.jboss.weld.injection.producer.DefaultLifecycleCallbackInvoker.postConstruct(DefaultLifecycleCallbackInvoker.java:80) at org.jboss.weld.injection.producer.BasicInjectionTarget.postConstruct(BasicInjectionTarget.java:122) at org.glassfish.jersey.ext.cdi1x.internal.CdiComponentProvider$InjectionManagerInjectedCdiTarget.postConstruct(CdiComponentProvider.java:887) at org.jboss.weld.bean.ManagedBean.create(ManagedBean.java:162) at org.jboss.weld.context.AbstractContext.get(AbstractContext.java:96) at org.jboss.weld.bean.ContextualInstanceStrategy$DefaultContextualInstanceStrategy.get(ContextualInstanceStrategy.java:100) at org.jboss.weld.bean.ContextualInstanceStrategy$CachingContextualInstanceStrategy.get(ContextualInstanceStrategy.java:177) at org.jboss.weld.bean.ContextualInstance.get(ContextualInstance.java:50) at org.jboss.weld.bean.proxy.ContextBeanInstance.getInstance(ContextBeanInstance.java:99) at org.jboss.weld.bean.proxy.ProxyMethodHandler.getInstance(ProxyMethodHandler.java:125) at org.apache.hadoop.ozone.s3.AWSSignatureProcessor$Proxy$_$$_WeldClientProxy.getAwsAccessId(Unknown Source) at org.apache.hadoop.ozone.s3.OzoneClientProducer.getClient(OzoneClientProducer.java:79) at org.apache.hadoop.ozone.s3.OzoneClientProducer.createClient(OzoneClientProducer.java:68) at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.jboss.weld.injection.StaticMethodInjectionPoint.invoke(StaticMethodInjectionPoint.java:88) at org.jboss.weld.injection.StaticMethodInjectionPoint.invoke(StaticMethodInjectionPoint.java:78) at org.jboss.weld.injection.producer.ProducerMethodProducer.produce(ProducerMethodProducer.java:100) at org.jboss.weld.injection.producer.AbstractMemberProducer.produce(AbstractMemberProducer.java:161) at org.jboss.weld.bean.AbstractProducerBean.create(AbstractProducerBean.java:180) at org.jboss.weld.context.unbound.DependentContextImpl.get(DependentContextImpl.java:70) at org.jboss.weld.bean.ContextualInstanceStrategy$DefaultContextualInstanceStrategy.get(ContextualInstanceStrategy.java:100) at org.jboss.weld.bean.ContextualInstance.get(ContextualInstance.java:50) at org.jboss.weld.manager.BeanManagerImpl.getReference(BeanManagerImpl.java:785) at org.jboss.weld.manager.BeanManagerImpl.getInjectableReference(BeanManagerImpl.java:885) at org.jboss.weld.injection.FieldInjectionPoint.inject(FieldInjectionPoint.java:92) at org.jboss.weld.util.Beans.injectBoundFields(Beans.java:358) at org.jboss.weld.util.Beans.injectFieldsAndInitializers(Beans.java:369) at org.jboss.weld.injection.producer.ResourceInjector$1.proceed(ResourceInjector.java:70) at org.jboss.weld.injection.InjectionContextImpl.run(InjectionContextImpl.java:48) at org.jboss.weld.injection.producer.ResourceInjector.inject(ResourceInjector.java:72) at org.jboss.weld.injection.producer.BasicInjectionTarget.inject(BasicInjectionTarget.java:117) at org.glassfish.jersey.ext.cdi1x.internal.CdiComponentProvider$InjectionManagerInjectedCdiTarget.inject(CdiComponentProvider.java:873) at org.jboss.weld.bean.ManagedBean.create(ManagedBean.java:159) at org.jboss.weld.context.unbound.DependentContextImpl.get(DependentContextImpl.java:70) at org.jboss.weld.bean.ContextualInstanceStrategy$DefaultContextualInstanceStrategy.get(ContextualInstanceStrategy.java:100) at org.jboss.weld.bean.ContextualInstance.get(ContextualInstance.java:50) at
[jira] [Assigned] (HDDS-3103) Have multi-raft pipeline calculator to recommend best pipeline number per datanode
[ https://issues.apache.org/jira/browse/HDDS-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng reassigned HDDS-3103: -- Assignee: (was: Li Cheng) > Have multi-raft pipeline calculator to recommend best pipeline number per > datanode > -- > > Key: HDDS-3103 > URL: https://issues.apache.org/jira/browse/HDDS-3103 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: SCM >Affects Versions: 0.5.0 >Reporter: Li Cheng >Priority: Critical > > PipelinePlacementPolicy should have a calculator method to recommend better > number for pipeline number per node. The number used to come from > ozone.datanode.pipeline.limit in config. SCM should be able to consider how > many ratis dir and the ratis retry timeout to recommend the best pipeline > number for every node. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-4295) SCM ServiceManager
Li Cheng created HDDS-4295: -- Summary: SCM ServiceManager Key: HDDS-4295 URL: https://issues.apache.org/jira/browse/HDDS-4295 Project: Hadoop Distributed Data Store Issue Type: Sub-task Components: SCM Reporter: Li Cheng SCM ServiceManager is going to control all the SCM background service so that they are only serving as the leader. ServiceManager also would bootstrap all the background services and protocol servers. It also needs to do validation steps when the SCM is up as the leader. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-3206) Make sure AllocateBlock can only be executed on leader SCM
[ https://issues.apache.org/jira/browse/HDDS-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng resolved HDDS-3206. Resolution: Duplicate > Make sure AllocateBlock can only be executed on leader SCM > -- > > Key: HDDS-3206 > URL: https://issues.apache.org/jira/browse/HDDS-3206 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Reporter: Li Cheng >Priority: Major > > Check if the current is leader. If not, return NonLeaderException -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-3199) Handle PipelineAction and OpenPipline from DN to SCM
[ https://issues.apache.org/jira/browse/HDDS-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng resolved HDDS-3199. Resolution: Duplicate > Handle PipelineAction and OpenPipline from DN to SCM > > > Key: HDDS-3199 > URL: https://issues.apache.org/jira/browse/HDDS-3199 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Reporter: Li Cheng >Priority: Major > > PipelineAction and OpenPipeline should only sent to leader SCM and leader SCM > will take action to close or open pipelines. Pipeline state change will be > updated to followers via Ratis. If action is sent to followers, follower SCM > will reject with NonLeaderException and DN will retry. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-4294) Backport updates from ContainerManager(V1)
Li Cheng created HDDS-4294: -- Summary: Backport updates from ContainerManager(V1) Key: HDDS-4294 URL: https://issues.apache.org/jira/browse/HDDS-4294 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: Li Cheng -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-4293) Backport updates from PipelineManager(V1)
Li Cheng created HDDS-4293: -- Summary: Backport updates from PipelineManager(V1) Key: HDDS-4293 URL: https://issues.apache.org/jira/browse/HDDS-4293 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: Li Cheng -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3211) Design for SCM HA configuration
[ https://issues.apache.org/jira/browse/HDDS-3211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng updated HDDS-3211: --- Summary: Design for SCM HA configuration (was: Make SCM HA configurable) > Design for SCM HA configuration > --- > > Key: HDDS-3211 > URL: https://issues.apache.org/jira/browse/HDDS-3211 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Reporter: Li Cheng >Priority: Major > > Need a switch in all path to turn on/off SCM HA. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-3200) Handle NodeReport from DN to SCMs
[ https://issues.apache.org/jira/browse/HDDS-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng resolved HDDS-3200. Resolution: Duplicate > Handle NodeReport from DN to SCMs > - > > Key: HDDS-3200 > URL: https://issues.apache.org/jira/browse/HDDS-3200 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Reporter: Li Cheng >Priority: Major > > NodeReport sends to all SCMs. Only leader SCM can take action to change node > status. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-3193) Handle ContainerReport and IncrementalContainerReport
[ https://issues.apache.org/jira/browse/HDDS-3193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng resolved HDDS-3193. Resolution: Duplicate > Handle ContainerReport and IncrementalContainerReport > - > > Key: HDDS-3193 > URL: https://issues.apache.org/jira/browse/HDDS-3193 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Li Cheng >Priority: Major > > Let DataNode send to all SCMs for contianerReport and > IncrementalContainerReport. And SCM should be aware of BSCID in reports to > know to version of report. SCM will NOT applyTransaction for container > reports. But only record the sequenceId like BCSID in reports. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-3211) Make SCM HA configurable
[ https://issues.apache.org/jira/browse/HDDS-3211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17204409#comment-17204409 ] Li Cheng commented on HDDS-3211: [~nicholasjiang] Hey Nicolas, this issue would require an overall design for SCM HA configuration considering multi-scms as well as allowing federation. Also this HA config ma apply for entire Ozone, which means we would need to update what OM HA does now. > Make SCM HA configurable > > > Key: HDDS-3211 > URL: https://issues.apache.org/jira/browse/HDDS-3211 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Reporter: Li Cheng >Priority: Major > > Need a switch in all path to turn on/off SCM HA. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-4115) CLI command to show current SCM leader and follower status
[ https://issues.apache.org/jira/browse/HDDS-4115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17203727#comment-17203727 ] Li Cheng commented on HDDS-4115: Patch is merged. Resolving > CLI command to show current SCM leader and follower status > -- > > Key: HDDS-4115 > URL: https://issues.apache.org/jira/browse/HDDS-4115 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Li Cheng >Assignee: Rui Wang >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-4115) CLI command to show current SCM leader and follower status
[ https://issues.apache.org/jira/browse/HDDS-4115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng updated HDDS-4115: --- Resolution: Fixed Status: Resolved (was: Patch Available) > CLI command to show current SCM leader and follower status > -- > > Key: HDDS-4115 > URL: https://issues.apache.org/jira/browse/HDDS-4115 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Li Cheng >Assignee: Rui Wang >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-3661) Add Snapshot into new SCMRatisServer and SCMStateMachine
[ https://issues.apache.org/jira/browse/HDDS-3661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng reassigned HDDS-3661: -- Assignee: Rui Wang (was: Li Cheng) > Add Snapshot into new SCMRatisServer and SCMStateMachine > - > > Key: HDDS-3661 > URL: https://issues.apache.org/jira/browse/HDDS-3661 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Li Cheng >Assignee: Rui Wang >Priority: Major > > Now we have prototype SCMRatisServer and SCMStateMachine under Ratis and HA > path. Implement Snapshot support into new SCMRatisServer and SCMStateMachine > as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-4132) Switch to ContainerManagerV2
[ https://issues.apache.org/jira/browse/HDDS-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng resolved HDDS-4132. Resolution: Duplicate https://issues.apache.org/jira/browse/HDDS-4133 > Switch to ContainerManagerV2 > > > Key: HDDS-4132 > URL: https://issues.apache.org/jira/browse/HDDS-4132 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM HA >Reporter: Li Cheng >Assignee: Li Cheng >Priority: Major > > Use the new ContainerManagerV2 API -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-3203) Replication can only be executed on leader
[ https://issues.apache.org/jira/browse/HDDS-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng resolved HDDS-3203. Resolution: Duplicate > Replication can only be executed on leader > -- > > Key: HDDS-3203 > URL: https://issues.apache.org/jira/browse/HDDS-3203 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Reporter: Li Cheng >Priority: Major > > Replication should only execute on leader. > If the leader has changed, the new leader will initialize new tasks based on > its current view. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3188) Add failover proxy to SCM block protocol
[ https://issues.apache.org/jira/browse/HDDS-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng updated HDDS-3188: --- Status: Patch Available (was: In Progress) > Add failover proxy to SCM block protocol > > > Key: HDDS-3188 > URL: https://issues.apache.org/jira/browse/HDDS-3188 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Reporter: Li Cheng >Assignee: Li Cheng >Priority: Major > Labels: pull-request-available > > Need to supports 2N + 1 SCMs. Add configs and logic to support multiple SCMs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-3188) Add failover proxy to SCM block protocol
[ https://issues.apache.org/jira/browse/HDDS-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng reassigned HDDS-3188: -- Assignee: Li Cheng (was: Li Cheng) > Add failover proxy to SCM block protocol > > > Key: HDDS-3188 > URL: https://issues.apache.org/jira/browse/HDDS-3188 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Reporter: Li Cheng >Assignee: Li Cheng >Priority: Major > Labels: pull-request-available > > Need to supports 2N + 1 SCMs. Add configs and logic to support multiple SCMs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-2823) SCM HA Support
[ https://issues.apache.org/jira/browse/HDDS-2823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng reassigned HDDS-2823: -- Assignee: Li Cheng (was: Li Cheng) > SCM HA Support > --- > > Key: HDDS-2823 > URL: https://issues.apache.org/jira/browse/HDDS-2823 > Project: Hadoop Distributed Data Store > Issue Type: New Feature > Components: SCM HA >Reporter: Sammi Chen >Assignee: Li Cheng >Priority: Major > > OM HA is close to feature complete now. It's time to support SCM HA, to make > sure there is no SPoF in the system. > > Design doc: > https://docs.google.com/document/d/1vr_z6mQgtS1dtI0nANoJlzvF1oLV-AtnNJnxAgg69rM/edit?usp=sharing -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-4281) Use suggestedLeader for SCM failover proxy performing failover
Li Cheng created HDDS-4281: -- Summary: Use suggestedLeader for SCM failover proxy performing failover Key: HDDS-4281 URL: https://issues.apache.org/jira/browse/HDDS-4281 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: Li Cheng related with HDDS-3188. Use suggestedLeader for SCM failover proxy response. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-4221) Support extra large storage capacity server as datanode
[ https://issues.apache.org/jira/browse/HDDS-4221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201902#comment-17201902 ] Li Cheng commented on HDDS-4221: There is discussion over RaftClient sharing one gRPC channel on every datanode: https://issues.apache.org/jira/browse/RATIS-1072 https://issues.apache.org/jira/browse/RATIS-1074 > Support extra large storage capacity server as datanode > --- > > Key: HDDS-4221 > URL: https://issues.apache.org/jira/browse/HDDS-4221 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Sammi Chen >Priority: Major > Attachments: image-2020-09-25-12-41-38-113.png > > > There is customer request to support high density storage server as datanode, > hardware configuration for example, 96 Core, 32G DDR4 *8, 480G SATA SSD, > 25GbE *2 , 60 * 12TB HDD. > How to fully utilize the hardware resource and unleash it's power is a big > challenge. > This umbrella JIRA is created to host all the discussions and next step > actions towards the final goal. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-4221) Support extra large storage capacity server as datanode
[ https://issues.apache.org/jira/browse/HDDS-4221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng updated HDDS-4221: --- Attachment: image-2020-09-25-12-41-38-113.png > Support extra large storage capacity server as datanode > --- > > Key: HDDS-4221 > URL: https://issues.apache.org/jira/browse/HDDS-4221 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Sammi Chen >Priority: Major > Attachments: image-2020-09-25-12-41-38-113.png > > > There is customer request to support high density storage server as datanode, > hardware configuration for example, 96 Core, 32G DDR4 *8, 480G SATA SSD, > 25GbE *2 , 60 * 12TB HDD. > How to fully utilize the hardware resource and unleash it's power is a big > challenge. > This umbrella JIRA is created to host all the discussions and next step > actions towards the final goal. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-4221) Support extra large storage capacity server as datanode
[ https://issues.apache.org/jira/browse/HDDS-4221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201901#comment-17201901 ] Li Cheng commented on HDDS-4221: In cosbench via S3 test, it looks like the writing performance performance has difference when we test towards single bucket and multiple buckets The above one is for single bucket and the below one is for 4 buckes. !image-2020-09-25-12-41-38-113.png! > Support extra large storage capacity server as datanode > --- > > Key: HDDS-4221 > URL: https://issues.apache.org/jira/browse/HDDS-4221 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Sammi Chen >Priority: Major > Attachments: image-2020-09-25-12-41-38-113.png > > > There is customer request to support high density storage server as datanode, > hardware configuration for example, 96 Core, 32G DDR4 *8, 480G SATA SSD, > 25GbE *2 , 60 * 12TB HDD. > How to fully utilize the hardware resource and unleash it's power is a big > challenge. > This umbrella JIRA is created to host all the discussions and next step > actions towards the final goal. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-4228) add field 'num' to ALLOCATE_BLOCK of scm audit log.
[ https://issues.apache.org/jira/browse/HDDS-4228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng resolved HDDS-4228. Fix Version/s: 1.1.0 Resolution: Fixed > add field 'num' to ALLOCATE_BLOCK of scm audit log. > --- > > Key: HDDS-4228 > URL: https://issues.apache.org/jira/browse/HDDS-4228 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Glen Geng >Assignee: Glen Geng >Priority: Minor > Labels: pull-request-available, pull-requests-available > Fix For: 1.1.0 > > > > The scm audit log for ALLOCATE_BLOCK is as follows: > {code:java} > 2020-09-10 03:42:08,196 | INFO | SCMAudit | user=root | ip=172.16.90.221 | > op=ALLOCATE_BLOCK {owner=7da0b4c4-d053-4fa0-8648-44ff0b8ba1bf, > size=268435456, type=RATIS, factor=THREE} | ret=SUCCESS |{code} > > One might be interested about the num of blocks allocated, better add field > 'num' to ALLOCATE_BLOCK of scm audit log. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-4228) add field 'num' to ALLOCATE_BLOCK of scm audit log.
[ https://issues.apache.org/jira/browse/HDDS-4228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17193955#comment-17193955 ] Li Cheng commented on HDDS-4228: PR is merged. Thanks [~glengeng] for working on this. > add field 'num' to ALLOCATE_BLOCK of scm audit log. > --- > > Key: HDDS-4228 > URL: https://issues.apache.org/jira/browse/HDDS-4228 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Glen Geng >Assignee: Glen Geng >Priority: Minor > Labels: pull-request-available, pull-requests-available > > > The scm audit log for ALLOCATE_BLOCK is as follows: > {code:java} > 2020-09-10 03:42:08,196 | INFO | SCMAudit | user=root | ip=172.16.90.221 | > op=ALLOCATE_BLOCK {owner=7da0b4c4-d053-4fa0-8648-44ff0b8ba1bf, > size=268435456, type=RATIS, factor=THREE} | ret=SUCCESS |{code} > > One might be interested about the num of blocks allocated, better add field > 'num' to ALLOCATE_BLOCK of scm audit log. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-4191) Add failover proxy for SCM container client
Li Cheng created HDDS-4191: -- Summary: Add failover proxy for SCM container client Key: HDDS-4191 URL: https://issues.apache.org/jira/browse/HDDS-4191 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: Li Cheng Assignee: Li Cheng Take advantage of failover proxy in HDDS-3188 and have failover proxy for SCM container client as well -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-3188) Add failover proxy to SCM block protocol
[ https://issues.apache.org/jira/browse/HDDS-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189070#comment-17189070 ] Li Cheng commented on HDDS-3188: Dividing 'Enable Multiple SCMs' to smaller task > Add failover proxy to SCM block protocol > > > Key: HDDS-3188 > URL: https://issues.apache.org/jira/browse/HDDS-3188 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Reporter: Li Cheng >Assignee: Li Cheng >Priority: Major > Labels: pull-request-available > > Need to supports 2N + 1 SCMs. Add configs and logic to support multiple SCMs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3188) Add failover proxy to SCM block protocol
[ https://issues.apache.org/jira/browse/HDDS-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng updated HDDS-3188: --- Summary: Add failover proxy to SCM block protocol (was: Enable Multiple SCMs) > Add failover proxy to SCM block protocol > > > Key: HDDS-3188 > URL: https://issues.apache.org/jira/browse/HDDS-3188 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Reporter: Li Cheng >Assignee: Li Cheng >Priority: Major > Labels: pull-request-available > > Need to supports 2N + 1 SCMs. Add configs and logic to support multiple SCMs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-3677) Handle events fired from PipelineManager to close container
[ https://issues.apache.org/jira/browse/HDDS-3677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng resolved HDDS-3677. Resolution: Duplicate This is resolved in PipelineManagerV2 > Handle events fired from PipelineManager to close container > --- > > Key: HDDS-3677 > URL: https://issues.apache.org/jira/browse/HDDS-3677 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Li Cheng >Priority: Major > > finalizePipeline used to fire events to close containers. In new interface, > we should decide where to fire these events. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-4132) Switch to ContainerManagerV2
Li Cheng created HDDS-4132: -- Summary: Switch to ContainerManagerV2 Key: HDDS-4132 URL: https://issues.apache.org/jira/browse/HDDS-4132 Project: Hadoop Distributed Data Store Issue Type: Sub-task Components: SCM HA Reporter: Li Cheng Assignee: Li Cheng Use the new ContainerManagerV2 API -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-4116) SCM CLI command towards certain IP
[ https://issues.apache.org/jira/browse/HDDS-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17179307#comment-17179307 ] Li Cheng commented on HDDS-4116: [~adoroszlai] The goal of this task is to enable admin cli to send scm related commands to certain SCM with IP address. Right now it's sending command just to SCM service, but with SCM HA, we would have leader SCM and followers. > SCM CLI command towards certain IP > -- > > Key: HDDS-4116 > URL: https://issues.apache.org/jira/browse/HDDS-4116 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Li Cheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3837) Add isLeader check for SCM state updates
[ https://issues.apache.org/jira/browse/HDDS-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng updated HDDS-3837: --- Resolution: Fixed Status: Resolved (was: Patch Available) > Add isLeader check for SCM state updates > > > Key: HDDS-3837 > URL: https://issues.apache.org/jira/browse/HDDS-3837 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Li Cheng >Assignee: Li Cheng >Priority: Major > Labels: pull-request-available > > We only allow leader to make decisions to update map, DB and fire events to DN -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-4116) SCM CLI command towards certain IP
Li Cheng created HDDS-4116: -- Summary: SCM CLI command towards certain IP Key: HDDS-4116 URL: https://issues.apache.org/jira/browse/HDDS-4116 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: Li Cheng -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-4115) CLI command to show current SCM leader and follower status
Li Cheng created HDDS-4115: -- Summary: CLI command to show current SCM leader and follower status Key: HDDS-4115 URL: https://issues.apache.org/jira/browse/HDDS-4115 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: Li Cheng -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-3190) SCM needs to replay RaftLog for recovery
[ https://issues.apache.org/jira/browse/HDDS-3190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176716#comment-17176716 ] Li Cheng commented on HDDS-3190: [~amaliujia] Hey Rui, Welcome to join ozone community! Let's find some time to have a sync-up offline. I shall introduce the vision as well as the design of SCM HA :) > SCM needs to replay RaftLog for recovery > > > Key: HDDS-3190 > URL: https://issues.apache.org/jira/browse/HDDS-3190 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Reporter: Li Cheng >Assignee: Rui Wang >Priority: Major > > Need to add a big Proto file for all types of requests to store in RaftLog. > SCM needs to replay RaftLog for recovery. > Note that Proto may have further changes. Until all data structures are > finished, need to leave some room for compatibility. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-3962) Use getRoleInfoProto() in isLeader check
Li Cheng created HDDS-3962: -- Summary: Use getRoleInfoProto() in isLeader check Key: HDDS-3962 URL: https://issues.apache.org/jira/browse/HDDS-3962 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: Li Cheng Assignee: Glen Geng {{RATIS-1001}} is going to have term in leadership check. SCM should check whether it's leader and at which term. Current isLeader check doesn't report term. [https://github.com/apache/hadoop-ozone/pull/1191/files/0ca2ff54d496ee9c74273a79a1d33e0dd998eecf#diff-0282ededa84a94d13dbed6fbb7ee159bR75] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3837) Add isLeader check for SCM state updates
[ https://issues.apache.org/jira/browse/HDDS-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng updated HDDS-3837: --- Status: Patch Available (was: In Progress) [https://github.com/apache/hadoop-ozone/pull/1191] > Add isLeader check for SCM state updates > > > Key: HDDS-3837 > URL: https://issues.apache.org/jira/browse/HDDS-3837 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Li Cheng >Assignee: Li Cheng >Priority: Major > Labels: pull-request-available > > We only allow leader to make decisions to update map, DB and fire events to DN -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-3188) Enable Multiple SCMs
[ https://issues.apache.org/jira/browse/HDDS-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng reassigned HDDS-3188: -- Assignee: Li Cheng > Enable Multiple SCMs > > > Key: HDDS-3188 > URL: https://issues.apache.org/jira/browse/HDDS-3188 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Reporter: Li Cheng >Assignee: Li Cheng >Priority: Major > > Need to supports 2N SCMs. Add configs and logic to support multiple SCMs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-3837) Add isLeader check for SCM state updates
[ https://issues.apache.org/jira/browse/HDDS-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng reassigned HDDS-3837: -- Assignee: Li Cheng > Add isLeader check for SCM state updates > > > Key: HDDS-3837 > URL: https://issues.apache.org/jira/browse/HDDS-3837 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Li Cheng >Assignee: Li Cheng >Priority: Major > > We only allow leader to make decisions to update map, DB and fire events to DN -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDDS-3911) Compile error in acceptance test on HDDS-2823
[ https://issues.apache.org/jira/browse/HDDS-3911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17150672#comment-17150672 ] Li Cheng edited comment on HDDS-3911 at 7/3/20, 2:09 AM: - [https://github.com/apache/hadoop-ozone/pull/1157] is merged.Thanks for contribution. was (Author: licheng): [https://github.com/apache/hadoop-ozone/pull/1157] is merged. Closing this JIRA. > Compile error in acceptance test on HDDS-2823 > - > > Key: HDDS-3911 > URL: https://issues.apache.org/jira/browse/HDDS-3911 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM HA >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Blocker > Labels: pull-request-available > > {code} > [INFO] --- hadoop-maven-plugins:3.2.1:protoc (compile-protoc) @ > hadoop-hdds-server-scm --- > [WARNING] [protoc, --version] failed: java.io.IOException: Cannot run program > "protoc": error=2, No such file or directory > [ERROR] stdout: [] > {code} > https://github.com/apache/hadoop-ozone/runs/814218639 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-3911) Compile error in acceptance test on HDDS-2823
[ https://issues.apache.org/jira/browse/HDDS-3911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17150672#comment-17150672 ] Li Cheng commented on HDDS-3911: [https://github.com/apache/hadoop-ozone/pull/1157] is merged. Closing this JIRA. > Compile error in acceptance test on HDDS-2823 > - > > Key: HDDS-3911 > URL: https://issues.apache.org/jira/browse/HDDS-3911 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM HA >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Blocker > Labels: pull-request-available > > {code} > [INFO] --- hadoop-maven-plugins:3.2.1:protoc (compile-protoc) @ > hadoop-hdds-server-scm --- > [WARNING] [protoc, --version] failed: java.io.IOException: Cannot run program > "protoc": error=2, No such file or directory > [ERROR] stdout: [] > {code} > https://github.com/apache/hadoop-ozone/runs/814218639 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-3838) Handle stale leader issue
Li Cheng created HDDS-3838: -- Summary: Handle stale leader issue Key: HDDS-3838 URL: https://issues.apache.org/jira/browse/HDDS-3838 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: Li Cheng There could be a stale SCM leader and a new SCM leader and both can communicate to DNs. We need to handle consistency issue. https://docs.google.com/document/d/1-5-KpR2GYIwWXGRH_C8IUVbFsm8RiETOVNYsMB5W8Ic/edit?usp=sharing -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-3837) Add isLeader check for SCM state updates
Li Cheng created HDDS-3837: -- Summary: Add isLeader check for SCM state updates Key: HDDS-3837 URL: https://issues.apache.org/jira/browse/HDDS-3837 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: Li Cheng We only allow leader to make decisions to update map, DB and fire events to DN -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-3191) Switch current pipeline interface to the new Replication based interface to write to Ratis
[ https://issues.apache.org/jira/browse/HDDS-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng reassigned HDDS-3191: -- Assignee: Li Cheng > Switch current pipeline interface to the new Replication based interface to > write to Ratis > -- > > Key: HDDS-3191 > URL: https://issues.apache.org/jira/browse/HDDS-3191 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Reporter: Li Cheng >Assignee: Li Cheng >Priority: Major > > Due to consistency concern, SCM needs to applyTransaction to RaftLog before > it writes to local database and in memory maps. Need refactor the current > codes to put this part to Ratis. > Ratis will write to DB on behalf of SCM. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-3191) Switch current pipeline interface to the new Replication based interface to write to Ratis
[ https://issues.apache.org/jira/browse/HDDS-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng reassigned HDDS-3191: -- Assignee: (was: Li Cheng) > Switch current pipeline interface to the new Replication based interface to > write to Ratis > -- > > Key: HDDS-3191 > URL: https://issues.apache.org/jira/browse/HDDS-3191 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Reporter: Li Cheng >Priority: Major > > Due to consistency concern, SCM needs to applyTransaction to RaftLog before > it writes to local database and in memory maps. Need refactor the current > codes to put this part to Ratis. > Ratis will write to DB on behalf of SCM. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-3679) Add unit tests for new PipelineManager interface
[ https://issues.apache.org/jira/browse/HDDS-3679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng resolved HDDS-3679. Fix Version/s: 0.6.0 Resolution: Fixed > Add unit tests for new PipelineManager interface > > > Key: HDDS-3679 > URL: https://issues.apache.org/jira/browse/HDDS-3679 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Li Cheng >Assignee: Li Cheng >Priority: Major > Labels: pull-request-available > Fix For: 0.6.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-3776) Upgrading RocksDB version to avoid java heap issue
Li Cheng created HDDS-3776: -- Summary: Upgrading RocksDB version to avoid java heap issue Key: HDDS-3776 URL: https://issues.apache.org/jira/browse/HDDS-3776 Project: Hadoop Distributed Data Store Issue Type: Bug Components: upgrade Affects Versions: 0.5.0 Reporter: Li Cheng Currently we have rocksdb 6.6.4 as major version and there are some jvm issues in tests (happened in [https://github.com/apache/hadoop-ozone/pull/1019]) related to rocksdb core dump. We may upgrade to 6.8.1 to avoid this issue. {{JRE version: Java(TM) SE Runtime Environment (8.0_211-b12) (build 1.8.0_211-b12) # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.211-b12 mixed mode bsd-amd64 compressed oops) # Problematic frame: # C [librocksdbjni2954960755376440018.jnilib+0x602b8] rocksdb::GetColumnFamilyID(rocksdb::ColumnFamilyHandle*)+0x8 See full dump at [https://the-asf.slack.com/files/U0159PV5Z6U/F0152UAJF0S/hs_err_pid90655.log?origin_team=T4S1WH2J3_channel=D014L2URB6E](url)}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-3499) Address compatibility issue by SCM DB instances change
[ https://issues.apache.org/jira/browse/HDDS-3499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17132853#comment-17132853 ] Li Cheng commented on HDDS-3499: [~arp] Our internal production deployment is still on schedule. But we have done internal tests to verify the step works for me. Resolving this now... > Address compatibility issue by SCM DB instances change > -- > > Key: HDDS-3499 > URL: https://issues.apache.org/jira/browse/HDDS-3499 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Reporter: Li Cheng >Assignee: Marton Elek >Priority: Blocker > Labels: Triaged > > After https://issues.apache.org/jira/browse/HDDS-3172, SCM now has one single > rocksdb instance instead of multiple db instances. > For running Ozone cluster, we need to address compatibility issues. One > possible way is to have a side-way tool to migrate old metadata from multiple > dbs to current single db. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-3499) Address compatibility issue by SCM DB instances change
[ https://issues.apache.org/jira/browse/HDDS-3499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng resolved HDDS-3499. Fix Version/s: 0.6.0 Resolution: Fixed > Address compatibility issue by SCM DB instances change > -- > > Key: HDDS-3499 > URL: https://issues.apache.org/jira/browse/HDDS-3499 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Reporter: Li Cheng >Assignee: Marton Elek >Priority: Blocker > Labels: Triaged > Fix For: 0.6.0 > > > After https://issues.apache.org/jira/browse/HDDS-3172, SCM now has one single > rocksdb instance instead of multiple db instances. > For running Ozone cluster, we need to address compatibility issues. One > possible way is to have a side-way tool to migrate old metadata from multiple > dbs to current single db. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-3662) decouple finalize and destroy pipeline
[ https://issues.apache.org/jira/browse/HDDS-3662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng reassigned HDDS-3662: -- Assignee: Li Cheng > decouple finalize and destroy pipeline > -- > > Key: HDDS-3662 > URL: https://issues.apache.org/jira/browse/HDDS-3662 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Li Cheng >Assignee: Li Cheng >Priority: Major > > We have to decouple finalize and destroy pipeline. We should have two > separate calls, closePipeline and destroyPipeline. > Close pipeline should only update the pipeline state, it’s the job of the > caller to issue close container commands to all the containers in the > pipeline. > Destroy pipeline should be called from pipeline scrubber, once a pipeline has > spent enough time in closed state the pipeline scrubber should call destroy > pipeline. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-3693) Switch to PipelineStateManagerV2 and put PipelineFactory in PipelineManager
[ https://issues.apache.org/jira/browse/HDDS-3693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng resolved HDDS-3693. Release Note: PR is merged Resolution: Fixed > Switch to PipelineStateManagerV2 and put PipelineFactory in PipelineManager > --- > > Key: HDDS-3693 > URL: https://issues.apache.org/jira/browse/HDDS-3693 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Li Cheng >Assignee: Li Cheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-3693) Switch to PipelineStateManagerV2 and put PipelineFactory in PipelineManager
Li Cheng created HDDS-3693: -- Summary: Switch to PipelineStateManagerV2 and put PipelineFactory in PipelineManager Key: HDDS-3693 URL: https://issues.apache.org/jira/browse/HDDS-3693 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: Li Cheng Assignee: Li Cheng -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-3499) Address compatibility issue by SCM DB instances change
[ https://issues.apache.org/jira/browse/HDDS-3499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119406#comment-17119406 ] Li Cheng commented on HDDS-3499: Hey I've tested in a testing cluster and it works for us. [~elek] [~arp] We will prepare a deployment next week and hopefully our production cluster can migrate safely. > Address compatibility issue by SCM DB instances change > -- > > Key: HDDS-3499 > URL: https://issues.apache.org/jira/browse/HDDS-3499 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Li Cheng >Assignee: Marton Elek >Priority: Blocker > > After https://issues.apache.org/jira/browse/HDDS-3172, SCM now has one single > rocksdb instance instead of multiple db instances. > For running Ozone cluster, we need to address compatibility issues. One > possible way is to have a side-way tool to migrate old metadata from multiple > dbs to current single db. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-3684) Add tests for replication annotation
Li Cheng created HDDS-3684: -- Summary: Add tests for replication annotation Key: HDDS-3684 URL: https://issues.apache.org/jira/browse/HDDS-3684 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: Li Cheng -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-3191) Switch current pipeline interface to the new Replication based interface to write to Ratis
[ https://issues.apache.org/jira/browse/HDDS-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng reassigned HDDS-3191: -- Assignee: Li Cheng > Switch current pipeline interface to the new Replication based interface to > write to Ratis > -- > > Key: HDDS-3191 > URL: https://issues.apache.org/jira/browse/HDDS-3191 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Reporter: Li Cheng >Assignee: Li Cheng >Priority: Major > > Due to consistency concern, SCM needs to applyTransaction to RaftLog before > it writes to local database and in memory maps. Need refactor the current > codes to put this part to Ratis. > Ratis will write to DB on behalf of SCM. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3191) Switch current pipeline interface to the new Replication based interface to write to Ratis
[ https://issues.apache.org/jira/browse/HDDS-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng updated HDDS-3191: --- Summary: Switch current pipeline interface to the new Replication based interface to write to Ratis (was: Interface to write to Ratis before write to SCM DB) > Switch current pipeline interface to the new Replication based interface to > write to Ratis > -- > > Key: HDDS-3191 > URL: https://issues.apache.org/jira/browse/HDDS-3191 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Reporter: Li Cheng >Priority: Major > > Due to consistency concern, SCM needs to applyTransaction to RaftLog before > it writes to local database and in memory maps. Need refactor the current > codes to put this part to Ratis. > Ratis will write to DB on behalf of SCM. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3196) New PipelineManager interface to persist to RatisServer
[ https://issues.apache.org/jira/browse/HDDS-3196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng updated HDDS-3196: --- Status: Patch Available (was: Open) [https://github.com/apache/hadoop-ozone/pull/980] > New PipelineManager interface to persist to RatisServer > --- > > Key: HDDS-3196 > URL: https://issues.apache.org/jira/browse/HDDS-3196 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Reporter: Li Cheng >Assignee: Li Cheng >Priority: Major > > This applies to DestroyPipeline as well createPipeline -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-3679) Add unit tests for new PipelineManager interface
Li Cheng created HDDS-3679: -- Summary: Add unit tests for new PipelineManager interface Key: HDDS-3679 URL: https://issues.apache.org/jira/browse/HDDS-3679 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: Li Cheng Assignee: Li Cheng -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3192) Handle AllocateContainer operation for HA
[ https://issues.apache.org/jira/browse/HDDS-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng updated HDDS-3192: --- Resolution: Fixed Status: Resolved (was: Patch Available) > Handle AllocateContainer operation for HA > - > > Key: HDDS-3192 > URL: https://issues.apache.org/jira/browse/HDDS-3192 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM HA >Reporter: Li Cheng >Assignee: Nanda kumar >Priority: Major > > Allocate container calls should make sure that the newly created container > information is replicated to the followers via Ratis. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-3192) Handle AllocateContainer operation for HA
[ https://issues.apache.org/jira/browse/HDDS-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118613#comment-17118613 ] Li Cheng commented on HDDS-3192: PR is merged. Thanks Nanda for this contribution. > Handle AllocateContainer operation for HA > - > > Key: HDDS-3192 > URL: https://issues.apache.org/jira/browse/HDDS-3192 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM HA >Reporter: Li Cheng >Assignee: Nanda kumar >Priority: Major > > Allocate container calls should make sure that the newly created container > information is replicated to the followers via Ratis. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-3677) Handle events fired from PipelineManager to close container
Li Cheng created HDDS-3677: -- Summary: Handle events fired from PipelineManager to close container Key: HDDS-3677 URL: https://issues.apache.org/jira/browse/HDDS-3677 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: Li Cheng finalizePipeline used to fire events to close containers. In new interface, we should decide where to fire these events. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3196) New PipelineManager interface to persist to RatisServer
[ https://issues.apache.org/jira/browse/HDDS-3196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng updated HDDS-3196: --- Summary: New PipelineManager interface to persist to RatisServer (was: Pipeline mutation needs to applyTransaction before writing to DB) > New PipelineManager interface to persist to RatisServer > --- > > Key: HDDS-3196 > URL: https://issues.apache.org/jira/browse/HDDS-3196 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Reporter: Li Cheng >Assignee: Li Cheng >Priority: Major > > This applies to DestroyPipeline as well createPipeline -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-3661) Add Snapshot into new SCMRatisServer and SCMStateMachine
[ https://issues.apache.org/jira/browse/HDDS-3661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng updated HDDS-3661: --- Summary: Add Snapshot into new SCMRatisServer and SCMStateMachine (was: Combine different versions of SCMRatisServer and SCMStateMachine) > Add Snapshot into new SCMRatisServer and SCMStateMachine > - > > Key: HDDS-3661 > URL: https://issues.apache.org/jira/browse/HDDS-3661 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Li Cheng >Assignee: Li Cheng >Priority: Major > > Now we have prototype SCMRatisServer and SCMStateMachine under Ratis and HA > path. We should combine them together. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-3661) Combine different versions of SCMRatisServer and SCMStateMachine
Li Cheng created HDDS-3661: -- Summary: Combine different versions of SCMRatisServer and SCMStateMachine Key: HDDS-3661 URL: https://issues.apache.org/jira/browse/HDDS-3661 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: Li Cheng Assignee: Li Cheng Now we have prototype SCMRatisServer and SCMStateMachine under Ratis and HA path. We should combine them together. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-3660) Arrange Util classes for SCM HA
Li Cheng created HDDS-3660: -- Summary: Arrange Util classes for SCM HA Key: HDDS-3660 URL: https://issues.apache.org/jira/browse/HDDS-3660 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: Li Cheng Assignee: Nanda kumar Now we have SCMHAUtils and RatisUtil. We need to arrange util classes for SCM HA better. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-3186) Introduce generic SCMRatisRequest and SCMRatisResponse
[ https://issues.apache.org/jira/browse/HDDS-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng resolved HDDS-3186. Resolution: Fixed > Introduce generic SCMRatisRequest and SCMRatisResponse > -- > > Key: HDDS-3186 > URL: https://issues.apache.org/jira/browse/HDDS-3186 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Reporter: Li Cheng >Assignee: Nanda kumar >Priority: Major > Labels: pull-request-available > > This jira will introduce generic SCMRatisRequest and SCMRatisResponse which > will be used by all the Ratis operations inside SCM. We will also have a > generic StateMachine which will dispatch the request to registered handlers. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-3186) Introduce generic SCMRatisRequest and SCMRatisResponse
[ https://issues.apache.org/jira/browse/HDDS-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17116536#comment-17116536 ] Li Cheng commented on HDDS-3186: PR is merged. Thanks Nanda for contribution. > Introduce generic SCMRatisRequest and SCMRatisResponse > -- > > Key: HDDS-3186 > URL: https://issues.apache.org/jira/browse/HDDS-3186 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Reporter: Li Cheng >Assignee: Nanda kumar >Priority: Major > Labels: pull-request-available > > This jira will introduce generic SCMRatisRequest and SCMRatisResponse which > will be used by all the Ratis operations inside SCM. We will also have a > generic StateMachine which will dispatch the request to registered handlers. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-3556) Refactor configuration in SCMRatisServer to Java-based configuration
[ https://issues.apache.org/jira/browse/HDDS-3556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111719#comment-17111719 ] Li Cheng commented on HDDS-3556: PR is merged. > Refactor configuration in SCMRatisServer to Java-based configuration > > > Key: HDDS-3556 > URL: https://issues.apache.org/jira/browse/HDDS-3556 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Affects Versions: 0.5.0 >Reporter: Li Cheng >Assignee: Li Cheng >Priority: Major > Labels: pull-request-available > > [https://cwiki.apache.org/confluence/display/HADOOP/Java-based+configuration+API] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-3556) Refactor configuration in SCMRatisServer to Java-based configuration
[ https://issues.apache.org/jira/browse/HDDS-3556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng resolved HDDS-3556. Release Note: PR is merged. Resolution: Fixed > Refactor configuration in SCMRatisServer to Java-based configuration > > > Key: HDDS-3556 > URL: https://issues.apache.org/jira/browse/HDDS-3556 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Affects Versions: 0.5.0 >Reporter: Li Cheng >Assignee: Li Cheng >Priority: Major > Labels: pull-request-available > > [https://cwiki.apache.org/confluence/display/HADOOP/Java-based+configuration+API] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-3499) Address compatibility issue by SCM DB instances change
[ https://issues.apache.org/jira/browse/HDDS-3499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111086#comment-17111086 ] Li Cheng commented on HDDS-3499: [~elek]Thanks for the testing. I will test with a VM cluster with 1 master and 3 datanodes. We see how it goes. [~arp] Sure, start to test it soon. Will get back to you. > Address compatibility issue by SCM DB instances change > -- > > Key: HDDS-3499 > URL: https://issues.apache.org/jira/browse/HDDS-3499 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Li Cheng >Assignee: Marton Elek >Priority: Blocker > > After https://issues.apache.org/jira/browse/HDDS-3172, SCM now has one single > rocksdb instance instead of multiple db instances. > For running Ozone cluster, we need to address compatibility issues. One > possible way is to have a side-way tool to migrate old metadata from multiple > dbs to current single db. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-3499) Address compatibility issue by SCM DB instances change
[ https://issues.apache.org/jira/browse/HDDS-3499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17110144#comment-17110144 ] Li Cheng commented on HDDS-3499: [~elek] Hey Marton, our production cluster is pending on this Jira for future upgrade. We are expecting an upgrade recently. Would you like to share some progress here? > Address compatibility issue by SCM DB instances change > -- > > Key: HDDS-3499 > URL: https://issues.apache.org/jira/browse/HDDS-3499 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Li Cheng >Assignee: Marton Elek >Priority: Blocker > > After https://issues.apache.org/jira/browse/HDDS-3172, SCM now has one single > rocksdb instance instead of multiple db instances. > For running Ozone cluster, we need to address compatibility issues. One > possible way is to have a side-way tool to migrate old metadata from multiple > dbs to current single db. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-3196) Pipeline mutation needs to applyTransaction before writing to DB
[ https://issues.apache.org/jira/browse/HDDS-3196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng reassigned HDDS-3196: -- Assignee: Li Cheng > Pipeline mutation needs to applyTransaction before writing to DB > > > Key: HDDS-3196 > URL: https://issues.apache.org/jira/browse/HDDS-3196 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Reporter: Li Cheng >Assignee: Li Cheng >Priority: Major > > This applies to DestroyPipeline as well createPipeline -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-3577) Reusable Ratis configuration among OM, SCM, DN and container
Li Cheng created HDDS-3577: -- Summary: Reusable Ratis configuration among OM, SCM, DN and container Key: HDDS-3577 URL: https://issues.apache.org/jira/browse/HDDS-3577 Project: Hadoop Distributed Data Store Issue Type: Improvement Affects Versions: 0.5.0 Reporter: Li Cheng Currently we have OM HA, container and DN who are using Ratis for consistency and redundancy. They all have own Ratis configuration. SCM HA is ongoing and SCM is also going to have Ratis support. Also we are moving to java based configuration: [https://cwiki.apache.org/confluence/display/HADOOP/Java-based+configuration+API] We shall consider clean up some naming fashion for all these configs and think of reusing some of them. We now have ozone.om.ratis, ozone.scm.ratis, hdds.datanode.ratis, dfs.container.ratis and even dfs.ratis. We may name them as ozone.ratis.* and let annotation to look for 'ozone.ratis' as prefix and then reuse some configs across OM, SCM and container. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-3559) Datanode doesn't handle java heap OutOfMemory exception
[ https://issues.apache.org/jira/browse/HDDS-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17102314#comment-17102314 ] Li Cheng commented on HDDS-3559: !http://file.tapd.oa.com//tfl/captures/2020-05/tapd_20417861_base64_1588909049_64.png! > Datanode doesn't handle java heap OutOfMemory exception > > > Key: HDDS-3559 > URL: https://issues.apache.org/jira/browse/HDDS-3559 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.5.0 >Reporter: Li Cheng >Priority: Major > > 2020-05-05 15:47:41,568 [Datanode State Machine Thread - 167] WARN > org.apache.hadoop.ozone.container.common.statemachine.Endpoi > ntStateMachine: Unable to communicate to SCM server at host-10-51-87-181:9861 > for past 0 seconds. > java.io.IOException: com.google.protobuf.ServiceException: > java.lang.OutOfMemoryError: Java heap space > at > org.apache.hadoop.ipc.ProtobufHelper.getRemoteException(ProtobufHelper.java:47) > at > org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolClientSideTranslatorPB.submitRequest(StorageContainerDatanodeProtocolClientSideTranslatorPB.java:118) > at > org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolClientSideTranslatorPB.sendHeartbeat(StorageContainerDatanodeProtocolClientSideTranslatorPB.java:148) > at > org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask.call(HeartbeatEndpointTask.java:145) > at > org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask.call(HeartbeatEndpointTask.java:76) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: com.google.protobuf.ServiceException: java.lang.OutOfMemoryError: > Java heap space > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.getReturnMessage(ProtobufRpcEngine.java:293) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:270) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) > at com.sun.proxy.$Proxy38.submitRequest(Unknown Source) > at > org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolClientSideTranslatorPB.submitRequest(StorageContainerDatanodeProtocolClientSideTranslatorPB.java:116) > > On a cluster, one datanode stops reporting to SCM while being kept unknown. > The datanode process is still working. Log shows Java heap OOM when it's > serializing protobuf for rpc message. However, datanode silently stops > reports to SCM and the process becomes stale. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-3559) Datanode doesn't handle java heap OutOfMemory exception
Li Cheng created HDDS-3559: -- Summary: Datanode doesn't handle java heap OutOfMemory exception Key: HDDS-3559 URL: https://issues.apache.org/jira/browse/HDDS-3559 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Datanode Affects Versions: 0.5.0 Reporter: Li Cheng 2020-05-05 15:47:41,568 [Datanode State Machine Thread - 167] WARN org.apache.hadoop.ozone.container.common.statemachine.Endpoi ntStateMachine: Unable to communicate to SCM server at host-10-51-87-181:9861 for past 0 seconds. java.io.IOException: com.google.protobuf.ServiceException: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.ipc.ProtobufHelper.getRemoteException(ProtobufHelper.java:47) at org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolClientSideTranslatorPB.submitRequest(StorageContainerDatanodeProtocolClientSideTranslatorPB.java:118) at org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolClientSideTranslatorPB.sendHeartbeat(StorageContainerDatanodeProtocolClientSideTranslatorPB.java:148) at org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask.call(HeartbeatEndpointTask.java:145) at org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask.call(HeartbeatEndpointTask.java:76) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: com.google.protobuf.ServiceException: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.getReturnMessage(ProtobufRpcEngine.java:293) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:270) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) at com.sun.proxy.$Proxy38.submitRequest(Unknown Source) at org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolClientSideTranslatorPB.submitRequest(StorageContainerDatanodeProtocolClientSideTranslatorPB.java:116) On a cluster, one datanode stops reporting to SCM while being kept unknown. The datanode process is still working. Log shows Java heap OOM when it's serializing protobuf for rpc message. However, datanode silently stops reports to SCM and the process becomes stale. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-3556) Refactor configuration in SCMRatisServer to Java-based configuration
[ https://issues.apache.org/jira/browse/HDDS-3556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng reassigned HDDS-3556: -- Assignee: Li Cheng > Refactor configuration in SCMRatisServer to Java-based configuration > > > Key: HDDS-3556 > URL: https://issues.apache.org/jira/browse/HDDS-3556 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Affects Versions: 0.5.0 >Reporter: Li Cheng >Assignee: Li Cheng >Priority: Major > > [https://cwiki.apache.org/confluence/display/HADOOP/Java-based+configuration+API] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-3556) Refactor configuration in SCMRatisServer to Java-based configuration
Li Cheng created HDDS-3556: -- Summary: Refactor configuration in SCMRatisServer to Java-based configuration Key: HDDS-3556 URL: https://issues.apache.org/jira/browse/HDDS-3556 Project: Hadoop Distributed Data Store Issue Type: Sub-task Components: SCM Affects Versions: 0.5.0 Reporter: Li Cheng [https://cwiki.apache.org/confluence/display/HADOOP/Java-based+configuration+API] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-3186) Client requests to SCM RatisServer
[ https://issues.apache.org/jira/browse/HDDS-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng reassigned HDDS-3186: -- Assignee: Nanda kumar (was: Li Cheng) > Client requests to SCM RatisServer > -- > > Key: HDDS-3186 > URL: https://issues.apache.org/jira/browse/HDDS-3186 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Reporter: Li Cheng >Assignee: Nanda kumar >Priority: Major > > Refactor requests to be handled by SCM RatisServer -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-3499) Address compatibility issue by SCM DB instances change
Li Cheng created HDDS-3499: -- Summary: Address compatibility issue by SCM DB instances change Key: HDDS-3499 URL: https://issues.apache.org/jira/browse/HDDS-3499 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Li Cheng After https://issues.apache.org/jira/browse/HDDS-3172, SCM now has one single rocksdb instance instead of multiple db instances. For running Ozone cluster, we need to address compatibility issues. One possible way is to have a side-way tool to migrate old metadata from multiple dbs to current single db. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-3491) SCM Invoke Handler for Ratis calls
Li Cheng created HDDS-3491: -- Summary: SCM Invoke Handler for Ratis calls Key: HDDS-3491 URL: https://issues.apache.org/jira/browse/HDDS-3491 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: Li Cheng -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-3186) Client requests to SCM RatisServer
[ https://issues.apache.org/jira/browse/HDDS-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17088481#comment-17088481 ] Li Cheng commented on HDDS-3186: https://docs.google.com/document/d/1NIf7GypgHFvznB_nb1An-vNfZc8BvxBVCoIpkI_JnQs/edit?usp=sharing > Client requests to SCM RatisServer > -- > > Key: HDDS-3186 > URL: https://issues.apache.org/jira/browse/HDDS-3186 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Reporter: Li Cheng >Assignee: Li Cheng >Priority: Major > > Refactor requests to be handled by SCM RatisServer -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-3466) Improve filterViableNodes performance in pipeline creation
Li Cheng created HDDS-3466: -- Summary: Improve filterViableNodes performance in pipeline creation Key: HDDS-3466 URL: https://issues.apache.org/jira/browse/HDDS-3466 Project: Hadoop Distributed Data Store Issue Type: Improvement Components: SCM Affects Versions: 0.5.0 Reporter: Li Cheng Per [~sodonnell]'s investigation, pipeline creation may have performance issue once the load-sorting algorithm in https://issues.apache.org/jira/browse/HDDS-3139. This task is to track potential performance bottleneck caused by this sorting operation for pipeline creation in large scale cluster. I am a little concerned about the expense of forming the list of healthy nodes on large clusters. We have to do quite a lot of work to form a list and then only use 3 nodes from the list. Even the method {{currentPipelineCount()}} needs to do a few map lookups per node to get the current pipeline count. This is the case even before this change. Creating a pipeline on a large cluster would be expensive already, but this change probably makes it worse, due to the sort needed. I know it was me who suggested the sort. I think the code as it is will work OK upto about 1000 nodes, and then the performance will drop off as the number of nodes goes toward 10k. Eg here are some benchmarks I created using this test code, which is similar to what we are doing in {{filterViableNodes()}}: {{ public List sortingWithMap(BenchmarkState state) \{ return state.otherList.stream() .map(o -> new Mock(o, state.rand.nextInt(20))) .filter(o -> o.getSize() <= 20) .sorted(Comparator.comparingInt(Mock::getSize)) .map(o -> o.getObject()) .collect(Collectors.toList()); }}} The OPs per second for various list sizes are: {{Benchmark (listSize) Mode Cnt Score Error Units Sorting.sortingWithMap 100 thrpt3 113948.345 ± 446.426 ops/s Sorting.sortingWithMap1000 thrpt39468.507 ± 894.138 ops/s Sorting.sortingWithMap5000 thrpt31931.612 ± 263.919 ops/s Sorting.sortingWithMap 1 thrpt3 970.745 ± 25.823 ops/s Sorting.sortingWithMap 10 thrpt3 87.684 ± 35.438 ops/s}} For a 1000 node cluster, with 10 pipelines per node, we would be looking at about 1 second to form all the piplines. For a 5k node cluster, it would be about 25 seconds For a 10k node cluster it would be 103 seconds, but even here, that would be at close to 1000 pipelines per second. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-3186) Client requests to SCM RatisServer
[ https://issues.apache.org/jira/browse/HDDS-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17087524#comment-17087524 ] Li Cheng commented on HDDS-3186: Nanda's doc: [https://docs.google.com/document/d/1YGdROzaWn8RqIjnvMafH0P0hu6M_b_2AQHMLWYVXXb8/edit?invite=CIfyodwC=5e97cf44#] > Client requests to SCM RatisServer > -- > > Key: HDDS-3186 > URL: https://issues.apache.org/jira/browse/HDDS-3186 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Reporter: Li Cheng >Assignee: Li Cheng >Priority: Major > > Refactor requests to be handled by SCM RatisServer -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-3187) SCM StateMachine
[ https://issues.apache.org/jira/browse/HDDS-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng resolved HDDS-3187. Release Note: PR is merged Resolution: Fixed > SCM StateMachine > > > Key: HDDS-3187 > URL: https://issues.apache.org/jira/browse/HDDS-3187 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Reporter: Li Cheng >Assignee: Li Cheng >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > SCM needs a StateMachine to manage states. StateMachine supports > applyTransaction and call RatisServer API. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-3329) Ozone cluster expansion: Block deletion mismatch
[ https://issues.apache.org/jira/browse/HDDS-3329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17073654#comment-17073654 ] Li Cheng commented on HDDS-3329: We were having different network-config for topology between SCM and datanode and we modified it on datanode following with a restart. Not sure if it matters. > Ozone cluster expansion: Block deletion mismatch > > > Key: HDDS-3329 > URL: https://issues.apache.org/jira/browse/HDDS-3329 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Affects Versions: 0.4.1 >Reporter: Li Cheng >Assignee: Lokesh Jain >Priority: Major > > SCM logs keep printing this when we expand Ozone cluster with more datanodes. > > 2020-04-02 19:45:42,745 > [EventQueue-PendingDeleteStatusForPendingDeleteHandler] INFO > org.apache.hadoop.hdds.scm.block.SCMBlockDeletingService: Block deletion > txnID mismatch in datanode 1eacbd89-a835-438e-aa4b-5bc78adb7c8c for > containerID 314. Datanode delete txnID: 0, SCM txnID: 1208 > 2020-04-02 19:45:42,745 > [EventQueue-PendingDeleteStatusForPendingDeleteHandler] INFO > org.apache.hadoop.hdds.scm.block.SCMBlockDeletingService: Block deletion > txnID mismatch in datanode 1eacbd89-a835-438e-aa4b-5bc78adb7c8c for > containerID 351. Datanode delete txnID: 0, SCM txnID: 662 > 2020-04-02 19:45:42,745 > [EventQueue-PendingDeleteStatusForPendingDeleteHandler] INFO > org.apache.hadoop.hdds.scm.block.SCMBlockDeletingService: Block deletion > txnID mismatch in datanode 1eacbd89-a835-438e-aa4b-5bc78adb7c8c for > containerID 352. Datanode delete txnID: 0, SCM txnID: 1085 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-3329) Ozone cluster expansion: Block deletion mismatch
Li Cheng created HDDS-3329: -- Summary: Ozone cluster expansion: Block deletion mismatch Key: HDDS-3329 URL: https://issues.apache.org/jira/browse/HDDS-3329 Project: Hadoop Distributed Data Store Issue Type: Bug Components: SCM Affects Versions: 0.4.1 Reporter: Li Cheng SCM logs keep printing this when we expand Ozone cluster with more datanodes. 2020-04-02 19:45:42,745 [EventQueue-PendingDeleteStatusForPendingDeleteHandler] INFO org.apache.hadoop.hdds.scm.block.SCMBlockDeletingService: Block deletion txnID mismatch in datanode 1eacbd89-a835-438e-aa4b-5bc78adb7c8c for containerID 314. Datanode delete txnID: 0, SCM txnID: 1208 2020-04-02 19:45:42,745 [EventQueue-PendingDeleteStatusForPendingDeleteHandler] INFO org.apache.hadoop.hdds.scm.block.SCMBlockDeletingService: Block deletion txnID mismatch in datanode 1eacbd89-a835-438e-aa4b-5bc78adb7c8c for containerID 351. Datanode delete txnID: 0, SCM txnID: 662 2020-04-02 19:45:42,745 [EventQueue-PendingDeleteStatusForPendingDeleteHandler] INFO org.apache.hadoop.hdds.scm.block.SCMBlockDeletingService: Block deletion txnID mismatch in datanode 1eacbd89-a835-438e-aa4b-5bc78adb7c8c for containerID 352. Datanode delete txnID: 0, SCM txnID: 1085 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-3187) SCM StateMachine
[ https://issues.apache.org/jira/browse/HDDS-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng reassigned HDDS-3187: -- Assignee: Li Cheng > SCM StateMachine > > > Key: HDDS-3187 > URL: https://issues.apache.org/jira/browse/HDDS-3187 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: SCM >Reporter: Li Cheng >Assignee: Li Cheng >Priority: Major > > SCM needs a StateMachine to manage states. StateMachine supports > applyTransaction and call RatisServer API. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org