from:"akash shinde"

Re: Re[4]: Failed to perform cache operation (cache is stopped)

2021-10-17 Thread Akash Shinde

Could someone please help me out here to find out the root cause of
this problem?
This is now happening so frequently.

On Wed, Oct 13, 2021 at 3:56 PM Akash Shinde  wrote:

> Yes, I have set  failureDetectionTimeout  = 6.
> There is no long GC pause.
> Core1 GC report
> <https://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMjEvMTAvMTMvLS1jb3JlXzFfZ2MtMjAyMS0xMC0wNV8wOS0zMS0yNi5sb2cuMC5jdXJyZW50LS05LTU3LTE3=WEB>
> Core 2 GC Report
> <https://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMjEvMTAvMTMvLS1DT1JFXzJfZ2MtMjAyMS0xMC0wNV8wOS00MC0xMC5sb2cuMC0tMTAtMjAtNTQ==WEB>
>
> On Wed, Oct 13, 2021 at 1:16 PM Zhenya Stanilovsky 
> wrote:
>
>>
>> Ok, additionally there is info about node segmentation:
>> Node FAILED: TcpDiscoveryNode [id=fb67a5fd-f1ab-441d-a38e-bab975cd1037,
>> consistentId=0:0:0:0:0:0:0:1%lo,XX.XX.XX.XX,127.0.0.1:47500,
>> addrs=ArrayList [0:0:0:0:0:0:0:1%lo, XX.XX.XX.XX, 127.0.0.1],
>> sockAddrs=HashSet [qagmscore02.xyz.com/XX.XX.XX.XX:47500,
>> /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=25,
>> intOrder=16, lastExchangeTime=1633426750418, loc=false,
>> ver=2.10.0#20210310-sha1:bc24f6ba, isClient=false]
>>
>> Local node SEGMENTED: TcpDiscoveryNode
>> [id=7f357ca2-0ae2-4af0-bfa4-d18e7bcb3797
>>
>> Possible too long JVM pause: 1052 milliseconds.
>>
>> Are you changed default settings networking timeouts ? If no — try to
>> recheck setting of failureDetectionTimeout
>> If you have GC pause longer than 10 seconds, node will be dropped from
>> the cluster(by default).
>>
>>
>> This is the codebase of AgmsCacheJdbcStoreSessionListner.java
>> This null pointer occurs due to a datasource bean not found.
>> The cluster was working fine but what could be the reason for
>> unavailability of datasource bean in between running cluster.
>>
>>
>>
>> public class AgmsCacheJdbcStoreSessionListener extends 
>> CacheJdbcStoreSessionListener {
>>
>>
>>   @SpringApplicationContextResource  public void 
>> setupDataSourceFromSpringContext(Object appCtx) {
>> ApplicationContext appContext = (ApplicationContext) appCtx;
>> setDataSource((DataSource) appContext.getBean("dataSource"));
>>   }
>> }
>>
>>
>>
>> I can see one log line that tells us about a problem on the network side.
>> Is this the possible reason?
>>
>> 2021-10-07 16:28:22,889 197776202 [tcp-disco-msg-worker-[fb67a5fd
>> XX.XX.XX.XX:47500 crd]-#2%springDataNode%-#69%springDataNode%] WARN
>>  o.a.i.s.d.tcp.TcpDiscoverySpi - Node is out of topology (probably, due to
>> short-time network problems).
>>
>>
>>
>>
>> On Mon, Oct 11, 2021 at 7:15 PM stanilovsky evgeny <
>> estanilovs...@gridgain.com
>> > wrote:
>>
>> may be this ?
>>
>> Caused by: java.lang.NullPointerException: null
>> at
>> com.xyz.agms.grid.cache.loader.AgmsCacheJdbcStoreSessionListener.setupDataSourceFromSpringContext(AgmsCacheJdbcStoreSessionListener.java:14)
>> ... 23 common frames omitted
>>
>>
>>
>> Hi Zhenya,
>> CacheStoppedException occurred again on our ignite cluster. I have
>> captured logs with  IGNITE_QUIET = false.
>> There are four core nodes in the cluster and two nodes gone down. I am
>> attaching the logs for two failed nodes.
>> Please let me know if you need any further details.
>>
>> Thanks,
>> Akash
>>
>> On Tue, Sep 7, 2021 at 12:19 PM Zhenya Stanilovsky > > wrote:
>>
>> plz share somehow these logs, if you have no ideas how to share, you can
>> send it directly to arzamas...@mail.ru
>> 
>>
>>
>>
>> Meanwhile I grep the logs with the next occurrence of cache stopped
>> exception,can someone highlight if there is any known bug related to this?
>> I want to check the possible reason for this cache stop exception.
>>
>> On Mon, Sep 6, 2021 at 6:27 PM Akash Shinde > <http://e.mail.ru/compose/?mailto=mailto%3aakashshi...@gmail.com>> wrote:
>>
>> Hi Zhenya,
>> Thanks for the quick response.
>> I believe you are talking about ignite instances. There is
>> single ignite using in application.
>> I also want to point out that I am not using destroyCache()  method
>> anywhere in application.
>>
>> I will set  IGNITE_QUIET = false and try to grep the required logs.
>> This issue occurs by random and there is no way reproduce it.
>>
>> Thanks,
>> Akash
>>
>>
>>
>> On Mon, Sep 6, 2021 at 5:33 PM Zhenya Stanilovsky > <http://e.mail.ru/compose/?mailto=mail

Re: Re[4]: Failed to perform cache operation (cache is stopped)

2021-10-13 Thread Akash Shinde

Yes, I have set  failureDetectionTimeout  = 6.
There is no long GC pause.
Core1 GC report
<https://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMjEvMTAvMTMvLS1jb3JlXzFfZ2MtMjAyMS0xMC0wNV8wOS0zMS0yNi5sb2cuMC5jdXJyZW50LS05LTU3LTE3=WEB>
Core 2 GC Report
<https://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMjEvMTAvMTMvLS1DT1JFXzJfZ2MtMjAyMS0xMC0wNV8wOS00MC0xMC5sb2cuMC0tMTAtMjAtNTQ==WEB>

On Wed, Oct 13, 2021 at 1:16 PM Zhenya Stanilovsky 
wrote:

>
> Ok, additionally there is info about node segmentation:
> Node FAILED: TcpDiscoveryNode [id=fb67a5fd-f1ab-441d-a38e-bab975cd1037,
> consistentId=0:0:0:0:0:0:0:1%lo,XX.XX.XX.XX,127.0.0.1:47500,
> addrs=ArrayList [0:0:0:0:0:0:0:1%lo, XX.XX.XX.XX, 127.0.0.1],
> sockAddrs=HashSet [qagmscore02.xyz.com/XX.XX.XX.XX:47500,
> /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=25,
> intOrder=16, lastExchangeTime=1633426750418, loc=false,
> ver=2.10.0#20210310-sha1:bc24f6ba, isClient=false]
>
> Local node SEGMENTED: TcpDiscoveryNode
> [id=7f357ca2-0ae2-4af0-bfa4-d18e7bcb3797
>
> Possible too long JVM pause: 1052 milliseconds.
>
> Are you changed default settings networking timeouts ? If no — try to
> recheck setting of failureDetectionTimeout
> If you have GC pause longer than 10 seconds, node will be dropped from the
> cluster(by default).
>
>
> This is the codebase of AgmsCacheJdbcStoreSessionListner.java
> This null pointer occurs due to a datasource bean not found.
> The cluster was working fine but what could be the reason for
> unavailability of datasource bean in between running cluster.
>
>
>
> public class AgmsCacheJdbcStoreSessionListener extends 
> CacheJdbcStoreSessionListener {
>
>
>   @SpringApplicationContextResource  public void 
> setupDataSourceFromSpringContext(Object appCtx) {
> ApplicationContext appContext = (ApplicationContext) appCtx;
> setDataSource((DataSource) appContext.getBean("dataSource"));
>   }
> }
>
>
>
> I can see one log line that tells us about a problem on the network side.
> Is this the possible reason?
>
> 2021-10-07 16:28:22,889 197776202 [tcp-disco-msg-worker-[fb67a5fd
> XX.XX.XX.XX:47500 crd]-#2%springDataNode%-#69%springDataNode%] WARN
>  o.a.i.s.d.tcp.TcpDiscoverySpi - Node is out of topology (probably, due to
> short-time network problems).
>
>
>
>
> On Mon, Oct 11, 2021 at 7:15 PM stanilovsky evgeny <
> estanilovs...@gridgain.com
> > wrote:
>
> may be this ?
>
> Caused by: java.lang.NullPointerException: null
> at
> com.xyz.agms.grid.cache.loader.AgmsCacheJdbcStoreSessionListener.setupDataSourceFromSpringContext(AgmsCacheJdbcStoreSessionListener.java:14)
> ... 23 common frames omitted
>
>
>
> Hi Zhenya,
> CacheStoppedException occurred again on our ignite cluster. I have
> captured logs with  IGNITE_QUIET = false.
> There are four core nodes in the cluster and two nodes gone down. I am
> attaching the logs for two failed nodes.
> Please let me know if you need any further details.
>
> Thanks,
> Akash
>
> On Tue, Sep 7, 2021 at 12:19 PM Zhenya Stanilovsky  > wrote:
>
> plz share somehow these logs, if you have no ideas how to share, you can
> send it directly to arzamas...@mail.ru
> 
>
>
>
> Meanwhile I grep the logs with the next occurrence of cache stopped
> exception,can someone highlight if there is any known bug related to this?
> I want to check the possible reason for this cache stop exception.
>
> On Mon, Sep 6, 2021 at 6:27 PM Akash Shinde  <http://e.mail.ru/compose/?mailto=mailto%3aakashshi...@gmail.com>> wrote:
>
> Hi Zhenya,
> Thanks for the quick response.
> I believe you are talking about ignite instances. There is
> single ignite using in application.
> I also want to point out that I am not using destroyCache()  method
> anywhere in application.
>
> I will set  IGNITE_QUIET = false and try to grep the required logs.
> This issue occurs by random and there is no way reproduce it.
>
> Thanks,
> Akash
>
>
>
> On Mon, Sep 6, 2021 at 5:33 PM Zhenya Stanilovsky  <http://e.mail.ru/compose/?mailto=mailto%3aarzamas...@mail.ru>> wrote:
>
> Hi, Akash
> You can obtain such a case, for example when you have several instances
> and :
> inst1:
> cache = inst1.getOrCreateCache("cache1");
>
> after inst2 destroy calling:
>
> cache._some_method_call_
>
> inst2:
>  inst2.destroyCache("cache1");
>
> or shorter: you still use instance that already destroyed, you can simple
> grep your logs and found the time when cache has been stopped.
> probably you need to set IGNITE_QUIET = false.
> [1] https://ignite.apache.org/docs/latest/logging
>
>
>
>
>
>
> Hi,
&g

Re: Failed to perform cache operation (cache is stopped)

2021-09-07 Thread Akash Shinde

Meanwhile I grep the logs with the next occurrence of cache stopped
exception,can someone highlight if there is any known bug related to this?
I want to check the possible reason for this cache stop exception.

On Mon, Sep 6, 2021 at 6:27 PM Akash Shinde  wrote:

> Hi Zhenya,
> Thanks for the quick response.
> I believe you are talking about ignite instances. There is
> single ignite using in application.
> I also want to point out that I am not using destroyCache()  method
> anywhere in application.
>
> I will set  IGNITE_QUIET = false and try to grep the required logs.
> This issue occurs by random and there is no way reproduce it.
>
> Thanks,
> Akash
>
>
>
> On Mon, Sep 6, 2021 at 5:33 PM Zhenya Stanilovsky 
> wrote:
>
>> Hi, Akash
>> You can obtain such a case, for example when you have several instances
>> and :
>> inst1:
>> cache = inst1.getOrCreateCache("cache1");
>>
>> after inst2 destroy calling:
>>
>> cache._some_method_call_
>>
>> inst2:
>>  inst2.destroyCache("cache1");
>>
>> or shorter: you still use instance that already destroyed, you can simple
>> grep your logs and found the time when cache has been stopped.
>> probably you need to set IGNITE_QUIET = false.
>> [1] https://ignite.apache.org/docs/latest/logging
>>
>>
>>
>>
>>
>>
>> Hi,
>> I have four server nodes and six client nodes on ignite cluster. I am
>> using ignite 2.10 version.
>> Some operations are failing due to the CacheStoppedException exception on
>> the server nodes. This has become a blocker issue.
>> Could someone please help me to resolve this issue.
>>
>> *Cache Configuration*
>>
>> CacheConfiguration subscriptionCacheCfg = new 
>> CacheConfiguration<>(CacheName.SUBSCRIPTION_CACHE.name());
>> subscriptionCacheCfg.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL);
>> subscriptionCacheCfg.setWriteThrough(false);
>> subscriptionCacheCfg.setReadThrough(true);
>> subscriptionCacheCfg.setRebalanceMode(CacheRebalanceMode.ASYNC);
>> subscriptionCacheCfg.setWriteSynchronizationMode(CacheWriteSynchronizationMode.FULL_SYNC);
>> subscriptionCacheCfg.setBackups(*2*);
>> Factory storeFactory = 
>> FactoryBuilder.factoryOf(SubscriptionDataLoader.class);
>> subscriptionCacheCfg.setCacheStoreFactory(storeFactory);
>> subscriptionCacheCfg.setIndexedTypes(DefaultDataKey.class, 
>> SubscriptionData.class);
>> subscriptionCacheCfg.setSqlIndexMaxInlineSize(47);
>> RendezvousAffinityFunction affinityFunction = new 
>> RendezvousAffinityFunction();
>> affinityFunction.setExcludeNeighbors(true);
>> subscriptionCacheCfg.setAffinity(affinityFunction);
>> subscriptionCacheCfg.setStatisticsEnabled(true);
>> subscriptionCacheCfg.setPartitionLossPolicy(PartitionLossPolicy.READ_WRITE_SAFE);
>>
>>
>> *Exception stack trace*
>>
>> ERROR c.q.dgms.kafka.TaskRequestListener - Error occurred while consuming
>> the object
>> com.baidu.unbiz.fluentvalidator.exception.RuntimeValidateException:
>> java.lang.IllegalStateException: class
>> org.apache.ignite.internal.processors.cache.CacheStoppedException: Failed
>> to perform cache operation (cache is stopped): SUBSCRIPTION_CACHE
>> at
>> com.baidu.unbiz.fluentvalidator.FluentValidator.doValidate(FluentValidator.java:506)
>> at
>> com.baidu.unbiz.fluentvalidator.FluentValidator.doValidate(FluentValidator.java:461)
>> at
>> com.xyz.dgms.service.UserManagementServiceImpl.deleteUser(UserManagementServiceImpl.java:710)
>> at
>> com.xyz.dgms.kafka.TaskRequestListener.processRequest(TaskRequestListener.java:190)
>> at
>> com.xyz.dgms.kafka.TaskRequestListener.process(TaskRequestListener.java:89)
>> at
>> com.xyz.libraries.mom.kafka.consumer.TopicConsumer.lambda$run$3(TopicConsumer.java:162)
>> at net.jodah.failsafe.Functions$12.call(Functions.java:274)
>> at net.jodah.failsafe.SyncFailsafe.call(SyncFailsafe.java:145)
>> at net.jodah.failsafe.SyncFailsafe.run(SyncFailsafe.java:93)
>> at
>> com.xyz.libraries.mom.kafka.consumer.TopicConsumer.run(TopicConsumer.java:159)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>> at java.lang.Thread.run(Thread.java:748)
>> Caused by: java.lang.IllegalStateException: class
>> org.apache.ignite.internal.processors.cache.CacheStoppedException: Failed
>> to perform cache operation (cache is stopped): SUBSCRIPTION_CACHE
>> at
>> org

Re: Failed to perform cache operation (cache is stopped)

2021-09-06 Thread Akash Shinde

Hi Zhenya,
Thanks for the quick response.
I believe you are talking about ignite instances. There is
single ignite using in application.
I also want to point out that I am not using destroyCache()  method
anywhere in application.

I will set  IGNITE_QUIET = false and try to grep the required logs.
This issue occurs by random and there is no way reproduce it.

Thanks,
Akash



On Mon, Sep 6, 2021 at 5:33 PM Zhenya Stanilovsky 
wrote:

> Hi, Akash
> You can obtain such a case, for example when you have several instances
> and :
> inst1:
> cache = inst1.getOrCreateCache("cache1");
>
> after inst2 destroy calling:
>
> cache._some_method_call_
>
> inst2:
>  inst2.destroyCache("cache1");
>
> or shorter: you still use instance that already destroyed, you can simple
> grep your logs and found the time when cache has been stopped.
> probably you need to set IGNITE_QUIET = false.
> [1] https://ignite.apache.org/docs/latest/logging
>
>
>
>
>
>
> Hi,
> I have four server nodes and six client nodes on ignite cluster. I am
> using ignite 2.10 version.
> Some operations are failing due to the CacheStoppedException exception on
> the server nodes. This has become a blocker issue.
> Could someone please help me to resolve this issue.
>
> *Cache Configuration*
>
> CacheConfiguration subscriptionCacheCfg = new 
> CacheConfiguration<>(CacheName.SUBSCRIPTION_CACHE.name());
> subscriptionCacheCfg.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL);
> subscriptionCacheCfg.setWriteThrough(false);
> subscriptionCacheCfg.setReadThrough(true);
> subscriptionCacheCfg.setRebalanceMode(CacheRebalanceMode.ASYNC);
> subscriptionCacheCfg.setWriteSynchronizationMode(CacheWriteSynchronizationMode.FULL_SYNC);
> subscriptionCacheCfg.setBackups(*2*);
> Factory storeFactory = 
> FactoryBuilder.factoryOf(SubscriptionDataLoader.class);
> subscriptionCacheCfg.setCacheStoreFactory(storeFactory);
> subscriptionCacheCfg.setIndexedTypes(DefaultDataKey.class, 
> SubscriptionData.class);
> subscriptionCacheCfg.setSqlIndexMaxInlineSize(47);
> RendezvousAffinityFunction affinityFunction = new 
> RendezvousAffinityFunction();
> affinityFunction.setExcludeNeighbors(true);
> subscriptionCacheCfg.setAffinity(affinityFunction);
> subscriptionCacheCfg.setStatisticsEnabled(true);
> subscriptionCacheCfg.setPartitionLossPolicy(PartitionLossPolicy.READ_WRITE_SAFE);
>
>
> *Exception stack trace*
>
> ERROR c.q.dgms.kafka.TaskRequestListener - Error occurred while consuming
> the object
> com.baidu.unbiz.fluentvalidator.exception.RuntimeValidateException:
> java.lang.IllegalStateException: class
> org.apache.ignite.internal.processors.cache.CacheStoppedException: Failed
> to perform cache operation (cache is stopped): SUBSCRIPTION_CACHE
> at
> com.baidu.unbiz.fluentvalidator.FluentValidator.doValidate(FluentValidator.java:506)
> at
> com.baidu.unbiz.fluentvalidator.FluentValidator.doValidate(FluentValidator.java:461)
> at
> com.xyz.dgms.service.UserManagementServiceImpl.deleteUser(UserManagementServiceImpl.java:710)
> at
> com.xyz.dgms.kafka.TaskRequestListener.processRequest(TaskRequestListener.java:190)
> at
> com.xyz.dgms.kafka.TaskRequestListener.process(TaskRequestListener.java:89)
> at
> com.xyz.libraries.mom.kafka.consumer.TopicConsumer.lambda$run$3(TopicConsumer.java:162)
> at net.jodah.failsafe.Functions$12.call(Functions.java:274)
> at net.jodah.failsafe.SyncFailsafe.call(SyncFailsafe.java:145)
> at net.jodah.failsafe.SyncFailsafe.run(SyncFailsafe.java:93)
> at
> com.xyz.libraries.mom.kafka.consumer.TopicConsumer.run(TopicConsumer.java:159)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.IllegalStateException: class
> org.apache.ignite.internal.processors.cache.CacheStoppedException: Failed
> to perform cache operation (cache is stopped): SUBSCRIPTION_CACHE
> at
> org.apache.ignite.internal.processors.cache.GridCacheGateway.enter(GridCacheGateway.java:166)
> at
> org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.onEnter(GatewayProtectedCacheProxy.java:1625)
> at
> org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.get(GatewayProtectedCacheProxy.java:673)
> at
> com.xyz.dgms.grid.dao.AbstractDataGridDAO.getData(AbstractDataGridDAO.java:39)
> at
> com.xyz.dgms.grid.dao.AbstractDataGridDAO.getData(AbstractDataGridDAO.java:28)
> at
> com.xyz.dgms.grid.dataservice.DefaultDataGridService.getData(DefaultDataGridService.java:22)
> at
> com.xyz.dgms.grid.dataservice.DefaultDataGridService.getData(DefaultDataGridService.java:10)
> at
> com.xyz.dgms.validators.common.validators.UserDataValidator.validateSubscription(UserDataValidator.java:226)
> at
> com.xyz.dgms.validators.common.validators.UserDataValidator.validateRequest(UserDataValidator.java:124)
> at
>

Failed to perform cache operation (cache is stopped)

2021-09-06 Thread Akash Shinde

Hi,
I have four server nodes and six client nodes on ignite cluster. I am using
ignite 2.10 version.
Some operations are failing due to the CacheStoppedException exception on
the server nodes. This has become a blocker issue.
Could someone please help me to resolve this issue.

*Cache Configuration*

CacheConfiguration subscriptionCacheCfg = new
CacheConfiguration<>(CacheName.SUBSCRIPTION_CACHE.name());
subscriptionCacheCfg.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL);
subscriptionCacheCfg.setWriteThrough(false);
subscriptionCacheCfg.setReadThrough(true);
subscriptionCacheCfg.setRebalanceMode(CacheRebalanceMode.ASYNC);
subscriptionCacheCfg.setWriteSynchronizationMode(CacheWriteSynchronizationMode.FULL_SYNC);
subscriptionCacheCfg.setBackups(*2*);
Factory storeFactory =
FactoryBuilder.factoryOf(SubscriptionDataLoader.class);
subscriptionCacheCfg.setCacheStoreFactory(storeFactory);
subscriptionCacheCfg.setIndexedTypes(DefaultDataKey.class,
SubscriptionData.class);
subscriptionCacheCfg.setSqlIndexMaxInlineSize(47);
RendezvousAffinityFunction affinityFunction = new RendezvousAffinityFunction();
affinityFunction.setExcludeNeighbors(true);
subscriptionCacheCfg.setAffinity(affinityFunction);
subscriptionCacheCfg.setStatisticsEnabled(true);
subscriptionCacheCfg.setPartitionLossPolicy(PartitionLossPolicy.READ_WRITE_SAFE);


*Exception stack trace*

ERROR c.q.dgms.kafka.TaskRequestListener - Error occurred while consuming
the object
com.baidu.unbiz.fluentvalidator.exception.RuntimeValidateException:
java.lang.IllegalStateException: class
org.apache.ignite.internal.processors.cache.CacheStoppedException: Failed
to perform cache operation (cache is stopped): SUBSCRIPTION_CACHE
at
com.baidu.unbiz.fluentvalidator.FluentValidator.doValidate(FluentValidator.java:506)
at
com.baidu.unbiz.fluentvalidator.FluentValidator.doValidate(FluentValidator.java:461)
at
com.xyz.dgms.service.UserManagementServiceImpl.deleteUser(UserManagementServiceImpl.java:710)
at
com.xyz.dgms.kafka.TaskRequestListener.processRequest(TaskRequestListener.java:190)
at
com.xyz.dgms.kafka.TaskRequestListener.process(TaskRequestListener.java:89)
at
com.xyz.libraries.mom.kafka.consumer.TopicConsumer.lambda$run$3(TopicConsumer.java:162)
at net.jodah.failsafe.Functions$12.call(Functions.java:274)
at net.jodah.failsafe.SyncFailsafe.call(SyncFailsafe.java:145)
at net.jodah.failsafe.SyncFailsafe.run(SyncFailsafe.java:93)
at
com.xyz.libraries.mom.kafka.consumer.TopicConsumer.run(TopicConsumer.java:159)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalStateException: class
org.apache.ignite.internal.processors.cache.CacheStoppedException: Failed
to perform cache operation (cache is stopped): SUBSCRIPTION_CACHE
at
org.apache.ignite.internal.processors.cache.GridCacheGateway.enter(GridCacheGateway.java:166)
at
org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.onEnter(GatewayProtectedCacheProxy.java:1625)
at
org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.get(GatewayProtectedCacheProxy.java:673)
at
com.xyz.dgms.grid.dao.AbstractDataGridDAO.getData(AbstractDataGridDAO.java:39)
at
com.xyz.dgms.grid.dao.AbstractDataGridDAO.getData(AbstractDataGridDAO.java:28)
at
com.xyz.dgms.grid.dataservice.DefaultDataGridService.getData(DefaultDataGridService.java:22)
at
com.xyz.dgms.grid.dataservice.DefaultDataGridService.getData(DefaultDataGridService.java:10)
at
com.xyz.dgms.validators.common.validators.UserDataValidator.validateSubscription(UserDataValidator.java:226)
at
com.xyz.dgms.validators.common.validators.UserDataValidator.validateRequest(UserDataValidator.java:124)
at
com.xyz.dgms.validators.common.validators.UserDataValidator.validate(UserDataValidator.java:346)
at
com.xyz.dgms.validators.common.validators.UserDataValidator.validate(UserDataValidator.java:41)
at
com.baidu.unbiz.fluentvalidator.FluentValidator.doValidate(FluentValidator.java:490)
... 12 common frames omitted
Caused by:
org.apache.ignite.internal.processors.cache.CacheStoppedException: Failed
to perform cache operation (cache is stopped): SUBSCRIPTION_CACHE
... 24 common frames omitted


Thanks,
Akash

Re: Failing client node due to not receiving metrics updates-IGNITE-10354

2021-06-10 Thread Akash Shinde

Hi Zhenya, Thanks for the quick response.
I am checking with the network team as well for any network glitch
occurrence.

Thanks,
Akash

On Thu, Jun 10, 2021 at 1:06 PM Zhenya Stanilovsky 
wrote:

> Hello Akash !
> I found that fix mentioned by you is ok.
> Why do you think that your network between server and client is ok ?
> Can you add some network monitoring here ?
> thanks.
>
>
>
>
>
>
> Hi, There is a cluster of four server nodes and six client nodes in
> production. I was using ignite 2.6.0 version and all six client nodes were
> failing with below error
>
> WARN o.a.i.s.d.tcp.TcpDiscoverySpi - Failing client node due to not
> receiving metrics updates from client node within
> 'IgniteConfiguration.clientFailureDetectionTimeout' (consider increasing
> configuration property) [timeout=9, node=TcpDiscoveryNode
> [id=12f9809d-95be-47e3-81fe-d7ffcaab064c,
> consistentId=12f9809d-95be-47e3-81fe-d7ffcaab064c, addrs=ArrayList
> [0:0:0:0:0:0:0:1%lo, 127.0.0.1, ], sockAddrs=HashSet
> [/0:0:0:0:0:0:0:1%lo:0, /127.0.0.1:0, /:0],
> discPort=0, order=155, intOrder=82, lastExchangeTime=1623154238808,
> loc=false, ver=2.10.0#20210310-sha1:bc24f6ba, isClient=true]]
>
> Then I have upgraded the ignite version to 2.10.0 to get the fix of known
> issue IGNITE-10354 .
> But I am still facing the issue even after upgrading to the 10.2.0
> ignite version.
>
> Could someone help here.
>
> Thanks,
> Akash
>
>
>
>
>
>
>
>

Failing client node due to not receiving metrics updates-IGNITE-10354

2021-06-09 Thread Akash Shinde

Hi, There is a cluster of four server nodes and six client nodes in
production. I was using ignite 2.6.0 version and all six client nodes were
failing with below error

WARN o.a.i.s.d.tcp.TcpDiscoverySpi - Failing client node due to not
receiving metrics updates from client node within
'IgniteConfiguration.clientFailureDetectionTimeout' (consider increasing
configuration property) [timeout=9, node=TcpDiscoveryNode
[id=12f9809d-95be-47e3-81fe-d7ffcaab064c,
consistentId=12f9809d-95be-47e3-81fe-d7ffcaab064c, addrs=ArrayList
[0:0:0:0:0:0:0:1%lo, 127.0.0.1, ], sockAddrs=HashSet
[/0:0:0:0:0:0:0:1%lo:0, /127.0.0.1:0, /:0],
discPort=0, order=155, intOrder=82, lastExchangeTime=1623154238808,
loc=false, ver=2.10.0#20210310-sha1:bc24f6ba, isClient=true]]

Then I have upgraded the ignite version to 2.10.0 to get the fix of known
issue IGNITE-10354 .
But I am still facing the issue even after upgrading to the 10.2.0
ignite version.

Could someone help here.

Thanks,
Akash

Re: Countdown latch issue with 2.6.0

2020-06-08 Thread Akash Shinde

Hi, I have created jira for this issue
https://issues.apache.org/jira/browse/IGNITE-13132

Thanks,
Akash

On Sun, Jun 7, 2020 at 9:29 AM Akash Shinde  wrote:

> Can someone please help me with this issue.
>
> On Sat, Jun 6, 2020 at 6:45 PM Akash Shinde  wrote:
>
>> Hi,
>> Issue: Countdown latched gets reinitialize to original value(4) when one
>> or more (but not all) node goes down. (Partition loss happened)
>>
>> We are using ignite's distributed countdownlatch to make sure that cache
>> loading is completed on all server nodes. We do this to make sure that our
>> kafka consumers starts only after cache loading is complete on all server
>> nodes. This is the basic criteria which needs to be fulfilled before starts
>> actual processing
>>
>>
>>  We have 4 server nodes and countdownlatch is initialized to 4. We use
>> "cache.loadCache" method to start the cache loading. When each server
>> completes cache loading it reduces the count by 1 using countDown method.
>> So when all the nodes completes cache loading, the count reaches to zero.
>> When this count  reaches to zero we start kafka consumers on all server
>> nodes.
>>
>>  But we saw weird behavior in prod env. The 3 server nodes were shut down
>> at the same time. But 1 node is still alive. When this happened the count
>> down was reinitialized to original value i.e. 4. But I am not able to
>> reproduce this in dev env.
>>
>>  Is this a bug, when one or more (but not all) nodes goes down then count
>> re initializes back to original value?
>>
>> Thanks,
>> Akash
>>
>

Re: Countdown latch issue with 2.6.0

2020-06-06 Thread Akash Shinde

Can someone please help me with this issue.

On Sat, Jun 6, 2020 at 6:45 PM Akash Shinde  wrote:

> Hi,
> Issue: Countdown latched gets reinitialize to original value(4) when one
> or more (but not all) node goes down. (Partition loss happened)
>
> We are using ignite's distributed countdownlatch to make sure that cache
> loading is completed on all server nodes. We do this to make sure that our
> kafka consumers starts only after cache loading is complete on all server
> nodes. This is the basic criteria which needs to be fulfilled before starts
> actual processing
>
>
>  We have 4 server nodes and countdownlatch is initialized to 4. We use
> "cache.loadCache" method to start the cache loading. When each server
> completes cache loading it reduces the count by 1 using countDown method.
> So when all the nodes completes cache loading, the count reaches to zero.
> When this count  reaches to zero we start kafka consumers on all server
> nodes.
>
>  But we saw weird behavior in prod env. The 3 server nodes were shut down
> at the same time. But 1 node is still alive. When this happened the count
> down was reinitialized to original value i.e. 4. But I am not able to
> reproduce this in dev env.
>
>  Is this a bug, when one or more (but not all) nodes goes down then count
> re initializes back to original value?
>
> Thanks,
> Akash
>

CountDownLatch issue in Ignite 2.6 version

2020-06-06 Thread Akash Shinde

*Issue:* Countdown latched gets reinitialize to original value(4) when one
or more (but not all) node goes down. *(Partition loss happened)*

We are using ignite's distributed countdownlatch to make sure that cache
loading is completed on all server nodes. We do this to make sure that our
kafka consumers starts only after cache loading is complete on all server
nodes. This is the basic criteria which needs to be fulfilled before starts
actual processing


 We have 4 server nodes and countdownlatch is initialized to 4. We use
cache.loadCache method to start the cache loading. When each server
completes cache loading it reduces the count by 1 using countDown method.
So when all the nodes completes cache loading, the count reaches to zero.
When this count  reaches to zero we start kafka consumers on all server
nodes.

 But we saw weird behavior in prod env. The 3 server nodes were shut down
at the same time. But 1 node is still alive. When this happened the count
down was reinitialized to original value i.e. 4. But I am not able to
reproduce this in dev env.

 Is this a bug, when one or more (but not all) nodes goes down then count
re initializes back to original value?

Thanks,
Akash

Countdown latch issue with 2.6.0

2020-06-06 Thread Akash Shinde

Hi,
Issue: Countdown latched gets reinitialize to original value(4) when one or
more (but not all) node goes down. (Partition loss happened)

We are using ignite's distributed countdownlatch to make sure that cache
loading is completed on all server nodes. We do this to make sure that our
kafka consumers starts only after cache loading is complete on all server
nodes. This is the basic criteria which needs to be fulfilled before starts
actual processing


 We have 4 server nodes and countdownlatch is initialized to 4. We use
"cache.loadCache" method to start the cache loading. When each server
completes cache loading it reduces the count by 1 using countDown method.
So when all the nodes completes cache loading, the count reaches to zero.
When this count  reaches to zero we start kafka consumers on all server
nodes.

 But we saw weird behavior in prod env. The 3 server nodes were shut down
at the same time. But 1 node is still alive. When this happened the count
down was reinitialized to original value i.e. 4. But I am not able to
reproduce this in dev env.

 Is this a bug, when one or more (but not all) nodes goes down then count
re initializes back to original value?

Thanks,
Akash

Test Mail

2020-06-06 Thread Akash Shinde

Re: Reloading of cache not working

2020-05-12 Thread Akash Shinde

Hi,
My question is specifically for clo.apply(key, data)  that I invoked
in CacheStoreAdapter.loadCache
method.
So, does this method (clo.apply) overrides value for the keys which are
already present in cache or it just skips?
My observation is that its not overriding value for the keys which are
already present and adding data for only new keys.

Thanks,
Akash

On Wed, May 13, 2020 at 1:15 AM akorensh  wrote:

> Hi,
> 1)  loadCache() is implementation dependent, by default it just adds new
> records to the cache .
>   see example:
>
> https://github.com/apache/ignite/blob/master/examples/src/main/java/org/apache/ignite/examples/datagrid/store/CacheLoadOnlyStoreExample.java
>
>
>   Take a look at jdbc example as well:
>
> https://github.com/apache/ignite/blob/master/examples/src/main/java/org/apache/ignite/examples/datagrid/store/jdbc/CacheJdbcStoreExample.java
>
>   more info:
> https://apacheignite.readme.io/docs/data-loading#ignitecacheloadcache
>
> 2) You do not need to clear the cache in order to call loadCache
>
> 3)  This is implementation dependent.   By default it does not overwrite
> existing entries.
>
> you can experiment w/the above examples by putting:
>   cache.put(1L,new Person(1L,"A", "B") ) before the loadCache() statement
> to
> get a better feel for its behavior.
>
> Thanks, Alex
>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Reloading of cache not working

2020-05-12 Thread Akash Shinde

Hi,
I am using ignite 2.6. I am trying refresh few caches which are already
loaded during server startup using cache loaders.
To refresh caches I am invoking IgniteCache.loadCache method. But it seems
that it is not updating data in caches.
1) Is it the expected behavior?
2) Do I have to clear the cache before calling IgniteCache.loadCache method
to refresh the cache ?
3) Does the clo.apply(key, data) (invoked from CacheStoreAdapter.loadCache
method) updates the cache only for the keys which are not already present
in cache?

Thanks,
Akash

cache.containsKey returns false

2020-02-28 Thread Akash Shinde

Hi,
I am using Ignite 2.6 version.

I have partitioned cache, read-through and write-through is enabled.
Back-up count is 1 and total number of server nodes in cluster are  3.

When I try to get the data from a cache for a key using cache.get(key)
method, ignite reads the value from database using provided cache loader
and returns the value by read-through approach.

But when I execute cache().containsKey(key) on client node, I get false.

But the strange this is this behavior is not same for all keys of the same
cache.
For key1 I get false but for key2 I get true. But both the keys are present
in cache.

I executed the SQL on every node (one node at a time) using web-console, I
got data present only on one node out of three. This seems to be primary
for this particular key.

Can someone please advise why this is happening? Is it a bug in ignite?
This seems to be a very basic case.



Thanks,
Akash

Re: Local node terminated after segmentation

2019-12-24 Thread Akash Shinde

Can someone please help me on this?

On Thu, Dec 12, 2019 at 1:11 PM Akash Shinde  wrote:

> Hi,
>
> Can you please explain on high level how GridGain implementations protects
> from having  two segments that are alive at the same time which could lead
> to data inconsistency over time? What exactly does it do to achieve this?
>
> Regards,
> A.
>
> On Wed, Dec 11, 2019 at 5:48 PM Stanislav Lukyanov 
> wrote:
>
>> In Ignite a node can go into "segmented" state in two cases really: 1. A
>> node was unavailable (sleeping. hanging in full GC, etc) for a long time 2.
>> Cluster detected a possible split-brain situation and marked the node as
>> "segmented".
>>
>> Yes, split-brain protection (in GridGain implementation and in theory
>> too) doesn't protect your node from stopping. It protects you from having
>> two segments that are alive at the same time which could lead to data
>> inconsistency over time.
>>
>> Regarding Discovery and large clusters. If your cluster is too big for
>> the ring-based TcpDiscoverySpi to work well then you should use Zookeeper
>> Discovery which was created specifically to support large clusters.
>>
>> Stan
>>
>> On Mon, Dec 9, 2019 at 4:02 PM Prasad Bhalerao <
>> prasadbhalerao1...@gmail.com> wrote:
>>
>>>
>>> Can someone please advise on this?
>>>>
>>>> -- Forwarded message -
>>>> From: Prasad Bhalerao 
>>>> Date: Fri, Nov 29, 2019 at 7:53 AM
>>>> Subject: Re: Local node terminated after segmentation
>>>> To: 
>>>>
>>>>
>>>> I had checked the resource you mentioned, but I was confused with
>>>> grid-gain doc  describing it as protection against split-brain. Because if
>>>> the node is segmented the only thing one can do is stop/restart/noop.
>>>> I was just wondering how it provides protection against split-brain.
>>>> Now I think by protection it means kill the segmented node/nodes or
>>>> restart it and bring it back in the cluster .
>>>>
>>>> Ignite uses TcpDiscoverSpi to send a heartbeat the next node in the
>>>> ring right to check if the node is reachable or not.
>>>> So the question in what situation one needs one more ways to check if
>>>> the node is reachable or not using different resolvers?
>>>>
>>>> Please let me know if my understanding is correct.
>>>>
>>>> The article you mentioned, I had checked that code. It requires a node
>>>> to be configured in advance so that resolver can check if that node is
>>>> reachable from local host. It doesn't not check if all the nodes are
>>>> reachable from local host.
>>>>
>>>> Eg: node1 will check for node2 and node2 will check for node 3 and node
>>>> 3 will check for node1 to complete the ring
>>>> Just wondering how to configure this plugin in prod env with large
>>>> cluster.
>>>> I tried to check grid-gain doc to see if they have provided any sample
>>>> code to configure their plugins just to get an idea but did not find any.
>>>>
>>>> Can you please advise?
>>>>
>>>>
>>>> Thanks,
>>>> Prasad
>>>>
>>>> On Thu 28 Nov, 2019, 11:41 PM akurbanov >>>
>>>>> Hello,
>>>>>
>>>>> Basically this is a mechanism to implement custom logical/network
>>>>> split-brain protection. Segmentation resolvers allow you to implement
>>>>> a way
>>>>> to determine if node has to be segmented/stopped/etc in method
>>>>> isValidSegment() and possibly use different combinations of resolvers
>>>>> within
>>>>> processor.
>>>>>
>>>>> If you want to check out how it could be done, some articles/source
>>>>> samples
>>>>> that might give you a good insight may be easily found on the web,
>>>>> like:
>>>>>
>>>>> https://medium.com/@aamargajbhiye/how-to-handle-network-segmentation-in-apache-ignite-35dc5fa6f239
>>>>>
>>>>> http://apache-ignite-users.70518.x6.nabble.com/Segmentation-Plugin-blog-or-article-td27955.html
>>>>>
>>>>> 2-3 are described in the documentation, copying the link just to point
>>>>> out
>>>>> which one:
>>>>> https://apacheignite.readme.io/docs/critical-failures-handling
>>>

How to identify if re-balancing completed

2019-12-02 Thread Akash Shinde

Hi,

If there are multiple nodes in cluster and if one of the node/nodes goes
down or a new node/nodes is added how to make sure that re-balancing of all
cache partitions has been completed successfully? I need this to implement
rolling restart.

Thanks,
Aksah

Primary Entry Count are not correct in cache

2019-11-29 Thread Akash Shinde

Hi I have created a cache with following configuration. I started four
nodes,each node on different machines.
I have loaded this cache with loader.
Issue: I am not performing any operation on this cache but I am able to see
the primary key count not constant. Its keep on changing after some time. I
am taking this key count from gridgain web console. Ideally my loader query
result count should match with primary entries in cache.
Ignite version 2.6.0.
Could someone  suggest why this is happening?

















*CacheConfiguration subscriptionCacheCfg = new
CacheConfiguration<>(CacheName.SUBSCRIPTION_CACHE.name());subscriptionCacheCfg.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL);subscriptionCacheCfg.setWriteThrough(false);subscriptionCacheCfg.setReadThrough(true);subscriptionCacheCfg.setRebalanceMode(CacheRebalanceMode.ASYNC);subscriptionCacheCfg.setWriteSynchronizationMode(CacheWriteSynchronizationMode.FULL_SYNC);subscriptionCacheCfg.setBackups(2);Factory
storeFactory = 
FactoryBuilder.factoryOf(SubscriptionDataLoader.class);subscriptionCacheCfg.setCacheStoreFactory(storeFactory);subscriptionCacheCfg.setIndexedTypes(DefaultDataKey.class,
SubscriptionData.class);subscriptionCacheCfg.setSqlIndexMaxInlineSize(47);RendezvousAffinityFunction
affinityFunction = new
RendezvousAffinityFunction();affinityFunction.setExcludeNeighbors(true);subscriptionCacheCfg.setAffinity(affinityFunction);subscriptionCacheCfg.setStatisticsEnabled(true);subscriptionCacheCfg.setPartitionLossPolicy(PartitionLossPolicy.READ_WRITE_SAFE);*


Thanks,

Akash

Re: resetLostPartitions is blocked inside event listener

2019-11-14 Thread Akash Shinde

Thank you Ilya.

On Fri, Nov 8, 2019 at 12:43 AM Ilya Kasnacheev 
wrote:

> Hello!
>
> Event listener is invoked synchronously from internal threads. If
> partition reset has to happen from the same thread, then obviously there
> will be a deadlock.
>
> Cache listeners have same property, i.e., you should avoid doing cache
> operations from them.
>
> This is tradeoff between performance and usability which was resolved in
> favor of former.
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> чт, 7 нояб. 2019 г. в 20:30, Prasad Bhalerao  >:
>
>> Do you mean to say, spawn a different thread from event listener and
>> reset the lost partition in that thread?
>>
>> I tried this and it works.
>>
>> But wanted to understand the reason, why this call get blocked in event
>> listener?
>>
>> Thanks,
>> Prasad
>>
>> On Thu 7 Nov, 2019, 9:28 PM Ilya Kasnacheev > wrote:
>>
>>> Hello!
>>>
>>> It is not advisable to call any blocking methods from event listeners.
>>> Just fire resetLostPartitions from another thread.
>>>
>>> Regards,
>>> --
>>> Ilya Kasnacheev
>>>
>>>
>>> чт, 7 нояб. 2019 г. в 15:17, Akash Shinde :
>>>
>>>> Hi,
>>>> I am trying to handle lost partition scenario.
>>>> I have written event listener listening  to
>>>> EventType.EVT_CACHE_REBALANCE_PART_DATA_LOST event.
>>>> I want to reset lost partition state of cache after cache loading  is
>>>> done.
>>>> *Issue:* ignite.resetLostPartitions(caheName) is getting blocked and
>>>> not completing.
>>>>
>>>> Please find the code for Event Listener. Someone can help on this. *Why
>>>> this resetLostPartitions getting blocked.*
>>>>
>>>> public class IgniteEventListner implements 
>>>> IgnitePredicate {
>>>>private static final Logger LOGGER = 
>>>> LoggerFactory.getLogger(IgniteEventListner.class);
>>>>
>>>>   private final Ignite ignite;
>>>>
>>>>   public IgniteEventListner(Ignite ignite) {
>>>> this.ignite = ignite;
>>>>   }
>>>>
>>>>   @Override
>>>>   public boolean apply(CacheRebalancingEvent evt) {
>>>>
>>>> IgniteCache cache = 
>>>> ignite.getOrCreateCache(CacheName.ASSET_GROUP_CACHE.name());
>>>> Collection lostPartitions = cache.lostPartitions();
>>>> reloadCache(lostPartitions); //perform partition based cache loading
>>>>
>>>>* 
>>>> ignite.resetLostPartitions(Arrays.asList(CacheName.ASSET_GROUP_CACHE.name()));
>>>>   //Reset partitions*
>>>>
>>>> System.out.println("Check-1, Partition lost event processed");
>>>>
>>>> return true;
>>>>   }
>>>> }
>>>>
>>>> *Cache Configuration*
>>>>
>>>> private CacheConfiguration assetGroupCacheCfg() {
>>>> CacheConfiguration assetGroupCacheCfg = new 
>>>> CacheConfiguration<>(CacheName.ASSET_GROUP_CACHE.name());
>>>> assetGroupCacheCfg.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL);
>>>> assetGroupCacheCfg.setWriteThrough(false);
>>>> assetGroupCacheCfg.setReadThrough(false);
>>>> assetGroupCacheCfg.setRebalanceMode(CacheRebalanceMode.ASYNC);
>>>> assetGroupCacheCfg.setBackups(0);
>>>> assetGroupCacheCfg.setCacheMode(CacheMode.PARTITIONED);
>>>> assetGroupCacheCfg.setIndexedTypes(DefaultDataAffinityKey.class, 
>>>> AssetGroupData.class);
>>>> assetGroupCacheCfg.setSqlIndexMaxInlineSize(100);
>>>> RendezvousAffinityFunction affinityFunction = new 
>>>> RendezvousAffinityFunction();
>>>> assetGroupCacheCfg.setAffinity(affinityFunction);
>>>> assetGroupCacheCfg.setStatisticsEnabled(true);
>>>>
>>>> assetGroupCacheCfg.setPartitionLossPolicy(PartitionLossPolicy.READ_WRITE_SAFE);
>>>> return assetGroupCacheCfg;
>>>>   }
>>>>
>>>> *Ignite Configuration*
>>>>
>>>> private IgniteConfiguration getIgniteConfiguration() {
>>>>
>>>>   TcpDiscoveryVmIpFinder ipFinder = new TcpDiscoveryVmIpFinder();
>>>>   String[] hosts = {"127.0.0.1:47500..47509"};
>>>>   ipFinder.setAddresses(Arrays.asList(hosts));
>>>>
>>>

resetLostPartitions is blocked inside event listener

2019-11-07 Thread Akash Shinde

Hi,
I am trying to handle lost partition scenario.
I have written event listener listening  to
EventType.EVT_CACHE_REBALANCE_PART_DATA_LOST event.
I want to reset lost partition state of cache after cache loading  is done.
*Issue:* ignite.resetLostPartitions(caheName) is getting blocked and not
completing.

Please find the code for Event Listener. Someone can help on this. *Why
this resetLostPartitions getting blocked.*

public class IgniteEventListner implements
IgnitePredicate {
   private static final Logger LOGGER =
LoggerFactory.getLogger(IgniteEventListner.class);

  private final Ignite ignite;

  public IgniteEventListner(Ignite ignite) {
this.ignite = ignite;
  }

  @Override
  public boolean apply(CacheRebalancingEvent evt) {

IgniteCache cache =
ignite.getOrCreateCache(CacheName.ASSET_GROUP_CACHE.name());
Collection lostPartitions = cache.lostPartitions();
reloadCache(lostPartitions); //perform partition based cache loading

   * 
ignite.resetLostPartitions(Arrays.asList(CacheName.ASSET_GROUP_CACHE.name()));
 //Reset partitions*

System.out.println("Check-1, Partition lost event processed");

return true;
  }
}

*Cache Configuration*

private CacheConfiguration assetGroupCacheCfg() {
CacheConfiguration assetGroupCacheCfg = new
CacheConfiguration<>(CacheName.ASSET_GROUP_CACHE.name());
assetGroupCacheCfg.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL);
assetGroupCacheCfg.setWriteThrough(false);
assetGroupCacheCfg.setReadThrough(false);
assetGroupCacheCfg.setRebalanceMode(CacheRebalanceMode.ASYNC);
assetGroupCacheCfg.setBackups(0);
assetGroupCacheCfg.setCacheMode(CacheMode.PARTITIONED);
assetGroupCacheCfg.setIndexedTypes(DefaultDataAffinityKey.class,
AssetGroupData.class);
assetGroupCacheCfg.setSqlIndexMaxInlineSize(100);
RendezvousAffinityFunction affinityFunction = new
RendezvousAffinityFunction();
assetGroupCacheCfg.setAffinity(affinityFunction);
assetGroupCacheCfg.setStatisticsEnabled(true);
   
assetGroupCacheCfg.setPartitionLossPolicy(PartitionLossPolicy.READ_WRITE_SAFE);
return assetGroupCacheCfg;
  }

*Ignite Configuration*

private IgniteConfiguration getIgniteConfiguration() {

  TcpDiscoveryVmIpFinder ipFinder = new TcpDiscoveryVmIpFinder();
  String[] hosts = {"127.0.0.1:47500..47509"};
  ipFinder.setAddresses(Arrays.asList(hosts));

  TcpDiscoverySpi discoSpi = new TcpDiscoverySpi();
  discoSpi.setIpFinder(ipFinder);

  IgniteConfiguration cfg = new IgniteConfiguration();
  cfg.setDiscoverySpi(discoSpi);
  cfg.setIgniteInstanceName("springDataNode");
  cfg.setPeerClassLoadingEnabled(false);
  cfg.setRebalanceThreadPoolSize(4);
  DataStorageConfiguration storageCfg = new DataStorageConfiguration();
  DataRegionConfiguration regionConfiguration = new DataRegionConfiguration();
  regionConfiguration.setInitialSize(3L * 1024 * 1024 * 1024);
  regionConfiguration.setMaxSize(3L * 1024 * 1024 * 1024);
  regionConfiguration.setMetricsEnabled(true);

  storageCfg.setDefaultDataRegionConfiguration(regionConfiguration);
  storageCfg.setStoragePath("c:/ignite-storage/storage");
  storageCfg.setWalPath("c:/ignite-storage/storage/wal");
  storageCfg.setWalArchivePath("c:/ignite-storage/storage/wal-archive");
  storageCfg.setMetricsEnabled(true);
  cfg.setDataStorageConfiguration(storageCfg);
  
cfg.setIncludeEventTypes(EventType.EVT_CACHE_REBALANCE_PART_DATA_LOST,EventType.EVT_NODE_FAILED);
  cfg.setCacheConfiguration(getCacheConfigurations());
  return cfg;
}


Thanks,

Akash

Re: Read through not working as expected in case of Replicated cache

2019-10-29 Thread Akash Shinde

Hi,
I tried this scenario with version 2.7.6 and issue is still there with
2.7.6.
I can not go with version 2.7.6 due to IGNITE-10884. This
issue(IGNITE-10884) if fixed but not yet released.
Could you please let me know what is the workaround for replicated cache
issue.

Thanks,
Akash


On Tue, Oct 29, 2019 at 8:53 PM Ilya Kasnacheev 
wrote:

> Hello!
>
> I remember that we had this issue. Have you tried 2.7.6 yet?
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> вт, 29 окт. 2019 г. в 18:18, Akash Shinde :
>
>> I am using Ignite 2.6 version.
>>
>> I am starting 3 server nodes with a replicated cache and 1 client node.
>> Cache configuration is as follows.
>> Read-through true on but write-through is false. Load data by key is
>> implemented as given below in cache-loader.
>>
>> Steps to reproduce issue:
>> 1) Delete an entry from cache using IgniteCache.remove() method. (Entry
>> is just removed from cache but present in DB as write-through is false)
>> 2) Invoke IgniteCache.get() method for the same key in step 1.
>> 3) Now query the cache from client node. Every invocation returns
>> different results.
>> Sometimes it returns reloaded entry, sometime returns the results
>> without reloaded entry.
>>
>> Looks like read-through is not replicating the reloaded entry on all
>> nodes in case of REPLICATED cache.
>>
>> So to investigate further I changed the cache mode to PARTITIONED and set
>> the backup count to 3 i.e. total number of nodes present in cluster (to
>> mimic REPLICATED behavior).
>> This time it worked as expected.
>> Every invocation returned the same result with reloaded entry.
>>
>> *  private CacheConfiguration networkCacheCfg() {*
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *CacheConfiguration networkCacheCfg = new
>> CacheConfiguration<>(CacheName.NETWORK_CACHE.name
>> <http://CacheName.NETWORK_CACHE.name>());
>> networkCacheCfg.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL);
>> networkCacheCfg.setWriteThrough(false);
>> networkCacheCfg.setReadThrough(true);
>> networkCacheCfg.setRebalanceMode(CacheRebalanceMode.ASYNC);
>> networkCacheCfg.setWriteSynchronizationMode(CacheWriteSynchronizationMode.FULL_SYNC);
>>   //networkCacheCfg.setBackups(3);
>> networkCacheCfg.setCacheMode(CacheMode.REPLICATED);
>> Factory storeFactory =
>> FactoryBuilder.factoryOf(NetworkDataCacheLoader.class);
>> networkCacheCfg.setCacheStoreFactory(storeFactory);
>> networkCacheCfg.setIndexedTypes(DefaultDataAffinityKey.class,
>> NetworkData.class);networkCacheCfg.setSqlIndexMaxInlineSize(65);
>> RendezvousAffinityFunction affinityFunction = new
>> RendezvousAffinityFunction();
>> affinityFunction.setExcludeNeighbors(false);
>> networkCacheCfg.setAffinity(affinityFunction);
>> networkCacheCfg.setStatisticsEnabled(true);   //
>> networkCacheCfg.setNearConfiguration(nearCacheConfiguration());return
>> networkCacheCfg;  }*
>>
>> @Override
>> public V load(K k) throws CacheLoaderException {
>> V value = null;
>> DataSource dataSource = springCtx.getBean(DataSource.class);
>> try (Connection connection = dataSource.getConnection();
>>  PreparedStatement statement = 
>> connection.prepareStatement(loadByKeySql)) {
>> //statement.setObject(1, k.getId());
>> setPreparedStatement(statement,k);
>> try (ResultSet rs = statement.executeQuery()) {
>> if (rs.next()) {
>> value = rowMapper.mapRow(rs, 0);
>> }
>> }
>> } catch (SQLException e) {
>>
>> throw new CacheLoaderException(e.getMessage(), e);
>> }
>>
>> return value;
>> }
>>
>>
>> Thanks,
>>
>> Akash
>>
>>

Read through not working as expected in case of Replicated cache

2019-10-29 Thread Akash Shinde

I am using Ignite 2.6 version.

I am starting 3 server nodes with a replicated cache and 1 client node.
Cache configuration is as follows.
Read-through true on but write-through is false. Load data by key is
implemented as given below in cache-loader.

Steps to reproduce issue:
1) Delete an entry from cache using IgniteCache.remove() method. (Entry is
just removed from cache but present in DB as write-through is false)
2) Invoke IgniteCache.get() method for the same key in step 1.
3) Now query the cache from client node. Every invocation returns different
results.
Sometimes it returns reloaded entry, sometime returns the results
without reloaded entry.

Looks like read-through is not replicating the reloaded entry on all nodes
in case of REPLICATED cache.

So to investigate further I changed the cache mode to PARTITIONED and set
the backup count to 3 i.e. total number of nodes present in cluster (to
mimic REPLICATED behavior).
This time it worked as expected.
Every invocation returned the same result with reloaded entry.

*  private CacheConfiguration networkCacheCfg() {*



















*CacheConfiguration networkCacheCfg = new
CacheConfiguration<>(CacheName.NETWORK_CACHE.name
());
networkCacheCfg.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL);
networkCacheCfg.setWriteThrough(false);
networkCacheCfg.setReadThrough(true);
networkCacheCfg.setRebalanceMode(CacheRebalanceMode.ASYNC);
networkCacheCfg.setWriteSynchronizationMode(CacheWriteSynchronizationMode.FULL_SYNC);
  //networkCacheCfg.setBackups(3);
networkCacheCfg.setCacheMode(CacheMode.REPLICATED);
Factory storeFactory =
FactoryBuilder.factoryOf(NetworkDataCacheLoader.class);
networkCacheCfg.setCacheStoreFactory(storeFactory);
networkCacheCfg.setIndexedTypes(DefaultDataAffinityKey.class,
NetworkData.class);networkCacheCfg.setSqlIndexMaxInlineSize(65);
RendezvousAffinityFunction affinityFunction = new
RendezvousAffinityFunction();
affinityFunction.setExcludeNeighbors(false);
networkCacheCfg.setAffinity(affinityFunction);
networkCacheCfg.setStatisticsEnabled(true);   //
networkCacheCfg.setNearConfiguration(nearCacheConfiguration());return
networkCacheCfg;  }*

@Override
public V load(K k) throws CacheLoaderException {
V value = null;
DataSource dataSource = springCtx.getBean(DataSource.class);
try (Connection connection = dataSource.getConnection();
 PreparedStatement statement =
connection.prepareStatement(loadByKeySql)) {
//statement.setObject(1, k.getId());
setPreparedStatement(statement,k);
try (ResultSet rs = statement.executeQuery()) {
if (rs.next()) {
value = rowMapper.mapRow(rs, 0);
}
}
} catch (SQLException e) {

throw new CacheLoaderException(e.getMessage(), e);
}

return value;
}


Thanks,

Akash

Re: Partition loss due to node segmentation

2019-10-09 Thread Akash Shinde

Looks like I have not asked the correct question. Let me correct my
question.

If the cluster is segmented and 2 or more nodes out of total 5 nodes are
thrown out of cluster then the data is lost (back up is set to 1).

Now my question is  does ignite fire the event
"EVT_CACHE_REBALANCE_PART_DATA_LOST" in this case or partition lost due to
segmentation?
I am trying to implement recovery/fallback mechanism in case of
partition lost due to any reason. Using this event I am reloading the lost
partitions.

Thanks,
A.

On Wed, Oct 9, 2019 at 1:07 PM Gaurav Bajaj  wrote:

> Hello Akash,
>
> Yes of course. As cluster is segmented, it is very well possible that
> cluster segment doesn't have all partitions in that segment.
> If you have backup partitions in the same segment, then only you have
> complete partitions.
>
> Best Regards,
> Gaurav
>
> On Wed, Oct 9, 2019 at 7:20 AM Akash Shinde  wrote:
>
>> Hi,
>>  Are there any possibilities of partition loss in case of node
>> segmentation?
>>
>> Thanks,
>> Akash
>>
>

Partition loss due to node segmentation

2019-10-08 Thread Akash Shinde

Hi,
 Are there any possibilities of partition loss in case of node segmentation?

Thanks,
Akash

Re: Handling Of Partition loss

2019-09-18 Thread Akash Shinde

Hi Maxim,
Thanks for your input.
I want to load lost data into cache in case of lost partitions happened. I
am handling this cache loading in event listener(which is listening
to *EVT_CACHE_REBALANCE_PART_DATA_LOST
*event).
I am taking help of IgniteCache.lostPartitions() method to get the lost
partitions. I am going to pass this partition list to cache loader.  Issue
is that I am receiving this event call per partition. But I want to start
cache loading for all lost partitions .

After loading data successfully  next step is to reset lost partition list
using resetLostPartitions () method.

Can you suggest any better way to reload lost data due to lost partitions .

Thanks,
Akash

On Thu, Sep 19, 2019 at 12:48 AM Maxim Muzafarov  wrote:

> Hello,
>
> 1) I'm not sure that I've cached your idea right, but is this method
> are you looking for?
> org.apache.ignite.IgniteCache#lostPartitions [1]
>
> 2) I doubt that it is possible since resetLostPartitions method
> accepts only cache names.
>
> [1]
> https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/IgniteCache.html#lostPartitions--
>
> On Tue, 17 Sep 2019 at 16:14, Akash Shinde  wrote:
> >
> > Can someone please help me on this?
> >
> > Thanks,
> > Akash
> >
> > On Tue, Sep 17, 2019 at 10:40 AM Akash Shinde 
> wrote:
> >>
> >> Hi,
> >> I am trying to recover lost data in case of partition loss.
> >> In my ignite configuration native persistence is off.
> >> I have started event listener on EVT_CACHE_REBALANCE_PART_DATA_LOST
> event. This listener will get lost partition list using
> cache.lostPartitions() method.
> >> The issue is that listener gets call per partition. So if there 100
> partition loss due to single node termination then 100 time this listener
> will get called and last multiple calls to the listener will fetch all lost
> partition list.
> >>
> >> Lets take a scenario:
> >> Started two server nodes  Node A and Node B.  Started cache with
> partition mode and the number of backup set to 0 in order to facilitate
> simulation of partition loss scenarios
> >> Started event listener on both node listening  to
> EVT_CACHE_REBALANCE_PART_DATA_LOST  event.
> >>
> >> Number of partitions on node A = 500
> >> Number of partitions on node B = 524
> >>
> >> Now stop node B. After termination of node B listener running on node A
> gets call multiple time per partition.
> >> I have printed logs on listener
> >>
> >> primary partition size after loss:1024
> >> Lost partion Nos.1
> >> IgniteThread [compositeRwLockIdx=1, stripe=-1, plc=-1,
> name=exchange-worker-#42%springDataNode%]::[0]
> >> Event Detail:CacheRebalancingEvent [cacheName=ASSET_GROUP_CACHE,
> part=0, discoNode=TcpDiscoveryNode
> [id=1bb17828-3556-499f-a4e6-98cfdc1d11fb, addrs=[0:0:0:0:0:0:0:1,
> 10.113.14.98, 127.0.0.1], sockAddrs=[], discPort=47501, order=2,
> intOrder=2, lastExchangeTime=1568357181089, loc=false,
> ver=2.6.0#20180710-sha1:669feacc, isClient=false], discoEvtType=12,
> discoTs=1568357376683, discoEvtName=NODE_FAILED, nodeId8=499400ac,
> msg=Cache rebalancing event., type=CACHE_REBALANCE_PART_DATA_LOST,
> tstamp=1568357376714]
> >> primary partition size after loss:1024
> >> Lost partion Nos.2
> >> IgniteThread [compositeRwLockIdx=1, stripe=-1, plc=-1,
> name=exchange-worker-#42%springDataNode%]::[0, 1]
> >> Event Detail:CacheRebalancingEvent [cacheName=ASSET_GROUP_CACHE,
> part=1, discoNode=TcpDiscoveryNode
> [id=1bb17828-3556-499f-a4e6-98cfdc1d11fb, addrs=[0:0:0:0:0:0:0:1,
> 10.113.14.98, 127.0.0.1], sockAddrs=[], discPort=47501, order=2,
> intOrder=2, lastExchangeTime=1568357181089, loc=false,
> ver=2.6.0#20180710-sha1:669feacc, isClient=false], discoEvtType=12,
> discoTs=1568357376683, discoEvtName=NODE_FAILED, nodeId8=499400ac,
> msg=Cache rebalancing event., type=CACHE_REBALANCE_PART_DATA_LOST,
> tstamp=1568357376726]
> >> primary partition size after loss:1024
> >> Lost partion Nos.3
> >> IgniteThread [compositeRwLockIdx=1, stripe=-1, plc=-1,
> name=exchange-worker-#42%springDataNode%]::[0, 1, 2]
> >> Event Detail:CacheRebalancingEvent [cacheName=ASSET_GROUP_CACHE,
> part=2, discoNode=TcpDiscoveryNode
> [id=1bb17828-3556-499f-a4e6-98cfdc1d11fb, addrs=[0:0:0:0:0:0:0:1,
> 10.113.14.98, 127.0.0.1], sockAddrs=[], discPort=47501, order=2,
> intOrder=2, lastExchangeTime=1568357181089, loc=false,
> ver=2.6.0#20180710-sha1:669feacc, isClient=false], discoEvtType=12,
> discoTs=1568357376683, discoEvtName=NODE_FAILED, nodeId8=499400ac,
> msg=Cache rebalancing event., type=CACHE_REBALANCE_PART_DATA_LOST,
> tstamp=15

Re: Handling Of Partition loss

2019-09-17 Thread Akash Shinde

Can someone please help me on this?

Thanks,
Akash

On Tue, Sep 17, 2019 at 10:40 AM Akash Shinde  wrote:

> Hi,
> I am trying to recover lost data in case of partition loss.
> In my ignite configuration native persistence is *off*.
> I have started event listener on EVT_CACHE_REBALANCE_PART_DATA_LOST
> event. This listener will get lost partition list
> using cache.lostPartitions() method.
> The issue is that listener gets call per partition. So if there 100
> partition loss due to single node termination then 100 time this
> listener will get called and last multiple calls to the listener will fetch
> all lost partition list.
>
> *Lets take a scenario:*
> Started two server nodes  Node A and Node B.  Started cache with
> partition mode and the number of backup set to 0 in order to facilitate
> simulation of partition loss scenarios
> Started event listener on both node listening  to
> EVT_CACHE_REBALANCE_PART_DATA_LOST  event.
>
> Number of partitions on node A = 500
> Number of partitions on node B = 524
>
> Now stop node B. After termination of node B listener running on node A
> gets call multiple time per partition.
> I have printed logs on listener
>
> primary partition size after loss:1024
> *Lost partion Nos.1*
> IgniteThread [compositeRwLockIdx=1, stripe=-1, plc=-1,
> name=exchange-worker-#42%springDataNode%]::*[0]*
> Event Detail:CacheRebalancingEvent [cacheName=ASSET_GROUP_CACHE, part=0,
> discoNode=TcpDiscoveryNode [id=1bb17828-3556-499f-a4e6-98cfdc1d11fb,
> addrs=[0:0:0:0:0:0:0:1, 10.113.14.98, 127.0.0.1], sockAddrs=[],
> discPort=47501, order=2, intOrder=2, lastExchangeTime=1568357181089,
> loc=false, ver=2.6.0#20180710-sha1:669feacc, isClient=false],
> discoEvtType=12, discoTs=1568357376683, discoEvtName=NODE_FAILED,
> nodeId8=499400ac, msg=Cache rebalancing event.,
> type=CACHE_REBALANCE_PART_DATA_LOST, tstamp=1568357376714]
> primary partition size after loss:1024
> *Lost partion Nos.2*
> IgniteThread [compositeRwLockIdx=1, stripe=-1, plc=-1,
> name=exchange-worker-#42%springDataNode%]::*[0, 1]*
> Event Detail:CacheRebalancingEvent [cacheName=ASSET_GROUP_CACHE, part=1,
> discoNode=TcpDiscoveryNode [id=1bb17828-3556-499f-a4e6-98cfdc1d11fb,
> addrs=[0:0:0:0:0:0:0:1, 10.113.14.98, 127.0.0.1], sockAddrs=[],
> discPort=47501, order=2, intOrder=2, lastExchangeTime=1568357181089,
> loc=false, ver=2.6.0#20180710-sha1:669feacc, isClient=false],
> discoEvtType=12, discoTs=1568357376683, discoEvtName=NODE_FAILED,
> nodeId8=499400ac, msg=Cache rebalancing event.,
> type=CACHE_REBALANCE_PART_DATA_LOST, tstamp=1568357376726]
> primary partition size after loss:1024
> *Lost partion Nos.3*
> IgniteThread [compositeRwLockIdx=1, stripe=-1, plc=-1,
> name=exchange-worker-#42%springDataNode%]::*[0, 1, 2]*
> Event Detail:CacheRebalancingEvent [cacheName=ASSET_GROUP_CACHE, part=2,
> discoNode=TcpDiscoveryNode [id=1bb17828-3556-499f-a4e6-98cfdc1d11fb,
> addrs=[0:0:0:0:0:0:0:1, 10.113.14.98, 127.0.0.1], sockAddrs=[],
> discPort=47501, order=2, intOrder=2, lastExchangeTime=1568357181089,
> loc=false, ver=2.6.0#20180710-sha1:669feacc, isClient=false],
> discoEvtType=12, discoTs=1568357376683, discoEvtName=NODE_FAILED,
> nodeId8=499400ac, msg=Cache rebalancing event.,
> type=CACHE_REBALANCE_PART_DATA_LOST, tstamp=1568357376726]
> primary partition size after loss:1024
> *Lost partion Nos.4*
> IgniteThread [compositeRwLockIdx=1, stripe=-1, plc=-1,
> name=exchange-worker-#42%springDataNode%]::*[0, 1, 2, 4]*
> Event Detail:CacheRebalancingEvent [cacheName=ASSET_GROUP_CACHE, part=4,
> discoNode=TcpDiscoveryNode [id=1bb17828-3556-499f-a4e6-98cfdc1d11fb,
> addrs=[0:0:0:0:0:0:0:1, 10.113.14.98, 127.0.0.1], sockAddrs=[],
> discPort=47501, order=2, intOrder=2, lastExchangeTime=1568357181089,
> loc=false, ver=2.6.0#20180710-sha1:669feacc, isClient=false],
> discoEvtType=12, discoTs=1568357376683, discoEvtName=NODE_FAILED,
> nodeId8=499400ac, msg=Cache rebalancing event.,
> type=CACHE_REBALANCE_PART_DATA_LOST, tstamp=1568357376736]
> primary partition size after loss:1024
> *Lost partion Nos.5*
> *.*
> *.*
> *.*
> *.*
> IgniteThread [compositeRwLockIdx=1, stripe=-1, plc=-1,
> name=exchange-worker-#42%springDataNode%]::[0, 1, 2, 4, 5, 6, 7, 11, 13,
> 17, 22, 26, 28, 29, 30, 33, 34, 37, 38, 41, 43, 45, 47, 48, 49, 50, 55, 58,
> 61, 62, 64, 65, 68, 70, 71, 75, 77, 79, 81, 82, 85, 87, 88, 89, 90, 93,
> 100, 101, 102, 104, 110, 112, 114, 116, 121, 123, 125, 126, 132, 133, 135,
> 137, 138, 139, 140, 144, 145, 146, 147, 149, 150, 151, 154, 156, 157, 158,
> 163, 164, 165, 169, 170, 172, 173, 176, 178, 180, 182, 183, 184, 185, 195,
> 196, 198, 199, 203, 204, 212, 213, 215, 217, 219, 220, 222, 223, 224, 226,
> 227, 230, 233, 234, 236, 237, 240, 242, 245, 248, 250, 251, 25

Handling Of Partition loss

2019-09-16 Thread Akash Shinde

Hi,
I am trying to recover lost data in case of partition loss.
In my ignite configuration native persistence is *off*.
I have started event listener on EVT_CACHE_REBALANCE_PART_DATA_LOST  event.
This listener will get lost partition list using cache.lostPartitions()
method.
The issue is that listener gets call per partition. So if there 100
partition loss due to single node termination then 100 time this
listener will get called and last multiple calls to the listener will fetch
all lost partition list.

*Lets take a scenario:*
Started two server nodes  Node A and Node B.  Started cache with
partition mode and the number of backup set to 0 in order to facilitate
simulation of partition loss scenarios
Started event listener on both node listening  to
EVT_CACHE_REBALANCE_PART_DATA_LOST  event.

Number of partitions on node A = 500
Number of partitions on node B = 524

Now stop node B. After termination of node B listener running on node A
gets call multiple time per partition.
I have printed logs on listener

primary partition size after loss:1024
*Lost partion Nos.1*
IgniteThread [compositeRwLockIdx=1, stripe=-1, plc=-1,
name=exchange-worker-#42%springDataNode%]::*[0]*
Event Detail:CacheRebalancingEvent [cacheName=ASSET_GROUP_CACHE, part=0,
discoNode=TcpDiscoveryNode [id=1bb17828-3556-499f-a4e6-98cfdc1d11fb,
addrs=[0:0:0:0:0:0:0:1, 10.113.14.98, 127.0.0.1], sockAddrs=[],
discPort=47501, order=2, intOrder=2, lastExchangeTime=1568357181089,
loc=false, ver=2.6.0#20180710-sha1:669feacc, isClient=false],
discoEvtType=12, discoTs=1568357376683, discoEvtName=NODE_FAILED,
nodeId8=499400ac, msg=Cache rebalancing event.,
type=CACHE_REBALANCE_PART_DATA_LOST, tstamp=1568357376714]
primary partition size after loss:1024
*Lost partion Nos.2*
IgniteThread [compositeRwLockIdx=1, stripe=-1, plc=-1,
name=exchange-worker-#42%springDataNode%]::*[0, 1]*
Event Detail:CacheRebalancingEvent [cacheName=ASSET_GROUP_CACHE, part=1,
discoNode=TcpDiscoveryNode [id=1bb17828-3556-499f-a4e6-98cfdc1d11fb,
addrs=[0:0:0:0:0:0:0:1, 10.113.14.98, 127.0.0.1], sockAddrs=[],
discPort=47501, order=2, intOrder=2, lastExchangeTime=1568357181089,
loc=false, ver=2.6.0#20180710-sha1:669feacc, isClient=false],
discoEvtType=12, discoTs=1568357376683, discoEvtName=NODE_FAILED,
nodeId8=499400ac, msg=Cache rebalancing event.,
type=CACHE_REBALANCE_PART_DATA_LOST, tstamp=1568357376726]
primary partition size after loss:1024
*Lost partion Nos.3*
IgniteThread [compositeRwLockIdx=1, stripe=-1, plc=-1,
name=exchange-worker-#42%springDataNode%]::*[0, 1, 2]*
Event Detail:CacheRebalancingEvent [cacheName=ASSET_GROUP_CACHE, part=2,
discoNode=TcpDiscoveryNode [id=1bb17828-3556-499f-a4e6-98cfdc1d11fb,
addrs=[0:0:0:0:0:0:0:1, 10.113.14.98, 127.0.0.1], sockAddrs=[],
discPort=47501, order=2, intOrder=2, lastExchangeTime=1568357181089,
loc=false, ver=2.6.0#20180710-sha1:669feacc, isClient=false],
discoEvtType=12, discoTs=1568357376683, discoEvtName=NODE_FAILED,
nodeId8=499400ac, msg=Cache rebalancing event.,
type=CACHE_REBALANCE_PART_DATA_LOST, tstamp=1568357376726]
primary partition size after loss:1024
*Lost partion Nos.4*
IgniteThread [compositeRwLockIdx=1, stripe=-1, plc=-1,
name=exchange-worker-#42%springDataNode%]::*[0, 1, 2, 4]*
Event Detail:CacheRebalancingEvent [cacheName=ASSET_GROUP_CACHE, part=4,
discoNode=TcpDiscoveryNode [id=1bb17828-3556-499f-a4e6-98cfdc1d11fb,
addrs=[0:0:0:0:0:0:0:1, 10.113.14.98, 127.0.0.1], sockAddrs=[],
discPort=47501, order=2, intOrder=2, lastExchangeTime=1568357181089,
loc=false, ver=2.6.0#20180710-sha1:669feacc, isClient=false],
discoEvtType=12, discoTs=1568357376683, discoEvtName=NODE_FAILED,
nodeId8=499400ac, msg=Cache rebalancing event.,
type=CACHE_REBALANCE_PART_DATA_LOST, tstamp=1568357376736]
primary partition size after loss:1024
*Lost partion Nos.5*
*.*
*.*
*.*
*.*
IgniteThread [compositeRwLockIdx=1, stripe=-1, plc=-1,
name=exchange-worker-#42%springDataNode%]::[0, 1, 2, 4, 5, 6, 7, 11, 13,
17, 22, 26, 28, 29, 30, 33, 34, 37, 38, 41, 43, 45, 47, 48, 49, 50, 55, 58,
61, 62, 64, 65, 68, 70, 71, 75, 77, 79, 81, 82, 85, 87, 88, 89, 90, 93,
100, 101, 102, 104, 110, 112, 114, 116, 121, 123, 125, 126, 132, 133, 135,
137, 138, 139, 140, 144, 145, 146, 147, 149, 150, 151, 154, 156, 157, 158,
163, 164, 165, 169, 170, 172, 173, 176, 178, 180, 182, 183, 184, 185, 195,
196, 198, 199, 203, 204, 212, 213, 215, 217, 219, 220, 222, 223, 224, 226,
227, 230, 233, 234, 236, 237, 240, 242, 245, 248, 250, 251, 253, 255, 257,
258, 263, 265, 266, 267, 269, 270, 272, 273, 275, 276, 277, 278, 281, 282,
283, 287, 288, 292, 293, 295, 296, 297, 298, 300, 301, 302, 305, 308, 309,
310, 311, 313, 314, 315, 318, 319, 320, 322, 323, 324, 326, 327, 328, 329,
330, 331, 332, 333, 336, 340, 342, 344, 347, 348, 349, 351, 352, 353, 354,
355, 357, 362, 364, 369, 370, 371, 373, 374, 375, 376, 380, 382, 383, 387,
389, 394, 395, 396, 397, 398, 401, 402, 403, 407, 408, 409, 410, 411, 412,
413, 416, 417, 421, 424, 425, 427, 430, 431, 433, 435, 437, 438, 439, 440,
441,

Re: Server Nodes Stopped Unexpectedly

2019-09-09 Thread Akash Shinde

Hi,
Sorry for late reply. I was out of town.
I am trying fetch the logs. Meanwhile could you please answer the questions
from last mail ?

Thanks,
Akash

On Thu, Aug 29, 2019 at 6:51 PM Evgenii Zhuravlev 
wrote:

> Hi,
> Can you please share new logs? It will help to understand the possible
> reason of the issue.
>
> Thanks,
> Evgenii
>
> ср, 28 авг. 2019 г. в 17:56, Akash Shinde :
>
>> Hi,
>>
>> Now I have set the failure detection timeout to 12 mills and I am
>> still getting this error message intermittently on Ignite 2.6 version.
>> It could be the network issue but I am not able to confirm that this is
>> happening because of network issue.
>>
>> 1)  What are all possible reasons for following error? Could you please
>> mention it, it might help to narrow down the issue.
>>  [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException:
>> Thread tcp-disco-srvr-#3%springDataNode% is terminated unexpectedly.]
>>
>> 2) Will upgrading to latest Ignite version 2.7.5 or 2.7.6 solve this
>> problem?
>>
>> 3) How do you monitor the network. Can you please suggest any tool?
>>
>> 4) I understand that node gets segmented because of long GC pause or
>> network connectivity. Is my understanding correct?
>>
>> 5) What is the purpose of networkTimeout configuration? In my case it is
>> set to 1 .
>>
>> Regards,
>> Akash
>>
>> On Mon, Jul 29, 2019 at 2:28 PM Evgenii Zhuravlev <
>> e.zhuravlev...@gmail.com> wrote:
>>
>>> >Does network issue make JVM  halt?
>>> There is a failureDetectionTimeout, which will help other nodes in the
>>> cluster to detect that node is unreachable and to exclude this node from
>>> topology. So, I believe it could be something like a temporary network
>>> problem. I would recommend to add some network monitoring to be prepared
>>> for the next failure.
>>>
>>> Best Regards,
>>> Evgenii
>>>
>>> пт, 26 июл. 2019 г. в 16:01, Akash Shinde :
>>>
>>>> This issue is not consistent and but occurs sometimes. Does network
>>>> issue make JVM  halt?. As per my understanding node will disconnects from
>>>> cluster if network issue happens.
>>>> But in this case multiple JVMs were terminated.Can it be a bug in
>>>> Ignite 2.6 version?
>>>>
>>>> Thanks,
>>>> Akash
>>>>
>>>> On Fri, Jul 26, 2019 at 4:00 PM Evgenii Zhuravlev <
>>>> e.zhuravlev...@gmail.com> wrote:
>>>>
>>>>> I don't see any specific errors in the logs. For me, it looks like
>>>>> network problems, moreover, on client nodes it prints messages about
>>>>> connection problems. Is this issue reproducible?
>>>>> Evgenii
>>>>>
>>>>> пт, 26 июл. 2019 г. в 09:21, Akash Shinde :
>>>>>
>>>>>> Can someone please help me on this issue ?
>>>>>>
>>>>>> On Wed, Jul 24, 2019 at 12:04 PM Akash Shinde 
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>> Please find attached logs from all server and client nodes.Also
>>>>>>> attached gc logs for each node.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Akash
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jul 23, 2019 at 8:59 PM Evgenii Zhuravlev <
>>>>>>> e.zhuravlev...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Can you please share full logs from the node start from all nodes
>>>>>>>> in the cluster?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Evgenii
>>>>>>>>
>>>>>>>> вт, 23 июл. 2019 г. в 16:51, Akash Shinde :
>>>>>>>>
>>>>>>>>> I am using Ignite 2.6 version.  I have created a cluster of 7
>>>>>>>>> server nodes and three client nodes. Out of seven nodes five nodes 
>>>>>>>>> stopped
>>>>>>>>> unexpectedly with below error logs lines.
>>>>>>>>> I have attached logs of two such server nodes.
>>>>>>>>>
>>>>>>>>> FailureDetectionTimeout is set to 3 ms  in Ignite
>>>>>>>>> configuration.
&g

Re: Server Nodes Stopped Unexpectedly

2019-08-28 Thread Akash Shinde

Hi,

Now I have set the failure detection timeout to 12 mills and I am still
getting this error message intermittently on Ignite 2.6 version.
It could be the network issue but I am not able to confirm that this is
happening because of network issue.

1)  What are all possible reasons for following error? Could you please
mention it, it might help to narrow down the issue.
 [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException:
Thread tcp-disco-srvr-#3%springDataNode% is terminated unexpectedly.]

2) Will upgrading to latest Ignite version 2.7.5 or 2.7.6 solve this
problem?

3) How do you monitor the network. Can you please suggest any tool?

4) I understand that node gets segmented because of long GC pause or
network connectivity. Is my understanding correct?

5) What is the purpose of networkTimeout configuration? In my case it is
set to 1 .

Regards,
Akash

On Mon, Jul 29, 2019 at 2:28 PM Evgenii Zhuravlev 
wrote:

> >Does network issue make JVM  halt?
> There is a failureDetectionTimeout, which will help other nodes in the
> cluster to detect that node is unreachable and to exclude this node from
> topology. So, I believe it could be something like a temporary network
> problem. I would recommend to add some network monitoring to be prepared
> for the next failure.
>
> Best Regards,
> Evgenii
>
> пт, 26 июл. 2019 г. в 16:01, Akash Shinde :
>
>> This issue is not consistent and but occurs sometimes. Does network issue
>> make JVM  halt?. As per my understanding node will disconnects from cluster
>> if network issue happens.
>> But in this case multiple JVMs were terminated.Can it be a bug in Ignite
>> 2.6 version?
>>
>> Thanks,
>> Akash
>>
>> On Fri, Jul 26, 2019 at 4:00 PM Evgenii Zhuravlev <
>> e.zhuravlev...@gmail.com> wrote:
>>
>>> I don't see any specific errors in the logs. For me, it looks like
>>> network problems, moreover, on client nodes it prints messages about
>>> connection problems. Is this issue reproducible?
>>> Evgenii
>>>
>>> пт, 26 июл. 2019 г. в 09:21, Akash Shinde :
>>>
>>>> Can someone please help me on this issue ?
>>>>
>>>> On Wed, Jul 24, 2019 at 12:04 PM Akash Shinde 
>>>> wrote:
>>>>
>>>>> Hi,
>>>>> Please find attached logs from all server and client nodes.Also
>>>>> attached gc logs for each node.
>>>>>
>>>>> Thanks,
>>>>> Akash
>>>>>
>>>>>
>>>>> On Tue, Jul 23, 2019 at 8:59 PM Evgenii Zhuravlev <
>>>>> e.zhuravlev...@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Can you please share full logs from the node start from all nodes in
>>>>>> the cluster?
>>>>>>
>>>>>> Thanks,
>>>>>> Evgenii
>>>>>>
>>>>>> вт, 23 июл. 2019 г. в 16:51, Akash Shinde :
>>>>>>
>>>>>>> I am using Ignite 2.6 version.  I have created a cluster of 7 server
>>>>>>> nodes and three client nodes. Out of seven nodes five nodes stopped
>>>>>>> unexpectedly with below error logs lines.
>>>>>>> I have attached logs of two such server nodes.
>>>>>>>
>>>>>>> FailureDetectionTimeout is set to 3 ms  in Ignite configuration.
>>>>>>> Network time out is default.
>>>>>>> ClientFailureDetectionTimeout is set to 3 ms.
>>>>>>>
>>>>>>> I check gc logs but it does not seem to be GC pause issue. I have
>>>>>>> attached GC logs too.
>>>>>>>
>>>>>>> 1) Can someone please help me to identify the reason for this issue?
>>>>>>> 2) Are there any specific reasons which causes this issue or it is a
>>>>>>> bug in Ignite 2.6 version?
>>>>>>>
>>>>>>>
>>>>>>> *ERROR LOGS LINES*
>>>>>>> 2019-07-22 09:22:47,281 19417675 [tcp-disco-srvr-#3%springDataNode%]
>>>>>>> ERROR  - Critical system error detected. Will be handled accordingly to
>>>>>>> configured handler [hnd=class 
>>>>>>> o.a.i.failure.StopNodeOrHaltFailureHandler,
>>>>>>> failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION,
>>>>>>> err=java.lang.IllegalStateException: Thread
>>>>>>> tcp-disco-srvr-#3%springDataNode% is terminated unexpectedly.]]
>>>>>>> java.lang.IllegalStateException: Thread
>>>>>>> tcp-disco-srvr-#3%springDataNode% is terminated unexpectedly.
>>>>>>> at
>>>>>>> org.apache.ignite.spi.discovery.tcp.ServerImpl$TcpServer.body(ServerImpl.java:5686)
>>>>>>> at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
>>>>>>> 2019-07-22 09:22:47,281 19417675 [tcp-disco-srvr-#3%springDataNode%]
>>>>>>> ERROR  - JVM will be halted immediately due to the failure:
>>>>>>> [failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION,
>>>>>>> err=java.lang.IllegalStateException: Thread
>>>>>>> tcp-disco-srvr-#3%springDataNode% is terminated unexpectedly.]]
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Akash
>>>>>>>
>>>>>>

Re: Server Nodes Stopped Unexpectedly

2019-07-26 Thread Akash Shinde

This issue is not consistent and but occurs sometimes. Does network issue
make JVM  halt?. As per my understanding node will disconnects from cluster
if network issue happens.
But in this case multiple JVMs were terminated.Can it be a bug in Ignite
2.6 version?

Thanks,
Akash

On Fri, Jul 26, 2019 at 4:00 PM Evgenii Zhuravlev 
wrote:

> I don't see any specific errors in the logs. For me, it looks like network
> problems, moreover, on client nodes it prints messages about connection
> problems. Is this issue reproducible?
> Evgenii
>
> пт, 26 июл. 2019 г. в 09:21, Akash Shinde :
>
>> Can someone please help me on this issue ?
>>
>> On Wed, Jul 24, 2019 at 12:04 PM Akash Shinde 
>> wrote:
>>
>>> Hi,
>>> Please find attached logs from all server and client nodes.Also attached
>>> gc logs for each node.
>>>
>>> Thanks,
>>> Akash
>>>
>>>
>>> On Tue, Jul 23, 2019 at 8:59 PM Evgenii Zhuravlev <
>>> e.zhuravlev...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> Can you please share full logs from the node start from all nodes in
>>>> the cluster?
>>>>
>>>> Thanks,
>>>> Evgenii
>>>>
>>>> вт, 23 июл. 2019 г. в 16:51, Akash Shinde :
>>>>
>>>>> I am using Ignite 2.6 version.  I have created a cluster of 7 server
>>>>> nodes and three client nodes. Out of seven nodes five nodes stopped
>>>>> unexpectedly with below error logs lines.
>>>>> I have attached logs of two such server nodes.
>>>>>
>>>>> FailureDetectionTimeout is set to 3 ms  in Ignite configuration.
>>>>> Network time out is default.
>>>>> ClientFailureDetectionTimeout is set to 3 ms.
>>>>>
>>>>> I check gc logs but it does not seem to be GC pause issue. I have
>>>>> attached GC logs too.
>>>>>
>>>>> 1) Can someone please help me to identify the reason for this issue?
>>>>> 2) Are there any specific reasons which causes this issue or it is a
>>>>> bug in Ignite 2.6 version?
>>>>>
>>>>>
>>>>> *ERROR LOGS LINES*
>>>>> 2019-07-22 09:22:47,281 19417675 [tcp-disco-srvr-#3%springDataNode%]
>>>>> ERROR  - Critical system error detected. Will be handled accordingly to
>>>>> configured handler [hnd=class o.a.i.failure.StopNodeOrHaltFailureHandler,
>>>>> failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION,
>>>>> err=java.lang.IllegalStateException: Thread
>>>>> tcp-disco-srvr-#3%springDataNode% is terminated unexpectedly.]]
>>>>> java.lang.IllegalStateException: Thread
>>>>> tcp-disco-srvr-#3%springDataNode% is terminated unexpectedly.
>>>>> at
>>>>> org.apache.ignite.spi.discovery.tcp.ServerImpl$TcpServer.body(ServerImpl.java:5686)
>>>>> at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
>>>>> 2019-07-22 09:22:47,281 19417675 [tcp-disco-srvr-#3%springDataNode%]
>>>>> ERROR  - JVM will be halted immediately due to the failure:
>>>>> [failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION,
>>>>> err=java.lang.IllegalStateException: Thread
>>>>> tcp-disco-srvr-#3%springDataNode% is terminated unexpectedly.]]
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Akash
>>>>>
>>>>

Re: Server Nodes Stopped Unexpectedly

2019-07-26 Thread Akash Shinde

Can someone please help me on this issue ?

On Wed, Jul 24, 2019 at 12:04 PM Akash Shinde  wrote:

> Hi,
> Please find attached logs from all server and client nodes.Also attached
> gc logs for each node.
>
> Thanks,
> Akash
>
>
> On Tue, Jul 23, 2019 at 8:59 PM Evgenii Zhuravlev <
> e.zhuravlev...@gmail.com> wrote:
>
>> Hi,
>>
>> Can you please share full logs from the node start from all nodes in the
>> cluster?
>>
>> Thanks,
>> Evgenii
>>
>> вт, 23 июл. 2019 г. в 16:51, Akash Shinde :
>>
>>> I am using Ignite 2.6 version.  I have created a cluster of 7 server
>>> nodes and three client nodes. Out of seven nodes five nodes stopped
>>> unexpectedly with below error logs lines.
>>> I have attached logs of two such server nodes.
>>>
>>> FailureDetectionTimeout is set to 3 ms  in Ignite configuration.
>>> Network time out is default.
>>> ClientFailureDetectionTimeout is set to 3 ms.
>>>
>>> I check gc logs but it does not seem to be GC pause issue. I have
>>> attached GC logs too.
>>>
>>> 1) Can someone please help me to identify the reason for this issue?
>>> 2) Are there any specific reasons which causes this issue or it is a bug
>>> in Ignite 2.6 version?
>>>
>>>
>>> *ERROR LOGS LINES*
>>> 2019-07-22 09:22:47,281 19417675 [tcp-disco-srvr-#3%springDataNode%]
>>> ERROR  - Critical system error detected. Will be handled accordingly to
>>> configured handler [hnd=class o.a.i.failure.StopNodeOrHaltFailureHandler,
>>> failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION,
>>> err=java.lang.IllegalStateException: Thread
>>> tcp-disco-srvr-#3%springDataNode% is terminated unexpectedly.]]
>>> java.lang.IllegalStateException: Thread
>>> tcp-disco-srvr-#3%springDataNode% is terminated unexpectedly.
>>> at
>>> org.apache.ignite.spi.discovery.tcp.ServerImpl$TcpServer.body(ServerImpl.java:5686)
>>> at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
>>> 2019-07-22 09:22:47,281 19417675 [tcp-disco-srvr-#3%springDataNode%]
>>> ERROR  - JVM will be halted immediately due to the failure:
>>> [failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION,
>>> err=java.lang.IllegalStateException: Thread
>>> tcp-disco-srvr-#3%springDataNode% is terminated unexpectedly.]]
>>>
>>>
>>> Thanks,
>>> Akash
>>>
>>

Re: After upgrading 2.7 getting Unexpected error occurred during unmarshalling

2019-01-08 Thread Akash Shinde

Hi,

No both nodes, client and server are running on Ignite 2.7 version. I am
starting both server and client from Intellij IDE.

Version printed in Server node log:
Ignite ver. 2.7.0#20181201-sha1:256ae4012cb143b4855b598b740a6f3499ead4db

Version in client node log:
Ignite ver. 2.7.0#20181201-sha1:256ae4012cb143b4855b598b740a6f3499ead4db

Thanks,
Akash

On Tue, Jan 8, 2019 at 5:18 PM Mikael  wrote:

> Hi!
>
> Any chance you might have one node running 2.6 or something like that ?
>
> It looks like it get a different object that does not match the one
> expected in 2.7
>
> Mikael
> Den 2019-01-08 kl. 12:21, skrev Akash Shinde:
>
> Before submitting the affinity task ignite first gets the affinity cached
> function (AffinityInfo) by submitting the cluster wide task "AffinityJob".
> But while in the process of retrieving the output of this AffinityJob,
> ignite deserializes this output. I am getting exception while deserailizing
> this output.
> In TcpDiscoveryNode.readExternal() method while deserailizing the
> CacheMetrics object from input stream on 14th iteration I am getting
> following exception. Complete stack trace is given in this mail chain.
>
> Caused by: java.io.IOException: Unexpected error occurred during
> unmarshalling of an instance of the class:
> org.apache.ignite.internal.processors.cache.CacheMetricsSnapshot.
>
> This is working fine on Ignite 2.6 version but giving problem on 2.7.
>
> Is this a bug or am I doing something wrong?
>
> Can someone please help?
>
> On Mon, Jan 7, 2019 at 9:41 PM Akash Shinde  wrote:
>
>> Hi,
>>
>> When execute affinity.partition(key), I am getting following exception on
>> Ignite  2.7.
>>
>> Stacktrace:
>>
>> 2019-01-07 21:23:03,093 6699878 [mgmt-#67%springDataNode%] ERROR
>> o.a.i.i.p.task.GridTaskWorker - Error deserializing job response:
>> GridJobExecuteResponse [nodeId=c0c832cb-33b0-4139-b11d-5cafab2fd046,
>> sesId=4778e982861-31445139-523d-4d44-b071-9ca1eb2d73df,
>> jobId=5778e982861-31445139-523d-4d44-b071-9ca1eb2d73df, gridEx=null,
>> isCancelled=false, retry=null]
>> org.apache.ignite.IgniteCheckedException: Failed to unmarshal object with
>> optimized marshaller
>>  at
>> org.apache.ignite.internal.util.IgniteUtils.unmarshal(IgniteUtils.java:10146)
>>  at
>> org.apache.ignite.internal.processors.task.GridTaskWorker.onResponse(GridTaskWorker.java:831)
>>  at
>> org.apache.ignite.internal.processors.task.GridTaskProcessor.processJobExecuteResponse(GridTaskProcessor.java:1081)
>>  at
>> org.apache.ignite.internal.processors.task.GridTaskProcessor$JobMessageListener.onMessage(GridTaskProcessor.java:1316)
>>  at
>> org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1569)
>>  at
>> org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1197)
>>  at
>> org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:127)
>>  at
>> org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1093)
>>  at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>  at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>  at java.lang.Thread.run(Thread.java:748)
>> Caused by: org.apache.ignite.binary.BinaryObjectException: Failed to
>> unmarshal object with optimized marshaller
>>  at
>> org.apache.ignite.internal.binary.BinaryUtils.doReadOptimized(BinaryUtils.java:1765)
>>  at
>> org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1964)
>>  at
>> org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1716)
>>  at
>> org.apache.ignite.internal.binary.GridBinaryMarshaller.deserialize(GridBinaryMarshaller.java:313)
>>  at
>> org.apache.ignite.internal.binary.BinaryMarshaller.unmarshal0(BinaryMarshaller.java:102)
>>  at
>> org.apache.ignite.marshaller.AbstractNodeNameAwareMarshaller.unmarshal(AbstractNodeNameAwareMarshaller.java:82)
>>  at
>> org.apache.ignite.internal.util.IgniteUtils.unmarshal(IgniteUtils.java:10140)
>>  ... 10 common frames omitted
>> Caused by: org.apache.ignite.IgniteCheckedException: Failed to
>> deserialize object with given class loader:
>> [clsLdr=sun.misc.Launcher$AppClassLoader@18b4aac2, err=Failed to
>> deserialize object
>> [typeName=org.apache.ignite.internal.util.lang.GridTuple3]]
>>  at
>> org.apache.ignite.in

Re: After upgrading 2.7 getting Unexpected error occurred during unmarshalling

2019-01-08 Thread Akash Shinde

Before submitting the affinity task ignite first gets the affinity cached
function (AffinityInfo) by submitting the cluster wide task "AffinityJob".
But while in the process of retrieving the output of this AffinityJob,
ignite deserializes this output. I am getting exception while deserailizing
this output.
In TcpDiscoveryNode.readExternal() method while deserailizing the
CacheMetrics object from input stream on 14th iteration I am getting
following exception. Complete stack trace is given in this mail chain.

Caused by: java.io.IOException: Unexpected error occurred during
unmarshalling of an instance of the class:
org.apache.ignite.internal.processors.cache.CacheMetricsSnapshot.

This is working fine on Ignite 2.6 version but giving problem on 2.7.

Is this a bug or am I doing something wrong?

Can someone please help?

On Mon, Jan 7, 2019 at 9:41 PM Akash Shinde  wrote:

> Hi,
>
> When execute affinity.partition(key), I am getting following exception on
> Ignite  2.7.
>
> Stacktrace:
>
> 2019-01-07 21:23:03,093 6699878 [mgmt-#67%springDataNode%] ERROR
> o.a.i.i.p.task.GridTaskWorker - Error deserializing job response:
> GridJobExecuteResponse [nodeId=c0c832cb-33b0-4139-b11d-5cafab2fd046,
> sesId=4778e982861-31445139-523d-4d44-b071-9ca1eb2d73df,
> jobId=5778e982861-31445139-523d-4d44-b071-9ca1eb2d73df, gridEx=null,
> isCancelled=false, retry=null]
> org.apache.ignite.IgniteCheckedException: Failed to unmarshal object with
> optimized marshaller
>  at
> org.apache.ignite.internal.util.IgniteUtils.unmarshal(IgniteUtils.java:10146)
>  at
> org.apache.ignite.internal.processors.task.GridTaskWorker.onResponse(GridTaskWorker.java:831)
>  at
> org.apache.ignite.internal.processors.task.GridTaskProcessor.processJobExecuteResponse(GridTaskProcessor.java:1081)
>  at
> org.apache.ignite.internal.processors.task.GridTaskProcessor$JobMessageListener.onMessage(GridTaskProcessor.java:1316)
>  at
> org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1569)
>  at
> org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1197)
>  at
> org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:127)
>  at
> org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1093)
>  at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.ignite.binary.BinaryObjectException: Failed to
> unmarshal object with optimized marshaller
>  at
> org.apache.ignite.internal.binary.BinaryUtils.doReadOptimized(BinaryUtils.java:1765)
>  at
> org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1964)
>  at
> org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1716)
>  at
> org.apache.ignite.internal.binary.GridBinaryMarshaller.deserialize(GridBinaryMarshaller.java:313)
>  at
> org.apache.ignite.internal.binary.BinaryMarshaller.unmarshal0(BinaryMarshaller.java:102)
>  at
> org.apache.ignite.marshaller.AbstractNodeNameAwareMarshaller.unmarshal(AbstractNodeNameAwareMarshaller.java:82)
>  at
> org.apache.ignite.internal.util.IgniteUtils.unmarshal(IgniteUtils.java:10140)
>  ... 10 common frames omitted
> Caused by: org.apache.ignite.IgniteCheckedException: Failed to deserialize
> object with given class loader:
> [clsLdr=sun.misc.Launcher$AppClassLoader@18b4aac2, err=Failed to
> deserialize object
> [typeName=org.apache.ignite.internal.util.lang.GridTuple3]]
>  at
> org.apache.ignite.internal.marshaller.optimized.OptimizedMarshaller.unmarshal0(OptimizedMarshaller.java:237)
>  at
> org.apache.ignite.marshaller.AbstractNodeNameAwareMarshaller.unmarshal(AbstractNodeNameAwareMarshaller.java:94)
>  at
> org.apache.ignite.internal.binary.BinaryUtils.doReadOptimized(BinaryUtils.java:1762)
>  ... 16 common frames omitted
> Caused by: java.io.IOException: Failed to deserialize object
> [typeName=org.apache.ignite.internal.util.lang.GridTuple3]
>  at
> org.apache.ignite.internal.marshaller.optimized.OptimizedObjectInputStream.readObject0(OptimizedObjectInputStream.java:350)
>  at
> org.apache.ignite.internal.marshaller.optimized.OptimizedObjectInputStream.readObjectOverride(OptimizedObjectInputStream.java:198)
>  at java.io.ObjectInputStream.readObject(ObjectInputStream.java:425)
>  at
> org.apache.ignite.internal.marshaller.optimized.OptimizedMarshaller.unmarshal0(OptimizedMarshaller.java:228)
>  ... 18 common frames omitted
> Caused by: java.io.IOException: Fail

Re: nodes getting disconnected from cluster

2019-01-07 Thread Akash Shinde

Hi,
Someone could please help me on this issue.

Thanks,
Akash

On Thu, Jan 3, 2019 at 5:46 PM Akash Shinde  wrote:

> Hi,
>
> I am getting " Timed out waiting for message delivery receipt" WARN
> message in my logs.
> But I am sure that it is not happening because of long GC pause. I have
> check the memory utilization and it is very low.
>
> I also tried to check the connectivity between two nodes between which the
> timeout is happening.
> bandwidth is as shown below.
>
> [ ID] Interval   Transfer Bandwidth
> [  4]  0.0-10.1 sec   855 MBytes   708 Mbits/sec
>
> Many times I get following message in my logs. Is it because two nodes are
> not able communicate within given time limit?
>
> *ERROR:*
>  Blocked system-critical thread has been detected. This can lead to
> cluster-wide undefined behaviour [threadName=tcp-disco-msg-worker,
> blockedFor=14s]
>
> I have also attached log snippet. Can some one please help to narrow down
> the issue?
>
> Thanks,
> Akash
>

After upgrading 2.7 getting Unexpected error occurred during unmarshalling

2019-01-07 Thread Akash Shinde

Hi,

When execute affinity.partition(key), I am getting following exception on
Ignite  2.7.

Stacktrace:

2019-01-07 21:23:03,093 6699878 [mgmt-#67%springDataNode%] ERROR
o.a.i.i.p.task.GridTaskWorker - Error deserializing job response:
GridJobExecuteResponse [nodeId=c0c832cb-33b0-4139-b11d-5cafab2fd046,
sesId=4778e982861-31445139-523d-4d44-b071-9ca1eb2d73df,
jobId=5778e982861-31445139-523d-4d44-b071-9ca1eb2d73df, gridEx=null,
isCancelled=false, retry=null]
org.apache.ignite.IgniteCheckedException: Failed to unmarshal object with
optimized marshaller
 at
org.apache.ignite.internal.util.IgniteUtils.unmarshal(IgniteUtils.java:10146)
 at
org.apache.ignite.internal.processors.task.GridTaskWorker.onResponse(GridTaskWorker.java:831)
 at
org.apache.ignite.internal.processors.task.GridTaskProcessor.processJobExecuteResponse(GridTaskProcessor.java:1081)
 at
org.apache.ignite.internal.processors.task.GridTaskProcessor$JobMessageListener.onMessage(GridTaskProcessor.java:1316)
 at
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1569)
 at
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1197)
 at
org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:127)
 at
org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1093)
 at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.ignite.binary.BinaryObjectException: Failed to
unmarshal object with optimized marshaller
 at
org.apache.ignite.internal.binary.BinaryUtils.doReadOptimized(BinaryUtils.java:1765)
 at
org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1964)
 at
org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1716)
 at
org.apache.ignite.internal.binary.GridBinaryMarshaller.deserialize(GridBinaryMarshaller.java:313)
 at
org.apache.ignite.internal.binary.BinaryMarshaller.unmarshal0(BinaryMarshaller.java:102)
 at
org.apache.ignite.marshaller.AbstractNodeNameAwareMarshaller.unmarshal(AbstractNodeNameAwareMarshaller.java:82)
 at
org.apache.ignite.internal.util.IgniteUtils.unmarshal(IgniteUtils.java:10140)
 ... 10 common frames omitted
Caused by: org.apache.ignite.IgniteCheckedException: Failed to deserialize
object with given class loader:
[clsLdr=sun.misc.Launcher$AppClassLoader@18b4aac2, err=Failed to
deserialize object
[typeName=org.apache.ignite.internal.util.lang.GridTuple3]]
 at
org.apache.ignite.internal.marshaller.optimized.OptimizedMarshaller.unmarshal0(OptimizedMarshaller.java:237)
 at
org.apache.ignite.marshaller.AbstractNodeNameAwareMarshaller.unmarshal(AbstractNodeNameAwareMarshaller.java:94)
 at
org.apache.ignite.internal.binary.BinaryUtils.doReadOptimized(BinaryUtils.java:1762)
 ... 16 common frames omitted
Caused by: java.io.IOException: Failed to deserialize object
[typeName=org.apache.ignite.internal.util.lang.GridTuple3]
 at
org.apache.ignite.internal.marshaller.optimized.OptimizedObjectInputStream.readObject0(OptimizedObjectInputStream.java:350)
 at
org.apache.ignite.internal.marshaller.optimized.OptimizedObjectInputStream.readObjectOverride(OptimizedObjectInputStream.java:198)
 at java.io.ObjectInputStream.readObject(ObjectInputStream.java:425)
 at
org.apache.ignite.internal.marshaller.optimized.OptimizedMarshaller.unmarshal0(OptimizedMarshaller.java:228)
 ... 18 common frames omitted
Caused by: java.io.IOException: Failed to deserialize object
[typeName=org.apache.ignite.internal.processors.affinity.GridAffinityAssignment]
 at
org.apache.ignite.internal.marshaller.optimized.OptimizedObjectInputStream.readObject0(OptimizedObjectInputStream.java:350)
 at
org.apache.ignite.internal.marshaller.optimized.OptimizedObjectInputStream.readObjectOverride(OptimizedObjectInputStream.java:198)
 at java.io.ObjectInputStream.readObject(ObjectInputStream.java:425)
 at
org.apache.ignite.internal.util.lang.GridTuple3.readExternal(GridTuple3.java:197)
 at
org.apache.ignite.internal.marshaller.optimized.OptimizedObjectInputStream.readExternalizable(OptimizedObjectInputStream.java:555)
 at
org.apache.ignite.internal.marshaller.optimized.OptimizedClassDescriptor.read(OptimizedClassDescriptor.java:949)
 at
org.apache.ignite.internal.marshaller.optimized.OptimizedObjectInputStream.readObject0(OptimizedObjectInputStream.java:346)
 ... 21 common frames omitted
Caused by: java.io.IOException: Failed to deserialize field
[name=assignment]
 at
org.apache.ignite.internal.marshaller.optimized.OptimizedObjectInputStream.readFields(OptimizedObjectInputStream.java:526)
 at
org.apache.ignite.internal.marshaller.optimized.OptimizedObjectInputStream.readSerializable(OptimizedObjectInputStream.java:611)
 at

Error in write-through

2018-11-12 Thread Akash Shinde

Hi,
I have started four ignite nodes and configured cache in distributed mode.
When I initiated thousands of requests to write the data on this
cache(write through enabled) , facing below error.
>From logs we can see this error is occurring  while witting to oracle
database.(using cache write-through).
This error is not consistent.Node does stop for while after this error and
continues to pick up the next ignite tasks.
Please someone advise what does following log means.


2018-11-13 05:52:05,577 2377545 [core-1] INFO
c.q.a.a.s.AssetManagementService - Add asset request processing started,
requestId ADD_Ip_483, subscriptionId =262604, userId=547159
2018-11-13 05:52:06,647 2378615 [grid-timeout-worker-#39%springDataNode%]
WARN  o.a.ignite.internal.util.typedef.G - >>> Possible starvation in
striped pool.
Thread name: sys-stripe-11-#12%springDataNode%
Queue: [Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE,
topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false,
msg=GridNearSingleGetResponse [futId=1542085977929, res=BinaryObjectImpl
[arr= true, ctx=false, start=0], topVer=null, err=null, flags=0]]], Message
closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8,
ordered=false, timeout=0, skipOnTimeout=false, msg=GridDhtTxPrepareResponse
[nearEvicted=null, futId=b67ac7b0761-93ebea72-bf4e-40d8-8a19-d3258be94ce9,
miniId=1, super=GridDistributedTxPrepareResponse [txState=null, part=-1,
err=null, super=GridDistributedBaseMessage [ver=GridCacheVersion
[topVer=153565953, order=1542089536997, nodeOrder=3], committedVers=null,
rolledbackVers=null, cnt=0, super=GridCacheIdMessage [cacheId=0]],
Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8,
ordered=false, timeout=0, skipOnTimeout=false, msg=GridNearSingleGetRequest
[futId=1542085976178, key=BinaryObjectImpl [arr= true, ctx=false, start=0],
flags=1, topVer=AffinityTopologyVersion [topVer=7, minorTopVer=0],
subjId=9e8db7e7-48ba-4161-881b-ad4fcfc175a0, taskNameHash=0, createTtl=-1,
accessTtl=-1
Deadlock: false
Completed: 703
Thread [name="sys-stripe-11-#12%springDataNode%", id=41, state=RUNNABLE,
blockCnt=37, waitCnt=729]
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at oracle.net.ns.Packet.receive(Packet.java:311)
at oracle.net.ns.DataPacket.receive(DataPacket.java:105)
at
oracle.net.ns.NetInputStream.getNextPacket(NetInputStream.java:305)
at oracle.net.ns.NetInputStream.read(NetInputStream.java:249)
at oracle.net.ns.NetInputStream.read(NetInputStream.java:171)
at oracle.net.ns.NetInputStream.read(NetInputStream.java:89)
at
oracle.jdbc.driver.T4CSocketInputStreamWrapper.readNextPacket(T4CSocketInputStreamWrapper.java:123)
at
oracle.jdbc.driver.T4CSocketInputStreamWrapper.read(T4CSocketInputStreamWrapper.java:79)
at
oracle.jdbc.driver.T4CMAREngineStream.unmarshalUB1(T4CMAREngineStream.java:429)
at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:397)
at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:257)
at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:587)
at
oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:225)
at
oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:53)
at
oracle.jdbc.driver.T4CPreparedStatement.executeForRows(T4CPreparedStatement.java:943)
at
oracle.jdbc.driver.OraclePreparedStatement.executeForRowsWithTimeout(OraclePreparedStatement.java:12029)
at
oracle.jdbc.driver.OraclePreparedStatement.executeBatch(OraclePreparedStatement.java:12140)
at
oracle.jdbc.driver.OracleStatementWrapper.executeBatch(OracleStatementWrapper.java:246)
at
com.zaxxer.hikari.pool.ProxyStatement.executeBatch(ProxyStatement.java:128)
at
com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeBatch(HikariProxyPreparedStatement.java)
at
com.qualys.agms.grid.cache.loader.AbstractDefaultCacheStore.writeAll(AbstractDefaultCacheStore.java:126)
at
o.a.i.i.processors.cache.store.GridCacheStoreManagerAdapter.putAll(GridCacheStoreManagerAdapter.java:641)
at
o.a.i.i.processors.cache.transactions.IgniteTxAdapter.batchStoreCommit(IgniteTxAdapter.java:1422)
at
o.a.i.i.processors.cache.transactions.IgniteTxLocalAdapter.userCommit(IgniteTxLocalAdapter.java:502)
at
o.a.i.i.processors.cache.distributed.near.GridNearTxLocal.localFinish(GridNearTxLocal.java:3185)
at
o.a.i.i.processors.cache.distributed.near.GridNearTxFinishFuture.doFinish(GridNearTxFinishFuture.java:467)
at
o.a.i.i.processors.cache.distributed.near.GridNearTxFinishFuture.finish(GridNearTxFinishFuture.java:417)
at

Cache Metrics

2018-10-26 Thread Akash Shinde

Hi,
 I have captured below two ignite cache metrics.

1) igniteCache.metrics().getSize()
2)igniteCache  .metrics().getOffHeapEntriesCount()

I started three nodes with distrusted load.
When I started filling the cache I observed OffHeapEntries counts are
approximately double
of cache size.

Why offHeapEntries count are approximately double?
Is offHeapEntries includes primary count plus backup counts?

Thanks,
Akash

Re: Query execution too long even after providing index

2018-09-10 Thread Akash Shinde

Hello guys,

I am also facing the similar problem. Does community users have any
solution for this?

This has become blocking issue for me. Can someone please help?

Thanks,
Akash


On Mon, Sep 10, 2018 at 8:33 AM Prasad Bhalerao <
prasadbhalerao1...@gmail.com> wrote:

> Guys, is there any solution for this?
> Can someone please respond?
>
> Thanks,
> Prasad
>
>
> -- Forwarded message -
> From: Prasad Bhalerao 
> Date: Fri, Sep 7, 2018, 8:04 AM
> Subject: Fwd: Query execution too long even after providing index
> To: 
>
>
> Can we have update on this?
>
> -- Forwarded message -
> From: Prasad Bhalerao 
> Date: Wed, Sep 5, 2018, 11:34 AM
> Subject: Re: Query execution too long even after providing index
> To: 
>
>
> Hi Andrey,
>
> Can you please help me with this? I
>
> Thanks,
> Prasad
>
> On Tue, Sep 4, 2018 at 2:08 PM Prasad Bhalerao <
> prasadbhalerao1...@gmail.com> wrote:
>
>>
>> I tried changing SqlIndexMaxInlineSize to 32 byte and 100 byte using
>> cache config.
>>
>> ipContainerIpV4CacheCfg.setSqlIndexMaxInlineSize(32/100);
>>
>> But it did not improve the sql execution time. Sql execution time
>> increases with increase in cache size.
>>
>> It is a simple range scan query. Which part of the execution process
>> might take time in this case?
>>
>> Can you please advise?
>>
>> Thanks,
>> PRasad
>>
>> On Mon, Sep 3, 2018 at 8:06 PM Andrey Mashenkov <
>> andrey.mashen...@gmail.com> wrote:
>>
>>> HI,
>>>
>>> Have you tried to increase index inlineSize? It is 10 bytes by default.
>>>
>>> Your indices uses simple value types (Java primitives) and all columns
>>> can be easily inlined.
>>> It should be enough to increase inlineSize up to 32 bytes (3 longs + 1
>>> int = 3*(8 /*long*/ + 1/*type code*/) + (4/*int*/ + 1/*type code*/)) to
>>> inline all columns for the idx1, and up to 27 (3 longs) for idx2.
>>>
>>> You can try to benchmark queries with different inline sizes to find
>>> optimal ratio between speedup and index size.
>>>
>>>
>>>
>>> On Mon, Sep 3, 2018 at 5:12 PM Prasad Bhalerao <
>>> prasadbhalerao1...@gmail.com> wrote:
>>>
 Hi,
 My cache has 1 million rows and the sql is as follows.
 This sql is taking around 1.836 seconds to execute and this time
 increases as I go on adding the data to this cache. Some time it takes more
 than 4 seconds.

 Is there any way to improve the execution time?

 *SQL:*
 SELECT id, moduleId,ipEnd, ipStart
 FROM IpContainerIpV4Data USE INDEX(ip_container_ipv4_idx1)
 WHERE subscriptionId = ?  AND moduleId = ? AND (ipStart
 <= ? AND ipEnd   >= ?)
 UNION ALL
 SELECT id, moduleId,ipEnd, ipStart
 FROM IpContainerIpV4Data USE INDEX(ip_container_ipv4_idx1)
 WHERE subscriptionId = ? AND moduleId = ? AND (ipStart
 <= ? AND ipEnd   >= ?)
 UNION ALL
 SELECT id, moduleId,ipEnd, ipStart
 FROM IpContainerIpV4Data USE INDEX(ip_container_ipv4_idx1)
 WHERE subscriptionId = ? AND moduleId = ? AND (ipStart
 >= ? AND ipEnd   <= ?)

 *Indexes are as follows:*

 public class IpContainerIpV4Data implements Data, 
 UpdatableData {

   @QuerySqlField
   private long id;

   @QuerySqlField(orderedGroups = {@QuerySqlField.Group(name = 
 "ip_container_ipv4_idx1", order = 1)})
   private int moduleId;

   @QuerySqlField(orderedGroups = {@QuerySqlField.Group(name = 
 "ip_container_ipv4_idx1", order = 0),
   @QuerySqlField.Group(name = "ip_container_ipv4_idx2", order = 0)})
   private long subscriptionId;

   @QuerySqlField(orderedGroups = {@QuerySqlField.Group(name = 
 "ip_container_ipv4_idx1", order = 3, descending = true),
   @QuerySqlField.Group(name = "ip_container_ipv4_idx2", order = 2, 
 descending = true)})
   private long ipEnd;

   @QuerySqlField(orderedGroups = {@QuerySqlField.Group(name = 
 "ip_container_ipv4_idx1", order = 2),
   @QuerySqlField.Group(name = "ip_container_ipv4_idx2", order = 1)})
   private long ipStart;

 }


 *Execution Plan:*

 2018-09-03 19:32:03,098 232176 [pub-#78%springDataNode%] INFO
 c.q.a.g.d.IpContainerIpV4DataGridDaoImpl - SELECT
 __Z0.ID AS __C0_0,
 __Z0.MODULEID AS __C0_1,
 __Z0.IPEND AS __C0_2,
 __Z0.IPSTART AS __C0_3
 FROM IP_CONTAINER_IPV4_CACHE.IPCONTAINERIPV4DATA __Z0 USE INDEX
 (IP_CONTAINER_IPV4_IDX1)
 /* IP_CONTAINER_IPV4_CACHE.IP_CONTAINER_IPV4_IDX1: SUBSCRIPTIONID =
 ?1
 AND MODULEID = ?2
 AND IPSTART <= ?3
 AND IPEND >= ?4
  */
 WHERE ((__Z0.SUBSCRIPTIONID = ?1)
 AND (__Z0.MODULEID = ?2))
 AND ((__Z0.IPSTART <= ?3)
 AND (__Z0.IPEND >= ?4))
 2018-09-03 19:32:03,098 232176 [pub-#78%springDataNode%] INFO
 c.q.a.g.d.IpContainerIpV4DataGridDaoImpl - SELECT
 __Z1.ID AS __C1_0,

Re: Partitions distribution across nodes

2018-08-08 Thread akash shinde

Hi,

I introduced the delay of 5 seconds and it worked.

1) What is exchange process and how to identify whether exchange process is
finished?

2) I am doing partition key aware data loading and I want to start load
process from server node only and not from the client node. I just want to
initiate my load process only after all the configured nodes are up and
running.
For that I am using Distributed count down latch. Each node when started
reduces the count on LifecycleEventType.AFTER_NODE_START event. When this
count down latch count becomes zero, cache.loadCache() method is invoked
and this method is always executed from the node which joins the cluster
last.
Is there any better way to achieve this?

3) I also want to make sure that if any other node joins the cluster after
the data loading process is comeplete, cache.loadCache method is not
invoked and the data is made available to this node using re-balancing
process.
I am thinking to use some variable which will tell the cache loading is
complete. Does ignite have any builtin feature to achieve this?

Code is as shown below to get the ignite partitions.

private List getPrimaryParitionIdsLocalToNode() {
  Affinity affinity = igniteSpringBean.affinity(cacheName);
  ClusterNode locNode = igniteSpringBean.cluster().localNode();
  List primaryPartitionIds =
Arrays.stream(affinity.primaryPartitions(locNode)).boxed()
  .collect(Collectors.toList());
  LOGGER.info("Primary Partition Ids for Node {} are {}",
locNode.id(), primaryPartitionIds);
  LOGGER.info("Number of Primary Partition Ids for Node {} are {}",
locNode.id(), primaryPartitionIds.size());
  return primaryPartitionIds;
}

private List getBackupParitionIdsLocalToNode() {
  Affinity affinity = igniteSpringBean.affinity(cacheName);
  ClusterNode locNode = igniteSpringBean.cluster().localNode();
  List backPartitionIds =
Arrays.stream(affinity.backupPartitions(locNode)).boxed()
  .collect(Collectors.toList());
  LOGGER.info("Backup Partition Ids for Node {} are {}", locNode.id(),
backPartitionIds);
  LOGGER.info("Number of Backup Partition Ids for Node {} are {}",
locNode.id(), backPartitionIds.size());
  return backPartitionIds;
}

Thanks,
Akash

On Wed, Aug 8, 2018 at 1:16 PM dkarachentsev 
wrote:

> Hi Akash,
>
> How do you measure partition distribution? Can you provide code for that
> test? I can assume that you get partitions before exchange process if
> finished. Try to use delay in 5 sec after all nodes are started and check
> again.
>
> Thanks!
> -Dmitry
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Partitions distribution across nodes

2018-08-07 Thread akash shinde

Hi ,
I am loading cache in partition aware mode.I have started four nodes.
Out of these four node two nodes are loading only backup partitions and
other two nodes are loading only primary partitions.

As per my understanding each node should have backup and primary partition
both.

But in my cluster distributions of partitions looks like this

Node Primary partitions Backup partitions
NODE 1 518 0
NODE 2 0 498
NODE 3 506 0
NODE 4 0 503

*Configuration of Cache*

CacheConfiguration ipv4AssetGroupDetailCacheCfg = new CacheConfiguration<>(
CacheName.IPV4_ASSET_GROUP_DETAIL_CACHE.name());
ipv4AssetGroupDetailCacheCfg.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL);
ipv4AssetGroupDetailCacheCfg.setWriteThrough(true);
ipv4AssetGroupDetailCacheCfg.setReadThrough(false);
ipv4AssetGroupDetailCacheCfg.setRebalanceMode(CacheRebalanceMode.ASYNC);
ipv4AssetGroupDetailCacheCfg.setWriteSynchronizationMode(CacheWriteSynchronizationMode.FULL_SYNC);
ipv4AssetGroupDetailCacheCfg.setBackups(1);
ipv4AssetGroupDetailCacheCfg.setIndexedTypes(DefaultDataAffinityKey.class,
IpV4AssetGroupData.class);

Factory storeFactory =
FactoryBuilder.factoryOf(IpV4AssetGroupCacheStore.class);
ipv4AssetGroupDetailCacheCfg.setCacheStoreFactory(storeFactory);
ipv4AssetGroupDetailCacheCfg.setCacheStoreSessionListenerFactories(cacheStoreSessionListenerFactory());

RendezvousAffinityFunction affinityFunction = new RendezvousAffinityFunction();
affinityFunction.setExcludeNeighbors(false);
ipv4AssetGroupDetailCacheCfg.setAffinity(affinityFunction);



*Could someone please advice why there is no balanced distribution of
primary and backup partitions?*

*Thanks,*
*Akash*

Re: UNCHECKED Executing SQL on cache using affinnity key

2018-07-25 Thread akash shinde

Hi Evgenii,

Are you saying that if the affinityKey column is present in WHERE clause of
SQL statement, the SQL engine will detect a node where this key is placed
and execute this query only on one node.

I just want to make sure that SQL gets executed only on node where the
affinity key is placed.

Thanks,
Akash

On Wed, Jul 25, 2018 at 3:45 PM, Ilya Kasnacheev 
wrote:

> Hello!
>
> I hope that Ignite statement planner will make use of affinity key if you
> provide it in the request. I.e. it will not be broadcast if you provide
> affinity key.
>
> Regards,
>
> --
> Ilya Kasnacheev
>
> 2018-07-25 10:58 GMT+03:00 Stanislav Lukyanov :
>
>> What do you mean by “execute select query on cache using affinity key”
>>
>> and what is the problem you’re trying to solve?
>>
>>
>>
>> Stan
>>
>>
>>
>> *From: *Prasad Bhalerao 
>> *Sent: *25 июля 2018 г. 10:03
>> *To: *user@ignite.apache.org
>> *Subject: UNCHECKED*** Executing SQL on cache using affinnity key
>>
>>
>>
>> Hi,
>>
>>
>>
>> Is there any way to execute select query on cache using affinity key?
>>
>>
>>
>> As per this link: https://apacheignite.readme.io/docs/collocate-compute-
>> and-data#section-affinity-call-and-run-methods
>>
>>
>>
>> It can be done as follows:
>>
>>
>>
>> compute.affinityCall(*CACHE_NAME*, *affinityKey*, () -> {
>>
>>   SqlFieldsQuery sqlFieldsQuery = *new 
>> *SqlFieldsQuery(SELECT_STATEMENT).setArgs(args);
>>   sqlFieldsQuery.setDistributedJoins(*false*);
>>   sqlFieldsQuery.setLocal(*true*);
>>   sqlFieldsQuery.setCollocated(*true*);
>>   *return *cache().query(sqlFieldsQuery);
>> });
>>
>> Is there any other way to do it?
>>
>>
>>
>> Thanks,
>>
>> Prasad
>>
>>
>>
>
>

Exception while running sql inside ignite transaction

2018-07-13 Thread akash shinde

Hi,
I am facing below exception while executing query on cache. This exception
is not consistent.
I am starting transaction in optimistic mode. Could someone please advice.

try (Transaction transaction = igniteTx
.txStart(TransactionConcurrency.OPTIMISTIC,
TransactionIsolation.SERIALIZABLE)) {



2018-07-13 19:50:39,877 325145
[grid-nio-worker-tcp-comm-1-#26%springDataNode%] INFO
o.a.i.s.c.tcp.TcpCommunicationSpi - Accepted incoming communication
connection [locAddr=/0:0:0:0:0:0:0:1:47100, rmtAddr=/0:0:0:0:0:0:0:1:56792]
2018-07-13 19:50:39,925 325193 [pub-#69%springDataNode%] ERROR
o.a.i.i.p.q.h.t.GridMapQueryExecutor - Failed to execute local query.
org.apache.ignite.IgniteCheckedException: Failed to execute SQL query.
General error: "class org.apache.ignite.IgniteInterruptedException: Thread
got interrupted while trying to acquire table lock."; SQL statement:
SELECT
__Z0.ID __C0_0
FROM ASSET_GROUP_DOMAIN_CACHE.ASSETGROUPDOMAINDATA __Z0
WHERE __Z0.DOMAINID = ?1 [5-196]
at
org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.executeSqlQuery(IgniteH2Indexing.java:1088)
at
org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.executeSqlQueryWithTimer(IgniteH2Indexing.java:1149)
at
org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.executeSqlQueryWithTimer(IgniteH2Indexing.java:1127)
at
org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onQueryRequest0(GridMapQueryExecutor.java:670)
at
org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onQueryRequest(GridMapQueryExecutor.java:516)
at
org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onMessage(GridMapQueryExecutor.java:214)
at
org.apache.ignite.internal.processors.query.h2.twostep.GridReduceQueryExecutor$1.applyx(GridReduceQueryExecutor.java:154)
at
org.apache.ignite.internal.processors.query.h2.twostep.GridReduceQueryExecutor$1.applyx(GridReduceQueryExecutor.java:152)
at
org.apache.ignite.internal.util.lang.IgniteInClosure2X.apply(IgniteInClosure2X.java:38)
at
org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.send(IgniteH2Indexing.java:2555)
at
org.apache.ignite.internal.processors.query.h2.twostep.GridReduceQueryExecutor.send(GridReduceQueryExecutor.java:1419)
at
org.apache.ignite.internal.processors.query.h2.twostep.GridReduceQueryExecutor.query(GridReduceQueryExecutor.java:732)
at
org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing$8.iterator(IgniteH2Indexing.java:1339)
at
org.apache.ignite.internal.processors.cache.QueryCursorImpl.iterator(QueryCursorImpl.java:95)
at
com.qualys.agms.grid.dao.AssetGroupDomainDataDaoImpl.getAssetGroupDomainsIds(AssetGroupDomainDataDaoImpl.java:69)
at
com.qualys.agms.grid.dataservice.AssetGroupDomainDataGridServiceImpl.getAssetGroupDomainsIds(AssetGroupDomainDataGridServiceImpl.java:21)
at
com.qualys.agms.task.handler.DomainNetBlockEditor.editDomainNetblockOverride(DomainNetBlockEditor.java:64)
at
com.qualys.agms.task.handler.DomainNetBlockEditor.edit(DomainNetBlockEditor.java:58)
at
com.qualys.agms.task.ignite.EditDomainIgniteTask.call(EditDomainIgniteTask.java:56)
at
com.qualys.agms.task.ignite.EditDomainIgniteTask.call(EditDomainIgniteTask.java:22)
at
org.apache.ignite.internal.processors.closure.GridClosureProcessor$C2.execute(GridClosureProcessor.java:1855)
at
org.apache.ignite.internal.processors.job.GridJobWorker$2.call(GridJobWorker.java:566)
at
org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6623)
at
org.apache.ignite.internal.processors.job.GridJobWorker.execute0(GridJobWorker.java:560)
at
org.apache.ignite.internal.processors.job.GridJobWorker.body(GridJobWorker.java:489)
at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
at
org.apache.ignite.internal.processors.job.GridJobProcessor.processJobExecuteRequest(GridJobProcessor.java:1189)
at
org.apache.ignite.internal.processors.job.GridJobProcessor$JobExecutionListener.onMessage(GridJobProcessor.java:1921)
at
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1555)
at
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1183)
at
org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:126)
at
org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1090)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.h2.jdbc.JdbcSQLException: General error: "class
org.apache.ignite.IgniteInterruptedException: Thread got interrupted while
trying to acquire table lock."; SQL statement:

Thanks,
Akash

code hangs up on cache().removeAll(set) operation

2018-06-04 Thread akash shinde

Hi,

My application hangs up when I execute following code. I tried to debug the
ignite source code but no luck.  Complete thread dump is attached in this
mail.

*cache().removeAll(set)*


* Can someone please advise? *

*Thread dump is as follows.*

Name: local-task-pool-0
State: WAITING
Total blocked: 0  Total waited: 1

Stack trace:
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140)
org.apache.ignite.internal.processors.cache.GridCacheAdapter$39.inOp(GridCacheAdapter.java:3011)
org.apache.ignite.internal.processors.cache.GridCacheAdapter$SyncInOp.op(GridCacheAdapter.java:5076)
org.apache.ignite.internal.processors.cache.GridCacheAdapter.syncOp(GridCacheAdapter.java:4088)
org.apache.ignite.internal.processors.cache.GridCacheAdapter.removeAll0(GridCacheAdapter.java:3004)
org.apache.ignite.internal.processors.cache.GridCacheAdapter.removeAll(GridCacheAdapter.java:2993)
org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.removeAll(IgniteCacheProxyImpl.java:1254)
org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.removeAll(GatewayProtectedCacheProxy.java:1166)
com.qualys.agms.grid.dao.AbstractDataGridDAO.removeAll(AbstractDataGridDAO.java:61)
com.qualys.agms.grid.dataservice.DefaultDataGridService.removeAll(DefaultDataGridService.java:47)
com.qualys.agms.task.local.RemoveIPsFromAssetGroupTaskV1.removeIpsFromAssetGroup(RemoveIPsFromAssetGroupTaskV1.java:70)
com.qualys.agms.task.local.RemoveIPsFromAssetGroupTaskV1.run(RemoveIPsFromAssetGroupTaskV1.java:48)
java.util.concurrent.Executors$RunnableAdapter.call$$$capture(Executors.java:511)
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java)
java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
java.util.concurrent.FutureTask.run(FutureTask.java)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)


Thanks,
Akash
"http-nio-8080-exec-9@8387" daemon prio=5 tid=0x59 nid=NA waiting
  java.lang.Thread.State: WAITING
  at sun.misc.Unsafe.park(Unsafe.java:-1)
  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
  at 
org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
  at 
org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140)
  at 
org.apache.ignite.internal.AsyncSupportAdapter.saveOrGet(AsyncSupportAdapter.java:112)
  at 
org.apache.ignite.internal.IgniteComputeImpl.affinityRun(IgniteComputeImpl.java:144)
  at 
com.qualys.agms.grid.service.GridServiceImpl.submitAffinityTask(GridServiceImpl.java:29)
  at 
com.qualys.agms.service.AssetGroupManagementServiceImpl.editAssetToGroup(AssetGroupManagementServiceImpl.java:133)
  at 
com.qualys.agms.web.controller.AssetGroupManagementController.updateAssetGroup(AssetGroupManagementController.java:71)
  at 
sun.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java:-1)
  at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:498)
  at 
org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:76)
  at 
org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$$Lambda$413.1604032818.invoke(Unknown
 Source:-1)
  at 
org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:148)
  at 
org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:191)
  at 
org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:200)
  at 
org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:103)
  at 
org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:493)
  at 
org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:415)
  at 
org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:104)
  at 
org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:277)
  at org.glassfish.jersey.internal.Errors$1.call(Errors.java:272)
  at

Off heap and JVM heap memory allocation

2018-04-11 Thread akash shinde

Hi,
I have 32 GB RAM machine to configure IGNITE node. Out of this 32 GB memory
I want to allocate 16 GB for *off heap* space and remaining 16 GB for *JVM
heap* space(processing). My cache will be configured as only off heap and
JVM heap space will be use for processing.
Is there any way to do this configuration in ignite?

44 matches

Mail list logo