Re: Ignite client node hangs while IgniteAtomicLong is created

2020-08-11 Thread Вячеслав Коптилин
Hello Ilya,

I have a look at your logs and thread dumps. This definitely a bug.
The data structure processor does not properly initialize when a client
node connects to a cluster that changes its own state (state transition
from the inactive state to active).
I have created a ticket in order to address this issue
https://issues.apache.org/jira/browse/IGNITE-13348

Thanks,
S.

вт, 11 авг. 2020 г. в 05:00, Ilya Roublev :

> Hello, Ilya,
>
> In the post above one week ago I've attached necessary thread dumps. Could
> you please say whether do you have sufficient information to investigate
> the
> problem with hanging of IgniteAtomicLong? I think the issue not all that
> harmless, it concerns the last Ignite 2.8.1 and its fixing may be IMHO
> important for the community (I think the cause is in initialization of
> ignite-sys-atomic-cache simultaneously in several nodes, but certainly I
> may
> be mistaken). But unfortunately I see no reaction on this since a week.
> Could you please give at least a hint that the problem is under
> investigation and there is a slightest chance that the problem can be
> resolved? Or it is better to work out some workarounds?
>
> Thank you very much in advance for any response.
>
> My best regards,
> Ilya
>
>
> ilya.kasnacheev wrote
> > Hello!
> >
> > Can you collect thread dumps from all nodes once you get them hanging?
> >
> > Can you throw together a reproducer project?
> >
> > Regards,
> > --
> > Ilya Kasnacheev
> >
> >
> > вт, 4 авг. 2020 г. в 12:51, Ilya Roublev <
>
> > iroublev@
>
> > >:
> >
> >> We are developing Jira cloud app using Apache Ignite both as data
> storage
> >> and as job scheduler. This is done via a standard Ignite client node.
> But
> >> we need to use Atlassian Connect Spring Boot to be able to communicate
> >> with
> >> Jira. In short, all is done exactly as in our article Boosting Jira
> Cloud
> >> app development with Apache Ignite
> >> <
> https://medium.com/alliedium/boosting-jira-cloud-app-development-with-apache-ignite-7eebc7bb3d48>
> ;.
> >> At first we used simple Ignite JDBC driver
> >> ; just for
> >> Atlassian
> >> Connect Spring Boot along with a separate Ignite client node for our own
> >> purposes. But this turned out to be very unstable being deployed in our
> >> local Kubernetes cluster (built via Kubespray) due to constant
> exceptions
> >>
> >> java.net.SocketException: Connection reset
> >>
> >> occuring from time to time (in fact, this revealed only in our local
> >> cluster, in AWS EKS all worked fine). To make all this more stable we
> >> tried
> >> to use Ignite JDBC Client driver
> >> ;
> >> exactly as
> >> described in the article mentioned above. Thus, now our backend uses two
> >> Ignite client nodes per single JVM: the first one for JDBC used by
> >> Atlassian Connect Spring Boot, the second one for our own purposes. This
> >> solution turned out to be good enough, because our app works now very
> >> stable both in our local cluster and in AWS EKS. But when we deploy our
> >> app
> >> in Docker for testing and developing purposes, our Ignite client nodes
> >> hang
> >> from time to time. After some investigation we were able to see that
> this
> >> occurs exactly at the instant when an object of IgniteAtomicLong is
> >> created. Below are logs both for successful initialization of our app
> and
> >> for the case when Ignite client node hanged. Logs when all is ok
> >> ignite-appclientnode-successful.log
> >> <
> http://apache-ignite-users.70518.x6.nabble.com/file/t2262/ignite-appclientnode-successful.log>
> ;
> >> ignite-jdbcclientnode-successful.log
> >> <
> http://apache-ignite-users.70518.x6.nabble.com/file/t2262/ignite-jdbcclientnode-successful.log>
> ;
> >> Logs
> >> when both client node hang ignite-appclientnode-failed.log
> >> <
> http://apache-ignite-users.70518.x6.nabble.com/file/t2262/ignite-appclientnode-failed.log>
> ;
> >> ignite-jdbcclientnode-failed.log
> >> <
> http://apache-ignite-users.70518.x6.nabble.com/file/t2262/ignite-jdbcclientnode-failed.log>
> ;
> >> Some
> >> analysis and questions From logs one can see that caches default,
> >> tenants, atlassian_host_audit, SQL_PUBLIC_ATLASSIAN_HOST are
> manipulated,
> >> in fact, default is given in client configuration: client.xml
> >> <
> http://apache-ignite-users.70518.x6.nabble.com/file/t2262/client.xml>;,
> >> the cache SQL_PUBLIC_ATLASSIAN_HOST contains atlassian_host table
> >> mentioned
> >> in Boosting Jira Cloud app development with Apache Ignite
> >> <
> https://medium.com/alliedium/boosting-jira-cloud-app-development-with-apache-ignite-7eebc7bb3d48>
> ;
> >> and is created in advance even before the app starts. Further,
> >> atlassian_host_audit is a copy of atlassian_host, in any case it is not
> >> yet
> >> created when the app hangs. What concerns other entities processed by
> >> Ignite, they are created by the following code:
> >>
>

Re: Ignite client node hangs while IgniteAtomicLong is created

2020-08-10 Thread Ilya Roublev
Hello, Ilya,

In the post above one week ago I've attached necessary thread dumps. Could
you please say whether do you have sufficient information to investigate the
problem with hanging of IgniteAtomicLong? I think the issue not all that
harmless, it concerns the last Ignite 2.8.1 and its fixing may be IMHO
important for the community (I think the cause is in initialization of
ignite-sys-atomic-cache simultaneously in several nodes, but certainly I may
be mistaken). But unfortunately I see no reaction on this since a week.
Could you please give at least a hint that the problem is under
investigation and there is a slightest chance that the problem can be
resolved? Or it is better to work out some workarounds?

Thank you very much in advance for any response.

My best regards,
Ilya


ilya.kasnacheev wrote
> Hello!
> 
> Can you collect thread dumps from all nodes once you get them hanging?
> 
> Can you throw together a reproducer project?
> 
> Regards,
> -- 
> Ilya Kasnacheev
> 
> 
> вт, 4 авг. 2020 г. в 12:51, Ilya Roublev <

> iroublev@

> >:
> 
>> We are developing Jira cloud app using Apache Ignite both as data storage
>> and as job scheduler. This is done via a standard Ignite client node. But
>> we need to use Atlassian Connect Spring Boot to be able to communicate
>> with
>> Jira. In short, all is done exactly as in our article Boosting Jira Cloud
>> app development with Apache Ignite
>> ;.
>> At first we used simple Ignite JDBC driver
>> ; just for
>> Atlassian
>> Connect Spring Boot along with a separate Ignite client node for our own
>> purposes. But this turned out to be very unstable being deployed in our
>> local Kubernetes cluster (built via Kubespray) due to constant exceptions
>>
>> java.net.SocketException: Connection reset
>>
>> occuring from time to time (in fact, this revealed only in our local
>> cluster, in AWS EKS all worked fine). To make all this more stable we
>> tried
>> to use Ignite JDBC Client driver
>> ;
>> exactly as
>> described in the article mentioned above. Thus, now our backend uses two
>> Ignite client nodes per single JVM: the first one for JDBC used by
>> Atlassian Connect Spring Boot, the second one for our own purposes. This
>> solution turned out to be good enough, because our app works now very
>> stable both in our local cluster and in AWS EKS. But when we deploy our
>> app
>> in Docker for testing and developing purposes, our Ignite client nodes
>> hang
>> from time to time. After some investigation we were able to see that this
>> occurs exactly at the instant when an object of IgniteAtomicLong is
>> created. Below are logs both for successful initialization of our app and
>> for the case when Ignite client node hanged. Logs when all is ok
>> ignite-appclientnode-successful.log
>> ;
>> ignite-jdbcclientnode-successful.log
>> ;
>> Logs
>> when both client node hang ignite-appclientnode-failed.log
>> ;
>> ignite-jdbcclientnode-failed.log
>> ;
>> Some
>> analysis and questions From logs one can see that caches default,
>> tenants, atlassian_host_audit, SQL_PUBLIC_ATLASSIAN_HOST are manipulated,
>> in fact, default is given in client configuration: client.xml
>> ;,
>> the cache SQL_PUBLIC_ATLASSIAN_HOST contains atlassian_host table
>> mentioned
>> in Boosting Jira Cloud app development with Apache Ignite
>> ;
>> and is created in advance even before the app starts. Further,
>> atlassian_host_audit is a copy of atlassian_host, in any case it is not
>> yet
>> created when the app hangs. What concerns other entities processed by
>> Ignite, they are created by the following code:
>>
>> CacheConfiguration tenantCacheCfg = new
>> CacheConfiguration<>();
>> tenantCacheCfg.setName("tenants");
>> tenantCacheCfg.setSqlSchema("PROD");
>> tenantCacheCfg.setIndexedTypes(Long.class, Tenant.class);
>>
>> tenantCacheCfg.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL);
>> tenantCacheCfg.setEncryptionEnabled(true);
>>
>> IgniteCache tenantCache =
>> ignite.getOrCreateCache(tenantCacheCfg);
>> IgniteAtomicLong idGen =
>> ignite.atomicLong("PROD_tenants_seq", 0, true);
>>
>> And from the logs of the app itself it is clea

Re: Ignite client node hangs while IgniteAtomicLong is created

2020-08-06 Thread Ilya Roublev
Hello, Ilya,Attached are two thread dumps, the second is taken 13 minutes
after thefirst one:  threaddump.txt
 
,  threaddump2.txt
 
.The hanging occurs in the main thread (in fact the same output is for
threaddump taken after 8 hours):The differences between two thread dumps are
minor, one of them is asfollows:in the first thread dumpin the secondWhat
concerns a reproducer project, this is not an easy task, because it
isdifficult to understand which factors may be treated as significant.
Ourinitial project is in general stable, the matter is that we have dozens
of buildson our build server per each day and only some of these builds
fail. It is very difficult tocatch this situation, I have had to launch 5
builds one after another beforethis situation really occured. And it may be
that this situation requireslaunching very specific containers in Docker
each at very specific time. Andwe cannot share our original project, all I
can do is to give you thoseparts of the code that deal with Ignite. For
example, the full code of startmethod from DbManager is as follows:And we
have logs for all containers of our app including those for Igniteserver
nodes, if you like I can provide them.Thank you very much for your help in
advance.My best regards,Ilya



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Ignite client node hangs while IgniteAtomicLong is created

2020-08-05 Thread Ilya Roublev
Hello, Ilya,

Attached are two thread dumps, the second is taken 13 minutes after the
first one:  threaddump.txt
 
,  threaddump2.txt
 
.

The hanging occurs in the main thread (in fact the same output is for thread
dump taken after 8 hours):


The differences between two thread dumps are minor, one of them is as
follows:
in the first thread dump

in the second


What concerns a reproducer project, this is not an easy task, because it is
difficult to understand which factors may be treated as significant. Our
initial project is in general stable, the matter we have dozens of builds
per each day and only some of these builds fail. It is very difficult to
catch this situation, I have had to launch 5 builds one after another before
this situation really occured. And it may be that this situation requires
launching very specific containers in Docker each at very specific time. And
we cannot share our original project, all I can do is to give you those
parts of the code that deal with Ignite. For example, the full code of start
method from DbManager is as follows:



And we have logs for all containers of our app including those for Ignite
server nodes, if you like I can provide them.

Thank you very much for your help in advance.

My best regards,
Ilya



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Ignite client node hangs while IgniteAtomicLong is created

2020-08-04 Thread Ilya Kasnacheev
Hello!

Can you collect thread dumps from all nodes once you get them hanging?

Can you throw together a reproducer project?

Regards,
-- 
Ilya Kasnacheev


вт, 4 авг. 2020 г. в 12:51, Ilya Roublev :

> We are developing Jira cloud app using Apache Ignite both as data storage
> and as job scheduler. This is done via a standard Ignite client node. But
> we need to use Atlassian Connect Spring Boot to be able to communicate with
> Jira. In short, all is done exactly as in our article Boosting Jira Cloud
> app development with Apache Ignite
> .
> At first we used simple Ignite JDBC driver
>  just for Atlassian
> Connect Spring Boot along with a separate Ignite client node for our own
> purposes. But this turned out to be very unstable being deployed in our
> local Kubernetes cluster (built via Kubespray) due to constant exceptions
>
> java.net.SocketException: Connection reset
>
> occuring from time to time (in fact, this revealed only in our local
> cluster, in AWS EKS all worked fine). To make all this more stable we tried
> to use Ignite JDBC Client driver
>  exactly as
> described in the article mentioned above. Thus, now our backend uses two
> Ignite client nodes per single JVM: the first one for JDBC used by
> Atlassian Connect Spring Boot, the second one for our own purposes. This
> solution turned out to be good enough, because our app works now very
> stable both in our local cluster and in AWS EKS. But when we deploy our app
> in Docker for testing and developing purposes, our Ignite client nodes hang
> from time to time. After some investigation we were able to see that this
> occurs exactly at the instant when an object of IgniteAtomicLong is
> created. Below are logs both for successful initialization of our app and
> for the case when Ignite client node hanged. Logs when all is ok
> ignite-appclientnode-successful.log
> 
> ignite-jdbcclientnode-successful.log
> 
>  Logs
> when both client node hang ignite-appclientnode-failed.log
> 
> ignite-jdbcclientnode-failed.log
> 
>  Some
> analysis and questions From logs one can see that caches default,
> tenants, atlassian_host_audit, SQL_PUBLIC_ATLASSIAN_HOST are manipulated,
> in fact, default is given in client configuration: client.xml
> ,
> the cache SQL_PUBLIC_ATLASSIAN_HOST contains atlassian_host table mentioned
> in Boosting Jira Cloud app development with Apache Ignite
> 
> and is created in advance even before the app starts. Further,
> atlassian_host_audit is a copy of atlassian_host, in any case it is not yet
> created when the app hangs. What concerns other entities processed by
> Ignite, they are created by the following code:
>
> CacheConfiguration tenantCacheCfg = new 
> CacheConfiguration<>();
> tenantCacheCfg.setName("tenants");
> tenantCacheCfg.setSqlSchema("PROD");
> tenantCacheCfg.setIndexedTypes(Long.class, Tenant.class);
> tenantCacheCfg.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL);
> tenantCacheCfg.setEncryptionEnabled(true);
>
> IgniteCache tenantCache = 
> ignite.getOrCreateCache(tenantCacheCfg);
> IgniteAtomicLong idGen = ignite.atomicLong("PROD_tenants_seq", 0, 
> true);
>
> And from the logs of the app itself it is clear that the app hangs exactly
> on the last line. This is confirmed by the fact that the in
> ignite-jdbcclientnode-successful.log we have the following lines:
>
> [2020-07-31 09:52:12,237][INFO ][exchange-worker-#43][GridCacheProcessor] 
> Started cache [name=ignite-sys-atomic-cache@default-ds-group, id=1481046058, 
> group=default-ds-group, dataRegionName=null, mode=PARTITIONED, 
> atomicity=TRANSACTIONAL, backups=1, mvcc=false]
> [2020-07-31 09:52:12,263][INFO ][exchange-worker-#43][time] Finished exchange 
> init [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=2], crd=false]
> [2020-07-31 09:52:12,631][INFO 
> ][sys-#109%ignite-jdbc-driver-e1f562e1-5127-4f55-80d2-18661982769c%][GridDhtPartitionsExchangeFuture]
>  Received full message, will finish exchange 
> [node=c7d1e091-da48-46a3-98ea-6971e8a811e0, resVer=AffinityTopologyVersion 
> [topVer=6, minorTopVer=2]]
> [2020-07-31 09:52:12,634][INFO ][sys-#54][GridDhtPartitionsExchangeFuture] 
> Received full messag

Ignite client node hangs while IgniteAtomicLong is created

2020-08-04 Thread Ilya Roublev
We are developing Jira cloud app using Apache Ignite both as data storage and
as job scheduler. This is done via a standard Ignite client node. But we
need to use Atlassian Connect Spring Boot to be able to communicate with
Jira. In short, all is done exactly as in our article  Boosting Jira Cloud
app development with Apache Ignite

 
.At first we used simple  Ignite JDBC driver
   just for Atlassian
Connect Spring Boot along with a separate Ignite client node for our own
purposes. But this turned out to be very unstable being deployed in our
local Kubernetes cluster (built via Kubespray) due to constant exceptions 
occuring from time to time (in fact, this revealed only in our local
cluster, in AWS EKS all worked fine). To make all this more stable we tried
to use  Ignite JDBC Client driver
   exactly as
described in the article mentioned above. Thus, now our backend uses two
Ignite client nodes per single JVM: the first one for JDBC used by Atlassian
Connect Spring Boot, the second one for our own purposes.This solution
turned out to be good enough, because our app works now very stable both in
our local cluster and in AWS EKS. But when we deploy our app in Docker for
testing and developing purposes, our Ignite client nodes hang from time to
time. After some investigation we were able to see that this occurs exactly
at the instant when an object of IgniteAtomicLong is created. Below are logs
both for successful initialization of our app and for the case when Ignite
client node hanged.
Logs when all is ok
ignite-appclientnode-successful.log

  
ignite-jdbcclientnode-successful.log

  
Logs when both client node hang
ignite-appclientnode-failed.log

  
ignite-jdbcclientnode-failed.log

  
Some analysis and questions
>From logs one can see that caches default, tenants, atlassian_host_audit,
SQL_PUBLIC_ATLASSIAN_HOST  are manipulated, in fact, default is given in
client configuration:  client.xml
  ,
the cache SQL_PUBLIC_ATLASSIAN_HOST contains atlassian_host table mentioned
in  Boosting Jira Cloud app development with Apache Ignite

  
and is created in advance even before the app starts. Further,
atlassian_host_audit is a copy of atlassian_host, in any case it is not yet
created when the app hangs.What concerns other entities processed by Ignite,
they are created by the following code:And from the logs of the app itself
it is clear that the app hangs exactly on the last line. This is confirmed
by the fact that the in ignite-jdbcclientnode-successful.log we have the
following lines:while in ignite-jdbcclientnode-failed.log all the lines
starting the first time the cache ignite-sys-atomic-cache@default-ds-group
(the cache used for atomics) was mentioned are as follows:In particular, the
following line from ignite-jdbcclientnode-successful.log is absent in
ignite-jdbcclientnode-failed.log:But it should be noted that for the failure
case there are other client nodes executed in separate containers executed
simultaneously with the backend app and with the same code creating the
cache tenants and IgniteAtomicLong idGen what concerns the logs below (see
above for the code), their node ids are 653143b2-6e80-49ff-9e9a-ae10237b32e8
and 30e24e06-ab76-4053-a36e-548e87ffe5d1, respectively (and it can be easily
seen that all the lines in ignite-jdbcclientnode-failed.log with
ignite-sys-atomic-cache@default-ds-group relate namely to these nodes), the
logs for the time segment when the code with tenants and idGen is executed
are as follows:And the code creating tenants and idGen is executed
successfully. But is it possible that this simultaneous creation of idGen
may hang some nodes? (As for the case when all was executed successfully,
there we also have two separate containers, but they are executed strictly
after all is done in the main app, so the simultaneous execution of the same
code in several client nodes may be the reason of hanging, isn't it?) And in
the case the answer is positive, what is to do? Certainly it is possible to
set a delay for those separate containers, but this does not look as a
rather safe solution...And we have another small question, when we have two
separate client nodes in our app, both configured for logging, why starting
from some instant only the log