Grid failure on frequent cache creation/destroying

2019-09-09 Thread Abhishek Gupta (BLOOMBERG/ 731 LEX)
Hello,
 We have a grid of 6 nodes with a main cache. We noticed something 
interesting today where while the regular ingestion was on with the mainCache. 
We have an operational tool that created and destroys a cache 
(tempCacheByExplorerApp) using REST API on each of the 6 nodes. While doing 
this today, all the nodes hit a critical error and died. Attached is a log 
snippet from one of the nodes when this happened. 

This issue seems like the one described @ 
http://apache-ignite-users.70518.x6.nabble.com/Ignite-2-7-0-server-node-null-pointer-exception-td28899.html
  but there's isn't any topology change as such happening here, just cache 
creation/destruction.


Appreciate your help.

-Abhi



ignite-log-forum.log
Description: Binary data


Re: Cache expiry policy not deleting records from disk(native persistence)

2019-09-09 Thread Shiva Kumar
Hi
I have deployed ignite on kubernetes and configured two seperate persistent
volume for WAL and persistence.
The issue Iam facing is same as
https://issues.apache.org/jira/browse/IGNITE-10862

Thanks
Shiva

On Mon, 9 Sep, 2019, 10:47 PM Andrei Aleksandrov, 
wrote:

> Hello,
>
> I guess that generated WAL will take this disk space. Please read about
> WAL here:
>
> https://apacheignite.readme.io/docs/write-ahead-log
>
> Please provide the size of every folder under /opt/ignite/persistence.
>
> BR,
> Andrei
> 9/6/2019 9:45 PM, Shiva Kumar пишет:
>
> Hi all,
> I have set cache expiry policy like this
>
>
>
>
> 
>  class="org.apache.ignite.configuration.CacheConfiguration">
>   
>   
>   
>   
>   
>  factory-method="factoryOf">
>   
> 
>   
>   
> 
>   
> 
>   
>
> 
> 
>
>
>
> And batch inserting records to one of the table which is created with
> above cache template.
> Around 10 minutes, I ingested ~1.5GB of data and after 10 minutes records
> started reducing(expiring) when I monitored from sqlline.
>
> 0: jdbc:ignite:thin://192.168.*.*:10800> select count(ID) from DIMENSIONS;
> 
>
> COUNT(ID)
> 
>
> 248896
> 
> 1 row selected (0.86 seconds)
> 0: jdbc:ignite:thin://192.168.*.*:10800> select count(ID) from DIMENSIONS;
> 
>
> COUNT(ID)
> 
>
> 222174
> 
> 1 row selected (0.313 seconds)
> 0: jdbc:ignite:thin://192.168.*.*:10800> select count(ID) from DIMENSIONS;
> 
>
> COUNT(ID)
> 
>
> 118154
> 
> 1 row selected (0.15 seconds)
> 0: jdbc:ignite:thin://192.168.*.*:10800>
> 0: jdbc:ignite:thin://192.168.*.*:10800> select count(ID) from DIMENSIONS;
> 
>
> COUNT(ID)
> 
>
> 76061
> 
> 1 row selected (0.106 seconds)
> 0: jdbc:ignite:thin://192.168.*.*:10800>
> 0: jdbc:ignite:thin://192.168.*.*:10800> select count(ID) from DIMENSIONS;
> 
>
> COUNT(ID)
> 
>
> 41671
> 
> 1 row selected (0.063 seconds)
> 0: jdbc:ignite:thin://192.168.*.*:10800> select count(ID) from DIMENSIONS;
> 
>
> COUNT(ID)
> 
>
> 18455
> 
> 1 row selected (0.037 seconds)
> 0: jdbc:ignite:thin://192.168.*.*:10800> select count(ID) from DIMENSIONS;
> 
>
> COUNT(ID)
> 
>
> 0
> 
> 1 row selected (0.014 seconds)
>
>
> But in the meantime, the disk space used by the persistence store was in
> the same usage level instead of decreasing.
>
>
> [ignite@ignite-cluster-ign-shiv-0 ignite]$ while true ; do df -h
> /opt/ignite/persistence/; sleep 1s; done
> Filesystem Size Used Avail Use% Mounted on
> /dev/vdj 15G 1.6G 14G 11% /opt/ignite/persistence
> Filesystem Size Used Avail Use% Mounted on
> /dev/vdj 15G 1.6G 14G 11% /opt/ignite/persistence
> Filesystem Size Used Avail Use% Mounted on
> /dev/vdj 15G 1.6G 14G 11% /opt/ignite/persistence
> Filesystem Size Used Avail Use% Mounted on
> /dev/vdj 15G 1.6G 14G 11% /opt/ignite/persistence
> Filesystem Size Used Avail Use% Mounted on
> /dev/vdj 15G 1.6G 14G 11% /opt/ignite/persistence
> Filesystem Size Used Avail Use% Mounted on
> /dev/vdj 15G 1.6G 14G 11% /opt/ignite/persistence
> Filesystem Size Used Avail Use% Mounted on
> /dev/vdj 15G 1.6G 14G 11% /opt/ignite/persistence
>
>
>
> This means that expiry policy not deleting records from the disk, but
> ignite document says when expiry policy is set and native persistence is
> enabled then it deletes records from disk as well.
> Am I missing some configuration?
> Any help is appreciated.
>
> Shiva
>
>


Re: IgniteCache.invoke deadlock example

2019-09-09 Thread Andrei Aleksandrov

Hello,

When you use the entry processor then you lock only provided key. So 
when you tries to work with *other keys* (different from provided one) 
that are being processed somewhere in other threads then deadlock is 
possible because other thread can take lock on these *other keys* and 
wait for provided one. Otherwise, entry processor will wait for these 
*other keys*. It's typical deadlock.


Sorry, I will not provide the example but hopes that my explanation is 
clear.


BR,
Andrei

9/7/2019 6:31 PM, Evangelos Morakis пишет:


Dear igniters,

I would like to elicit your expert
advice in regards to how ignite differentiates
on the use of a call to: 1)|IgniteCompute.affinityRun(...)|
and
|2)IgniteCache.invoke(...)|
|
|
| as far as dead locks are concerned. According to the documentation 
the main difference is that method 2 above, operates within a lock. 
Specifically the doc quotes:|
|“EntryProcessors| are executed atomically within a lock on the given 
cache key.”
Now it even comes with a warning that is meant to show how it is 
supposed to be used (or conversely NOT to be used):
“You should not access *other keys* from within the 
|EntryProcessor| logic as it may cause a deadlock.”
But this phrase “*other keys*” to what kind of keys does it refer to? 
 The remaining keys of the passed in cache?  For e.g. :

 Assume a persons cache...
Cache Person personsCache=...

|personsCache.invoke("personKey", new EntryProcessorVoid>() {|

||
|@Override public Object process(MutableEntry entry, 
Object... args) { |

||
|Person person= entry.getValue(); 
entry.setValue(person.setOccupation(“foo”));|

|return null;|
| } |
| });|
In other words can someone provide an example based on the above dummy 
code  that would make invoke deadlock so that I could get an 
understanding of what the documentation refers to?


Thanks

Evangelos Morakis



Re: Cache expiry policy not deleting records from disk(native persistence)

2019-09-09 Thread Andrei Aleksandrov

Hello,

I guess that generated WAL will take this disk space. Please read about 
WAL here:


https://apacheignite.readme.io/docs/write-ahead-log

Please provide the size of every folder under /opt/ignite/persistence.

BR,
Andrei

9/6/2019 9:45 PM, Shiva Kumar пишет:

Hi all,
I have set cache expiry policy like this


 
 
          
              class="org.apache.ignite.configuration.CacheConfiguration">

                
                
                
                
                
                  factory-method="factoryOf">

                    
                      
                        
                        
                      
                    
                  
                

              
          
 


And batch inserting records to one of the table which is created with 
above cache template.
Around 10 minutes, I ingested ~1.5GB of data and after 10 minutes 
records started reducing(expiring) when I monitored from sqlline.


0: jdbc:ignite:thin://192.168.*.*:10800> select count(ID) from DIMENSIONS;


COUNT(ID)


248896

1 row selected (0.86 seconds)
0: jdbc:ignite:thin://192.168.*.*:10800> select count(ID) from DIMENSIONS;


COUNT(ID)


222174

1 row selected (0.313 seconds)
0: jdbc:ignite:thin://192.168.*.*:10800> select count(ID) from DIMENSIONS;


COUNT(ID)


118154

1 row selected (0.15 seconds)
0: jdbc:ignite:thin://192.168.*.*:10800>
0: jdbc:ignite:thin://192.168.*.*:10800> select count(ID) from DIMENSIONS;


COUNT(ID)


76061

1 row selected (0.106 seconds)
0: jdbc:ignite:thin://192.168.*.*:10800>
0: jdbc:ignite:thin://192.168.*.*:10800> select count(ID) from DIMENSIONS;


COUNT(ID)


41671

1 row selected (0.063 seconds)
0: jdbc:ignite:thin://192.168.*.*:10800> select count(ID) from DIMENSIONS;


COUNT(ID)


18455

1 row selected (0.037 seconds)
0: jdbc:ignite:thin://192.168.*.*:10800> select count(ID) from DIMENSIONS;


COUNT(ID)


0

1 row selected (0.014 seconds)


But in the meantime, the disk space used by the persistence store was 
in the same usage level instead of decreasing.



[ignite@ignite-cluster-ign-shiv-0 ignite]$ while true ; do df -h 
/opt/ignite/persistence/; sleep 1s; done

Filesystem Size Used Avail Use% Mounted on
/dev/vdj 15G 1.6G 14G 11% /opt/ignite/persistence
Filesystem Size Used Avail Use% Mounted on
/dev/vdj 15G 1.6G 14G 11% /opt/ignite/persistence
Filesystem Size Used Avail Use% Mounted on
/dev/vdj 15G 1.6G 14G 11% /opt/ignite/persistence
Filesystem Size Used Avail Use% Mounted on
/dev/vdj 15G 1.6G 14G 11% /opt/ignite/persistence
Filesystem Size Used Avail Use% Mounted on
/dev/vdj 15G 1.6G 14G 11% /opt/ignite/persistence
Filesystem Size Used Avail Use% Mounted on
/dev/vdj 15G 1.6G 14G 11% /opt/ignite/persistence
Filesystem Size Used Avail Use% Mounted on
/dev/vdj 15G 1.6G 14G 11% /opt/ignite/persistence



This means that expiry policy not deleting records from the disk, but 
ignite document says when expiry policy is set and native persistence 
is enabled then it deletes records from disk as well.

Am I missing some configuration?
Any help is appreciated.

Shiva


Re: Ignite ignores cache config when putting entries through near cache

2019-09-09 Thread Andrei Aleksandrov

Hi Bartłomiej,

Yes, it looks like a bug. Thank you for filing of the JIRA ticket.

Possible that http://apache-ignite-developers.2346864.n4.nabble.com 
 is better place 
to discuss the issues in product. You can start the thread there.


BR,
Andrei

9/6/2019 12:36 PM, Bartłomiej Stefański пишет:

Hi,
I have a problem with putting entries to partitioned or replicated 
cache through near cache on client node. Even when I configured cache 
on server to put values to off-heap space they are stored on heap.


I already descibed it on jira 
https://issues.apache.org/jira/projects/IGNITE/issues/IGNITE-12142. 
I'm writing also here - mailing list seems to be more active.


Is it a bug in Ignite or a problem with configuration?

--
Bartłomiej Stefański


Re: Server Nodes Stopped Unexpectedly

2019-09-09 Thread Akash Shinde
Hi,
Sorry for late reply. I was out of town.
I am trying fetch the logs. Meanwhile could you please answer the questions
from last mail ?

Thanks,
Akash

On Thu, Aug 29, 2019 at 6:51 PM Evgenii Zhuravlev 
wrote:

> Hi,
> Can you please share new logs? It will help to understand the possible
> reason of the issue.
>
> Thanks,
> Evgenii
>
> ср, 28 авг. 2019 г. в 17:56, Akash Shinde :
>
>> Hi,
>>
>> Now I have set the failure detection timeout to 12 mills and I am
>> still getting this error message intermittently on Ignite 2.6 version.
>> It could be the network issue but I am not able to confirm that this is
>> happening because of network issue.
>>
>> 1)  What are all possible reasons for following error? Could you please
>> mention it, it might help to narrow down the issue.
>>  [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException:
>> Thread tcp-disco-srvr-#3%springDataNode% is terminated unexpectedly.]
>>
>> 2) Will upgrading to latest Ignite version 2.7.5 or 2.7.6 solve this
>> problem?
>>
>> 3) How do you monitor the network. Can you please suggest any tool?
>>
>> 4) I understand that node gets segmented because of long GC pause or
>> network connectivity. Is my understanding correct?
>>
>> 5) What is the purpose of networkTimeout configuration? In my case it is
>> set to 1 .
>>
>> Regards,
>> Akash
>>
>> On Mon, Jul 29, 2019 at 2:28 PM Evgenii Zhuravlev <
>> e.zhuravlev...@gmail.com> wrote:
>>
>>> >Does network issue make JVM  halt?
>>> There is a failureDetectionTimeout, which will help other nodes in the
>>> cluster to detect that node is unreachable and to exclude this node from
>>> topology. So, I believe it could be something like a temporary network
>>> problem. I would recommend to add some network monitoring to be prepared
>>> for the next failure.
>>>
>>> Best Regards,
>>> Evgenii
>>>
>>> пт, 26 июл. 2019 г. в 16:01, Akash Shinde :
>>>
 This issue is not consistent and but occurs sometimes. Does network
 issue make JVM  halt?. As per my understanding node will disconnects from
 cluster if network issue happens.
 But in this case multiple JVMs were terminated.Can it be a bug in
 Ignite 2.6 version?

 Thanks,
 Akash

 On Fri, Jul 26, 2019 at 4:00 PM Evgenii Zhuravlev <
 e.zhuravlev...@gmail.com> wrote:

> I don't see any specific errors in the logs. For me, it looks like
> network problems, moreover, on client nodes it prints messages about
> connection problems. Is this issue reproducible?
> Evgenii
>
> пт, 26 июл. 2019 г. в 09:21, Akash Shinde :
>
>> Can someone please help me on this issue ?
>>
>> On Wed, Jul 24, 2019 at 12:04 PM Akash Shinde 
>> wrote:
>>
>>> Hi,
>>> Please find attached logs from all server and client nodes.Also
>>> attached gc logs for each node.
>>>
>>> Thanks,
>>> Akash
>>>
>>>
>>> On Tue, Jul 23, 2019 at 8:59 PM Evgenii Zhuravlev <
>>> e.zhuravlev...@gmail.com> wrote:
>>>
 Hi,

 Can you please share full logs from the node start from all nodes
 in the cluster?

 Thanks,
 Evgenii

 вт, 23 июл. 2019 г. в 16:51, Akash Shinde :

> I am using Ignite 2.6 version.  I have created a cluster of 7
> server nodes and three client nodes. Out of seven nodes five nodes 
> stopped
> unexpectedly with below error logs lines.
> I have attached logs of two such server nodes.
>
> FailureDetectionTimeout is set to 3 ms  in Ignite
> configuration.
> Network time out is default.
> ClientFailureDetectionTimeout is set to 3 ms.
>
> I check gc logs but it does not seem to be GC pause issue. I have
> attached GC logs too.
>
> 1) Can someone please help me to identify the reason for this
> issue?
> 2) Are there any specific reasons which causes this issue or it is
> a bug in Ignite 2.6 version?
>
>
> *ERROR LOGS LINES*
> 2019-07-22 09:22:47,281 19417675
> [tcp-disco-srvr-#3%springDataNode%] ERROR  - Critical system error
> detected. Will be handled accordingly to configured handler [hnd=class
> o.a.i.failure.StopNodeOrHaltFailureHandler, failureCtx=FailureContext
> [type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException:
> Thread tcp-disco-srvr-#3%springDataNode% is terminated unexpectedly.]]
> java.lang.IllegalStateException: Thread
> tcp-disco-srvr-#3%springDataNode% is terminated unexpectedly.
> at
> org.apache.ignite.spi.discovery.tcp.ServerImpl$TcpServer.body(ServerImpl.java:5686)
> at
> org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
> 2019-07-22 09:22:47,281 19417675
> [tcp-disco-srvr-#3%springDataNode%] ERROR  - JVM will be halted 
>>

Re: Ignite Performance - Adding new node is not improving SQL query performance

2019-09-09 Thread Mikael

Hi!

Well, as I said do not take my word for 100% truth, I could be wrong.

Yes, nodes that are not part of the baseline will still handle 
everything except persisted data, so you can still use them for compute 
grid, reading/writing KV, ML, services and so on, but in your case you 
are running SQL queries on persisted data so you do not have any use of 
the new node unless it is part of the baseline, that is my understanding 
on it.


Hopefully someone with better knowledge than me will step in and give 
you a more detailed answer (and correct me if I am wrong).


Mikael


Den 2019-09-09 kl. 12:05, skrev Muhammed Favas:


Thanks Mikael for the response.

So in that case, is it  necessary to add all the new nodes to baseline 
to make use of the resources efficiently ? But in ignite doc, it is 
not mentioned in that way. A sub set of nodes in cluster can be part 
of baseline.


What I though of is like, when I run a query, the data will load to 
memory of all these 5 nodes and will use the computing power of all. 
But now it seems, it is not working like that.


*Regards,*

*Favas ***

*From:*Mikael 
*Sent:* Monday, September 9, 2019 3:21 PM
*To:* user@ignite.apache.org
*Subject:* Re: Ignite Performance - Adding new node is not improving 
SQL query performance


Hi!

If the new node is not part of the baseline topology it will not have 
any persisted data stored so any SQL query will not be of any use on 
the node as it does not have any of the data (at least that is how I 
understand it, I could be wrong here).


If so you would need to add the new node to the baseline topology to 
see any performance improvement, and of course wait for a complete 
rebalance of the data.


From docs:

"The same tools and APIs can be used to adjust the baseline topology 
throughout the cluster lifetime. It's required if you decide to scale 
out or scale in an existing topology by setting more or fewer nodes 
that will store the data. The sections below show how to use the APIs 
and tools."


Mikael

Den 2019-09-09 kl. 11:31, skrev Muhammed Favas:

Hi,

I have an ignite cluster with 4 node(each with 8 core, 32 GB RAM
and 30 GB Disk)  with native persistence enabled and added to
baseline topology. There are two sql table created and loaded with
120 GB data.

One of my test sql query is taking 8 second with this set up.
Currently I am trying various option to reduce the execution time
of the query.

For that, I have added on more node (with same configuration) to
the cluster (non- baselined) with the impression that it will
reduce the  execution time, but it didn’t. When I checked the CPU
utilization of each node, all 4 previously added node’s CPU is
utilizing at its best, but the CPU of newly added node is not
using much.

Can you please help me to figure out why it is so and how can I
make sure all the nodes CPU is utilizing when I run a distributed
query so that my query runs faster.

Also, what all the additional things I need to do to make my query
runs faster.

*Regards,*

*Favas *



RE: Ignite Performance - Adding new node is not improving SQL query performance

2019-09-09 Thread Muhammed Favas
Thanks Mikael for the response.

So in that case, is it  necessary to add all the new nodes to baseline to make 
use of the resources efficiently ? But in ignite doc, it is not mentioned in 
that way. A sub set of nodes in cluster can be part of baseline.

What I though of is like, when I run a query, the data will load to memory of 
all these 5 nodes and will use the computing power of all. But now it seems, it 
is not working like that.

Regards,
Favas

From: Mikael 
Sent: Monday, September 9, 2019 3:21 PM
To: user@ignite.apache.org
Subject: Re: Ignite Performance - Adding new node is not improving SQL query 
performance


Hi!

If the new node is not part of the baseline topology it will not have any 
persisted data stored so any SQL query will not be of any use on the node as it 
does not have any of the data (at least that is how I understand it, I could be 
wrong here).

If so you would need to add the new node to the baseline topology to see any 
performance improvement, and of course wait for a complete rebalance of the 
data.

>From docs:

"The same tools and APIs can be used to adjust the baseline topology throughout 
the cluster lifetime. It's required if you decide to scale out or scale in an 
existing topology by setting more or fewer nodes that will store the data. The 
sections below show how to use the APIs and tools."

Mikael
Den 2019-09-09 kl. 11:31, skrev Muhammed Favas:
Hi,

I have an ignite cluster with 4 node(each with 8 core, 32 GB RAM and 30 GB 
Disk)  with native persistence enabled and added to baseline topology. There 
are two sql table created and loaded with 120 GB data.
One of my test sql query is taking 8 second with this set up. Currently I am 
trying various option to reduce the execution time of the query.

For that, I have added on more node (with same configuration) to the cluster 
(non- baselined) with the impression that it will reduce the  execution time, 
but it didn't. When I checked the CPU utilization of each node, all 4 
previously added node's CPU is utilizing at its best, but the CPU of newly 
added node is not using much.

Can you please help me to figure out why it is so and how can I make sure all 
the nodes CPU is utilizing when I run a distributed query so that my query runs 
faster.
Also, what all the additional things I need to do to make my query runs faster.

Regards,
Favas



Re: Ignite Performance - Adding new node is not improving SQL query performance

2019-09-09 Thread Mikael

Hi!

If the new node is not part of the baseline topology it will not have 
any persisted data stored so any SQL query will not be of any use on the 
node as it does not have any of the data (at least that is how I 
understand it, I could be wrong here).


If so you would need to add the new node to the baseline topology to see 
any performance improvement, and of course wait for a complete rebalance 
of the data.


From docs:

"The same tools and APIs can be used to adjust the baseline topology 
throughout the cluster lifetime. It's required if you decide to scale 
out or scale in an existing topology by setting more or fewer nodes that 
will store the data. The sections below show how to use the APIs and tools."


Mikael

Den 2019-09-09 kl. 11:31, skrev Muhammed Favas:


Hi,

I have an ignite cluster with 4 node(each with 8 core, 32 GB RAM and 
30 GB Disk)  with native persistence enabled and added to baseline 
topology. There are two sql table created and loaded with 120 GB data.


One of my test sql query is taking 8 second with this set up. 
Currently I am trying various option to reduce the execution time of 
the query.


For that, I have added on more node (with same configuration) to the 
cluster (non- baselined) with the impression that it will reduce the 
 execution time, but it didn’t. When I checked the CPU utilization of 
each node, all 4 previously added node’s CPU is utilizing at its best, 
but the CPU of newly added node is not using much.


Can you please help me to figure out why it is so and how can I make 
sure all the nodes CPU is utilizing when I run a distributed query so 
that my query runs faster.


Also, what all the additional things I need to do to make my query 
runs faster.


*Regards,*

*Favas ***



Ignite Performance - Adding new node is not improving SQL query performance

2019-09-09 Thread Muhammed Favas
Hi,

I have an ignite cluster with 4 node(each with 8 core, 32 GB RAM and 30 GB 
Disk)  with native persistence enabled and added to baseline topology. There 
are two sql table created and loaded with 120 GB data.
One of my test sql query is taking 8 second with this set up. Currently I am 
trying various option to reduce the execution time of the query.

For that, I have added on more node (with same configuration) to the cluster 
(non- baselined) with the impression that it will reduce the  execution time, 
but it didn't. When I checked the CPU utilization of each node, all 4 
previously added node's CPU is utilizing at its best, but the CPU of newly 
added node is not using much.

Can you please help me to figure out why it is so and how can I make sure all 
the nodes CPU is utilizing when I run a distributed query so that my query runs 
faster.
Also, what all the additional things I need to do to make my query runs faster.

Regards,
Favas