Re: Ignite thick client triggering PME in 2.8.0 ?

2021-02-05 Thread Shiva Kumar
Hi Ilya,
In my case i have cache group configuration as attached in the config file
and connecting to visor using this config file, does this config cause PME ?

Regards,
Shiva

On Fri, Feb 5, 2021 at 3:42 PM Ilya Kasnacheev 
wrote:

> Hello!
>
> Do you have any caches defined in client configuration? If you have any
> caches there then PME will be triggered.
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> чт, 4 февр. 2021 г. в 14:37, Shiva Kumar :
>
>> Even i observed the same, during thick client or visor joining cluster
>> looks like something related to PME happens but not data rebalancing, and
>> also it is putting some lock on WAL archive segment which never gets
>> released and causing WAL disk running out of space.
>>
>> On Thu, 4 Feb, 2021, 4:59 pm Hemambara,  wrote:
>>
>>> Hi, can anyone please check and respond on this..appreciate your help in
>>> advance
>>>
>>>
>>>
>>> --
>>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>>
>>
  
 
 MY_SCHEMA
 INVENTORY
 
 


 
 
   
   
   
   
   
 
   
 
   
   
 
   
 
   
 
 
   
   
   
   
   
 
   
 
   
   
 
   
 
   
 
 
   
   
   
   
   
 
   
 
   
   
 
   
 
   
 
 
   
   
   
   
   
 
 




Re: Ignite thick client triggering PME in 2.8.0 ?

2021-02-04 Thread Shiva Kumar
Even i observed the same, during thick client or visor joining cluster
looks like something related to PME happens but not data rebalancing, and
also it is putting some lock on WAL archive segment which never gets
released and causing WAL disk running out of space.

On Thu, 4 Feb, 2021, 4:59 pm Hemambara,  wrote:

> Hi, can anyone please check and respond on this..appreciate your help in
> advance
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


Unsubscribe

2020-01-07 Thread Shiva Kumar
Unsubscribe


Does Leap Day 29.02.2020 has any impact on Apache Ignite?

2019-11-04 Thread Shiva Kumar
Hi all,

I wanted to know if Leap Day 29.02.2020 has any impact on Apache Ignite?

best regards,
shiva


Re: nodes in the baseline topology is going to OFFLINE state

2019-10-18 Thread Shiva Kumar
Hi Ilya,
My goal is to deactivate the cluster and not restart !! There is an issue
in deactivating the cluster in my deployment so I am going with restart.

I have the ignite deployment on kubernetes and during deactivation entire
cluster and even request to deactivate (rest or control.sh) hangs because I
have few applications which connected to this ignite  cluster over JDBC and
try to run some queries and also inserts records to many tables parallelly.
At this time if I issue a deactivate request it hangs for more than 25
minutes. I am in a impression that since there are many clients established
TCP connections and running queries, this is causing the cluster to hang
and thinking of restarting the cluster so that I can proceed with
deactivation easily once restart is done.
Any suggestions is appreciated.

Regards,
Shiva


On Fri, 18 Oct, 2019, 6:37 PM Ilya Kasnacheev, 
wrote:

> Hello!
>
> If cluster is persistent, you can deactivate it and then restart.
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> пт, 18 окт. 2019 г. в 09:51, shivakumar :
>
>> Hi Ilya Kasnacheev,
>> Is there any other way of gracefully shutting down/restart the entire
>> cluster?
>>
>> regards,
>> shiva
>>
>>
>>
>> --
>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>
>


Gracefully shutting down the data grid

2019-09-30 Thread Shiva Kumar
Hi all,

I am trying to deactivate a cluster which is being connected with few
clients over JDBC.
As part of these clients connections, it inserts some records to many
tables and runs some long-running queries.
At this time I am trying to deactivate the cluster [basically trying to
take data backup, so before this, I need to de-activate the cluster] But
de-activation is hanging and control.sh not returning the control and hangs
infinitely.
when I check the current cluster state with rest API calls it sometime it
returns saying cluster is inactive.
After some time I am trying to activate the cluster but it returns this
error:

[root@ignite-test]# curl "
http://ignite-service-shiv.ignite.svc.cluster.local:8080/ignite?cmd=activate&user=ignite&password=ignite";
 | jq
  % Total% Received % Xferd  Average Speed   TimeTime Time
 Current
 Dload  Upload   Total   SpentLeft
 Speed
100   207  100   2070 0   2411  0 --:--:-- --:--:-- --:--:--
 2406
{
  "successStatus": 0,
  "sessionToken": "654F094484E24232AA74F35AC5E83481",
  "error": "*Failed to activate, because another state change operation is
currently in progress: deactivate\nsuppressed: \n*",
  "response": null
}


This means that my earlier de-activation has not succeeded properly.
Is there any other way to de-activate the cluster or to terminate the
existing client connections or to terminate the running queries.
I tried "kill -k -ar" from visor shell but it restarts few nodes and it
ended up with some exception related to page corruption.
Note: My Ignite deployment is on Kubernetes

Any help is appreciated.

regards,
shiva


Re: distributed sql join not working as mentioned in documentation

2019-09-26 Thread Shiva Kumar
Hi Evgenii,
Even with *where condition*, I am getting the same error.
I have some use case where I can't collocate tables data, as Ignite doc
says non-collocated distributed join or cross join is supported in Ignite I
am trying to use that but getting this exception when I create tables in
replicated mode.
I have filed a bug  https://issues.apache.org/jira/browse/IGNITE-12201

regards,
shiva

On Mon, Sep 23, 2019 at 3:57 PM Evgenii Zhuravlev 
wrote:

> Hi,
>
> To make work this query, you can add one where clause or join condition in
> the query, for example: where c.id = city_id;. I don't really understand
> why do you want to run a fully distributed cross join on these tables - it
> doesn't make sense, moreover, it will lead to the a lot of data movement
> between nodes.
>
> What are you trying to achieve?
>
> Best Regards,
> Evgenii
>
> чт, 19 сент. 2019 г. в 16:18, Shiva Kumar :
>
>> Hi all,
>> I am trying to do a simple cross join on two tables with non-collocated
>> data (without affinity key),
>> This non-collocated distributed join always fails with the error message:
>>
>> *"java.sql.SQLException: javax.cache.CacheException: Failed to prepare
>> distributed join query: join condition does not use index "*
>>
>> If I create one of the tables in replicated mode and another one in
>> partitioned mode this Join operation works but documentation mentions that
>> Ignite supports non-collocated joins without any condition.
>> And we tried with 3 tables and 1 in replicated and other 2 in partitioned
>> then we observed that it failed.
>> we are running the Join operations with *distributedJoins=true.*
>> *We observed that if there are N tables in Join operation then (N-1)
>> should be in replicated mode, is our understanding right?*
>> *If our understanding is correct then to do Join operation the
>> dimensioning of cluster increases by many folds which can't be used in a
>> production environment.*
>> *To reproduce:*
>> *Ignite with 4 node cluster with native persistence enabled.*
>> *create the following tables*
>>
>> CREATE TABLE City (
>>
>>   id LONG PRIMARY KEY, name VARCHAR)
>>
>>   WITH "backup=1";
>>
>> CREATE TABLE Person (
>>
>>   id LONG, name VARCHAR, city_id LONG, PRIMARY KEY (id, city_id))
>>
>>   WITH "backups=1";
>>
>> CREATE INDEX idx_city_name ON City (name);
>>
>> CREATE INDEX idx_person_name ON Person (name);
>>
>>
>> INSERT INTO City (id, name) VALUES (1, 'Forest Hill');
>>
>> INSERT INTO City (id, name) VALUES (2, 'Denver');
>>
>> INSERT INTO City (id, name) VALUES (3, 'St. Petersburg');
>>
>> INSERT INTO Person (id, name, city_id) VALUES (1, 'John Doe', 3);
>>
>> INSERT INTO Person (id, name, city_id) VALUES (2, 'Jane Roe', 2);
>>
>> INSERT INTO Person (id, name, city_id) VALUES (3, 'Mary Major', 1);
>>
>> INSERT INTO Person (id, name, city_id) VALUES (4, 'Richard Miles', 2);
>>
>>
>> Query to be run:
>>
>> select * from City c, Person p;
>>
>> or
>> *SELECT* * *FROM* City *AS* c *CROSS* *join* Person *AS* p;
>>
>>
>>
>>


Re: nodes are restarting when i try to drop a table created with persistence enabled

2019-09-26 Thread Shiva Kumar
Hi dmagda,

When I insert many records (~ 10 or 20 million) to the same table and try
to drop table or delete records from the table, nodes are restarting, the
restarts happens In the middle of drop or delete operation.
According to the logs the cause for restart looks like OOM in the data
region.

regards,
shiva

On Wed, Sep 25, 2019 at 1:12 PM Denis Mekhanikov 
wrote:

> I think, the issue is that Ignite can't recover from
> IgniteOutOfMemory, even by removing data.
> Shiva, did IgniteOutOfMemory occur for the first time when you did the
> DROP TABLE, or before that?
>
> Denis
>
> ср, 25 сент. 2019 г. в 02:30, Denis Magda :
> >
> > Shiva,
> >
> > Does this issue still exist? Ignite Dev how do we debug this sort of
> thing?
> >
> > -
> > Denis
> >
> >
> > On Tue, Sep 17, 2019 at 7:22 AM Shiva Kumar 
> wrote:
> >>
> >> Hi dmagda,
> >>
> >> I am trying to drop the table which has around 10 million records and I
> am seeing "Out of memory in data region" error messages in Ignite logs and
> ignite node [Ignite pod on kubernetes] is restarting.
> >> I have configured 3GB for default data region, 7GB for JVM and total
> 15GB for Ignite container and enabled native persistence.
> >> Earlier I was in an impression that restart was caused by
> "SYSTEM_WORKER_BLOCKED" errors but now I am realized that
> "SYSTEM_WORKER_BLOCKED" is added to ignore failure list and the actual
> cause is " CRITICAL_ERROR " due to  "Out of memory in data region"
> >>
> >> This is the error messages in logs:
> >>
> >> ""[2019-09-17T08:25:35,054][ERROR][sys-#773][] JVM will be halted
> immediately due to the failure: [failureCtx=FailureContext
> [type=CRITICAL_ERROR, err=class o.a.i.i.mem.IgniteOutOfMemoryException:
> Failed to find a page for eviction [segmentCapacity=971652, loaded=381157,
> maxDirtyPages=285868, dirtyPages=381157, cpPages=0, pinnedInSegment=3,
> failedToPrepare=381155]
> >> Out of memory in data region [name=Default_Region, initSize=500.0 MiB,
> maxSize=3.0 GiB, persistenceEnabled=true] Try the following:
> >>   ^-- Increase maximum off-heap memory size
> (DataRegionConfiguration.maxSize)
> >>   ^-- Enable Ignite persistence
> (DataRegionConfiguration.persistenceEnabled)
> >>   ^-- Enable eviction or expiration policies]]
> >>
> >> Could you please help me on why drop table operation causing  "Out of
> memory in data region"? and how I can avoid it?
> >>
> >> We have a use case where application inserts records to many tables in
> Ignite simultaneously for some time period and other applications run a
> query on that time period data and update the dashboard. we need to delete
> the records inserted in the previous time period before inserting new
> records.
> >>
> >> even during delete from table operation, I have seen:
> >>
> >> "Critical system error detected. Will be handled accordingly to
> configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false,
> timeout=0, super=AbstractFailureHandler
> [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext
> [type=CRITICAL_ERROR, err=class o.a.i.IgniteException: Checkpoint read lock
> acquisition has been timed out.]] class org.apache.ignite.IgniteException:
> Checkpoint read lock acquisition has been timed out.|
> >>
> >>
> >>
> >> On Mon, Apr 29, 2019 at 12:17 PM Denis Magda  wrote:
> >>>
> >>> Hi Shiva,
> >>>
> >>> That was designed to prevent global cluster performance degradation or
> other outages. Have you tried to apply my recommendation of turning of the
> failure handler for this system threads?
> >>>
> >>> -
> >>> Denis
> >>>
> >>>
> >>> On Sun, Apr 28, 2019 at 10:28 AM shivakumar 
> wrote:
> >>>>
> >>>> HI Denis,
> >>>>
> >>>> is there any specific reason for the blocking of critical thread,
> like CPU
> >>>> is full or Heap is full ?
> >>>> We are again and again hitting this issue.
> >>>> is there any other way to drop tables/cache ?
> >>>> This looks like a critical issue.
> >>>>
> >>>> regards,
> >>>> shiva
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


index corrupted error : org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException: Runtime failure on search row

2019-09-20 Thread Shiva Kumar
Hi all,
I have deployed 3 node Ignite cluster with native persistence on Kubernetes
and one of the node crashed with below error message,

*org.h2.message.DbException: General error: "class
org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException:
Runtime failure on search row: Row@8cfe967[ key: epro_model_abcdKey
[idHash=822184780, hash=737706081, NE_ID=, NAME=], val: epro_model_abcd
[idHash=60444003, hash=1539928610, epro_ID=51, LONGITUDE=null,
DELETE_TIME=null, VENDOR=null, CREATE_TIME=2019-09-19T20:38:32.361929Z,
UPDATE_TIME=2019-09-19T20:40:05.821447Z, ADDITIONAL_INFO=null,
VALID_UNTIL=2019-11-18T20:38:32.362036Z, TYPE=null, LATITUDE=null], ver:
GridCacheVersion [topVer=180326822, order=1568925345552, nodeOrder=6] ][
51, 2019-09-19T20:38:32.361929Z, 2019-09-19T20:40:05.821447Z, null,
2019-11-18T20:38:32.362036Z, , , null, null, null, null, null ]"
[5-197]|*

Please find attached file [index_corruption.txt] for complete backtrace.

It looks like the Index got corrupted, I am not sure what exactly caused
the index to corrupt. Any knows issues related to this?

In my cluster, many applications write into many tables simultaneously and
some queries run on many tables simultaneously and frequently application
deletes unwanted rows[old data] in the tables using *delete from table* SQL
operation.
Failed to reinitialize local partitions (rebalancing will be stopped): 
GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=21, 
minorTopVer=0], discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode 
[id=f3d7fb8c-0cda-42d0-a171-0155a171405b, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 
192.168.*.*], sockAddrs=[/0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500, 
ignite-cluster-ignite-epro-0.ignite-service.default.svc.cluster.local/192.168.*.*:47500],
 discPort=47500, order=21, intOrder=12, lastExchangeTime=1568926175782, 
loc=true, ver=2.7.0#19700101-sha1:, isClient=false], topVer=21, 
nodeId8=f3d7fb8c, msg=null, type=NODE_JOINED, tstamp=1568926160054], 
nodeId=f3d7fb8c, evt=NODE_JOINED] org.h2.message.DbException: General error: 
"class 
org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException:
 Runtime failure on search row: Row@8cfe967[ key: epro_model_abcdKey 
[idHash=822184780, hash=737706081, NE_ID=, NAME=], val: epro_model_abcd 
[idHash=60444003, hash=1539928610, epro_ID=51, LONGITUDE=null, 
DELETE_TIME=null, VENDOR=null, CREATE_TIME=2019-09-19T20:38:32.361929Z, 
UPDATE_TIME=2019-09-19T20:40:05.821447Z, ADDITIONAL_INFO=null, 
VALID_UNTIL=2019-11-18T20:38:32.362036Z, TYPE=null, LATITUDE=null], ver: 
GridCacheVersion [topVer=180326822, order=1568925345552, nodeOrder=6] ][ 51, 
2019-09-19T20:38:32.361929Z, 2019-09-19T20:40:05.821447Z, null, 
2019-11-18T20:38:32.362036Z, , , null, null, null, null, null ]" [5-197]|   
   at org.h2.message.DbException.get(DbException.java:168)|at 
org.h2.message.DbException.convert(DbException.java:307)|at 
org.apache.ignite.internal.processors.query.h2.database.H2TreeIndex.removex(H2TreeIndex.java:348)|
   at 
org.apache.ignite.internal.processors.query.h2.opt.GridH2Table.addToIndex(GridH2Table.java:550)|
 at 
org.apache.ignite.internal.processors.query.h2.opt.GridH2Table.update(GridH2Table.java:479)|
 at 
org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.store(IgniteH2Indexing.java:768)|
at 
org.apache.ignite.internal.processors.query.GridQueryProcessor.store(GridQueryProcessor.java:1905)|
  at 
org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager.store(GridCacheQueryManager.java:404)|
   at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.finishUpdate(IgniteCacheOffheapManagerImpl.java:2633)|
  at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.update(IgniteCacheOffheapManagerImpl.java:2524)|
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.update(GridCacheOffheapManager.java:1759)|
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.update(IgniteCacheOffheapManagerImpl.java:443)|
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.applyUpdate(GridCacheDatabaseSharedManager.java:2653)|
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.applyLastUpdates(GridCacheDatabaseSharedManager.java:2339)|
   at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreState(GridCacheDatabaseSharedManager.java:1628)|
   at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.beforeExchange(GridCacheDatabaseSharedManager.java:1302)|
 at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:1453)|
at 
org.

distributed sql join not working as mentioned in documentation

2019-09-19 Thread Shiva Kumar
Hi all,
I am trying to do a simple cross join on two tables with non-collocated
data (without affinity key),
This non-collocated distributed join always fails with the error message:

*"java.sql.SQLException: javax.cache.CacheException: Failed to prepare
distributed join query: join condition does not use index "*

If I create one of the tables in replicated mode and another one in
partitioned mode this Join operation works but documentation mentions that
Ignite supports non-collocated joins without any condition.
And we tried with 3 tables and 1 in replicated and other 2 in partitioned
then we observed that it failed.
we are running the Join operations with *distributedJoins=true.*
*We observed that if there are N tables in Join operation then (N-1) should
be in replicated mode, is our understanding right?*
*If our understanding is correct then to do Join operation the dimensioning
of cluster increases by many folds which can't be used in a production
environment.*
*To reproduce:*
*Ignite with 4 node cluster with native persistence enabled.*
*create the following tables*

CREATE TABLE City (

  id LONG PRIMARY KEY, name VARCHAR)

  WITH "backup=1";

CREATE TABLE Person (

  id LONG, name VARCHAR, city_id LONG, PRIMARY KEY (id, city_id))

  WITH "backups=1";

CREATE INDEX idx_city_name ON City (name);

CREATE INDEX idx_person_name ON Person (name);


INSERT INTO City (id, name) VALUES (1, 'Forest Hill');

INSERT INTO City (id, name) VALUES (2, 'Denver');

INSERT INTO City (id, name) VALUES (3, 'St. Petersburg');

INSERT INTO Person (id, name, city_id) VALUES (1, 'John Doe', 3);

INSERT INTO Person (id, name, city_id) VALUES (2, 'Jane Roe', 2);

INSERT INTO Person (id, name, city_id) VALUES (3, 'Mary Major', 1);

INSERT INTO Person (id, name, city_id) VALUES (4, 'Richard Miles', 2);


Query to be run:

select * from City c, Person p;

or
*SELECT* * *FROM* City *AS* c *CROSS* *join* Person *AS* p;


liveness and rediness probe configuration for Ignite on kubernetes

2019-09-18 Thread Shiva Kumar
Hi all,
I have deployed Ignite on Kubernetes and configured liveness and readiness
probe like this
readinessProbe:
  tcpSocket:
port: 10800
  initialDelaySeconds: 10
  periodSeconds: 2
  failureThreshold: 60
livenessProbe:
  httpGet:
   scheme: HTTP
   path: /ignite?cmd=version&user=ignite&password=ignite
   port: 8080
  initialDelaySeconds: 60
  periodSeconds: 20

where 10800 is SQL port and 8080 is rest port.

The problem I am facing here is, If pod restarts for some reason then
liveness probe is failing because during recovery of Ignite node(pod) it is
failing to respond to rest API (liveness probe) and this leads to again
restart by Kubernetes and pod going into crashLoopBackOff state.

Is there any other best way of configuring liveness probe?
during recovery of Ignite node, why it is failing to respond to a simple
rest API query ( /ignite?cmd=version ) ?

regards,
shiva


Re: nodes are restarting when i try to drop a table created with persistence enabled

2019-09-17 Thread Shiva Kumar
Hi dmagda,

I am trying to drop the table which has around 10 million records and I am
seeing "*Out of memory in data region*" error messages in Ignite logs and
ignite node [Ignite pod on kubernetes] is restarting.
I have configured 3GB for default data region, 7GB for JVM and total 15GB
for Ignite container and enabled native persistence.
Earlier I was in an impression that restart was caused by "
*SYSTEM_WORKER_BLOCKED*" errors but now I am realized that  "
*SYSTEM_WORKER_BLOCKED*" is added to ignore failure list and the actual
cause is " *CRITICAL_ERROR* " due to  "*Out of memory in data region"*

This is the error messages in logs:

""[2019-09-17T08:25:35,054][ERROR][sys-#773][] *JVM will be halted
immediately due to the failure: [failureCtx=FailureContext
[type=CRITICAL_ERROR, err=class o.a.i.i.mem.IgniteOutOfMemoryException:
Failed to find a page for eviction* [segmentCapacity=971652, loaded=381157,
maxDirtyPages=285868, dirtyPages=381157, cpPages=0, pinnedInSegment=3,
failedToPrepare=381155]
*Out of memory in data region* [name=Default_Region, initSize=500.0 MiB,
maxSize=3.0 GiB, persistenceEnabled=true] Try the following:
  ^-- Increase maximum off-heap memory size
(DataRegionConfiguration.maxSize)
  ^-- Enable Ignite persistence (DataRegionConfiguration.persistenceEnabled)
  ^-- Enable eviction or expiration policies]]

Could you please help me on why *drop table operation* causing  "*Out of
memory in data region"*? and how I can avoid it?

We have a use case where application inserts records to many tables in
Ignite simultaneously for some time period and other applications run a
query on that time period data and update the dashboard. we need to delete
the records inserted in the previous time period before inserting new
records.

even during *delete from table* operation, I have seen:

"Critical system error detected. Will be handled accordingly to
configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false,
timeout=0, super=AbstractFailureHandler
[ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]],
failureCtx=FailureContext [*type=CRITICAL_ERROR*, err=class
o.a.i.IgniteException: *Checkpoint read lock acquisition has been
timed out*.]] class org.apache.ignite.IgniteException: Checkpoint read
lock acquisition has been timed out.|



On Mon, Apr 29, 2019 at 12:17 PM Denis Magda  wrote:

> Hi Shiva,
>
> That was designed to prevent global cluster performance degradation or
> other outages. Have you tried to apply my recommendation of turning of the
> failure handler for this system threads?
>
> -
> Denis
>
>
> On Sun, Apr 28, 2019 at 10:28 AM shivakumar 
> wrote:
>
>> HI Denis,
>>
>> is there any specific reason for the blocking of critical thread, like CPU
>> is full or Heap is full ?
>> We are again and again hitting this issue.
>> is there any other way to drop tables/cache ?
>> This looks like a critical issue.
>>
>> regards,
>> shiva
>>
>>
>>
>> --
>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>
>


Re: Cache expiry policy not deleting records from disk(native persistence)

2019-09-10 Thread Shiva Kumar
I have filed a bug https://issues.apache.org/jira/browse/IGNITE-12152 but
this is same as https://issues.apache.org/jira/browse/IGNITE-10862
Any idea on the timeline of these tickets?
In the documentation
https://apacheignite.readme.io/v2.7/docs/expiry-policies
it says when native persistence is enabled "*expired entries are removed
from both memory and disk tiers*" but in the disk it just mark the pages as
unwanted pages and same disk space used by these unwanted pages will be
used to store new pages but it will not remove unwanted pages from disk and
so it will not release disk space used by these unwanted pages.

here is the developer's discussion link
http://apache-ignite-developers.2346864.n4.nabble.com/How-to-free-up-space-on-disc-after-removing-entries-from-IgniteCache-with-enabled-PDS-td39839.html


On Mon, Sep 9, 2019 at 11:53 PM Shiva Kumar 
wrote:

> Hi
> I have deployed ignite on kubernetes and configured two seperate
> persistent volume for WAL and persistence.
> The issue Iam facing is same as
> https://issues.apache.org/jira/browse/IGNITE-10862
>
> Thanks
> Shiva
>
> On Mon, 9 Sep, 2019, 10:47 PM Andrei Aleksandrov, 
> wrote:
>
>> Hello,
>>
>> I guess that generated WAL will take this disk space. Please read about
>> WAL here:
>>
>> https://apacheignite.readme.io/docs/write-ahead-log
>>
>> Please provide the size of every folder under /opt/ignite/persistence.
>>
>> BR,
>> Andrei
>> 9/6/2019 9:45 PM, Shiva Kumar пишет:
>>
>> Hi all,
>> I have set cache expiry policy like this
>>
>>
>>
>>
>> 
>> > class="org.apache.ignite.configuration.CacheConfiguration">
>>   
>>   
>>   
>>   
>>   
>> > factory-method="factoryOf">
>>   
>> 
>>   
>>   
>> 
>>   
>> 
>>   
>>
>> 
>> 
>>
>>
>>
>> And batch inserting records to one of the table which is created with
>> above cache template.
>> Around 10 minutes, I ingested ~1.5GB of data and after 10 minutes records
>> started reducing(expiring) when I monitored from sqlline.
>>
>> 0: jdbc:ignite:thin://192.168.*.*:10800> select count(ID) from DIMENSIONS;
>> 
>>
>> COUNT(ID)
>> 
>>
>> 248896
>> 
>> 1 row selected (0.86 seconds)
>> 0: jdbc:ignite:thin://192.168.*.*:10800> select count(ID) from DIMENSIONS;
>> 
>>
>> COUNT(ID)
>> 
>>
>> 222174
>> 
>> 1 row selected (0.313 seconds)
>> 0: jdbc:ignite:thin://192.168.*.*:10800> select count(ID) from DIMENSIONS;
>> 
>>
>> COUNT(ID)
>> 
>>
>> 118154
>> 
>> 1 row selected (0.15 seconds)
>> 0: jdbc:ignite:thin://192.168.*.*:10800>
>> 0: jdbc:ignite:thin://192.168.*.*:10800> select count(ID) from DIMENSIONS;
>> 
>>
>> COUNT(ID)
>> 
>>
>> 76061
>> 
>> 1 row selected (0.106 seconds)
>> 0: jdbc:ignite:thin://192.168.*.*:10800>
>> 0: jdbc:ignite:thin://192.168.*.*:10800> select count(ID) from DIMENSIONS;
>> 
>>
>> COUNT(ID)
>> 
>>
>> 41671
>> 
>> 1 row selected (0.063 seconds)
>> 0: jdbc:ignite:thin://192.168.*.*:10800> select count(ID) from DIMENSIONS;
>> 
>>
>> COUNT(ID)
>> 
>>
>> 18455
>> 
>> 1 row selected (0.037 seconds)
>> 0: jdbc:ignite:thin://192.168.*.*:10800> select count(ID) from DIMENSIONS;
>> 
>>
>> COUNT(ID)
>> 
>>
>> 0
>> 
>> 1 row selected (0.014 seconds)
>>
>>
>> But in the meantime, the disk space used by the persistence store was i

Re: Cache expiry policy not deleting records from disk(native persistence)

2019-09-09 Thread Shiva Kumar
Hi
I have deployed ignite on kubernetes and configured two seperate persistent
volume for WAL and persistence.
The issue Iam facing is same as
https://issues.apache.org/jira/browse/IGNITE-10862

Thanks
Shiva

On Mon, 9 Sep, 2019, 10:47 PM Andrei Aleksandrov, 
wrote:

> Hello,
>
> I guess that generated WAL will take this disk space. Please read about
> WAL here:
>
> https://apacheignite.readme.io/docs/write-ahead-log
>
> Please provide the size of every folder under /opt/ignite/persistence.
>
> BR,
> Andrei
> 9/6/2019 9:45 PM, Shiva Kumar пишет:
>
> Hi all,
> I have set cache expiry policy like this
>
>
>
>
> 
>  class="org.apache.ignite.configuration.CacheConfiguration">
>   
>   
>   
>   
>   
>  factory-method="factoryOf">
>   
> 
>   
>   
> 
>   
> 
>   
>
> 
> 
>
>
>
> And batch inserting records to one of the table which is created with
> above cache template.
> Around 10 minutes, I ingested ~1.5GB of data and after 10 minutes records
> started reducing(expiring) when I monitored from sqlline.
>
> 0: jdbc:ignite:thin://192.168.*.*:10800> select count(ID) from DIMENSIONS;
> 
>
> COUNT(ID)
> 
>
> 248896
> 
> 1 row selected (0.86 seconds)
> 0: jdbc:ignite:thin://192.168.*.*:10800> select count(ID) from DIMENSIONS;
> 
>
> COUNT(ID)
> 
>
> 222174
> 
> 1 row selected (0.313 seconds)
> 0: jdbc:ignite:thin://192.168.*.*:10800> select count(ID) from DIMENSIONS;
> 
>
> COUNT(ID)
> 
>
> 118154
> 
> 1 row selected (0.15 seconds)
> 0: jdbc:ignite:thin://192.168.*.*:10800>
> 0: jdbc:ignite:thin://192.168.*.*:10800> select count(ID) from DIMENSIONS;
> 
>
> COUNT(ID)
> 
>
> 76061
> 
> 1 row selected (0.106 seconds)
> 0: jdbc:ignite:thin://192.168.*.*:10800>
> 0: jdbc:ignite:thin://192.168.*.*:10800> select count(ID) from DIMENSIONS;
> 
>
> COUNT(ID)
> 
>
> 41671
> 
> 1 row selected (0.063 seconds)
> 0: jdbc:ignite:thin://192.168.*.*:10800> select count(ID) from DIMENSIONS;
> 
>
> COUNT(ID)
> 
>
> 18455
> 
> 1 row selected (0.037 seconds)
> 0: jdbc:ignite:thin://192.168.*.*:10800> select count(ID) from DIMENSIONS;
> 
>
> COUNT(ID)
> 
>
> 0
> 
> 1 row selected (0.014 seconds)
>
>
> But in the meantime, the disk space used by the persistence store was in
> the same usage level instead of decreasing.
>
>
> [ignite@ignite-cluster-ign-shiv-0 ignite]$ while true ; do df -h
> /opt/ignite/persistence/; sleep 1s; done
> Filesystem Size Used Avail Use% Mounted on
> /dev/vdj 15G 1.6G 14G 11% /opt/ignite/persistence
> Filesystem Size Used Avail Use% Mounted on
> /dev/vdj 15G 1.6G 14G 11% /opt/ignite/persistence
> Filesystem Size Used Avail Use% Mounted on
> /dev/vdj 15G 1.6G 14G 11% /opt/ignite/persistence
> Filesystem Size Used Avail Use% Mounted on
> /dev/vdj 15G 1.6G 14G 11% /opt/ignite/persistence
> Filesystem Size Used Avail Use% Mounted on
> /dev/vdj 15G 1.6G 14G 11% /opt/ignite/persistence
> Filesystem Size Used Avail Use% Mounted on
> /dev/vdj 15G 1.6G 14G 11% /opt/ignite/persistence
> Filesystem Size Used Avail Use% Mounted on
> /dev/vdj 15G 1.6G 14G 11% /opt/ignite/persistence
>
>
>
> This means that expiry policy not deleting records from the disk, but
> ignite document says when expiry policy is set and native persistence is
> enabled then it deletes records from disk as well.
> Am I missing some configuration?
> Any help is appreciated.
>
> Shiva
>
>


Cache expiry policy not deleting records from disk(native persistence)

2019-09-06 Thread Shiva Kumar
Hi all,
I have set cache expiry policy like this


   
   


  
  
  
  
  

  

  
  

  

  



   


And batch inserting records to one of the table which is created with above
cache template.
Around 10 minutes, I ingested ~1.5GB of data and after 10 minutes records
started reducing(expiring) when I monitored from sqlline.

0: jdbc:ignite:thin://192.168.*.*:10800> select count(ID) from DIMENSIONS;


COUNT(ID)


248896

1 row selected (0.86 seconds)
0: jdbc:ignite:thin://192.168.*.*:10800> select count(ID) from DIMENSIONS;


COUNT(ID)


222174

1 row selected (0.313 seconds)
0: jdbc:ignite:thin://192.168.*.*:10800> select count(ID) from DIMENSIONS;


COUNT(ID)


118154

1 row selected (0.15 seconds)
0: jdbc:ignite:thin://192.168.*.*:10800>
0: jdbc:ignite:thin://192.168.*.*:10800> select count(ID) from DIMENSIONS;


COUNT(ID)


76061

1 row selected (0.106 seconds)
0: jdbc:ignite:thin://192.168.*.*:10800>
0: jdbc:ignite:thin://192.168.*.*:10800> select count(ID) from DIMENSIONS;


COUNT(ID)


41671

1 row selected (0.063 seconds)
0: jdbc:ignite:thin://192.168.*.*:10800> select count(ID) from DIMENSIONS;


COUNT(ID)


18455

1 row selected (0.037 seconds)
0: jdbc:ignite:thin://192.168.*.*:10800> select count(ID) from DIMENSIONS;


COUNT(ID)


0

1 row selected (0.014 seconds)


But in the meantime, the disk space used by the persistence store was in
the same usage level instead of decreasing.


[ignite@ignite-cluster-ign-shiv-0 ignite]$ while true ; do df -h
/opt/ignite/persistence/; sleep 1s; done
Filesystem Size Used Avail Use% Mounted on
/dev/vdj 15G 1.6G 14G 11% /opt/ignite/persistence
Filesystem Size Used Avail Use% Mounted on
/dev/vdj 15G 1.6G 14G 11% /opt/ignite/persistence
Filesystem Size Used Avail Use% Mounted on
/dev/vdj 15G 1.6G 14G 11% /opt/ignite/persistence
Filesystem Size Used Avail Use% Mounted on
/dev/vdj 15G 1.6G 14G 11% /opt/ignite/persistence
Filesystem Size Used Avail Use% Mounted on
/dev/vdj 15G 1.6G 14G 11% /opt/ignite/persistence
Filesystem Size Used Avail Use% Mounted on
/dev/vdj 15G 1.6G 14G 11% /opt/ignite/persistence
Filesystem Size Used Avail Use% Mounted on
/dev/vdj 15G 1.6G 14G 11% /opt/ignite/persistence



This means that expiry policy not deleting records from the disk, but
ignite document says when expiry policy is set and native persistence is
enabled then it deletes records from disk as well.
Am I missing some configuration?
Any help is appreciated.

Shiva


Re: Capacity planning for production deployment on kubernetes

2019-08-22 Thread Shiva Kumar
Hi Denis,

Thanks for your response,
yes in our test also we have seen OOM errors and pod crash.
so we will follow the recommendation for RAM requirements and also I was
checking to ignite documentation on disk space required for WAL + WAL
archive.
here in this link
https://apacheignite.readme.io/docs/write-ahead-log#section-wal-archive

it says: archive size is defined as 4 times the size of the checkpointing
buffer and checkpointing buffer is a function of the data region (
https://apacheignite.readme.io/docs/durable-memory-tuning#section-checkpointing-buffer-size
)

but in this link
https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood#IgnitePersistentStore-underthehood-SubfoldersGeneration

under *Estimating disk space* section it explains something to estimate
disk space required for WAL but it is not clear, can you please help me the
correct recommendation for calculating the disk space required for WAL+WAL
archive.

In one of my testing, I configured 4GB for data region and 10GB for WAL+WAL
archive but our pods crashing as disk mounted for WAL+WAL archive runs out
of space.

[ignite@ignite-cluster-ignite-node-2 ignite]$* df -h*
Filesystem  Size  Used Avail Use% Mounted on
overlay 158G   39G  112G  26% /
tmpfs63G 0   63G   0% /dev
tmpfs63G 0   63G   0% /sys/fs/cgroup
/dev/vda1   158G   39G  112G  26% /etc/hosts
shm  64M 0   64M   0% /dev/shm
*/dev/vdq9.8G  9.7G   44M 100% /opt/ignite/wal*
/dev/vdr 50G  1.4G   48G   3% /opt/ignite/persistence
tmpfs63G   12K   63G   1% /run/secrets/
kubernetes.io/serviceaccount
tmpfs63G 0   63G   0% /proc/acpi
tmpfs63G 0   63G   0% /proc/scsi
tmpfs63G 0   63G   0% /sys/firmware


and this is the error message in ignite node:

"ERROR","JVM will be halted immediately due to the failure:
[failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=class
o.a.i.IgniteCheckedException: Failed to archive WAL segment
[srcFile=/opt/ignite/wal/node00-37ea8ba6-3198-46a1-9e9e-38aff27ed9c9/0006.wal,
dstFile=/opt/ignite/wal/archive/node00-37ea8ba6-3198-46a1-9e9e-38aff27ed9c9/0236.wal.tmp]]]"


On Thu, Aug 22, 2019 at 8:04 PM Denis Mekhanikov 
wrote:

> Shivakumar,
>
> Such allocation doesn’t allow full memory utilization, so it’s possible,
> that nodes will crash because of out of memory errors.
> So, it’s better to follow the given recommendation.
>
> If you want us to investigate reasons of the failures, please provide logs
> and configuration of the failed nodes.
>
> Denis
> On 21 Aug 2019, 16:17 +0300, Shiva Kumar ,
> wrote:
>
> Hi all,
> we are testing field use case before deploying in the field and we want to
> know whether below resource limits are suitable in production.
> There are 3 nodes (3 pods on kubernetes) running. Each having below
> configuration
>
>DefaultDataRegion: 60GB
> JVM: 32GB
> Resource allocated for each container: 64GB
>
> And ignite documents says (JVM+ All DataRegion) should not exceed 70% of
> total RAM allocated to each node(container).
> but we started testing with the above configuration and up to 9 days
> ignite cluster was running successfully and there was some data ingestion
> but suddenly pods crashed and they were unable to recover from the crash.
> does the above resource configuration not good for node recovery??
>
>


Capacity planning for production deployment on kubernetes

2019-08-21 Thread Shiva Kumar
Hi all,
we are testing field use case before deploying in the field and we want to
know whether below resource limits are suitable in production.
There are 3 nodes (3 pods on kubernetes) running. Each having below
configuration

   DefaultDataRegion: 60GB
JVM: 32GB
Resource allocated for each container: 64GB

And ignite documents says (JVM+ All DataRegion) should not exceed 70% of
total RAM allocated to each node(container).
but we started testing with the above configuration and up to 9 days ignite
cluster was running successfully and there was some data ingestion but
suddenly pods crashed and they were unable to recover from the crash.
does the above resource configuration not good for node recovery??