Re: Ignite thick client triggering PME in 2.8.0 ?
Hi Ilya, In my case i have cache group configuration as attached in the config file and connecting to visor using this config file, does this config cause PME ? Regards, Shiva On Fri, Feb 5, 2021 at 3:42 PM Ilya Kasnacheev wrote: > Hello! > > Do you have any caches defined in client configuration? If you have any > caches there then PME will be triggered. > > Regards, > -- > Ilya Kasnacheev > > > чт, 4 февр. 2021 г. в 14:37, Shiva Kumar : > >> Even i observed the same, during thick client or visor joining cluster >> looks like something related to PME happens but not data rebalancing, and >> also it is putting some lock on WAL archive segment which never gets >> released and causing WAL disk running out of space. >> >> On Thu, 4 Feb, 2021, 4:59 pm Hemambara, wrote: >> >>> Hi, can anyone please check and respond on this..appreciate your help in >>> advance >>> >>> >>> >>> -- >>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/ >>> >> MY_SCHEMA INVENTORY
Re: Ignite thick client triggering PME in 2.8.0 ?
Even i observed the same, during thick client or visor joining cluster looks like something related to PME happens but not data rebalancing, and also it is putting some lock on WAL archive segment which never gets released and causing WAL disk running out of space. On Thu, 4 Feb, 2021, 4:59 pm Hemambara, wrote: > Hi, can anyone please check and respond on this..appreciate your help in > advance > > > > -- > Sent from: http://apache-ignite-users.70518.x6.nabble.com/ >
Unsubscribe
Unsubscribe
Does Leap Day 29.02.2020 has any impact on Apache Ignite?
Hi all, I wanted to know if Leap Day 29.02.2020 has any impact on Apache Ignite? best regards, shiva
Re: nodes in the baseline topology is going to OFFLINE state
Hi Ilya, My goal is to deactivate the cluster and not restart !! There is an issue in deactivating the cluster in my deployment so I am going with restart. I have the ignite deployment on kubernetes and during deactivation entire cluster and even request to deactivate (rest or control.sh) hangs because I have few applications which connected to this ignite cluster over JDBC and try to run some queries and also inserts records to many tables parallelly. At this time if I issue a deactivate request it hangs for more than 25 minutes. I am in a impression that since there are many clients established TCP connections and running queries, this is causing the cluster to hang and thinking of restarting the cluster so that I can proceed with deactivation easily once restart is done. Any suggestions is appreciated. Regards, Shiva On Fri, 18 Oct, 2019, 6:37 PM Ilya Kasnacheev, wrote: > Hello! > > If cluster is persistent, you can deactivate it and then restart. > > Regards, > -- > Ilya Kasnacheev > > > пт, 18 окт. 2019 г. в 09:51, shivakumar : > >> Hi Ilya Kasnacheev, >> Is there any other way of gracefully shutting down/restart the entire >> cluster? >> >> regards, >> shiva >> >> >> >> -- >> Sent from: http://apache-ignite-users.70518.x6.nabble.com/ >> >
Gracefully shutting down the data grid
Hi all, I am trying to deactivate a cluster which is being connected with few clients over JDBC. As part of these clients connections, it inserts some records to many tables and runs some long-running queries. At this time I am trying to deactivate the cluster [basically trying to take data backup, so before this, I need to de-activate the cluster] But de-activation is hanging and control.sh not returning the control and hangs infinitely. when I check the current cluster state with rest API calls it sometime it returns saying cluster is inactive. After some time I am trying to activate the cluster but it returns this error: [root@ignite-test]# curl " http://ignite-service-shiv.ignite.svc.cluster.local:8080/ignite?cmd=activate&user=ignite&password=ignite"; | jq % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 100 207 100 2070 0 2411 0 --:--:-- --:--:-- --:--:-- 2406 { "successStatus": 0, "sessionToken": "654F094484E24232AA74F35AC5E83481", "error": "*Failed to activate, because another state change operation is currently in progress: deactivate\nsuppressed: \n*", "response": null } This means that my earlier de-activation has not succeeded properly. Is there any other way to de-activate the cluster or to terminate the existing client connections or to terminate the running queries. I tried "kill -k -ar" from visor shell but it restarts few nodes and it ended up with some exception related to page corruption. Note: My Ignite deployment is on Kubernetes Any help is appreciated. regards, shiva
Re: distributed sql join not working as mentioned in documentation
Hi Evgenii, Even with *where condition*, I am getting the same error. I have some use case where I can't collocate tables data, as Ignite doc says non-collocated distributed join or cross join is supported in Ignite I am trying to use that but getting this exception when I create tables in replicated mode. I have filed a bug https://issues.apache.org/jira/browse/IGNITE-12201 regards, shiva On Mon, Sep 23, 2019 at 3:57 PM Evgenii Zhuravlev wrote: > Hi, > > To make work this query, you can add one where clause or join condition in > the query, for example: where c.id = city_id;. I don't really understand > why do you want to run a fully distributed cross join on these tables - it > doesn't make sense, moreover, it will lead to the a lot of data movement > between nodes. > > What are you trying to achieve? > > Best Regards, > Evgenii > > чт, 19 сент. 2019 г. в 16:18, Shiva Kumar : > >> Hi all, >> I am trying to do a simple cross join on two tables with non-collocated >> data (without affinity key), >> This non-collocated distributed join always fails with the error message: >> >> *"java.sql.SQLException: javax.cache.CacheException: Failed to prepare >> distributed join query: join condition does not use index "* >> >> If I create one of the tables in replicated mode and another one in >> partitioned mode this Join operation works but documentation mentions that >> Ignite supports non-collocated joins without any condition. >> And we tried with 3 tables and 1 in replicated and other 2 in partitioned >> then we observed that it failed. >> we are running the Join operations with *distributedJoins=true.* >> *We observed that if there are N tables in Join operation then (N-1) >> should be in replicated mode, is our understanding right?* >> *If our understanding is correct then to do Join operation the >> dimensioning of cluster increases by many folds which can't be used in a >> production environment.* >> *To reproduce:* >> *Ignite with 4 node cluster with native persistence enabled.* >> *create the following tables* >> >> CREATE TABLE City ( >> >> id LONG PRIMARY KEY, name VARCHAR) >> >> WITH "backup=1"; >> >> CREATE TABLE Person ( >> >> id LONG, name VARCHAR, city_id LONG, PRIMARY KEY (id, city_id)) >> >> WITH "backups=1"; >> >> CREATE INDEX idx_city_name ON City (name); >> >> CREATE INDEX idx_person_name ON Person (name); >> >> >> INSERT INTO City (id, name) VALUES (1, 'Forest Hill'); >> >> INSERT INTO City (id, name) VALUES (2, 'Denver'); >> >> INSERT INTO City (id, name) VALUES (3, 'St. Petersburg'); >> >> INSERT INTO Person (id, name, city_id) VALUES (1, 'John Doe', 3); >> >> INSERT INTO Person (id, name, city_id) VALUES (2, 'Jane Roe', 2); >> >> INSERT INTO Person (id, name, city_id) VALUES (3, 'Mary Major', 1); >> >> INSERT INTO Person (id, name, city_id) VALUES (4, 'Richard Miles', 2); >> >> >> Query to be run: >> >> select * from City c, Person p; >> >> or >> *SELECT* * *FROM* City *AS* c *CROSS* *join* Person *AS* p; >> >> >> >>
Re: nodes are restarting when i try to drop a table created with persistence enabled
Hi dmagda, When I insert many records (~ 10 or 20 million) to the same table and try to drop table or delete records from the table, nodes are restarting, the restarts happens In the middle of drop or delete operation. According to the logs the cause for restart looks like OOM in the data region. regards, shiva On Wed, Sep 25, 2019 at 1:12 PM Denis Mekhanikov wrote: > I think, the issue is that Ignite can't recover from > IgniteOutOfMemory, even by removing data. > Shiva, did IgniteOutOfMemory occur for the first time when you did the > DROP TABLE, or before that? > > Denis > > ср, 25 сент. 2019 г. в 02:30, Denis Magda : > > > > Shiva, > > > > Does this issue still exist? Ignite Dev how do we debug this sort of > thing? > > > > - > > Denis > > > > > > On Tue, Sep 17, 2019 at 7:22 AM Shiva Kumar > wrote: > >> > >> Hi dmagda, > >> > >> I am trying to drop the table which has around 10 million records and I > am seeing "Out of memory in data region" error messages in Ignite logs and > ignite node [Ignite pod on kubernetes] is restarting. > >> I have configured 3GB for default data region, 7GB for JVM and total > 15GB for Ignite container and enabled native persistence. > >> Earlier I was in an impression that restart was caused by > "SYSTEM_WORKER_BLOCKED" errors but now I am realized that > "SYSTEM_WORKER_BLOCKED" is added to ignore failure list and the actual > cause is " CRITICAL_ERROR " due to "Out of memory in data region" > >> > >> This is the error messages in logs: > >> > >> ""[2019-09-17T08:25:35,054][ERROR][sys-#773][] JVM will be halted > immediately due to the failure: [failureCtx=FailureContext > [type=CRITICAL_ERROR, err=class o.a.i.i.mem.IgniteOutOfMemoryException: > Failed to find a page for eviction [segmentCapacity=971652, loaded=381157, > maxDirtyPages=285868, dirtyPages=381157, cpPages=0, pinnedInSegment=3, > failedToPrepare=381155] > >> Out of memory in data region [name=Default_Region, initSize=500.0 MiB, > maxSize=3.0 GiB, persistenceEnabled=true] Try the following: > >> ^-- Increase maximum off-heap memory size > (DataRegionConfiguration.maxSize) > >> ^-- Enable Ignite persistence > (DataRegionConfiguration.persistenceEnabled) > >> ^-- Enable eviction or expiration policies]] > >> > >> Could you please help me on why drop table operation causing "Out of > memory in data region"? and how I can avoid it? > >> > >> We have a use case where application inserts records to many tables in > Ignite simultaneously for some time period and other applications run a > query on that time period data and update the dashboard. we need to delete > the records inserted in the previous time period before inserting new > records. > >> > >> even during delete from table operation, I have seen: > >> > >> "Critical system error detected. Will be handled accordingly to > configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, > timeout=0, super=AbstractFailureHandler > [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext > [type=CRITICAL_ERROR, err=class o.a.i.IgniteException: Checkpoint read lock > acquisition has been timed out.]] class org.apache.ignite.IgniteException: > Checkpoint read lock acquisition has been timed out.| > >> > >> > >> > >> On Mon, Apr 29, 2019 at 12:17 PM Denis Magda wrote: > >>> > >>> Hi Shiva, > >>> > >>> That was designed to prevent global cluster performance degradation or > other outages. Have you tried to apply my recommendation of turning of the > failure handler for this system threads? > >>> > >>> - > >>> Denis > >>> > >>> > >>> On Sun, Apr 28, 2019 at 10:28 AM shivakumar > wrote: > >>>> > >>>> HI Denis, > >>>> > >>>> is there any specific reason for the blocking of critical thread, > like CPU > >>>> is full or Heap is full ? > >>>> We are again and again hitting this issue. > >>>> is there any other way to drop tables/cache ? > >>>> This looks like a critical issue. > >>>> > >>>> regards, > >>>> shiva > >>>> > >>>> > >>>> > >>>> -- > >>>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/ >
index corrupted error : org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException: Runtime failure on search row
Hi all, I have deployed 3 node Ignite cluster with native persistence on Kubernetes and one of the node crashed with below error message, *org.h2.message.DbException: General error: "class org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException: Runtime failure on search row: Row@8cfe967[ key: epro_model_abcdKey [idHash=822184780, hash=737706081, NE_ID=, NAME=], val: epro_model_abcd [idHash=60444003, hash=1539928610, epro_ID=51, LONGITUDE=null, DELETE_TIME=null, VENDOR=null, CREATE_TIME=2019-09-19T20:38:32.361929Z, UPDATE_TIME=2019-09-19T20:40:05.821447Z, ADDITIONAL_INFO=null, VALID_UNTIL=2019-11-18T20:38:32.362036Z, TYPE=null, LATITUDE=null], ver: GridCacheVersion [topVer=180326822, order=1568925345552, nodeOrder=6] ][ 51, 2019-09-19T20:38:32.361929Z, 2019-09-19T20:40:05.821447Z, null, 2019-11-18T20:38:32.362036Z, , , null, null, null, null, null ]" [5-197]|* Please find attached file [index_corruption.txt] for complete backtrace. It looks like the Index got corrupted, I am not sure what exactly caused the index to corrupt. Any knows issues related to this? In my cluster, many applications write into many tables simultaneously and some queries run on many tables simultaneously and frequently application deletes unwanted rows[old data] in the tables using *delete from table* SQL operation. Failed to reinitialize local partitions (rebalancing will be stopped): GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=21, minorTopVer=0], discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=f3d7fb8c-0cda-42d0-a171-0155a171405b, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 192.168.*.*], sockAddrs=[/0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500, ignite-cluster-ignite-epro-0.ignite-service.default.svc.cluster.local/192.168.*.*:47500], discPort=47500, order=21, intOrder=12, lastExchangeTime=1568926175782, loc=true, ver=2.7.0#19700101-sha1:, isClient=false], topVer=21, nodeId8=f3d7fb8c, msg=null, type=NODE_JOINED, tstamp=1568926160054], nodeId=f3d7fb8c, evt=NODE_JOINED] org.h2.message.DbException: General error: "class org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException: Runtime failure on search row: Row@8cfe967[ key: epro_model_abcdKey [idHash=822184780, hash=737706081, NE_ID=, NAME=], val: epro_model_abcd [idHash=60444003, hash=1539928610, epro_ID=51, LONGITUDE=null, DELETE_TIME=null, VENDOR=null, CREATE_TIME=2019-09-19T20:38:32.361929Z, UPDATE_TIME=2019-09-19T20:40:05.821447Z, ADDITIONAL_INFO=null, VALID_UNTIL=2019-11-18T20:38:32.362036Z, TYPE=null, LATITUDE=null], ver: GridCacheVersion [topVer=180326822, order=1568925345552, nodeOrder=6] ][ 51, 2019-09-19T20:38:32.361929Z, 2019-09-19T20:40:05.821447Z, null, 2019-11-18T20:38:32.362036Z, , , null, null, null, null, null ]" [5-197]| at org.h2.message.DbException.get(DbException.java:168)|at org.h2.message.DbException.convert(DbException.java:307)|at org.apache.ignite.internal.processors.query.h2.database.H2TreeIndex.removex(H2TreeIndex.java:348)| at org.apache.ignite.internal.processors.query.h2.opt.GridH2Table.addToIndex(GridH2Table.java:550)| at org.apache.ignite.internal.processors.query.h2.opt.GridH2Table.update(GridH2Table.java:479)| at org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.store(IgniteH2Indexing.java:768)| at org.apache.ignite.internal.processors.query.GridQueryProcessor.store(GridQueryProcessor.java:1905)| at org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager.store(GridCacheQueryManager.java:404)| at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.finishUpdate(IgniteCacheOffheapManagerImpl.java:2633)| at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.update(IgniteCacheOffheapManagerImpl.java:2524)| at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.update(GridCacheOffheapManager.java:1759)| at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.update(IgniteCacheOffheapManagerImpl.java:443)| at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.applyUpdate(GridCacheDatabaseSharedManager.java:2653)| at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.applyLastUpdates(GridCacheDatabaseSharedManager.java:2339)| at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreState(GridCacheDatabaseSharedManager.java:1628)| at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.beforeExchange(GridCacheDatabaseSharedManager.java:1302)| at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:1453)| at org.
distributed sql join not working as mentioned in documentation
Hi all, I am trying to do a simple cross join on two tables with non-collocated data (without affinity key), This non-collocated distributed join always fails with the error message: *"java.sql.SQLException: javax.cache.CacheException: Failed to prepare distributed join query: join condition does not use index "* If I create one of the tables in replicated mode and another one in partitioned mode this Join operation works but documentation mentions that Ignite supports non-collocated joins without any condition. And we tried with 3 tables and 1 in replicated and other 2 in partitioned then we observed that it failed. we are running the Join operations with *distributedJoins=true.* *We observed that if there are N tables in Join operation then (N-1) should be in replicated mode, is our understanding right?* *If our understanding is correct then to do Join operation the dimensioning of cluster increases by many folds which can't be used in a production environment.* *To reproduce:* *Ignite with 4 node cluster with native persistence enabled.* *create the following tables* CREATE TABLE City ( id LONG PRIMARY KEY, name VARCHAR) WITH "backup=1"; CREATE TABLE Person ( id LONG, name VARCHAR, city_id LONG, PRIMARY KEY (id, city_id)) WITH "backups=1"; CREATE INDEX idx_city_name ON City (name); CREATE INDEX idx_person_name ON Person (name); INSERT INTO City (id, name) VALUES (1, 'Forest Hill'); INSERT INTO City (id, name) VALUES (2, 'Denver'); INSERT INTO City (id, name) VALUES (3, 'St. Petersburg'); INSERT INTO Person (id, name, city_id) VALUES (1, 'John Doe', 3); INSERT INTO Person (id, name, city_id) VALUES (2, 'Jane Roe', 2); INSERT INTO Person (id, name, city_id) VALUES (3, 'Mary Major', 1); INSERT INTO Person (id, name, city_id) VALUES (4, 'Richard Miles', 2); Query to be run: select * from City c, Person p; or *SELECT* * *FROM* City *AS* c *CROSS* *join* Person *AS* p;
liveness and rediness probe configuration for Ignite on kubernetes
Hi all, I have deployed Ignite on Kubernetes and configured liveness and readiness probe like this readinessProbe: tcpSocket: port: 10800 initialDelaySeconds: 10 periodSeconds: 2 failureThreshold: 60 livenessProbe: httpGet: scheme: HTTP path: /ignite?cmd=version&user=ignite&password=ignite port: 8080 initialDelaySeconds: 60 periodSeconds: 20 where 10800 is SQL port and 8080 is rest port. The problem I am facing here is, If pod restarts for some reason then liveness probe is failing because during recovery of Ignite node(pod) it is failing to respond to rest API (liveness probe) and this leads to again restart by Kubernetes and pod going into crashLoopBackOff state. Is there any other best way of configuring liveness probe? during recovery of Ignite node, why it is failing to respond to a simple rest API query ( /ignite?cmd=version ) ? regards, shiva
Re: nodes are restarting when i try to drop a table created with persistence enabled
Hi dmagda, I am trying to drop the table which has around 10 million records and I am seeing "*Out of memory in data region*" error messages in Ignite logs and ignite node [Ignite pod on kubernetes] is restarting. I have configured 3GB for default data region, 7GB for JVM and total 15GB for Ignite container and enabled native persistence. Earlier I was in an impression that restart was caused by " *SYSTEM_WORKER_BLOCKED*" errors but now I am realized that " *SYSTEM_WORKER_BLOCKED*" is added to ignore failure list and the actual cause is " *CRITICAL_ERROR* " due to "*Out of memory in data region"* This is the error messages in logs: ""[2019-09-17T08:25:35,054][ERROR][sys-#773][] *JVM will be halted immediately due to the failure: [failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.i.mem.IgniteOutOfMemoryException: Failed to find a page for eviction* [segmentCapacity=971652, loaded=381157, maxDirtyPages=285868, dirtyPages=381157, cpPages=0, pinnedInSegment=3, failedToPrepare=381155] *Out of memory in data region* [name=Default_Region, initSize=500.0 MiB, maxSize=3.0 GiB, persistenceEnabled=true] Try the following: ^-- Increase maximum off-heap memory size (DataRegionConfiguration.maxSize) ^-- Enable Ignite persistence (DataRegionConfiguration.persistenceEnabled) ^-- Enable eviction or expiration policies]] Could you please help me on why *drop table operation* causing "*Out of memory in data region"*? and how I can avoid it? We have a use case where application inserts records to many tables in Ignite simultaneously for some time period and other applications run a query on that time period data and update the dashboard. we need to delete the records inserted in the previous time period before inserting new records. even during *delete from table* operation, I have seen: "Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext [*type=CRITICAL_ERROR*, err=class o.a.i.IgniteException: *Checkpoint read lock acquisition has been timed out*.]] class org.apache.ignite.IgniteException: Checkpoint read lock acquisition has been timed out.| On Mon, Apr 29, 2019 at 12:17 PM Denis Magda wrote: > Hi Shiva, > > That was designed to prevent global cluster performance degradation or > other outages. Have you tried to apply my recommendation of turning of the > failure handler for this system threads? > > - > Denis > > > On Sun, Apr 28, 2019 at 10:28 AM shivakumar > wrote: > >> HI Denis, >> >> is there any specific reason for the blocking of critical thread, like CPU >> is full or Heap is full ? >> We are again and again hitting this issue. >> is there any other way to drop tables/cache ? >> This looks like a critical issue. >> >> regards, >> shiva >> >> >> >> -- >> Sent from: http://apache-ignite-users.70518.x6.nabble.com/ >> >
Re: Cache expiry policy not deleting records from disk(native persistence)
I have filed a bug https://issues.apache.org/jira/browse/IGNITE-12152 but this is same as https://issues.apache.org/jira/browse/IGNITE-10862 Any idea on the timeline of these tickets? In the documentation https://apacheignite.readme.io/v2.7/docs/expiry-policies it says when native persistence is enabled "*expired entries are removed from both memory and disk tiers*" but in the disk it just mark the pages as unwanted pages and same disk space used by these unwanted pages will be used to store new pages but it will not remove unwanted pages from disk and so it will not release disk space used by these unwanted pages. here is the developer's discussion link http://apache-ignite-developers.2346864.n4.nabble.com/How-to-free-up-space-on-disc-after-removing-entries-from-IgniteCache-with-enabled-PDS-td39839.html On Mon, Sep 9, 2019 at 11:53 PM Shiva Kumar wrote: > Hi > I have deployed ignite on kubernetes and configured two seperate > persistent volume for WAL and persistence. > The issue Iam facing is same as > https://issues.apache.org/jira/browse/IGNITE-10862 > > Thanks > Shiva > > On Mon, 9 Sep, 2019, 10:47 PM Andrei Aleksandrov, > wrote: > >> Hello, >> >> I guess that generated WAL will take this disk space. Please read about >> WAL here: >> >> https://apacheignite.readme.io/docs/write-ahead-log >> >> Please provide the size of every folder under /opt/ignite/persistence. >> >> BR, >> Andrei >> 9/6/2019 9:45 PM, Shiva Kumar пишет: >> >> Hi all, >> I have set cache expiry policy like this >> >> >> >> >> >> > class="org.apache.ignite.configuration.CacheConfiguration"> >> >> >> >> >> >> > factory-method="factoryOf"> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> And batch inserting records to one of the table which is created with >> above cache template. >> Around 10 minutes, I ingested ~1.5GB of data and after 10 minutes records >> started reducing(expiring) when I monitored from sqlline. >> >> 0: jdbc:ignite:thin://192.168.*.*:10800> select count(ID) from DIMENSIONS; >> >> >> COUNT(ID) >> >> >> 248896 >> >> 1 row selected (0.86 seconds) >> 0: jdbc:ignite:thin://192.168.*.*:10800> select count(ID) from DIMENSIONS; >> >> >> COUNT(ID) >> >> >> 222174 >> >> 1 row selected (0.313 seconds) >> 0: jdbc:ignite:thin://192.168.*.*:10800> select count(ID) from DIMENSIONS; >> >> >> COUNT(ID) >> >> >> 118154 >> >> 1 row selected (0.15 seconds) >> 0: jdbc:ignite:thin://192.168.*.*:10800> >> 0: jdbc:ignite:thin://192.168.*.*:10800> select count(ID) from DIMENSIONS; >> >> >> COUNT(ID) >> >> >> 76061 >> >> 1 row selected (0.106 seconds) >> 0: jdbc:ignite:thin://192.168.*.*:10800> >> 0: jdbc:ignite:thin://192.168.*.*:10800> select count(ID) from DIMENSIONS; >> >> >> COUNT(ID) >> >> >> 41671 >> >> 1 row selected (0.063 seconds) >> 0: jdbc:ignite:thin://192.168.*.*:10800> select count(ID) from DIMENSIONS; >> >> >> COUNT(ID) >> >> >> 18455 >> >> 1 row selected (0.037 seconds) >> 0: jdbc:ignite:thin://192.168.*.*:10800> select count(ID) from DIMENSIONS; >> >> >> COUNT(ID) >> >> >> 0 >> >> 1 row selected (0.014 seconds) >> >> >> But in the meantime, the disk space used by the persistence store was i
Re: Cache expiry policy not deleting records from disk(native persistence)
Hi I have deployed ignite on kubernetes and configured two seperate persistent volume for WAL and persistence. The issue Iam facing is same as https://issues.apache.org/jira/browse/IGNITE-10862 Thanks Shiva On Mon, 9 Sep, 2019, 10:47 PM Andrei Aleksandrov, wrote: > Hello, > > I guess that generated WAL will take this disk space. Please read about > WAL here: > > https://apacheignite.readme.io/docs/write-ahead-log > > Please provide the size of every folder under /opt/ignite/persistence. > > BR, > Andrei > 9/6/2019 9:45 PM, Shiva Kumar пишет: > > Hi all, > I have set cache expiry policy like this > > > > > > class="org.apache.ignite.configuration.CacheConfiguration"> > > > > > > factory-method="factoryOf"> > > > > > > > > > > > > > > > And batch inserting records to one of the table which is created with > above cache template. > Around 10 minutes, I ingested ~1.5GB of data and after 10 minutes records > started reducing(expiring) when I monitored from sqlline. > > 0: jdbc:ignite:thin://192.168.*.*:10800> select count(ID) from DIMENSIONS; > > > COUNT(ID) > > > 248896 > > 1 row selected (0.86 seconds) > 0: jdbc:ignite:thin://192.168.*.*:10800> select count(ID) from DIMENSIONS; > > > COUNT(ID) > > > 222174 > > 1 row selected (0.313 seconds) > 0: jdbc:ignite:thin://192.168.*.*:10800> select count(ID) from DIMENSIONS; > > > COUNT(ID) > > > 118154 > > 1 row selected (0.15 seconds) > 0: jdbc:ignite:thin://192.168.*.*:10800> > 0: jdbc:ignite:thin://192.168.*.*:10800> select count(ID) from DIMENSIONS; > > > COUNT(ID) > > > 76061 > > 1 row selected (0.106 seconds) > 0: jdbc:ignite:thin://192.168.*.*:10800> > 0: jdbc:ignite:thin://192.168.*.*:10800> select count(ID) from DIMENSIONS; > > > COUNT(ID) > > > 41671 > > 1 row selected (0.063 seconds) > 0: jdbc:ignite:thin://192.168.*.*:10800> select count(ID) from DIMENSIONS; > > > COUNT(ID) > > > 18455 > > 1 row selected (0.037 seconds) > 0: jdbc:ignite:thin://192.168.*.*:10800> select count(ID) from DIMENSIONS; > > > COUNT(ID) > > > 0 > > 1 row selected (0.014 seconds) > > > But in the meantime, the disk space used by the persistence store was in > the same usage level instead of decreasing. > > > [ignite@ignite-cluster-ign-shiv-0 ignite]$ while true ; do df -h > /opt/ignite/persistence/; sleep 1s; done > Filesystem Size Used Avail Use% Mounted on > /dev/vdj 15G 1.6G 14G 11% /opt/ignite/persistence > Filesystem Size Used Avail Use% Mounted on > /dev/vdj 15G 1.6G 14G 11% /opt/ignite/persistence > Filesystem Size Used Avail Use% Mounted on > /dev/vdj 15G 1.6G 14G 11% /opt/ignite/persistence > Filesystem Size Used Avail Use% Mounted on > /dev/vdj 15G 1.6G 14G 11% /opt/ignite/persistence > Filesystem Size Used Avail Use% Mounted on > /dev/vdj 15G 1.6G 14G 11% /opt/ignite/persistence > Filesystem Size Used Avail Use% Mounted on > /dev/vdj 15G 1.6G 14G 11% /opt/ignite/persistence > Filesystem Size Used Avail Use% Mounted on > /dev/vdj 15G 1.6G 14G 11% /opt/ignite/persistence > > > > This means that expiry policy not deleting records from the disk, but > ignite document says when expiry policy is set and native persistence is > enabled then it deletes records from disk as well. > Am I missing some configuration? > Any help is appreciated. > > Shiva > >
Cache expiry policy not deleting records from disk(native persistence)
Hi all, I have set cache expiry policy like this And batch inserting records to one of the table which is created with above cache template. Around 10 minutes, I ingested ~1.5GB of data and after 10 minutes records started reducing(expiring) when I monitored from sqlline. 0: jdbc:ignite:thin://192.168.*.*:10800> select count(ID) from DIMENSIONS; COUNT(ID) 248896 1 row selected (0.86 seconds) 0: jdbc:ignite:thin://192.168.*.*:10800> select count(ID) from DIMENSIONS; COUNT(ID) 222174 1 row selected (0.313 seconds) 0: jdbc:ignite:thin://192.168.*.*:10800> select count(ID) from DIMENSIONS; COUNT(ID) 118154 1 row selected (0.15 seconds) 0: jdbc:ignite:thin://192.168.*.*:10800> 0: jdbc:ignite:thin://192.168.*.*:10800> select count(ID) from DIMENSIONS; COUNT(ID) 76061 1 row selected (0.106 seconds) 0: jdbc:ignite:thin://192.168.*.*:10800> 0: jdbc:ignite:thin://192.168.*.*:10800> select count(ID) from DIMENSIONS; COUNT(ID) 41671 1 row selected (0.063 seconds) 0: jdbc:ignite:thin://192.168.*.*:10800> select count(ID) from DIMENSIONS; COUNT(ID) 18455 1 row selected (0.037 seconds) 0: jdbc:ignite:thin://192.168.*.*:10800> select count(ID) from DIMENSIONS; COUNT(ID) 0 1 row selected (0.014 seconds) But in the meantime, the disk space used by the persistence store was in the same usage level instead of decreasing. [ignite@ignite-cluster-ign-shiv-0 ignite]$ while true ; do df -h /opt/ignite/persistence/; sleep 1s; done Filesystem Size Used Avail Use% Mounted on /dev/vdj 15G 1.6G 14G 11% /opt/ignite/persistence Filesystem Size Used Avail Use% Mounted on /dev/vdj 15G 1.6G 14G 11% /opt/ignite/persistence Filesystem Size Used Avail Use% Mounted on /dev/vdj 15G 1.6G 14G 11% /opt/ignite/persistence Filesystem Size Used Avail Use% Mounted on /dev/vdj 15G 1.6G 14G 11% /opt/ignite/persistence Filesystem Size Used Avail Use% Mounted on /dev/vdj 15G 1.6G 14G 11% /opt/ignite/persistence Filesystem Size Used Avail Use% Mounted on /dev/vdj 15G 1.6G 14G 11% /opt/ignite/persistence Filesystem Size Used Avail Use% Mounted on /dev/vdj 15G 1.6G 14G 11% /opt/ignite/persistence This means that expiry policy not deleting records from the disk, but ignite document says when expiry policy is set and native persistence is enabled then it deletes records from disk as well. Am I missing some configuration? Any help is appreciated. Shiva
Re: Capacity planning for production deployment on kubernetes
Hi Denis, Thanks for your response, yes in our test also we have seen OOM errors and pod crash. so we will follow the recommendation for RAM requirements and also I was checking to ignite documentation on disk space required for WAL + WAL archive. here in this link https://apacheignite.readme.io/docs/write-ahead-log#section-wal-archive it says: archive size is defined as 4 times the size of the checkpointing buffer and checkpointing buffer is a function of the data region ( https://apacheignite.readme.io/docs/durable-memory-tuning#section-checkpointing-buffer-size ) but in this link https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood#IgnitePersistentStore-underthehood-SubfoldersGeneration under *Estimating disk space* section it explains something to estimate disk space required for WAL but it is not clear, can you please help me the correct recommendation for calculating the disk space required for WAL+WAL archive. In one of my testing, I configured 4GB for data region and 10GB for WAL+WAL archive but our pods crashing as disk mounted for WAL+WAL archive runs out of space. [ignite@ignite-cluster-ignite-node-2 ignite]$* df -h* Filesystem Size Used Avail Use% Mounted on overlay 158G 39G 112G 26% / tmpfs63G 0 63G 0% /dev tmpfs63G 0 63G 0% /sys/fs/cgroup /dev/vda1 158G 39G 112G 26% /etc/hosts shm 64M 0 64M 0% /dev/shm */dev/vdq9.8G 9.7G 44M 100% /opt/ignite/wal* /dev/vdr 50G 1.4G 48G 3% /opt/ignite/persistence tmpfs63G 12K 63G 1% /run/secrets/ kubernetes.io/serviceaccount tmpfs63G 0 63G 0% /proc/acpi tmpfs63G 0 63G 0% /proc/scsi tmpfs63G 0 63G 0% /sys/firmware and this is the error message in ignite node: "ERROR","JVM will be halted immediately due to the failure: [failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=class o.a.i.IgniteCheckedException: Failed to archive WAL segment [srcFile=/opt/ignite/wal/node00-37ea8ba6-3198-46a1-9e9e-38aff27ed9c9/0006.wal, dstFile=/opt/ignite/wal/archive/node00-37ea8ba6-3198-46a1-9e9e-38aff27ed9c9/0236.wal.tmp]]]" On Thu, Aug 22, 2019 at 8:04 PM Denis Mekhanikov wrote: > Shivakumar, > > Such allocation doesn’t allow full memory utilization, so it’s possible, > that nodes will crash because of out of memory errors. > So, it’s better to follow the given recommendation. > > If you want us to investigate reasons of the failures, please provide logs > and configuration of the failed nodes. > > Denis > On 21 Aug 2019, 16:17 +0300, Shiva Kumar , > wrote: > > Hi all, > we are testing field use case before deploying in the field and we want to > know whether below resource limits are suitable in production. > There are 3 nodes (3 pods on kubernetes) running. Each having below > configuration > >DefaultDataRegion: 60GB > JVM: 32GB > Resource allocated for each container: 64GB > > And ignite documents says (JVM+ All DataRegion) should not exceed 70% of > total RAM allocated to each node(container). > but we started testing with the above configuration and up to 9 days > ignite cluster was running successfully and there was some data ingestion > but suddenly pods crashed and they were unable to recover from the crash. > does the above resource configuration not good for node recovery?? > >
Capacity planning for production deployment on kubernetes
Hi all, we are testing field use case before deploying in the field and we want to know whether below resource limits are suitable in production. There are 3 nodes (3 pods on kubernetes) running. Each having below configuration DefaultDataRegion: 60GB JVM: 32GB Resource allocated for each container: 64GB And ignite documents says (JVM+ All DataRegion) should not exceed 70% of total RAM allocated to each node(container). but we started testing with the above configuration and up to 9 days ignite cluster was running successfully and there was some data ingestion but suddenly pods crashed and they were unable to recover from the crash. does the above resource configuration not good for node recovery??