Unsubscribe me from mailing list
Running Node removal from baseline
The documentation describes the use case where a node is stopped and removed from the baseline, which reduces the number of backups/replicas when the node is stopped. I assume that there is no current code to support removing the node from the baseline first, so that at least desired number of backups are maintained all the times? Any plans for this? -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
RE: Deadlock during cache loading
In our case we're only using the receiver as you describe, to update the key that it was invoked for. Our actual use case is that the incoming stream of data sometimes sends us old data, which we want to discard rather than cache. So the StreamReceiver examines the value already in the cache and either applies the update or discards it. I re-examined our code, and it only operates on keys that were supplied in the second arg to the receive() function (the Collection> of updated entries). >From your description it seems like this should be safe, and yet it seems to be hitting a deadlock somewhere... -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Re: running Apache Ignite in docker with cgroups
Thanks! Wasn't aware of these! On Fri, Jun 22, 2018 at 7:14 AM, David Wimsey wrote: > Are you enabling the extra flags required for the JVM work detect memory > and work properly? > > Specifically adding the following options to the JVM options when starting > ignite > > -XX:+UseCGroupMemoryLimitForHeap. The -XX:+UnlockExperimentalVMOptions > > See: https://blogs.oracle.com/java-platform-group/java-se- > support-for-docker-cpu-and-memory-limits > > > > On Jun 21, 2018, at 5:32 PM, Andrew Fung wrote: > > > > I can see from the docs for IgniteConfiguration that some properties > auto-size based on visible OS resources. In docker, "visible" ends up being > the host values, which will exceed any CPU/memory limits applied via > cgroups to the container. > > > > I see on > > IgniteConfiguration.DataStorageConfiguration.DataRegionConfiguration > the properties initialSize and maxSize, which I've set substantially lower > than the container memory allocation, which helped avoid being killed due > to exceeding memory limits, but I'm still seeing the oom reaper kick in > occasionally. Am I missing some other configuration value that controls > off-heap memory use? > > > > I've set the DataRegionConfiguration initial/max size to 16GB, the JVM > heap to 8GB, and the container is currently allocated 32GB, which seems > more than enough for OS and other ancillary uses. In case relevant, I've > enabled persistence, and set the relevant CacheConfiguration to partitioned > mode with backups=2 and writeSynchronizationMode=full_sync. Running > Ignite 2.5.0, Oracle JDK 8u172. > > > > Thanks! > > Andrew. > > -- *Andrew Fung* Engineering | ❖ Medallia af...@medallia.com https://lwn.net/2000/0824/a/esr-sharing.php3
Re: running Apache Ignite in docker with cgroups
One node per container. Configuration below, key values come from env vars. IGNITE_DATA_REGION_MAX_SIZE_MB=16384 IGNITE_DATA_REGION_MAX_SIZE=$(( $PIPE_IGNITE_DATA_REGION_MAX_SIZE_MB * 1024 * 1024 )) # 17179869184 IGNITE_JVM_OPTS='-Xms8g -Xmx8g' IGNITE_PERSISTENCE_ENABLED=true IGNITE_TEST_CACHE_BACKUPS=2 http://www.springframework.org/schema/beans; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; xsi:schemaLocation=" http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd; > On Fri, Jun 22, 2018 at 7:11 AM, aealexsandrov wrote: > Hi, > > Could you please provide your configuration files? How many nodes did you > start in your container? > > BR, > Andrei > > > > > > -- > Sent from: http://apache-ignite-users.70518.x6.nabble.com/ > -- *Andrew Fung* Engineering | ❖ Medallia af...@medallia.com https://lwn.net/2000/0824/a/esr-sharing.php3
RE: Deadlock during cache loading
Well, that’s diving a bit deeper than the “don’t do cache operations” rule of thumb, but let’s do that. When receiver is invoked for key K, it’s holding the lock for K. It is safe to do invoke on that K (especially if you control the invoked code) since it is locked already. But it is not safe to call invoke on another key J – because someone holding the lock for J might be doing the same for K, leading to a deadlock. I believe it is really awkward to micromanage these locks, so the best practice is to avoid starting any cache operations (or, more general, any locking operations – including put()/get()) from the system pool threads, i.e. when executing things like StreamReceiver, Coninious Query listener, invoke() closure, etc – basically anything that is intercepting a cache operation. Thanks, Stan From: breischl Sent: 22 июня 2018 г. 18:09 To: user@ignite.apache.org Subject: RE: Deadlock during cache loading Hi Stan, Thanks for taking a look. I'm having trouble finding anywhere that it's documented what I can or can't call inside a receiver. Is it just put()/get() that are allowed? Also, I noticed that the default StreamTransformer implementation calls invoke() from within a receiver. So is that broken/deadlock-prone as well? https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/stream/StreamTransformer.java#L50-L53 Thanks! BKR -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
RE: High cpu on ignite server nodes
Hi Stan, Thanks for your analysis. We have increased the on heap cache size 50 and added expiry policy [30mins]. The expiry policy is expiring the entries and the cache is never reaching to it's max size. But now we see high heap usage because of that GCs are happening frequently and FULL GC is happened only once in a 2 days, after that full gc didn't happen, only GCs are happening frequently. Every time the heap usage is more than 59% in all the nodes and the heap usage is reaching to 94% after 40 to 60 mins. Once GC happens it is coming down to 60% . Following are the gc logs. Desired survivor size 10485760 bytes, new threshold 1 (max 15) [PSYoungGen: 2086889K->10213K(2086912K)] 5665084K->3680259K(6281216K), 0.0704050 secs] [Times: user=0.54 sys=0.00, real=0.07 secs] 2018-06-22T09:53:38.873-0400: 374604.772: Total time for which application threads were stopped: 0.0794010 seconds 2018-06-22T09:55:00.332-0400: 374686.231: Total time for which application threads were stopped: 0.0084890 seconds 2018-06-22T09:55:00.340-0400: 374686.239: Total time for which application threads were stopped: 0.0075450 seconds 2018-06-22T09:55:00.348-0400: 374686.247: Total time for which application threads were stopped: 0.0078560 seconds 2018-06-22T09:55:26.847-0400: 374712.746: Total time for which application threads were stopped: 0.0090060 seconds 2018-06-22T10:00:26.857-0400: 375012.756: Total time for which application threads were stopped: 0.0105490 seconds 2018-06-22T10:02:48.740-0400: 375154.639: Total time for which application threads were stopped: 0.0093160 seconds 2018-06-22T10:02:48.748-0400: 375154.647: Total time for which application threads were stopped: 0.000 seconds 2018-06-22T10:02:48.757-0400: 375154.656: Total time for which application threads were stopped: 0.0092110 seconds 2018-06-22T10:05:26.867-0400: 375312.766: Total time for which application threads were stopped: 0.0098100 seconds 2018-06-22T10:05:52.775-0400: 375338.674: Total time for which application threads were stopped: 0.0083580 seconds 2018-06-22T10:05:52.783-0400: 375338.682: Total time for which application threads were stopped: 0.0074860 seconds 2018-06-22T10:05:52.790-0400: 375338.689: Total time for which application threads were stopped: 0.0073980 seconds 2018-06-22T10:06:48.756-0400: 375394.655: Total time for which application threads were stopped: 0.0086660 seconds 2018-06-22T10:06:48.764-0400: 375394.662: Total time for which application threads were stopped: 0.0076080 seconds 2018-06-22T10:06:48.771-0400: 375394.670: Total time for which application threads were stopped: 0.0076890 seconds 2018-06-22T10:07:05.603-0400: 375411.501: Total time for which application threads were stopped: 0.0077390 seconds 2018-06-22T10:07:05.610-0400: 375411.509: Total time for which application threads were stopped: 0.0074570 seconds 2018-06-22T10:07:05.617-0400: 375411.516: Total time for which application threads were stopped: 0.0073410 seconds 2018-06-22T10:07:05.626-0400: 375411.525: Total time for which application threads were stopped: 0.0072380 seconds 2018-06-22T10:07:05.633-0400: 375411.532: Total time for which application threads were stopped: 0.0073070 seconds 2018-06-22T10:10:26.876-0400: 375612.775: Total time for which application threads were stopped: 0.0091690 seconds 2018-06-22T10:15:26.887-0400: 375912.786: Total time for which application threads were stopped: 0.0111650 seconds 2018-06-22T10:20:26.897-0400: 376212.796: Total time for which application threads were stopped: 0.0099680 seconds 2018-06-22T10:22:30.917-0400: 376336.816: Total time for which application threads were stopped: 0.0085330 seconds 2018-06-22T10:25:26.907-0400: 376512.806: Total time for which application threads were stopped: 0.0094760 seconds 2018-06-22T10:26:04.247-0400: 376550.145: Total time for which application threads were stopped: 0.0077120 seconds 2018-06-22T10:26:04.254-0400: 376550.153: Total time for which application threads were stopped: 0.0075380 seconds 2018-06-22T10:26:04.262-0400: 376550.161: Total time for which application threads were stopped: 0.0073460 seconds 2018-06-22T10:30:26.918-0400: 376812.817: Total time for which application threads were stopped: 0.0107140 seconds 2018-06-22T10:35:26.929-0400: 377112.827: Total time for which application threads were stopped: 0.0102250 seconds 2018-06-22T10:40:26.939-0400: 377412.838: Total time for which application threads were stopped: 0.0096620 seconds 2018-06-22T10:41:06.178-0400: 377452.077: Total time for which application threads were stopped: 0.0085630 seconds 2018-06-22T10:41:06.186-0400: 377452.085: Total time for which application threads were stopped: 0.0079250 seconds 2018-06-22T10:41:06.194-0400: 377452.092: Total time for which application threads were stopped: 0.0074940 seconds 2018-06-22T10:42:57.088-0400: 377562.987: Total time for which application threads were stopped: 0.0090560 seconds 2018-06-22T10:42:57.096-0400: 377562.995: Total time for which
RE: Deadlock during cache loading
Hi Stan, Thanks for taking a look. I'm having trouble finding anywhere that it's documented what I can or can't call inside a receiver. Is it just put()/get() that are allowed? Also, I noticed that the default StreamTransformer implementation calls invoke() from within a receiver. So is that broken/deadlock-prone as well? https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/stream/StreamTransformer.java#L50-L53 Thanks! BKR -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Re: running Apache Ignite in docker with cgroups
Hi, Could you please provide your configuration files? How many nodes did you start in your container? BR, Andrei -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Re: And again... Failed to get page IO instance (page content is corrupted)
Hi, We've found and fixed few issues related to ExpiryPolicy usage. Most likely, your issue is [1] and it is planned to ignite 2.6 release. [1] https://issues.apache.org/jira/browse/IGNITE-8659 On Fri, Jun 22, 2018 at 8:43 AM Olexandr K wrote: > Hi Team, > > Issue is still there in 2.5.0 > > Steps to reproduce: > 1) start 2 servers + 2 clients topology > 2) start load testing on client nodes > 3) stop server 1 > 4) start server 1 > 5) stop server 1 again when rebalancing is in progress > => and we got data corrupted here, see error below > => we were not able to restart Ignite cluster after that and need to > perform data folders cleanup... > > 2018-06-21 11:28:01.684 [ttl-cleanup-worker-#43] ERROR - Critical system > error detected. Will be handled accordingly to configured handler > [hnd=class o.a.i.failure.StopNodeOrHaltFailureHandler, > failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=class > o.a.i.IgniteException: Runtime failure on bounds: [lower=null, > upper=PendingRow [ > org.apache.ignite.IgniteException: Runtime failure on bounds: [lower=null, > upper=PendingRow []] > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.find(BPlusTree.java:971) > ~[ignite-core-2.5.0.jar:2.5.0] > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.find(BPlusTree.java:950) > ~[ignite-core-2.5.0.jar:2.5.0] > at > org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.expire(IgniteCacheOffheapManagerImpl.java:1024) > ~[ignite-core-2.5.0.jar:2.5.0] > at > org.apache.ignite.internal.processors.cache.GridCacheTtlManager.expire(GridCacheTtlManager.java:197) > ~[ignite-core-2.5.0.jar:2.5.0] > at > org.apache.ignite.internal.processors.cache.GridCacheSharedTtlCleanupManager$CleanupWorker.body(GridCacheSharedTtlCleanupManager.java:137) > [ignite-core-2.5.0.jar:2.5.0] > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) > [ignite-core-2.5.0.jar:2.5.0] > at java.lang.Thread.run(Thread.java:748) [?:1.8.0_162] > Caused by: java.lang.IllegalStateException: Item not found: 2 > at > org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPageIO.findIndirectItemIndex(AbstractDataPageIO.java:341) > ~[ignite-core-2.5.0.jar:2.5.0] > at > org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPageIO.getDataOffset(AbstractDataPageIO.java:450) > ~[ignite-core-2.5.0.jar:2.5.0] > at > org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPageIO.readPayload(AbstractDataPageIO.java:492) > ~[ignite-core-2.5.0.jar:2.5.0] > at > org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.initFromLink(CacheDataRowAdapter.java:150) > ~[ignite-core-2.5.0.jar:2.5.0] > at > org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.initFromLink(CacheDataRowAdapter.java:102) > ~[ignite-core-2.5.0.j > > BR, Oleksandr > > On Thu, Jun 14, 2018 at 2:51 PM, Olexandr K > wrote: > >> Upgraded to 2.5.0 and didn't get such error so far.. >> Thanks! >> >> On Wed, Jun 13, 2018 at 4:58 PM, dkarachentsev < >> dkarachent...@gridgain.com> wrote: >> >>> It would be better to upgrade to 2.5, where it is fixed. >>> But if you want to overcome this issue in your's version, you need to add >>> ignite-indexing dependency to your classpath and configure SQL indexes. >>> For >>> example [1], just modify it to work with Spring in XML: >>> >>> >>> org.your.KeyObject >>> org.your.ValueObject >>> >>> >>> >>> [1] >>> >>> https://apacheignite-sql.readme.io/docs/schema-and-indexes#section-registering-indexed-types >>> >>> Thanks! >>> -Dmitry >>> >>> >>> >>> -- >>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/ >>> >> >> > -- Best regards, Andrey V. Mashenkov
Re: setting baseline topology in kubernetes
Hi, No 11211 is a default ignite TCP port. For every new node, it will be incremented 11211, 11212, 11213, etc. Also please check that you didn't overwrite it. https://www.gridgain.com/sdk/pe/latest/javadoc/org/apache/ignite/spi/discovery/tcp/TcpDiscoverySpi.html#setLocalPort-int- And yes ignite TCP port should be exposed for every node in case if you are going to work with them from outside. BR, Andrei -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Re: Is there any way to remove a node from cluster safely?
Hi, At least one backup, got it. Thank you. -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Re: setting baseline topology in kubernetes
Hi Alex, It accepts an IP and port as an argument. Do I need to enable ignite rest and expose rest endpoints on cluster nodes for this to work? Thanks, Arun -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/