Unsubscribe me from mailing list

2018-06-22 Thread shaun_m



Running Node removal from baseline

2018-06-22 Thread Dave Harvey
The documentation describes the use case where a node is stopped and removed
from the baseline, which reduces the number of backups/replicas when the
node is stopped.

I assume that there is no current code to support removing the node from the
baseline first, so that at least desired number of backups are maintained
all the times?   Any plans for this?





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


RE: Deadlock during cache loading

2018-06-22 Thread breischl
In our case we're only using the receiver as you describe, to update the key
that it was invoked for. Our actual use case is that the incoming stream of
data sometimes sends us old data, which we want to discard rather than
cache. So the StreamReceiver examines the value already in the cache and
either applies the update or discards it. I re-examined our code, and it
only operates on keys that were supplied in the second arg to the receive()
function (the Collection> of updated entries).

>From your description it seems like this should be safe, and yet it seems to
be hitting a deadlock somewhere...



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: running Apache Ignite in docker with cgroups

2018-06-22 Thread Andrew Fung
Thanks! Wasn't aware of these!

On Fri, Jun 22, 2018 at 7:14 AM, David Wimsey  wrote:

> Are you enabling the extra flags required for the JVM work detect memory
> and work properly?
>
> Specifically adding the following options to the JVM options when starting
> ignite
>
> -XX:+UseCGroupMemoryLimitForHeap. The -XX:+UnlockExperimentalVMOptions
>
> See: https://blogs.oracle.com/java-platform-group/java-se-
> support-for-docker-cpu-and-memory-limits
>
>
> > On Jun 21, 2018, at 5:32 PM, Andrew Fung  wrote:
> >
> > I can see from the docs for IgniteConfiguration that some properties
> auto-size based on visible OS resources. In docker, "visible" ends up being
> the host values, which will exceed any CPU/memory limits applied via
> cgroups to the container.
> >
> > I see on 
> > IgniteConfiguration.DataStorageConfiguration.DataRegionConfiguration
> the properties initialSize and maxSize, which I've set substantially lower
> than the container memory allocation, which helped avoid being killed due
> to exceeding memory limits, but I'm still seeing the oom reaper kick in
> occasionally. Am I missing some other configuration value that controls
> off-heap memory use?
> >
> > I've set the DataRegionConfiguration initial/max size to 16GB, the JVM
> heap to 8GB, and the container is currently allocated 32GB, which seems
> more than enough for OS and other ancillary uses. In case relevant, I've
> enabled persistence, and set the relevant CacheConfiguration to partitioned
> mode with backups=2 and writeSynchronizationMode=full_sync. Running
> Ignite 2.5.0, Oracle JDK 8u172.
> >
> > Thanks!
> > Andrew.
>
>


-- 
*Andrew Fung*
Engineering  | ❖ Medallia
af...@medallia.com

https://lwn.net/2000/0824/a/esr-sharing.php3


Re: running Apache Ignite in docker with cgroups

2018-06-22 Thread Andrew Fung
One node per container. Configuration below, key values come from env vars.

IGNITE_DATA_REGION_MAX_SIZE_MB=16384
IGNITE_DATA_REGION_MAX_SIZE=$(( $PIPE_IGNITE_DATA_REGION_MAX_SIZE_MB * 1024
* 1024 ))  # 17179869184
IGNITE_JVM_OPTS='-Xms8g -Xmx8g'
IGNITE_PERSISTENCE_ENABLED=true
IGNITE_TEST_CACHE_BACKUPS=2



http://www.springframework.org/schema/beans;
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
xsi:schemaLocation="
http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans.xsd;
>








































On Fri, Jun 22, 2018 at 7:11 AM, aealexsandrov 
wrote:

> Hi,
>
> Could you please provide your configuration files? How many nodes did you
> start in your container?
>
> BR,
> Andrei
>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>



-- 
*Andrew Fung*
Engineering  | ❖ Medallia
af...@medallia.com

https://lwn.net/2000/0824/a/esr-sharing.php3


RE: Deadlock during cache loading

2018-06-22 Thread Stanislav Lukyanov
Well, that’s diving a bit deeper than the “don’t do cache operations” rule of 
thumb, but let’s do that.

When receiver is invoked for key K, it’s holding the lock for K.
It is safe to do invoke on that K (especially if you control the invoked code) 
since it is locked already.
But it is not safe to call invoke on another key J – because someone holding 
the lock for J might be doing the same for K, leading to a deadlock.

I believe it is really awkward to micromanage these locks, so the best practice 
is to avoid starting any cache operations (or, more general, any locking 
operations – including put()/get()) from the system pool threads, i.e. when 
executing things like StreamReceiver, Coninious Query listener, invoke() 
closure, etc – basically anything that is intercepting a cache operation.

Thanks,
Stan

From: breischl
Sent: 22 июня 2018 г. 18:09
To: user@ignite.apache.org
Subject: RE: Deadlock during cache loading

Hi Stan,
  Thanks for taking a look. I'm having trouble finding anywhere that it's
documented what I can or can't call inside a receiver. Is it just
put()/get() that are allowed? 

  Also, I noticed that the default StreamTransformer implementation calls
invoke() from within a receiver. So is that broken/deadlock-prone as well?

https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/stream/StreamTransformer.java#L50-L53

Thanks!
BKR



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/



RE: High cpu on ignite server nodes

2018-06-22 Thread praveeng
Hi Stan,

Thanks for your analysis.
We have increased the on heap cache size 50 and added expiry policy
[30mins].
The expiry policy is expiring the entries and the cache is never reaching to
it's max size.

But now we see high heap usage because of that GCs are happening frequently
and FULL GC is happened only once in a 2 days, after that full gc didn't
happen, only GCs are happening frequently.
Every time the heap usage is more than 59% in all the nodes and the heap
usage is reaching to 94% after 40 to 60 mins. Once GC happens it is coming
down to 60% . 


Following are the gc logs.
Desired survivor size 10485760 bytes, new threshold 1 (max 15)
 [PSYoungGen: 2086889K->10213K(2086912K)] 5665084K->3680259K(6281216K),
0.0704050 secs] [Times: user=0.54 sys=0.00, real=0.07 secs]
2018-06-22T09:53:38.873-0400: 374604.772: Total time for which application
threads were stopped: 0.0794010 seconds
2018-06-22T09:55:00.332-0400: 374686.231: Total time for which application
threads were stopped: 0.0084890 seconds
2018-06-22T09:55:00.340-0400: 374686.239: Total time for which application
threads were stopped: 0.0075450 seconds
2018-06-22T09:55:00.348-0400: 374686.247: Total time for which application
threads were stopped: 0.0078560 seconds
2018-06-22T09:55:26.847-0400: 374712.746: Total time for which application
threads were stopped: 0.0090060 seconds
2018-06-22T10:00:26.857-0400: 375012.756: Total time for which application
threads were stopped: 0.0105490 seconds
2018-06-22T10:02:48.740-0400: 375154.639: Total time for which application
threads were stopped: 0.0093160 seconds
2018-06-22T10:02:48.748-0400: 375154.647: Total time for which application
threads were stopped: 0.000 seconds
2018-06-22T10:02:48.757-0400: 375154.656: Total time for which application
threads were stopped: 0.0092110 seconds
2018-06-22T10:05:26.867-0400: 375312.766: Total time for which application
threads were stopped: 0.0098100 seconds
2018-06-22T10:05:52.775-0400: 375338.674: Total time for which application
threads were stopped: 0.0083580 seconds
2018-06-22T10:05:52.783-0400: 375338.682: Total time for which application
threads were stopped: 0.0074860 seconds
2018-06-22T10:05:52.790-0400: 375338.689: Total time for which application
threads were stopped: 0.0073980 seconds
2018-06-22T10:06:48.756-0400: 375394.655: Total time for which application
threads were stopped: 0.0086660 seconds
2018-06-22T10:06:48.764-0400: 375394.662: Total time for which application
threads were stopped: 0.0076080 seconds
2018-06-22T10:06:48.771-0400: 375394.670: Total time for which application
threads were stopped: 0.0076890 seconds
2018-06-22T10:07:05.603-0400: 375411.501: Total time for which application
threads were stopped: 0.0077390 seconds
2018-06-22T10:07:05.610-0400: 375411.509: Total time for which application
threads were stopped: 0.0074570 seconds
2018-06-22T10:07:05.617-0400: 375411.516: Total time for which application
threads were stopped: 0.0073410 seconds
2018-06-22T10:07:05.626-0400: 375411.525: Total time for which application
threads were stopped: 0.0072380 seconds
2018-06-22T10:07:05.633-0400: 375411.532: Total time for which application
threads were stopped: 0.0073070 seconds
2018-06-22T10:10:26.876-0400: 375612.775: Total time for which application
threads were stopped: 0.0091690 seconds
2018-06-22T10:15:26.887-0400: 375912.786: Total time for which application
threads were stopped: 0.0111650 seconds
2018-06-22T10:20:26.897-0400: 376212.796: Total time for which application
threads were stopped: 0.0099680 seconds
2018-06-22T10:22:30.917-0400: 376336.816: Total time for which application
threads were stopped: 0.0085330 seconds
2018-06-22T10:25:26.907-0400: 376512.806: Total time for which application
threads were stopped: 0.0094760 seconds
2018-06-22T10:26:04.247-0400: 376550.145: Total time for which application
threads were stopped: 0.0077120 seconds
2018-06-22T10:26:04.254-0400: 376550.153: Total time for which application
threads were stopped: 0.0075380 seconds
2018-06-22T10:26:04.262-0400: 376550.161: Total time for which application
threads were stopped: 0.0073460 seconds
2018-06-22T10:30:26.918-0400: 376812.817: Total time for which application
threads were stopped: 0.0107140 seconds
2018-06-22T10:35:26.929-0400: 377112.827: Total time for which application
threads were stopped: 0.0102250 seconds
2018-06-22T10:40:26.939-0400: 377412.838: Total time for which application
threads were stopped: 0.0096620 seconds
2018-06-22T10:41:06.178-0400: 377452.077: Total time for which application
threads were stopped: 0.0085630 seconds
2018-06-22T10:41:06.186-0400: 377452.085: Total time for which application
threads were stopped: 0.0079250 seconds
2018-06-22T10:41:06.194-0400: 377452.092: Total time for which application
threads were stopped: 0.0074940 seconds
2018-06-22T10:42:57.088-0400: 377562.987: Total time for which application
threads were stopped: 0.0090560 seconds
2018-06-22T10:42:57.096-0400: 377562.995: Total time for which 

RE: Deadlock during cache loading

2018-06-22 Thread breischl
Hi Stan,
  Thanks for taking a look. I'm having trouble finding anywhere that it's
documented what I can or can't call inside a receiver. Is it just
put()/get() that are allowed? 

  Also, I noticed that the default StreamTransformer implementation calls
invoke() from within a receiver. So is that broken/deadlock-prone as well?

https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/stream/StreamTransformer.java#L50-L53

Thanks!
BKR



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: running Apache Ignite in docker with cgroups

2018-06-22 Thread aealexsandrov
Hi,

Could you please provide your configuration files? How many nodes did you
start in your container?

BR,
Andrei





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: And again... Failed to get page IO instance (page content is corrupted)

2018-06-22 Thread Andrey Mashenkov
Hi,

We've found and fixed few issues related to ExpiryPolicy usage.
Most likely, your issue is [1] and it is planned to ignite 2.6 release.

[1] https://issues.apache.org/jira/browse/IGNITE-8659


On Fri, Jun 22, 2018 at 8:43 AM Olexandr K 
wrote:

> Hi Team,
>
> Issue is still there in 2.5.0
>
> Steps to reproduce:
> 1) start 2 servers + 2 clients topology
> 2) start load testing on client nodes
> 3) stop server 1
> 4) start server 1
> 5) stop server 1 again when rebalancing is in progress
> => and we got data corrupted here, see error below
> => we were not able to restart Ignite cluster after that and need to
> perform data folders cleanup...
>
> 2018-06-21 11:28:01.684 [ttl-cleanup-worker-#43] ERROR  - Critical system
> error detected. Will be handled accordingly to configured handler
> [hnd=class o.a.i.failure.StopNodeOrHaltFailureHandler,
> failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=class
> o.a.i.IgniteException: Runtime failure on bounds: [lower=null,
> upper=PendingRow [
> org.apache.ignite.IgniteException: Runtime failure on bounds: [lower=null,
> upper=PendingRow []]
> at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.find(BPlusTree.java:971)
> ~[ignite-core-2.5.0.jar:2.5.0]
> at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.find(BPlusTree.java:950)
> ~[ignite-core-2.5.0.jar:2.5.0]
> at
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.expire(IgniteCacheOffheapManagerImpl.java:1024)
> ~[ignite-core-2.5.0.jar:2.5.0]
> at
> org.apache.ignite.internal.processors.cache.GridCacheTtlManager.expire(GridCacheTtlManager.java:197)
> ~[ignite-core-2.5.0.jar:2.5.0]
> at
> org.apache.ignite.internal.processors.cache.GridCacheSharedTtlCleanupManager$CleanupWorker.body(GridCacheSharedTtlCleanupManager.java:137)
> [ignite-core-2.5.0.jar:2.5.0]
> at
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
> [ignite-core-2.5.0.jar:2.5.0]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_162]
> Caused by: java.lang.IllegalStateException: Item not found: 2
> at
> org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPageIO.findIndirectItemIndex(AbstractDataPageIO.java:341)
> ~[ignite-core-2.5.0.jar:2.5.0]
> at
> org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPageIO.getDataOffset(AbstractDataPageIO.java:450)
> ~[ignite-core-2.5.0.jar:2.5.0]
> at
> org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPageIO.readPayload(AbstractDataPageIO.java:492)
> ~[ignite-core-2.5.0.jar:2.5.0]
> at
> org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.initFromLink(CacheDataRowAdapter.java:150)
> ~[ignite-core-2.5.0.jar:2.5.0]
> at
> org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.initFromLink(CacheDataRowAdapter.java:102)
> ~[ignite-core-2.5.0.j
>
> BR, Oleksandr
>
> On Thu, Jun 14, 2018 at 2:51 PM, Olexandr K  > wrote:
>
>> Upgraded to 2.5.0 and didn't get such error so far..
>> Thanks!
>>
>> On Wed, Jun 13, 2018 at 4:58 PM, dkarachentsev <
>> dkarachent...@gridgain.com> wrote:
>>
>>> It would be better to upgrade to 2.5, where it is fixed.
>>> But if you want to overcome this issue in your's version, you need to add
>>> ignite-indexing dependency to your classpath and configure SQL indexes.
>>> For
>>> example [1], just modify it to work with Spring in XML:
>>> 
>>> 
>>> org.your.KeyObject
>>> org.your.ValueObject
>>> 
>>> 
>>>
>>> [1]
>>>
>>> https://apacheignite-sql.readme.io/docs/schema-and-indexes#section-registering-indexed-types
>>>
>>> Thanks!
>>> -Dmitry
>>>
>>>
>>>
>>> --
>>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>>
>>
>>
>

-- 
Best regards,
Andrey V. Mashenkov


Re: setting baseline topology in kubernetes

2018-06-22 Thread aealexsandrov
Hi,

No 11211 is a default ignite TCP port. For every new node, it will be
incremented 11211, 11212, 11213, etc.

Also please check that you didn't overwrite it. 

https://www.gridgain.com/sdk/pe/latest/javadoc/org/apache/ignite/spi/discovery/tcp/TcpDiscoverySpi.html#setLocalPort-int-

And yes ignite TCP port should be exposed for every node in case if you are
going to work with them from outside.

BR,
Andrei



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Is there any way to remove a node from cluster safely?

2018-06-22 Thread Hu Hailin
Hi,

At least one backup, got it.

Thank you.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: setting baseline topology in kubernetes

2018-06-22 Thread arunkjn
Hi Alex,

It accepts an IP and port as an argument. Do I need to enable ignite rest
and expose rest endpoints on cluster nodes for this to work?

Thanks,
Arun



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/