Re: Ignite DataStreamer Memory Problems

2019-04-30 Thread kellan
At this point I've spent enough time on this problem and can move on with my
project without using @QueryTextField--I'm just letting anyone who's
concerned know what I've seen in case you want to probe into this issue any
further. 

I've taken the time to write a reproducer that can be easily run on any
machine, go ahead and run it based on my instructions and you can see
whatever logs you'd like to see for yourself. It runs with 4GB of heap
default, not 1, though feel free to adjust it. With 10GB of durable memory
and 4GB of Heap and a 22GB memory limit on the container, it will consume
memory up until the limit triggering an OOM kill in Docker.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Ignite DataStreamer Memory Problems

2019-04-30 Thread Evgenii Zhuravlev
Hi,

Lucene indexes are stored in the heap, while I see that in reproducer
you've limited heap size to 1gb. Are you sure that you used these JVM opts?
Can you please share logs from your run, so I can check the heap usage?

Best Regards,
Evgenii

вт, 30 апр. 2019 г. в 00:23, kellan :

> The issue seems to be with the @QueryTextField annotation. Unless Lucene
> indexes are supposed to be eating up all this memory, in which case it
> might
> be worth improving your documentation.
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


Re: Ignite DataStreamer Memory Problems

2019-04-29 Thread kellan
The issue seems to be with the @QueryTextField annotation. Unless Lucene
indexes are supposed to be eating up all this memory, in which case it might
be worth improving your documentation.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Ignite DataStreamer Memory Problems

2019-04-25 Thread kellan
Here is a reproducible example of the DataStreamer memory leak:
https://github.com/kellanburket/ignite-leak

I've also added a public image to DockerHub: miraco/ignite:leak

This can be run on a machine with at least 22GB of memory available to
Docker and probably 50GB of storage between WAL and persistent storage, just
to be safe.
I'm following the guidelines here:
https://apacheignite.readme.io/docs/durable-memory-tuning#section-share-ram

10GB of Durable Memory
4GB of Heap

with a 22GB memory limit in Docker that adds up to about 63% of overall RAM

Now run this container: (You adjust the cpus as needed. I'm using AWS r4.4xl
nodes with 16 cores running Amazon Linux):

docker run -v $LOCAL_STORAGE:$CONTAINER_STORAGE -v $LOCAL_WAL:$CONTAINER_WAL
-m 22G --cpus=12 --memory-swappiness 0 --name ignite.leak -d
miraco/ignite:leak

I would expect memory usage to stabilize somewhere around 18-19GB (4GB Heap
+ 10GB Durable + 640M WAL + 2GB Checkpoint Buffer + 1-2GB jdk overhead), but
instead usage per docker stats rises to the container limit forcing an OOM
kill. Feel free to increase the memory limit above 22GB. Results should be
the same though it make take longer to get there.

Now this is interesting. If I replace the cache value type, which is
Array[Byte] with a Long and run it again, memory usage eventually stabilizes
at around 19-20GB:

docker run -v $LOCAL_STORAGE:$CONTAINER_STORAGE -v $LOCAL_WAL:$CONTAINER_WAL
-e VALUE_TYPE=ValueLong -m 22G --cpus=12 --memory-swappiness 0 --name
ignite.leak -d miraco/ignite:leak

Is there something I'm missing here, or is this a bug?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Ignite DataStreamer Memory Problems

2019-04-24 Thread kellan
Ignite Version: 2.7.0

Ignite Config:
https://gist.github.com/kellanburket/73971d076a9b2d4f001b073d02e2343a

Java Process: /opt/jdk/bin/java -XX:+AggressiveOpts
-XX:NativeMemoryTracking=detail -Xms24G -Xmx24G
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/dumps/oom.bin
-XX:+AlwaysPreTouch -XX:+UseG1GC -XX:+ScavengeBeforeFullGC
-XX:MaxDirectMemorySize=256M -Duser.timezone=GMT
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.local.only=false
-Dcom.sun.management.jmxremote.port=49112
-Dcom.sun.management.jmxremote.rmi.port=49112
-Djava.rmi.server.hostname=127.0.0.1 -DIGNITE_WAL_MMAP=true
-Djdk.nio.maxCachedBufferSize=262144 -DIGNITE_QUIET=true
-DIGNITE_SUCCESS_FILE=/opt/ignite/apache-ignite-2.7.0-bin/work/ignite_success_77a36388-73e4-4de6-9988-27e62775c3fc
-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=49112
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
-DIGNITE_HOME=/opt/ignite/apache-ignite-2.7.0-bin
-DIGNITE_PROG_NAME=/opt/ignite/apache-ignite-2.7.0-bin/bin/ignite.sh -cp
/opt/ignite/apache-ignite-2.7.0-bin/libs/*:/opt/ignite/apache-ignite-2.7.0-bin/libs/ignite-indexing/*:/opt/ignite/apache-ignite-2.7.0-bin/libs/ignite-kubernetes/*:/opt/ignite/apache-ignite-2.7.0-bin/libs/ignite-spark/*:/opt/ignite/apache-ignite-2.7.0-bin/libs/ignite-spring/*:/opt/ignite/apache-ignite-2.7.0-bin/libs/ignite-zookeeper/*:/opt/ignite/apache-ignite-2.7.0-bin/libs/licenses/*
org.apache.ignite.startup.cmdline.CommandLineStartup
/opt/ignite/apache-ignite-2.7.0-bin/config/default-config.xml

I've already tried running with walMode=None, but I'll try it again just to
confirm

I'll put together a shareable reproducer today.





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Ignite DataStreamer Memory Problems

2019-04-23 Thread Stanislav Lukyanov
Can you share your full configuration (Ignite config and JVM options) and
the server logs of Ignite?

Which version of Ignite you use?

Can you confirm that on this version and configuration simply disabling
Ignite persistence removes the problem?
If yes, can you try running with walMode=NONE? It will help to rule out at
least some possibilities.

Also, if you can share a reproducer to this problem it should be easy for
us to debug this.

Stan

On Tue, Apr 23, 2019 at 6:42 AM kellan  wrote:

> Any suggestions from where I can go from here? I'd like to find a way to
> isolate this problem before I have to look into another storage/grid
> solutions. A lot of work has gone into integrating Ignite into our
> platform,
> and I'd really hate to start from scratch. I can provide as much
> information
> as needed to help pinpoint this problem/do additional tests on my end.
>
> Are there any projects out there that have successfully run Ignite on
> Kubernetes with Persistence and a high-volume write load?
>
> I've been looking into using third-party persistence but we require SQL
> queries to fetch the bulk of our data and it seems like this isn't really
> possible with Cassandra, et al, unless I can know in advance what data
> needs
> to be loaded into memory. Is that a safe assumption to make?
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


Re: Ignite DataStreamer Memory Problems

2019-04-23 Thread kellan
Any suggestions from where I can go from here? I'd like to find a way to
isolate this problem before I have to look into another storage/grid
solutions. A lot of work has gone into integrating Ignite into our platform,
and I'd really hate to start from scratch. I can provide as much information
as needed to help pinpoint this problem/do additional tests on my end.

Are there any projects out there that have successfully run Ignite on
Kubernetes with Persistence and a high-volume write load?

I've been looking into using third-party persistence but we require SQL
queries to fetch the bulk of our data and it seems like this isn't really
possible with Cassandra, et al, unless I can know in advance what data needs
to be loaded into memory. Is that a safe assumption to make?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Ignite DataStreamer Memory Problems

2019-04-21 Thread kellan
No luck with the changed configuration. Memory still continues to rise until
the Kubernetes limit (110GB), then crashes. This is output I pulled from
jcmd at some point before the crash. I can post the detailed memory report
if that helps.

Total: reserved=84645150KB, committed=83359362KB
- Java Heap (reserved=25165824KB, committed=25165824KB)
(mmap: reserved=25165824KB,
committed=25165824KB) 
 
- Class (reserved=1121992KB, committed=80356KB)
(classes #11821)
(malloc=1736KB #20912) 
(mmap: reserved=1120256KB, committed=78620KB) 
 
-Thread (reserved=198099KB, committed=198099KB)
(thread #193)
(stack: reserved=197248KB, committed=197248KB)
(malloc=626KB #975) 
(arena=225KB #380)
 
-  Code (reserved=260571KB, committed=65571KB)
(malloc=10971KB #16284) 
(mmap: reserved=249600KB, committed=54600KB) 
 
-GC (reserved=1047369KB, committed=1047369KB)
(malloc=80713KB #57810) 
(mmap: reserved=966656KB, committed=966656KB) 
 
-  Compiler (reserved=597KB, committed=597KB)
(malloc=467KB #1235) 
(arena=131KB #7)
 
-  Internal (reserved=56763248KB, committed=56763248KB)
(malloc=56763216KB #1063361) 
(mmap: reserved=32KB, committed=32KB) 
 
-Symbol (reserved=17245KB, committed=17245KB)
(malloc=14680KB #138104) 
(arena=2565KB #1)
 
-Native Memory Tracking (reserved=20852KB, committed=20852KB)
(malloc=453KB #6407) 
(tracking overhead=20399KB)
 
-   Arena Chunk (reserved=201KB, committed=201KB)
(malloc=201KB) 
 
-   Unknown (reserved=49152KB, committed=0KB)
(mmap: reserved=49152KB, committed=0KB) 
  



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Ignite DataStreamer Memory Problems

2019-04-21 Thread Stanislav Lukyanov
I've put a full answer on SO -
https://stackoverflow.com/questions/55752357/possible-memory-leak-in-ignite-datastreamer/55786023#55786023
.

In short, so far it doesn't look like a memory leak to me - just a
misconfiguration.
There is a memory pool in JVM for direct memory buffers which is by default
bounded by the value of `-Xmx`. Most applications would use minuscule
amount of it, but in some it can grow - and grow to the size of the heap,
making your total Java usage not roughly `heap + data region` but `heap *
2 + data region`.

Set walSegmentSize=64mb and -XX:MaxDirectMemorySize=256mb and I think it's
going to be OK.

Stan

On Sun, Apr 21, 2019 at 11:51 AM Denis Magda  wrote:

> Hello,
>
> Copying Evgeniy and Stan, our community experts who'd guide you through.
> In the meantime, please try to capture the OOM with this approach:
>
> https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/tooldescr007.html
>
> -
> Denis
>
>
> On Sun, Apr 21, 2019 at 8:49 AM kellan  wrote:
>
>> Update: I've been able to confirm a couple more details:
>>
>> 1. I'm experiencing the same leak with put, putAll as I am with the
>> DataStreamer
>> 2. The problem is resolved when persistence is turned off
>>
>>
>>
>> --
>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>
>


Re: Ignite DataStreamer Memory Problems

2019-04-21 Thread Denis Magda
Hello,

Copying Evgeniy and Stan, our community experts who'd guide you through. In
the meantime, please try to capture the OOM with this approach:
https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/tooldescr007.html

-
Denis


On Sun, Apr 21, 2019 at 8:49 AM kellan  wrote:

> Update: I've been able to confirm a couple more details:
>
> 1. I'm experiencing the same leak with put, putAll as I am with the
> DataStreamer
> 2. The problem is resolved when persistence is turned off
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


Re: Ignite DataStreamer Memory Problems

2019-04-21 Thread kellan
Update: I've been able to confirm a couple more details:

1. I'm experiencing the same leak with put, putAll as I am with the
DataStreamer
2. The problem is resolved when persistence is turned off



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Ignite DataStreamer Memory Problems

2019-04-19 Thread Denis Magda
Looping in the dev list.

Community does it remind you any memory leak addressed in the master? What
do we need to get down to the issue.

Denis

On Friday, April 19, 2019, kellan  wrote:

> After doing additional tests to isolate the issue, it looks like Ignite is
> having a problem releasing Internal memory of cache objects passed into the
> NIO ByteBuffers that back the DataStreamer objects. At first I thought this
> might be on account of my Avro's ByteBuffers that get transformed into byte
> arrays before being loaded into the Ignite DataStreamers, but I can run my
> application without the DataStreamers (otherwise exactly the same) and
> there
> is not memory leak.
>
> I've posted more about it on StackOverflow:
> https://stackoverflow.com/questions/55752357/possible-
> memory-leak-in-ignite-datastreamer
>
> I'm trying to productionalize an Ignite Cluster in Kubernetes and can't
> move
> forward until I can solve this problem. Is there anyone who's used
> DataStreamers to do heavy write loads in a k8s environment who has any
> insight into what would be causing this?
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


-- 
--
Denis Magda


Re: Ignite DataStreamer Memory Problems

2019-04-19 Thread kellan
After doing additional tests to isolate the issue, it looks like Ignite is
having a problem releasing Internal memory of cache objects passed into the
NIO ByteBuffers that back the DataStreamer objects. At first I thought this
might be on account of my Avro's ByteBuffers that get transformed into byte
arrays before being loaded into the Ignite DataStreamers, but I can run my
application without the DataStreamers (otherwise exactly the same) and there
is not memory leak.

I've posted more about it on StackOverflow:
https://stackoverflow.com/questions/55752357/possible-memory-leak-in-ignite-datastreamer

I'm trying to productionalize an Ignite Cluster in Kubernetes and can't move
forward until I can solve this problem. Is there anyone who's used
DataStreamers to do heavy write loads in a k8s environment who has any
insight into what would be causing this?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Ignite DataStreamer Memory Problems

2019-04-17 Thread kellan
So I've done a heap dump and recorded heap metrics while running my
DataStreamers and the heap doesn't appear to be the problem here. Ignite
operates normally for several hours without the heap size ever reaching its
max. My durable memory also seems to be behaving as expected. While looking
at the output of top, however, I notice a gradual increase in memory above
the sum total of heap + durable memory, which continues to increase for
several hours until my kubernetes pod hits its memory limit and is killed.
My guess is this is an NIO problem.

I suppose this could originate from the Avro files I'm loading from S3, and
I'm investigating this, but I'd like to rule out there being a problem on
the Ignite end. Do DataStreamers use NIO and is there anyway these could end
up "leaking" memory, and if so, are there configuration parameters or best
practices I could use to prevent this?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Ignite DataStreamer Memory Problems

2019-04-16 Thread kellan
A heap dump won't address non-heap memory issues, which is what I'm most
often running into. Where are places that memory build up can take place
with Ignite that is not in Durable Memory or Heap Memory?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Ignite DataStreamer Memory Problems

2019-04-16 Thread Ilya Kasnacheev
Hello!

I suggest collecting a heap dump and taking a long look towards it.

Regards,
-- 
Ilya Kasnacheev


пн, 15 апр. 2019 г. в 15:35, kellan :

> I'm confused. If the DataStreamer blocks until all data is loaded into
> remote
> caches and I'm only ever running a fixed number of DataStreamers (4 max),
> which close after they read a single file of a more or less fixed length
> each time (no more than 200MB; e.g. I shouldn't have more than 800MB +
> additional Ignite Metadata at any point in my DataStreamers), I shouldn't
> be
> seeing a gradual build-up of memory, but that's what I'm seeing.
>
> Maybe I should have said before that this is a persistent cache and the
> problem starts at some point after I've run out of memory in my data
> regions
> (not immediately, but hours later).
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


Re: Ignite DataStreamer Memory Problems

2019-04-15 Thread kellan
I'm confused. If the DataStreamer blocks until all data is loaded into remote
caches and I'm only ever running a fixed number of DataStreamers (4 max),
which close after they read a single file of a more or less fixed length
each time (no more than 200MB; e.g. I shouldn't have more than 800MB +
additional Ignite Metadata at any point in my DataStreamers), I shouldn't be
seeing a gradual build-up of memory, but that's what I'm seeing.

Maybe I should have said before that this is a persistent cache and the
problem starts at some point after I've run out of memory in my data regions
(not immediately, but hours later).



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Ignite DataStreamer Memory Problems

2019-04-15 Thread Ilya Kasnacheev
Hello!

DataStreamer WILL block until all data is loaded in caches.

The recommendation here is probably reducing perNodeParallelOperations(),
streamerBufferSize() and perThreadBufferSize(), and flush()ing your
DataStreamer frequently to avoid data build-ups in temporary data
structures of DataStreamer. Or maybe, if you have a few entries which are
very large, you can just use Cache API to populate those.

Regards,
-- 
Ilya Kasnacheev


вс, 14 апр. 2019 г. в 18:45, kellan :

> I seem to be running into some sort of memory issues with my DataStreamers
> and I'd like to get a better idea of how they work behind the scenes to
> troubleshoot my problem.
>
> I have a cluster of 4 nodes, each of which is pulling files from S3 over an
> extended period of time and loading the contents. Each new opens up a new
> DataStreamer, loads its contents and closes the DataStreamer. At most each
> cache has 4 DataStreamers writing to 4 different caches simultaneously. A
> new DataStreamer isn't created until the last one on that thread is closed.
> I wait for the futures to complete, then close the DataStreamer. So far so
> good.
>
> After my nodes are running for a few hours, one or more inevitably ends up
> crashing. Sometimes the Java heap overflows and Java exits, and sometimes
> Java is killed by the kernel because of an OOM error.
>
> Here are my specs per node:
> Total Available Memory: 110GB
> Memory Assigned to All Data Regions: 50GB
> Total Checkpoint Page Buffers: 5GB
> Java Heap: 25GB
>
> Does DataStreamer.close block until data is loaded into the cache on remote
> nodes (I'm assuming it doesn't), and if not is there anyway to monitor the
> progress loading data in the cache on the remote nodes/replicas, so I can
> slow down my DataStreamers to keep pace?
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


Ignite DataStreamer Memory Problems

2019-04-14 Thread kellan
I seem to be running into some sort of memory issues with my DataStreamers
and I'd like to get a better idea of how they work behind the scenes to
troubleshoot my problem.

I have a cluster of 4 nodes, each of which is pulling files from S3 over an
extended period of time and loading the contents. Each new opens up a new
DataStreamer, loads its contents and closes the DataStreamer. At most each
cache has 4 DataStreamers writing to 4 different caches simultaneously. A
new DataStreamer isn't created until the last one on that thread is closed.
I wait for the futures to complete, then close the DataStreamer. So far so
good.

After my nodes are running for a few hours, one or more inevitably ends up
crashing. Sometimes the Java heap overflows and Java exits, and sometimes
Java is killed by the kernel because of an OOM error.

Here are my specs per node:
Total Available Memory: 110GB
Memory Assigned to All Data Regions: 50GB
Total Checkpoint Page Buffers: 5GB
Java Heap: 25GB

Does DataStreamer.close block until data is loaded into the cache on remote
nodes (I'm assuming it doesn't), and if not is there anyway to monitor the
progress loading data in the cache on the remote nodes/replicas, so I can
slow down my DataStreamers to keep pace? 



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/