Re: Ignite DataStreamer Memory Problems
At this point I've spent enough time on this problem and can move on with my project without using @QueryTextField--I'm just letting anyone who's concerned know what I've seen in case you want to probe into this issue any further. I've taken the time to write a reproducer that can be easily run on any machine, go ahead and run it based on my instructions and you can see whatever logs you'd like to see for yourself. It runs with 4GB of heap default, not 1, though feel free to adjust it. With 10GB of durable memory and 4GB of Heap and a 22GB memory limit on the container, it will consume memory up until the limit triggering an OOM kill in Docker. -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Re: Ignite DataStreamer Memory Problems
Hi, Lucene indexes are stored in the heap, while I see that in reproducer you've limited heap size to 1gb. Are you sure that you used these JVM opts? Can you please share logs from your run, so I can check the heap usage? Best Regards, Evgenii вт, 30 апр. 2019 г. в 00:23, kellan : > The issue seems to be with the @QueryTextField annotation. Unless Lucene > indexes are supposed to be eating up all this memory, in which case it > might > be worth improving your documentation. > > > > -- > Sent from: http://apache-ignite-users.70518.x6.nabble.com/ >
Re: Ignite DataStreamer Memory Problems
The issue seems to be with the @QueryTextField annotation. Unless Lucene indexes are supposed to be eating up all this memory, in which case it might be worth improving your documentation. -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Re: Ignite DataStreamer Memory Problems
Here is a reproducible example of the DataStreamer memory leak: https://github.com/kellanburket/ignite-leak I've also added a public image to DockerHub: miraco/ignite:leak This can be run on a machine with at least 22GB of memory available to Docker and probably 50GB of storage between WAL and persistent storage, just to be safe. I'm following the guidelines here: https://apacheignite.readme.io/docs/durable-memory-tuning#section-share-ram 10GB of Durable Memory 4GB of Heap with a 22GB memory limit in Docker that adds up to about 63% of overall RAM Now run this container: (You adjust the cpus as needed. I'm using AWS r4.4xl nodes with 16 cores running Amazon Linux): docker run -v $LOCAL_STORAGE:$CONTAINER_STORAGE -v $LOCAL_WAL:$CONTAINER_WAL -m 22G --cpus=12 --memory-swappiness 0 --name ignite.leak -d miraco/ignite:leak I would expect memory usage to stabilize somewhere around 18-19GB (4GB Heap + 10GB Durable + 640M WAL + 2GB Checkpoint Buffer + 1-2GB jdk overhead), but instead usage per docker stats rises to the container limit forcing an OOM kill. Feel free to increase the memory limit above 22GB. Results should be the same though it make take longer to get there. Now this is interesting. If I replace the cache value type, which is Array[Byte] with a Long and run it again, memory usage eventually stabilizes at around 19-20GB: docker run -v $LOCAL_STORAGE:$CONTAINER_STORAGE -v $LOCAL_WAL:$CONTAINER_WAL -e VALUE_TYPE=ValueLong -m 22G --cpus=12 --memory-swappiness 0 --name ignite.leak -d miraco/ignite:leak Is there something I'm missing here, or is this a bug? -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Re: Ignite DataStreamer Memory Problems
Ignite Version: 2.7.0 Ignite Config: https://gist.github.com/kellanburket/73971d076a9b2d4f001b073d02e2343a Java Process: /opt/jdk/bin/java -XX:+AggressiveOpts -XX:NativeMemoryTracking=detail -Xms24G -Xmx24G -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/dumps/oom.bin -XX:+AlwaysPreTouch -XX:+UseG1GC -XX:+ScavengeBeforeFullGC -XX:MaxDirectMemorySize=256M -Duser.timezone=GMT -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.port=49112 -Dcom.sun.management.jmxremote.rmi.port=49112 -Djava.rmi.server.hostname=127.0.0.1 -DIGNITE_WAL_MMAP=true -Djdk.nio.maxCachedBufferSize=262144 -DIGNITE_QUIET=true -DIGNITE_SUCCESS_FILE=/opt/ignite/apache-ignite-2.7.0-bin/work/ignite_success_77a36388-73e4-4de6-9988-27e62775c3fc -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=49112 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -DIGNITE_HOME=/opt/ignite/apache-ignite-2.7.0-bin -DIGNITE_PROG_NAME=/opt/ignite/apache-ignite-2.7.0-bin/bin/ignite.sh -cp /opt/ignite/apache-ignite-2.7.0-bin/libs/*:/opt/ignite/apache-ignite-2.7.0-bin/libs/ignite-indexing/*:/opt/ignite/apache-ignite-2.7.0-bin/libs/ignite-kubernetes/*:/opt/ignite/apache-ignite-2.7.0-bin/libs/ignite-spark/*:/opt/ignite/apache-ignite-2.7.0-bin/libs/ignite-spring/*:/opt/ignite/apache-ignite-2.7.0-bin/libs/ignite-zookeeper/*:/opt/ignite/apache-ignite-2.7.0-bin/libs/licenses/* org.apache.ignite.startup.cmdline.CommandLineStartup /opt/ignite/apache-ignite-2.7.0-bin/config/default-config.xml I've already tried running with walMode=None, but I'll try it again just to confirm I'll put together a shareable reproducer today. -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Re: Ignite DataStreamer Memory Problems
Can you share your full configuration (Ignite config and JVM options) and the server logs of Ignite? Which version of Ignite you use? Can you confirm that on this version and configuration simply disabling Ignite persistence removes the problem? If yes, can you try running with walMode=NONE? It will help to rule out at least some possibilities. Also, if you can share a reproducer to this problem it should be easy for us to debug this. Stan On Tue, Apr 23, 2019 at 6:42 AM kellan wrote: > Any suggestions from where I can go from here? I'd like to find a way to > isolate this problem before I have to look into another storage/grid > solutions. A lot of work has gone into integrating Ignite into our > platform, > and I'd really hate to start from scratch. I can provide as much > information > as needed to help pinpoint this problem/do additional tests on my end. > > Are there any projects out there that have successfully run Ignite on > Kubernetes with Persistence and a high-volume write load? > > I've been looking into using third-party persistence but we require SQL > queries to fetch the bulk of our data and it seems like this isn't really > possible with Cassandra, et al, unless I can know in advance what data > needs > to be loaded into memory. Is that a safe assumption to make? > > > > -- > Sent from: http://apache-ignite-users.70518.x6.nabble.com/ >
Re: Ignite DataStreamer Memory Problems
Any suggestions from where I can go from here? I'd like to find a way to isolate this problem before I have to look into another storage/grid solutions. A lot of work has gone into integrating Ignite into our platform, and I'd really hate to start from scratch. I can provide as much information as needed to help pinpoint this problem/do additional tests on my end. Are there any projects out there that have successfully run Ignite on Kubernetes with Persistence and a high-volume write load? I've been looking into using third-party persistence but we require SQL queries to fetch the bulk of our data and it seems like this isn't really possible with Cassandra, et al, unless I can know in advance what data needs to be loaded into memory. Is that a safe assumption to make? -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Re: Ignite DataStreamer Memory Problems
No luck with the changed configuration. Memory still continues to rise until the Kubernetes limit (110GB), then crashes. This is output I pulled from jcmd at some point before the crash. I can post the detailed memory report if that helps. Total: reserved=84645150KB, committed=83359362KB - Java Heap (reserved=25165824KB, committed=25165824KB) (mmap: reserved=25165824KB, committed=25165824KB) - Class (reserved=1121992KB, committed=80356KB) (classes #11821) (malloc=1736KB #20912) (mmap: reserved=1120256KB, committed=78620KB) -Thread (reserved=198099KB, committed=198099KB) (thread #193) (stack: reserved=197248KB, committed=197248KB) (malloc=626KB #975) (arena=225KB #380) - Code (reserved=260571KB, committed=65571KB) (malloc=10971KB #16284) (mmap: reserved=249600KB, committed=54600KB) -GC (reserved=1047369KB, committed=1047369KB) (malloc=80713KB #57810) (mmap: reserved=966656KB, committed=966656KB) - Compiler (reserved=597KB, committed=597KB) (malloc=467KB #1235) (arena=131KB #7) - Internal (reserved=56763248KB, committed=56763248KB) (malloc=56763216KB #1063361) (mmap: reserved=32KB, committed=32KB) -Symbol (reserved=17245KB, committed=17245KB) (malloc=14680KB #138104) (arena=2565KB #1) -Native Memory Tracking (reserved=20852KB, committed=20852KB) (malloc=453KB #6407) (tracking overhead=20399KB) - Arena Chunk (reserved=201KB, committed=201KB) (malloc=201KB) - Unknown (reserved=49152KB, committed=0KB) (mmap: reserved=49152KB, committed=0KB) -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Re: Ignite DataStreamer Memory Problems
I've put a full answer on SO - https://stackoverflow.com/questions/55752357/possible-memory-leak-in-ignite-datastreamer/55786023#55786023 . In short, so far it doesn't look like a memory leak to me - just a misconfiguration. There is a memory pool in JVM for direct memory buffers which is by default bounded by the value of `-Xmx`. Most applications would use minuscule amount of it, but in some it can grow - and grow to the size of the heap, making your total Java usage not roughly `heap + data region` but `heap * 2 + data region`. Set walSegmentSize=64mb and -XX:MaxDirectMemorySize=256mb and I think it's going to be OK. Stan On Sun, Apr 21, 2019 at 11:51 AM Denis Magda wrote: > Hello, > > Copying Evgeniy and Stan, our community experts who'd guide you through. > In the meantime, please try to capture the OOM with this approach: > > https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/tooldescr007.html > > - > Denis > > > On Sun, Apr 21, 2019 at 8:49 AM kellan wrote: > >> Update: I've been able to confirm a couple more details: >> >> 1. I'm experiencing the same leak with put, putAll as I am with the >> DataStreamer >> 2. The problem is resolved when persistence is turned off >> >> >> >> -- >> Sent from: http://apache-ignite-users.70518.x6.nabble.com/ >> >
Re: Ignite DataStreamer Memory Problems
Hello, Copying Evgeniy and Stan, our community experts who'd guide you through. In the meantime, please try to capture the OOM with this approach: https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/tooldescr007.html - Denis On Sun, Apr 21, 2019 at 8:49 AM kellan wrote: > Update: I've been able to confirm a couple more details: > > 1. I'm experiencing the same leak with put, putAll as I am with the > DataStreamer > 2. The problem is resolved when persistence is turned off > > > > -- > Sent from: http://apache-ignite-users.70518.x6.nabble.com/ >
Re: Ignite DataStreamer Memory Problems
Update: I've been able to confirm a couple more details: 1. I'm experiencing the same leak with put, putAll as I am with the DataStreamer 2. The problem is resolved when persistence is turned off -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Re: Ignite DataStreamer Memory Problems
Looping in the dev list. Community does it remind you any memory leak addressed in the master? What do we need to get down to the issue. Denis On Friday, April 19, 2019, kellan wrote: > After doing additional tests to isolate the issue, it looks like Ignite is > having a problem releasing Internal memory of cache objects passed into the > NIO ByteBuffers that back the DataStreamer objects. At first I thought this > might be on account of my Avro's ByteBuffers that get transformed into byte > arrays before being loaded into the Ignite DataStreamers, but I can run my > application without the DataStreamers (otherwise exactly the same) and > there > is not memory leak. > > I've posted more about it on StackOverflow: > https://stackoverflow.com/questions/55752357/possible- > memory-leak-in-ignite-datastreamer > > I'm trying to productionalize an Ignite Cluster in Kubernetes and can't > move > forward until I can solve this problem. Is there anyone who's used > DataStreamers to do heavy write loads in a k8s environment who has any > insight into what would be causing this? > > > > -- > Sent from: http://apache-ignite-users.70518.x6.nabble.com/ > -- -- Denis Magda
Re: Ignite DataStreamer Memory Problems
After doing additional tests to isolate the issue, it looks like Ignite is having a problem releasing Internal memory of cache objects passed into the NIO ByteBuffers that back the DataStreamer objects. At first I thought this might be on account of my Avro's ByteBuffers that get transformed into byte arrays before being loaded into the Ignite DataStreamers, but I can run my application without the DataStreamers (otherwise exactly the same) and there is not memory leak. I've posted more about it on StackOverflow: https://stackoverflow.com/questions/55752357/possible-memory-leak-in-ignite-datastreamer I'm trying to productionalize an Ignite Cluster in Kubernetes and can't move forward until I can solve this problem. Is there anyone who's used DataStreamers to do heavy write loads in a k8s environment who has any insight into what would be causing this? -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Re: Ignite DataStreamer Memory Problems
So I've done a heap dump and recorded heap metrics while running my DataStreamers and the heap doesn't appear to be the problem here. Ignite operates normally for several hours without the heap size ever reaching its max. My durable memory also seems to be behaving as expected. While looking at the output of top, however, I notice a gradual increase in memory above the sum total of heap + durable memory, which continues to increase for several hours until my kubernetes pod hits its memory limit and is killed. My guess is this is an NIO problem. I suppose this could originate from the Avro files I'm loading from S3, and I'm investigating this, but I'd like to rule out there being a problem on the Ignite end. Do DataStreamers use NIO and is there anyway these could end up "leaking" memory, and if so, are there configuration parameters or best practices I could use to prevent this? -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Re: Ignite DataStreamer Memory Problems
A heap dump won't address non-heap memory issues, which is what I'm most often running into. Where are places that memory build up can take place with Ignite that is not in Durable Memory or Heap Memory? -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Re: Ignite DataStreamer Memory Problems
Hello! I suggest collecting a heap dump and taking a long look towards it. Regards, -- Ilya Kasnacheev пн, 15 апр. 2019 г. в 15:35, kellan : > I'm confused. If the DataStreamer blocks until all data is loaded into > remote > caches and I'm only ever running a fixed number of DataStreamers (4 max), > which close after they read a single file of a more or less fixed length > each time (no more than 200MB; e.g. I shouldn't have more than 800MB + > additional Ignite Metadata at any point in my DataStreamers), I shouldn't > be > seeing a gradual build-up of memory, but that's what I'm seeing. > > Maybe I should have said before that this is a persistent cache and the > problem starts at some point after I've run out of memory in my data > regions > (not immediately, but hours later). > > > > -- > Sent from: http://apache-ignite-users.70518.x6.nabble.com/ >
Re: Ignite DataStreamer Memory Problems
I'm confused. If the DataStreamer blocks until all data is loaded into remote caches and I'm only ever running a fixed number of DataStreamers (4 max), which close after they read a single file of a more or less fixed length each time (no more than 200MB; e.g. I shouldn't have more than 800MB + additional Ignite Metadata at any point in my DataStreamers), I shouldn't be seeing a gradual build-up of memory, but that's what I'm seeing. Maybe I should have said before that this is a persistent cache and the problem starts at some point after I've run out of memory in my data regions (not immediately, but hours later). -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Re: Ignite DataStreamer Memory Problems
Hello! DataStreamer WILL block until all data is loaded in caches. The recommendation here is probably reducing perNodeParallelOperations(), streamerBufferSize() and perThreadBufferSize(), and flush()ing your DataStreamer frequently to avoid data build-ups in temporary data structures of DataStreamer. Or maybe, if you have a few entries which are very large, you can just use Cache API to populate those. Regards, -- Ilya Kasnacheev вс, 14 апр. 2019 г. в 18:45, kellan : > I seem to be running into some sort of memory issues with my DataStreamers > and I'd like to get a better idea of how they work behind the scenes to > troubleshoot my problem. > > I have a cluster of 4 nodes, each of which is pulling files from S3 over an > extended period of time and loading the contents. Each new opens up a new > DataStreamer, loads its contents and closes the DataStreamer. At most each > cache has 4 DataStreamers writing to 4 different caches simultaneously. A > new DataStreamer isn't created until the last one on that thread is closed. > I wait for the futures to complete, then close the DataStreamer. So far so > good. > > After my nodes are running for a few hours, one or more inevitably ends up > crashing. Sometimes the Java heap overflows and Java exits, and sometimes > Java is killed by the kernel because of an OOM error. > > Here are my specs per node: > Total Available Memory: 110GB > Memory Assigned to All Data Regions: 50GB > Total Checkpoint Page Buffers: 5GB > Java Heap: 25GB > > Does DataStreamer.close block until data is loaded into the cache on remote > nodes (I'm assuming it doesn't), and if not is there anyway to monitor the > progress loading data in the cache on the remote nodes/replicas, so I can > slow down my DataStreamers to keep pace? > > > > -- > Sent from: http://apache-ignite-users.70518.x6.nabble.com/ >
Ignite DataStreamer Memory Problems
I seem to be running into some sort of memory issues with my DataStreamers and I'd like to get a better idea of how they work behind the scenes to troubleshoot my problem. I have a cluster of 4 nodes, each of which is pulling files from S3 over an extended period of time and loading the contents. Each new opens up a new DataStreamer, loads its contents and closes the DataStreamer. At most each cache has 4 DataStreamers writing to 4 different caches simultaneously. A new DataStreamer isn't created until the last one on that thread is closed. I wait for the futures to complete, then close the DataStreamer. So far so good. After my nodes are running for a few hours, one or more inevitably ends up crashing. Sometimes the Java heap overflows and Java exits, and sometimes Java is killed by the kernel because of an OOM error. Here are my specs per node: Total Available Memory: 110GB Memory Assigned to All Data Regions: 50GB Total Checkpoint Page Buffers: 5GB Java Heap: 25GB Does DataStreamer.close block until data is loaded into the cache on remote nodes (I'm assuming it doesn't), and if not is there anyway to monitor the progress loading data in the cache on the remote nodes/replicas, so I can slow down my DataStreamers to keep pace? -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/