Hi All, I've created a StackOverflow post (http://stackoverflow.com/questions/34815652/igniterdd-freezes-at-savepairs) but thought I might share it here also.
I have a Spark cluster of three machines and am trying to use Apache Ignite for caching data. On each Spark machine I have an Ignite node running and am using the Spark REPL for testing (problem originally found using Spark-submit so it not the REPL). The problem is that my execution freezes at IgniteRDD.savePairs. Here is my CacheConfig: <bean class="org.apache.ignite.configuration.CacheConfiguration"> <property name="name" value="myRddCache"/> <property name="cacheMode" value="PARTITIONED"/> </bean> I previously had this working but ran out of memory in Ignite so I (temporarily) added some options for tiered storage: <!- Store cache entries on-heap. -!-> <property name="memoryMode" value="ONHEAP_TIERED"/> <!- Enable Off-Heap memory with max size of 10 Gigabytes (0 for unlimited). -!-> <property name="offHeapMaxMemory" value="#{10 * 1024L * 1024L * 1024L}"/> <!- Configure eviction policy. -!-> <property name="evictionPolicy"> <bean class="org.apache.ignite.cache.eviction.fifo.FifoEvictionPolicy"> <!- Evict to off-heap after cache size reaches maxSize. -!-> <property name="maxSize" value="100000"/> </bean> </property> These were removed to try to debug the current issue. After adding these changes was when savePairs stopped working. I have not found anything in the logs. Has anyone came across this issue, any work-arounds/solutions? I believe there could be some hidden state involved. Is there a way to restore my cluster (delete certain directory etc.)? I have restarted the whole cluster numerous times. Notes: HDFS is configured as the under filesystem. When I create my IgniteContext (in the REPL) with an Ignite Node running on the same machine I get a warning that the IGFS/IGFS-management endpoints already in use: I have tested this with and without an Ignite node running on driver machine. P.S. Here is the thread trace: Frozen threads found (potential deadlock) It seems that the following threads have not changed their stack for more than 10 seconds. These threads are possibly (but not necessarily!) in a deadlock or hung. shmem-worker-#93%null% <--- Frozen for at least 21 sec org.apache.ignite.internal.util.ipc.shmem.IpcSharedMemoryUtils.readSharedMemory(long, byte[], long, long, long) IpcSharedMemoryUtils.java (native) org.apache.ignite.internal.util.ipc.shmem.IpcSharedMemorySpace.read(byte[], int, int, long) IpcSharedMemorySpace.java:220 org.apache.ignite.internal.util.ipc.shmem.IpcSharedMemoryInputStream.read(byte[], int, int) IpcSharedMemoryInputStream.java:62 org.apache.ignite.internal.util.ipc.IpcToNioAdapter.serve() IpcToNioAdapter.java:114 org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$ShmemWorker.body() TcpCommunicationSpi.java:2943 org.apache.ignite.internal.util.worker.GridWorker.run() GridWorker.java:110 java.lang.Thread.run() Thread.java:745 I see references to Shmem; FYI the endpoint that is configured is a TCP endpoint. P.P.S. Forget the 21 seconds above: the thread was still hung after an hour. -- View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Execution-hangs-at-IgniteRDD-savePairs-tp2615.html Sent from the Apache Ignite Users mailing list archive at Nabble.com.