Hi All, I've created a StackOverflow post
(http://stackoverflow.com/questions/34815652/igniterdd-freezes-at-savepairs)
but thought I might share it here also.



I have a Spark cluster of three machines and am trying to use Apache Ignite
for caching data. On each Spark machine I have an Ignite node running and am
using the Spark REPL for testing (problem originally found using
Spark-submit so it not the REPL).

The problem is that my execution freezes at IgniteRDD.savePairs. Here is my
CacheConfig:

    <bean class="org.apache.ignite.configuration.CacheConfiguration">
        <property name="name" value="myRddCache"/>
        <property name="cacheMode" value="PARTITIONED"/>
    </bean>

I previously had this working but ran out of memory in Ignite so I
(temporarily) added some options for tiered storage:

    <!- Store cache entries on-heap. -!->
    <property name="memoryMode" value="ONHEAP_TIERED"/> 

    <!- Enable Off-Heap memory with max size of 10 Gigabytes (0 for
unlimited). -!->
    <property name="offHeapMaxMemory" value="#{10 * 1024L * 1024L *
1024L}"/>

    <!- Configure eviction policy. -!->
    <property name="evictionPolicy">
        <bean
class="org.apache.ignite.cache.eviction.fifo.FifoEvictionPolicy">
            <!- Evict to off-heap after cache size reaches maxSize. -!->
            <property name="maxSize" value="100000"/>
        </bean>
    </property>

These were removed to try to debug the current issue. After adding these
changes was when savePairs stopped working. I have not found anything in the
logs.

Has anyone came across this issue, any work-arounds/solutions?

I believe there could be some hidden state involved. Is there a way to
restore my cluster (delete certain directory etc.)? I have restarted the
whole cluster numerous times.

Notes: HDFS is configured as the under filesystem. When I create my
IgniteContext (in the REPL) with an Ignite Node running on the same machine
I get a warning that the IGFS/IGFS-management endpoints already in use: I
have tested this with and without an Ignite node running on driver machine.

P.S. Here is the thread trace:

Frozen threads found (potential deadlock)

It seems that the following threads have not changed their stack for more
than 10 seconds.
These threads are possibly (but not necessarily!) in a deadlock or hung.

shmem-worker-#93%null% <--- Frozen for at least 21 sec
org.apache.ignite.internal.util.ipc.shmem.IpcSharedMemoryUtils.readSharedMemory(long,
byte[], long, long, long) IpcSharedMemoryUtils.java (native)
org.apache.ignite.internal.util.ipc.shmem.IpcSharedMemorySpace.read(byte[],
int, int, long) IpcSharedMemorySpace.java:220
org.apache.ignite.internal.util.ipc.shmem.IpcSharedMemoryInputStream.read(byte[],
int, int) IpcSharedMemoryInputStream.java:62
org.apache.ignite.internal.util.ipc.IpcToNioAdapter.serve()
IpcToNioAdapter.java:114
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$ShmemWorker.body()
TcpCommunicationSpi.java:2943
org.apache.ignite.internal.util.worker.GridWorker.run() GridWorker.java:110
java.lang.Thread.run() Thread.java:745

I see references to Shmem; FYI the endpoint that is configured is a TCP
endpoint.

P.P.S. Forget the 21 seconds above: the thread was still hung after an hour.




--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Execution-hangs-at-IgniteRDD-savePairs-tp2615.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Reply via email to