RE: Spark Ignite Data Load failing On Large Cache

2018-10-09 Thread Stanislav Lukyanov
Hi,

Please share configurations and full logs from all nodes.

Stan

From: ApacheUser
Sent: 8 октября 2018 г. 17:49
To: user@ignite.apache.org
Subject: Spark Ignite Data Load failing On Large Cache

Hi,I am testing large Ignite Cache of 900GB, on 4 node VM(96GB RAM, 8CPU and 
500GB SAN Storage) Spark Ignite Cluster .It happened tow times after 
reaching 350GB plus one or two nodes not processing data load and the data 
load is stopped. Please advise, the CLuster , Server and Client Logs 
below.

<http://apache-ignite-users.70518.x6.nabble.com/file/t1842/IgniteClusterSnapshot.png>
 


Server Logs:

[11:59:34] Topology snapshot [ver=121, servers=4, clients=9, CPUs=32,
offheap=1000.0GB, heap=78.0GB]
[11:59:34]   ^-- Node [id=F6605E96-47C9-479B-A840-03316500C9A3,
clusterState=ACTIVE]
[11:59:34]   ^-- Baseline [id=0, size=4, online=4, offline=0]
[11:59:34] Data Regions Configured:
[11:59:34]   ^-- default_mem_region [initSize=256.0 MiB, maxSize=20.0 GiB,
persistenceEnabled=true]
[11:59:34]   ^-- q_major [initSize=10.0 GiB, maxSize=30.0 GiB,
persistenceEnabled=true]
[11:59:34]   ^-- q_minor [initSize=10.0 GiB, maxSize=30.0 GiB,
persistenceEnabled=true]
[14:33:15,872][SEVERE][grid-nio-worker-client-listener-3-#33][ClientListenerProcessor]
Failed to process selector key [ses=GridSelectorNioSessionImpl
[worker=ByteBufferNioClientWorker [readBuf=java.nio.HeapByteBuffer[pos=0
lim=8192 cap=8192], super=AbstractNioClientWorker [idx=3, bytesRcvd=0,
bytesSent=0, bytesRcvd0=0, bytesSent0=0, select=true, super=GridWorker
[name=grid-nio-worker-client-listener-3, igniteInstanceName=null,
finished=false, hashCode=254322881, interrupted=false,
runner=grid-nio-worker-client-listener-3-#33]]], writeBuf=null,
readBuf=null, inRecovery=null, outRecovery=null, super=GridNioSessionImpl
[locAddr=/64.102.213.190:10800, rmtAddr=/10.82.249.225:51449,
createTime=1538740798912, closeTime=0, bytesSent=397, bytesRcvd=302,
bytesSent0=0, bytesRcvd0=0, sndSchedTime=1538742789216,
lastSndTime=1538742789216, lastRcvTime=1538742789216, readsPaused=false,
filterChain=FilterChain[filters=[GridNioAsyncNotifyFilter,
GridNioCodecFilter [parser=ClientListenerBufferedParser, directMode=false]],
accepted=true]]]
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:197)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at
org.apache.ignite.internal.util.nio.GridNioServer$ByteBufferNioClientWorker.processRead(GridNioServer.java:1085)
at
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2339)
at
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2110)
at
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1764)
at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
at java.lang.Thread.run(Thread.java:748)
[21:43:26,312][SEVERE][grid-nio-worker-client-listener-0-#30][ClientListenerProcessor]
Failed to process selector key [ses=GridSelectorNioSessionImpl
[worker=ByteBufferNioClientWorker [readBuf=java.nio.HeapByteBuffer[pos=0
lim=8192 cap=8192], super=AbstractNioClientWorker [idx=0, bytesRcvd=0,
bytesSent=0, bytesRcvd0=0, bytesSent0=0, select=true, super=GridWorker
[name=grid-nio-worker-client-listener-0, igniteInstanceName=null,
finished=false, hashCode=2211598, interrupted=false,
runner=grid-nio-worker-client-listener-0-#30]]], writeBuf=null,
readBuf=null, inRecovery=null, outRecovery=null, super=GridNioSessionImpl
[locAddr=/64.102.213.190:10800, rmtAddr=/10.82.32.114:59525,
createTime=1538746249024, closeTime=0, bytesSent=2035, bytesRcvd=1532,
bytesSent0=0, bytesRcvd0=0, sndSchedTime=1538767916701,
lastSndTime=1538767916701, lastRcvTime=1538767916701, readsPaused=false,
filterChain=FilterChain[filters=[GridNioAsyncNotifyFilter,
GridNioCodecFilter [parser=ClientListenerBufferedParser, directMode=false]],
accepted=true]]]
java.io.IOException: Connection timed out
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:197)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at
org.apache.ignite.internal.util.nio.GridNioServer$ByteBufferNioClientWorker.processRead(GridNioServer.java:1085)
at
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2339)
at
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.jav

Spark Ignite Data Load failing On Large Cache

2018-10-08 Thread ApacheUser
Hi,I am testing large Ignite Cache of 900GB, on 4 node VM(96GB RAM, 8CPU and 
500GB SAN Storage) Spark Ignite Cluster .It happened tow times after 
reaching 350GB plus one or two nodes not processing data load and the data 
load is stopped. Please advise, the CLuster , Server and Client Logs 
below.


 


Server Logs:

[11:59:34] Topology snapshot [ver=121, servers=4, clients=9, CPUs=32,
offheap=1000.0GB, heap=78.0GB]
[11:59:34]   ^-- Node [id=F6605E96-47C9-479B-A840-03316500C9A3,
clusterState=ACTIVE]
[11:59:34]   ^-- Baseline [id=0, size=4, online=4, offline=0]
[11:59:34] Data Regions Configured:
[11:59:34]   ^-- default_mem_region [initSize=256.0 MiB, maxSize=20.0 GiB,
persistenceEnabled=true]
[11:59:34]   ^-- q_major [initSize=10.0 GiB, maxSize=30.0 GiB,
persistenceEnabled=true]
[11:59:34]   ^-- q_minor [initSize=10.0 GiB, maxSize=30.0 GiB,
persistenceEnabled=true]
[14:33:15,872][SEVERE][grid-nio-worker-client-listener-3-#33][ClientListenerProcessor]
Failed to process selector key [ses=GridSelectorNioSessionImpl
[worker=ByteBufferNioClientWorker [readBuf=java.nio.HeapByteBuffer[pos=0
lim=8192 cap=8192], super=AbstractNioClientWorker [idx=3, bytesRcvd=0,
bytesSent=0, bytesRcvd0=0, bytesSent0=0, select=true, super=GridWorker
[name=grid-nio-worker-client-listener-3, igniteInstanceName=null,
finished=false, hashCode=254322881, interrupted=false,
runner=grid-nio-worker-client-listener-3-#33]]], writeBuf=null,
readBuf=null, inRecovery=null, outRecovery=null, super=GridNioSessionImpl
[locAddr=/64.102.213.190:10800, rmtAddr=/10.82.249.225:51449,
createTime=1538740798912, closeTime=0, bytesSent=397, bytesRcvd=302,
bytesSent0=0, bytesRcvd0=0, sndSchedTime=1538742789216,
lastSndTime=1538742789216, lastRcvTime=1538742789216, readsPaused=false,
filterChain=FilterChain[filters=[GridNioAsyncNotifyFilter,
GridNioCodecFilter [parser=ClientListenerBufferedParser, directMode=false]],
accepted=true]]]
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:197)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at
org.apache.ignite.internal.util.nio.GridNioServer$ByteBufferNioClientWorker.processRead(GridNioServer.java:1085)
at
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2339)
at
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2110)
at
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1764)
at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
at java.lang.Thread.run(Thread.java:748)
[21:43:26,312][SEVERE][grid-nio-worker-client-listener-0-#30][ClientListenerProcessor]
Failed to process selector key [ses=GridSelectorNioSessionImpl
[worker=ByteBufferNioClientWorker [readBuf=java.nio.HeapByteBuffer[pos=0
lim=8192 cap=8192], super=AbstractNioClientWorker [idx=0, bytesRcvd=0,
bytesSent=0, bytesRcvd0=0, bytesSent0=0, select=true, super=GridWorker
[name=grid-nio-worker-client-listener-0, igniteInstanceName=null,
finished=false, hashCode=2211598, interrupted=false,
runner=grid-nio-worker-client-listener-0-#30]]], writeBuf=null,
readBuf=null, inRecovery=null, outRecovery=null, super=GridNioSessionImpl
[locAddr=/64.102.213.190:10800, rmtAddr=/10.82.32.114:59525,
createTime=1538746249024, closeTime=0, bytesSent=2035, bytesRcvd=1532,
bytesSent0=0, bytesRcvd0=0, sndSchedTime=1538767916701,
lastSndTime=1538767916701, lastRcvTime=1538767916701, readsPaused=false,
filterChain=FilterChain[filters=[GridNioAsyncNotifyFilter,
GridNioCodecFilter [parser=ClientListenerBufferedParser, directMode=false]],
accepted=true]]]
java.io.IOException: Connection timed out
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:197)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at
org.apache.ignite.internal.util.nio.GridNioServer$ByteBufferNioClientWorker.processRead(GridNioServer.java:1085)
at
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2339)
at
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2110)
at
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1764)
at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
   

Spark Ignite Data Load failing On Large Cache

2018-10-08 Thread ApacheUser
Hi,I am testing large Ignite Cache of 900GB, on 4 node VM(96GB RAM, 8CPU and
500GB SAN Storage) Spark Ignite Cluster .It happened tow times after
reaching 350GB plus one or two nodes not processing data load and the data
load is stopped. Please advise, the CLuster , Server and Client Logs
below.Detailsvisor> topHosts:
4+===+|
  
Int./Ext. IPs|Node ID8(@)| Node Type |   OS 
 
| CPUs |   MACs| CPU Load
|+===+|
0:0:0:0:0:0:0:1%lo | 1: F6605E96(@n1)  | Server| Linux amd64
3.10.0-862.11.6.el7.x86_64 | 8| FA:16:3E:52:96:C4 | 0.14 %   ||
127.0.0.1  | 2: 2760B50C(@n11) | Client|
   
|  |   |  || 64.102.213.190 | 3:
81855FF0(@n12) | Client||  |
  
| 
|++---+---++--+---+--+|
0:0:0:0:0:0:0:1%lo | 1: 512609AB(@n0)  | Server| Linux amd64
3.10.0-862.11.6.el7.x86_64 | 8| FA:16:3E:E5:27:36 | 2.13 %   ||
127.0.0.1  | 2: 72AA1490(@n5)  | Client|
   
|  |   |  || 64.102.212.151 | 3:
E218A964(@n6)  | Client||  |
  
| 
|++---+---++--+---+--+|
0:0:0:0:0:0:0:1%lo | 1: 4470553B(@n2)  | Server| Linux amd64
3.10.0-862.11.6.el7.x86_64 | 8| FA:16:3E:C4:F4:98 | 0.10 %   ||
127.0.0.1  | 2: F0D1625A(@n7)  | Client|
   
|  |   |  || 64.102.213.13  | 3:
EF0C5A13(@n8)  | Client||  |
  
| 
|++---+---++--+---+--+|
0:0:0:0:0:0:0:1%lo | 1: F44497FE(@n3)  | Server| Linux amd64
3.10.0-862.11.6.el7.x86_64 | 8| FA:16:3E:26:72:FD | 0.21 %   ||
127.0.0.1  | 2: DBA60939(@n4)  | Client|
   
|  |   |  || 64.102.213.220 | 3:
65FA421F(@n9)  | Client||  |
  
|  ||| 4: 8CBFE426(@n10) | Client|  
 
|  |   | 
|+---+Summary:+--+|
Active | true|| Total hosts| 4  
|| Total nodes| 13  || Total CPUs | 32 
|| Avg. CPU load  | 0.61 %  || Avg. free heap | 71.00 %
|| Avg. Up time   | 30:22:52|| Snapshot time  | 2018-10-08
14:19:47 |+--+visor> nodeSelect node
from:+==+|
#  |Node ID8(@), IP | Node Type | Up Time  | CPUs | CPU Load
| Free Heap
|+==+|
0  | 512609AB(@n0), 64.102.212.151  | Server| 30:23:14 | 8| 4.33 %  
| 36.00 %   || 1  | F6605E96(@n1), 64.102.213.190  | Server| 30:23:10 |
8| 0.90 %   | 56.00 %   || 2  | 4470553B(@n2), 64.102.213.13   | Server   
| 30:23:07 | 8| 0.20 %   | 78.00 %   || 3  | F44497FE(@n3),
64.102.213.220  | Server| 30:23:03 | 8| 0.17 %   | 44.00 %   || 4  |
DBA60939(@n4), 64.102.213.220  | Client| 14:21:12 | 8| 0.17 %   |
66.00 %   || 5  | 72AA1490(@n5), 64.102.212.151  | Client| 14:21:06 | 8   
| 0.17 %   | 78.00 %   || 6  | E218A964(@n6), 64.102.212.151  | Client|
14:21:07 | 8| 0.17 %   | 71.00 %   || 7  | F0D1625A(@n7), 64.102.213.13  
| Client| 14:21:06 | 8| 0.07 %   | 84.00 %   || 8  | EF0C5A13(@n8),
64.102.213.13   | Client| 14:21:06 | 8| 0.07 %   | 83.00 %   || 9  |
65FA421F(@n9), 64.102.213.220  | Client| 14:21:07 | 8| 0.10 %   |
64.00 %   || 10 | 8CBFE426(@n10), 64.102.213.220 | Client| 14:21:06 | 8   
| 0.13 %   | 76.00 %   || 11 | 2760B50C(@n11), 64.102.213.190 | Client|
14:21:07 | 8| 0.13 %   | 78.00 %   || 12 | 81855FF0(@n12),
64.102.213.190 | Client| 14:21:06 | 8| 0.10 %   | 81.00 %  
|+--+*Server