Re: Ignite Cluster getting stuck when new node Join or release

2018-06-26 Thread dkarachentsev
Hi,

Thread dumps look healthy. Please share full logs at that time when you took
that thread dumps or take a new ones (thread dumps + logs).

Thanks!
-Dmitry



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Ignite Cluster getting stuck when new node Join or release

2018-06-07 Thread Andrey Mashenkov
Hi,

It is ok if you kill client node. Grid will wait for
failureDetectionTimeout before drop failed node from topology.
All topology operations will stuck during that time as ignite nodes will
wait for answer from failed node until they detected failure.

On Thu, Jun 7, 2018 at 8:22 AM, Sambhaji Sawant 
wrote:

> An issue occurred when we abnormally stop Spark Java application which
> having Ignite client running inside that Spark context.So when we kill
> spark application its abnormally stop Ignite client and then when we
> restart our application and client try to connect with Ignite cluster then
> it getting stuck.
>
> On Mon, Jun 4, 2018 at 6:32 PM, dkarachentsev 
> wrote:
>
>> Hi,
>>
>> It's hard to get what's going wrong from your question.
>> Please attach full logs and thread dumps from all server nodes.
>>
>> Thanks!
>> -Dmitry
>>
>>
>>
>> --
>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>
>
>


-- 
Best regards,
Andrey V. Mashenkov


Re: Ignite Cluster getting stuck when new node Join or release

2018-06-06 Thread Sambhaji Sawant
An issue occurred when we abnormally stop Spark Java application which
having Ignite client running inside that Spark context.So when we kill
spark application its abnormally stop Ignite client and then when we
restart our application and client try to connect with Ignite cluster then
it getting stuck.

On Mon, Jun 4, 2018 at 6:32 PM, dkarachentsev 
wrote:

> Hi,
>
> It's hard to get what's going wrong from your question.
> Please attach full logs and thread dumps from all server nodes.
>
> Thanks!
> -Dmitry
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


Re: Ignite Cluster getting stuck when new node Join or release

2018-06-04 Thread dkarachentsev
Hi,

It's hard to get what's going wrong from your question.
Please attach full logs and thread dumps from all server nodes.

Thanks!
-Dmitry



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Ignite Cluster getting stuck when new node Join or release

2018-06-03 Thread Sambhaji Sawant
I have 3 node cluster with 20+ client and it's running in spark
context.Initially it working fine but randomly get issue whenever new node
i.e. client try to connect with cluster.The cluster getting inoperative.I
have got following logs when its stuck.If I restart any Ignite server
explicitly then its release and work fine.I have use Ignite 2.4.0 version.
same issue produced in Ignite 2.5.0 version too.

client side Logs

Failed to wait for partition map exchange [topVer=AffinityTopologyVersion
[topVer=44, minorTopVer=0], node=4d885cfd-45ed-43a2-8088-f35c9469797f].
Dumping pending objects that might be the cause:

GridDhtPartitionsExchangeFuture
[topVer=AffinityTopologyVersion [topVer=44, minorTopVer=0],
evt=NODE_JOINED, evtNode=TcpDiscoveryNode
[id=4d885cfd-45ed-43a2-8088-f35c9469797f, addrs=[0:0:0:0:0:0:0:1%lo,
10.13.10.179, 127.0.0.1], sockAddrs=[/0:0:0:0:0:0:0:1%lo:0,
/127.0.0.1:0, hdn6.mstorm.com/10.13.10.179:0], discPort=0, order=44,
intOrder=0, lastExchangeTime=1527651620413, loc=true,
ver=2.4.0#20180305-sha1:aa342270, isClient=true], done=false]

Failed to wait for partition map exchange [topVer=AffinityTopologyVersion
[topVer=44, minorTopVer=0], node=4d885cfd-45ed-43a2-8088-f35c9469797f].
Dumping pending objects that might be the cause:

GridDhtPartitionsExchangeFuture
[topVer=AffinityTopologyVersion [topVer=44, minorTopVer=0],
evt=NODE_JOINED, evtNode=TcpDiscoveryNode
[id=4d885cfd-45ed-43a2-8088-f35c9469797f, addrs=[0:0:0:0:0:0:0:1%lo,
10.13.10.179, 127.0.0.1], sockAddrs=[/0:0:0:0:0:0:0:1%lo:0,
/127.0.0.1:0, hdn6.mstorm.com/10.13.10.179:0], discPort=0, order=44,
intOrder=0, lastExchangeTime=1527651620413, loc=true,
ver=2.4.0#20180305-sha1:aa342270, isClient=true], done=false]

Failed to wait for initial partition map exchange. Possible reasons are:
^-- Transactions in deadlock. ^-- Long running transactions (ignore if this
is the case). ^-- Unreleased explicit locks.

Still waiting for initial partition map exchange
[fut=GridDhtPartitionsExchangeFuture [firstDiscoEvt=DiscoveryEvent
[evtNode=TcpDiscoveryNode [id=4d885cfd-45ed-43a2-8088-f35c9469797f, addrs=

Server Side Logs

Possible starvation in striped pool. Thread name: sys-stripe-0-#1 Queue:
[Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8,
ordered=false, timeout=0, skipOnTimeout=false, msg=GridDhtTxPrepareResponse
[nearEvicted=null, futId=869dd4ca361-fe7e167d-4d80-4f57-b004-13359a9f2c11,
miniId=1, super=GridDistributedTxPrepareResponse [txState=null, part=-1,
err=null, super=GridDistributedBaseMessage [ver=GridCacheVersion
[topVer=139084030, order=1527604094903, nodeOrder=1], committedVers=null,
rolledbackVers=null, cnt=0, super=GridCacheIdMessage [cacheId=0]],
Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8,
ordered=false, timeout=0, skipOnTimeout=false,
msg=GridDhtAtomicSingleUpdateRequest [key=KeyCacheObjectImpl [part=984,
val=null, hasValBytes=true], val=BinaryObjectImpl [arr= true, ctx=false,
start=0], prevVal=null, super=GridDhtAtomicAbstractUpdateRequest
[onRes=false, nearNodeId=null, nearFutId=0, flags=,
o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$DeferredUpdateTimeout@2735c674,
Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8,
ordered=false, timeout=0, skipOnTimeout=false, msg=GridDhtTxPrepareRequest
[nearNodeId=628e3078-17fd-4e49-b9ae-ad94ad97a2f1,
futId=6576e4ca361-6e7cdac2-d5a3-4624-9ad3-b93f25546cc3, miniId=1,
topVer=AffinityTopologyVersion [topVer=20, minorTopVer=0],
invalidateNearEntries={}, nearWrites=null, owned=null,
nearXidVer=GridCacheVersion [topVer=139084030, order=1527604094933,
nodeOrder=2], subjId=628e3078-17fd-4e49-b9ae-ad94ad97a2f1, taskNameHash=0,
preloadKeys=null, super=GridDistributedTxPrepareRequest [threadId=86,
concurrency=OPTIMISTIC, isolation=READ_COMMITTED, writeVer=GridCacheVersion
[topVer=139084030, order=1527604094935, nodeOrder=2], timeout=0,
reads=null, writes=[IgniteTxEntry [key=BinaryObjectImpl [arr= true,
ctx=false, start=0], cacheId=-1755241537, txKey=null, val=[op=UPDATE,
val=BinaryObjectImpl [arr= true, ctx=false, start=0]], prevVal=[op=NOOP,
val=null], oldVal=[op=NOOP, val=null], entryProcessorsCol=null, ttl=-1,
conflictExpireTime=-1, conflictVer=null, explicitVer=null, dhtVer=null,
filters=null, filtersPassed=false, filtersSet=false, entry=null,
prepared=0, locked=false, nodeId=null, locMapped=false, expiryPlc=null,
transferExpiryPlc=false, flags=0, partUpdateCntr=0, serReadVer=null,
xidVer=null]], dhtVers=null, txSize=0, plc=2, txState=null,
flags=onePhase|last, super=GridDistributedBaseMessage [ver=GridCacheVersion
[topVer=139084030, order=1527604094933, nodeOrder=2], committedVers=null,
rolledbackVers=null, cnt=0, super=GridCacheIdMessage [cacheId=0]],
Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8,
ordered=false, timeout=0, skipOnTimeout=false,
msg=GridDhtAtomicDeferredUpdateResponse [futIds=GridLongList [idx=2,
arr=[65774,65775],