[jira] [Commented] (IGNITE-9283) [ML] Add Discrete Cosine preprocessor

2019-07-30 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896482#comment-16896482
 ] 

Ilya Lantukh commented on IGNITE-9283:
--

[~zaleslaw] Please review https://github.com/apache/ignite/pull/6735.

> [ML] Add Discrete Cosine preprocessor
> -
>
> Key: IGNITE-9283
> URL: https://issues.apache.org/jira/browse/IGNITE-9283
> Project: Ignite
>  Issue Type: Sub-task
>  Components: ml
>Reporter: Aleksey Zinoviev
>Assignee: Ilya Lantukh
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Add [https://en.wikipedia.org/wiki/Discrete_cosine_transform]
> Please look at the MinMaxScaler or Normalization packages in preprocessing 
> package.
> Add classes if required
> 1) Preprocessor
> 2) Trainer
> 3) custom PartitionData if shuffling is a step of algorithm
>  
> Requirements for successful PR:
>  # PartitionedDataset usage
>  # Trainer-Model paradigm support
>  # Tests for Model and for Trainer (and other stuff)
>  # Example of usage with small, but famous dataset like IRIS, Titanic or 
> House Prices
>  # Javadocs/codestyle according guidelines
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (IGNITE-9283) [ML] Add Discrete Cosine preprocessor

2019-07-18 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16887941#comment-16887941
 ] 

Ilya Lantukh commented on IGNITE-9283:
--

[~zaleslaw] , I will prepare a PR in a few days.

> [ML] Add Discrete Cosine preprocessor
> -
>
> Key: IGNITE-9283
> URL: https://issues.apache.org/jira/browse/IGNITE-9283
> Project: Ignite
>  Issue Type: Sub-task
>  Components: ml
>Reporter: Aleksey Zinoviev
>Assignee: Ilya Lantukh
>Priority: Major
>
> Add [https://en.wikipedia.org/wiki/Discrete_cosine_transform]
> Please look at the MinMaxScaler or Normalization packages in preprocessing 
> package.
> Add classes if required
> 1) Preprocessor
> 2) Trainer
> 3) custom PartitionData if shuffling is a step of algorithm
>  
> Requirements for successful PR:
>  # PartitionedDataset usage
>  # Trainer-Model paradigm support
>  # Tests for Model and for Trainer (and other stuff)
>  # Example of usage with small, but famous dataset like IRIS, Titanic or 
> House Prices
>  # Javadocs/codestyle according guidelines
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (IGNITE-8828) Detecting and stopping unresponsive nodes during Partition Map Exchange

2019-03-29 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16805255#comment-16805255
 ] 

Ilya Lantukh commented on IGNITE-8828:
--

Hi [~agoncharuk] ,

Thanks for the review!

I don't think I will be able to address your remarks in the nearest future. 
Feel free to take over this ticket.

> Detecting and stopping unresponsive nodes during Partition Map Exchange
> ---
>
> Key: IGNITE-8828
> URL: https://issues.apache.org/jira/browse/IGNITE-8828
> Project: Ignite
>  Issue Type: Improvement
>  Components: general
>Reporter: Sergey Chugunov
>Assignee: Ilya Lantukh
>Priority: Major
>  Labels: iep-25
> Fix For: 2.8
>
>   Original Estimate: 264h
>  Remaining Estimate: 264h
>
> During PME process coordinator (1) gathers local partition maps from all 
> nodes and (2) sends calculated full partition map back to all nodes in the 
> topology.
> However if one or more nodes fail to send local information on step 1 for any 
> reason, PME process hangs blocking all operations. The only solution will be 
> to manually identify and stop nodes which failed to send info to coordinator.
> This should be done by coordinator itself: in case it didn't receive in time 
> local partition maps from any nodes, it should check that stopping these 
> nodes won't lead to data loss and then stop them forcibly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (IGNITE-8828) Detecting and stopping unresponsive nodes during Partition Map Exchange

2019-03-29 Thread Ilya Lantukh (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ilya Lantukh reassigned IGNITE-8828:


Assignee: (was: Ilya Lantukh)

> Detecting and stopping unresponsive nodes during Partition Map Exchange
> ---
>
> Key: IGNITE-8828
> URL: https://issues.apache.org/jira/browse/IGNITE-8828
> Project: Ignite
>  Issue Type: Improvement
>  Components: general
>Reporter: Sergey Chugunov
>Priority: Major
>  Labels: iep-25
> Fix For: 2.8
>
>   Original Estimate: 264h
>  Remaining Estimate: 264h
>
> During PME process coordinator (1) gathers local partition maps from all 
> nodes and (2) sends calculated full partition map back to all nodes in the 
> topology.
> However if one or more nodes fail to send local information on step 1 for any 
> reason, PME process hangs blocking all operations. The only solution will be 
> to manually identify and stop nodes which failed to send info to coordinator.
> This should be done by coordinator itself: in case it didn't receive in time 
> local partition maps from any nodes, it should check that stopping these 
> nodes won't lead to data loss and then stop them forcibly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9913) Prevent data updates blocking in case of backup BLT server node leave

2019-03-28 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16803972#comment-16803972
 ] 

Ilya Lantukh commented on IGNITE-9913:
--

Hi [~NSAmelchev],

 

Thanks for the contribution! I've added some comments on your PR on github.

In general, I think that what you have done doesn't match the ticket's 
description. PME should definitely be faster now, because you removed the 
distributed exchange phase out of it. But cache operations might still be 
blocked until PME is finished on all nodes. For large clusters it might take 
significant amount of time for NODE_LEFT event to reach all nodes, and for that 
time some nodes will have topVer == X, while others will have it == X-1. If a 
cache operation involves nodes from both subsets, it will get blocked until 
node with lower version updates it to a higher version.

 

[~ivan.glukos], do you agree with that?

> Prevent data updates blocking in case of backup BLT server node leave
> -
>
> Key: IGNITE-9913
> URL: https://issues.apache.org/jira/browse/IGNITE-9913
> Project: Ignite
>  Issue Type: Improvement
>  Components: general
>Reporter: Ivan Rakov
>Assignee: Amelchev Nikita
>Priority: Major
> Fix For: 2.8
>
> Attachments: 9913_yardstick.png, master_yardstick.png
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Ignite cluster performs distributed partition map exchange when any server 
> node leaves or joins the topology.
> Distributed PME blocks all updates and may take a long time. If all 
> partitions are assigned according to the baseline topology and server node 
> leaves, there's no actual need to perform distributed PME: every cluster node 
> is able to recalculate new affinity assigments and partition states locally. 
> If we'll implement such lightweight PME and handle mapping and lock requests 
> on new topology version correctly, updates won't be stopped (except updates 
> of partitions that lost their primary copy).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (IGNITE-8877) PartitionsExchangeOnDiscoveryHistoryOverflowTest.testDynamicCacheCreation leads to OutOfMemoryError

2019-03-27 Thread Ilya Lantukh (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-8877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ilya Lantukh reassigned IGNITE-8877:


Assignee: (was: Ilya Lantukh)

>  PartitionsExchangeOnDiscoveryHistoryOverflowTest.testDynamicCacheCreation 
> leads to OutOfMemoryError
> 
>
> Key: IGNITE-8877
> URL: https://issues.apache.org/jira/browse/IGNITE-8877
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ilya Lantukh
>Priority: Major
>
> TC history: 
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=7685795678405642188&branch=%3Cdefault%3E&tab=testDetails



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (IGNITE-4003) Slow or faulty client can stall the whole cluster.

2019-03-27 Thread Ilya Lantukh (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ilya Lantukh reassigned IGNITE-4003:


Assignee: (was: Ilya Lantukh)

> Slow or faulty client can stall the whole cluster.
> --
>
> Key: IGNITE-4003
> URL: https://issues.apache.org/jira/browse/IGNITE-4003
> Project: Ignite
>  Issue Type: Bug
>  Components: cache, general
>Affects Versions: 1.7
>Reporter: Vladimir Ozerov
>Priority: Critical
>
> Steps to reproduce:
> 1) Start two server nodes and some data to cache.
> 2) Start a client from Docker subnet, which is not visible from the outside. 
> Client will join the cluster.
> 3) Try to put something to cache or start another node to force rabalance.
> Cluster is stuck at this moment. Root cause - servers are constantly trying 
> to establish outgoing connection to the client, but fail as Docker subnet is 
> not visible from the outside. It may stop virtually all cluster operations.
> Typical thread dump:
> {code}
> org.apache.ignite.IgniteCheckedException: Failed to send message (node may 
> have left the grid or TCP connection cannot be established due to firewall 
> issues) [node=TcpDiscoveryNode [id=a15d74c2-1ec2-4349-9640-aeacd70d8714, 
> addrs=[127.0.0.1, 172.17.0.6], sockAddrs=[/127.0.0.1:0, /127.0.0.1:0, 
> /172.17.0.6:0], discPort=0, order=7241, intOrder=3707, 
> lastExchangeTime=1474096941045, loc=false, ver=1.5.23#20160526-sha1:259146da, 
> isClient=true], topic=T4 [topic=TOPIC_CACHE, 
> id1=949732fd-1360-3a58-8d9e-0ff6ea6182cc, 
> id2=a15d74c2-1ec2-4349-9640-aeacd70d8714, id3=2], msg=GridContinuousMessage 
> [type=MSG_EVT_NOTIFICATION, routineId=7e13c48e-6933-48b2-9f15-8d92007930db, 
> data=null, futId=null], policy=2]
>   at 
> org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1129)
>  [ignite-core-1.5.23.jar:1.5.23]
>   at 
> org.apache.ignite.internal.managers.communication.GridIoManager.sendOrderedMessage(GridIoManager.java:1347)
>  [ignite-core-1.5.23.jar:1.5.23]
>   at 
> org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1227)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>   at 
> org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1198)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>   at 
> org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1180)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>   at 
> org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendNotification(GridContinuousProcessor.java:841)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>   at 
> org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.addNotification(GridContinuousProcessor.java:800)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>   at 
> org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.onEntryUpdate(CacheContinuousQueryHandler.java:787)
>  [ignite-core-1.5.23.jar:1.5.23]
>   at 
> org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.access$700(CacheContinuousQueryHandler.java:91)
>  [ignite-core-1.5.23.jar:1.5.23]
>   at 
> org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler$1.onEntryUpdated(CacheContinuousQueryHandler.java:412)
>  [ignite-core-1.5.23.jar:1.5.23]
>   at 
> org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:343)
>  [ignite-core-1.5.23.jar:1.5.23]
>   at 
> org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:250)
>  [ignite-core-1.5.23.jar:1.5.23]
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.initialValue(GridCacheMapEntry.java:3476)
>  [ignite-core-1.5.23.jar:1.5.23]
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture$MiniFuture.onResult(GridDhtForceKeysFuture.java:548)
>  [ignite-core-1.5.23.jar:1.5.23]
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture.onResult(GridDhtForceKeysFuture.java:207)
>  [ignite-core-1.5.23.jar:1.5.23]
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.processForceKeyResponse(GridDhtPreloader.java:636)
>  [ignite-core-1.5.23.jar:1.5.23]
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.access$1000(GridDhtPreloader.java:81)
>  [ignite-core-1.5.23.jar:1.5.23]
>   at 
> org.apache.ignite.internal.processors.cache.distribute

[jira] [Commented] (IGNITE-11457) Prewarming of page memory after node restart

2019-03-08 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-11457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16787867#comment-16787867
 ] 

Ilya Lantukh commented on IGNITE-11457:
---

[~cyberdemon],

Thanks for the contribution!

To me it makes more sense to pre-warm page memory based on physical records in 
last N WAL segments, without any LoadedPageTrackers.. What do you think?

> Prewarming of page memory after node restart
> 
>
> Key: IGNITE-11457
> URL: https://issues.apache.org/jira/browse/IGNITE-11457
> Project: Ignite
>  Issue Type: New Feature
>Reporter: Dmitriy Sorokin
>Assignee: Dmitriy Sorokin
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The essence of page memory prewarming feature is that after restarting the 
> node is to load into memory those pages that were loaded before last 
> shutdown. This approach allows to get fully prewarmed node or even cluster 
> just after one has been restarted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9739) Critical exception in transaction processing in case we have nodes out of baseline and non-persisted cache

2019-01-16 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16744028#comment-16744028
 ] 

Ilya Lantukh commented on IGNITE-9739:
--

Looks good.

> Critical exception in transaction processing in case we have nodes out of 
> baseline and non-persisted cache
> --
>
> Key: IGNITE-9739
> URL: https://issues.apache.org/jira/browse/IGNITE-9739
> Project: Ignite
>  Issue Type: Bug
>  Components: persistence
>Reporter: Sergey Kosarev
>Assignee: Sergey Kosarev
>Priority: Major
>
> Activation finished
> {code:java}
> 2018-09-20 20:47:05.169 [INFO 
> ][sys-#307%DPL_GRID%DplGridNodeName%][o.g.g.i.p.c.d.GridSnapshotAwareClusterStateProcessorImpl]
>  Successfully performed final activation steps 
> [nodeId=382437eb-fd8a-4f92-acd5-d9ea562c8557, client=false, 
> topVer=AffinityTopologyVersion [topVer=160, minorTopVer=1]]
> {code}
> but we have nodes not in base line
> {code:java}
> 2018-09-20 20:45:36.116 [INFO 
> ][sys-#305%DPL_GRID%DplGridNodeName%][o.g.g.i.p.c.d.GridSnapshotAwareClusterStateProcessorImpl]
>  Local node is not included in Baseline Topology and will not be used for 
> persistent data storage. Use control.(sh|bat) script or IgniteCluster 
> interface to include the node to Baseline Topology.
> {code}
> And we have cache (869481129) in the data region with persistanceEnabled=false
> {code:java}
> 2018-09-20 20:49:01.825 [INFO 
> ][exchange-worker-#154%DPL_GRID%DplGridNodeName%][o.a.i.i.p.cache.GridCacheProcessor]
>  Started cache [name=DPL_PUBLISHED_CACHES_REGISTRY$, *id=869481129*, group=SY
> STEM_CACHEGROUP_PUBLISHED_REGISTRY, memoryPolicyName=not-persisted, 
> mode=PARTITIONED, atomicity=TRANSACTIONAL, backups=3]
> {code}
> Transaction on this cache(869481129)
> {code:java}
> 869481129{code}
> leads to critical error causing nodes by faulure handler:
> {code:java}
> 2018-09-20 20:50:24.275 
> [ERROR][sys-stripe-41-#42%DPL_GRID%DplGridNodeName%][o.a.i.i.p.cache.GridCacheIoManager]
>  Failed processing message [senderId=62e986f0-62b5-4ec8-8cc7-27b74d345235, 
> msg=GridDhtTxPrepareRequest [nearNodeId=814af7c4-2de5-4511-b1ea-065b91eaa774, 
> futId=520e308f561-255fdea5-a996-4102-a120-afa380c54570, miniId=1, 
> topVer=AffinityTopologyVersion [topVer=160, minorTopVer=2], 
> invalidateNearEntries={}, nearWrites=null, owned=null, 
> nearXidVer=GridCacheVersion [topVer=148944365, order=1537511036821, 
> nodeOrder=132], subjId=814af7c4-2de5-4511-b1ea-065b91eaa774, taskNameHash=0, 
> preloadKeys=null, skipCompletedVers=false, 
> super=GridDistributedTxPrepareRequest [threadId=58, concurrency=PESSIMISTIC, 
> isolation=READ_COMMITTED, writeVer=GridCacheVersion [topVer=148944365, 
> order=1537511036824, nodeOrder=7], timeout=299970, reads=null, 
> writes=ArrayList [
> IgniteTxEntry [key=KeyCacheObjectImpl [part=27254, 
> val=com.sbt.api.entities.out.IPublishedDocType, hasValBytes=true], 
> *cacheId=869481129*,
> txKey=IgniteTxKey [key=KeyCacheObjectImpl [part=27254, 
> val=com.sbt.api.entities.out.IPublishedDocType, hasValBytes=true], 
> *cacheId=869481129*], val=[op=CREATE, 
> val=com.sbt.dpl.gridgain.PublishedRegistry$PublishedCacheTuple 
> [idHash=811765531, hash=1522508040, 
> cacheName=com.sbt.gbk.entities.DocType_DPL_union-module,indexes=ArrayList 
> {com.sbt.dpl.gridgain.newModel.base.indexes.PublishedIndexType
> [idHash=1583970836, hash=363194492, isSoftReference=false, 
> unselectiveBuckets=4096, fieldNames=ArrayList 
> \{isDeleted},moduleName=union-module
> , cachedUnselectives=1, selectors=ArrayList {isDeleted}, 
> exceptUnselectives=false, primitiveCollection=false, isVersioned=false, 
> isComposite=false, isSystemTypeBelongs=false,
> name=com.sbt.gbk.entities.DocType_DPL_isDeleted, isIndexedCollection=false, 
> isGlobal=false, maxSelective=1000], 
> com.sbt.dpl.gridgain.newModel.base.indexes.PublishedIndexType
> [idHash=2060926101, hash=1983794578, isSoftReference=false, 
> unselectiveBuckets=4096, fieldNames=ArrayList ,moduleName=union-module, 
> cachedUnselectives=1, selectors=ArrayList, exceptUnselectives=false, 
> primitiveCollection=false, isVersioned=false, isComposite=false, 
> isSystemTypeBelongs=false, name=com.sbt.gbk.entities.DocType_DPL_code, 
> isIndexedCollection=false, isGlobal=true, maxSelective=1000]
> , com.sbt.dpl.gridgain.newModel.base.indexes.PublishedIndexType
> [idHash=1821682714, hash=-1245813786, isSoftReference=false, 
> unselectiveBuckets=4096, fieldNames=ArrayList {globalId},
> moduleName=union-module, cachedUnselectives=1, selectors=ArrayList 
> {globalId}, exceptUnselectives=false, primitiveCollection=false, 
> isVersioned=false, isComposite=false, isSystemTypeBelongs=false,
> name=com.sbt.gbk.entities.DocType_DPL_globalId, isIndexedCollection=fa

[jira] [Assigned] (IGNITE-10898) Exchange coordinator failover breaks in some cases when node filter is used

2019-01-11 Thread Ilya Lantukh (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ilya Lantukh reassigned IGNITE-10898:
-

Assignee: Ilya Lantukh

> Exchange coordinator failover breaks in some cases when node filter is used
> ---
>
> Key: IGNITE-10898
> URL: https://issues.apache.org/jira/browse/IGNITE-10898
> Project: Ignite
>  Issue Type: Bug
>Reporter: Alexey Goncharuk
>Assignee: Ilya Lantukh
>Priority: Critical
> Fix For: 2.8
>
> Attachments: NodeWithFilterRestartTest.java
>
>
> Currently if a node does not pass cache node filter, we do not store this 
> cache affinity on the node unless the node is coordinator. This, however, may 
> fail in the following scenario:
> 1) A node passing node filter joins cluster
> 2) During the join coordinator fails, new coordinator is selected for which 
> previous exchange is completed
> 3) Next coordinator attempts to fetch the affinity, and joining node resends 
> partitions single message, but there are two problems here. First, exchange 
> fast-reply does not wait for the new affinity initialization which results in 
> {{IllegalStateException}}. Second, such an attempt to fetch affinity may lead 
> either to deadlock or to incorrectly fetched affinity (basically, coordinator 
> must be in consensus with other nodes passing node filter)
> Test attached reproduces the issue.
> I suggest to always calculate and keep affinity on all nodes, even ones not 
> passing the filter. In this case, there will be no need to fetch and 
> recalculate affinity ({{initCoordinatorCaches}} will go away.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-10290) Map.Entry interface for key cache may lead to incorrect hash code calculation

2018-12-24 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-10290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16728404#comment-16728404
 ] 

Ilya Lantukh commented on IGNITE-10290:
---

[~DmitriyGovorukhin], thanks for the contribution!

Please add test that reproduces this problem to the pull request.

> Map.Entry interface for key cache may lead to incorrect hash code calculation
> -
>
> Key: IGNITE-10290
> URL: https://issues.apache.org/jira/browse/IGNITE-10290
> Project: Ignite
>  Issue Type: Bug
>Reporter: Dmitriy Govorukhin
>Assignee: Dmitriy Govorukhin
>Priority: Major
> Fix For: 2.8
>
> Attachments: Reproducer.java
>
>
>  If use Map.Entry interface for a key, we can try to find (key, value) in 
> store with incorrect calculated hash code for binary representation, it lead 
> to result null.
> The problem is in the 
> GridPartitionedSingleGetFuture#localGet() and 
> GridPartitionedGetFuture#localGet() does not execute prepareForCache before 
> reading cacheDataRow from row store.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9149) Get rid of logging remaining supplier nodes rebalance time

2018-12-24 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16728396#comment-16728396
 ] 

Ilya Lantukh commented on IGNITE-9149:
--

Thanks for the contribution! Changes look good.

> Get rid of logging remaining supplier nodes rebalance time
> --
>
> Key: IGNITE-9149
> URL: https://issues.apache.org/jira/browse/IGNITE-9149
> Project: Ignite
>  Issue Type: Task
>Reporter: Maxim Muzafarov
>Assignee: PetrovMikhail
>Priority: Minor
>  Labels: rebalance
>
> Logging rebalance execution time in section of each supplier node have no 
> sence and provides no helpfull info for analyzing logs. It also 
> overcomplicates {{GridDhtPartitionDemander}}.
> I'm suggesting remove it by simplifying {{Map IgniteDhtDemandedPartitionsMap>>}} to {{Map IgniteDhtDemandedPartitionsMap>}}.
> {code:java}
> /** Remaining. T2: startTime, partitions */
> private final Map> remaining = 
> new HashMap<>();
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9303) PageSnapshot can contain wrong pageId tag when not dirty page is recycling

2018-12-20 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16725845#comment-16725845
 ] 

Ilya Lantukh commented on IGNITE-9303:
--

[~agoncharuk], thanks for the review.

I've addressed all your remarks except {restore=true} flag. While it isn't 
necessary, it is semantically correct. 

> PageSnapshot can contain wrong pageId tag when not dirty page is recycling
> --
>
> Key: IGNITE-9303
> URL: https://issues.apache.org/jira/browse/IGNITE-9303
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.6
>Reporter: Aleksey Plekhanov
>Assignee: Ilya Lantukh
>Priority: Major
> Fix For: 2.8
>
>
> When page is recycling (for example in {{BPlusTree.Remove#freePage()}} -> 
> {{DataStructure#recyclePage()}}) tag of {{pageId}} is modified, but original 
> {{pageId}} is passed to {{writeUnlock()}} method and this passed {{pageId}} 
> is stored to PageSnapshot WAL record.
> This bug may lead to errors in WAL applying during crash recovery.
> Reproducer (ignite-indexing module must be in classpath):
> {code:java}
> public class WalFailReproducer extends AbstractWalDeltaConsistencyTest {
> @Override protected boolean checkPagesOnCheckpoint() {
> return true;
> }
> public final void testPutRemoveCacheDestroy() throws Exception {
> CacheConfiguration ccfg = new 
> CacheConfiguration<>("cache0");
> ccfg.setIndexedTypes(Integer.class, Integer.class);
> IgniteEx ignite = startGrid(0);
> ignite.cluster().active(true);
> IgniteCache cache0 = ignite.getOrCreateCache(ccfg);
> for (int i = 0; i < 5_000; i++)
> cache0.put(i, i);
> forceCheckpoint();
> for (int i = 1_000; i < 4_000; i++)
> cache0.remove(i);
> forceCheckpoint();
> stopAllGrids();
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (IGNITE-9303) PageSnapshot can contain wrong pageId tag when not dirty page is recycling

2018-12-20 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16725845#comment-16725845
 ] 

Ilya Lantukh edited comment on IGNITE-9303 at 12/20/18 1:44 PM:


[~agoncharuk], thanks for the review.

I've addressed all your remarks except {{restore=true}} flag. While it isn't 
necessary, it is semantically correct.


was (Author: ilantukh):
[~agoncharuk], thanks for the review.

I've addressed all your remarks except {restore=true} flag. While it isn't 
necessary, it is semantically correct. 

> PageSnapshot can contain wrong pageId tag when not dirty page is recycling
> --
>
> Key: IGNITE-9303
> URL: https://issues.apache.org/jira/browse/IGNITE-9303
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.6
>Reporter: Aleksey Plekhanov
>Assignee: Ilya Lantukh
>Priority: Major
> Fix For: 2.8
>
>
> When page is recycling (for example in {{BPlusTree.Remove#freePage()}} -> 
> {{DataStructure#recyclePage()}}) tag of {{pageId}} is modified, but original 
> {{pageId}} is passed to {{writeUnlock()}} method and this passed {{pageId}} 
> is stored to PageSnapshot WAL record.
> This bug may lead to errors in WAL applying during crash recovery.
> Reproducer (ignite-indexing module must be in classpath):
> {code:java}
> public class WalFailReproducer extends AbstractWalDeltaConsistencyTest {
> @Override protected boolean checkPagesOnCheckpoint() {
> return true;
> }
> public final void testPutRemoveCacheDestroy() throws Exception {
> CacheConfiguration ccfg = new 
> CacheConfiguration<>("cache0");
> ccfg.setIndexedTypes(Integer.class, Integer.class);
> IgniteEx ignite = startGrid(0);
> ignite.cluster().active(true);
> IgniteCache cache0 = ignite.getOrCreateCache(ccfg);
> for (int i = 0; i < 5_000; i++)
> cache0.put(i, i);
> forceCheckpoint();
> for (int i = 1_000; i < 4_000; i++)
> cache0.remove(i);
> forceCheckpoint();
> stopAllGrids();
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-10058) resetLostPartitions() leaves an additional copy of a partition in the cluster

2018-12-19 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-10058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16724935#comment-16724935
 ] 

Ilya Lantukh commented on IGNITE-10058:
---

Hi [~xtern],

I think that GridDhtPartitionTopologyImpl.resetLostPartitions(...) shouldn't be 
called on non-coordinator nodes at all. They should simply send their local 
partition states and counters, just as on every other PME type. It is 
coordinator's job to change those states after it receives all single messages.

> resetLostPartitions() leaves an additional copy of a partition in the cluster
> -
>
> Key: IGNITE-10058
> URL: https://issues.apache.org/jira/browse/IGNITE-10058
> Project: Ignite
>  Issue Type: Bug
>Reporter: Stanislav Lukyanov
>Assignee: Pavel Pereslegin
>Priority: Major
> Fix For: 2.8
>
>
> If there are several copies of a LOST partition, resetLostPartitions() will 
> leave all of them in the cluster as OWNING.
> Scenario:
> 1) Start 4 nodes, a cache with backups=0 and READ_WRITE_SAFE, fill the cache
> 2) Stop one node - some partitions are recreated on the remaining nodes as 
> LOST
> 3) Start one node - the LOST partitions are being rebalanced to the new node 
> from the existing ones
> 4) Wait for rebalance to complete
> 5) Call resetLostPartitions()
> After that the partitions that were LOST become OWNING on all nodes that had 
> them. Eviction of these partitions doesn't start.
> Need to correctly evict additional copies of LOST partitions either after 
> rebalance on step 4 or after resetLostPartitions() call on step 5.
> Current resetLostPartitions() implementation does call checkEvictions(), but 
> the ready affinity assignment contains several nodes per partition for some 
> reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (IGNITE-10589) Multiple server node failure after a client node stopping

2018-12-07 Thread Ilya Lantukh (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ilya Lantukh reassigned IGNITE-10589:
-

Assignee: Ilya Lantukh

> Multiple server node failure after a client node stopping
> -
>
> Key: IGNITE-10589
> URL: https://issues.apache.org/jira/browse/IGNITE-10589
> Project: Ignite
>  Issue Type: Bug
>Reporter: Sergey Kosarev
>Assignee: Ilya Lantukh
>Priority: Critical
> Attachments: 16_02.tar
>
>
> after stopping a client
> we see  topology change and pme finish on the coordinator, 
> and at soon on another nodes we still don't see new topology, but have 
> Critical error resulting nodes failure
> crd log
> {code}
> 2018-12-06 15:55:23.660 [WARN 
> ][disco-event-worker-#159%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
>  Node FAILED: ZookeeperClusterNode [id=979f03db-f858-44f6-8646-12034dfd5c93, 
> addrs=[10.116.206.1], order=129, loc=false, client=true]
> 2018-12-06 15:55:23.660 [INFO 
> ][disco-event-worker-#159%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
>  Topology snapshot [ver=162, servers=128, clients=0, CPUs=7168, 
> offheap=14.0GB, heap=4000.0GB]
> 2018-12-06 15:55:23.660 [INFO 
> ][disco-event-worker-#159%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
>^-- Node [id=44D27930-80E5-4EB7-B377-8B07C02C2033, clusterState=ACTIVE]
> 2018-12-06 15:55:23.660 [INFO 
> ][zk-DPL_GRID%DplGridNodeName-EventThread][o.a.i.s.d.z.i.ZookeeperDiscoveryImpl]
>  Process alive nodes change [alives=128]
> 2018-12-06 15:55:23.661 [INFO 
> ][disco-event-worker-#159%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
>^-- Baseline [id=0, size=128, online=128, offline=0]
> 2018-12-06 15:55:23.661 [INFO 
> ][disco-event-worker-#159%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
>  Data Regions Configured:
> 2018-12-06 15:55:23.661 [INFO 
> ][disco-event-worker-#159%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
>^-- dpl_mem_plc [initSize=256.0 MiB, maxSize=556.6 GiB, 
> persistenceEnabled=true]
> 2018-12-06 15:55:23.661 [INFO 
> ][disco-event-worker-#159%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDiscoveryManager]
>^-- not-persisted [initSize=256.0 MiB, maxSize=556.6 GiB, 
> persistenceEnabled=false]
> 2018-12-06 15:55:23.670 
> [DEBUG][sys-#564%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.l.ExchangeLatchManager]
>  Process node left 979f03db-f858-44f6-8646-12034dfd5c93
> 2018-12-06 15:55:23.670 [INFO 
> ][exchange-worker-#160%DPL_GRID%DplGridNodeName%][o.a.ignite.internal.exchange.time]
>  Started exchange init [topVer=AffinityTopologyVersion [topVer=162, 
> minorTopVer=0], crd=true, evt=NODE_FAILED, 
> evtNode=979f03db-f858-44f6-8646-12034dfd5c93, customEvt=null, allowMerge=true]
> 2018-12-06 15:55:23.712 [INFO 
> ][exchange-worker-#160%DPL_GRID%DplGridNodeName%][o.a.ignite.internal.exchange.time]
>  Finished exchange init [topVer=AffinityTopologyVersion [topVer=162, 
> minorTopVer=0], crd=true]
> 2018-12-06 15:55:23.699 [INFO 
> ][exchange-worker-#160%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture]
>  Finish exchange future [startVer=AffinityTopologyVersion [topVer=162, 
> minorTopVer=0], resVer=AffinityTopologyVersion [topVer=162, minorTopVer=0], 
> err=null]
> {code}
> on a node(1) we have critical error(1)
> {code}
> 2018-12-06 15:55:23.727 
> [ERROR][utility-#432%DPL_GRID%DplGridNodeName%][o.a.i.i.p.cache.GridCacheIoManager]
>  Failed processing message [senderId=1e17c56a-5213-4a1b-b94b-4575a95a2c81, 
> msg=GridDhtTxPrepareRequest [nearNodeId=44d27930-80e5-4eb7-b377-8b07c02c2033,
>  futId=1d225238761-05eea259-5c25-4a4b-8469-9dd8980e218c, miniId=105, 
> topVer=AffinityTopologyVersion [topVer=162, minorTopVer=0], 
> invalidateNearEntries={}, nearWrites=null, owned=null, 
> nearXidVer=GridCacheVersion [topVer=155571374, order=1545423626166, nodeOrd
> er=1], subjId=44d27930-80e5-4eb7-b377-8b07c02c2033, taskNameHash=0, 
> preloadKeys=null, skipCompletedVers=false, 
> super=GridDistributedTxPrepareRequest [threadId=1281, 
> concurrency=PESSIMISTIC, isolation=REPEATABLE_READ, writeVer=GridCacheVersion 
> [topVer=15557137
> 4, order=1545423626614, nodeOrder=96], timeout=0, reads=null, 
> writes=ArrayList [IgniteTxEntry [key=KeyCacheObjectImpl [part=65, 
> val=GridServiceAssignmentsKey [name=DPLThreadManager_service], 
> hasValBytes=true], cacheId=-2100569601, txKey=IgniteTxKey [key=KeyCa
> cheObjectImpl [part=65, val=GridServiceAssignmentsKey 
> [name=DPLThreadManager_service], hasValBytes=true], cacheId=-2100569601], 
> val=CacheObjectImpl [val=GridServiceAssignments 
> [nodeId=426a4a51-1af3-4019-9769-4a58d8ece426, topVer=162, 
> cfg=LazyServiceConfigurat
> ion [srvcClsName=com.sbt.dpl.gridgain.thread.DPLThreadManager, svcCls=, 
> 

[jira] [Commented] (IGNITE-10058) resetLostPartitions() leaves an additional copy of a partition in the cluster

2018-12-06 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-10058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711600#comment-16711600
 ] 

Ilya Lantukh commented on IGNITE-10058:
---

[~xtern],

Thanks for your efforts and willingness to solve this problem!

Unfortunately, our current implementation of partition loss mechanics has a 
number of complex flaws, which result in strange behavior. And to solve them we 
must re-work, re-design and improve this particular mechanism. Adding hacks to 
other pieces of code will just make things worse.

To solve this particular issue, I suggest the following:
1. Deprecate PartitionLossPolicy.READ_WRITE_ALL. If we assume that it's 
possible to modify data in LOST partitions, we should prepare for very weird 
scenarios that are impossible to solve with current architecture.
2. Modify GridDhtPartitionTopologyImpl.resetLostPartitions(...) - it should 
reset update counters to 0 only if at the moment when the method was called 
there was at least one partition owner. Also, add special logic for the case 
when all LOST partitions already have update counter 0 - transfer state to 
OWNING only on affinity nodes.
3. Ensure that resetLostPartitions(...) call always leads to rebalance, and 
after that all non-affinity nodes evict their partition instances.

> resetLostPartitions() leaves an additional copy of a partition in the cluster
> -
>
> Key: IGNITE-10058
> URL: https://issues.apache.org/jira/browse/IGNITE-10058
> Project: Ignite
>  Issue Type: Bug
>Reporter: Stanislav Lukyanov
>Assignee: Pavel Pereslegin
>Priority: Major
> Fix For: 2.8
>
>
> If there are several copies of a LOST partition, resetLostPartitions() will 
> leave all of them in the cluster as OWNING.
> Scenario:
> 1) Start 4 nodes, a cache with backups=0 and READ_WRITE_SAFE, fill the cache
> 2) Stop one node - some partitions are recreated on the remaining nodes as 
> LOST
> 3) Start one node - the LOST partitions are being rebalanced to the new node 
> from the existing ones
> 4) Wait for rebalance to complete
> 5) Call resetLostPartitions()
> After that the partitions that were LOST become OWNING on all nodes that had 
> them. Eviction of these partitions doesn't start.
> Need to correctly evict additional copies of LOST partitions either after 
> rebalance on step 4 or after resetLostPartitions() call on step 5.
> Current resetLostPartitions() implementation does call checkEvictions(), but 
> the ready affinity assignment contains several nodes per partition for some 
> reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9303) PageSnapshot can contain wrong pageId tag when not dirty page is recycling

2018-12-05 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16710108#comment-16710108
 ] 

Ilya Lantukh commented on IGNITE-9303:
--

Actually, there was a bug in page recycling mechanism, which led to test 
failures. One of such tests is 
IgniteLogicalRecoveryTest.testRecoveryOnCrushDuringCheckpointOnNodeStart.

> PageSnapshot can contain wrong pageId tag when not dirty page is recycling
> --
>
> Key: IGNITE-9303
> URL: https://issues.apache.org/jira/browse/IGNITE-9303
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.6
>Reporter: Aleksey Plekhanov
>Assignee: Ilya Lantukh
>Priority: Major
> Fix For: 2.8
>
>
> When page is recycling (for example in {{BPlusTree.Remove#freePage()}} -> 
> {{DataStructure#recyclePage()}}) tag of {{pageId}} is modified, but original 
> {{pageId}} is passed to {{writeUnlock()}} method and this passed {{pageId}} 
> is stored to PageSnapshot WAL record.
> This bug may lead to errors in WAL applying during crash recovery.
> Reproducer (ignite-indexing module must be in classpath):
> {code:java}
> public class WalFailReproducer extends AbstractWalDeltaConsistencyTest {
> @Override protected boolean checkPagesOnCheckpoint() {
> return true;
> }
> public final void testPutRemoveCacheDestroy() throws Exception {
> CacheConfiguration ccfg = new 
> CacheConfiguration<>("cache0");
> ccfg.setIndexedTypes(Integer.class, Integer.class);
> IgniteEx ignite = startGrid(0);
> ignite.cluster().active(true);
> IgniteCache cache0 = ignite.getOrCreateCache(ccfg);
> for (int i = 0; i < 5_000; i++)
> cache0.put(i, i);
> forceCheckpoint();
> for (int i = 1_000; i < 4_000; i++)
> cache0.remove(i);
> forceCheckpoint();
> stopAllGrids();
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9290) Make remove explicit locks async when node left.

2018-12-05 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16710008#comment-16710008
 ] 

Ilya Lantukh commented on IGNITE-9290:
--

[~amashenkov], why did you remove onLeft() call in ExchangeFuture.init(...) for 
case when event node is client?

And why do we still need to call removeExplicitNodeLocks(...) from 
ExchangeFuture if it is now handled by listener in GridCacheMvccManager?

> Make remove explicit locks async when node left.
> 
>
> Key: IGNITE-9290
> URL: https://issues.apache.org/jira/browse/IGNITE-9290
> Project: Ignite
>  Issue Type: Bug
>  Components: cache
>Reporter: Andrew Mashenkov
>Assignee: Andrew Mashenkov
>Priority: Critical
>  Labels: deadlock, iep-25
> Fix For: 2.8
>
>
> GridCacheMvccManager.removeExplicitNodeLocks() run synchronously in discovery 
> and exchange threads. This introduce unnecessary delays in discovery and 
> exchange process.
> Also, this may cause a deadlock on node stop if user transaction holds an 
> entry lock and awaits some Ignite manager response (e.g. cache store or DR or 
> CQ), as manager stops right after last exchange has finished so managers 
> can't detect node is stopping. 
>  
> [1] 
> [http://apache-ignite-developers.2346864.n4.nabble.com/Synchronous-tx-entries-unlocking-in-discovery-exchange-threads-td33827.html]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-10437) GridCacheWriteBehindStoreMultithreadedSelfTest.testFlushFromTheSameThreadWithCoalescing is flaky

2018-12-05 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-10437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16709992#comment-16709992
 ] 

Ilya Lantukh commented on IGNITE-10437:
---

Thanks, looks good now.

> GridCacheWriteBehindStoreMultithreadedSelfTest.testFlushFromTheSameThreadWithCoalescing
>  is flaky
> 
>
> Key: IGNITE-10437
> URL: https://issues.apache.org/jira/browse/IGNITE-10437
> Project: Ignite
>  Issue Type: Test
>Reporter: Ryabov Dmitrii
>Assignee: Ryabov Dmitrii
>Priority: Minor
>  Labels: MakeTeamcityGreenAgain
> Fix For: 2.8
>
>
> Fails periodically on 
> [TeamCity|https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=-2991182438861864832&tab=testDetails&branch_IgniteTests24Java8=%3Cdefault%3E].
> {code:java}
> junit.framework.AssertionFailedError: No cache overflows detected (a bug or 
> too few keys or too few delay?)
>   at junit.framework.Assert.fail(Assert.java:57)
>   at junit.framework.Assert.assertTrue(Assert.java:22)
>   at junit.framework.TestCase.assertTrue(TestCase.java:192)
>   at 
> org.apache.ignite.internal.processors.cache.store.GridCacheWriteBehindStoreMultithreadedSelfTest.testFlushFromTheSameThread(GridCacheWriteBehindStoreMultithreadedSelfTest.java:215)
>   at 
> org.apache.ignite.internal.processors.cache.store.GridCacheWriteBehindStoreMultithreadedSelfTest.testFlushFromTheSameThreadWithCoalescing(GridCacheWriteBehindStoreMultithreadedSelfTest.java:166)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at junit.framework.TestCase.runTest(TestCase.java:176)
>   at 
> org.apache.ignite.testframework.junits.GridAbstractTest.access$001(GridAbstractTest.java:150)
>   at 
> org.apache.ignite.testframework.junits.GridAbstractTest$6.evaluate(GridAbstractTest.java:2104)
>   at 
> org.apache.ignite.testframework.junits.GridAbstractTest$7.run(GridAbstractTest.java:2119)
>   at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (IGNITE-9213) CacheLockReleaseNodeLeaveTest.testLockTopologyChange hangs sometimes, leading to TC timeout

2018-12-04 Thread Ilya Lantukh (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ilya Lantukh reassigned IGNITE-9213:


Assignee: (was: Ilya Lantukh)

> CacheLockReleaseNodeLeaveTest.testLockTopologyChange hangs sometimes, leading 
> to TC timeout
> ---
>
> Key: IGNITE-9213
> URL: https://issues.apache.org/jira/browse/IGNITE-9213
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ilya Lantukh
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
> Attachments: ignite-9213-threaddump.txt
>
>
> Probability is quite low, < 5%.
> One thread gets stuck in GridCacheAdapter.lockAll(...), holding gw readlock 
> and waiting for future that never completes. Another one cannot acquire gw 
> writelock.
> {code}
> "test-runner-#123405%distributed.CacheLockReleaseNodeLeaveTest%" #136172 
> prio=5 os_prio=0 tid=0x7f20cd3d7000 nid=0x356f 
> sleeping[0x7f1eae48b000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.ignite.internal.util.IgniteUtils.sleep(IgniteUtils.java:7678)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheGateway.onStopped(GridCacheGateway.java:318)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.blockGateways(GridCacheProcessor.java:970)
>   at org.apache.ignite.internal.IgniteKernal.stop0(IgniteKernal.java:2195)
>   at org.apache.ignite.internal.IgniteKernal.stop(IgniteKernal.java:2082)
>   at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2595)
>   - locked <0xc2e69580> (a 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance)
>   at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop(IgnitionEx.java:2558)
>   at org.apache.ignite.internal.IgnitionEx.stop(IgnitionEx.java:374)
>   at org.apache.ignite.Ignition.stop(Ignition.java:229)
>   at 
> org.apache.ignite.testframework.junits.GridAbstractTest.stopGrid(GridAbstractTest.java:1153)
>   at 
> org.apache.ignite.testframework.junits.GridAbstractTest.stopAllGrids(GridAbstractTest.java:1196)
>   at 
> org.apache.ignite.testframework.junits.GridAbstractTest.stopAllGrids(GridAbstractTest.java:1174)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.CacheLockReleaseNodeLeaveTest.testLockTopologyChange(CacheLockReleaseNodeLeaveTest.java:177)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at junit.framework.TestCase.runTest(TestCase.java:176)
>   at 
> org.apache.ignite.testframework.junits.GridAbstractTest.runTestInternal(GridAbstractTest.java:2156)
>   at 
> org.apache.ignite.testframework.junits.GridAbstractTest.access$000(GridAbstractTest.java:143)
>   at 
> org.apache.ignite.testframework.junits.GridAbstractTest$5.run(GridAbstractTest.java:2071)
>   at java.lang.Thread.run(Thread.java:745)
> "test-lock-thread-4" #136488 prio=5 os_prio=0 tid=0x7f208802a000 
> nid=0x36a5 waiting on condition [0x7f1ea81c3000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
>   at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
>   at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheAdapter.lockAll(GridCacheAdapter.java:3405)
>   at 
> org.apache.ignite.internal.processors.cache.CacheLockImpl.lock(CacheLockImpl.java:74)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.CacheLockReleaseNodeLeaveTest$3.run(CacheLockReleaseNodeLeaveTest.java:154)
>   at 
> org.apache.ignite.testframework.GridTestUtils$6.call(GridTestUtils.java:1254)
>   at 
> org.apache.ignite.testframework.GridTestThread.run(GridTestThread.java:86)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9303) PageSnapshot can contain wrong pageId tag when not dirty page is recycling

2018-12-03 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16707262#comment-16707262
 ] 

Ilya Lantukh commented on IGNITE-9303:
--

I don't see any problem in PageMemory code, only in PageMemoryTracker (which is 
just a testing utility).

> PageSnapshot can contain wrong pageId tag when not dirty page is recycling
> --
>
> Key: IGNITE-9303
> URL: https://issues.apache.org/jira/browse/IGNITE-9303
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.6
>Reporter: Aleksey Plekhanov
>Assignee: Ilya Lantukh
>Priority: Major
> Fix For: 2.8
>
>
> When page is recycling (for example in {{BPlusTree.Remove#freePage()}} -> 
> {{DataStructure#recyclePage()}}) tag of {{pageId}} is modified, but original 
> {{pageId}} is passed to {{writeUnlock()}} method and this passed {{pageId}} 
> is stored to PageSnapshot WAL record.
> This bug may lead to errors in WAL applying during crash recovery.
> Reproducer (ignite-indexing module must be in classpath):
> {code:java}
> public class WalFailReproducer extends AbstractWalDeltaConsistencyTest {
> @Override protected boolean checkPagesOnCheckpoint() {
> return true;
> }
> public final void testPutRemoveCacheDestroy() throws Exception {
> CacheConfiguration ccfg = new 
> CacheConfiguration<>("cache0");
> ccfg.setIndexedTypes(Integer.class, Integer.class);
> IgniteEx ignite = startGrid(0);
> ignite.cluster().active(true);
> IgniteCache cache0 = ignite.getOrCreateCache(ccfg);
> for (int i = 0; i < 5_000; i++)
> cache0.put(i, i);
> forceCheckpoint();
> for (int i = 1_000; i < 4_000; i++)
> cache0.remove(i);
> forceCheckpoint();
> stopAllGrids();
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-10437) GridCacheWriteBehindStoreMultithreadedSelfTest.testFlushFromTheSameThreadWithCoalescing is flaky

2018-12-03 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-10437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16707174#comment-16707174
 ] 

Ilya Lantukh commented on IGNITE-10437:
---

[~SomeFire], It seems to me that your patch doesn't solve the issue, but just 
reduces it's probability. I would prefer to re-write this test (and 
GridCacheTestStore if necessary) to ensure that overflow always happens. For 
example, you can insert CyclicBarrier or CountDownLatch into 
GridCacheTestStore.write method to make all Flusher threads stuck, or you can 
put data in test in a *while* loop until overflow happens.

> GridCacheWriteBehindStoreMultithreadedSelfTest.testFlushFromTheSameThreadWithCoalescing
>  is flaky
> 
>
> Key: IGNITE-10437
> URL: https://issues.apache.org/jira/browse/IGNITE-10437
> Project: Ignite
>  Issue Type: Test
>Reporter: Ryabov Dmitrii
>Assignee: Ryabov Dmitrii
>Priority: Minor
>  Labels: MakeTeamcityGreenAgain
> Fix For: 2.8
>
>
> Fails periodically on 
> [TeamCity|https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=-2991182438861864832&tab=testDetails&branch_IgniteTests24Java8=%3Cdefault%3E].
> {code:java}
> junit.framework.AssertionFailedError: No cache overflows detected (a bug or 
> too few keys or too few delay?)
>   at junit.framework.Assert.fail(Assert.java:57)
>   at junit.framework.Assert.assertTrue(Assert.java:22)
>   at junit.framework.TestCase.assertTrue(TestCase.java:192)
>   at 
> org.apache.ignite.internal.processors.cache.store.GridCacheWriteBehindStoreMultithreadedSelfTest.testFlushFromTheSameThread(GridCacheWriteBehindStoreMultithreadedSelfTest.java:215)
>   at 
> org.apache.ignite.internal.processors.cache.store.GridCacheWriteBehindStoreMultithreadedSelfTest.testFlushFromTheSameThreadWithCoalescing(GridCacheWriteBehindStoreMultithreadedSelfTest.java:166)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at junit.framework.TestCase.runTest(TestCase.java:176)
>   at 
> org.apache.ignite.testframework.junits.GridAbstractTest.access$001(GridAbstractTest.java:150)
>   at 
> org.apache.ignite.testframework.junits.GridAbstractTest$6.evaluate(GridAbstractTest.java:2104)
>   at 
> org.apache.ignite.testframework.junits.GridAbstractTest$7.run(GridAbstractTest.java:2119)
>   at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (IGNITE-9303) PageSnapshot can contain wrong pageId tag when not dirty page is recycling

2018-12-03 Thread Ilya Lantukh (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ilya Lantukh reassigned IGNITE-9303:


Assignee: Ilya Lantukh

> PageSnapshot can contain wrong pageId tag when not dirty page is recycling
> --
>
> Key: IGNITE-9303
> URL: https://issues.apache.org/jira/browse/IGNITE-9303
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.6
>Reporter: Aleksey Plekhanov
>Assignee: Ilya Lantukh
>Priority: Major
> Fix For: 2.8
>
>
> When page is recycling (for example in {{BPlusTree.Remove#freePage()}} -> 
> {{DataStructure#recyclePage()}}) tag of {{pageId}} is modified, but original 
> {{pageId}} is passed to {{writeUnlock()}} method and this passed {{pageId}} 
> is stored to PageSnapshot WAL record.
> This bug may lead to errors in WAL applying during crash recovery.
> Reproducer (ignite-indexing module must be in classpath):
> {code:java}
> public class WalFailReproducer extends AbstractWalDeltaConsistencyTest {
> @Override protected boolean checkPagesOnCheckpoint() {
> return true;
> }
> public final void testPutRemoveCacheDestroy() throws Exception {
> CacheConfiguration ccfg = new 
> CacheConfiguration<>("cache0");
> ccfg.setIndexedTypes(Integer.class, Integer.class);
> IgniteEx ignite = startGrid(0);
> ignite.cluster().active(true);
> IgniteCache cache0 = ignite.getOrCreateCache(ccfg);
> for (int i = 0; i < 5_000; i++)
> cache0.put(i, i);
> forceCheckpoint();
> for (int i = 1_000; i < 4_000; i++)
> cache0.remove(i);
> forceCheckpoint();
> stopAllGrids();
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-10427) GridClusterStateProcessor#changeGlobalState0() should wrap future before sending ChangeGlobalStateMessage

2018-11-30 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-10427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16705013#comment-16705013
 ] 

Ilya Lantukh commented on IGNITE-10427:
---

Thanks for the contribution!

Looks good.

> GridClusterStateProcessor#changeGlobalState0() should wrap future before 
> sending ChangeGlobalStateMessage
> -
>
> Key: IGNITE-10427
> URL: https://issues.apache.org/jira/browse/IGNITE-10427
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.6
>Reporter: Sergey Antonov
>Assignee: Sergey Antonov
>Priority: Major
> Fix For: 2.8
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-10437) GridCacheWriteBehindStoreMultithreadedSelfTest.testFlushFromTheSameThreadWithCoalescing is flaky

2018-11-29 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-10437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16703180#comment-16703180
 ] 

Ilya Lantukh commented on IGNITE-10437:
---

[~SomeFire],

Can you please explain what was the reason of the problem and how your patch 
fixed it?

> GridCacheWriteBehindStoreMultithreadedSelfTest.testFlushFromTheSameThreadWithCoalescing
>  is flaky
> 
>
> Key: IGNITE-10437
> URL: https://issues.apache.org/jira/browse/IGNITE-10437
> Project: Ignite
>  Issue Type: Test
>Reporter: Ryabov Dmitrii
>Assignee: Ryabov Dmitrii
>Priority: Minor
>  Labels: MakeTeamcityGreenAgain
> Fix For: 2.8
>
>
> Fails periodically on 
> [TeamCity|https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=-2991182438861864832&tab=testDetails&branch_IgniteTests24Java8=%3Cdefault%3E].
> {code:java}
> junit.framework.AssertionFailedError: No cache overflows detected (a bug or 
> too few keys or too few delay?)
>   at junit.framework.Assert.fail(Assert.java:57)
>   at junit.framework.Assert.assertTrue(Assert.java:22)
>   at junit.framework.TestCase.assertTrue(TestCase.java:192)
>   at 
> org.apache.ignite.internal.processors.cache.store.GridCacheWriteBehindStoreMultithreadedSelfTest.testFlushFromTheSameThread(GridCacheWriteBehindStoreMultithreadedSelfTest.java:215)
>   at 
> org.apache.ignite.internal.processors.cache.store.GridCacheWriteBehindStoreMultithreadedSelfTest.testFlushFromTheSameThreadWithCoalescing(GridCacheWriteBehindStoreMultithreadedSelfTest.java:166)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at junit.framework.TestCase.runTest(TestCase.java:176)
>   at 
> org.apache.ignite.testframework.junits.GridAbstractTest.access$001(GridAbstractTest.java:150)
>   at 
> org.apache.ignite.testframework.junits.GridAbstractTest$6.evaluate(GridAbstractTest.java:2104)
>   at 
> org.apache.ignite.testframework.junits.GridAbstractTest$7.run(GridAbstractTest.java:2119)
>   at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-4111) Communication fails to send message if target node did not finish join process

2018-11-29 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-4111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16703178#comment-16703178
 ] 

Ilya Lantukh commented on IGNITE-4111:
--

Thanks, looks good now.

> Communication fails to send message if target node did not finish join process
> --
>
> Key: IGNITE-4111
> URL: https://issues.apache.org/jira/browse/IGNITE-4111
> Project: Ignite
>  Issue Type: Bug
>  Components: general
>Reporter: Semen Boikov
>Assignee: Amelchev Nikita
>Priority: Minor
> Fix For: 2.8
>
> Attachments: test onFirstMessage hang.log
>
>
> Currently this scenario is possible:
> - joining node sent join request and waits for 
> TcpDiscoveryNodeAddFinishedMessage inside ServerImpl.joinTopology
> - others nodes already see this node and can send messages to it (for example 
> try to run compute job on this node)
> - joining node can not receive message: TcpCommunicationSpi will hang inside 
> 'onFirstMessage' on 'getSpiContext' call, so sending node will get error 
> trying to establish connection
> Possible fix: if in onFirstMessage() spi context is not available, then 
> TcpCommunicationSpi  should send special response which indicates that this 
> node is not ready yet, and sender should retry after some time.
> Also need check internal code for places where message can be unnecessarily 
> sent to node: one such place is 
> GridCachePartitionExchangeManager.refreshPartitions - message is sent to all 
> known nodes, but here we can filter by node order / finished exchage version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-10392) Client broken cluster where try to connect. Server nodes drop by handler

2018-11-23 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-10392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16697322#comment-16697322
 ] 

Ilya Lantukh commented on IGNITE-10392:
---

The aforementioned test failures are known issues.

> Client broken cluster where try to connect. Server nodes drop by handler
> 
>
> Key: IGNITE-10392
> URL: https://issues.apache.org/jira/browse/IGNITE-10392
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ilya Lantukh
>Assignee: Ilya Lantukh
>Priority: Blocker
> Fix For: 2.8
>
>
> {noformat}
> org.apache.ignite.IgniteException: Failed to resolve nodes topology 
> [cacheGrp=N/A, topVer=AffinityTopologyVersion [topVer=133, minorTopVer=0], 
> history=[AffinityTopologyVersion [topVer=35, minorTopVer=0], 
> AffinityTopologyVersion [topVer=36, minorTopVer=0], AffinityTopologyVersion 
> [topVer=37, minorTopVer=0], AffinityTopologyVersion [topVer=38, 
> minorTopVer=0], AffinityTopologyVersion [topVer=39, minorTopVer=0], 
> AffinityTopologyVersion [topVer=40, minorTopVer=0], AffinityTopologyVersion 
> [topVer=41, minorTopVer=0], AffinityTopologyVersion [topVer=42, 
> minorTopVer=0], AffinityTopologyVersion [topVer=43, minorTopVer=0], 
> AffinityTopologyVersion [topVer=44, minorTopVer=0], AffinityTopologyVersion 
> [topVer=45, minorTopVer=0], AffinityTopologyVersion [topVer=46, 
> minorTopVer=0], AffinityTopologyVersion [topVer=47, minorTopVer=0], 
> AffinityTopologyVersion [topVer=48, minorTopVer=0], AffinityTopologyVersion 
> [topVer=49, minorTopVer=0], AffinityTopologyVersion [topVer=50, 
> minorTopVer=0], AffinityTopologyVersion [topVer=51, minorTopVer=0], 
> AffinityTopologyVersion [topVer=52, minorTopVer=0], AffinityTopologyVersion 
> [topVer=53, minorTopVer=0], AffinityTopologyVersion [topVer=54, 
> minorTopVer=0], AffinityTopologyVersion [topVer=55, minorTopVer=0], 
> AffinityTopologyVersion [topVer=56, minorTopVer=0], AffinityTopologyVersion 
> [topVer=57, minorTopVer=0], AffinityTopologyVersion [topVer=58, 
> minorTopVer=0], AffinityTopologyVersion [topVer=59, minorTopVer=0], 
> AffinityTopologyVersion [topVer=60, minorTopVer=0], AffinityTopologyVersion 
> [topVer=61, minorTopVer=0], AffinityTopologyVersion [topVer=62, 
> minorTopVer=0], AffinityTopologyVersion [topVer=63, minorTopVer=0], 
> AffinityTopologyVersion [topVer=64, minorTopVer=0], AffinityTopologyVersion 
> [topVer=65, minorTopVer=0], AffinityTopologyVersion [topVer=66, 
> minorTopVer=0], AffinityTopologyVersion [topVer=67, minorTopVer=0], 
> AffinityTopologyVersion [topVer=68, minorTopVer=0], AffinityTopologyVersion 
> [topVer=69, minorTopVer=0], AffinityTopologyVersion [topVer=70, 
> minorTopVer=0], AffinityTopologyVersion [topVer=71, minorTopVer=0], 
> AffinityTopologyVersion [topVer=72, minorTopVer=0], AffinityTopologyVersion 
> [topVer=73, minorTopVer=0], AffinityTopologyVersion [topVer=74, 
> minorTopVer=0], AffinityTopologyVersion [topVer=75, minorTopVer=0], 
> AffinityTopologyVersion [topVer=76, minorTopVer=0], AffinityTopologyVersion 
> [topVer=77, minorTopVer=0], AffinityTopologyVersion [topVer=78, 
> minorTopVer=0], AffinityTopologyVersion [topVer=79, minorTopVer=0], 
> AffinityTopologyVersion [topVer=80, minorTopVer=0], AffinityTopologyVersion 
> [topVer=81, minorTopVer=0], AffinityTopologyVersion [topVer=82, 
> minorTopVer=0], AffinityTopologyVersion [topVer=83, minorTopVer=0], 
> AffinityTopologyVersion [topVer=84, minorTopVer=0], AffinityTopologyVersion 
> [topVer=85, minorTopVer=0], AffinityTopologyVersion [topVer=86, 
> minorTopVer=0], AffinityTopologyVersion [topVer=87, minorTopVer=0], 
> AffinityTopologyVersion [topVer=88, minorTopVer=0], AffinityTopologyVersion 
> [topVer=89, minorTopVer=0], AffinityTopologyVersion [topVer=90, 
> minorTopVer=0], AffinityTopologyVersion [topVer=91, minorTopVer=0], 
> AffinityTopologyVersion [topVer=92, minorTopVer=0], AffinityTopologyVersion 
> [topVer=93, minorTopVer=0], AffinityTopologyVersion [topVer=94, 
> minorTopVer=0], AffinityTopologyVersion [topVer=95, minorTopVer=0], 
> AffinityTopologyVersion [topVer=96, minorTopVer=0], AffinityTopologyVersion 
> [topVer=97, minorTopVer=0], AffinityTopologyVersion [topVer=98, 
> minorTopVer=0], AffinityTopologyVersion [topVer=99, minorTopVer=0], 
> AffinityTopologyVersion [topVer=100, minorTopVer=0], AffinityTopologyVersion 
> [topVer=101, minorTopVer=0], AffinityTopologyVersion [topVer=102, 
> minorTopVer=0], AffinityTopologyVersion [topVer=103, minorTopVer=0], 
> AffinityTopologyVersion [topVer=104, minorTopVer=0], AffinityTopologyVersion 
> [topVer=105, minorTopVer=0], AffinityTopologyVersion [topVer=106, 
> minorTopVer=0], AffinityTopologyVersion [topVer=107, minorTopVer=0], 
> AffinityTopologyVersion [topVer=108, minorTopVer=0], AffinityTopologyVersion 
> [topVer=109, 

[jira] [Created] (IGNITE-10392) Client broken cluster where try to connect. Server nodes drop by handler

2018-11-23 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-10392:
-

 Summary: Client broken cluster where try to connect. Server nodes 
drop by handler
 Key: IGNITE-10392
 URL: https://issues.apache.org/jira/browse/IGNITE-10392
 Project: Ignite
  Issue Type: Bug
Reporter: Ilya Lantukh
Assignee: Ilya Lantukh


{noformat}
org.apache.ignite.IgniteException: Failed to resolve nodes topology 
[cacheGrp=N/A, topVer=AffinityTopologyVersion [topVer=133, minorTopVer=0], 
history=[AffinityTopologyVersion [topVer=35, minorTopVer=0], 
AffinityTopologyVersion [topVer=36, minorTopVer=0], AffinityTopologyVersion 
[topVer=37, minorTopVer=0], AffinityTopologyVersion [topVer=38, minorTopVer=0], 
AffinityTopologyVersion [topVer=39, minorTopVer=0], AffinityTopologyVersion 
[topVer=40, minorTopVer=0], AffinityTopologyVersion [topVer=41, minorTopVer=0], 
AffinityTopologyVersion [topVer=42, minorTopVer=0], AffinityTopologyVersion 
[topVer=43, minorTopVer=0], AffinityTopologyVersion [topVer=44, minorTopVer=0], 
AffinityTopologyVersion [topVer=45, minorTopVer=0], AffinityTopologyVersion 
[topVer=46, minorTopVer=0], AffinityTopologyVersion [topVer=47, minorTopVer=0], 
AffinityTopologyVersion [topVer=48, minorTopVer=0], AffinityTopologyVersion 
[topVer=49, minorTopVer=0], AffinityTopologyVersion [topVer=50, minorTopVer=0], 
AffinityTopologyVersion [topVer=51, minorTopVer=0], AffinityTopologyVersion 
[topVer=52, minorTopVer=0], AffinityTopologyVersion [topVer=53, minorTopVer=0], 
AffinityTopologyVersion [topVer=54, minorTopVer=0], AffinityTopologyVersion 
[topVer=55, minorTopVer=0], AffinityTopologyVersion [topVer=56, minorTopVer=0], 
AffinityTopologyVersion [topVer=57, minorTopVer=0], AffinityTopologyVersion 
[topVer=58, minorTopVer=0], AffinityTopologyVersion [topVer=59, minorTopVer=0], 
AffinityTopologyVersion [topVer=60, minorTopVer=0], AffinityTopologyVersion 
[topVer=61, minorTopVer=0], AffinityTopologyVersion [topVer=62, minorTopVer=0], 
AffinityTopologyVersion [topVer=63, minorTopVer=0], AffinityTopologyVersion 
[topVer=64, minorTopVer=0], AffinityTopologyVersion [topVer=65, minorTopVer=0], 
AffinityTopologyVersion [topVer=66, minorTopVer=0], AffinityTopologyVersion 
[topVer=67, minorTopVer=0], AffinityTopologyVersion [topVer=68, minorTopVer=0], 
AffinityTopologyVersion [topVer=69, minorTopVer=0], AffinityTopologyVersion 
[topVer=70, minorTopVer=0], AffinityTopologyVersion [topVer=71, minorTopVer=0], 
AffinityTopologyVersion [topVer=72, minorTopVer=0], AffinityTopologyVersion 
[topVer=73, minorTopVer=0], AffinityTopologyVersion [topVer=74, minorTopVer=0], 
AffinityTopologyVersion [topVer=75, minorTopVer=0], AffinityTopologyVersion 
[topVer=76, minorTopVer=0], AffinityTopologyVersion [topVer=77, minorTopVer=0], 
AffinityTopologyVersion [topVer=78, minorTopVer=0], AffinityTopologyVersion 
[topVer=79, minorTopVer=0], AffinityTopologyVersion [topVer=80, minorTopVer=0], 
AffinityTopologyVersion [topVer=81, minorTopVer=0], AffinityTopologyVersion 
[topVer=82, minorTopVer=0], AffinityTopologyVersion [topVer=83, minorTopVer=0], 
AffinityTopologyVersion [topVer=84, minorTopVer=0], AffinityTopologyVersion 
[topVer=85, minorTopVer=0], AffinityTopologyVersion [topVer=86, minorTopVer=0], 
AffinityTopologyVersion [topVer=87, minorTopVer=0], AffinityTopologyVersion 
[topVer=88, minorTopVer=0], AffinityTopologyVersion [topVer=89, minorTopVer=0], 
AffinityTopologyVersion [topVer=90, minorTopVer=0], AffinityTopologyVersion 
[topVer=91, minorTopVer=0], AffinityTopologyVersion [topVer=92, minorTopVer=0], 
AffinityTopologyVersion [topVer=93, minorTopVer=0], AffinityTopologyVersion 
[topVer=94, minorTopVer=0], AffinityTopologyVersion [topVer=95, minorTopVer=0], 
AffinityTopologyVersion [topVer=96, minorTopVer=0], AffinityTopologyVersion 
[topVer=97, minorTopVer=0], AffinityTopologyVersion [topVer=98, minorTopVer=0], 
AffinityTopologyVersion [topVer=99, minorTopVer=0], AffinityTopologyVersion 
[topVer=100, minorTopVer=0], AffinityTopologyVersion [topVer=101, 
minorTopVer=0], AffinityTopologyVersion [topVer=102, minorTopVer=0], 
AffinityTopologyVersion [topVer=103, minorTopVer=0], AffinityTopologyVersion 
[topVer=104, minorTopVer=0], AffinityTopologyVersion [topVer=105, 
minorTopVer=0], AffinityTopologyVersion [topVer=106, minorTopVer=0], 
AffinityTopologyVersion [topVer=107, minorTopVer=0], AffinityTopologyVersion 
[topVer=108, minorTopVer=0], AffinityTopologyVersion [topVer=109, 
minorTopVer=0], AffinityTopologyVersion [topVer=110, minorTopVer=0], 
AffinityTopologyVersion [topVer=111, minorTopVer=0], AffinityTopologyVersion 
[topVer=112, minorTopVer=0], AffinityTopologyVersion [topVer=113, 
minorTopVer=0], AffinityTopologyVersion [topVer=114, minorTopVer=0], 
AffinityTopologyVersion [topVer=115, minorTopVer=0], AffinityTopologyVersion 
[topVer=116, minorTopVer=0], AffinityTopologyVersion [topVer=117, 
minorTopVer=0], AffinityTopologyVersion [topVer=118, m

[jira] [Assigned] (IGNITE-9283) [ML] Add Discrete Cosine preprocessor

2018-11-22 Thread Ilya Lantukh (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ilya Lantukh reassigned IGNITE-9283:


Assignee: Ilya Lantukh

> [ML] Add Discrete Cosine preprocessor
> -
>
> Key: IGNITE-9283
> URL: https://issues.apache.org/jira/browse/IGNITE-9283
> Project: Ignite
>  Issue Type: Sub-task
>  Components: ml
>Reporter: Aleksey Zinoviev
>Assignee: Ilya Lantukh
>Priority: Major
>
> Add [https://en.wikipedia.org/wiki/Discrete_cosine_transform]
> Please look at the MinMaxScaler or Normalization packages in preprocessing 
> package.
> Add classes if required
> 1) Preprocessor
> 2) Trainer
> 3) custom PartitionData if shuffling is a step of algorithm
>  
> Requirements for successful PR:
>  # PartitionedDataset usage
>  # Trainer-Model paradigm support
>  # Tests for Model and for Trainer (and other stuff)
>  # Example of usage with small, but famous dataset like IRIS, Titanic or 
> House Prices
>  # Javadocs/codestyle according guidelines
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-4111) Communication fails to send message if target node did not finish join process

2018-11-22 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-4111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16695805#comment-16695805
 ] 

Ilya Lantukh commented on IGNITE-4111:
--

[~NSAmelchev], thanks for the contribution!

I've reviewed your PR, it looks good. However, I would prefer to have a more 
precise test. Currently in IgniteTcpCommunicationBigClusterTest you just create 
an artificial latency and start multiple nodes, hoping that you will end up in 
the scenario mentioned in ticket's description. Please check if it is possible 
to re-write it so it will ensure such scenario using synchronization mechanics 
(like CountDownLatch) and make it more deterministic. Also, please give the 
test more meaningful name.

> Communication fails to send message if target node did not finish join process
> --
>
> Key: IGNITE-4111
> URL: https://issues.apache.org/jira/browse/IGNITE-4111
> Project: Ignite
>  Issue Type: Bug
>  Components: general
>Reporter: Semen Boikov
>Assignee: Amelchev Nikita
>Priority: Minor
> Fix For: 2.8
>
> Attachments: test onFirstMessage hang.log
>
>
> Currently this scenario is possible:
> - joining node sent join request and waits for 
> TcpDiscoveryNodeAddFinishedMessage inside ServerImpl.joinTopology
> - others nodes already see this node and can send messages to it (for example 
> try to run compute job on this node)
> - joining node can not receive message: TcpCommunicationSpi will hang inside 
> 'onFirstMessage' on 'getSpiContext' call, so sending node will get error 
> trying to establish connection
> Possible fix: if in onFirstMessage() spi context is not available, then 
> TcpCommunicationSpi  should send special response which indicates that this 
> node is not ready yet, and sender should retry after some time.
> Also need check internal code for places where message can be unnecessarily 
> sent to node: one such place is 
> GridCachePartitionExchangeManager.refreshPartitions - message is sent to all 
> known nodes, but here we can filter by node order / finished exchage version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9558) Avoid changing AffinityTopologyVersion on client connect when possible

2018-11-11 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16682961#comment-16682961
 ] 

Ilya Lantukh commented on IGNITE-9558:
--

[~agoncharuk], thanks for uncovering that issue!

I've fixed it, please review my PR again.

> Avoid changing AffinityTopologyVersion on client connect when possible
> --
>
> Key: IGNITE-9558
> URL: https://issues.apache.org/jira/browse/IGNITE-9558
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 2.0
>Reporter: Alexey Goncharuk
>Assignee: Ilya Lantukh
>Priority: Major
> Fix For: 2.8
>
>
> Currently a client join event changes discovery topology version which, in 
> turn, changes AffinityTopologyVersion.
> When a client maps transaction on new AffinityTopologyVersion, corresponding 
> message is not processed on remote node until remote node receives the 
> corresponding discovery event. If discovery event delivery is delayed for 
> some reason, this will result in transaction stalls on client joins.
> Since the client node does not change partition affinity, we can safely map 
> transactions on the previous topology version and do not change the affinity 
> topology version at all.
> Some cases need special care and probably do not qualify for this 
> optimization, such as when client has near cache or client hosts partition 
> for REPLICATED cache.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (IGNITE-9840) Possible deadlock on transactional future on client node in case of network problems or long GC pauses

2018-11-08 Thread Ilya Lantukh (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ilya Lantukh reassigned IGNITE-9840:


Assignee: Ilya Lantukh  (was: Alexey Stelmak)

> Possible deadlock on transactional future on client node in case of network 
> problems or long GC pauses
> --
>
> Key: IGNITE-9840
> URL: https://issues.apache.org/jira/browse/IGNITE-9840
> Project: Ignite
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 2.6
>Reporter: Andrey Aleksandrov
>Assignee: Ilya Lantukh
>Priority: Critical
> Fix For: 2.8
>
>
> Steps to reproduce:
> 1)Start the server node with next timeouts. DefaultTxTimeout should be 
> greater than other:
>  
> {code:java}
> 
> 
> 
> 
> 
> 
> 
>     
>         
>     
> 
> 
> 
> 
> {code}
> On the server side you should create a cache with next parameters:
>  
>  
> {code:java}
> 
>     
>     
>     
>     
>     
>     {code}
> 2)After that start the client with the next code:
> {code:java}
> IgniteCache cache = ignite.getOrCreateCache("CACHE");
> try (Transaction tx = ignite.transactions().txStart()) {
> cache.put("Key", new Object());
> System.out.println("Stop me");
> //here we will get long GC pause on server side
> Thread.sleep(1);
> // Commit the transaction.
> tx.commitAsync().get();
> }
> {code}
>  
> On step "Stop me" you should suspend all the thread on the server side to 
> emulate the networking problem or long GC pause on the server side.
> Finally, you will face in client node next:
> {code:java}
> [2018-10-10 16:46:10,157][ERROR][nio-acceptor-tcp-comm-#28%GRIDC1%][root] 
> Critical system error detected. Will be handled accordingly to configured 
> handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet 
> [SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext 
> [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker 
> [name=grid-timeout-worker, igniteInstanceName=GRIDC1, finished=false, 
> heartbeatTs=1539179057570]]]
> {code}
> Also, the similar issue could be reproduced in 2.4. In both cases looks like 
> we have a deadlock during trying to display the TxEntryValueHolder. Looks 
> like this values are already used by the transaction with long 
> DefaultTxTimeout .
> {code:java}
> java.lang.Thread.State: WAITING
> at sun.misc.Unsafe.park(Unsafe.java:-1)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
> at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
> at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140)
> at 
> org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.metadata0(CacheObjectBinaryProcessorImpl.java:526)
> at 
> org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.metadata(CacheObjectBinaryProcessorImpl.java:510)
> at 
> org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl$2.metadata(CacheObjectBinaryProcessorImpl.java:193)
> at 
> org.apache.ignite.internal.binary.BinaryContext.metadata(BinaryContext.java:1265)
> at org.apache.ignite.internal.binary.BinaryUtils.type(BinaryUtils.java:2407)
> at 
> org.apache.ignite.internal.binary.BinaryObjectImpl.rawType(BinaryObjectImpl.java:302)
> at 
> org.apache.ignite.internal.binary.BinaryObjectExImpl.toString(BinaryObjectExImpl.java:205)
> at 
> org.apache.ignite.internal.binary.BinaryObjectExImpl.toString(BinaryObjectExImpl.java:186)
> at 
> org.apache.ignite.internal.binary.BinaryObjectImpl.toString(BinaryObjectImpl.java:919)
> at java.lang.String.valueOf(String.java:2994)
> at java.lang.StringBuilder.append(StringBuilder.java:131)
> at 
> org.apache.ignite.internal.processors.cache.transactions.TxEntryValueHolder.toString(TxEntryValueHolder.java:161)
> ...{code}
> On the client side, it could be looked like a hanging transaction because we 
> waiting on:
> {code:java}
> tx.commitAsync().get();{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (IGNITE-4003) Slow or faulty client can stall the whole cluster.

2018-11-08 Thread Ilya Lantukh (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ilya Lantukh reassigned IGNITE-4003:


Assignee: Ilya Lantukh  (was: Semen Boikov)

> Slow or faulty client can stall the whole cluster.
> --
>
> Key: IGNITE-4003
> URL: https://issues.apache.org/jira/browse/IGNITE-4003
> Project: Ignite
>  Issue Type: Bug
>  Components: cache, general
>Affects Versions: 1.7
>Reporter: Vladimir Ozerov
>Assignee: Ilya Lantukh
>Priority: Critical
>
> Steps to reproduce:
> 1) Start two server nodes and some data to cache.
> 2) Start a client from Docker subnet, which is not visible from the outside. 
> Client will join the cluster.
> 3) Try to put something to cache or start another node to force rabalance.
> Cluster is stuck at this moment. Root cause - servers are constantly trying 
> to establish outgoing connection to the client, but fail as Docker subnet is 
> not visible from the outside. It may stop virtually all cluster operations.
> Typical thread dump:
> {code}
> org.apache.ignite.IgniteCheckedException: Failed to send message (node may 
> have left the grid or TCP connection cannot be established due to firewall 
> issues) [node=TcpDiscoveryNode [id=a15d74c2-1ec2-4349-9640-aeacd70d8714, 
> addrs=[127.0.0.1, 172.17.0.6], sockAddrs=[/127.0.0.1:0, /127.0.0.1:0, 
> /172.17.0.6:0], discPort=0, order=7241, intOrder=3707, 
> lastExchangeTime=1474096941045, loc=false, ver=1.5.23#20160526-sha1:259146da, 
> isClient=true], topic=T4 [topic=TOPIC_CACHE, 
> id1=949732fd-1360-3a58-8d9e-0ff6ea6182cc, 
> id2=a15d74c2-1ec2-4349-9640-aeacd70d8714, id3=2], msg=GridContinuousMessage 
> [type=MSG_EVT_NOTIFICATION, routineId=7e13c48e-6933-48b2-9f15-8d92007930db, 
> data=null, futId=null], policy=2]
>   at 
> org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1129)
>  [ignite-core-1.5.23.jar:1.5.23]
>   at 
> org.apache.ignite.internal.managers.communication.GridIoManager.sendOrderedMessage(GridIoManager.java:1347)
>  [ignite-core-1.5.23.jar:1.5.23]
>   at 
> org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1227)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>   at 
> org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1198)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>   at 
> org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1180)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>   at 
> org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendNotification(GridContinuousProcessor.java:841)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>   at 
> org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.addNotification(GridContinuousProcessor.java:800)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>   at 
> org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.onEntryUpdate(CacheContinuousQueryHandler.java:787)
>  [ignite-core-1.5.23.jar:1.5.23]
>   at 
> org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.access$700(CacheContinuousQueryHandler.java:91)
>  [ignite-core-1.5.23.jar:1.5.23]
>   at 
> org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler$1.onEntryUpdated(CacheContinuousQueryHandler.java:412)
>  [ignite-core-1.5.23.jar:1.5.23]
>   at 
> org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:343)
>  [ignite-core-1.5.23.jar:1.5.23]
>   at 
> org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:250)
>  [ignite-core-1.5.23.jar:1.5.23]
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.initialValue(GridCacheMapEntry.java:3476)
>  [ignite-core-1.5.23.jar:1.5.23]
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture$MiniFuture.onResult(GridDhtForceKeysFuture.java:548)
>  [ignite-core-1.5.23.jar:1.5.23]
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture.onResult(GridDhtForceKeysFuture.java:207)
>  [ignite-core-1.5.23.jar:1.5.23]
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.processForceKeyResponse(GridDhtPreloader.java:636)
>  [ignite-core-1.5.23.jar:1.5.23]
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.access$1000(GridDhtPreloader.java:81)
>  [ignite-core-1.5.23.jar:1.5.23]
>   at 
> org.apac

[jira] [Created] (IGNITE-10186) Revise ability to merge client-only exchanges

2018-11-08 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-10186:
-

 Summary: Revise ability to merge client-only exchanges
 Key: IGNITE-10186
 URL: https://issues.apache.org/jira/browse/IGNITE-10186
 Project: Ignite
  Issue Type: Improvement
Reporter: Ilya Lantukh


In IGNITE-9558 ability to merge client exchanges was disabled because it could 
create complex technical problems. 

Now we need to decide either to re-enable this functionality and fix all 
related issues or remove it completely, including test 
CacheExchangeMergeTest.testMergeServerAndClientJoin.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9883) Do not block get/getAll during start/stop operations on other cache.

2018-11-06 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16676888#comment-16676888
 ] 

Ilya Lantukh commented on IGNITE-9883:
--

[~ibessonov], thanks for the contribution!

I have reviewed your pull request and see 2 problems:
1. Your solution only works in situation when DHT has already received 
discovery event for the next topology version. What if it hasn't? The first 
problem is that we will have to wait for it. The second is that even after we 
received that event, all cache messages that were received earlier will 
continue to wait until exchange is finished. I don't think that we can consider 
this ticket complete without fixing at least the second problem.
2. Currently shouldWaitForAffinityReadyFuture(...) always returns true if 
versions are equal, without taking event type into consideration. It looks 
dangerous to me, I suggest to restrict it to DynamicCacheChange events.

> Do not block get/getAll during start/stop operations on other cache.
> 
>
> Key: IGNITE-9883
> URL: https://issues.apache.org/jira/browse/IGNITE-9883
> Project: Ignite
>  Issue Type: Sub-task
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
> Fix For: 2.8
>
>
> Do not block get/getAll during start/stop operations on other cache.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9769) IgniteCacheAtomicProtocolTest.testPutReaderUpdate1 is flaky

2018-10-30 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16668642#comment-16668642
 ] 

Ilya Lantukh commented on IGNITE-9769:
--

OK, patch looks good to me. Thanks for the contribution!

> IgniteCacheAtomicProtocolTest.testPutReaderUpdate1 is flaky
> ---
>
> Key: IGNITE-9769
> URL: https://issues.apache.org/jira/browse/IGNITE-9769
> Project: Ignite
>  Issue Type: Task
>Reporter: Ryabov Dmitrii
>Assignee: Ryabov Dmitrii
>Priority: Trivial
>  Labels: MakeTeamcityGreenAgain
> Fix For: 2.8
>
>
> {{IgniteCacheAtomicProtocolTest.testPutReaderUpdate1}} and 
> {{IgniteCacheAtomicProtocolTest.testPutReaderUpdate2}} are flaky.
> In the {{#readerUpdateDhtFails}} method we blocks 
> {{GridDhtAtomicNearResponse}} messages and do put operation. Put should hangs 
> always, but sometimes it doesn't.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9769) IgniteCacheAtomicProtocolTest.testPutReaderUpdate1 is flaky

2018-10-30 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16668625#comment-16668625
 ] 

Ilya Lantukh commented on IGNITE-9769:
--

[~SomeFire],
Do we still need changes in IgniteCacheAtomicProtocolTest to resolve this 
ticket?

> IgniteCacheAtomicProtocolTest.testPutReaderUpdate1 is flaky
> ---
>
> Key: IGNITE-9769
> URL: https://issues.apache.org/jira/browse/IGNITE-9769
> Project: Ignite
>  Issue Type: Task
>Reporter: Ryabov Dmitrii
>Assignee: Ryabov Dmitrii
>Priority: Trivial
>  Labels: MakeTeamcityGreenAgain
> Fix For: 2.8
>
>
> {{IgniteCacheAtomicProtocolTest.testPutReaderUpdate1}} and 
> {{IgniteCacheAtomicProtocolTest.testPutReaderUpdate2}} are flaky.
> In the {{#readerUpdateDhtFails}} method we blocks 
> {{GridDhtAtomicNearResponse}} messages and do put operation. Put should hangs 
> always, but sometimes it doesn't.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IGNITE-10046) MVCC: IgniteCachePrimaryNodeFailureRecoveryAbstractTest.testPessimisticPrimaryNodeFailureRollback hangs sometimes.

2018-10-29 Thread Ilya Lantukh (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ilya Lantukh resolved IGNITE-10046.
---
Resolution: Invalid

> MVCC: 
> IgniteCachePrimaryNodeFailureRecoveryAbstractTest.testPessimisticPrimaryNodeFailureRollback
>  hangs sometimes.
> --
>
> Key: IGNITE-10046
> URL: https://issues.apache.org/jira/browse/IGNITE-10046
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ilya Lantukh
>Priority: Major
> Fix For: 2.7
>
> Attachments: 
> Ignite_Tests_2.4_Java_8_9_10_11_MVCC_Queries_1323.log(1).zip
>
>
> The following exception can be found in log before the hangup:
> {noformat}
> [14:51:43]W:   [org.apache.ignite:ignite-indexing] 
> java.lang.NullPointerException
> [14:51:43]W:   [org.apache.ignite:ignite-indexing]at 
> java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936)
> [14:51:43]W:   [org.apache.ignite:ignite-indexing]at 
> org.apache.ignite.internal.processors.cache.GridCacheMvccManager.future(GridCacheMvccManager.java:754)
> [14:51:43]W:   [org.apache.ignite:ignite-indexing]at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processPartitionCountersResponse(IgniteTxHandler.java:2204)
> [14:51:43]W:   [org.apache.ignite:ignite-indexing]at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.access$1100(IgniteTxHandler.java:120)
> [14:51:43]W:   [org.apache.ignite:ignite-indexing]at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$13.apply(IgniteTxHandler.java:276)
> [14:51:43]W:   [org.apache.ignite:ignite-indexing]at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$13.apply(IgniteTxHandler.java:274)
> [14:51:43]W:   [org.apache.ignite:ignite-indexing]at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1060)
> [14:51:43]W:   [org.apache.ignite:ignite-indexing]at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:585)
> [14:51:43]W:   [org.apache.ignite:ignite-indexing]at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:384)
> [14:51:43]W:   [org.apache.ignite:ignite-indexing]at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:310)
> [14:51:43]W:   [org.apache.ignite:ignite-indexing]at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:100)
> [14:51:43]W:   [org.apache.ignite:ignite-indexing]at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:299)
> [14:51:43]W:   [org.apache.ignite:ignite-indexing]at 
> org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1569)
> [14:51:43]W:   [org.apache.ignite:ignite-indexing]at 
> org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1197)
> [14:51:43]W:   [org.apache.ignite:ignite-indexing]at 
> org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:127)
> [14:51:43]W:   [org.apache.ignite:ignite-indexing]at 
> org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1093)
> [14:51:43]W:   [org.apache.ignite:ignite-indexing]at 
> org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:505)
> [14:51:43]W:   [org.apache.ignite:ignite-indexing]at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
> [14:51:43]W:   [org.apache.ignite:ignite-indexing]at 
> java.lang.Thread.run(Thread.java:748)
> {noformat}
> Full log is attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-10046) MVCC: IgniteCachePrimaryNodeFailureRecoveryAbstractTest.testPessimisticPrimaryNodeFailureRollback hangs sometimes.

2018-10-29 Thread Ilya Lantukh (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ilya Lantukh updated IGNITE-10046:
--
Fix Version/s: (was: 2.7)

> MVCC: 
> IgniteCachePrimaryNodeFailureRecoveryAbstractTest.testPessimisticPrimaryNodeFailureRollback
>  hangs sometimes.
> --
>
> Key: IGNITE-10046
> URL: https://issues.apache.org/jira/browse/IGNITE-10046
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ilya Lantukh
>Priority: Major
> Attachments: 
> Ignite_Tests_2.4_Java_8_9_10_11_MVCC_Queries_1323.log(1).zip
>
>
> The following exception can be found in log before the hangup:
> {noformat}
> [14:51:43]W:   [org.apache.ignite:ignite-indexing] 
> java.lang.NullPointerException
> [14:51:43]W:   [org.apache.ignite:ignite-indexing]at 
> java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936)
> [14:51:43]W:   [org.apache.ignite:ignite-indexing]at 
> org.apache.ignite.internal.processors.cache.GridCacheMvccManager.future(GridCacheMvccManager.java:754)
> [14:51:43]W:   [org.apache.ignite:ignite-indexing]at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processPartitionCountersResponse(IgniteTxHandler.java:2204)
> [14:51:43]W:   [org.apache.ignite:ignite-indexing]at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.access$1100(IgniteTxHandler.java:120)
> [14:51:43]W:   [org.apache.ignite:ignite-indexing]at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$13.apply(IgniteTxHandler.java:276)
> [14:51:43]W:   [org.apache.ignite:ignite-indexing]at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$13.apply(IgniteTxHandler.java:274)
> [14:51:43]W:   [org.apache.ignite:ignite-indexing]at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1060)
> [14:51:43]W:   [org.apache.ignite:ignite-indexing]at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:585)
> [14:51:43]W:   [org.apache.ignite:ignite-indexing]at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:384)
> [14:51:43]W:   [org.apache.ignite:ignite-indexing]at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:310)
> [14:51:43]W:   [org.apache.ignite:ignite-indexing]at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:100)
> [14:51:43]W:   [org.apache.ignite:ignite-indexing]at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:299)
> [14:51:43]W:   [org.apache.ignite:ignite-indexing]at 
> org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1569)
> [14:51:43]W:   [org.apache.ignite:ignite-indexing]at 
> org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1197)
> [14:51:43]W:   [org.apache.ignite:ignite-indexing]at 
> org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:127)
> [14:51:43]W:   [org.apache.ignite:ignite-indexing]at 
> org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1093)
> [14:51:43]W:   [org.apache.ignite:ignite-indexing]at 
> org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:505)
> [14:51:43]W:   [org.apache.ignite:ignite-indexing]at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
> [14:51:43]W:   [org.apache.ignite:ignite-indexing]at 
> java.lang.Thread.run(Thread.java:748)
> {noformat}
> Full log is attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-10046) MVCC: IgniteCachePrimaryNodeFailureRecoveryAbstractTest.testPessimisticPrimaryNodeFailureRollback hangs sometimes.

2018-10-29 Thread Ilya Lantukh (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ilya Lantukh updated IGNITE-10046:
--
Issue Type: Bug  (was: Improvement)

> MVCC: 
> IgniteCachePrimaryNodeFailureRecoveryAbstractTest.testPessimisticPrimaryNodeFailureRollback
>  hangs sometimes.
> --
>
> Key: IGNITE-10046
> URL: https://issues.apache.org/jira/browse/IGNITE-10046
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ilya Lantukh
>Priority: Major
> Fix For: 2.7
>
> Attachments: 
> Ignite_Tests_2.4_Java_8_9_10_11_MVCC_Queries_1323.log(1).zip
>
>
> The following exception can be found in log before the hangup:
> {noformat}
> [14:51:43]W:   [org.apache.ignite:ignite-indexing] 
> java.lang.NullPointerException
> [14:51:43]W:   [org.apache.ignite:ignite-indexing]at 
> java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936)
> [14:51:43]W:   [org.apache.ignite:ignite-indexing]at 
> org.apache.ignite.internal.processors.cache.GridCacheMvccManager.future(GridCacheMvccManager.java:754)
> [14:51:43]W:   [org.apache.ignite:ignite-indexing]at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processPartitionCountersResponse(IgniteTxHandler.java:2204)
> [14:51:43]W:   [org.apache.ignite:ignite-indexing]at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.access$1100(IgniteTxHandler.java:120)
> [14:51:43]W:   [org.apache.ignite:ignite-indexing]at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$13.apply(IgniteTxHandler.java:276)
> [14:51:43]W:   [org.apache.ignite:ignite-indexing]at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$13.apply(IgniteTxHandler.java:274)
> [14:51:43]W:   [org.apache.ignite:ignite-indexing]at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1060)
> [14:51:43]W:   [org.apache.ignite:ignite-indexing]at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:585)
> [14:51:43]W:   [org.apache.ignite:ignite-indexing]at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:384)
> [14:51:43]W:   [org.apache.ignite:ignite-indexing]at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:310)
> [14:51:43]W:   [org.apache.ignite:ignite-indexing]at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:100)
> [14:51:43]W:   [org.apache.ignite:ignite-indexing]at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:299)
> [14:51:43]W:   [org.apache.ignite:ignite-indexing]at 
> org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1569)
> [14:51:43]W:   [org.apache.ignite:ignite-indexing]at 
> org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1197)
> [14:51:43]W:   [org.apache.ignite:ignite-indexing]at 
> org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:127)
> [14:51:43]W:   [org.apache.ignite:ignite-indexing]at 
> org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1093)
> [14:51:43]W:   [org.apache.ignite:ignite-indexing]at 
> org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:505)
> [14:51:43]W:   [org.apache.ignite:ignite-indexing]at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
> [14:51:43]W:   [org.apache.ignite:ignite-indexing]at 
> java.lang.Thread.run(Thread.java:748)
> {noformat}
> Full log is attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-10046) MVCC: IgniteCachePrimaryNodeFailureRecoveryAbstractTest.testPessimisticPrimaryNodeFailureRollback hangs sometimes.

2018-10-29 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-10046:
-

 Summary: MVCC: 
IgniteCachePrimaryNodeFailureRecoveryAbstractTest.testPessimisticPrimaryNodeFailureRollback
 hangs sometimes.
 Key: IGNITE-10046
 URL: https://issues.apache.org/jira/browse/IGNITE-10046
 Project: Ignite
  Issue Type: Improvement
Reporter: Ilya Lantukh
 Fix For: 2.7
 Attachments: 
Ignite_Tests_2.4_Java_8_9_10_11_MVCC_Queries_1323.log(1).zip

The following exception can be found in log before the hangup:
{noformat}
[14:51:43]W: [org.apache.ignite:ignite-indexing] 
java.lang.NullPointerException
[14:51:43]W: [org.apache.ignite:ignite-indexing]at 
java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936)
[14:51:43]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.processors.cache.GridCacheMvccManager.future(GridCacheMvccManager.java:754)
[14:51:43]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processPartitionCountersResponse(IgniteTxHandler.java:2204)
[14:51:43]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.access$1100(IgniteTxHandler.java:120)
[14:51:43]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$13.apply(IgniteTxHandler.java:276)
[14:51:43]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$13.apply(IgniteTxHandler.java:274)
[14:51:43]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1060)
[14:51:43]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:585)
[14:51:43]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:384)
[14:51:43]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:310)
[14:51:43]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:100)
[14:51:43]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:299)
[14:51:43]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1569)
[14:51:43]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1197)
[14:51:43]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:127)
[14:51:43]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1093)
[14:51:43]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:505)
[14:51:43]W: [org.apache.ignite:ignite-indexing]at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
[14:51:43]W: [org.apache.ignite:ignite-indexing]at 
java.lang.Thread.run(Thread.java:748)
{noformat}
Full log is attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9558) Avoid changing AffinityTopologyVersion on client connect when possible

2018-10-27 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1095#comment-1095
 ] 

Ilya Lantukh commented on IGNITE-9558:
--

Done.

> Avoid changing AffinityTopologyVersion on client connect when possible
> --
>
> Key: IGNITE-9558
> URL: https://issues.apache.org/jira/browse/IGNITE-9558
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 2.0
>Reporter: Alexey Goncharuk
>Assignee: Ilya Lantukh
>Priority: Major
> Fix For: 2.8
>
>
> Currently a client join event changes discovery topology version which, in 
> turn, changes AffinityTopologyVersion.
> When a client maps transaction on new AffinityTopologyVersion, corresponding 
> message is not processed on remote node until remote node receives the 
> corresponding discovery event. If discovery event delivery is delayed for 
> some reason, this will result in transaction stalls on client joins.
> Since the client node does not change partition affinity, we can safely map 
> transactions on the previous topology version and do not change the affinity 
> topology version at all.
> Some cases need special care and probably do not qualify for this 
> optimization, such as when client has near cache or client hosts partition 
> for REPLICATED cache.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9091) IEP-25: creating documentation

2018-10-26 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16665182#comment-16665182
 ] 

Ilya Lantukh commented on IGNITE-9091:
--

[~agoncharuk], yes, everything is correct.

> IEP-25: creating documentation
> --
>
> Key: IGNITE-9091
> URL: https://issues.apache.org/jira/browse/IGNITE-9091
> Project: Ignite
>  Issue Type: Task
>  Components: documentation
>Reporter: Alex Volkov
>Assignee: Alexey Goncharuk
>Priority: Major
>  Labels: iep-25
> Fix For: 2.8
>
>
> It would be great to have proper documentation for IEP-25:
> [https://cwiki.apache.org/confluence/display/IGNITE/IEP-25:+Partition+Map+Exchange+hangs+resolving]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9753) Control.sh validate index work long and with errors

2018-10-26 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16665155#comment-16665155
 ] 

Ilya Lantukh commented on IGNITE-9753:
--

Looks good.

> Control.sh validate index work long and with errors
> ---
>
> Key: IGNITE-9753
> URL: https://issues.apache.org/jira/browse/IGNITE-9753
> Project: Ignite
>  Issue Type: Bug
>  Components: general
>Affects Versions: 2.5
>Reporter: ARomantsov
>Assignee: Ivan Daschinskiy
>Priority: Major
> Fix For: 2.8
>
>
> Errors - 
> [12:19:54][:666] IndexValidationIssue [key=null, cacheName=cache_name_1, 
> idxName=_key_PK_hash], class java.lang.NullPointerException: null
> [12:19:54][:666] IndexValidationIssue [key=null, cacheName=cache_name_2, 
> idxName=_key_PK_hash], class java.lang.NullPointerException: null



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-9558) Avoid changing AffinityTopologyVersion on client connect when possible

2018-10-26 Thread Ilya Lantukh (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ilya Lantukh updated IGNITE-9558:
-
Fix Version/s: 2.8

> Avoid changing AffinityTopologyVersion on client connect when possible
> --
>
> Key: IGNITE-9558
> URL: https://issues.apache.org/jira/browse/IGNITE-9558
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 2.0
>Reporter: Alexey Goncharuk
>Assignee: Ilya Lantukh
>Priority: Major
> Fix For: 2.8
>
>
> Currently a client join event changes discovery topology version which, in 
> turn, changes AffinityTopologyVersion.
> When a client maps transaction on new AffinityTopologyVersion, corresponding 
> message is not processed on remote node until remote node receives the 
> corresponding discovery event. If discovery event delivery is delayed for 
> some reason, this will result in transaction stalls on client joins.
> Since the client node does not change partition affinity, we can safely map 
> transactions on the previous topology version and do not change the affinity 
> topology version at all.
> Some cases need special care and probably do not qualify for this 
> optimization, such as when client has near cache or client hosts partition 
> for REPLICATED cache.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9769) IgniteCacheAtomicProtocolTest.testPutReaderUpdate1 is flaky

2018-10-25 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16663626#comment-16663626
 ] 

Ilya Lantukh commented on IGNITE-9769:
--

[~SomeFire],
The name *equalsPrimaryAffNodes* is misleading, because it will be *true* if 
primary nodes are *not equal*. Also, I think it will be more readable if you 
simply add something like this to *if* condition:
{noformat}
|| !dht.context().affinity().primaryByPartition(p, 
readyVer).equals(affNodes.get(0))
{noformat}

With these check added you should be able to rollback other changes you did in 
this PR and the test should pass.

> IgniteCacheAtomicProtocolTest.testPutReaderUpdate1 is flaky
> ---
>
> Key: IGNITE-9769
> URL: https://issues.apache.org/jira/browse/IGNITE-9769
> Project: Ignite
>  Issue Type: Task
>Reporter: Ryabov Dmitrii
>Assignee: Ryabov Dmitrii
>Priority: Trivial
>  Labels: MakeTeamcityGreenAgain
> Fix For: 2.8
>
>
> {{IgniteCacheAtomicProtocolTest.testPutReaderUpdate1}} and 
> {{IgniteCacheAtomicProtocolTest.testPutReaderUpdate2}} are flaky.
> In the {{#readerUpdateDhtFails}} method we blocks 
> {{GridDhtAtomicNearResponse}} messages and do put operation. Put should hangs 
> always, but sometimes it doesn't.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9673) Timeout in Java Client suite.

2018-10-24 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16662194#comment-16662194
 ] 

Ilya Lantukh commented on IGNITE-9673:
--

[~sergey-chugunov],

These changes affect only handlers related to Redis commands. They do no belong 
to any specific "stripe", so they shouldn't be executed by the striped sys pool.

> Timeout in Java Client suite.
> -
>
> Key: IGNITE-9673
> URL: https://issues.apache.org/jira/browse/IGNITE-9673
> Project: Ignite
>  Issue Type: Bug
>Reporter: Amelchev Nikita
>Assignee: Amelchev Nikita
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
> Fix For: 2.8
>
> Attachments: ThreadDump.txt
>
>
> Example of timeout: [TC 
> build|[https://ci.ignite.apache.org/viewLog.html?buildId=1919405&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_JavaClient].]
> The possible reason is non-interruptable future and starvation in stripped 
> pool:
> {noformat}
> "test-runner-#2440%redis.RedisProtocolStringSelfTest%" #3843 prio=5 os_prio=0 
> tid=0x7f8f053fb000 nid=0x7b19 waiting on condition [0x7f8d74f8f000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
>   at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
>   at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheAdapter$22.op(GridCacheAdapter.java:2465)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheAdapter$22.op(GridCacheAdapter.java:2463)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheAdapter.syncOp(GridCacheAdapter.java:4228)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheAdapter.put0(GridCacheAdapter.java:2463)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheAdapter.put(GridCacheAdapter.java:2444)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheAdapter.put(GridCacheAdapter.java:2421)
>   at 
> org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.put(IgniteCacheProxyImpl.java:1089)
>   at 
> org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.put(GatewayProtectedCacheProxy.java:820)
>   at 
> org.apache.ignite.internal.processors.rest.protocols.tcp.redis.RedisProtocolStringSelfTest.testStrlen(RedisProtocolStringSelfTest.java:310)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at junit.framework.TestCase.runTest(TestCase.java:176)
>   at 
> org.apache.ignite.testframework.junits.GridAbstractTest.runTestInternal(GridAbstractTest.java:2177)
>   at 
> org.apache.ignite.testframework.junits.GridAbstractTest.access$000(GridAbstractTest.java:143)
>   at 
> org.apache.ignite.testframework.junits.GridAbstractTest$5.run(GridAbstractTest.java:2092)
>   at java.lang.Thread.run(Thread.java:748)
> [grid-timeout-worker-#2323%redis.RedisProtocolStringSelfTest0%][G] >>> 
> Possible starvation in striped pool.
> Thread name: sys-stripe-3-#2304%redis.RedisProtocolStringSelfTest0%
> Queue: [Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, 
> topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, 
> msg=GridCacheIdMessage [cacheId=1481046058]GridDistributedBaseMessage 
> [ver=GridCacheVersion [topVer=148979816, order=1537499815759, nodeOrder=1], 
> committedVers=ArrayList [], rolledbackVers=ArrayList [], cnt=0, 
> super=]GridDistributedLockResponse 
> [futId=e739f1af561-9bc10183-74c7-4b9a-a525-aef32c002efc, err=null, 
> vals=ArrayList [null], super=]GridNearLockResponse [pending=ArrayList [], 
> miniId=1, dhtVers=GridCacheVersion[] [GridCacheVersion [topVer=0, order=0, 
> nodeOrder=0]], mappedVers=GridCacheVersion[] [GridCacheVersion 
> [topVer=148979816, order=1537499815760, nodeOrder=2]], clientRemapVer=null, 
> super=]]], Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, 
> topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, 
> msg=GridDistributedTxFinishResponse [txId=GridCacheVersion [topVer=148979816, 
> order=1537499815765, nodeOrder=1], 
> futId=e849f1af561-9bc10183-74c7-4b9a-a525-aef32c002efc, 
> part=-1]GridNearTxFinishResponse [err=null, miniId=1, nearThreadId=3843, 
> super=
> Deadlock: false
> Completed: 5
> "sys-stripe-3-#2304%redis.RedisProtocolStringSelfTest0%" #3628 prio=5 
> os_prio=0 tid=0x7f8f054e1800 nid=0x7a41 w

[jira] [Commented] (IGNITE-8020) Rebalancing for persistent caches should transfer file store over network instead of using existing supply/demand protocol

2018-10-22 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-8020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658757#comment-16658757
 ] 

Ilya Lantukh commented on IGNITE-8020:
--

Of course, you can!

> Rebalancing for persistent caches should transfer file store over network 
> instead of using existing supply/demand protocol
> --
>
> Key: IGNITE-8020
> URL: https://issues.apache.org/jira/browse/IGNITE-8020
> Project: Ignite
>  Issue Type: Improvement
>  Components: persistence
>Reporter: Ilya Lantukh
>Assignee: Ilya Lantukh
>Priority: Major
>  Labels: iep-16
>
> Existing rebalancing protocol is suitable for in-memory data storage, but for 
> data persisted in files it is sub-optimal and requires a lot of unnecessary 
> steps. Efforts to optimize it led to necessity to completely rework the 
> protocol - instead of sending batches (SupplyMessages) with cache entries it 
> is possible to send data files directly.
> The algorithm should look like this:
> 1. Demander node sends requests with required partition IDs (like now)
> 2. Supplier node receives request and performs a checkpoint.
> 3. After checkpoint is done, supplier sends files with demanded partitions 
> using low-level NIO API.
> 4. During steps 2-3, demander node should work in special mode - it should 
> temporary store all incoming updates in such way that they can be quickly 
> applied later.
> 5. After files are transferred, demander applies updates stored at step 4.
> The tricky part here is to switch work modes of demander node avoiding all 
> possible race conditions. Also, the aforementioned algorithm should be 
> extended to transfer or rebuild query indexes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (IGNITE-9905) After transaction load cluster inconsistent

2018-10-19 Thread Ilya Lantukh (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ilya Lantukh reassigned IGNITE-9905:


Assignee: Ilya Lantukh

> After transaction load cluster inconsistent
> ---
>
> Key: IGNITE-9905
> URL: https://issues.apache.org/jira/browse/IGNITE-9905
> Project: Ignite
>  Issue Type: Bug
>  Components: persistence
>Affects Versions: 2.7
>Reporter: ARomantsov
>Assignee: Ilya Lantukh
>Priority: Critical
>
> Loaded data into the cluster using transactions consisting of two get / two 
> put
> Test env: one server, two server node, one client
> {code:java}
> idle_verify check has finished, found 60 conflict partitions: 
> [counterConflicts=45, hashConflicts=15]
> Update counter conflicts:
> Conflict partition: PartitionKeyV2 [grpId=-1903385190, 
> grpName=CACHEGROUP_PARTICLE_1, partId=98]
> Partition instances: [PartitionHashRecordV2 [isPrimary=true, 
> consistentId=node2, updateCntr=1519, size=596, partHash=-1167688484], 
> PartitionHashRecordV2 [isPrimary=false, consistentId=node1, updateCntr=1520, 
> size=596, partHash=-1167688484]]
> Conflict partition: PartitionKeyV2 [grpId=-1903385190, 
> grpName=CACHEGROUP_PARTICLE_1, partId=34]
> Partition instances: [PartitionHashRecordV2 [isPrimary=false, 
> consistentId=node2, updateCntr=1539, size=596, partHash=-99631005], 
> PartitionHashRecordV2 [isPrimary=true, consistentId=node1, updateCntr=1537, 
> size=596, partHash=-1284437377]]
> Conflict partition: PartitionKeyV2 [grpId=770187303, 
> grpName=CACHEGROUP_PARTICLE_1, partId=31]
> Partition instances: [PartitionHashRecordV2 [isPrimary=true, 
> consistentId=node2, updateCntr=15, size=4, partHash=-1125172674], 
> PartitionHashRecordV2 [isPrimary=false, consistentId=node1, updateCntr=16, 
> size=4, partHash=-1125172674]]
> Conflict partition: PartitionKeyV2 [grpId=-1903385190, 
> grpName=CACHEGROUP_PARTICLE_1, partId=39]
> Partition instances: [PartitionHashRecordV2 [isPrimary=true, 
> consistentId=node2, updateCntr=1555, size=596, partHash=-40303136], 
> PartitionHashRecordV2 [isPrimary=false, consistentId=node1, updateCntr=1556, 
> size=596, partHash=-40303136]]
> Conflict partition: PartitionKeyV2 [grpId=-1903385190, 
> grpName=CACHEGROUP_PARTICLE_1, partId=90]
> Partition instances: [PartitionHashRecordV2 [isPrimary=false, 
> consistentId=node2, updateCntr=1557, size=596, partHash=-1295145299], 
> PartitionHashRecordV2 [isPrimary=true, consistentId=node1, updateCntr=1556, 
> size=596, partHash=-1221175703]]
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9769) IgniteCacheAtomicProtocolTest.testPutReaderUpdate1 is flaky

2018-10-19 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16656725#comment-16656725
 ] 

Ilya Lantukh commented on IGNITE-9769:
--

{noformat}if(affNodesCnt!=ownerNodesCnt||!affNodes.containsAll(owners)||
(waitEvicts&&loc!=null&&loc.state()!=GridDhtPartitionState.OWNING)){
if(i%50==0)
LT.warn(log(),"Waiting for topology map update ["+
"igniteInstanceName="+g.name()+
", cache="+cfg.getName()+
", cacheId="+dht.context().cacheId()+
", topVer="+top.readyTopologyVersion()+
", p="+p+
", affNodesCnt="+affNodesCnt+
", ownersCnt="+ownerNodesCnt+
", affNodes="+F.nodeIds(affNodes)+
", owners="+F.nodeIds(owners)+
", topFut="+topFut+
", locNode="+g.cluster().localNode()+']');
{noformat}
Looks like we need to check not only that affNodes.containsAll(owners), but 
also that order of nodes in these collections (or at least first nodes, that 
are primaries) is the same.

> IgniteCacheAtomicProtocolTest.testPutReaderUpdate1 is flaky
> ---
>
> Key: IGNITE-9769
> URL: https://issues.apache.org/jira/browse/IGNITE-9769
> Project: Ignite
>  Issue Type: Task
>Reporter: Ryabov Dmitrii
>Assignee: Ryabov Dmitrii
>Priority: Trivial
>  Labels: MakeTeamcityGreenAgain
> Fix For: 2.8
>
>
> {{IgniteCacheAtomicProtocolTest.testPutReaderUpdate1}} and 
> {{IgniteCacheAtomicProtocolTest.testPutReaderUpdate2}} are flaky.
> In the {{#readerUpdateDhtFails}} method we blocks 
> {{GridDhtAtomicNearResponse}} messages and do put operation. Put should hangs 
> always, but sometimes it doesn't.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-9917) Write proper tests for start/stop client.

2018-10-17 Thread Ilya Lantukh (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ilya Lantukh updated IGNITE-9917:
-
Description: 
Write proper tests for start/stop client. For now we only have "start client" 
tests that don't actually check blocking on exchange in proper way.

To correctly block client-only PME, it is necessary to block Discovery messages 
for node join/left events from reaching some nodes in cluster. It will create a 
situation when nodes have different AffinityTopologyVersion, but affinity 
assignments are still the same. After 
https://issues.apache.org/jira/browse/IGNITE-9558 is done, all cache operations 
should work successfully on such cluster.

  was:Write proper tests for start/stop client. For now we only have "start 
client" tests that don't actually check blocking on exchange in proper way.


> Write proper tests for start/stop client.
> -
>
> Key: IGNITE-9917
> URL: https://issues.apache.org/jira/browse/IGNITE-9917
> Project: Ignite
>  Issue Type: Sub-task
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>
> Write proper tests for start/stop client. For now we only have "start client" 
> tests that don't actually check blocking on exchange in proper way.
> To correctly block client-only PME, it is necessary to block Discovery 
> messages for node join/left events from reaching some nodes in cluster. It 
> will create a situation when nodes have different AffinityTopologyVersion, 
> but affinity assignments are still the same. After 
> https://issues.apache.org/jira/browse/IGNITE-9558 is done, all cache 
> operations should work successfully on such cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9694) Add tests to check that reading queries are not blocked on exchange events that don't change data visibility

2018-10-17 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16653589#comment-16653589
 ] 

Ilya Lantukh commented on IGNITE-9694:
--

[~DmitriyGovorukhin] [~ibessonov],
We still don't have correct tests for client-only PME. Please either fix them 
in this ticket or create another one for that.

> Add tests to check that reading queries are not blocked on exchange events 
> that don't change data visibility
> 
>
> Key: IGNITE-9694
> URL: https://issues.apache.org/jira/browse/IGNITE-9694
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
> Fix For: 2.8
>
>
> In current implementation there might be situations where reading operation 
> waits, for example, exchange of client join event. Such events should not 
> block read operations.
> In theory - the only operation that has to block reading (except for writing) 
> is "node left" for server (or baseline server in case of persistent setup).
> Table shows current state of blocking, covered by test in this ticket:
>  
> Partitioned cache:
> || ||Start
>  Client||Stop
>  Client||Start
>  Server||Stop
>  Server||Start
>  Baseline||Stop
>  Baseline||Add
>  Baseline||Start
>  Cache||Stop
>  Cache||Create
>  Sql Index||Drop
>  Sql Index||
> |Get|   (/)|   (?)|   (x)|   (x)|     (x)|     (x)|     (/)|   (x)|   (x)|    
>   (/)|  (/)|
> |Get All|   (/)|   (?)|   (x)|   (x)|     (x)|     (x)|     (/)|   (x)|   
> (x)|      (/)|  (/)|
> |Scan|   (/)|   (?)|   (/)|   (/)|     (/)|     (/)|     (/)|   (/)|   (/)|   
>    (/)|      (/)|
> |Sql Query|   (/)|   (?)|   (x)|   (x)|     (/)|     (x)|     (?)|   (/)|   
> (/)|  (/)|      (/)|
> Replicated cache:
> || ||Start
>  Client||Stop
>  Client||Start
>  Server||Stop
>  Server||Start
>  Baseline||Stop
>  Baseline||Add
>  Baseline||Start
>  Cache||Stop
>  Cache||Create
>  Sql Index||Drop
>  Sql Index||
> |Get|   (/)|   (?)|   (x)|   (x)|     (x)|     (x)|     (/)|   (x)|   (x)|    
>   (/)|  (/)|
> |Get All|   (/)|   (?)|   (x)|   (x)|     (x)|     (x)|     (/)|   (x)|   
> (x)|      (/)|  (/)|
> |Scan|   (/)|   (?)|   (/)|   (/)|     (/)|     (/)|     (/)|   (/)|   (/)|   
>    (/)|      (/)|
> |Sql Query|   (/)|   (?)|   (/)|   (/)|     (/)|     (?)|     (/)|   (/)|   
> (/)|  (/)|      (/)|
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (IGNITE-9694) Add tests to check that reading queries are not blocked on exchange events that don't change data visibility

2018-10-17 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16653589#comment-16653589
 ] 

Ilya Lantukh edited comment on IGNITE-9694 at 10/17/18 2:00 PM:


[~DmitriyGovorukhin], [~ibessonov],
We still don't have correct tests for client-only PME. Please either fix them 
in this ticket or create another one for that.


was (Author: ilantukh):
[~DmitriyGovorukhin] [~ibessonov],
We still don't have correct tests for client-only PME. Please either fix them 
in this ticket or create another one for that.

> Add tests to check that reading queries are not blocked on exchange events 
> that don't change data visibility
> 
>
> Key: IGNITE-9694
> URL: https://issues.apache.org/jira/browse/IGNITE-9694
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
> Fix For: 2.8
>
>
> In current implementation there might be situations where reading operation 
> waits, for example, exchange of client join event. Such events should not 
> block read operations.
> In theory - the only operation that has to block reading (except for writing) 
> is "node left" for server (or baseline server in case of persistent setup).
> Table shows current state of blocking, covered by test in this ticket:
>  
> Partitioned cache:
> || ||Start
>  Client||Stop
>  Client||Start
>  Server||Stop
>  Server||Start
>  Baseline||Stop
>  Baseline||Add
>  Baseline||Start
>  Cache||Stop
>  Cache||Create
>  Sql Index||Drop
>  Sql Index||
> |Get|   (/)|   (?)|   (x)|   (x)|     (x)|     (x)|     (/)|   (x)|   (x)|    
>   (/)|  (/)|
> |Get All|   (/)|   (?)|   (x)|   (x)|     (x)|     (x)|     (/)|   (x)|   
> (x)|      (/)|  (/)|
> |Scan|   (/)|   (?)|   (/)|   (/)|     (/)|     (/)|     (/)|   (/)|   (/)|   
>    (/)|      (/)|
> |Sql Query|   (/)|   (?)|   (x)|   (x)|     (/)|     (x)|     (?)|   (/)|   
> (/)|  (/)|      (/)|
> Replicated cache:
> || ||Start
>  Client||Stop
>  Client||Start
>  Server||Stop
>  Server||Start
>  Baseline||Stop
>  Baseline||Add
>  Baseline||Start
>  Cache||Stop
>  Cache||Create
>  Sql Index||Drop
>  Sql Index||
> |Get|   (/)|   (?)|   (x)|   (x)|     (x)|     (x)|     (/)|   (x)|   (x)|    
>   (/)|  (/)|
> |Get All|   (/)|   (?)|   (x)|   (x)|     (x)|     (x)|     (/)|   (x)|   
> (x)|      (/)|  (/)|
> |Scan|   (/)|   (?)|   (/)|   (/)|     (/)|     (/)|     (/)|   (/)|   (/)|   
>    (/)|      (/)|
> |Sql Query|   (/)|   (?)|   (/)|   (/)|     (/)|     (?)|     (/)|   (/)|   
> (/)|  (/)|      (/)|
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9868) Refactor [GridCachePartitionExchangeManager] Sending Full Message/Full Message creating

2018-10-17 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16653238#comment-16653238
 ] 

Ilya Lantukh commented on IGNITE-9868:
--

Thanks for the contribution! Looks good.

> Refactor [GridCachePartitionExchangeManager] Sending Full Message/Full 
> Message creating
> ---
>
> Key: IGNITE-9868
> URL: https://issues.apache.org/jira/browse/IGNITE-9868
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Sergey Antonov
>Assignee: Sergey Antonov
>Priority: Major
>
> Made messages more informative and, maybe, reduce number of messages in log 
> file



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (IGNITE-9868) Refactor [GridCachePartitionExchangeManager] Sending Full Message/Full Message creating

2018-10-17 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16653238#comment-16653238
 ] 

Ilya Lantukh edited comment on IGNITE-9868 at 10/17/18 9:21 AM:


Looks good. Thanks for the contribution!


was (Author: ilantukh):
Thanks for the contribution! Looks good.

> Refactor [GridCachePartitionExchangeManager] Sending Full Message/Full 
> Message creating
> ---
>
> Key: IGNITE-9868
> URL: https://issues.apache.org/jira/browse/IGNITE-9868
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Sergey Antonov
>Assignee: Sergey Antonov
>Priority: Major
>
> Made messages more informative and, maybe, reduce number of messages in log 
> file



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9794) Registration of a binary type with POJO field under topology lock leads to UnregisteredBinaryTypeException

2018-10-05 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16639794#comment-16639794
 ] 

Ilya Lantukh commented on IGNITE-9794:
--

[~dmekhanikov],

Thanks! Looks good.

> Registration of a binary type with POJO field under topology lock leads to 
> UnregisteredBinaryTypeException
> --
>
> Key: IGNITE-9794
> URL: https://issues.apache.org/jira/browse/IGNITE-9794
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.6
>Reporter: Denis Mekhanikov
>Assignee: Denis Mekhanikov
>Priority: Major
> Fix For: 2.8
>
> Attachments: BinaryMetadataRegistrationInsideEntryProcessorTest.java
>
>
> Please find attached test class with a reproducer.
> The exception was introduced in IGNITE-8926. Metadata registration should be 
> retried when this exception is thrown, but it doesn't happen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9561) Optimize affinity initialization for started cache groups

2018-10-01 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16633898#comment-16633898
 ] 

Ilya Lantukh commented on IGNITE-9561:
--

[~Jokser], thanks for the contribution!

I have a few questions and remarks:
1. IgniteThrowableConsumer - what's the difference between this class and 
IgniteInClosureX?
2. CacheAffinitySharedManager.applyAffinityFromFullMessage - I didn't 
understand your comment. What do you mean by *pattern of code (nodesByOrder, 
affCache)*? Why can't I use it? Please write a more detailed comment.
3. CacheGroupDescriptor - why do you need equals and hashCode methods? At least 
fix codestyle for them.
4. IgniteUtils.doInParallel - I find it very inconvenient that in case of 
Exception you won't have any information about which particular tasks failed.

> Optimize affinity initialization for started cache groups
> -
>
> Key: IGNITE-9561
> URL: https://issues.apache.org/jira/browse/IGNITE-9561
> Project: Ignite
>  Issue Type: Improvement
>  Components: cache
>Affects Versions: 2.5
>Reporter: Pavel Kovalenko
>Assignee: Pavel Kovalenko
>Priority: Major
>  Labels: cache
> Fix For: 2.8
>
>
> At the end of
> {noformat}
> org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager#processCacheStartRequests
>  
> {noformat}
> method we're initializing affinity for cache groups starting at current 
> exchange.
> We do it one-by-one and synchronously wait for AffinityFetchResponse for each 
> of the starting groups. This is inefficient. We may parallelize this process 
> and speed up caches starting process.
> NOTE: There are also a lot of affinity recalculation methods in: 
> {noformat}
> CacheAffinitySharedManager
> {noformat}
> which all looks like iterate over cache groups and recalculate affinity for 
> all of them. We can easily speed-up each of such methods executing in 
> parallel affinity re-calculation for each of cache groups.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-8619) Remote node could not start in ssh connection

2018-09-17 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-8619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16617419#comment-16617419
 ] 

Ilya Lantukh commented on IGNITE-8619:
--

[~ivanan.fed], thanks for detailed explanation, but by "add comments" I meant 
comments in the source code, not in Jira.

> Remote node could not start in ssh connection
> -
>
> Key: IGNITE-8619
> URL: https://issues.apache.org/jira/browse/IGNITE-8619
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.6
>Reporter: Ivan Fedotov
>Assignee: Ivan Fedotov
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
> Fix For: 2.7
>
>
> Now there is a problem with launch remote node via ssh. Initially was an 
> assumption that it's due to remote process has not enough time to write 
> information into log: 
> [IGNITE-8085|https://issues.apache.org/jira/browse/IGNITE-8085]. But this 
> correction didn't fix [TeamCity 
> |https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=6814497542781613621&tab=testDetails]
>  (IgniteProjectionStartStopRestartSelfTest.testStartFiveNodesInTwoCalls). 
> So  it's necessary to make launch remote node via ssh always succesful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9392) CacheAsyncOperationsFailoverTxTest hangs on TC

2018-09-14 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16614799#comment-16614799
 ] 

Ilya Lantukh commented on IGNITE-9392:
--

Done.

> CacheAsyncOperationsFailoverTxTest hangs on TC
> --
>
> Key: IGNITE-9392
> URL: https://issues.apache.org/jira/browse/IGNITE-9392
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ilya Lantukh
>Assignee: Ilya Lantukh
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
> Fix For: 2.7
>
>
> Exchange worker hangs while waiting for partition release:
> {code}
> [13:42:50] :   [Step 3/4] Thread 
> [name="exchange-worker-#214881%distributed.CacheAsyncOperationsFailoverTxTest0%",
>  id=245275, state=TIMED_WAITING, blockCnt=135, waitCnt=176]
> [13:42:50] :   [Step 3/4] at sun.misc.Unsafe.park(Native Method)
> [13:42:50] :   [Step 3/4] at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:338)
> [13:42:50] :   [Step 3/4] at 
> o.a.i.i.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:217)
> [13:42:50] :   [Step 3/4] at 
> o.a.i.i.util.future.GridFutureAdapter.get(GridFutureAdapter.java:159)
> [13:42:50] :   [Step 3/4] at 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.waitPartitionRelease(GridDhtPartitionsExchangeFuture.java:1367)
> [13:42:50] :   [Step 3/4] at 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:1211)
> [13:42:50] :   [Step 3/4] at 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:752)
> [13:42:50] :   [Step 3/4] at 
> o.a.i.i.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:2525)
> [13:42:50] :   [Step 3/4] at 
> o.a.i.i.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2405)
> [13:42:50] :   [Step 3/4] at 
> o.a.i.i.util.worker.GridWorker.run(GridWorker.java:110)
> [13:42:50] :   [Step 3/4] at java.lang.Thread.run(Thread.java:748)
> {code}
> At that moment there are lots of pending transactions and one pending TX 
> finish future:
> {code}
> [13:43:14]W:   [org.apache.ignite:ignite-core] [2018-08-27 
> 10:43:14,632][WARN 
> ][exchange-worker-#214881%distributed.CacheAsyncOperationsFailoverTxTest0%][diagnostic]
>  Failed to wait for partition map exchange [topVer=AffinityTopologyVersion 
> [topVer=37, minorTopVer=0], node=98909049-bca4-4cba-b659-768ccfe0]. 
> Dumping pending objects that might be the cause: 
> [13:43:14]W:   [org.apache.ignite:ignite-core] [2018-08-27 
> 10:43:14,632][WARN 
> ][exchange-worker-#214881%distributed.CacheAsyncOperationsFailoverTxTest0%][diagnostic]
>  Ready affinity version: AffinityTopologyVersion [topVer=36, minorTopVer=0]
> [13:43:14]W:   [org.apache.ignite:ignite-core] [2018-08-27 
> 10:43:14,633][WARN 
> ][exchange-worker-#214881%distributed.CacheAsyncOperationsFailoverTxTest0%][diagnostic]
>  Last exchange future: GridDhtPartitionsExchangeFuture 
> [firstDiscoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode 
> [id=387bab47-9542-4d98-8dda-03e3a3f3, addrs=ArrayList [127.0.0.1], 
> sockAddrs=HashSet [/127.0.0.1:47503], discPort=47503, order=4, intOrder=4, 
> lastExchangeTime=1535366199108, loc=false, ver=2.7.0#20180824-sha1:b0175cf6, 
> isClient=false], topVer=37, nodeId8=98909049, msg=Node left: TcpDiscoveryNode 
> [id=387bab47-9542-4d98-8dda-03e3a3f3, addrs=ArrayList [127.0.0.1], 
> sockAddrs=HashSet [/127.0.0.1:47503], discPort=47503, order=4, intOrder=4, 
> lastExchangeTime=1535366199108, loc=false, ver=2.7.0#20180824-sha1:b0175cf6, 
> isClient=false], type=NODE_LEFT, tstamp=1535366275135], crd=TcpDiscoveryNode 
> [id=98909049-bca4-4cba-b659-768ccfe0, addrs=ArrayList [127.0.0.1], 
> sockAddrs=HashSet [/127.0.0.1:47500], discPort=47500, order=1, intOrder=1, 
> lastExchangeTime=1535366575460, loc=true, ver=2.7.0#20180824-sha1:b0175cf6, 
> isClient=false], exchId=GridDhtPartitionExchangeId 
> [topVer=AffinityTopologyVersion [topVer=37, minorTopVer=0], 
> discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode 
> [id=387bab47-9542-4d98-8dda-03e3a3f3, addrs=ArrayList [127.0.0.1], 
> sockAddrs=HashSet [/127.0.0.1:47503], discPort=47503, order=4, intOrder=4, 
> lastExchangeTime=1535366199108, loc=false, ver=2.7.0#20180824-sha1:b0175cf6, 
> isClient=false], topVer=37, nodeId8=98909049, msg=Node left: TcpDiscoveryNode 
> [id=387bab47-9542-4d98-8dda-03e3a3f3, addrs=ArrayList [127.0.0.1], 
> sockAddrs=HashSet [/127.0.0.1:47503], discPort=47503, order=4, intOrder=4, 
> lastExchangeTime=1535

[jira] [Commented] (IGNITE-9249) Tests hang when different threads try to start and stop nodes at the same time.

2018-09-13 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16613471#comment-16613471
 ] 

Ilya Lantukh commented on IGNITE-9249:
--

[~agoncharuk], I've pushed fix for join timeout handling into this PR, please 
merge it into master.

> Tests hang when different threads try to start and stop nodes at the same 
> time.
> ---
>
> Key: IGNITE-9249
> URL: https://issues.apache.org/jira/browse/IGNITE-9249
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ilya Lantukh
>Assignee: Ilya Lantukh
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
>
> An example of such test is 
> GridCachePartitionedNearDisabledOptimisticTxNodeRestartTest.testRestartWithPutFourNodesOneBackupsOffheapEvict().
> Hanged threads:
> {code}
> "restart-worker-1@63424" prio=5 tid=0x7f5e nid=NA waiting
>   java.lang.Thread.State: WAITING
> at java.lang.Object.wait(Object.java:-1)
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl.joinTopology(ServerImpl.java:949)
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl.spiStart(ServerImpl.java:389)
> at 
> org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.spiStart(TcpDiscoverySpi.java:2002)
> at 
> org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:297)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.start(GridDiscoveryManager.java:916)
> at 
> org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1754)
> at 
> org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1050)
> at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2020)
> at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1725)
> - locked <0xfc36> (a 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance)
> at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1153)
> at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:651)
> at 
> org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:920)
> at 
> org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:858)
> at 
> org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:846)
> at 
> org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:812)
> at 
> org.apache.ignite.internal.processors.cache.distributed.GridCacheAbstractNodeRestartSelfTest.access$1000(GridCacheAbstractNodeRestartSelfTest.java:64)
> at 
> org.apache.ignite.internal.processors.cache.distributed.GridCacheAbstractNodeRestartSelfTest$2.run(GridCacheAbstractNodeRestartSelfTest.java:665)
> at java.lang.Thread.run(Thread.java:748)
> "restart-worker-0@63423" prio=5 tid=0x7f5d nid=NA waiting
>   java.lang.Thread.State: WAITING
> at sun.misc.Unsafe.park(Unsafe.java:-1)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
> at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
> at 
> org.apache.ignite.internal.util.IgniteUtils.awaitQuiet(IgniteUtils.java:7584)
> at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.grid(IgnitionEx.java:1666)
> at 
> org.apache.ignite.internal.IgnitionEx.allGrids(IgnitionEx.java:1284)
> at 
> org.apache.ignite.internal.IgnitionEx.allGrids(IgnitionEx.java:1262)
> at org.apache.ignite.Ignition.allGrids(Ignition.java:502)
> at 
> org.apache.ignite.testframework.junits.GridAbstractTest.awaitTopologyChange(GridAbstractTest.java:2258)
> at 
> org.apache.ignite.testframework.junits.GridAbstractTest.stopGrid(GridAbstractTest.java:1158)
> at 
> org.apache.ignite.testframework.junits.GridAbstractTest.stopGrid(GridAbstractTest.java:1133)
> at 
> org.apache.ignite.testframework.junits.GridAbstractTest.stopGrid(GridAbstractTest.java:1433)
> at 
> org.apache.ignite.internal.processors.cache.distributed.GridCacheAbstractNodeRestartSelfTest.access$800(GridCacheAbstractNodeRestartSelfTest.java:64)
> at 
> org.apache.ignite.internal.processors.cache.distributed.GridCacheAbstractNodeRestartSelfTest$2.run

[jira] [Commented] (IGNITE-9249) Tests hang when different threads try to start and stop nodes at the same time.

2018-09-12 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16612277#comment-16612277
 ] 

Ilya Lantukh commented on IGNITE-9249:
--

https://ci.ignite.apache.org/viewQueued.html?itemId=1850938

> Tests hang when different threads try to start and stop nodes at the same 
> time.
> ---
>
> Key: IGNITE-9249
> URL: https://issues.apache.org/jira/browse/IGNITE-9249
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ilya Lantukh
>Assignee: Ilya Lantukh
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
>
> An example of such test is 
> GridCachePartitionedNearDisabledOptimisticTxNodeRestartTest.testRestartWithPutFourNodesOneBackupsOffheapEvict().
> Hanged threads:
> {code}
> "restart-worker-1@63424" prio=5 tid=0x7f5e nid=NA waiting
>   java.lang.Thread.State: WAITING
> at java.lang.Object.wait(Object.java:-1)
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl.joinTopology(ServerImpl.java:949)
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl.spiStart(ServerImpl.java:389)
> at 
> org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.spiStart(TcpDiscoverySpi.java:2002)
> at 
> org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:297)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.start(GridDiscoveryManager.java:916)
> at 
> org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1754)
> at 
> org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1050)
> at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2020)
> at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1725)
> - locked <0xfc36> (a 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance)
> at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1153)
> at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:651)
> at 
> org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:920)
> at 
> org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:858)
> at 
> org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:846)
> at 
> org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:812)
> at 
> org.apache.ignite.internal.processors.cache.distributed.GridCacheAbstractNodeRestartSelfTest.access$1000(GridCacheAbstractNodeRestartSelfTest.java:64)
> at 
> org.apache.ignite.internal.processors.cache.distributed.GridCacheAbstractNodeRestartSelfTest$2.run(GridCacheAbstractNodeRestartSelfTest.java:665)
> at java.lang.Thread.run(Thread.java:748)
> "restart-worker-0@63423" prio=5 tid=0x7f5d nid=NA waiting
>   java.lang.Thread.State: WAITING
> at sun.misc.Unsafe.park(Unsafe.java:-1)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
> at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
> at 
> org.apache.ignite.internal.util.IgniteUtils.awaitQuiet(IgniteUtils.java:7584)
> at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.grid(IgnitionEx.java:1666)
> at 
> org.apache.ignite.internal.IgnitionEx.allGrids(IgnitionEx.java:1284)
> at 
> org.apache.ignite.internal.IgnitionEx.allGrids(IgnitionEx.java:1262)
> at org.apache.ignite.Ignition.allGrids(Ignition.java:502)
> at 
> org.apache.ignite.testframework.junits.GridAbstractTest.awaitTopologyChange(GridAbstractTest.java:2258)
> at 
> org.apache.ignite.testframework.junits.GridAbstractTest.stopGrid(GridAbstractTest.java:1158)
> at 
> org.apache.ignite.testframework.junits.GridAbstractTest.stopGrid(GridAbstractTest.java:1133)
> at 
> org.apache.ignite.testframework.junits.GridAbstractTest.stopGrid(GridAbstractTest.java:1433)
> at 
> org.apache.ignite.internal.processors.cache.distributed.GridCacheAbstractNodeRestartSelfTest.access$800(GridCacheAbstractNodeRestartSelfTest.java:64)
> at 
> org.apache.ignite.internal.processors.cache.distributed.GridCacheAbstractNodeRestartSelfTest$2.run(GridCacheAbstractNodeRestartSelfTest.jav

[jira] [Commented] (IGNITE-9558) Avoid changing AffinityTopologyVersion on client connect when possible

2018-09-12 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16612152#comment-16612152
 ] 

Ilya Lantukh commented on IGNITE-9558:
--

It looks like special handling for near caches isn't required.

Also, it should be theoretically possible to make similar optimizations for 
join/leave events of server nodes that are not in baseline.

> Avoid changing AffinityTopologyVersion on client connect when possible
> --
>
> Key: IGNITE-9558
> URL: https://issues.apache.org/jira/browse/IGNITE-9558
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 2.0
>Reporter: Alexey Goncharuk
>Assignee: Ilya Lantukh
>Priority: Major
>
> Currently a client join event changes discovery topology version which, in 
> turn, changes AffinityTopologyVersion.
> When a client maps transaction on new AffinityTopologyVersion, corresponding 
> message is not processed on remote node until remote node receives the 
> corresponding discovery event. If discovery event delivery is delayed for 
> some reason, this will result in transaction stalls on client joins.
> Since the client node does not change partition affinity, we can safely map 
> transactions on the previous topology version and do not change the affinity 
> topology version at all.
> Some cases need special care and probably do not qualify for this 
> optimization, such as when client has near cache or client hosts partition 
> for REPLICATED cache.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (IGNITE-9558) Avoid changing AffinityTopologyVersion on client connect when possible

2018-09-12 Thread Ilya Lantukh (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ilya Lantukh reassigned IGNITE-9558:


Assignee: Ilya Lantukh

> Avoid changing AffinityTopologyVersion on client connect when possible
> --
>
> Key: IGNITE-9558
> URL: https://issues.apache.org/jira/browse/IGNITE-9558
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 2.0
>Reporter: Alexey Goncharuk
>Assignee: Ilya Lantukh
>Priority: Major
>
> Currently a client join event changes discovery topology version which, in 
> turn, changes AffinityTopologyVersion.
> When a client maps transaction on new AffinityTopologyVersion, corresponding 
> message is not processed on remote node until remote node receives the 
> corresponding discovery event. If discovery event delivery is delayed for 
> some reason, this will result in transaction stalls on client joins.
> Since the client node does not change partition affinity, we can safely map 
> transactions on the previous topology version and do not change the affinity 
> topology version at all.
> Some cases need special care and probably do not qualify for this 
> optimization, such as when client has near cache or client hosts partition 
> for REPLICATED cache.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9249) Tests hang when different threads try to start and stop nodes at the same time.

2018-09-11 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16610545#comment-16610545
 ] 

Ilya Lantukh commented on IGNITE-9249:
--

https://ci.ignite.apache.org/viewQueued.html?itemId=1841812

> Tests hang when different threads try to start and stop nodes at the same 
> time.
> ---
>
> Key: IGNITE-9249
> URL: https://issues.apache.org/jira/browse/IGNITE-9249
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ilya Lantukh
>Assignee: Ilya Lantukh
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
>
> An example of such test is 
> GridCachePartitionedNearDisabledOptimisticTxNodeRestartTest.testRestartWithPutFourNodesOneBackupsOffheapEvict().
> Hanged threads:
> {code}
> "restart-worker-1@63424" prio=5 tid=0x7f5e nid=NA waiting
>   java.lang.Thread.State: WAITING
> at java.lang.Object.wait(Object.java:-1)
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl.joinTopology(ServerImpl.java:949)
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl.spiStart(ServerImpl.java:389)
> at 
> org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.spiStart(TcpDiscoverySpi.java:2002)
> at 
> org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:297)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.start(GridDiscoveryManager.java:916)
> at 
> org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1754)
> at 
> org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1050)
> at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2020)
> at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1725)
> - locked <0xfc36> (a 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance)
> at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1153)
> at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:651)
> at 
> org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:920)
> at 
> org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:858)
> at 
> org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:846)
> at 
> org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:812)
> at 
> org.apache.ignite.internal.processors.cache.distributed.GridCacheAbstractNodeRestartSelfTest.access$1000(GridCacheAbstractNodeRestartSelfTest.java:64)
> at 
> org.apache.ignite.internal.processors.cache.distributed.GridCacheAbstractNodeRestartSelfTest$2.run(GridCacheAbstractNodeRestartSelfTest.java:665)
> at java.lang.Thread.run(Thread.java:748)
> "restart-worker-0@63423" prio=5 tid=0x7f5d nid=NA waiting
>   java.lang.Thread.State: WAITING
> at sun.misc.Unsafe.park(Unsafe.java:-1)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
> at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
> at 
> org.apache.ignite.internal.util.IgniteUtils.awaitQuiet(IgniteUtils.java:7584)
> at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.grid(IgnitionEx.java:1666)
> at 
> org.apache.ignite.internal.IgnitionEx.allGrids(IgnitionEx.java:1284)
> at 
> org.apache.ignite.internal.IgnitionEx.allGrids(IgnitionEx.java:1262)
> at org.apache.ignite.Ignition.allGrids(Ignition.java:502)
> at 
> org.apache.ignite.testframework.junits.GridAbstractTest.awaitTopologyChange(GridAbstractTest.java:2258)
> at 
> org.apache.ignite.testframework.junits.GridAbstractTest.stopGrid(GridAbstractTest.java:1158)
> at 
> org.apache.ignite.testframework.junits.GridAbstractTest.stopGrid(GridAbstractTest.java:1133)
> at 
> org.apache.ignite.testframework.junits.GridAbstractTest.stopGrid(GridAbstractTest.java:1433)
> at 
> org.apache.ignite.internal.processors.cache.distributed.GridCacheAbstractNodeRestartSelfTest.access$800(GridCacheAbstractNodeRestartSelfTest.java:64)
> at 
> org.apache.ignite.internal.processors.cache.distributed.GridCacheAbstractNodeRestartSelfTest$2.run(GridCacheAbstractNodeRestartSelfTest.jav

[jira] [Commented] (IGNITE-8619) Remote node could not start in ssh connection

2018-09-10 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-8619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16609360#comment-16609360
 ] 

Ilya Lantukh commented on IGNITE-8619:
--

[~ivanan.fed], I've reviewed your PR from a Java developer perspective. It's 
not obvious how and why your solution fixes the problem, please refactor it to 
be more self-explanatory or add comments.

> Remote node could not start in ssh connection
> -
>
> Key: IGNITE-8619
> URL: https://issues.apache.org/jira/browse/IGNITE-8619
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.6
>Reporter: Ivan Fedotov
>Assignee: Ivan Fedotov
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
> Fix For: 2.7
>
>
> Now there is a problem with launch remote node via ssh. Initially was an 
> assumption that it's due to remote process has not enough time to write 
> information into log: 
> [IGNITE-8085|https://issues.apache.org/jira/browse/IGNITE-8085]. But this 
> correction didn't fix [TeamCity 
> |https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=6814497542781613621&tab=testDetails]
>  (IgniteProjectionStartStopRestartSelfTest.testStartFiveNodesInTwoCalls). 
> So  it's necessary to make launch remote node via ssh always succesful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IGNITE-8830) Non-coordinator node unable to finish local exchange should detect it and stop

2018-09-10 Thread Ilya Lantukh (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-8830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ilya Lantukh resolved IGNITE-8830.
--
Resolution: Duplicate

> Non-coordinator node unable to finish local exchange should detect it and stop
> --
>
> Key: IGNITE-8830
> URL: https://issues.apache.org/jira/browse/IGNITE-8830
> Project: Ignite
>  Issue Type: Improvement
>  Components: general
>Reporter: Sergey Chugunov
>Priority: Major
>  Labels: iep-25
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> At the final stage of Partition Map Exchange coordinator node sends full 
> partitions map to all other nodes.
> If some node fails to apply this message and finish its local exchange it 
> won't be able to operate correctly.
> To prevent this node should be able to check status of this exchange on 
> coordinator (by sending some diagnostic message). If the exchange has already 
> finished on coordinator, node should stop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-8830) Non-coordinator node unable to finish local exchange should detect it and stop

2018-09-10 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-8830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16609262#comment-16609262
 ] 

Ilya Lantukh commented on IGNITE-8830:
--

Implemented in the scope of IGNITE-8828.

> Non-coordinator node unable to finish local exchange should detect it and stop
> --
>
> Key: IGNITE-8830
> URL: https://issues.apache.org/jira/browse/IGNITE-8830
> Project: Ignite
>  Issue Type: Improvement
>  Components: general
>Reporter: Sergey Chugunov
>Priority: Major
>  Labels: iep-25
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> At the final stage of Partition Map Exchange coordinator node sends full 
> partitions map to all other nodes.
> If some node fails to apply this message and finish its local exchange it 
> won't be able to operate correctly.
> To prevent this node should be able to check status of this exchange on 
> coordinator (by sending some diagnostic message). If the exchange has already 
> finished on coordinator, node should stop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-8832) Detecting hanging of coordinator during processing local partition maps messages

2018-09-10 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-8832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16609254#comment-16609254
 ] 

Ilya Lantukh commented on IGNITE-8832:
--

This situation is already covered by IGNITE-6890 and IGNITE-8828.

> Detecting hanging of coordinator during processing local partition maps 
> messages
> 
>
> Key: IGNITE-8832
> URL: https://issues.apache.org/jira/browse/IGNITE-8832
> Project: Ignite
>  Issue Type: Improvement
>  Components: general
>Reporter: Sergey Chugunov
>Priority: Major
>  Labels: iep-25
>   Original Estimate: 264h
>  Remaining Estimate: 264h
>
> After coordinator gathered local partition maps from all other server nodes 
> it should prepare full partition map and send it back.
> If for some reason (e.g. bug in code) coordinator fails to prepare full map 
> it should detect this situation and shut down itself.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IGNITE-8832) Detecting hanging of coordinator during processing local partition maps messages

2018-09-10 Thread Ilya Lantukh (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-8832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ilya Lantukh resolved IGNITE-8832.
--
Resolution: Not A Problem

> Detecting hanging of coordinator during processing local partition maps 
> messages
> 
>
> Key: IGNITE-8832
> URL: https://issues.apache.org/jira/browse/IGNITE-8832
> Project: Ignite
>  Issue Type: Improvement
>  Components: general
>Reporter: Sergey Chugunov
>Priority: Major
>  Labels: iep-25
>   Original Estimate: 264h
>  Remaining Estimate: 264h
>
> After coordinator gathered local partition maps from all other server nodes 
> it should prepare full partition map and send it back.
> If for some reason (e.g. bug in code) coordinator fails to prepare full map 
> it should detect this situation and shut down itself.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9475) Closures that has been created on client does not provide real class name to TASK_* permissions

2018-09-07 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16607210#comment-16607210
 ] 

Ilya Lantukh commented on IGNITE-9475:
--

Thanks for contribution, changes look good to me.

> Closures that has been created on client does not provide real class name to 
> TASK_* permissions
> ---
>
> Key: IGNITE-9475
> URL: https://issues.apache.org/jira/browse/IGNITE-9475
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
> Fix For: 2.7
>
>
> Broadcast job for example get 
> org.apache.ignite.internal.processors.closure.GridClosureProcessor$T6 task 
> name.
> This combination of java + xml config
> {code:java}
> ignite.compute(ignite.cluster().forServers()).broadcast(new 
> DistributedJob(cacheName));
> {code}
> {code:java}
>{
>
> task:'org.apache.ignite.piclient.operations.DistributedChecksumOperation$DistributedJob',
>
> permissions:[TASK_EXECUTE]
>},
> {code}
> provides following error
> {code:java}
> Authorization failed [perm=TASK_EXECUTE, 
> name=org.apache.ignite.internal.processors.closure.GridClosureProcessor$T6, 
> ... ]
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (IGNITE-9419) Avoid saving cache configuration synchronously during PME

2018-09-06 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16605879#comment-16605879
 ] 

Ilya Lantukh edited comment on IGNITE-9419 at 9/6/18 3:48 PM:
--

[~Jokser],
Thanks for contribution!

In my opinion, adding public-mutable 
GridDhtPartitionsExchangeFuture.registerCachesFuture breaks encapsulation. Is 
it possible to re-design your solution to make this field modifiable only by 
ExchangeFuture itself?

Also, please add tests for this functionality.


was (Author: ilantukh):
[~Jokser],
Thanks for contribution!

In my opinion, adding publicly-mutable 
GridDhtPartitionsExchangeFuture.registerCachesFuture breaks encapsulation. Is 
it possible to re-design your solution to make this field modifiable only by 
ExchangeFuture itself?

Also, please add tests for this functionality.

> Avoid saving cache configuration synchronously during PME
> -
>
> Key: IGNITE-9419
> URL: https://issues.apache.org/jira/browse/IGNITE-9419
> Project: Ignite
>  Issue Type: Improvement
>  Components: cache
>Affects Versions: 2.5
>Reporter: Pavel Kovalenko
>Assignee: Pavel Kovalenko
>Priority: Major
> Fix For: 2.7
>
>
> Currently, we save cache configuration during PME at the activation phase 
> here 
> org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.CachesInfo#updateCachesInfo
>  . We should avoid this, as it performs operations to a disk. We should save 
> it asynchronously or lazy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Issue Comment Deleted] (IGNITE-9084) Trash in WAL after node stop may affect WAL rebalance

2018-09-06 Thread Ilya Lantukh (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ilya Lantukh updated IGNITE-9084:
-
Comment: was deleted

(was: [~Jokser],
Thanks for contribution!

I've reviewed your PR, you can find my comments here: 
https://reviews.ignite.apache.org/ignite/review/IGNT-CR-760.)

> Trash in WAL after node stop may affect WAL rebalance
> -
>
> Key: IGNITE-9084
> URL: https://issues.apache.org/jira/browse/IGNITE-9084
> Project: Ignite
>  Issue Type: Bug
>  Components: cache
>Affects Versions: 2.6
>Reporter: Pavel Kovalenko
>Assignee: Pavel Kovalenko
>Priority: Major
> Fix For: 2.7
>
>
> During iteration over WAL we can face with trash in WAL segment, which can 
> remains after node restart. We should handle this situation in WAL rebalance 
> iterator and gracefully stop iteration process.
> {noformat}
> [2018-07-25 
> 17:18:21,152][ERROR][sys-#25385%persistence.IgnitePdsTxHistoricalRebalancingTest0%][GridCacheIoManager]
>  Failed to process message [senderId=f0d35df7-ff93-4b6c-b699-45f3e7c3, 
> messageType=class 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemandMessage]
> class org.apache.ignite.IgniteException: Failed to read WAL record at 
> position: 19346739 size: 67108864
>   at 
> org.apache.ignite.internal.util.lang.GridIteratorAdapter.next(GridIteratorAdapter.java:38)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$WALHistoricalIterator.advance(GridCacheOffheapManager.java:1033)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$WALHistoricalIterator.next(GridCacheOffheapManager.java:948)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$WALHistoricalIterator.nextX(GridCacheOffheapManager.java:917)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$WALHistoricalIterator.nextX(GridCacheOffheapManager.java:842)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.IgniteRebalanceIteratorImpl.nextX(IgniteRebalanceIteratorImpl.java:130)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.IgniteRebalanceIteratorImpl.next(IgniteRebalanceIteratorImpl.java:185)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.IgniteRebalanceIteratorImpl.next(IgniteRebalanceIteratorImpl.java:37)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionSupplier.handleDemandMessage(GridDhtPartitionSupplier.java:348)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.handleDemandMessage(GridDhtPreloader.java:370)
>   at 
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$5.apply(GridCachePartitionExchangeManager.java:380)
>   at 
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$5.apply(GridCachePartitionExchangeManager.java:365)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1056)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:581)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$700(GridCacheIoManager.java:101)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager$OrderedMessageListener.onMessage(GridCacheIoManager.java:1613)
>   at 
> org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556)
>   at 
> org.apache.ignite.internal.managers.communication.GridIoManager.access$4100(GridIoManager.java:125)
>   at 
> org.apache.ignite.internal.managers.communication.GridIoManager$GridCommunicationMessageSet.unwind(GridIoManager.java:2752)
>   at 
> org.apache.ignite.internal.managers.communication.GridIoManager.unwindMessageSet(GridIoManager.java:1516)
>   at 
> org.apache.ignite.internal.managers.communication.GridIoManager.access$4400(GridIoManager.java:125)
>   at 
> org.apache.ignite.internal.managers.communication.GridIoManager$10.run(GridIoManager.java:1485)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: class org.apache.ignite.IgniteCheckedException: Failed to read WAL 
> record at position: 19346739 size: 67108864
>   at 
> org.apache.ignite.internal.processors.cache.persistence.wal.AbstractWalRecordsIterator.handleRecordException(AbstractWalRec

[jira] [Commented] (IGNITE-9084) Trash in WAL after node stop may affect WAL rebalance

2018-09-06 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16605924#comment-16605924
 ] 

Ilya Lantukh commented on IGNITE-9084:
--

[~Jokser],
Thanks for contribution!

I've reviewed your PR, you can find my comments here: 
https://reviews.ignite.apache.org/ignite/review/IGNT-CR-760.

> Trash in WAL after node stop may affect WAL rebalance
> -
>
> Key: IGNITE-9084
> URL: https://issues.apache.org/jira/browse/IGNITE-9084
> Project: Ignite
>  Issue Type: Bug
>  Components: cache
>Affects Versions: 2.6
>Reporter: Pavel Kovalenko
>Assignee: Pavel Kovalenko
>Priority: Major
> Fix For: 2.7
>
>
> During iteration over WAL we can face with trash in WAL segment, which can 
> remains after node restart. We should handle this situation in WAL rebalance 
> iterator and gracefully stop iteration process.
> {noformat}
> [2018-07-25 
> 17:18:21,152][ERROR][sys-#25385%persistence.IgnitePdsTxHistoricalRebalancingTest0%][GridCacheIoManager]
>  Failed to process message [senderId=f0d35df7-ff93-4b6c-b699-45f3e7c3, 
> messageType=class 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemandMessage]
> class org.apache.ignite.IgniteException: Failed to read WAL record at 
> position: 19346739 size: 67108864
>   at 
> org.apache.ignite.internal.util.lang.GridIteratorAdapter.next(GridIteratorAdapter.java:38)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$WALHistoricalIterator.advance(GridCacheOffheapManager.java:1033)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$WALHistoricalIterator.next(GridCacheOffheapManager.java:948)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$WALHistoricalIterator.nextX(GridCacheOffheapManager.java:917)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$WALHistoricalIterator.nextX(GridCacheOffheapManager.java:842)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.IgniteRebalanceIteratorImpl.nextX(IgniteRebalanceIteratorImpl.java:130)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.IgniteRebalanceIteratorImpl.next(IgniteRebalanceIteratorImpl.java:185)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.IgniteRebalanceIteratorImpl.next(IgniteRebalanceIteratorImpl.java:37)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionSupplier.handleDemandMessage(GridDhtPartitionSupplier.java:348)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.handleDemandMessage(GridDhtPreloader.java:370)
>   at 
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$5.apply(GridCachePartitionExchangeManager.java:380)
>   at 
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$5.apply(GridCachePartitionExchangeManager.java:365)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1056)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:581)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$700(GridCacheIoManager.java:101)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager$OrderedMessageListener.onMessage(GridCacheIoManager.java:1613)
>   at 
> org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556)
>   at 
> org.apache.ignite.internal.managers.communication.GridIoManager.access$4100(GridIoManager.java:125)
>   at 
> org.apache.ignite.internal.managers.communication.GridIoManager$GridCommunicationMessageSet.unwind(GridIoManager.java:2752)
>   at 
> org.apache.ignite.internal.managers.communication.GridIoManager.unwindMessageSet(GridIoManager.java:1516)
>   at 
> org.apache.ignite.internal.managers.communication.GridIoManager.access$4400(GridIoManager.java:125)
>   at 
> org.apache.ignite.internal.managers.communication.GridIoManager$10.run(GridIoManager.java:1485)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: class org.apache.ignite.IgniteCheckedException: Failed to read WAL 
> record at position: 19346739 size: 67108864
>   at 
> org.apache.ignite.internal.processors.cache.persistence.wal.AbstractWalRecordsIterator.handleRecor

[jira] [Commented] (IGNITE-9419) Avoid saving cache configuration synchronously during PME

2018-09-06 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16605879#comment-16605879
 ] 

Ilya Lantukh commented on IGNITE-9419:
--

[~Jokser],
Thanks for contribution!

In my opinion, adding publicly-mutable 
GridDhtPartitionsExchangeFuture.registerCachesFuture breaks encapsulation. Is 
it possible to re-design your solution to make this field modifiable only by 
ExchangeFuture itself?

Also, please add tests for this functionality.

> Avoid saving cache configuration synchronously during PME
> -
>
> Key: IGNITE-9419
> URL: https://issues.apache.org/jira/browse/IGNITE-9419
> Project: Ignite
>  Issue Type: Improvement
>  Components: cache
>Affects Versions: 2.5
>Reporter: Pavel Kovalenko
>Assignee: Pavel Kovalenko
>Priority: Major
> Fix For: 2.7
>
>
> Currently, we save cache configuration during PME at the activation phase 
> here 
> org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.CachesInfo#updateCachesInfo
>  . We should avoid this, as it performs operations to a disk. We should save 
> it asynchronously or lazy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9472) REST API has no permission checks for cluster activation/deactivation

2018-09-05 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16604357#comment-16604357
 ] 

Ilya Lantukh commented on IGNITE-9472:
--

[~ibessonov], thanks for contribution!

Please add test for this functionality.
Also, you should run your PR on Ignite TeamCity (https://ci.ignite.apache.org/) 
before moving ticket to the "Patch Available" state, even if changes look 
trivial.

> REST API has no permission checks for cluster activation/deactivation
> -
>
> Key: IGNITE-9472
> URL: https://issues.apache.org/jira/browse/IGNITE-9472
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>
> ADMIN_OPS permission should be required for CLUSTER_ACTIVE / CLUSTER_INACTIVE 
> commands. This has to be done in GridRestProcessor.authorize method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9068) Node fails to stop when CacheObjectBinaryProcessor.addMeta() is executed inside guard()/unguard()

2018-09-03 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16602213#comment-16602213
 ] 

Ilya Lantukh commented on IGNITE-9068:
--

We need full thread dump from all nodes if such situation happens again.

> Node fails to stop when CacheObjectBinaryProcessor.addMeta() is executed 
> inside guard()/unguard()
> -
>
> Key: IGNITE-9068
> URL: https://issues.apache.org/jira/browse/IGNITE-9068
> Project: Ignite
>  Issue Type: Bug
>  Components: binary, managed services
>Affects Versions: 2.5
>Reporter: Ilya Kasnacheev
>Assignee: Ilya Lantukh
>Priority: Blocker
>  Labels: test
> Fix For: 2.7
>
> Attachments: GridServiceDeadlockTest.java, MyService.java
>
>
> When addMeta is called in e.g. service deployment it us executed inside 
> guard()/unguard()
> If node will be stopped at this point, Ignite.stop() will hang.
> Consider the following thread dump:
> {code}
> "Thread-1" #57 prio=5 os_prio=0 tid=0x7f7780005000 nid=0x7f26 runnable 
> [0x7f766cbef000]
>java.lang.Thread.State: TIMED_WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0005cb7b0468> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(AbstractQueuedSynchronizer.java:934)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1247)
>   at 
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(ReentrantReadWriteLock.java:1115)
>   at 
> org.apache.ignite.internal.util.StripedCompositeReadWriteLock$WriteLock.tryLock(StripedCompositeReadWriteLock.java:220)
>   at 
> org.apache.ignite.internal.GridKernalGatewayImpl.tryWriteLock(GridKernalGatewayImpl.java:143)
> // Waiting for lock to cancel futures of BinaryMetadataTransport
>   at org.apache.ignite.internal.IgniteKernal.stop0(IgniteKernal.java:2171)
>   at org.apache.ignite.internal.IgniteKernal.stop(IgniteKernal.java:2094)
>   at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2545)
>   - locked <0x0005cb423f00> (a 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance)
>   at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop(IgnitionEx.java:2508)
>   at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.run(IgnitionEx.java:2033)
> "test-runner-#1%service.GridServiceDeadlockTest%" #13 prio=5 os_prio=0 
> tid=0x7f77b87d5800 nid=0x7eb8 waiting on condition [0x7f778cdfc000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
> // May never return if there's discovery problems
>   at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
>   at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140)
>   at 
> org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.addMeta(CacheObjectBinaryProcessorImpl.java:463)
>   at 
> org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl$2.addMeta(CacheObjectBinaryProcessorImpl.java:188)
>   at 
> org.apache.ignite.internal.binary.BinaryContext.registerUserClassDescriptor(BinaryContext.java:802)
>   at 
> org.apache.ignite.internal.binary.BinaryContext.registerClassDescriptor(BinaryContext.java:761)
>   at 
> org.apache.ignite.internal.binary.BinaryContext.descriptorForClass(BinaryContext.java:627)
>   at 
> org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal0(BinaryWriterExImpl.java:174)
>   at 
> org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:157)
>   at 
> org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:144)
>   at 
> org.apache.ignite.internal.binary.GridBinaryMarshaller.marshal(GridBinaryMarshaller.java:254)
>   at 
> org.apache.ignite.internal.binary.BinaryMarshaller.marshal0(BinaryMarshaller.java:82)
>   at 
> org.apache.ignite.marshaller.AbstractNodeNameAwareMarshaller.marshal(AbstractNodeNameAwareMarshaller.java:58)
>   at 
> org.apache.ignite.internal.util.IgniteUtils.marshal(IgniteUtils.java:10069)
>   at 
> org.apache.ignite.internal.processors.service.GridServiceProcessor.prepareServiceConfigurations(GridServiceProcessor.java:570)
>   at 
> org.apache.ignite.internal.processo

[jira] [Commented] (IGNITE-8286) ScanQuery ignore setLocal with non local partition

2018-08-31 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-8286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16598756#comment-16598756
 ] 

Ilya Lantukh commented on IGNITE-8286:
--

[~agoncharuk], I think we can merge it into master and include into 2.7 now.

> ScanQuery ignore setLocal with non local partition
> --
>
> Key: IGNITE-8286
> URL: https://issues.apache.org/jira/browse/IGNITE-8286
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.4
>Reporter: Alexander Belyak
>Assignee: Roman Shtykh
>Priority: Major
> Fix For: 2.7
>
>
> 1) Create partitioned cache on 2+ nodes cluster
> 2) Select some partition N, local node should not be OWNER of partition N
> 3) execute: cache.query(new ScanQuery<>().setLocal(true).setPartition(N))
> Expected result:
> empty result (probaply with logging smth like "Trying to execute local query 
>  with non local partition N") or even throw exception
> Actual result:
> executing (with ScanQueryFallbackClosableIterator) query on remote node.
> Problem is that we execute local query on remote node.
> Same behaviour can be achieved if we get empty node list from 
> GridCacheQueryAdapter.node() by any reasons, for example - if we run "local" 
> query from non data node from given cache (see 
> GridDiscoveryNamager.cacheAffinityNode(ClusterNode node, String cacheName) in 
> GridcacheQueryAdapter.executeScanQuery()



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (IGNITE-9093) IgniteDbPutGetWithCacheStoreTest.testReadThrough fails every time when run on master

2018-08-30 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16597456#comment-16597456
 ] 

Ilya Lantukh edited comment on IGNITE-9093 at 8/30/18 1:34 PM:
---

[~ilyak],
This is not a big issue and your fix looks OK. We do not store in WAL data that 
was read from external store using read-through mechanics.


was (Author: ilantukh):
[~ilyak],
This is not a big issue and your fix looks OK.

> IgniteDbPutGetWithCacheStoreTest.testReadThrough fails every time when run on 
> master
> 
>
> Key: IGNITE-9093
> URL: https://issues.apache.org/jira/browse/IGNITE-9093
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.7
>Reporter: Ilya Kasnacheev
>Assignee: Ilya Kasnacheev
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
>
> Such as in 
> https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_Pds1&branch=pull%2F4420%2Fhead&tab=buildTypeStatusDiv
> Used to work every time in 2.6 release.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9093) IgniteDbPutGetWithCacheStoreTest.testReadThrough fails every time when run on master

2018-08-30 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16597456#comment-16597456
 ] 

Ilya Lantukh commented on IGNITE-9093:
--

[~ilyak],
This is not a big issue and your fix looks OK.

> IgniteDbPutGetWithCacheStoreTest.testReadThrough fails every time when run on 
> master
> 
>
> Key: IGNITE-9093
> URL: https://issues.apache.org/jira/browse/IGNITE-9093
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.7
>Reporter: Ilya Kasnacheev
>Assignee: Ilya Kasnacheev
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
>
> Such as in 
> https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_Pds1&branch=pull%2F4420%2Fhead&tab=buildTypeStatusDiv
> Used to work every time in 2.6 release.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-8286) ScanQuery ignore setLocal with non local partition

2018-08-30 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-8286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16597347#comment-16597347
 ] 

Ilya Lantukh commented on IGNITE-8286:
--

[~roman_s], thanks!

There is still one minor issue - new field is now immutable, but is still 
marked as *volatile* instead of *final*.

> ScanQuery ignore setLocal with non local partition
> --
>
> Key: IGNITE-8286
> URL: https://issues.apache.org/jira/browse/IGNITE-8286
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.4
>Reporter: Alexander Belyak
>Assignee: Roman Shtykh
>Priority: Major
> Fix For: 2.7
>
>
> 1) Create partitioned cache on 2+ nodes cluster
> 2) Select some partition N, local node should not be OWNER of partition N
> 3) execute: cache.query(new ScanQuery<>().setLocal(true).setPartition(N))
> Expected result:
> empty result (probaply with logging smth like "Trying to execute local query 
>  with non local partition N") or even throw exception
> Actual result:
> executing (with ScanQueryFallbackClosableIterator) query on remote node.
> Problem is that we execute local query on remote node.
> Same behaviour can be achieved if we get empty node list from 
> GridCacheQueryAdapter.node() by any reasons, for example - if we run "local" 
> query from non data node from given cache (see 
> GridDiscoveryNamager.cacheAffinityNode(ClusterNode node, String cacheName) in 
> GridcacheQueryAdapter.executeScanQuery()



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9326) IgniteCacheFailedUpdateResponseTest hangs in master

2018-08-29 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16596604#comment-16596604
 ] 

Ilya Lantukh commented on IGNITE-9326:
--

Looks good.

> IgniteCacheFailedUpdateResponseTest hangs in master
> ---
>
> Key: IGNITE-9326
> URL: https://issues.apache.org/jira/browse/IGNITE-9326
> Project: Ignite
>  Issue Type: Test
>Reporter: Alexey Goncharuk
>Assignee: Alexey Goncharuk
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
> Fix For: 2.7
>
>
> The test started to hang after IGNITE-8926 was merged.
> The reason is that entry processor result started to be lazily serialized 
> during the message send, which results in a failure handler invocation. 
> However, the test checks that the exception is rethrown to a user.
> The issue affects only ATOMIC caches.
> One of the possible fixes is to marshal the result after the topology lock is 
> released.
> Muting test in master for now.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-9400) TC bot: add progress bar to history page

2018-08-28 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-9400:


 Summary: TC bot: add progress bar to history page
 Key: IGNITE-9400
 URL: https://issues.apache.org/jira/browse/IGNITE-9400
 Project: Ignite
  Issue Type: Improvement
Reporter: Ilya Lantukh


History page (like */all.html?branch=master*) takes significant amount of time 
to load, and it would be helpful to replace spinning wheel with progress bar.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-9392) CacheAsyncOperationsFailoverTxTest hangs on TC

2018-08-28 Thread Ilya Lantukh (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ilya Lantukh updated IGNITE-9392:
-
Labels: MakeTeamcityGreenAgain  (was: )

> CacheAsyncOperationsFailoverTxTest hangs on TC
> --
>
> Key: IGNITE-9392
> URL: https://issues.apache.org/jira/browse/IGNITE-9392
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ilya Lantukh
>Assignee: Ilya Lantukh
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
>
> Exchange worker hangs while waiting for partition release:
> {code}
> [13:42:50] :   [Step 3/4] Thread 
> [name="exchange-worker-#214881%distributed.CacheAsyncOperationsFailoverTxTest0%",
>  id=245275, state=TIMED_WAITING, blockCnt=135, waitCnt=176]
> [13:42:50] :   [Step 3/4] at sun.misc.Unsafe.park(Native Method)
> [13:42:50] :   [Step 3/4] at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:338)
> [13:42:50] :   [Step 3/4] at 
> o.a.i.i.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:217)
> [13:42:50] :   [Step 3/4] at 
> o.a.i.i.util.future.GridFutureAdapter.get(GridFutureAdapter.java:159)
> [13:42:50] :   [Step 3/4] at 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.waitPartitionRelease(GridDhtPartitionsExchangeFuture.java:1367)
> [13:42:50] :   [Step 3/4] at 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:1211)
> [13:42:50] :   [Step 3/4] at 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:752)
> [13:42:50] :   [Step 3/4] at 
> o.a.i.i.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:2525)
> [13:42:50] :   [Step 3/4] at 
> o.a.i.i.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2405)
> [13:42:50] :   [Step 3/4] at 
> o.a.i.i.util.worker.GridWorker.run(GridWorker.java:110)
> [13:42:50] :   [Step 3/4] at java.lang.Thread.run(Thread.java:748)
> {code}
> At that moment there are lots of pending transactions and one pending TX 
> finish future:
> {code}
> [13:43:14]W:   [org.apache.ignite:ignite-core] [2018-08-27 
> 10:43:14,632][WARN 
> ][exchange-worker-#214881%distributed.CacheAsyncOperationsFailoverTxTest0%][diagnostic]
>  Failed to wait for partition map exchange [topVer=AffinityTopologyVersion 
> [topVer=37, minorTopVer=0], node=98909049-bca4-4cba-b659-768ccfe0]. 
> Dumping pending objects that might be the cause: 
> [13:43:14]W:   [org.apache.ignite:ignite-core] [2018-08-27 
> 10:43:14,632][WARN 
> ][exchange-worker-#214881%distributed.CacheAsyncOperationsFailoverTxTest0%][diagnostic]
>  Ready affinity version: AffinityTopologyVersion [topVer=36, minorTopVer=0]
> [13:43:14]W:   [org.apache.ignite:ignite-core] [2018-08-27 
> 10:43:14,633][WARN 
> ][exchange-worker-#214881%distributed.CacheAsyncOperationsFailoverTxTest0%][diagnostic]
>  Last exchange future: GridDhtPartitionsExchangeFuture 
> [firstDiscoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode 
> [id=387bab47-9542-4d98-8dda-03e3a3f3, addrs=ArrayList [127.0.0.1], 
> sockAddrs=HashSet [/127.0.0.1:47503], discPort=47503, order=4, intOrder=4, 
> lastExchangeTime=1535366199108, loc=false, ver=2.7.0#20180824-sha1:b0175cf6, 
> isClient=false], topVer=37, nodeId8=98909049, msg=Node left: TcpDiscoveryNode 
> [id=387bab47-9542-4d98-8dda-03e3a3f3, addrs=ArrayList [127.0.0.1], 
> sockAddrs=HashSet [/127.0.0.1:47503], discPort=47503, order=4, intOrder=4, 
> lastExchangeTime=1535366199108, loc=false, ver=2.7.0#20180824-sha1:b0175cf6, 
> isClient=false], type=NODE_LEFT, tstamp=1535366275135], crd=TcpDiscoveryNode 
> [id=98909049-bca4-4cba-b659-768ccfe0, addrs=ArrayList [127.0.0.1], 
> sockAddrs=HashSet [/127.0.0.1:47500], discPort=47500, order=1, intOrder=1, 
> lastExchangeTime=1535366575460, loc=true, ver=2.7.0#20180824-sha1:b0175cf6, 
> isClient=false], exchId=GridDhtPartitionExchangeId 
> [topVer=AffinityTopologyVersion [topVer=37, minorTopVer=0], 
> discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode 
> [id=387bab47-9542-4d98-8dda-03e3a3f3, addrs=ArrayList [127.0.0.1], 
> sockAddrs=HashSet [/127.0.0.1:47503], discPort=47503, order=4, intOrder=4, 
> lastExchangeTime=1535366199108, loc=false, ver=2.7.0#20180824-sha1:b0175cf6, 
> isClient=false], topVer=37, nodeId8=98909049, msg=Node left: TcpDiscoveryNode 
> [id=387bab47-9542-4d98-8dda-03e3a3f3, addrs=ArrayList [127.0.0.1], 
> sockAddrs=HashSet [/127.0.0.1:47503], discPort=47503, order=4, intOrder=4, 
> lastExchangeTime=1535366199108, loc=false, ver=2.7.0#20180824-sha1:b01

[jira] [Updated] (IGNITE-9392) CacheAsyncOperationsFailoverTxTest hangs on TC

2018-08-28 Thread Ilya Lantukh (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ilya Lantukh updated IGNITE-9392:
-
Ignite Flags:   (was: Docs Required)

> CacheAsyncOperationsFailoverTxTest hangs on TC
> --
>
> Key: IGNITE-9392
> URL: https://issues.apache.org/jira/browse/IGNITE-9392
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ilya Lantukh
>Assignee: Ilya Lantukh
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
>
> Exchange worker hangs while waiting for partition release:
> {code}
> [13:42:50] :   [Step 3/4] Thread 
> [name="exchange-worker-#214881%distributed.CacheAsyncOperationsFailoverTxTest0%",
>  id=245275, state=TIMED_WAITING, blockCnt=135, waitCnt=176]
> [13:42:50] :   [Step 3/4] at sun.misc.Unsafe.park(Native Method)
> [13:42:50] :   [Step 3/4] at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:338)
> [13:42:50] :   [Step 3/4] at 
> o.a.i.i.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:217)
> [13:42:50] :   [Step 3/4] at 
> o.a.i.i.util.future.GridFutureAdapter.get(GridFutureAdapter.java:159)
> [13:42:50] :   [Step 3/4] at 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.waitPartitionRelease(GridDhtPartitionsExchangeFuture.java:1367)
> [13:42:50] :   [Step 3/4] at 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:1211)
> [13:42:50] :   [Step 3/4] at 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:752)
> [13:42:50] :   [Step 3/4] at 
> o.a.i.i.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:2525)
> [13:42:50] :   [Step 3/4] at 
> o.a.i.i.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2405)
> [13:42:50] :   [Step 3/4] at 
> o.a.i.i.util.worker.GridWorker.run(GridWorker.java:110)
> [13:42:50] :   [Step 3/4] at java.lang.Thread.run(Thread.java:748)
> {code}
> At that moment there are lots of pending transactions and one pending TX 
> finish future:
> {code}
> [13:43:14]W:   [org.apache.ignite:ignite-core] [2018-08-27 
> 10:43:14,632][WARN 
> ][exchange-worker-#214881%distributed.CacheAsyncOperationsFailoverTxTest0%][diagnostic]
>  Failed to wait for partition map exchange [topVer=AffinityTopologyVersion 
> [topVer=37, minorTopVer=0], node=98909049-bca4-4cba-b659-768ccfe0]. 
> Dumping pending objects that might be the cause: 
> [13:43:14]W:   [org.apache.ignite:ignite-core] [2018-08-27 
> 10:43:14,632][WARN 
> ][exchange-worker-#214881%distributed.CacheAsyncOperationsFailoverTxTest0%][diagnostic]
>  Ready affinity version: AffinityTopologyVersion [topVer=36, minorTopVer=0]
> [13:43:14]W:   [org.apache.ignite:ignite-core] [2018-08-27 
> 10:43:14,633][WARN 
> ][exchange-worker-#214881%distributed.CacheAsyncOperationsFailoverTxTest0%][diagnostic]
>  Last exchange future: GridDhtPartitionsExchangeFuture 
> [firstDiscoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode 
> [id=387bab47-9542-4d98-8dda-03e3a3f3, addrs=ArrayList [127.0.0.1], 
> sockAddrs=HashSet [/127.0.0.1:47503], discPort=47503, order=4, intOrder=4, 
> lastExchangeTime=1535366199108, loc=false, ver=2.7.0#20180824-sha1:b0175cf6, 
> isClient=false], topVer=37, nodeId8=98909049, msg=Node left: TcpDiscoveryNode 
> [id=387bab47-9542-4d98-8dda-03e3a3f3, addrs=ArrayList [127.0.0.1], 
> sockAddrs=HashSet [/127.0.0.1:47503], discPort=47503, order=4, intOrder=4, 
> lastExchangeTime=1535366199108, loc=false, ver=2.7.0#20180824-sha1:b0175cf6, 
> isClient=false], type=NODE_LEFT, tstamp=1535366275135], crd=TcpDiscoveryNode 
> [id=98909049-bca4-4cba-b659-768ccfe0, addrs=ArrayList [127.0.0.1], 
> sockAddrs=HashSet [/127.0.0.1:47500], discPort=47500, order=1, intOrder=1, 
> lastExchangeTime=1535366575460, loc=true, ver=2.7.0#20180824-sha1:b0175cf6, 
> isClient=false], exchId=GridDhtPartitionExchangeId 
> [topVer=AffinityTopologyVersion [topVer=37, minorTopVer=0], 
> discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode 
> [id=387bab47-9542-4d98-8dda-03e3a3f3, addrs=ArrayList [127.0.0.1], 
> sockAddrs=HashSet [/127.0.0.1:47503], discPort=47503, order=4, intOrder=4, 
> lastExchangeTime=1535366199108, loc=false, ver=2.7.0#20180824-sha1:b0175cf6, 
> isClient=false], topVer=37, nodeId8=98909049, msg=Node left: TcpDiscoveryNode 
> [id=387bab47-9542-4d98-8dda-03e3a3f3, addrs=ArrayList [127.0.0.1], 
> sockAddrs=HashSet [/127.0.0.1:47503], discPort=47503, order=4, intOrder=4, 
> lastExchangeTime=1535366199108, loc=false, ver=2.7.0#20180824-sha1:b0175c

[jira] [Commented] (IGNITE-9392) CacheAsyncOperationsFailoverTxTest hangs on TC

2018-08-28 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16594903#comment-16594903
 ] 

Ilya Lantukh commented on IGNITE-9392:
--

Looks like setting TX timeout resolves this issue.

> CacheAsyncOperationsFailoverTxTest hangs on TC
> --
>
> Key: IGNITE-9392
> URL: https://issues.apache.org/jira/browse/IGNITE-9392
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ilya Lantukh
>Assignee: Ilya Lantukh
>Priority: Major
>
> Exchange worker hangs while waiting for partition release:
> {code}
> [13:42:50] :   [Step 3/4] Thread 
> [name="exchange-worker-#214881%distributed.CacheAsyncOperationsFailoverTxTest0%",
>  id=245275, state=TIMED_WAITING, blockCnt=135, waitCnt=176]
> [13:42:50] :   [Step 3/4] at sun.misc.Unsafe.park(Native Method)
> [13:42:50] :   [Step 3/4] at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:338)
> [13:42:50] :   [Step 3/4] at 
> o.a.i.i.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:217)
> [13:42:50] :   [Step 3/4] at 
> o.a.i.i.util.future.GridFutureAdapter.get(GridFutureAdapter.java:159)
> [13:42:50] :   [Step 3/4] at 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.waitPartitionRelease(GridDhtPartitionsExchangeFuture.java:1367)
> [13:42:50] :   [Step 3/4] at 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:1211)
> [13:42:50] :   [Step 3/4] at 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:752)
> [13:42:50] :   [Step 3/4] at 
> o.a.i.i.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:2525)
> [13:42:50] :   [Step 3/4] at 
> o.a.i.i.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2405)
> [13:42:50] :   [Step 3/4] at 
> o.a.i.i.util.worker.GridWorker.run(GridWorker.java:110)
> [13:42:50] :   [Step 3/4] at java.lang.Thread.run(Thread.java:748)
> {code}
> At that moment there are lots of pending transactions and one pending TX 
> finish future:
> {code}
> [13:43:14]W:   [org.apache.ignite:ignite-core] [2018-08-27 
> 10:43:14,632][WARN 
> ][exchange-worker-#214881%distributed.CacheAsyncOperationsFailoverTxTest0%][diagnostic]
>  Failed to wait for partition map exchange [topVer=AffinityTopologyVersion 
> [topVer=37, minorTopVer=0], node=98909049-bca4-4cba-b659-768ccfe0]. 
> Dumping pending objects that might be the cause: 
> [13:43:14]W:   [org.apache.ignite:ignite-core] [2018-08-27 
> 10:43:14,632][WARN 
> ][exchange-worker-#214881%distributed.CacheAsyncOperationsFailoverTxTest0%][diagnostic]
>  Ready affinity version: AffinityTopologyVersion [topVer=36, minorTopVer=0]
> [13:43:14]W:   [org.apache.ignite:ignite-core] [2018-08-27 
> 10:43:14,633][WARN 
> ][exchange-worker-#214881%distributed.CacheAsyncOperationsFailoverTxTest0%][diagnostic]
>  Last exchange future: GridDhtPartitionsExchangeFuture 
> [firstDiscoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode 
> [id=387bab47-9542-4d98-8dda-03e3a3f3, addrs=ArrayList [127.0.0.1], 
> sockAddrs=HashSet [/127.0.0.1:47503], discPort=47503, order=4, intOrder=4, 
> lastExchangeTime=1535366199108, loc=false, ver=2.7.0#20180824-sha1:b0175cf6, 
> isClient=false], topVer=37, nodeId8=98909049, msg=Node left: TcpDiscoveryNode 
> [id=387bab47-9542-4d98-8dda-03e3a3f3, addrs=ArrayList [127.0.0.1], 
> sockAddrs=HashSet [/127.0.0.1:47503], discPort=47503, order=4, intOrder=4, 
> lastExchangeTime=1535366199108, loc=false, ver=2.7.0#20180824-sha1:b0175cf6, 
> isClient=false], type=NODE_LEFT, tstamp=1535366275135], crd=TcpDiscoveryNode 
> [id=98909049-bca4-4cba-b659-768ccfe0, addrs=ArrayList [127.0.0.1], 
> sockAddrs=HashSet [/127.0.0.1:47500], discPort=47500, order=1, intOrder=1, 
> lastExchangeTime=1535366575460, loc=true, ver=2.7.0#20180824-sha1:b0175cf6, 
> isClient=false], exchId=GridDhtPartitionExchangeId 
> [topVer=AffinityTopologyVersion [topVer=37, minorTopVer=0], 
> discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode 
> [id=387bab47-9542-4d98-8dda-03e3a3f3, addrs=ArrayList [127.0.0.1], 
> sockAddrs=HashSet [/127.0.0.1:47503], discPort=47503, order=4, intOrder=4, 
> lastExchangeTime=1535366199108, loc=false, ver=2.7.0#20180824-sha1:b0175cf6, 
> isClient=false], topVer=37, nodeId8=98909049, msg=Node left: TcpDiscoveryNode 
> [id=387bab47-9542-4d98-8dda-03e3a3f3, addrs=ArrayList [127.0.0.1], 
> sockAddrs=HashSet [/127.0.0.1:47503], discPort=47503, order=4, intOrder=4, 
> lastExchangeTime=1535366199108, loc=false, ver=2.7.

[jira] [Created] (IGNITE-9392) CacheAsyncOperationsFailoverTxTest hangs on TC

2018-08-27 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-9392:


 Summary: CacheAsyncOperationsFailoverTxTest hangs on TC
 Key: IGNITE-9392
 URL: https://issues.apache.org/jira/browse/IGNITE-9392
 Project: Ignite
  Issue Type: Bug
Reporter: Ilya Lantukh
Assignee: Ilya Lantukh






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9249) Tests hang when different threads try to start and stop nodes at the same time.

2018-08-10 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16576286#comment-16576286
 ] 

Ilya Lantukh commented on IGNITE-9249:
--

https://ci.ignite.apache.org/viewQueued.html?itemId=1628472

> Tests hang when different threads try to start and stop nodes at the same 
> time.
> ---
>
> Key: IGNITE-9249
> URL: https://issues.apache.org/jira/browse/IGNITE-9249
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ilya Lantukh
>Assignee: Ilya Lantukh
>Priority: Major
>
> An example of such test is 
> GridCachePartitionedNearDisabledOptimisticTxNodeRestartTest.testRestartWithPutFourNodesOneBackupsOffheapEvict().
> Hanged threads:
> {code}
> "restart-worker-1@63424" prio=5 tid=0x7f5e nid=NA waiting
>   java.lang.Thread.State: WAITING
> at java.lang.Object.wait(Object.java:-1)
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl.joinTopology(ServerImpl.java:949)
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl.spiStart(ServerImpl.java:389)
> at 
> org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.spiStart(TcpDiscoverySpi.java:2002)
> at 
> org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:297)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.start(GridDiscoveryManager.java:916)
> at 
> org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1754)
> at 
> org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1050)
> at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2020)
> at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1725)
> - locked <0xfc36> (a 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance)
> at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1153)
> at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:651)
> at 
> org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:920)
> at 
> org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:858)
> at 
> org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:846)
> at 
> org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:812)
> at 
> org.apache.ignite.internal.processors.cache.distributed.GridCacheAbstractNodeRestartSelfTest.access$1000(GridCacheAbstractNodeRestartSelfTest.java:64)
> at 
> org.apache.ignite.internal.processors.cache.distributed.GridCacheAbstractNodeRestartSelfTest$2.run(GridCacheAbstractNodeRestartSelfTest.java:665)
> at java.lang.Thread.run(Thread.java:748)
> "restart-worker-0@63423" prio=5 tid=0x7f5d nid=NA waiting
>   java.lang.Thread.State: WAITING
> at sun.misc.Unsafe.park(Unsafe.java:-1)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
> at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
> at 
> org.apache.ignite.internal.util.IgniteUtils.awaitQuiet(IgniteUtils.java:7584)
> at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.grid(IgnitionEx.java:1666)
> at 
> org.apache.ignite.internal.IgnitionEx.allGrids(IgnitionEx.java:1284)
> at 
> org.apache.ignite.internal.IgnitionEx.allGrids(IgnitionEx.java:1262)
> at org.apache.ignite.Ignition.allGrids(Ignition.java:502)
> at 
> org.apache.ignite.testframework.junits.GridAbstractTest.awaitTopologyChange(GridAbstractTest.java:2258)
> at 
> org.apache.ignite.testframework.junits.GridAbstractTest.stopGrid(GridAbstractTest.java:1158)
> at 
> org.apache.ignite.testframework.junits.GridAbstractTest.stopGrid(GridAbstractTest.java:1133)
> at 
> org.apache.ignite.testframework.junits.GridAbstractTest.stopGrid(GridAbstractTest.java:1433)
> at 
> org.apache.ignite.internal.processors.cache.distributed.GridCacheAbstractNodeRestartSelfTest.access$800(GridCacheAbstractNodeRestartSelfTest.java:64)
> at 
> org.apache.ignite.internal.processors.cache.distributed.GridCacheAbstractNodeRestartSelfTest$2.run(GridCacheAbstractNodeRestartSelfTest.java:661)
> at java.lang.Thread.run(Threa

[jira] [Commented] (IGNITE-9249) Tests hang when different threads try to start and stop nodes at the same time.

2018-08-10 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16576277#comment-16576277
 ] 

Ilya Lantukh commented on IGNITE-9249:
--

As a temporary solution I suggest to set join timeout in GridAbstractTest, so 
tests will fail instead of hanging up.

> Tests hang when different threads try to start and stop nodes at the same 
> time.
> ---
>
> Key: IGNITE-9249
> URL: https://issues.apache.org/jira/browse/IGNITE-9249
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ilya Lantukh
>Assignee: Ilya Lantukh
>Priority: Major
>
> An example of such test is 
> GridCachePartitionedNearDisabledOptimisticTxNodeRestartTest.testRestartWithPutFourNodesOneBackupsOffheapEvict().
> Hanged threads:
> {code}
> "restart-worker-1@63424" prio=5 tid=0x7f5e nid=NA waiting
>   java.lang.Thread.State: WAITING
> at java.lang.Object.wait(Object.java:-1)
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl.joinTopology(ServerImpl.java:949)
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl.spiStart(ServerImpl.java:389)
> at 
> org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.spiStart(TcpDiscoverySpi.java:2002)
> at 
> org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:297)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.start(GridDiscoveryManager.java:916)
> at 
> org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1754)
> at 
> org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1050)
> at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2020)
> at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1725)
> - locked <0xfc36> (a 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance)
> at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1153)
> at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:651)
> at 
> org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:920)
> at 
> org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:858)
> at 
> org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:846)
> at 
> org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:812)
> at 
> org.apache.ignite.internal.processors.cache.distributed.GridCacheAbstractNodeRestartSelfTest.access$1000(GridCacheAbstractNodeRestartSelfTest.java:64)
> at 
> org.apache.ignite.internal.processors.cache.distributed.GridCacheAbstractNodeRestartSelfTest$2.run(GridCacheAbstractNodeRestartSelfTest.java:665)
> at java.lang.Thread.run(Thread.java:748)
> "restart-worker-0@63423" prio=5 tid=0x7f5d nid=NA waiting
>   java.lang.Thread.State: WAITING
> at sun.misc.Unsafe.park(Unsafe.java:-1)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
> at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
> at 
> org.apache.ignite.internal.util.IgniteUtils.awaitQuiet(IgniteUtils.java:7584)
> at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.grid(IgnitionEx.java:1666)
> at 
> org.apache.ignite.internal.IgnitionEx.allGrids(IgnitionEx.java:1284)
> at 
> org.apache.ignite.internal.IgnitionEx.allGrids(IgnitionEx.java:1262)
> at org.apache.ignite.Ignition.allGrids(Ignition.java:502)
> at 
> org.apache.ignite.testframework.junits.GridAbstractTest.awaitTopologyChange(GridAbstractTest.java:2258)
> at 
> org.apache.ignite.testframework.junits.GridAbstractTest.stopGrid(GridAbstractTest.java:1158)
> at 
> org.apache.ignite.testframework.junits.GridAbstractTest.stopGrid(GridAbstractTest.java:1133)
> at 
> org.apache.ignite.testframework.junits.GridAbstractTest.stopGrid(GridAbstractTest.java:1433)
> at 
> org.apache.ignite.internal.processors.cache.distributed.GridCacheAbstractNodeRestartSelfTest.access$800(GridCacheAbstractNodeRestartSelfTest.java:64)
> at 
> org.apache.ignite.internal.processors.cache.distributed.GridCacheAbstractNodeRestartSelfTest$2.run(GridCacheAbstractNodeRestart

[jira] [Created] (IGNITE-9249) Tests hang when different threads try to start and stop nodes at the same time.

2018-08-10 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-9249:


 Summary: Tests hang when different threads try to start and stop 
nodes at the same time.
 Key: IGNITE-9249
 URL: https://issues.apache.org/jira/browse/IGNITE-9249
 Project: Ignite
  Issue Type: Bug
Reporter: Ilya Lantukh
Assignee: Ilya Lantukh


An example of such test is 
GridCachePartitionedNearDisabledOptimisticTxNodeRestartTest.testRestartWithPutFourNodesOneBackupsOffheapEvict().

Hanged threads:
{code}
"restart-worker-1@63424" prio=5 tid=0x7f5e nid=NA waiting
  java.lang.Thread.State: WAITING
  at java.lang.Object.wait(Object.java:-1)
  at 
org.apache.ignite.spi.discovery.tcp.ServerImpl.joinTopology(ServerImpl.java:949)
  at 
org.apache.ignite.spi.discovery.tcp.ServerImpl.spiStart(ServerImpl.java:389)
  at 
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.spiStart(TcpDiscoverySpi.java:2002)
  at 
org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:297)
  at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.start(GridDiscoveryManager.java:916)
  at 
org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1754)
  at 
org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1050)
  at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2020)
  at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1725)
  - locked <0xfc36> (a 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance)
  at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1153)
  at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:651)
  at 
org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:920)
  at 
org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:858)
  at 
org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:846)
  at 
org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:812)
  at 
org.apache.ignite.internal.processors.cache.distributed.GridCacheAbstractNodeRestartSelfTest.access$1000(GridCacheAbstractNodeRestartSelfTest.java:64)
  at 
org.apache.ignite.internal.processors.cache.distributed.GridCacheAbstractNodeRestartSelfTest$2.run(GridCacheAbstractNodeRestartSelfTest.java:665)
  at java.lang.Thread.run(Thread.java:748)

"restart-worker-0@63423" prio=5 tid=0x7f5d nid=NA waiting
  java.lang.Thread.State: WAITING
  at sun.misc.Unsafe.park(Unsafe.java:-1)
  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
  at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
  at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
  at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
  at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
  at 
org.apache.ignite.internal.util.IgniteUtils.awaitQuiet(IgniteUtils.java:7584)
  at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.grid(IgnitionEx.java:1666)
  at 
org.apache.ignite.internal.IgnitionEx.allGrids(IgnitionEx.java:1284)
  at 
org.apache.ignite.internal.IgnitionEx.allGrids(IgnitionEx.java:1262)
  at org.apache.ignite.Ignition.allGrids(Ignition.java:502)
  at 
org.apache.ignite.testframework.junits.GridAbstractTest.awaitTopologyChange(GridAbstractTest.java:2258)
  at 
org.apache.ignite.testframework.junits.GridAbstractTest.stopGrid(GridAbstractTest.java:1158)
  at 
org.apache.ignite.testframework.junits.GridAbstractTest.stopGrid(GridAbstractTest.java:1133)
  at 
org.apache.ignite.testframework.junits.GridAbstractTest.stopGrid(GridAbstractTest.java:1433)
  at 
org.apache.ignite.internal.processors.cache.distributed.GridCacheAbstractNodeRestartSelfTest.access$800(GridCacheAbstractNodeRestartSelfTest.java:64)
  at 
org.apache.ignite.internal.processors.cache.distributed.GridCacheAbstractNodeRestartSelfTest$2.run(GridCacheAbstractNodeRestartSelfTest.java:661)
  at java.lang.Thread.run(Thread.java:748)
{code}

Full thread dump:
{code}
"test-runner-#26488%dht.GridCachePartitionedNearDisabledOptimisticTxNodeRestartTest%@63124"
 prio=5 tid=0x7e6a nid=NA waiting
  java.lang.Thread.State: WAITING
  at sun.misc.Unsafe.park(Unsafe.java:-1)
  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
  at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInte

[jira] [Commented] (IGNITE-8724) Skip logging 3-rd parameter while calling U.warn with initialized logger.

2018-08-10 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-8724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16576224#comment-16576224
 ] 

Ilya Lantukh commented on IGNITE-8724:
--

Thanks! Looks good now.

> Skip logging 3-rd parameter while calling U.warn with initialized logger.
> -
>
> Key: IGNITE-8724
> URL: https://issues.apache.org/jira/browse/IGNITE-8724
> Project: Ignite
>  Issue Type: Bug
>  Components: general
>Affects Versions: 2.5
>Reporter: Stanilovsky Evgeny
>Assignee: Stanilovsky Evgeny
>Priority: Major
> Fix For: 2.7
>
> Attachments: tc.png
>
>
> There are a lot of places where exception need to be logged, for example :
> {code:java}
> U.warn(log,"Unable to await partitions release future", e);
> {code}
> but current U.warn realization silently swallow it.
> {code:java}
> public static void warn(@Nullable IgniteLogger log, Object longMsg, 
> Object shortMsg) {
> assert longMsg != null;
> assert shortMsg != null;
> if (log != null)
> log.warning(compact(longMsg.toString()));
> else
> X.println("[" + SHORT_DATE_FMT.format(new java.util.Date()) + "] 
> (wrn) " +
> compact(shortMsg.toString()));
> }
> {code}
> fix, looks like simple add:
> {code:java}
> public static void warn(@Nullable IgniteLogger log, Object longMsg, 
> Throwable ex) {
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9178) Partition lost event are not triggered if multiple nodes left cluster

2018-08-10 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16576209#comment-16576209
 ] 

Ilya Lantukh commented on IGNITE-9178:
--

[~agoncharuk],

I've double-checked this PR, it looks correct to me. {{leftNode2Part}} in this 
case is just a temporary map that is used to fire part lost events. There is no 
need to update {{diffFromAffinity}} in that part of code, because it will be 
re-calculated later.

[~pvinokurov],

Thanks for contribution!

> Partition lost event are not triggered if multiple nodes left cluster
> -
>
> Key: IGNITE-9178
> URL: https://issues.apache.org/jira/browse/IGNITE-9178
> Project: Ignite
>  Issue Type: Bug
>  Components: cache
>Affects Versions: 2.4
>Reporter: Pavel Vinokurov
>Assignee: Pavel Vinokurov
>Priority: Blocker
> Fix For: 2.7
>
>
> If multiple nodes left cluster simultaneously, left partitions are removed 
> from GridDhtPartitionTopologyImpl#node2part without adding to leftNode2Part  
> in GridDhtPartitionTopologyImpl#update method.
> Thus GridDhtPartitionTopologyImpl#detectLostPartitions can't detect lost 
> partitions



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-8724) Skip logging 3-rd parameter while calling U.warn with initialized logger.

2018-08-08 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-8724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16573354#comment-16573354
 ] 

Ilya Lantukh commented on IGNITE-8724:
--

Hi [~zstan],

Thanks for contribution!
I think that fixing this problem is very important, and I don't see any reason 
to leave old method in our codebase. Please remove it. If it has any usages 
where last argument isn't Throwable, feel free to change them.

> Skip logging 3-rd parameter while calling U.warn with initialized logger.
> -
>
> Key: IGNITE-8724
> URL: https://issues.apache.org/jira/browse/IGNITE-8724
> Project: Ignite
>  Issue Type: Bug
>  Components: general
>Affects Versions: 2.5
>Reporter: Stanilovsky Evgeny
>Assignee: Stanilovsky Evgeny
>Priority: Major
> Fix For: 2.7
>
> Attachments: tc.png
>
>
> There are a lot of places where exception need to be logged, for example :
> {code:java}
> U.warn(log,"Unable to await partitions release future", e);
> {code}
> but current U.warn realization silently swallow it.
> {code:java}
> public static void warn(@Nullable IgniteLogger log, Object longMsg, 
> Object shortMsg) {
> assert longMsg != null;
> assert shortMsg != null;
> if (log != null)
> log.warning(compact(longMsg.toString()));
> else
> X.println("[" + SHORT_DATE_FMT.format(new java.util.Date()) + "] 
> (wrn) " +
> compact(shortMsg.toString()));
> }
> {code}
> fix, looks like simple add:
> {code:java}
> public static void warn(@Nullable IgniteLogger log, Object longMsg, 
> Throwable ex) {
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9236) Handshake timeout never completes in some tests (GridCacheReplicatedFailoverSelfTest in particular)

2018-08-08 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16573255#comment-16573255
 ] 

Ilya Lantukh commented on IGNITE-9236:
--

https://ci.ignite.apache.org/viewLog.html?buildId=1612771&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_RunAll

> Handshake timeout never completes in some tests 
> (GridCacheReplicatedFailoverSelfTest in particular)
> ---
>
> Key: IGNITE-9236
> URL: https://issues.apache.org/jira/browse/IGNITE-9236
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ilya Lantukh
>Assignee: Ilya Lantukh
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
>
> In GridCacheReplicatedFailoverSelfTest one thread tries to establish TCP 
> connection and hangs on handshake forever, holding lock on RebalanceFuture:
> {code}
> [11:51:55] :   [Step 3/4] Locked synchronizers:
> [11:51:55] :   [Step 3/4] 
> java.util.concurrent.ThreadPoolExecutor$Worker@5b17b883
> [11:51:55] :   [Step 3/4] Thread 
> [name="sys-#68921%new-node-topology-change-thread-1%", id=77410, 
> state=RUNNABLE, blockCnt=3, waitCnt=0]
> [11:51:55] :   [Step 3/4] at 
> sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> [11:51:55] :   [Step 3/4] at 
> sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> [11:51:55] :   [Step 3/4] at 
> sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> [11:51:55] :   [Step 3/4] at sun.nio.ch.IOUtil.read(IOUtil.java:197)
> [11:51:55] :   [Step 3/4] at 
> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
> [11:51:55] :   [Step 3/4] - locked java.lang.Object@23aaa756
> [11:51:55] :   [Step 3/4] at 
> o.a.i.spi.communication.tcp.TcpCommunicationSpi.safeTcpHandshake(TcpCommunicationSpi.java:3647)
> [11:51:55] :   [Step 3/4] at 
> o.a.i.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3293)
> [11:51:55] :   [Step 3/4] at 
> o.a.i.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2967)
> [11:51:55] :   [Step 3/4] at 
> o.a.i.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2850)
> [11:51:55] :   [Step 3/4] at 
> o.a.i.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2693)
> [11:51:55] :   [Step 3/4] at 
> o.a.i.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2652)
> [11:51:55] :   [Step 3/4] at 
> o.a.i.i.managers.communication.GridIoManager.send(GridIoManager.java:1643)
> [11:51:55] :   [Step 3/4] at 
> o.a.i.i.managers.communication.GridIoManager.sendOrderedMessage(GridIoManager.java:1750)
> [11:51:55] :   [Step 3/4] at 
> o.a.i.i.processors.cache.GridCacheIoManager.sendOrderedMessage(GridCacheIoManager.java:1231)
> [11:51:55] :   [Step 3/4] at 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$RebalanceFuture.cleanupRemoteContexts(GridDhtPartitionDemander.java:)
> [11:51:55] :   [Step 3/4] at 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$RebalanceFuture.cancel(GridDhtPartitionDemander.java:1041)
> [11:51:55] :   [Step 3/4] - locked 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$RebalanceFuture@7e28f150
> [11:51:55] :   [Step 3/4] at 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander.lambda$null$2(GridDhtPartitionDemander.java:534)
> [11:51:55] :   [Step 3/4] at 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$$Lambda$41/603501511.run(Unknown
>  Source)
> [11:51:55] :   [Step 3/4] at 
> o.a.i.i.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6800)
> [11:51:55] :   [Step 3/4] at 
> o.a.i.i.processors.closure.GridClosureProcessor$1.body(GridClosureProcessor.java:827)
> [11:51:55] :   [Step 3/4] at 
> o.a.i.i.util.worker.GridWorker.run(GridWorker.java:110)
> [11:51:55] :   [Step 3/4] at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [11:51:55] :   [Step 3/4] at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [11:51:55] :   [Step 3/4] at java.lang.Thread.run(Thread.java:748)
> {code}
> Because of that, exchange worker hangs forever while trying to acquire that 
> lock:
> {code}
> [11:51:55] :   [Step 3/4] Thread 
> [name="exchange-worker-#68894%new-node-topology-change-thread-1%", id=77379, 
> state=BLOCKED, blockCnt=11, waitCnt=7]
> [11:51:55] :   [Step 3/4] Lock 
> [object=o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$RebalanceFuture@

[jira] [Created] (IGNITE-9236) Handshake timeout never completes in some tests (GridCacheReplicatedFailoverSelfTest in particular)

2018-08-08 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-9236:


 Summary: Handshake timeout never completes in some tests 
(GridCacheReplicatedFailoverSelfTest in particular)
 Key: IGNITE-9236
 URL: https://issues.apache.org/jira/browse/IGNITE-9236
 Project: Ignite
  Issue Type: Bug
Reporter: Ilya Lantukh
Assignee: Ilya Lantukh


In GridCacheReplicatedFailoverSelfTest one thread tries to establish TCP 
connection and hangs on handshake forever, holding lock on RebalanceFuture:
{code}
[11:51:55] : [Step 3/4] Locked synchronizers:
[11:51:55] : [Step 3/4] 
java.util.concurrent.ThreadPoolExecutor$Worker@5b17b883
[11:51:55] : [Step 3/4] Thread 
[name="sys-#68921%new-node-topology-change-thread-1%", id=77410, 
state=RUNNABLE, blockCnt=3, waitCnt=0]
[11:51:55] : [Step 3/4] at 
sun.nio.ch.FileDispatcherImpl.read0(Native Method)
[11:51:55] : [Step 3/4] at 
sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
[11:51:55] : [Step 3/4] at 
sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
[11:51:55] : [Step 3/4] at sun.nio.ch.IOUtil.read(IOUtil.java:197)
[11:51:55] : [Step 3/4] at 
sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
[11:51:55] : [Step 3/4] - locked java.lang.Object@23aaa756
[11:51:55] : [Step 3/4] at 
o.a.i.spi.communication.tcp.TcpCommunicationSpi.safeTcpHandshake(TcpCommunicationSpi.java:3647)
[11:51:55] : [Step 3/4] at 
o.a.i.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3293)
[11:51:55] : [Step 3/4] at 
o.a.i.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2967)
[11:51:55] : [Step 3/4] at 
o.a.i.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2850)
[11:51:55] : [Step 3/4] at 
o.a.i.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2693)
[11:51:55] : [Step 3/4] at 
o.a.i.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2652)
[11:51:55] : [Step 3/4] at 
o.a.i.i.managers.communication.GridIoManager.send(GridIoManager.java:1643)
[11:51:55] : [Step 3/4] at 
o.a.i.i.managers.communication.GridIoManager.sendOrderedMessage(GridIoManager.java:1750)
[11:51:55] : [Step 3/4] at 
o.a.i.i.processors.cache.GridCacheIoManager.sendOrderedMessage(GridCacheIoManager.java:1231)
[11:51:55] : [Step 3/4] at 
o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$RebalanceFuture.cleanupRemoteContexts(GridDhtPartitionDemander.java:)
[11:51:55] : [Step 3/4] at 
o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$RebalanceFuture.cancel(GridDhtPartitionDemander.java:1041)
[11:51:55] : [Step 3/4] - locked 
o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$RebalanceFuture@7e28f150
[11:51:55] : [Step 3/4] at 
o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander.lambda$null$2(GridDhtPartitionDemander.java:534)
[11:51:55] : [Step 3/4] at 
o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$$Lambda$41/603501511.run(Unknown
 Source)
[11:51:55] : [Step 3/4] at 
o.a.i.i.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6800)
[11:51:55] : [Step 3/4] at 
o.a.i.i.processors.closure.GridClosureProcessor$1.body(GridClosureProcessor.java:827)
[11:51:55] : [Step 3/4] at 
o.a.i.i.util.worker.GridWorker.run(GridWorker.java:110)
[11:51:55] : [Step 3/4] at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[11:51:55] : [Step 3/4] at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[11:51:55] : [Step 3/4] at java.lang.Thread.run(Thread.java:748)
{code}

Because of that, exchange worker hangs forever while trying to acquire that 
lock:
{code}
[11:51:55] : [Step 3/4] Thread 
[name="exchange-worker-#68894%new-node-topology-change-thread-1%", id=77379, 
state=BLOCKED, blockCnt=11, waitCnt=7]
[11:51:55] : [Step 3/4] Lock 
[object=o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$RebalanceFuture@7e28f150,
 ownerName=sys-#68921%new-node-topology-change-thread-1%, ownerId=77410]
[11:51:55] : [Step 3/4] at 
o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$RebalanceFuture.cancel(GridDhtPartitionDemander.java:1033)
[11:51:55] : [Step 3/4] at 
o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander.addAssignments(GridDhtPartitionDemander.java:302)
[11:51:55] : [Step 3/4] at 
o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPreloader.addAssign

[jira] [Updated] (IGNITE-9213) CacheLockReleaseNodeLeaveTest.testLockTopologyChange hangs sometimes, leading to TC timeout

2018-08-07 Thread Ilya Lantukh (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ilya Lantukh updated IGNITE-9213:
-
Attachment: ignite-9213-threaddump.txt

> CacheLockReleaseNodeLeaveTest.testLockTopologyChange hangs sometimes, leading 
> to TC timeout
> ---
>
> Key: IGNITE-9213
> URL: https://issues.apache.org/jira/browse/IGNITE-9213
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ilya Lantukh
>Assignee: Ilya Lantukh
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
> Attachments: ignite-9213-threaddump.txt
>
>
> Probability is quite low, < 5%.
> One thread gets stuck in GridCacheAdapter.lockAll(...), holding gw readlock 
> and waiting for future that never completes. Another one cannot acquire gw 
> writelock.
> {code}
> "test-runner-#123405%distributed.CacheLockReleaseNodeLeaveTest%" #136172 
> prio=5 os_prio=0 tid=0x7f20cd3d7000 nid=0x356f 
> sleeping[0x7f1eae48b000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.ignite.internal.util.IgniteUtils.sleep(IgniteUtils.java:7678)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheGateway.onStopped(GridCacheGateway.java:318)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.blockGateways(GridCacheProcessor.java:970)
>   at org.apache.ignite.internal.IgniteKernal.stop0(IgniteKernal.java:2195)
>   at org.apache.ignite.internal.IgniteKernal.stop(IgniteKernal.java:2082)
>   at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2595)
>   - locked <0xc2e69580> (a 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance)
>   at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop(IgnitionEx.java:2558)
>   at org.apache.ignite.internal.IgnitionEx.stop(IgnitionEx.java:374)
>   at org.apache.ignite.Ignition.stop(Ignition.java:229)
>   at 
> org.apache.ignite.testframework.junits.GridAbstractTest.stopGrid(GridAbstractTest.java:1153)
>   at 
> org.apache.ignite.testframework.junits.GridAbstractTest.stopAllGrids(GridAbstractTest.java:1196)
>   at 
> org.apache.ignite.testframework.junits.GridAbstractTest.stopAllGrids(GridAbstractTest.java:1174)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.CacheLockReleaseNodeLeaveTest.testLockTopologyChange(CacheLockReleaseNodeLeaveTest.java:177)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at junit.framework.TestCase.runTest(TestCase.java:176)
>   at 
> org.apache.ignite.testframework.junits.GridAbstractTest.runTestInternal(GridAbstractTest.java:2156)
>   at 
> org.apache.ignite.testframework.junits.GridAbstractTest.access$000(GridAbstractTest.java:143)
>   at 
> org.apache.ignite.testframework.junits.GridAbstractTest$5.run(GridAbstractTest.java:2071)
>   at java.lang.Thread.run(Thread.java:745)
> "test-lock-thread-4" #136488 prio=5 os_prio=0 tid=0x7f208802a000 
> nid=0x36a5 waiting on condition [0x7f1ea81c3000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
>   at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
>   at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheAdapter.lockAll(GridCacheAdapter.java:3405)
>   at 
> org.apache.ignite.internal.processors.cache.CacheLockImpl.lock(CacheLockImpl.java:74)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.CacheLockReleaseNodeLeaveTest$3.run(CacheLockReleaseNodeLeaveTest.java:154)
>   at 
> org.apache.ignite.testframework.GridTestUtils$6.call(GridTestUtils.java:1254)
>   at 
> org.apache.ignite.testframework.GridTestThread.run(GridTestThread.java:86)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-9213) CacheLockReleaseNodeLeaveTest.testLockTopologyChange hangs sometimes, leading to TC timeout

2018-08-07 Thread Ilya Lantukh (JIRA)
Ilya Lantukh created IGNITE-9213:


 Summary: CacheLockReleaseNodeLeaveTest.testLockTopologyChange 
hangs sometimes, leading to TC timeout
 Key: IGNITE-9213
 URL: https://issues.apache.org/jira/browse/IGNITE-9213
 Project: Ignite
  Issue Type: Bug
Reporter: Ilya Lantukh
Assignee: Ilya Lantukh


Probability is quite low, < 5%.

One thread gets stuck in GridCacheAdapter.lockAll(...), holding gw readlock and 
waiting for future that never completes. Another one cannot acquire gw 
writelock.

{code}
"test-runner-#123405%distributed.CacheLockReleaseNodeLeaveTest%" #136172 prio=5 
os_prio=0 tid=0x7f20cd3d7000 nid=0x356f sleeping[0x7f1eae48b000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at 
org.apache.ignite.internal.util.IgniteUtils.sleep(IgniteUtils.java:7678)
at 
org.apache.ignite.internal.processors.cache.GridCacheGateway.onStopped(GridCacheGateway.java:318)
at 
org.apache.ignite.internal.processors.cache.GridCacheProcessor.blockGateways(GridCacheProcessor.java:970)
at org.apache.ignite.internal.IgniteKernal.stop0(IgniteKernal.java:2195)
at org.apache.ignite.internal.IgniteKernal.stop(IgniteKernal.java:2082)
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2595)
- locked <0xc2e69580> (a 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance)
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop(IgnitionEx.java:2558)
at org.apache.ignite.internal.IgnitionEx.stop(IgnitionEx.java:374)
at org.apache.ignite.Ignition.stop(Ignition.java:229)
at 
org.apache.ignite.testframework.junits.GridAbstractTest.stopGrid(GridAbstractTest.java:1153)
at 
org.apache.ignite.testframework.junits.GridAbstractTest.stopAllGrids(GridAbstractTest.java:1196)
at 
org.apache.ignite.testframework.junits.GridAbstractTest.stopAllGrids(GridAbstractTest.java:1174)
at 
org.apache.ignite.internal.processors.cache.distributed.CacheLockReleaseNodeLeaveTest.testLockTopologyChange(CacheLockReleaseNodeLeaveTest.java:177)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at junit.framework.TestCase.runTest(TestCase.java:176)
at 
org.apache.ignite.testframework.junits.GridAbstractTest.runTestInternal(GridAbstractTest.java:2156)
at 
org.apache.ignite.testframework.junits.GridAbstractTest.access$000(GridAbstractTest.java:143)
at 
org.apache.ignite.testframework.junits.GridAbstractTest$5.run(GridAbstractTest.java:2071)
at java.lang.Thread.run(Thread.java:745)

"test-lock-thread-4" #136488 prio=5 os_prio=0 tid=0x7f208802a000 nid=0x36a5 
waiting on condition [0x7f1ea81c3000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140)
at 
org.apache.ignite.internal.processors.cache.GridCacheAdapter.lockAll(GridCacheAdapter.java:3405)
at 
org.apache.ignite.internal.processors.cache.CacheLockImpl.lock(CacheLockImpl.java:74)
at 
org.apache.ignite.internal.processors.cache.distributed.CacheLockReleaseNodeLeaveTest$3.run(CacheLockReleaseNodeLeaveTest.java:154)
at 
org.apache.ignite.testframework.GridTestUtils$6.call(GridTestUtils.java:1254)
at 
org.apache.ignite.testframework.GridTestThread.run(GridTestThread.java:86)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9184) Cluster hangs during concurrent node restart and continues query registration

2018-08-07 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16571580#comment-16571580
 ] 

Ilya Lantukh commented on IGNITE-9184:
--

Discovered the following exception while running this test, but it wasn't 
present in attached log.
{code}
class org.apache.ignite.IgniteCheckedException: Failed to connect to node (is 
node still alive?). Make sure that each ComputeTask and cache Transaction has a 
timeout set in order to prevent parties from waiting forever in case of network 
issues [nodeId=c7cb7c10-a793-407a-9300-2693757d26fe, 
addrs=[/0:0:0:0:0:0:0:1:47101, /127.0.0.1:47101, /172.25.4.114:47101]]
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3439)
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2967)
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2850)
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2693)
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2652)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1643)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:1715)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.sendNoRetry(GridCacheIoManager.java:1282)
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.sendAllPartitions(GridCachePartitionExchangeManager.java:1081)
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.refreshPartitions(GridCachePartitionExchangeManager.java:1053)
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ResendTimeoutObject$1.run(GridCachePartitionExchangeManager.java:2771)
at 
org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6800)
at 
org.apache.ignite.internal.processors.closure.GridClosureProcessor$1.body(GridClosureProcessor.java:827)
at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Suppressed: class org.apache.ignite.IgniteCheckedException: Failed to 
connect to address [addr=/0:0:0:0:0:0:0:1:47101, err=Remote node ID is not as 
expected [expected=c7cb7c10-a793-407a-9300-2693757d26fe, 
rcvd=15f11c3b-9a53-4c2a-b9a9-3ff4ba2f425a]]
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3442)
... 16 more
Caused by: class 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$HandshakeException: 
Remote node ID is not as expected 
[expected=c7cb7c10-a793-407a-9300-2693757d26fe, 
rcvd=15f11c3b-9a53-4c2a-b9a9-3ff4ba2f425a]
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.safeTcpHandshake(TcpCommunicationSpi.java:3659)
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3293)
... 16 more
Suppressed: class org.apache.ignite.IgniteCheckedException: Failed to 
connect to address [addr=/0:0:0:0:0:0:0:1:47101, err=Remote node ID is not as 
expected [expected=c7cb7c10-a793-407a-9300-2693757d26fe, 
rcvd=15f11c3b-9a53-4c2a-b9a9-3ff4ba2f425a]]
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3442)
... 16 more
Caused by: class 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$HandshakeException: 
Remote node ID is not as expected 
[expected=c7cb7c10-a793-407a-9300-2693757d26fe, 
rcvd=15f11c3b-9a53-4c2a-b9a9-3ff4ba2f425a]
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.safeTcpHandshake(TcpCommunicationSpi.java:3659)
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3293)
... 16 more
Suppressed: class org.apache.ignite.IgniteCheckedException: Failed to 
connect to address [addr=/0:0:0:0:0:0:0:1:47101, err=Remote node ID is not as 
expected [expected=c7cb7c10-a793-407a-9300-2693757d26fe, 
rcvd=15f11c3b-9a53-4c2a-b9a9-3ff4ba2f425a]]
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3442)
... 16 more
Caused by: class 
org.apache.ignite.spi.communication.tcp

[jira] [Comment Edited] (IGNITE-9184) Cluster hangs during concurrent node restart and continues query registration

2018-08-07 Thread Ilya Lantukh (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16571513#comment-16571513
 ] 

Ilya Lantukh edited comment on IGNITE-9184 at 8/7/18 12:01 PM:
---

[~mcherkas],
I cannot understand the problem from the attached log and thread dump, and I 
cannot reproduce it by running your test. Please perform a deeper investigation 
of this issue and write more detailed description.


was (Author: ilantukh):
[~mcherkas],
I cannot understand the problem from the attached logs and thread dump, and I 
cannot reproduce it by running your test. Please perform a deeper investigation 
of this issue and write more detailed description.

> Cluster hangs during concurrent node restart and continues query registration
> -
>
> Key: IGNITE-9184
> URL: https://issues.apache.org/jira/browse/IGNITE-9184
> Project: Ignite
>  Issue Type: Bug
>  Components: general
>Affects Versions: 2.6
>Reporter: Mikhail Cherkasov
>Assignee: Ilya Lantukh
>Priority: Blocker
> Fix For: 2.7
>
> Attachments: StressTest.java, logs, stacktrace
>
>
> Please check the attached test case and stack trace.
> I can see: "Failed to wait for initial partition map exchange" message.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (IGNITE-9184) Cluster hangs during concurrent node restart and continues query registration

2018-08-07 Thread Ilya Lantukh (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ilya Lantukh reassigned IGNITE-9184:


Assignee: Mikhail Cherkasov  (was: Ilya Lantukh)

> Cluster hangs during concurrent node restart and continues query registration
> -
>
> Key: IGNITE-9184
> URL: https://issues.apache.org/jira/browse/IGNITE-9184
> Project: Ignite
>  Issue Type: Bug
>  Components: general
>Affects Versions: 2.6
>Reporter: Mikhail Cherkasov
>Assignee: Mikhail Cherkasov
>Priority: Blocker
> Fix For: 2.7
>
> Attachments: StressTest.java, logs, stacktrace
>
>
> Please check the attached test case and stack trace.
> I can see: "Failed to wait for initial partition map exchange" message.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   4   5   >