[jira] [Closed] (IGNITE-15317) Cache.put/get Jepsen Elle test failed for Java Thin Client

2021-08-17 Thread Dmitry Sherstobitov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-15317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Sherstobitov closed IGNITE-15317.

Ignite Flags:   (was: Docs Required,Release Notes Required)

> Cache.put/get Jepsen Elle test failed for Java Thin Client
> --
>
> Key: IGNITE-15317
> URL: https://issues.apache.org/jira/browse/IGNITE-15317
> Project: Ignite
>  Issue Type: Bug
>  Components: thin client
>Affects Versions: 2.10
>Reporter: Dmitry Sherstobitov
>Priority: Critical
>
> I've created simple jepsen elle test with for basic functionality 
> cache.get/put. There is 2 versions of clients: one for thick client and other 
> for thin client. Invoke code is exactly the same.
> Test passed for thick client and failed for thin client. Note that this issue 
> is reproducible even with noop nemesis algorithm. 
> {code:java}
> (reduce
>   (fn [txn' [f k v :as micro-op]]
> (case f
>   :r (let [value (read-value cache k)]
>(conj txn' [f k value]))
>   :w (do
>(let [contain-key (.containsKey cache k)
>  value   (read-value cache k)]
>  (if (or (not contain-key) (and contain-key (< 
> value v)))
>(.put cache k v)
>(vreset! tx-state false))) ; bye functional 
> programming, we are saving state here to fail tx later
>(conj txn' micro-op
> {code}
> GitHub repo with jepsen code: https://github.com/qvad/jepsen (see ignite 
> folder there, there is also small guide how to launch hyper-v VM and run test 
> there)
> Elle reports following anomalies (:anomaly-types (:G1a :G1b :internal)), 
> which is aborted reads and intermediate reads.
> Anomalies description: 
> https://sitano.github.io/theory/databases/2019/07/30/tx-isolation-anomalies/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IGNITE-15317) Cache.put/get Jepsen Elle test failed for Java Thin Client

2021-08-17 Thread Dmitry Sherstobitov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-15317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Sherstobitov resolved IGNITE-15317.
--
Resolution: Not A Problem

> Cache.put/get Jepsen Elle test failed for Java Thin Client
> --
>
> Key: IGNITE-15317
> URL: https://issues.apache.org/jira/browse/IGNITE-15317
> Project: Ignite
>  Issue Type: Bug
>  Components: thin client
>Affects Versions: 2.10
>Reporter: Dmitry Sherstobitov
>Priority: Critical
>
> I've created simple jepsen elle test with for basic functionality 
> cache.get/put. There is 2 versions of clients: one for thick client and other 
> for thin client. Invoke code is exactly the same.
> Test passed for thick client and failed for thin client. Note that this issue 
> is reproducible even with noop nemesis algorithm. 
> {code:java}
> (reduce
>   (fn [txn' [f k v :as micro-op]]
> (case f
>   :r (let [value (read-value cache k)]
>(conj txn' [f k value]))
>   :w (do
>(let [contain-key (.containsKey cache k)
>  value   (read-value cache k)]
>  (if (or (not contain-key) (and contain-key (< 
> value v)))
>(.put cache k v)
>(vreset! tx-state false))) ; bye functional 
> programming, we are saving state here to fail tx later
>(conj txn' micro-op
> {code}
> GitHub repo with jepsen code: https://github.com/qvad/jepsen (see ignite 
> folder there, there is also small guide how to launch hyper-v VM and run test 
> there)
> Elle reports following anomalies (:anomaly-types (:G1a :G1b :internal)), 
> which is aborted reads and intermediate reads.
> Anomalies description: 
> https://sitano.github.io/theory/databases/2019/07/30/tx-isolation-anomalies/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-15317) Cache.put/get Jepsen Elle test failed for Java Thin Client

2021-08-17 Thread Dmitry Sherstobitov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-15317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17400493#comment-17400493
 ] 

Dmitry Sherstobitov commented on IGNITE-15317:
--

[~alex_pl] You were right, it was a problem in test.
I will close this ticked and continue testing, maybe will add few more nemesis 
algorithms.

> Cache.put/get Jepsen Elle test failed for Java Thin Client
> --
>
> Key: IGNITE-15317
> URL: https://issues.apache.org/jira/browse/IGNITE-15317
> Project: Ignite
>  Issue Type: Bug
>  Components: thin client
>Affects Versions: 2.10
>Reporter: Dmitry Sherstobitov
>Priority: Critical
>
> I've created simple jepsen elle test with for basic functionality 
> cache.get/put. There is 2 versions of clients: one for thick client and other 
> for thin client. Invoke code is exactly the same.
> Test passed for thick client and failed for thin client. Note that this issue 
> is reproducible even with noop nemesis algorithm. 
> {code:java}
> (reduce
>   (fn [txn' [f k v :as micro-op]]
> (case f
>   :r (let [value (read-value cache k)]
>(conj txn' [f k value]))
>   :w (do
>(let [contain-key (.containsKey cache k)
>  value   (read-value cache k)]
>  (if (or (not contain-key) (and contain-key (< 
> value v)))
>(.put cache k v)
>(vreset! tx-state false))) ; bye functional 
> programming, we are saving state here to fail tx later
>(conj txn' micro-op
> {code}
> GitHub repo with jepsen code: https://github.com/qvad/jepsen (see ignite 
> folder there, there is also small guide how to launch hyper-v VM and run test 
> there)
> Elle reports following anomalies (:anomaly-types (:G1a :G1b :internal)), 
> which is aborted reads and intermediate reads.
> Anomalies description: 
> https://sitano.github.io/theory/databases/2019/07/30/tx-isolation-anomalies/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-15317) Cache.put/get Jepsen Elle test failed for Java Thin Client

2021-08-17 Thread Dmitry Sherstobitov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-15317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17400454#comment-17400454
 ] 

Dmitry Sherstobitov commented on IGNITE-15317:
--

[~alex_pl] thanks, yeah, it's a valid point. I will fix the test and rerun it 
then

> Cache.put/get Jepsen Elle test failed for Java Thin Client
> --
>
> Key: IGNITE-15317
> URL: https://issues.apache.org/jira/browse/IGNITE-15317
> Project: Ignite
>  Issue Type: Bug
>  Components: thin client
>Affects Versions: 2.10
>Reporter: Dmitry Sherstobitov
>Priority: Critical
>
> I've created simple jepsen elle test with for basic functionality 
> cache.get/put. There is 2 versions of clients: one for thick client and other 
> for thin client. Invoke code is exactly the same.
> Test passed for thick client and failed for thin client. Note that this issue 
> is reproducible even with noop nemesis algorithm. 
> {code:java}
> (reduce
>   (fn [txn' [f k v :as micro-op]]
> (case f
>   :r (let [value (read-value cache k)]
>(conj txn' [f k value]))
>   :w (do
>(let [contain-key (.containsKey cache k)
>  value   (read-value cache k)]
>  (if (or (not contain-key) (and contain-key (< 
> value v)))
>(.put cache k v)
>(vreset! tx-state false))) ; bye functional 
> programming, we are saving state here to fail tx later
>(conj txn' micro-op
> {code}
> GitHub repo with jepsen code: https://github.com/qvad/jepsen (see ignite 
> folder there, there is also small guide how to launch hyper-v VM and run test 
> there)
> Elle reports following anomalies (:anomaly-types (:G1a :G1b :internal)), 
> which is aborted reads and intermediate reads.
> Anomalies description: 
> https://sitano.github.io/theory/databases/2019/07/30/tx-isolation-anomalies/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-15317) Cache.put/get Jepsen Elle test failed for Java Thin Client

2021-08-17 Thread Dmitry Sherstobitov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-15317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Sherstobitov updated IGNITE-15317:
-
Summary: Cache.put/get Jepsen Elle test failed for Java Thin Client  (was: 
Simple Jepsen Elle test failed for Java Thin Client)

> Cache.put/get Jepsen Elle test failed for Java Thin Client
> --
>
> Key: IGNITE-15317
> URL: https://issues.apache.org/jira/browse/IGNITE-15317
> Project: Ignite
>  Issue Type: Bug
>  Components: thin client
>Affects Versions: 2.10
>Reporter: Dmitry Sherstobitov
>Priority: Critical
>
> I've created simple jepsen elle test with for basic functionality 
> cache.get/put. There is 2 versions of clients: one for thick client and other 
> for thin client. Invoke code is exactly the same.
> Test passed for thick client and failed for thin client. Note that this issue 
> is reproducible even with noop nemesis algorithm. 
> {code:java}
> (reduce
>   (fn [txn' [f k v :as micro-op]]
> (case f
>   :r (let [value (read-value cache k)]
>(conj txn' [f k value]))
>   :w (do
>(let [contain-key (.containsKey cache k)
>  value   (read-value cache k)]
>  (if (or (not contain-key) (and contain-key (< 
> value v)))
>(.put cache k v)
>(vreset! tx-state false))) ; bye functional 
> programming, we are saving state here to fail tx later
>(conj txn' micro-op
> {code}
> GitHub repo with jepsen code: https://github.com/qvad/jepsen (see ignite 
> folder there, there is also small guide how to launch hyper-v VM and run test 
> there)
> Elle reports following anomalies (:anomaly-types (:G1a :G1b :internal)), 
> which is aborted reads and intermediate reads.
> Anomalies description: 
> https://sitano.github.io/theory/databases/2019/07/30/tx-isolation-anomalies/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-15317) Simple Jepsen Elle test failed for Java Thin Client

2021-08-16 Thread Dmitry Sherstobitov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-15317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Sherstobitov updated IGNITE-15317:
-
Description: 
I've created simple jepsen elle test with for basic functionality 
cache.get/put. There is 2 versions of clients: one for thick client and other 
for thin client. Invoke code is exactly the same.
Test passed for thick client and failed for thin client. Note that this issue 
is reproducible even with noop nemesis algorithm. 


{code:java}
(reduce
  (fn [txn' [f k v :as micro-op]]
(case f
  :r (let [value (read-value cache k)]
   (conj txn' [f k value]))

  :w (do
   (let [contain-key (.containsKey cache k)
 value   (read-value cache k)]
 (if (or (not contain-key) (and contain-key (< 
value v)))
   (.put cache k v)
   (vreset! tx-state false))) ; bye functional 
programming, we are saving state here to fail tx later
   (conj txn' micro-op
{code}


GitHub repo with jepsen code: https://github.com/qvad/jepsen (see ignite folder 
there, there is also small guide how to launch hyper-v VM and run test there)
Elle reports following anomalies (:anomaly-types (:G1a :G1b :internal)), which 
is aborted reads and intermediate reads.
Anomalies description: 
https://sitano.github.io/theory/databases/2019/07/30/tx-isolation-anomalies/


  was:
I've created simple jepsen elle test with for basic functionality 
cache.get/put. There is 2 versions of clients: one for thick client and other 
for thin client. Invoke code is exactly the same.
Test passed for thick client and failed for thin client. Note that this issue 
is reproducible even with noop nemesis algorithm. 

GitHub repo with jepsen code: https://github.com/qvad/jepsen (see ignite folder 
there, there is also small guide how to launch hyper-v VM and run test there)
Elle reports following anomalies (:anomaly-types (:G1a :G1b :internal)), which 
is aborted reads and intermediate reads.
Anomalies description: 
https://sitano.github.io/theory/databases/2019/07/30/tx-isolation-anomalies/



> Simple Jepsen Elle test failed for Java Thin Client
> ---
>
> Key: IGNITE-15317
> URL: https://issues.apache.org/jira/browse/IGNITE-15317
> Project: Ignite
>  Issue Type: Bug
>  Components: thin client
>Affects Versions: 2.10
>Reporter: Dmitry Sherstobitov
>Priority: Critical
>
> I've created simple jepsen elle test with for basic functionality 
> cache.get/put. There is 2 versions of clients: one for thick client and other 
> for thin client. Invoke code is exactly the same.
> Test passed for thick client and failed for thin client. Note that this issue 
> is reproducible even with noop nemesis algorithm. 
> {code:java}
> (reduce
>   (fn [txn' [f k v :as micro-op]]
> (case f
>   :r (let [value (read-value cache k)]
>(conj txn' [f k value]))
>   :w (do
>(let [contain-key (.containsKey cache k)
>  value   (read-value cache k)]
>  (if (or (not contain-key) (and contain-key (< 
> value v)))
>(.put cache k v)
>(vreset! tx-state false))) ; bye functional 
> programming, we are saving state here to fail tx later
>(conj txn' micro-op
> {code}
> GitHub repo with jepsen code: https://github.com/qvad/jepsen (see ignite 
> folder there, there is also small guide how to launch hyper-v VM and run test 
> there)
> Elle reports following anomalies (:anomaly-types (:G1a :G1b :internal)), 
> which is aborted reads and intermediate reads.
> Anomalies description: 
> https://sitano.github.io/theory/databases/2019/07/30/tx-isolation-anomalies/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-15317) Simple Jepsen Elle test failed for Java Thin Client

2021-08-16 Thread Dmitry Sherstobitov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-15317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Sherstobitov updated IGNITE-15317:
-
Affects Version/s: 2.10

> Simple Jepsen Elle test failed for Java Thin Client
> ---
>
> Key: IGNITE-15317
> URL: https://issues.apache.org/jira/browse/IGNITE-15317
> Project: Ignite
>  Issue Type: Bug
>  Components: thin client
>Affects Versions: 2.10
>Reporter: Dmitry Sherstobitov
>Priority: Critical
>
> I've created simple jepsen elle test with for basic functionality 
> cache.get/put. There is 2 versions of clients: one for thick client and other 
> for thin client. Invoke code is exactly the same.
> Test passed for thick client and failed for thin client. Note that this issue 
> is reproducible even with noop nemesis algorithm. 
> GitHub repo with jepsen code: https://github.com/qvad/jepsen (see ignite 
> folder there, there is also small guide how to launch hyper-v VM and run test 
> there)
> Elle reports following anomalies (:anomaly-types (:G1a :G1b :internal)), 
> which is aborted reads and intermediate reads.
> Anomalies description: 
> https://sitano.github.io/theory/databases/2019/07/30/tx-isolation-anomalies/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-15317) Simple Jepsen Elle test failed for Java Thin Client

2021-08-16 Thread Dmitry Sherstobitov (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-15317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Sherstobitov updated IGNITE-15317:
-
Component/s: thin client

> Simple Jepsen Elle test failed for Java Thin Client
> ---
>
> Key: IGNITE-15317
> URL: https://issues.apache.org/jira/browse/IGNITE-15317
> Project: Ignite
>  Issue Type: Bug
>  Components: thin client
>Reporter: Dmitry Sherstobitov
>Priority: Critical
>
> I've created simple jepsen elle test with for basic functionality 
> cache.get/put. There is 2 versions of clients: one for thick client and other 
> for thin client. Invoke code is exactly the same.
> Test passed for thick client and failed for thin client. Note that this issue 
> is reproducible even with noop nemesis algorithm. 
> GitHub repo with jepsen code: https://github.com/qvad/jepsen (see ignite 
> folder there, there is also small guide how to launch hyper-v VM and run test 
> there)
> Elle reports following anomalies (:anomaly-types (:G1a :G1b :internal)), 
> which is aborted reads and intermediate reads.
> Anomalies description: 
> https://sitano.github.io/theory/databases/2019/07/30/tx-isolation-anomalies/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-15317) Simple Jepsen Elle test failed for Java Thin Client

2021-08-16 Thread Dmitry Sherstobitov (Jira)
Dmitry Sherstobitov created IGNITE-15317:


 Summary: Simple Jepsen Elle test failed for Java Thin Client
 Key: IGNITE-15317
 URL: https://issues.apache.org/jira/browse/IGNITE-15317
 Project: Ignite
  Issue Type: Bug
Reporter: Dmitry Sherstobitov


I've created simple jepsen elle test with for basic functionality 
cache.get/put. There is 2 versions of clients: one for thick client and other 
for thin client. Invoke code is exactly the same.
Test passed for thick client and failed for thin client. Note that this issue 
is reproducible even with noop nemesis algorithm. 

GitHub repo with jepsen code: https://github.com/qvad/jepsen (see ignite folder 
there, there is also small guide how to launch hyper-v VM and run test there)
Elle reports following anomalies (:anomaly-types (:G1a :G1b :internal)), which 
is aborted reads and intermediate reads.
Anomalies description: 
https://sitano.github.io/theory/databases/2019/07/30/tx-isolation-anomalies/




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-11838) Improve usability of UriDeploymentSpi documentation

2019-05-07 Thread Dmitry Sherstobitov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Sherstobitov updated IGNITE-11838:
-
Description: 
I was trying to run UriDeploymentSPi feature and actually failed in it. I've 
only managed to stop it using the actual Java code.

Here some issues in documentation I've found:
1. Not clear what is GAR file and how user can create it (manually?, using some 
utility?)
2. Local disk folder containing only compiled Java classes - this doesn’t work 
for me (and according to java code this shouldn't work)
3. Local disk folder with structure of unpacked GAR file - this DOES work but. 
META-INF/ actually is an optional folder, xyz.class -see previous). The only 
thing user need is to put lib/ folder in deployment URI and put .jar file there
4. Doesn’t clear what is ignite.xml descriptor file. How user can create it
5. I don’t like windows paths in examples (I think linux paths is more common 
in case of Ignite, we may create Note with Windows paths examples)
6. In case of Linux path user should write something like this: 
file:///tmp/path/deployment (3 slashes instead of 2)
7. 
https://apacheignite.readme.io/docs/service-grid-28#section-service-updates-redeployment
 - here link to URI looks strange and doesn’t work
8. Previous page: example temporaryDirectoryPath value is optional so we may 
remove it

  was:
I was trying to run UriDeploymentSPi feature and actually failed in it. I've 
only managed to stop it sung Java code.

Here some issues in documentation I've found:
1. Not clear what is GAR file and how user can create it (manually?, using some 
utility?)
2. Local disk folder containing only compiled Java classes - this doesn’t work 
for me (and according to java code this shouldn't work)
3. Local disk folder with structure of unpacked GAR file - this DOES work but. 
META-INF/ actually is an optional folder, xyz.class -see previous). The only 
thing user need is to put lib/ folder in deployment URI and put .jar file there
4. Doesn’t clear what is ignite.xml descriptor file. How user can create it
5. I don’t like windows paths in examples (I think linux paths is more common 
in case of Ignite, we may create Note with Windows paths examples)
6. In case of Linux path user should write something like this: 
file:///tmp/path/deployment (3 slashes instead of 2)
7. 
https://apacheignite.readme.io/docs/service-grid-28#section-service-updates-redeployment
 - here link to URI looks strange and doesn’t work
8. Previous page: example temporaryDirectoryPath value is optional so we may 
remove it


> Improve usability of UriDeploymentSpi documentation 
> 
>
> Key: IGNITE-11838
> URL: https://issues.apache.org/jira/browse/IGNITE-11838
> Project: Ignite
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.7
>Reporter: Dmitry Sherstobitov
>Priority: Critical
>
> I was trying to run UriDeploymentSPi feature and actually failed in it. I've 
> only managed to stop it using the actual Java code.
> Here some issues in documentation I've found:
> 1. Not clear what is GAR file and how user can create it (manually?, using 
> some utility?)
> 2. Local disk folder containing only compiled Java classes - this doesn’t 
> work for me (and according to java code this shouldn't work)
> 3. Local disk folder with structure of unpacked GAR file - this DOES work 
> but. META-INF/ actually is an optional folder, xyz.class -see previous). The 
> only thing user need is to put lib/ folder in deployment URI and put .jar 
> file there
> 4. Doesn’t clear what is ignite.xml descriptor file. How user can create it
> 5. I don’t like windows paths in examples (I think linux paths is more common 
> in case of Ignite, we may create Note with Windows paths examples)
> 6. In case of Linux path user should write something like this: 
> file:///tmp/path/deployment (3 slashes instead of 2)
> 7. 
> https://apacheignite.readme.io/docs/service-grid-28#section-service-updates-redeployment
>  - here link to URI looks strange and doesn’t work
> 8. Previous page: example temporaryDirectoryPath value is optional so we may 
> remove it



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11838) Improve usability of UriDeploymentSpi documentation

2019-05-07 Thread Dmitry Sherstobitov (JIRA)
Dmitry Sherstobitov created IGNITE-11838:


 Summary: Improve usability of UriDeploymentSpi documentation 
 Key: IGNITE-11838
 URL: https://issues.apache.org/jira/browse/IGNITE-11838
 Project: Ignite
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.7
Reporter: Dmitry Sherstobitov


I was trying to run UriDeploymentSPi feature and actually failed in it. I've 
only managed to stop it sung Java code.

Here some issues in documentation I've found:
1. Not clear what is GAR file and how user can create it (manually?, using some 
utility?)
2. Local disk folder containing only compiled Java classes - this doesn’t work 
for me (and according to java code this shouldn't work)
3. Local disk folder with structure of unpacked GAR file - this DOES work but. 
META-INF/ actually is an optional folder, xyz.class -see previous). The only 
thing user need is to put lib/ folder in deployment URI and put .jar file there
4. Doesn’t clear what is ignite.xml descriptor file. How user can create it
5. I don’t like windows paths in examples (I think linux paths is more common 
in case of Ignite, we may create Note with Windows paths examples)
6. In case of Linux path user should write something like this: 
file:///tmp/path/deployment (3 slashes instead of 2)
7. 
https://apacheignite.readme.io/docs/service-grid-28#section-service-updates-redeployment
 - here link to URI looks strange and doesn’t work
8. Previous page: example temporaryDirectoryPath value is optional so we may 
remove it



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11667) OPTIMISTIC REPEATEBLE_READ transactions does not guarantee transactional consistency in blinking node scenario

2019-04-01 Thread Dmitry Sherstobitov (JIRA)
Dmitry Sherstobitov created IGNITE-11667:


 Summary: OPTIMISTIC REPEATEBLE_READ transactions does not 
guarantee transactional consistency in blinking node scenario
 Key: IGNITE-11667
 URL: https://issues.apache.org/jira/browse/IGNITE-11667
 Project: Ignite
  Issue Type: Bug
Reporter: Dmitry Sherstobitov


Following scenario

Start cluster, load data
Start transactional loading (simple transfer task with PESSIMISTIC + 
OPTIMISTIC, REPEATABLE_READ transactions)
repeat 10 times:
  Stop one node, sleep 10 seconds, start again
  Wait for finish rebalance (LocalNodeMovingPartitionsCount == 0 for each 
cache/cache_group)

Validate that there is no conflicts in sum of fields (verify action for 
transfer task)

In case of OPTIMISTIC/REPEATABLE_READ transactions there is no guarantee that 
transactional consistence will be supported. (last validate step will be failed)




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11609) Add support of authentication and SSL in yardstick IgniteThinClient benchmark

2019-03-22 Thread Dmitry Sherstobitov (JIRA)
Dmitry Sherstobitov created IGNITE-11609:


 Summary: Add support of authentication and SSL in yardstick 
IgniteThinClient benchmark
 Key: IGNITE-11609
 URL: https://issues.apache.org/jira/browse/IGNITE-11609
 Project: Ignite
  Issue Type: New Feature
Affects Versions: 2.7
Reporter: Dmitry Sherstobitov
 Fix For: 2.8


Add support of following keys:

Mandatory authentication:
USER
PASSWORD

Mandatory SSL: 
SSL_KEY_PASSWORD
SSL_KEY_PATH

Optional SSL: 
SSL_CLIENT_STORE_TYPE (default JKS)
SSL_SERVER_STORE_TYPE (default JKS)
SSL_KEY_ALGORITHM (default SunX509)
SSL_TRUST_ALL (default false) 
SSL_PROTOCOL (default TLS)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-11461) Automatic modules support for Apache Ignite: find and resolve packages conflicts

2019-03-15 Thread Dmitry Sherstobitov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-11461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16793498#comment-16793498
 ] 

Dmitry Sherstobitov commented on IGNITE-11461:
--

[~dpavlov] Looks good for me.

> Automatic modules support for Apache Ignite: find and resolve packages 
> conflicts
> 
>
> Key: IGNITE-11461
> URL: https://issues.apache.org/jira/browse/IGNITE-11461
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Dmitriy Pavlov
>Assignee: Dmitriy Pavlov
>Priority: Critical
> Fix For: 2.8
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Example of failure in a modular environment:
> Error:java: the unnamed module reads package 
> org.apache.ignite.internal.processors.cache.persistence.file from both 
> ignite.core and ignite.direct.io
> This type of failure is named package inference, but it is strictly 
> prohibited 
> http://openjdk.java.net/projects/jigsaw/spec/reqs/#non-interference 
> Ignite compatibility with Jigsaw is tested in a separate project. See details 
> in
> https://github.com/apache/ignite/tree/ignite-11461-java11/modules/dev-utils/ignite-modules-test#ignite-modular-environment-test-project
>  
> Following table contains currenly investigated Ignite modules if this 
> applicability as automatic modules:
> ||Module||Run In Modular Environment||Changeable using private API only || 
> Notes ||
> |ignite-code|(/)|(/)| |
> |ignite-indexing|(x) [IGNITE-11464] | (?) Refacrtoing to use 
> ignite-indexing-text may be a breaking change | Lucene artifacts exclusion is 
> required by user manually. |
> |ignite-compress | (x) | (/) not releaseed | 
> org.apache.ignite.internal.processors.compress package conflict |
> |ignite-direct-io|(x) blocked by indexind | (/) | 
> org.apache.ignite.internal.processors.cache.persistence.file package conflict 
> |
> |ignite-spring|(x) [IGNITE-11467] blocked by indexing | (x) 
> org.apache.ignite.IgniteSpringBean affected | |
> |ignite-ml |(x) blocked by indexing | | |
> |ignite-log4j|(/)|(/) | But may not compile with other logging dependencies - 
> EOL https://blogs.apache.org/logging/entry/moving_on_to_log4j_2 |
> |ignite-log4j2|(/)|(/)| |
> |ignite-slf4j | (/)|(/)| |
> |ignite-rest-http | (x) IGNITE-11469 & Mirgate to log4j2x [IGNITE-11486] | 
> (/) | Usage with slf4j may break compilation because conflict of packages |
> |ignite-hibernate_5.3 and others | (x) [IGNITE-11485] | (?) | avoid of API 
> breaking is possibleif hibernate core classes not used by third party code |
> |ignite-zookeeper| (x) IGNITE-11486 | (/) | | 
> |ignite-spring-data_2-0 | (x) blocked by spring | org.apache.commons.logging 
> from both commons.logging and spring.jcl conflict | 
> https://jira.spring.io/browse/SPR-16605 | 
> |ignite-ml | (/) master (x) 2.7 | | | 
> |ignite-cassandra-store | (x)  [IGNITE-11467]  blocked by spring  | (/) | 
> Only spring needs to be fixed | 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11407) AssertionError may occurs on server start

2019-02-25 Thread Dmitry Sherstobitov (JIRA)
Dmitry Sherstobitov created IGNITE-11407:


 Summary: AssertionError may occurs on server start
 Key: IGNITE-11407
 URL: https://issues.apache.org/jira/browse/IGNITE-11407
 Project: Ignite
  Issue Type: Bug
Reporter: Dmitry Sherstobitov


See https://issues.apache.org/jira/browse/IGNITE-11406 (same scenario)

On 5th iteration (on each iteration there is 50 round cluster nodes restart)
{code:java}
ava.lang.AssertionError
at 
org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.stopRoutine(GridContinuousProcessor.java:743)
at 
org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.executeQuery0(CacheContinuousQueryManager.java:705)
at 
org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.executeInternalQuery(CacheContinuousQueryManager.java:542)
at 
org.apache.ignite.internal.processors.datastructures.DataStructuresProcessor.startQuery(DataStructuresProcessor.java:213)
at 
org.apache.ignite.internal.processors.datastructures.DataStructuresProcessor.getAtomic(DataStructuresProcessor.java:541)
at 
org.apache.ignite.internal.processors.datastructures.DataStructuresProcessor.atomicLong(DataStructuresProcessor.java:457)
at org.apache.ignite.internal.IgniteKernal.atomicLong(IgniteKernal.java:3468)
at org.apache.ignite.internal.IgniteKernal.atomicLong(IgniteKernal.java:3457)
at 
org.apache.ignite.piclient.bean.LifecycleAtomicLongBean.onLifecycleEvent(LifecycleAtomicLongBean.java:48)
at 
org.apache.ignite.internal.IgniteKernal.notifyLifecycleBeans(IgniteKernal.java:655)
at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1064)
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1973)
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1716)
at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1144)
at 
org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1062)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:948)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:847)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:717)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:686)
at org.apache.ignite.Ignition.start(Ignition.java:352)
at 
org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:302)
Failed to start grid: null{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11408) AssertionError may occurs on client start

2019-02-25 Thread Dmitry Sherstobitov (JIRA)
Dmitry Sherstobitov created IGNITE-11408:


 Summary: AssertionError may occurs on client start
 Key: IGNITE-11408
 URL: https://issues.apache.org/jira/browse/IGNITE-11408
 Project: Ignite
  Issue Type: Bug
Reporter: Dmitry Sherstobitov


Scenario from: https://issues.apache.org/jira/browse/IGNITE-11406

AssertionError may occurs on client start:
{code}
2019-02-23T18:26:27,317][DEBUG][tcp-client-disco-msg-worker-#4][TcpDiscoverySpi]
 Grid runnable finished normally: tcp-client-disco-msg-worker-#4
Exception in thread “tcp-client-disco-msg-worker-#4” java.lang.AssertionError: 
TcpDiscoveryClientReconnectMessage 
[routerNodeId=76b33f1b-bef6-4805-bcca-0ea32df641ac, lastMsgId=null, 
super=TcpDiscoveryAbstractMessage 
[sndNodeId=76b33f1b-bef6-4805-bcca-0ea32df641ac, 
id=57c55fa1961-99d3d909-fa44-4b74-aea4-d375ad85e53e, 
verifierNodeId=6ba6bd09-4bc0-400c-ba11-a06d2507e983, topVer=0, pendingIdx=0, 
failedNodes=null, isClient=false]]
   at 
org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.processClientReconnectMessage(ClientImpl.java:2311)
   at 
org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.processDiscoveryMessage(ClientImpl.java:1914)
   at 
org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.body(ClientImpl.java:1798)
   at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
{code}


Other trace
{code:java}
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) 
[piclient-2.7.jar:?]
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) 
[piclient-2.7.jar:?]
at py4j.Gateway.invoke(Gateway.java:282) [piclient-2.7.jar:?]
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) 
[piclient-2.7.jar:?]
at py4j.commands.CallCommand.execute(CallCommand.java:79) 
[piclient-2.7.jar:?]
at py4j.GatewayConnection.run(GatewayConnection.java:238) 
[piclient-2.7.jar:?]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
Caused by: org.apache.ignite.IgniteCheckedException: Failed to start SPI: 
TcpDiscoverySpi [addrRslvr=null, sockTimeout=3, ackTimeout=6, 
marsh=JdkMarshaller 
[clsFilter=org.apache.ignite.marshaller.MarshallerUtils$1@59f2595b], 
reconCnt=2, reconDelay=2000, maxAckTimeout=30, forceSrvMode=false, 
clientReconnectDisabled=false, internalLsnr=null, skipAddrsRandomization=false]
at 
org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:300)
 ~[ignite-core-2.4.15.jar:2.4.15]
at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.start(GridDiscoveryManager.java:901)
 ~[ignite-core-2.4.15.jar:2.4.15]
at 
org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1672) 
~[ignite-core-2.4.15.jar:2.4.15]
... 22 more
Caused by: org.apache.ignite.spi.IgniteSpiException: Some error in join process.
at 
org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.body(ClientImpl.java:1809)
 ~[ignite-core-2.4.15.jar:2.4.15]
at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) 
~[ignite-core-2.4.15.jar:2.4.15]
2019-02-23T18:26:27,320][ERROR][tcp-client-disco-sock-reader-#3][TcpDiscoverySpi]
 Connection failed [sock=Socket[addr=/172.25.1.34,port=47503,localport=60675], 
locNodeId=99d3d909-fa44-4b74-aea4-d375ad85e53e]
2019-02-23T18:26:27,320][ERROR][Thread-2][IgniteKernal] Got exception while 
starting (will rollback startup routine).{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11407) AssertionError may occurs on server start

2019-02-25 Thread Dmitry Sherstobitov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Sherstobitov updated IGNITE-11407:
-
Description: 
See https://issues.apache.org/jira/browse/IGNITE-11406 (same scenario)

On 5th iteration (on each iteration there is 50 round cluster nodes restart)

There is atomic long started in lifecyclebean:
{code:java}
LifecycleAtomicLongBean implements LifecycleBean {
/**
* Auto-inject ignite instance.
*/
@IgniteInstanceResource
private Ignite ignite;

/**
* atomicLongName
*/
private String atomicLongName;

/**
* Event type
*/
private LifecycleEventType eventType;

/**
* Logger
*/
private static final Logger log = LogManager.getLogger(IgniteService.class);

public LifecycleAtomicLongBean(String atomicLongName, LifecycleEventType 
eventType) {
this.atomicLongName = atomicLongName;
this.eventType = eventType;
}

/** {@inheritDoc} */
@Override public void onLifecycleEvent(LifecycleEventType evt) {
System.out.println();
System.out.println(">>> Lifecycle event occurred: " + evt);
System.out.println(">>> Ignite name: " + ignite.name());

if (evt == eventType) {
IgniteAtomicLong atomicLong = ignite.atomicLong(atomicLongName, 0, true);

log.info(">>> Ignite Atomic Long");

log.info("Atomic long initial value : " + atomicLong.getAndIncrement() + '.');
}
}

}{code}

Configuration:
{code:java}





AFTER_NODE_START



{code}

Error on start 
{code:java}
ava.lang.AssertionError
at 
org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.stopRoutine(GridContinuousProcessor.java:743)
at 
org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.executeQuery0(CacheContinuousQueryManager.java:705)
at 
org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.executeInternalQuery(CacheContinuousQueryManager.java:542)
at 
org.apache.ignite.internal.processors.datastructures.DataStructuresProcessor.startQuery(DataStructuresProcessor.java:213)
at 
org.apache.ignite.internal.processors.datastructures.DataStructuresProcessor.getAtomic(DataStructuresProcessor.java:541)
at 
org.apache.ignite.internal.processors.datastructures.DataStructuresProcessor.atomicLong(DataStructuresProcessor.java:457)
at org.apache.ignite.internal.IgniteKernal.atomicLong(IgniteKernal.java:3468)
at org.apache.ignite.internal.IgniteKernal.atomicLong(IgniteKernal.java:3457)
at 
org.apache.ignite.piclient.bean.LifecycleAtomicLongBean.onLifecycleEvent(LifecycleAtomicLongBean.java:48)
at 
org.apache.ignite.internal.IgniteKernal.notifyLifecycleBeans(IgniteKernal.java:655)
at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1064)
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1973)
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1716)
at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1144)
at 
org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1062)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:948)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:847)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:717)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:686)
at org.apache.ignite.Ignition.start(Ignition.java:352)
at 
org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:302)
Failed to start grid: null{code}

  was:
See https://issues.apache.org/jira/browse/IGNITE-11406 (same scenario)

On 5th iteration (on each iteration there is 50 round cluster nodes restart)
{code:java}
ava.lang.AssertionError
at 
org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.stopRoutine(GridContinuousProcessor.java:743)
at 
org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.executeQuery0(CacheContinuousQueryManager.java:705)
at 
org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.executeInternalQuery(CacheContinuousQueryManager.java:542)
at 
org.apache.ignite.internal.processors.datastructures.DataStructuresProcessor.startQuery(DataStructuresProcessor.java:213)
at 
org.apache.ignite.internal.processors.datastructures.DataStructuresProcessor.getAtomic(DataStructuresProcessor.java:541)
at 
org.apache.ignite.internal.processors.datastructures.DataStructuresProcessor.atomicLong(DataStructuresProcessor.java:457)
at org.apache.ignite.internal.IgniteKernal.atomicLong(IgniteKernal.java:3468)
at org.apache.ignite.internal.IgniteKernal.atomicLong(IgniteKernal.java:3457)
at 
org.apache.ignite.piclient.bean.LifecycleAtomicLongBean.onLifecycleEvent(LifecycleAtomicLongBean.java:48)
at 
org.apache.ignite.internal.IgniteKernal.notifyLifecycleBeans(IgniteKernal.java:655)
at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1064)
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance

[jira] [Created] (IGNITE-11406) NullPointerException may occurs on client start

2019-02-25 Thread Dmitry Sherstobitov (JIRA)
Dmitry Sherstobitov created IGNITE-11406:


 Summary: NullPointerException may occurs on client start
 Key: IGNITE-11406
 URL: https://issues.apache.org/jira/browse/IGNITE-11406
 Project: Ignite
  Issue Type: Bug
Reporter: Dmitry Sherstobitov


During testing fixes for https://issues.apache.org/jira/browse/IGNITE-10878
 # Start cluster, create caches with no persistence and load data into it
 # Restart each node in cluster by order (coordinator first)
Do not wait until topology message occurs 
 # Try to run utilities: activate, baseline (to check that cluster is alive)
 # Run clients and load data into alive caches

On 4th step one of the clients throw NPE on start
{code:java}
2019-02-23T18:36:24,045][DEBUG][tcp-client-disco-msg-worker-#4][TcpDiscoverySpi]
 Connection closed, local node received force fail message, will not try to 
restore connection
2019-02-23T18:36:24,045][DEBUG][tcp-client-disco-msg-worker-#4][TcpDiscoverySpi]
 Failed to restore closed connection, will try to reconnect 
[networkTimeout=5000, joinTimeout=0, failMsg=TcpDiscoveryNodeFailedMessage 
[failedNodeId=80f8b6ee-6a6d-4235-86e9-1b66ea310eb6, order=90, warning=Client 
node considered as unreachable and will be dropped from cluster, because no 
metrics update messages received in interval: 
TcpDiscoverySpi.clientFailureDetectionTimeout() ms. It may be caused by network 
problems or long GC pause on client node, try to increase this parameter. 
[nodeId=80f8b6ee-6a6d-4235-86e9-1b66ea310eb6, 
clientFailureDetectionTimeout=3], super=TcpDiscoveryAbstractMessage 
[sndNodeId=987d4a03-8233-4130-af5b-c06900bdb6d7, 
id=3642cfa1961-987d4a03-8233-4130-af5b-c06900bdb6d7, 
verifierNodeId=d9abbff3-4b4d-4a13-9cb1-0ca4d2436164, topVer=167, pendingIdx=0, 
failedNodes=null, isClient=false]]]
2019-02-23T18:36:24,046][DEBUG][tcp-client-disco-msg-worker-#4][TcpDiscoverySpi]
 Discovery notification [node=TcpDiscoveryNode 
[id=80f8b6ee-6a6d-4235-86e9-1b66ea310eb6, addrs=[172.25.1.34], 
sockAddrs=[lab34.gridgain.local/172.25.1.34:0], discPort=0, order=165, 
intOrder=0, lastExchangeTime=1550936128313, loc=true, 
ver=2.4.15#20190222-sha1:36b1d676, isClient=true], 
type=CLIENT_NODE_DISCONNECTED, topVer=166]
2019-02-23T18:36:24,049][INFO 
][tcp-client-disco-msg-worker-#4][GridDhtPartitionsExchangeFuture] Finish 
exchange future [startVer=AffinityTopologyVersion [topVer=165, minorTopVer=0], 
resVer=null, err=class 
org.apache.ignite.internal.IgniteClientDisconnectedCheckedException: Client 
node disconnected: null]
[2019-02-23T18:36:24,061][ERROR][Thread-2][IgniteKernal] Got exception while 
starting (will rollback startup routine).
java.lang.NullPointerException: null
at 
org.apache.ignite.internal.processors.cache.GridCacheProcessor.internalCacheEx(GridCacheProcessor.java:3886)
 ~[ignite-core-2.4.15.jar:2.4.15]
at 
org.apache.ignite.internal.processors.cache.GridCacheProcessor.utilityCache(GridCacheProcessor.java:3858)
 ~[ignite-core-2.4.15.jar:2.4.15]
at 
org.apache.ignite.internal.processors.service.GridServiceProcessor.updateUtilityCache(GridServiceProcessor.java:290)
 ~[ignite-core-2.4.15.jar:2.4.15]
at 
org.apache.ignite.internal.processors.service.GridServiceProcessor.onKernalStart0(GridServiceProcessor.java:233)
 ~[ignite-core-2.4.15.jar:2.4.15]
at 
org.apache.ignite.internal.processors.service.GridServiceProcessor.onKernalStart(GridServiceProcessor.java:221)
 ~[ignite-core-2.4.15.jar:2.4.15]
at 
org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1038) 
[ignite-core-2.4.15.jar:2.4.15]
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1973)
 [ignite-core-2.4.15.jar:2.4.15]
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1716)
 [ignite-core-2.4.15.jar:2.4.15]
at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1144) 
[ignite-core-2.4.15.jar:2.4.15]
at 
org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1062) 
[ignite-core-2.4.15.jar:2.4.15]
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:948) 
[ignite-core-2.4.15.jar:2.4.15]
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:847) 
[ignite-core-2.4.15.jar:2.4.15]
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:717) 
[ignite-core-2.4.15.jar:2.4.15]
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:686) 
[ignite-core-2.4.15.jar:2.4.15]
at org.apache.ignite.Ignition.start(Ignition.java:352) 
[ignite-core-2.4.15.jar:2.4.15]
at 
org.apache.ignite.piclient.api.IgniteService.startIgniteClientNode(IgniteService.java:86)
 [piclient-2.7.jar:?]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
~[?:1.8.0_181]
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce

[jira] [Updated] (IGNITE-11292) There is no way to disable WAL for cache in group

2019-02-12 Thread Dmitry Sherstobitov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Sherstobitov updated IGNITE-11292:
-
Description: 
Following code doesn't work if cache is in a cacheGroup:
{code:java}
ignite.cluster().disableWal(cacheName){code}
cacheName == cacheName:
{code:java}
Caused by: class org.apache.ignite.IgniteCheckedException: Cannot change WAL 
mode because not all cache names belonging to the group are provided 
[group=cache_group_1, missingCaches=[cache_group_1_005, cache_group_3_063, 
cache_group_1_003, cache_group_3_064, cache_group_1_004, cache_group_3_061, 
cache_group_3_062, cache_group_1_001, cache_group_1_002]]
{code}
cacheName == groupName:
{code:java}
Caused by: class org.apache.ignite.IgniteCheckedException: Cache doesn't exist: 
cache_group_1{code}
Also there is no javadoc about this behaviour

  was:
Following code doesn't work if cache is in a cacheGroup:
{code:java}
ignite.cluster().disableWal(cacheName){code}
cacheName == cacheName:
{code:java}
Caused by: class org.apache.ignite.IgniteCheckedException: Cannot change WAL 
mode because not all cache names belonging to the group are provided 
[group=cache_group_1, missingCaches=[cache_group_1_005, cache_group_3_063, 
cache_group_1_003, cache_group_3_064, cache_group_1_004, cache_group_3_061, 
cache_group_3_062, cache_group_1_001, cache_group_1_002]]
{code}
cacheName == groupName:
{code:java}
Caused by: class org.apache.ignite.IgniteCheckedException: Cache doesn't exist: 
cache_group_1{code}

Also there is no javadoc about this behavious


> There is no way to disable WAL for cache in group
> -
>
> Key: IGNITE-11292
> URL: https://issues.apache.org/jira/browse/IGNITE-11292
> Project: Ignite
>  Issue Type: Bug
>Reporter: Dmitry Sherstobitov
>Priority: Critical
>
> Following code doesn't work if cache is in a cacheGroup:
> {code:java}
> ignite.cluster().disableWal(cacheName){code}
> cacheName == cacheName:
> {code:java}
> Caused by: class org.apache.ignite.IgniteCheckedException: Cannot change WAL 
> mode because not all cache names belonging to the group are provided 
> [group=cache_group_1, missingCaches=[cache_group_1_005, cache_group_3_063, 
> cache_group_1_003, cache_group_3_064, cache_group_1_004, cache_group_3_061, 
> cache_group_3_062, cache_group_1_001, cache_group_1_002]]
> {code}
> cacheName == groupName:
> {code:java}
> Caused by: class org.apache.ignite.IgniteCheckedException: Cache doesn't 
> exist: cache_group_1{code}
> Also there is no javadoc about this behaviour



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11292) There is no way to disable WAL for cache in group

2019-02-12 Thread Dmitry Sherstobitov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Sherstobitov updated IGNITE-11292:
-
Description: 
Following code doesn't work if cache is in a cacheGroup:
{code:java}
ignite.cluster().disableWal(cacheName){code}
cacheName == cacheName:
{code:java}
Caused by: class org.apache.ignite.IgniteCheckedException: Cannot change WAL 
mode because not all cache names belonging to the group are provided 
[group=cache_group_1, missingCaches=[cache_group_1_005, cache_group_3_063, 
cache_group_1_003, cache_group_3_064, cache_group_1_004, cache_group_3_061, 
cache_group_3_062, cache_group_1_001, cache_group_1_002]]
{code}
cacheName == groupName:
{code:java}
Caused by: class org.apache.ignite.IgniteCheckedException: Cache doesn't exist: 
cache_group_1{code}

Also there is no javadoc about this behavious

  was:
Following code doesn't work if cache is in a cacheGroup:
{code:java}
ignite.cluster().disableWal(cacheName){code}
cacheName == cacheName:
{code:java}
Caused by: class org.apache.ignite.IgniteCheckedException: Cannot change WAL 
mode because not all cache names belonging to the group are provided 
[group=cache_group_1, missingCaches=[cache_group_1_005, cache_group_3_063, 
cache_group_1_003, cache_group_3_064, cache_group_1_004, cache_group_3_061, 
cache_group_3_062, cache_group_1_001, cache_group_1_002]]
{code}
cacheName == groupName:
{code:java}
Caused by: class org.apache.ignite.IgniteCheckedException: Cache doesn't exist: 
cache_group_1
{code}


> There is no way to disable WAL for cache in group
> -
>
> Key: IGNITE-11292
> URL: https://issues.apache.org/jira/browse/IGNITE-11292
> Project: Ignite
>  Issue Type: Bug
>Reporter: Dmitry Sherstobitov
>Priority: Critical
>
> Following code doesn't work if cache is in a cacheGroup:
> {code:java}
> ignite.cluster().disableWal(cacheName){code}
> cacheName == cacheName:
> {code:java}
> Caused by: class org.apache.ignite.IgniteCheckedException: Cannot change WAL 
> mode because not all cache names belonging to the group are provided 
> [group=cache_group_1, missingCaches=[cache_group_1_005, cache_group_3_063, 
> cache_group_1_003, cache_group_3_064, cache_group_1_004, cache_group_3_061, 
> cache_group_3_062, cache_group_1_001, cache_group_1_002]]
> {code}
> cacheName == groupName:
> {code:java}
> Caused by: class org.apache.ignite.IgniteCheckedException: Cache doesn't 
> exist: cache_group_1{code}
> Also there is no javadoc about this behavious



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11292) There is no way to disable WAL for cache in group

2019-02-11 Thread Dmitry Sherstobitov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Sherstobitov updated IGNITE-11292:
-
Description: 
Following code doesn't work if cache is in a cacheGroup:
{code:java}
ignite.cluster().disableWal(cacheName){code}
cacheName == cacheName:
{code:java}
Caused by: class org.apache.ignite.IgniteCheckedException: Cannot change WAL 
mode because not all cache names belonging to the group are provided 
[group=cache_group_1, missingCaches=[cache_group_1_005, cache_group_3_063, 
cache_group_1_003, cache_group_3_064, cache_group_1_004, cache_group_3_061, 
cache_group_3_062, cache_group_1_001, cache_group_1_002]]
{code}
cacheName == groupName:
{code:java}
Caused by: class org.apache.ignite.IgniteCheckedException: Cache doesn't exist: 
cache_group_1
{code}

  was:
Following code doesn't work if cache is in cacheGroup:
{code}ignite.cluster().disableWal(cacheName){code}

cacheName == cacheName:
{code}
Caused by: class org.apache.ignite.IgniteCheckedException: Cannot change WAL 
mode because not all cache names belonging to the group are provided 
[group=cache_group_1, missingCaches=[cache_group_1_005, cache_group_3_063, 
cache_group_1_003, cache_group_3_064, cache_group_1_004, cache_group_3_061, 
cache_group_3_062, cache_group_1_001, cache_group_1_002]]
{code}

cacheName == groupName:
{code}
Caused by: class org.apache.ignite.IgniteCheckedException: Cache doesn't exist: 
cache_group_1
{code}


> There is no way to disable WAL for cache in group
> -
>
> Key: IGNITE-11292
> URL: https://issues.apache.org/jira/browse/IGNITE-11292
> Project: Ignite
>  Issue Type: Bug
>Reporter: Dmitry Sherstobitov
>Priority: Critical
>
> Following code doesn't work if cache is in a cacheGroup:
> {code:java}
> ignite.cluster().disableWal(cacheName){code}
> cacheName == cacheName:
> {code:java}
> Caused by: class org.apache.ignite.IgniteCheckedException: Cannot change WAL 
> mode because not all cache names belonging to the group are provided 
> [group=cache_group_1, missingCaches=[cache_group_1_005, cache_group_3_063, 
> cache_group_1_003, cache_group_3_064, cache_group_1_004, cache_group_3_061, 
> cache_group_3_062, cache_group_1_001, cache_group_1_002]]
> {code}
> cacheName == groupName:
> {code:java}
> Caused by: class org.apache.ignite.IgniteCheckedException: Cache doesn't 
> exist: cache_group_1
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-11292) There is no way to disable WAL for cache in group

2019-02-11 Thread Dmitry Sherstobitov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-11292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16765037#comment-16765037
 ] 

Dmitry Sherstobitov commented on IGNITE-11292:
--

[~avinogradov] Could you please look at this issue?

> There is no way to disable WAL for cache in group
> -
>
> Key: IGNITE-11292
> URL: https://issues.apache.org/jira/browse/IGNITE-11292
> Project: Ignite
>  Issue Type: Bug
>Reporter: Dmitry Sherstobitov
>Priority: Critical
>
> Following code doesn't work if cache is in cacheGroup:
> {code}ignite.cluster().disableWal(cacheName){code}
> cacheName == cacheName:
> {code}
> Caused by: class org.apache.ignite.IgniteCheckedException: Cannot change WAL 
> mode because not all cache names belonging to the group are provided 
> [group=cache_group_1, missingCaches=[cache_group_1_005, cache_group_3_063, 
> cache_group_1_003, cache_group_3_064, cache_group_1_004, cache_group_3_061, 
> cache_group_3_062, cache_group_1_001, cache_group_1_002]]
> {code}
> cacheName == groupName:
> {code}
> Caused by: class org.apache.ignite.IgniteCheckedException: Cache doesn't 
> exist: cache_group_1
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11292) There is no way to disable WAL for cache in group

2019-02-11 Thread Dmitry Sherstobitov (JIRA)
Dmitry Sherstobitov created IGNITE-11292:


 Summary: There is no way to disable WAL for cache in group
 Key: IGNITE-11292
 URL: https://issues.apache.org/jira/browse/IGNITE-11292
 Project: Ignite
  Issue Type: Bug
Reporter: Dmitry Sherstobitov


Following code doesn't work if cache is in cacheGroup:
{code}ignite.cluster().disableWal(cacheName){code}

cacheName == cacheName:
{code}
Caused by: class org.apache.ignite.IgniteCheckedException: Cannot change WAL 
mode because not all cache names belonging to the group are provided 
[group=cache_group_1, missingCaches=[cache_group_1_005, cache_group_3_063, 
cache_group_1_003, cache_group_3_064, cache_group_1_004, cache_group_3_061, 
cache_group_3_062, cache_group_1_001, cache_group_1_002]]
{code}

cacheName == groupName:
{code}
Caused by: class org.apache.ignite.IgniteCheckedException: Cache doesn't exist: 
cache_group_1
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11100) AssertionError LocalJoinCachesContext occurs in sequential cluster restart

2019-01-28 Thread Dmitry Sherstobitov (JIRA)
Dmitry Sherstobitov created IGNITE-11100:


 Summary: AssertionError LocalJoinCachesContext occurs in 
sequential cluster restart
 Key: IGNITE-11100
 URL: https://issues.apache.org/jira/browse/IGNITE-11100
 Project: Ignite
  Issue Type: Bug
Reporter: Dmitry Sherstobitov


Same scenario as in https://issues.apache.org/jira/browse/IGNITE-10878

{code}
[2019-01-26T03:32:22,226][ERROR][tcp-disco-msg-worker-#2][TcpDiscoverySpi] 
TcpDiscoverSpi's message worker thread failed abnormally. Stopping the node in 
order to prevent cluster wide instability.
java.lang.AssertionError: LocalJoinCachesContext 
[locJoinStartCaches=[IgniteBiTuple [val1=DynamicCacheDescriptor 
[deploymentId=bc3e0978861-fb98885f-92a5-47d2-9475-00173fab8ee1, staticCfg=true, 
sql=false, cacheType=UTILITY, template=false, updatesAllowed=true, 
cacheId=-2100569601, rcvdFrom=f97e4743-6cf2-488e-a7fc-14707e9a8eb0, 
objCtx=null, rcvdOnDiscovery=false, startTopVer=null, rcvdFromVer=null, 
clientCacheStartVer=null, schema=QuerySchema [], grpDesc=CacheGroupDescriptor 
[grpId=-2100569601, grpName=null, startTopVer=null, 
rcvdFrom=f97e4743-6cf2-488e-a7fc-14707e9a8eb0, 
deploymentId=bc3e0978861-fb98885f-92a5-47d2-9475-00173fab8ee1, 
caches={ignite-sys-cache=-2100569601}, rcvdFromVer=null, 
persistenceEnabled=false, walEnabled=false, cacheName=ignite-sys-cache], 
cacheName=ignite-sys-cache], val2=null], IgniteBiTuple 
[val1=DynamicCacheDescriptor 
[deploymentId=60771978861-398164df-6240-4d19-ad0b-308768d2a095, 
staticCfg=false, sql=false, cacheType=USER, template=false, 
updatesAllowed=true, cacheId=-1901084566, 
rcvdFrom=f00ec506-fc6c-45c5-b550-9308d17a39cf, objCtx=null, 
rcvdOnDiscovery=true, startTopVer=null, rcvdFromVer=null, 
clientCacheStartVer=null, schema=QuerySchema [], grpDesc=CacheGroupDescriptor 
[grpId=-1901084566, grpName=null, startTopVer=AffinityTopologyVersion 
[topVer=13, minorTopVer=20], rcvdFrom=f00ec506-fc6c-45c5-b550-9308d17a39cf, 
deploymentId=60771978861-398164df-6240-4d19-ad0b-308768d2a095, 
caches={config_third_copy=-1901084566}, rcvdFromVer=null, 
persistenceEnabled=false, walEnabled=false, cacheName=config_third_copy], 
cacheName=config_third_copy], val2=null], IgniteBiTuple 
[val1=DynamicCacheDescriptor 
[deploymentId=01771978861-398164df-6240-4d19-ad0b-308768d2a095, 
staticCfg=false, sql=false, cacheType=USER, template=false, 
updatesAllowed=true, cacheId=-1858528402, 
rcvdFrom=f00ec506-fc6c-45c5-b550-9308d17a39cf, objCtx=null, 
rcvdOnDiscovery=true, startTopVer=null, rcvdFromVer=null, 
clientCacheStartVer=null, schema=QuerySchema [], grpDesc=CacheGroupDescriptor 
[grpId=-1858528402, grpName=null, startTopVer=AffinityTopologyVersion 
[topVer=13, minorTopVer=22], rcvdFrom=f00ec506-fc6c-45c5-b550-9308d17a39cf, 
deploymentId=01771978861-398164df-6240-4d19-ad0b-308768d2a095, 
caches={trans_forth_copy=-1858528402}, rcvdFromVer=null, 
persistenceEnabled=false, walEnabled=false, cacheName=trans_forth_copy], 
cacheName=trans_forth_copy], val2=null], IgniteBiTuple 
[val1=DynamicCacheDescriptor 
[deploymentId=51771978861-398164df-6240-4d19-ad0b-308768d2a095, 
staticCfg=false, sql=false, cacheType=USER, template=false, 
updatesAllowed=true, cacheId=-1502999781, 
rcvdFrom=f00ec506-fc6c-45c5-b550-9308d17a39cf, objCtx=null, 
rcvdOnDiscovery=true, startTopVer=null, rcvdFromVer=null, 
clientCacheStartVer=null, schema=QuerySchema [], grpDesc=CacheGroupDescriptor 
[grpId=-1502999781, grpName=null, startTopVer=AffinityTopologyVersion 
[topVer=13, minorTopVer=23], rcvdFrom=f00ec506-fc6c-45c5-b550-9308d17a39cf, 
deploymentId=51771978861-398164df-6240-4d19-ad0b-308768d2a095, 
caches={id_forth_copy=-1502999781}, rcvdFromVer=null, persistenceEnabled=false, 
walEnabled=false, cacheName=id_forth_copy], cacheName=id_forth_copy], 
val2=null], IgniteBiTuple [val1=DynamicCacheDescriptor 
[deploymentId=8a671978861-398164df-6240-4d19-ad0b-308768d2a095, 
staticCfg=false, sql=false, cacheType=USER, template=false, 
updatesAllowed=true, cacheId=-1354792126, 
rcvdFrom=f00ec506-fc6c-45c5-b550-9308d17a39cf, objCtx=null, 
rcvdOnDiscovery=true, startTopVer=null, rcvdFromVer=null, 
clientCacheStartVer=null, schema=QuerySchema [], grpDesc=CacheGroupDescriptor 
[grpId=-1354792126, grpName=null, startTopVer=AffinityTopologyVersion 
[topVer=13, minorTopVer=5], rcvdFrom=f00ec506-fc6c-45c5-b550-9308d17a39cf, 
deploymentId=8a671978861-398164df-6240-4d19-ad0b-308768d2a095, 
caches={config=-1354792126}, rcvdFromVer=null, persistenceEnabled=false, 
walEnabled=false, cacheName=config], cacheName=config], val2=null], 
IgniteBiTuple [val1=DynamicCacheDescriptor 
[deploymentId=6d671978861-398164df-6240-4d19-ad0b-308768d2a095, 
staticCfg=false, sql=false, cacheType=USER, template=false, 
updatesAllowed=true, cacheId=-1176672452, 
rcvdFrom=f00ec506-fc6c-45c5-b550-9308d17a39cf, objCtx=null, 
rcvdOnDiscovery=true, startTopVer=null, rcvdFromVer=null, 
clientCacheStartVe

[jira] [Updated] (IGNITE-10995) GridDhtPartitionSupplier::handleDemandMessage suppress errors

2019-01-21 Thread Dmitry Sherstobitov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Sherstobitov updated IGNITE-10995:
-
Description: 
Scenario:
 # Cluster with data
 # Triggered historical rebalance
 In this case if OOM occurs on supplier there is no failHandler triggered and 
cluster is alive with inconsistent data (target node have MOVING partitions, 
supplier do nothing)

Target rebalance node log:
{code:java}
[15:00:31,418][WARNING][sys-#86][GridDhtPartitionDemander] Rebalancing from 
node cancelled [grp=cache_group_4, topVer=AffinityTopologyVersion [topVer=17, 
minorTopVer=0], supplier=4cbc66d3-9d2c-4396-8366-2839a8d0cdb6, topic=5]]. 
Supplier has failed with error: java.lang.OutOfMemoryError: Java heap 
space{code}
Supplier stack trace:
 !Screenshot 2019-01-20 at 23.19.08.png!

  was:
Scenario:
 # Cluster with data
 # Triggered historical rebalance
In this case if OOM occurs on supplier there is no triggered failHandler and 
cluster is alive with inconsistent data (target node have MOVING partitions, 
supplier do nothing)

Target rebalance node log:

{code:java}
[15:00:31,418][WARNING][sys-#86][GridDhtPartitionDemander] Rebalancing from 
node cancelled [grp=cache_group_4, topVer=AffinityTopologyVersion [topVer=17, 
minorTopVer=0], supplier=4cbc66d3-9d2c-4396-8366-2839a8d0cdb6, topic=5]]. 
Supplier has failed with error: java.lang.OutOfMemoryError: Java heap 
space{code}
Supplier stack trace:
!Screenshot 2019-01-20 at 23.19.08.png!


> GridDhtPartitionSupplier::handleDemandMessage suppress errors
> -
>
> Key: IGNITE-10995
> URL: https://issues.apache.org/jira/browse/IGNITE-10995
> Project: Ignite
>  Issue Type: Bug
>Reporter: Dmitry Sherstobitov
>Priority: Major
> Attachments: Screenshot 2019-01-20 at 23.19.08.png
>
>
> Scenario:
>  # Cluster with data
>  # Triggered historical rebalance
>  In this case if OOM occurs on supplier there is no failHandler triggered and 
> cluster is alive with inconsistent data (target node have MOVING partitions, 
> supplier do nothing)
> Target rebalance node log:
> {code:java}
> [15:00:31,418][WARNING][sys-#86][GridDhtPartitionDemander] Rebalancing from 
> node cancelled [grp=cache_group_4, topVer=AffinityTopologyVersion [topVer=17, 
> minorTopVer=0], supplier=4cbc66d3-9d2c-4396-8366-2839a8d0cdb6, topic=5]]. 
> Supplier has failed with error: java.lang.OutOfMemoryError: Java heap 
> space{code}
> Supplier stack trace:
>  !Screenshot 2019-01-20 at 23.19.08.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-10995) GridDhtPartitionSupplier::handleDemandMessage suppress errors

2019-01-21 Thread Dmitry Sherstobitov (JIRA)
Dmitry Sherstobitov created IGNITE-10995:


 Summary: GridDhtPartitionSupplier::handleDemandMessage suppress 
errors
 Key: IGNITE-10995
 URL: https://issues.apache.org/jira/browse/IGNITE-10995
 Project: Ignite
  Issue Type: Bug
Reporter: Dmitry Sherstobitov
 Attachments: Screenshot 2019-01-20 at 23.19.08.png

Scenario:
 # Cluster with data
 # Triggered historical rebalance
In this case if OOM occurs on supplier there is no triggered failHandler and 
cluster is alive with inconsistent data (target node have MOVING partitions, 
supplier do nothing)

Target rebalance node log:

{code:java}
[15:00:31,418][WARNING][sys-#86][GridDhtPartitionDemander] Rebalancing from 
node cancelled [grp=cache_group_4, topVer=AffinityTopologyVersion [topVer=17, 
minorTopVer=0], supplier=4cbc66d3-9d2c-4396-8366-2839a8d0cdb6, topic=5]]. 
Supplier has failed with error: java.lang.OutOfMemoryError: Java heap 
space{code}
Supplier stack trace:
!Screenshot 2019-01-20 at 23.19.08.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-10943) "No next node in topology" infinite messages in log after cycle cluster nodes restart

2019-01-15 Thread Dmitry Sherstobitov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Sherstobitov updated IGNITE-10943:
-
Attachment: grid.1.node.1.jstack.log

> "No next node in topology" infinite messages in log after cycle cluster nodes 
> restart
> -
>
> Key: IGNITE-10943
> URL: https://issues.apache.org/jira/browse/IGNITE-10943
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.4
>Reporter: Dmitry Sherstobitov
>Priority: Critical
> Attachments: grid.1.node.1.jstack.log
>
>
> Same scenario as in https://issues.apache.org/jira/browse/IGNITE-10878
> After cluster restarted here is one node with 100% CPU load and following 
> messages in log:
> {code:java}
> 2019-01-15T15:16:41,333][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] 
> Message has been added to queue: TcpDiscoveryNodeFailedMessage 
> [failedNodeId=e006e575-bbc8-4004-8ce3-ddc165d1748c, order=12, warning=null, 
> super=TcpDiscoveryAbstractMessage [sndNodeId=null, 
> id=3cfe0715861-24a27aff-e471-4db1-ac46-cda072de17b9, verifierNodeId=null, 
> topVer=0, pendingIdx=0, failedNodes=null, isClient=false]]
> 2019-01-15T15:16:41,333][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] 
> Pending messages will be resent to local node
> 2019-01-15T15:16:41,333][INFO ][tcp-disco-srvr-#3][TcpDiscoverySpi] TCP 
> discovery spawning a new thread for connection [rmtAddr=/172.25.1.40, 
> rmtPort=59236]
> 2019-01-15T15:16:41,333][INFO ][tcp-disco-sock-reader-#21][TcpDiscoverySpi] 
> Started serving remote node connection [rmtAddr=/172.25.1.40:59236, 
> rmtPort=59236]
> 2019-01-15T15:16:41,333][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] 
> Message has been added to queue: TcpDiscoveryStatusCheckMessage 
> [creatorNode=TcpDiscoveryNode [id=24a27aff-e471-4db1-ac46-cda072de17b9, 
> addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 172.17.0.1, 172.25.1.40], 
> sockAddrs=[/172.17.0.1:47500, /0:0:0:0:0:0:0:1%lo:47500, 
> lab40.gridgain.local/172.25.1.40:47500, /127.0.0.1:47500], discPort=47500, 
> order=0, intOrder=15, lastExchangeTime=1547554584282, loc=true, 
> ver=2.4.13#20190114-sha1:a7667ae6, isClient=false], failedNodeId=null, 
> status=0, super=TcpDiscoveryAbstractMessage [sndNodeId=null, 
> id=4cfe0715861-24a27aff-e471-4db1-ac46-cda072de17b9, verifierNodeId=null, 
> topVer=0, pendingIdx=0, failedNodes=null, isClient=false]]
> 2019-01-15T15:16:41,334][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] 
> Ignore message failed nodes, sender node is in fail list 
> [nodeId=e006e575-bbc8-4004-8ce3-ddc165d1748c, 
> failedNodes=[a251994d-8df6-4b2d-a28c-18ec55a3a48c, 
> a5fa9095-2e4b-48e5-803d-551a5ebde558]]
> 2019-01-15T15:16:41,334][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] No 
> next node in topology.
> 2019-01-15T15:16:41,334][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] No 
> next node in topology.
> 2019-01-15T15:16:41,334][DEBUG][tcp-disco-sock-reader-#21][TcpDiscoverySpi] 
> Initialized connection with remote node 
> [nodeId=6df245fe-6288-4d93-ab20-2b9ac1b35771, client=false]
> 2019-01-15T15:16:41,334][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] No 
> next node in topology.
> 2019-01-15T15:16:41,335][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] No 
> next node in topology.
> 2019-01-15T15:16:41,335][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] No 
> next node in topology.
> 2019-01-15T15:16:41,335][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] No 
> next node in topology.
> 2019-01-15T15:16:41,335][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] No 
> next node in topology.
> 2019-01-15T15:16:41,335][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] No 
> next node in topology.
> 2019-01-15T15:16:41,335][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] No 
> next node in topology.
> 2019-01-15T15:16:41,336][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] No 
> next node in topology.
> 2019-01-15T15:16:41,336][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] No 
> next node in topology.{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-10943) "No next node in topology" infinite messages in log after cycle cluster nodes restart

2019-01-15 Thread Dmitry Sherstobitov (JIRA)
Dmitry Sherstobitov created IGNITE-10943:


 Summary: "No next node in topology" infinite messages in log after 
cycle cluster nodes restart
 Key: IGNITE-10943
 URL: https://issues.apache.org/jira/browse/IGNITE-10943
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.4
Reporter: Dmitry Sherstobitov
 Attachments: grid.1.node.1.jstack.log

Same scenario as in https://issues.apache.org/jira/browse/IGNITE-10878
After cluster restarted here is one node with 100% CPU load and following 
messages in log:
{code:java}
2019-01-15T15:16:41,333][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] 
Message has been added to queue: TcpDiscoveryNodeFailedMessage 
[failedNodeId=e006e575-bbc8-4004-8ce3-ddc165d1748c, order=12, warning=null, 
super=TcpDiscoveryAbstractMessage [sndNodeId=null, 
id=3cfe0715861-24a27aff-e471-4db1-ac46-cda072de17b9, verifierNodeId=null, 
topVer=0, pendingIdx=0, failedNodes=null, isClient=false]]
2019-01-15T15:16:41,333][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] 
Pending messages will be resent to local node
2019-01-15T15:16:41,333][INFO ][tcp-disco-srvr-#3][TcpDiscoverySpi] TCP 
discovery spawning a new thread for connection [rmtAddr=/172.25.1.40, 
rmtPort=59236]
2019-01-15T15:16:41,333][INFO ][tcp-disco-sock-reader-#21][TcpDiscoverySpi] 
Started serving remote node connection [rmtAddr=/172.25.1.40:59236, 
rmtPort=59236]
2019-01-15T15:16:41,333][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] 
Message has been added to queue: TcpDiscoveryStatusCheckMessage 
[creatorNode=TcpDiscoveryNode [id=24a27aff-e471-4db1-ac46-cda072de17b9, 
addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 172.17.0.1, 172.25.1.40], 
sockAddrs=[/172.17.0.1:47500, /0:0:0:0:0:0:0:1%lo:47500, 
lab40.gridgain.local/172.25.1.40:47500, /127.0.0.1:47500], discPort=47500, 
order=0, intOrder=15, lastExchangeTime=1547554584282, loc=true, 
ver=2.4.13#20190114-sha1:a7667ae6, isClient=false], failedNodeId=null, 
status=0, super=TcpDiscoveryAbstractMessage [sndNodeId=null, 
id=4cfe0715861-24a27aff-e471-4db1-ac46-cda072de17b9, verifierNodeId=null, 
topVer=0, pendingIdx=0, failedNodes=null, isClient=false]]
2019-01-15T15:16:41,334][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] 
Ignore message failed nodes, sender node is in fail list 
[nodeId=e006e575-bbc8-4004-8ce3-ddc165d1748c, 
failedNodes=[a251994d-8df6-4b2d-a28c-18ec55a3a48c, 
a5fa9095-2e4b-48e5-803d-551a5ebde558]]
2019-01-15T15:16:41,334][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] No 
next node in topology.
2019-01-15T15:16:41,334][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] No 
next node in topology.
2019-01-15T15:16:41,334][DEBUG][tcp-disco-sock-reader-#21][TcpDiscoverySpi] 
Initialized connection with remote node 
[nodeId=6df245fe-6288-4d93-ab20-2b9ac1b35771, client=false]
2019-01-15T15:16:41,334][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] No 
next node in topology.
2019-01-15T15:16:41,335][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] No 
next node in topology.
2019-01-15T15:16:41,335][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] No 
next node in topology.
2019-01-15T15:16:41,335][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] No 
next node in topology.
2019-01-15T15:16:41,335][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] No 
next node in topology.
2019-01-15T15:16:41,335][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] No 
next node in topology.
2019-01-15T15:16:41,335][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] No 
next node in topology.
2019-01-15T15:16:41,336][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] No 
next node in topology.
2019-01-15T15:16:41,336][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] No 
next node in topology.{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-10935) "Invalid node order" error occurs while cycle cluster nodes restart

2019-01-14 Thread Dmitry Sherstobitov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-10935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16742397#comment-16742397
 ] 

Dmitry Sherstobitov commented on IGNITE-10935:
--

Looks like this is another exception that may occurs in this scenario

> "Invalid node order" error occurs while cycle cluster nodes restart
> ---
>
> Key: IGNITE-10935
> URL: https://issues.apache.org/jira/browse/IGNITE-10935
> Project: Ignite
>  Issue Type: Bug
>Reporter: Dmitry Sherstobitov
>Priority: Critical
>
> Same scenario as in https://issues.apache.org/jira/browse/IGNITE-10878
> {code:java}
> Exception in thread "tcp-disco-msg-worker-#2" java.lang.AssertionError: 
> Invalid node order: TcpDiscoveryNode 
> [id=9a332aa3-3d60-469a-9ff5-3deee8918451, addrs=[0:0:0:0:0:0:0:1%lo, 
> 127.0.0.1, 172.17.0.1, 172.25.1.40], sockAddrs=[/172.25.1.40:47501, 
> /0:0:0:0:0:0:0:1%lo:47501, /127.0.0.1:47501, /172.17.0.1:47501], 
> discPort=47501, order=0, intOrder=16, lastExchangeTime=1547486771047, 
> loc=false, ver=2.4.13#20190114-sha1:a7667ae6, isClient=false]
> at 
> org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing$1.apply(TcpDiscoveryNodesRing.java:51)
> at 
> org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing$1.apply(TcpDiscoveryNodesRing.java:48)
> at org.apache.ignite.internal.util.lang.GridFunc.isAll(GridFunc.java:2030)
> at 
> org.apache.ignite.internal.util.IgniteUtils.arrayList(IgniteUtils.java:9635)
> at 
> org.apache.ignite.internal.util.IgniteUtils.arrayList(IgniteUtils.java:9608)
> at 
> org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing.nodes(TcpDiscoveryNodesRing.java:625)
> at 
> org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing.visibleNodes(TcpDiscoveryNodesRing.java:145)
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl.notifyDiscovery(ServerImpl.java:1429)
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl.access$2400(ServerImpl.java:176)
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processNodeAddFinishedMessage(ServerImpl.java:4565)
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2732)
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2554)
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerAdapter.body(ServerImpl.java:6955)
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2634)
> at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
> Collaps{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-10935) "Invalid node order" error occurs while cycle cluster nodes restart

2019-01-14 Thread Dmitry Sherstobitov (JIRA)
Dmitry Sherstobitov created IGNITE-10935:


 Summary: "Invalid node order" error occurs while cycle cluster 
nodes restart
 Key: IGNITE-10935
 URL: https://issues.apache.org/jira/browse/IGNITE-10935
 Project: Ignite
  Issue Type: Bug
Reporter: Dmitry Sherstobitov


Same scenario as in https://issues.apache.org/jira/browse/IGNITE-10878
{code:java}
Exception in thread "tcp-disco-msg-worker-#2" java.lang.AssertionError: Invalid 
node order: TcpDiscoveryNode [id=9a332aa3-3d60-469a-9ff5-3deee8918451, 
addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 172.17.0.1, 172.25.1.40], 
sockAddrs=[/172.25.1.40:47501, /0:0:0:0:0:0:0:1%lo:47501, /127.0.0.1:47501, 
/172.17.0.1:47501], discPort=47501, order=0, intOrder=16, 
lastExchangeTime=1547486771047, loc=false, ver=2.4.13#20190114-sha1:a7667ae6, 
isClient=false]
at 
org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing$1.apply(TcpDiscoveryNodesRing.java:51)
at 
org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing$1.apply(TcpDiscoveryNodesRing.java:48)
at org.apache.ignite.internal.util.lang.GridFunc.isAll(GridFunc.java:2030)
at org.apache.ignite.internal.util.IgniteUtils.arrayList(IgniteUtils.java:9635)
at org.apache.ignite.internal.util.IgniteUtils.arrayList(IgniteUtils.java:9608)
at 
org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing.nodes(TcpDiscoveryNodesRing.java:625)
at 
org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing.visibleNodes(TcpDiscoveryNodesRing.java:145)
at 
org.apache.ignite.spi.discovery.tcp.ServerImpl.notifyDiscovery(ServerImpl.java:1429)
at 
org.apache.ignite.spi.discovery.tcp.ServerImpl.access$2400(ServerImpl.java:176)
at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processNodeAddFinishedMessage(ServerImpl.java:4565)
at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2732)
at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2554)
at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerAdapter.body(ServerImpl.java:6955)
at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2634)
at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)




Collaps{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-10878) "Failed to find completed exchange future" error occurs in test with round cluster restart

2019-01-10 Thread Dmitry Sherstobitov (JIRA)
Dmitry Sherstobitov created IGNITE-10878:


 Summary: "Failed to find completed exchange future" error occurs 
in test with round cluster restart
 Key: IGNITE-10878
 URL: https://issues.apache.org/jira/browse/IGNITE-10878
 Project: Ignite
  Issue Type: Bug
Reporter: Dmitry Sherstobitov


# Start cluster, create caches with no persistence and load data into it
 # Restart each node in cluster by order (coordinator first)
 Do not wait until topology message occurs 
 # At some moment there is possibility of error (1 out of 20 runs)

This is the case when the topology version has time to be reset
{code:java}
[23:27:17,218][INFO][exchange-worker-#62][GridCacheProcessor] Started cache 
[name=ENTITY_CONFIG, id=23889694, memoryPolicyName=no-evict, mode=REPLICATED, 
atomicity=ATOMIC, backups=2147483647]
[23:27:17,222][SEVERE][exchange-worker-#62][GridDhtPartitionsExchangeFuture] 
Failed to reinitialize local partitions (preloading will be stopped): 
GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=1, 
minorTopVer=0], discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode 
[id=83bd0a25-4574-4723-9594-b95ddaab19be, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 
172.17.0.1, 172.25.1.40], sockAddrs=[/0:0:0:0:0:0:0:1%lo:47503, 
/127.0.0.1:47503, /172.17.0.1:47503, lab40.gridgain.local/172.25.1.40:47503], 
discPort=47503, order=1, intOrder=1, lastExchangeTime=1547065626462, loc=true, 
ver=2.4.13#20181228-sha1:9033812f, isClient=false], topVer=1, nodeId8=83bd0a25, 
msg=null, type=NODE_JOINED, tstamp=1547065636782], nodeId=83bd0a25, 
evt=NODE_JOINED]
class org.apache.ignite.IgniteCheckedException: Failed to find completed 
exchange future to fetch affinity.
at 
org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager$18.applyx(CacheAffinitySharedManager.java:1798)
at 
org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager$18.applyx(CacheAffinitySharedManager.java:1743)
at 
org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.forAllRegisteredCacheGroups(CacheAffinitySharedManager.java:1107)
at 
org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.initCoordinatorCaches(CacheAffinitySharedManager.java:1743)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.initCoordinatorCaches(GridDhtPartitionsExchangeFuture.java:573)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:679)
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2398)
at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
at java.lang.Thread.run(Thread.java:748)
[23:27:17,222][INFO][exchange-worker-#62][GridDhtPartitionsExchangeFuture] 
Finish exchange future [startVer=AffinityTopologyVersion [topVer=1, 
minorTopVer=0], resVer=null, err=class 
org.apache.ignite.IgniteCheckedException: Failed to find completed exchange 
future to fetch affinity.]
[23:27:17,238][SEVERE][main][IgniteKernal] Got exception while starting (will 
rollback startup routine).
class org.apache.ignite.IgniteCheckedException: Failed to find completed 
exchange future to fetch affinity.
at 
org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager$18.applyx(CacheAffinitySharedManager.java:1798)
at 
org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager$18.applyx(CacheAffinitySharedManager.java:1743)
at 
org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.forAllRegisteredCacheGroups(CacheAffinitySharedManager.java:1107)
at 
org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.initCoordinatorCaches(CacheAffinitySharedManager.java:1743)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.initCoordinatorCaches(GridDhtPartitionsExchangeFuture.java:573)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:679)
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2398)
at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
at java.lang.Thread.run(Thread.java:748)
[23:27:17,238][INFO][exchange-worker-#62][GridDhtPartitionsExchangeFuture] 
Completed partition exchange [localNode=83bd0a25-4574-4723-9594-b95ddaab19be, 
exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion 
[topVer=1, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode 
[id=83bd0a25-4574-4723-9594-b95ddaa

[jira] [Updated] (IGNITE-10672) Changing walSegments property leads to fallen node on start

2018-12-20 Thread Dmitry Sherstobitov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Sherstobitov updated IGNITE-10672:
-
Summary: Changing walSegments property leads to fallen node on start  (was: 
Changing walSegments property leads to fallen node)

> Changing walSegments property leads to fallen node on start
> ---
>
> Key: IGNITE-10672
> URL: https://issues.apache.org/jira/browse/IGNITE-10672
> Project: Ignite
>  Issue Type: Bug
>Reporter: Dmitry Sherstobitov
>Priority: Major
>
> Start cluster with
> {code}
>  
>  class="org.apache.ignite.configuration.DataStorageConfiguration">
> 
> 
>  class="org.apache.ignite.configuration.DataRegionConfiguration">
> 
> 
> 
> 
> 
> {code}
> Load some data and then restart cluster with new config:
> {code}
>  
>  class="org.apache.ignite.configuration.DataStorageConfiguration">
> 
> 
>  class="org.apache.ignite.configuration.DataRegionConfiguration">
> 
> 
> 
> 
> 
> 
> {code}
> This will lead nodes to fail on start
> {code}
> [14:51:00,852][SEVERE][main][IgniteKernal] Got exception while starting (will 
> rollback startup routine).
> class org.apache.ignite.IgniteCheckedException: Failed to start processor: 
> GridProcessorAdapter []
> at 
> org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1784)
> at 
> org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1008)
> at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2020)
> at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1725)
> at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1153)
> at 
> org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1071)
> at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:957)
> at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:856)
> at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:726)
> at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:695)
> at org.apache.ignite.Ignition.start(Ignition.java:348)
> at 
> org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:301)
> Caused by: class 
> org.apache.ignite.internal.processors.cache.persistence.StorageException: 
> Failed to initialize wal (work directory contains incorrect number of 
> segments) [cur=10, expected=5]
> at 
> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.checkOrPrepareFiles(FileWriteAheadLogManager.java:1408)
> at 
> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.start0(FileWriteAheadLogManager.java:435)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter.start(GridCacheSharedManagerAdapter.java:61)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.start(GridCacheProcessor.java:741)
> at 
> org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1781)
> ... 11 more
> [14:51:00,853][WARNING][main][IgniteKernal] Attempt to stop starting grid. 
> This operation cannot be guaranteed to be successful.
> [14:51:00,855][SEVERE][main][IgniteKernal] Failed to stop component 
> (ignoring): GridProcessorAdapter []
> java.lang.NullPointerException
> at 
> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.stop0(FileWriteAheadLogManager.java:631)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter.stop(GridCacheSharedManagerAdapter.java:94)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.stop(GridCacheProcessor.java:980)
> at 
> org.apache.ignite.internal.IgniteKernal.stop0(IgniteKernal.java:2312)
> at 
> org.apache.ignite.internal.IgniteKernal.stop(IgniteKernal.java:2190)
> at 
> org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1164)
> at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2020)
> at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1725)
> at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1153)
> at 
> org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1071)
> at org.apache.ignite.internal.IgnitionEx.

[jira] [Updated] (IGNITE-10672) Changing walSegments property leads to fallen node

2018-12-13 Thread Dmitry Sherstobitov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Sherstobitov updated IGNITE-10672:
-
Description: 
Start cluster with

{code}
 










{code}

Load some data and then restart cluster with new config:
{code}
 










{code}

This will lead nodes to fail on start
{code}
[14:51:00,852][SEVERE][main][IgniteKernal] Got exception while starting (will 
rollback startup routine).
class org.apache.ignite.IgniteCheckedException: Failed to start processor: 
GridProcessorAdapter []
at 
org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1784)
at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1008)
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2020)
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1725)
at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1153)
at 
org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1071)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:957)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:856)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:726)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:695)
at org.apache.ignite.Ignition.start(Ignition.java:348)
at 
org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:301)
Caused by: class 
org.apache.ignite.internal.processors.cache.persistence.StorageException: 
Failed to initialize wal (work directory contains incorrect number of segments) 
[cur=10, expected=5]
at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.checkOrPrepareFiles(FileWriteAheadLogManager.java:1408)
at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.start0(FileWriteAheadLogManager.java:435)
at 
org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter.start(GridCacheSharedManagerAdapter.java:61)
at 
org.apache.ignite.internal.processors.cache.GridCacheProcessor.start(GridCacheProcessor.java:741)
at 
org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1781)
... 11 more
[14:51:00,853][WARNING][main][IgniteKernal] Attempt to stop starting grid. This 
operation cannot be guaranteed to be successful.
[14:51:00,855][SEVERE][main][IgniteKernal] Failed to stop component (ignoring): 
GridProcessorAdapter []
java.lang.NullPointerException
at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.stop0(FileWriteAheadLogManager.java:631)
at 
org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter.stop(GridCacheSharedManagerAdapter.java:94)
at 
org.apache.ignite.internal.processors.cache.GridCacheProcessor.stop(GridCacheProcessor.java:980)
at org.apache.ignite.internal.IgniteKernal.stop0(IgniteKernal.java:2312)
at org.apache.ignite.internal.IgniteKernal.stop(IgniteKernal.java:2190)
at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1164)
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2020)
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1725)
at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1153)
at 
org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1071)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:957)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:856)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:726)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:695)
at org.apache.ignite.Ignition.start(Ignition.java:348)
at 
org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:301)
{code}

  was:
Start cluster with

{code}
 










{code}

Load some data and then restart cluster with new config:
{code}
 










{code}

This will lead node 

[jira] [Updated] (IGNITE-10672) Changing walSegments property leads to fallen node

2018-12-13 Thread Dmitry Sherstobitov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Sherstobitov updated IGNITE-10672:
-
Description: 
Start cluster with

{code}
 









{code}

Load some data and then restart cluster with new config:
{code}
 










{code}

This will lead nodes to fail on start
{code}
[14:51:00,852][SEVERE][main][IgniteKernal] Got exception while starting (will 
rollback startup routine).
class org.apache.ignite.IgniteCheckedException: Failed to start processor: 
GridProcessorAdapter []
at 
org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1784)
at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1008)
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2020)
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1725)
at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1153)
at 
org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1071)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:957)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:856)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:726)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:695)
at org.apache.ignite.Ignition.start(Ignition.java:348)
at 
org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:301)
Caused by: class 
org.apache.ignite.internal.processors.cache.persistence.StorageException: 
Failed to initialize wal (work directory contains incorrect number of segments) 
[cur=10, expected=5]
at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.checkOrPrepareFiles(FileWriteAheadLogManager.java:1408)
at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.start0(FileWriteAheadLogManager.java:435)
at 
org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter.start(GridCacheSharedManagerAdapter.java:61)
at 
org.apache.ignite.internal.processors.cache.GridCacheProcessor.start(GridCacheProcessor.java:741)
at 
org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1781)
... 11 more
[14:51:00,853][WARNING][main][IgniteKernal] Attempt to stop starting grid. This 
operation cannot be guaranteed to be successful.
[14:51:00,855][SEVERE][main][IgniteKernal] Failed to stop component (ignoring): 
GridProcessorAdapter []
java.lang.NullPointerException
at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.stop0(FileWriteAheadLogManager.java:631)
at 
org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter.stop(GridCacheSharedManagerAdapter.java:94)
at 
org.apache.ignite.internal.processors.cache.GridCacheProcessor.stop(GridCacheProcessor.java:980)
at org.apache.ignite.internal.IgniteKernal.stop0(IgniteKernal.java:2312)
at org.apache.ignite.internal.IgniteKernal.stop(IgniteKernal.java:2190)
at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1164)
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2020)
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1725)
at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1153)
at 
org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1071)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:957)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:856)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:726)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:695)
at org.apache.ignite.Ignition.start(Ignition.java:348)
at 
org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:301)
{code}

  was:
Start cluster with

{code}
 










{code}

Load some data and then restart cluster with new config:
{code}
 










{code}

This will lead nodes to fail on start

[jira] [Created] (IGNITE-10672) Changing walSegments property leads to fallen node

2018-12-13 Thread Dmitry Sherstobitov (JIRA)
Dmitry Sherstobitov created IGNITE-10672:


 Summary: Changing walSegments property leads to fallen node
 Key: IGNITE-10672
 URL: https://issues.apache.org/jira/browse/IGNITE-10672
 Project: Ignite
  Issue Type: Bug
Reporter: Dmitry Sherstobitov


Start cluster with

{code}
 










{code}

Load some data and then restart cluster with new config:
{code}
 










{code}

This will lead node to error on start
{code}
[14:51:00,852][SEVERE][main][IgniteKernal] Got exception while starting (will 
rollback startup routine).
class org.apache.ignite.IgniteCheckedException: Failed to start processor: 
GridProcessorAdapter []
at 
org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1784)
at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1008)
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2020)
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1725)
at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1153)
at 
org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1071)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:957)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:856)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:726)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:695)
at org.apache.ignite.Ignition.start(Ignition.java:348)
at 
org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:301)
Caused by: class 
org.apache.ignite.internal.processors.cache.persistence.StorageException: 
Failed to initialize wal (work directory contains incorrect number of segments) 
[cur=10, expected=5]
at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.checkOrPrepareFiles(FileWriteAheadLogManager.java:1408)
at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.start0(FileWriteAheadLogManager.java:435)
at 
org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter.start(GridCacheSharedManagerAdapter.java:61)
at 
org.apache.ignite.internal.processors.cache.GridCacheProcessor.start(GridCacheProcessor.java:741)
at 
org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1781)
... 11 more
[14:51:00,853][WARNING][main][IgniteKernal] Attempt to stop starting grid. This 
operation cannot be guaranteed to be successful.
[14:51:00,855][SEVERE][main][IgniteKernal] Failed to stop component (ignoring): 
GridProcessorAdapter []
java.lang.NullPointerException
at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.stop0(FileWriteAheadLogManager.java:631)
at 
org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter.stop(GridCacheSharedManagerAdapter.java:94)
at 
org.apache.ignite.internal.processors.cache.GridCacheProcessor.stop(GridCacheProcessor.java:980)
at org.apache.ignite.internal.IgniteKernal.stop0(IgniteKernal.java:2312)
at org.apache.ignite.internal.IgniteKernal.stop(IgniteKernal.java:2190)
at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1164)
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2020)
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1725)
at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1153)
at 
org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1071)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:957)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:856)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:726)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:695)
at org.apache.ignite.Ignition.start(Ignition.java:348)
at 
org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:301)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-10189) SslContextFactory's ciphers doesn't work with control.sh utility

2018-11-08 Thread Dmitry Sherstobitov (JIRA)
Dmitry Sherstobitov created IGNITE-10189:


 Summary: SslContextFactory's ciphers doesn't work with control.sh 
utility
 Key: IGNITE-10189
 URL: https://issues.apache.org/jira/browse/IGNITE-10189
 Project: Ignite
  Issue Type: Bug
Reporter: Dmitry Sherstobitov


There is no options for control.sh utility if ciphers feature enabled on server

If this property enabled on server:
{code}


   ...

   ...


{code}

Control.sh utility doesn't work



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-8895) Update yardstick libraries

2018-11-01 Thread Dmitry Sherstobitov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-8895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Sherstobitov updated IGNITE-8895:

Description: 
There is some conflicts in yardstick libraries for now
||yardstick||core||problem||
|jline-0.9.94.jar|bin/include/sqlline/jline-2.4.3.jar|./sqlline.sh unable to 
start because of yardstick libraries in the PATH|

 

 

  was:
There is some conflicts in yardstick libraries for now
||yardstick||core||problem||
|jline-0.9.94.jar|bin/include/sqlline/jline-2.4.3.jar|./sqlline.sh unable to 
start if yardstick libraries in path|

 

 


> Update yardstick libraries 
> ---
>
> Key: IGNITE-8895
> URL: https://issues.apache.org/jira/browse/IGNITE-8895
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.5
>Reporter: Dmitry Sherstobitov
>Priority: Major
>
> There is some conflicts in yardstick libraries for now
> ||yardstick||core||problem||
> |jline-0.9.94.jar|bin/include/sqlline/jline-2.4.3.jar|./sqlline.sh unable to 
> start because of yardstick libraries in the PATH|
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-6167) Ability to enabled TLS protocols and cipher suites

2018-10-29 Thread Dmitry Sherstobitov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-6167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16667039#comment-16667039
 ] 

Dmitry Sherstobitov commented on IGNITE-6167:
-

Duplicate javadoc

{code:java}
/**
 * Gets enabled cipher suites
 * @return enabled cipher suites
 */
public String[] getCipherSuites() {
return cipherSuites;
}

/**
 * Gets enabled cipher suites
 * @return enabled cipher suites
 */
public String[] getProtocols() {
return protocols;
}
{code}

> Ability to enabled TLS protocols and cipher suites
> --
>
> Key: IGNITE-6167
> URL: https://issues.apache.org/jira/browse/IGNITE-6167
> Project: Ignite
>  Issue Type: Wish
>  Components: security
>Affects Versions: 2.1
>Reporter: Jens Borgland
>Assignee: Mikhail Cherkasov
>Priority: Major
> Fix For: 2.7
>
>
> It would be very useful to be able to, in addition to the 
> {{javax.net.ssl.SSLContext}}, either specify a custom 
> {{javax.net.ssl.SSLServerSocketFactory}} and a custom 
> {{javax.net.ssl.SSLSocketFactory}}, or to be able to at least specify the 
> enabled TLS protocols and cipher suites.
> I have noticed that the 
> {{org.apache.ignite.internal.util.nio.ssl.GridNioSslFilter}} has support for 
> the latter but I cannot find a way of getting a reference to the filter 
> instance. The {{GridNioSslFilter}} also isn't used by {{TcpDiscoverySpi}} as 
> far as I can tell.
> Currently (as far as I can tell) there is no way of specifying the enabled 
> cipher suites and protocols used by Ignite, without doing it globally for the 
> JRE.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9752) Fix ODBC documentation

2018-10-05 Thread Dmitry Sherstobitov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16639915#comment-16639915
 ] 

Dmitry Sherstobitov commented on IGNITE-9752:
-

I've updated description
localhost IP should be fixed too

> Fix ODBC documentation
> --
>
> Key: IGNITE-9752
> URL: https://issues.apache.org/jira/browse/IGNITE-9752
> Project: Ignite
>  Issue Type: Bug
>  Components: documentation
>Reporter: Dmitry Sherstobitov
>Assignee: Prachi Garg
>Priority: Blocker
> Fix For: 2.7
>
> Attachments: image-2018-10-01-17-12-21-555.png
>
>
> See screen shot.
> There is no matching between default values and values in example 
> host in default - 0.0.0.0
>  port in default - 10800
> host in example 127.0.0.1 
>  port - 12345 
> Parameters in xml example will be not working for external connections 
> (because of 127.0.0.1 using instead of 0.0.0.0)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-9752) Fix ODBC documentation

2018-10-05 Thread Dmitry Sherstobitov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Sherstobitov updated IGNITE-9752:

Description: 
See screen shot.

There is no matching between default values and values in example 

host in default - 0.0.0.0
 port in default - 10800

host in example 127.0.0.1 
 port - 12345 

Parameters in xml example will be not working for external connections (because 
of 127.0.0.1 using instead of 0.0.0.0)

  was:
See screen shot.

There is no matching between default values and values in example 

host in default - 0.0.0.0
 port in default - 10800

host in example 127.0.0.1 
 port - 12345 

Parameters in xml example will be not working for external connections


> Fix ODBC documentation
> --
>
> Key: IGNITE-9752
> URL: https://issues.apache.org/jira/browse/IGNITE-9752
> Project: Ignite
>  Issue Type: Bug
>  Components: documentation
>Reporter: Dmitry Sherstobitov
>Assignee: Prachi Garg
>Priority: Blocker
> Fix For: 2.7
>
> Attachments: image-2018-10-01-17-12-21-555.png
>
>
> See screen shot.
> There is no matching between default values and values in example 
> host in default - 0.0.0.0
>  port in default - 10800
> host in example 127.0.0.1 
>  port - 12345 
> Parameters in xml example will be not working for external connections 
> (because of 127.0.0.1 using instead of 0.0.0.0)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-9752) Fix ODBC documentation

2018-10-05 Thread Dmitry Sherstobitov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Sherstobitov updated IGNITE-9752:

Description: 
See screen shot.

There is no matching between default values and values in example 

host in default - 0.0.0.0
 port in default - 10800

host in example 127.0.0.1 
 port - 12345 

Parameters in xml example will be not working for external connections

  was:
See screen shot.

There is no matching between default values and values in example 

host in default - 0.0.0.0
port in default - 10800

host in example 127.0.0.1 (does it visible inside machine?)
port - 12345 


> Fix ODBC documentation
> --
>
> Key: IGNITE-9752
> URL: https://issues.apache.org/jira/browse/IGNITE-9752
> Project: Ignite
>  Issue Type: Bug
>  Components: documentation
>Reporter: Dmitry Sherstobitov
>Assignee: Prachi Garg
>Priority: Blocker
> Fix For: 2.7
>
> Attachments: image-2018-10-01-17-12-21-555.png
>
>
> See screen shot.
> There is no matching between default values and values in example 
> host in default - 0.0.0.0
>  port in default - 10800
> host in example 127.0.0.1 
>  port - 12345 
> Parameters in xml example will be not working for external connections



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Issue Comment Deleted] (IGNITE-9298) control.sh does not support SSL (org.apache.ignite.internal.commandline.CommandHandler)

2018-10-04 Thread Dmitry Sherstobitov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Sherstobitov updated IGNITE-9298:

Comment: was deleted

(was: We've increased chaos in args naming:
{code:java}
/** */
protected static final String CMD_PING_TIMEOUT = "--ping-timeout";
/** */
private static final String CMD_DUMP = "--dump";
/** */
private static final String CMD_SKIP_ZEROS = "--skipZeros";
// SSL configuration section
/** */
protected static final String CMD_SSL_ENABLED = "--ssl_enabled";
/** */
protected static final String CMD_SSL_PROTOCOL = "--ssl_protocol";{code}
Here is 3 different types of split word: with dash, with capital letter and 
with '_')

> control.sh does not support SSL 
> (org.apache.ignite.internal.commandline.CommandHandler)
> ---
>
> Key: IGNITE-9298
> URL: https://issues.apache.org/jira/browse/IGNITE-9298
> Project: Ignite
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 2.6
>Reporter: Paul Anderson
>Assignee: Paul Anderson
>Priority: Major
> Fix For: 2.7
>
> Attachments: Arguments.patch, CommandHandler.patch
>
>
> We required SSL on the connector port and to use control.sh to work with the 
> baseline configuration.
> This morning I added support, see attached patches against 2.6.0 for 
> org/apache/ignite/internal/commandline/CommandHandler.java
> org/apache/ignite/internal/commandline/Arguments.java
> No tests, no docs.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9298) control.sh does not support SSL (org.apache.ignite.internal.commandline.CommandHandler)

2018-10-04 Thread Dmitry Sherstobitov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16638287#comment-16638287
 ] 

Dmitry Sherstobitov commented on IGNITE-9298:
-

We've increased chaos in args naming:
{code:java}
/** */
protected static final String CMD_PING_TIMEOUT = "--ping-timeout";
/** */
private static final String CMD_DUMP = "--dump";
/** */
private static final String CMD_SKIP_ZEROS = "--skipZeros";
// SSL configuration section
/** */
protected static final String CMD_SSL_ENABLED = "--ssl_enabled";
/** */
protected static final String CMD_SSL_PROTOCOL = "--ssl_protocol";{code}
Here is 3 different types of split word: with dash, with capital letter and 
with '_'

> control.sh does not support SSL 
> (org.apache.ignite.internal.commandline.CommandHandler)
> ---
>
> Key: IGNITE-9298
> URL: https://issues.apache.org/jira/browse/IGNITE-9298
> Project: Ignite
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 2.6
>Reporter: Paul Anderson
>Assignee: Paul Anderson
>Priority: Major
> Fix For: 2.7
>
> Attachments: Arguments.patch, CommandHandler.patch
>
>
> We required SSL on the connector port and to use control.sh to work with the 
> baseline configuration.
> This morning I added support, see attached patches against 2.6.0 for 
> org/apache/ignite/internal/commandline/CommandHandler.java
> org/apache/ignite/internal/commandline/Arguments.java
> No tests, no docs.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-9752) Fix ODBC documentation

2018-10-01 Thread Dmitry Sherstobitov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Sherstobitov updated IGNITE-9752:

Description: 
See screen shot.

There is no matching between default values and values in example 

  was:
See screen shot.

There is no matching between default values and values in example 
!image-2018-10-01-17-12-28-557.png!


> Fix ODBC documentation
> --
>
> Key: IGNITE-9752
> URL: https://issues.apache.org/jira/browse/IGNITE-9752
> Project: Ignite
>  Issue Type: Bug
>Reporter: Dmitry Sherstobitov
>Priority: Major
> Attachments: image-2018-10-01-17-12-21-555.png
>
>
> See screen shot.
> There is no matching between default values and values in example 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-9752) Fix ODBC documentation

2018-10-01 Thread Dmitry Sherstobitov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Sherstobitov updated IGNITE-9752:

Description: 
See screen shot.

There is no matching between default values and values in example 

host in default - 0.0.0.0
port in default - 10800

host in example 127.0.0.1 (does it visible inside machine?)
port - 12345 

  was:
See screen shot.

There is no matching between default values and values in example 


> Fix ODBC documentation
> --
>
> Key: IGNITE-9752
> URL: https://issues.apache.org/jira/browse/IGNITE-9752
> Project: Ignite
>  Issue Type: Bug
>Reporter: Dmitry Sherstobitov
>Priority: Major
> Attachments: image-2018-10-01-17-12-21-555.png
>
>
> See screen shot.
> There is no matching between default values and values in example 
> host in default - 0.0.0.0
> port in default - 10800
> host in example 127.0.0.1 (does it visible inside machine?)
> port - 12345 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-9752) Fix ODBC documentation

2018-10-01 Thread Dmitry Sherstobitov (JIRA)
Dmitry Sherstobitov created IGNITE-9752:
---

 Summary: Fix ODBC documentation
 Key: IGNITE-9752
 URL: https://issues.apache.org/jira/browse/IGNITE-9752
 Project: Ignite
  Issue Type: Bug
Reporter: Dmitry Sherstobitov
 Attachments: image-2018-10-01-17-12-21-555.png

See screen shot.

There is no matching between default values and values in example 
!image-2018-10-01-17-12-28-557.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-9751) Fix odbc driver description

2018-10-01 Thread Dmitry Sherstobitov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Sherstobitov updated IGNITE-9751:

Summary: Fix odbc driver description  (was: Fix odic driver description)

> Fix odbc driver description
> ---
>
> Key: IGNITE-9751
> URL: https://issues.apache.org/jira/browse/IGNITE-9751
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.7
>Reporter: Dmitry Sherstobitov
>Priority: Major
> Attachments: Screen Shot 2018-10-01 at 14.55.21.png
>
>
> There is no version and company 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-9751) Fix odic driver description

2018-10-01 Thread Dmitry Sherstobitov (JIRA)
Dmitry Sherstobitov created IGNITE-9751:
---

 Summary: Fix odic driver description
 Key: IGNITE-9751
 URL: https://issues.apache.org/jira/browse/IGNITE-9751
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.7
Reporter: Dmitry Sherstobitov
 Attachments: Screen Shot 2018-10-01 at 14.55.21.png

There is no version and company 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9309) LocalNodeMovingPartitionsCount metrics may calculates incorrect due to processFullPartitionUpdate

2018-08-27 Thread Dmitry Sherstobitov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16593502#comment-16593502
 ] 

Dmitry Sherstobitov commented on IGNITE-9309:
-

[~avinogradov] [~Mmuzaf]

I've moved files from 7165 here.
Please look at [~Jokser] comment

> LocalNodeMovingPartitionsCount metrics may calculates incorrect due to 
> processFullPartitionUpdate
> -
>
> Key: IGNITE-9309
> URL: https://issues.apache.org/jira/browse/IGNITE-9309
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.6
>Reporter: Maxim Muzafarov
>Priority: Major
> Attachments: GridCacheRebalancingCancelTestNoReproduce.java, 
> node-2-jstack.log, node-NO_REBALANCE-7165.log
>
>
> [~qvad] have found incorrect {{LocalNodeMovingPartitionsCount}} metrics 
> calculation on client node {{JOIN\LEFT}}. Full issue reproducer is absent.
> Probable scenario:
> {code}
> Repeat 10 times:
> 1. stop node
> 2. clean lfs
> 3. add stopped node (trigger rebalance)
> 4. 3 times: start 2 clients, wait for topology snapshot, close clients
> 5. for each cache group check JMX metrics LocalNodeMovingPartitionsCount 
> (like waitForFinishRebalance())
> {code}
> Whole discussion and all configuration details can be found in comments of 
> [IGNITE-7165|https://issues.apache.org/jira/browse/IGNITE-7165].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-9309) LocalNodeMovingPartitionsCount metrics may calculates incorrect due to processFullPartitionUpdate

2018-08-27 Thread Dmitry Sherstobitov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Sherstobitov updated IGNITE-9309:

Attachment: node-NO_REBALANCE-7165.log
node-2-jstack.log
GridCacheRebalancingCancelTestNoReproduce.java

> LocalNodeMovingPartitionsCount metrics may calculates incorrect due to 
> processFullPartitionUpdate
> -
>
> Key: IGNITE-9309
> URL: https://issues.apache.org/jira/browse/IGNITE-9309
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.6
>Reporter: Maxim Muzafarov
>Priority: Major
> Attachments: GridCacheRebalancingCancelTestNoReproduce.java, 
> node-2-jstack.log, node-NO_REBALANCE-7165.log
>
>
> [~qvad] have found incorrect {{LocalNodeMovingPartitionsCount}} metrics 
> calculation on client node {{JOIN\LEFT}}. Full issue reproducer is absent.
> Probable scenario:
> {code}
> Repeat 10 times:
> 1. stop node
> 2. clean lfs
> 3. add stopped node (trigger rebalance)
> 4. 3 times: start 2 clients, wait for topology snapshot, close clients
> 5. for each cache group check JMX metrics LocalNodeMovingPartitionsCount 
> (like waitForFinishRebalance())
> {code}
> Whole discussion and all configuration details can be found in comments of 
> [IGNITE-7165|https://issues.apache.org/jira/browse/IGNITE-7165].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-7165) Re-balancing is cancelled if client node joins

2018-08-16 Thread Dmitry Sherstobitov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16582657#comment-16582657
 ] 

Dmitry Sherstobitov commented on IGNITE-7165:
-

[~Mmuzaf]

Here is code of the test. There is one big problem with it: it is single JVM. 
Second problem that it is not a reproducer.   
[^GridCacheRebalancingCancelTestNoReproduce.java] 

 

> Re-balancing is cancelled if client node joins
> --
>
> Key: IGNITE-7165
> URL: https://issues.apache.org/jira/browse/IGNITE-7165
> Project: Ignite
>  Issue Type: Bug
>Reporter: Mikhail Cherkasov
>Assignee: Maxim Muzafarov
>Priority: Critical
>  Labels: rebalance
> Fix For: 2.7
>
> Attachments: GridCacheRebalancingCancelTestNoReproduce.java, 
> node-2-jstack.log, node-NO_REBALANCE-7165.log
>
>
> Re-balancing is canceled if client node joins. Re-balancing can take hours 
> and each time when client node joins it starts again:
> [15:10:05,700][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager]
>  Added new node to topology: TcpDiscoveryNode 
> [id=979cf868-1c37-424a-9ad1-12db501f32ef, addrs=[0:0:0:0:0:0:0:1, 127.0.0.1, 
> 172.31.16.213], sockAddrs=[/0:0:0:0:0:0:0:1:0, /127.0.0.1:0, 
> /172.31.16.213:0], discPort=0, order=36, intOrder=24, 
> lastExchangeTime=1512907805688, loc=false, ver=2.3.1#20171129-sha1:4b1ec0fe, 
> isClient=true]
> [15:10:05,701][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager]
>  Topology snapshot [ver=36, servers=7, clients=5, CPUs=128, heap=160.0GB]
> [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Started 
> exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> crd=false, evt=NODE_JOINED, evtNode=979cf868-1c37-424a-9ad1-12db501f32ef, 
> customEvt=null, allowMerge=true]
> [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionsExchangeFuture]
>  Finish exchange future [startVer=AffinityTopologyVersion [topVer=36, 
> minorTopVer=0], resVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> err=null]
> [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Finished 
> exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> crd=false]
> [15:10:05,703][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager]
>  Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion 
> [topVer=36, minorTopVer=0], evt=NODE_JOINED, 
> node=979cf868-1c37-424a-9ad1-12db501f32ef]
> [15:10:08,706][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Cancelled rebalancing from all nodes [topology=AffinityTopologyVersion 
> [topVer=35, minorTopVer=0]]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager]
>  Rebalancing scheduled [order=[statementp]]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager]
>  Rebalancing started [top=null, evt=NODE_JOINED, 
> node=a8be3c14-9add-48c3-b099-3fd304cfdbf4]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=2f6bde48-ffb5-4815-bd32-df4e57dc13e0, partitionsCount=18, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=35d01141-4dce-47dd-adf6-a4f3b2bb9da9, partitionsCount=15, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=b3a8be53-e61f-4023-a906-a265923837ba, partitionsCount=15, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=f825cb4e-7dcc-405f-a40d-c1dc1a3ade5a, partitionsCount=12, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=4ae1db91-8b88-4180-a84b-127a303959e9, partitionsCount=11, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=7c286481-7638-49e4-8c68-fa6aa65d8b76, partitionsCount=18, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> so in clusters with a big amount of data and the frequent client left/join 

[jira] [Updated] (IGNITE-7165) Re-balancing is cancelled if client node joins

2018-08-16 Thread Dmitry Sherstobitov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Sherstobitov updated IGNITE-7165:

Attachment: GridCacheRebalancingCancelTestNoReproduce.java

> Re-balancing is cancelled if client node joins
> --
>
> Key: IGNITE-7165
> URL: https://issues.apache.org/jira/browse/IGNITE-7165
> Project: Ignite
>  Issue Type: Bug
>Reporter: Mikhail Cherkasov
>Assignee: Maxim Muzafarov
>Priority: Critical
>  Labels: rebalance
> Fix For: 2.7
>
> Attachments: GridCacheRebalancingCancelTestNoReproduce.java, 
> node-2-jstack.log, node-NO_REBALANCE-7165.log
>
>
> Re-balancing is canceled if client node joins. Re-balancing can take hours 
> and each time when client node joins it starts again:
> [15:10:05,700][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager]
>  Added new node to topology: TcpDiscoveryNode 
> [id=979cf868-1c37-424a-9ad1-12db501f32ef, addrs=[0:0:0:0:0:0:0:1, 127.0.0.1, 
> 172.31.16.213], sockAddrs=[/0:0:0:0:0:0:0:1:0, /127.0.0.1:0, 
> /172.31.16.213:0], discPort=0, order=36, intOrder=24, 
> lastExchangeTime=1512907805688, loc=false, ver=2.3.1#20171129-sha1:4b1ec0fe, 
> isClient=true]
> [15:10:05,701][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager]
>  Topology snapshot [ver=36, servers=7, clients=5, CPUs=128, heap=160.0GB]
> [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Started 
> exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> crd=false, evt=NODE_JOINED, evtNode=979cf868-1c37-424a-9ad1-12db501f32ef, 
> customEvt=null, allowMerge=true]
> [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionsExchangeFuture]
>  Finish exchange future [startVer=AffinityTopologyVersion [topVer=36, 
> minorTopVer=0], resVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> err=null]
> [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Finished 
> exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> crd=false]
> [15:10:05,703][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager]
>  Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion 
> [topVer=36, minorTopVer=0], evt=NODE_JOINED, 
> node=979cf868-1c37-424a-9ad1-12db501f32ef]
> [15:10:08,706][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Cancelled rebalancing from all nodes [topology=AffinityTopologyVersion 
> [topVer=35, minorTopVer=0]]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager]
>  Rebalancing scheduled [order=[statementp]]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager]
>  Rebalancing started [top=null, evt=NODE_JOINED, 
> node=a8be3c14-9add-48c3-b099-3fd304cfdbf4]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=2f6bde48-ffb5-4815-bd32-df4e57dc13e0, partitionsCount=18, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=35d01141-4dce-47dd-adf6-a4f3b2bb9da9, partitionsCount=15, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=b3a8be53-e61f-4023-a906-a265923837ba, partitionsCount=15, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=f825cb4e-7dcc-405f-a40d-c1dc1a3ade5a, partitionsCount=12, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=4ae1db91-8b88-4180-a84b-127a303959e9, partitionsCount=11, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=7c286481-7638-49e4-8c68-fa6aa65d8b76, partitionsCount=18, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> so in clusters with a big amount of data and the frequent client left/join 
> events this means that a new server will never receive its partitions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-7165) Re-balancing is cancelled if client node joins

2018-08-16 Thread Dmitry Sherstobitov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16582617#comment-16582617
 ] 

Dmitry Sherstobitov commented on IGNITE-7165:
-

Looks like we have old issue with hanging rebalance. 7165 pull request increase 
the chance of this issue

Adding Thread.sleep(200) in this method solves the problem:
{code:java}
GridCachePartitionExchangeManager.java

private void processFullPartitionUpdate(ClusterNode node, 
GridDhtPartitionsFullMessage msg) {
if (!enterBusy())
return;

try {
if (msg.exchangeId() == null) {
if (log.isDebugEnabled())
log.debug("Received full partition update [node=" + node.id() + 
", msg=" + msg + ']');

boolean updated = false;

for (Map.Entry entry : 
msg.partitions().entrySet()) {
try {
Thread.sleep(200);
} catch (InterruptedException e) {
e.printStackTrace();
}{code}
I've recently test 137dd06aaee9cc84104e6b4bf48306b050341e3a + this code in my 
test environment and it's passed.

> Re-balancing is cancelled if client node joins
> --
>
> Key: IGNITE-7165
> URL: https://issues.apache.org/jira/browse/IGNITE-7165
> Project: Ignite
>  Issue Type: Bug
>Reporter: Mikhail Cherkasov
>Assignee: Maxim Muzafarov
>Priority: Critical
>  Labels: rebalance
> Fix For: 2.7
>
> Attachments: node-2-jstack.log, node-NO_REBALANCE-7165.log
>
>
> Re-balancing is canceled if client node joins. Re-balancing can take hours 
> and each time when client node joins it starts again:
> [15:10:05,700][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager]
>  Added new node to topology: TcpDiscoveryNode 
> [id=979cf868-1c37-424a-9ad1-12db501f32ef, addrs=[0:0:0:0:0:0:0:1, 127.0.0.1, 
> 172.31.16.213], sockAddrs=[/0:0:0:0:0:0:0:1:0, /127.0.0.1:0, 
> /172.31.16.213:0], discPort=0, order=36, intOrder=24, 
> lastExchangeTime=1512907805688, loc=false, ver=2.3.1#20171129-sha1:4b1ec0fe, 
> isClient=true]
> [15:10:05,701][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager]
>  Topology snapshot [ver=36, servers=7, clients=5, CPUs=128, heap=160.0GB]
> [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Started 
> exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> crd=false, evt=NODE_JOINED, evtNode=979cf868-1c37-424a-9ad1-12db501f32ef, 
> customEvt=null, allowMerge=true]
> [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionsExchangeFuture]
>  Finish exchange future [startVer=AffinityTopologyVersion [topVer=36, 
> minorTopVer=0], resVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> err=null]
> [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Finished 
> exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> crd=false]
> [15:10:05,703][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager]
>  Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion 
> [topVer=36, minorTopVer=0], evt=NODE_JOINED, 
> node=979cf868-1c37-424a-9ad1-12db501f32ef]
> [15:10:08,706][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Cancelled rebalancing from all nodes [topology=AffinityTopologyVersion 
> [topVer=35, minorTopVer=0]]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager]
>  Rebalancing scheduled [order=[statementp]]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager]
>  Rebalancing started [top=null, evt=NODE_JOINED, 
> node=a8be3c14-9add-48c3-b099-3fd304cfdbf4]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=2f6bde48-ffb5-4815-bd32-df4e57dc13e0, partitionsCount=18, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=35d01141-4dce-47dd-adf6-a4f3b2bb9da9, partitionsCount=15, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=b3a8be53-e61f-4023-a906-a265923837ba, partitionsCount=15, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=f825cb4e-7dcc-405f-a40d-c1dc1a3ade5a, partitionsCount=12, 
> topology=AffinityTopologyVersio

[jira] [Commented] (IGNITE-7165) Re-balancing is cancelled if client node joins

2018-08-16 Thread Dmitry Sherstobitov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16582459#comment-16582459
 ] 

Dmitry Sherstobitov commented on IGNITE-7165:
-

[~avinogradov]

Yes.

Two builds under test: from apache-ignite master 
(137dd06aaee9cc84104e6b4bf48306b050341e3a and 
f6f731f575290b10d6d6bcb6869bb0a1b470455e revisions)

> Re-balancing is cancelled if client node joins
> --
>
> Key: IGNITE-7165
> URL: https://issues.apache.org/jira/browse/IGNITE-7165
> Project: Ignite
>  Issue Type: Bug
>Reporter: Mikhail Cherkasov
>Assignee: Maxim Muzafarov
>Priority: Critical
>  Labels: rebalance
> Fix For: 2.7
>
> Attachments: node-2-jstack.log, node-NO_REBALANCE-7165.log
>
>
> Re-balancing is canceled if client node joins. Re-balancing can take hours 
> and each time when client node joins it starts again:
> [15:10:05,700][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager]
>  Added new node to topology: TcpDiscoveryNode 
> [id=979cf868-1c37-424a-9ad1-12db501f32ef, addrs=[0:0:0:0:0:0:0:1, 127.0.0.1, 
> 172.31.16.213], sockAddrs=[/0:0:0:0:0:0:0:1:0, /127.0.0.1:0, 
> /172.31.16.213:0], discPort=0, order=36, intOrder=24, 
> lastExchangeTime=1512907805688, loc=false, ver=2.3.1#20171129-sha1:4b1ec0fe, 
> isClient=true]
> [15:10:05,701][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager]
>  Topology snapshot [ver=36, servers=7, clients=5, CPUs=128, heap=160.0GB]
> [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Started 
> exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> crd=false, evt=NODE_JOINED, evtNode=979cf868-1c37-424a-9ad1-12db501f32ef, 
> customEvt=null, allowMerge=true]
> [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionsExchangeFuture]
>  Finish exchange future [startVer=AffinityTopologyVersion [topVer=36, 
> minorTopVer=0], resVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> err=null]
> [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Finished 
> exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> crd=false]
> [15:10:05,703][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager]
>  Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion 
> [topVer=36, minorTopVer=0], evt=NODE_JOINED, 
> node=979cf868-1c37-424a-9ad1-12db501f32ef]
> [15:10:08,706][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Cancelled rebalancing from all nodes [topology=AffinityTopologyVersion 
> [topVer=35, minorTopVer=0]]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager]
>  Rebalancing scheduled [order=[statementp]]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager]
>  Rebalancing started [top=null, evt=NODE_JOINED, 
> node=a8be3c14-9add-48c3-b099-3fd304cfdbf4]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=2f6bde48-ffb5-4815-bd32-df4e57dc13e0, partitionsCount=18, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=35d01141-4dce-47dd-adf6-a4f3b2bb9da9, partitionsCount=15, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=b3a8be53-e61f-4023-a906-a265923837ba, partitionsCount=15, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=f825cb4e-7dcc-405f-a40d-c1dc1a3ade5a, partitionsCount=12, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=4ae1db91-8b88-4180-a84b-127a303959e9, partitionsCount=11, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=7c286481-7638-49e4-8c68-fa6aa65d8b76, partitionsCount=18, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> so in clusters with a big amount of data and the frequent client left/join 
> events this means that a new server will never receive its partition

[jira] [Commented] (IGNITE-7165) Re-balancing is cancelled if client node joins

2018-08-16 Thread Dmitry Sherstobitov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16582280#comment-16582280
 ] 

Dmitry Sherstobitov commented on IGNITE-7165:
-

[~avinogradov]
1) I'm trying to solve this problem by writing jUnit reproducer.
2) I've test with and without this pull request and it is definitely a problem 
in this commit.

> Re-balancing is cancelled if client node joins
> --
>
> Key: IGNITE-7165
> URL: https://issues.apache.org/jira/browse/IGNITE-7165
> Project: Ignite
>  Issue Type: Bug
>Reporter: Mikhail Cherkasov
>Assignee: Maxim Muzafarov
>Priority: Critical
>  Labels: rebalance
> Fix For: 2.7
>
> Attachments: node-2-jstack.log, node-NO_REBALANCE-7165.log
>
>
> Re-balancing is canceled if client node joins. Re-balancing can take hours 
> and each time when client node joins it starts again:
> [15:10:05,700][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager]
>  Added new node to topology: TcpDiscoveryNode 
> [id=979cf868-1c37-424a-9ad1-12db501f32ef, addrs=[0:0:0:0:0:0:0:1, 127.0.0.1, 
> 172.31.16.213], sockAddrs=[/0:0:0:0:0:0:0:1:0, /127.0.0.1:0, 
> /172.31.16.213:0], discPort=0, order=36, intOrder=24, 
> lastExchangeTime=1512907805688, loc=false, ver=2.3.1#20171129-sha1:4b1ec0fe, 
> isClient=true]
> [15:10:05,701][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager]
>  Topology snapshot [ver=36, servers=7, clients=5, CPUs=128, heap=160.0GB]
> [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Started 
> exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> crd=false, evt=NODE_JOINED, evtNode=979cf868-1c37-424a-9ad1-12db501f32ef, 
> customEvt=null, allowMerge=true]
> [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionsExchangeFuture]
>  Finish exchange future [startVer=AffinityTopologyVersion [topVer=36, 
> minorTopVer=0], resVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> err=null]
> [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Finished 
> exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> crd=false]
> [15:10:05,703][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager]
>  Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion 
> [topVer=36, minorTopVer=0], evt=NODE_JOINED, 
> node=979cf868-1c37-424a-9ad1-12db501f32ef]
> [15:10:08,706][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Cancelled rebalancing from all nodes [topology=AffinityTopologyVersion 
> [topVer=35, minorTopVer=0]]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager]
>  Rebalancing scheduled [order=[statementp]]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager]
>  Rebalancing started [top=null, evt=NODE_JOINED, 
> node=a8be3c14-9add-48c3-b099-3fd304cfdbf4]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=2f6bde48-ffb5-4815-bd32-df4e57dc13e0, partitionsCount=18, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=35d01141-4dce-47dd-adf6-a4f3b2bb9da9, partitionsCount=15, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=b3a8be53-e61f-4023-a906-a265923837ba, partitionsCount=15, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=f825cb4e-7dcc-405f-a40d-c1dc1a3ade5a, partitionsCount=12, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=4ae1db91-8b88-4180-a84b-127a303959e9, partitionsCount=11, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=7c286481-7638-49e4-8c68-fa6aa65d8b76, partitionsCount=18, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> so in clusters with a big amount of data and the frequent client left/join 
> events this means that a new server will never receive its part

[jira] [Comment Edited] (IGNITE-7165) Re-balancing is cancelled if client node joins

2018-08-15 Thread Dmitry Sherstobitov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16581164#comment-16581164
 ] 

Dmitry Sherstobitov edited comment on IGNITE-7165 at 8/15/18 2:44 PM:
--

Following set of caches leads to bug in my test :) All this caches unable to 
change their JMX properties after clients connect/disconnect.

I'm still trying to reduce this list, but for now this is final set
 
{code:xml}





























































 



 






















 



 










 


 


































{code}

This is log from test frameworks output:
{code}
Current metric state for cache cache_group_1_028 on node 2: 19
Current metric state for cache cache_group_2_058 on node 2: 32
Current metric state for cache cache_group_5 on node 2: 128
Current metric state for cache cache_group_5 on node 2: 128
Current metric state for cache cache_group_4 on node 2: 512
Current metric state for cache cache_group_4_118 on node 2: 32
Current metric state for cache cache_group_6 on node 2: 64
Current metric state for cache cache_group_2_031 on node 2: 512
Current metric state for cache cache_group_6 on node 2: 64
[17:43:27][:568 :617] Wait rebalance to finish 7/240
Current metric state for cache cache_group_1_028 on node 2: 19
Current metric state for cache cache_group_2_058 on node 2: 32
Current metric state for cache cache_group_5 on node 2: 128
Current metric state for cache cache_group_5 on node 2: 128
Current metric state for cache cache_group_4 on node 2: 512
Current metric state for cache cache_group_4_118 on node 2: 32
Current metric state for cache cache_group_6 on node 2: 64
Current metric state for cache cache_group_2_031 on node 2: 512
Current metric state for cache cache_group_6 on node 2: 64
{code}


was (Author: qvad):
Following set of caches leads to bug in my test :) All this caches unable to 
change their JMX properties after clients connect/disconnect.

I'm still trying to reduce this list, but for now this is final set
 
{code:xml}





























































 



 






















 



 










 


 


































{code}

> Re-balancing is cancelled if client node joins
> --
>
> Key: IGNITE-7165
> URL: https://issues.apache.org/jira/browse/IGNITE-7165
> Project: Ignite
>  Issue Type: Bug
>Reporter: Mikhail Cherkasov
>Assignee: Maxim Muzafarov
>Priority: Critical
>  Labels: rebalance
> Fix For: 2.7
>
> Attachments: node-2-jstack.log, node-NO_REBALANCE-7165.log
>
>
> Re-balancing is canceled if client node joins. Re-balancing can take hours 
> and each time when client node joins it starts again:
> [15:10:05,700][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager]
>  Added new node to topology: TcpDiscoveryNode 
> [id=979cf868-1c37-424a-9ad1-12db501f32ef, addrs=[0:0:0:0:0:0:0:1, 127.0.0.1, 
> 172.31.16.213], sockAddrs=[/0:0:0:0:0:0:0:1:0, /127.0.0.1:0, 
> /172.31.16.213:0], discPort=0, order=36,

[jira] [Comment Edited] (IGNITE-7165) Re-balancing is cancelled if client node joins

2018-08-15 Thread Dmitry Sherstobitov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16581164#comment-16581164
 ] 

Dmitry Sherstobitov edited comment on IGNITE-7165 at 8/15/18 2:42 PM:
--

Following set of caches leads to bug in my test :) All this caches unable to 
change their JMX properties after clients connect/disconnect.

I'm still trying to reduce this list, but for now this is final set
 
{code:xml}





























































 



 






















 



 










 


 


































{code}


was (Author: qvad):
Following set of caches leads to bug in my test :)

I'm still trying to reduce this list, but for now this is final set
 
{code:xml}














































 



 






















 



 










 


 


































{code}

> Re-balancing is cancelled if client node joins
> --
>
> Key: IGNITE-7165
> URL: https://issues.apache.org/jira/browse/IGNITE-7165
> Project: Ignite
>  Issue Type: Bug
>Reporter: Mikhail Cherkasov
>Assignee: Maxim Muzafarov
>Priority: Critical
>  Labels: rebalance
> Fix For: 2.7
>
> Attachments: node-2-jstack.log, node-NO_REBALANCE-7165.log
>
>
> Re-balancing is canceled if client node joins. Re-balancing can take hours 
> and each time when client node joins it starts again:
> [15:10:05,700][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager]
>  Added new node to topology: TcpDiscoveryNode 
> [id=979cf868-1c37-424a-9ad1-12db501f32ef, addrs=[0:0:0:0:0:0:0:1, 127.0.0.1, 
> 172.31.16.213], sockAddrs=[/0:0:0:0:0:0:0:1:0, /127.0.0.1:0, 
> /172.31.16.213:0], discPort=0, order=36, intOrder=24, 
> lastExchangeTime=1512907805688, loc=false, ver=2.3.1#20171129-sha1:4b1ec0fe, 
> isClient=true]
> [15:10:05,701][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager]
>  Topology snapshot [ver=36, servers=7, clients=5, CPUs=128, heap=160.0GB]
> [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Started 
> exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> crd=false, evt=NODE_JOINED, evtNode=979cf868-1c37-424a-9ad1-12db501f32ef, 
> customEvt=null, allowMerge=true]
> [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionsExchangeFuture]
>  Finish exchange future [startVer=AffinityTopologyVersion [topVer=36, 
> minorTopVer=0], resVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> err=null]
> [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Finished 
> exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> crd=false]
> [15:10:05,703][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager]
>  Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion 
> [topVer=36, minorTopVer=0], evt=NODE_JOINED, 
> node=979cf868-1c37-424a-9ad1-12db501f32ef]
> [15:10:08,706][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Cancelled rebalancing from all nodes [topology=AffinityTopologyVersion 
> [topVer=35, minorTopVer=0]]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager]
>  R

[jira] [Commented] (IGNITE-7165) Re-balancing is cancelled if client node joins

2018-08-15 Thread Dmitry Sherstobitov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16581164#comment-16581164
 ] 

Dmitry Sherstobitov commented on IGNITE-7165:
-

Following set of caches leads to bug in my test :)

I'm still trying to reduce this list, but for now this is final set
 
{code:xml}














































 



 






















 



 










 


 


































{code}

> Re-balancing is cancelled if client node joins
> --
>
> Key: IGNITE-7165
> URL: https://issues.apache.org/jira/browse/IGNITE-7165
> Project: Ignite
>  Issue Type: Bug
>Reporter: Mikhail Cherkasov
>Assignee: Maxim Muzafarov
>Priority: Critical
>  Labels: rebalance
> Fix For: 2.7
>
> Attachments: node-2-jstack.log, node-NO_REBALANCE-7165.log
>
>
> Re-balancing is canceled if client node joins. Re-balancing can take hours 
> and each time when client node joins it starts again:
> [15:10:05,700][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager]
>  Added new node to topology: TcpDiscoveryNode 
> [id=979cf868-1c37-424a-9ad1-12db501f32ef, addrs=[0:0:0:0:0:0:0:1, 127.0.0.1, 
> 172.31.16.213], sockAddrs=[/0:0:0:0:0:0:0:1:0, /127.0.0.1:0, 
> /172.31.16.213:0], discPort=0, order=36, intOrder=24, 
> lastExchangeTime=1512907805688, loc=false, ver=2.3.1#20171129-sha1:4b1ec0fe, 
> isClient=true]
> [15:10:05,701][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager]
>  Topology snapshot [ver=36, servers=7, clients=5, CPUs=128, heap=160.0GB]
> [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Started 
> exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> crd=false, evt=NODE_JOINED, evtNode=979cf868-1c37-424a-9ad1-12db501f32ef, 
> customEvt=null, allowMerge=true]
> [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionsExchangeFuture]
>  Finish exchange future [startVer=AffinityTopologyVersion [topVer=36, 
> minorTopVer=0], resVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> err=null]
> [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Finished 
> exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> crd=false]
> [15:10:05,703][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager]
>  Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion 
> [topVer=36, minorTopVer=0], evt=NODE_JOINED, 
> node=979cf868-1c37-424a-9ad1-12db501f32ef]
> [15:10:08,706][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Cancelled rebalancing from all nodes [topology=AffinityTopologyVersion 
> [topVer=35, minorTopVer=0]]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager]
>  Rebalancing scheduled [order=[statementp]]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager]
>  Rebalancing started [top=null, evt=NODE_JOINED, 
> node=a8be3c14-9add-48c3-b099-3fd304cfdbf4]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=2f6bde48-ffb5-4815-bd32-df4e57dc13e0, partitionsCount=18, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=35d01141-4dce-47dd-adf6-a4f3b2bb9da9, partitionsCount=15, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=b3a8be53-e61f-4023-a906-a265923837ba, partitionsCount=15, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=f825cb4e-7dcc-405f-a40d-c1dc1a3ade5a, partitionsCount=12, 
> topology=AffinityTopologyVersion

[jira] [Commented] (IGNITE-7165) Re-balancing is cancelled if client node joins

2018-08-15 Thread Dmitry Sherstobitov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16581079#comment-16581079
 ] 

Dmitry Sherstobitov commented on IGNITE-7165:
-

[~Mmuzaf]

I'm looking on this issue too. I've some additional information: cache 
cache_group_1_028 is not the single cache in test. And there is some dependency 
on caches number or(and) configs. I will provide you information, when I'll got 
some results.

> Re-balancing is cancelled if client node joins
> --
>
> Key: IGNITE-7165
> URL: https://issues.apache.org/jira/browse/IGNITE-7165
> Project: Ignite
>  Issue Type: Bug
>Reporter: Mikhail Cherkasov
>Assignee: Maxim Muzafarov
>Priority: Critical
>  Labels: rebalance
> Fix For: 2.7
>
> Attachments: node-2-jstack.log, node-NO_REBALANCE-7165.log
>
>
> Re-balancing is canceled if client node joins. Re-balancing can take hours 
> and each time when client node joins it starts again:
> [15:10:05,700][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager]
>  Added new node to topology: TcpDiscoveryNode 
> [id=979cf868-1c37-424a-9ad1-12db501f32ef, addrs=[0:0:0:0:0:0:0:1, 127.0.0.1, 
> 172.31.16.213], sockAddrs=[/0:0:0:0:0:0:0:1:0, /127.0.0.1:0, 
> /172.31.16.213:0], discPort=0, order=36, intOrder=24, 
> lastExchangeTime=1512907805688, loc=false, ver=2.3.1#20171129-sha1:4b1ec0fe, 
> isClient=true]
> [15:10:05,701][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager]
>  Topology snapshot [ver=36, servers=7, clients=5, CPUs=128, heap=160.0GB]
> [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Started 
> exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> crd=false, evt=NODE_JOINED, evtNode=979cf868-1c37-424a-9ad1-12db501f32ef, 
> customEvt=null, allowMerge=true]
> [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionsExchangeFuture]
>  Finish exchange future [startVer=AffinityTopologyVersion [topVer=36, 
> minorTopVer=0], resVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> err=null]
> [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Finished 
> exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> crd=false]
> [15:10:05,703][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager]
>  Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion 
> [topVer=36, minorTopVer=0], evt=NODE_JOINED, 
> node=979cf868-1c37-424a-9ad1-12db501f32ef]
> [15:10:08,706][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Cancelled rebalancing from all nodes [topology=AffinityTopologyVersion 
> [topVer=35, minorTopVer=0]]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager]
>  Rebalancing scheduled [order=[statementp]]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager]
>  Rebalancing started [top=null, evt=NODE_JOINED, 
> node=a8be3c14-9add-48c3-b099-3fd304cfdbf4]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=2f6bde48-ffb5-4815-bd32-df4e57dc13e0, partitionsCount=18, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=35d01141-4dce-47dd-adf6-a4f3b2bb9da9, partitionsCount=15, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=b3a8be53-e61f-4023-a906-a265923837ba, partitionsCount=15, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=f825cb4e-7dcc-405f-a40d-c1dc1a3ade5a, partitionsCount=12, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=4ae1db91-8b88-4180-a84b-127a303959e9, partitionsCount=11, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=7c286481-7638-49e4-8c68-fa6aa65d8b76, partitionsCount=18, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> so in clusters with a big amount of data and the frequent 

[jira] [Comment Edited] (IGNITE-7165) Re-balancing is cancelled if client node joins

2018-08-15 Thread Dmitry Sherstobitov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580970#comment-16580970
 ] 

Dmitry Sherstobitov edited comment on IGNITE-7165 at 8/15/18 12:15 PM:
---

[~Mmuzaf]

Config:
([^node-2-jstack.log]PD: Disabling persistance solves the problem)
{code:java}


























{code}
Test code:
{code:java}
def test_blinking_clients_clean_lfs(self):
"""
IGNITE-7165
"""
self.wait_for_running_clients_num(client_num=0, timeout=120)

self.start_grid() # start 4 nodes

for _ in range(0, 10):
log_print("Iteration %s" % str(_))

self.assert_nodes_alive() # check that no nodes left grid because of 
FailHandler

self.ignite.kill_node(2)
self._cleanup_lfs(2)
self.ignite.start_node(2)

# start Ignition.start() with client config and do nothing 3 times
with PiClient(self.ignite, self.get_client_config()):
pass 

with PiClient(self.ignite, self.get_client_config()):
pass

with PiClient(self.ignite, self.get_client_config()):
pass

# check LocalNodeMovingPartitionsCount metric for all cache groups in 
cluster
# wait that for all cache groups this value will be 0
self.wait_for_finish_rebalance(){code}
self.start_grid() start real grid on distributed servers using ignite.sh 
scripts. with PiClient block start JVM and runs Ignition.start() with client 
config (major difference with server config is clientMode=true)

Log file of this test contains following information: metric dos not change 
their state in 240 seconds in current master. (I've recently check this on 15 
Aug nightly build)
{code:java}
Current metric state for cache cache_group_1_028 on node 2: 19
[14:44:58][:568 :617] Wait rebalance to finish 7/240
Current metric state for cache cache_group_1_028 on node 2: 19
[14:45:04][:568 :617] Wait rebalance to finish 13/240
Current metric state for cache cache_group_1_028 on node 2: 19

[14:48:47][:568 :617] Wait rebalance to finish 236/240
Current metric state for cache cache_group_1_028 on node 2: 19{code}
Config of the cache that fails:
{code:xml}


















{code}
I'm afraid that this all information that I can provide for you for now. I've 
attached jstack from node2: [^node-2-jstack.log]^[^node-2-jstack.log]^


was (Author: qvad):
[~Mmuzaf]

Config:
{code:java}


























{code}
Test code:
{code}
def test_blinking_clients_clean_lfs(self):
"""
IGNITE-7165
"""
self.wait_for_running_clients_num(client_num=0, timeout=120)

self.start_grid() # start 4 nodes

for _ in range(0, 10):
log_print("Iteration %s" % str(_))

self.assert_nodes_alive() # check that no nodes left grid because of 
FailHandler

self.ignite.kill_node(2)
self._cleanup_lfs(2)
self.ignite.start_node(2)

# start Ignition.start() with client config and do nothing 3 times
with PiClient(self.ignite, self.get_client_config()):
pass 

with PiClient(self.ignite, self.get_client_config()):
pass

with PiClient(self.ignite, self.get_client_config()):
pass

# check LocalNodeMovingPartitionsCount metric for all cache groups in 
cluster
# wait that for all cache groups this value will be 0
self.wait_for_finish_rebalance(){code}
self.start_grid() start real grid on distributed servers using ignite.sh 
scripts. with PiClient block start JVM and runs Ignition.start() with client 
config (major difference with server config is clientMode=true)

Log file of this test contains following information: metric dos not change 
their state in 240 seconds in current master. (I've recently check this on 15 
Aug nightly build)
{code}
Current metric state for cache cache_group_1_028 on node 2: 19
[14:44:58][:568 :617] Wait rebalance to finish 7/240
Current metric state for cache cache_group_1_028 on node 2: 19
[14:45:04][:568 :617] Wait rebalance to finish 13/240
Current metric state for cache cache_group_1_028 on node 2: 19

[14:48:47][:568 :617] Wait rebalance to finish 236/240
Current metric state for cache cache_group_1_028 on node 2: 19{code}
Config of the cache that fails:
{code:xml}


















{code}
I'm afraid that this all information that I can provide for you for now. I've 
attached jstack from node2: [^node-2-jstack.log]^[^node-2-js

[jira] [Comment Edited] (IGNITE-7165) Re-balancing is cancelled if client node joins

2018-08-15 Thread Dmitry Sherstobitov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580970#comment-16580970
 ] 

Dmitry Sherstobitov edited comment on IGNITE-7165 at 8/15/18 12:15 PM:
---

[~Mmuzaf]

Config:
 (UPD: Disabling persistance solves the problem)
{code:java}


























{code}
Test code:
{code:java}
def test_blinking_clients_clean_lfs(self):
"""
IGNITE-7165
"""
self.wait_for_running_clients_num(client_num=0, timeout=120)

self.start_grid() # start 4 nodes

for _ in range(0, 10):
log_print("Iteration %s" % str(_))

self.assert_nodes_alive() # check that no nodes left grid because of 
FailHandler

self.ignite.kill_node(2)
self._cleanup_lfs(2)
self.ignite.start_node(2)

# start Ignition.start() with client config and do nothing 3 times
with PiClient(self.ignite, self.get_client_config()):
pass 

with PiClient(self.ignite, self.get_client_config()):
pass

with PiClient(self.ignite, self.get_client_config()):
pass

# check LocalNodeMovingPartitionsCount metric for all cache groups in 
cluster
# wait that for all cache groups this value will be 0
self.wait_for_finish_rebalance(){code}
self.start_grid() start real grid on distributed servers using ignite.sh 
scripts. with PiClient block start JVM and runs Ignition.start() with client 
config (major difference with server config is clientMode=true)

Log file of this test contains following information: metric dos not change 
their state in 240 seconds in current master. (I've recently check this on 15 
Aug nightly build)
{code:java}
Current metric state for cache cache_group_1_028 on node 2: 19
[14:44:58][:568 :617] Wait rebalance to finish 7/240
Current metric state for cache cache_group_1_028 on node 2: 19
[14:45:04][:568 :617] Wait rebalance to finish 13/240
Current metric state for cache cache_group_1_028 on node 2: 19

[14:48:47][:568 :617] Wait rebalance to finish 236/240
Current metric state for cache cache_group_1_028 on node 2: 19{code}
Config of the cache that fails:
{code:xml}


















{code}
I'm afraid that this all information that I can provide for you for now. I've 
attached jstack from node2: [^node-2-jstack.log]^[^node-2-jstack.log]^


was (Author: qvad):
[~Mmuzaf]

Config:
([^node-2-jstack.log]PD: Disabling persistance solves the problem)
{code:java}


























{code}
Test code:
{code:java}
def test_blinking_clients_clean_lfs(self):
"""
IGNITE-7165
"""
self.wait_for_running_clients_num(client_num=0, timeout=120)

self.start_grid() # start 4 nodes

for _ in range(0, 10):
log_print("Iteration %s" % str(_))

self.assert_nodes_alive() # check that no nodes left grid because of 
FailHandler

self.ignite.kill_node(2)
self._cleanup_lfs(2)
self.ignite.start_node(2)

# start Ignition.start() with client config and do nothing 3 times
with PiClient(self.ignite, self.get_client_config()):
pass 

with PiClient(self.ignite, self.get_client_config()):
pass

with PiClient(self.ignite, self.get_client_config()):
pass

# check LocalNodeMovingPartitionsCount metric for all cache groups in 
cluster
# wait that for all cache groups this value will be 0
self.wait_for_finish_rebalance(){code}
self.start_grid() start real grid on distributed servers using ignite.sh 
scripts. with PiClient block start JVM and runs Ignition.start() with client 
config (major difference with server config is clientMode=true)

Log file of this test contains following information: metric dos not change 
their state in 240 seconds in current master. (I've recently check this on 15 
Aug nightly build)
{code:java}
Current metric state for cache cache_group_1_028 on node 2: 19
[14:44:58][:568 :617] Wait rebalance to finish 7/240
Current metric state for cache cache_group_1_028 on node 2: 19
[14:45:04][:568 :617] Wait rebalance to finish 13/240
Current metric state for cache cache_group_1_028 on node 2: 19

[14:48:47][:568 :617] Wait rebalance to finish 236/240
Current metric state for cache cache_group_1_028 on node 2: 19{code}
Config of the cache that fails:
{code:xml}


















{code}
I'm afraid that this all information that I can provide for you for now. I've 
a

[jira] [Comment Edited] (IGNITE-7165) Re-balancing is cancelled if client node joins

2018-08-15 Thread Dmitry Sherstobitov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580970#comment-16580970
 ] 

Dmitry Sherstobitov edited comment on IGNITE-7165 at 8/15/18 12:04 PM:
---

[~Mmuzaf]

Config:
{code:java}


























{code}
Test code:
{code}
def test_blinking_clients_clean_lfs(self):
"""
IGNITE-7165
"""
self.wait_for_running_clients_num(client_num=0, timeout=120)

self.start_grid() # start 4 nodes

for _ in range(0, 10):
log_print("Iteration %s" % str(_))

self.assert_nodes_alive() # check that no nodes left grid because of 
FailHandler

self.ignite.kill_node(2)
self._cleanup_lfs(2)
self.ignite.start_node(2)

# start Ignition.start() with client config and do nothing 3 times
with PiClient(self.ignite, self.get_client_config()):
pass 

with PiClient(self.ignite, self.get_client_config()):
pass

with PiClient(self.ignite, self.get_client_config()):
pass

# check LocalNodeMovingPartitionsCount metric for all cache groups in 
cluster
# wait that for all cache groups this value will be 0
self.wait_for_finish_rebalance(){code}
self.start_grid() start real grid on distributed servers using ignite.sh 
scripts. with PiClient block start JVM and runs Ignition.start() with client 
config (major difference with server config is clientMode=true)

Log file of this test contains following information: metric dos not change 
their state in 240 seconds in current master. (I've recently check this on 15 
Aug nightly build)
{code}
Current metric state for cache cache_group_1_028 on node 2: 19
[14:44:58][:568 :617] Wait rebalance to finish 7/240
Current metric state for cache cache_group_1_028 on node 2: 19
[14:45:04][:568 :617] Wait rebalance to finish 13/240
Current metric state for cache cache_group_1_028 on node 2: 19

[14:48:47][:568 :617] Wait rebalance to finish 236/240
Current metric state for cache cache_group_1_028 on node 2: 19{code}
Config of the cache that fails:
{code:xml}


















{code}
I'm afraid that this all information that I can provide for you for now. I've 
attached jstack from node2: [^node-2-jstack.log]^[^node-2-jstack.log]^


was (Author: qvad):
[~Mmuzaf]

Config:
{code:java}


























{code}

Test code:
{code:python}
def test_blinking_clients_clean_lfs(self):
"""
IGNITE-7165
"""
self.wait_for_running_clients_num(client_num=0, timeout=120)

self.start_grid() # start 4 nodes

for _ in range(0, 10):
log_print("Iteration %s" % str(_))

self.assert_nodes_alive() # check that no nodes left grid because of 
FailHandler

self.ignite.kill_node(2)
self._cleanup_lfs(2)
self.ignite.start_node(2)

# start Ignition.start() with client config and do nothing 3 times
with PiClient(self.ignite, self.get_client_config()):
pass 

with PiClient(self.ignite, self.get_client_config()):
pass

with PiClient(self.ignite, self.get_client_config()):
pass

# check LocalNodeMovingPartitionsCount metric for all cache groups in 
cluster
# wait that for all cache groups this value will be 0
self.wait_for_finish_rebalance(){code}
Here is code from our test on python. self.start_grid() start real grid on 
distributed servers using ignite.sh scripts. with PiClient block start JVM and 
runs Ignition.start() with client config (major difference with server config 
is clientMode=true)

Log file of this test contains following information: metric dos not change 
their state in 240 seconds in current master. (I've recently check this on 15 
Aug nightly build)
{code:sh}
Current metric state for cache cache_group_1_028 on node 2: 19
[14:44:58][:568 :617] Wait rebalance to finish 7/240
Current metric state for cache cache_group_1_028 on node 2: 19
[14:45:04][:568 :617] Wait rebalance to finish 13/240
Current metric state for cache cache_group_1_028 on node 2: 19

[14:48:47][:568 :617] Wait rebalance to finish 236/240
Current metric state for cache cache_group_1_028 on node 2: 19{code}
Config of the cache that fails:
{code:xml}


















{code}
I'm afraid that this all information that I can provide for you for now. I've 
attached jstack from node2: [^node-2-jstack.log]

> Re-balancing is cancelled if client

[jira] [Comment Edited] (IGNITE-7165) Re-balancing is cancelled if client node joins

2018-08-15 Thread Dmitry Sherstobitov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580970#comment-16580970
 ] 

Dmitry Sherstobitov edited comment on IGNITE-7165 at 8/15/18 12:03 PM:
---

[~Mmuzaf]

Config:
{code:java}


























{code}

Test code:
{code:python}
def test_blinking_clients_clean_lfs(self):
"""
IGNITE-7165
"""
self.wait_for_running_clients_num(client_num=0, timeout=120)

self.start_grid() # start 4 nodes

for _ in range(0, 10):
log_print("Iteration %s" % str(_))

self.assert_nodes_alive() # check that no nodes left grid because of 
FailHandler

self.ignite.kill_node(2)
self._cleanup_lfs(2)
self.ignite.start_node(2)

# start Ignition.start() with client config and do nothing 3 times
with PiClient(self.ignite, self.get_client_config()):
pass 

with PiClient(self.ignite, self.get_client_config()):
pass

with PiClient(self.ignite, self.get_client_config()):
pass

# check LocalNodeMovingPartitionsCount metric for all cache groups in 
cluster
# wait that for all cache groups this value will be 0
self.wait_for_finish_rebalance(){code}
Here is code from our test on python. self.start_grid() start real grid on 
distributed servers using ignite.sh scripts. with PiClient block start JVM and 
runs Ignition.start() with client config (major difference with server config 
is clientMode=true)

Log file of this test contains following information: metric dos not change 
their state in 240 seconds in current master. (I've recently check this on 15 
Aug nightly build)
{code:sh}
Current metric state for cache cache_group_1_028 on node 2: 19
[14:44:58][:568 :617] Wait rebalance to finish 7/240
Current metric state for cache cache_group_1_028 on node 2: 19
[14:45:04][:568 :617] Wait rebalance to finish 13/240
Current metric state for cache cache_group_1_028 on node 2: 19

[14:48:47][:568 :617] Wait rebalance to finish 236/240
Current metric state for cache cache_group_1_028 on node 2: 19{code}
Config of the cache that fails:
{code:xml}


















{code}
I'm afraid that this all information that I can provide for you for now. I've 
attached jstack from node2: [^node-2-jstack.log]


was (Author: qvad):
{code:java}
 {code}
[~Mmuzaf]
{code:java}


























{code}
{code:java}
def test_blinking_clients_clean_lfs(self):
"""
IGNITE-7165
"""
self.wait_for_running_clients_num(client_num=0, timeout=120)

self.start_grid() # start 4 nodes

for _ in range(0, 10):
log_print("Iteration %s" % str(_))

self.assert_nodes_alive() # check that no nodes left grid because of 
FailHandler

self.ignite.kill_node(2)
self._cleanup_lfs(2)
self.ignite.start_node(2)

# start Ignition.start() with client config and do nothing 3 times
with PiClient(self.ignite, self.get_client_config()):
pass 

with PiClient(self.ignite, self.get_client_config()):
pass

with PiClient(self.ignite, self.get_client_config()):
pass

# check LocalNodeMovingPartitionsCount metric for all cache groups in 
cluster
# wait that for all cache groups this value will be 0
self.wait_for_finish_rebalance(){code}
Here is code from our test on python. self.start_grid() start real grid on 
distributed servers using ignite.sh scripts. with PiClient block start JVM and 
runs Ignition.start() with client config (major difference with server config 
is clientMode=true)

Log file of this test contains following information: metric dos not change 
their state in 240 seconds in current master. (I've recently check this on 15 
Aug nightly build)
{code:java}
Current metric state for cache cache_group_1_028 on node 2: 19
[14:44:58][:568 :617] Wait rebalance to finish 7/240
Current metric state for cache cache_group_1_028 on node 2: 19
[14:45:04][:568 :617] Wait rebalance to finish 13/240
Current metric state for cache cache_group_1_028 on node 2: 19

[14:48:47][:568 :617] Wait rebalance to finish 236/240
Current metric state for cache cache_group_1_028 on node 2: 19{code}
Config of the cache that fails:
{code:java}


















{code}
I'm afraid that this all information that I can provide for you for now. I've 
attached jstack from node2: [^node-2-jstack.log]

> Re-bala

[jira] [Comment Edited] (IGNITE-7165) Re-balancing is cancelled if client node joins

2018-08-15 Thread Dmitry Sherstobitov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580970#comment-16580970
 ] 

Dmitry Sherstobitov edited comment on IGNITE-7165 at 8/15/18 12:02 PM:
---

{code:java}
 {code}
[~Mmuzaf]
{code:java}


























{code}
{code:java}
def test_blinking_clients_clean_lfs(self):
"""
IGNITE-7165
"""
self.wait_for_running_clients_num(client_num=0, timeout=120)

self.start_grid() # start 4 nodes

for _ in range(0, 10):
log_print("Iteration %s" % str(_))

self.assert_nodes_alive() # check that no nodes left grid because of 
FailHandler

self.ignite.kill_node(2)
self._cleanup_lfs(2)
self.ignite.start_node(2)

# start Ignition.start() with client config and do nothing 3 times
with PiClient(self.ignite, self.get_client_config()):
pass 

with PiClient(self.ignite, self.get_client_config()):
pass

with PiClient(self.ignite, self.get_client_config()):
pass

# check LocalNodeMovingPartitionsCount metric for all cache groups in 
cluster
# wait that for all cache groups this value will be 0
self.wait_for_finish_rebalance(){code}
Here is code from our test on python. self.start_grid() start real grid on 
distributed servers using ignite.sh scripts. with PiClient block start JVM and 
runs Ignition.start() with client config (major difference with server config 
is clientMode=true)

Log file of this test contains following information: metric dos not change 
their state in 240 seconds in current master. (I've recently check this on 15 
Aug nightly build)
{code:java}
Current metric state for cache cache_group_1_028 on node 2: 19
[14:44:58][:568 :617] Wait rebalance to finish 7/240
Current metric state for cache cache_group_1_028 on node 2: 19
[14:45:04][:568 :617] Wait rebalance to finish 13/240
Current metric state for cache cache_group_1_028 on node 2: 19

[14:48:47][:568 :617] Wait rebalance to finish 236/240
Current metric state for cache cache_group_1_028 on node 2: 19{code}
Config of the cache that fails:
{code:java}


















{code}
I'm afraid that this all information that I can provide for you for now. I've 
attached jstack from node2: [^node-2-jstack.log]


was (Author: qvad):
{code:java}


























{code}
{code:java}
def test_blinking_clients_clean_lfs(self):
"""
IGNITE-7165
"""
self.wait_for_running_clients_num(client_num=0, timeout=120)

self.start_grid() # start 4 nodes

for _ in range(0, 10):
log_print("Iteration %s" % str(_))

self.assert_nodes_alive() # check that no nodes left grid because of 
FailHandler

self.ignite.kill_node(2)
self._cleanup_lfs(2)
self.ignite.start_node(2)

# start Ignition.start() with client config and do nothing 3 times
with PiClient(self.ignite, self.get_client_config()):
pass 

with PiClient(self.ignite, self.get_client_config()):
pass

with PiClient(self.ignite, self.get_client_config()):
pass

# check LocalNodeMovingPartitionsCount metric for all cache groups in 
cluster
# wait that for all cache groups this value will be 0
self.wait_for_finish_rebalance(){code}
Here is code from our test on python. self.start_grid() start real grid on 
distributed servers using ignite.sh scripts. with PiClient block start JVM and 
runs Ignition.start() with client config (major difference with server config 
is clientMode=true)

Log file of this test contains following information: metric dos not change 
their state in 240 seconds in current master. (I've recently check this on 15 
Aug nightly build)
{code:java}
Current metric state for cache cache_group_1_028 on node 2: 19
[14:44:58][:568 :617] Wait rebalance to finish 7/240
Current metric state for cache cache_group_1_028 on node 2: 19
[14:45:04][:568 :617] Wait rebalance to finish 13/240
Current metric state for cache cache_group_1_028 on node 2: 19

[14:48:47][:568 :617] Wait rebalance to finish 236/240
Current metric state for cache cache_group_1_028 on node 2: 19{code}
Config of the cache that fails:
{code:java}


















{code}
I'm afraid that this all information that I can provide for you for now. I've 
attached jstack from node2: [^node-2-jstack.log]

> Re-balancing is cancelled if client n

[jira] [Comment Edited] (IGNITE-7165) Re-balancing is cancelled if client node joins

2018-08-15 Thread Dmitry Sherstobitov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580970#comment-16580970
 ] 

Dmitry Sherstobitov edited comment on IGNITE-7165 at 8/15/18 12:02 PM:
---

{code:java}
 {code}
[~Mmuzaf]
{code:java}


























{code}
{code:java}
def test_blinking_clients_clean_lfs(self):
"""
IGNITE-7165
"""
self.wait_for_running_clients_num(client_num=0, timeout=120)

self.start_grid() # start 4 nodes

for _ in range(0, 10):
log_print("Iteration %s" % str(_))

self.assert_nodes_alive() # check that no nodes left grid because of 
FailHandler

self.ignite.kill_node(2)
self._cleanup_lfs(2)
self.ignite.start_node(2)

# start Ignition.start() with client config and do nothing 3 times
with PiClient(self.ignite, self.get_client_config()):
pass 

with PiClient(self.ignite, self.get_client_config()):
pass

with PiClient(self.ignite, self.get_client_config()):
pass

# check LocalNodeMovingPartitionsCount metric for all cache groups in 
cluster
# wait that for all cache groups this value will be 0
self.wait_for_finish_rebalance(){code}
Here is code from our test on python. self.start_grid() start real grid on 
distributed servers using ignite.sh scripts. with PiClient block start JVM and 
runs Ignition.start() with client config (major difference with server config 
is clientMode=true)

Log file of this test contains following information: metric dos not change 
their state in 240 seconds in current master. (I've recently check this on 15 
Aug nightly build)
{code:java}
Current metric state for cache cache_group_1_028 on node 2: 19
[14:44:58][:568 :617] Wait rebalance to finish 7/240
Current metric state for cache cache_group_1_028 on node 2: 19
[14:45:04][:568 :617] Wait rebalance to finish 13/240
Current metric state for cache cache_group_1_028 on node 2: 19

[14:48:47][:568 :617] Wait rebalance to finish 236/240
Current metric state for cache cache_group_1_028 on node 2: 19{code}
Config of the cache that fails:
{code:java}


















{code}
I'm afraid that this all information that I can provide for you for now. I've 
attached jstack from node2: [^node-2-jstack.log]


was (Author: qvad):
{code:java}
 {code}
[~Mmuzaf]
{code:java}


























{code}
{code:java}
def test_blinking_clients_clean_lfs(self):
"""
IGNITE-7165
"""
self.wait_for_running_clients_num(client_num=0, timeout=120)

self.start_grid() # start 4 nodes

for _ in range(0, 10):
log_print("Iteration %s" % str(_))

self.assert_nodes_alive() # check that no nodes left grid because of 
FailHandler

self.ignite.kill_node(2)
self._cleanup_lfs(2)
self.ignite.start_node(2)

# start Ignition.start() with client config and do nothing 3 times
with PiClient(self.ignite, self.get_client_config()):
pass 

with PiClient(self.ignite, self.get_client_config()):
pass

with PiClient(self.ignite, self.get_client_config()):
pass

# check LocalNodeMovingPartitionsCount metric for all cache groups in 
cluster
# wait that for all cache groups this value will be 0
self.wait_for_finish_rebalance(){code}
Here is code from our test on python. self.start_grid() start real grid on 
distributed servers using ignite.sh scripts. with PiClient block start JVM and 
runs Ignition.start() with client config (major difference with server config 
is clientMode=true)

Log file of this test contains following information: metric dos not change 
their state in 240 seconds in current master. (I've recently check this on 15 
Aug nightly build)
{code:java}
Current metric state for cache cache_group_1_028 on node 2: 19
[14:44:58][:568 :617] Wait rebalance to finish 7/240
Current metric state for cache cache_group_1_028 on node 2: 19
[14:45:04][:568 :617] Wait rebalance to finish 13/240
Current metric state for cache cache_group_1_028 on node 2: 19

[14:48:47][:568 :617] Wait rebalance to finish 236/240
Current metric state for cache cache_group_1_028 on node 2: 19{code}
Config of the cache that fails:
{code:java}


















{code}
I'm afraid that this all information that I can provide for you for now. I've 
attached jstack from node2: [^node-2-jstack.log]

> Re-bala

[jira] [Comment Edited] (IGNITE-7165) Re-balancing is cancelled if client node joins

2018-08-15 Thread Dmitry Sherstobitov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580970#comment-16580970
 ] 

Dmitry Sherstobitov edited comment on IGNITE-7165 at 8/15/18 12:01 PM:
---

{code:java}


























{code}
{code:java}
def test_blinking_clients_clean_lfs(self):
"""
IGNITE-7165
"""
self.wait_for_running_clients_num(client_num=0, timeout=120)

self.start_grid() # start 4 nodes

for _ in range(0, 10):
log_print("Iteration %s" % str(_))

self.assert_nodes_alive() # check that no nodes left grid because of 
FailHandler

self.ignite.kill_node(2)
self._cleanup_lfs(2)
self.ignite.start_node(2)

# start Ignition.start() with client config and do nothing 3 times
with PiClient(self.ignite, self.get_client_config()):
pass 

with PiClient(self.ignite, self.get_client_config()):
pass

with PiClient(self.ignite, self.get_client_config()):
pass

# check LocalNodeMovingPartitionsCount metric for all cache groups in 
cluster
# wait that for all cache groups this value will be 0
self.wait_for_finish_rebalance(){code}
Here is code from our test on python. self.start_grid() start real grid on 
distributed servers using ignite.sh scripts. with PiClient block start JVM and 
runs Ignition.start() with client config (major difference with server config 
is clientMode=true)

Log file of this test contains following information: metric dos not change 
their state in 240 seconds in current master. (I've recently check this on 15 
Aug nightly build)
{code:java}
Current metric state for cache cache_group_1_028 on node 2: 19
[14:44:58][:568 :617] Wait rebalance to finish 7/240
Current metric state for cache cache_group_1_028 on node 2: 19
[14:45:04][:568 :617] Wait rebalance to finish 13/240
Current metric state for cache cache_group_1_028 on node 2: 19

[14:48:47][:568 :617] Wait rebalance to finish 236/240
Current metric state for cache cache_group_1_028 on node 2: 19{code}
Config of the cache that fails:
{code:java}


















{code}
I'm afraid that this all information that I can provide for you for now. I've 
attached jstack from node2: [^node-2-jstack.log]


was (Author: qvad):
{code:java}


























{code}
{code:java}
def test_blinking_clients_clean_lfs(self):
"""
IGN-9159 (IGNITE-7165)
"""
self.wait_for_running_clients_num(client_num=0, timeout=120)

self.start_grid() # start 4 nodes

for _ in range(0, 10):
log_print("Iteration %s" % str(_))

self.assert_nodes_alive() # check that no nodes left grid because of 
FailHandler

self.ignite.kill_node(2)
self._cleanup_lfs(2)
self.ignite.start_node(2)

# start Ignition.start() with client config and do nothing 3 times
with PiClient(self.ignite, self.get_client_config()):
pass 

with PiClient(self.ignite, self.get_client_config()):
pass

with PiClient(self.ignite, self.get_client_config()):
pass

# check LocalNodeMovingPartitionsCount metric for all cache groups in 
cluster
# wait that for all cache groups this value will be 0
self.wait_for_finish_rebalance(){code}
Here is code from our test on python. self.start_grid() start real grid on 
distributed servers using ignite.sh scripts. with PiClient block start JVM and 
runs Ignition.start() with client config (major difference with server config 
is clientMode=true)

Log file of this test contains following information: metric dos not change 
their state in 240 seconds in current master. (I've recently check this on 15 
Aug nightly build)
{code:java}
Current metric state for cache cache_group_1_028 on node 2: 19
[14:44:58][:568 :617] Wait rebalance to finish 7/240
Current metric state for cache cache_group_1_028 on node 2: 19
[14:45:04][:568 :617] Wait rebalance to finish 13/240
Current metric state for cache cache_group_1_028 on node 2: 19

[14:48:47][:568 :617] Wait rebalance to finish 236/240
Current metric state for cache cache_group_1_028 on node 2: 19{code}
Config of the cache that fails:
{code:java}


















{code}
I'm afraid that this all information that I can provide for you for now. I've 
attached jstack from node2: [^node-2-jstack.log]

> Re-balancing is cancelled if client node joins
> ---

[jira] [Comment Edited] (IGNITE-7165) Re-balancing is cancelled if client node joins

2018-08-15 Thread Dmitry Sherstobitov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580970#comment-16580970
 ] 

Dmitry Sherstobitov edited comment on IGNITE-7165 at 8/15/18 11:54 AM:
---

{code:java}


























{code}
{code:java}
def test_blinking_clients_clean_lfs(self):
"""
IGN-9159 (IGNITE-7165)
"""
self.wait_for_running_clients_num(client_num=0, timeout=120)

self.start_grid() # start 4 nodes

for _ in range(0, 10):
log_print("Iteration %s" % str(_))

self.assert_nodes_alive() # check that no nodes left grid because of 
FailHandler

self.ignite.kill_node(2)
self._cleanup_lfs(2)
self.ignite.start_node(2)

# start Ignition.start() with client config and do nothing 3 times
with PiClient(self.ignite, self.get_client_config()):
pass 

with PiClient(self.ignite, self.get_client_config()):
pass

with PiClient(self.ignite, self.get_client_config()):
pass

# check LocalNodeMovingPartitionsCount metric for all cache groups in 
cluster
# wait that for all cache groups this value will be 0
self.wait_for_finish_rebalance(){code}
Here is code from our test on python. self.start_grid() start real grid on 
distributed servers using ignite.sh scripts. with PiClient block start JVM and 
runs Ignition.start() with client config (major difference with server config 
is clientMode=true)

Log file of this test contains following information: metric dos not change 
their state in 240 seconds in current master. (I've recently check this on 15 
Aug nightly build)
{code:java}
Current metric state for cache cache_group_1_028 on node 2: 19
[14:44:58][:568 :617] Wait rebalance to finish 7/240
Current metric state for cache cache_group_1_028 on node 2: 19
[14:45:04][:568 :617] Wait rebalance to finish 13/240
Current metric state for cache cache_group_1_028 on node 2: 19

[14:48:47][:568 :617] Wait rebalance to finish 236/240
Current metric state for cache cache_group_1_028 on node 2: 19{code}
Config of the cache that fails:
{code:java}


















{code}
I'm afraid that this all information that I can provide for you for now. I've 
attached jstack from node2: [^node-2-jstack.log]


was (Author: qvad):
{code:java}
def test_blinking_clients_clean_lfs(self):
"""
IGN-9159 (IGNITE-7165)
"""
self.wait_for_running_clients_num(client_num=0, timeout=120)

self.start_grid() # start 4 nodes

for _ in range(0, 10):
log_print("Iteration %s" % str(_))

self.assert_nodes_alive() # check that no nodes left grid because of 
FailHandler

self.ignite.kill_node(2)
self._cleanup_lfs(2)
self.ignite.start_node(2)

# start Ignition.start() with client config and do nothing 3 times
with PiClient(self.ignite, self.get_client_config()):
pass 

with PiClient(self.ignite, self.get_client_config()):
pass

with PiClient(self.ignite, self.get_client_config()):
pass

# check LocalNodeMovingPartitionsCount metric for all cache groups in 
cluster
# wait that for all cache groups this value will be 0
self.wait_for_finish_rebalance(){code}
Here is code from our test on python. self.start_grid() start real grid on 
distributed servers using ignite.sh scripts. with PiClient block start JVM and 
runs Ignition.start() with client config (major difference with server config 
is clientMode=true)

Log file of this test contains following information: metric dos not change 
their state in 240 seconds in current master. (I've recently check this on 15 
Aug nightly build)
{code:java}
Current metric state for cache cache_group_1_028 on node 2: 19
[14:44:58][:568 :617] Wait rebalance to finish 7/240
Current metric state for cache cache_group_1_028 on node 2: 19
[14:45:04][:568 :617] Wait rebalance to finish 13/240
Current metric state for cache cache_group_1_028 on node 2: 19

[14:48:47][:568 :617] Wait rebalance to finish 236/240
Current metric state for cache cache_group_1_028 on node 2: 19{code}
Config of the cache that fails:
{code:java}


















{code}

I'm afraid that this all information that I can provide for you for now. I've 
attached jstack from node2: [^node-2-jstack.log]

> Re-balancing is cancelled if client node joins
> --
>
> Key: IGNITE-7165
> URL: https://issues.apache.org/jira/browse/IGNITE-

[jira] [Commented] (IGNITE-7165) Re-balancing is cancelled if client node joins

2018-08-15 Thread Dmitry Sherstobitov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580970#comment-16580970
 ] 

Dmitry Sherstobitov commented on IGNITE-7165:
-

{code:java}
def test_blinking_clients_clean_lfs(self):
"""
IGN-9159 (IGNITE-7165)
"""
self.wait_for_running_clients_num(client_num=0, timeout=120)

self.start_grid() # start 4 nodes

for _ in range(0, 10):
log_print("Iteration %s" % str(_))

self.assert_nodes_alive() # check that no nodes left grid because of 
FailHandler

self.ignite.kill_node(2)
self._cleanup_lfs(2)
self.ignite.start_node(2)

# start Ignition.start() with client config and do nothing 3 times
with PiClient(self.ignite, self.get_client_config()):
pass 

with PiClient(self.ignite, self.get_client_config()):
pass

with PiClient(self.ignite, self.get_client_config()):
pass

# check LocalNodeMovingPartitionsCount metric for all cache groups in 
cluster
# wait that for all cache groups this value will be 0
self.wait_for_finish_rebalance(){code}
Here is code from our test on python. self.start_grid() start real grid on 
distributed servers using ignite.sh scripts. with PiClient block start JVM and 
runs Ignition.start() with client config (major difference with server config 
is clientMode=true)

Log file of this test contains following information: metric dos not change 
their state in 240 seconds in current master. (I've recently check this on 15 
Aug nightly build)
{code:java}
Current metric state for cache cache_group_1_028 on node 2: 19
[14:44:58][:568 :617] Wait rebalance to finish 7/240
Current metric state for cache cache_group_1_028 on node 2: 19
[14:45:04][:568 :617] Wait rebalance to finish 13/240
Current metric state for cache cache_group_1_028 on node 2: 19

[14:48:47][:568 :617] Wait rebalance to finish 236/240
Current metric state for cache cache_group_1_028 on node 2: 19{code}
Config of the cache that fails:
{code:java}


















{code}

I'm afraid that this all information that I can provide for you for now. I've 
attached jstack from node2: [^node-2-jstack.log]

> Re-balancing is cancelled if client node joins
> --
>
> Key: IGNITE-7165
> URL: https://issues.apache.org/jira/browse/IGNITE-7165
> Project: Ignite
>  Issue Type: Bug
>Reporter: Mikhail Cherkasov
>Assignee: Maxim Muzafarov
>Priority: Critical
>  Labels: rebalance
> Fix For: 2.7
>
> Attachments: node-2-jstack.log, node-NO_REBALANCE-7165.log
>
>
> Re-balancing is canceled if client node joins. Re-balancing can take hours 
> and each time when client node joins it starts again:
> [15:10:05,700][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager]
>  Added new node to topology: TcpDiscoveryNode 
> [id=979cf868-1c37-424a-9ad1-12db501f32ef, addrs=[0:0:0:0:0:0:0:1, 127.0.0.1, 
> 172.31.16.213], sockAddrs=[/0:0:0:0:0:0:0:1:0, /127.0.0.1:0, 
> /172.31.16.213:0], discPort=0, order=36, intOrder=24, 
> lastExchangeTime=1512907805688, loc=false, ver=2.3.1#20171129-sha1:4b1ec0fe, 
> isClient=true]
> [15:10:05,701][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager]
>  Topology snapshot [ver=36, servers=7, clients=5, CPUs=128, heap=160.0GB]
> [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Started 
> exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> crd=false, evt=NODE_JOINED, evtNode=979cf868-1c37-424a-9ad1-12db501f32ef, 
> customEvt=null, allowMerge=true]
> [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionsExchangeFuture]
>  Finish exchange future [startVer=AffinityTopologyVersion [topVer=36, 
> minorTopVer=0], resVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> err=null]
> [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Finished 
> exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> crd=false]
> [15:10:05,703][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager]
>  Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion 
> [topVer=36, minorTopVer=0], evt=NODE_JOINED, 
> node=979cf868-1c37-424a-9ad1-12db501f32ef]
> [15:10:08,706][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Cancelled rebalancing from all nodes [topology=AffinityTopologyVersion 
> [topVer=35, minorTopVer=0]]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager]
>  Rebalancing scheduled [order=[statementp]]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][G

[jira] [Updated] (IGNITE-7165) Re-balancing is cancelled if client node joins

2018-08-15 Thread Dmitry Sherstobitov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Sherstobitov updated IGNITE-7165:

Attachment: node-2-jstack.log

> Re-balancing is cancelled if client node joins
> --
>
> Key: IGNITE-7165
> URL: https://issues.apache.org/jira/browse/IGNITE-7165
> Project: Ignite
>  Issue Type: Bug
>Reporter: Mikhail Cherkasov
>Assignee: Maxim Muzafarov
>Priority: Critical
>  Labels: rebalance
> Fix For: 2.7
>
> Attachments: node-2-jstack.log, node-NO_REBALANCE-7165.log
>
>
> Re-balancing is canceled if client node joins. Re-balancing can take hours 
> and each time when client node joins it starts again:
> [15:10:05,700][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager]
>  Added new node to topology: TcpDiscoveryNode 
> [id=979cf868-1c37-424a-9ad1-12db501f32ef, addrs=[0:0:0:0:0:0:0:1, 127.0.0.1, 
> 172.31.16.213], sockAddrs=[/0:0:0:0:0:0:0:1:0, /127.0.0.1:0, 
> /172.31.16.213:0], discPort=0, order=36, intOrder=24, 
> lastExchangeTime=1512907805688, loc=false, ver=2.3.1#20171129-sha1:4b1ec0fe, 
> isClient=true]
> [15:10:05,701][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager]
>  Topology snapshot [ver=36, servers=7, clients=5, CPUs=128, heap=160.0GB]
> [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Started 
> exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> crd=false, evt=NODE_JOINED, evtNode=979cf868-1c37-424a-9ad1-12db501f32ef, 
> customEvt=null, allowMerge=true]
> [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionsExchangeFuture]
>  Finish exchange future [startVer=AffinityTopologyVersion [topVer=36, 
> minorTopVer=0], resVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> err=null]
> [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Finished 
> exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> crd=false]
> [15:10:05,703][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager]
>  Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion 
> [topVer=36, minorTopVer=0], evt=NODE_JOINED, 
> node=979cf868-1c37-424a-9ad1-12db501f32ef]
> [15:10:08,706][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Cancelled rebalancing from all nodes [topology=AffinityTopologyVersion 
> [topVer=35, minorTopVer=0]]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager]
>  Rebalancing scheduled [order=[statementp]]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager]
>  Rebalancing started [top=null, evt=NODE_JOINED, 
> node=a8be3c14-9add-48c3-b099-3fd304cfdbf4]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=2f6bde48-ffb5-4815-bd32-df4e57dc13e0, partitionsCount=18, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=35d01141-4dce-47dd-adf6-a4f3b2bb9da9, partitionsCount=15, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=b3a8be53-e61f-4023-a906-a265923837ba, partitionsCount=15, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=f825cb4e-7dcc-405f-a40d-c1dc1a3ade5a, partitionsCount=12, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=4ae1db91-8b88-4180-a84b-127a303959e9, partitionsCount=11, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=7c286481-7638-49e4-8c68-fa6aa65d8b76, partitionsCount=18, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> so in clusters with a big amount of data and the frequent client left/join 
> events this means that a new server will never receive its partitions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (IGNITE-7165) Re-balancing is cancelled if client node joins

2018-08-14 Thread Dmitry Sherstobitov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579657#comment-16579657
 ] 

Dmitry Sherstobitov edited comment on IGNITE-7165 at 8/14/18 11:22 AM:
---

For now, I have no reproducer on Java.
 I've investigated persistent store in my test and found that there is 
rebalanced data in storage on the node with cleared LFS, but metrics 
LocalNodeMovingPartitionsCount is definitely broken after client node joins the 
cluster. If I remove the client join event after the node is back - rebalance 
finished correctly.

Here is code from my test log: (Rebalance didn't finish in 240 seconds, while 
in previous versions it's done in 10-15 seconds)

[13:14:17][:568 :617] Wait rebalance to finish 8/240Current metric state for 
cache cache_group_3_088 on node 2: 19
 
 [13:18:04][:568 :617] Wait rebalance to finish 235/240Current metric state for 
cache cache_group_3_088 on node 2: 19

 

P.S. Test runs on a distributed environment, not on a single machine


was (Author: qvad):
For now, I have no reproducer on Java.
I've investigated persistent store in my test and found that there is 
rebalanced data in storage on the node with cleared LFS, but metrics 
LocalNodeMovingPartitionsCount is definitely broken after client node joins the 
cluster. If I remove the client join event after the node is back - rebalance 
finished correctly.

Here is code from my test log: (Rebalance didn't finish in 240 seconds, while 
in previous versions it's done in 10-15 seconds)

[13:14:17][:568 :617] Wait rebalance to finish 8/240Current metric state for 
cache cache_group_3_088 on node 2: 19

[13:18:04][:568 :617] Wait rebalance to finish 235/240Current metric state for 
cache cache_group_3_088 on node 2: 19

> Re-balancing is cancelled if client node joins
> --
>
> Key: IGNITE-7165
> URL: https://issues.apache.org/jira/browse/IGNITE-7165
> Project: Ignite
>  Issue Type: Bug
>Reporter: Mikhail Cherkasov
>Assignee: Maxim Muzafarov
>Priority: Critical
>  Labels: rebalance
> Fix For: 2.7
>
> Attachments: node-NO_REBALANCE-7165.log
>
>
> Re-balancing is canceled if client node joins. Re-balancing can take hours 
> and each time when client node joins it starts again:
> [15:10:05,700][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager]
>  Added new node to topology: TcpDiscoveryNode 
> [id=979cf868-1c37-424a-9ad1-12db501f32ef, addrs=[0:0:0:0:0:0:0:1, 127.0.0.1, 
> 172.31.16.213], sockAddrs=[/0:0:0:0:0:0:0:1:0, /127.0.0.1:0, 
> /172.31.16.213:0], discPort=0, order=36, intOrder=24, 
> lastExchangeTime=1512907805688, loc=false, ver=2.3.1#20171129-sha1:4b1ec0fe, 
> isClient=true]
> [15:10:05,701][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager]
>  Topology snapshot [ver=36, servers=7, clients=5, CPUs=128, heap=160.0GB]
> [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Started 
> exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> crd=false, evt=NODE_JOINED, evtNode=979cf868-1c37-424a-9ad1-12db501f32ef, 
> customEvt=null, allowMerge=true]
> [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionsExchangeFuture]
>  Finish exchange future [startVer=AffinityTopologyVersion [topVer=36, 
> minorTopVer=0], resVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> err=null]
> [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Finished 
> exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> crd=false]
> [15:10:05,703][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager]
>  Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion 
> [topVer=36, minorTopVer=0], evt=NODE_JOINED, 
> node=979cf868-1c37-424a-9ad1-12db501f32ef]
> [15:10:08,706][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Cancelled rebalancing from all nodes [topology=AffinityTopologyVersion 
> [topVer=35, minorTopVer=0]]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager]
>  Rebalancing scheduled [order=[statementp]]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager]
>  Rebalancing started [top=null, evt=NODE_JOINED, 
> node=a8be3c14-9add-48c3-b099-3fd304cfdbf4]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=2f6bde48-ffb5-4815-bd32-df4e57dc13e0, partitionsCount=18, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=35d01141-4d

[jira] [Commented] (IGNITE-7165) Re-balancing is cancelled if client node joins

2018-08-14 Thread Dmitry Sherstobitov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579657#comment-16579657
 ] 

Dmitry Sherstobitov commented on IGNITE-7165:
-

For now, I have no reproducer on Java.
I've investigated persistent store in my test and found that there is 
rebalanced data in storage on the node with cleared LFS, but metrics 
LocalNodeMovingPartitionsCount is definitely broken after client node joins the 
cluster. If I remove the client join event after the node is back - rebalance 
finished correctly.

Here is code from my test log: (Rebalance didn't finish in 240 seconds, while 
in previous versions it's done in 10-15 seconds)

[13:14:17][:568 :617] Wait rebalance to finish 8/240Current metric state for 
cache cache_group_3_088 on node 2: 19

[13:18:04][:568 :617] Wait rebalance to finish 235/240Current metric state for 
cache cache_group_3_088 on node 2: 19

> Re-balancing is cancelled if client node joins
> --
>
> Key: IGNITE-7165
> URL: https://issues.apache.org/jira/browse/IGNITE-7165
> Project: Ignite
>  Issue Type: Bug
>Reporter: Mikhail Cherkasov
>Assignee: Maxim Muzafarov
>Priority: Critical
>  Labels: rebalance
> Fix For: 2.7
>
> Attachments: node-NO_REBALANCE-7165.log
>
>
> Re-balancing is canceled if client node joins. Re-balancing can take hours 
> and each time when client node joins it starts again:
> [15:10:05,700][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager]
>  Added new node to topology: TcpDiscoveryNode 
> [id=979cf868-1c37-424a-9ad1-12db501f32ef, addrs=[0:0:0:0:0:0:0:1, 127.0.0.1, 
> 172.31.16.213], sockAddrs=[/0:0:0:0:0:0:0:1:0, /127.0.0.1:0, 
> /172.31.16.213:0], discPort=0, order=36, intOrder=24, 
> lastExchangeTime=1512907805688, loc=false, ver=2.3.1#20171129-sha1:4b1ec0fe, 
> isClient=true]
> [15:10:05,701][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager]
>  Topology snapshot [ver=36, servers=7, clients=5, CPUs=128, heap=160.0GB]
> [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Started 
> exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> crd=false, evt=NODE_JOINED, evtNode=979cf868-1c37-424a-9ad1-12db501f32ef, 
> customEvt=null, allowMerge=true]
> [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionsExchangeFuture]
>  Finish exchange future [startVer=AffinityTopologyVersion [topVer=36, 
> minorTopVer=0], resVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> err=null]
> [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Finished 
> exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> crd=false]
> [15:10:05,703][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager]
>  Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion 
> [topVer=36, minorTopVer=0], evt=NODE_JOINED, 
> node=979cf868-1c37-424a-9ad1-12db501f32ef]
> [15:10:08,706][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Cancelled rebalancing from all nodes [topology=AffinityTopologyVersion 
> [topVer=35, minorTopVer=0]]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager]
>  Rebalancing scheduled [order=[statementp]]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager]
>  Rebalancing started [top=null, evt=NODE_JOINED, 
> node=a8be3c14-9add-48c3-b099-3fd304cfdbf4]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=2f6bde48-ffb5-4815-bd32-df4e57dc13e0, partitionsCount=18, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=35d01141-4dce-47dd-adf6-a4f3b2bb9da9, partitionsCount=15, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=b3a8be53-e61f-4023-a906-a265923837ba, partitionsCount=15, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=f825cb4e-7dcc-405f-a40d-c1dc1a3ade5a, partitionsCount=12, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=4ae1db91-8b88-4180-a84b-127a303959e9, pa

[jira] [Issue Comment Deleted] (IGNITE-7165) Re-balancing is cancelled if client node joins

2018-08-14 Thread Dmitry Sherstobitov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Sherstobitov updated IGNITE-7165:

Comment: was deleted

(was: I'm afraid I cannot give you correct reproducer on Java

Attached log from node with cleared LFS [^node-NO_REBALANCE-7165.log]

There is some messaged with "Skipping rebalancing (no affinity changes)" after 
node joins cluster while in previous version following text appears in log

{code:java}
[12:53:44,127][INFO][disco-event-worker-#61][GridDiscoveryManager] Topology 
snapshot [ver=18, servers=4, clients=0, CPUs=32, offheap=75.0GB, heap=120.0GB]
[12:53:44,127][INFO][disco-event-worker-#61][GridDiscoveryManager]   ^-- Node 
[id=61E12BC1-31A0-473A-BF79-DDD51C879722, clusterState=ACTIVE]
[12:53:44,127][INFO][disco-event-worker-#61][GridDiscoveryManager]   ^-- 
Baseline [id=0, size=4, online=4, offline=0]
[12:53:44,127][INFO][disco-event-worker-#61][GridDiscoveryManager] Data Regions 
Configured:
[12:53:44,128][INFO][disco-event-worker-#61][GridDiscoveryManager]   ^-- 
default [initSize=256.0 MiB, maxSize=18.8 GiB, persistenceEnabled=true]
[12:53:44,128][INFO][exchange-worker-#62][time] Started exchange init 
[topVer=AffinityTopologyVersion [topVer=18, minorTopVer=0], crd=false, 
evt=NODE_FAILED, evtNode=02e72065-13c8-4b47-a905-874d723cc3c1, customEvt=null, 
allowMerge=true]
[12:53:44,129][INFO][exchange-worker-#62][GridDhtPartitionsExchangeFuture] 
Finish exchange future [startVer=AffinityTopologyVersion [topVer=18, 
minorTopVer=0], resVer=AffinityTopologyVersion [topVer=18, minorTopVer=0], 
err=null]
[12:53:44,130][INFO][exchange-worker-#62][time] Finished exchange init 
[topVer=AffinityTopologyVersion [topVer=18, minorTopVer=0], crd=false]
[12:53:44,141][INFO][exchange-worker-#62][GridDhtPartitionDemander] Cancelled 
rebalancing from all nodes [topology=AffinityTopologyVersion [topVer=17, 
minorTopVer=0]]
[12:53:44,141][INFO][exchange-worker-#62][GridDhtPartitionDemander] Completed 
rebalance future: RebalanceFuture [grp=CacheGroupContext 
[grp=cache_group_1_028], topVer=AffinityTopologyVersion [topVer=17, 
minorTopVer=0], rebalanceId=6]
[12:53:44,141][INFO][exchange-worker-#62][GridDhtPartitionDemander] Cancelled 
rebalancing from all nodes [topology=AffinityTopologyVersion [topVer=17, 
minorTopVer=0]]
[12:53:44,141][INFO][exchange-worker-#62][GridDhtPartitionDemander] Completed 
rebalance future: RebalanceFuture [grp=CacheGroupContext 
[grp=cache_group_3_088], topVer=AffinityTopologyVersion [topVer=17, 
minorTopVer=0], rebalanceId=6]
[12:53:44,141][INFO][exchange-worker-#62][GridDhtPartitionDemander] Cancelled 
rebalancing from all nodes [topology=AffinityTopologyVersion [topVer=17, 
minorTopVer=0]]
[12:53:44,141][INFO][exchange-worker-#62][GridDhtPartitionDemander] Completed 
rebalance future: RebalanceFuture [grp=CacheGroupContext 
[grp=cache_group_1_015], topVer=AffinityTopologyVersion [topVer=17, 
minorTopVer=0], rebalanceId=6]
[12:53:44,141][INFO][exchange-worker-#62][GridDhtPartitionDemander] Cancelled 
rebalancing from all nodes [topology=AffinityTopologyVersion [topVer=17, 
minorTopVer=0]]
[12:53:44,141][INFO][exchange-worker-#62][GridDhtPartitionDemander] Completed 
rebalance future: RebalanceFuture [grp=CacheGroupContext 
[grp=cache_group_4_118], topVer=AffinityTopologyVersion [topVer=17, 
minorTopVer=0], rebalanceId=6]
[12:53:44,141][INFO][exchange-worker-#62][GridDhtPartitionDemander] Cancelled 
rebalancing from all nodes [topology=AffinityTopologyVersion [topVer=17, 
minorTopVer=0]]
[12:53:44,141][INFO][exchange-worker-#62][GridDhtPartitionDemander] Completed 
rebalance future: RebalanceFuture [grp=CacheGroupContext 
[grp=cache_group_2_058], topVer=AffinityTopologyVersion [topVer=17, 
minorTopVer=0], rebalanceId=6]
[12:53:44,141][INFO][exchange-worker-#62][GridDhtPartitionDemander] Cancelled 
rebalancing from all nodes [topology=AffinityTopologyVersion [topVer=17, 
minorTopVer=0]]
[12:53:44,142][INFO][exchange-worker-#62][GridDhtPartitionDemander] Completed 
rebalance future: RebalanceFuture [grp=CacheGroupContext [grp=cache_group_6], 
topVer=AffinityTopologyVersion [topVer=17, minorTopVer=0], rebalanceId=6]
[12:53:44,142][INFO][exchange-worker-#62][GridDhtPartitionDemander] Cancelled 
rebalancing from all nodes [topology=AffinityTopologyVersion [topVer=17, 
minorTopVer=0]]
[12:53:44,142][INFO][exchange-worker-#62][GridDhtPartitionDemander] Completed 
rebalance future: RebalanceFuture [grp=CacheGroupContext [grp=cache_group_5], 
topVer=AffinityTopologyVersion [topVer=17, minorTopVer=0], rebalanceId=6]
[12:53:44,142][INFO][exchange-worker-#62][GridDhtPartitionDemander] Cancelled 
rebalancing from all nodes [topology=AffinityTopologyVersion [topVer=17, 
minorTopVer=0]]
[12:53:44,142][INFO][exchange-worker-#62][GridDhtPartitionDemander] Completed 
rebalance future: RebalanceFuture [grp=CacheGroupContext [grp=cache_group_4], 
topVer=AffinityTopologyVersio

[jira] [Commented] (IGNITE-7165) Re-balancing is cancelled if client node joins

2018-08-14 Thread Dmitry Sherstobitov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579579#comment-16579579
 ] 

Dmitry Sherstobitov commented on IGNITE-7165:
-

I'm afraid I cannot give you correct reproducer on Java

Attached log from node with cleared LFS [^node-NO_REBALANCE-7165.log]

There is some messaged with "Skipping rebalancing (no affinity changes)" after 
node joins cluster while in previous version following text appears in log

{code:java}
[12:53:44,127][INFO][disco-event-worker-#61][GridDiscoveryManager] Topology 
snapshot [ver=18, servers=4, clients=0, CPUs=32, offheap=75.0GB, heap=120.0GB]
[12:53:44,127][INFO][disco-event-worker-#61][GridDiscoveryManager]   ^-- Node 
[id=61E12BC1-31A0-473A-BF79-DDD51C879722, clusterState=ACTIVE]
[12:53:44,127][INFO][disco-event-worker-#61][GridDiscoveryManager]   ^-- 
Baseline [id=0, size=4, online=4, offline=0]
[12:53:44,127][INFO][disco-event-worker-#61][GridDiscoveryManager] Data Regions 
Configured:
[12:53:44,128][INFO][disco-event-worker-#61][GridDiscoveryManager]   ^-- 
default [initSize=256.0 MiB, maxSize=18.8 GiB, persistenceEnabled=true]
[12:53:44,128][INFO][exchange-worker-#62][time] Started exchange init 
[topVer=AffinityTopologyVersion [topVer=18, minorTopVer=0], crd=false, 
evt=NODE_FAILED, evtNode=02e72065-13c8-4b47-a905-874d723cc3c1, customEvt=null, 
allowMerge=true]
[12:53:44,129][INFO][exchange-worker-#62][GridDhtPartitionsExchangeFuture] 
Finish exchange future [startVer=AffinityTopologyVersion [topVer=18, 
minorTopVer=0], resVer=AffinityTopologyVersion [topVer=18, minorTopVer=0], 
err=null]
[12:53:44,130][INFO][exchange-worker-#62][time] Finished exchange init 
[topVer=AffinityTopologyVersion [topVer=18, minorTopVer=0], crd=false]
[12:53:44,141][INFO][exchange-worker-#62][GridDhtPartitionDemander] Cancelled 
rebalancing from all nodes [topology=AffinityTopologyVersion [topVer=17, 
minorTopVer=0]]
[12:53:44,141][INFO][exchange-worker-#62][GridDhtPartitionDemander] Completed 
rebalance future: RebalanceFuture [grp=CacheGroupContext 
[grp=cache_group_1_028], topVer=AffinityTopologyVersion [topVer=17, 
minorTopVer=0], rebalanceId=6]
[12:53:44,141][INFO][exchange-worker-#62][GridDhtPartitionDemander] Cancelled 
rebalancing from all nodes [topology=AffinityTopologyVersion [topVer=17, 
minorTopVer=0]]
[12:53:44,141][INFO][exchange-worker-#62][GridDhtPartitionDemander] Completed 
rebalance future: RebalanceFuture [grp=CacheGroupContext 
[grp=cache_group_3_088], topVer=AffinityTopologyVersion [topVer=17, 
minorTopVer=0], rebalanceId=6]
[12:53:44,141][INFO][exchange-worker-#62][GridDhtPartitionDemander] Cancelled 
rebalancing from all nodes [topology=AffinityTopologyVersion [topVer=17, 
minorTopVer=0]]
[12:53:44,141][INFO][exchange-worker-#62][GridDhtPartitionDemander] Completed 
rebalance future: RebalanceFuture [grp=CacheGroupContext 
[grp=cache_group_1_015], topVer=AffinityTopologyVersion [topVer=17, 
minorTopVer=0], rebalanceId=6]
[12:53:44,141][INFO][exchange-worker-#62][GridDhtPartitionDemander] Cancelled 
rebalancing from all nodes [topology=AffinityTopologyVersion [topVer=17, 
minorTopVer=0]]
[12:53:44,141][INFO][exchange-worker-#62][GridDhtPartitionDemander] Completed 
rebalance future: RebalanceFuture [grp=CacheGroupContext 
[grp=cache_group_4_118], topVer=AffinityTopologyVersion [topVer=17, 
minorTopVer=0], rebalanceId=6]
[12:53:44,141][INFO][exchange-worker-#62][GridDhtPartitionDemander] Cancelled 
rebalancing from all nodes [topology=AffinityTopologyVersion [topVer=17, 
minorTopVer=0]]
[12:53:44,141][INFO][exchange-worker-#62][GridDhtPartitionDemander] Completed 
rebalance future: RebalanceFuture [grp=CacheGroupContext 
[grp=cache_group_2_058], topVer=AffinityTopologyVersion [topVer=17, 
minorTopVer=0], rebalanceId=6]
[12:53:44,141][INFO][exchange-worker-#62][GridDhtPartitionDemander] Cancelled 
rebalancing from all nodes [topology=AffinityTopologyVersion [topVer=17, 
minorTopVer=0]]
[12:53:44,142][INFO][exchange-worker-#62][GridDhtPartitionDemander] Completed 
rebalance future: RebalanceFuture [grp=CacheGroupContext [grp=cache_group_6], 
topVer=AffinityTopologyVersion [topVer=17, minorTopVer=0], rebalanceId=6]
[12:53:44,142][INFO][exchange-worker-#62][GridDhtPartitionDemander] Cancelled 
rebalancing from all nodes [topology=AffinityTopologyVersion [topVer=17, 
minorTopVer=0]]
[12:53:44,142][INFO][exchange-worker-#62][GridDhtPartitionDemander] Completed 
rebalance future: RebalanceFuture [grp=CacheGroupContext [grp=cache_group_5], 
topVer=AffinityTopologyVersion [topVer=17, minorTopVer=0], rebalanceId=6]
[12:53:44,142][INFO][exchange-worker-#62][GridDhtPartitionDemander] Cancelled 
rebalancing from all nodes [topology=AffinityTopologyVersion [topVer=17, 
minorTopVer=0]]
[12:53:44,142][INFO][exchange-worker-#62][GridDhtPartitionDemander] Completed 
rebalance future: RebalanceFuture [grp=CacheGroupContext [grp=cache_group_4], 
top

[jira] [Updated] (IGNITE-7165) Re-balancing is cancelled if client node joins

2018-08-14 Thread Dmitry Sherstobitov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Sherstobitov updated IGNITE-7165:

Attachment: node-NO_REBALANCE-7165.log

> Re-balancing is cancelled if client node joins
> --
>
> Key: IGNITE-7165
> URL: https://issues.apache.org/jira/browse/IGNITE-7165
> Project: Ignite
>  Issue Type: Bug
>Reporter: Mikhail Cherkasov
>Assignee: Maxim Muzafarov
>Priority: Critical
>  Labels: rebalance
> Fix For: 2.7
>
> Attachments: node-NO_REBALANCE-7165.log
>
>
> Re-balancing is canceled if client node joins. Re-balancing can take hours 
> and each time when client node joins it starts again:
> [15:10:05,700][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager]
>  Added new node to topology: TcpDiscoveryNode 
> [id=979cf868-1c37-424a-9ad1-12db501f32ef, addrs=[0:0:0:0:0:0:0:1, 127.0.0.1, 
> 172.31.16.213], sockAddrs=[/0:0:0:0:0:0:0:1:0, /127.0.0.1:0, 
> /172.31.16.213:0], discPort=0, order=36, intOrder=24, 
> lastExchangeTime=1512907805688, loc=false, ver=2.3.1#20171129-sha1:4b1ec0fe, 
> isClient=true]
> [15:10:05,701][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager]
>  Topology snapshot [ver=36, servers=7, clients=5, CPUs=128, heap=160.0GB]
> [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Started 
> exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> crd=false, evt=NODE_JOINED, evtNode=979cf868-1c37-424a-9ad1-12db501f32ef, 
> customEvt=null, allowMerge=true]
> [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionsExchangeFuture]
>  Finish exchange future [startVer=AffinityTopologyVersion [topVer=36, 
> minorTopVer=0], resVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> err=null]
> [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Finished 
> exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> crd=false]
> [15:10:05,703][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager]
>  Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion 
> [topVer=36, minorTopVer=0], evt=NODE_JOINED, 
> node=979cf868-1c37-424a-9ad1-12db501f32ef]
> [15:10:08,706][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Cancelled rebalancing from all nodes [topology=AffinityTopologyVersion 
> [topVer=35, minorTopVer=0]]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager]
>  Rebalancing scheduled [order=[statementp]]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager]
>  Rebalancing started [top=null, evt=NODE_JOINED, 
> node=a8be3c14-9add-48c3-b099-3fd304cfdbf4]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=2f6bde48-ffb5-4815-bd32-df4e57dc13e0, partitionsCount=18, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=35d01141-4dce-47dd-adf6-a4f3b2bb9da9, partitionsCount=15, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=b3a8be53-e61f-4023-a906-a265923837ba, partitionsCount=15, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=f825cb4e-7dcc-405f-a40d-c1dc1a3ade5a, partitionsCount=12, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=4ae1db91-8b88-4180-a84b-127a303959e9, partitionsCount=11, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=7c286481-7638-49e4-8c68-fa6aa65d8b76, partitionsCount=18, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> so in clusters with a big amount of data and the frequent client left/join 
> events this means that a new server will never receive its partitions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-7165) Re-balancing is cancelled if client node joins

2018-08-14 Thread Dmitry Sherstobitov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Sherstobitov updated IGNITE-7165:

Attachment: (was: node-NO_REBALANCE-7165.log)

> Re-balancing is cancelled if client node joins
> --
>
> Key: IGNITE-7165
> URL: https://issues.apache.org/jira/browse/IGNITE-7165
> Project: Ignite
>  Issue Type: Bug
>Reporter: Mikhail Cherkasov
>Assignee: Maxim Muzafarov
>Priority: Critical
>  Labels: rebalance
> Fix For: 2.7
>
>
> Re-balancing is canceled if client node joins. Re-balancing can take hours 
> and each time when client node joins it starts again:
> [15:10:05,700][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager]
>  Added new node to topology: TcpDiscoveryNode 
> [id=979cf868-1c37-424a-9ad1-12db501f32ef, addrs=[0:0:0:0:0:0:0:1, 127.0.0.1, 
> 172.31.16.213], sockAddrs=[/0:0:0:0:0:0:0:1:0, /127.0.0.1:0, 
> /172.31.16.213:0], discPort=0, order=36, intOrder=24, 
> lastExchangeTime=1512907805688, loc=false, ver=2.3.1#20171129-sha1:4b1ec0fe, 
> isClient=true]
> [15:10:05,701][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager]
>  Topology snapshot [ver=36, servers=7, clients=5, CPUs=128, heap=160.0GB]
> [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Started 
> exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> crd=false, evt=NODE_JOINED, evtNode=979cf868-1c37-424a-9ad1-12db501f32ef, 
> customEvt=null, allowMerge=true]
> [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionsExchangeFuture]
>  Finish exchange future [startVer=AffinityTopologyVersion [topVer=36, 
> minorTopVer=0], resVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> err=null]
> [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Finished 
> exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> crd=false]
> [15:10:05,703][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager]
>  Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion 
> [topVer=36, minorTopVer=0], evt=NODE_JOINED, 
> node=979cf868-1c37-424a-9ad1-12db501f32ef]
> [15:10:08,706][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Cancelled rebalancing from all nodes [topology=AffinityTopologyVersion 
> [topVer=35, minorTopVer=0]]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager]
>  Rebalancing scheduled [order=[statementp]]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager]
>  Rebalancing started [top=null, evt=NODE_JOINED, 
> node=a8be3c14-9add-48c3-b099-3fd304cfdbf4]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=2f6bde48-ffb5-4815-bd32-df4e57dc13e0, partitionsCount=18, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=35d01141-4dce-47dd-adf6-a4f3b2bb9da9, partitionsCount=15, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=b3a8be53-e61f-4023-a906-a265923837ba, partitionsCount=15, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=f825cb4e-7dcc-405f-a40d-c1dc1a3ade5a, partitionsCount=12, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=4ae1db91-8b88-4180-a84b-127a303959e9, partitionsCount=11, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=7c286481-7638-49e4-8c68-fa6aa65d8b76, partitionsCount=18, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> so in clusters with a big amount of data and the frequent client left/join 
> events this means that a new server will never receive its partitions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-7165) Re-balancing is cancelled if client node joins

2018-08-14 Thread Dmitry Sherstobitov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Sherstobitov updated IGNITE-7165:

Attachment: node-NO_REBALANCE-7165.log

> Re-balancing is cancelled if client node joins
> --
>
> Key: IGNITE-7165
> URL: https://issues.apache.org/jira/browse/IGNITE-7165
> Project: Ignite
>  Issue Type: Bug
>Reporter: Mikhail Cherkasov
>Assignee: Maxim Muzafarov
>Priority: Critical
>  Labels: rebalance
> Fix For: 2.7
>
> Attachments: node-NO_REBALANCE-7165.log
>
>
> Re-balancing is canceled if client node joins. Re-balancing can take hours 
> and each time when client node joins it starts again:
> [15:10:05,700][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager]
>  Added new node to topology: TcpDiscoveryNode 
> [id=979cf868-1c37-424a-9ad1-12db501f32ef, addrs=[0:0:0:0:0:0:0:1, 127.0.0.1, 
> 172.31.16.213], sockAddrs=[/0:0:0:0:0:0:0:1:0, /127.0.0.1:0, 
> /172.31.16.213:0], discPort=0, order=36, intOrder=24, 
> lastExchangeTime=1512907805688, loc=false, ver=2.3.1#20171129-sha1:4b1ec0fe, 
> isClient=true]
> [15:10:05,701][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager]
>  Topology snapshot [ver=36, servers=7, clients=5, CPUs=128, heap=160.0GB]
> [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Started 
> exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> crd=false, evt=NODE_JOINED, evtNode=979cf868-1c37-424a-9ad1-12db501f32ef, 
> customEvt=null, allowMerge=true]
> [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionsExchangeFuture]
>  Finish exchange future [startVer=AffinityTopologyVersion [topVer=36, 
> minorTopVer=0], resVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> err=null]
> [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Finished 
> exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> crd=false]
> [15:10:05,703][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager]
>  Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion 
> [topVer=36, minorTopVer=0], evt=NODE_JOINED, 
> node=979cf868-1c37-424a-9ad1-12db501f32ef]
> [15:10:08,706][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Cancelled rebalancing from all nodes [topology=AffinityTopologyVersion 
> [topVer=35, minorTopVer=0]]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager]
>  Rebalancing scheduled [order=[statementp]]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager]
>  Rebalancing started [top=null, evt=NODE_JOINED, 
> node=a8be3c14-9add-48c3-b099-3fd304cfdbf4]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=2f6bde48-ffb5-4815-bd32-df4e57dc13e0, partitionsCount=18, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=35d01141-4dce-47dd-adf6-a4f3b2bb9da9, partitionsCount=15, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=b3a8be53-e61f-4023-a906-a265923837ba, partitionsCount=15, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=f825cb4e-7dcc-405f-a40d-c1dc1a3ade5a, partitionsCount=12, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=4ae1db91-8b88-4180-a84b-127a303959e9, partitionsCount=11, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=7c286481-7638-49e4-8c68-fa6aa65d8b76, partitionsCount=18, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> so in clusters with a big amount of data and the frequent client left/join 
> events this means that a new server will never receive its partitions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-7165) Re-balancing is cancelled if client node joins

2018-08-13 Thread Dmitry Sherstobitov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16578279#comment-16578279
 ] 

Dmitry Sherstobitov commented on IGNITE-7165:
-

I've problem with current solution

Following test passed on version before fix, and hangs on current master on 
first iteration.
Test hangs on JMX LocalNodeMovingPartitionsCount and looks like rebalance did 
not started at all.

Repeat 10 times:
1. stop node
2. clean lfs
3. add stopped node (trigger rebalance)
4. 3 times: start 2 clients, wait for topology snapshot, close clients
5. for each cache group check JMX metrics LocalNodeMovingPartitionsCount (like 
waitForFinishRebalance())

> Re-balancing is cancelled if client node joins
> --
>
> Key: IGNITE-7165
> URL: https://issues.apache.org/jira/browse/IGNITE-7165
> Project: Ignite
>  Issue Type: Bug
>Reporter: Mikhail Cherkasov
>Assignee: Maxim Muzafarov
>Priority: Critical
>  Labels: rebalance
> Fix For: 2.7
>
>
> Re-balancing is canceled if client node joins. Re-balancing can take hours 
> and each time when client node joins it starts again:
> [15:10:05,700][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager]
>  Added new node to topology: TcpDiscoveryNode 
> [id=979cf868-1c37-424a-9ad1-12db501f32ef, addrs=[0:0:0:0:0:0:0:1, 127.0.0.1, 
> 172.31.16.213], sockAddrs=[/0:0:0:0:0:0:0:1:0, /127.0.0.1:0, 
> /172.31.16.213:0], discPort=0, order=36, intOrder=24, 
> lastExchangeTime=1512907805688, loc=false, ver=2.3.1#20171129-sha1:4b1ec0fe, 
> isClient=true]
> [15:10:05,701][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager]
>  Topology snapshot [ver=36, servers=7, clients=5, CPUs=128, heap=160.0GB]
> [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Started 
> exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> crd=false, evt=NODE_JOINED, evtNode=979cf868-1c37-424a-9ad1-12db501f32ef, 
> customEvt=null, allowMerge=true]
> [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionsExchangeFuture]
>  Finish exchange future [startVer=AffinityTopologyVersion [topVer=36, 
> minorTopVer=0], resVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> err=null]
> [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Finished 
> exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> crd=false]
> [15:10:05,703][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager]
>  Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion 
> [topVer=36, minorTopVer=0], evt=NODE_JOINED, 
> node=979cf868-1c37-424a-9ad1-12db501f32ef]
> [15:10:08,706][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Cancelled rebalancing from all nodes [topology=AffinityTopologyVersion 
> [topVer=35, minorTopVer=0]]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager]
>  Rebalancing scheduled [order=[statementp]]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager]
>  Rebalancing started [top=null, evt=NODE_JOINED, 
> node=a8be3c14-9add-48c3-b099-3fd304cfdbf4]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=2f6bde48-ffb5-4815-bd32-df4e57dc13e0, partitionsCount=18, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=35d01141-4dce-47dd-adf6-a4f3b2bb9da9, partitionsCount=15, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=b3a8be53-e61f-4023-a906-a265923837ba, partitionsCount=15, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=f825cb4e-7dcc-405f-a40d-c1dc1a3ade5a, partitionsCount=12, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=4ae1db91-8b88-4180-a84b-127a303959e9, partitionsCount=11, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=7c286481-7638-49e4-8c68-fa6aa65d8b76, partitions

[jira] [Created] (IGNITE-8895) Update yardstick libraries

2018-06-28 Thread Dmitry Sherstobitov (JIRA)
Dmitry Sherstobitov created IGNITE-8895:
---

 Summary: Update yardstick libraries 
 Key: IGNITE-8895
 URL: https://issues.apache.org/jira/browse/IGNITE-8895
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.5
Reporter: Dmitry Sherstobitov


There is some conflicts in yardstick libraries for now
||yardstick||core||problem||
|jline-0.9.94.jar|bin/include/sqlline/jline-2.4.3.jar|./sqlline.sh unable to 
start if yardstick libraries in path|

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8893) Blinking node in baseline may corrupt own WAL records

2018-06-28 Thread Dmitry Sherstobitov (JIRA)
Dmitry Sherstobitov created IGNITE-8893:
---

 Summary: Blinking node in baseline may corrupt own WAL records
 Key: IGNITE-8893
 URL: https://issues.apache.org/jira/browse/IGNITE-8893
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.5
Reporter: Dmitry Sherstobitov


# Start cluster, load data
 # Start additional node that not in BLT
 # Repeat 10 times: kill 1 node in baseline and 1 node not in baseline, start 
node in blt and node not in BLT

Node in baseline in some moment may unable to start because of corrupted WAL:
Notice that there is no loading on cluster at all - so there is no reason to 
corrupt WAL, rebalance should be interruptible.

Also there is another scenario that may case same error (but also may cause JVM 
crash)
 # Start cluster, load data, start nodes
 # Repeat 10 times: kill 1 node in baseline, clean LFS, start node again, while 
rebalance blink node that should rebalance data to previously killed node

Node that should rebalance data to cleaned node may corrupt own WAL. But this 
second scenario has configuration "error" - number of backups in each case is 
1. So obviously 2 nodes blinking actually may cause data loss.
{code:java}
[2018-06-28 17:33:39,583][ERROR][wal-file-archiver%null-#63][root] Critical 
system error detected. Will be handled accordingly to configured handler 
[hnd=class o.a.i.failure.StopNodeOrHaltFailureHandler, 
failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, 
err=java.lang.AssertionError: lastArchived=757, current=42]]
java.lang.AssertionError: lastArchived=757, current=42
at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileArchiver.body(FileWriteAheadLogManager.java:1629)
at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-8879) Blinking baseline node sometimes unable to connect to cluster

2018-06-26 Thread Dmitry Sherstobitov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-8879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Sherstobitov updated IGNITE-8879:

Description: 
Almost the same scenario as in IGNITE-8874 but node left baseline while blinking

All caches with 2 backups
 4 nodes in cluster
 # Start cluster, load data
 # Start transactional loading (8 threads, 100 ops/second put/get in each op)
 # Repeat 10 times: kill one node, remove from baseline, start node again 
(*with no LFS clean*), wait for rebalance
 # Check idle_verify, check data corruption

 

At some point killed node unable to start and join cluster because of error

(Attachments info: grid.1.node2.X.log - blinking node logs, X - iteration 
counter from step 3)
{code:java}
080ee8-END.bin]
[2018-06-26 19:01:43,039][INFO ][main][PageMemoryImpl] Started page memory 
[memoryAllocated=100.0 MiB, pages=24800, tableSize=1.9 MiB, 
checkpointBuffer=100.0 MiB]
[2018-06-26 19:01:43,039][INFO ][main][GridCacheDatabaseSharedManager] Checking 
memory state [lastValidPos=FileWALPointer [idx=0, fileOff=583691, len=119], 
lastMarked=FileWALPointer [idx=0, fileOff=583691, len=119], 
lastCheckpointId=7fca4dbb-8f01-4b63-95e2-43283b080ee8]
[2018-06-26 19:01:43,050][INFO ][main][GridCacheDatabaseSharedManager] Found 
last checkpoint marker [cpId=7fca4dbb-8f01-4b63-95e2-43283b080ee8, 
pos=FileWALPointer [idx=0, fileOff=583691, len=119]]
[2018-06-26 19:01:43,082][INFO ][main][FileWriteAheadLogManager] Stopping WAL 
iteration due to an exception: EOF at position [100] expected to read [1] 
bytes, ptr=FileWALPointer [idx=0, fileOff=100, len=0]
[2018-06-26 19:01:43,219][WARN ][main][FileWriteAheadLogManager] WAL segment 
tail is reached. [ Expected next state: {Index=19,Offset=794017}, Actual state 
: {Index=3602879702215753728,Offset=775434544} ]
[2018-06-26 19:01:43,243][INFO ][main][GridCacheDatabaseSharedManager] Applying 
lost cache updates since last checkpoint record [lastMarked=FileWALPointer 
[idx=0, fileOff=583691, len=119], 
lastCheckpointId=7fca4dbb-8f01-4b63-95e2-43283b080ee8]
[2018-06-26 19:01:43,246][INFO ][main][FileWriteAheadLogManager] Stopping WAL 
iteration due to an exception: EOF at position [100] expected to read [1] 
bytes, ptr=FileWALPointer [idx=0, fileOff=100, len=0]
[2018-06-26 19:01:43,336][WARN ][main][FileWriteAheadLogManager] WAL segment 
tail is reached. [ Expected next state: {Index=19,Offset=794017}, Actual state 
: {Index=3602879702215753728,Offset=775434544} ]
[2018-06-26 19:01:43,336][INFO ][main][GridCacheDatabaseSharedManager] Finished 
applying WAL changes [updatesApplied=0, time=101ms]
[2018-06-26 19:01:43,450][INFO 
][main][GridSnapshotAwareClusterStateProcessorImpl] Restoring history for 
BaselineTopology[id=4]
[2018-06-26 19:01:43,454][ERROR][main][IgniteKernal] Exception during start 
processors, node will be stopped and close connections
class org.apache.ignite.IgniteCheckedException: Failed to start processor: 
GridProcessorAdapter []
at 
org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1769)
at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1001)
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2020)
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1725)
at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1153)
at 
org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1071)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:957)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:856)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:726)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:695)
at org.apache.ignite.Ignition.start(Ignition.java:352)
at 
org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:301)
Caused by: class org.apache.ignite.IgniteCheckedException: Restoring of 
BaselineTopology history has failed, expected history item not found for id=1
at 
org.apache.ignite.internal.processors.cluster.BaselineTopologyHistory.restoreHistory(BaselineTopologyHistory.java:54)
at 
org.apache.ignite.internal.processors.cluster.GridClusterStateProcessor.onReadyForRead(GridClusterStateProcessor.java:222)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetastorageReadyForRead(GridCacheDatabaseSharedManager.java:381)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readMetastore(GridCacheDatabaseSharedManager.java:643)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.start0(GridCacheDatabaseSharedManager.java:486)
at 
org.apache.ignite.internal.processors.c

[jira] [Updated] (IGNITE-8879) Blinking baseline node sometimes unable to connect to cluster

2018-06-26 Thread Dmitry Sherstobitov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-8879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Sherstobitov updated IGNITE-8879:

Attachment: IGNITE-8879.zip

> Blinking baseline node sometimes unable to connect to cluster
> -
>
> Key: IGNITE-8879
> URL: https://issues.apache.org/jira/browse/IGNITE-8879
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.5
>Reporter: Dmitry Sherstobitov
>Priority: Major
> Attachments: IGNITE-8879.zip
>
>
> Almost the same scenario as in IGNITE-8874 but node left baseline while 
> blinking
> All caches with 2 backups
> 4 nodes in cluster
>  # Start cluster, load data
>  # Start transactional loading (8 threads, 100 ops/second put/get in each op)
>  # Repeat 10 times: kill one node, remove from baseline, start node again 
> (*with no LFS clean*), wait for rebalance
>  # Check idle_verify, check data corruption
>  
> At some point killed node unable to start and join cluster because of
> {code:java}
> 080ee8-END.bin]
> [2018-06-26 19:01:43,039][INFO ][main][PageMemoryImpl] Started page memory 
> [memoryAllocated=100.0 MiB, pages=24800, tableSize=1.9 MiB, 
> checkpointBuffer=100.0 MiB]
> [2018-06-26 19:01:43,039][INFO ][main][GridCacheDatabaseSharedManager] 
> Checking memory state [lastValidPos=FileWALPointer [idx=0, fileOff=583691, 
> len=119], lastMarked=FileWALPointer [idx=0, fileOff=583691, len=119], 
> lastCheckpointId=7fca4dbb-8f01-4b63-95e2-43283b080ee8]
> [2018-06-26 19:01:43,050][INFO ][main][GridCacheDatabaseSharedManager] Found 
> last checkpoint marker [cpId=7fca4dbb-8f01-4b63-95e2-43283b080ee8, 
> pos=FileWALPointer [idx=0, fileOff=583691, len=119]]
> [2018-06-26 19:01:43,082][INFO ][main][FileWriteAheadLogManager] Stopping WAL 
> iteration due to an exception: EOF at position [100] expected to read [1] 
> bytes, ptr=FileWALPointer [idx=0, fileOff=100, len=0]
> [2018-06-26 19:01:43,219][WARN ][main][FileWriteAheadLogManager] WAL segment 
> tail is reached. [ Expected next state: {Index=19,Offset=794017}, Actual 
> state : {Index=3602879702215753728,Offset=775434544} ]
> [2018-06-26 19:01:43,243][INFO ][main][GridCacheDatabaseSharedManager] 
> Applying lost cache updates since last checkpoint record 
> [lastMarked=FileWALPointer [idx=0, fileOff=583691, len=119], 
> lastCheckpointId=7fca4dbb-8f01-4b63-95e2-43283b080ee8]
> [2018-06-26 19:01:43,246][INFO ][main][FileWriteAheadLogManager] Stopping WAL 
> iteration due to an exception: EOF at position [100] expected to read [1] 
> bytes, ptr=FileWALPointer [idx=0, fileOff=100, len=0]
> [2018-06-26 19:01:43,336][WARN ][main][FileWriteAheadLogManager] WAL segment 
> tail is reached. [ Expected next state: {Index=19,Offset=794017}, Actual 
> state : {Index=3602879702215753728,Offset=775434544} ]
> [2018-06-26 19:01:43,336][INFO ][main][GridCacheDatabaseSharedManager] 
> Finished applying WAL changes [updatesApplied=0, time=101ms]
> [2018-06-26 19:01:43,450][INFO 
> ][main][GridSnapshotAwareClusterStateProcessorImpl] Restoring history for 
> BaselineTopology[id=4]
> [2018-06-26 19:01:43,454][ERROR][main][IgniteKernal] Exception during start 
> processors, node will be stopped and close connections
> class org.apache.ignite.IgniteCheckedException: Failed to start processor: 
> GridProcessorAdapter []
> at 
> org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1769)
> at 
> org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1001)
> at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2020)
> at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1725)
> at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1153)
> at 
> org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1071)
> at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:957)
> at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:856)
> at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:726)
> at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:695)
> at org.apache.ignite.Ignition.start(Ignition.java:352)
> at 
> org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:301)
> Caused by: class org.apache.ignite.IgniteCheckedException: Restoring of 
> BaselineTopology history has failed, expected history item not found for id=1
> at 
> org.apache.ignite.internal.processors.cluster.BaselineTopologyHistory.restoreHistory(BaselineTopologyHistory.java:54)
> at 
> org.apache.ignite.internal.processors.cluster.GridClusterStateProcessor.onReadyForRead(GridClusterStateProcessor.java:222)
> at 
> org.apache.igni

[jira] [Created] (IGNITE-8879) Blinking baseline node sometimes unable to connect to cluster

2018-06-26 Thread Dmitry Sherstobitov (JIRA)
Dmitry Sherstobitov created IGNITE-8879:
---

 Summary: Blinking baseline node sometimes unable to connect to 
cluster
 Key: IGNITE-8879
 URL: https://issues.apache.org/jira/browse/IGNITE-8879
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.5
Reporter: Dmitry Sherstobitov


Almost the same scenario as in IGNITE-8874 but node left baseline while blinking

All caches with 2 backups
4 nodes in cluster
 # Start cluster, load data
 # Start transactional loading (8 threads, 100 ops/second put/get in each op)
 # Repeat 10 times: kill one node, remove from baseline, start node again 
(*with no LFS clean*), wait for rebalance
 # Check idle_verify, check data corruption

 

At some point killed node unable to start and join cluster because of
{code:java}
080ee8-END.bin]
[2018-06-26 19:01:43,039][INFO ][main][PageMemoryImpl] Started page memory 
[memoryAllocated=100.0 MiB, pages=24800, tableSize=1.9 MiB, 
checkpointBuffer=100.0 MiB]
[2018-06-26 19:01:43,039][INFO ][main][GridCacheDatabaseSharedManager] Checking 
memory state [lastValidPos=FileWALPointer [idx=0, fileOff=583691, len=119], 
lastMarked=FileWALPointer [idx=0, fileOff=583691, len=119], 
lastCheckpointId=7fca4dbb-8f01-4b63-95e2-43283b080ee8]
[2018-06-26 19:01:43,050][INFO ][main][GridCacheDatabaseSharedManager] Found 
last checkpoint marker [cpId=7fca4dbb-8f01-4b63-95e2-43283b080ee8, 
pos=FileWALPointer [idx=0, fileOff=583691, len=119]]
[2018-06-26 19:01:43,082][INFO ][main][FileWriteAheadLogManager] Stopping WAL 
iteration due to an exception: EOF at position [100] expected to read [1] 
bytes, ptr=FileWALPointer [idx=0, fileOff=100, len=0]
[2018-06-26 19:01:43,219][WARN ][main][FileWriteAheadLogManager] WAL segment 
tail is reached. [ Expected next state: {Index=19,Offset=794017}, Actual state 
: {Index=3602879702215753728,Offset=775434544} ]
[2018-06-26 19:01:43,243][INFO ][main][GridCacheDatabaseSharedManager] Applying 
lost cache updates since last checkpoint record [lastMarked=FileWALPointer 
[idx=0, fileOff=583691, len=119], 
lastCheckpointId=7fca4dbb-8f01-4b63-95e2-43283b080ee8]
[2018-06-26 19:01:43,246][INFO ][main][FileWriteAheadLogManager] Stopping WAL 
iteration due to an exception: EOF at position [100] expected to read [1] 
bytes, ptr=FileWALPointer [idx=0, fileOff=100, len=0]
[2018-06-26 19:01:43,336][WARN ][main][FileWriteAheadLogManager] WAL segment 
tail is reached. [ Expected next state: {Index=19,Offset=794017}, Actual state 
: {Index=3602879702215753728,Offset=775434544} ]
[2018-06-26 19:01:43,336][INFO ][main][GridCacheDatabaseSharedManager] Finished 
applying WAL changes [updatesApplied=0, time=101ms]
[2018-06-26 19:01:43,450][INFO 
][main][GridSnapshotAwareClusterStateProcessorImpl] Restoring history for 
BaselineTopology[id=4]
[2018-06-26 19:01:43,454][ERROR][main][IgniteKernal] Exception during start 
processors, node will be stopped and close connections
class org.apache.ignite.IgniteCheckedException: Failed to start processor: 
GridProcessorAdapter []
at 
org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1769)
at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1001)
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2020)
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1725)
at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1153)
at 
org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1071)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:957)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:856)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:726)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:695)
at org.apache.ignite.Ignition.start(Ignition.java:352)
at 
org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:301)
Caused by: class org.apache.ignite.IgniteCheckedException: Restoring of 
BaselineTopology history has failed, expected history item not found for id=1
at 
org.apache.ignite.internal.processors.cluster.BaselineTopologyHistory.restoreHistory(BaselineTopologyHistory.java:54)
at 
org.apache.ignite.internal.processors.cluster.GridClusterStateProcessor.onReadyForRead(GridClusterStateProcessor.java:222)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetastorageReadyForRead(GridCacheDatabaseSharedManager.java:381)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readMetastore(GridCacheDatabaseSharedManager.java:643)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.start0(GridCacheDatabaseSharedM

[jira] [Created] (IGNITE-8874) Blinking node in cluster may cause data corruption

2018-06-26 Thread Dmitry Sherstobitov (JIRA)
Dmitry Sherstobitov created IGNITE-8874:
---

 Summary: Blinking node in cluster may cause data corruption
 Key: IGNITE-8874
 URL: https://issues.apache.org/jira/browse/IGNITE-8874
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.5
Reporter: Dmitry Sherstobitov


All caches with 2 backups
4 nodes in cluster
 # Start cluster, load data
 # Start transactional loading (8 threads, 100 ops/second put/get in each op)
 # Repeat 10 times: kill one node, clean LFS, start node again, wait for 
rebalance
 # Check idle_verify, check data corruption

Here is idle_verify report:
node2 - node that was blinking while test. Update counter are equal between 
partitions but data is different.
{code:java}
Conflict partition: PartitionKey [grpId=374280886, grpName=cache_group_3, 
partId=41]
Partition instances: [PartitionHashRecord [isPrimary=true, partHash=885018783, 
updateCntr=16, size=15, consistentId=node4], PartitionHashRecord 
[isPrimary=false, partHash=885018783, updateCntr=16, size=15, 
consistentId=node3], PartitionHashRecord [isPrimary=false, partHash=-357162793, 
updateCntr=16, size=15, consistentId=node2]]

Conflict partition: PartitionKey [grpId=1586135625, grpName=cache_group_1_015, 
partId=15]
Partition instances: [PartitionHashRecord [isPrimary=true, partHash=-562597978, 
updateCntr=22, size=16, consistentId=node3], PartitionHashRecord 
[isPrimary=false, partHash=-562597978, updateCntr=22, size=16, 
consistentId=node1], PartitionHashRecord [isPrimary=false, partHash=780813725, 
updateCntr=22, size=16, consistentId=node2]]

Conflict partition: PartitionKey [grpId=374280885, grpName=cache_group_2, 
partId=75]
Partition instances: [PartitionHashRecord [isPrimary=true, 
partHash=-1500797699, updateCntr=21, size=16, consistentId=node3], 
PartitionHashRecord [isPrimary=false, partHash=-1500797699, updateCntr=21, 
size=16, consistentId=node1], PartitionHashRecord [isPrimary=false, 
partHash=-1592034435, updateCntr=21, size=16, consistentId=node2]]

Conflict partition: PartitionKey [grpId=374280884, grpName=cache_group_1, 
partId=713]
Partition instances: [PartitionHashRecord [isPrimary=false, partHash=-63058826, 
updateCntr=4, size=2, consistentId=node3], PartitionHashRecord [isPrimary=true, 
partHash=-63058826, updateCntr=4, size=2, consistentId=node1], 
PartitionHashRecord [isPrimary=false, partHash=670869467, updateCntr=4, size=2, 
consistentId=node2]]

Conflict partition: PartitionKey [grpId=374280886, grpName=cache_group_3, 
partId=11]
Partition instances: [PartitionHashRecord [isPrimary=false, 
partHash=-224572810, updateCntr=17, size=16, consistentId=node3], 
PartitionHashRecord [isPrimary=true, partHash=-224572810, updateCntr=17, 
size=16, consistentId=node1], PartitionHashRecord [isPrimary=false, 
partHash=176419075, updateCntr=17, size=16, consistentId=node2]]{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8620) Remove intOrder and lc keys from node info in control.sh --tx utility

2018-05-28 Thread Dmitry Sherstobitov (JIRA)
Dmitry Sherstobitov created IGNITE-8620:
---

 Summary: Remove intOrder and lc keys from node info in control.sh 
--tx utility
 Key: IGNITE-8620
 URL: https://issues.apache.org/jira/browse/IGNITE-8620
 Project: Ignite
  Issue Type: Improvement
Reporter: Dmitry Sherstobitov


For now this information displayed in control.sh utility for each node:

TcpDiscoveryNode [id=2ed402d5-b5a7-4ade-a77a-12c2feea95ec, 
addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 172.25.1.47], 
sockAddrs=[/0:0:0:0:0:0:0:1%lo:0, /127.0.0.1:0, /172.25.1.47:0], discPort=0, 
order=6, intOrder=6, lastExchangeTime=1526482701193, loc=false, 
ver=2.5.1#20180510-sha1:ee417b82, isClient=true]

loc and intOrder values are internal information and there is not need to 
display it

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-8602) Add support filter label=null for control.sh tx utility

2018-05-24 Thread Dmitry Sherstobitov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-8602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Sherstobitov updated IGNITE-8602:

Description: For now this transactions cannot be separated from other by 
using filter "label null"

> Add support filter label=null for control.sh tx utility
> ---
>
> Key: IGNITE-8602
> URL: https://issues.apache.org/jira/browse/IGNITE-8602
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 2.5
>Reporter: Dmitry Sherstobitov
>Priority: Major
>
> For now this transactions cannot be separated from other by using filter 
> "label null"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8602) Add support filter label=null for control.sh tx utility

2018-05-24 Thread Dmitry Sherstobitov (JIRA)
Dmitry Sherstobitov created IGNITE-8602:
---

 Summary: Add support filter label=null for control.sh tx utility
 Key: IGNITE-8602
 URL: https://issues.apache.org/jira/browse/IGNITE-8602
 Project: Ignite
  Issue Type: Improvement
Affects Versions: 2.5
Reporter: Dmitry Sherstobitov






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8601) Add to control.sh tx utility information about transaction start time

2018-05-24 Thread Dmitry Sherstobitov (JIRA)
Dmitry Sherstobitov created IGNITE-8601:
---

 Summary: Add to control.sh tx utility information about 
transaction start time
 Key: IGNITE-8601
 URL: https://issues.apache.org/jira/browse/IGNITE-8601
 Project: Ignite
  Issue Type: Improvement
Affects Versions: 2.5
Reporter: Dmitry Sherstobitov


This information will be useful



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (IGNITE-8466) Control.sh transacitions utility may hang on case with loading

2018-05-21 Thread Dmitry Sherstobitov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-8466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Sherstobitov closed IGNITE-8466.
---

> Control.sh transacitions utility may hang on case with loading
> --
>
> Key: IGNITE-8466
> URL: https://issues.apache.org/jira/browse/IGNITE-8466
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.5
>Reporter: Dmitry Sherstobitov
>Priority: Critical
> Attachments: IGNITE-8466.zip
>
>
> Start nodes, activate and preload some Accounts data
>  Start client and run transactional loading (8 threads with ~1000ops/second - 
> moving some amount from one value to another)
>  Start 10 long running transactions (transactions with flexible sleep inside) 
> with label tx_*
>  Start control.sh --tx label tx kill
>  
> Last run of control.sh utility hangs. 
>  
> Attachment info:
>  grid1,2,3 - server logs
> grid20001 - client with preloading
>  grid20002 - client with transactional loading and LRTs (with stack trace)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IGNITE-8466) Control.sh transacitions utility may hang on case with loading

2018-05-21 Thread Dmitry Sherstobitov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-8466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Sherstobitov resolved IGNITE-8466.
-
Resolution: Not A Bug

Incorrect test scenario because of unused --force flag

> Control.sh transacitions utility may hang on case with loading
> --
>
> Key: IGNITE-8466
> URL: https://issues.apache.org/jira/browse/IGNITE-8466
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.5
>Reporter: Dmitry Sherstobitov
>Priority: Critical
> Attachments: IGNITE-8466.zip
>
>
> Start nodes, activate and preload some Accounts data
>  Start client and run transactional loading (8 threads with ~1000ops/second - 
> moving some amount from one value to another)
>  Start 10 long running transactions (transactions with flexible sleep inside) 
> with label tx_*
>  Start control.sh --tx label tx kill
>  
> Last run of control.sh utility hangs. 
>  
> Attachment info:
>  grid1,2,3 - server logs
> grid20001 - client with preloading
>  grid20002 - client with transactional loading and LRTs (with stack trace)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (IGNITE-8476) AssertionError exception occurs when trying to remove node from baseline under loading

2018-05-18 Thread Dmitry Sherstobitov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-8476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16480807#comment-16480807
 ] 

Dmitry Sherstobitov edited comment on IGNITE-8476 at 5/18/18 3:33 PM:
--

Another test scenario with no transactional loading:
 # Start cluster, load data
 # Start client
 # Create transactional cache
 # Start long transactions (transactions with infinite sleep() and interrupt 
variable to call commit() on it)
 # Add new node in cluster that is not in baseline
 # Release transactions after some minor timeout
 # Try to get values from cluster that was affected by this long transactions

First 3 cache.get() are successful, next get() hangs and throw Assertion in 
server logs


was (Author: qvad):
Another test scenario with no transactional loading:
 # Start cluster, load data
 # Start client
 # Create transactional cache
 # Start long transactions (transactions with infinite sleep() and interrupt 
variable to call commit() on it)
 # Add new node in cluster that not in baseline
 # Release transactions after some minor timeout
 # Try to get values from cluster that was affected by this long transactions

First 3 cache.get() are successful, next get() hangs and throw Assertion in 
server logs

> AssertionError exception occurs when trying to remove node from baseline 
> under loading
> --
>
> Key: IGNITE-8476
> URL: https://issues.apache.org/jira/browse/IGNITE-8476
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.5
>Reporter: Dmitry Sherstobitov
>Assignee: Ivan Rakov
>Priority: Blocker
>
> Run 6 nodes, start loading (8 threads, 1000 2x cache.get() and 2x cache.put() 
> in each thread per second)
> Kill 2 nodes and try to remove one node from baseline using
> control.sh --baseline remove node1
>  control.sh --baseline version TOPOLOGY_VERSION
>  
> Utility hangs or connected client may hangs, this assertion appears in log 
> For transactional cache:
> {code:java}
> [16:32:58,960][SEVERE][sys-stripe-14-#15][G] Failed to execute runnable.
> java.lang.AssertionError: localNode = be945692-c750-4d72-b9a1-9ac4170ff125, 
> dhtNodes = [ZookeeperClusterNode [id=810226e6-656a-460d-8069-ca7d2dd294ef, 
> addrs=[172.17.0.1, 0:0:0:0:0:0:0:1%lo, 172.25.1.28, 127.0.0.1], order=1, 
> loc=false, client=false], ZookeeperClusterNode 
> [id=be945692-c750-4d72-b9a1-9ac4170ff125, addrs=[172.17.0.1, 172.25.1.28, 
> 0:0:0:0:0:0:0:1%lo, 127.0.0.1], order=3, loc=true, client=false], 
> ZookeeperClusterNode [id=db4503f6-9185-4673-b38c-8890dfa69511, 
> addrs=[172.17.0.1, 172.25.1.37, 0:0:0:0:0:0:0:1%lo, 127.0.0.1], order=5, 
> loc=false, client=false], ZookeeperClusterNode 
> [id=3b8d8d4f-3513-4d39-a1fd-7ec5b15fc653, addrs=[172.17.0.1, 172.25.1.37, 
> 127.0.0.1, 0:0:0:0:0:0:0:1%lo], order=4, loc=false, client=false], 
> ZookeeperClusterNode [id=2bfc8c2e-2f47-4126-9cc4-6f017ce7efde, 
> addrs=[172.17.0.1, 172.25.1.37, 0:0:0:0:0:0:0:1%lo, 127.0.0.1], order=6, 
> loc=false, client=false]]
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.map(GridDhtTxPrepareFuture.java:1520)
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.prepare0(GridDhtTxPrepareFuture.java:1239)
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.mapIfLocked(GridDhtTxPrepareFuture.java:671)
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.prepare(GridDhtTxPrepareFuture.java:1048)
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.prepareAsync(GridDhtTxLocal.java:397)
> at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.prepareNearTx(IgniteTxHandler.java:516)
> at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processNearTxPrepareRequest0(IgniteTxHandler.java:150)
> at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processNearTxPrepareRequest(IgniteTxHandler.java:135)
> at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.access$000(IgniteTxHandler.java:97)
> at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$1.apply(IgniteTxHandler.java:177)
> at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$1.apply(IgniteTxHandler.java:175)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1054)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:579)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:378)
> at 
> org.apache.ignite.internal.p

[jira] [Comment Edited] (IGNITE-8476) AssertionError exception occurs when trying to remove node from baseline under loading

2018-05-18 Thread Dmitry Sherstobitov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-8476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16480807#comment-16480807
 ] 

Dmitry Sherstobitov edited comment on IGNITE-8476 at 5/18/18 3:33 PM:
--

Another test scenario with no transactional loading:
 # Start cluster, load data
 # Start client
 # Create transactional cache
 # Start long transactions (transactions with infinite sleep() and interrupt 
variable to call commit() on it)
 # Add new node in cluster that not in baseline
 # Release transactions after some minor timeout
 # Try to get values from cluster that was affected by this long transactions

First 3 cache.get() are successful, next get() hangs and throw Assertion in 
server logs


was (Author: qvad):
Another test scenario with no transactional loading:
 # Start cluster, load data
 # Start client
 # Create transactional cache
 # Start long transactions (transactions with infinite sleep() and interrupt 
variable to call commit() on it)
 # Release transactions after some minor timeout
 # Try to get values from cluster that was affected by this long transactions

First 3 cache.get() are successful, next get() hangs and throw Assertion in 
server logs

> AssertionError exception occurs when trying to remove node from baseline 
> under loading
> --
>
> Key: IGNITE-8476
> URL: https://issues.apache.org/jira/browse/IGNITE-8476
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.5
>Reporter: Dmitry Sherstobitov
>Assignee: Ivan Rakov
>Priority: Blocker
>
> Run 6 nodes, start loading (8 threads, 1000 2x cache.get() and 2x cache.put() 
> in each thread per second)
> Kill 2 nodes and try to remove one node from baseline using
> control.sh --baseline remove node1
>  control.sh --baseline version TOPOLOGY_VERSION
>  
> Utility hangs or connected client may hangs, this assertion appears in log 
> For transactional cache:
> {code:java}
> [16:32:58,960][SEVERE][sys-stripe-14-#15][G] Failed to execute runnable.
> java.lang.AssertionError: localNode = be945692-c750-4d72-b9a1-9ac4170ff125, 
> dhtNodes = [ZookeeperClusterNode [id=810226e6-656a-460d-8069-ca7d2dd294ef, 
> addrs=[172.17.0.1, 0:0:0:0:0:0:0:1%lo, 172.25.1.28, 127.0.0.1], order=1, 
> loc=false, client=false], ZookeeperClusterNode 
> [id=be945692-c750-4d72-b9a1-9ac4170ff125, addrs=[172.17.0.1, 172.25.1.28, 
> 0:0:0:0:0:0:0:1%lo, 127.0.0.1], order=3, loc=true, client=false], 
> ZookeeperClusterNode [id=db4503f6-9185-4673-b38c-8890dfa69511, 
> addrs=[172.17.0.1, 172.25.1.37, 0:0:0:0:0:0:0:1%lo, 127.0.0.1], order=5, 
> loc=false, client=false], ZookeeperClusterNode 
> [id=3b8d8d4f-3513-4d39-a1fd-7ec5b15fc653, addrs=[172.17.0.1, 172.25.1.37, 
> 127.0.0.1, 0:0:0:0:0:0:0:1%lo], order=4, loc=false, client=false], 
> ZookeeperClusterNode [id=2bfc8c2e-2f47-4126-9cc4-6f017ce7efde, 
> addrs=[172.17.0.1, 172.25.1.37, 0:0:0:0:0:0:0:1%lo, 127.0.0.1], order=6, 
> loc=false, client=false]]
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.map(GridDhtTxPrepareFuture.java:1520)
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.prepare0(GridDhtTxPrepareFuture.java:1239)
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.mapIfLocked(GridDhtTxPrepareFuture.java:671)
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.prepare(GridDhtTxPrepareFuture.java:1048)
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.prepareAsync(GridDhtTxLocal.java:397)
> at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.prepareNearTx(IgniteTxHandler.java:516)
> at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processNearTxPrepareRequest0(IgniteTxHandler.java:150)
> at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processNearTxPrepareRequest(IgniteTxHandler.java:135)
> at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.access$000(IgniteTxHandler.java:97)
> at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$1.apply(IgniteTxHandler.java:177)
> at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$1.apply(IgniteTxHandler.java:175)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1054)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:579)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:378)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(Gr

[jira] [Commented] (IGNITE-8476) AssertionError exception occurs when trying to remove node from baseline under loading

2018-05-18 Thread Dmitry Sherstobitov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-8476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16480807#comment-16480807
 ] 

Dmitry Sherstobitov commented on IGNITE-8476:
-

Another test scenario with no transactional loading:
 # Start cluster, load data
 # Start client
 # Create transactional cache
 # Start long transactions (transactions with infinite sleep() and interrupt 
variable to call commit() on it)
 # Release transactions after some minor timeout
 # Try to get values from cluster that was affected by this long transactions

First 3 cache.get() are successful, next get() hangs and throw Assertion in 
server logs

> AssertionError exception occurs when trying to remove node from baseline 
> under loading
> --
>
> Key: IGNITE-8476
> URL: https://issues.apache.org/jira/browse/IGNITE-8476
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.5
>Reporter: Dmitry Sherstobitov
>Assignee: Ivan Rakov
>Priority: Blocker
>
> Run 6 nodes, start loading (8 threads, 1000 2x cache.get() and 2x cache.put() 
> in each thread per second)
> Kill 2 nodes and try to remove one node from baseline using
> control.sh --baseline remove node1
>  control.sh --baseline version TOPOLOGY_VERSION
>  
> Utility hangs or connected client may hangs, this assertion appears in log 
> For transactional cache:
> {code:java}
> [16:32:58,960][SEVERE][sys-stripe-14-#15][G] Failed to execute runnable.
> java.lang.AssertionError: localNode = be945692-c750-4d72-b9a1-9ac4170ff125, 
> dhtNodes = [ZookeeperClusterNode [id=810226e6-656a-460d-8069-ca7d2dd294ef, 
> addrs=[172.17.0.1, 0:0:0:0:0:0:0:1%lo, 172.25.1.28, 127.0.0.1], order=1, 
> loc=false, client=false], ZookeeperClusterNode 
> [id=be945692-c750-4d72-b9a1-9ac4170ff125, addrs=[172.17.0.1, 172.25.1.28, 
> 0:0:0:0:0:0:0:1%lo, 127.0.0.1], order=3, loc=true, client=false], 
> ZookeeperClusterNode [id=db4503f6-9185-4673-b38c-8890dfa69511, 
> addrs=[172.17.0.1, 172.25.1.37, 0:0:0:0:0:0:0:1%lo, 127.0.0.1], order=5, 
> loc=false, client=false], ZookeeperClusterNode 
> [id=3b8d8d4f-3513-4d39-a1fd-7ec5b15fc653, addrs=[172.17.0.1, 172.25.1.37, 
> 127.0.0.1, 0:0:0:0:0:0:0:1%lo], order=4, loc=false, client=false], 
> ZookeeperClusterNode [id=2bfc8c2e-2f47-4126-9cc4-6f017ce7efde, 
> addrs=[172.17.0.1, 172.25.1.37, 0:0:0:0:0:0:0:1%lo, 127.0.0.1], order=6, 
> loc=false, client=false]]
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.map(GridDhtTxPrepareFuture.java:1520)
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.prepare0(GridDhtTxPrepareFuture.java:1239)
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.mapIfLocked(GridDhtTxPrepareFuture.java:671)
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.prepare(GridDhtTxPrepareFuture.java:1048)
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.prepareAsync(GridDhtTxLocal.java:397)
> at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.prepareNearTx(IgniteTxHandler.java:516)
> at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processNearTxPrepareRequest0(IgniteTxHandler.java:150)
> at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processNearTxPrepareRequest(IgniteTxHandler.java:135)
> at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.access$000(IgniteTxHandler.java:97)
> at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$1.apply(IgniteTxHandler.java:177)
> at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$1.apply(IgniteTxHandler.java:175)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1054)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:579)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:378)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:304)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:99)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:293)
> at 
> org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556)
> at 
> org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1184)
> at 
> org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoMan

[jira] [Comment Edited] (IGNITE-8476) AssertionError exception occurs when trying to remove node from baseline under loading

2018-05-18 Thread Dmitry Sherstobitov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-8476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16480756#comment-16480756
 ] 

Dmitry Sherstobitov edited comment on IGNITE-8476 at 5/18/18 2:51 PM:
--

[~ivan.glukos] I've got same AssertionError while adding node from baseline 
with clean LFS.
 # Start cluster, activate, start client with loading
 # Kill single node, clean LFS and start it again
 # AssertionError

Next steps in this case:
 # Add new single node in cluster
 # Add new node to baseline
 # Wait for transaction loading ends
 # LRTs in client logs and transactional loading hangs (transactions with 
timeout=5 ms)


was (Author: qvad):
[~ivan.glukos] I've got same AssertionError while adding node from baseline 
with clean LFS.
 # Start cluster, activate, start client with loading
 # Kill single node, clean LFS and start it again
 # AssertionError

> AssertionError exception occurs when trying to remove node from baseline 
> under loading
> --
>
> Key: IGNITE-8476
> URL: https://issues.apache.org/jira/browse/IGNITE-8476
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.5
>Reporter: Dmitry Sherstobitov
>Assignee: Ivan Rakov
>Priority: Blocker
>
> Run 6 nodes, start loading (8 threads, 1000 2x cache.get() and 2x cache.put() 
> in each thread per second)
> Kill 2 nodes and try to remove one node from baseline using
> control.sh --baseline remove node1
>  control.sh --baseline version TOPOLOGY_VERSION
>  
> Utility hangs or connected client may hangs, this assertion appears in log 
> For transactional cache:
> {code:java}
> [16:32:58,960][SEVERE][sys-stripe-14-#15][G] Failed to execute runnable.
> java.lang.AssertionError: localNode = be945692-c750-4d72-b9a1-9ac4170ff125, 
> dhtNodes = [ZookeeperClusterNode [id=810226e6-656a-460d-8069-ca7d2dd294ef, 
> addrs=[172.17.0.1, 0:0:0:0:0:0:0:1%lo, 172.25.1.28, 127.0.0.1], order=1, 
> loc=false, client=false], ZookeeperClusterNode 
> [id=be945692-c750-4d72-b9a1-9ac4170ff125, addrs=[172.17.0.1, 172.25.1.28, 
> 0:0:0:0:0:0:0:1%lo, 127.0.0.1], order=3, loc=true, client=false], 
> ZookeeperClusterNode [id=db4503f6-9185-4673-b38c-8890dfa69511, 
> addrs=[172.17.0.1, 172.25.1.37, 0:0:0:0:0:0:0:1%lo, 127.0.0.1], order=5, 
> loc=false, client=false], ZookeeperClusterNode 
> [id=3b8d8d4f-3513-4d39-a1fd-7ec5b15fc653, addrs=[172.17.0.1, 172.25.1.37, 
> 127.0.0.1, 0:0:0:0:0:0:0:1%lo], order=4, loc=false, client=false], 
> ZookeeperClusterNode [id=2bfc8c2e-2f47-4126-9cc4-6f017ce7efde, 
> addrs=[172.17.0.1, 172.25.1.37, 0:0:0:0:0:0:0:1%lo, 127.0.0.1], order=6, 
> loc=false, client=false]]
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.map(GridDhtTxPrepareFuture.java:1520)
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.prepare0(GridDhtTxPrepareFuture.java:1239)
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.mapIfLocked(GridDhtTxPrepareFuture.java:671)
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.prepare(GridDhtTxPrepareFuture.java:1048)
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.prepareAsync(GridDhtTxLocal.java:397)
> at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.prepareNearTx(IgniteTxHandler.java:516)
> at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processNearTxPrepareRequest0(IgniteTxHandler.java:150)
> at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processNearTxPrepareRequest(IgniteTxHandler.java:135)
> at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.access$000(IgniteTxHandler.java:97)
> at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$1.apply(IgniteTxHandler.java:177)
> at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$1.apply(IgniteTxHandler.java:175)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1054)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:579)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:378)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:304)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:99)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:293)
> at 
> org.apache.ignite.internal.managers.communication.GridIoM

[jira] [Commented] (IGNITE-8476) AssertionError exception occurs when trying to remove node from baseline under loading

2018-05-18 Thread Dmitry Sherstobitov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-8476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16480756#comment-16480756
 ] 

Dmitry Sherstobitov commented on IGNITE-8476:
-

[~ivan.glukos] I've got same AssertionError while adding node from baseline 
with clean LFS.
 # Start cluster, activate, start client with loading
 # Kill single node, clean LFS and start it again
 # AssertionError

> AssertionError exception occurs when trying to remove node from baseline 
> under loading
> --
>
> Key: IGNITE-8476
> URL: https://issues.apache.org/jira/browse/IGNITE-8476
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.5
>Reporter: Dmitry Sherstobitov
>Assignee: Ivan Rakov
>Priority: Blocker
>
> Run 6 nodes, start loading (8 threads, 1000 2x cache.get() and 2x cache.put() 
> in each thread per second)
> Kill 2 nodes and try to remove one node from baseline using
> control.sh --baseline remove node1
>  control.sh --baseline version TOPOLOGY_VERSION
>  
> Utility hangs or connected client may hangs, this assertion appears in log 
> For transactional cache:
> {code:java}
> [16:32:58,960][SEVERE][sys-stripe-14-#15][G] Failed to execute runnable.
> java.lang.AssertionError: localNode = be945692-c750-4d72-b9a1-9ac4170ff125, 
> dhtNodes = [ZookeeperClusterNode [id=810226e6-656a-460d-8069-ca7d2dd294ef, 
> addrs=[172.17.0.1, 0:0:0:0:0:0:0:1%lo, 172.25.1.28, 127.0.0.1], order=1, 
> loc=false, client=false], ZookeeperClusterNode 
> [id=be945692-c750-4d72-b9a1-9ac4170ff125, addrs=[172.17.0.1, 172.25.1.28, 
> 0:0:0:0:0:0:0:1%lo, 127.0.0.1], order=3, loc=true, client=false], 
> ZookeeperClusterNode [id=db4503f6-9185-4673-b38c-8890dfa69511, 
> addrs=[172.17.0.1, 172.25.1.37, 0:0:0:0:0:0:0:1%lo, 127.0.0.1], order=5, 
> loc=false, client=false], ZookeeperClusterNode 
> [id=3b8d8d4f-3513-4d39-a1fd-7ec5b15fc653, addrs=[172.17.0.1, 172.25.1.37, 
> 127.0.0.1, 0:0:0:0:0:0:0:1%lo], order=4, loc=false, client=false], 
> ZookeeperClusterNode [id=2bfc8c2e-2f47-4126-9cc4-6f017ce7efde, 
> addrs=[172.17.0.1, 172.25.1.37, 0:0:0:0:0:0:0:1%lo, 127.0.0.1], order=6, 
> loc=false, client=false]]
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.map(GridDhtTxPrepareFuture.java:1520)
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.prepare0(GridDhtTxPrepareFuture.java:1239)
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.mapIfLocked(GridDhtTxPrepareFuture.java:671)
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.prepare(GridDhtTxPrepareFuture.java:1048)
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.prepareAsync(GridDhtTxLocal.java:397)
> at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.prepareNearTx(IgniteTxHandler.java:516)
> at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processNearTxPrepareRequest0(IgniteTxHandler.java:150)
> at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processNearTxPrepareRequest(IgniteTxHandler.java:135)
> at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.access$000(IgniteTxHandler.java:97)
> at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$1.apply(IgniteTxHandler.java:177)
> at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$1.apply(IgniteTxHandler.java:175)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1054)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:579)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:378)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:304)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:99)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:293)
> at 
> org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556)
> at 
> org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1184)
> at 
> org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:125)
> at 
> org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1091)
> at 
> org.apache.ignite.internal.util.StripedExecutor$Stripe.run(StripedExecutor.java:511)
> at java.lang.Thread.run(Thr

[jira] [Commented] (IGNITE-8476) AssertionError exception occurs when trying to remove node from baseline under loading

2018-05-16 Thread Dmitry Sherstobitov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-8476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16477188#comment-16477188
 ] 

Dmitry Sherstobitov commented on IGNITE-8476:
-

{code:java}




http://www.springframework.org/schema/beans";
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
   xmlns:util="http://www.springframework.org/schema/util";
   xsi:schemaLocation="http://www.springframework.org/schema/beans
   
http://www.springframework.org/schema/beans/spring-beans-2.5.xsd
   http://www.springframework.org/schema/util
   
http://www.springframework.org/schema/util/spring-util-2.0.xsd";>












































































































 



 







































 



















 


















 








 



 










 






































{code}

> AssertionError exception occurs when trying to remove node from baseline 
> under loading
> --
>
> Key: IGNITE-8476
> URL: https://issues.apache.org/jira/browse/IGNITE-8476
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.5
>Reporter: Dmitry Sherstobitov
>Assignee: Ivan Rakov
>Priority: Blocker
>
> Run 6 nodes, start loading (8 threads, 1000 2x cache.get() and 2x cache.put() 
> in each thread per second)
> Kill 2 nodes and try to remove one node from baseline using
> control.sh --base

[jira] [Updated] (IGNITE-8476) AssertionError exception occurs when trying to remove node from baseline under loading

2018-05-14 Thread Dmitry Sherstobitov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-8476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Sherstobitov updated IGNITE-8476:

Description: 
Run 6 nodes, start loading (8 threads, 1000 2x cache.get() and 2x cache.put() 
in each thread per second)

Kill 2 nodes and try to remove one node from baseline using

control.sh --baseline remove node1
 control.sh --baseline version TOPOLOGY_VERSION

 

Utility hangs or connected client may hangs, this assertion appears in log 

For transactional cache:
{code:java}
[16:32:58,960][SEVERE][sys-stripe-14-#15][G] Failed to execute runnable.
java.lang.AssertionError: localNode = be945692-c750-4d72-b9a1-9ac4170ff125, 
dhtNodes = [ZookeeperClusterNode [id=810226e6-656a-460d-8069-ca7d2dd294ef, 
addrs=[172.17.0.1, 0:0:0:0:0:0:0:1%lo, 172.25.1.28, 127.0.0.1], order=1, 
loc=false, client=false], ZookeeperClusterNode 
[id=be945692-c750-4d72-b9a1-9ac4170ff125, addrs=[172.17.0.1, 172.25.1.28, 
0:0:0:0:0:0:0:1%lo, 127.0.0.1], order=3, loc=true, client=false], 
ZookeeperClusterNode [id=db4503f6-9185-4673-b38c-8890dfa69511, 
addrs=[172.17.0.1, 172.25.1.37, 0:0:0:0:0:0:0:1%lo, 127.0.0.1], order=5, 
loc=false, client=false], ZookeeperClusterNode 
[id=3b8d8d4f-3513-4d39-a1fd-7ec5b15fc653, addrs=[172.17.0.1, 172.25.1.37, 
127.0.0.1, 0:0:0:0:0:0:0:1%lo], order=4, loc=false, client=false], 
ZookeeperClusterNode [id=2bfc8c2e-2f47-4126-9cc4-6f017ce7efde, 
addrs=[172.17.0.1, 172.25.1.37, 0:0:0:0:0:0:0:1%lo, 127.0.0.1], order=6, 
loc=false, client=false]]
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.map(GridDhtTxPrepareFuture.java:1520)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.prepare0(GridDhtTxPrepareFuture.java:1239)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.mapIfLocked(GridDhtTxPrepareFuture.java:671)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.prepare(GridDhtTxPrepareFuture.java:1048)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.prepareAsync(GridDhtTxLocal.java:397)
at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.prepareNearTx(IgniteTxHandler.java:516)
at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processNearTxPrepareRequest0(IgniteTxHandler.java:150)
at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processNearTxPrepareRequest(IgniteTxHandler.java:135)
at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.access$000(IgniteTxHandler.java:97)
at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$1.apply(IgniteTxHandler.java:177)
at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$1.apply(IgniteTxHandler.java:175)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1054)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:579)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:378)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:304)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:99)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:293)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1184)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:125)
at 
org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1091)
at 
org.apache.ignite.internal.util.StripedExecutor$Stripe.run(StripedExecutor.java:511)
at java.lang.Thread.run(Thread.java:748){code}
For atomic cache:
{code:java}
[18:40:12,858][SEVERE][sys-stripe-10-#11][GridCacheIoManager] Failed to process 
message [senderId=9fde40b1-3b21-49de-b1ad-cdd0d9d902e5, messageType=class 
o.a.i.i.processors.cache.distributed.near.GridNearSingleGetRequest]
java.lang.AssertionError: Wrong ready topology version for invalid partitions 
response [topVer=AffinityTopologyVersion [topVer=11, minorTopVer=0], 
req=GridNearSingleGetRequest [futId=1526053201329, key=KeyCacheObjectImpl 
[part=42, val=1514, hasValBytes=true], flags=1, topVer=AffinityTopologyVersion 
[topVer=11, minorTopVer=0], subjId=9fde40b1-3b21-49de-b1ad-cdd0d9d902e5, 
taskNameHash=0, createTtl=-1, accessTtl=-1]]
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter$6.apply(GridDhtCacheAdapter.java:943)
at 
org.apache.ignite.internal.processors.cache.distrib

[jira] [Updated] (IGNITE-8476) AssertionError exception occurs when trying to remove node from baseline under loading

2018-05-14 Thread Dmitry Sherstobitov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-8476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Sherstobitov updated IGNITE-8476:

Description: 
Run 6 nodes, start loading (8 threads, 1000 cache.put() in each thread per 
second)

Kill 2 nodes and try to remove one node from baseline using

control.sh --baseline remove node1
 control.sh --baseline version TOPOLOGY_VERSION

 

Utility hangs or connected client may hangs, this assertion appears in log 
{code:java}
[18:40:12,858][SEVERE][sys-stripe-10-#11][GridCacheIoManager] Failed to process 
message [senderId=9fde40b1-3b21-49de-b1ad-cdd0d9d902e5, messageType=class 
o.a.i.i.processors.cache.distributed.near.GridNearSingleGetRequest]
java.lang.AssertionError: Wrong ready topology version for invalid partitions 
response [topVer=AffinityTopologyVersion [topVer=11, minorTopVer=0], 
req=GridNearSingleGetRequest [futId=1526053201329, key=KeyCacheObjectImpl 
[part=42, val=1514, hasValBytes=true], flags=1, topVer=AffinityTopologyVersion 
[topVer=11, minorTopVer=0], subjId=9fde40b1-3b21-49de-b1ad-cdd0d9d902e5, 
taskNameHash=0, createTtl=-1, accessTtl=-1]]
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter$6.apply(GridDhtCacheAdapter.java:943)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter$6.apply(GridDhtCacheAdapter.java:906)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:383)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:353)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter.processNearSingleGetRequest(GridDhtCacheAdapter.java:906)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$300(GridDhtAtomicCache.java:130)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$4.apply(GridDhtAtomicCache.java:252)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$4.apply(GridDhtAtomicCache.java:247)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1054)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:579)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:378)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:304)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:99)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:293)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1184)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:125)
at 
org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1091)
at 
org.apache.ignite.internal.util.StripedExecutor$Stripe.run(StripedExecutor.java:511)
at java.lang.Thread.run(Thread.java:748){code}

  was:
Run 6 nodes, start loading (8 threads, 1000 cache.put() in each thread per 
second)

Kill 2 nodes and try to remove one node from baseline using

control.sh --baseline remove node1
control.sh --baseline version TOPOLOGY_VERSION

 

Utility hangs, this assertion appears in log
{code:java}
[18:40:12,858][SEVERE][sys-stripe-10-#11][GridCacheIoManager] Failed to process 
message [senderId=9fde40b1-3b21-49de-b1ad-cdd0d9d902e5, messageType=class 
o.a.i.i.processors.cache.distributed.near.GridNearSingleGetRequest]
java.lang.AssertionError: Wrong ready topology version for invalid partitions 
response [topVer=AffinityTopologyVersion [topVer=11, minorTopVer=0], 
req=GridNearSingleGetRequest [futId=1526053201329, key=KeyCacheObjectImpl 
[part=42, val=1514, hasValBytes=true], flags=1, topVer=AffinityTopologyVersion 
[topVer=11, minorTopVer=0], subjId=9fde40b1-3b21-49de-b1ad-cdd0d9d902e5, 
taskNameHash=0, createTtl=-1, accessTtl=-1]]
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter$6.apply(GridDhtCacheAdapter.java:943)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter$6.apply(GridDhtCacheAdapter.java:906)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:383)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:353)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter.processNearSingleGetRequest(GridDhtCacheAdapter.java:906)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCa

[jira] [Updated] (IGNITE-8476) AssertionError exception occurs when trying to remove node from baseline under loading

2018-05-14 Thread Dmitry Sherstobitov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-8476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Sherstobitov updated IGNITE-8476:

Summary: AssertionError exception occurs when trying to remove node from 
baseline under loading  (was: AssertionError exception occurs when trying to 
remove node from baseline by consistentId under loading)

> AssertionError exception occurs when trying to remove node from baseline 
> under loading
> --
>
> Key: IGNITE-8476
> URL: https://issues.apache.org/jira/browse/IGNITE-8476
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.5
>Reporter: Dmitry Sherstobitov
>Priority: Blocker
>
> Run 6 nodes, start loading (8 threads, 1000 cache.put() in each thread per 
> second)
> Kill 2 nodes and try to remove one node from baseline using
> control.sh --baseline remove node1
> control.sh --baseline version TOPOLOGY_VERSION
>  
> Utility hangs, this assertion appears in log
> {code:java}
> [18:40:12,858][SEVERE][sys-stripe-10-#11][GridCacheIoManager] Failed to 
> process message [senderId=9fde40b1-3b21-49de-b1ad-cdd0d9d902e5, 
> messageType=class 
> o.a.i.i.processors.cache.distributed.near.GridNearSingleGetRequest]
> java.lang.AssertionError: Wrong ready topology version for invalid partitions 
> response [topVer=AffinityTopologyVersion [topVer=11, minorTopVer=0], 
> req=GridNearSingleGetRequest [futId=1526053201329, key=KeyCacheObjectImpl 
> [part=42, val=1514, hasValBytes=true], flags=1, 
> topVer=AffinityTopologyVersion [topVer=11, minorTopVer=0], 
> subjId=9fde40b1-3b21-49de-b1ad-cdd0d9d902e5, taskNameHash=0, createTtl=-1, 
> accessTtl=-1]]
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter$6.apply(GridDhtCacheAdapter.java:943)
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter$6.apply(GridDhtCacheAdapter.java:906)
> at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:383)
> at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:353)
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter.processNearSingleGetRequest(GridDhtCacheAdapter.java:906)
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$300(GridDhtAtomicCache.java:130)
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$4.apply(GridDhtAtomicCache.java:252)
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$4.apply(GridDhtAtomicCache.java:247)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1054)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:579)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:378)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:304)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:99)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:293)
> at 
> org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556)
> at 
> org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1184)
> at 
> org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:125)
> at 
> org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1091)
> at 
> org.apache.ignite.internal.util.StripedExecutor$Stripe.run(StripedExecutor.java:511)
> at java.lang.Thread.run(Thread.java:748){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-8476) AssertionError exception occurs when trying to remove node from baseline by consistentId under loading

2018-05-14 Thread Dmitry Sherstobitov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-8476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Sherstobitov updated IGNITE-8476:

Description: 
Run 6 nodes, start loading (8 threads, 1000 cache.put() in each thread per 
second)

Kill 2 nodes and try to remove one node from baseline using

control.sh --baseline remove node1
control.sh --baseline version TOPOLOGY_VERSION

 

Utility hangs, this assertion appears in log
{code:java}
[18:40:12,858][SEVERE][sys-stripe-10-#11][GridCacheIoManager] Failed to process 
message [senderId=9fde40b1-3b21-49de-b1ad-cdd0d9d902e5, messageType=class 
o.a.i.i.processors.cache.distributed.near.GridNearSingleGetRequest]
java.lang.AssertionError: Wrong ready topology version for invalid partitions 
response [topVer=AffinityTopologyVersion [topVer=11, minorTopVer=0], 
req=GridNearSingleGetRequest [futId=1526053201329, key=KeyCacheObjectImpl 
[part=42, val=1514, hasValBytes=true], flags=1, topVer=AffinityTopologyVersion 
[topVer=11, minorTopVer=0], subjId=9fde40b1-3b21-49de-b1ad-cdd0d9d902e5, 
taskNameHash=0, createTtl=-1, accessTtl=-1]]
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter$6.apply(GridDhtCacheAdapter.java:943)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter$6.apply(GridDhtCacheAdapter.java:906)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:383)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:353)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter.processNearSingleGetRequest(GridDhtCacheAdapter.java:906)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$300(GridDhtAtomicCache.java:130)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$4.apply(GridDhtAtomicCache.java:252)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$4.apply(GridDhtAtomicCache.java:247)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1054)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:579)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:378)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:304)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:99)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:293)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1184)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:125)
at 
org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1091)
at 
org.apache.ignite.internal.util.StripedExecutor$Stripe.run(StripedExecutor.java:511)
at java.lang.Thread.run(Thread.java:748){code}

  was:
Run 6 nodes, start loading (8 threads, 1000 cache.put() in each thread per 
second)

Kill 2 nodes and try to remove one node from baseline using

control.sh --baseline remove node1

 

Utility hangs, this assertion appears in log
{code:java}
[18:40:12,858][SEVERE][sys-stripe-10-#11][GridCacheIoManager] Failed to process 
message [senderId=9fde40b1-3b21-49de-b1ad-cdd0d9d902e5, messageType=class 
o.a.i.i.processors.cache.distributed.near.GridNearSingleGetRequest]
java.lang.AssertionError: Wrong ready topology version for invalid partitions 
response [topVer=AffinityTopologyVersion [topVer=11, minorTopVer=0], 
req=GridNearSingleGetRequest [futId=1526053201329, key=KeyCacheObjectImpl 
[part=42, val=1514, hasValBytes=true], flags=1, topVer=AffinityTopologyVersion 
[topVer=11, minorTopVer=0], subjId=9fde40b1-3b21-49de-b1ad-cdd0d9d902e5, 
taskNameHash=0, createTtl=-1, accessTtl=-1]]
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter$6.apply(GridDhtCacheAdapter.java:943)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter$6.apply(GridDhtCacheAdapter.java:906)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:383)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:353)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter.processNearSingleGetRequest(GridDhtCacheAdapter.java:906)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$300(GridDhtAtomicCache.java:130)
at 
org.apache.ignite.internal.proc

[jira] [Updated] (IGNITE-8476) AssertionError exception occurs when trying to remove node from baseline by consistentId under loading

2018-05-14 Thread Dmitry Sherstobitov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-8476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Sherstobitov updated IGNITE-8476:

Priority: Blocker  (was: Critical)

> AssertionError exception occurs when trying to remove node from baseline by 
> consistentId under loading
> --
>
> Key: IGNITE-8476
> URL: https://issues.apache.org/jira/browse/IGNITE-8476
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.5
>Reporter: Dmitry Sherstobitov
>Priority: Blocker
>
> Run 6 nodes, start loading (8 threads, 1000 cache.put() in each thread per 
> second)
> Kill 2 nodes and try to remove one node from baseline using
> control.sh --baseline remove node1
>  
> Utility hangs, this assertion appears in log
> {code:java}
> [18:40:12,858][SEVERE][sys-stripe-10-#11][GridCacheIoManager] Failed to 
> process message [senderId=9fde40b1-3b21-49de-b1ad-cdd0d9d902e5, 
> messageType=class 
> o.a.i.i.processors.cache.distributed.near.GridNearSingleGetRequest]
> java.lang.AssertionError: Wrong ready topology version for invalid partitions 
> response [topVer=AffinityTopologyVersion [topVer=11, minorTopVer=0], 
> req=GridNearSingleGetRequest [futId=1526053201329, key=KeyCacheObjectImpl 
> [part=42, val=1514, hasValBytes=true], flags=1, 
> topVer=AffinityTopologyVersion [topVer=11, minorTopVer=0], 
> subjId=9fde40b1-3b21-49de-b1ad-cdd0d9d902e5, taskNameHash=0, createTtl=-1, 
> accessTtl=-1]]
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter$6.apply(GridDhtCacheAdapter.java:943)
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter$6.apply(GridDhtCacheAdapter.java:906)
> at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:383)
> at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:353)
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter.processNearSingleGetRequest(GridDhtCacheAdapter.java:906)
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$300(GridDhtAtomicCache.java:130)
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$4.apply(GridDhtAtomicCache.java:252)
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$4.apply(GridDhtAtomicCache.java:247)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1054)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:579)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:378)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:304)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:99)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:293)
> at 
> org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556)
> at 
> org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1184)
> at 
> org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:125)
> at 
> org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1091)
> at 
> org.apache.ignite.internal.util.StripedExecutor$Stripe.run(StripedExecutor.java:511)
> at java.lang.Thread.run(Thread.java:748){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   >