[jira] [Closed] (IGNITE-15317) Cache.put/get Jepsen Elle test failed for Java Thin Client
[ https://issues.apache.org/jira/browse/IGNITE-15317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Sherstobitov closed IGNITE-15317. Ignite Flags: (was: Docs Required,Release Notes Required) > Cache.put/get Jepsen Elle test failed for Java Thin Client > -- > > Key: IGNITE-15317 > URL: https://issues.apache.org/jira/browse/IGNITE-15317 > Project: Ignite > Issue Type: Bug > Components: thin client >Affects Versions: 2.10 >Reporter: Dmitry Sherstobitov >Priority: Critical > > I've created simple jepsen elle test with for basic functionality > cache.get/put. There is 2 versions of clients: one for thick client and other > for thin client. Invoke code is exactly the same. > Test passed for thick client and failed for thin client. Note that this issue > is reproducible even with noop nemesis algorithm. > {code:java} > (reduce > (fn [txn' [f k v :as micro-op]] > (case f > :r (let [value (read-value cache k)] >(conj txn' [f k value])) > :w (do >(let [contain-key (.containsKey cache k) > value (read-value cache k)] > (if (or (not contain-key) (and contain-key (< > value v))) >(.put cache k v) >(vreset! tx-state false))) ; bye functional > programming, we are saving state here to fail tx later >(conj txn' micro-op > {code} > GitHub repo with jepsen code: https://github.com/qvad/jepsen (see ignite > folder there, there is also small guide how to launch hyper-v VM and run test > there) > Elle reports following anomalies (:anomaly-types (:G1a :G1b :internal)), > which is aborted reads and intermediate reads. > Anomalies description: > https://sitano.github.io/theory/databases/2019/07/30/tx-isolation-anomalies/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IGNITE-15317) Cache.put/get Jepsen Elle test failed for Java Thin Client
[ https://issues.apache.org/jira/browse/IGNITE-15317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Sherstobitov resolved IGNITE-15317. -- Resolution: Not A Problem > Cache.put/get Jepsen Elle test failed for Java Thin Client > -- > > Key: IGNITE-15317 > URL: https://issues.apache.org/jira/browse/IGNITE-15317 > Project: Ignite > Issue Type: Bug > Components: thin client >Affects Versions: 2.10 >Reporter: Dmitry Sherstobitov >Priority: Critical > > I've created simple jepsen elle test with for basic functionality > cache.get/put. There is 2 versions of clients: one for thick client and other > for thin client. Invoke code is exactly the same. > Test passed for thick client and failed for thin client. Note that this issue > is reproducible even with noop nemesis algorithm. > {code:java} > (reduce > (fn [txn' [f k v :as micro-op]] > (case f > :r (let [value (read-value cache k)] >(conj txn' [f k value])) > :w (do >(let [contain-key (.containsKey cache k) > value (read-value cache k)] > (if (or (not contain-key) (and contain-key (< > value v))) >(.put cache k v) >(vreset! tx-state false))) ; bye functional > programming, we are saving state here to fail tx later >(conj txn' micro-op > {code} > GitHub repo with jepsen code: https://github.com/qvad/jepsen (see ignite > folder there, there is also small guide how to launch hyper-v VM and run test > there) > Elle reports following anomalies (:anomaly-types (:G1a :G1b :internal)), > which is aborted reads and intermediate reads. > Anomalies description: > https://sitano.github.io/theory/databases/2019/07/30/tx-isolation-anomalies/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IGNITE-15317) Cache.put/get Jepsen Elle test failed for Java Thin Client
[ https://issues.apache.org/jira/browse/IGNITE-15317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17400493#comment-17400493 ] Dmitry Sherstobitov commented on IGNITE-15317: -- [~alex_pl] You were right, it was a problem in test. I will close this ticked and continue testing, maybe will add few more nemesis algorithms. > Cache.put/get Jepsen Elle test failed for Java Thin Client > -- > > Key: IGNITE-15317 > URL: https://issues.apache.org/jira/browse/IGNITE-15317 > Project: Ignite > Issue Type: Bug > Components: thin client >Affects Versions: 2.10 >Reporter: Dmitry Sherstobitov >Priority: Critical > > I've created simple jepsen elle test with for basic functionality > cache.get/put. There is 2 versions of clients: one for thick client and other > for thin client. Invoke code is exactly the same. > Test passed for thick client and failed for thin client. Note that this issue > is reproducible even with noop nemesis algorithm. > {code:java} > (reduce > (fn [txn' [f k v :as micro-op]] > (case f > :r (let [value (read-value cache k)] >(conj txn' [f k value])) > :w (do >(let [contain-key (.containsKey cache k) > value (read-value cache k)] > (if (or (not contain-key) (and contain-key (< > value v))) >(.put cache k v) >(vreset! tx-state false))) ; bye functional > programming, we are saving state here to fail tx later >(conj txn' micro-op > {code} > GitHub repo with jepsen code: https://github.com/qvad/jepsen (see ignite > folder there, there is also small guide how to launch hyper-v VM and run test > there) > Elle reports following anomalies (:anomaly-types (:G1a :G1b :internal)), > which is aborted reads and intermediate reads. > Anomalies description: > https://sitano.github.io/theory/databases/2019/07/30/tx-isolation-anomalies/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IGNITE-15317) Cache.put/get Jepsen Elle test failed for Java Thin Client
[ https://issues.apache.org/jira/browse/IGNITE-15317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17400454#comment-17400454 ] Dmitry Sherstobitov commented on IGNITE-15317: -- [~alex_pl] thanks, yeah, it's a valid point. I will fix the test and rerun it then > Cache.put/get Jepsen Elle test failed for Java Thin Client > -- > > Key: IGNITE-15317 > URL: https://issues.apache.org/jira/browse/IGNITE-15317 > Project: Ignite > Issue Type: Bug > Components: thin client >Affects Versions: 2.10 >Reporter: Dmitry Sherstobitov >Priority: Critical > > I've created simple jepsen elle test with for basic functionality > cache.get/put. There is 2 versions of clients: one for thick client and other > for thin client. Invoke code is exactly the same. > Test passed for thick client and failed for thin client. Note that this issue > is reproducible even with noop nemesis algorithm. > {code:java} > (reduce > (fn [txn' [f k v :as micro-op]] > (case f > :r (let [value (read-value cache k)] >(conj txn' [f k value])) > :w (do >(let [contain-key (.containsKey cache k) > value (read-value cache k)] > (if (or (not contain-key) (and contain-key (< > value v))) >(.put cache k v) >(vreset! tx-state false))) ; bye functional > programming, we are saving state here to fail tx later >(conj txn' micro-op > {code} > GitHub repo with jepsen code: https://github.com/qvad/jepsen (see ignite > folder there, there is also small guide how to launch hyper-v VM and run test > there) > Elle reports following anomalies (:anomaly-types (:G1a :G1b :internal)), > which is aborted reads and intermediate reads. > Anomalies description: > https://sitano.github.io/theory/databases/2019/07/30/tx-isolation-anomalies/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (IGNITE-15317) Cache.put/get Jepsen Elle test failed for Java Thin Client
[ https://issues.apache.org/jira/browse/IGNITE-15317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Sherstobitov updated IGNITE-15317: - Summary: Cache.put/get Jepsen Elle test failed for Java Thin Client (was: Simple Jepsen Elle test failed for Java Thin Client) > Cache.put/get Jepsen Elle test failed for Java Thin Client > -- > > Key: IGNITE-15317 > URL: https://issues.apache.org/jira/browse/IGNITE-15317 > Project: Ignite > Issue Type: Bug > Components: thin client >Affects Versions: 2.10 >Reporter: Dmitry Sherstobitov >Priority: Critical > > I've created simple jepsen elle test with for basic functionality > cache.get/put. There is 2 versions of clients: one for thick client and other > for thin client. Invoke code is exactly the same. > Test passed for thick client and failed for thin client. Note that this issue > is reproducible even with noop nemesis algorithm. > {code:java} > (reduce > (fn [txn' [f k v :as micro-op]] > (case f > :r (let [value (read-value cache k)] >(conj txn' [f k value])) > :w (do >(let [contain-key (.containsKey cache k) > value (read-value cache k)] > (if (or (not contain-key) (and contain-key (< > value v))) >(.put cache k v) >(vreset! tx-state false))) ; bye functional > programming, we are saving state here to fail tx later >(conj txn' micro-op > {code} > GitHub repo with jepsen code: https://github.com/qvad/jepsen (see ignite > folder there, there is also small guide how to launch hyper-v VM and run test > there) > Elle reports following anomalies (:anomaly-types (:G1a :G1b :internal)), > which is aborted reads and intermediate reads. > Anomalies description: > https://sitano.github.io/theory/databases/2019/07/30/tx-isolation-anomalies/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (IGNITE-15317) Simple Jepsen Elle test failed for Java Thin Client
[ https://issues.apache.org/jira/browse/IGNITE-15317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Sherstobitov updated IGNITE-15317: - Description: I've created simple jepsen elle test with for basic functionality cache.get/put. There is 2 versions of clients: one for thick client and other for thin client. Invoke code is exactly the same. Test passed for thick client and failed for thin client. Note that this issue is reproducible even with noop nemesis algorithm. {code:java} (reduce (fn [txn' [f k v :as micro-op]] (case f :r (let [value (read-value cache k)] (conj txn' [f k value])) :w (do (let [contain-key (.containsKey cache k) value (read-value cache k)] (if (or (not contain-key) (and contain-key (< value v))) (.put cache k v) (vreset! tx-state false))) ; bye functional programming, we are saving state here to fail tx later (conj txn' micro-op {code} GitHub repo with jepsen code: https://github.com/qvad/jepsen (see ignite folder there, there is also small guide how to launch hyper-v VM and run test there) Elle reports following anomalies (:anomaly-types (:G1a :G1b :internal)), which is aborted reads and intermediate reads. Anomalies description: https://sitano.github.io/theory/databases/2019/07/30/tx-isolation-anomalies/ was: I've created simple jepsen elle test with for basic functionality cache.get/put. There is 2 versions of clients: one for thick client and other for thin client. Invoke code is exactly the same. Test passed for thick client and failed for thin client. Note that this issue is reproducible even with noop nemesis algorithm. GitHub repo with jepsen code: https://github.com/qvad/jepsen (see ignite folder there, there is also small guide how to launch hyper-v VM and run test there) Elle reports following anomalies (:anomaly-types (:G1a :G1b :internal)), which is aborted reads and intermediate reads. Anomalies description: https://sitano.github.io/theory/databases/2019/07/30/tx-isolation-anomalies/ > Simple Jepsen Elle test failed for Java Thin Client > --- > > Key: IGNITE-15317 > URL: https://issues.apache.org/jira/browse/IGNITE-15317 > Project: Ignite > Issue Type: Bug > Components: thin client >Affects Versions: 2.10 >Reporter: Dmitry Sherstobitov >Priority: Critical > > I've created simple jepsen elle test with for basic functionality > cache.get/put. There is 2 versions of clients: one for thick client and other > for thin client. Invoke code is exactly the same. > Test passed for thick client and failed for thin client. Note that this issue > is reproducible even with noop nemesis algorithm. > {code:java} > (reduce > (fn [txn' [f k v :as micro-op]] > (case f > :r (let [value (read-value cache k)] >(conj txn' [f k value])) > :w (do >(let [contain-key (.containsKey cache k) > value (read-value cache k)] > (if (or (not contain-key) (and contain-key (< > value v))) >(.put cache k v) >(vreset! tx-state false))) ; bye functional > programming, we are saving state here to fail tx later >(conj txn' micro-op > {code} > GitHub repo with jepsen code: https://github.com/qvad/jepsen (see ignite > folder there, there is also small guide how to launch hyper-v VM and run test > there) > Elle reports following anomalies (:anomaly-types (:G1a :G1b :internal)), > which is aborted reads and intermediate reads. > Anomalies description: > https://sitano.github.io/theory/databases/2019/07/30/tx-isolation-anomalies/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (IGNITE-15317) Simple Jepsen Elle test failed for Java Thin Client
[ https://issues.apache.org/jira/browse/IGNITE-15317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Sherstobitov updated IGNITE-15317: - Affects Version/s: 2.10 > Simple Jepsen Elle test failed for Java Thin Client > --- > > Key: IGNITE-15317 > URL: https://issues.apache.org/jira/browse/IGNITE-15317 > Project: Ignite > Issue Type: Bug > Components: thin client >Affects Versions: 2.10 >Reporter: Dmitry Sherstobitov >Priority: Critical > > I've created simple jepsen elle test with for basic functionality > cache.get/put. There is 2 versions of clients: one for thick client and other > for thin client. Invoke code is exactly the same. > Test passed for thick client and failed for thin client. Note that this issue > is reproducible even with noop nemesis algorithm. > GitHub repo with jepsen code: https://github.com/qvad/jepsen (see ignite > folder there, there is also small guide how to launch hyper-v VM and run test > there) > Elle reports following anomalies (:anomaly-types (:G1a :G1b :internal)), > which is aborted reads and intermediate reads. > Anomalies description: > https://sitano.github.io/theory/databases/2019/07/30/tx-isolation-anomalies/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (IGNITE-15317) Simple Jepsen Elle test failed for Java Thin Client
[ https://issues.apache.org/jira/browse/IGNITE-15317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Sherstobitov updated IGNITE-15317: - Component/s: thin client > Simple Jepsen Elle test failed for Java Thin Client > --- > > Key: IGNITE-15317 > URL: https://issues.apache.org/jira/browse/IGNITE-15317 > Project: Ignite > Issue Type: Bug > Components: thin client >Reporter: Dmitry Sherstobitov >Priority: Critical > > I've created simple jepsen elle test with for basic functionality > cache.get/put. There is 2 versions of clients: one for thick client and other > for thin client. Invoke code is exactly the same. > Test passed for thick client and failed for thin client. Note that this issue > is reproducible even with noop nemesis algorithm. > GitHub repo with jepsen code: https://github.com/qvad/jepsen (see ignite > folder there, there is also small guide how to launch hyper-v VM and run test > there) > Elle reports following anomalies (:anomaly-types (:G1a :G1b :internal)), > which is aborted reads and intermediate reads. > Anomalies description: > https://sitano.github.io/theory/databases/2019/07/30/tx-isolation-anomalies/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-15317) Simple Jepsen Elle test failed for Java Thin Client
Dmitry Sherstobitov created IGNITE-15317: Summary: Simple Jepsen Elle test failed for Java Thin Client Key: IGNITE-15317 URL: https://issues.apache.org/jira/browse/IGNITE-15317 Project: Ignite Issue Type: Bug Reporter: Dmitry Sherstobitov I've created simple jepsen elle test with for basic functionality cache.get/put. There is 2 versions of clients: one for thick client and other for thin client. Invoke code is exactly the same. Test passed for thick client and failed for thin client. Note that this issue is reproducible even with noop nemesis algorithm. GitHub repo with jepsen code: https://github.com/qvad/jepsen (see ignite folder there, there is also small guide how to launch hyper-v VM and run test there) Elle reports following anomalies (:anomaly-types (:G1a :G1b :internal)), which is aborted reads and intermediate reads. Anomalies description: https://sitano.github.io/theory/databases/2019/07/30/tx-isolation-anomalies/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (IGNITE-11838) Improve usability of UriDeploymentSpi documentation
[ https://issues.apache.org/jira/browse/IGNITE-11838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Sherstobitov updated IGNITE-11838: - Description: I was trying to run UriDeploymentSPi feature and actually failed in it. I've only managed to stop it using the actual Java code. Here some issues in documentation I've found: 1. Not clear what is GAR file and how user can create it (manually?, using some utility?) 2. Local disk folder containing only compiled Java classes - this doesn’t work for me (and according to java code this shouldn't work) 3. Local disk folder with structure of unpacked GAR file - this DOES work but. META-INF/ actually is an optional folder, xyz.class -see previous). The only thing user need is to put lib/ folder in deployment URI and put .jar file there 4. Doesn’t clear what is ignite.xml descriptor file. How user can create it 5. I don’t like windows paths in examples (I think linux paths is more common in case of Ignite, we may create Note with Windows paths examples) 6. In case of Linux path user should write something like this: file:///tmp/path/deployment (3 slashes instead of 2) 7. https://apacheignite.readme.io/docs/service-grid-28#section-service-updates-redeployment - here link to URI looks strange and doesn’t work 8. Previous page: example temporaryDirectoryPath value is optional so we may remove it was: I was trying to run UriDeploymentSPi feature and actually failed in it. I've only managed to stop it sung Java code. Here some issues in documentation I've found: 1. Not clear what is GAR file and how user can create it (manually?, using some utility?) 2. Local disk folder containing only compiled Java classes - this doesn’t work for me (and according to java code this shouldn't work) 3. Local disk folder with structure of unpacked GAR file - this DOES work but. META-INF/ actually is an optional folder, xyz.class -see previous). The only thing user need is to put lib/ folder in deployment URI and put .jar file there 4. Doesn’t clear what is ignite.xml descriptor file. How user can create it 5. I don’t like windows paths in examples (I think linux paths is more common in case of Ignite, we may create Note with Windows paths examples) 6. In case of Linux path user should write something like this: file:///tmp/path/deployment (3 slashes instead of 2) 7. https://apacheignite.readme.io/docs/service-grid-28#section-service-updates-redeployment - here link to URI looks strange and doesn’t work 8. Previous page: example temporaryDirectoryPath value is optional so we may remove it > Improve usability of UriDeploymentSpi documentation > > > Key: IGNITE-11838 > URL: https://issues.apache.org/jira/browse/IGNITE-11838 > Project: Ignite > Issue Type: Bug > Components: documentation >Affects Versions: 2.7 >Reporter: Dmitry Sherstobitov >Priority: Critical > > I was trying to run UriDeploymentSPi feature and actually failed in it. I've > only managed to stop it using the actual Java code. > Here some issues in documentation I've found: > 1. Not clear what is GAR file and how user can create it (manually?, using > some utility?) > 2. Local disk folder containing only compiled Java classes - this doesn’t > work for me (and according to java code this shouldn't work) > 3. Local disk folder with structure of unpacked GAR file - this DOES work > but. META-INF/ actually is an optional folder, xyz.class -see previous). The > only thing user need is to put lib/ folder in deployment URI and put .jar > file there > 4. Doesn’t clear what is ignite.xml descriptor file. How user can create it > 5. I don’t like windows paths in examples (I think linux paths is more common > in case of Ignite, we may create Note with Windows paths examples) > 6. In case of Linux path user should write something like this: > file:///tmp/path/deployment (3 slashes instead of 2) > 7. > https://apacheignite.readme.io/docs/service-grid-28#section-service-updates-redeployment > - here link to URI looks strange and doesn’t work > 8. Previous page: example temporaryDirectoryPath value is optional so we may > remove it -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11838) Improve usability of UriDeploymentSpi documentation
Dmitry Sherstobitov created IGNITE-11838: Summary: Improve usability of UriDeploymentSpi documentation Key: IGNITE-11838 URL: https://issues.apache.org/jira/browse/IGNITE-11838 Project: Ignite Issue Type: Bug Components: documentation Affects Versions: 2.7 Reporter: Dmitry Sherstobitov I was trying to run UriDeploymentSPi feature and actually failed in it. I've only managed to stop it sung Java code. Here some issues in documentation I've found: 1. Not clear what is GAR file and how user can create it (manually?, using some utility?) 2. Local disk folder containing only compiled Java classes - this doesn’t work for me (and according to java code this shouldn't work) 3. Local disk folder with structure of unpacked GAR file - this DOES work but. META-INF/ actually is an optional folder, xyz.class -see previous). The only thing user need is to put lib/ folder in deployment URI and put .jar file there 4. Doesn’t clear what is ignite.xml descriptor file. How user can create it 5. I don’t like windows paths in examples (I think linux paths is more common in case of Ignite, we may create Note with Windows paths examples) 6. In case of Linux path user should write something like this: file:///tmp/path/deployment (3 slashes instead of 2) 7. https://apacheignite.readme.io/docs/service-grid-28#section-service-updates-redeployment - here link to URI looks strange and doesn’t work 8. Previous page: example temporaryDirectoryPath value is optional so we may remove it -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11667) OPTIMISTIC REPEATEBLE_READ transactions does not guarantee transactional consistency in blinking node scenario
Dmitry Sherstobitov created IGNITE-11667: Summary: OPTIMISTIC REPEATEBLE_READ transactions does not guarantee transactional consistency in blinking node scenario Key: IGNITE-11667 URL: https://issues.apache.org/jira/browse/IGNITE-11667 Project: Ignite Issue Type: Bug Reporter: Dmitry Sherstobitov Following scenario Start cluster, load data Start transactional loading (simple transfer task with PESSIMISTIC + OPTIMISTIC, REPEATABLE_READ transactions) repeat 10 times: Stop one node, sleep 10 seconds, start again Wait for finish rebalance (LocalNodeMovingPartitionsCount == 0 for each cache/cache_group) Validate that there is no conflicts in sum of fields (verify action for transfer task) In case of OPTIMISTIC/REPEATABLE_READ transactions there is no guarantee that transactional consistence will be supported. (last validate step will be failed) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11609) Add support of authentication and SSL in yardstick IgniteThinClient benchmark
Dmitry Sherstobitov created IGNITE-11609: Summary: Add support of authentication and SSL in yardstick IgniteThinClient benchmark Key: IGNITE-11609 URL: https://issues.apache.org/jira/browse/IGNITE-11609 Project: Ignite Issue Type: New Feature Affects Versions: 2.7 Reporter: Dmitry Sherstobitov Fix For: 2.8 Add support of following keys: Mandatory authentication: USER PASSWORD Mandatory SSL: SSL_KEY_PASSWORD SSL_KEY_PATH Optional SSL: SSL_CLIENT_STORE_TYPE (default JKS) SSL_SERVER_STORE_TYPE (default JKS) SSL_KEY_ALGORITHM (default SunX509) SSL_TRUST_ALL (default false) SSL_PROTOCOL (default TLS) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-11461) Automatic modules support for Apache Ignite: find and resolve packages conflicts
[ https://issues.apache.org/jira/browse/IGNITE-11461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16793498#comment-16793498 ] Dmitry Sherstobitov commented on IGNITE-11461: -- [~dpavlov] Looks good for me. > Automatic modules support for Apache Ignite: find and resolve packages > conflicts > > > Key: IGNITE-11461 > URL: https://issues.apache.org/jira/browse/IGNITE-11461 > Project: Ignite > Issue Type: Improvement >Reporter: Dmitriy Pavlov >Assignee: Dmitriy Pavlov >Priority: Critical > Fix For: 2.8 > > Time Spent: 50m > Remaining Estimate: 0h > > Example of failure in a modular environment: > Error:java: the unnamed module reads package > org.apache.ignite.internal.processors.cache.persistence.file from both > ignite.core and ignite.direct.io > This type of failure is named package inference, but it is strictly > prohibited > http://openjdk.java.net/projects/jigsaw/spec/reqs/#non-interference > Ignite compatibility with Jigsaw is tested in a separate project. See details > in > https://github.com/apache/ignite/tree/ignite-11461-java11/modules/dev-utils/ignite-modules-test#ignite-modular-environment-test-project > > Following table contains currenly investigated Ignite modules if this > applicability as automatic modules: > ||Module||Run In Modular Environment||Changeable using private API only || > Notes || > |ignite-code|(/)|(/)| | > |ignite-indexing|(x) [IGNITE-11464] | (?) Refacrtoing to use > ignite-indexing-text may be a breaking change | Lucene artifacts exclusion is > required by user manually. | > |ignite-compress | (x) | (/) not releaseed | > org.apache.ignite.internal.processors.compress package conflict | > |ignite-direct-io|(x) blocked by indexind | (/) | > org.apache.ignite.internal.processors.cache.persistence.file package conflict > | > |ignite-spring|(x) [IGNITE-11467] blocked by indexing | (x) > org.apache.ignite.IgniteSpringBean affected | | > |ignite-ml |(x) blocked by indexing | | | > |ignite-log4j|(/)|(/) | But may not compile with other logging dependencies - > EOL https://blogs.apache.org/logging/entry/moving_on_to_log4j_2 | > |ignite-log4j2|(/)|(/)| | > |ignite-slf4j | (/)|(/)| | > |ignite-rest-http | (x) IGNITE-11469 & Mirgate to log4j2x [IGNITE-11486] | > (/) | Usage with slf4j may break compilation because conflict of packages | > |ignite-hibernate_5.3 and others | (x) [IGNITE-11485] | (?) | avoid of API > breaking is possibleif hibernate core classes not used by third party code | > |ignite-zookeeper| (x) IGNITE-11486 | (/) | | > |ignite-spring-data_2-0 | (x) blocked by spring | org.apache.commons.logging > from both commons.logging and spring.jcl conflict | > https://jira.spring.io/browse/SPR-16605 | > |ignite-ml | (/) master (x) 2.7 | | | > |ignite-cassandra-store | (x) [IGNITE-11467] blocked by spring | (/) | > Only spring needs to be fixed | -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11407) AssertionError may occurs on server start
Dmitry Sherstobitov created IGNITE-11407: Summary: AssertionError may occurs on server start Key: IGNITE-11407 URL: https://issues.apache.org/jira/browse/IGNITE-11407 Project: Ignite Issue Type: Bug Reporter: Dmitry Sherstobitov See https://issues.apache.org/jira/browse/IGNITE-11406 (same scenario) On 5th iteration (on each iteration there is 50 round cluster nodes restart) {code:java} ava.lang.AssertionError at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.stopRoutine(GridContinuousProcessor.java:743) at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.executeQuery0(CacheContinuousQueryManager.java:705) at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.executeInternalQuery(CacheContinuousQueryManager.java:542) at org.apache.ignite.internal.processors.datastructures.DataStructuresProcessor.startQuery(DataStructuresProcessor.java:213) at org.apache.ignite.internal.processors.datastructures.DataStructuresProcessor.getAtomic(DataStructuresProcessor.java:541) at org.apache.ignite.internal.processors.datastructures.DataStructuresProcessor.atomicLong(DataStructuresProcessor.java:457) at org.apache.ignite.internal.IgniteKernal.atomicLong(IgniteKernal.java:3468) at org.apache.ignite.internal.IgniteKernal.atomicLong(IgniteKernal.java:3457) at org.apache.ignite.piclient.bean.LifecycleAtomicLongBean.onLifecycleEvent(LifecycleAtomicLongBean.java:48) at org.apache.ignite.internal.IgniteKernal.notifyLifecycleBeans(IgniteKernal.java:655) at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1064) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1973) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1716) at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1144) at org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1062) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:948) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:847) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:717) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:686) at org.apache.ignite.Ignition.start(Ignition.java:352) at org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:302) Failed to start grid: null{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11408) AssertionError may occurs on client start
Dmitry Sherstobitov created IGNITE-11408: Summary: AssertionError may occurs on client start Key: IGNITE-11408 URL: https://issues.apache.org/jira/browse/IGNITE-11408 Project: Ignite Issue Type: Bug Reporter: Dmitry Sherstobitov Scenario from: https://issues.apache.org/jira/browse/IGNITE-11406 AssertionError may occurs on client start: {code} 2019-02-23T18:26:27,317][DEBUG][tcp-client-disco-msg-worker-#4][TcpDiscoverySpi] Grid runnable finished normally: tcp-client-disco-msg-worker-#4 Exception in thread “tcp-client-disco-msg-worker-#4” java.lang.AssertionError: TcpDiscoveryClientReconnectMessage [routerNodeId=76b33f1b-bef6-4805-bcca-0ea32df641ac, lastMsgId=null, super=TcpDiscoveryAbstractMessage [sndNodeId=76b33f1b-bef6-4805-bcca-0ea32df641ac, id=57c55fa1961-99d3d909-fa44-4b74-aea4-d375ad85e53e, verifierNodeId=6ba6bd09-4bc0-400c-ba11-a06d2507e983, topVer=0, pendingIdx=0, failedNodes=null, isClient=false]] at org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.processClientReconnectMessage(ClientImpl.java:2311) at org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.processDiscoveryMessage(ClientImpl.java:1914) at org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.body(ClientImpl.java:1798) at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) {code} Other trace {code:java} at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) [piclient-2.7.jar:?] at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) [piclient-2.7.jar:?] at py4j.Gateway.invoke(Gateway.java:282) [piclient-2.7.jar:?] at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) [piclient-2.7.jar:?] at py4j.commands.CallCommand.execute(CallCommand.java:79) [piclient-2.7.jar:?] at py4j.GatewayConnection.run(GatewayConnection.java:238) [piclient-2.7.jar:?] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181] Caused by: org.apache.ignite.IgniteCheckedException: Failed to start SPI: TcpDiscoverySpi [addrRslvr=null, sockTimeout=3, ackTimeout=6, marsh=JdkMarshaller [clsFilter=org.apache.ignite.marshaller.MarshallerUtils$1@59f2595b], reconCnt=2, reconDelay=2000, maxAckTimeout=30, forceSrvMode=false, clientReconnectDisabled=false, internalLsnr=null, skipAddrsRandomization=false] at org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:300) ~[ignite-core-2.4.15.jar:2.4.15] at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.start(GridDiscoveryManager.java:901) ~[ignite-core-2.4.15.jar:2.4.15] at org.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1672) ~[ignite-core-2.4.15.jar:2.4.15] ... 22 more Caused by: org.apache.ignite.spi.IgniteSpiException: Some error in join process. at org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.body(ClientImpl.java:1809) ~[ignite-core-2.4.15.jar:2.4.15] at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) ~[ignite-core-2.4.15.jar:2.4.15] 2019-02-23T18:26:27,320][ERROR][tcp-client-disco-sock-reader-#3][TcpDiscoverySpi] Connection failed [sock=Socket[addr=/172.25.1.34,port=47503,localport=60675], locNodeId=99d3d909-fa44-4b74-aea4-d375ad85e53e] 2019-02-23T18:26:27,320][ERROR][Thread-2][IgniteKernal] Got exception while starting (will rollback startup routine).{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-11407) AssertionError may occurs on server start
[ https://issues.apache.org/jira/browse/IGNITE-11407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Sherstobitov updated IGNITE-11407: - Description: See https://issues.apache.org/jira/browse/IGNITE-11406 (same scenario) On 5th iteration (on each iteration there is 50 round cluster nodes restart) There is atomic long started in lifecyclebean: {code:java} LifecycleAtomicLongBean implements LifecycleBean { /** * Auto-inject ignite instance. */ @IgniteInstanceResource private Ignite ignite; /** * atomicLongName */ private String atomicLongName; /** * Event type */ private LifecycleEventType eventType; /** * Logger */ private static final Logger log = LogManager.getLogger(IgniteService.class); public LifecycleAtomicLongBean(String atomicLongName, LifecycleEventType eventType) { this.atomicLongName = atomicLongName; this.eventType = eventType; } /** {@inheritDoc} */ @Override public void onLifecycleEvent(LifecycleEventType evt) { System.out.println(); System.out.println(">>> Lifecycle event occurred: " + evt); System.out.println(">>> Ignite name: " + ignite.name()); if (evt == eventType) { IgniteAtomicLong atomicLong = ignite.atomicLong(atomicLongName, 0, true); log.info(">>> Ignite Atomic Long"); log.info("Atomic long initial value : " + atomicLong.getAndIncrement() + '.'); } } }{code} Configuration: {code:java} AFTER_NODE_START {code} Error on start {code:java} ava.lang.AssertionError at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.stopRoutine(GridContinuousProcessor.java:743) at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.executeQuery0(CacheContinuousQueryManager.java:705) at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.executeInternalQuery(CacheContinuousQueryManager.java:542) at org.apache.ignite.internal.processors.datastructures.DataStructuresProcessor.startQuery(DataStructuresProcessor.java:213) at org.apache.ignite.internal.processors.datastructures.DataStructuresProcessor.getAtomic(DataStructuresProcessor.java:541) at org.apache.ignite.internal.processors.datastructures.DataStructuresProcessor.atomicLong(DataStructuresProcessor.java:457) at org.apache.ignite.internal.IgniteKernal.atomicLong(IgniteKernal.java:3468) at org.apache.ignite.internal.IgniteKernal.atomicLong(IgniteKernal.java:3457) at org.apache.ignite.piclient.bean.LifecycleAtomicLongBean.onLifecycleEvent(LifecycleAtomicLongBean.java:48) at org.apache.ignite.internal.IgniteKernal.notifyLifecycleBeans(IgniteKernal.java:655) at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1064) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1973) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1716) at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1144) at org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1062) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:948) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:847) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:717) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:686) at org.apache.ignite.Ignition.start(Ignition.java:352) at org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:302) Failed to start grid: null{code} was: See https://issues.apache.org/jira/browse/IGNITE-11406 (same scenario) On 5th iteration (on each iteration there is 50 round cluster nodes restart) {code:java} ava.lang.AssertionError at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.stopRoutine(GridContinuousProcessor.java:743) at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.executeQuery0(CacheContinuousQueryManager.java:705) at org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.executeInternalQuery(CacheContinuousQueryManager.java:542) at org.apache.ignite.internal.processors.datastructures.DataStructuresProcessor.startQuery(DataStructuresProcessor.java:213) at org.apache.ignite.internal.processors.datastructures.DataStructuresProcessor.getAtomic(DataStructuresProcessor.java:541) at org.apache.ignite.internal.processors.datastructures.DataStructuresProcessor.atomicLong(DataStructuresProcessor.java:457) at org.apache.ignite.internal.IgniteKernal.atomicLong(IgniteKernal.java:3468) at org.apache.ignite.internal.IgniteKernal.atomicLong(IgniteKernal.java:3457) at org.apache.ignite.piclient.bean.LifecycleAtomicLongBean.onLifecycleEvent(LifecycleAtomicLongBean.java:48) at org.apache.ignite.internal.IgniteKernal.notifyLifecycleBeans(IgniteKernal.java:655) at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1064) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance
[jira] [Created] (IGNITE-11406) NullPointerException may occurs on client start
Dmitry Sherstobitov created IGNITE-11406: Summary: NullPointerException may occurs on client start Key: IGNITE-11406 URL: https://issues.apache.org/jira/browse/IGNITE-11406 Project: Ignite Issue Type: Bug Reporter: Dmitry Sherstobitov During testing fixes for https://issues.apache.org/jira/browse/IGNITE-10878 # Start cluster, create caches with no persistence and load data into it # Restart each node in cluster by order (coordinator first) Do not wait until topology message occurs # Try to run utilities: activate, baseline (to check that cluster is alive) # Run clients and load data into alive caches On 4th step one of the clients throw NPE on start {code:java} 2019-02-23T18:36:24,045][DEBUG][tcp-client-disco-msg-worker-#4][TcpDiscoverySpi] Connection closed, local node received force fail message, will not try to restore connection 2019-02-23T18:36:24,045][DEBUG][tcp-client-disco-msg-worker-#4][TcpDiscoverySpi] Failed to restore closed connection, will try to reconnect [networkTimeout=5000, joinTimeout=0, failMsg=TcpDiscoveryNodeFailedMessage [failedNodeId=80f8b6ee-6a6d-4235-86e9-1b66ea310eb6, order=90, warning=Client node considered as unreachable and will be dropped from cluster, because no metrics update messages received in interval: TcpDiscoverySpi.clientFailureDetectionTimeout() ms. It may be caused by network problems or long GC pause on client node, try to increase this parameter. [nodeId=80f8b6ee-6a6d-4235-86e9-1b66ea310eb6, clientFailureDetectionTimeout=3], super=TcpDiscoveryAbstractMessage [sndNodeId=987d4a03-8233-4130-af5b-c06900bdb6d7, id=3642cfa1961-987d4a03-8233-4130-af5b-c06900bdb6d7, verifierNodeId=d9abbff3-4b4d-4a13-9cb1-0ca4d2436164, topVer=167, pendingIdx=0, failedNodes=null, isClient=false]]] 2019-02-23T18:36:24,046][DEBUG][tcp-client-disco-msg-worker-#4][TcpDiscoverySpi] Discovery notification [node=TcpDiscoveryNode [id=80f8b6ee-6a6d-4235-86e9-1b66ea310eb6, addrs=[172.25.1.34], sockAddrs=[lab34.gridgain.local/172.25.1.34:0], discPort=0, order=165, intOrder=0, lastExchangeTime=1550936128313, loc=true, ver=2.4.15#20190222-sha1:36b1d676, isClient=true], type=CLIENT_NODE_DISCONNECTED, topVer=166] 2019-02-23T18:36:24,049][INFO ][tcp-client-disco-msg-worker-#4][GridDhtPartitionsExchangeFuture] Finish exchange future [startVer=AffinityTopologyVersion [topVer=165, minorTopVer=0], resVer=null, err=class org.apache.ignite.internal.IgniteClientDisconnectedCheckedException: Client node disconnected: null] [2019-02-23T18:36:24,061][ERROR][Thread-2][IgniteKernal] Got exception while starting (will rollback startup routine). java.lang.NullPointerException: null at org.apache.ignite.internal.processors.cache.GridCacheProcessor.internalCacheEx(GridCacheProcessor.java:3886) ~[ignite-core-2.4.15.jar:2.4.15] at org.apache.ignite.internal.processors.cache.GridCacheProcessor.utilityCache(GridCacheProcessor.java:3858) ~[ignite-core-2.4.15.jar:2.4.15] at org.apache.ignite.internal.processors.service.GridServiceProcessor.updateUtilityCache(GridServiceProcessor.java:290) ~[ignite-core-2.4.15.jar:2.4.15] at org.apache.ignite.internal.processors.service.GridServiceProcessor.onKernalStart0(GridServiceProcessor.java:233) ~[ignite-core-2.4.15.jar:2.4.15] at org.apache.ignite.internal.processors.service.GridServiceProcessor.onKernalStart(GridServiceProcessor.java:221) ~[ignite-core-2.4.15.jar:2.4.15] at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1038) [ignite-core-2.4.15.jar:2.4.15] at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1973) [ignite-core-2.4.15.jar:2.4.15] at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1716) [ignite-core-2.4.15.jar:2.4.15] at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1144) [ignite-core-2.4.15.jar:2.4.15] at org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1062) [ignite-core-2.4.15.jar:2.4.15] at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:948) [ignite-core-2.4.15.jar:2.4.15] at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:847) [ignite-core-2.4.15.jar:2.4.15] at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:717) [ignite-core-2.4.15.jar:2.4.15] at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:686) [ignite-core-2.4.15.jar:2.4.15] at org.apache.ignite.Ignition.start(Ignition.java:352) [ignite-core-2.4.15.jar:2.4.15] at org.apache.ignite.piclient.api.IgniteService.startIgniteClientNode(IgniteService.java:86) [piclient-2.7.jar:?] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_181] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce
[jira] [Updated] (IGNITE-11292) There is no way to disable WAL for cache in group
[ https://issues.apache.org/jira/browse/IGNITE-11292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Sherstobitov updated IGNITE-11292: - Description: Following code doesn't work if cache is in a cacheGroup: {code:java} ignite.cluster().disableWal(cacheName){code} cacheName == cacheName: {code:java} Caused by: class org.apache.ignite.IgniteCheckedException: Cannot change WAL mode because not all cache names belonging to the group are provided [group=cache_group_1, missingCaches=[cache_group_1_005, cache_group_3_063, cache_group_1_003, cache_group_3_064, cache_group_1_004, cache_group_3_061, cache_group_3_062, cache_group_1_001, cache_group_1_002]] {code} cacheName == groupName: {code:java} Caused by: class org.apache.ignite.IgniteCheckedException: Cache doesn't exist: cache_group_1{code} Also there is no javadoc about this behaviour was: Following code doesn't work if cache is in a cacheGroup: {code:java} ignite.cluster().disableWal(cacheName){code} cacheName == cacheName: {code:java} Caused by: class org.apache.ignite.IgniteCheckedException: Cannot change WAL mode because not all cache names belonging to the group are provided [group=cache_group_1, missingCaches=[cache_group_1_005, cache_group_3_063, cache_group_1_003, cache_group_3_064, cache_group_1_004, cache_group_3_061, cache_group_3_062, cache_group_1_001, cache_group_1_002]] {code} cacheName == groupName: {code:java} Caused by: class org.apache.ignite.IgniteCheckedException: Cache doesn't exist: cache_group_1{code} Also there is no javadoc about this behavious > There is no way to disable WAL for cache in group > - > > Key: IGNITE-11292 > URL: https://issues.apache.org/jira/browse/IGNITE-11292 > Project: Ignite > Issue Type: Bug >Reporter: Dmitry Sherstobitov >Priority: Critical > > Following code doesn't work if cache is in a cacheGroup: > {code:java} > ignite.cluster().disableWal(cacheName){code} > cacheName == cacheName: > {code:java} > Caused by: class org.apache.ignite.IgniteCheckedException: Cannot change WAL > mode because not all cache names belonging to the group are provided > [group=cache_group_1, missingCaches=[cache_group_1_005, cache_group_3_063, > cache_group_1_003, cache_group_3_064, cache_group_1_004, cache_group_3_061, > cache_group_3_062, cache_group_1_001, cache_group_1_002]] > {code} > cacheName == groupName: > {code:java} > Caused by: class org.apache.ignite.IgniteCheckedException: Cache doesn't > exist: cache_group_1{code} > Also there is no javadoc about this behaviour -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-11292) There is no way to disable WAL for cache in group
[ https://issues.apache.org/jira/browse/IGNITE-11292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Sherstobitov updated IGNITE-11292: - Description: Following code doesn't work if cache is in a cacheGroup: {code:java} ignite.cluster().disableWal(cacheName){code} cacheName == cacheName: {code:java} Caused by: class org.apache.ignite.IgniteCheckedException: Cannot change WAL mode because not all cache names belonging to the group are provided [group=cache_group_1, missingCaches=[cache_group_1_005, cache_group_3_063, cache_group_1_003, cache_group_3_064, cache_group_1_004, cache_group_3_061, cache_group_3_062, cache_group_1_001, cache_group_1_002]] {code} cacheName == groupName: {code:java} Caused by: class org.apache.ignite.IgniteCheckedException: Cache doesn't exist: cache_group_1{code} Also there is no javadoc about this behavious was: Following code doesn't work if cache is in a cacheGroup: {code:java} ignite.cluster().disableWal(cacheName){code} cacheName == cacheName: {code:java} Caused by: class org.apache.ignite.IgniteCheckedException: Cannot change WAL mode because not all cache names belonging to the group are provided [group=cache_group_1, missingCaches=[cache_group_1_005, cache_group_3_063, cache_group_1_003, cache_group_3_064, cache_group_1_004, cache_group_3_061, cache_group_3_062, cache_group_1_001, cache_group_1_002]] {code} cacheName == groupName: {code:java} Caused by: class org.apache.ignite.IgniteCheckedException: Cache doesn't exist: cache_group_1 {code} > There is no way to disable WAL for cache in group > - > > Key: IGNITE-11292 > URL: https://issues.apache.org/jira/browse/IGNITE-11292 > Project: Ignite > Issue Type: Bug >Reporter: Dmitry Sherstobitov >Priority: Critical > > Following code doesn't work if cache is in a cacheGroup: > {code:java} > ignite.cluster().disableWal(cacheName){code} > cacheName == cacheName: > {code:java} > Caused by: class org.apache.ignite.IgniteCheckedException: Cannot change WAL > mode because not all cache names belonging to the group are provided > [group=cache_group_1, missingCaches=[cache_group_1_005, cache_group_3_063, > cache_group_1_003, cache_group_3_064, cache_group_1_004, cache_group_3_061, > cache_group_3_062, cache_group_1_001, cache_group_1_002]] > {code} > cacheName == groupName: > {code:java} > Caused by: class org.apache.ignite.IgniteCheckedException: Cache doesn't > exist: cache_group_1{code} > Also there is no javadoc about this behavious -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-11292) There is no way to disable WAL for cache in group
[ https://issues.apache.org/jira/browse/IGNITE-11292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Sherstobitov updated IGNITE-11292: - Description: Following code doesn't work if cache is in a cacheGroup: {code:java} ignite.cluster().disableWal(cacheName){code} cacheName == cacheName: {code:java} Caused by: class org.apache.ignite.IgniteCheckedException: Cannot change WAL mode because not all cache names belonging to the group are provided [group=cache_group_1, missingCaches=[cache_group_1_005, cache_group_3_063, cache_group_1_003, cache_group_3_064, cache_group_1_004, cache_group_3_061, cache_group_3_062, cache_group_1_001, cache_group_1_002]] {code} cacheName == groupName: {code:java} Caused by: class org.apache.ignite.IgniteCheckedException: Cache doesn't exist: cache_group_1 {code} was: Following code doesn't work if cache is in cacheGroup: {code}ignite.cluster().disableWal(cacheName){code} cacheName == cacheName: {code} Caused by: class org.apache.ignite.IgniteCheckedException: Cannot change WAL mode because not all cache names belonging to the group are provided [group=cache_group_1, missingCaches=[cache_group_1_005, cache_group_3_063, cache_group_1_003, cache_group_3_064, cache_group_1_004, cache_group_3_061, cache_group_3_062, cache_group_1_001, cache_group_1_002]] {code} cacheName == groupName: {code} Caused by: class org.apache.ignite.IgniteCheckedException: Cache doesn't exist: cache_group_1 {code} > There is no way to disable WAL for cache in group > - > > Key: IGNITE-11292 > URL: https://issues.apache.org/jira/browse/IGNITE-11292 > Project: Ignite > Issue Type: Bug >Reporter: Dmitry Sherstobitov >Priority: Critical > > Following code doesn't work if cache is in a cacheGroup: > {code:java} > ignite.cluster().disableWal(cacheName){code} > cacheName == cacheName: > {code:java} > Caused by: class org.apache.ignite.IgniteCheckedException: Cannot change WAL > mode because not all cache names belonging to the group are provided > [group=cache_group_1, missingCaches=[cache_group_1_005, cache_group_3_063, > cache_group_1_003, cache_group_3_064, cache_group_1_004, cache_group_3_061, > cache_group_3_062, cache_group_1_001, cache_group_1_002]] > {code} > cacheName == groupName: > {code:java} > Caused by: class org.apache.ignite.IgniteCheckedException: Cache doesn't > exist: cache_group_1 > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-11292) There is no way to disable WAL for cache in group
[ https://issues.apache.org/jira/browse/IGNITE-11292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16765037#comment-16765037 ] Dmitry Sherstobitov commented on IGNITE-11292: -- [~avinogradov] Could you please look at this issue? > There is no way to disable WAL for cache in group > - > > Key: IGNITE-11292 > URL: https://issues.apache.org/jira/browse/IGNITE-11292 > Project: Ignite > Issue Type: Bug >Reporter: Dmitry Sherstobitov >Priority: Critical > > Following code doesn't work if cache is in cacheGroup: > {code}ignite.cluster().disableWal(cacheName){code} > cacheName == cacheName: > {code} > Caused by: class org.apache.ignite.IgniteCheckedException: Cannot change WAL > mode because not all cache names belonging to the group are provided > [group=cache_group_1, missingCaches=[cache_group_1_005, cache_group_3_063, > cache_group_1_003, cache_group_3_064, cache_group_1_004, cache_group_3_061, > cache_group_3_062, cache_group_1_001, cache_group_1_002]] > {code} > cacheName == groupName: > {code} > Caused by: class org.apache.ignite.IgniteCheckedException: Cache doesn't > exist: cache_group_1 > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11292) There is no way to disable WAL for cache in group
Dmitry Sherstobitov created IGNITE-11292: Summary: There is no way to disable WAL for cache in group Key: IGNITE-11292 URL: https://issues.apache.org/jira/browse/IGNITE-11292 Project: Ignite Issue Type: Bug Reporter: Dmitry Sherstobitov Following code doesn't work if cache is in cacheGroup: {code}ignite.cluster().disableWal(cacheName){code} cacheName == cacheName: {code} Caused by: class org.apache.ignite.IgniteCheckedException: Cannot change WAL mode because not all cache names belonging to the group are provided [group=cache_group_1, missingCaches=[cache_group_1_005, cache_group_3_063, cache_group_1_003, cache_group_3_064, cache_group_1_004, cache_group_3_061, cache_group_3_062, cache_group_1_001, cache_group_1_002]] {code} cacheName == groupName: {code} Caused by: class org.apache.ignite.IgniteCheckedException: Cache doesn't exist: cache_group_1 {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11100) AssertionError LocalJoinCachesContext occurs in sequential cluster restart
Dmitry Sherstobitov created IGNITE-11100: Summary: AssertionError LocalJoinCachesContext occurs in sequential cluster restart Key: IGNITE-11100 URL: https://issues.apache.org/jira/browse/IGNITE-11100 Project: Ignite Issue Type: Bug Reporter: Dmitry Sherstobitov Same scenario as in https://issues.apache.org/jira/browse/IGNITE-10878 {code} [2019-01-26T03:32:22,226][ERROR][tcp-disco-msg-worker-#2][TcpDiscoverySpi] TcpDiscoverSpi's message worker thread failed abnormally. Stopping the node in order to prevent cluster wide instability. java.lang.AssertionError: LocalJoinCachesContext [locJoinStartCaches=[IgniteBiTuple [val1=DynamicCacheDescriptor [deploymentId=bc3e0978861-fb98885f-92a5-47d2-9475-00173fab8ee1, staticCfg=true, sql=false, cacheType=UTILITY, template=false, updatesAllowed=true, cacheId=-2100569601, rcvdFrom=f97e4743-6cf2-488e-a7fc-14707e9a8eb0, objCtx=null, rcvdOnDiscovery=false, startTopVer=null, rcvdFromVer=null, clientCacheStartVer=null, schema=QuerySchema [], grpDesc=CacheGroupDescriptor [grpId=-2100569601, grpName=null, startTopVer=null, rcvdFrom=f97e4743-6cf2-488e-a7fc-14707e9a8eb0, deploymentId=bc3e0978861-fb98885f-92a5-47d2-9475-00173fab8ee1, caches={ignite-sys-cache=-2100569601}, rcvdFromVer=null, persistenceEnabled=false, walEnabled=false, cacheName=ignite-sys-cache], cacheName=ignite-sys-cache], val2=null], IgniteBiTuple [val1=DynamicCacheDescriptor [deploymentId=60771978861-398164df-6240-4d19-ad0b-308768d2a095, staticCfg=false, sql=false, cacheType=USER, template=false, updatesAllowed=true, cacheId=-1901084566, rcvdFrom=f00ec506-fc6c-45c5-b550-9308d17a39cf, objCtx=null, rcvdOnDiscovery=true, startTopVer=null, rcvdFromVer=null, clientCacheStartVer=null, schema=QuerySchema [], grpDesc=CacheGroupDescriptor [grpId=-1901084566, grpName=null, startTopVer=AffinityTopologyVersion [topVer=13, minorTopVer=20], rcvdFrom=f00ec506-fc6c-45c5-b550-9308d17a39cf, deploymentId=60771978861-398164df-6240-4d19-ad0b-308768d2a095, caches={config_third_copy=-1901084566}, rcvdFromVer=null, persistenceEnabled=false, walEnabled=false, cacheName=config_third_copy], cacheName=config_third_copy], val2=null], IgniteBiTuple [val1=DynamicCacheDescriptor [deploymentId=01771978861-398164df-6240-4d19-ad0b-308768d2a095, staticCfg=false, sql=false, cacheType=USER, template=false, updatesAllowed=true, cacheId=-1858528402, rcvdFrom=f00ec506-fc6c-45c5-b550-9308d17a39cf, objCtx=null, rcvdOnDiscovery=true, startTopVer=null, rcvdFromVer=null, clientCacheStartVer=null, schema=QuerySchema [], grpDesc=CacheGroupDescriptor [grpId=-1858528402, grpName=null, startTopVer=AffinityTopologyVersion [topVer=13, minorTopVer=22], rcvdFrom=f00ec506-fc6c-45c5-b550-9308d17a39cf, deploymentId=01771978861-398164df-6240-4d19-ad0b-308768d2a095, caches={trans_forth_copy=-1858528402}, rcvdFromVer=null, persistenceEnabled=false, walEnabled=false, cacheName=trans_forth_copy], cacheName=trans_forth_copy], val2=null], IgniteBiTuple [val1=DynamicCacheDescriptor [deploymentId=51771978861-398164df-6240-4d19-ad0b-308768d2a095, staticCfg=false, sql=false, cacheType=USER, template=false, updatesAllowed=true, cacheId=-1502999781, rcvdFrom=f00ec506-fc6c-45c5-b550-9308d17a39cf, objCtx=null, rcvdOnDiscovery=true, startTopVer=null, rcvdFromVer=null, clientCacheStartVer=null, schema=QuerySchema [], grpDesc=CacheGroupDescriptor [grpId=-1502999781, grpName=null, startTopVer=AffinityTopologyVersion [topVer=13, minorTopVer=23], rcvdFrom=f00ec506-fc6c-45c5-b550-9308d17a39cf, deploymentId=51771978861-398164df-6240-4d19-ad0b-308768d2a095, caches={id_forth_copy=-1502999781}, rcvdFromVer=null, persistenceEnabled=false, walEnabled=false, cacheName=id_forth_copy], cacheName=id_forth_copy], val2=null], IgniteBiTuple [val1=DynamicCacheDescriptor [deploymentId=8a671978861-398164df-6240-4d19-ad0b-308768d2a095, staticCfg=false, sql=false, cacheType=USER, template=false, updatesAllowed=true, cacheId=-1354792126, rcvdFrom=f00ec506-fc6c-45c5-b550-9308d17a39cf, objCtx=null, rcvdOnDiscovery=true, startTopVer=null, rcvdFromVer=null, clientCacheStartVer=null, schema=QuerySchema [], grpDesc=CacheGroupDescriptor [grpId=-1354792126, grpName=null, startTopVer=AffinityTopologyVersion [topVer=13, minorTopVer=5], rcvdFrom=f00ec506-fc6c-45c5-b550-9308d17a39cf, deploymentId=8a671978861-398164df-6240-4d19-ad0b-308768d2a095, caches={config=-1354792126}, rcvdFromVer=null, persistenceEnabled=false, walEnabled=false, cacheName=config], cacheName=config], val2=null], IgniteBiTuple [val1=DynamicCacheDescriptor [deploymentId=6d671978861-398164df-6240-4d19-ad0b-308768d2a095, staticCfg=false, sql=false, cacheType=USER, template=false, updatesAllowed=true, cacheId=-1176672452, rcvdFrom=f00ec506-fc6c-45c5-b550-9308d17a39cf, objCtx=null, rcvdOnDiscovery=true, startTopVer=null, rcvdFromVer=null, clientCacheStartVe
[jira] [Updated] (IGNITE-10995) GridDhtPartitionSupplier::handleDemandMessage suppress errors
[ https://issues.apache.org/jira/browse/IGNITE-10995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Sherstobitov updated IGNITE-10995: - Description: Scenario: # Cluster with data # Triggered historical rebalance In this case if OOM occurs on supplier there is no failHandler triggered and cluster is alive with inconsistent data (target node have MOVING partitions, supplier do nothing) Target rebalance node log: {code:java} [15:00:31,418][WARNING][sys-#86][GridDhtPartitionDemander] Rebalancing from node cancelled [grp=cache_group_4, topVer=AffinityTopologyVersion [topVer=17, minorTopVer=0], supplier=4cbc66d3-9d2c-4396-8366-2839a8d0cdb6, topic=5]]. Supplier has failed with error: java.lang.OutOfMemoryError: Java heap space{code} Supplier stack trace: !Screenshot 2019-01-20 at 23.19.08.png! was: Scenario: # Cluster with data # Triggered historical rebalance In this case if OOM occurs on supplier there is no triggered failHandler and cluster is alive with inconsistent data (target node have MOVING partitions, supplier do nothing) Target rebalance node log: {code:java} [15:00:31,418][WARNING][sys-#86][GridDhtPartitionDemander] Rebalancing from node cancelled [grp=cache_group_4, topVer=AffinityTopologyVersion [topVer=17, minorTopVer=0], supplier=4cbc66d3-9d2c-4396-8366-2839a8d0cdb6, topic=5]]. Supplier has failed with error: java.lang.OutOfMemoryError: Java heap space{code} Supplier stack trace: !Screenshot 2019-01-20 at 23.19.08.png! > GridDhtPartitionSupplier::handleDemandMessage suppress errors > - > > Key: IGNITE-10995 > URL: https://issues.apache.org/jira/browse/IGNITE-10995 > Project: Ignite > Issue Type: Bug >Reporter: Dmitry Sherstobitov >Priority: Major > Attachments: Screenshot 2019-01-20 at 23.19.08.png > > > Scenario: > # Cluster with data > # Triggered historical rebalance > In this case if OOM occurs on supplier there is no failHandler triggered and > cluster is alive with inconsistent data (target node have MOVING partitions, > supplier do nothing) > Target rebalance node log: > {code:java} > [15:00:31,418][WARNING][sys-#86][GridDhtPartitionDemander] Rebalancing from > node cancelled [grp=cache_group_4, topVer=AffinityTopologyVersion [topVer=17, > minorTopVer=0], supplier=4cbc66d3-9d2c-4396-8366-2839a8d0cdb6, topic=5]]. > Supplier has failed with error: java.lang.OutOfMemoryError: Java heap > space{code} > Supplier stack trace: > !Screenshot 2019-01-20 at 23.19.08.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10995) GridDhtPartitionSupplier::handleDemandMessage suppress errors
Dmitry Sherstobitov created IGNITE-10995: Summary: GridDhtPartitionSupplier::handleDemandMessage suppress errors Key: IGNITE-10995 URL: https://issues.apache.org/jira/browse/IGNITE-10995 Project: Ignite Issue Type: Bug Reporter: Dmitry Sherstobitov Attachments: Screenshot 2019-01-20 at 23.19.08.png Scenario: # Cluster with data # Triggered historical rebalance In this case if OOM occurs on supplier there is no triggered failHandler and cluster is alive with inconsistent data (target node have MOVING partitions, supplier do nothing) Target rebalance node log: {code:java} [15:00:31,418][WARNING][sys-#86][GridDhtPartitionDemander] Rebalancing from node cancelled [grp=cache_group_4, topVer=AffinityTopologyVersion [topVer=17, minorTopVer=0], supplier=4cbc66d3-9d2c-4396-8366-2839a8d0cdb6, topic=5]]. Supplier has failed with error: java.lang.OutOfMemoryError: Java heap space{code} Supplier stack trace: !Screenshot 2019-01-20 at 23.19.08.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-10943) "No next node in topology" infinite messages in log after cycle cluster nodes restart
[ https://issues.apache.org/jira/browse/IGNITE-10943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Sherstobitov updated IGNITE-10943: - Attachment: grid.1.node.1.jstack.log > "No next node in topology" infinite messages in log after cycle cluster nodes > restart > - > > Key: IGNITE-10943 > URL: https://issues.apache.org/jira/browse/IGNITE-10943 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.4 >Reporter: Dmitry Sherstobitov >Priority: Critical > Attachments: grid.1.node.1.jstack.log > > > Same scenario as in https://issues.apache.org/jira/browse/IGNITE-10878 > After cluster restarted here is one node with 100% CPU load and following > messages in log: > {code:java} > 2019-01-15T15:16:41,333][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] > Message has been added to queue: TcpDiscoveryNodeFailedMessage > [failedNodeId=e006e575-bbc8-4004-8ce3-ddc165d1748c, order=12, warning=null, > super=TcpDiscoveryAbstractMessage [sndNodeId=null, > id=3cfe0715861-24a27aff-e471-4db1-ac46-cda072de17b9, verifierNodeId=null, > topVer=0, pendingIdx=0, failedNodes=null, isClient=false]] > 2019-01-15T15:16:41,333][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] > Pending messages will be resent to local node > 2019-01-15T15:16:41,333][INFO ][tcp-disco-srvr-#3][TcpDiscoverySpi] TCP > discovery spawning a new thread for connection [rmtAddr=/172.25.1.40, > rmtPort=59236] > 2019-01-15T15:16:41,333][INFO ][tcp-disco-sock-reader-#21][TcpDiscoverySpi] > Started serving remote node connection [rmtAddr=/172.25.1.40:59236, > rmtPort=59236] > 2019-01-15T15:16:41,333][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] > Message has been added to queue: TcpDiscoveryStatusCheckMessage > [creatorNode=TcpDiscoveryNode [id=24a27aff-e471-4db1-ac46-cda072de17b9, > addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 172.17.0.1, 172.25.1.40], > sockAddrs=[/172.17.0.1:47500, /0:0:0:0:0:0:0:1%lo:47500, > lab40.gridgain.local/172.25.1.40:47500, /127.0.0.1:47500], discPort=47500, > order=0, intOrder=15, lastExchangeTime=1547554584282, loc=true, > ver=2.4.13#20190114-sha1:a7667ae6, isClient=false], failedNodeId=null, > status=0, super=TcpDiscoveryAbstractMessage [sndNodeId=null, > id=4cfe0715861-24a27aff-e471-4db1-ac46-cda072de17b9, verifierNodeId=null, > topVer=0, pendingIdx=0, failedNodes=null, isClient=false]] > 2019-01-15T15:16:41,334][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] > Ignore message failed nodes, sender node is in fail list > [nodeId=e006e575-bbc8-4004-8ce3-ddc165d1748c, > failedNodes=[a251994d-8df6-4b2d-a28c-18ec55a3a48c, > a5fa9095-2e4b-48e5-803d-551a5ebde558]] > 2019-01-15T15:16:41,334][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] No > next node in topology. > 2019-01-15T15:16:41,334][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] No > next node in topology. > 2019-01-15T15:16:41,334][DEBUG][tcp-disco-sock-reader-#21][TcpDiscoverySpi] > Initialized connection with remote node > [nodeId=6df245fe-6288-4d93-ab20-2b9ac1b35771, client=false] > 2019-01-15T15:16:41,334][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] No > next node in topology. > 2019-01-15T15:16:41,335][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] No > next node in topology. > 2019-01-15T15:16:41,335][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] No > next node in topology. > 2019-01-15T15:16:41,335][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] No > next node in topology. > 2019-01-15T15:16:41,335][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] No > next node in topology. > 2019-01-15T15:16:41,335][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] No > next node in topology. > 2019-01-15T15:16:41,335][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] No > next node in topology. > 2019-01-15T15:16:41,336][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] No > next node in topology. > 2019-01-15T15:16:41,336][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] No > next node in topology.{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10943) "No next node in topology" infinite messages in log after cycle cluster nodes restart
Dmitry Sherstobitov created IGNITE-10943: Summary: "No next node in topology" infinite messages in log after cycle cluster nodes restart Key: IGNITE-10943 URL: https://issues.apache.org/jira/browse/IGNITE-10943 Project: Ignite Issue Type: Bug Affects Versions: 2.4 Reporter: Dmitry Sherstobitov Attachments: grid.1.node.1.jstack.log Same scenario as in https://issues.apache.org/jira/browse/IGNITE-10878 After cluster restarted here is one node with 100% CPU load and following messages in log: {code:java} 2019-01-15T15:16:41,333][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] Message has been added to queue: TcpDiscoveryNodeFailedMessage [failedNodeId=e006e575-bbc8-4004-8ce3-ddc165d1748c, order=12, warning=null, super=TcpDiscoveryAbstractMessage [sndNodeId=null, id=3cfe0715861-24a27aff-e471-4db1-ac46-cda072de17b9, verifierNodeId=null, topVer=0, pendingIdx=0, failedNodes=null, isClient=false]] 2019-01-15T15:16:41,333][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] Pending messages will be resent to local node 2019-01-15T15:16:41,333][INFO ][tcp-disco-srvr-#3][TcpDiscoverySpi] TCP discovery spawning a new thread for connection [rmtAddr=/172.25.1.40, rmtPort=59236] 2019-01-15T15:16:41,333][INFO ][tcp-disco-sock-reader-#21][TcpDiscoverySpi] Started serving remote node connection [rmtAddr=/172.25.1.40:59236, rmtPort=59236] 2019-01-15T15:16:41,333][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] Message has been added to queue: TcpDiscoveryStatusCheckMessage [creatorNode=TcpDiscoveryNode [id=24a27aff-e471-4db1-ac46-cda072de17b9, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 172.17.0.1, 172.25.1.40], sockAddrs=[/172.17.0.1:47500, /0:0:0:0:0:0:0:1%lo:47500, lab40.gridgain.local/172.25.1.40:47500, /127.0.0.1:47500], discPort=47500, order=0, intOrder=15, lastExchangeTime=1547554584282, loc=true, ver=2.4.13#20190114-sha1:a7667ae6, isClient=false], failedNodeId=null, status=0, super=TcpDiscoveryAbstractMessage [sndNodeId=null, id=4cfe0715861-24a27aff-e471-4db1-ac46-cda072de17b9, verifierNodeId=null, topVer=0, pendingIdx=0, failedNodes=null, isClient=false]] 2019-01-15T15:16:41,334][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] Ignore message failed nodes, sender node is in fail list [nodeId=e006e575-bbc8-4004-8ce3-ddc165d1748c, failedNodes=[a251994d-8df6-4b2d-a28c-18ec55a3a48c, a5fa9095-2e4b-48e5-803d-551a5ebde558]] 2019-01-15T15:16:41,334][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] No next node in topology. 2019-01-15T15:16:41,334][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] No next node in topology. 2019-01-15T15:16:41,334][DEBUG][tcp-disco-sock-reader-#21][TcpDiscoverySpi] Initialized connection with remote node [nodeId=6df245fe-6288-4d93-ab20-2b9ac1b35771, client=false] 2019-01-15T15:16:41,334][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] No next node in topology. 2019-01-15T15:16:41,335][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] No next node in topology. 2019-01-15T15:16:41,335][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] No next node in topology. 2019-01-15T15:16:41,335][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] No next node in topology. 2019-01-15T15:16:41,335][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] No next node in topology. 2019-01-15T15:16:41,335][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] No next node in topology. 2019-01-15T15:16:41,335][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] No next node in topology. 2019-01-15T15:16:41,336][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] No next node in topology. 2019-01-15T15:16:41,336][DEBUG][tcp-disco-msg-worker-#2][TcpDiscoverySpi] No next node in topology.{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-10935) "Invalid node order" error occurs while cycle cluster nodes restart
[ https://issues.apache.org/jira/browse/IGNITE-10935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16742397#comment-16742397 ] Dmitry Sherstobitov commented on IGNITE-10935: -- Looks like this is another exception that may occurs in this scenario > "Invalid node order" error occurs while cycle cluster nodes restart > --- > > Key: IGNITE-10935 > URL: https://issues.apache.org/jira/browse/IGNITE-10935 > Project: Ignite > Issue Type: Bug >Reporter: Dmitry Sherstobitov >Priority: Critical > > Same scenario as in https://issues.apache.org/jira/browse/IGNITE-10878 > {code:java} > Exception in thread "tcp-disco-msg-worker-#2" java.lang.AssertionError: > Invalid node order: TcpDiscoveryNode > [id=9a332aa3-3d60-469a-9ff5-3deee8918451, addrs=[0:0:0:0:0:0:0:1%lo, > 127.0.0.1, 172.17.0.1, 172.25.1.40], sockAddrs=[/172.25.1.40:47501, > /0:0:0:0:0:0:0:1%lo:47501, /127.0.0.1:47501, /172.17.0.1:47501], > discPort=47501, order=0, intOrder=16, lastExchangeTime=1547486771047, > loc=false, ver=2.4.13#20190114-sha1:a7667ae6, isClient=false] > at > org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing$1.apply(TcpDiscoveryNodesRing.java:51) > at > org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing$1.apply(TcpDiscoveryNodesRing.java:48) > at org.apache.ignite.internal.util.lang.GridFunc.isAll(GridFunc.java:2030) > at > org.apache.ignite.internal.util.IgniteUtils.arrayList(IgniteUtils.java:9635) > at > org.apache.ignite.internal.util.IgniteUtils.arrayList(IgniteUtils.java:9608) > at > org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing.nodes(TcpDiscoveryNodesRing.java:625) > at > org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing.visibleNodes(TcpDiscoveryNodesRing.java:145) > at > org.apache.ignite.spi.discovery.tcp.ServerImpl.notifyDiscovery(ServerImpl.java:1429) > at > org.apache.ignite.spi.discovery.tcp.ServerImpl.access$2400(ServerImpl.java:176) > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processNodeAddFinishedMessage(ServerImpl.java:4565) > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2732) > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2554) > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerAdapter.body(ServerImpl.java:6955) > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2634) > at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) > Collaps{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10935) "Invalid node order" error occurs while cycle cluster nodes restart
Dmitry Sherstobitov created IGNITE-10935: Summary: "Invalid node order" error occurs while cycle cluster nodes restart Key: IGNITE-10935 URL: https://issues.apache.org/jira/browse/IGNITE-10935 Project: Ignite Issue Type: Bug Reporter: Dmitry Sherstobitov Same scenario as in https://issues.apache.org/jira/browse/IGNITE-10878 {code:java} Exception in thread "tcp-disco-msg-worker-#2" java.lang.AssertionError: Invalid node order: TcpDiscoveryNode [id=9a332aa3-3d60-469a-9ff5-3deee8918451, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 172.17.0.1, 172.25.1.40], sockAddrs=[/172.25.1.40:47501, /0:0:0:0:0:0:0:1%lo:47501, /127.0.0.1:47501, /172.17.0.1:47501], discPort=47501, order=0, intOrder=16, lastExchangeTime=1547486771047, loc=false, ver=2.4.13#20190114-sha1:a7667ae6, isClient=false] at org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing$1.apply(TcpDiscoveryNodesRing.java:51) at org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing$1.apply(TcpDiscoveryNodesRing.java:48) at org.apache.ignite.internal.util.lang.GridFunc.isAll(GridFunc.java:2030) at org.apache.ignite.internal.util.IgniteUtils.arrayList(IgniteUtils.java:9635) at org.apache.ignite.internal.util.IgniteUtils.arrayList(IgniteUtils.java:9608) at org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing.nodes(TcpDiscoveryNodesRing.java:625) at org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing.visibleNodes(TcpDiscoveryNodesRing.java:145) at org.apache.ignite.spi.discovery.tcp.ServerImpl.notifyDiscovery(ServerImpl.java:1429) at org.apache.ignite.spi.discovery.tcp.ServerImpl.access$2400(ServerImpl.java:176) at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processNodeAddFinishedMessage(ServerImpl.java:4565) at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2732) at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2554) at org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerAdapter.body(ServerImpl.java:6955) at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2634) at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) Collaps{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10878) "Failed to find completed exchange future" error occurs in test with round cluster restart
Dmitry Sherstobitov created IGNITE-10878: Summary: "Failed to find completed exchange future" error occurs in test with round cluster restart Key: IGNITE-10878 URL: https://issues.apache.org/jira/browse/IGNITE-10878 Project: Ignite Issue Type: Bug Reporter: Dmitry Sherstobitov # Start cluster, create caches with no persistence and load data into it # Restart each node in cluster by order (coordinator first) Do not wait until topology message occurs # At some moment there is possibility of error (1 out of 20 runs) This is the case when the topology version has time to be reset {code:java} [23:27:17,218][INFO][exchange-worker-#62][GridCacheProcessor] Started cache [name=ENTITY_CONFIG, id=23889694, memoryPolicyName=no-evict, mode=REPLICATED, atomicity=ATOMIC, backups=2147483647] [23:27:17,222][SEVERE][exchange-worker-#62][GridDhtPartitionsExchangeFuture] Failed to reinitialize local partitions (preloading will be stopped): GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=1, minorTopVer=0], discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=83bd0a25-4574-4723-9594-b95ddaab19be, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 172.17.0.1, 172.25.1.40], sockAddrs=[/0:0:0:0:0:0:0:1%lo:47503, /127.0.0.1:47503, /172.17.0.1:47503, lab40.gridgain.local/172.25.1.40:47503], discPort=47503, order=1, intOrder=1, lastExchangeTime=1547065626462, loc=true, ver=2.4.13#20181228-sha1:9033812f, isClient=false], topVer=1, nodeId8=83bd0a25, msg=null, type=NODE_JOINED, tstamp=1547065636782], nodeId=83bd0a25, evt=NODE_JOINED] class org.apache.ignite.IgniteCheckedException: Failed to find completed exchange future to fetch affinity. at org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager$18.applyx(CacheAffinitySharedManager.java:1798) at org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager$18.applyx(CacheAffinitySharedManager.java:1743) at org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.forAllRegisteredCacheGroups(CacheAffinitySharedManager.java:1107) at org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.initCoordinatorCaches(CacheAffinitySharedManager.java:1743) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.initCoordinatorCaches(GridDhtPartitionsExchangeFuture.java:573) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:679) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2398) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) at java.lang.Thread.run(Thread.java:748) [23:27:17,222][INFO][exchange-worker-#62][GridDhtPartitionsExchangeFuture] Finish exchange future [startVer=AffinityTopologyVersion [topVer=1, minorTopVer=0], resVer=null, err=class org.apache.ignite.IgniteCheckedException: Failed to find completed exchange future to fetch affinity.] [23:27:17,238][SEVERE][main][IgniteKernal] Got exception while starting (will rollback startup routine). class org.apache.ignite.IgniteCheckedException: Failed to find completed exchange future to fetch affinity. at org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager$18.applyx(CacheAffinitySharedManager.java:1798) at org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager$18.applyx(CacheAffinitySharedManager.java:1743) at org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.forAllRegisteredCacheGroups(CacheAffinitySharedManager.java:1107) at org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.initCoordinatorCaches(CacheAffinitySharedManager.java:1743) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.initCoordinatorCaches(GridDhtPartitionsExchangeFuture.java:573) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:679) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2398) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) at java.lang.Thread.run(Thread.java:748) [23:27:17,238][INFO][exchange-worker-#62][GridDhtPartitionsExchangeFuture] Completed partition exchange [localNode=83bd0a25-4574-4723-9594-b95ddaab19be, exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=1, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode [id=83bd0a25-4574-4723-9594-b95ddaa
[jira] [Updated] (IGNITE-10672) Changing walSegments property leads to fallen node on start
[ https://issues.apache.org/jira/browse/IGNITE-10672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Sherstobitov updated IGNITE-10672: - Summary: Changing walSegments property leads to fallen node on start (was: Changing walSegments property leads to fallen node) > Changing walSegments property leads to fallen node on start > --- > > Key: IGNITE-10672 > URL: https://issues.apache.org/jira/browse/IGNITE-10672 > Project: Ignite > Issue Type: Bug >Reporter: Dmitry Sherstobitov >Priority: Major > > Start cluster with > {code} > > class="org.apache.ignite.configuration.DataStorageConfiguration"> > > > class="org.apache.ignite.configuration.DataRegionConfiguration"> > > > > > > {code} > Load some data and then restart cluster with new config: > {code} > > class="org.apache.ignite.configuration.DataStorageConfiguration"> > > > class="org.apache.ignite.configuration.DataRegionConfiguration"> > > > > > > > {code} > This will lead nodes to fail on start > {code} > [14:51:00,852][SEVERE][main][IgniteKernal] Got exception while starting (will > rollback startup routine). > class org.apache.ignite.IgniteCheckedException: Failed to start processor: > GridProcessorAdapter [] > at > org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1784) > at > org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1008) > at > org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2020) > at > org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1725) > at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1153) > at > org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1071) > at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:957) > at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:856) > at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:726) > at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:695) > at org.apache.ignite.Ignition.start(Ignition.java:348) > at > org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:301) > Caused by: class > org.apache.ignite.internal.processors.cache.persistence.StorageException: > Failed to initialize wal (work directory contains incorrect number of > segments) [cur=10, expected=5] > at > org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.checkOrPrepareFiles(FileWriteAheadLogManager.java:1408) > at > org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.start0(FileWriteAheadLogManager.java:435) > at > org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter.start(GridCacheSharedManagerAdapter.java:61) > at > org.apache.ignite.internal.processors.cache.GridCacheProcessor.start(GridCacheProcessor.java:741) > at > org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1781) > ... 11 more > [14:51:00,853][WARNING][main][IgniteKernal] Attempt to stop starting grid. > This operation cannot be guaranteed to be successful. > [14:51:00,855][SEVERE][main][IgniteKernal] Failed to stop component > (ignoring): GridProcessorAdapter [] > java.lang.NullPointerException > at > org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.stop0(FileWriteAheadLogManager.java:631) > at > org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter.stop(GridCacheSharedManagerAdapter.java:94) > at > org.apache.ignite.internal.processors.cache.GridCacheProcessor.stop(GridCacheProcessor.java:980) > at > org.apache.ignite.internal.IgniteKernal.stop0(IgniteKernal.java:2312) > at > org.apache.ignite.internal.IgniteKernal.stop(IgniteKernal.java:2190) > at > org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1164) > at > org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2020) > at > org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1725) > at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1153) > at > org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1071) > at org.apache.ignite.internal.IgnitionEx.
[jira] [Updated] (IGNITE-10672) Changing walSegments property leads to fallen node
[ https://issues.apache.org/jira/browse/IGNITE-10672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Sherstobitov updated IGNITE-10672: - Description: Start cluster with {code} {code} Load some data and then restart cluster with new config: {code} {code} This will lead nodes to fail on start {code} [14:51:00,852][SEVERE][main][IgniteKernal] Got exception while starting (will rollback startup routine). class org.apache.ignite.IgniteCheckedException: Failed to start processor: GridProcessorAdapter [] at org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1784) at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1008) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2020) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1725) at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1153) at org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1071) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:957) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:856) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:726) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:695) at org.apache.ignite.Ignition.start(Ignition.java:348) at org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:301) Caused by: class org.apache.ignite.internal.processors.cache.persistence.StorageException: Failed to initialize wal (work directory contains incorrect number of segments) [cur=10, expected=5] at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.checkOrPrepareFiles(FileWriteAheadLogManager.java:1408) at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.start0(FileWriteAheadLogManager.java:435) at org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter.start(GridCacheSharedManagerAdapter.java:61) at org.apache.ignite.internal.processors.cache.GridCacheProcessor.start(GridCacheProcessor.java:741) at org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1781) ... 11 more [14:51:00,853][WARNING][main][IgniteKernal] Attempt to stop starting grid. This operation cannot be guaranteed to be successful. [14:51:00,855][SEVERE][main][IgniteKernal] Failed to stop component (ignoring): GridProcessorAdapter [] java.lang.NullPointerException at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.stop0(FileWriteAheadLogManager.java:631) at org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter.stop(GridCacheSharedManagerAdapter.java:94) at org.apache.ignite.internal.processors.cache.GridCacheProcessor.stop(GridCacheProcessor.java:980) at org.apache.ignite.internal.IgniteKernal.stop0(IgniteKernal.java:2312) at org.apache.ignite.internal.IgniteKernal.stop(IgniteKernal.java:2190) at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1164) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2020) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1725) at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1153) at org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1071) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:957) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:856) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:726) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:695) at org.apache.ignite.Ignition.start(Ignition.java:348) at org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:301) {code} was: Start cluster with {code} {code} Load some data and then restart cluster with new config: {code} {code} This will lead node
[jira] [Updated] (IGNITE-10672) Changing walSegments property leads to fallen node
[ https://issues.apache.org/jira/browse/IGNITE-10672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Sherstobitov updated IGNITE-10672: - Description: Start cluster with {code} {code} Load some data and then restart cluster with new config: {code} {code} This will lead nodes to fail on start {code} [14:51:00,852][SEVERE][main][IgniteKernal] Got exception while starting (will rollback startup routine). class org.apache.ignite.IgniteCheckedException: Failed to start processor: GridProcessorAdapter [] at org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1784) at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1008) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2020) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1725) at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1153) at org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1071) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:957) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:856) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:726) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:695) at org.apache.ignite.Ignition.start(Ignition.java:348) at org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:301) Caused by: class org.apache.ignite.internal.processors.cache.persistence.StorageException: Failed to initialize wal (work directory contains incorrect number of segments) [cur=10, expected=5] at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.checkOrPrepareFiles(FileWriteAheadLogManager.java:1408) at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.start0(FileWriteAheadLogManager.java:435) at org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter.start(GridCacheSharedManagerAdapter.java:61) at org.apache.ignite.internal.processors.cache.GridCacheProcessor.start(GridCacheProcessor.java:741) at org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1781) ... 11 more [14:51:00,853][WARNING][main][IgniteKernal] Attempt to stop starting grid. This operation cannot be guaranteed to be successful. [14:51:00,855][SEVERE][main][IgniteKernal] Failed to stop component (ignoring): GridProcessorAdapter [] java.lang.NullPointerException at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.stop0(FileWriteAheadLogManager.java:631) at org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter.stop(GridCacheSharedManagerAdapter.java:94) at org.apache.ignite.internal.processors.cache.GridCacheProcessor.stop(GridCacheProcessor.java:980) at org.apache.ignite.internal.IgniteKernal.stop0(IgniteKernal.java:2312) at org.apache.ignite.internal.IgniteKernal.stop(IgniteKernal.java:2190) at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1164) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2020) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1725) at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1153) at org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1071) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:957) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:856) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:726) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:695) at org.apache.ignite.Ignition.start(Ignition.java:348) at org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:301) {code} was: Start cluster with {code} {code} Load some data and then restart cluster with new config: {code} {code} This will lead nodes to fail on start
[jira] [Created] (IGNITE-10672) Changing walSegments property leads to fallen node
Dmitry Sherstobitov created IGNITE-10672: Summary: Changing walSegments property leads to fallen node Key: IGNITE-10672 URL: https://issues.apache.org/jira/browse/IGNITE-10672 Project: Ignite Issue Type: Bug Reporter: Dmitry Sherstobitov Start cluster with {code} {code} Load some data and then restart cluster with new config: {code} {code} This will lead node to error on start {code} [14:51:00,852][SEVERE][main][IgniteKernal] Got exception while starting (will rollback startup routine). class org.apache.ignite.IgniteCheckedException: Failed to start processor: GridProcessorAdapter [] at org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1784) at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1008) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2020) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1725) at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1153) at org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1071) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:957) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:856) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:726) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:695) at org.apache.ignite.Ignition.start(Ignition.java:348) at org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:301) Caused by: class org.apache.ignite.internal.processors.cache.persistence.StorageException: Failed to initialize wal (work directory contains incorrect number of segments) [cur=10, expected=5] at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.checkOrPrepareFiles(FileWriteAheadLogManager.java:1408) at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.start0(FileWriteAheadLogManager.java:435) at org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter.start(GridCacheSharedManagerAdapter.java:61) at org.apache.ignite.internal.processors.cache.GridCacheProcessor.start(GridCacheProcessor.java:741) at org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1781) ... 11 more [14:51:00,853][WARNING][main][IgniteKernal] Attempt to stop starting grid. This operation cannot be guaranteed to be successful. [14:51:00,855][SEVERE][main][IgniteKernal] Failed to stop component (ignoring): GridProcessorAdapter [] java.lang.NullPointerException at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.stop0(FileWriteAheadLogManager.java:631) at org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter.stop(GridCacheSharedManagerAdapter.java:94) at org.apache.ignite.internal.processors.cache.GridCacheProcessor.stop(GridCacheProcessor.java:980) at org.apache.ignite.internal.IgniteKernal.stop0(IgniteKernal.java:2312) at org.apache.ignite.internal.IgniteKernal.stop(IgniteKernal.java:2190) at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1164) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2020) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1725) at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1153) at org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1071) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:957) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:856) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:726) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:695) at org.apache.ignite.Ignition.start(Ignition.java:348) at org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:301) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10189) SslContextFactory's ciphers doesn't work with control.sh utility
Dmitry Sherstobitov created IGNITE-10189: Summary: SslContextFactory's ciphers doesn't work with control.sh utility Key: IGNITE-10189 URL: https://issues.apache.org/jira/browse/IGNITE-10189 Project: Ignite Issue Type: Bug Reporter: Dmitry Sherstobitov There is no options for control.sh utility if ciphers feature enabled on server If this property enabled on server: {code} ... ... {code} Control.sh utility doesn't work -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-8895) Update yardstick libraries
[ https://issues.apache.org/jira/browse/IGNITE-8895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Sherstobitov updated IGNITE-8895: Description: There is some conflicts in yardstick libraries for now ||yardstick||core||problem|| |jline-0.9.94.jar|bin/include/sqlline/jline-2.4.3.jar|./sqlline.sh unable to start because of yardstick libraries in the PATH| was: There is some conflicts in yardstick libraries for now ||yardstick||core||problem|| |jline-0.9.94.jar|bin/include/sqlline/jline-2.4.3.jar|./sqlline.sh unable to start if yardstick libraries in path| > Update yardstick libraries > --- > > Key: IGNITE-8895 > URL: https://issues.apache.org/jira/browse/IGNITE-8895 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.5 >Reporter: Dmitry Sherstobitov >Priority: Major > > There is some conflicts in yardstick libraries for now > ||yardstick||core||problem|| > |jline-0.9.94.jar|bin/include/sqlline/jline-2.4.3.jar|./sqlline.sh unable to > start because of yardstick libraries in the PATH| > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-6167) Ability to enabled TLS protocols and cipher suites
[ https://issues.apache.org/jira/browse/IGNITE-6167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16667039#comment-16667039 ] Dmitry Sherstobitov commented on IGNITE-6167: - Duplicate javadoc {code:java} /** * Gets enabled cipher suites * @return enabled cipher suites */ public String[] getCipherSuites() { return cipherSuites; } /** * Gets enabled cipher suites * @return enabled cipher suites */ public String[] getProtocols() { return protocols; } {code} > Ability to enabled TLS protocols and cipher suites > -- > > Key: IGNITE-6167 > URL: https://issues.apache.org/jira/browse/IGNITE-6167 > Project: Ignite > Issue Type: Wish > Components: security >Affects Versions: 2.1 >Reporter: Jens Borgland >Assignee: Mikhail Cherkasov >Priority: Major > Fix For: 2.7 > > > It would be very useful to be able to, in addition to the > {{javax.net.ssl.SSLContext}}, either specify a custom > {{javax.net.ssl.SSLServerSocketFactory}} and a custom > {{javax.net.ssl.SSLSocketFactory}}, or to be able to at least specify the > enabled TLS protocols and cipher suites. > I have noticed that the > {{org.apache.ignite.internal.util.nio.ssl.GridNioSslFilter}} has support for > the latter but I cannot find a way of getting a reference to the filter > instance. The {{GridNioSslFilter}} also isn't used by {{TcpDiscoverySpi}} as > far as I can tell. > Currently (as far as I can tell) there is no way of specifying the enabled > cipher suites and protocols used by Ignite, without doing it globally for the > JRE. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-9752) Fix ODBC documentation
[ https://issues.apache.org/jira/browse/IGNITE-9752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16639915#comment-16639915 ] Dmitry Sherstobitov commented on IGNITE-9752: - I've updated description localhost IP should be fixed too > Fix ODBC documentation > -- > > Key: IGNITE-9752 > URL: https://issues.apache.org/jira/browse/IGNITE-9752 > Project: Ignite > Issue Type: Bug > Components: documentation >Reporter: Dmitry Sherstobitov >Assignee: Prachi Garg >Priority: Blocker > Fix For: 2.7 > > Attachments: image-2018-10-01-17-12-21-555.png > > > See screen shot. > There is no matching between default values and values in example > host in default - 0.0.0.0 > port in default - 10800 > host in example 127.0.0.1 > port - 12345 > Parameters in xml example will be not working for external connections > (because of 127.0.0.1 using instead of 0.0.0.0) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-9752) Fix ODBC documentation
[ https://issues.apache.org/jira/browse/IGNITE-9752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Sherstobitov updated IGNITE-9752: Description: See screen shot. There is no matching between default values and values in example host in default - 0.0.0.0 port in default - 10800 host in example 127.0.0.1 port - 12345 Parameters in xml example will be not working for external connections (because of 127.0.0.1 using instead of 0.0.0.0) was: See screen shot. There is no matching between default values and values in example host in default - 0.0.0.0 port in default - 10800 host in example 127.0.0.1 port - 12345 Parameters in xml example will be not working for external connections > Fix ODBC documentation > -- > > Key: IGNITE-9752 > URL: https://issues.apache.org/jira/browse/IGNITE-9752 > Project: Ignite > Issue Type: Bug > Components: documentation >Reporter: Dmitry Sherstobitov >Assignee: Prachi Garg >Priority: Blocker > Fix For: 2.7 > > Attachments: image-2018-10-01-17-12-21-555.png > > > See screen shot. > There is no matching between default values and values in example > host in default - 0.0.0.0 > port in default - 10800 > host in example 127.0.0.1 > port - 12345 > Parameters in xml example will be not working for external connections > (because of 127.0.0.1 using instead of 0.0.0.0) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-9752) Fix ODBC documentation
[ https://issues.apache.org/jira/browse/IGNITE-9752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Sherstobitov updated IGNITE-9752: Description: See screen shot. There is no matching between default values and values in example host in default - 0.0.0.0 port in default - 10800 host in example 127.0.0.1 port - 12345 Parameters in xml example will be not working for external connections was: See screen shot. There is no matching between default values and values in example host in default - 0.0.0.0 port in default - 10800 host in example 127.0.0.1 (does it visible inside machine?) port - 12345 > Fix ODBC documentation > -- > > Key: IGNITE-9752 > URL: https://issues.apache.org/jira/browse/IGNITE-9752 > Project: Ignite > Issue Type: Bug > Components: documentation >Reporter: Dmitry Sherstobitov >Assignee: Prachi Garg >Priority: Blocker > Fix For: 2.7 > > Attachments: image-2018-10-01-17-12-21-555.png > > > See screen shot. > There is no matching between default values and values in example > host in default - 0.0.0.0 > port in default - 10800 > host in example 127.0.0.1 > port - 12345 > Parameters in xml example will be not working for external connections -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Issue Comment Deleted] (IGNITE-9298) control.sh does not support SSL (org.apache.ignite.internal.commandline.CommandHandler)
[ https://issues.apache.org/jira/browse/IGNITE-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Sherstobitov updated IGNITE-9298: Comment: was deleted (was: We've increased chaos in args naming: {code:java} /** */ protected static final String CMD_PING_TIMEOUT = "--ping-timeout"; /** */ private static final String CMD_DUMP = "--dump"; /** */ private static final String CMD_SKIP_ZEROS = "--skipZeros"; // SSL configuration section /** */ protected static final String CMD_SSL_ENABLED = "--ssl_enabled"; /** */ protected static final String CMD_SSL_PROTOCOL = "--ssl_protocol";{code} Here is 3 different types of split word: with dash, with capital letter and with '_') > control.sh does not support SSL > (org.apache.ignite.internal.commandline.CommandHandler) > --- > > Key: IGNITE-9298 > URL: https://issues.apache.org/jira/browse/IGNITE-9298 > Project: Ignite > Issue Type: Bug > Components: clients >Affects Versions: 2.6 >Reporter: Paul Anderson >Assignee: Paul Anderson >Priority: Major > Fix For: 2.7 > > Attachments: Arguments.patch, CommandHandler.patch > > > We required SSL on the connector port and to use control.sh to work with the > baseline configuration. > This morning I added support, see attached patches against 2.6.0 for > org/apache/ignite/internal/commandline/CommandHandler.java > org/apache/ignite/internal/commandline/Arguments.java > No tests, no docs. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-9298) control.sh does not support SSL (org.apache.ignite.internal.commandline.CommandHandler)
[ https://issues.apache.org/jira/browse/IGNITE-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16638287#comment-16638287 ] Dmitry Sherstobitov commented on IGNITE-9298: - We've increased chaos in args naming: {code:java} /** */ protected static final String CMD_PING_TIMEOUT = "--ping-timeout"; /** */ private static final String CMD_DUMP = "--dump"; /** */ private static final String CMD_SKIP_ZEROS = "--skipZeros"; // SSL configuration section /** */ protected static final String CMD_SSL_ENABLED = "--ssl_enabled"; /** */ protected static final String CMD_SSL_PROTOCOL = "--ssl_protocol";{code} Here is 3 different types of split word: with dash, with capital letter and with '_' > control.sh does not support SSL > (org.apache.ignite.internal.commandline.CommandHandler) > --- > > Key: IGNITE-9298 > URL: https://issues.apache.org/jira/browse/IGNITE-9298 > Project: Ignite > Issue Type: Bug > Components: clients >Affects Versions: 2.6 >Reporter: Paul Anderson >Assignee: Paul Anderson >Priority: Major > Fix For: 2.7 > > Attachments: Arguments.patch, CommandHandler.patch > > > We required SSL on the connector port and to use control.sh to work with the > baseline configuration. > This morning I added support, see attached patches against 2.6.0 for > org/apache/ignite/internal/commandline/CommandHandler.java > org/apache/ignite/internal/commandline/Arguments.java > No tests, no docs. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-9752) Fix ODBC documentation
[ https://issues.apache.org/jira/browse/IGNITE-9752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Sherstobitov updated IGNITE-9752: Description: See screen shot. There is no matching between default values and values in example was: See screen shot. There is no matching between default values and values in example !image-2018-10-01-17-12-28-557.png! > Fix ODBC documentation > -- > > Key: IGNITE-9752 > URL: https://issues.apache.org/jira/browse/IGNITE-9752 > Project: Ignite > Issue Type: Bug >Reporter: Dmitry Sherstobitov >Priority: Major > Attachments: image-2018-10-01-17-12-21-555.png > > > See screen shot. > There is no matching between default values and values in example -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-9752) Fix ODBC documentation
[ https://issues.apache.org/jira/browse/IGNITE-9752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Sherstobitov updated IGNITE-9752: Description: See screen shot. There is no matching between default values and values in example host in default - 0.0.0.0 port in default - 10800 host in example 127.0.0.1 (does it visible inside machine?) port - 12345 was: See screen shot. There is no matching between default values and values in example > Fix ODBC documentation > -- > > Key: IGNITE-9752 > URL: https://issues.apache.org/jira/browse/IGNITE-9752 > Project: Ignite > Issue Type: Bug >Reporter: Dmitry Sherstobitov >Priority: Major > Attachments: image-2018-10-01-17-12-21-555.png > > > See screen shot. > There is no matching between default values and values in example > host in default - 0.0.0.0 > port in default - 10800 > host in example 127.0.0.1 (does it visible inside machine?) > port - 12345 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9752) Fix ODBC documentation
Dmitry Sherstobitov created IGNITE-9752: --- Summary: Fix ODBC documentation Key: IGNITE-9752 URL: https://issues.apache.org/jira/browse/IGNITE-9752 Project: Ignite Issue Type: Bug Reporter: Dmitry Sherstobitov Attachments: image-2018-10-01-17-12-21-555.png See screen shot. There is no matching between default values and values in example !image-2018-10-01-17-12-28-557.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-9751) Fix odbc driver description
[ https://issues.apache.org/jira/browse/IGNITE-9751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Sherstobitov updated IGNITE-9751: Summary: Fix odbc driver description (was: Fix odic driver description) > Fix odbc driver description > --- > > Key: IGNITE-9751 > URL: https://issues.apache.org/jira/browse/IGNITE-9751 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.7 >Reporter: Dmitry Sherstobitov >Priority: Major > Attachments: Screen Shot 2018-10-01 at 14.55.21.png > > > There is no version and company -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9751) Fix odic driver description
Dmitry Sherstobitov created IGNITE-9751: --- Summary: Fix odic driver description Key: IGNITE-9751 URL: https://issues.apache.org/jira/browse/IGNITE-9751 Project: Ignite Issue Type: Bug Affects Versions: 2.7 Reporter: Dmitry Sherstobitov Attachments: Screen Shot 2018-10-01 at 14.55.21.png There is no version and company -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-9309) LocalNodeMovingPartitionsCount metrics may calculates incorrect due to processFullPartitionUpdate
[ https://issues.apache.org/jira/browse/IGNITE-9309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16593502#comment-16593502 ] Dmitry Sherstobitov commented on IGNITE-9309: - [~avinogradov] [~Mmuzaf] I've moved files from 7165 here. Please look at [~Jokser] comment > LocalNodeMovingPartitionsCount metrics may calculates incorrect due to > processFullPartitionUpdate > - > > Key: IGNITE-9309 > URL: https://issues.apache.org/jira/browse/IGNITE-9309 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.6 >Reporter: Maxim Muzafarov >Priority: Major > Attachments: GridCacheRebalancingCancelTestNoReproduce.java, > node-2-jstack.log, node-NO_REBALANCE-7165.log > > > [~qvad] have found incorrect {{LocalNodeMovingPartitionsCount}} metrics > calculation on client node {{JOIN\LEFT}}. Full issue reproducer is absent. > Probable scenario: > {code} > Repeat 10 times: > 1. stop node > 2. clean lfs > 3. add stopped node (trigger rebalance) > 4. 3 times: start 2 clients, wait for topology snapshot, close clients > 5. for each cache group check JMX metrics LocalNodeMovingPartitionsCount > (like waitForFinishRebalance()) > {code} > Whole discussion and all configuration details can be found in comments of > [IGNITE-7165|https://issues.apache.org/jira/browse/IGNITE-7165]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-9309) LocalNodeMovingPartitionsCount metrics may calculates incorrect due to processFullPartitionUpdate
[ https://issues.apache.org/jira/browse/IGNITE-9309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Sherstobitov updated IGNITE-9309: Attachment: node-NO_REBALANCE-7165.log node-2-jstack.log GridCacheRebalancingCancelTestNoReproduce.java > LocalNodeMovingPartitionsCount metrics may calculates incorrect due to > processFullPartitionUpdate > - > > Key: IGNITE-9309 > URL: https://issues.apache.org/jira/browse/IGNITE-9309 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.6 >Reporter: Maxim Muzafarov >Priority: Major > Attachments: GridCacheRebalancingCancelTestNoReproduce.java, > node-2-jstack.log, node-NO_REBALANCE-7165.log > > > [~qvad] have found incorrect {{LocalNodeMovingPartitionsCount}} metrics > calculation on client node {{JOIN\LEFT}}. Full issue reproducer is absent. > Probable scenario: > {code} > Repeat 10 times: > 1. stop node > 2. clean lfs > 3. add stopped node (trigger rebalance) > 4. 3 times: start 2 clients, wait for topology snapshot, close clients > 5. for each cache group check JMX metrics LocalNodeMovingPartitionsCount > (like waitForFinishRebalance()) > {code} > Whole discussion and all configuration details can be found in comments of > [IGNITE-7165|https://issues.apache.org/jira/browse/IGNITE-7165]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-7165) Re-balancing is cancelled if client node joins
[ https://issues.apache.org/jira/browse/IGNITE-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16582657#comment-16582657 ] Dmitry Sherstobitov commented on IGNITE-7165: - [~Mmuzaf] Here is code of the test. There is one big problem with it: it is single JVM. Second problem that it is not a reproducer. [^GridCacheRebalancingCancelTestNoReproduce.java] > Re-balancing is cancelled if client node joins > -- > > Key: IGNITE-7165 > URL: https://issues.apache.org/jira/browse/IGNITE-7165 > Project: Ignite > Issue Type: Bug >Reporter: Mikhail Cherkasov >Assignee: Maxim Muzafarov >Priority: Critical > Labels: rebalance > Fix For: 2.7 > > Attachments: GridCacheRebalancingCancelTestNoReproduce.java, > node-2-jstack.log, node-NO_REBALANCE-7165.log > > > Re-balancing is canceled if client node joins. Re-balancing can take hours > and each time when client node joins it starts again: > [15:10:05,700][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager] > Added new node to topology: TcpDiscoveryNode > [id=979cf868-1c37-424a-9ad1-12db501f32ef, addrs=[0:0:0:0:0:0:0:1, 127.0.0.1, > 172.31.16.213], sockAddrs=[/0:0:0:0:0:0:0:1:0, /127.0.0.1:0, > /172.31.16.213:0], discPort=0, order=36, intOrder=24, > lastExchangeTime=1512907805688, loc=false, ver=2.3.1#20171129-sha1:4b1ec0fe, > isClient=true] > [15:10:05,701][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager] > Topology snapshot [ver=36, servers=7, clients=5, CPUs=128, heap=160.0GB] > [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Started > exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], > crd=false, evt=NODE_JOINED, evtNode=979cf868-1c37-424a-9ad1-12db501f32ef, > customEvt=null, allowMerge=true] > [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionsExchangeFuture] > Finish exchange future [startVer=AffinityTopologyVersion [topVer=36, > minorTopVer=0], resVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], > err=null] > [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Finished > exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], > crd=false] > [15:10:05,703][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager] > Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion > [topVer=36, minorTopVer=0], evt=NODE_JOINED, > node=979cf868-1c37-424a-9ad1-12db501f32ef] > [15:10:08,706][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Cancelled rebalancing from all nodes [topology=AffinityTopologyVersion > [topVer=35, minorTopVer=0]] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager] > Rebalancing scheduled [order=[statementp]] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager] > Rebalancing started [top=null, evt=NODE_JOINED, > node=a8be3c14-9add-48c3-b099-3fd304cfdbf4] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=2f6bde48-ffb5-4815-bd32-df4e57dc13e0, partitionsCount=18, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=35d01141-4dce-47dd-adf6-a4f3b2bb9da9, partitionsCount=15, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=b3a8be53-e61f-4023-a906-a265923837ba, partitionsCount=15, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=f825cb4e-7dcc-405f-a40d-c1dc1a3ade5a, partitionsCount=12, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=4ae1db91-8b88-4180-a84b-127a303959e9, partitionsCount=11, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=7c286481-7638-49e4-8c68-fa6aa65d8b76, partitionsCount=18, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > so in clusters with a big amount of data and the frequent client left/join
[jira] [Updated] (IGNITE-7165) Re-balancing is cancelled if client node joins
[ https://issues.apache.org/jira/browse/IGNITE-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Sherstobitov updated IGNITE-7165: Attachment: GridCacheRebalancingCancelTestNoReproduce.java > Re-balancing is cancelled if client node joins > -- > > Key: IGNITE-7165 > URL: https://issues.apache.org/jira/browse/IGNITE-7165 > Project: Ignite > Issue Type: Bug >Reporter: Mikhail Cherkasov >Assignee: Maxim Muzafarov >Priority: Critical > Labels: rebalance > Fix For: 2.7 > > Attachments: GridCacheRebalancingCancelTestNoReproduce.java, > node-2-jstack.log, node-NO_REBALANCE-7165.log > > > Re-balancing is canceled if client node joins. Re-balancing can take hours > and each time when client node joins it starts again: > [15:10:05,700][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager] > Added new node to topology: TcpDiscoveryNode > [id=979cf868-1c37-424a-9ad1-12db501f32ef, addrs=[0:0:0:0:0:0:0:1, 127.0.0.1, > 172.31.16.213], sockAddrs=[/0:0:0:0:0:0:0:1:0, /127.0.0.1:0, > /172.31.16.213:0], discPort=0, order=36, intOrder=24, > lastExchangeTime=1512907805688, loc=false, ver=2.3.1#20171129-sha1:4b1ec0fe, > isClient=true] > [15:10:05,701][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager] > Topology snapshot [ver=36, servers=7, clients=5, CPUs=128, heap=160.0GB] > [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Started > exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], > crd=false, evt=NODE_JOINED, evtNode=979cf868-1c37-424a-9ad1-12db501f32ef, > customEvt=null, allowMerge=true] > [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionsExchangeFuture] > Finish exchange future [startVer=AffinityTopologyVersion [topVer=36, > minorTopVer=0], resVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], > err=null] > [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Finished > exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], > crd=false] > [15:10:05,703][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager] > Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion > [topVer=36, minorTopVer=0], evt=NODE_JOINED, > node=979cf868-1c37-424a-9ad1-12db501f32ef] > [15:10:08,706][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Cancelled rebalancing from all nodes [topology=AffinityTopologyVersion > [topVer=35, minorTopVer=0]] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager] > Rebalancing scheduled [order=[statementp]] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager] > Rebalancing started [top=null, evt=NODE_JOINED, > node=a8be3c14-9add-48c3-b099-3fd304cfdbf4] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=2f6bde48-ffb5-4815-bd32-df4e57dc13e0, partitionsCount=18, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=35d01141-4dce-47dd-adf6-a4f3b2bb9da9, partitionsCount=15, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=b3a8be53-e61f-4023-a906-a265923837ba, partitionsCount=15, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=f825cb4e-7dcc-405f-a40d-c1dc1a3ade5a, partitionsCount=12, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=4ae1db91-8b88-4180-a84b-127a303959e9, partitionsCount=11, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=7c286481-7638-49e4-8c68-fa6aa65d8b76, partitionsCount=18, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > so in clusters with a big amount of data and the frequent client left/join > events this means that a new server will never receive its partitions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-7165) Re-balancing is cancelled if client node joins
[ https://issues.apache.org/jira/browse/IGNITE-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16582617#comment-16582617 ] Dmitry Sherstobitov commented on IGNITE-7165: - Looks like we have old issue with hanging rebalance. 7165 pull request increase the chance of this issue Adding Thread.sleep(200) in this method solves the problem: {code:java} GridCachePartitionExchangeManager.java private void processFullPartitionUpdate(ClusterNode node, GridDhtPartitionsFullMessage msg) { if (!enterBusy()) return; try { if (msg.exchangeId() == null) { if (log.isDebugEnabled()) log.debug("Received full partition update [node=" + node.id() + ", msg=" + msg + ']'); boolean updated = false; for (Map.Entry entry : msg.partitions().entrySet()) { try { Thread.sleep(200); } catch (InterruptedException e) { e.printStackTrace(); }{code} I've recently test 137dd06aaee9cc84104e6b4bf48306b050341e3a + this code in my test environment and it's passed. > Re-balancing is cancelled if client node joins > -- > > Key: IGNITE-7165 > URL: https://issues.apache.org/jira/browse/IGNITE-7165 > Project: Ignite > Issue Type: Bug >Reporter: Mikhail Cherkasov >Assignee: Maxim Muzafarov >Priority: Critical > Labels: rebalance > Fix For: 2.7 > > Attachments: node-2-jstack.log, node-NO_REBALANCE-7165.log > > > Re-balancing is canceled if client node joins. Re-balancing can take hours > and each time when client node joins it starts again: > [15:10:05,700][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager] > Added new node to topology: TcpDiscoveryNode > [id=979cf868-1c37-424a-9ad1-12db501f32ef, addrs=[0:0:0:0:0:0:0:1, 127.0.0.1, > 172.31.16.213], sockAddrs=[/0:0:0:0:0:0:0:1:0, /127.0.0.1:0, > /172.31.16.213:0], discPort=0, order=36, intOrder=24, > lastExchangeTime=1512907805688, loc=false, ver=2.3.1#20171129-sha1:4b1ec0fe, > isClient=true] > [15:10:05,701][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager] > Topology snapshot [ver=36, servers=7, clients=5, CPUs=128, heap=160.0GB] > [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Started > exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], > crd=false, evt=NODE_JOINED, evtNode=979cf868-1c37-424a-9ad1-12db501f32ef, > customEvt=null, allowMerge=true] > [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionsExchangeFuture] > Finish exchange future [startVer=AffinityTopologyVersion [topVer=36, > minorTopVer=0], resVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], > err=null] > [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Finished > exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], > crd=false] > [15:10:05,703][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager] > Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion > [topVer=36, minorTopVer=0], evt=NODE_JOINED, > node=979cf868-1c37-424a-9ad1-12db501f32ef] > [15:10:08,706][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Cancelled rebalancing from all nodes [topology=AffinityTopologyVersion > [topVer=35, minorTopVer=0]] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager] > Rebalancing scheduled [order=[statementp]] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager] > Rebalancing started [top=null, evt=NODE_JOINED, > node=a8be3c14-9add-48c3-b099-3fd304cfdbf4] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=2f6bde48-ffb5-4815-bd32-df4e57dc13e0, partitionsCount=18, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=35d01141-4dce-47dd-adf6-a4f3b2bb9da9, partitionsCount=15, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=b3a8be53-e61f-4023-a906-a265923837ba, partitionsCount=15, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=f825cb4e-7dcc-405f-a40d-c1dc1a3ade5a, partitionsCount=12, > topology=AffinityTopologyVersio
[jira] [Commented] (IGNITE-7165) Re-balancing is cancelled if client node joins
[ https://issues.apache.org/jira/browse/IGNITE-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16582459#comment-16582459 ] Dmitry Sherstobitov commented on IGNITE-7165: - [~avinogradov] Yes. Two builds under test: from apache-ignite master (137dd06aaee9cc84104e6b4bf48306b050341e3a and f6f731f575290b10d6d6bcb6869bb0a1b470455e revisions) > Re-balancing is cancelled if client node joins > -- > > Key: IGNITE-7165 > URL: https://issues.apache.org/jira/browse/IGNITE-7165 > Project: Ignite > Issue Type: Bug >Reporter: Mikhail Cherkasov >Assignee: Maxim Muzafarov >Priority: Critical > Labels: rebalance > Fix For: 2.7 > > Attachments: node-2-jstack.log, node-NO_REBALANCE-7165.log > > > Re-balancing is canceled if client node joins. Re-balancing can take hours > and each time when client node joins it starts again: > [15:10:05,700][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager] > Added new node to topology: TcpDiscoveryNode > [id=979cf868-1c37-424a-9ad1-12db501f32ef, addrs=[0:0:0:0:0:0:0:1, 127.0.0.1, > 172.31.16.213], sockAddrs=[/0:0:0:0:0:0:0:1:0, /127.0.0.1:0, > /172.31.16.213:0], discPort=0, order=36, intOrder=24, > lastExchangeTime=1512907805688, loc=false, ver=2.3.1#20171129-sha1:4b1ec0fe, > isClient=true] > [15:10:05,701][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager] > Topology snapshot [ver=36, servers=7, clients=5, CPUs=128, heap=160.0GB] > [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Started > exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], > crd=false, evt=NODE_JOINED, evtNode=979cf868-1c37-424a-9ad1-12db501f32ef, > customEvt=null, allowMerge=true] > [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionsExchangeFuture] > Finish exchange future [startVer=AffinityTopologyVersion [topVer=36, > minorTopVer=0], resVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], > err=null] > [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Finished > exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], > crd=false] > [15:10:05,703][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager] > Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion > [topVer=36, minorTopVer=0], evt=NODE_JOINED, > node=979cf868-1c37-424a-9ad1-12db501f32ef] > [15:10:08,706][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Cancelled rebalancing from all nodes [topology=AffinityTopologyVersion > [topVer=35, minorTopVer=0]] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager] > Rebalancing scheduled [order=[statementp]] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager] > Rebalancing started [top=null, evt=NODE_JOINED, > node=a8be3c14-9add-48c3-b099-3fd304cfdbf4] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=2f6bde48-ffb5-4815-bd32-df4e57dc13e0, partitionsCount=18, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=35d01141-4dce-47dd-adf6-a4f3b2bb9da9, partitionsCount=15, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=b3a8be53-e61f-4023-a906-a265923837ba, partitionsCount=15, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=f825cb4e-7dcc-405f-a40d-c1dc1a3ade5a, partitionsCount=12, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=4ae1db91-8b88-4180-a84b-127a303959e9, partitionsCount=11, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=7c286481-7638-49e4-8c68-fa6aa65d8b76, partitionsCount=18, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > so in clusters with a big amount of data and the frequent client left/join > events this means that a new server will never receive its partition
[jira] [Commented] (IGNITE-7165) Re-balancing is cancelled if client node joins
[ https://issues.apache.org/jira/browse/IGNITE-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16582280#comment-16582280 ] Dmitry Sherstobitov commented on IGNITE-7165: - [~avinogradov] 1) I'm trying to solve this problem by writing jUnit reproducer. 2) I've test with and without this pull request and it is definitely a problem in this commit. > Re-balancing is cancelled if client node joins > -- > > Key: IGNITE-7165 > URL: https://issues.apache.org/jira/browse/IGNITE-7165 > Project: Ignite > Issue Type: Bug >Reporter: Mikhail Cherkasov >Assignee: Maxim Muzafarov >Priority: Critical > Labels: rebalance > Fix For: 2.7 > > Attachments: node-2-jstack.log, node-NO_REBALANCE-7165.log > > > Re-balancing is canceled if client node joins. Re-balancing can take hours > and each time when client node joins it starts again: > [15:10:05,700][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager] > Added new node to topology: TcpDiscoveryNode > [id=979cf868-1c37-424a-9ad1-12db501f32ef, addrs=[0:0:0:0:0:0:0:1, 127.0.0.1, > 172.31.16.213], sockAddrs=[/0:0:0:0:0:0:0:1:0, /127.0.0.1:0, > /172.31.16.213:0], discPort=0, order=36, intOrder=24, > lastExchangeTime=1512907805688, loc=false, ver=2.3.1#20171129-sha1:4b1ec0fe, > isClient=true] > [15:10:05,701][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager] > Topology snapshot [ver=36, servers=7, clients=5, CPUs=128, heap=160.0GB] > [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Started > exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], > crd=false, evt=NODE_JOINED, evtNode=979cf868-1c37-424a-9ad1-12db501f32ef, > customEvt=null, allowMerge=true] > [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionsExchangeFuture] > Finish exchange future [startVer=AffinityTopologyVersion [topVer=36, > minorTopVer=0], resVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], > err=null] > [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Finished > exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], > crd=false] > [15:10:05,703][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager] > Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion > [topVer=36, minorTopVer=0], evt=NODE_JOINED, > node=979cf868-1c37-424a-9ad1-12db501f32ef] > [15:10:08,706][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Cancelled rebalancing from all nodes [topology=AffinityTopologyVersion > [topVer=35, minorTopVer=0]] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager] > Rebalancing scheduled [order=[statementp]] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager] > Rebalancing started [top=null, evt=NODE_JOINED, > node=a8be3c14-9add-48c3-b099-3fd304cfdbf4] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=2f6bde48-ffb5-4815-bd32-df4e57dc13e0, partitionsCount=18, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=35d01141-4dce-47dd-adf6-a4f3b2bb9da9, partitionsCount=15, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=b3a8be53-e61f-4023-a906-a265923837ba, partitionsCount=15, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=f825cb4e-7dcc-405f-a40d-c1dc1a3ade5a, partitionsCount=12, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=4ae1db91-8b88-4180-a84b-127a303959e9, partitionsCount=11, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=7c286481-7638-49e4-8c68-fa6aa65d8b76, partitionsCount=18, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > so in clusters with a big amount of data and the frequent client left/join > events this means that a new server will never receive its part
[jira] [Comment Edited] (IGNITE-7165) Re-balancing is cancelled if client node joins
[ https://issues.apache.org/jira/browse/IGNITE-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16581164#comment-16581164 ] Dmitry Sherstobitov edited comment on IGNITE-7165 at 8/15/18 2:44 PM: -- Following set of caches leads to bug in my test :) All this caches unable to change their JMX properties after clients connect/disconnect. I'm still trying to reduce this list, but for now this is final set {code:xml} {code} This is log from test frameworks output: {code} Current metric state for cache cache_group_1_028 on node 2: 19 Current metric state for cache cache_group_2_058 on node 2: 32 Current metric state for cache cache_group_5 on node 2: 128 Current metric state for cache cache_group_5 on node 2: 128 Current metric state for cache cache_group_4 on node 2: 512 Current metric state for cache cache_group_4_118 on node 2: 32 Current metric state for cache cache_group_6 on node 2: 64 Current metric state for cache cache_group_2_031 on node 2: 512 Current metric state for cache cache_group_6 on node 2: 64 [17:43:27][:568 :617] Wait rebalance to finish 7/240 Current metric state for cache cache_group_1_028 on node 2: 19 Current metric state for cache cache_group_2_058 on node 2: 32 Current metric state for cache cache_group_5 on node 2: 128 Current metric state for cache cache_group_5 on node 2: 128 Current metric state for cache cache_group_4 on node 2: 512 Current metric state for cache cache_group_4_118 on node 2: 32 Current metric state for cache cache_group_6 on node 2: 64 Current metric state for cache cache_group_2_031 on node 2: 512 Current metric state for cache cache_group_6 on node 2: 64 {code} was (Author: qvad): Following set of caches leads to bug in my test :) All this caches unable to change their JMX properties after clients connect/disconnect. I'm still trying to reduce this list, but for now this is final set {code:xml} {code} > Re-balancing is cancelled if client node joins > -- > > Key: IGNITE-7165 > URL: https://issues.apache.org/jira/browse/IGNITE-7165 > Project: Ignite > Issue Type: Bug >Reporter: Mikhail Cherkasov >Assignee: Maxim Muzafarov >Priority: Critical > Labels: rebalance > Fix For: 2.7 > > Attachments: node-2-jstack.log, node-NO_REBALANCE-7165.log > > > Re-balancing is canceled if client node joins. Re-balancing can take hours > and each time when client node joins it starts again: > [15:10:05,700][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager] > Added new node to topology: TcpDiscoveryNode > [id=979cf868-1c37-424a-9ad1-12db501f32ef, addrs=[0:0:0:0:0:0:0:1, 127.0.0.1, > 172.31.16.213], sockAddrs=[/0:0:0:0:0:0:0:1:0, /127.0.0.1:0, > /172.31.16.213:0], discPort=0, order=36,
[jira] [Comment Edited] (IGNITE-7165) Re-balancing is cancelled if client node joins
[ https://issues.apache.org/jira/browse/IGNITE-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16581164#comment-16581164 ] Dmitry Sherstobitov edited comment on IGNITE-7165 at 8/15/18 2:42 PM: -- Following set of caches leads to bug in my test :) All this caches unable to change their JMX properties after clients connect/disconnect. I'm still trying to reduce this list, but for now this is final set {code:xml} {code} was (Author: qvad): Following set of caches leads to bug in my test :) I'm still trying to reduce this list, but for now this is final set {code:xml} {code} > Re-balancing is cancelled if client node joins > -- > > Key: IGNITE-7165 > URL: https://issues.apache.org/jira/browse/IGNITE-7165 > Project: Ignite > Issue Type: Bug >Reporter: Mikhail Cherkasov >Assignee: Maxim Muzafarov >Priority: Critical > Labels: rebalance > Fix For: 2.7 > > Attachments: node-2-jstack.log, node-NO_REBALANCE-7165.log > > > Re-balancing is canceled if client node joins. Re-balancing can take hours > and each time when client node joins it starts again: > [15:10:05,700][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager] > Added new node to topology: TcpDiscoveryNode > [id=979cf868-1c37-424a-9ad1-12db501f32ef, addrs=[0:0:0:0:0:0:0:1, 127.0.0.1, > 172.31.16.213], sockAddrs=[/0:0:0:0:0:0:0:1:0, /127.0.0.1:0, > /172.31.16.213:0], discPort=0, order=36, intOrder=24, > lastExchangeTime=1512907805688, loc=false, ver=2.3.1#20171129-sha1:4b1ec0fe, > isClient=true] > [15:10:05,701][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager] > Topology snapshot [ver=36, servers=7, clients=5, CPUs=128, heap=160.0GB] > [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Started > exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], > crd=false, evt=NODE_JOINED, evtNode=979cf868-1c37-424a-9ad1-12db501f32ef, > customEvt=null, allowMerge=true] > [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionsExchangeFuture] > Finish exchange future [startVer=AffinityTopologyVersion [topVer=36, > minorTopVer=0], resVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], > err=null] > [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Finished > exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], > crd=false] > [15:10:05,703][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager] > Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion > [topVer=36, minorTopVer=0], evt=NODE_JOINED, > node=979cf868-1c37-424a-9ad1-12db501f32ef] > [15:10:08,706][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Cancelled rebalancing from all nodes [topology=AffinityTopologyVersion > [topVer=35, minorTopVer=0]] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager] > R
[jira] [Commented] (IGNITE-7165) Re-balancing is cancelled if client node joins
[ https://issues.apache.org/jira/browse/IGNITE-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16581164#comment-16581164 ] Dmitry Sherstobitov commented on IGNITE-7165: - Following set of caches leads to bug in my test :) I'm still trying to reduce this list, but for now this is final set {code:xml} {code} > Re-balancing is cancelled if client node joins > -- > > Key: IGNITE-7165 > URL: https://issues.apache.org/jira/browse/IGNITE-7165 > Project: Ignite > Issue Type: Bug >Reporter: Mikhail Cherkasov >Assignee: Maxim Muzafarov >Priority: Critical > Labels: rebalance > Fix For: 2.7 > > Attachments: node-2-jstack.log, node-NO_REBALANCE-7165.log > > > Re-balancing is canceled if client node joins. Re-balancing can take hours > and each time when client node joins it starts again: > [15:10:05,700][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager] > Added new node to topology: TcpDiscoveryNode > [id=979cf868-1c37-424a-9ad1-12db501f32ef, addrs=[0:0:0:0:0:0:0:1, 127.0.0.1, > 172.31.16.213], sockAddrs=[/0:0:0:0:0:0:0:1:0, /127.0.0.1:0, > /172.31.16.213:0], discPort=0, order=36, intOrder=24, > lastExchangeTime=1512907805688, loc=false, ver=2.3.1#20171129-sha1:4b1ec0fe, > isClient=true] > [15:10:05,701][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager] > Topology snapshot [ver=36, servers=7, clients=5, CPUs=128, heap=160.0GB] > [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Started > exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], > crd=false, evt=NODE_JOINED, evtNode=979cf868-1c37-424a-9ad1-12db501f32ef, > customEvt=null, allowMerge=true] > [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionsExchangeFuture] > Finish exchange future [startVer=AffinityTopologyVersion [topVer=36, > minorTopVer=0], resVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], > err=null] > [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Finished > exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], > crd=false] > [15:10:05,703][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager] > Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion > [topVer=36, minorTopVer=0], evt=NODE_JOINED, > node=979cf868-1c37-424a-9ad1-12db501f32ef] > [15:10:08,706][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Cancelled rebalancing from all nodes [topology=AffinityTopologyVersion > [topVer=35, minorTopVer=0]] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager] > Rebalancing scheduled [order=[statementp]] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager] > Rebalancing started [top=null, evt=NODE_JOINED, > node=a8be3c14-9add-48c3-b099-3fd304cfdbf4] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=2f6bde48-ffb5-4815-bd32-df4e57dc13e0, partitionsCount=18, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=35d01141-4dce-47dd-adf6-a4f3b2bb9da9, partitionsCount=15, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=b3a8be53-e61f-4023-a906-a265923837ba, partitionsCount=15, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=f825cb4e-7dcc-405f-a40d-c1dc1a3ade5a, partitionsCount=12, > topology=AffinityTopologyVersion
[jira] [Commented] (IGNITE-7165) Re-balancing is cancelled if client node joins
[ https://issues.apache.org/jira/browse/IGNITE-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16581079#comment-16581079 ] Dmitry Sherstobitov commented on IGNITE-7165: - [~Mmuzaf] I'm looking on this issue too. I've some additional information: cache cache_group_1_028 is not the single cache in test. And there is some dependency on caches number or(and) configs. I will provide you information, when I'll got some results. > Re-balancing is cancelled if client node joins > -- > > Key: IGNITE-7165 > URL: https://issues.apache.org/jira/browse/IGNITE-7165 > Project: Ignite > Issue Type: Bug >Reporter: Mikhail Cherkasov >Assignee: Maxim Muzafarov >Priority: Critical > Labels: rebalance > Fix For: 2.7 > > Attachments: node-2-jstack.log, node-NO_REBALANCE-7165.log > > > Re-balancing is canceled if client node joins. Re-balancing can take hours > and each time when client node joins it starts again: > [15:10:05,700][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager] > Added new node to topology: TcpDiscoveryNode > [id=979cf868-1c37-424a-9ad1-12db501f32ef, addrs=[0:0:0:0:0:0:0:1, 127.0.0.1, > 172.31.16.213], sockAddrs=[/0:0:0:0:0:0:0:1:0, /127.0.0.1:0, > /172.31.16.213:0], discPort=0, order=36, intOrder=24, > lastExchangeTime=1512907805688, loc=false, ver=2.3.1#20171129-sha1:4b1ec0fe, > isClient=true] > [15:10:05,701][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager] > Topology snapshot [ver=36, servers=7, clients=5, CPUs=128, heap=160.0GB] > [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Started > exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], > crd=false, evt=NODE_JOINED, evtNode=979cf868-1c37-424a-9ad1-12db501f32ef, > customEvt=null, allowMerge=true] > [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionsExchangeFuture] > Finish exchange future [startVer=AffinityTopologyVersion [topVer=36, > minorTopVer=0], resVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], > err=null] > [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Finished > exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], > crd=false] > [15:10:05,703][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager] > Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion > [topVer=36, minorTopVer=0], evt=NODE_JOINED, > node=979cf868-1c37-424a-9ad1-12db501f32ef] > [15:10:08,706][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Cancelled rebalancing from all nodes [topology=AffinityTopologyVersion > [topVer=35, minorTopVer=0]] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager] > Rebalancing scheduled [order=[statementp]] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager] > Rebalancing started [top=null, evt=NODE_JOINED, > node=a8be3c14-9add-48c3-b099-3fd304cfdbf4] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=2f6bde48-ffb5-4815-bd32-df4e57dc13e0, partitionsCount=18, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=35d01141-4dce-47dd-adf6-a4f3b2bb9da9, partitionsCount=15, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=b3a8be53-e61f-4023-a906-a265923837ba, partitionsCount=15, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=f825cb4e-7dcc-405f-a40d-c1dc1a3ade5a, partitionsCount=12, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=4ae1db91-8b88-4180-a84b-127a303959e9, partitionsCount=11, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=7c286481-7638-49e4-8c68-fa6aa65d8b76, partitionsCount=18, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > so in clusters with a big amount of data and the frequent
[jira] [Comment Edited] (IGNITE-7165) Re-balancing is cancelled if client node joins
[ https://issues.apache.org/jira/browse/IGNITE-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580970#comment-16580970 ] Dmitry Sherstobitov edited comment on IGNITE-7165 at 8/15/18 12:15 PM: --- [~Mmuzaf] Config: ([^node-2-jstack.log]PD: Disabling persistance solves the problem) {code:java} {code} Test code: {code:java} def test_blinking_clients_clean_lfs(self): """ IGNITE-7165 """ self.wait_for_running_clients_num(client_num=0, timeout=120) self.start_grid() # start 4 nodes for _ in range(0, 10): log_print("Iteration %s" % str(_)) self.assert_nodes_alive() # check that no nodes left grid because of FailHandler self.ignite.kill_node(2) self._cleanup_lfs(2) self.ignite.start_node(2) # start Ignition.start() with client config and do nothing 3 times with PiClient(self.ignite, self.get_client_config()): pass with PiClient(self.ignite, self.get_client_config()): pass with PiClient(self.ignite, self.get_client_config()): pass # check LocalNodeMovingPartitionsCount metric for all cache groups in cluster # wait that for all cache groups this value will be 0 self.wait_for_finish_rebalance(){code} self.start_grid() start real grid on distributed servers using ignite.sh scripts. with PiClient block start JVM and runs Ignition.start() with client config (major difference with server config is clientMode=true) Log file of this test contains following information: metric dos not change their state in 240 seconds in current master. (I've recently check this on 15 Aug nightly build) {code:java} Current metric state for cache cache_group_1_028 on node 2: 19 [14:44:58][:568 :617] Wait rebalance to finish 7/240 Current metric state for cache cache_group_1_028 on node 2: 19 [14:45:04][:568 :617] Wait rebalance to finish 13/240 Current metric state for cache cache_group_1_028 on node 2: 19 [14:48:47][:568 :617] Wait rebalance to finish 236/240 Current metric state for cache cache_group_1_028 on node 2: 19{code} Config of the cache that fails: {code:xml} {code} I'm afraid that this all information that I can provide for you for now. I've attached jstack from node2: [^node-2-jstack.log]^[^node-2-jstack.log]^ was (Author: qvad): [~Mmuzaf] Config: {code:java} {code} Test code: {code} def test_blinking_clients_clean_lfs(self): """ IGNITE-7165 """ self.wait_for_running_clients_num(client_num=0, timeout=120) self.start_grid() # start 4 nodes for _ in range(0, 10): log_print("Iteration %s" % str(_)) self.assert_nodes_alive() # check that no nodes left grid because of FailHandler self.ignite.kill_node(2) self._cleanup_lfs(2) self.ignite.start_node(2) # start Ignition.start() with client config and do nothing 3 times with PiClient(self.ignite, self.get_client_config()): pass with PiClient(self.ignite, self.get_client_config()): pass with PiClient(self.ignite, self.get_client_config()): pass # check LocalNodeMovingPartitionsCount metric for all cache groups in cluster # wait that for all cache groups this value will be 0 self.wait_for_finish_rebalance(){code} self.start_grid() start real grid on distributed servers using ignite.sh scripts. with PiClient block start JVM and runs Ignition.start() with client config (major difference with server config is clientMode=true) Log file of this test contains following information: metric dos not change their state in 240 seconds in current master. (I've recently check this on 15 Aug nightly build) {code} Current metric state for cache cache_group_1_028 on node 2: 19 [14:44:58][:568 :617] Wait rebalance to finish 7/240 Current metric state for cache cache_group_1_028 on node 2: 19 [14:45:04][:568 :617] Wait rebalance to finish 13/240 Current metric state for cache cache_group_1_028 on node 2: 19 [14:48:47][:568 :617] Wait rebalance to finish 236/240 Current metric state for cache cache_group_1_028 on node 2: 19{code} Config of the cache that fails: {code:xml} {code} I'm afraid that this all information that I can provide for you for now. I've attached jstack from node2: [^node-2-jstack.log]^[^node-2-js
[jira] [Comment Edited] (IGNITE-7165) Re-balancing is cancelled if client node joins
[ https://issues.apache.org/jira/browse/IGNITE-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580970#comment-16580970 ] Dmitry Sherstobitov edited comment on IGNITE-7165 at 8/15/18 12:15 PM: --- [~Mmuzaf] Config: (UPD: Disabling persistance solves the problem) {code:java} {code} Test code: {code:java} def test_blinking_clients_clean_lfs(self): """ IGNITE-7165 """ self.wait_for_running_clients_num(client_num=0, timeout=120) self.start_grid() # start 4 nodes for _ in range(0, 10): log_print("Iteration %s" % str(_)) self.assert_nodes_alive() # check that no nodes left grid because of FailHandler self.ignite.kill_node(2) self._cleanup_lfs(2) self.ignite.start_node(2) # start Ignition.start() with client config and do nothing 3 times with PiClient(self.ignite, self.get_client_config()): pass with PiClient(self.ignite, self.get_client_config()): pass with PiClient(self.ignite, self.get_client_config()): pass # check LocalNodeMovingPartitionsCount metric for all cache groups in cluster # wait that for all cache groups this value will be 0 self.wait_for_finish_rebalance(){code} self.start_grid() start real grid on distributed servers using ignite.sh scripts. with PiClient block start JVM and runs Ignition.start() with client config (major difference with server config is clientMode=true) Log file of this test contains following information: metric dos not change their state in 240 seconds in current master. (I've recently check this on 15 Aug nightly build) {code:java} Current metric state for cache cache_group_1_028 on node 2: 19 [14:44:58][:568 :617] Wait rebalance to finish 7/240 Current metric state for cache cache_group_1_028 on node 2: 19 [14:45:04][:568 :617] Wait rebalance to finish 13/240 Current metric state for cache cache_group_1_028 on node 2: 19 [14:48:47][:568 :617] Wait rebalance to finish 236/240 Current metric state for cache cache_group_1_028 on node 2: 19{code} Config of the cache that fails: {code:xml} {code} I'm afraid that this all information that I can provide for you for now. I've attached jstack from node2: [^node-2-jstack.log]^[^node-2-jstack.log]^ was (Author: qvad): [~Mmuzaf] Config: ([^node-2-jstack.log]PD: Disabling persistance solves the problem) {code:java} {code} Test code: {code:java} def test_blinking_clients_clean_lfs(self): """ IGNITE-7165 """ self.wait_for_running_clients_num(client_num=0, timeout=120) self.start_grid() # start 4 nodes for _ in range(0, 10): log_print("Iteration %s" % str(_)) self.assert_nodes_alive() # check that no nodes left grid because of FailHandler self.ignite.kill_node(2) self._cleanup_lfs(2) self.ignite.start_node(2) # start Ignition.start() with client config and do nothing 3 times with PiClient(self.ignite, self.get_client_config()): pass with PiClient(self.ignite, self.get_client_config()): pass with PiClient(self.ignite, self.get_client_config()): pass # check LocalNodeMovingPartitionsCount metric for all cache groups in cluster # wait that for all cache groups this value will be 0 self.wait_for_finish_rebalance(){code} self.start_grid() start real grid on distributed servers using ignite.sh scripts. with PiClient block start JVM and runs Ignition.start() with client config (major difference with server config is clientMode=true) Log file of this test contains following information: metric dos not change their state in 240 seconds in current master. (I've recently check this on 15 Aug nightly build) {code:java} Current metric state for cache cache_group_1_028 on node 2: 19 [14:44:58][:568 :617] Wait rebalance to finish 7/240 Current metric state for cache cache_group_1_028 on node 2: 19 [14:45:04][:568 :617] Wait rebalance to finish 13/240 Current metric state for cache cache_group_1_028 on node 2: 19 [14:48:47][:568 :617] Wait rebalance to finish 236/240 Current metric state for cache cache_group_1_028 on node 2: 19{code} Config of the cache that fails: {code:xml} {code} I'm afraid that this all information that I can provide for you for now. I've a
[jira] [Comment Edited] (IGNITE-7165) Re-balancing is cancelled if client node joins
[ https://issues.apache.org/jira/browse/IGNITE-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580970#comment-16580970 ] Dmitry Sherstobitov edited comment on IGNITE-7165 at 8/15/18 12:04 PM: --- [~Mmuzaf] Config: {code:java} {code} Test code: {code} def test_blinking_clients_clean_lfs(self): """ IGNITE-7165 """ self.wait_for_running_clients_num(client_num=0, timeout=120) self.start_grid() # start 4 nodes for _ in range(0, 10): log_print("Iteration %s" % str(_)) self.assert_nodes_alive() # check that no nodes left grid because of FailHandler self.ignite.kill_node(2) self._cleanup_lfs(2) self.ignite.start_node(2) # start Ignition.start() with client config and do nothing 3 times with PiClient(self.ignite, self.get_client_config()): pass with PiClient(self.ignite, self.get_client_config()): pass with PiClient(self.ignite, self.get_client_config()): pass # check LocalNodeMovingPartitionsCount metric for all cache groups in cluster # wait that for all cache groups this value will be 0 self.wait_for_finish_rebalance(){code} self.start_grid() start real grid on distributed servers using ignite.sh scripts. with PiClient block start JVM and runs Ignition.start() with client config (major difference with server config is clientMode=true) Log file of this test contains following information: metric dos not change their state in 240 seconds in current master. (I've recently check this on 15 Aug nightly build) {code} Current metric state for cache cache_group_1_028 on node 2: 19 [14:44:58][:568 :617] Wait rebalance to finish 7/240 Current metric state for cache cache_group_1_028 on node 2: 19 [14:45:04][:568 :617] Wait rebalance to finish 13/240 Current metric state for cache cache_group_1_028 on node 2: 19 [14:48:47][:568 :617] Wait rebalance to finish 236/240 Current metric state for cache cache_group_1_028 on node 2: 19{code} Config of the cache that fails: {code:xml} {code} I'm afraid that this all information that I can provide for you for now. I've attached jstack from node2: [^node-2-jstack.log]^[^node-2-jstack.log]^ was (Author: qvad): [~Mmuzaf] Config: {code:java} {code} Test code: {code:python} def test_blinking_clients_clean_lfs(self): """ IGNITE-7165 """ self.wait_for_running_clients_num(client_num=0, timeout=120) self.start_grid() # start 4 nodes for _ in range(0, 10): log_print("Iteration %s" % str(_)) self.assert_nodes_alive() # check that no nodes left grid because of FailHandler self.ignite.kill_node(2) self._cleanup_lfs(2) self.ignite.start_node(2) # start Ignition.start() with client config and do nothing 3 times with PiClient(self.ignite, self.get_client_config()): pass with PiClient(self.ignite, self.get_client_config()): pass with PiClient(self.ignite, self.get_client_config()): pass # check LocalNodeMovingPartitionsCount metric for all cache groups in cluster # wait that for all cache groups this value will be 0 self.wait_for_finish_rebalance(){code} Here is code from our test on python. self.start_grid() start real grid on distributed servers using ignite.sh scripts. with PiClient block start JVM and runs Ignition.start() with client config (major difference with server config is clientMode=true) Log file of this test contains following information: metric dos not change their state in 240 seconds in current master. (I've recently check this on 15 Aug nightly build) {code:sh} Current metric state for cache cache_group_1_028 on node 2: 19 [14:44:58][:568 :617] Wait rebalance to finish 7/240 Current metric state for cache cache_group_1_028 on node 2: 19 [14:45:04][:568 :617] Wait rebalance to finish 13/240 Current metric state for cache cache_group_1_028 on node 2: 19 [14:48:47][:568 :617] Wait rebalance to finish 236/240 Current metric state for cache cache_group_1_028 on node 2: 19{code} Config of the cache that fails: {code:xml} {code} I'm afraid that this all information that I can provide for you for now. I've attached jstack from node2: [^node-2-jstack.log] > Re-balancing is cancelled if client
[jira] [Comment Edited] (IGNITE-7165) Re-balancing is cancelled if client node joins
[ https://issues.apache.org/jira/browse/IGNITE-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580970#comment-16580970 ] Dmitry Sherstobitov edited comment on IGNITE-7165 at 8/15/18 12:03 PM: --- [~Mmuzaf] Config: {code:java} {code} Test code: {code:python} def test_blinking_clients_clean_lfs(self): """ IGNITE-7165 """ self.wait_for_running_clients_num(client_num=0, timeout=120) self.start_grid() # start 4 nodes for _ in range(0, 10): log_print("Iteration %s" % str(_)) self.assert_nodes_alive() # check that no nodes left grid because of FailHandler self.ignite.kill_node(2) self._cleanup_lfs(2) self.ignite.start_node(2) # start Ignition.start() with client config and do nothing 3 times with PiClient(self.ignite, self.get_client_config()): pass with PiClient(self.ignite, self.get_client_config()): pass with PiClient(self.ignite, self.get_client_config()): pass # check LocalNodeMovingPartitionsCount metric for all cache groups in cluster # wait that for all cache groups this value will be 0 self.wait_for_finish_rebalance(){code} Here is code from our test on python. self.start_grid() start real grid on distributed servers using ignite.sh scripts. with PiClient block start JVM and runs Ignition.start() with client config (major difference with server config is clientMode=true) Log file of this test contains following information: metric dos not change their state in 240 seconds in current master. (I've recently check this on 15 Aug nightly build) {code:sh} Current metric state for cache cache_group_1_028 on node 2: 19 [14:44:58][:568 :617] Wait rebalance to finish 7/240 Current metric state for cache cache_group_1_028 on node 2: 19 [14:45:04][:568 :617] Wait rebalance to finish 13/240 Current metric state for cache cache_group_1_028 on node 2: 19 [14:48:47][:568 :617] Wait rebalance to finish 236/240 Current metric state for cache cache_group_1_028 on node 2: 19{code} Config of the cache that fails: {code:xml} {code} I'm afraid that this all information that I can provide for you for now. I've attached jstack from node2: [^node-2-jstack.log] was (Author: qvad): {code:java} {code} [~Mmuzaf] {code:java} {code} {code:java} def test_blinking_clients_clean_lfs(self): """ IGNITE-7165 """ self.wait_for_running_clients_num(client_num=0, timeout=120) self.start_grid() # start 4 nodes for _ in range(0, 10): log_print("Iteration %s" % str(_)) self.assert_nodes_alive() # check that no nodes left grid because of FailHandler self.ignite.kill_node(2) self._cleanup_lfs(2) self.ignite.start_node(2) # start Ignition.start() with client config and do nothing 3 times with PiClient(self.ignite, self.get_client_config()): pass with PiClient(self.ignite, self.get_client_config()): pass with PiClient(self.ignite, self.get_client_config()): pass # check LocalNodeMovingPartitionsCount metric for all cache groups in cluster # wait that for all cache groups this value will be 0 self.wait_for_finish_rebalance(){code} Here is code from our test on python. self.start_grid() start real grid on distributed servers using ignite.sh scripts. with PiClient block start JVM and runs Ignition.start() with client config (major difference with server config is clientMode=true) Log file of this test contains following information: metric dos not change their state in 240 seconds in current master. (I've recently check this on 15 Aug nightly build) {code:java} Current metric state for cache cache_group_1_028 on node 2: 19 [14:44:58][:568 :617] Wait rebalance to finish 7/240 Current metric state for cache cache_group_1_028 on node 2: 19 [14:45:04][:568 :617] Wait rebalance to finish 13/240 Current metric state for cache cache_group_1_028 on node 2: 19 [14:48:47][:568 :617] Wait rebalance to finish 236/240 Current metric state for cache cache_group_1_028 on node 2: 19{code} Config of the cache that fails: {code:java} {code} I'm afraid that this all information that I can provide for you for now. I've attached jstack from node2: [^node-2-jstack.log] > Re-bala
[jira] [Comment Edited] (IGNITE-7165) Re-balancing is cancelled if client node joins
[ https://issues.apache.org/jira/browse/IGNITE-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580970#comment-16580970 ] Dmitry Sherstobitov edited comment on IGNITE-7165 at 8/15/18 12:02 PM: --- {code:java} {code} [~Mmuzaf] {code:java} {code} {code:java} def test_blinking_clients_clean_lfs(self): """ IGNITE-7165 """ self.wait_for_running_clients_num(client_num=0, timeout=120) self.start_grid() # start 4 nodes for _ in range(0, 10): log_print("Iteration %s" % str(_)) self.assert_nodes_alive() # check that no nodes left grid because of FailHandler self.ignite.kill_node(2) self._cleanup_lfs(2) self.ignite.start_node(2) # start Ignition.start() with client config and do nothing 3 times with PiClient(self.ignite, self.get_client_config()): pass with PiClient(self.ignite, self.get_client_config()): pass with PiClient(self.ignite, self.get_client_config()): pass # check LocalNodeMovingPartitionsCount metric for all cache groups in cluster # wait that for all cache groups this value will be 0 self.wait_for_finish_rebalance(){code} Here is code from our test on python. self.start_grid() start real grid on distributed servers using ignite.sh scripts. with PiClient block start JVM and runs Ignition.start() with client config (major difference with server config is clientMode=true) Log file of this test contains following information: metric dos not change their state in 240 seconds in current master. (I've recently check this on 15 Aug nightly build) {code:java} Current metric state for cache cache_group_1_028 on node 2: 19 [14:44:58][:568 :617] Wait rebalance to finish 7/240 Current metric state for cache cache_group_1_028 on node 2: 19 [14:45:04][:568 :617] Wait rebalance to finish 13/240 Current metric state for cache cache_group_1_028 on node 2: 19 [14:48:47][:568 :617] Wait rebalance to finish 236/240 Current metric state for cache cache_group_1_028 on node 2: 19{code} Config of the cache that fails: {code:java} {code} I'm afraid that this all information that I can provide for you for now. I've attached jstack from node2: [^node-2-jstack.log] was (Author: qvad): {code:java} {code} {code:java} def test_blinking_clients_clean_lfs(self): """ IGNITE-7165 """ self.wait_for_running_clients_num(client_num=0, timeout=120) self.start_grid() # start 4 nodes for _ in range(0, 10): log_print("Iteration %s" % str(_)) self.assert_nodes_alive() # check that no nodes left grid because of FailHandler self.ignite.kill_node(2) self._cleanup_lfs(2) self.ignite.start_node(2) # start Ignition.start() with client config and do nothing 3 times with PiClient(self.ignite, self.get_client_config()): pass with PiClient(self.ignite, self.get_client_config()): pass with PiClient(self.ignite, self.get_client_config()): pass # check LocalNodeMovingPartitionsCount metric for all cache groups in cluster # wait that for all cache groups this value will be 0 self.wait_for_finish_rebalance(){code} Here is code from our test on python. self.start_grid() start real grid on distributed servers using ignite.sh scripts. with PiClient block start JVM and runs Ignition.start() with client config (major difference with server config is clientMode=true) Log file of this test contains following information: metric dos not change their state in 240 seconds in current master. (I've recently check this on 15 Aug nightly build) {code:java} Current metric state for cache cache_group_1_028 on node 2: 19 [14:44:58][:568 :617] Wait rebalance to finish 7/240 Current metric state for cache cache_group_1_028 on node 2: 19 [14:45:04][:568 :617] Wait rebalance to finish 13/240 Current metric state for cache cache_group_1_028 on node 2: 19 [14:48:47][:568 :617] Wait rebalance to finish 236/240 Current metric state for cache cache_group_1_028 on node 2: 19{code} Config of the cache that fails: {code:java} {code} I'm afraid that this all information that I can provide for you for now. I've attached jstack from node2: [^node-2-jstack.log] > Re-balancing is cancelled if client n
[jira] [Comment Edited] (IGNITE-7165) Re-balancing is cancelled if client node joins
[ https://issues.apache.org/jira/browse/IGNITE-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580970#comment-16580970 ] Dmitry Sherstobitov edited comment on IGNITE-7165 at 8/15/18 12:02 PM: --- {code:java} {code} [~Mmuzaf] {code:java} {code} {code:java} def test_blinking_clients_clean_lfs(self): """ IGNITE-7165 """ self.wait_for_running_clients_num(client_num=0, timeout=120) self.start_grid() # start 4 nodes for _ in range(0, 10): log_print("Iteration %s" % str(_)) self.assert_nodes_alive() # check that no nodes left grid because of FailHandler self.ignite.kill_node(2) self._cleanup_lfs(2) self.ignite.start_node(2) # start Ignition.start() with client config and do nothing 3 times with PiClient(self.ignite, self.get_client_config()): pass with PiClient(self.ignite, self.get_client_config()): pass with PiClient(self.ignite, self.get_client_config()): pass # check LocalNodeMovingPartitionsCount metric for all cache groups in cluster # wait that for all cache groups this value will be 0 self.wait_for_finish_rebalance(){code} Here is code from our test on python. self.start_grid() start real grid on distributed servers using ignite.sh scripts. with PiClient block start JVM and runs Ignition.start() with client config (major difference with server config is clientMode=true) Log file of this test contains following information: metric dos not change their state in 240 seconds in current master. (I've recently check this on 15 Aug nightly build) {code:java} Current metric state for cache cache_group_1_028 on node 2: 19 [14:44:58][:568 :617] Wait rebalance to finish 7/240 Current metric state for cache cache_group_1_028 on node 2: 19 [14:45:04][:568 :617] Wait rebalance to finish 13/240 Current metric state for cache cache_group_1_028 on node 2: 19 [14:48:47][:568 :617] Wait rebalance to finish 236/240 Current metric state for cache cache_group_1_028 on node 2: 19{code} Config of the cache that fails: {code:java} {code} I'm afraid that this all information that I can provide for you for now. I've attached jstack from node2: [^node-2-jstack.log] was (Author: qvad): {code:java} {code} [~Mmuzaf] {code:java} {code} {code:java} def test_blinking_clients_clean_lfs(self): """ IGNITE-7165 """ self.wait_for_running_clients_num(client_num=0, timeout=120) self.start_grid() # start 4 nodes for _ in range(0, 10): log_print("Iteration %s" % str(_)) self.assert_nodes_alive() # check that no nodes left grid because of FailHandler self.ignite.kill_node(2) self._cleanup_lfs(2) self.ignite.start_node(2) # start Ignition.start() with client config and do nothing 3 times with PiClient(self.ignite, self.get_client_config()): pass with PiClient(self.ignite, self.get_client_config()): pass with PiClient(self.ignite, self.get_client_config()): pass # check LocalNodeMovingPartitionsCount metric for all cache groups in cluster # wait that for all cache groups this value will be 0 self.wait_for_finish_rebalance(){code} Here is code from our test on python. self.start_grid() start real grid on distributed servers using ignite.sh scripts. with PiClient block start JVM and runs Ignition.start() with client config (major difference with server config is clientMode=true) Log file of this test contains following information: metric dos not change their state in 240 seconds in current master. (I've recently check this on 15 Aug nightly build) {code:java} Current metric state for cache cache_group_1_028 on node 2: 19 [14:44:58][:568 :617] Wait rebalance to finish 7/240 Current metric state for cache cache_group_1_028 on node 2: 19 [14:45:04][:568 :617] Wait rebalance to finish 13/240 Current metric state for cache cache_group_1_028 on node 2: 19 [14:48:47][:568 :617] Wait rebalance to finish 236/240 Current metric state for cache cache_group_1_028 on node 2: 19{code} Config of the cache that fails: {code:java} {code} I'm afraid that this all information that I can provide for you for now. I've attached jstack from node2: [^node-2-jstack.log] > Re-bala
[jira] [Comment Edited] (IGNITE-7165) Re-balancing is cancelled if client node joins
[ https://issues.apache.org/jira/browse/IGNITE-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580970#comment-16580970 ] Dmitry Sherstobitov edited comment on IGNITE-7165 at 8/15/18 12:01 PM: --- {code:java} {code} {code:java} def test_blinking_clients_clean_lfs(self): """ IGNITE-7165 """ self.wait_for_running_clients_num(client_num=0, timeout=120) self.start_grid() # start 4 nodes for _ in range(0, 10): log_print("Iteration %s" % str(_)) self.assert_nodes_alive() # check that no nodes left grid because of FailHandler self.ignite.kill_node(2) self._cleanup_lfs(2) self.ignite.start_node(2) # start Ignition.start() with client config and do nothing 3 times with PiClient(self.ignite, self.get_client_config()): pass with PiClient(self.ignite, self.get_client_config()): pass with PiClient(self.ignite, self.get_client_config()): pass # check LocalNodeMovingPartitionsCount metric for all cache groups in cluster # wait that for all cache groups this value will be 0 self.wait_for_finish_rebalance(){code} Here is code from our test on python. self.start_grid() start real grid on distributed servers using ignite.sh scripts. with PiClient block start JVM and runs Ignition.start() with client config (major difference with server config is clientMode=true) Log file of this test contains following information: metric dos not change their state in 240 seconds in current master. (I've recently check this on 15 Aug nightly build) {code:java} Current metric state for cache cache_group_1_028 on node 2: 19 [14:44:58][:568 :617] Wait rebalance to finish 7/240 Current metric state for cache cache_group_1_028 on node 2: 19 [14:45:04][:568 :617] Wait rebalance to finish 13/240 Current metric state for cache cache_group_1_028 on node 2: 19 [14:48:47][:568 :617] Wait rebalance to finish 236/240 Current metric state for cache cache_group_1_028 on node 2: 19{code} Config of the cache that fails: {code:java} {code} I'm afraid that this all information that I can provide for you for now. I've attached jstack from node2: [^node-2-jstack.log] was (Author: qvad): {code:java} {code} {code:java} def test_blinking_clients_clean_lfs(self): """ IGN-9159 (IGNITE-7165) """ self.wait_for_running_clients_num(client_num=0, timeout=120) self.start_grid() # start 4 nodes for _ in range(0, 10): log_print("Iteration %s" % str(_)) self.assert_nodes_alive() # check that no nodes left grid because of FailHandler self.ignite.kill_node(2) self._cleanup_lfs(2) self.ignite.start_node(2) # start Ignition.start() with client config and do nothing 3 times with PiClient(self.ignite, self.get_client_config()): pass with PiClient(self.ignite, self.get_client_config()): pass with PiClient(self.ignite, self.get_client_config()): pass # check LocalNodeMovingPartitionsCount metric for all cache groups in cluster # wait that for all cache groups this value will be 0 self.wait_for_finish_rebalance(){code} Here is code from our test on python. self.start_grid() start real grid on distributed servers using ignite.sh scripts. with PiClient block start JVM and runs Ignition.start() with client config (major difference with server config is clientMode=true) Log file of this test contains following information: metric dos not change their state in 240 seconds in current master. (I've recently check this on 15 Aug nightly build) {code:java} Current metric state for cache cache_group_1_028 on node 2: 19 [14:44:58][:568 :617] Wait rebalance to finish 7/240 Current metric state for cache cache_group_1_028 on node 2: 19 [14:45:04][:568 :617] Wait rebalance to finish 13/240 Current metric state for cache cache_group_1_028 on node 2: 19 [14:48:47][:568 :617] Wait rebalance to finish 236/240 Current metric state for cache cache_group_1_028 on node 2: 19{code} Config of the cache that fails: {code:java} {code} I'm afraid that this all information that I can provide for you for now. I've attached jstack from node2: [^node-2-jstack.log] > Re-balancing is cancelled if client node joins > ---
[jira] [Comment Edited] (IGNITE-7165) Re-balancing is cancelled if client node joins
[ https://issues.apache.org/jira/browse/IGNITE-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580970#comment-16580970 ] Dmitry Sherstobitov edited comment on IGNITE-7165 at 8/15/18 11:54 AM: --- {code:java} {code} {code:java} def test_blinking_clients_clean_lfs(self): """ IGN-9159 (IGNITE-7165) """ self.wait_for_running_clients_num(client_num=0, timeout=120) self.start_grid() # start 4 nodes for _ in range(0, 10): log_print("Iteration %s" % str(_)) self.assert_nodes_alive() # check that no nodes left grid because of FailHandler self.ignite.kill_node(2) self._cleanup_lfs(2) self.ignite.start_node(2) # start Ignition.start() with client config and do nothing 3 times with PiClient(self.ignite, self.get_client_config()): pass with PiClient(self.ignite, self.get_client_config()): pass with PiClient(self.ignite, self.get_client_config()): pass # check LocalNodeMovingPartitionsCount metric for all cache groups in cluster # wait that for all cache groups this value will be 0 self.wait_for_finish_rebalance(){code} Here is code from our test on python. self.start_grid() start real grid on distributed servers using ignite.sh scripts. with PiClient block start JVM and runs Ignition.start() with client config (major difference with server config is clientMode=true) Log file of this test contains following information: metric dos not change their state in 240 seconds in current master. (I've recently check this on 15 Aug nightly build) {code:java} Current metric state for cache cache_group_1_028 on node 2: 19 [14:44:58][:568 :617] Wait rebalance to finish 7/240 Current metric state for cache cache_group_1_028 on node 2: 19 [14:45:04][:568 :617] Wait rebalance to finish 13/240 Current metric state for cache cache_group_1_028 on node 2: 19 [14:48:47][:568 :617] Wait rebalance to finish 236/240 Current metric state for cache cache_group_1_028 on node 2: 19{code} Config of the cache that fails: {code:java} {code} I'm afraid that this all information that I can provide for you for now. I've attached jstack from node2: [^node-2-jstack.log] was (Author: qvad): {code:java} def test_blinking_clients_clean_lfs(self): """ IGN-9159 (IGNITE-7165) """ self.wait_for_running_clients_num(client_num=0, timeout=120) self.start_grid() # start 4 nodes for _ in range(0, 10): log_print("Iteration %s" % str(_)) self.assert_nodes_alive() # check that no nodes left grid because of FailHandler self.ignite.kill_node(2) self._cleanup_lfs(2) self.ignite.start_node(2) # start Ignition.start() with client config and do nothing 3 times with PiClient(self.ignite, self.get_client_config()): pass with PiClient(self.ignite, self.get_client_config()): pass with PiClient(self.ignite, self.get_client_config()): pass # check LocalNodeMovingPartitionsCount metric for all cache groups in cluster # wait that for all cache groups this value will be 0 self.wait_for_finish_rebalance(){code} Here is code from our test on python. self.start_grid() start real grid on distributed servers using ignite.sh scripts. with PiClient block start JVM and runs Ignition.start() with client config (major difference with server config is clientMode=true) Log file of this test contains following information: metric dos not change their state in 240 seconds in current master. (I've recently check this on 15 Aug nightly build) {code:java} Current metric state for cache cache_group_1_028 on node 2: 19 [14:44:58][:568 :617] Wait rebalance to finish 7/240 Current metric state for cache cache_group_1_028 on node 2: 19 [14:45:04][:568 :617] Wait rebalance to finish 13/240 Current metric state for cache cache_group_1_028 on node 2: 19 [14:48:47][:568 :617] Wait rebalance to finish 236/240 Current metric state for cache cache_group_1_028 on node 2: 19{code} Config of the cache that fails: {code:java} {code} I'm afraid that this all information that I can provide for you for now. I've attached jstack from node2: [^node-2-jstack.log] > Re-balancing is cancelled if client node joins > -- > > Key: IGNITE-7165 > URL: https://issues.apache.org/jira/browse/IGNITE-
[jira] [Commented] (IGNITE-7165) Re-balancing is cancelled if client node joins
[ https://issues.apache.org/jira/browse/IGNITE-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580970#comment-16580970 ] Dmitry Sherstobitov commented on IGNITE-7165: - {code:java} def test_blinking_clients_clean_lfs(self): """ IGN-9159 (IGNITE-7165) """ self.wait_for_running_clients_num(client_num=0, timeout=120) self.start_grid() # start 4 nodes for _ in range(0, 10): log_print("Iteration %s" % str(_)) self.assert_nodes_alive() # check that no nodes left grid because of FailHandler self.ignite.kill_node(2) self._cleanup_lfs(2) self.ignite.start_node(2) # start Ignition.start() with client config and do nothing 3 times with PiClient(self.ignite, self.get_client_config()): pass with PiClient(self.ignite, self.get_client_config()): pass with PiClient(self.ignite, self.get_client_config()): pass # check LocalNodeMovingPartitionsCount metric for all cache groups in cluster # wait that for all cache groups this value will be 0 self.wait_for_finish_rebalance(){code} Here is code from our test on python. self.start_grid() start real grid on distributed servers using ignite.sh scripts. with PiClient block start JVM and runs Ignition.start() with client config (major difference with server config is clientMode=true) Log file of this test contains following information: metric dos not change their state in 240 seconds in current master. (I've recently check this on 15 Aug nightly build) {code:java} Current metric state for cache cache_group_1_028 on node 2: 19 [14:44:58][:568 :617] Wait rebalance to finish 7/240 Current metric state for cache cache_group_1_028 on node 2: 19 [14:45:04][:568 :617] Wait rebalance to finish 13/240 Current metric state for cache cache_group_1_028 on node 2: 19 [14:48:47][:568 :617] Wait rebalance to finish 236/240 Current metric state for cache cache_group_1_028 on node 2: 19{code} Config of the cache that fails: {code:java} {code} I'm afraid that this all information that I can provide for you for now. I've attached jstack from node2: [^node-2-jstack.log] > Re-balancing is cancelled if client node joins > -- > > Key: IGNITE-7165 > URL: https://issues.apache.org/jira/browse/IGNITE-7165 > Project: Ignite > Issue Type: Bug >Reporter: Mikhail Cherkasov >Assignee: Maxim Muzafarov >Priority: Critical > Labels: rebalance > Fix For: 2.7 > > Attachments: node-2-jstack.log, node-NO_REBALANCE-7165.log > > > Re-balancing is canceled if client node joins. Re-balancing can take hours > and each time when client node joins it starts again: > [15:10:05,700][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager] > Added new node to topology: TcpDiscoveryNode > [id=979cf868-1c37-424a-9ad1-12db501f32ef, addrs=[0:0:0:0:0:0:0:1, 127.0.0.1, > 172.31.16.213], sockAddrs=[/0:0:0:0:0:0:0:1:0, /127.0.0.1:0, > /172.31.16.213:0], discPort=0, order=36, intOrder=24, > lastExchangeTime=1512907805688, loc=false, ver=2.3.1#20171129-sha1:4b1ec0fe, > isClient=true] > [15:10:05,701][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager] > Topology snapshot [ver=36, servers=7, clients=5, CPUs=128, heap=160.0GB] > [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Started > exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], > crd=false, evt=NODE_JOINED, evtNode=979cf868-1c37-424a-9ad1-12db501f32ef, > customEvt=null, allowMerge=true] > [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionsExchangeFuture] > Finish exchange future [startVer=AffinityTopologyVersion [topVer=36, > minorTopVer=0], resVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], > err=null] > [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Finished > exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], > crd=false] > [15:10:05,703][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager] > Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion > [topVer=36, minorTopVer=0], evt=NODE_JOINED, > node=979cf868-1c37-424a-9ad1-12db501f32ef] > [15:10:08,706][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Cancelled rebalancing from all nodes [topology=AffinityTopologyVersion > [topVer=35, minorTopVer=0]] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager] > Rebalancing scheduled [order=[statementp]] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][G
[jira] [Updated] (IGNITE-7165) Re-balancing is cancelled if client node joins
[ https://issues.apache.org/jira/browse/IGNITE-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Sherstobitov updated IGNITE-7165: Attachment: node-2-jstack.log > Re-balancing is cancelled if client node joins > -- > > Key: IGNITE-7165 > URL: https://issues.apache.org/jira/browse/IGNITE-7165 > Project: Ignite > Issue Type: Bug >Reporter: Mikhail Cherkasov >Assignee: Maxim Muzafarov >Priority: Critical > Labels: rebalance > Fix For: 2.7 > > Attachments: node-2-jstack.log, node-NO_REBALANCE-7165.log > > > Re-balancing is canceled if client node joins. Re-balancing can take hours > and each time when client node joins it starts again: > [15:10:05,700][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager] > Added new node to topology: TcpDiscoveryNode > [id=979cf868-1c37-424a-9ad1-12db501f32ef, addrs=[0:0:0:0:0:0:0:1, 127.0.0.1, > 172.31.16.213], sockAddrs=[/0:0:0:0:0:0:0:1:0, /127.0.0.1:0, > /172.31.16.213:0], discPort=0, order=36, intOrder=24, > lastExchangeTime=1512907805688, loc=false, ver=2.3.1#20171129-sha1:4b1ec0fe, > isClient=true] > [15:10:05,701][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager] > Topology snapshot [ver=36, servers=7, clients=5, CPUs=128, heap=160.0GB] > [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Started > exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], > crd=false, evt=NODE_JOINED, evtNode=979cf868-1c37-424a-9ad1-12db501f32ef, > customEvt=null, allowMerge=true] > [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionsExchangeFuture] > Finish exchange future [startVer=AffinityTopologyVersion [topVer=36, > minorTopVer=0], resVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], > err=null] > [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Finished > exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], > crd=false] > [15:10:05,703][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager] > Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion > [topVer=36, minorTopVer=0], evt=NODE_JOINED, > node=979cf868-1c37-424a-9ad1-12db501f32ef] > [15:10:08,706][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Cancelled rebalancing from all nodes [topology=AffinityTopologyVersion > [topVer=35, minorTopVer=0]] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager] > Rebalancing scheduled [order=[statementp]] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager] > Rebalancing started [top=null, evt=NODE_JOINED, > node=a8be3c14-9add-48c3-b099-3fd304cfdbf4] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=2f6bde48-ffb5-4815-bd32-df4e57dc13e0, partitionsCount=18, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=35d01141-4dce-47dd-adf6-a4f3b2bb9da9, partitionsCount=15, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=b3a8be53-e61f-4023-a906-a265923837ba, partitionsCount=15, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=f825cb4e-7dcc-405f-a40d-c1dc1a3ade5a, partitionsCount=12, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=4ae1db91-8b88-4180-a84b-127a303959e9, partitionsCount=11, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=7c286481-7638-49e4-8c68-fa6aa65d8b76, partitionsCount=18, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > so in clusters with a big amount of data and the frequent client left/join > events this means that a new server will never receive its partitions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (IGNITE-7165) Re-balancing is cancelled if client node joins
[ https://issues.apache.org/jira/browse/IGNITE-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579657#comment-16579657 ] Dmitry Sherstobitov edited comment on IGNITE-7165 at 8/14/18 11:22 AM: --- For now, I have no reproducer on Java. I've investigated persistent store in my test and found that there is rebalanced data in storage on the node with cleared LFS, but metrics LocalNodeMovingPartitionsCount is definitely broken after client node joins the cluster. If I remove the client join event after the node is back - rebalance finished correctly. Here is code from my test log: (Rebalance didn't finish in 240 seconds, while in previous versions it's done in 10-15 seconds) [13:14:17][:568 :617] Wait rebalance to finish 8/240Current metric state for cache cache_group_3_088 on node 2: 19 [13:18:04][:568 :617] Wait rebalance to finish 235/240Current metric state for cache cache_group_3_088 on node 2: 19 P.S. Test runs on a distributed environment, not on a single machine was (Author: qvad): For now, I have no reproducer on Java. I've investigated persistent store in my test and found that there is rebalanced data in storage on the node with cleared LFS, but metrics LocalNodeMovingPartitionsCount is definitely broken after client node joins the cluster. If I remove the client join event after the node is back - rebalance finished correctly. Here is code from my test log: (Rebalance didn't finish in 240 seconds, while in previous versions it's done in 10-15 seconds) [13:14:17][:568 :617] Wait rebalance to finish 8/240Current metric state for cache cache_group_3_088 on node 2: 19 [13:18:04][:568 :617] Wait rebalance to finish 235/240Current metric state for cache cache_group_3_088 on node 2: 19 > Re-balancing is cancelled if client node joins > -- > > Key: IGNITE-7165 > URL: https://issues.apache.org/jira/browse/IGNITE-7165 > Project: Ignite > Issue Type: Bug >Reporter: Mikhail Cherkasov >Assignee: Maxim Muzafarov >Priority: Critical > Labels: rebalance > Fix For: 2.7 > > Attachments: node-NO_REBALANCE-7165.log > > > Re-balancing is canceled if client node joins. Re-balancing can take hours > and each time when client node joins it starts again: > [15:10:05,700][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager] > Added new node to topology: TcpDiscoveryNode > [id=979cf868-1c37-424a-9ad1-12db501f32ef, addrs=[0:0:0:0:0:0:0:1, 127.0.0.1, > 172.31.16.213], sockAddrs=[/0:0:0:0:0:0:0:1:0, /127.0.0.1:0, > /172.31.16.213:0], discPort=0, order=36, intOrder=24, > lastExchangeTime=1512907805688, loc=false, ver=2.3.1#20171129-sha1:4b1ec0fe, > isClient=true] > [15:10:05,701][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager] > Topology snapshot [ver=36, servers=7, clients=5, CPUs=128, heap=160.0GB] > [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Started > exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], > crd=false, evt=NODE_JOINED, evtNode=979cf868-1c37-424a-9ad1-12db501f32ef, > customEvt=null, allowMerge=true] > [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionsExchangeFuture] > Finish exchange future [startVer=AffinityTopologyVersion [topVer=36, > minorTopVer=0], resVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], > err=null] > [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Finished > exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], > crd=false] > [15:10:05,703][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager] > Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion > [topVer=36, minorTopVer=0], evt=NODE_JOINED, > node=979cf868-1c37-424a-9ad1-12db501f32ef] > [15:10:08,706][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Cancelled rebalancing from all nodes [topology=AffinityTopologyVersion > [topVer=35, minorTopVer=0]] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager] > Rebalancing scheduled [order=[statementp]] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager] > Rebalancing started [top=null, evt=NODE_JOINED, > node=a8be3c14-9add-48c3-b099-3fd304cfdbf4] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=2f6bde48-ffb5-4815-bd32-df4e57dc13e0, partitionsCount=18, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=35d01141-4d
[jira] [Commented] (IGNITE-7165) Re-balancing is cancelled if client node joins
[ https://issues.apache.org/jira/browse/IGNITE-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579657#comment-16579657 ] Dmitry Sherstobitov commented on IGNITE-7165: - For now, I have no reproducer on Java. I've investigated persistent store in my test and found that there is rebalanced data in storage on the node with cleared LFS, but metrics LocalNodeMovingPartitionsCount is definitely broken after client node joins the cluster. If I remove the client join event after the node is back - rebalance finished correctly. Here is code from my test log: (Rebalance didn't finish in 240 seconds, while in previous versions it's done in 10-15 seconds) [13:14:17][:568 :617] Wait rebalance to finish 8/240Current metric state for cache cache_group_3_088 on node 2: 19 [13:18:04][:568 :617] Wait rebalance to finish 235/240Current metric state for cache cache_group_3_088 on node 2: 19 > Re-balancing is cancelled if client node joins > -- > > Key: IGNITE-7165 > URL: https://issues.apache.org/jira/browse/IGNITE-7165 > Project: Ignite > Issue Type: Bug >Reporter: Mikhail Cherkasov >Assignee: Maxim Muzafarov >Priority: Critical > Labels: rebalance > Fix For: 2.7 > > Attachments: node-NO_REBALANCE-7165.log > > > Re-balancing is canceled if client node joins. Re-balancing can take hours > and each time when client node joins it starts again: > [15:10:05,700][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager] > Added new node to topology: TcpDiscoveryNode > [id=979cf868-1c37-424a-9ad1-12db501f32ef, addrs=[0:0:0:0:0:0:0:1, 127.0.0.1, > 172.31.16.213], sockAddrs=[/0:0:0:0:0:0:0:1:0, /127.0.0.1:0, > /172.31.16.213:0], discPort=0, order=36, intOrder=24, > lastExchangeTime=1512907805688, loc=false, ver=2.3.1#20171129-sha1:4b1ec0fe, > isClient=true] > [15:10:05,701][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager] > Topology snapshot [ver=36, servers=7, clients=5, CPUs=128, heap=160.0GB] > [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Started > exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], > crd=false, evt=NODE_JOINED, evtNode=979cf868-1c37-424a-9ad1-12db501f32ef, > customEvt=null, allowMerge=true] > [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionsExchangeFuture] > Finish exchange future [startVer=AffinityTopologyVersion [topVer=36, > minorTopVer=0], resVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], > err=null] > [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Finished > exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], > crd=false] > [15:10:05,703][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager] > Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion > [topVer=36, minorTopVer=0], evt=NODE_JOINED, > node=979cf868-1c37-424a-9ad1-12db501f32ef] > [15:10:08,706][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Cancelled rebalancing from all nodes [topology=AffinityTopologyVersion > [topVer=35, minorTopVer=0]] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager] > Rebalancing scheduled [order=[statementp]] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager] > Rebalancing started [top=null, evt=NODE_JOINED, > node=a8be3c14-9add-48c3-b099-3fd304cfdbf4] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=2f6bde48-ffb5-4815-bd32-df4e57dc13e0, partitionsCount=18, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=35d01141-4dce-47dd-adf6-a4f3b2bb9da9, partitionsCount=15, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=b3a8be53-e61f-4023-a906-a265923837ba, partitionsCount=15, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=f825cb4e-7dcc-405f-a40d-c1dc1a3ade5a, partitionsCount=12, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=4ae1db91-8b88-4180-a84b-127a303959e9, pa
[jira] [Issue Comment Deleted] (IGNITE-7165) Re-balancing is cancelled if client node joins
[ https://issues.apache.org/jira/browse/IGNITE-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Sherstobitov updated IGNITE-7165: Comment: was deleted (was: I'm afraid I cannot give you correct reproducer on Java Attached log from node with cleared LFS [^node-NO_REBALANCE-7165.log] There is some messaged with "Skipping rebalancing (no affinity changes)" after node joins cluster while in previous version following text appears in log {code:java} [12:53:44,127][INFO][disco-event-worker-#61][GridDiscoveryManager] Topology snapshot [ver=18, servers=4, clients=0, CPUs=32, offheap=75.0GB, heap=120.0GB] [12:53:44,127][INFO][disco-event-worker-#61][GridDiscoveryManager] ^-- Node [id=61E12BC1-31A0-473A-BF79-DDD51C879722, clusterState=ACTIVE] [12:53:44,127][INFO][disco-event-worker-#61][GridDiscoveryManager] ^-- Baseline [id=0, size=4, online=4, offline=0] [12:53:44,127][INFO][disco-event-worker-#61][GridDiscoveryManager] Data Regions Configured: [12:53:44,128][INFO][disco-event-worker-#61][GridDiscoveryManager] ^-- default [initSize=256.0 MiB, maxSize=18.8 GiB, persistenceEnabled=true] [12:53:44,128][INFO][exchange-worker-#62][time] Started exchange init [topVer=AffinityTopologyVersion [topVer=18, minorTopVer=0], crd=false, evt=NODE_FAILED, evtNode=02e72065-13c8-4b47-a905-874d723cc3c1, customEvt=null, allowMerge=true] [12:53:44,129][INFO][exchange-worker-#62][GridDhtPartitionsExchangeFuture] Finish exchange future [startVer=AffinityTopologyVersion [topVer=18, minorTopVer=0], resVer=AffinityTopologyVersion [topVer=18, minorTopVer=0], err=null] [12:53:44,130][INFO][exchange-worker-#62][time] Finished exchange init [topVer=AffinityTopologyVersion [topVer=18, minorTopVer=0], crd=false] [12:53:44,141][INFO][exchange-worker-#62][GridDhtPartitionDemander] Cancelled rebalancing from all nodes [topology=AffinityTopologyVersion [topVer=17, minorTopVer=0]] [12:53:44,141][INFO][exchange-worker-#62][GridDhtPartitionDemander] Completed rebalance future: RebalanceFuture [grp=CacheGroupContext [grp=cache_group_1_028], topVer=AffinityTopologyVersion [topVer=17, minorTopVer=0], rebalanceId=6] [12:53:44,141][INFO][exchange-worker-#62][GridDhtPartitionDemander] Cancelled rebalancing from all nodes [topology=AffinityTopologyVersion [topVer=17, minorTopVer=0]] [12:53:44,141][INFO][exchange-worker-#62][GridDhtPartitionDemander] Completed rebalance future: RebalanceFuture [grp=CacheGroupContext [grp=cache_group_3_088], topVer=AffinityTopologyVersion [topVer=17, minorTopVer=0], rebalanceId=6] [12:53:44,141][INFO][exchange-worker-#62][GridDhtPartitionDemander] Cancelled rebalancing from all nodes [topology=AffinityTopologyVersion [topVer=17, minorTopVer=0]] [12:53:44,141][INFO][exchange-worker-#62][GridDhtPartitionDemander] Completed rebalance future: RebalanceFuture [grp=CacheGroupContext [grp=cache_group_1_015], topVer=AffinityTopologyVersion [topVer=17, minorTopVer=0], rebalanceId=6] [12:53:44,141][INFO][exchange-worker-#62][GridDhtPartitionDemander] Cancelled rebalancing from all nodes [topology=AffinityTopologyVersion [topVer=17, minorTopVer=0]] [12:53:44,141][INFO][exchange-worker-#62][GridDhtPartitionDemander] Completed rebalance future: RebalanceFuture [grp=CacheGroupContext [grp=cache_group_4_118], topVer=AffinityTopologyVersion [topVer=17, minorTopVer=0], rebalanceId=6] [12:53:44,141][INFO][exchange-worker-#62][GridDhtPartitionDemander] Cancelled rebalancing from all nodes [topology=AffinityTopologyVersion [topVer=17, minorTopVer=0]] [12:53:44,141][INFO][exchange-worker-#62][GridDhtPartitionDemander] Completed rebalance future: RebalanceFuture [grp=CacheGroupContext [grp=cache_group_2_058], topVer=AffinityTopologyVersion [topVer=17, minorTopVer=0], rebalanceId=6] [12:53:44,141][INFO][exchange-worker-#62][GridDhtPartitionDemander] Cancelled rebalancing from all nodes [topology=AffinityTopologyVersion [topVer=17, minorTopVer=0]] [12:53:44,142][INFO][exchange-worker-#62][GridDhtPartitionDemander] Completed rebalance future: RebalanceFuture [grp=CacheGroupContext [grp=cache_group_6], topVer=AffinityTopologyVersion [topVer=17, minorTopVer=0], rebalanceId=6] [12:53:44,142][INFO][exchange-worker-#62][GridDhtPartitionDemander] Cancelled rebalancing from all nodes [topology=AffinityTopologyVersion [topVer=17, minorTopVer=0]] [12:53:44,142][INFO][exchange-worker-#62][GridDhtPartitionDemander] Completed rebalance future: RebalanceFuture [grp=CacheGroupContext [grp=cache_group_5], topVer=AffinityTopologyVersion [topVer=17, minorTopVer=0], rebalanceId=6] [12:53:44,142][INFO][exchange-worker-#62][GridDhtPartitionDemander] Cancelled rebalancing from all nodes [topology=AffinityTopologyVersion [topVer=17, minorTopVer=0]] [12:53:44,142][INFO][exchange-worker-#62][GridDhtPartitionDemander] Completed rebalance future: RebalanceFuture [grp=CacheGroupContext [grp=cache_group_4], topVer=AffinityTopologyVersio
[jira] [Commented] (IGNITE-7165) Re-balancing is cancelled if client node joins
[ https://issues.apache.org/jira/browse/IGNITE-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579579#comment-16579579 ] Dmitry Sherstobitov commented on IGNITE-7165: - I'm afraid I cannot give you correct reproducer on Java Attached log from node with cleared LFS [^node-NO_REBALANCE-7165.log] There is some messaged with "Skipping rebalancing (no affinity changes)" after node joins cluster while in previous version following text appears in log {code:java} [12:53:44,127][INFO][disco-event-worker-#61][GridDiscoveryManager] Topology snapshot [ver=18, servers=4, clients=0, CPUs=32, offheap=75.0GB, heap=120.0GB] [12:53:44,127][INFO][disco-event-worker-#61][GridDiscoveryManager] ^-- Node [id=61E12BC1-31A0-473A-BF79-DDD51C879722, clusterState=ACTIVE] [12:53:44,127][INFO][disco-event-worker-#61][GridDiscoveryManager] ^-- Baseline [id=0, size=4, online=4, offline=0] [12:53:44,127][INFO][disco-event-worker-#61][GridDiscoveryManager] Data Regions Configured: [12:53:44,128][INFO][disco-event-worker-#61][GridDiscoveryManager] ^-- default [initSize=256.0 MiB, maxSize=18.8 GiB, persistenceEnabled=true] [12:53:44,128][INFO][exchange-worker-#62][time] Started exchange init [topVer=AffinityTopologyVersion [topVer=18, minorTopVer=0], crd=false, evt=NODE_FAILED, evtNode=02e72065-13c8-4b47-a905-874d723cc3c1, customEvt=null, allowMerge=true] [12:53:44,129][INFO][exchange-worker-#62][GridDhtPartitionsExchangeFuture] Finish exchange future [startVer=AffinityTopologyVersion [topVer=18, minorTopVer=0], resVer=AffinityTopologyVersion [topVer=18, minorTopVer=0], err=null] [12:53:44,130][INFO][exchange-worker-#62][time] Finished exchange init [topVer=AffinityTopologyVersion [topVer=18, minorTopVer=0], crd=false] [12:53:44,141][INFO][exchange-worker-#62][GridDhtPartitionDemander] Cancelled rebalancing from all nodes [topology=AffinityTopologyVersion [topVer=17, minorTopVer=0]] [12:53:44,141][INFO][exchange-worker-#62][GridDhtPartitionDemander] Completed rebalance future: RebalanceFuture [grp=CacheGroupContext [grp=cache_group_1_028], topVer=AffinityTopologyVersion [topVer=17, minorTopVer=0], rebalanceId=6] [12:53:44,141][INFO][exchange-worker-#62][GridDhtPartitionDemander] Cancelled rebalancing from all nodes [topology=AffinityTopologyVersion [topVer=17, minorTopVer=0]] [12:53:44,141][INFO][exchange-worker-#62][GridDhtPartitionDemander] Completed rebalance future: RebalanceFuture [grp=CacheGroupContext [grp=cache_group_3_088], topVer=AffinityTopologyVersion [topVer=17, minorTopVer=0], rebalanceId=6] [12:53:44,141][INFO][exchange-worker-#62][GridDhtPartitionDemander] Cancelled rebalancing from all nodes [topology=AffinityTopologyVersion [topVer=17, minorTopVer=0]] [12:53:44,141][INFO][exchange-worker-#62][GridDhtPartitionDemander] Completed rebalance future: RebalanceFuture [grp=CacheGroupContext [grp=cache_group_1_015], topVer=AffinityTopologyVersion [topVer=17, minorTopVer=0], rebalanceId=6] [12:53:44,141][INFO][exchange-worker-#62][GridDhtPartitionDemander] Cancelled rebalancing from all nodes [topology=AffinityTopologyVersion [topVer=17, minorTopVer=0]] [12:53:44,141][INFO][exchange-worker-#62][GridDhtPartitionDemander] Completed rebalance future: RebalanceFuture [grp=CacheGroupContext [grp=cache_group_4_118], topVer=AffinityTopologyVersion [topVer=17, minorTopVer=0], rebalanceId=6] [12:53:44,141][INFO][exchange-worker-#62][GridDhtPartitionDemander] Cancelled rebalancing from all nodes [topology=AffinityTopologyVersion [topVer=17, minorTopVer=0]] [12:53:44,141][INFO][exchange-worker-#62][GridDhtPartitionDemander] Completed rebalance future: RebalanceFuture [grp=CacheGroupContext [grp=cache_group_2_058], topVer=AffinityTopologyVersion [topVer=17, minorTopVer=0], rebalanceId=6] [12:53:44,141][INFO][exchange-worker-#62][GridDhtPartitionDemander] Cancelled rebalancing from all nodes [topology=AffinityTopologyVersion [topVer=17, minorTopVer=0]] [12:53:44,142][INFO][exchange-worker-#62][GridDhtPartitionDemander] Completed rebalance future: RebalanceFuture [grp=CacheGroupContext [grp=cache_group_6], topVer=AffinityTopologyVersion [topVer=17, minorTopVer=0], rebalanceId=6] [12:53:44,142][INFO][exchange-worker-#62][GridDhtPartitionDemander] Cancelled rebalancing from all nodes [topology=AffinityTopologyVersion [topVer=17, minorTopVer=0]] [12:53:44,142][INFO][exchange-worker-#62][GridDhtPartitionDemander] Completed rebalance future: RebalanceFuture [grp=CacheGroupContext [grp=cache_group_5], topVer=AffinityTopologyVersion [topVer=17, minorTopVer=0], rebalanceId=6] [12:53:44,142][INFO][exchange-worker-#62][GridDhtPartitionDemander] Cancelled rebalancing from all nodes [topology=AffinityTopologyVersion [topVer=17, minorTopVer=0]] [12:53:44,142][INFO][exchange-worker-#62][GridDhtPartitionDemander] Completed rebalance future: RebalanceFuture [grp=CacheGroupContext [grp=cache_group_4], top
[jira] [Updated] (IGNITE-7165) Re-balancing is cancelled if client node joins
[ https://issues.apache.org/jira/browse/IGNITE-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Sherstobitov updated IGNITE-7165: Attachment: node-NO_REBALANCE-7165.log > Re-balancing is cancelled if client node joins > -- > > Key: IGNITE-7165 > URL: https://issues.apache.org/jira/browse/IGNITE-7165 > Project: Ignite > Issue Type: Bug >Reporter: Mikhail Cherkasov >Assignee: Maxim Muzafarov >Priority: Critical > Labels: rebalance > Fix For: 2.7 > > Attachments: node-NO_REBALANCE-7165.log > > > Re-balancing is canceled if client node joins. Re-balancing can take hours > and each time when client node joins it starts again: > [15:10:05,700][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager] > Added new node to topology: TcpDiscoveryNode > [id=979cf868-1c37-424a-9ad1-12db501f32ef, addrs=[0:0:0:0:0:0:0:1, 127.0.0.1, > 172.31.16.213], sockAddrs=[/0:0:0:0:0:0:0:1:0, /127.0.0.1:0, > /172.31.16.213:0], discPort=0, order=36, intOrder=24, > lastExchangeTime=1512907805688, loc=false, ver=2.3.1#20171129-sha1:4b1ec0fe, > isClient=true] > [15:10:05,701][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager] > Topology snapshot [ver=36, servers=7, clients=5, CPUs=128, heap=160.0GB] > [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Started > exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], > crd=false, evt=NODE_JOINED, evtNode=979cf868-1c37-424a-9ad1-12db501f32ef, > customEvt=null, allowMerge=true] > [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionsExchangeFuture] > Finish exchange future [startVer=AffinityTopologyVersion [topVer=36, > minorTopVer=0], resVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], > err=null] > [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Finished > exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], > crd=false] > [15:10:05,703][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager] > Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion > [topVer=36, minorTopVer=0], evt=NODE_JOINED, > node=979cf868-1c37-424a-9ad1-12db501f32ef] > [15:10:08,706][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Cancelled rebalancing from all nodes [topology=AffinityTopologyVersion > [topVer=35, minorTopVer=0]] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager] > Rebalancing scheduled [order=[statementp]] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager] > Rebalancing started [top=null, evt=NODE_JOINED, > node=a8be3c14-9add-48c3-b099-3fd304cfdbf4] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=2f6bde48-ffb5-4815-bd32-df4e57dc13e0, partitionsCount=18, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=35d01141-4dce-47dd-adf6-a4f3b2bb9da9, partitionsCount=15, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=b3a8be53-e61f-4023-a906-a265923837ba, partitionsCount=15, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=f825cb4e-7dcc-405f-a40d-c1dc1a3ade5a, partitionsCount=12, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=4ae1db91-8b88-4180-a84b-127a303959e9, partitionsCount=11, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=7c286481-7638-49e4-8c68-fa6aa65d8b76, partitionsCount=18, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > so in clusters with a big amount of data and the frequent client left/join > events this means that a new server will never receive its partitions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-7165) Re-balancing is cancelled if client node joins
[ https://issues.apache.org/jira/browse/IGNITE-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Sherstobitov updated IGNITE-7165: Attachment: (was: node-NO_REBALANCE-7165.log) > Re-balancing is cancelled if client node joins > -- > > Key: IGNITE-7165 > URL: https://issues.apache.org/jira/browse/IGNITE-7165 > Project: Ignite > Issue Type: Bug >Reporter: Mikhail Cherkasov >Assignee: Maxim Muzafarov >Priority: Critical > Labels: rebalance > Fix For: 2.7 > > > Re-balancing is canceled if client node joins. Re-balancing can take hours > and each time when client node joins it starts again: > [15:10:05,700][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager] > Added new node to topology: TcpDiscoveryNode > [id=979cf868-1c37-424a-9ad1-12db501f32ef, addrs=[0:0:0:0:0:0:0:1, 127.0.0.1, > 172.31.16.213], sockAddrs=[/0:0:0:0:0:0:0:1:0, /127.0.0.1:0, > /172.31.16.213:0], discPort=0, order=36, intOrder=24, > lastExchangeTime=1512907805688, loc=false, ver=2.3.1#20171129-sha1:4b1ec0fe, > isClient=true] > [15:10:05,701][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager] > Topology snapshot [ver=36, servers=7, clients=5, CPUs=128, heap=160.0GB] > [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Started > exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], > crd=false, evt=NODE_JOINED, evtNode=979cf868-1c37-424a-9ad1-12db501f32ef, > customEvt=null, allowMerge=true] > [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionsExchangeFuture] > Finish exchange future [startVer=AffinityTopologyVersion [topVer=36, > minorTopVer=0], resVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], > err=null] > [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Finished > exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], > crd=false] > [15:10:05,703][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager] > Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion > [topVer=36, minorTopVer=0], evt=NODE_JOINED, > node=979cf868-1c37-424a-9ad1-12db501f32ef] > [15:10:08,706][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Cancelled rebalancing from all nodes [topology=AffinityTopologyVersion > [topVer=35, minorTopVer=0]] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager] > Rebalancing scheduled [order=[statementp]] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager] > Rebalancing started [top=null, evt=NODE_JOINED, > node=a8be3c14-9add-48c3-b099-3fd304cfdbf4] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=2f6bde48-ffb5-4815-bd32-df4e57dc13e0, partitionsCount=18, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=35d01141-4dce-47dd-adf6-a4f3b2bb9da9, partitionsCount=15, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=b3a8be53-e61f-4023-a906-a265923837ba, partitionsCount=15, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=f825cb4e-7dcc-405f-a40d-c1dc1a3ade5a, partitionsCount=12, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=4ae1db91-8b88-4180-a84b-127a303959e9, partitionsCount=11, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=7c286481-7638-49e4-8c68-fa6aa65d8b76, partitionsCount=18, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > so in clusters with a big amount of data and the frequent client left/join > events this means that a new server will never receive its partitions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-7165) Re-balancing is cancelled if client node joins
[ https://issues.apache.org/jira/browse/IGNITE-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Sherstobitov updated IGNITE-7165: Attachment: node-NO_REBALANCE-7165.log > Re-balancing is cancelled if client node joins > -- > > Key: IGNITE-7165 > URL: https://issues.apache.org/jira/browse/IGNITE-7165 > Project: Ignite > Issue Type: Bug >Reporter: Mikhail Cherkasov >Assignee: Maxim Muzafarov >Priority: Critical > Labels: rebalance > Fix For: 2.7 > > Attachments: node-NO_REBALANCE-7165.log > > > Re-balancing is canceled if client node joins. Re-balancing can take hours > and each time when client node joins it starts again: > [15:10:05,700][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager] > Added new node to topology: TcpDiscoveryNode > [id=979cf868-1c37-424a-9ad1-12db501f32ef, addrs=[0:0:0:0:0:0:0:1, 127.0.0.1, > 172.31.16.213], sockAddrs=[/0:0:0:0:0:0:0:1:0, /127.0.0.1:0, > /172.31.16.213:0], discPort=0, order=36, intOrder=24, > lastExchangeTime=1512907805688, loc=false, ver=2.3.1#20171129-sha1:4b1ec0fe, > isClient=true] > [15:10:05,701][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager] > Topology snapshot [ver=36, servers=7, clients=5, CPUs=128, heap=160.0GB] > [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Started > exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], > crd=false, evt=NODE_JOINED, evtNode=979cf868-1c37-424a-9ad1-12db501f32ef, > customEvt=null, allowMerge=true] > [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionsExchangeFuture] > Finish exchange future [startVer=AffinityTopologyVersion [topVer=36, > minorTopVer=0], resVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], > err=null] > [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Finished > exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], > crd=false] > [15:10:05,703][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager] > Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion > [topVer=36, minorTopVer=0], evt=NODE_JOINED, > node=979cf868-1c37-424a-9ad1-12db501f32ef] > [15:10:08,706][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Cancelled rebalancing from all nodes [topology=AffinityTopologyVersion > [topVer=35, minorTopVer=0]] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager] > Rebalancing scheduled [order=[statementp]] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager] > Rebalancing started [top=null, evt=NODE_JOINED, > node=a8be3c14-9add-48c3-b099-3fd304cfdbf4] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=2f6bde48-ffb5-4815-bd32-df4e57dc13e0, partitionsCount=18, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=35d01141-4dce-47dd-adf6-a4f3b2bb9da9, partitionsCount=15, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=b3a8be53-e61f-4023-a906-a265923837ba, partitionsCount=15, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=f825cb4e-7dcc-405f-a40d-c1dc1a3ade5a, partitionsCount=12, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=4ae1db91-8b88-4180-a84b-127a303959e9, partitionsCount=11, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=7c286481-7638-49e4-8c68-fa6aa65d8b76, partitionsCount=18, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > so in clusters with a big amount of data and the frequent client left/join > events this means that a new server will never receive its partitions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-7165) Re-balancing is cancelled if client node joins
[ https://issues.apache.org/jira/browse/IGNITE-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16578279#comment-16578279 ] Dmitry Sherstobitov commented on IGNITE-7165: - I've problem with current solution Following test passed on version before fix, and hangs on current master on first iteration. Test hangs on JMX LocalNodeMovingPartitionsCount and looks like rebalance did not started at all. Repeat 10 times: 1. stop node 2. clean lfs 3. add stopped node (trigger rebalance) 4. 3 times: start 2 clients, wait for topology snapshot, close clients 5. for each cache group check JMX metrics LocalNodeMovingPartitionsCount (like waitForFinishRebalance()) > Re-balancing is cancelled if client node joins > -- > > Key: IGNITE-7165 > URL: https://issues.apache.org/jira/browse/IGNITE-7165 > Project: Ignite > Issue Type: Bug >Reporter: Mikhail Cherkasov >Assignee: Maxim Muzafarov >Priority: Critical > Labels: rebalance > Fix For: 2.7 > > > Re-balancing is canceled if client node joins. Re-balancing can take hours > and each time when client node joins it starts again: > [15:10:05,700][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager] > Added new node to topology: TcpDiscoveryNode > [id=979cf868-1c37-424a-9ad1-12db501f32ef, addrs=[0:0:0:0:0:0:0:1, 127.0.0.1, > 172.31.16.213], sockAddrs=[/0:0:0:0:0:0:0:1:0, /127.0.0.1:0, > /172.31.16.213:0], discPort=0, order=36, intOrder=24, > lastExchangeTime=1512907805688, loc=false, ver=2.3.1#20171129-sha1:4b1ec0fe, > isClient=true] > [15:10:05,701][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager] > Topology snapshot [ver=36, servers=7, clients=5, CPUs=128, heap=160.0GB] > [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Started > exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], > crd=false, evt=NODE_JOINED, evtNode=979cf868-1c37-424a-9ad1-12db501f32ef, > customEvt=null, allowMerge=true] > [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionsExchangeFuture] > Finish exchange future [startVer=AffinityTopologyVersion [topVer=36, > minorTopVer=0], resVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], > err=null] > [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Finished > exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], > crd=false] > [15:10:05,703][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager] > Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion > [topVer=36, minorTopVer=0], evt=NODE_JOINED, > node=979cf868-1c37-424a-9ad1-12db501f32ef] > [15:10:08,706][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Cancelled rebalancing from all nodes [topology=AffinityTopologyVersion > [topVer=35, minorTopVer=0]] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager] > Rebalancing scheduled [order=[statementp]] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager] > Rebalancing started [top=null, evt=NODE_JOINED, > node=a8be3c14-9add-48c3-b099-3fd304cfdbf4] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=2f6bde48-ffb5-4815-bd32-df4e57dc13e0, partitionsCount=18, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=35d01141-4dce-47dd-adf6-a4f3b2bb9da9, partitionsCount=15, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=b3a8be53-e61f-4023-a906-a265923837ba, partitionsCount=15, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=f825cb4e-7dcc-405f-a40d-c1dc1a3ade5a, partitionsCount=12, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=4ae1db91-8b88-4180-a84b-127a303959e9, partitionsCount=11, > topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], > updateSeq=-1754630006] > [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander] > Starting rebalancing [mode=ASYNC, > fromNode=7c286481-7638-49e4-8c68-fa6aa65d8b76, partitions
[jira] [Created] (IGNITE-8895) Update yardstick libraries
Dmitry Sherstobitov created IGNITE-8895: --- Summary: Update yardstick libraries Key: IGNITE-8895 URL: https://issues.apache.org/jira/browse/IGNITE-8895 Project: Ignite Issue Type: Bug Affects Versions: 2.5 Reporter: Dmitry Sherstobitov There is some conflicts in yardstick libraries for now ||yardstick||core||problem|| |jline-0.9.94.jar|bin/include/sqlline/jline-2.4.3.jar|./sqlline.sh unable to start if yardstick libraries in path| -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8893) Blinking node in baseline may corrupt own WAL records
Dmitry Sherstobitov created IGNITE-8893: --- Summary: Blinking node in baseline may corrupt own WAL records Key: IGNITE-8893 URL: https://issues.apache.org/jira/browse/IGNITE-8893 Project: Ignite Issue Type: Bug Affects Versions: 2.5 Reporter: Dmitry Sherstobitov # Start cluster, load data # Start additional node that not in BLT # Repeat 10 times: kill 1 node in baseline and 1 node not in baseline, start node in blt and node not in BLT Node in baseline in some moment may unable to start because of corrupted WAL: Notice that there is no loading on cluster at all - so there is no reason to corrupt WAL, rebalance should be interruptible. Also there is another scenario that may case same error (but also may cause JVM crash) # Start cluster, load data, start nodes # Repeat 10 times: kill 1 node in baseline, clean LFS, start node again, while rebalance blink node that should rebalance data to previously killed node Node that should rebalance data to cleaned node may corrupt own WAL. But this second scenario has configuration "error" - number of backups in each case is 1. So obviously 2 nodes blinking actually may cause data loss. {code:java} [2018-06-28 17:33:39,583][ERROR][wal-file-archiver%null-#63][root] Critical system error detected. Will be handled accordingly to configured handler [hnd=class o.a.i.failure.StopNodeOrHaltFailureHandler, failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=java.lang.AssertionError: lastArchived=757, current=42]] java.lang.AssertionError: lastArchived=757, current=42 at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileArchiver.body(FileWriteAheadLogManager.java:1629) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-8879) Blinking baseline node sometimes unable to connect to cluster
[ https://issues.apache.org/jira/browse/IGNITE-8879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Sherstobitov updated IGNITE-8879: Description: Almost the same scenario as in IGNITE-8874 but node left baseline while blinking All caches with 2 backups 4 nodes in cluster # Start cluster, load data # Start transactional loading (8 threads, 100 ops/second put/get in each op) # Repeat 10 times: kill one node, remove from baseline, start node again (*with no LFS clean*), wait for rebalance # Check idle_verify, check data corruption At some point killed node unable to start and join cluster because of error (Attachments info: grid.1.node2.X.log - blinking node logs, X - iteration counter from step 3) {code:java} 080ee8-END.bin] [2018-06-26 19:01:43,039][INFO ][main][PageMemoryImpl] Started page memory [memoryAllocated=100.0 MiB, pages=24800, tableSize=1.9 MiB, checkpointBuffer=100.0 MiB] [2018-06-26 19:01:43,039][INFO ][main][GridCacheDatabaseSharedManager] Checking memory state [lastValidPos=FileWALPointer [idx=0, fileOff=583691, len=119], lastMarked=FileWALPointer [idx=0, fileOff=583691, len=119], lastCheckpointId=7fca4dbb-8f01-4b63-95e2-43283b080ee8] [2018-06-26 19:01:43,050][INFO ][main][GridCacheDatabaseSharedManager] Found last checkpoint marker [cpId=7fca4dbb-8f01-4b63-95e2-43283b080ee8, pos=FileWALPointer [idx=0, fileOff=583691, len=119]] [2018-06-26 19:01:43,082][INFO ][main][FileWriteAheadLogManager] Stopping WAL iteration due to an exception: EOF at position [100] expected to read [1] bytes, ptr=FileWALPointer [idx=0, fileOff=100, len=0] [2018-06-26 19:01:43,219][WARN ][main][FileWriteAheadLogManager] WAL segment tail is reached. [ Expected next state: {Index=19,Offset=794017}, Actual state : {Index=3602879702215753728,Offset=775434544} ] [2018-06-26 19:01:43,243][INFO ][main][GridCacheDatabaseSharedManager] Applying lost cache updates since last checkpoint record [lastMarked=FileWALPointer [idx=0, fileOff=583691, len=119], lastCheckpointId=7fca4dbb-8f01-4b63-95e2-43283b080ee8] [2018-06-26 19:01:43,246][INFO ][main][FileWriteAheadLogManager] Stopping WAL iteration due to an exception: EOF at position [100] expected to read [1] bytes, ptr=FileWALPointer [idx=0, fileOff=100, len=0] [2018-06-26 19:01:43,336][WARN ][main][FileWriteAheadLogManager] WAL segment tail is reached. [ Expected next state: {Index=19,Offset=794017}, Actual state : {Index=3602879702215753728,Offset=775434544} ] [2018-06-26 19:01:43,336][INFO ][main][GridCacheDatabaseSharedManager] Finished applying WAL changes [updatesApplied=0, time=101ms] [2018-06-26 19:01:43,450][INFO ][main][GridSnapshotAwareClusterStateProcessorImpl] Restoring history for BaselineTopology[id=4] [2018-06-26 19:01:43,454][ERROR][main][IgniteKernal] Exception during start processors, node will be stopped and close connections class org.apache.ignite.IgniteCheckedException: Failed to start processor: GridProcessorAdapter [] at org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1769) at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1001) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2020) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1725) at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1153) at org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1071) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:957) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:856) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:726) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:695) at org.apache.ignite.Ignition.start(Ignition.java:352) at org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:301) Caused by: class org.apache.ignite.IgniteCheckedException: Restoring of BaselineTopology history has failed, expected history item not found for id=1 at org.apache.ignite.internal.processors.cluster.BaselineTopologyHistory.restoreHistory(BaselineTopologyHistory.java:54) at org.apache.ignite.internal.processors.cluster.GridClusterStateProcessor.onReadyForRead(GridClusterStateProcessor.java:222) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetastorageReadyForRead(GridCacheDatabaseSharedManager.java:381) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readMetastore(GridCacheDatabaseSharedManager.java:643) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.start0(GridCacheDatabaseSharedManager.java:486) at org.apache.ignite.internal.processors.c
[jira] [Updated] (IGNITE-8879) Blinking baseline node sometimes unable to connect to cluster
[ https://issues.apache.org/jira/browse/IGNITE-8879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Sherstobitov updated IGNITE-8879: Attachment: IGNITE-8879.zip > Blinking baseline node sometimes unable to connect to cluster > - > > Key: IGNITE-8879 > URL: https://issues.apache.org/jira/browse/IGNITE-8879 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.5 >Reporter: Dmitry Sherstobitov >Priority: Major > Attachments: IGNITE-8879.zip > > > Almost the same scenario as in IGNITE-8874 but node left baseline while > blinking > All caches with 2 backups > 4 nodes in cluster > # Start cluster, load data > # Start transactional loading (8 threads, 100 ops/second put/get in each op) > # Repeat 10 times: kill one node, remove from baseline, start node again > (*with no LFS clean*), wait for rebalance > # Check idle_verify, check data corruption > > At some point killed node unable to start and join cluster because of > {code:java} > 080ee8-END.bin] > [2018-06-26 19:01:43,039][INFO ][main][PageMemoryImpl] Started page memory > [memoryAllocated=100.0 MiB, pages=24800, tableSize=1.9 MiB, > checkpointBuffer=100.0 MiB] > [2018-06-26 19:01:43,039][INFO ][main][GridCacheDatabaseSharedManager] > Checking memory state [lastValidPos=FileWALPointer [idx=0, fileOff=583691, > len=119], lastMarked=FileWALPointer [idx=0, fileOff=583691, len=119], > lastCheckpointId=7fca4dbb-8f01-4b63-95e2-43283b080ee8] > [2018-06-26 19:01:43,050][INFO ][main][GridCacheDatabaseSharedManager] Found > last checkpoint marker [cpId=7fca4dbb-8f01-4b63-95e2-43283b080ee8, > pos=FileWALPointer [idx=0, fileOff=583691, len=119]] > [2018-06-26 19:01:43,082][INFO ][main][FileWriteAheadLogManager] Stopping WAL > iteration due to an exception: EOF at position [100] expected to read [1] > bytes, ptr=FileWALPointer [idx=0, fileOff=100, len=0] > [2018-06-26 19:01:43,219][WARN ][main][FileWriteAheadLogManager] WAL segment > tail is reached. [ Expected next state: {Index=19,Offset=794017}, Actual > state : {Index=3602879702215753728,Offset=775434544} ] > [2018-06-26 19:01:43,243][INFO ][main][GridCacheDatabaseSharedManager] > Applying lost cache updates since last checkpoint record > [lastMarked=FileWALPointer [idx=0, fileOff=583691, len=119], > lastCheckpointId=7fca4dbb-8f01-4b63-95e2-43283b080ee8] > [2018-06-26 19:01:43,246][INFO ][main][FileWriteAheadLogManager] Stopping WAL > iteration due to an exception: EOF at position [100] expected to read [1] > bytes, ptr=FileWALPointer [idx=0, fileOff=100, len=0] > [2018-06-26 19:01:43,336][WARN ][main][FileWriteAheadLogManager] WAL segment > tail is reached. [ Expected next state: {Index=19,Offset=794017}, Actual > state : {Index=3602879702215753728,Offset=775434544} ] > [2018-06-26 19:01:43,336][INFO ][main][GridCacheDatabaseSharedManager] > Finished applying WAL changes [updatesApplied=0, time=101ms] > [2018-06-26 19:01:43,450][INFO > ][main][GridSnapshotAwareClusterStateProcessorImpl] Restoring history for > BaselineTopology[id=4] > [2018-06-26 19:01:43,454][ERROR][main][IgniteKernal] Exception during start > processors, node will be stopped and close connections > class org.apache.ignite.IgniteCheckedException: Failed to start processor: > GridProcessorAdapter [] > at > org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1769) > at > org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1001) > at > org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2020) > at > org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1725) > at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1153) > at > org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1071) > at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:957) > at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:856) > at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:726) > at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:695) > at org.apache.ignite.Ignition.start(Ignition.java:352) > at > org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:301) > Caused by: class org.apache.ignite.IgniteCheckedException: Restoring of > BaselineTopology history has failed, expected history item not found for id=1 > at > org.apache.ignite.internal.processors.cluster.BaselineTopologyHistory.restoreHistory(BaselineTopologyHistory.java:54) > at > org.apache.ignite.internal.processors.cluster.GridClusterStateProcessor.onReadyForRead(GridClusterStateProcessor.java:222) > at > org.apache.igni
[jira] [Created] (IGNITE-8879) Blinking baseline node sometimes unable to connect to cluster
Dmitry Sherstobitov created IGNITE-8879: --- Summary: Blinking baseline node sometimes unable to connect to cluster Key: IGNITE-8879 URL: https://issues.apache.org/jira/browse/IGNITE-8879 Project: Ignite Issue Type: Bug Affects Versions: 2.5 Reporter: Dmitry Sherstobitov Almost the same scenario as in IGNITE-8874 but node left baseline while blinking All caches with 2 backups 4 nodes in cluster # Start cluster, load data # Start transactional loading (8 threads, 100 ops/second put/get in each op) # Repeat 10 times: kill one node, remove from baseline, start node again (*with no LFS clean*), wait for rebalance # Check idle_verify, check data corruption At some point killed node unable to start and join cluster because of {code:java} 080ee8-END.bin] [2018-06-26 19:01:43,039][INFO ][main][PageMemoryImpl] Started page memory [memoryAllocated=100.0 MiB, pages=24800, tableSize=1.9 MiB, checkpointBuffer=100.0 MiB] [2018-06-26 19:01:43,039][INFO ][main][GridCacheDatabaseSharedManager] Checking memory state [lastValidPos=FileWALPointer [idx=0, fileOff=583691, len=119], lastMarked=FileWALPointer [idx=0, fileOff=583691, len=119], lastCheckpointId=7fca4dbb-8f01-4b63-95e2-43283b080ee8] [2018-06-26 19:01:43,050][INFO ][main][GridCacheDatabaseSharedManager] Found last checkpoint marker [cpId=7fca4dbb-8f01-4b63-95e2-43283b080ee8, pos=FileWALPointer [idx=0, fileOff=583691, len=119]] [2018-06-26 19:01:43,082][INFO ][main][FileWriteAheadLogManager] Stopping WAL iteration due to an exception: EOF at position [100] expected to read [1] bytes, ptr=FileWALPointer [idx=0, fileOff=100, len=0] [2018-06-26 19:01:43,219][WARN ][main][FileWriteAheadLogManager] WAL segment tail is reached. [ Expected next state: {Index=19,Offset=794017}, Actual state : {Index=3602879702215753728,Offset=775434544} ] [2018-06-26 19:01:43,243][INFO ][main][GridCacheDatabaseSharedManager] Applying lost cache updates since last checkpoint record [lastMarked=FileWALPointer [idx=0, fileOff=583691, len=119], lastCheckpointId=7fca4dbb-8f01-4b63-95e2-43283b080ee8] [2018-06-26 19:01:43,246][INFO ][main][FileWriteAheadLogManager] Stopping WAL iteration due to an exception: EOF at position [100] expected to read [1] bytes, ptr=FileWALPointer [idx=0, fileOff=100, len=0] [2018-06-26 19:01:43,336][WARN ][main][FileWriteAheadLogManager] WAL segment tail is reached. [ Expected next state: {Index=19,Offset=794017}, Actual state : {Index=3602879702215753728,Offset=775434544} ] [2018-06-26 19:01:43,336][INFO ][main][GridCacheDatabaseSharedManager] Finished applying WAL changes [updatesApplied=0, time=101ms] [2018-06-26 19:01:43,450][INFO ][main][GridSnapshotAwareClusterStateProcessorImpl] Restoring history for BaselineTopology[id=4] [2018-06-26 19:01:43,454][ERROR][main][IgniteKernal] Exception during start processors, node will be stopped and close connections class org.apache.ignite.IgniteCheckedException: Failed to start processor: GridProcessorAdapter [] at org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1769) at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1001) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2020) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1725) at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1153) at org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1071) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:957) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:856) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:726) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:695) at org.apache.ignite.Ignition.start(Ignition.java:352) at org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:301) Caused by: class org.apache.ignite.IgniteCheckedException: Restoring of BaselineTopology history has failed, expected history item not found for id=1 at org.apache.ignite.internal.processors.cluster.BaselineTopologyHistory.restoreHistory(BaselineTopologyHistory.java:54) at org.apache.ignite.internal.processors.cluster.GridClusterStateProcessor.onReadyForRead(GridClusterStateProcessor.java:222) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetastorageReadyForRead(GridCacheDatabaseSharedManager.java:381) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readMetastore(GridCacheDatabaseSharedManager.java:643) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.start0(GridCacheDatabaseSharedM
[jira] [Created] (IGNITE-8874) Blinking node in cluster may cause data corruption
Dmitry Sherstobitov created IGNITE-8874: --- Summary: Blinking node in cluster may cause data corruption Key: IGNITE-8874 URL: https://issues.apache.org/jira/browse/IGNITE-8874 Project: Ignite Issue Type: Bug Affects Versions: 2.5 Reporter: Dmitry Sherstobitov All caches with 2 backups 4 nodes in cluster # Start cluster, load data # Start transactional loading (8 threads, 100 ops/second put/get in each op) # Repeat 10 times: kill one node, clean LFS, start node again, wait for rebalance # Check idle_verify, check data corruption Here is idle_verify report: node2 - node that was blinking while test. Update counter are equal between partitions but data is different. {code:java} Conflict partition: PartitionKey [grpId=374280886, grpName=cache_group_3, partId=41] Partition instances: [PartitionHashRecord [isPrimary=true, partHash=885018783, updateCntr=16, size=15, consistentId=node4], PartitionHashRecord [isPrimary=false, partHash=885018783, updateCntr=16, size=15, consistentId=node3], PartitionHashRecord [isPrimary=false, partHash=-357162793, updateCntr=16, size=15, consistentId=node2]] Conflict partition: PartitionKey [grpId=1586135625, grpName=cache_group_1_015, partId=15] Partition instances: [PartitionHashRecord [isPrimary=true, partHash=-562597978, updateCntr=22, size=16, consistentId=node3], PartitionHashRecord [isPrimary=false, partHash=-562597978, updateCntr=22, size=16, consistentId=node1], PartitionHashRecord [isPrimary=false, partHash=780813725, updateCntr=22, size=16, consistentId=node2]] Conflict partition: PartitionKey [grpId=374280885, grpName=cache_group_2, partId=75] Partition instances: [PartitionHashRecord [isPrimary=true, partHash=-1500797699, updateCntr=21, size=16, consistentId=node3], PartitionHashRecord [isPrimary=false, partHash=-1500797699, updateCntr=21, size=16, consistentId=node1], PartitionHashRecord [isPrimary=false, partHash=-1592034435, updateCntr=21, size=16, consistentId=node2]] Conflict partition: PartitionKey [grpId=374280884, grpName=cache_group_1, partId=713] Partition instances: [PartitionHashRecord [isPrimary=false, partHash=-63058826, updateCntr=4, size=2, consistentId=node3], PartitionHashRecord [isPrimary=true, partHash=-63058826, updateCntr=4, size=2, consistentId=node1], PartitionHashRecord [isPrimary=false, partHash=670869467, updateCntr=4, size=2, consistentId=node2]] Conflict partition: PartitionKey [grpId=374280886, grpName=cache_group_3, partId=11] Partition instances: [PartitionHashRecord [isPrimary=false, partHash=-224572810, updateCntr=17, size=16, consistentId=node3], PartitionHashRecord [isPrimary=true, partHash=-224572810, updateCntr=17, size=16, consistentId=node1], PartitionHashRecord [isPrimary=false, partHash=176419075, updateCntr=17, size=16, consistentId=node2]]{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8620) Remove intOrder and lc keys from node info in control.sh --tx utility
Dmitry Sherstobitov created IGNITE-8620: --- Summary: Remove intOrder and lc keys from node info in control.sh --tx utility Key: IGNITE-8620 URL: https://issues.apache.org/jira/browse/IGNITE-8620 Project: Ignite Issue Type: Improvement Reporter: Dmitry Sherstobitov For now this information displayed in control.sh utility for each node: TcpDiscoveryNode [id=2ed402d5-b5a7-4ade-a77a-12c2feea95ec, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 172.25.1.47], sockAddrs=[/0:0:0:0:0:0:0:1%lo:0, /127.0.0.1:0, /172.25.1.47:0], discPort=0, order=6, intOrder=6, lastExchangeTime=1526482701193, loc=false, ver=2.5.1#20180510-sha1:ee417b82, isClient=true] loc and intOrder values are internal information and there is not need to display it -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-8602) Add support filter label=null for control.sh tx utility
[ https://issues.apache.org/jira/browse/IGNITE-8602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Sherstobitov updated IGNITE-8602: Description: For now this transactions cannot be separated from other by using filter "label null" > Add support filter label=null for control.sh tx utility > --- > > Key: IGNITE-8602 > URL: https://issues.apache.org/jira/browse/IGNITE-8602 > Project: Ignite > Issue Type: Improvement >Affects Versions: 2.5 >Reporter: Dmitry Sherstobitov >Priority: Major > > For now this transactions cannot be separated from other by using filter > "label null" -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8602) Add support filter label=null for control.sh tx utility
Dmitry Sherstobitov created IGNITE-8602: --- Summary: Add support filter label=null for control.sh tx utility Key: IGNITE-8602 URL: https://issues.apache.org/jira/browse/IGNITE-8602 Project: Ignite Issue Type: Improvement Affects Versions: 2.5 Reporter: Dmitry Sherstobitov -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8601) Add to control.sh tx utility information about transaction start time
Dmitry Sherstobitov created IGNITE-8601: --- Summary: Add to control.sh tx utility information about transaction start time Key: IGNITE-8601 URL: https://issues.apache.org/jira/browse/IGNITE-8601 Project: Ignite Issue Type: Improvement Affects Versions: 2.5 Reporter: Dmitry Sherstobitov This information will be useful -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (IGNITE-8466) Control.sh transacitions utility may hang on case with loading
[ https://issues.apache.org/jira/browse/IGNITE-8466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Sherstobitov closed IGNITE-8466. --- > Control.sh transacitions utility may hang on case with loading > -- > > Key: IGNITE-8466 > URL: https://issues.apache.org/jira/browse/IGNITE-8466 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.5 >Reporter: Dmitry Sherstobitov >Priority: Critical > Attachments: IGNITE-8466.zip > > > Start nodes, activate and preload some Accounts data > Start client and run transactional loading (8 threads with ~1000ops/second - > moving some amount from one value to another) > Start 10 long running transactions (transactions with flexible sleep inside) > with label tx_* > Start control.sh --tx label tx kill > > Last run of control.sh utility hangs. > > Attachment info: > grid1,2,3 - server logs > grid20001 - client with preloading > grid20002 - client with transactional loading and LRTs (with stack trace) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (IGNITE-8466) Control.sh transacitions utility may hang on case with loading
[ https://issues.apache.org/jira/browse/IGNITE-8466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Sherstobitov resolved IGNITE-8466. - Resolution: Not A Bug Incorrect test scenario because of unused --force flag > Control.sh transacitions utility may hang on case with loading > -- > > Key: IGNITE-8466 > URL: https://issues.apache.org/jira/browse/IGNITE-8466 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.5 >Reporter: Dmitry Sherstobitov >Priority: Critical > Attachments: IGNITE-8466.zip > > > Start nodes, activate and preload some Accounts data > Start client and run transactional loading (8 threads with ~1000ops/second - > moving some amount from one value to another) > Start 10 long running transactions (transactions with flexible sleep inside) > with label tx_* > Start control.sh --tx label tx kill > > Last run of control.sh utility hangs. > > Attachment info: > grid1,2,3 - server logs > grid20001 - client with preloading > grid20002 - client with transactional loading and LRTs (with stack trace) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (IGNITE-8476) AssertionError exception occurs when trying to remove node from baseline under loading
[ https://issues.apache.org/jira/browse/IGNITE-8476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16480807#comment-16480807 ] Dmitry Sherstobitov edited comment on IGNITE-8476 at 5/18/18 3:33 PM: -- Another test scenario with no transactional loading: # Start cluster, load data # Start client # Create transactional cache # Start long transactions (transactions with infinite sleep() and interrupt variable to call commit() on it) # Add new node in cluster that is not in baseline # Release transactions after some minor timeout # Try to get values from cluster that was affected by this long transactions First 3 cache.get() are successful, next get() hangs and throw Assertion in server logs was (Author: qvad): Another test scenario with no transactional loading: # Start cluster, load data # Start client # Create transactional cache # Start long transactions (transactions with infinite sleep() and interrupt variable to call commit() on it) # Add new node in cluster that not in baseline # Release transactions after some minor timeout # Try to get values from cluster that was affected by this long transactions First 3 cache.get() are successful, next get() hangs and throw Assertion in server logs > AssertionError exception occurs when trying to remove node from baseline > under loading > -- > > Key: IGNITE-8476 > URL: https://issues.apache.org/jira/browse/IGNITE-8476 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.5 >Reporter: Dmitry Sherstobitov >Assignee: Ivan Rakov >Priority: Blocker > > Run 6 nodes, start loading (8 threads, 1000 2x cache.get() and 2x cache.put() > in each thread per second) > Kill 2 nodes and try to remove one node from baseline using > control.sh --baseline remove node1 > control.sh --baseline version TOPOLOGY_VERSION > > Utility hangs or connected client may hangs, this assertion appears in log > For transactional cache: > {code:java} > [16:32:58,960][SEVERE][sys-stripe-14-#15][G] Failed to execute runnable. > java.lang.AssertionError: localNode = be945692-c750-4d72-b9a1-9ac4170ff125, > dhtNodes = [ZookeeperClusterNode [id=810226e6-656a-460d-8069-ca7d2dd294ef, > addrs=[172.17.0.1, 0:0:0:0:0:0:0:1%lo, 172.25.1.28, 127.0.0.1], order=1, > loc=false, client=false], ZookeeperClusterNode > [id=be945692-c750-4d72-b9a1-9ac4170ff125, addrs=[172.17.0.1, 172.25.1.28, > 0:0:0:0:0:0:0:1%lo, 127.0.0.1], order=3, loc=true, client=false], > ZookeeperClusterNode [id=db4503f6-9185-4673-b38c-8890dfa69511, > addrs=[172.17.0.1, 172.25.1.37, 0:0:0:0:0:0:0:1%lo, 127.0.0.1], order=5, > loc=false, client=false], ZookeeperClusterNode > [id=3b8d8d4f-3513-4d39-a1fd-7ec5b15fc653, addrs=[172.17.0.1, 172.25.1.37, > 127.0.0.1, 0:0:0:0:0:0:0:1%lo], order=4, loc=false, client=false], > ZookeeperClusterNode [id=2bfc8c2e-2f47-4126-9cc4-6f017ce7efde, > addrs=[172.17.0.1, 172.25.1.37, 0:0:0:0:0:0:0:1%lo, 127.0.0.1], order=6, > loc=false, client=false]] > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.map(GridDhtTxPrepareFuture.java:1520) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.prepare0(GridDhtTxPrepareFuture.java:1239) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.mapIfLocked(GridDhtTxPrepareFuture.java:671) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.prepare(GridDhtTxPrepareFuture.java:1048) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.prepareAsync(GridDhtTxLocal.java:397) > at > org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.prepareNearTx(IgniteTxHandler.java:516) > at > org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processNearTxPrepareRequest0(IgniteTxHandler.java:150) > at > org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processNearTxPrepareRequest(IgniteTxHandler.java:135) > at > org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.access$000(IgniteTxHandler.java:97) > at > org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$1.apply(IgniteTxHandler.java:177) > at > org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$1.apply(IgniteTxHandler.java:175) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1054) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:579) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:378) > at > org.apache.ignite.internal.p
[jira] [Comment Edited] (IGNITE-8476) AssertionError exception occurs when trying to remove node from baseline under loading
[ https://issues.apache.org/jira/browse/IGNITE-8476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16480807#comment-16480807 ] Dmitry Sherstobitov edited comment on IGNITE-8476 at 5/18/18 3:33 PM: -- Another test scenario with no transactional loading: # Start cluster, load data # Start client # Create transactional cache # Start long transactions (transactions with infinite sleep() and interrupt variable to call commit() on it) # Add new node in cluster that not in baseline # Release transactions after some minor timeout # Try to get values from cluster that was affected by this long transactions First 3 cache.get() are successful, next get() hangs and throw Assertion in server logs was (Author: qvad): Another test scenario with no transactional loading: # Start cluster, load data # Start client # Create transactional cache # Start long transactions (transactions with infinite sleep() and interrupt variable to call commit() on it) # Release transactions after some minor timeout # Try to get values from cluster that was affected by this long transactions First 3 cache.get() are successful, next get() hangs and throw Assertion in server logs > AssertionError exception occurs when trying to remove node from baseline > under loading > -- > > Key: IGNITE-8476 > URL: https://issues.apache.org/jira/browse/IGNITE-8476 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.5 >Reporter: Dmitry Sherstobitov >Assignee: Ivan Rakov >Priority: Blocker > > Run 6 nodes, start loading (8 threads, 1000 2x cache.get() and 2x cache.put() > in each thread per second) > Kill 2 nodes and try to remove one node from baseline using > control.sh --baseline remove node1 > control.sh --baseline version TOPOLOGY_VERSION > > Utility hangs or connected client may hangs, this assertion appears in log > For transactional cache: > {code:java} > [16:32:58,960][SEVERE][sys-stripe-14-#15][G] Failed to execute runnable. > java.lang.AssertionError: localNode = be945692-c750-4d72-b9a1-9ac4170ff125, > dhtNodes = [ZookeeperClusterNode [id=810226e6-656a-460d-8069-ca7d2dd294ef, > addrs=[172.17.0.1, 0:0:0:0:0:0:0:1%lo, 172.25.1.28, 127.0.0.1], order=1, > loc=false, client=false], ZookeeperClusterNode > [id=be945692-c750-4d72-b9a1-9ac4170ff125, addrs=[172.17.0.1, 172.25.1.28, > 0:0:0:0:0:0:0:1%lo, 127.0.0.1], order=3, loc=true, client=false], > ZookeeperClusterNode [id=db4503f6-9185-4673-b38c-8890dfa69511, > addrs=[172.17.0.1, 172.25.1.37, 0:0:0:0:0:0:0:1%lo, 127.0.0.1], order=5, > loc=false, client=false], ZookeeperClusterNode > [id=3b8d8d4f-3513-4d39-a1fd-7ec5b15fc653, addrs=[172.17.0.1, 172.25.1.37, > 127.0.0.1, 0:0:0:0:0:0:0:1%lo], order=4, loc=false, client=false], > ZookeeperClusterNode [id=2bfc8c2e-2f47-4126-9cc4-6f017ce7efde, > addrs=[172.17.0.1, 172.25.1.37, 0:0:0:0:0:0:0:1%lo, 127.0.0.1], order=6, > loc=false, client=false]] > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.map(GridDhtTxPrepareFuture.java:1520) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.prepare0(GridDhtTxPrepareFuture.java:1239) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.mapIfLocked(GridDhtTxPrepareFuture.java:671) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.prepare(GridDhtTxPrepareFuture.java:1048) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.prepareAsync(GridDhtTxLocal.java:397) > at > org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.prepareNearTx(IgniteTxHandler.java:516) > at > org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processNearTxPrepareRequest0(IgniteTxHandler.java:150) > at > org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processNearTxPrepareRequest(IgniteTxHandler.java:135) > at > org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.access$000(IgniteTxHandler.java:97) > at > org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$1.apply(IgniteTxHandler.java:177) > at > org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$1.apply(IgniteTxHandler.java:175) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1054) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:579) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:378) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(Gr
[jira] [Commented] (IGNITE-8476) AssertionError exception occurs when trying to remove node from baseline under loading
[ https://issues.apache.org/jira/browse/IGNITE-8476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16480807#comment-16480807 ] Dmitry Sherstobitov commented on IGNITE-8476: - Another test scenario with no transactional loading: # Start cluster, load data # Start client # Create transactional cache # Start long transactions (transactions with infinite sleep() and interrupt variable to call commit() on it) # Release transactions after some minor timeout # Try to get values from cluster that was affected by this long transactions First 3 cache.get() are successful, next get() hangs and throw Assertion in server logs > AssertionError exception occurs when trying to remove node from baseline > under loading > -- > > Key: IGNITE-8476 > URL: https://issues.apache.org/jira/browse/IGNITE-8476 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.5 >Reporter: Dmitry Sherstobitov >Assignee: Ivan Rakov >Priority: Blocker > > Run 6 nodes, start loading (8 threads, 1000 2x cache.get() and 2x cache.put() > in each thread per second) > Kill 2 nodes and try to remove one node from baseline using > control.sh --baseline remove node1 > control.sh --baseline version TOPOLOGY_VERSION > > Utility hangs or connected client may hangs, this assertion appears in log > For transactional cache: > {code:java} > [16:32:58,960][SEVERE][sys-stripe-14-#15][G] Failed to execute runnable. > java.lang.AssertionError: localNode = be945692-c750-4d72-b9a1-9ac4170ff125, > dhtNodes = [ZookeeperClusterNode [id=810226e6-656a-460d-8069-ca7d2dd294ef, > addrs=[172.17.0.1, 0:0:0:0:0:0:0:1%lo, 172.25.1.28, 127.0.0.1], order=1, > loc=false, client=false], ZookeeperClusterNode > [id=be945692-c750-4d72-b9a1-9ac4170ff125, addrs=[172.17.0.1, 172.25.1.28, > 0:0:0:0:0:0:0:1%lo, 127.0.0.1], order=3, loc=true, client=false], > ZookeeperClusterNode [id=db4503f6-9185-4673-b38c-8890dfa69511, > addrs=[172.17.0.1, 172.25.1.37, 0:0:0:0:0:0:0:1%lo, 127.0.0.1], order=5, > loc=false, client=false], ZookeeperClusterNode > [id=3b8d8d4f-3513-4d39-a1fd-7ec5b15fc653, addrs=[172.17.0.1, 172.25.1.37, > 127.0.0.1, 0:0:0:0:0:0:0:1%lo], order=4, loc=false, client=false], > ZookeeperClusterNode [id=2bfc8c2e-2f47-4126-9cc4-6f017ce7efde, > addrs=[172.17.0.1, 172.25.1.37, 0:0:0:0:0:0:0:1%lo, 127.0.0.1], order=6, > loc=false, client=false]] > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.map(GridDhtTxPrepareFuture.java:1520) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.prepare0(GridDhtTxPrepareFuture.java:1239) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.mapIfLocked(GridDhtTxPrepareFuture.java:671) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.prepare(GridDhtTxPrepareFuture.java:1048) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.prepareAsync(GridDhtTxLocal.java:397) > at > org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.prepareNearTx(IgniteTxHandler.java:516) > at > org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processNearTxPrepareRequest0(IgniteTxHandler.java:150) > at > org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processNearTxPrepareRequest(IgniteTxHandler.java:135) > at > org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.access$000(IgniteTxHandler.java:97) > at > org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$1.apply(IgniteTxHandler.java:177) > at > org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$1.apply(IgniteTxHandler.java:175) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1054) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:579) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:378) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:304) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:99) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:293) > at > org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556) > at > org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1184) > at > org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoMan
[jira] [Comment Edited] (IGNITE-8476) AssertionError exception occurs when trying to remove node from baseline under loading
[ https://issues.apache.org/jira/browse/IGNITE-8476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16480756#comment-16480756 ] Dmitry Sherstobitov edited comment on IGNITE-8476 at 5/18/18 2:51 PM: -- [~ivan.glukos] I've got same AssertionError while adding node from baseline with clean LFS. # Start cluster, activate, start client with loading # Kill single node, clean LFS and start it again # AssertionError Next steps in this case: # Add new single node in cluster # Add new node to baseline # Wait for transaction loading ends # LRTs in client logs and transactional loading hangs (transactions with timeout=5 ms) was (Author: qvad): [~ivan.glukos] I've got same AssertionError while adding node from baseline with clean LFS. # Start cluster, activate, start client with loading # Kill single node, clean LFS and start it again # AssertionError > AssertionError exception occurs when trying to remove node from baseline > under loading > -- > > Key: IGNITE-8476 > URL: https://issues.apache.org/jira/browse/IGNITE-8476 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.5 >Reporter: Dmitry Sherstobitov >Assignee: Ivan Rakov >Priority: Blocker > > Run 6 nodes, start loading (8 threads, 1000 2x cache.get() and 2x cache.put() > in each thread per second) > Kill 2 nodes and try to remove one node from baseline using > control.sh --baseline remove node1 > control.sh --baseline version TOPOLOGY_VERSION > > Utility hangs or connected client may hangs, this assertion appears in log > For transactional cache: > {code:java} > [16:32:58,960][SEVERE][sys-stripe-14-#15][G] Failed to execute runnable. > java.lang.AssertionError: localNode = be945692-c750-4d72-b9a1-9ac4170ff125, > dhtNodes = [ZookeeperClusterNode [id=810226e6-656a-460d-8069-ca7d2dd294ef, > addrs=[172.17.0.1, 0:0:0:0:0:0:0:1%lo, 172.25.1.28, 127.0.0.1], order=1, > loc=false, client=false], ZookeeperClusterNode > [id=be945692-c750-4d72-b9a1-9ac4170ff125, addrs=[172.17.0.1, 172.25.1.28, > 0:0:0:0:0:0:0:1%lo, 127.0.0.1], order=3, loc=true, client=false], > ZookeeperClusterNode [id=db4503f6-9185-4673-b38c-8890dfa69511, > addrs=[172.17.0.1, 172.25.1.37, 0:0:0:0:0:0:0:1%lo, 127.0.0.1], order=5, > loc=false, client=false], ZookeeperClusterNode > [id=3b8d8d4f-3513-4d39-a1fd-7ec5b15fc653, addrs=[172.17.0.1, 172.25.1.37, > 127.0.0.1, 0:0:0:0:0:0:0:1%lo], order=4, loc=false, client=false], > ZookeeperClusterNode [id=2bfc8c2e-2f47-4126-9cc4-6f017ce7efde, > addrs=[172.17.0.1, 172.25.1.37, 0:0:0:0:0:0:0:1%lo, 127.0.0.1], order=6, > loc=false, client=false]] > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.map(GridDhtTxPrepareFuture.java:1520) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.prepare0(GridDhtTxPrepareFuture.java:1239) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.mapIfLocked(GridDhtTxPrepareFuture.java:671) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.prepare(GridDhtTxPrepareFuture.java:1048) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.prepareAsync(GridDhtTxLocal.java:397) > at > org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.prepareNearTx(IgniteTxHandler.java:516) > at > org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processNearTxPrepareRequest0(IgniteTxHandler.java:150) > at > org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processNearTxPrepareRequest(IgniteTxHandler.java:135) > at > org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.access$000(IgniteTxHandler.java:97) > at > org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$1.apply(IgniteTxHandler.java:177) > at > org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$1.apply(IgniteTxHandler.java:175) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1054) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:579) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:378) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:304) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:99) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:293) > at > org.apache.ignite.internal.managers.communication.GridIoM
[jira] [Commented] (IGNITE-8476) AssertionError exception occurs when trying to remove node from baseline under loading
[ https://issues.apache.org/jira/browse/IGNITE-8476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16480756#comment-16480756 ] Dmitry Sherstobitov commented on IGNITE-8476: - [~ivan.glukos] I've got same AssertionError while adding node from baseline with clean LFS. # Start cluster, activate, start client with loading # Kill single node, clean LFS and start it again # AssertionError > AssertionError exception occurs when trying to remove node from baseline > under loading > -- > > Key: IGNITE-8476 > URL: https://issues.apache.org/jira/browse/IGNITE-8476 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.5 >Reporter: Dmitry Sherstobitov >Assignee: Ivan Rakov >Priority: Blocker > > Run 6 nodes, start loading (8 threads, 1000 2x cache.get() and 2x cache.put() > in each thread per second) > Kill 2 nodes and try to remove one node from baseline using > control.sh --baseline remove node1 > control.sh --baseline version TOPOLOGY_VERSION > > Utility hangs or connected client may hangs, this assertion appears in log > For transactional cache: > {code:java} > [16:32:58,960][SEVERE][sys-stripe-14-#15][G] Failed to execute runnable. > java.lang.AssertionError: localNode = be945692-c750-4d72-b9a1-9ac4170ff125, > dhtNodes = [ZookeeperClusterNode [id=810226e6-656a-460d-8069-ca7d2dd294ef, > addrs=[172.17.0.1, 0:0:0:0:0:0:0:1%lo, 172.25.1.28, 127.0.0.1], order=1, > loc=false, client=false], ZookeeperClusterNode > [id=be945692-c750-4d72-b9a1-9ac4170ff125, addrs=[172.17.0.1, 172.25.1.28, > 0:0:0:0:0:0:0:1%lo, 127.0.0.1], order=3, loc=true, client=false], > ZookeeperClusterNode [id=db4503f6-9185-4673-b38c-8890dfa69511, > addrs=[172.17.0.1, 172.25.1.37, 0:0:0:0:0:0:0:1%lo, 127.0.0.1], order=5, > loc=false, client=false], ZookeeperClusterNode > [id=3b8d8d4f-3513-4d39-a1fd-7ec5b15fc653, addrs=[172.17.0.1, 172.25.1.37, > 127.0.0.1, 0:0:0:0:0:0:0:1%lo], order=4, loc=false, client=false], > ZookeeperClusterNode [id=2bfc8c2e-2f47-4126-9cc4-6f017ce7efde, > addrs=[172.17.0.1, 172.25.1.37, 0:0:0:0:0:0:0:1%lo, 127.0.0.1], order=6, > loc=false, client=false]] > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.map(GridDhtTxPrepareFuture.java:1520) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.prepare0(GridDhtTxPrepareFuture.java:1239) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.mapIfLocked(GridDhtTxPrepareFuture.java:671) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.prepare(GridDhtTxPrepareFuture.java:1048) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.prepareAsync(GridDhtTxLocal.java:397) > at > org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.prepareNearTx(IgniteTxHandler.java:516) > at > org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processNearTxPrepareRequest0(IgniteTxHandler.java:150) > at > org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processNearTxPrepareRequest(IgniteTxHandler.java:135) > at > org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.access$000(IgniteTxHandler.java:97) > at > org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$1.apply(IgniteTxHandler.java:177) > at > org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$1.apply(IgniteTxHandler.java:175) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1054) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:579) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:378) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:304) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:99) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:293) > at > org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556) > at > org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1184) > at > org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:125) > at > org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1091) > at > org.apache.ignite.internal.util.StripedExecutor$Stripe.run(StripedExecutor.java:511) > at java.lang.Thread.run(Thr
[jira] [Commented] (IGNITE-8476) AssertionError exception occurs when trying to remove node from baseline under loading
[ https://issues.apache.org/jira/browse/IGNITE-8476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16477188#comment-16477188 ] Dmitry Sherstobitov commented on IGNITE-8476: - {code:java} http://www.springframework.org/schema/beans"; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; xmlns:util="http://www.springframework.org/schema/util"; xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-2.5.xsd http://www.springframework.org/schema/util http://www.springframework.org/schema/util/spring-util-2.0.xsd";> {code} > AssertionError exception occurs when trying to remove node from baseline > under loading > -- > > Key: IGNITE-8476 > URL: https://issues.apache.org/jira/browse/IGNITE-8476 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.5 >Reporter: Dmitry Sherstobitov >Assignee: Ivan Rakov >Priority: Blocker > > Run 6 nodes, start loading (8 threads, 1000 2x cache.get() and 2x cache.put() > in each thread per second) > Kill 2 nodes and try to remove one node from baseline using > control.sh --base
[jira] [Updated] (IGNITE-8476) AssertionError exception occurs when trying to remove node from baseline under loading
[ https://issues.apache.org/jira/browse/IGNITE-8476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Sherstobitov updated IGNITE-8476: Description: Run 6 nodes, start loading (8 threads, 1000 2x cache.get() and 2x cache.put() in each thread per second) Kill 2 nodes and try to remove one node from baseline using control.sh --baseline remove node1 control.sh --baseline version TOPOLOGY_VERSION Utility hangs or connected client may hangs, this assertion appears in log For transactional cache: {code:java} [16:32:58,960][SEVERE][sys-stripe-14-#15][G] Failed to execute runnable. java.lang.AssertionError: localNode = be945692-c750-4d72-b9a1-9ac4170ff125, dhtNodes = [ZookeeperClusterNode [id=810226e6-656a-460d-8069-ca7d2dd294ef, addrs=[172.17.0.1, 0:0:0:0:0:0:0:1%lo, 172.25.1.28, 127.0.0.1], order=1, loc=false, client=false], ZookeeperClusterNode [id=be945692-c750-4d72-b9a1-9ac4170ff125, addrs=[172.17.0.1, 172.25.1.28, 0:0:0:0:0:0:0:1%lo, 127.0.0.1], order=3, loc=true, client=false], ZookeeperClusterNode [id=db4503f6-9185-4673-b38c-8890dfa69511, addrs=[172.17.0.1, 172.25.1.37, 0:0:0:0:0:0:0:1%lo, 127.0.0.1], order=5, loc=false, client=false], ZookeeperClusterNode [id=3b8d8d4f-3513-4d39-a1fd-7ec5b15fc653, addrs=[172.17.0.1, 172.25.1.37, 127.0.0.1, 0:0:0:0:0:0:0:1%lo], order=4, loc=false, client=false], ZookeeperClusterNode [id=2bfc8c2e-2f47-4126-9cc4-6f017ce7efde, addrs=[172.17.0.1, 172.25.1.37, 0:0:0:0:0:0:0:1%lo, 127.0.0.1], order=6, loc=false, client=false]] at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.map(GridDhtTxPrepareFuture.java:1520) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.prepare0(GridDhtTxPrepareFuture.java:1239) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.mapIfLocked(GridDhtTxPrepareFuture.java:671) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.prepare(GridDhtTxPrepareFuture.java:1048) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.prepareAsync(GridDhtTxLocal.java:397) at org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.prepareNearTx(IgniteTxHandler.java:516) at org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processNearTxPrepareRequest0(IgniteTxHandler.java:150) at org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processNearTxPrepareRequest(IgniteTxHandler.java:135) at org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.access$000(IgniteTxHandler.java:97) at org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$1.apply(IgniteTxHandler.java:177) at org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$1.apply(IgniteTxHandler.java:175) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1054) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:579) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:378) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:304) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:99) at org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:293) at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556) at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1184) at org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:125) at org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1091) at org.apache.ignite.internal.util.StripedExecutor$Stripe.run(StripedExecutor.java:511) at java.lang.Thread.run(Thread.java:748){code} For atomic cache: {code:java} [18:40:12,858][SEVERE][sys-stripe-10-#11][GridCacheIoManager] Failed to process message [senderId=9fde40b1-3b21-49de-b1ad-cdd0d9d902e5, messageType=class o.a.i.i.processors.cache.distributed.near.GridNearSingleGetRequest] java.lang.AssertionError: Wrong ready topology version for invalid partitions response [topVer=AffinityTopologyVersion [topVer=11, minorTopVer=0], req=GridNearSingleGetRequest [futId=1526053201329, key=KeyCacheObjectImpl [part=42, val=1514, hasValBytes=true], flags=1, topVer=AffinityTopologyVersion [topVer=11, minorTopVer=0], subjId=9fde40b1-3b21-49de-b1ad-cdd0d9d902e5, taskNameHash=0, createTtl=-1, accessTtl=-1]] at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter$6.apply(GridDhtCacheAdapter.java:943) at org.apache.ignite.internal.processors.cache.distrib
[jira] [Updated] (IGNITE-8476) AssertionError exception occurs when trying to remove node from baseline under loading
[ https://issues.apache.org/jira/browse/IGNITE-8476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Sherstobitov updated IGNITE-8476: Description: Run 6 nodes, start loading (8 threads, 1000 cache.put() in each thread per second) Kill 2 nodes and try to remove one node from baseline using control.sh --baseline remove node1 control.sh --baseline version TOPOLOGY_VERSION Utility hangs or connected client may hangs, this assertion appears in log {code:java} [18:40:12,858][SEVERE][sys-stripe-10-#11][GridCacheIoManager] Failed to process message [senderId=9fde40b1-3b21-49de-b1ad-cdd0d9d902e5, messageType=class o.a.i.i.processors.cache.distributed.near.GridNearSingleGetRequest] java.lang.AssertionError: Wrong ready topology version for invalid partitions response [topVer=AffinityTopologyVersion [topVer=11, minorTopVer=0], req=GridNearSingleGetRequest [futId=1526053201329, key=KeyCacheObjectImpl [part=42, val=1514, hasValBytes=true], flags=1, topVer=AffinityTopologyVersion [topVer=11, minorTopVer=0], subjId=9fde40b1-3b21-49de-b1ad-cdd0d9d902e5, taskNameHash=0, createTtl=-1, accessTtl=-1]] at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter$6.apply(GridDhtCacheAdapter.java:943) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter$6.apply(GridDhtCacheAdapter.java:906) at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:383) at org.apache.ignite.internal.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:353) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter.processNearSingleGetRequest(GridDhtCacheAdapter.java:906) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$300(GridDhtAtomicCache.java:130) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$4.apply(GridDhtAtomicCache.java:252) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$4.apply(GridDhtAtomicCache.java:247) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1054) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:579) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:378) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:304) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:99) at org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:293) at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556) at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1184) at org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:125) at org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1091) at org.apache.ignite.internal.util.StripedExecutor$Stripe.run(StripedExecutor.java:511) at java.lang.Thread.run(Thread.java:748){code} was: Run 6 nodes, start loading (8 threads, 1000 cache.put() in each thread per second) Kill 2 nodes and try to remove one node from baseline using control.sh --baseline remove node1 control.sh --baseline version TOPOLOGY_VERSION Utility hangs, this assertion appears in log {code:java} [18:40:12,858][SEVERE][sys-stripe-10-#11][GridCacheIoManager] Failed to process message [senderId=9fde40b1-3b21-49de-b1ad-cdd0d9d902e5, messageType=class o.a.i.i.processors.cache.distributed.near.GridNearSingleGetRequest] java.lang.AssertionError: Wrong ready topology version for invalid partitions response [topVer=AffinityTopologyVersion [topVer=11, minorTopVer=0], req=GridNearSingleGetRequest [futId=1526053201329, key=KeyCacheObjectImpl [part=42, val=1514, hasValBytes=true], flags=1, topVer=AffinityTopologyVersion [topVer=11, minorTopVer=0], subjId=9fde40b1-3b21-49de-b1ad-cdd0d9d902e5, taskNameHash=0, createTtl=-1, accessTtl=-1]] at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter$6.apply(GridDhtCacheAdapter.java:943) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter$6.apply(GridDhtCacheAdapter.java:906) at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:383) at org.apache.ignite.internal.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:353) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter.processNearSingleGetRequest(GridDhtCacheAdapter.java:906) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCa
[jira] [Updated] (IGNITE-8476) AssertionError exception occurs when trying to remove node from baseline under loading
[ https://issues.apache.org/jira/browse/IGNITE-8476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Sherstobitov updated IGNITE-8476: Summary: AssertionError exception occurs when trying to remove node from baseline under loading (was: AssertionError exception occurs when trying to remove node from baseline by consistentId under loading) > AssertionError exception occurs when trying to remove node from baseline > under loading > -- > > Key: IGNITE-8476 > URL: https://issues.apache.org/jira/browse/IGNITE-8476 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.5 >Reporter: Dmitry Sherstobitov >Priority: Blocker > > Run 6 nodes, start loading (8 threads, 1000 cache.put() in each thread per > second) > Kill 2 nodes and try to remove one node from baseline using > control.sh --baseline remove node1 > control.sh --baseline version TOPOLOGY_VERSION > > Utility hangs, this assertion appears in log > {code:java} > [18:40:12,858][SEVERE][sys-stripe-10-#11][GridCacheIoManager] Failed to > process message [senderId=9fde40b1-3b21-49de-b1ad-cdd0d9d902e5, > messageType=class > o.a.i.i.processors.cache.distributed.near.GridNearSingleGetRequest] > java.lang.AssertionError: Wrong ready topology version for invalid partitions > response [topVer=AffinityTopologyVersion [topVer=11, minorTopVer=0], > req=GridNearSingleGetRequest [futId=1526053201329, key=KeyCacheObjectImpl > [part=42, val=1514, hasValBytes=true], flags=1, > topVer=AffinityTopologyVersion [topVer=11, minorTopVer=0], > subjId=9fde40b1-3b21-49de-b1ad-cdd0d9d902e5, taskNameHash=0, createTtl=-1, > accessTtl=-1]] > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter$6.apply(GridDhtCacheAdapter.java:943) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter$6.apply(GridDhtCacheAdapter.java:906) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:383) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:353) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter.processNearSingleGetRequest(GridDhtCacheAdapter.java:906) > at > org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$300(GridDhtAtomicCache.java:130) > at > org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$4.apply(GridDhtAtomicCache.java:252) > at > org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$4.apply(GridDhtAtomicCache.java:247) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1054) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:579) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:378) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:304) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:99) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:293) > at > org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556) > at > org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1184) > at > org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:125) > at > org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1091) > at > org.apache.ignite.internal.util.StripedExecutor$Stripe.run(StripedExecutor.java:511) > at java.lang.Thread.run(Thread.java:748){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-8476) AssertionError exception occurs when trying to remove node from baseline by consistentId under loading
[ https://issues.apache.org/jira/browse/IGNITE-8476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Sherstobitov updated IGNITE-8476: Description: Run 6 nodes, start loading (8 threads, 1000 cache.put() in each thread per second) Kill 2 nodes and try to remove one node from baseline using control.sh --baseline remove node1 control.sh --baseline version TOPOLOGY_VERSION Utility hangs, this assertion appears in log {code:java} [18:40:12,858][SEVERE][sys-stripe-10-#11][GridCacheIoManager] Failed to process message [senderId=9fde40b1-3b21-49de-b1ad-cdd0d9d902e5, messageType=class o.a.i.i.processors.cache.distributed.near.GridNearSingleGetRequest] java.lang.AssertionError: Wrong ready topology version for invalid partitions response [topVer=AffinityTopologyVersion [topVer=11, minorTopVer=0], req=GridNearSingleGetRequest [futId=1526053201329, key=KeyCacheObjectImpl [part=42, val=1514, hasValBytes=true], flags=1, topVer=AffinityTopologyVersion [topVer=11, minorTopVer=0], subjId=9fde40b1-3b21-49de-b1ad-cdd0d9d902e5, taskNameHash=0, createTtl=-1, accessTtl=-1]] at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter$6.apply(GridDhtCacheAdapter.java:943) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter$6.apply(GridDhtCacheAdapter.java:906) at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:383) at org.apache.ignite.internal.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:353) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter.processNearSingleGetRequest(GridDhtCacheAdapter.java:906) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$300(GridDhtAtomicCache.java:130) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$4.apply(GridDhtAtomicCache.java:252) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$4.apply(GridDhtAtomicCache.java:247) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1054) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:579) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:378) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:304) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:99) at org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:293) at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556) at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1184) at org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:125) at org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1091) at org.apache.ignite.internal.util.StripedExecutor$Stripe.run(StripedExecutor.java:511) at java.lang.Thread.run(Thread.java:748){code} was: Run 6 nodes, start loading (8 threads, 1000 cache.put() in each thread per second) Kill 2 nodes and try to remove one node from baseline using control.sh --baseline remove node1 Utility hangs, this assertion appears in log {code:java} [18:40:12,858][SEVERE][sys-stripe-10-#11][GridCacheIoManager] Failed to process message [senderId=9fde40b1-3b21-49de-b1ad-cdd0d9d902e5, messageType=class o.a.i.i.processors.cache.distributed.near.GridNearSingleGetRequest] java.lang.AssertionError: Wrong ready topology version for invalid partitions response [topVer=AffinityTopologyVersion [topVer=11, minorTopVer=0], req=GridNearSingleGetRequest [futId=1526053201329, key=KeyCacheObjectImpl [part=42, val=1514, hasValBytes=true], flags=1, topVer=AffinityTopologyVersion [topVer=11, minorTopVer=0], subjId=9fde40b1-3b21-49de-b1ad-cdd0d9d902e5, taskNameHash=0, createTtl=-1, accessTtl=-1]] at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter$6.apply(GridDhtCacheAdapter.java:943) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter$6.apply(GridDhtCacheAdapter.java:906) at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:383) at org.apache.ignite.internal.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:353) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter.processNearSingleGetRequest(GridDhtCacheAdapter.java:906) at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$300(GridDhtAtomicCache.java:130) at org.apache.ignite.internal.proc
[jira] [Updated] (IGNITE-8476) AssertionError exception occurs when trying to remove node from baseline by consistentId under loading
[ https://issues.apache.org/jira/browse/IGNITE-8476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Sherstobitov updated IGNITE-8476: Priority: Blocker (was: Critical) > AssertionError exception occurs when trying to remove node from baseline by > consistentId under loading > -- > > Key: IGNITE-8476 > URL: https://issues.apache.org/jira/browse/IGNITE-8476 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.5 >Reporter: Dmitry Sherstobitov >Priority: Blocker > > Run 6 nodes, start loading (8 threads, 1000 cache.put() in each thread per > second) > Kill 2 nodes and try to remove one node from baseline using > control.sh --baseline remove node1 > > Utility hangs, this assertion appears in log > {code:java} > [18:40:12,858][SEVERE][sys-stripe-10-#11][GridCacheIoManager] Failed to > process message [senderId=9fde40b1-3b21-49de-b1ad-cdd0d9d902e5, > messageType=class > o.a.i.i.processors.cache.distributed.near.GridNearSingleGetRequest] > java.lang.AssertionError: Wrong ready topology version for invalid partitions > response [topVer=AffinityTopologyVersion [topVer=11, minorTopVer=0], > req=GridNearSingleGetRequest [futId=1526053201329, key=KeyCacheObjectImpl > [part=42, val=1514, hasValBytes=true], flags=1, > topVer=AffinityTopologyVersion [topVer=11, minorTopVer=0], > subjId=9fde40b1-3b21-49de-b1ad-cdd0d9d902e5, taskNameHash=0, createTtl=-1, > accessTtl=-1]] > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter$6.apply(GridDhtCacheAdapter.java:943) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter$6.apply(GridDhtCacheAdapter.java:906) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:383) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:353) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter.processNearSingleGetRequest(GridDhtCacheAdapter.java:906) > at > org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$300(GridDhtAtomicCache.java:130) > at > org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$4.apply(GridDhtAtomicCache.java:252) > at > org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$4.apply(GridDhtAtomicCache.java:247) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1054) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:579) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:378) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:304) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:99) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:293) > at > org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556) > at > org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1184) > at > org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:125) > at > org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1091) > at > org.apache.ignite.internal.util.StripedExecutor$Stripe.run(StripedExecutor.java:511) > at java.lang.Thread.run(Thread.java:748){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)