[jira] [Commented] (IGNITE-11936) Avoid changing AffinityTopologyVersion on a server node join/left event from not baseline topology.
[ https://issues.apache.org/jira/browse/IGNITE-11936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16882305#comment-16882305 ] Ignite TC Bot commented on IGNITE-11936: {panel:title=-- Run :: All: No blockers found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel} [TeamCity *-- Run :: All* Results|https://ci.ignite.apache.org/viewLog.html?buildId=4300185buildTypeId=IgniteTests24Java8_RunAll] > Avoid changing AffinityTopologyVersion on a server node join/left event from > not baseline topology. > --- > > Key: IGNITE-11936 > URL: https://issues.apache.org/jira/browse/IGNITE-11936 > Project: Ignite > Issue Type: Improvement >Reporter: Amelchev Nikita >Assignee: Amelchev Nikita >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Currently, a client join/left event does not change AffinityTopologyVersion > (see IGNITE-9558). It shouldn't be changed on a server node join/left event > from not baseline topology too. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-11951) Set ThreadLocal node name only once in JdkMarshaller
[ https://issues.apache.org/jira/browse/IGNITE-11951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16882217#comment-16882217 ] Aleksey Plekhanov commented on IGNITE-11951: [~Pavlukhin], I've looked at your PR. It looks good to me. > Set ThreadLocal node name only once in JdkMarshaller > > > Key: IGNITE-11951 > URL: https://issues.apache.org/jira/browse/IGNITE-11951 > Project: Ignite > Issue Type: Improvement >Affects Versions: 2.7.5 >Reporter: Ivan Pavlukhin >Assignee: Ivan Pavlukhin >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Currently {{JdkMarshaller}} saves a node name twice in couple of > marshall/unmarshall methods. Code can be improved to do it only once. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-11958) JDBC connection validation should use it's own task instead of cache validation task
[ https://issues.apache.org/jira/browse/IGNITE-11958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16882079#comment-16882079 ] Yury Gerzhedovich commented on IGNITE-11958: Looks ok to me > JDBC connection validation should use it's own task instead of cache > validation task > > > Key: IGNITE-11958 > URL: https://issues.apache.org/jira/browse/IGNITE-11958 > Project: Ignite > Issue Type: Bug >Reporter: Denis Chudov >Assignee: Denis Chudov >Priority: Major > Fix For: 2.8 > > Time Spent: 1h > Remaining Estimate: 0h > > JDBC connection is validated using GridCacheQueryJdbcValidationTask. We > should create own validation task for this activity. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-11974) infinite loop and 100% cpu in GridDhtPartitionsEvictor: Eviction in progress ...
[ https://issues.apache.org/jira/browse/IGNITE-11974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Kamyshnikov updated IGNITE-11974: -- Attachment: server-node-restarts-1.png > infinite loop and 100% cpu in GridDhtPartitionsEvictor: Eviction in progress > ... > > > Key: IGNITE-11974 > URL: https://issues.apache.org/jira/browse/IGNITE-11974 > Project: Ignite > Issue Type: Bug > Components: cache >Affects Versions: 2.5 >Reporter: Igor Kamyshnikov >Priority: Blocker > Attachments: image-2019-07-10-16-07-37-185.png, > server-node-restarts-1.png > > > Note: RCA was not done: > Sometimes ignite server nodes fall into infinite loop and consume 100% cpu: > {noformat} > "sys-#260008" #260285 prio=5 os_prio=0 tid=0x7fabb020a800 nid=0x1e850 > runnable [0x7fab26fef000] >java.lang.Thread.State: RUNNABLE > at > java.util.concurrent.ConcurrentHashMap$Traverser.advance(ConcurrentHashMap.java:3339) > at > java.util.concurrent.ConcurrentHashMap$ValueIterator.next(ConcurrentHashMap.java:3439) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionsEvictor$1.call(GridDhtPartitionsEvictor.java:84) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionsEvictor$1.call(GridDhtPartitionsEvictor.java:73) > at > org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6695) > at > org.apache.ignite.internal.processors.closure.GridClosureProcessor$2.body(GridClosureProcessor.java:967) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) >Locked ownable synchronizers: > - <0x000649b9cba0> (a > java.util.concurrent.ThreadPoolExecutor$Worker) > {noformat} > the following appears in logs each 2 minutes: > {noformat} > INFO 2019-07-08 12:21:45.081 (1562581305081) [sys-#98168] > [GridDhtPartitionsEvictor] > Eviction in progress > [grp=CUSTPRODINVOICEDISCUSAGE, remainingCnt=102] > {noformat} > remainingCnt remains the same once it reached 102 (the very first line in the > logs was with value equal to 101). > Some other facts: > we have a heapdump taken for *topVer = 900* . the problem appeared after > *topVer = 790*, but it looks like it was silently waiting from *topVer = 641* > (about 24 hours back). > There were 259 topology changes between 900 and 641. > All 102 GridDhtLocalPartitions can be found in the heapdump: > {noformat} > select * from > "org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition" > t where delayedRenting = true > {noformat} > They all have status = 65537 , which means (according to > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition#state): > reservations(65537) = 1 > getPartState(65537) = OWNING > There are also 26968 instances of > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionTopologyImpl$$Lambda$70, > that are created by > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionTopologyImpl#checkEvictions > method. > 26418 of 26968 refer to AtomicInteger instance with value = 102: > 26418/102 = 259 = 900 - 641 (see topology info above). > The key thing seen from the heapdump is that topVer = 641 or topVer = 642 was > the last topology where these 102 partitions were assigned to the current > ignite server node. > {noformat} > select > t.this > ,t.this['clientEvtChange'] as clientEvtChange > ,t.this['topVer.topVer'] as topVer > > ,t.this['assignment.elementData'][555]['elementData'][0]['hostNames.elementData'][0] > as primary_part > > ,t.this['assignment.elementData'][555]['elementData'][1]['hostNames.elementData'][0] > as secondary_part > from org.apache.ignite.internal.processors.affinity.HistoryAffinityAssignment > t where length(t.this['assignment.elementData']) = 1024 > order by topVer > {noformat} > !image-2019-07-10-16-07-37-185.png! > The connection of a client node at topVer = 790 somehow triggered the > GridDhtPartitionsEvictor loop to execute. > Summary: > 1) it is seen that 102 partitions has one reservation and OWNING state. > 2) they were backup partitions. > 3) for some reason their eviction has been silently delaying (because of > reservations), but each topology change seemed to trigger eviction attempt. > 4) something managed to make > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionsEvictor#evictPartitionAsync > to run never exiting. -- This message was
[jira] [Updated] (IGNITE-11974) infinite loop and 100% cpu in GridDhtPartitionsEvictor: Eviction in progress ...
[ https://issues.apache.org/jira/browse/IGNITE-11974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Kamyshnikov updated IGNITE-11974: -- Description: Note: RCA was not done: Sometimes ignite server nodes fall into infinite loop and consume 100% cpu: {noformat} "sys-#260008" #260285 prio=5 os_prio=0 tid=0x7fabb020a800 nid=0x1e850 runnable [0x7fab26fef000] java.lang.Thread.State: RUNNABLE at java.util.concurrent.ConcurrentHashMap$Traverser.advance(ConcurrentHashMap.java:3339) at java.util.concurrent.ConcurrentHashMap$ValueIterator.next(ConcurrentHashMap.java:3439) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionsEvictor$1.call(GridDhtPartitionsEvictor.java:84) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionsEvictor$1.call(GridDhtPartitionsEvictor.java:73) at org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6695) at org.apache.ignite.internal.processors.closure.GridClosureProcessor$2.body(GridClosureProcessor.java:967) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Locked ownable synchronizers: - <0x000649b9cba0> (a java.util.concurrent.ThreadPoolExecutor$Worker) {noformat} the following appears in logs each 2 minutes: {noformat} INFO 2019-07-08 12:21:45.081 (1562581305081) [sys-#98168] [GridDhtPartitionsEvictor] > Eviction in progress [grp=CUSTPRODINVOICEDISCUSAGE, remainingCnt=102] {noformat} remainingCnt remains the same once it reached 102 (the very first line in the logs was with value equal to 101). Some other facts: we have a heapdump taken for *topVer = 900* . the problem appeared after *topVer = 790*, but it looks like it was silently waiting from *topVer = 641* (about 24 hours back). There were 259 topology changes between 900 and 641. All 102 GridDhtLocalPartitions can be found in the heapdump: {noformat} select * from "org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition" t where delayedRenting = true {noformat} They all have status = 65537 , which means (according to org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition#state): reservations(65537) = 1 getPartState(65537) = OWNING There are also 26968 instances of org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionTopologyImpl$$Lambda$70, that are created by org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionTopologyImpl#checkEvictions method. 26418 of 26968 refer to AtomicInteger instance with value = 102: 26418/102 = 259 = 900 - 641 (see topology info above). The key thing seen from the heapdump is that topVer = 641 or topVer = 642 was the last topology where these 102 partitions were assigned to the current ignite server node. {noformat} select t.this ,t.this['clientEvtChange'] as clientEvtChange ,t.this['topVer.topVer'] as topVer ,t.this['assignment.elementData'][555]['elementData'][0]['hostNames.elementData'][0] as primary_part ,t.this['assignment.elementData'][555]['elementData'][1]['hostNames.elementData'][0] as secondary_part from org.apache.ignite.internal.processors.affinity.HistoryAffinityAssignment t where length(t.this['assignment.elementData']) = 1024 order by topVer {noformat} !image-2019-07-10-16-07-37-185.png! The connection of a client node at topVer = 790 somehow triggered the GridDhtPartitionsEvictor loop to execute. Summary: 1) it is seen that 102 partitions has one reservation and OWNING state. 2) they were backup partitions. 3) for some reason their eviction has been silently delaying (because of reservations), but each topology change seemed to trigger eviction attempt. 4) something managed to make org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionsEvictor#evictPartitionAsync to run never exiting. Additional info: topVer = 641 was in chain of sever nodes restarts (not sure if rebalancing actually succeeded): !server-node-restarts-1.png! was: Note: RCA was not done: Sometimes ignite server nodes fall into infinite loop and consume 100% cpu: {noformat} "sys-#260008" #260285 prio=5 os_prio=0 tid=0x7fabb020a800 nid=0x1e850 runnable [0x7fab26fef000] java.lang.Thread.State: RUNNABLE at java.util.concurrent.ConcurrentHashMap$Traverser.advance(ConcurrentHashMap.java:3339) at java.util.concurrent.ConcurrentHashMap$ValueIterator.next(ConcurrentHashMap.java:3439) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionsEvictor$1.call(GridDhtPartitionsEvictor.java:84)
[jira] [Created] (IGNITE-11974) infinite loop and 100% cpu in GridDhtPartitionsEvictor: Eviction in progress ...
Igor Kamyshnikov created IGNITE-11974: - Summary: infinite loop and 100% cpu in GridDhtPartitionsEvictor: Eviction in progress ... Key: IGNITE-11974 URL: https://issues.apache.org/jira/browse/IGNITE-11974 Project: Ignite Issue Type: Bug Components: cache Affects Versions: 2.5 Reporter: Igor Kamyshnikov Attachments: image-2019-07-10-16-07-37-185.png Note: RCA was not done: Sometimes ignite server nodes fall into infinite loop and consume 100% cpu: {noformat} "sys-#260008" #260285 prio=5 os_prio=0 tid=0x7fabb020a800 nid=0x1e850 runnable [0x7fab26fef000] java.lang.Thread.State: RUNNABLE at java.util.concurrent.ConcurrentHashMap$Traverser.advance(ConcurrentHashMap.java:3339) at java.util.concurrent.ConcurrentHashMap$ValueIterator.next(ConcurrentHashMap.java:3439) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionsEvictor$1.call(GridDhtPartitionsEvictor.java:84) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionsEvictor$1.call(GridDhtPartitionsEvictor.java:73) at org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6695) at org.apache.ignite.internal.processors.closure.GridClosureProcessor$2.body(GridClosureProcessor.java:967) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Locked ownable synchronizers: - <0x000649b9cba0> (a java.util.concurrent.ThreadPoolExecutor$Worker) {noformat} the following appears in logs each 2 minutes: {noformat} INFO 2019-07-08 12:21:45.081 (1562581305081) [sys-#98168] [GridDhtPartitionsEvictor] > Eviction in progress [grp=CUSTPRODINVOICEDISCUSAGE, remainingCnt=102] {noformat} remainingCnt remains the same once it reached 102 (the very first line in the logs was with value equal to 101). Some other facts: we have a heapdump taken for *topVer = 900* . the problem appeared after *topVer = 790*, but it looks like it was silently waiting from *topVer = 641* (about 24 hours back). There were 259 topology changes between 900 and 641. All 102 GridDhtLocalPartitions can be found in the heapdump: {noformat} select * from "org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition" t where delayedRenting = true {noformat} They all have status = 65537 , which means (according to org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition#state): reservations(65537) = 1 getPartState(65537) = OWNING There are also 26968 instances of org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionTopologyImpl$$Lambda$70, that are created by org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionTopologyImpl#checkEvictions method. 26418 of 26968 refer to AtomicInteger instance with value = 102: 26418/102 = 259 = 900 - 641 (see topology info above). The key thing seen from the heapdump is that topVer = 641 or topVer = 642 was the last topology where these 102 partitions were assigned to the current ignite server node. {noformat} select t.this ,t.this['clientEvtChange'] as clientEvtChange ,t.this['topVer.topVer'] as topVer ,t.this['assignment.elementData'][555]['elementData'][0]['hostNames.elementData'][0] as primary_part ,t.this['assignment.elementData'][555]['elementData'][1]['hostNames.elementData'][0] as secondary_part from org.apache.ignite.internal.processors.affinity.HistoryAffinityAssignment t where length(t.this['assignment.elementData']) = 1024 order by topVer {noformat} !image-2019-07-10-16-07-37-185.png! The connection of a client node at topVer = 790 somehow triggered the GridDhtPartitionsEvictor loop to execute. Summary: 1) it is seen that 102 partitions has one reservation and OWNING state. 2) they were backup partitions. 3) for some reason their eviction has been silently delaying (because of reservations), but each topology change seemed to trigger eviction attempt. 4) something managed to make org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionsEvictor#evictPartitionAsync to run never exiting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-11907) Registration of continuous query should fail if nodes don't have remote filter class
[ https://issues.apache.org/jira/browse/IGNITE-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881891#comment-16881891 ] Ivan Pavlukhin commented on IGNITE-11907: - [~rkondakov], [~jooger], thank you for review I fixed all points except {quote} 1) IncompleteDeserializationExceptionTest - commented code at the end{quote} I would prefer to keep commented lines because they explains how a file with serialized object is generated. Unfortunately it is not possible to uncomment them because test will not work as expected (class should not be available at runtime). > Registration of continuous query should fail if nodes don't have remote > filter class > > > Key: IGNITE-11907 > URL: https://issues.apache.org/jira/browse/IGNITE-11907 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.7 >Reporter: Denis Mekhanikov >Assignee: Ivan Pavlukhin >Priority: Major > Attachments: > ContinuousQueryRemoteFilterMissingInClassPathSelfTest.java > > Time Spent: 10m > Remaining Estimate: 0h > > If one of data nodes doesn't have a remote filter class, then registration of > continuous queries should fail with an exception. Currently nodes fail > instead. > Reproducer is attached: > [^ContinuousQueryRemoteFilterMissingInClassPathSelfTest.java] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (IGNITE-11907) Registration of continuous query should fail if nodes don't have remote filter class
[ https://issues.apache.org/jira/browse/IGNITE-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881874#comment-16881874 ] Yury Gerzhedovich edited comment on IGNITE-11907 at 7/10/19 9:13 AM: - [~Pavlukhin] Pull request looks good. Just two additional minors: 1) IncompleteDeserializationExceptionTest - commented code at the end 2) ContinuousQueryRemoteFilterMissingInClassPathSelfTest#testServerMissingClassFailsRegistration method - doesn't check that exception not thrown. was (Author: jooger): [~Pavlukhin] Pull request looks good. Just two additional minors: 1) IncompleteDeserializationExceptionTest - commented code at the end 2) testServerMissingClassFailsRegistration method - doesn't check that exception not thrown. > Registration of continuous query should fail if nodes don't have remote > filter class > > > Key: IGNITE-11907 > URL: https://issues.apache.org/jira/browse/IGNITE-11907 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.7 >Reporter: Denis Mekhanikov >Assignee: Ivan Pavlukhin >Priority: Major > Attachments: > ContinuousQueryRemoteFilterMissingInClassPathSelfTest.java > > Time Spent: 10m > Remaining Estimate: 0h > > If one of data nodes doesn't have a remote filter class, then registration of > continuous queries should fail with an exception. Currently nodes fail > instead. > Reproducer is attached: > [^ContinuousQueryRemoteFilterMissingInClassPathSelfTest.java] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-11907) Registration of continuous query should fail if nodes don't have remote filter class
[ https://issues.apache.org/jira/browse/IGNITE-11907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881874#comment-16881874 ] Yury Gerzhedovich commented on IGNITE-11907: [~Pavlukhin] Pull request looks good. Just two additional minors: 1) IncompleteDeserializationExceptionTest - commented code at the end 2) testServerMissingClassFailsRegistration method - doesn't check that exception not thrown. > Registration of continuous query should fail if nodes don't have remote > filter class > > > Key: IGNITE-11907 > URL: https://issues.apache.org/jira/browse/IGNITE-11907 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.7 >Reporter: Denis Mekhanikov >Assignee: Ivan Pavlukhin >Priority: Major > Attachments: > ContinuousQueryRemoteFilterMissingInClassPathSelfTest.java > > Time Spent: 10m > Remaining Estimate: 0h > > If one of data nodes doesn't have a remote filter class, then registration of > continuous queries should fail with an exception. Currently nodes fail > instead. > Reproducer is attached: > [^ContinuousQueryRemoteFilterMissingInClassPathSelfTest.java] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-11973) testAccountTxNodeRestart cause unexpected repairs in case of ReadRepair usage
[ https://issues.apache.org/jira/browse/IGNITE-11973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Vinogradov updated IGNITE-11973: -- Ignite Flags: (was: Docs Required) > testAccountTxNodeRestart cause unexpected repairs in case of ReadRepair usage > - > > Key: IGNITE-11973 > URL: https://issues.apache.org/jira/browse/IGNITE-11973 > Project: Ignite > Issue Type: Task >Reporter: Anton Vinogradov >Assignee: Anton Vinogradov >Priority: Major > Labels: iep-31 > Fix For: 2.8 > > > Just add withReadRepair() proxy to test's cache and you'll see unexpected > data repairs (values are differ on backups while primary is locked). > To debug this add a breakpoint to > {{GridNearReadRepairFuture#recordConsistencyViolation}} after > fixedRaw.isEmpty() check. > Not empty map means repair happened. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-11973) testAccountTxNodeRestart cause unexpected repairs in case of ReadRepair usage
[ https://issues.apache.org/jira/browse/IGNITE-11973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Vinogradov updated IGNITE-11973: -- Fix Version/s: 2.8 > testAccountTxNodeRestart cause unexpected repairs in case of ReadRepair usage > - > > Key: IGNITE-11973 > URL: https://issues.apache.org/jira/browse/IGNITE-11973 > Project: Ignite > Issue Type: Task >Reporter: Anton Vinogradov >Assignee: Anton Vinogradov >Priority: Major > Labels: iep-31 > Fix For: 2.8 > > > Just add withReadRepair() proxy to test's cache and you'll see unexpected > data repairs (values are differ on backups while primary is locked). > To debug this add a breakpoint to > {{GridNearReadRepairFuture#recordConsistencyViolation}} after > fixedRaw.isEmpty() check. > Not empty map means repair happened. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-11971) Consistency check on test finish
[ https://issues.apache.org/jira/browse/IGNITE-11971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Vinogradov updated IGNITE-11971: -- Ignite Flags: (was: Docs Required) > Consistency check on test finish > > > Key: IGNITE-11971 > URL: https://issues.apache.org/jira/browse/IGNITE-11971 > Project: Ignite > Issue Type: Task >Reporter: Anton Vinogradov >Priority: Major > Labels: iep-31 > Fix For: 2.8 > > > Tests based on GridAbstractTest should automatically check cache's content > too be consistent on test finish. > Good place to check this is a tearDown method. > Additional check can be add to awaitPartitionMapExchange() method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-11971) Consistency check on test finish
[ https://issues.apache.org/jira/browse/IGNITE-11971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Vinogradov updated IGNITE-11971: -- Fix Version/s: 2.8 > Consistency check on test finish > > > Key: IGNITE-11971 > URL: https://issues.apache.org/jira/browse/IGNITE-11971 > Project: Ignite > Issue Type: Task >Reporter: Anton Vinogradov >Priority: Major > Labels: iep-31 > Fix For: 2.8 > > > Tests based on GridAbstractTest should automatically check cache's content > too be consistent on test finish. > Good place to check this is a tearDown method. > Additional check can be add to awaitPartitionMapExchange() method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-11972) Jepsen tests should check consistency
[ https://issues.apache.org/jira/browse/IGNITE-11972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Vinogradov updated IGNITE-11972: -- Fix Version/s: 2.8 > Jepsen tests should check consistency > - > > Key: IGNITE-11972 > URL: https://issues.apache.org/jira/browse/IGNITE-11972 > Project: Ignite > Issue Type: Task >Reporter: Anton Vinogradov >Assignee: Mikhail Filatov >Priority: Major > Labels: iep-31 > Fix For: 2.8 > > > We have to check data is consistent during and after the tests. > Good case is to use: > - idle_verify of test finish > - ReadRepair > -- during the test (some/(all?) gets should be with RR proxy) > -- after the test finish. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-11972) Jepsen tests should check consistency
[ https://issues.apache.org/jira/browse/IGNITE-11972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Vinogradov updated IGNITE-11972: -- Ignite Flags: (was: Docs Required) > Jepsen tests should check consistency > - > > Key: IGNITE-11972 > URL: https://issues.apache.org/jira/browse/IGNITE-11972 > Project: Ignite > Issue Type: Task >Reporter: Anton Vinogradov >Assignee: Mikhail Filatov >Priority: Major > Labels: iep-31 > > We have to check data is consistent during and after the tests. > Good case is to use: > - idle_verify of test finish > - ReadRepair > -- during the test (some/(all?) gets should be with RR proxy) > -- after the test finish. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (IGNITE-11972) Jepsen tests should check consistency
[ https://issues.apache.org/jira/browse/IGNITE-11972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Vinogradov reassigned IGNITE-11972: - Assignee: Mikhail Filatov > Jepsen tests should check consistency > - > > Key: IGNITE-11972 > URL: https://issues.apache.org/jira/browse/IGNITE-11972 > Project: Ignite > Issue Type: Task >Reporter: Anton Vinogradov >Assignee: Mikhail Filatov >Priority: Major > Labels: iep-31 > > We have to check data is consistent during and after the tests. > Good case is to use: > - idle_verify of test finish > - ReadRepair > -- during the test (some/(all?) gets should be with RR proxy) > -- after the test finish. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11973) testAccountTxNodeRestart cause unexpected repairs in case of ReadRepair usage
Anton Vinogradov created IGNITE-11973: - Summary: testAccountTxNodeRestart cause unexpected repairs in case of ReadRepair usage Key: IGNITE-11973 URL: https://issues.apache.org/jira/browse/IGNITE-11973 Project: Ignite Issue Type: Task Reporter: Anton Vinogradov Assignee: Anton Vinogradov Just add withReadRepair() proxy to test's cache and you'll see unexpected data repairs (values are differ on backups while primary is locked). To debug this add a breakpoint to {{GridNearReadRepairFuture#recordConsistencyViolation}} after fixedRaw.isEmpty() check. Not empty map means repair happened. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11972) Jepsen tests should check consistency
Anton Vinogradov created IGNITE-11972: - Summary: Jepsen tests should check consistency Key: IGNITE-11972 URL: https://issues.apache.org/jira/browse/IGNITE-11972 Project: Ignite Issue Type: Task Reporter: Anton Vinogradov We have to check data is consistent during and after the tests. Good case is to use: - idle_verify of test finish - ReadRepair -- during the test (some/(all?) gets should be with RR proxy) -- after the test finish. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-11927) [IEP-35] Add ability to configure subset of metrics
[ https://issues.apache.org/jira/browse/IGNITE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881779#comment-16881779 ] Nikolay Izhikov commented on IGNITE-11927: -- [~agura] Sorry, I should read ticket description(written by myself) more carefully. I will add an enabling/disabling feature to this PR shortly. > [IEP-35] Add ability to configure subset of metrics > --- > > Key: IGNITE-11927 > URL: https://issues.apache.org/jira/browse/IGNITE-11927 > Project: Ignite > Issue Type: Improvement >Reporter: Nikolay Izhikov >Assignee: Nikolay Izhikov >Priority: Major > Labels: IEP-35 > Time Spent: 10m > Remaining Estimate: 0h > > Ignite should be able to: > * Enable or disable an arbitrary subset of the metrics. User should be able > to do it in runtime. > * Configure Histogram metrics > * Configure HitRate metrics. > We should provide 2 ways to configure metric: > 1. -Configuration file.- Discussed on dev-list. Agreed to go with the > simplest solution - JMX method. > 2. JMX method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-11971) Consistency check on test finish
[ https://issues.apache.org/jira/browse/IGNITE-11971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881778#comment-16881778 ] Anton Vinogradov commented on IGNITE-11971: --- ReadRepair > Consistency check on test finish > > > Key: IGNITE-11971 > URL: https://issues.apache.org/jira/browse/IGNITE-11971 > Project: Ignite > Issue Type: Task >Reporter: Anton Vinogradov >Priority: Major > Labels: iep-31 > > Tests based on GridAbstractTest should automatically check cache's content > too be consistent on test finish. > Good place to check this is a tearDown method. > Additional check can be add to awaitPartitionMapExchange() method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Issue Comment Deleted] (IGNITE-11971) Consistency check on test finish
[ https://issues.apache.org/jira/browse/IGNITE-11971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anton Vinogradov updated IGNITE-11971: -- Comment: was deleted (was: ReadRepair) > Consistency check on test finish > > > Key: IGNITE-11971 > URL: https://issues.apache.org/jira/browse/IGNITE-11971 > Project: Ignite > Issue Type: Task >Reporter: Anton Vinogradov >Priority: Major > Labels: iep-31 > > Tests based on GridAbstractTest should automatically check cache's content > too be consistent on test finish. > Good place to check this is a tearDown method. > Additional check can be add to awaitPartitionMapExchange() method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11971) Consistency check on test finish
Anton Vinogradov created IGNITE-11971: - Summary: Consistency check on test finish Key: IGNITE-11971 URL: https://issues.apache.org/jira/browse/IGNITE-11971 Project: Ignite Issue Type: Task Reporter: Anton Vinogradov Tests based on GridAbstractTest should automatically check cache's content too be consistent on test finish. Good place to check this is a tearDown method. Additional check can be add to awaitPartitionMapExchange() method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (IGNITE-10663) Implement cache mode allows reads with consistency check and fix
[ https://issues.apache.org/jira/browse/IGNITE-10663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881765#comment-16881765 ] Anton Vinogradov edited comment on IGNITE-10663 at 7/10/19 6:35 AM: Merged to the master branch. was (Author: avinogradov): Merger to the master branch. > Implement cache mode allows reads with consistency check and fix > > > Key: IGNITE-10663 > URL: https://issues.apache.org/jira/browse/IGNITE-10663 > Project: Ignite > Issue Type: Task >Reporter: Anton Vinogradov >Assignee: Anton Vinogradov >Priority: Major > Labels: iep-31 > Fix For: 2.8 > > Time Spent: 18h > Remaining Estimate: 0h > > The main idea is to provide special "read from cache" mode which will read a > value from primary and all backups and will check that values are the same. > In case values differ they should be fixed according to the appropriate > strategy. > ToDo list: > 1) {{cache.withReadRepair().get(key)}} should guarantee values will be > checked across the topology and fixed if necessary. > 2) LWW (Last Write Wins) strategy should be used for validation. > 3) Since LWW and any other strategy do not guarantee that the correct value > will be chosen. > We have to record the event contains all values and the chosen one. -- This message was sent by Atlassian JIRA (v7.6.3#76005)