[jira] [Commented] (IGNITE-7338) can get value by entry.getValue but cann't get value by cache.get(entry.getKey)
[ https://issues.apache.org/jira/browse/IGNITE-7338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16306078#comment-16306078 ] Andrey Kuznetsov commented on IGNITE-7338: -- At the very least cache key class (CcbCdrChargeRuleKey) is needed to understand the issue. Cache and cluster configuration can also be useful. > can get value by entry.getValue but cann't get value by > cache.get(entry.getKey) > --- > > Key: IGNITE-7338 > URL: https://issues.apache.org/jira/browse/IGNITE-7338 > Project: Ignite > Issue Type: Bug > Components: cache >Affects Versions: 1.9 >Reporter: dean >Priority: Critical > > bossapp@Linux-Power-NB-AltiDB2-XT:/home/bossapp6$java -version > java version "1.7.0" > Java(TM) SE Runtime Environment (build pxp6470_27sr4fp15-20171116_01(SR4 > FP15)) > IBM J9 VM (build 2.7, JRE 1.7.0 Linux ppc64-64 Compressed References > 20171011_366929 (JIT enabled, AOT enabled) > J9VM - R27_Java727_SR4_20171011_1720_B366929 > JIT - tr.r13.java_20171011_366929 > GC - R27_Java727_SR4_20171011_1720_B366929_CMPRSS > J9CL - 20171011_366929) > JCL - 20171109_01 based on Oracle jdk7u161-b13 > bossapp@Linux-Power-NB-AltiDB2-XT:/home/bossapp6$lsb_release -a > LSB Version: > :core-4.0-noarch:core-4.0-ppc64:graphics-4.0-noarch:graphics-4.0-ppc64:printing-4.0-noarch:printing-4.0-ppc64 > Distributor ID: n/a > Description: redhat-4 > Release: n/a > Codename: n/a > {color:#d04437}test code:{color} > if(cacheName.equalsIgnoreCase("tariff-ccb_cdr_charge_rule")){ > CcbCdrChargeRuleKey ccbCdrChargeRuleKey = new CcbCdrChargeRuleKey(); > ccbCdrChargeRuleKey.setFileType("573"); > ccbCdrChargeRuleKey.setSourceType("5"); > Object cacheDate = ignite.cache(cacheName).get(ccbCdrChargeRuleKey); > logger.debug(LogProperty.LOGTYPE_DETAIL, ccbCdrChargeRuleKey+"Object > eKey:" + cacheDate ); > ccbCdrChargeRuleKey.setFileType("1233"); > ccbCdrChargeRuleKey.setSourceType("14"); > Object cacheDate1 = ignite.cache(cacheName).get(ccbCdrChargeRuleKey); > logger.debug(LogProperty.LOGTYPE_DETAIL, "Object1:" + cacheDate1 > +"ccbCdrChargeRuleKey"+ccbCdrChargeRuleKey.hashCode()); > IgniteCache cacheDate3 = ignite.cache(cacheName); > for (Cache.Entry e : cacheDate3) { > List cacheModelList = (List) > cacheDate3.get(e.getKey()); > > {color:#d04437}logger.debug(LogProperty.LOGTYPE_DETAIL,"e.getKey():"+e.getKey() > +" cacheModelList:" + cacheModelList ); > logger.debug(LogProperty.LOGTYPE_DETAIL,"e.getValue():"+e.getValue() > );{color} >} > } > results: > 2017-12-2819:17:36,322||DEBUG||frame_thread_nodestart| > com.newland.boss.cloud.commons.igniteclient.PlatformInitIgniteClient.test(PlatformInitIgniteClient.java:337)| > {color:#d04437}e.getKey():573,5 cacheModelList:null{color} > 2017-12-2819:17:36,323||DEBUG||frame_thread_nodestart| > com.newland.boss.cloud.commons.igniteclient.PlatformInitIgniteClient.test(PlatformInitIgniteClient.java:338)| > > {color:#d04437}e.getValue():[CcbCdrChargeRule{ccbCdrChargeRuleKey=573,5, > bizDomainCode='3', conditionGroupId=, fileType='573', > preProcessUnitClass='PreProcessGprs', priority=1, rateItemTypes='6', > ratingClass='RatingGprs', ruleDesc='国际出访GPRS 专网', sourceType='5', > userTariffClass='GetUserTariffInfoGprs', version='0.0.1'}]{color} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (IGNITE-5553) Ignite PDS 2: IgnitePersistentStoreDataStructuresTest testSet assertion error
[ https://issues.apache.org/jira/browse/IGNITE-5553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Kuznetsov updated IGNITE-5553: - Priority: Major (was: Critical) > Ignite PDS 2: IgnitePersistentStoreDataStructuresTest testSet assertion error > - > > Key: IGNITE-5553 > URL: https://issues.apache.org/jira/browse/IGNITE-5553 > Project: Ignite > Issue Type: Bug > Components: data structures, persistence >Affects Versions: 2.1 >Reporter: Dmitriy Pavlov >Assignee: Andrey Kuznetsov > Labels: MakeTeamcityGreenAgain, Muted_test, test-fail > > h2. Notes-4435 > When IgniteSet is restored from persistence, size of set is always 0, [link > to test > history|http://ci.ignite.apache.org/project.html?projectId=Ignite20Tests&testNameId=-7043871603266099589&tab=testDetails]. > h2. Detailed description > Unlike *IgniteQueue* which uses separate cache key to store its size > *IgniteSet* stores it in a field of some class. > Test from the link above shows very clearly that after restoring memory state > from PDS all set values are restored correctly but size is lost. > h2. Proposed solution > One possible solution might be to do the same thing as *IgniteQueue* does: > size of *IgniteSet* must be stored is cache instead of volatile in-memory > fields of random classes. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (IGNITE-7386) Get rid of LongAdder8, ConcurrentHashMap8, etc
[ https://issues.apache.org/jira/browse/IGNITE-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Kuznetsov reassigned IGNITE-7386: Assignee: Andrey Kuznetsov (was: Aleksey Plekhanov) > Get rid of LongAdder8, ConcurrentHashMap8, etc > -- > > Key: IGNITE-7386 > URL: https://issues.apache.org/jira/browse/IGNITE-7386 > Project: Ignite > Issue Type: Task >Reporter: Anton Vinogradov >Assignee: Andrey Kuznetsov >Priority: Major > Fix For: 2.5 > > > Since we're dropping java7 support there is no need now to use > {{LongAdder8}}, {{ConcurrentHashMap8}}, ... > We should remove all classes from {{org.jsr166}} namespace and use > corresponding classes from jdk8. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-6711) DataRegionMetrics#totalAllocatedPages is not valid after node restart
[ https://issues.apache.org/jira/browse/IGNITE-6711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16332540#comment-16332540 ] Andrey Kuznetsov commented on IGNITE-6711: -- [~avinogradov], it look more elegant now, after you removed redundant mapping from data regions to their allocation counters. > DataRegionMetrics#totalAllocatedPages is not valid after node restart > - > > Key: IGNITE-6711 > URL: https://issues.apache.org/jira/browse/IGNITE-6711 > Project: Ignite > Issue Type: Bug > Components: general >Affects Versions: 2.2 >Reporter: Alexey Goncharuk >Assignee: Andrey Kuznetsov >Priority: Major > Labels: iep-6, newbie > Fix For: 2.4 > > > Currently, data region metric tracks total allocated pages by a callback on > page allocation. However, when a node with enabled persistence is started, > some of the pages are already allocated, which leads to an incorrect metric > value. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (IGNITE-6711) DataRegionMetrics#totalAllocatedPages is not valid after node restart
[ https://issues.apache.org/jira/browse/IGNITE-6711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16332540#comment-16332540 ] Andrey Kuznetsov edited comment on IGNITE-6711 at 1/19/18 5:19 PM: --- [~avinogradov], it looks more elegant now, after you removed redundant mapping from data regions to their allocation counters. was (Author: andrey-kuznetsov): [~avinogradov], it look more elegant now, after you removed redundant mapping from data regions to their allocation counters. > DataRegionMetrics#totalAllocatedPages is not valid after node restart > - > > Key: IGNITE-6711 > URL: https://issues.apache.org/jira/browse/IGNITE-6711 > Project: Ignite > Issue Type: Bug > Components: general >Affects Versions: 2.2 >Reporter: Alexey Goncharuk >Assignee: Andrey Kuznetsov >Priority: Major > Labels: iep-6, newbie > Fix For: 2.4 > > > Currently, data region metric tracks total allocated pages by a callback on > page allocation. However, when a node with enabled persistence is started, > some of the pages are already allocated, which leads to an incorrect metric > value. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7491) Documentation: add new data region metrics description
Andrey Kuznetsov created IGNITE-7491: Summary: Documentation: add new data region metrics description Key: IGNITE-7491 URL: https://issues.apache.org/jira/browse/IGNITE-7491 Project: Ignite Issue Type: Task Components: documentation Reporter: Andrey Kuznetsov Assignee: Denis Magda Fix For: 2.4 Newly created data region metrics should be documented. * `getTotalAllocatedSize` -- same as `getTotalAllocatedPages` but in bytes. * `getPhysicalMemorySize` -- same as `getPhysicalMemoryPages` but in bytes. * `getCheckpointBufferPages` -- gets checkpoint buffer size in pages. * `getCheckpointBufferSize` -- gets checkpoint buffer size in bytes. * `getPageSize` -- gets memory page size. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-7386) Get rid of org.jsr166.LongAdder8, org.jsr166.ThreadLocalRandom
[ https://issues.apache.org/jira/browse/IGNITE-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Kuznetsov updated IGNITE-7386: - Summary: Get rid of org.jsr166.LongAdder8, org.jsr166.ThreadLocalRandom (was: Get rid of LongAdder8, ConcurrentHashMap8, etc) > Get rid of org.jsr166.LongAdder8, org.jsr166.ThreadLocalRandom > -- > > Key: IGNITE-7386 > URL: https://issues.apache.org/jira/browse/IGNITE-7386 > Project: Ignite > Issue Type: Task >Reporter: Anton Vinogradov >Assignee: Andrey Kuznetsov >Priority: Major > Fix For: 2.5 > > > Since we're dropping java7 support there is no need now to use > {{LongAdder8}}, {{ConcurrentHashMap8}}, ... > We should remove all classes from {{org.jsr166}} namespace and use > corresponding classes from jdk8. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-7386) Get rid of org.jsr166.LongAdder8, org.jsr166.ThreadLocalRandom
[ https://issues.apache.org/jira/browse/IGNITE-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Kuznetsov updated IGNITE-7386: - Description: Since we're switching to java8 there is no need now to use these classes anymore. (was: Since we're dropping java7 support there is no need now to use {{LongAdder8}}, {{ConcurrentHashMap8}}, ... We should remove all classes from {{org.jsr166}} namespace and use corresponding classes from jdk8.) > Get rid of org.jsr166.LongAdder8, org.jsr166.ThreadLocalRandom > -- > > Key: IGNITE-7386 > URL: https://issues.apache.org/jira/browse/IGNITE-7386 > Project: Ignite > Issue Type: Task >Reporter: Anton Vinogradov >Assignee: Andrey Kuznetsov >Priority: Major > Fix For: 2.5 > > > Since we're switching to java8 there is no need now to use these classes > anymore. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7513) Get rid of org.jsr166.ConcurrentHashMap8
Andrey Kuznetsov created IGNITE-7513: Summary: Get rid of org.jsr166.ConcurrentHashMap8 Key: IGNITE-7513 URL: https://issues.apache.org/jira/browse/IGNITE-7513 Project: Ignite Issue Type: Task Reporter: Andrey Kuznetsov Assignee: Andrey Kuznetsov Fix For: 2.5 This class was made of ConcurrentHashMapV8, an intermadiate implementation of Java8's ConcurrentHashMap. Now we should switch to standard CHM. Possibly, we'll have to check for performance implications. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-7513) Get rid of org.jsr166.ConcurrentHashMap8
[ https://issues.apache.org/jira/browse/IGNITE-7513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Kuznetsov updated IGNITE-7513: - Description: This class was made of ConcurrentHashMapV8, an intermediate implementation of Java8's ConcurrentHashMap. Now we should switch to standard CHM. Possibly, we'll have to check for performance implications. (was: This class was made of ConcurrentHashMapV8, an intermadiate implementation of Java8's ConcurrentHashMap. Now we should switch to standard CHM. Possibly, we'll have to check for performance implications.) > Get rid of org.jsr166.ConcurrentHashMap8 > > > Key: IGNITE-7513 > URL: https://issues.apache.org/jira/browse/IGNITE-7513 > Project: Ignite > Issue Type: Task >Reporter: Andrey Kuznetsov >Assignee: Andrey Kuznetsov >Priority: Major > Fix For: 2.5 > > > This class was made of ConcurrentHashMapV8, an intermediate implementation of > Java8's ConcurrentHashMap. Now we should switch to standard CHM. Possibly, > we'll have to check for performance implications. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-7513) Get rid of org.jsr166.ConcurrentHashMap8
[ https://issues.apache.org/jira/browse/IGNITE-7513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Kuznetsov updated IGNITE-7513: - Issue Type: Sub-task (was: Task) Parent: IGNITE-7386 > Get rid of org.jsr166.ConcurrentHashMap8 > > > Key: IGNITE-7513 > URL: https://issues.apache.org/jira/browse/IGNITE-7513 > Project: Ignite > Issue Type: Sub-task >Reporter: Andrey Kuznetsov >Assignee: Andrey Kuznetsov >Priority: Major > Fix For: 2.5 > > > This class was made of ConcurrentHashMapV8, an intermediate implementation of > Java8's ConcurrentHashMap. Now we should switch to standard CHM. Possibly, > we'll have to check for performance implications. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-7386) Get rid of LongAdder8, ConcurrentHashMap8, etc
[ https://issues.apache.org/jira/browse/IGNITE-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Kuznetsov updated IGNITE-7386: - Summary: Get rid of LongAdder8, ConcurrentHashMap8, etc (was: Get rid of org.jsr166.LongAdder8, org.jsr166.ThreadLocalRandom) > Get rid of LongAdder8, ConcurrentHashMap8, etc > -- > > Key: IGNITE-7386 > URL: https://issues.apache.org/jira/browse/IGNITE-7386 > Project: Ignite > Issue Type: Task >Reporter: Anton Vinogradov >Assignee: Andrey Kuznetsov >Priority: Major > Fix For: 2.5 > > > Since we're switching to java8 there is no need now to use these classes > anymore. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-7386) Get rid of LongAdder8, ConcurrentHashMap8, etc
[ https://issues.apache.org/jira/browse/IGNITE-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Kuznetsov updated IGNITE-7386: - Description: |Since we're dropping java7 support there is no need now to use \{{LongAdder8}}, \{{ConcurrentHashMap8}}, ... We should remove all classes from \{{org.jsr166}} namespace and use corresponding classes from jdk8.| was:Since we're switching to java8 there is no need now to use these classes anymore. > Get rid of LongAdder8, ConcurrentHashMap8, etc > -- > > Key: IGNITE-7386 > URL: https://issues.apache.org/jira/browse/IGNITE-7386 > Project: Ignite > Issue Type: Task >Reporter: Anton Vinogradov >Assignee: Andrey Kuznetsov >Priority: Major > Fix For: 2.5 > > > |Since we're dropping java7 support there is no need now to use > \{{LongAdder8}}, \{{ConcurrentHashMap8}}, ... > > We should remove all classes from \{{org.jsr166}} namespace and use > corresponding classes from jdk8.| -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7516) Get rid of org.jsr166.ConcurrentLinkedHashMap
Andrey Kuznetsov created IGNITE-7516: Summary: Get rid of org.jsr166.ConcurrentLinkedHashMap Key: IGNITE-7516 URL: https://issues.apache.org/jira/browse/IGNITE-7516 Project: Ignite Issue Type: Sub-task Reporter: Andrey Kuznetsov Assignee: Andrey Kuznetsov Fix For: 2.5 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7517) Get rid of org.jsr166.ConcurrentLinkedDeque8
Andrey Kuznetsov created IGNITE-7517: Summary: Get rid of org.jsr166.ConcurrentLinkedDeque8 Key: IGNITE-7517 URL: https://issues.apache.org/jira/browse/IGNITE-7517 Project: Ignite Issue Type: Sub-task Reporter: Andrey Kuznetsov Assignee: Andrey Kuznetsov Fix For: 2.5 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7518) Get rid of org.jsr166.LongAdder8, org.jsr166.ThreadLocalRandom
Andrey Kuznetsov created IGNITE-7518: Summary: Get rid of org.jsr166.LongAdder8, org.jsr166.ThreadLocalRandom Key: IGNITE-7518 URL: https://issues.apache.org/jira/browse/IGNITE-7518 Project: Ignite Issue Type: Sub-task Reporter: Andrey Kuznetsov Assignee: Andrey Kuznetsov Fix For: 2.5 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-7518) Get rid of org.jsr166.LongAdder8, org.jsr166.ThreadLocalRandom
[ https://issues.apache.org/jira/browse/IGNITE-7518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16342980#comment-16342980 ] Andrey Kuznetsov commented on IGNITE-7518: -- [~avinogradov], this change is ready for review, could you please take a look? > Get rid of org.jsr166.LongAdder8, org.jsr166.ThreadLocalRandom > -- > > Key: IGNITE-7518 > URL: https://issues.apache.org/jira/browse/IGNITE-7518 > Project: Ignite > Issue Type: Sub-task >Reporter: Andrey Kuznetsov >Assignee: Andrey Kuznetsov >Priority: Major > Fix For: 2.5 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-7685) Incorrect AllocationRate counting
[ https://issues.apache.org/jira/browse/IGNITE-7685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Kuznetsov updated IGNITE-7685: - Description: Each call of {{org.apache.ignite.internal.processors.cache.persistence.DataRegionMetricsImpl#updateTotalAllocatedPages}} performs {{allocRate.onHit()}} call which is not correct since delta can be negative or bigger that 1. Need to fix allocationRate counting. The fix should affect only "proper" allocations, as opposed to allocations made during recovery, storage initialization, etc. was: Each call of {{org.apache.ignite.internal.processors.cache.persistence.DataRegionMetricsImpl#updateTotalAllocatedPages}} performs {{allocRate.onHit()}} call which is not correct since delta can be negative or bigger that 1. Need to fix allocationRate counting > Incorrect AllocationRate counting > - > > Key: IGNITE-7685 > URL: https://issues.apache.org/jira/browse/IGNITE-7685 > Project: Ignite > Issue Type: Task >Affects Versions: 2.4 >Reporter: Anton Vinogradov >Assignee: Andrey Kuznetsov >Priority: Major > Fix For: 2.5 > > > Each call of > {{org.apache.ignite.internal.processors.cache.persistence.DataRegionMetricsImpl#updateTotalAllocatedPages}} > performs {{allocRate.onHit()}} call which is not correct since delta can be > negative or bigger that 1. > Need to fix allocationRate counting. The fix should affect only "proper" > allocations, as opposed to allocations made during recovery, storage > initialization, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7823) Enrich IgniteCache with asSet method
Andrey Kuznetsov created IGNITE-7823: Summary: Enrich IgniteCache with asSet method Key: IGNITE-7823 URL: https://issues.apache.org/jira/browse/IGNITE-7823 Project: Ignite Issue Type: New Feature Components: data structures Reporter: Andrey Kuznetsov Fix For: 2.5 Existing {{IgniteSet}} datastructure is good enough for small sets. For big sets it's too expensive to maintain redundant onheap data copies. Thus we'd better to add new {{IgniteCache::asSet}} method returning set adapter to existing cache. The difference between these two kinds of sets should be properly documented afterwards. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-7770) Ignite Cache 6: testRandomMixedTxConfigurations failed probably after jsr166 removal
[ https://issues.apache.org/jira/browse/IGNITE-7770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16387722#comment-16387722 ] Andrey Kuznetsov commented on IGNITE-7770: -- There are at least two possible flaky failure scenarios. 1 - Most frequent. NPE; first {{get()}} in transaction returns {{null}}, this should be impossible due to test code structure. 2 - Less frequent. Timeout while waiting for {{multithreadedAsync()}} completion. First, some optimistic transaction commits changed entry with key K and parks forever: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:304) org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177) org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140) org.apache.ignite.internal.processors.cache.transactions.TransactionProxyImpl.commit(TransactionProxyImpl.java:293) org.apache.ignite.internal.processors.cache.transactions.TxRollbackOnTimeoutTest$4.run(TxRollbackOnTimeoutTest.java:444) org.apache.ignite.testframework.GridTestUtils$9.call(GridTestUtils.java:1275) org.apache.ignite.testframework.GridTestThread.run(GridTestThread.java:86) This transaction keeps hanging even after its timeout occured. Next, a bunch of pessimistic transactions also park forever on {{put()}} for the same key K: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:304) org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177) org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140) org.apache.ignite.internal.processors.cache.GridCacheAdapter$22.op(GridCacheAdapter.java:2390) org.apache.ignite.internal.processors.cache.GridCacheAdapter$22.op(GridCacheAdapter.java:2388) org.apache.ignite.internal.processors.cache.GridCacheAdapter.syncOp(GridCacheAdapter.java:4088) org.apache.ignite.internal.processors.cache.GridCacheAdapter.put0(GridCacheAdapter.java:2388) org.apache.ignite.internal.processors.cache.GridCacheAdapter.put(GridCacheAdapter.java:2369) org.apache.ignite.internal.processors.cache.GridCacheAdapter.put(GridCacheAdapter.java:2346) org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.put(IgniteCacheProxyImpl.java:1084) org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.put(GatewayProtectedCacheProxy.java:886) org.apache.ignite.internal.processors.cache.transactions.TxRollbackOnTimeoutTest$4.run(TxRollbackOnTimeoutTest.java:442) org.apache.ignite.testframework.GridTestUtils$9.call(GridTestUtils.java:1275) org.apache.ignite.testframework.GridTestThread.run(GridTestThread.java:86) And also there is a number of threads operating normally. > Ignite Cache 6: testRandomMixedTxConfigurations failed probably after jsr166 > removal > > > Key: IGNITE-7770 > URL: https://issues.apache.org/jira/browse/IGNITE-7770 > Project: Ignite > Issue Type: Task >Reporter: Dmitriy Pavlov >Assignee: Andrey Kuznetsov >Priority: Major > Labels: MakeTeamcityGreenAgain, Muted_test > Fix For: 2.5 > > > Ignite Cache 6: testRandomMixedTxConfigurations failed probably after jsr166 > removal > After appying IGNITE-7518 Get rid of org.jsr166.LongAdder8, > IgniteCacheTestSuite6: > TxRollbackOnTimeoutNearCacheTest.testRandomMixedTxConfigurations (fail rate > 38,6%) > https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=-3733584033131292028&branch=%3Cdefault%3E&tab=testDetails -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7983) NPE in TxRollbackOnTimeoutNearCacheTest.testRandomMixedTxConfigurations
Andrey Kuznetsov created IGNITE-7983: Summary: NPE in TxRollbackOnTimeoutNearCacheTest.testRandomMixedTxConfigurations Key: IGNITE-7983 URL: https://issues.apache.org/jira/browse/IGNITE-7983 Project: Ignite Issue Type: Task Affects Versions: 2.4 Reporter: Andrey Kuznetsov Assignee: Andrey Kuznetsov Fix For: 2.5 {{get}} inside transaction sometimes returns {{null}}. This should be impossible. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-7770) Ignite Cache 6: testRandomMixedTxConfigurations failed probably after jsr166 removal
[ https://issues.apache.org/jira/browse/IGNITE-7770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16404659#comment-16404659 ] Andrey Kuznetsov commented on IGNITE-7770: -- Emerged another issue [1] for NPE scenario. [1] https://issues.apache.org/jira/browse/IGNITE-7983 > Ignite Cache 6: testRandomMixedTxConfigurations failed probably after jsr166 > removal > > > Key: IGNITE-7770 > URL: https://issues.apache.org/jira/browse/IGNITE-7770 > Project: Ignite > Issue Type: Task >Reporter: Dmitriy Pavlov >Assignee: Andrey Kuznetsov >Priority: Major > Labels: MakeTeamcityGreenAgain, Muted_test > Fix For: 2.5 > > > Ignite Cache 6: testRandomMixedTxConfigurations failed probably after jsr166 > removal > After appying IGNITE-7518 Get rid of org.jsr166.LongAdder8, > IgniteCacheTestSuite6: > TxRollbackOnTimeoutNearCacheTest.testRandomMixedTxConfigurations (fail rate > 38,6%) > https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=-3733584033131292028&branch=%3Cdefault%3E&tab=testDetails -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-7770) Ignite Cache 6: testRandomMixedTxConfigurations failed probably after jsr166 removal
[ https://issues.apache.org/jira/browse/IGNITE-7770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16406118#comment-16406118 ] Andrey Kuznetsov commented on IGNITE-7770: -- [~agura], I've prepared a fix for "infinite park" scenario, could you please take a look? NPE scenario is suppressed in the PR. > Ignite Cache 6: testRandomMixedTxConfigurations failed probably after jsr166 > removal > > > Key: IGNITE-7770 > URL: https://issues.apache.org/jira/browse/IGNITE-7770 > Project: Ignite > Issue Type: Task >Reporter: Dmitriy Pavlov >Assignee: Andrey Kuznetsov >Priority: Major > Labels: MakeTeamcityGreenAgain, Muted_test > Fix For: 2.5 > > > Ignite Cache 6: testRandomMixedTxConfigurations failed probably after jsr166 > removal > After appying IGNITE-7518 Get rid of org.jsr166.LongAdder8, > IgniteCacheTestSuite6: > TxRollbackOnTimeoutNearCacheTest.testRandomMixedTxConfigurations (fail rate > 38,6%) > https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=-3733584033131292028&branch=%3Cdefault%3E&tab=testDetails -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-5553) Ignite PDS 2: IgnitePersistentStoreDataStructuresTest testSet assertion error
[ https://issues.apache.org/jira/browse/IGNITE-5553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16406427#comment-16406427 ] Andrey Kuznetsov commented on IGNITE-5553: -- [~dpavlov], we discussed the PR with [~xtern] and replaced two init flags with single state field (that is, single mean of synchronization). Now I like the changes. > Ignite PDS 2: IgnitePersistentStoreDataStructuresTest testSet assertion error > - > > Key: IGNITE-5553 > URL: https://issues.apache.org/jira/browse/IGNITE-5553 > Project: Ignite > Issue Type: Bug > Components: data structures, persistence >Affects Versions: 2.1 >Reporter: Dmitriy Pavlov >Assignee: Pavel Pereslegin >Priority: Major > Labels: MakeTeamcityGreenAgain, Muted_test, test-fail > Fix For: 2.5 > > > h2. Notes-4435 > When IgniteSet is restored from persistence, size of set is always 0, [link > to test > history|http://ci.ignite.apache.org/project.html?projectId=Ignite20Tests&testNameId=-7043871603266099589&tab=testDetails]. > h2. Detailed description > Unlike *IgniteQueue* which uses separate cache key to store its size > *IgniteSet* stores it in a field of some class. > Test from the link above shows very clearly that after restoring memory state > from PDS all set values are restored correctly but size is lost. > h2. Proposed solution > One possible solution might be to do the same thing as *IgniteQueue* does: > size of *IgniteSet* must be stored is cache instead of volatile in-memory > fields of random classes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8025) Result of GridTestUtils.runMultiThreadedAsync has a bug in cancel() implementation
Andrey Kuznetsov created IGNITE-8025: Summary: Result of GridTestUtils.runMultiThreadedAsync has a bug in cancel() implementation Key: IGNITE-8025 URL: https://issues.apache.org/jira/browse/IGNITE-8025 Project: Ignite Issue Type: Bug Affects Versions: 2.4 Reporter: Andrey Kuznetsov Fix For: 2.5 Attachments: BugRunMTAsyncTest.java GridTestUtils.runMultiThreadedAsync returns a future with cancel() support, but cancellation implementation never interrupts threads that execute user-provided tasks. That is, those threads can continue their execution even after test method finishes. The reproducer attached demonstrates activity from threads created by test0 after test0 finished and test1 is being run. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (IGNITE-8025) Result of GridTestUtils.runMultiThreadedAsync has a bug in cancel() implementation
[ https://issues.apache.org/jira/browse/IGNITE-8025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Kuznetsov reassigned IGNITE-8025: Assignee: Andrey Kuznetsov > Result of GridTestUtils.runMultiThreadedAsync has a bug in cancel() > implementation > -- > > Key: IGNITE-8025 > URL: https://issues.apache.org/jira/browse/IGNITE-8025 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.4 >Reporter: Andrey Kuznetsov >Assignee: Andrey Kuznetsov >Priority: Major > Labels: MakeTeamcityGreenAgain, test > Fix For: 2.5 > > Attachments: BugRunMTAsyncTest.java > > > GridTestUtils.runMultiThreadedAsync returns a future with cancel() support, > but cancellation implementation never interrupts threads that execute > user-provided tasks. That is, those threads can continue their execution even > after test method finishes. > The reproducer attached demonstrates activity from threads created by test0 > after test0 finished and test1 is being run. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-7770) Ignite Cache 6: testRandomMixedTxConfigurations failed probably after jsr166 removal
[ https://issues.apache.org/jira/browse/IGNITE-7770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Kuznetsov updated IGNITE-7770: - Description: TxRollbackOnTimeoutNearCacheTest.testRandomMixedTxConfigurations fails flakily due to resulting future timeout. It's caused by poor reproducible infinite park's in optimistic TX commit: was: Ignite Cache 6: testRandomMixedTxConfigurations failed probably after jsr166 removal After appying IGNITE-7518 Get rid of org.jsr166.LongAdder8, IgniteCacheTestSuite6: TxRollbackOnTimeoutNearCacheTest.testRandomMixedTxConfigurations (fail rate 38,6%) https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=-3733584033131292028&branch=%3Cdefault%3E&tab=testDetails > Ignite Cache 6: testRandomMixedTxConfigurations failed probably after jsr166 > removal > > > Key: IGNITE-7770 > URL: https://issues.apache.org/jira/browse/IGNITE-7770 > Project: Ignite > Issue Type: Task >Reporter: Dmitriy Pavlov >Assignee: Andrey Kuznetsov >Priority: Major > Labels: MakeTeamcityGreenAgain, Muted_test > Fix For: 2.5 > > > TxRollbackOnTimeoutNearCacheTest.testRandomMixedTxConfigurations fails > flakily due to resulting future timeout. It's caused by poor reproducible > infinite park's in optimistic TX commit: -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-7770) Ignite Cache 6: testRandomMixedTxConfigurations failed probably after jsr166 removal
[ https://issues.apache.org/jira/browse/IGNITE-7770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Kuznetsov updated IGNITE-7770: - Description: TxRollbackOnTimeoutNearCacheTest.testRandomMixedTxConfigurations fails flakily due to resulting future timeout. It's caused by poor reproducible infinite park's in optimistic TX commit: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:304) org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177) org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140) org.apache.ignite.internal.processors.cache.transactions.TransactionProxyImpl.commit(TransactionProxyImpl.java:293) org.apache.ignite.internal.processors.cache.transactions.TxRollbackOnTimeoutTest$4.run(TxRollbackOnTimeoutTest.java:444) org.apache.ignite.testframework.GridTestUtils$9.call(GridTestUtils.java:1275) org.apache.ignite.testframework.GridTestThread.run(GridTestThread.java:86) There is also another failure for this test, it's described in separate ticket attached to this one. was: TxRollbackOnTimeoutNearCacheTest.testRandomMixedTxConfigurations fails flakily due to resulting future timeout. It's caused by poor reproducible infinite park's in optimistic TX commit: > Ignite Cache 6: testRandomMixedTxConfigurations failed probably after jsr166 > removal > > > Key: IGNITE-7770 > URL: https://issues.apache.org/jira/browse/IGNITE-7770 > Project: Ignite > Issue Type: Task >Reporter: Dmitriy Pavlov >Assignee: Andrey Kuznetsov >Priority: Major > Labels: MakeTeamcityGreenAgain, Muted_test > Fix For: 2.5 > > > TxRollbackOnTimeoutNearCacheTest.testRandomMixedTxConfigurations fails > flakily due to resulting future timeout. It's caused by poor reproducible > infinite park's in optimistic TX commit: > sun.misc.Unsafe.park(Native Method) > java.util.concurrent.locks.LockSupport.park(LockSupport.java:304) > org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177) > org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140) > org.apache.ignite.internal.processors.cache.transactions.TransactionProxyImpl.commit(TransactionProxyImpl.java:293) > org.apache.ignite.internal.processors.cache.transactions.TxRollbackOnTimeoutTest$4.run(TxRollbackOnTimeoutTest.java:444) > org.apache.ignite.testframework.GridTestUtils$9.call(GridTestUtils.java:1275) > org.apache.ignite.testframework.GridTestThread.run(GridTestThread.java:86) > There is also another failure for this test, it's described in separate > ticket attached to this one. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (IGNITE-7516) Get rid of org.jsr166.ConcurrentLinkedHashMap
[ https://issues.apache.org/jira/browse/IGNITE-7516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Kuznetsov resolved IGNITE-7516. -- Resolution: Won't Fix ConcurrentLinkedHashMap has no direct equivalent in modern Java standard libraries. It's a customized class from older Java versions. We can't replace it with some standard class due to performance reasons. This activity will be continued in smaller issues related to particular classes that use CLHM. > Get rid of org.jsr166.ConcurrentLinkedHashMap > - > > Key: IGNITE-7516 > URL: https://issues.apache.org/jira/browse/IGNITE-7516 > Project: Ignite > Issue Type: Sub-task >Reporter: Andrey Kuznetsov >Assignee: Andrey Kuznetsov >Priority: Major > Fix For: 2.5 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8061) GridCachePartitionedDataStructuresFailoverSelfTest.testCountDownLatchConstantMultipleTopologyChange may hang on TeamCity
Andrey Kuznetsov created IGNITE-8061: Summary: GridCachePartitionedDataStructuresFailoverSelfTest.testCountDownLatchConstantMultipleTopologyChange may hang on TeamCity Key: IGNITE-8061 URL: https://issues.apache.org/jira/browse/IGNITE-8061 Project: Ignite Issue Type: Bug Components: data structures Affects Versions: 2.4 Reporter: Andrey Kuznetsov Fix For: 2.5 Attachments: log.txt The log attached contains 'Test has been timed out and will be interrupted' message, but does not contain subsequent 'Test has been timed out [test=...'. Known facts: * There is pending GridDhtColocatedLockFuture in the log. * On timeout, InterruptedException comes to doTestCountDownLatch, but finally-block contains the code leading to distributed locking. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-8025) Result of GridTestUtils.runMultiThreadedAsync has a bug in cancel() implementation
[ https://issues.apache.org/jira/browse/IGNITE-8025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16417875#comment-16417875 ] Andrey Kuznetsov commented on IGNITE-8025: -- [~dpavlov], I've launched some suites (including Memory Leaks) separately to ensure test validity. So, there is no need to wait for one more launch completion. > Result of GridTestUtils.runMultiThreadedAsync has a bug in cancel() > implementation > -- > > Key: IGNITE-8025 > URL: https://issues.apache.org/jira/browse/IGNITE-8025 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.4 >Reporter: Andrey Kuznetsov >Assignee: Andrey Kuznetsov >Priority: Major > Labels: MakeTeamcityGreenAgain, test > Fix For: 2.5 > > Attachments: BugRunMTAsyncTest.java > > > GridTestUtils.runMultiThreadedAsync returns a future with cancel() support, > but cancellation implementation never interrupts threads that execute > user-provided tasks. That is, those threads can continue their execution even > after test method finishes. > The reproducer attached demonstrates activity from threads created by test0 > after test0 finished and test1 is being run. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-8025) Result of GridTestUtils.runMultiThreadedAsync has a bug in cancel() implementation
[ https://issues.apache.org/jira/browse/IGNITE-8025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16418791#comment-16418791 ] Andrey Kuznetsov commented on IGNITE-8025: -- [~dpavlov], thanks for your alertness. I've executed this suite again, and it turned green. Probably, new flaky tests nave been found. > Result of GridTestUtils.runMultiThreadedAsync has a bug in cancel() > implementation > -- > > Key: IGNITE-8025 > URL: https://issues.apache.org/jira/browse/IGNITE-8025 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.4 >Reporter: Andrey Kuznetsov >Assignee: Andrey Kuznetsov >Priority: Major > Labels: MakeTeamcityGreenAgain, test > Fix For: 2.5 > > Attachments: BugRunMTAsyncTest.java > > > GridTestUtils.runMultiThreadedAsync returns a future with cancel() support, > but cancellation implementation never interrupts threads that execute > user-provided tasks. That is, those threads can continue their execution even > after test method finishes. > The reproducer attached demonstrates activity from threads created by test0 > after test0 finished and test1 is being run. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (IGNITE-7772) All critical system workers health should be covered by IgniteFailureProcessor
[ https://issues.apache.org/jira/browse/IGNITE-7772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Kuznetsov reassigned IGNITE-7772: Assignee: Andrey Kuznetsov (was: Dmitriy Sorokin) > All critical system workers health should be covered by IgniteFailureProcessor > -- > > Key: IGNITE-7772 > URL: https://issues.apache.org/jira/browse/IGNITE-7772 > Project: Ignite > Issue Type: Task >Reporter: Anton Vinogradov >Assignee: Andrey Kuznetsov >Priority: Major > Labels: iep-14 > Fix For: 2.5 > > > List of system workers should be covered by this engine: > disco-event-worker > tcp-disco-sock-reader > tcp-disco-srvr > tcp-disco-msg-worker > tcp-comm-worker > grid-nio-worker-tcp-comm > exchange-worker > sys-stripe > grid-timeout-worker > db-checkpoint-thread > wal-file-archiver > ttl-cleanup-worker > nio-acceptor -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-7772) All critical system workers health should be covered by IgniteFailureProcessor
[ https://issues.apache.org/jira/browse/IGNITE-7772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Kuznetsov updated IGNITE-7772: - Description: List of system workers should be covered by this engine: * disco-event-worker * tcp-disco-sock-reader * tcp-disco-srvr * tcp-disco-msg-worker * tcp-comm-worker * grid-nio-worker-tcp-comm * exchange-worker * sys-stripe * grid-timeout-worker * db-checkpoint-thread * wal-file-archiver * wal-write-worker * ttl-cleanup-worker * nio-acceptor was: List of system workers should be covered by this engine: disco-event-worker tcp-disco-sock-reader tcp-disco-srvr tcp-disco-msg-worker tcp-comm-worker grid-nio-worker-tcp-comm exchange-worker sys-stripe grid-timeout-worker db-checkpoint-thread wal-file-archiver ttl-cleanup-worker nio-acceptor > All critical system workers health should be covered by IgniteFailureProcessor > -- > > Key: IGNITE-7772 > URL: https://issues.apache.org/jira/browse/IGNITE-7772 > Project: Ignite > Issue Type: Task >Reporter: Anton Vinogradov >Assignee: Andrey Kuznetsov >Priority: Major > Labels: iep-14 > Fix For: 2.5 > > > List of system workers should be covered by this engine: > * disco-event-worker > * tcp-disco-sock-reader > * tcp-disco-srvr > * tcp-disco-msg-worker > * tcp-comm-worker > * grid-nio-worker-tcp-comm > * exchange-worker > * sys-stripe > * grid-timeout-worker > * db-checkpoint-thread > * wal-file-archiver > * wal-write-worker > * ttl-cleanup-worker > * nio-acceptor -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-7772) All critical system workers health should be covered by IgniteFailureProcessor
[ https://issues.apache.org/jira/browse/IGNITE-7772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Kuznetsov updated IGNITE-7772: - Description: List of system workers should be covered by this engine: * disco-event-worker * tcp-disco-srvr * tcp-disco-msg-worker * tcp-comm-worker * grid-nio-worker-tcp-comm * exchange-worker * sys-stripe * grid-timeout-worker * db-checkpoint-thread * wal-file-archiver * wal-write-worker * ttl-cleanup-worker * nio-acceptor was: List of system workers should be covered by this engine: * disco-event-worker * tcp-disco-sock-reader * tcp-disco-srvr * tcp-disco-msg-worker * tcp-comm-worker * grid-nio-worker-tcp-comm * exchange-worker * sys-stripe * grid-timeout-worker * db-checkpoint-thread * wal-file-archiver * wal-write-worker * ttl-cleanup-worker * nio-acceptor > All critical system workers health should be covered by IgniteFailureProcessor > -- > > Key: IGNITE-7772 > URL: https://issues.apache.org/jira/browse/IGNITE-7772 > Project: Ignite > Issue Type: Task >Reporter: Anton Vinogradov >Assignee: Andrey Kuznetsov >Priority: Major > Labels: iep-14 > Fix For: 2.5 > > > List of system workers should be covered by this engine: > * disco-event-worker > * tcp-disco-srvr > * tcp-disco-msg-worker > * tcp-comm-worker > * grid-nio-worker-tcp-comm > * exchange-worker > * sys-stripe > * grid-timeout-worker > * db-checkpoint-thread > * wal-file-archiver > * wal-write-worker > * ttl-cleanup-worker > * nio-acceptor -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (IGNITE-5655) Introduce pluggable string encoder/decoder
[ https://issues.apache.org/jira/browse/IGNITE-5655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Kuznetsov resolved IGNITE-5655. -- Resolution: Won't Fix Feature completion requires significant effort, while there is no active interest for it in the community. > Introduce pluggable string encoder/decoder > -- > > Key: IGNITE-5655 > URL: https://issues.apache.org/jira/browse/IGNITE-5655 > Project: Ignite > Issue Type: New Feature > Components: binary >Affects Versions: 2.0 >Reporter: Valentin Kulichenko >Assignee: Andrey Kuznetsov >Priority: Major > Labels: iep-2, important > Fix For: 2.5 > > > Currently binary marshaller encodes strings in UTF-8. However, sometimes it > makes sense to serialize strings with different encodings to save space. > Let's add global property to control String encoding and customize our binary > protocol to support it. For instance, we can add another flag > {{ENCODED_STRING}}, which will write strings as follows: > [flag][encoding_flag][str_len][str_bytes] > First implementation should set preferred encoding for strings in > BinaryConfiguration. This setting is optional, default encoding is UTF-8. > Currently, the same BinaryConfiguration is used for all cluster nodes, thus > no encoding clashes are possible. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-7772) All critical system workers health should be covered by IgniteFailureProcessor
[ https://issues.apache.org/jira/browse/IGNITE-7772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16425655#comment-16425655 ] Andrey Kuznetsov commented on IGNITE-7772: -- The way of critical failures handling I proposed is not perfect. Reworking now. > All critical system workers health should be covered by IgniteFailureProcessor > -- > > Key: IGNITE-7772 > URL: https://issues.apache.org/jira/browse/IGNITE-7772 > Project: Ignite > Issue Type: Task >Reporter: Anton Vinogradov >Assignee: Andrey Kuznetsov >Priority: Major > Labels: iep-14 > Fix For: 2.5 > > > All critical workers listed in "IEP-14: Ignite failures handling" > (https://cwiki.apache.org/confluence/display/IGNITE/IEP-14+Ignite+failures+handling). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-7499) DataRegionMetricsImpl#getPageSize returns ZERO for system data regions
[ https://issues.apache.org/jira/browse/IGNITE-7499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16426798#comment-16426798 ] Andrey Kuznetsov commented on IGNITE-7499: -- Unlike regular data regions, system data region configuration is hardcoded, and metrics are disabled there. [~kuaw26], do we have real reasons to query metrics for system data region? Depending on the answer to this question we can choose the way of fixing the issue. > DataRegionMetricsImpl#getPageSize returns ZERO for system data regions > -- > > Key: IGNITE-7499 > URL: https://issues.apache.org/jira/browse/IGNITE-7499 > Project: Ignite > Issue Type: Bug > Components: cache >Reporter: Alexey Kuznetsov >Assignee: Andrey Kuznetsov >Priority: Major > Fix For: 2.5 > > > Working on IGNITE-7492 I found that DataRegionMetricsImpl#getPageSize returns > ZERO for system data regions. > Meanwhile there is also > org.apache.ignite.internal.pagemem.PageMemory#systemPageSize method. > That looks a bit strange, why we need PageSize and SystemPageSize ? > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-9679) Document critical workers liveness checking implementation
[ https://issues.apache.org/jira/browse/IGNITE-9679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16668842#comment-16668842 ] Andrey Kuznetsov commented on IGNITE-9679: -- [~Artem Budnikov], thanks, great job! Please consider some minor remarks. * Blocked (aka hanging) worker could be included to Critical Failures list. * Workers of Data Streamer striped pool could be added to mission critical worker list. * Due to [1], blocked worker timeout configuration became a bit trickier. Should this be mentioned in docs? [1] https://issues.apache.org/jira/browse/IGNITE-9737?focusedCommentId=16632210&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16632210 > Document critical workers liveness checking implementation > -- > > Key: IGNITE-9679 > URL: https://issues.apache.org/jira/browse/IGNITE-9679 > Project: Ignite > Issue Type: Task > Components: documentation >Reporter: Andrey Kuznetsov >Assignee: Andrey Kuznetsov >Priority: Major > Fix For: 2.7 > > > Newly implemented critical worker thread liveness checks should be mentioned > in Ignite Documentation. Brief description of the functionality follows. > Ignite node has a number of critical worker threads that should be alive and > responsive, otherwise node's health is not guaranteed. These threads monitor > each other periodically and track two aspects for a thread being checked: > - whether it's alive; > - whether it updates its internal heartbeat timestamp. > Whenever at least one of the above conditions is violated, checker thread > logs the error and calls currently configured {{FailureHandler}}. > {{IgniteConfiguration.SystemWorkerBlockedTimeout}} configuration property > affects monitoring behavior. At runtime monitoring settings can be changed > via {{FailureHandlingMxBean}}. > By default, liveness checks are enabled, but blocked system worker detection > will not lead to failure handler invocation, see > {{FailureProcessor#getDefaultFailureHandler}} . -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-10079) FileWriteAheadLogManager may return invalid lastCompactedSegment
Andrey Kuznetsov created IGNITE-10079: - Summary: FileWriteAheadLogManager may return invalid lastCompactedSegment Key: IGNITE-10079 URL: https://issues.apache.org/jira/browse/IGNITE-10079 Project: Ignite Issue Type: Bug Components: persistence Reporter: Andrey Kuznetsov Assignee: Andrey Kuznetsov Fix For: 2.8 Attachments: WalCompactionAfterRestartTest.java As of current {{master}} branch, {{FileWriteAheadLogManager#lastCompactedSegment}} may report -1 even after some segments have been actually compressed. Reproducer is attached. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-9679) Document critical workers liveness checking implementation
[ https://issues.apache.org/jira/browse/IGNITE-9679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16669859#comment-16669859 ] Andrey Kuznetsov commented on IGNITE-9679: -- [~Artem Budnikov], sorry. Maybe I overlooked something yesterday. Now I see that configuration description is up to date. > Document critical workers liveness checking implementation > -- > > Key: IGNITE-9679 > URL: https://issues.apache.org/jira/browse/IGNITE-9679 > Project: Ignite > Issue Type: Task > Components: documentation >Reporter: Andrey Kuznetsov >Assignee: Andrey Kuznetsov >Priority: Major > Fix For: 2.7 > > > Newly implemented critical worker thread liveness checks should be mentioned > in Ignite Documentation. Brief description of the functionality follows. > Ignite node has a number of critical worker threads that should be alive and > responsive, otherwise node's health is not guaranteed. These threads monitor > each other periodically and track two aspects for a thread being checked: > - whether it's alive; > - whether it updates its internal heartbeat timestamp. > Whenever at least one of the above conditions is violated, checker thread > logs the error and calls currently configured {{FailureHandler}}. > {{IgniteConfiguration.SystemWorkerBlockedTimeout}} configuration property > affects monitoring behavior. At runtime monitoring settings can be changed > via {{FailureHandlingMxBean}}. > By default, liveness checks are enabled, but blocked system worker detection > will not lead to failure handler invocation, see > {{FailureProcessor#getDefaultFailureHandler}} . -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-10079) FileWriteAheadLogManager may return invalid lastCompactedSegment
[ https://issues.apache.org/jira/browse/IGNITE-10079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16671237#comment-16671237 ] Andrey Kuznetsov commented on IGNITE-10079: --- [~ivandasch], got it. And this ordering works perfectly. There are still some possibilities to bring {{SegmentCompressStorage}} to inconsistent state. For example, in the fix I proposed, I just mitigate consequences of independent changes to {{SegmentAware.lastTruncatedArchiveIdx}} and {{SegmentCompressStorage.compressingSegments}}. Is it possible to place both into the same class and modify in atomic fashion? > FileWriteAheadLogManager may return invalid lastCompactedSegment > > > Key: IGNITE-10079 > URL: https://issues.apache.org/jira/browse/IGNITE-10079 > Project: Ignite > Issue Type: Bug > Components: persistence >Reporter: Andrey Kuznetsov >Assignee: Andrey Kuznetsov >Priority: Major > Fix For: 2.8 > > Attachments: WalCompactionAfterRestartTest.java > > > As of current {{master}} branch, > {{FileWriteAheadLogManager#lastCompactedSegment}} may report -1 even after > some segments have been actually compressed. Reproducer is attached. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-10079) FileWriteAheadLogManager may return invalid lastCompactedSegment
[ https://issues.apache.org/jira/browse/IGNITE-10079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16673306#comment-16673306 ] Andrey Kuznetsov commented on IGNITE-10079: --- [~akalashnikov], thanks for your notes and suggestions. I will work on them next week. > FileWriteAheadLogManager may return invalid lastCompactedSegment > > > Key: IGNITE-10079 > URL: https://issues.apache.org/jira/browse/IGNITE-10079 > Project: Ignite > Issue Type: Bug > Components: persistence >Reporter: Andrey Kuznetsov >Assignee: Andrey Kuznetsov >Priority: Major > Fix For: 2.8 > > Attachments: WalCompactionAfterRestartTest.java > > > As of current {{master}} branch, > {{FileWriteAheadLogManager#lastCompactedSegment}} may report -1 even after > some segments have been actually compressed. Reproducer is attached. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-10079) FileWriteAheadLogManager may return invalid lastCompactedSegment
[ https://issues.apache.org/jira/browse/IGNITE-10079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16678168#comment-16678168 ] Andrey Kuznetsov commented on IGNITE-10079: --- [~akalashnikov], I have accepted your suggestions in Upsource partly, let us agree on the rest. > FileWriteAheadLogManager may return invalid lastCompactedSegment > > > Key: IGNITE-10079 > URL: https://issues.apache.org/jira/browse/IGNITE-10079 > Project: Ignite > Issue Type: Bug > Components: persistence >Reporter: Andrey Kuznetsov >Assignee: Andrey Kuznetsov >Priority: Major > Fix For: 2.8 > > Attachments: WalCompactionAfterRestartTest.java > > > As of current {{master}} branch, > {{FileWriteAheadLogManager#lastCompactedSegment}} may report -1 even after > some segments have been actually compressed. Reproducer is attached. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-10079) FileWriteAheadLogManager may return invalid lastCompactedSegment
[ https://issues.apache.org/jira/browse/IGNITE-10079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16679562#comment-16679562 ] Andrey Kuznetsov commented on IGNITE-10079: --- [~akalashnikov], do you have any more comments on this change? > FileWriteAheadLogManager may return invalid lastCompactedSegment > > > Key: IGNITE-10079 > URL: https://issues.apache.org/jira/browse/IGNITE-10079 > Project: Ignite > Issue Type: Bug > Components: persistence >Reporter: Andrey Kuznetsov >Assignee: Andrey Kuznetsov >Priority: Major > Fix For: 2.8 > > Attachments: WalCompactionAfterRestartTest.java > > > As of current {{master}} branch, > {{FileWriteAheadLogManager#lastCompactedSegment}} may report -1 even after > some segments have been actually compressed. Reproducer is attached. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (IGNITE-10386) Add mode when WAL won't be disabled during rebalancing caused by BLT change
[ https://issues.apache.org/jira/browse/IGNITE-10386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Kuznetsov reassigned IGNITE-10386: - Assignee: Andrey Kuznetsov > Add mode when WAL won't be disabled during rebalancing caused by BLT change > --- > > Key: IGNITE-10386 > URL: https://issues.apache.org/jira/browse/IGNITE-10386 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Rakov >Assignee: Andrey Kuznetsov >Priority: Major > Fix For: 2.8 > > > Enabling IgniteSystemProperties#IGNITE_DISABLE_WAL_DURING_REBALANCING > disables WAL for cache group during rebalancing in case local node has no > OWNING partitions for this group. > We should add mode when in specific case (after BaselineTopology change) WAL > won't be disabled even if this property is switched on. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-8823) Incorrect transaction state in tx manager
[ https://issues.apache.org/jira/browse/IGNITE-8823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16721565#comment-16721565 ] Andrey Kuznetsov commented on IGNITE-8823: -- [~dpavlov], I just have rolled new master up. Also, TeamCity tests have been re-triggered, waiting for results. > Incorrect transaction state in tx manager > - > > Key: IGNITE-8823 > URL: https://issues.apache.org/jira/browse/IGNITE-8823 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.6 >Reporter: Andrey Gura >Assignee: Andrey Kuznetsov >Priority: Major > Fix For: 2.8 > > Attachments: Ignite8823ReproducerTest.java > > > Reproducable by test method {{testCreateConsistencyMultithreaded}} in > {{IgfsPrimaryMultiNodeSelfTest}} and > {{IgfsPrimaryRelaxedConsistencyMultiNodeSelfTest}}: > {noformat} > 18:34:40,701][SEVERE][sys-stripe-0-#44%ignite%][GridCacheIoManager] Failed > processing message [senderId=e273c3f8-02ed-4201-9ac8-09f9ab6a1d31, > msg=GridNearTxPrepareResponse [pending=[], > futId=b4df8831461-9735f9d5-79a0-47a3-a951-e62a03af71ef, miniId=1, > dhtVer=GridCacheVersion [topVer=140816081, order=1529336085358, nodeOrder=3], > writeVer=GridCacheVersion [topVer=140816081, order=1529336085360, > nodeOrder=3], ownedVals=null, retVal=GridCacheReturn [v=null, cacheObj=null, > success=true, invokeRes=true, loc=true, cacheId=0], clientRemapVer=null, > super=GridDistributedTxPrepareResponse > [txState=IgniteTxImplicitSingleStateImpl [init=true, recovery=false], > part=-1, err=null, super=GridDistributedBaseMessage [ver=GridCacheVersion > [topVer=140816081, order=1529336085224, nodeOrder=1], committedVers=null, > rolledbackVers=null, cnt=0, super=GridCacheIdMessage [cacheId=0] > java.lang.AssertionError: true instead of GridCacheReturnCompletableWrapper > at > org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager.removeTxReturn(IgniteTxManager.java:1098) > at > org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxFinishFuture.ackBackup(GridNearTxFinishFuture.java:533) > at > org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxFinishFuture.doFinish(GridNearTxFinishFuture.java:500) > at > org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxFinishFuture.finish(GridNearTxFinishFuture.java:417) > at > org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal$19.apply(GridNearTxLocal.java:3341) > at > org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal$19.apply(GridNearTxLocal.java:3335) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:383) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.unblock(GridFutureAdapter.java:347) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.unblockAll(GridFutureAdapter.java:335) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:495) > at > org.apache.ignite.internal.processors.cache.GridCacheCompoundFuture.onDone(GridCacheCompoundFuture.java:56) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:474) > at > org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimisticTxPrepareFuture.onComplete(GridNearOptimisticTxPrepareFuture.java:310) > at > org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimisticTxPrepareFuture.onDone(GridNearOptimisticTxPrepareFuture.java:288) > at > org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimisticTxPrepareFuture.onDone(GridNearOptimisticTxPrepareFuture.java:78) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:451) > at > org.apache.ignite.internal.util.future.GridCompoundFuture.checkComplete(GridCompoundFuture.java:285) > at > org.apache.ignite.internal.util.future.GridCompoundFuture.apply(GridCompoundFuture.java:144) > at > org.apache.ignite.internal.util.future.GridCompoundFuture.apply(GridCompoundFuture.java:45) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:383) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.unblock(GridFutureAdapter.java:347) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.unblockAll(GridFutureAdapter.java:335) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:495) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:474) > at > org.apache.ignite.internal.util.future.Gr
[jira] [Created] (IGNITE-9601) Write rollover WAL record as the last
Andrey Kuznetsov created IGNITE-9601: Summary: Write rollover WAL record as the last Key: IGNITE-9601 URL: https://issues.apache.org/jira/browse/IGNITE-9601 Project: Ignite Issue Type: Task Affects Versions: 2.6 Reporter: Andrey Kuznetsov Assignee: Andrey Kuznetsov Fix For: 2.8 Currently, rollover WAL record gets to the next segment when being logged. Moreover, the implementation does allows data races, and rollover record is not necessarily the first record in the next segment. We are to add an option to logging facility to allow writing rollover record to the end of the current segment; subsequent records should get to the next segment then. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-9601) Write rollover WAL record as the last record in current segment
[ https://issues.apache.org/jira/browse/IGNITE-9601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Kuznetsov updated IGNITE-9601: - Summary: Write rollover WAL record as the last record in current segment (was: Write rollover WAL record as the last ) > Write rollover WAL record as the last record in current segment > --- > > Key: IGNITE-9601 > URL: https://issues.apache.org/jira/browse/IGNITE-9601 > Project: Ignite > Issue Type: Task >Affects Versions: 2.6 >Reporter: Andrey Kuznetsov >Assignee: Andrey Kuznetsov >Priority: Major > Fix For: 2.8 > > > Currently, rollover WAL record gets to the next segment when being logged. > Moreover, the implementation does allows data races, and rollover record is > not necessarily the first record in the next segment. We are to add an option > to logging facility to allow writing rollover record to the end of the > current segment; subsequent records should get to the next segment then. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-9601) Write rollover WAL record as the last record in current segment
[ https://issues.apache.org/jira/browse/IGNITE-9601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Kuznetsov updated IGNITE-9601: - Description: Currently, rollover WAL record gets to the next segment when being logged. Moreover, the implementation allows data races, and rollover record is not necessarily the first record in the next segment. We are to add an option to logging facility to allow writing rollover record to the end of the current segment; subsequent records should get to the next segment then. (was: Currently, rollover WAL record gets to the next segment when being logged. Moreover, the implementation does allows data races, and rollover record is not necessarily the first record in the next segment. We are to add an option to logging facility to allow writing rollover record to the end of the current segment; subsequent records should get to the next segment then.) > Write rollover WAL record as the last record in current segment > --- > > Key: IGNITE-9601 > URL: https://issues.apache.org/jira/browse/IGNITE-9601 > Project: Ignite > Issue Type: Task >Affects Versions: 2.6 >Reporter: Andrey Kuznetsov >Assignee: Andrey Kuznetsov >Priority: Major > Fix For: 2.8 > > > Currently, rollover WAL record gets to the next segment when being logged. > Moreover, the implementation allows data races, and rollover record is not > necessarily the first record in the next segment. We are to add an option to > logging facility to allow writing rollover record to the end of the current > segment; subsequent records should get to the next segment then. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-6587) Ignite watchdog service
[ https://issues.apache.org/jira/browse/IGNITE-6587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618882#comment-16618882 ] Andrey Kuznetsov commented on IGNITE-6587: -- [~agura], I've updated the implementation after discussing your points, see [1]. Now it's waiting for your review. > Ignite watchdog service > --- > > Key: IGNITE-6587 > URL: https://issues.apache.org/jira/browse/IGNITE-6587 > Project: Ignite > Issue Type: Improvement > Components: general >Affects Versions: 2.2 >Reporter: Alexey Goncharuk >Assignee: Andrey Kuznetsov >Priority: Major > Labels: IEP-5 > Fix For: 2.7 > > Attachments: watchdog.sh > > > As described in [1], each Ignite node has a number of system-critical > threads. We should implement a periodic check that calls failure handler when > one of the following conditions has been detected: > * Critical thread is not alive anymore. > * Critical thread 'hangs' for a long time, e.g. while executing a task > extracted from task queue. > In case of failure condition, call stacks of all threads should be logged > before invoking failure handler. > Actual list of system-critical threads can be found at [1]. > Implementations based on separate diagnostic thread seem fragile, cause this > thread become a vulnerable point with respect to thread termination and CPU > resource starvation. So we are to use self-monitoring approach: critical > threads themselves should monitor each other. > Currently we have {{o.a.i.internal.worker.WorkersRegistry}} facility that > fits best to store and track system critical threads. All of them should be > refactored to be {{GridWorker's}} and added to {{WorkersRegistry}}. Each > worker should periodically choose some subset of peer workers and check > whether > * All of them are alive. > * All of them are actively running. > It's required to add a 'heartbeat' timestamp to worker in order to implement > latter check. Additionally, infinite queue polls, waits on monitors or thread > parks should be refactored to their timed equivalents in system critical > threads. > Monitoring parameters (enable/disable, check interval, thread 'hang' > threshold, etc.) are to be set via system properties. > [1] > https://cwiki.apache.org/confluence/display/IGNITE/IEP-14+Ignite+failures+handling -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (IGNITE-6587) Ignite watchdog service
[ https://issues.apache.org/jira/browse/IGNITE-6587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618882#comment-16618882 ] Andrey Kuznetsov edited comment on IGNITE-6587 at 9/18/18 10:28 AM: [~agura], I've updated the implementation after discussing your points, see [1]. Now it's waiting for your review. [1] http://apache-ignite-developers.2346864.n4.nabble.com/Critical-worker-threads-liveness-checking-drawbacks-td34783.html was (Author: andrey-kuznetsov): [~agura], I've updated the implementation after discussing your points, see [1]. Now it's waiting for your review. > Ignite watchdog service > --- > > Key: IGNITE-6587 > URL: https://issues.apache.org/jira/browse/IGNITE-6587 > Project: Ignite > Issue Type: Improvement > Components: general >Affects Versions: 2.2 >Reporter: Alexey Goncharuk >Assignee: Andrey Kuznetsov >Priority: Major > Labels: IEP-5 > Fix For: 2.7 > > Attachments: watchdog.sh > > > As described in [1], each Ignite node has a number of system-critical > threads. We should implement a periodic check that calls failure handler when > one of the following conditions has been detected: > * Critical thread is not alive anymore. > * Critical thread 'hangs' for a long time, e.g. while executing a task > extracted from task queue. > In case of failure condition, call stacks of all threads should be logged > before invoking failure handler. > Actual list of system-critical threads can be found at [1]. > Implementations based on separate diagnostic thread seem fragile, cause this > thread become a vulnerable point with respect to thread termination and CPU > resource starvation. So we are to use self-monitoring approach: critical > threads themselves should monitor each other. > Currently we have {{o.a.i.internal.worker.WorkersRegistry}} facility that > fits best to store and track system critical threads. All of them should be > refactored to be {{GridWorker's}} and added to {{WorkersRegistry}}. Each > worker should periodically choose some subset of peer workers and check > whether > * All of them are alive. > * All of them are actively running. > It's required to add a 'heartbeat' timestamp to worker in order to implement > latter check. Additionally, infinite queue polls, waits on monitors or thread > parks should be refactored to their timed equivalents in system critical > threads. > Monitoring parameters (enable/disable, check interval, thread 'hang' > threshold, etc.) are to be set via system properties. > [1] > https://cwiki.apache.org/confluence/display/IGNITE/IEP-14+Ignite+failures+handling -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9640) [TC Bot] Determine repetitive failure types by analyzing build log
Andrey Kuznetsov created IGNITE-9640: Summary: [TC Bot] Determine repetitive failure types by analyzing build log Key: IGNITE-9640 URL: https://issues.apache.org/jira/browse/IGNITE-9640 Project: Ignite Issue Type: Task Reporter: Andrey Kuznetsov When someone is analyzing flaky test failure, it's important to distinguish between newly created failure and pre-existing one. In the latter case, the bot should not attract contributor's attention to the test. In more detail, TC build log fragments starts with identical substrings for identical failures very often, e.g. {noformat} junit.framework.AssertionFailedError at org.apache.ignite.internal.GridVersionSelfTest.testVersions(GridVersionSelfTest.java:54) {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-8570) Create lighter version of GridStringLogger
[ https://issues.apache.org/jira/browse/IGNITE-8570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16619398#comment-16619398 ] Andrey Kuznetsov commented on IGNITE-8570: -- Thanks, [~xtern]. I'll examine your change in a day. > Create lighter version of GridStringLogger > -- > > Key: IGNITE-8570 > URL: https://issues.apache.org/jira/browse/IGNITE-8570 > Project: Ignite > Issue Type: Improvement >Affects Versions: 2.4 >Reporter: Andrey Kuznetsov >Assignee: Pavel Pereslegin >Priority: Major > Fix For: 2.7 > > > _+Problem with current GridStringLogger implementation+_: > Most usages of {{GridStringLogger}} in test assumes the following scenario. > First, it is set as a logger for some Ignite node. > Then, after some activity on that node, log content is searched for some > predefined strings. > {{GridStringLogger}} uses {{StringBuilder}} of bounded size internally to > store log contents, older contents gets dropped on exaustion. > Thus, changes that add more logging may damage some independent tests that > use {{GridStringLogger}}. > > +_The suggestion for new implementation:_+ > The suggestion is to implement and use another test logger conforming to > these requirements: > * It does not accumulate any logs(actually, it will print no logs to > anywhere) > * It allows to set the listener that fires when log message matches certain > regular expression, {{Matcher}} can be passed to the listener > > _+Proposed design+_, pseudocode: > ``` > Class GridRegexpLogger implements IgniteLogger{ > … > debug(String str){ > if (/* str matches pattern. */) > \{ /* notify listeners. */ } > } > … > listen("regexp", IgniteInClosure loggerListener)// listener receives > message > { /* registers listener. */ } > listenDebug("regexp", loggerListener) > { /* registers listener for debug output only. */ } > … > } > ``` > +_Sample regexp logger usage_+: > ``` > GridRegexpLogger logger; > logger.listen(“regexp”, new GridRegexpListener()); > logger.listenDebug("regexp", new GridRegexpListener()); > ``` -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-8823) Incorrect transaction state in tx manager
[ https://issues.apache.org/jira/browse/IGNITE-8823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Kuznetsov updated IGNITE-8823: - Fix Version/s: 2.8 > Incorrect transaction state in tx manager > - > > Key: IGNITE-8823 > URL: https://issues.apache.org/jira/browse/IGNITE-8823 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.6 >Reporter: Andrey Gura >Assignee: Andrey Kuznetsov >Priority: Major > Fix For: 2.8 > > Attachments: Ignite8823ReproducerTest.java > > > Reproducable by test method {{testCreateConsistencyMultithreaded}} in > {{IgfsPrimaryMultiNodeSelfTest}} and > {{IgfsPrimaryRelaxedConsistencyMultiNodeSelfTest}}: > {noformat} > 18:34:40,701][SEVERE][sys-stripe-0-#44%ignite%][GridCacheIoManager] Failed > processing message [senderId=e273c3f8-02ed-4201-9ac8-09f9ab6a1d31, > msg=GridNearTxPrepareResponse [pending=[], > futId=b4df8831461-9735f9d5-79a0-47a3-a951-e62a03af71ef, miniId=1, > dhtVer=GridCacheVersion [topVer=140816081, order=1529336085358, nodeOrder=3], > writeVer=GridCacheVersion [topVer=140816081, order=1529336085360, > nodeOrder=3], ownedVals=null, retVal=GridCacheReturn [v=null, cacheObj=null, > success=true, invokeRes=true, loc=true, cacheId=0], clientRemapVer=null, > super=GridDistributedTxPrepareResponse > [txState=IgniteTxImplicitSingleStateImpl [init=true, recovery=false], > part=-1, err=null, super=GridDistributedBaseMessage [ver=GridCacheVersion > [topVer=140816081, order=1529336085224, nodeOrder=1], committedVers=null, > rolledbackVers=null, cnt=0, super=GridCacheIdMessage [cacheId=0] > java.lang.AssertionError: true instead of GridCacheReturnCompletableWrapper > at > org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager.removeTxReturn(IgniteTxManager.java:1098) > at > org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxFinishFuture.ackBackup(GridNearTxFinishFuture.java:533) > at > org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxFinishFuture.doFinish(GridNearTxFinishFuture.java:500) > at > org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxFinishFuture.finish(GridNearTxFinishFuture.java:417) > at > org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal$19.apply(GridNearTxLocal.java:3341) > at > org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal$19.apply(GridNearTxLocal.java:3335) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:383) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.unblock(GridFutureAdapter.java:347) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.unblockAll(GridFutureAdapter.java:335) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:495) > at > org.apache.ignite.internal.processors.cache.GridCacheCompoundFuture.onDone(GridCacheCompoundFuture.java:56) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:474) > at > org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimisticTxPrepareFuture.onComplete(GridNearOptimisticTxPrepareFuture.java:310) > at > org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimisticTxPrepareFuture.onDone(GridNearOptimisticTxPrepareFuture.java:288) > at > org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimisticTxPrepareFuture.onDone(GridNearOptimisticTxPrepareFuture.java:78) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:451) > at > org.apache.ignite.internal.util.future.GridCompoundFuture.checkComplete(GridCompoundFuture.java:285) > at > org.apache.ignite.internal.util.future.GridCompoundFuture.apply(GridCompoundFuture.java:144) > at > org.apache.ignite.internal.util.future.GridCompoundFuture.apply(GridCompoundFuture.java:45) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:383) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.unblock(GridFutureAdapter.java:347) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.unblockAll(GridFutureAdapter.java:335) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:495) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:474) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:451) > at > org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimisticTxPr
[jira] [Updated] (IGNITE-8823) Incorrect transaction state in tx manager
[ https://issues.apache.org/jira/browse/IGNITE-8823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Kuznetsov updated IGNITE-8823: - Affects Version/s: 2.6 > Incorrect transaction state in tx manager > - > > Key: IGNITE-8823 > URL: https://issues.apache.org/jira/browse/IGNITE-8823 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.6 >Reporter: Andrey Gura >Assignee: Andrey Kuznetsov >Priority: Major > Fix For: 2.8 > > Attachments: Ignite8823ReproducerTest.java > > > Reproducable by test method {{testCreateConsistencyMultithreaded}} in > {{IgfsPrimaryMultiNodeSelfTest}} and > {{IgfsPrimaryRelaxedConsistencyMultiNodeSelfTest}}: > {noformat} > 18:34:40,701][SEVERE][sys-stripe-0-#44%ignite%][GridCacheIoManager] Failed > processing message [senderId=e273c3f8-02ed-4201-9ac8-09f9ab6a1d31, > msg=GridNearTxPrepareResponse [pending=[], > futId=b4df8831461-9735f9d5-79a0-47a3-a951-e62a03af71ef, miniId=1, > dhtVer=GridCacheVersion [topVer=140816081, order=1529336085358, nodeOrder=3], > writeVer=GridCacheVersion [topVer=140816081, order=1529336085360, > nodeOrder=3], ownedVals=null, retVal=GridCacheReturn [v=null, cacheObj=null, > success=true, invokeRes=true, loc=true, cacheId=0], clientRemapVer=null, > super=GridDistributedTxPrepareResponse > [txState=IgniteTxImplicitSingleStateImpl [init=true, recovery=false], > part=-1, err=null, super=GridDistributedBaseMessage [ver=GridCacheVersion > [topVer=140816081, order=1529336085224, nodeOrder=1], committedVers=null, > rolledbackVers=null, cnt=0, super=GridCacheIdMessage [cacheId=0] > java.lang.AssertionError: true instead of GridCacheReturnCompletableWrapper > at > org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager.removeTxReturn(IgniteTxManager.java:1098) > at > org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxFinishFuture.ackBackup(GridNearTxFinishFuture.java:533) > at > org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxFinishFuture.doFinish(GridNearTxFinishFuture.java:500) > at > org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxFinishFuture.finish(GridNearTxFinishFuture.java:417) > at > org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal$19.apply(GridNearTxLocal.java:3341) > at > org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal$19.apply(GridNearTxLocal.java:3335) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:383) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.unblock(GridFutureAdapter.java:347) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.unblockAll(GridFutureAdapter.java:335) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:495) > at > org.apache.ignite.internal.processors.cache.GridCacheCompoundFuture.onDone(GridCacheCompoundFuture.java:56) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:474) > at > org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimisticTxPrepareFuture.onComplete(GridNearOptimisticTxPrepareFuture.java:310) > at > org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimisticTxPrepareFuture.onDone(GridNearOptimisticTxPrepareFuture.java:288) > at > org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimisticTxPrepareFuture.onDone(GridNearOptimisticTxPrepareFuture.java:78) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:451) > at > org.apache.ignite.internal.util.future.GridCompoundFuture.checkComplete(GridCompoundFuture.java:285) > at > org.apache.ignite.internal.util.future.GridCompoundFuture.apply(GridCompoundFuture.java:144) > at > org.apache.ignite.internal.util.future.GridCompoundFuture.apply(GridCompoundFuture.java:45) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:383) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.unblock(GridFutureAdapter.java:347) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.unblockAll(GridFutureAdapter.java:335) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:495) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:474) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:451) > at > org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimistic
[jira] [Updated] (IGNITE-8862) IgniteChangeGlobalStateTest hangs on TeamCity
[ https://issues.apache.org/jira/browse/IGNITE-8862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Kuznetsov updated IGNITE-8862: - Fix Version/s: 2.8 > IgniteChangeGlobalStateTest hangs on TeamCity > - > > Key: IGNITE-8862 > URL: https://issues.apache.org/jira/browse/IGNITE-8862 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.5 >Reporter: Andrey Kuznetsov >Assignee: Andrey Kuznetsov >Priority: Major > Labels: MakeTeamcityGreenAgain > Fix For: 2.8 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-6893) Java Deadlocks monitoring
[ https://issues.apache.org/jira/browse/IGNITE-6893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Kuznetsov updated IGNITE-6893: - Fix Version/s: (was: 2.7) 2.8 > Java Deadlocks monitoring > - > > Key: IGNITE-6893 > URL: https://issues.apache.org/jira/browse/IGNITE-6893 > Project: Ignite > Issue Type: Improvement >Reporter: Anton Vinogradov >Assignee: Andrey Kuznetsov >Priority: Major > Labels: iep-7 > Fix For: 2.8 > > > Java Level Deadlocks > Description > This situation occurs if user or Ignite comes to a Java-level deadlock due to > a bug in code - reverse order synchronized(mux1) {synchronized (mux2) {}} > sections, reverse order reentrant locks, etc. > Detection and Solution > This most likely cannot be resolved automatically and will require JVM > restart. > We can implement periodical threaddumps analysis and detect the deadlock. > Report > Deadlock should be reported to the logs. > Web Console should fire an alert on java deadlock detection and display a > warning on UI. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
***UNCHECKED*** [jira] [Updated] (IGNITE-7499) DataRegionMetricsImpl#getPageSize returns ZERO for system data regions
[ https://issues.apache.org/jira/browse/IGNITE-7499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Kuznetsov updated IGNITE-7499: - Fix Version/s: (was: 2.7) 2.8 > DataRegionMetricsImpl#getPageSize returns ZERO for system data regions > -- > > Key: IGNITE-7499 > URL: https://issues.apache.org/jira/browse/IGNITE-7499 > Project: Ignite > Issue Type: Bug > Components: cache >Reporter: Alexey Kuznetsov >Assignee: Andrey Kuznetsov >Priority: Major > Fix For: 2.8 > > > Working on IGNITE-7492 I found that DataRegionMetricsImpl#getPageSize returns > ZERO for system data regions. > Meanwhile there is also > org.apache.ignite.internal.pagemem.PageMemory#systemPageSize method. > That looks a bit strange, why we need PageSize and SystemPageSize ? > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-8290) Activation message handling fails with AssertionError sporadically.
[ https://issues.apache.org/jira/browse/IGNITE-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Kuznetsov updated IGNITE-8290: - Fix Version/s: 2.8 > Activation message handling fails with AssertionError sporadically. > --- > > Key: IGNITE-8290 > URL: https://issues.apache.org/jira/browse/IGNITE-8290 > Project: Ignite > Issue Type: Bug > Components: persistence >Reporter: Andrew Mashenkov >Assignee: Andrey Kuznetsov >Priority: Major > Labels: MakeTeamcityGreenAgain > Fix For: 2.8 > > Attachments: disco-msg-fails-2.stack, disco-msg-fails.stack > > > Some test fails sporadically due to AssertionError while processing custom > discovery message which can leads to grid and tests handing. > PFA stacktraces. > org.apache.ignite.internal.processors.cache.persistence.db.IgnitePdsWholeClusterRestartTest > is a good startpoint. > However, the test passes at master, it's every run logs lot of > AssertionErrors . -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-8570) Create lighter version of GridStringLogger
[ https://issues.apache.org/jira/browse/IGNITE-8570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16620820#comment-16620820 ] Andrey Kuznetsov commented on IGNITE-8570: -- After discussion at Upsource and subsequent minor changes, I'm OK with the PR. > Create lighter version of GridStringLogger > -- > > Key: IGNITE-8570 > URL: https://issues.apache.org/jira/browse/IGNITE-8570 > Project: Ignite > Issue Type: Improvement >Affects Versions: 2.4 >Reporter: Andrey Kuznetsov >Assignee: Pavel Pereslegin >Priority: Major > Fix For: 2.8 > > > _+Problem with current GridStringLogger implementation+_: > Most usages of {{GridStringLogger}} in test assumes the following scenario. > First, it is set as a logger for some Ignite node. > Then, after some activity on that node, log content is searched for some > predefined strings. > {{GridStringLogger}} uses {{StringBuilder}} of bounded size internally to > store log contents, older contents gets dropped on exaustion. > Thus, changes that add more logging may damage some independent tests that > use {{GridStringLogger}}. > > +_The suggestion for new implementation:_+ > The suggestion is to implement and use another test logger conforming to > these requirements: > * It does not accumulate any logs(actually, it will print no logs to > anywhere) > * It allows to set the listener that fires when log message matches certain > regular expression, {{Matcher}} can be passed to the listener > > _+Proposed design+_, pseudocode: > ``` > Class GridRegexpLogger implements IgniteLogger{ > … > debug(String str){ > if (/* str matches pattern. */) > \{ /* notify listeners. */ } > } > … > listen("regexp", IgniteInClosure loggerListener)// listener receives > message > { /* registers listener. */ } > listenDebug("regexp", loggerListener) > { /* registers listener for debug output only. */ } > … > } > ``` > +_Sample regexp logger usage_+: > ``` > GridRegexpLogger logger; > logger.listen(“regexp”, new GridRegexpListener()); > logger.listenDebug("regexp", new GridRegexpListener()); > ``` -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9653) StopNodeOrHaltFailureHandlerTest.testJvmHalted has flaky failures on master branch
Andrey Kuznetsov created IGNITE-9653: Summary: StopNodeOrHaltFailureHandlerTest.testJvmHalted has flaky failures on master branch Key: IGNITE-9653 URL: https://issues.apache.org/jira/browse/IGNITE-9653 Project: Ignite Issue Type: Bug Reporter: Andrey Kuznetsov Fix For: 2.8 ``` junit.framework.AssertionFailedError at org.apache.ignite.failure.StopNodeOrHaltFailureHandlerTest.testJvmHalted(StopNodeOrHaltFailureHandlerTest.java:93) ``` -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-9653) StopNodeOrHaltFailureHandlerTest.testJvmHalted has flaky failures on master branch
[ https://issues.apache.org/jira/browse/IGNITE-9653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Kuznetsov updated IGNITE-9653: - Description: {noformat} junit.framework.AssertionFailedError at org.apache.ignite.failure.StopNodeOrHaltFailureHandlerTest.testJvmHalted(StopNodeOrHaltFailureHandlerTest.java:93) {noformat} was: ``` junit.framework.AssertionFailedError at org.apache.ignite.failure.StopNodeOrHaltFailureHandlerTest.testJvmHalted(StopNodeOrHaltFailureHandlerTest.java:93) ``` > StopNodeOrHaltFailureHandlerTest.testJvmHalted has flaky failures on master > branch > -- > > Key: IGNITE-9653 > URL: https://issues.apache.org/jira/browse/IGNITE-9653 > Project: Ignite > Issue Type: Bug >Reporter: Andrey Kuznetsov >Priority: Major > Fix For: 2.8 > > > {noformat} > junit.framework.AssertionFailedError > at > org.apache.ignite.failure.StopNodeOrHaltFailureHandlerTest.testJvmHalted(StopNodeOrHaltFailureHandlerTest.java:93) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-9653) StopNodeOrHaltFailureHandlerTest.testJvmHalted has flaky failures on master branch
[ https://issues.apache.org/jira/browse/IGNITE-9653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Kuznetsov updated IGNITE-9653: - Ignite Flags: (was: Docs Required) > StopNodeOrHaltFailureHandlerTest.testJvmHalted has flaky failures on master > branch > -- > > Key: IGNITE-9653 > URL: https://issues.apache.org/jira/browse/IGNITE-9653 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.6 >Reporter: Andrey Kuznetsov >Priority: Major > Fix For: 2.8 > > > {noformat} > junit.framework.AssertionFailedError > at > org.apache.ignite.failure.StopNodeOrHaltFailureHandlerTest.testJvmHalted(StopNodeOrHaltFailureHandlerTest.java:93) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-9653) StopNodeOrHaltFailureHandlerTest.testJvmHalted has flaky failures on master branch
[ https://issues.apache.org/jira/browse/IGNITE-9653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Kuznetsov updated IGNITE-9653: - Affects Version/s: 2.6 > StopNodeOrHaltFailureHandlerTest.testJvmHalted has flaky failures on master > branch > -- > > Key: IGNITE-9653 > URL: https://issues.apache.org/jira/browse/IGNITE-9653 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.6 >Reporter: Andrey Kuznetsov >Priority: Major > Fix For: 2.8 > > > {noformat} > junit.framework.AssertionFailedError > at > org.apache.ignite.failure.StopNodeOrHaltFailureHandlerTest.testJvmHalted(StopNodeOrHaltFailureHandlerTest.java:93) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9660) Switch default test FailureHandler to StopNodeFailureHandler
Andrey Kuznetsov created IGNITE-9660: Summary: Switch default test FailureHandler to StopNodeFailureHandler Key: IGNITE-9660 URL: https://issues.apache.org/jira/browse/IGNITE-9660 Project: Ignite Issue Type: Test Affects Versions: 2.6 Reporter: Andrey Kuznetsov Fix For: 2.8 {{GridAbstractTest.getFailureHandler()}} returns {{NoOpFailureHandler}} instance. This often leads to hiding bugs occurring in tests. {{getFailureFailureHandler()}} should return {{StopNodeFailureHandler}} instead. The change assumes re-checking failed tests and set handler to {{NoOpFailureHandler}} in subclasses where it's really a must. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-9660) Switch default test FailureHandler to StopNodeFailureHandler
[ https://issues.apache.org/jira/browse/IGNITE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Kuznetsov updated IGNITE-9660: - Issue Type: Task (was: Test) > Switch default test FailureHandler to StopNodeFailureHandler > > > Key: IGNITE-9660 > URL: https://issues.apache.org/jira/browse/IGNITE-9660 > Project: Ignite > Issue Type: Task >Affects Versions: 2.6 >Reporter: Andrey Kuznetsov >Priority: Major > Fix For: 2.8 > > > {{GridAbstractTest.getFailureHandler()}} returns {{NoOpFailureHandler}} > instance. This often leads to hiding bugs occurring in tests. > {{getFailureFailureHandler()}} should return {{StopNodeFailureHandler}} > instead. > The change assumes re-checking failed tests and set handler to > {{NoOpFailureHandler}} in subclasses where it's really a must. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-9660) Switch default test FailureHandler to StopNodeFailureHandler
[ https://issues.apache.org/jira/browse/IGNITE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16623651#comment-16623651 ] Andrey Kuznetsov commented on IGNITE-9660: -- It's better to use special test-scope failure handler instead of {{StopNodeFailureHandler}}. That handler should fail current test method as graceful as possible. > Switch default test FailureHandler to StopNodeFailureHandler > > > Key: IGNITE-9660 > URL: https://issues.apache.org/jira/browse/IGNITE-9660 > Project: Ignite > Issue Type: Task >Affects Versions: 2.6 >Reporter: Andrey Kuznetsov >Priority: Major > Fix For: 2.8 > > > {{GridAbstractTest.getFailureHandler()}} returns {{NoOpFailureHandler}} > instance. This often leads to hiding bugs occurring in tests. > {{getFailureFailureHandler()}} should return {{StopNodeFailureHandler}} > instead. > The change assumes re-checking failed tests and set handler to > {{NoOpFailureHandler}} in subclasses where it's really a must. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (IGNITE-9660) Switch default test FailureHandler to StopNodeFailureHandler
[ https://issues.apache.org/jira/browse/IGNITE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16623651#comment-16623651 ] Andrey Kuznetsov edited comment on IGNITE-9660 at 9/21/18 1:56 PM: --- It's better to use special test-scope failure handler instead of {{StopNodeFailureHandler}}. That handler mentioned in [1] should fail current test method as graceful as possible. [1] https://issues.apache.org/jira/browse/IGNITE-8227 was (Author: andrey-kuznetsov): It's better to use special test-scope failure handler instead of {{StopNodeFailureHandler}}. That handler should fail current test method as graceful as possible. > Switch default test FailureHandler to StopNodeFailureHandler > > > Key: IGNITE-9660 > URL: https://issues.apache.org/jira/browse/IGNITE-9660 > Project: Ignite > Issue Type: Task >Affects Versions: 2.6 >Reporter: Andrey Kuznetsov >Priority: Major > Fix For: 2.8 > > > {{GridAbstractTest.getFailureHandler()}} returns {{NoOpFailureHandler}} > instance. This often leads to hiding bugs occurring in tests. > {{getFailureFailureHandler()}} should return {{StopNodeFailureHandler}} > instead. > The change assumes re-checking failed tests and set handler to > {{NoOpFailureHandler}} in subclasses where it's really a must. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (IGNITE-9660) Switch default test FailureHandler to StopNodeFailureHandler
[ https://issues.apache.org/jira/browse/IGNITE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16623651#comment-16623651 ] Andrey Kuznetsov edited comment on IGNITE-9660 at 9/21/18 1:57 PM: --- It's better to use special test-scope failure handler instead of {{StopNodeFailureHandler}}. That handler (mentioned in [1]) should fail current test method as graceful as possible. [1] https://issues.apache.org/jira/browse/IGNITE-8227 was (Author: andrey-kuznetsov): It's better to use special test-scope failure handler instead of {{StopNodeFailureHandler}}. That handler mentioned in [1] should fail current test method as graceful as possible. [1] https://issues.apache.org/jira/browse/IGNITE-8227 > Switch default test FailureHandler to StopNodeFailureHandler > > > Key: IGNITE-9660 > URL: https://issues.apache.org/jira/browse/IGNITE-9660 > Project: Ignite > Issue Type: Task >Affects Versions: 2.6 >Reporter: Andrey Kuznetsov >Priority: Major > Fix For: 2.8 > > > {{GridAbstractTest.getFailureHandler()}} returns {{NoOpFailureHandler}} > instance. This often leads to hiding bugs occurring in tests. > {{getFailureFailureHandler()}} should return {{StopNodeFailureHandler}} > instead. > The change assumes re-checking failed tests and set handler to > {{NoOpFailureHandler}} in subclasses where it's really a must. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9666) TxPessimisticDeadlockDetectionCrossCacheTest.testDeadlockAnotherNear is flaky on master
Andrey Kuznetsov created IGNITE-9666: Summary: TxPessimisticDeadlockDetectionCrossCacheTest.testDeadlockAnotherNear is flaky on master Key: IGNITE-9666 URL: https://issues.apache.org/jira/browse/IGNITE-9666 Project: Ignite Issue Type: Bug Affects Versions: 2.6 Reporter: Andrey Kuznetsov Fix For: 2.8 Sometimes the test cannot pass {{assertTrue(deadlock.get())}}. Presumably, it's due to ignoring possible long JVM pauses. For example, one can see near the first 'put' pair (note timestamps) : {noformat} [2018-09-23 11:16:55,975][INFO ][tx-thread-1][root] >>> Performs put [node=TcpDiscoveryNode [id=dd46ab0e-ed28-4c67-b3c4-98900bb0, addrs=ArrayList [127.0.0.1], sockAddrs=HashSet [/127.0.0.1:47500], discPort=47500, order=1, intOrder=1, lastExchangeTime=1537690615852, loc=true, ver=2.7.0#19700101-sha1:, isClient=false], tx=TransactionProxyImpl [tx=IgniteTxAdapter [xidVer=GridCacheVersion [topVer=149170604, order=1537690611182, nodeOrder=1], writeVer=null, implicit=false, loc=true, threadId=129, startTime=1537690615791, nodeId=dd46ab0e-ed28-4c67-b3c4-98900bb0, startVer=GridCacheVersion [topVer=149170604, order=1537690611182, nodeOrder=1], endVer=null, isolation=REPEATABLE_READ, concurrency=PESSIMISTIC, timeout=500, sysInvalidate=false, sys=false, plc=2, commitVer=null, finalizing=NONE, invalidParts=null, state=ACTIVE, timedOut=false, topVer=AffinityTopologyVersion [topVer=-1, minorTopVer=0], txCounters=org.apache.ignite.internal.processors.cache.transactions.TxCounters@31c7393f, duration=155ms, onePhaseCommit=false]IgniteTxLocalAdapter [completedBase=null, sndTransformedVals=false, depEnabled=false, txState=IgniteTxStateImpl [activeCacheIds=[], recovery=null, mvccEnabled=null, txMap=EmptySet []], mvccWaitTxs=null, qryEnlisted=false, super=, size=0]GridDhtTxLocalAdapter [nearOnOriginatingNode=false, nearNodes=KeySetView [], dhtNodes=KeySetView [], explicitLock=false, super=]GridNearTxLocal [mappings=IgniteTxMappingsImpl [], nearLocallyMapped=false, colocatedLocallyMapped=false, needCheckBackup=null, hasRemoteLocks=false, trackTimeout=true, lb=null, mvccTracker=null, sql=null, thread=tx-thread-1, mappings=IgniteTxMappingsImpl [], super=], async=false, asyncRes=null], key=2, cache=cache0] [2018-09-23 11:16:55,975][INFO ][tx-thread-2][root] >>> Performs put [node=TcpDiscoveryNode [id=dd46ab0e-ed28-4c67-b3c4-98900bb0, addrs=ArrayList [127.0.0.1], sockAddrs=HashSet [/127.0.0.1:47500], discPort=47500, order=1, intOrder=1, lastExchangeTime=1537690615852, loc=true, ver=2.7.0#19700101-sha1:, isClient=false], tx=TransactionProxyImpl [tx=IgniteTxAdapter [xidVer=GridCacheVersion [topVer=149170604, order=1537690611181, nodeOrder=1], writeVer=null, implicit=false, loc=true, threadId=130, startTime=1537690615791, nodeId=dd46ab0e-ed28-4c67-b3c4-98900bb0, startVer=GridCacheVersion [topVer=149170604, order=1537690611182, nodeOrder=1], endVer=null, isolation=REPEATABLE_READ, concurrency=PESSIMISTIC, timeout=500, sysInvalidate=false, sys=false, plc=2, commitVer=null, finalizing=NONE, invalidParts=null, state=ACTIVE, timedOut=false, topVer=AffinityTopologyVersion [topVer=-1, minorTopVer=0], txCounters=org.apache.ignite.internal.processors.cache.transactions.TxCounters@14d54c9c, duration=155ms, onePhaseCommit=false]IgniteTxLocalAdapter [completedBase=null, sndTransformedVals=false, depEnabled=false, txState=IgniteTxStateImpl [activeCacheIds=[], recovery=null, mvccEnabled=null, txMap=EmptySet []], mvccWaitTxs=null, qryEnlisted=false, super=, size=0]GridDhtTxLocalAdapter [nearOnOriginatingNode=false, nearNodes=KeySetView [], dhtNodes=KeySetView [], explicitLock=false, super=]GridNearTxLocal [mappings=IgniteTxMappingsImpl [], nearLocallyMapped=false, colocatedLocallyMapped=false, needCheckBackup=null, hasRemoteLocks=false, trackTimeout=true, lb=null, mvccTracker=null, sql=null, thread=tx-thread-2, mappings=IgniteTxMappingsImpl [], super=], async=false, asyncRes=null], key=2, cache=cache1] [2018-09-23 11:16:56,378][INFO ][exchange-worker-#38%transactions.TxPessimisticDeadlockDetectionCrossCacheTest0%][time] Started exchange init [topVer=AffinityTopologyVersion [topVer=2, minorTopVer=3], mvccCrd=MvccCoordinator [nodeId=dd46ab0e-ed28-4c67-b3c4-98900bb0, crdVer=1537690602134, topVer=AffinityTopologyVersion [topVer=1, minorTopVer=0]], mvccCrdChange=false, crd=true, evt=DISCOVERY_CUSTOM_EVT, evtNode=dd46ab0e-ed28-4c67-b3c4-98900bb0, customEvt=CacheAffinityChangeMessage [id=d7540850661-799b6d10-6e53-4f8b-9595-98f8c060efa1, topVer=AffinityTopologyVersion [topVer=2, minorTopVer=0], exchId=null, partsMsg=null, exchangeNeeded=true], allowMerge=false] {noformat} And then, transactions have to roll back due to 500 ms timeout, leaving no possibility to produce deadlock. -- This message was sent by Atlassi
[jira] [Updated] (IGNITE-9640) [TC Bot] Determine repetitive failure types by analyzing build log
[ https://issues.apache.org/jira/browse/IGNITE-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Kuznetsov updated IGNITE-9640: - Description: When someone is analyzing flaky test failure, it's important to distinguish between newly created failure and pre-existing one. In the latter case, the bot should not attract contributor's attention to the test. In more detail, TC build log fragments start with identical substrings for identical failures very often, e.g. {noformat} junit.framework.AssertionFailedError at org.apache.ignite.internal.GridVersionSelfTest.testVersions(GridVersionSelfTest.java:54) {noformat} was: When someone is analyzing flaky test failure, it's important to distinguish between newly created failure and pre-existing one. In the latter case, the bot should not attract contributor's attention to the test. In more detail, TC build log fragments starts with identical substrings for identical failures very often, e.g. {noformat} junit.framework.AssertionFailedError at org.apache.ignite.internal.GridVersionSelfTest.testVersions(GridVersionSelfTest.java:54) {noformat} > [TC Bot] Determine repetitive failure types by analyzing build log > -- > > Key: IGNITE-9640 > URL: https://issues.apache.org/jira/browse/IGNITE-9640 > Project: Ignite > Issue Type: Task >Reporter: Andrey Kuznetsov >Priority: Minor > > When someone is analyzing flaky test failure, it's important to distinguish > between newly created failure and pre-existing one. In the latter case, the > bot should not attract contributor's attention to the test. > In more detail, TC build log fragments start with identical substrings for > identical failures very often, e.g. > {noformat} > junit.framework.AssertionFailedError > at > org.apache.ignite.internal.GridVersionSelfTest.testVersions(GridVersionSelfTest.java:54) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-9640) [TC Bot] Determine repetitive failure types by analyzing build log
[ https://issues.apache.org/jira/browse/IGNITE-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Kuznetsov updated IGNITE-9640: - Description: When the test is already flaky in master branch, developer has to check if possible failures in PR are the same as in master branch. As for now, this is done manually by analyzing build history on TeamCity. TC Bot can simplify this process in the following way: * When both error descriptions from PR/master build logs start with identical substring, then the failure should not be reported. * Otherwise test failure should be reported. Example of potentially identical substring: {noformat} junit.framework.AssertionFailedError at org.apache.ignite.internal.GridVersionSelfTest.testVersions(GridVersionSelfTest.java:54) {noformat} was: When someone is analyzing flaky test failure, it's important to distinguish between newly created failure and pre-existing one. In the latter case, the bot should not attract contributor's attention to the test. In more detail, TC build log fragments start with identical substrings for identical failures very often, e.g. {noformat} junit.framework.AssertionFailedError at org.apache.ignite.internal.GridVersionSelfTest.testVersions(GridVersionSelfTest.java:54) {noformat} > [TC Bot] Determine repetitive failure types by analyzing build log > -- > > Key: IGNITE-9640 > URL: https://issues.apache.org/jira/browse/IGNITE-9640 > Project: Ignite > Issue Type: Task >Reporter: Andrey Kuznetsov >Priority: Minor > > When the test is already flaky in master branch, developer has to check if > possible failures in PR are the same as in master branch. As for now, this is > done manually by analyzing build history on TeamCity. TC Bot can simplify > this process in the following way: > * When both error descriptions from PR/master build logs start with identical > substring, then the failure should not be reported. > * Otherwise test failure should be reported. > Example of potentially identical substring: > {noformat} > junit.framework.AssertionFailedError > at > org.apache.ignite.internal.GridVersionSelfTest.testVersions(GridVersionSelfTest.java:54) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9679) Document critical workers liveness checking implementation
Andrey Kuznetsov created IGNITE-9679: Summary: Document critical workers liveness checking implementation Key: IGNITE-9679 URL: https://issues.apache.org/jira/browse/IGNITE-9679 Project: Ignite Issue Type: Task Components: documentation Reporter: Andrey Kuznetsov Assignee: Denis Magda Fix For: 2.7 Newly implemented critical worker thread liveness checks should be mentioned in Ignite Documentation. Brief description of the functionality follows. Ignite node has a number of critical worker threads that should be alive and responsive, otherwise node's health is not guaranteed. These threads monitor each other periodically and track two aspects for a thread being checked: - whether it's alive; - whether it updates its internal heartbeat timestamp. Both checks use {{IgniteConfiguration.failureDetectionTimeout}} property as a threshold value. Whenever at least one of the above conditions is violated, checker thread logs the error and calls currently configured {{FailureHandler}}. Liveness checks are enabled by default, but can be disabled through {{WorkersControlMXBean.healthMonitoringEnabled}} property. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-9683) Create manual pinger for ZK client
[ https://issues.apache.org/jira/browse/IGNITE-9683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627284#comment-16627284 ] Andrey Kuznetsov commented on IGNITE-9683: -- [~Jokser], Zookeeper docs [1] mention ping requests built into Zk client: {noformat} The session is kept alive by requests sent by the client. If the session is idle for a period of time that would timeout the session, the client will send a PING request to keep the session alive. This PING request not only allows the ZooKeeper server to know that the client is still active, but it also allows the client to verify that its connection to the ZooKeeper server is still active. The timing of the PING is conservative enough to ensure reasonable time to detect a dead connection and reconnect to a new server. {noformat} Is this statement outdated, or "reasonable time" is not suitable for Discovery SPI? > Create manual pinger for ZK client > -- > > Key: IGNITE-9683 > URL: https://issues.apache.org/jira/browse/IGNITE-9683 > Project: Ignite > Issue Type: Improvement > Components: cache, zookeeper >Affects Versions: 2.5 >Reporter: Pavel Kovalenko >Priority: Major > Fix For: 2.7 > > > Connection loss with Zookeeper more than ZK session timeout for server nodes > is unacceptable. To improve durability of connrction, we need to keep session > with ZK as long possible. We need to introduce manual pinger additionally to > ZK client and ping ZK server with simple request each tick time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (IGNITE-9683) Create manual pinger for ZK client
[ https://issues.apache.org/jira/browse/IGNITE-9683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627284#comment-16627284 ] Andrey Kuznetsov edited comment on IGNITE-9683 at 9/25/18 12:51 PM: [~Jokser], Zookeeper docs [1] mention ping requests built into Zk client: "The session is kept alive by requests sent by the client. If the session is idle for a period of time that would timeout the session, the client will send a PING request to keep the session alive. This PING request not only allows the ZooKeeper server to know that the client is still active, but it also allows the client to verify that its connection to the ZooKeeper server is still active. The timing of the PING is conservative enough to ensure reasonable time to detect a dead connection and reconnect to a new server." Is this statement outdated, or "reasonable time" is not suitable for Discovery SPI? was (Author: andrey-kuznetsov): [~Jokser], Zookeeper docs [1] mention ping requests built into Zk client: {noformat} The session is kept alive by requests sent by the client. If the session is idle for a period of time that would timeout the session, the client will send a PING request to keep the session alive. This PING request not only allows the ZooKeeper server to know that the client is still active, but it also allows the client to verify that its connection to the ZooKeeper server is still active. The timing of the PING is conservative enough to ensure reasonable time to detect a dead connection and reconnect to a new server. {noformat} Is this statement outdated, or "reasonable time" is not suitable for Discovery SPI? > Create manual pinger for ZK client > -- > > Key: IGNITE-9683 > URL: https://issues.apache.org/jira/browse/IGNITE-9683 > Project: Ignite > Issue Type: Improvement > Components: cache, zookeeper >Affects Versions: 2.5 >Reporter: Pavel Kovalenko >Priority: Major > Fix For: 2.7 > > > Connection loss with Zookeeper more than ZK session timeout for server nodes > is unacceptable. To improve durability of connrction, we need to keep session > with ZK as long possible. We need to introduce manual pinger additionally to > ZK client and ping ZK server with simple request each tick time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (IGNITE-9683) Create manual pinger for ZK client
[ https://issues.apache.org/jira/browse/IGNITE-9683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627284#comment-16627284 ] Andrey Kuznetsov edited comment on IGNITE-9683 at 9/25/18 12:51 PM: [~Jokser], Zookeeper docs [1] mention ping requests built into Zk client: "The session is kept alive by requests sent by the client. If the session is idle for a period of time that would timeout the session, the client will send a PING request to keep the session alive. This PING request not only allows the ZooKeeper server to know that the client is still active, but it also allows the client to verify that its connection to the ZooKeeper server is still active. The timing of the PING is conservative enough to ensure reasonable time to detect a dead connection and reconnect to a new server." Is this statement outdated, or "reasonable time" is not suitable for Discovery SPI? [1] https://zookeeper.apache.org/doc/r3.4.13/zookeeperProgrammers.html was (Author: andrey-kuznetsov): [~Jokser], Zookeeper docs [1] mention ping requests built into Zk client: "The session is kept alive by requests sent by the client. If the session is idle for a period of time that would timeout the session, the client will send a PING request to keep the session alive. This PING request not only allows the ZooKeeper server to know that the client is still active, but it also allows the client to verify that its connection to the ZooKeeper server is still active. The timing of the PING is conservative enough to ensure reasonable time to detect a dead connection and reconnect to a new server." Is this statement outdated, or "reasonable time" is not suitable for Discovery SPI? > Create manual pinger for ZK client > -- > > Key: IGNITE-9683 > URL: https://issues.apache.org/jira/browse/IGNITE-9683 > Project: Ignite > Issue Type: Improvement > Components: cache, zookeeper >Affects Versions: 2.5 >Reporter: Pavel Kovalenko >Priority: Major > Fix For: 2.7 > > > Connection loss with Zookeeper more than ZK session timeout for server nodes > is unacceptable. To improve durability of connrction, we need to keep session > with ZK as long possible. We need to introduce manual pinger additionally to > ZK client and ping ZK server with simple request each tick time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9695) Add a way to prevent per-cache WAL disabling in WalStateManager
Andrey Kuznetsov created IGNITE-9695: Summary: Add a way to prevent per-cache WAL disabling in WalStateManager Key: IGNITE-9695 URL: https://issues.apache.org/jira/browse/IGNITE-9695 Project: Ignite Issue Type: Task Affects Versions: 2.6 Reporter: Andrey Kuznetsov Assignee: Andrey Kuznetsov Fix For: 2.8 When this prevention is on, {{WalStateManager.init()}} should return an error-holding future immediately. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9731) NPE is possible during WAL flushing
Andrey Kuznetsov created IGNITE-9731: Summary: NPE is possible during WAL flushing Key: IGNITE-9731 URL: https://issues.apache.org/jira/browse/IGNITE-9731 Project: Ignite Issue Type: Task Reporter: Andrey Kuznetsov Fix For: 2.7 Attachments: WalRolloverRecordLoggingTest.java {{FileWriteAheadLogManager.flush()}} seems to be not thread-safe anymore in master branch. The test attached produces the following NPE: {noformat} java.lang.NullPointerException at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileHandle.getSegmentId(FileWriteAheadLogManager.java:2371) at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.needFsync(FileWriteAheadLogManager.java:2642) at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.fsync(FileWriteAheadLogManager.java:2668) at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.access$1900(FileWriteAheadLogManager.java:2445) at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.flush(FileWriteAheadLogManager.java:866) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.markCheckpointBegin(GridCacheDatabaseSharedManager.java:3633) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.doCheckpoint(GridCacheDatabaseSharedManager.java:3126) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.body(GridCacheDatabaseSharedManager.java:3025) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) at java.lang.Thread.run(Thread.java:748) {noformat} This could be possibly brought by commit [1]. [1] https://github.com/apache/ignite/commit/2f72fe758d4256c4eb4610e5922ad3d174b43dc5 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-9731) NPE is possible during WAL flushing
[ https://issues.apache.org/jira/browse/IGNITE-9731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Kuznetsov updated IGNITE-9731: - Issue Type: Bug (was: Task) > NPE is possible during WAL flushing > --- > > Key: IGNITE-9731 > URL: https://issues.apache.org/jira/browse/IGNITE-9731 > Project: Ignite > Issue Type: Bug >Reporter: Andrey Kuznetsov >Priority: Critical > Fix For: 2.7 > > Attachments: WalRolloverRecordLoggingTest.java > > > {{FileWriteAheadLogManager.flush()}} seems to be not thread-safe anymore in > master branch. The test attached produces the following NPE: > {noformat} > java.lang.NullPointerException > at > org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileHandle.getSegmentId(FileWriteAheadLogManager.java:2371) > at > org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.needFsync(FileWriteAheadLogManager.java:2642) > at > org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.fsync(FileWriteAheadLogManager.java:2668) > at > org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.access$1900(FileWriteAheadLogManager.java:2445) > at > org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.flush(FileWriteAheadLogManager.java:866) > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.markCheckpointBegin(GridCacheDatabaseSharedManager.java:3633) > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.doCheckpoint(GridCacheDatabaseSharedManager.java:3126) > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.body(GridCacheDatabaseSharedManager.java:3025) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) > at java.lang.Thread.run(Thread.java:748) > {noformat} > This could be possibly brought by commit [1]. > [1] > https://github.com/apache/ignite/commit/2f72fe758d4256c4eb4610e5922ad3d174b43dc5 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (IGNITE-9737) Ignite WatchDog service should be configurable
[ https://issues.apache.org/jira/browse/IGNITE-9737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Kuznetsov reassigned IGNITE-9737: Assignee: Andrey Kuznetsov > Ignite WatchDog service should be configurable > -- > > Key: IGNITE-9737 > URL: https://issues.apache.org/jira/browse/IGNITE-9737 > Project: Ignite > Issue Type: Bug >Reporter: Nikolay Izhikov >Assignee: Andrey Kuznetsov >Priority: Blocker > Fix For: 2.7 > > > At the moment, there is no way to disable Ignite WatchDog service from config > or JVM option. > In any corner case or bug in that feature Ignite can become fully unusable > due to unpredictable shutdown. > We should provide a way to enable/disable this feature from config or from > JVM option. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-9737) Ignite WatchDog service should be configurable
[ https://issues.apache.org/jira/browse/IGNITE-9737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16632210#comment-16632210 ] Andrey Kuznetsov commented on IGNITE-9737: -- Critical threads monitoring should be configured in a more flexible way than it was originally suggested in issue description. # Separate timeout setting should be introduced for {{SYSTEM_WORKER_BLOCKED}} detection in {{IgniteConfiguration}}, falling back to {{IgniteConfiguration.failureDetectionTimeout}} if omitted. This can be overridden by a system property or changed in runtime by newly created management bean. Timeout value of 0 should denote infinite timeout, thus it has an effect of detection disabling. # Separate timeout should be introduced for checkpoint read lock acquisition in the same manner. # When timed out, checkpoint read lock acquisition should not throw an exception unless failure handler invalidated the node. This will guarantee neutral behavior of {{NoOpFailureHandler}}. > Ignite WatchDog service should be configurable > -- > > Key: IGNITE-9737 > URL: https://issues.apache.org/jira/browse/IGNITE-9737 > Project: Ignite > Issue Type: Bug >Reporter: Nikolay Izhikov >Assignee: Andrey Kuznetsov >Priority: Blocker > Fix For: 2.7 > > > At the moment, there is no way to disable Ignite WatchDog service from config > or JVM option. > In any corner case or bug in that feature Ignite can become fully unusable > due to unpredictable shutdown. > We should provide a way to enable/disable this feature from config or from > JVM option. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (IGNITE-9710) Ignite watchdog service handles longrunning cache creation
[ https://issues.apache.org/jira/browse/IGNITE-9710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Kuznetsov reassigned IGNITE-9710: Assignee: Andrey Kuznetsov > Ignite watchdog service handles longrunning cache creation > -- > > Key: IGNITE-9710 > URL: https://issues.apache.org/jira/browse/IGNITE-9710 > Project: Ignite > Issue Type: Bug > Components: cache >Affects Versions: 2.6 >Reporter: Vyacheslav Daradur >Assignee: Andrey Kuznetsov >Priority: Major > Fix For: 2.7 > > Attachments: LongRunningCacheCreationTest.java > > > Ignite watchdog service introduced by IGNITE-6587 handles long running cache > creation. > Action in {{GridDhtPartitionsExchangeFuture#init}} may take significant time > and possibly should be covered by blocking section of warchdog service. > Reproducer was attached: [^LongRunningCacheCreationTest.java]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-9710) Ignite watchdog service handles longrunning cache creation
[ https://issues.apache.org/jira/browse/IGNITE-9710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16632230#comment-16632230 ] Andrey Kuznetsov commented on IGNITE-9710: -- Wrapping entire {{GridDhtPartitionsExchangeFuture.init()}} into blocking section is too rough solution. By doing this we in fact make partition-exchanger thread not critical. To fix the issue thoroughly, {{init()}} body should be examined, and long running operations should be interleaved with {{GridWorker.updateHeartbeat()}} calls or wrapped into critical sections, depending on the situation. > Ignite watchdog service handles longrunning cache creation > -- > > Key: IGNITE-9710 > URL: https://issues.apache.org/jira/browse/IGNITE-9710 > Project: Ignite > Issue Type: Bug > Components: cache >Affects Versions: 2.6 >Reporter: Vyacheslav Daradur >Assignee: Andrey Kuznetsov >Priority: Major > Fix For: 2.7 > > Attachments: LongRunningCacheCreationTest.java > > > Ignite watchdog service introduced by IGNITE-6587 handles long running cache > creation. > Action in {{GridDhtPartitionsExchangeFuture#init}} may take significant time > and possibly should be covered by blocking section of warchdog service. > Reproducer was attached: [^LongRunningCacheCreationTest.java]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9744) Fix SYSTEM_WORKER_TERMINATION detection in general case
Andrey Kuznetsov created IGNITE-9744: Summary: Fix SYSTEM_WORKER_TERMINATION detection in general case Key: IGNITE-9744 URL: https://issues.apache.org/jira/browse/IGNITE-9744 Project: Ignite Issue Type: Bug Reporter: Andrey Kuznetsov Assignee: Andrey Kuznetsov Fix For: 2.7 All existing critical workers handle unintended termination individually. This should be done for arbitrtary critical worker as well. There is a test to check this situation, {{SystemWorkersTerminationTest.testTermination}}, but now it passes in fact due to {{SYSTEM_WORKER_BLOCKED}} instead of {{SYSTEM_WORKER_TERMINATION}}, and this should be fixed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (IGNITE-9679) Document critical workers liveness checking implementation
[ https://issues.apache.org/jira/browse/IGNITE-9679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Kuznetsov reassigned IGNITE-9679: Assignee: Andrey Kuznetsov (was: Artem Budnikov) > Document critical workers liveness checking implementation > -- > > Key: IGNITE-9679 > URL: https://issues.apache.org/jira/browse/IGNITE-9679 > Project: Ignite > Issue Type: Task > Components: documentation >Reporter: Andrey Kuznetsov >Assignee: Andrey Kuznetsov >Priority: Major > Fix For: 2.7 > > > Newly implemented critical worker thread liveness checks should be mentioned > in Ignite Documentation. Brief description of the functionality follows. > Ignite node has a number of critical worker threads that should be alive and > responsive, otherwise node's health is not guaranteed. These threads monitor > each other periodically and track two aspects for a thread being checked: > - whether it's alive; > - whether it updates its internal heartbeat timestamp. > Both checks use {{IgniteConfiguration.failureDetectionTimeout}} property as a > threshold value. > Whenever at least one of the above conditions is violated, checker thread > logs the error and calls currently configured {{FailureHandler}}. > Liveness checks are enabled by default, but can be disabled through > {{WorkersControlMXBean.healthMonitoringEnabled}} property. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9776) FsyncModeFileWriteAheadLogManager can block forever in log() call
Andrey Kuznetsov created IGNITE-9776: Summary: FsyncModeFileWriteAheadLogManager can block forever in log() call Key: IGNITE-9776 URL: https://issues.apache.org/jira/browse/IGNITE-9776 Project: Ignite Issue Type: Bug Reporter: Andrey Kuznetsov Fix For: 2.7 Attachments: FsyncWalRolloverDoesNotBlockTest.java If WAL archiver is disabled and WALRecord being logged has {{rollOver() == true}}, then {{log()}} call blocks forever in {{FileArchiver}}'s (!) method: {noformat} nextAbsoluteSegmentIndex:1707, FsyncModeFileWriteAheadLogManager$FileArchiver (org.apache.ignite.internal.processors.cache.persistence.wal) access$3200:1437, FsyncModeFileWriteAheadLogManager$FileArchiver (org.apache.ignite.internal.processors.cache.persistence.wal) pollNextFile:1384, FsyncModeFileWriteAheadLogManager (org.apache.ignite.internal.processors.cache.persistence.wal) initNextWriteHandle:1243, FsyncModeFileWriteAheadLogManager (org.apache.ignite.internal.processors.cache.persistence.wal) rollOver:1130, FsyncModeFileWriteAheadLogManager (org.apache.ignite.internal.processors.cache.persistence.wal) log:712, FsyncModeFileWriteAheadLogManager (org.apache.ignite.internal.processors.cache.persistence.wal) {noformat} Reporoducer is attached. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-9776) FsyncModeFileWriteAheadLogManager can block forever in log() call
[ https://issues.apache.org/jira/browse/IGNITE-9776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Kuznetsov updated IGNITE-9776: - Description: If WAL archiver is disabled and WALRecord being logged has {{rollOver() == true}}, then {{log()}} call blocks forever in {{FileArchiver}}'s (!) method: {noformat} nextAbsoluteSegmentIndex:1707, FsyncModeFileWriteAheadLogManager$FileArchiver (org.apache.ignite.internal.processors.cache.persistence.wal) access$3200:1437, FsyncModeFileWriteAheadLogManager$FileArchiver (org.apache.ignite.internal.processors.cache.persistence.wal) pollNextFile:1384, FsyncModeFileWriteAheadLogManager (org.apache.ignite.internal.processors.cache.persistence.wal) initNextWriteHandle:1243, FsyncModeFileWriteAheadLogManager (org.apache.ignite.internal.processors.cache.persistence.wal) rollOver:1130, FsyncModeFileWriteAheadLogManager (org.apache.ignite.internal.processors.cache.persistence.wal) log:712, FsyncModeFileWriteAheadLogManager (org.apache.ignite.internal.processors.cache.persistence.wal) {noformat} Reproducer is attached. was: If WAL archiver is disabled and WALRecord being logged has {{rollOver() == true}}, then {{log()}} call blocks forever in {{FileArchiver}}'s (!) method: {noformat} nextAbsoluteSegmentIndex:1707, FsyncModeFileWriteAheadLogManager$FileArchiver (org.apache.ignite.internal.processors.cache.persistence.wal) access$3200:1437, FsyncModeFileWriteAheadLogManager$FileArchiver (org.apache.ignite.internal.processors.cache.persistence.wal) pollNextFile:1384, FsyncModeFileWriteAheadLogManager (org.apache.ignite.internal.processors.cache.persistence.wal) initNextWriteHandle:1243, FsyncModeFileWriteAheadLogManager (org.apache.ignite.internal.processors.cache.persistence.wal) rollOver:1130, FsyncModeFileWriteAheadLogManager (org.apache.ignite.internal.processors.cache.persistence.wal) log:712, FsyncModeFileWriteAheadLogManager (org.apache.ignite.internal.processors.cache.persistence.wal) {noformat} Reporoducer is attached. > FsyncModeFileWriteAheadLogManager can block forever in log() call > - > > Key: IGNITE-9776 > URL: https://issues.apache.org/jira/browse/IGNITE-9776 > Project: Ignite > Issue Type: Bug >Reporter: Andrey Kuznetsov >Priority: Major > Fix For: 2.7 > > Attachments: FsyncWalRolloverDoesNotBlockTest.java > > > If WAL archiver is disabled and WALRecord being logged has {{rollOver() == > true}}, then {{log()}} call blocks forever in {{FileArchiver}}'s (!) method: > {noformat} > nextAbsoluteSegmentIndex:1707, FsyncModeFileWriteAheadLogManager$FileArchiver > (org.apache.ignite.internal.processors.cache.persistence.wal) > access$3200:1437, FsyncModeFileWriteAheadLogManager$FileArchiver > (org.apache.ignite.internal.processors.cache.persistence.wal) > pollNextFile:1384, FsyncModeFileWriteAheadLogManager > (org.apache.ignite.internal.processors.cache.persistence.wal) > initNextWriteHandle:1243, FsyncModeFileWriteAheadLogManager > (org.apache.ignite.internal.processors.cache.persistence.wal) > rollOver:1130, FsyncModeFileWriteAheadLogManager > (org.apache.ignite.internal.processors.cache.persistence.wal) > log:712, FsyncModeFileWriteAheadLogManager > (org.apache.ignite.internal.processors.cache.persistence.wal) > {noformat} > Reproducer is attached. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-9744) Fix SYSTEM_WORKER_TERMINATION detection in general case
[ https://issues.apache.org/jira/browse/IGNITE-9744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16637845#comment-16637845 ] Andrey Kuznetsov commented on IGNITE-9744: -- [~ivan.glukos], this change is ready. Could you please take a look? > Fix SYSTEM_WORKER_TERMINATION detection in general case > --- > > Key: IGNITE-9744 > URL: https://issues.apache.org/jira/browse/IGNITE-9744 > Project: Ignite > Issue Type: Bug >Reporter: Andrey Kuznetsov >Assignee: Andrey Kuznetsov >Priority: Major > Fix For: 2.7 > > > All existing critical workers handle unintended termination individually. > This should be done for arbitrtary critical worker as well. There is a test > to check this situation, {{SystemWorkersTerminationTest.testTermination}}, > but now it passes in fact due to {{SYSTEM_WORKER_BLOCKED}} instead of > {{SYSTEM_WORKER_TERMINATION}}, and this should be fixed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-9737) Ignite WatchDog service should be configurable
[ https://issues.apache.org/jira/browse/IGNITE-9737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16642935#comment-16642935 ] Andrey Kuznetsov commented on IGNITE-9737: -- Test for new parameters have been added. TeamCity results are ok for the PR. > Ignite WatchDog service should be configurable > -- > > Key: IGNITE-9737 > URL: https://issues.apache.org/jira/browse/IGNITE-9737 > Project: Ignite > Issue Type: Bug >Reporter: Nikolay Izhikov >Assignee: Andrey Kuznetsov >Priority: Blocker > Fix For: 2.7 > > > At the moment, there is no way to disable Ignite WatchDog service from config > or JVM option. > In any corner case or bug in that feature Ignite can become fully unusable > due to unpredictable shutdown. > We should provide a way to enable/disable this feature from config or from > JVM option. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9838) TxStateChangeEventTest fails sometimes on TeamCity
Andrey Kuznetsov created IGNITE-9838: Summary: TxStateChangeEventTest fails sometimes on TeamCity Key: IGNITE-9838 URL: https://issues.apache.org/jira/browse/IGNITE-9838 Project: Ignite Issue Type: Test Reporter: Andrey Kuznetsov Fix For: 2.8 Both test methods may fail to acquire transaction lock. Presumably, timeout increasing can be enough to fix. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9860) Unreliable listener invocation order in GridDhtPartitionsExchangeFuture#onDone
Andrey Kuznetsov created IGNITE-9860: Summary: Unreliable listener invocation order in GridDhtPartitionsExchangeFuture#onDone Key: IGNITE-9860 URL: https://issues.apache.org/jira/browse/IGNITE-9860 Project: Ignite Issue Type: Bug Reporter: Andrey Kuznetsov Fix For: 2.8 Listener being added right before {{super.onDone()}} call is intended to be invoked earlier than all other listeners. There is a small probability of breaking this guarantee: some other thread can call {{listen()}} before future-completing thread enters {{super.onDone()}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-9776) FsyncModeFileWriteAheadLogManager can block forever in log() call
[ https://issues.apache.org/jira/browse/IGNITE-9776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16649444#comment-16649444 ] Andrey Kuznetsov commented on IGNITE-9776: -- [~astelmak], could you please resurrect failing methods in {{WalRolloverTypesTest}} as a part of your PR? > FsyncModeFileWriteAheadLogManager can block forever in log() call > - > > Key: IGNITE-9776 > URL: https://issues.apache.org/jira/browse/IGNITE-9776 > Project: Ignite > Issue Type: Bug >Reporter: Andrey Kuznetsov >Assignee: Alexey Stelmak >Priority: Major > Fix For: 2.7 > > Attachments: FsyncWalRolloverDoesNotBlockTest.java > > > If WAL archiver is disabled and WALRecord being logged has {{rollOver() == > true}}, then {{log()}} call blocks forever in {{FileArchiver}}'s (!) method: > {noformat} > nextAbsoluteSegmentIndex:1707, FsyncModeFileWriteAheadLogManager$FileArchiver > (org.apache.ignite.internal.processors.cache.persistence.wal) > access$3200:1437, FsyncModeFileWriteAheadLogManager$FileArchiver > (org.apache.ignite.internal.processors.cache.persistence.wal) > pollNextFile:1384, FsyncModeFileWriteAheadLogManager > (org.apache.ignite.internal.processors.cache.persistence.wal) > initNextWriteHandle:1243, FsyncModeFileWriteAheadLogManager > (org.apache.ignite.internal.processors.cache.persistence.wal) > rollOver:1130, FsyncModeFileWriteAheadLogManager > (org.apache.ignite.internal.processors.cache.persistence.wal) > log:712, FsyncModeFileWriteAheadLogManager > (org.apache.ignite.internal.processors.cache.persistence.wal) > {noformat} > Reproducer is attached. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-9710) Ignite watchdog service handles longrunning cache creation
[ https://issues.apache.org/jira/browse/IGNITE-9710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16651936#comment-16651936 ] Andrey Kuznetsov commented on IGNITE-9710: -- [~agoncharuk], [~agura], thanks for your comments. I've made all required changes. I had to update progress marker for exchanger worker from threads other than exchanger thread. Are you ok with this? TeamCity (re)tests are in progress right now. > Ignite watchdog service handles longrunning cache creation > -- > > Key: IGNITE-9710 > URL: https://issues.apache.org/jira/browse/IGNITE-9710 > Project: Ignite > Issue Type: Bug > Components: cache >Affects Versions: 2.6 >Reporter: Vyacheslav Daradur >Assignee: Andrey Kuznetsov >Priority: Major > Fix For: 2.7 > > Attachments: LongRunningCacheCreationTest.java > > > Ignite watchdog service introduced by IGNITE-6587 handles long running cache > creation. > Action in {{GridDhtPartitionsExchangeFuture#init}} may take significant time > and possibly should be covered by blocking section of warchdog service. > Reproducer was attached: [^LongRunningCacheCreationTest.java]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-9710) Ignite watchdog service handles longrunning cache creation
[ https://issues.apache.org/jira/browse/IGNITE-9710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16653436#comment-16653436 ] Andrey Kuznetsov commented on IGNITE-9710: -- Don't believe TC Bot this time: the test mentioned is flaky. > Ignite watchdog service handles longrunning cache creation > -- > > Key: IGNITE-9710 > URL: https://issues.apache.org/jira/browse/IGNITE-9710 > Project: Ignite > Issue Type: Bug > Components: cache >Affects Versions: 2.6 >Reporter: Vyacheslav Daradur >Assignee: Andrey Kuznetsov >Priority: Major > Fix For: 2.7 > > Attachments: LongRunningCacheCreationTest.java > > > Ignite watchdog service introduced by IGNITE-6587 handles long running cache > creation. > Action in {{GridDhtPartitionsExchangeFuture#init}} may take significant time > and possibly should be covered by blocking section of warchdog service. > Reproducer was attached: [^LongRunningCacheCreationTest.java]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-9932) Exchanger blocking session bounds can be accessed from invalid thread
Andrey Kuznetsov created IGNITE-9932: Summary: Exchanger blocking session bounds can be accessed from invalid thread Key: IGNITE-9932 URL: https://issues.apache.org/jira/browse/IGNITE-9932 Project: Ignite Issue Type: Bug Reporter: Andrey Kuznetsov Assignee: Andrey Kuznetsov {{GridDhtPartitionExchangeFuture}} uses critical sections surrounded by {{exchangerBlockingSectionBegin}} and {{exchangerBlockingSectionEnd}}. Currently, these begin/end bounds assert they are called from partition-exchanger thread. It appeared that this assertion can be failed reasonably. So it is better to make begin/end bounds no-op unless they are called from partition-exchanger thread. {{IgniteStableBaselineBinObjFieldsQuerySelfTest#testQueryReplicatedTransactional}} may hang due to this issue, see [1]. Exception stack trace leading to critical failure follows. {noformat} java.lang.AssertionError at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.exchangerBlockingSectionBegin(GridCachePartitionExchangeManager.java:2351) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.waitUntilNewCachesAreRegistered(GridDhtPartitionsExchangeFuture.java:2261) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onDone(GridDhtPartitionsExchangeFuture.java:2066) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.processFullMessage(GridDhtPartitionsExchangeFuture.java:3980) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.access$2100(GridDhtPartitionsExchangeFuture.java:141) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$7.apply(GridDhtPartitionsExchangeFuture.java:3667) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$7.apply(GridDhtPartitionsExchangeFuture.java:3655) at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:385) at org.apache.ignite.internal.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:355) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onReceiveFullMessage(GridDhtPartitionsExchangeFuture.java:3655) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.processFullPartitionUpdate(GridCachePartitionExchangeManager.java:1655) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$3.onMessage(GridCachePartitionExchangeManager.java:393) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$3.onMessage(GridCachePartitionExchangeManager.java:380) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:3178) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:3157) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1056) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:581) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:380) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:306) at org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:101) at org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:295) at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1569) at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1197) at org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:127) at org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1093) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {noformat} [1] https://ci.ignite.apache.org/viewLog.html?buildId=2111470&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_BinaryObjectsSimpleMapperQueries -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (IGNITE-9679) Document critical workers liveness checking implementation
[ https://issues.apache.org/jira/browse/IGNITE-9679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Kuznetsov reassigned IGNITE-9679: Assignee: (was: Andrey Kuznetsov) > Document critical workers liveness checking implementation > -- > > Key: IGNITE-9679 > URL: https://issues.apache.org/jira/browse/IGNITE-9679 > Project: Ignite > Issue Type: Task > Components: documentation >Reporter: Andrey Kuznetsov >Priority: Major > Fix For: 2.7 > > > Newly implemented critical worker thread liveness checks should be mentioned > in Ignite Documentation. Brief description of the functionality follows. > Ignite node has a number of critical worker threads that should be alive and > responsive, otherwise node's health is not guaranteed. These threads monitor > each other periodically and track two aspects for a thread being checked: > - whether it's alive; > - whether it updates its internal heartbeat timestamp. > Both checks use {{IgniteConfiguration.failureDetectionTimeout}} property as a > threshold value. > Whenever at least one of the above conditions is violated, checker thread > logs the error and calls currently configured {{FailureHandler}}. > Liveness checks are enabled by default, but can be disabled through > {{WorkersControlMXBean.healthMonitoringEnabled}} property. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-9679) Document critical workers liveness checking implementation
[ https://issues.apache.org/jira/browse/IGNITE-9679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Kuznetsov updated IGNITE-9679: - Description: Newly implemented critical worker thread liveness checks should be mentioned in Ignite Documentation. Brief description of the functionality follows. Ignite node has a number of critical worker threads that should be alive and responsive, otherwise node's health is not guaranteed. These threads monitor each other periodically and track two aspects for a thread being checked: - whether it's alive; - whether it updates its internal heartbeat timestamp. Whenever at least one of the above conditions is violated, checker thread logs the error and calls currently configured {{FailureHandler}}. {{IgniteConfiguration.SystemWorkerBlockedTimeout}} configuration property affects monitoring behavior. At runtime monitoring settings can be changed via {{FailureHandlingMxBean}}. By default, liveness checks are enabled, but blocked system worker detection will not lead to failure handler invocation, see {{FailureProcessor#getDefaultFailureHandler}} . was: Newly implemented critical worker thread liveness checks should be mentioned in Ignite Documentation. Brief description of the functionality follows. Ignite node has a number of critical worker threads that should be alive and responsive, otherwise node's health is not guaranteed. These threads monitor each other periodically and track two aspects for a thread being checked: - whether it's alive; - whether it updates its internal heartbeat timestamp. Both checks use {{IgniteConfiguration.failureDetectionTimeout}} property as a threshold value. Whenever at least one of the above conditions is violated, checker thread logs the error and calls currently configured {{FailureHandler}}. Liveness checks are enabled by default, but can be disabled through {{WorkersControlMXBean.healthMonitoringEnabled}} property. > Document critical workers liveness checking implementation > -- > > Key: IGNITE-9679 > URL: https://issues.apache.org/jira/browse/IGNITE-9679 > Project: Ignite > Issue Type: Task > Components: documentation >Reporter: Andrey Kuznetsov >Priority: Major > Fix For: 2.7 > > > Newly implemented critical worker thread liveness checks should be mentioned > in Ignite Documentation. Brief description of the functionality follows. > Ignite node has a number of critical worker threads that should be alive and > responsive, otherwise node's health is not guaranteed. These threads monitor > each other periodically and track two aspects for a thread being checked: > - whether it's alive; > - whether it updates its internal heartbeat timestamp. > Whenever at least one of the above conditions is violated, checker thread > logs the error and calls currently configured {{FailureHandler}}. > {{IgniteConfiguration.SystemWorkerBlockedTimeout}} configuration property > affects monitoring behavior. At runtime monitoring settings can be changed > via {{FailureHandlingMxBean}}. > By default, liveness checks are enabled, but blocked system worker detection > will not lead to failure handler invocation, see > {{FailureProcessor#getDefaultFailureHandler}} . -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-9679) Document critical workers liveness checking implementation
[ https://issues.apache.org/jira/browse/IGNITE-9679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16655548#comment-16655548 ] Andrey Kuznetsov commented on IGNITE-9679: -- [~Artem Budnikov], this issue is not blocked anymore, your help with it is appreciated. > Document critical workers liveness checking implementation > -- > > Key: IGNITE-9679 > URL: https://issues.apache.org/jira/browse/IGNITE-9679 > Project: Ignite > Issue Type: Task > Components: documentation >Reporter: Andrey Kuznetsov >Priority: Major > Fix For: 2.7 > > > Newly implemented critical worker thread liveness checks should be mentioned > in Ignite Documentation. Brief description of the functionality follows. > Ignite node has a number of critical worker threads that should be alive and > responsive, otherwise node's health is not guaranteed. These threads monitor > each other periodically and track two aspects for a thread being checked: > - whether it's alive; > - whether it updates its internal heartbeat timestamp. > Whenever at least one of the above conditions is violated, checker thread > logs the error and calls currently configured {{FailureHandler}}. > {{IgniteConfiguration.SystemWorkerBlockedTimeout}} configuration property > affects monitoring behavior. At runtime monitoring settings can be changed > via {{FailureHandlingMxBean}}. > By default, liveness checks are enabled, but blocked system worker detection > will not lead to failure handler invocation, see > {{FailureProcessor#getDefaultFailureHandler}} . -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-9695) Add a way to prevent WAL disabling in WalStateManager
[ https://issues.apache.org/jira/browse/IGNITE-9695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Kuznetsov updated IGNITE-9695: - Summary: Add a way to prevent WAL disabling in WalStateManager (was: Add a way to prevent per-cache WAL disabling in WalStateManager) > Add a way to prevent WAL disabling in WalStateManager > - > > Key: IGNITE-9695 > URL: https://issues.apache.org/jira/browse/IGNITE-9695 > Project: Ignite > Issue Type: Task >Affects Versions: 2.6 >Reporter: Andrey Kuznetsov >Assignee: Andrey Kuznetsov >Priority: Major > Fix For: 2.8 > > > When this prevention is on, {{WalStateManager.init()}} should return an > error-holding future immediately. -- This message was sent by Atlassian JIRA (v7.6.3#76005)