[jira] [Commented] (IGNITE-7338) can get value by entry.getValue but cann't get value by cache.get(entry.getKey)

2017-12-29 Thread Andrey Kuznetsov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-7338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16306078#comment-16306078
 ] 

Andrey Kuznetsov commented on IGNITE-7338:
--

At the very least cache key class (CcbCdrChargeRuleKey) is needed to understand 
the issue. Cache and cluster configuration can also be useful.

> can get value by entry.getValue but cann't get value by 
> cache.get(entry.getKey)
> ---
>
> Key: IGNITE-7338
> URL: https://issues.apache.org/jira/browse/IGNITE-7338
> Project: Ignite
>  Issue Type: Bug
>  Components: cache
>Affects Versions: 1.9
>Reporter: dean
>Priority: Critical
>
> bossapp@Linux-Power-NB-AltiDB2-XT:/home/bossapp6$java -version
> java version "1.7.0"
> Java(TM) SE Runtime Environment (build pxp6470_27sr4fp15-20171116_01(SR4 
> FP15))
> IBM J9 VM (build 2.7, JRE 1.7.0 Linux ppc64-64 Compressed References 
> 20171011_366929 (JIT enabled, AOT enabled)
> J9VM - R27_Java727_SR4_20171011_1720_B366929
> JIT  - tr.r13.java_20171011_366929
> GC   - R27_Java727_SR4_20171011_1720_B366929_CMPRSS
> J9CL - 20171011_366929)
> JCL - 20171109_01 based on Oracle jdk7u161-b13
> bossapp@Linux-Power-NB-AltiDB2-XT:/home/bossapp6$lsb_release -a
> LSB Version:  
> :core-4.0-noarch:core-4.0-ppc64:graphics-4.0-noarch:graphics-4.0-ppc64:printing-4.0-noarch:printing-4.0-ppc64
> Distributor ID:   n/a
> Description:  redhat-4 
> Release:  n/a
> Codename: n/a
> {color:#d04437}test code:{color}
> if(cacheName.equalsIgnoreCase("tariff-ccb_cdr_charge_rule")){
> CcbCdrChargeRuleKey ccbCdrChargeRuleKey = new CcbCdrChargeRuleKey();
> ccbCdrChargeRuleKey.setFileType("573");
> ccbCdrChargeRuleKey.setSourceType("5");
> Object cacheDate = ignite.cache(cacheName).get(ccbCdrChargeRuleKey);
> logger.debug(LogProperty.LOGTYPE_DETAIL, ccbCdrChargeRuleKey+"Object 
> eKey:" + cacheDate );
> ccbCdrChargeRuleKey.setFileType("1233");
> ccbCdrChargeRuleKey.setSourceType("14");
> Object cacheDate1 = ignite.cache(cacheName).get(ccbCdrChargeRuleKey);
> logger.debug(LogProperty.LOGTYPE_DETAIL, "Object1:" + cacheDate1  
> +"ccbCdrChargeRuleKey"+ccbCdrChargeRuleKey.hashCode());
> IgniteCache cacheDate3 = ignite.cache(cacheName);
> for (Cache.Entry e : cacheDate3) {
>  List cacheModelList = (List) 
> cacheDate3.get(e.getKey());
> 
> {color:#d04437}logger.debug(LogProperty.LOGTYPE_DETAIL,"e.getKey():"+e.getKey()
>  +" cacheModelList:" + cacheModelList );
> logger.debug(LogProperty.LOGTYPE_DETAIL,"e.getValue():"+e.getValue() 
> );{color}
>}
> }
> results:
> 2017-12-2819:17:36,322||DEBUG||frame_thread_nodestart| 
> com.newland.boss.cloud.commons.igniteclient.PlatformInitIgniteClient.test(PlatformInitIgniteClient.java:337)|
>  {color:#d04437}e.getKey():573,5 cacheModelList:null{color}
> 2017-12-2819:17:36,323||DEBUG||frame_thread_nodestart| 
> com.newland.boss.cloud.commons.igniteclient.PlatformInitIgniteClient.test(PlatformInitIgniteClient.java:338)|
>  
> {color:#d04437}e.getValue():[CcbCdrChargeRule{ccbCdrChargeRuleKey=573,5,
>  bizDomainCode='3', conditionGroupId=, fileType='573', 
> preProcessUnitClass='PreProcessGprs', priority=1, rateItemTypes='6', 
> ratingClass='RatingGprs', ruleDesc='国际出访GPRS 专网', sourceType='5', 
> userTariffClass='GetUserTariffInfoGprs', version='0.0.1'}]{color}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (IGNITE-5553) Ignite PDS 2: IgnitePersistentStoreDataStructuresTest testSet assertion error

2018-01-10 Thread Andrey Kuznetsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-5553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Kuznetsov updated IGNITE-5553:
-
Priority: Major  (was: Critical)

> Ignite PDS 2: IgnitePersistentStoreDataStructuresTest testSet assertion error
> -
>
> Key: IGNITE-5553
> URL: https://issues.apache.org/jira/browse/IGNITE-5553
> Project: Ignite
>  Issue Type: Bug
>  Components: data structures, persistence
>Affects Versions: 2.1
>Reporter: Dmitriy Pavlov
>Assignee: Andrey Kuznetsov
>  Labels: MakeTeamcityGreenAgain, Muted_test, test-fail
>
> h2. Notes-4435
> When IgniteSet is restored from persistence, size of set is always 0, [link 
> to test 
> history|http://ci.ignite.apache.org/project.html?projectId=Ignite20Tests&testNameId=-7043871603266099589&tab=testDetails].
> h2. Detailed description
> Unlike *IgniteQueue* which uses separate cache key to store its size 
> *IgniteSet* stores it in a field of some class.
> Test from the link above shows very clearly that after restoring memory state 
> from PDS all set values are restored correctly but size is lost.
> h2. Proposed solution
> One possible solution might be to do the same thing as *IgniteQueue* does: 
> size of *IgniteSet* must be stored is cache instead of volatile in-memory 
> fields of random classes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (IGNITE-7386) Get rid of LongAdder8, ConcurrentHashMap8, etc

2018-01-15 Thread Andrey Kuznetsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Kuznetsov reassigned IGNITE-7386:


Assignee: Andrey Kuznetsov  (was: Aleksey Plekhanov)

> Get rid of LongAdder8, ConcurrentHashMap8, etc
> --
>
> Key: IGNITE-7386
> URL: https://issues.apache.org/jira/browse/IGNITE-7386
> Project: Ignite
>  Issue Type: Task
>Reporter: Anton Vinogradov
>Assignee: Andrey Kuznetsov
>Priority: Major
> Fix For: 2.5
>
>
> Since we're dropping java7 support there is no need now to use 
> {{LongAdder8}}, {{ConcurrentHashMap8}}, ...
> We should remove all classes from {{org.jsr166}} namespace and use 
> corresponding classes from jdk8.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-6711) DataRegionMetrics#totalAllocatedPages is not valid after node restart

2018-01-19 Thread Andrey Kuznetsov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16332540#comment-16332540
 ] 

Andrey Kuznetsov commented on IGNITE-6711:
--

[~avinogradov], it look more elegant now, after you removed redundant mapping 
from data regions to their allocation counters.

> DataRegionMetrics#totalAllocatedPages is not valid after node restart
> -
>
> Key: IGNITE-6711
> URL: https://issues.apache.org/jira/browse/IGNITE-6711
> Project: Ignite
>  Issue Type: Bug
>  Components: general
>Affects Versions: 2.2
>Reporter: Alexey Goncharuk
>Assignee: Andrey Kuznetsov
>Priority: Major
>  Labels: iep-6, newbie
> Fix For: 2.4
>
>
> Currently, data region metric tracks total allocated pages by a callback on 
> page allocation. However, when a node with enabled persistence is started, 
> some of the pages are already allocated, which leads to an incorrect metric 
> value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (IGNITE-6711) DataRegionMetrics#totalAllocatedPages is not valid after node restart

2018-01-19 Thread Andrey Kuznetsov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16332540#comment-16332540
 ] 

Andrey Kuznetsov edited comment on IGNITE-6711 at 1/19/18 5:19 PM:
---

[~avinogradov], it looks more elegant now, after you removed redundant mapping 
from data regions to their allocation counters.


was (Author: andrey-kuznetsov):
[~avinogradov], it look more elegant now, after you removed redundant mapping 
from data regions to their allocation counters.

> DataRegionMetrics#totalAllocatedPages is not valid after node restart
> -
>
> Key: IGNITE-6711
> URL: https://issues.apache.org/jira/browse/IGNITE-6711
> Project: Ignite
>  Issue Type: Bug
>  Components: general
>Affects Versions: 2.2
>Reporter: Alexey Goncharuk
>Assignee: Andrey Kuznetsov
>Priority: Major
>  Labels: iep-6, newbie
> Fix For: 2.4
>
>
> Currently, data region metric tracks total allocated pages by a callback on 
> page allocation. However, when a node with enabled persistence is started, 
> some of the pages are already allocated, which leads to an incorrect metric 
> value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7491) Documentation: add new data region metrics description

2018-01-22 Thread Andrey Kuznetsov (JIRA)
Andrey Kuznetsov created IGNITE-7491:


 Summary: Documentation: add new data region metrics description
 Key: IGNITE-7491
 URL: https://issues.apache.org/jira/browse/IGNITE-7491
 Project: Ignite
  Issue Type: Task
  Components: documentation
Reporter: Andrey Kuznetsov
Assignee: Denis Magda
 Fix For: 2.4


Newly created data region metrics should be documented.

* `getTotalAllocatedSize` -- same as `getTotalAllocatedPages` but in bytes.
* `getPhysicalMemorySize` -- same as `getPhysicalMemoryPages` but in bytes.
* `getCheckpointBufferPages` -- gets checkpoint buffer size in pages.
* `getCheckpointBufferSize` -- gets checkpoint buffer size in bytes.
* `getPageSize` -- gets memory page size.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-7386) Get rid of org.jsr166.LongAdder8, org.jsr166.ThreadLocalRandom

2018-01-24 Thread Andrey Kuznetsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Kuznetsov updated IGNITE-7386:
-
Summary: Get rid of org.jsr166.LongAdder8, org.jsr166.ThreadLocalRandom  
(was: Get rid of LongAdder8, ConcurrentHashMap8, etc)

> Get rid of org.jsr166.LongAdder8, org.jsr166.ThreadLocalRandom
> --
>
> Key: IGNITE-7386
> URL: https://issues.apache.org/jira/browse/IGNITE-7386
> Project: Ignite
>  Issue Type: Task
>Reporter: Anton Vinogradov
>Assignee: Andrey Kuznetsov
>Priority: Major
> Fix For: 2.5
>
>
> Since we're dropping java7 support there is no need now to use 
> {{LongAdder8}}, {{ConcurrentHashMap8}}, ...
> We should remove all classes from {{org.jsr166}} namespace and use 
> corresponding classes from jdk8.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-7386) Get rid of org.jsr166.LongAdder8, org.jsr166.ThreadLocalRandom

2018-01-24 Thread Andrey Kuznetsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Kuznetsov updated IGNITE-7386:
-
Description: Since we're switching to java8 there is no need now to use 
these classes anymore.  (was: Since we're dropping java7 support there is no 
need now to use {{LongAdder8}}, {{ConcurrentHashMap8}}, ...

We should remove all classes from {{org.jsr166}} namespace and use 
corresponding classes from jdk8.)

> Get rid of org.jsr166.LongAdder8, org.jsr166.ThreadLocalRandom
> --
>
> Key: IGNITE-7386
> URL: https://issues.apache.org/jira/browse/IGNITE-7386
> Project: Ignite
>  Issue Type: Task
>Reporter: Anton Vinogradov
>Assignee: Andrey Kuznetsov
>Priority: Major
> Fix For: 2.5
>
>
> Since we're switching to java8 there is no need now to use these classes 
> anymore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7513) Get rid of org.jsr166.ConcurrentHashMap8

2018-01-24 Thread Andrey Kuznetsov (JIRA)
Andrey Kuznetsov created IGNITE-7513:


 Summary: Get rid of org.jsr166.ConcurrentHashMap8
 Key: IGNITE-7513
 URL: https://issues.apache.org/jira/browse/IGNITE-7513
 Project: Ignite
  Issue Type: Task
Reporter: Andrey Kuznetsov
Assignee: Andrey Kuznetsov
 Fix For: 2.5


This class was made of ConcurrentHashMapV8, an intermadiate implementation of 
Java8's ConcurrentHashMap. Now we should switch to standard CHM. Possibly, 
we'll have to check for performance implications.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-7513) Get rid of org.jsr166.ConcurrentHashMap8

2018-01-24 Thread Andrey Kuznetsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-7513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Kuznetsov updated IGNITE-7513:
-
Description: This class was made of ConcurrentHashMapV8, an intermediate 
implementation of Java8's ConcurrentHashMap. Now we should switch to standard 
CHM. Possibly, we'll have to check for performance implications.  (was: This 
class was made of ConcurrentHashMapV8, an intermadiate implementation of 
Java8's ConcurrentHashMap. Now we should switch to standard CHM. Possibly, 
we'll have to check for performance implications.)

> Get rid of org.jsr166.ConcurrentHashMap8
> 
>
> Key: IGNITE-7513
> URL: https://issues.apache.org/jira/browse/IGNITE-7513
> Project: Ignite
>  Issue Type: Task
>Reporter: Andrey Kuznetsov
>Assignee: Andrey Kuznetsov
>Priority: Major
> Fix For: 2.5
>
>
> This class was made of ConcurrentHashMapV8, an intermediate implementation of 
> Java8's ConcurrentHashMap. Now we should switch to standard CHM. Possibly, 
> we'll have to check for performance implications.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-7513) Get rid of org.jsr166.ConcurrentHashMap8

2018-01-24 Thread Andrey Kuznetsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-7513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Kuznetsov updated IGNITE-7513:
-
Issue Type: Sub-task  (was: Task)
Parent: IGNITE-7386

> Get rid of org.jsr166.ConcurrentHashMap8
> 
>
> Key: IGNITE-7513
> URL: https://issues.apache.org/jira/browse/IGNITE-7513
> Project: Ignite
>  Issue Type: Sub-task
>Reporter: Andrey Kuznetsov
>Assignee: Andrey Kuznetsov
>Priority: Major
> Fix For: 2.5
>
>
> This class was made of ConcurrentHashMapV8, an intermediate implementation of 
> Java8's ConcurrentHashMap. Now we should switch to standard CHM. Possibly, 
> we'll have to check for performance implications.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-7386) Get rid of LongAdder8, ConcurrentHashMap8, etc

2018-01-24 Thread Andrey Kuznetsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Kuznetsov updated IGNITE-7386:
-
Summary: Get rid of LongAdder8, ConcurrentHashMap8, etc  (was: Get rid of 
org.jsr166.LongAdder8, org.jsr166.ThreadLocalRandom)

> Get rid of LongAdder8, ConcurrentHashMap8, etc
> --
>
> Key: IGNITE-7386
> URL: https://issues.apache.org/jira/browse/IGNITE-7386
> Project: Ignite
>  Issue Type: Task
>Reporter: Anton Vinogradov
>Assignee: Andrey Kuznetsov
>Priority: Major
> Fix For: 2.5
>
>
> Since we're switching to java8 there is no need now to use these classes 
> anymore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-7386) Get rid of LongAdder8, ConcurrentHashMap8, etc

2018-01-24 Thread Andrey Kuznetsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Kuznetsov updated IGNITE-7386:
-
Description: 
|Since we're dropping java7 support there is no need now to use 
\{{LongAdder8}}, \{{ConcurrentHashMap8}}, ... 
 
We should remove all classes from \{{org.jsr166}} namespace and use 
corresponding classes from jdk8.|

  was:Since we're switching to java8 there is no need now to use these classes 
anymore.


> Get rid of LongAdder8, ConcurrentHashMap8, etc
> --
>
> Key: IGNITE-7386
> URL: https://issues.apache.org/jira/browse/IGNITE-7386
> Project: Ignite
>  Issue Type: Task
>Reporter: Anton Vinogradov
>Assignee: Andrey Kuznetsov
>Priority: Major
> Fix For: 2.5
>
>
> |Since we're dropping java7 support there is no need now to use 
> \{{LongAdder8}}, \{{ConcurrentHashMap8}}, ... 
>  
> We should remove all classes from \{{org.jsr166}} namespace and use 
> corresponding classes from jdk8.|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7516) Get rid of org.jsr166.ConcurrentLinkedHashMap

2018-01-24 Thread Andrey Kuznetsov (JIRA)
Andrey Kuznetsov created IGNITE-7516:


 Summary: Get rid of org.jsr166.ConcurrentLinkedHashMap
 Key: IGNITE-7516
 URL: https://issues.apache.org/jira/browse/IGNITE-7516
 Project: Ignite
  Issue Type: Sub-task
Reporter: Andrey Kuznetsov
Assignee: Andrey Kuznetsov
 Fix For: 2.5






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7517) Get rid of org.jsr166.ConcurrentLinkedDeque8

2018-01-24 Thread Andrey Kuznetsov (JIRA)
Andrey Kuznetsov created IGNITE-7517:


 Summary: Get rid of org.jsr166.ConcurrentLinkedDeque8
 Key: IGNITE-7517
 URL: https://issues.apache.org/jira/browse/IGNITE-7517
 Project: Ignite
  Issue Type: Sub-task
Reporter: Andrey Kuznetsov
Assignee: Andrey Kuznetsov
 Fix For: 2.5






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7518) Get rid of org.jsr166.LongAdder8, org.jsr166.ThreadLocalRandom

2018-01-24 Thread Andrey Kuznetsov (JIRA)
Andrey Kuznetsov created IGNITE-7518:


 Summary: Get rid of org.jsr166.LongAdder8, 
org.jsr166.ThreadLocalRandom
 Key: IGNITE-7518
 URL: https://issues.apache.org/jira/browse/IGNITE-7518
 Project: Ignite
  Issue Type: Sub-task
Reporter: Andrey Kuznetsov
Assignee: Andrey Kuznetsov
 Fix For: 2.5






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-7518) Get rid of org.jsr166.LongAdder8, org.jsr166.ThreadLocalRandom

2018-01-28 Thread Andrey Kuznetsov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-7518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16342980#comment-16342980
 ] 

Andrey Kuznetsov commented on IGNITE-7518:
--

[~avinogradov], this change is ready for review, could you please take a look?

> Get rid of org.jsr166.LongAdder8, org.jsr166.ThreadLocalRandom
> --
>
> Key: IGNITE-7518
> URL: https://issues.apache.org/jira/browse/IGNITE-7518
> Project: Ignite
>  Issue Type: Sub-task
>Reporter: Andrey Kuznetsov
>Assignee: Andrey Kuznetsov
>Priority: Major
> Fix For: 2.5
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-7685) Incorrect AllocationRate counting

2018-02-15 Thread Andrey Kuznetsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-7685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Kuznetsov updated IGNITE-7685:
-
Description: 
Each call of 
{{org.apache.ignite.internal.processors.cache.persistence.DataRegionMetricsImpl#updateTotalAllocatedPages}}
 performs {{allocRate.onHit()}} call which is not correct since delta can be 
negative or bigger that 1.

Need to fix allocationRate counting. The fix should affect only "proper" 
allocations, as opposed to allocations made during recovery, storage 
initialization, etc. 

  was:
Each call of 
{{org.apache.ignite.internal.processors.cache.persistence.DataRegionMetricsImpl#updateTotalAllocatedPages}}
 performs {{allocRate.onHit()}} call which is not correct since delta can be 
negative or bigger that 1.

Need to fix allocationRate counting


> Incorrect AllocationRate counting
> -
>
> Key: IGNITE-7685
> URL: https://issues.apache.org/jira/browse/IGNITE-7685
> Project: Ignite
>  Issue Type: Task
>Affects Versions: 2.4
>Reporter: Anton Vinogradov
>Assignee: Andrey Kuznetsov
>Priority: Major
> Fix For: 2.5
>
>
> Each call of 
> {{org.apache.ignite.internal.processors.cache.persistence.DataRegionMetricsImpl#updateTotalAllocatedPages}}
>  performs {{allocRate.onHit()}} call which is not correct since delta can be 
> negative or bigger that 1.
> Need to fix allocationRate counting. The fix should affect only "proper" 
> allocations, as opposed to allocations made during recovery, storage 
> initialization, etc. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7823) Enrich IgniteCache with asSet method

2018-02-27 Thread Andrey Kuznetsov (JIRA)
Andrey Kuznetsov created IGNITE-7823:


 Summary: Enrich IgniteCache with asSet method
 Key: IGNITE-7823
 URL: https://issues.apache.org/jira/browse/IGNITE-7823
 Project: Ignite
  Issue Type: New Feature
  Components: data structures
Reporter: Andrey Kuznetsov
 Fix For: 2.5


Existing {{IgniteSet}} datastructure is good enough for small sets. For big 
sets it's too expensive to maintain redundant onheap data copies. Thus we'd 
better to add new {{IgniteCache::asSet}} method returning set adapter to 
existing cache. The difference between these two kinds of sets should be 
properly documented afterwards. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-7770) Ignite Cache 6: testRandomMixedTxConfigurations failed probably after jsr166 removal

2018-03-06 Thread Andrey Kuznetsov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-7770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16387722#comment-16387722
 ] 

Andrey Kuznetsov commented on IGNITE-7770:
--

There are at least two possible flaky failure scenarios.

1 - Most frequent. 

NPE; first {{get()}} in transaction returns {{null}}, this should be impossible 
due to test code structure.


2 - Less frequent. 

Timeout while waiting for {{multithreadedAsync()}} completion.
First, some optimistic transaction commits changed entry with key K and parks 
forever:
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140)
org.apache.ignite.internal.processors.cache.transactions.TransactionProxyImpl.commit(TransactionProxyImpl.java:293)
org.apache.ignite.internal.processors.cache.transactions.TxRollbackOnTimeoutTest$4.run(TxRollbackOnTimeoutTest.java:444)
org.apache.ignite.testframework.GridTestUtils$9.call(GridTestUtils.java:1275)
org.apache.ignite.testframework.GridTestThread.run(GridTestThread.java:86)
This transaction keeps hanging even after its timeout occured.

Next, a bunch of pessimistic transactions also park forever on {{put()}} for 
the same key K:
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140)
org.apache.ignite.internal.processors.cache.GridCacheAdapter$22.op(GridCacheAdapter.java:2390)
org.apache.ignite.internal.processors.cache.GridCacheAdapter$22.op(GridCacheAdapter.java:2388)
org.apache.ignite.internal.processors.cache.GridCacheAdapter.syncOp(GridCacheAdapter.java:4088)
org.apache.ignite.internal.processors.cache.GridCacheAdapter.put0(GridCacheAdapter.java:2388)
org.apache.ignite.internal.processors.cache.GridCacheAdapter.put(GridCacheAdapter.java:2369)
org.apache.ignite.internal.processors.cache.GridCacheAdapter.put(GridCacheAdapter.java:2346)
org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.put(IgniteCacheProxyImpl.java:1084)
org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.put(GatewayProtectedCacheProxy.java:886)
org.apache.ignite.internal.processors.cache.transactions.TxRollbackOnTimeoutTest$4.run(TxRollbackOnTimeoutTest.java:442)
org.apache.ignite.testframework.GridTestUtils$9.call(GridTestUtils.java:1275)
org.apache.ignite.testframework.GridTestThread.run(GridTestThread.java:86)

And also there is a number of threads operating normally.

> Ignite Cache 6: testRandomMixedTxConfigurations failed probably after jsr166 
> removal
> 
>
> Key: IGNITE-7770
> URL: https://issues.apache.org/jira/browse/IGNITE-7770
> Project: Ignite
>  Issue Type: Task
>Reporter: Dmitriy Pavlov
>Assignee: Andrey Kuznetsov
>Priority: Major
>  Labels: MakeTeamcityGreenAgain, Muted_test
> Fix For: 2.5
>
>
> Ignite Cache 6: testRandomMixedTxConfigurations failed probably after jsr166 
> removal
> After appying IGNITE-7518 Get rid of org.jsr166.LongAdder8, 
>  IgniteCacheTestSuite6: 
> TxRollbackOnTimeoutNearCacheTest.testRandomMixedTxConfigurations (fail rate 
> 38,6%) 
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=-3733584033131292028&branch=%3Cdefault%3E&tab=testDetails



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7983) NPE in TxRollbackOnTimeoutNearCacheTest.testRandomMixedTxConfigurations

2018-03-19 Thread Andrey Kuznetsov (JIRA)
Andrey Kuznetsov created IGNITE-7983:


 Summary: NPE in 
TxRollbackOnTimeoutNearCacheTest.testRandomMixedTxConfigurations
 Key: IGNITE-7983
 URL: https://issues.apache.org/jira/browse/IGNITE-7983
 Project: Ignite
  Issue Type: Task
Affects Versions: 2.4
Reporter: Andrey Kuznetsov
Assignee: Andrey Kuznetsov
 Fix For: 2.5


{{get}} inside transaction sometimes returns {{null}}. This should be 
impossible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-7770) Ignite Cache 6: testRandomMixedTxConfigurations failed probably after jsr166 removal

2018-03-19 Thread Andrey Kuznetsov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-7770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16404659#comment-16404659
 ] 

Andrey Kuznetsov commented on IGNITE-7770:
--

Emerged another issue [1] for NPE scenario.

[1] https://issues.apache.org/jira/browse/IGNITE-7983

> Ignite Cache 6: testRandomMixedTxConfigurations failed probably after jsr166 
> removal
> 
>
> Key: IGNITE-7770
> URL: https://issues.apache.org/jira/browse/IGNITE-7770
> Project: Ignite
>  Issue Type: Task
>Reporter: Dmitriy Pavlov
>Assignee: Andrey Kuznetsov
>Priority: Major
>  Labels: MakeTeamcityGreenAgain, Muted_test
> Fix For: 2.5
>
>
> Ignite Cache 6: testRandomMixedTxConfigurations failed probably after jsr166 
> removal
> After appying IGNITE-7518 Get rid of org.jsr166.LongAdder8, 
>  IgniteCacheTestSuite6: 
> TxRollbackOnTimeoutNearCacheTest.testRandomMixedTxConfigurations (fail rate 
> 38,6%) 
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=-3733584033131292028&branch=%3Cdefault%3E&tab=testDetails



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-7770) Ignite Cache 6: testRandomMixedTxConfigurations failed probably after jsr166 removal

2018-03-20 Thread Andrey Kuznetsov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-7770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16406118#comment-16406118
 ] 

Andrey Kuznetsov commented on IGNITE-7770:
--

[~agura], I've prepared a fix for "infinite park" scenario, could you please 
take a look?
NPE scenario is suppressed in the PR.

> Ignite Cache 6: testRandomMixedTxConfigurations failed probably after jsr166 
> removal
> 
>
> Key: IGNITE-7770
> URL: https://issues.apache.org/jira/browse/IGNITE-7770
> Project: Ignite
>  Issue Type: Task
>Reporter: Dmitriy Pavlov
>Assignee: Andrey Kuznetsov
>Priority: Major
>  Labels: MakeTeamcityGreenAgain, Muted_test
> Fix For: 2.5
>
>
> Ignite Cache 6: testRandomMixedTxConfigurations failed probably after jsr166 
> removal
> After appying IGNITE-7518 Get rid of org.jsr166.LongAdder8, 
>  IgniteCacheTestSuite6: 
> TxRollbackOnTimeoutNearCacheTest.testRandomMixedTxConfigurations (fail rate 
> 38,6%) 
> https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=-3733584033131292028&branch=%3Cdefault%3E&tab=testDetails



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-5553) Ignite PDS 2: IgnitePersistentStoreDataStructuresTest testSet assertion error

2018-03-20 Thread Andrey Kuznetsov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-5553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16406427#comment-16406427
 ] 

Andrey Kuznetsov commented on IGNITE-5553:
--

[~dpavlov], we discussed the PR with [~xtern] and replaced two init flags with 
single state field (that is, single mean of synchronization). Now I like the 
changes. 

> Ignite PDS 2: IgnitePersistentStoreDataStructuresTest testSet assertion error
> -
>
> Key: IGNITE-5553
> URL: https://issues.apache.org/jira/browse/IGNITE-5553
> Project: Ignite
>  Issue Type: Bug
>  Components: data structures, persistence
>Affects Versions: 2.1
>Reporter: Dmitriy Pavlov
>Assignee: Pavel Pereslegin
>Priority: Major
>  Labels: MakeTeamcityGreenAgain, Muted_test, test-fail
> Fix For: 2.5
>
>
> h2. Notes-4435
> When IgniteSet is restored from persistence, size of set is always 0, [link 
> to test 
> history|http://ci.ignite.apache.org/project.html?projectId=Ignite20Tests&testNameId=-7043871603266099589&tab=testDetails].
> h2. Detailed description
> Unlike *IgniteQueue* which uses separate cache key to store its size 
> *IgniteSet* stores it in a field of some class.
> Test from the link above shows very clearly that after restoring memory state 
> from PDS all set values are restored correctly but size is lost.
> h2. Proposed solution
> One possible solution might be to do the same thing as *IgniteQueue* does: 
> size of *IgniteSet* must be stored is cache instead of volatile in-memory 
> fields of random classes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8025) Result of GridTestUtils.runMultiThreadedAsync has a bug in cancel() implementation

2018-03-22 Thread Andrey Kuznetsov (JIRA)
Andrey Kuznetsov created IGNITE-8025:


 Summary: Result of GridTestUtils.runMultiThreadedAsync has a bug 
in cancel() implementation
 Key: IGNITE-8025
 URL: https://issues.apache.org/jira/browse/IGNITE-8025
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.4
Reporter: Andrey Kuznetsov
 Fix For: 2.5
 Attachments: BugRunMTAsyncTest.java

GridTestUtils.runMultiThreadedAsync returns a future with cancel() support, but 
cancellation implementation never interrupts threads that execute user-provided 
tasks. That is, those threads can continue their execution even after test 
method finishes.

The reproducer attached demonstrates activity from threads created by test0 
after test0 finished and test1 is being run.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (IGNITE-8025) Result of GridTestUtils.runMultiThreadedAsync has a bug in cancel() implementation

2018-03-22 Thread Andrey Kuznetsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-8025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Kuznetsov reassigned IGNITE-8025:


Assignee: Andrey Kuznetsov

> Result of GridTestUtils.runMultiThreadedAsync has a bug in cancel() 
> implementation
> --
>
> Key: IGNITE-8025
> URL: https://issues.apache.org/jira/browse/IGNITE-8025
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.4
>Reporter: Andrey Kuznetsov
>Assignee: Andrey Kuznetsov
>Priority: Major
>  Labels: MakeTeamcityGreenAgain, test
> Fix For: 2.5
>
> Attachments: BugRunMTAsyncTest.java
>
>
> GridTestUtils.runMultiThreadedAsync returns a future with cancel() support, 
> but cancellation implementation never interrupts threads that execute 
> user-provided tasks. That is, those threads can continue their execution even 
> after test method finishes.
> The reproducer attached demonstrates activity from threads created by test0 
> after test0 finished and test1 is being run.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-7770) Ignite Cache 6: testRandomMixedTxConfigurations failed probably after jsr166 removal

2018-03-23 Thread Andrey Kuznetsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-7770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Kuznetsov updated IGNITE-7770:
-
Description: 
TxRollbackOnTimeoutNearCacheTest.testRandomMixedTxConfigurations fails flakily 
due to resulting future timeout. It's caused by poor reproducible infinite 
park's in optimistic TX commit:



  was:
Ignite Cache 6: testRandomMixedTxConfigurations failed probably after jsr166 
removal
After appying IGNITE-7518 Get rid of org.jsr166.LongAdder8, 
 IgniteCacheTestSuite6: 
TxRollbackOnTimeoutNearCacheTest.testRandomMixedTxConfigurations (fail rate 
38,6%) 

https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=-3733584033131292028&branch=%3Cdefault%3E&tab=testDetails



> Ignite Cache 6: testRandomMixedTxConfigurations failed probably after jsr166 
> removal
> 
>
> Key: IGNITE-7770
> URL: https://issues.apache.org/jira/browse/IGNITE-7770
> Project: Ignite
>  Issue Type: Task
>Reporter: Dmitriy Pavlov
>Assignee: Andrey Kuznetsov
>Priority: Major
>  Labels: MakeTeamcityGreenAgain, Muted_test
> Fix For: 2.5
>
>
> TxRollbackOnTimeoutNearCacheTest.testRandomMixedTxConfigurations fails 
> flakily due to resulting future timeout. It's caused by poor reproducible 
> infinite park's in optimistic TX commit:



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-7770) Ignite Cache 6: testRandomMixedTxConfigurations failed probably after jsr166 removal

2018-03-23 Thread Andrey Kuznetsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-7770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Kuznetsov updated IGNITE-7770:
-
Description: 
TxRollbackOnTimeoutNearCacheTest.testRandomMixedTxConfigurations fails flakily 
due to resulting future timeout. It's caused by poor reproducible infinite 
park's in optimistic TX commit:

sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140)
org.apache.ignite.internal.processors.cache.transactions.TransactionProxyImpl.commit(TransactionProxyImpl.java:293)
org.apache.ignite.internal.processors.cache.transactions.TxRollbackOnTimeoutTest$4.run(TxRollbackOnTimeoutTest.java:444)
org.apache.ignite.testframework.GridTestUtils$9.call(GridTestUtils.java:1275)
org.apache.ignite.testframework.GridTestThread.run(GridTestThread.java:86)

There is also another failure for this test, it's described in separate ticket 
attached to this one.

  was:
TxRollbackOnTimeoutNearCacheTest.testRandomMixedTxConfigurations fails flakily 
due to resulting future timeout. It's caused by poor reproducible infinite 
park's in optimistic TX commit:




> Ignite Cache 6: testRandomMixedTxConfigurations failed probably after jsr166 
> removal
> 
>
> Key: IGNITE-7770
> URL: https://issues.apache.org/jira/browse/IGNITE-7770
> Project: Ignite
>  Issue Type: Task
>Reporter: Dmitriy Pavlov
>Assignee: Andrey Kuznetsov
>Priority: Major
>  Labels: MakeTeamcityGreenAgain, Muted_test
> Fix For: 2.5
>
>
> TxRollbackOnTimeoutNearCacheTest.testRandomMixedTxConfigurations fails 
> flakily due to resulting future timeout. It's caused by poor reproducible 
> infinite park's in optimistic TX commit:
> sun.misc.Unsafe.park(Native Method)
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
> org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
> org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140)
> org.apache.ignite.internal.processors.cache.transactions.TransactionProxyImpl.commit(TransactionProxyImpl.java:293)
> org.apache.ignite.internal.processors.cache.transactions.TxRollbackOnTimeoutTest$4.run(TxRollbackOnTimeoutTest.java:444)
> org.apache.ignite.testframework.GridTestUtils$9.call(GridTestUtils.java:1275)
> org.apache.ignite.testframework.GridTestThread.run(GridTestThread.java:86)
> There is also another failure for this test, it's described in separate 
> ticket attached to this one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IGNITE-7516) Get rid of org.jsr166.ConcurrentLinkedHashMap

2018-03-28 Thread Andrey Kuznetsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-7516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Kuznetsov resolved IGNITE-7516.
--
Resolution: Won't Fix

ConcurrentLinkedHashMap has no direct equivalent in modern Java standard 
libraries. It's a customized class from older Java versions. We can't replace 
it with some standard class due to performance reasons. 

This activity will be continued in smaller issues related to particular classes 
that use CLHM.

> Get rid of org.jsr166.ConcurrentLinkedHashMap
> -
>
> Key: IGNITE-7516
> URL: https://issues.apache.org/jira/browse/IGNITE-7516
> Project: Ignite
>  Issue Type: Sub-task
>Reporter: Andrey Kuznetsov
>Assignee: Andrey Kuznetsov
>Priority: Major
> Fix For: 2.5
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8061) GridCachePartitionedDataStructuresFailoverSelfTest.testCountDownLatchConstantMultipleTopologyChange may hang on TeamCity

2018-03-28 Thread Andrey Kuznetsov (JIRA)
Andrey Kuznetsov created IGNITE-8061:


 Summary: 
GridCachePartitionedDataStructuresFailoverSelfTest.testCountDownLatchConstantMultipleTopologyChange
 may hang on TeamCity
 Key: IGNITE-8061
 URL: https://issues.apache.org/jira/browse/IGNITE-8061
 Project: Ignite
  Issue Type: Bug
  Components: data structures
Affects Versions: 2.4
Reporter: Andrey Kuznetsov
 Fix For: 2.5
 Attachments: log.txt

The log attached contains 'Test has been timed out and will be interrupted' 
message, but does not contain subsequent 'Test has been timed out [test=...'.

Known facts:
* There is pending GridDhtColocatedLockFuture in the log.
* On timeout, InterruptedException comes to doTestCountDownLatch, but 
finally-block contains the code leading to distributed locking.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-8025) Result of GridTestUtils.runMultiThreadedAsync has a bug in cancel() implementation

2018-03-28 Thread Andrey Kuznetsov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-8025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16417875#comment-16417875
 ] 

Andrey Kuznetsov commented on IGNITE-8025:
--

[~dpavlov], I've launched some suites (including Memory Leaks) separately to 
ensure test validity. So, there is no need to wait for one more launch 
completion.

> Result of GridTestUtils.runMultiThreadedAsync has a bug in cancel() 
> implementation
> --
>
> Key: IGNITE-8025
> URL: https://issues.apache.org/jira/browse/IGNITE-8025
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.4
>Reporter: Andrey Kuznetsov
>Assignee: Andrey Kuznetsov
>Priority: Major
>  Labels: MakeTeamcityGreenAgain, test
> Fix For: 2.5
>
> Attachments: BugRunMTAsyncTest.java
>
>
> GridTestUtils.runMultiThreadedAsync returns a future with cancel() support, 
> but cancellation implementation never interrupts threads that execute 
> user-provided tasks. That is, those threads can continue their execution even 
> after test method finishes.
> The reproducer attached demonstrates activity from threads created by test0 
> after test0 finished and test1 is being run.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-8025) Result of GridTestUtils.runMultiThreadedAsync has a bug in cancel() implementation

2018-03-29 Thread Andrey Kuznetsov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-8025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16418791#comment-16418791
 ] 

Andrey Kuznetsov commented on IGNITE-8025:
--

[~dpavlov], thanks for your alertness. I've executed this suite again, and it 
turned green. Probably, new flaky tests nave been found.

> Result of GridTestUtils.runMultiThreadedAsync has a bug in cancel() 
> implementation
> --
>
> Key: IGNITE-8025
> URL: https://issues.apache.org/jira/browse/IGNITE-8025
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.4
>Reporter: Andrey Kuznetsov
>Assignee: Andrey Kuznetsov
>Priority: Major
>  Labels: MakeTeamcityGreenAgain, test
> Fix For: 2.5
>
> Attachments: BugRunMTAsyncTest.java
>
>
> GridTestUtils.runMultiThreadedAsync returns a future with cancel() support, 
> but cancellation implementation never interrupts threads that execute 
> user-provided tasks. That is, those threads can continue their execution even 
> after test method finishes.
> The reproducer attached demonstrates activity from threads created by test0 
> after test0 finished and test1 is being run.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (IGNITE-7772) All critical system workers health should be covered by IgniteFailureProcessor

2018-03-29 Thread Andrey Kuznetsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-7772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Kuznetsov reassigned IGNITE-7772:


Assignee: Andrey Kuznetsov  (was: Dmitriy Sorokin)

> All critical system workers health should be covered by IgniteFailureProcessor
> --
>
> Key: IGNITE-7772
> URL: https://issues.apache.org/jira/browse/IGNITE-7772
> Project: Ignite
>  Issue Type: Task
>Reporter: Anton Vinogradov
>Assignee: Andrey Kuznetsov
>Priority: Major
>  Labels: iep-14
> Fix For: 2.5
>
>
> List of system workers should be covered by this engine:
> disco-event-worker
> tcp-disco-sock-reader
> tcp-disco-srvr
> tcp-disco-msg-worker
> tcp-comm-worker
> grid-nio-worker-tcp-comm
> exchange-worker
> sys-stripe
> grid-timeout-worker
> db-checkpoint-thread
> wal-file-archiver
> ttl-cleanup-worker
> nio-acceptor



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-7772) All critical system workers health should be covered by IgniteFailureProcessor

2018-03-29 Thread Andrey Kuznetsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-7772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Kuznetsov updated IGNITE-7772:
-
Description: 
List of system workers should be covered by this engine:

* disco-event-worker
* tcp-disco-sock-reader
* tcp-disco-srvr
* tcp-disco-msg-worker
* tcp-comm-worker
* grid-nio-worker-tcp-comm
* exchange-worker
* sys-stripe
* grid-timeout-worker
* db-checkpoint-thread
* wal-file-archiver
* wal-write-worker
* ttl-cleanup-worker
* nio-acceptor

  was:
List of system workers should be covered by this engine:

disco-event-worker
tcp-disco-sock-reader
tcp-disco-srvr
tcp-disco-msg-worker
tcp-comm-worker
grid-nio-worker-tcp-comm
exchange-worker
sys-stripe
grid-timeout-worker
db-checkpoint-thread
wal-file-archiver
ttl-cleanup-worker
nio-acceptor


> All critical system workers health should be covered by IgniteFailureProcessor
> --
>
> Key: IGNITE-7772
> URL: https://issues.apache.org/jira/browse/IGNITE-7772
> Project: Ignite
>  Issue Type: Task
>Reporter: Anton Vinogradov
>Assignee: Andrey Kuznetsov
>Priority: Major
>  Labels: iep-14
> Fix For: 2.5
>
>
> List of system workers should be covered by this engine:
> * disco-event-worker
> * tcp-disco-sock-reader
> * tcp-disco-srvr
> * tcp-disco-msg-worker
> * tcp-comm-worker
> * grid-nio-worker-tcp-comm
> * exchange-worker
> * sys-stripe
> * grid-timeout-worker
> * db-checkpoint-thread
> * wal-file-archiver
> * wal-write-worker
> * ttl-cleanup-worker
> * nio-acceptor



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-7772) All critical system workers health should be covered by IgniteFailureProcessor

2018-03-29 Thread Andrey Kuznetsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-7772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Kuznetsov updated IGNITE-7772:
-
Description: 
List of system workers should be covered by this engine:

* disco-event-worker
* tcp-disco-srvr
* tcp-disco-msg-worker
* tcp-comm-worker
* grid-nio-worker-tcp-comm
* exchange-worker
* sys-stripe
* grid-timeout-worker
* db-checkpoint-thread
* wal-file-archiver
* wal-write-worker
* ttl-cleanup-worker
* nio-acceptor

  was:
List of system workers should be covered by this engine:

* disco-event-worker
* tcp-disco-sock-reader
* tcp-disco-srvr
* tcp-disco-msg-worker
* tcp-comm-worker
* grid-nio-worker-tcp-comm
* exchange-worker
* sys-stripe
* grid-timeout-worker
* db-checkpoint-thread
* wal-file-archiver
* wal-write-worker
* ttl-cleanup-worker
* nio-acceptor


> All critical system workers health should be covered by IgniteFailureProcessor
> --
>
> Key: IGNITE-7772
> URL: https://issues.apache.org/jira/browse/IGNITE-7772
> Project: Ignite
>  Issue Type: Task
>Reporter: Anton Vinogradov
>Assignee: Andrey Kuznetsov
>Priority: Major
>  Labels: iep-14
> Fix For: 2.5
>
>
> List of system workers should be covered by this engine:
> * disco-event-worker
> * tcp-disco-srvr
> * tcp-disco-msg-worker
> * tcp-comm-worker
> * grid-nio-worker-tcp-comm
> * exchange-worker
> * sys-stripe
> * grid-timeout-worker
> * db-checkpoint-thread
> * wal-file-archiver
> * wal-write-worker
> * ttl-cleanup-worker
> * nio-acceptor



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IGNITE-5655) Introduce pluggable string encoder/decoder

2018-04-02 Thread Andrey Kuznetsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-5655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Kuznetsov resolved IGNITE-5655.
--
Resolution: Won't Fix

Feature completion requires significant effort, while there is no active 
interest for it in the community.

> Introduce pluggable string encoder/decoder
> --
>
> Key: IGNITE-5655
> URL: https://issues.apache.org/jira/browse/IGNITE-5655
> Project: Ignite
>  Issue Type: New Feature
>  Components: binary
>Affects Versions: 2.0
>Reporter: Valentin Kulichenko
>Assignee: Andrey Kuznetsov
>Priority: Major
>  Labels: iep-2, important
> Fix For: 2.5
>
>
> Currently binary marshaller encodes strings in UTF-8. However, sometimes it 
> makes sense to serialize strings with different encodings to save space. 
> Let's add global property to control String encoding and customize our binary 
> protocol to support it. For instance, we can add another flag 
> {{ENCODED_STRING}}, which will write strings as follows:
> [flag][encoding_flag][str_len][str_bytes]
> First implementation should set preferred encoding for strings in 
> BinaryConfiguration. This setting is optional, default encoding is UTF-8. 
> Currently, the same BinaryConfiguration is used for all cluster nodes, thus 
> no encoding clashes are possible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-7772) All critical system workers health should be covered by IgniteFailureProcessor

2018-04-04 Thread Andrey Kuznetsov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-7772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16425655#comment-16425655
 ] 

Andrey Kuznetsov commented on IGNITE-7772:
--

The way of critical failures handling I proposed is not perfect. Reworking now.

> All critical system workers health should be covered by IgniteFailureProcessor
> --
>
> Key: IGNITE-7772
> URL: https://issues.apache.org/jira/browse/IGNITE-7772
> Project: Ignite
>  Issue Type: Task
>Reporter: Anton Vinogradov
>Assignee: Andrey Kuznetsov
>Priority: Major
>  Labels: iep-14
> Fix For: 2.5
>
>
> All critical workers listed in "IEP-14: Ignite failures handling" 
> (https://cwiki.apache.org/confluence/display/IGNITE/IEP-14+Ignite+failures+handling).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-7499) DataRegionMetricsImpl#getPageSize returns ZERO for system data regions

2018-04-05 Thread Andrey Kuznetsov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-7499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16426798#comment-16426798
 ] 

Andrey Kuznetsov commented on IGNITE-7499:
--

Unlike regular data regions, system data region configuration is hardcoded, and 
metrics are disabled there. [~kuaw26], do we have real reasons to query metrics 
for system data region? Depending on the answer to this question we can choose 
the way of fixing the issue.

> DataRegionMetricsImpl#getPageSize returns ZERO for system data regions
> --
>
> Key: IGNITE-7499
> URL: https://issues.apache.org/jira/browse/IGNITE-7499
> Project: Ignite
>  Issue Type: Bug
>  Components: cache
>Reporter: Alexey Kuznetsov
>Assignee: Andrey Kuznetsov
>Priority: Major
> Fix For: 2.5
>
>
> Working on IGNITE-7492 I found that DataRegionMetricsImpl#getPageSize returns 
> ZERO for system data regions.
> Meanwhile there is also 
> org.apache.ignite.internal.pagemem.PageMemory#systemPageSize method.
> That looks a bit strange, why we need PageSize and SystemPageSize ?
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9679) Document critical workers liveness checking implementation

2018-10-30 Thread Andrey Kuznetsov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16668842#comment-16668842
 ] 

Andrey Kuznetsov commented on IGNITE-9679:
--

[~Artem Budnikov], thanks, great job!

Please consider some minor remarks.
* Blocked (aka hanging) worker could be included to Critical Failures list.
* Workers of Data Streamer striped pool could be added to mission critical 
worker list.
* Due to [1], blocked worker timeout configuration became a bit trickier. 
Should this be mentioned in docs? 

[1] 
https://issues.apache.org/jira/browse/IGNITE-9737?focusedCommentId=16632210&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16632210

> Document critical workers liveness checking implementation
> --
>
> Key: IGNITE-9679
> URL: https://issues.apache.org/jira/browse/IGNITE-9679
> Project: Ignite
>  Issue Type: Task
>  Components: documentation
>Reporter: Andrey Kuznetsov
>Assignee: Andrey Kuznetsov
>Priority: Major
> Fix For: 2.7
>
>
> Newly implemented critical worker thread liveness checks should be mentioned 
> in Ignite Documentation. Brief description of the functionality follows.
> Ignite node has a number of critical worker threads that should be alive and 
> responsive, otherwise node's health is not guaranteed. These threads monitor 
> each other periodically and track two aspects for a thread being checked:
> - whether it's alive;
> - whether it updates its internal heartbeat timestamp.
> Whenever at least one of the above conditions is violated, checker thread 
> logs the error and calls currently configured {{FailureHandler}}.
> {{IgniteConfiguration.SystemWorkerBlockedTimeout}} configuration property 
> affects monitoring behavior. At runtime monitoring settings can be changed 
> via {{FailureHandlingMxBean}}.
> By default, liveness checks are enabled, but blocked system worker detection 
> will not lead to failure handler invocation, see 
> {{FailureProcessor#getDefaultFailureHandler}} .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-10079) FileWriteAheadLogManager may return invalid lastCompactedSegment

2018-10-31 Thread Andrey Kuznetsov (JIRA)
Andrey Kuznetsov created IGNITE-10079:
-

 Summary: FileWriteAheadLogManager may return invalid 
lastCompactedSegment
 Key: IGNITE-10079
 URL: https://issues.apache.org/jira/browse/IGNITE-10079
 Project: Ignite
  Issue Type: Bug
  Components: persistence
Reporter: Andrey Kuznetsov
Assignee: Andrey Kuznetsov
 Fix For: 2.8
 Attachments: WalCompactionAfterRestartTest.java

As of current {{master}} branch, 
{{FileWriteAheadLogManager#lastCompactedSegment}} may report -1 even after some 
segments have been actually compressed. Reproducer is attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9679) Document critical workers liveness checking implementation

2018-10-31 Thread Andrey Kuznetsov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16669859#comment-16669859
 ] 

Andrey Kuznetsov commented on IGNITE-9679:
--

[~Artem Budnikov], sorry. Maybe I overlooked something yesterday. Now I see 
that configuration description is up to date.

> Document critical workers liveness checking implementation
> --
>
> Key: IGNITE-9679
> URL: https://issues.apache.org/jira/browse/IGNITE-9679
> Project: Ignite
>  Issue Type: Task
>  Components: documentation
>Reporter: Andrey Kuznetsov
>Assignee: Andrey Kuznetsov
>Priority: Major
> Fix For: 2.7
>
>
> Newly implemented critical worker thread liveness checks should be mentioned 
> in Ignite Documentation. Brief description of the functionality follows.
> Ignite node has a number of critical worker threads that should be alive and 
> responsive, otherwise node's health is not guaranteed. These threads monitor 
> each other periodically and track two aspects for a thread being checked:
> - whether it's alive;
> - whether it updates its internal heartbeat timestamp.
> Whenever at least one of the above conditions is violated, checker thread 
> logs the error and calls currently configured {{FailureHandler}}.
> {{IgniteConfiguration.SystemWorkerBlockedTimeout}} configuration property 
> affects monitoring behavior. At runtime monitoring settings can be changed 
> via {{FailureHandlingMxBean}}.
> By default, liveness checks are enabled, but blocked system worker detection 
> will not lead to failure handler invocation, see 
> {{FailureProcessor#getDefaultFailureHandler}} .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-10079) FileWriteAheadLogManager may return invalid lastCompactedSegment

2018-11-01 Thread Andrey Kuznetsov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-10079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16671237#comment-16671237
 ] 

Andrey Kuznetsov commented on IGNITE-10079:
---

[~ivandasch], got it. And this ordering works perfectly. 

There are still some possibilities to bring {{SegmentCompressStorage}} to 
inconsistent state. For example, in the fix I proposed, I just mitigate 
consequences of independent changes to {{SegmentAware.lastTruncatedArchiveIdx}} 
and {{SegmentCompressStorage.compressingSegments}}. Is it possible to place 
both into the same class and modify in atomic fashion?

> FileWriteAheadLogManager may return invalid lastCompactedSegment
> 
>
> Key: IGNITE-10079
> URL: https://issues.apache.org/jira/browse/IGNITE-10079
> Project: Ignite
>  Issue Type: Bug
>  Components: persistence
>Reporter: Andrey Kuznetsov
>Assignee: Andrey Kuznetsov
>Priority: Major
> Fix For: 2.8
>
> Attachments: WalCompactionAfterRestartTest.java
>
>
> As of current {{master}} branch, 
> {{FileWriteAheadLogManager#lastCompactedSegment}} may report -1 even after 
> some segments have been actually compressed. Reproducer is attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-10079) FileWriteAheadLogManager may return invalid lastCompactedSegment

2018-11-02 Thread Andrey Kuznetsov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-10079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16673306#comment-16673306
 ] 

Andrey Kuznetsov commented on IGNITE-10079:
---

[~akalashnikov], thanks for your notes and suggestions. I will work on them 
next week.

> FileWriteAheadLogManager may return invalid lastCompactedSegment
> 
>
> Key: IGNITE-10079
> URL: https://issues.apache.org/jira/browse/IGNITE-10079
> Project: Ignite
>  Issue Type: Bug
>  Components: persistence
>Reporter: Andrey Kuznetsov
>Assignee: Andrey Kuznetsov
>Priority: Major
> Fix For: 2.8
>
> Attachments: WalCompactionAfterRestartTest.java
>
>
> As of current {{master}} branch, 
> {{FileWriteAheadLogManager#lastCompactedSegment}} may report -1 even after 
> some segments have been actually compressed. Reproducer is attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-10079) FileWriteAheadLogManager may return invalid lastCompactedSegment

2018-11-07 Thread Andrey Kuznetsov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-10079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16678168#comment-16678168
 ] 

Andrey Kuznetsov commented on IGNITE-10079:
---

[~akalashnikov], I have accepted your suggestions in Upsource partly, let us 
agree on the rest.

> FileWriteAheadLogManager may return invalid lastCompactedSegment
> 
>
> Key: IGNITE-10079
> URL: https://issues.apache.org/jira/browse/IGNITE-10079
> Project: Ignite
>  Issue Type: Bug
>  Components: persistence
>Reporter: Andrey Kuznetsov
>Assignee: Andrey Kuznetsov
>Priority: Major
> Fix For: 2.8
>
> Attachments: WalCompactionAfterRestartTest.java
>
>
> As of current {{master}} branch, 
> {{FileWriteAheadLogManager#lastCompactedSegment}} may report -1 even after 
> some segments have been actually compressed. Reproducer is attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-10079) FileWriteAheadLogManager may return invalid lastCompactedSegment

2018-11-08 Thread Andrey Kuznetsov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-10079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16679562#comment-16679562
 ] 

Andrey Kuznetsov commented on IGNITE-10079:
---

[~akalashnikov], do you have any more comments on this change?

> FileWriteAheadLogManager may return invalid lastCompactedSegment
> 
>
> Key: IGNITE-10079
> URL: https://issues.apache.org/jira/browse/IGNITE-10079
> Project: Ignite
>  Issue Type: Bug
>  Components: persistence
>Reporter: Andrey Kuznetsov
>Assignee: Andrey Kuznetsov
>Priority: Major
> Fix For: 2.8
>
> Attachments: WalCompactionAfterRestartTest.java
>
>
> As of current {{master}} branch, 
> {{FileWriteAheadLogManager#lastCompactedSegment}} may report -1 even after 
> some segments have been actually compressed. Reproducer is attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (IGNITE-10386) Add mode when WAL won't be disabled during rebalancing caused by BLT change

2018-11-26 Thread Andrey Kuznetsov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Kuznetsov reassigned IGNITE-10386:
-

Assignee: Andrey Kuznetsov

> Add mode when WAL won't be disabled during rebalancing caused by BLT change
> ---
>
> Key: IGNITE-10386
> URL: https://issues.apache.org/jira/browse/IGNITE-10386
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Rakov
>Assignee: Andrey Kuznetsov
>Priority: Major
> Fix For: 2.8
>
>
> Enabling IgniteSystemProperties#IGNITE_DISABLE_WAL_DURING_REBALANCING 
> disables WAL for cache group during rebalancing in case local node has no 
> OWNING partitions for this group. 
> We should add mode when in specific case (after BaselineTopology change) WAL 
> won't be disabled even if this property is switched on.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-8823) Incorrect transaction state in tx manager

2018-12-14 Thread Andrey Kuznetsov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-8823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16721565#comment-16721565
 ] 

Andrey Kuznetsov commented on IGNITE-8823:
--

[~dpavlov], I just have rolled new master up. Also, TeamCity tests have been 
re-triggered, waiting for results.

> Incorrect transaction state in tx manager
> -
>
> Key: IGNITE-8823
> URL: https://issues.apache.org/jira/browse/IGNITE-8823
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.6
>Reporter: Andrey Gura
>Assignee: Andrey Kuznetsov
>Priority: Major
> Fix For: 2.8
>
> Attachments: Ignite8823ReproducerTest.java
>
>
> Reproducable by test method {{testCreateConsistencyMultithreaded}} in 
> {{IgfsPrimaryMultiNodeSelfTest}} and 
> {{IgfsPrimaryRelaxedConsistencyMultiNodeSelfTest}}:
> {noformat}
> 18:34:40,701][SEVERE][sys-stripe-0-#44%ignite%][GridCacheIoManager] Failed 
> processing message [senderId=e273c3f8-02ed-4201-9ac8-09f9ab6a1d31, 
> msg=GridNearTxPrepareResponse [pending=[], 
> futId=b4df8831461-9735f9d5-79a0-47a3-a951-e62a03af71ef, miniId=1, 
> dhtVer=GridCacheVersion [topVer=140816081, order=1529336085358, nodeOrder=3], 
> writeVer=GridCacheVersion [topVer=140816081, order=1529336085360, 
> nodeOrder=3], ownedVals=null, retVal=GridCacheReturn [v=null, cacheObj=null, 
> success=true, invokeRes=true, loc=true, cacheId=0], clientRemapVer=null, 
> super=GridDistributedTxPrepareResponse 
> [txState=IgniteTxImplicitSingleStateImpl [init=true, recovery=false], 
> part=-1, err=null, super=GridDistributedBaseMessage [ver=GridCacheVersion 
> [topVer=140816081, order=1529336085224, nodeOrder=1], committedVers=null, 
> rolledbackVers=null, cnt=0, super=GridCacheIdMessage [cacheId=0]
> java.lang.AssertionError: true instead of GridCacheReturnCompletableWrapper
>   at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager.removeTxReturn(IgniteTxManager.java:1098)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxFinishFuture.ackBackup(GridNearTxFinishFuture.java:533)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxFinishFuture.doFinish(GridNearTxFinishFuture.java:500)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxFinishFuture.finish(GridNearTxFinishFuture.java:417)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal$19.apply(GridNearTxLocal.java:3341)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal$19.apply(GridNearTxLocal.java:3335)
>   at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:383)
>   at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.unblock(GridFutureAdapter.java:347)
>   at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.unblockAll(GridFutureAdapter.java:335)
>   at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:495)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheCompoundFuture.onDone(GridCacheCompoundFuture.java:56)
>   at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:474)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimisticTxPrepareFuture.onComplete(GridNearOptimisticTxPrepareFuture.java:310)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimisticTxPrepareFuture.onDone(GridNearOptimisticTxPrepareFuture.java:288)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimisticTxPrepareFuture.onDone(GridNearOptimisticTxPrepareFuture.java:78)
>   at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:451)
>   at 
> org.apache.ignite.internal.util.future.GridCompoundFuture.checkComplete(GridCompoundFuture.java:285)
>   at 
> org.apache.ignite.internal.util.future.GridCompoundFuture.apply(GridCompoundFuture.java:144)
>   at 
> org.apache.ignite.internal.util.future.GridCompoundFuture.apply(GridCompoundFuture.java:45)
>   at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:383)
>   at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.unblock(GridFutureAdapter.java:347)
>   at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.unblockAll(GridFutureAdapter.java:335)
>   at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:495)
>   at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:474)
>   at 
> org.apache.ignite.internal.util.future.Gr

[jira] [Created] (IGNITE-9601) Write rollover WAL record as the last

2018-09-14 Thread Andrey Kuznetsov (JIRA)
Andrey Kuznetsov created IGNITE-9601:


 Summary: Write rollover WAL record as the last 
 Key: IGNITE-9601
 URL: https://issues.apache.org/jira/browse/IGNITE-9601
 Project: Ignite
  Issue Type: Task
Affects Versions: 2.6
Reporter: Andrey Kuznetsov
Assignee: Andrey Kuznetsov
 Fix For: 2.8


Currently, rollover WAL record gets to the next segment when being logged. 
Moreover, the implementation does allows data races, and rollover record is not 
necessarily the first record in the next segment. We are to add an option to 
logging facility to allow writing rollover record to the end of the current 
segment; subsequent records should get to the next segment then.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-9601) Write rollover WAL record as the last record in current segment

2018-09-14 Thread Andrey Kuznetsov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Kuznetsov updated IGNITE-9601:
-
Summary: Write rollover WAL record as the last record in current segment  
(was: Write rollover WAL record as the last )

> Write rollover WAL record as the last record in current segment
> ---
>
> Key: IGNITE-9601
> URL: https://issues.apache.org/jira/browse/IGNITE-9601
> Project: Ignite
>  Issue Type: Task
>Affects Versions: 2.6
>Reporter: Andrey Kuznetsov
>Assignee: Andrey Kuznetsov
>Priority: Major
> Fix For: 2.8
>
>
> Currently, rollover WAL record gets to the next segment when being logged. 
> Moreover, the implementation does allows data races, and rollover record is 
> not necessarily the first record in the next segment. We are to add an option 
> to logging facility to allow writing rollover record to the end of the 
> current segment; subsequent records should get to the next segment then.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-9601) Write rollover WAL record as the last record in current segment

2018-09-14 Thread Andrey Kuznetsov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Kuznetsov updated IGNITE-9601:
-
Description: Currently, rollover WAL record gets to the next segment when 
being logged. Moreover, the implementation allows data races, and rollover 
record is not necessarily the first record in the next segment. We are to add 
an option to logging facility to allow writing rollover record to the end of 
the current segment; subsequent records should get to the next segment then.  
(was: Currently, rollover WAL record gets to the next segment when being 
logged. Moreover, the implementation does allows data races, and rollover 
record is not necessarily the first record in the next segment. We are to add 
an option to logging facility to allow writing rollover record to the end of 
the current segment; subsequent records should get to the next segment then.)

> Write rollover WAL record as the last record in current segment
> ---
>
> Key: IGNITE-9601
> URL: https://issues.apache.org/jira/browse/IGNITE-9601
> Project: Ignite
>  Issue Type: Task
>Affects Versions: 2.6
>Reporter: Andrey Kuznetsov
>Assignee: Andrey Kuznetsov
>Priority: Major
> Fix For: 2.8
>
>
> Currently, rollover WAL record gets to the next segment when being logged. 
> Moreover, the implementation allows data races, and rollover record is not 
> necessarily the first record in the next segment. We are to add an option to 
> logging facility to allow writing rollover record to the end of the current 
> segment; subsequent records should get to the next segment then.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-6587) Ignite watchdog service

2018-09-18 Thread Andrey Kuznetsov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-6587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618882#comment-16618882
 ] 

Andrey Kuznetsov commented on IGNITE-6587:
--

[~agura], I've updated the implementation after discussing your points, see 
[1]. Now it's waiting for your review.

> Ignite watchdog service
> ---
>
> Key: IGNITE-6587
> URL: https://issues.apache.org/jira/browse/IGNITE-6587
> Project: Ignite
>  Issue Type: Improvement
>  Components: general
>Affects Versions: 2.2
>Reporter: Alexey Goncharuk
>Assignee: Andrey Kuznetsov
>Priority: Major
>  Labels: IEP-5
> Fix For: 2.7
>
> Attachments: watchdog.sh
>
>
> As described in [1], each Ignite node has a number of system-critical 
> threads. We should implement a periodic check that calls failure handler when 
> one of the following conditions has been detected:
> * Critical thread is not alive anymore.
> * Critical thread 'hangs' for a long time, e.g. while executing a task 
> extracted from task queue.
> In case of failure condition, call stacks of all threads should be logged 
> before invoking failure handler.
> Actual list of system-critical threads can be found at [1].
> Implementations based on separate diagnostic thread seem fragile, cause this 
> thread become a vulnerable point with respect to thread termination and CPU 
> resource starvation. So we are to use self-monitoring approach: critical 
> threads themselves should monitor each other.
> Currently we have {{o.a.i.internal.worker.WorkersRegistry}} facility that 
> fits best to store and track system critical threads. All of them should be 
> refactored to be {{GridWorker's}} and added to {{WorkersRegistry}}. Each 
> worker should periodically choose some subset of peer workers and check 
> whether
> * All of them are alive.
> * All of them are actively running.
> It's required to add a 'heartbeat' timestamp to worker in order to implement 
> latter check. Additionally, infinite queue polls, waits on monitors or thread 
> parks should be refactored to their timed equivalents in system critical 
> threads.
> Monitoring parameters (enable/disable, check interval, thread 'hang' 
> threshold, etc.) are to be set via system properties.
> [1] 
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-14+Ignite+failures+handling



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (IGNITE-6587) Ignite watchdog service

2018-09-18 Thread Andrey Kuznetsov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-6587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618882#comment-16618882
 ] 

Andrey Kuznetsov edited comment on IGNITE-6587 at 9/18/18 10:28 AM:


[~agura], I've updated the implementation after discussing your points, see 
[1]. Now it's waiting for your review.

[1] 
http://apache-ignite-developers.2346864.n4.nabble.com/Critical-worker-threads-liveness-checking-drawbacks-td34783.html



was (Author: andrey-kuznetsov):
[~agura], I've updated the implementation after discussing your points, see 
[1]. Now it's waiting for your review.

> Ignite watchdog service
> ---
>
> Key: IGNITE-6587
> URL: https://issues.apache.org/jira/browse/IGNITE-6587
> Project: Ignite
>  Issue Type: Improvement
>  Components: general
>Affects Versions: 2.2
>Reporter: Alexey Goncharuk
>Assignee: Andrey Kuznetsov
>Priority: Major
>  Labels: IEP-5
> Fix For: 2.7
>
> Attachments: watchdog.sh
>
>
> As described in [1], each Ignite node has a number of system-critical 
> threads. We should implement a periodic check that calls failure handler when 
> one of the following conditions has been detected:
> * Critical thread is not alive anymore.
> * Critical thread 'hangs' for a long time, e.g. while executing a task 
> extracted from task queue.
> In case of failure condition, call stacks of all threads should be logged 
> before invoking failure handler.
> Actual list of system-critical threads can be found at [1].
> Implementations based on separate diagnostic thread seem fragile, cause this 
> thread become a vulnerable point with respect to thread termination and CPU 
> resource starvation. So we are to use self-monitoring approach: critical 
> threads themselves should monitor each other.
> Currently we have {{o.a.i.internal.worker.WorkersRegistry}} facility that 
> fits best to store and track system critical threads. All of them should be 
> refactored to be {{GridWorker's}} and added to {{WorkersRegistry}}. Each 
> worker should periodically choose some subset of peer workers and check 
> whether
> * All of them are alive.
> * All of them are actively running.
> It's required to add a 'heartbeat' timestamp to worker in order to implement 
> latter check. Additionally, infinite queue polls, waits on monitors or thread 
> parks should be refactored to their timed equivalents in system critical 
> threads.
> Monitoring parameters (enable/disable, check interval, thread 'hang' 
> threshold, etc.) are to be set via system properties.
> [1] 
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-14+Ignite+failures+handling



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-9640) [TC Bot] Determine repetitive failure types by analyzing build log

2018-09-18 Thread Andrey Kuznetsov (JIRA)
Andrey Kuznetsov created IGNITE-9640:


 Summary: [TC Bot] Determine repetitive failure types by analyzing 
build log
 Key: IGNITE-9640
 URL: https://issues.apache.org/jira/browse/IGNITE-9640
 Project: Ignite
  Issue Type: Task
Reporter: Andrey Kuznetsov


When someone is analyzing flaky test failure, it's important to distinguish 
between newly created failure and pre-existing one. In the latter case, the bot 
should not attract contributor's attention to the test.

In more detail, TC build log fragments starts with identical substrings for 
identical failures very often, e.g.

{noformat}
junit.framework.AssertionFailedError
at 
org.apache.ignite.internal.GridVersionSelfTest.testVersions(GridVersionSelfTest.java:54)
{noformat}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-8570) Create lighter version of GridStringLogger

2018-09-18 Thread Andrey Kuznetsov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-8570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16619398#comment-16619398
 ] 

Andrey Kuznetsov commented on IGNITE-8570:
--

Thanks, [~xtern]. I'll examine your change in a day.

> Create lighter version of GridStringLogger
> --
>
> Key: IGNITE-8570
> URL: https://issues.apache.org/jira/browse/IGNITE-8570
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 2.4
>Reporter: Andrey Kuznetsov
>Assignee: Pavel Pereslegin
>Priority: Major
> Fix For: 2.7
>
>
> _+Problem with current GridStringLogger implementation+_: 
>  Most usages of {{GridStringLogger}} in test assumes the following scenario. 
> First, it is set as a logger for some Ignite node. 
>  Then, after some activity on that node, log content is searched for some 
> predefined strings. 
>  {{GridStringLogger}} uses {{StringBuilder}} of bounded size internally to 
> store log contents, older contents gets dropped on exaustion. 
>  Thus, changes that add more logging may damage some independent tests that 
> use {{GridStringLogger}}.
>  
> +_The suggestion for new implementation:_+
>  The suggestion is to implement and use another test logger conforming to 
> these requirements:
>  * It does not accumulate any logs(actually, it will print no logs to 
> anywhere)
>  * It allows to set the listener that fires when log message matches certain 
> regular expression, {{Matcher}} can be passed to the listener
>  
> _+Proposed design+_, pseudocode:
> ```
>  Class GridRegexpLogger implements IgniteLogger{
>  …
>  debug(String str){
>  if (/* str matches pattern. */)
>    \{ /* notify listeners. */ }
> }
>  …
>  listen("regexp", IgniteInClosure loggerListener)// listener receives 
> message
> { /* registers listener. */ }
> listenDebug("regexp", loggerListener)
> { /* registers listener for debug output only. */ }
> …
>  }
>  ```
> +_Sample regexp logger usage_+:
> ```
>  GridRegexpLogger logger;
> logger.listen(“regexp”, new GridRegexpListener());
> logger.listenDebug("regexp", new GridRegexpListener());
>  ```



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-8823) Incorrect transaction state in tx manager

2018-09-19 Thread Andrey Kuznetsov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-8823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Kuznetsov updated IGNITE-8823:
-
Fix Version/s: 2.8

> Incorrect transaction state in tx manager
> -
>
> Key: IGNITE-8823
> URL: https://issues.apache.org/jira/browse/IGNITE-8823
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.6
>Reporter: Andrey Gura
>Assignee: Andrey Kuznetsov
>Priority: Major
> Fix For: 2.8
>
> Attachments: Ignite8823ReproducerTest.java
>
>
> Reproducable by test method {{testCreateConsistencyMultithreaded}} in 
> {{IgfsPrimaryMultiNodeSelfTest}} and 
> {{IgfsPrimaryRelaxedConsistencyMultiNodeSelfTest}}:
> {noformat}
> 18:34:40,701][SEVERE][sys-stripe-0-#44%ignite%][GridCacheIoManager] Failed 
> processing message [senderId=e273c3f8-02ed-4201-9ac8-09f9ab6a1d31, 
> msg=GridNearTxPrepareResponse [pending=[], 
> futId=b4df8831461-9735f9d5-79a0-47a3-a951-e62a03af71ef, miniId=1, 
> dhtVer=GridCacheVersion [topVer=140816081, order=1529336085358, nodeOrder=3], 
> writeVer=GridCacheVersion [topVer=140816081, order=1529336085360, 
> nodeOrder=3], ownedVals=null, retVal=GridCacheReturn [v=null, cacheObj=null, 
> success=true, invokeRes=true, loc=true, cacheId=0], clientRemapVer=null, 
> super=GridDistributedTxPrepareResponse 
> [txState=IgniteTxImplicitSingleStateImpl [init=true, recovery=false], 
> part=-1, err=null, super=GridDistributedBaseMessage [ver=GridCacheVersion 
> [topVer=140816081, order=1529336085224, nodeOrder=1], committedVers=null, 
> rolledbackVers=null, cnt=0, super=GridCacheIdMessage [cacheId=0]
> java.lang.AssertionError: true instead of GridCacheReturnCompletableWrapper
>   at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager.removeTxReturn(IgniteTxManager.java:1098)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxFinishFuture.ackBackup(GridNearTxFinishFuture.java:533)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxFinishFuture.doFinish(GridNearTxFinishFuture.java:500)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxFinishFuture.finish(GridNearTxFinishFuture.java:417)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal$19.apply(GridNearTxLocal.java:3341)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal$19.apply(GridNearTxLocal.java:3335)
>   at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:383)
>   at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.unblock(GridFutureAdapter.java:347)
>   at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.unblockAll(GridFutureAdapter.java:335)
>   at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:495)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheCompoundFuture.onDone(GridCacheCompoundFuture.java:56)
>   at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:474)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimisticTxPrepareFuture.onComplete(GridNearOptimisticTxPrepareFuture.java:310)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimisticTxPrepareFuture.onDone(GridNearOptimisticTxPrepareFuture.java:288)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimisticTxPrepareFuture.onDone(GridNearOptimisticTxPrepareFuture.java:78)
>   at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:451)
>   at 
> org.apache.ignite.internal.util.future.GridCompoundFuture.checkComplete(GridCompoundFuture.java:285)
>   at 
> org.apache.ignite.internal.util.future.GridCompoundFuture.apply(GridCompoundFuture.java:144)
>   at 
> org.apache.ignite.internal.util.future.GridCompoundFuture.apply(GridCompoundFuture.java:45)
>   at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:383)
>   at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.unblock(GridFutureAdapter.java:347)
>   at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.unblockAll(GridFutureAdapter.java:335)
>   at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:495)
>   at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:474)
>   at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:451)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimisticTxPr

[jira] [Updated] (IGNITE-8823) Incorrect transaction state in tx manager

2018-09-19 Thread Andrey Kuznetsov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-8823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Kuznetsov updated IGNITE-8823:
-
Affects Version/s: 2.6

> Incorrect transaction state in tx manager
> -
>
> Key: IGNITE-8823
> URL: https://issues.apache.org/jira/browse/IGNITE-8823
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.6
>Reporter: Andrey Gura
>Assignee: Andrey Kuznetsov
>Priority: Major
> Fix For: 2.8
>
> Attachments: Ignite8823ReproducerTest.java
>
>
> Reproducable by test method {{testCreateConsistencyMultithreaded}} in 
> {{IgfsPrimaryMultiNodeSelfTest}} and 
> {{IgfsPrimaryRelaxedConsistencyMultiNodeSelfTest}}:
> {noformat}
> 18:34:40,701][SEVERE][sys-stripe-0-#44%ignite%][GridCacheIoManager] Failed 
> processing message [senderId=e273c3f8-02ed-4201-9ac8-09f9ab6a1d31, 
> msg=GridNearTxPrepareResponse [pending=[], 
> futId=b4df8831461-9735f9d5-79a0-47a3-a951-e62a03af71ef, miniId=1, 
> dhtVer=GridCacheVersion [topVer=140816081, order=1529336085358, nodeOrder=3], 
> writeVer=GridCacheVersion [topVer=140816081, order=1529336085360, 
> nodeOrder=3], ownedVals=null, retVal=GridCacheReturn [v=null, cacheObj=null, 
> success=true, invokeRes=true, loc=true, cacheId=0], clientRemapVer=null, 
> super=GridDistributedTxPrepareResponse 
> [txState=IgniteTxImplicitSingleStateImpl [init=true, recovery=false], 
> part=-1, err=null, super=GridDistributedBaseMessage [ver=GridCacheVersion 
> [topVer=140816081, order=1529336085224, nodeOrder=1], committedVers=null, 
> rolledbackVers=null, cnt=0, super=GridCacheIdMessage [cacheId=0]
> java.lang.AssertionError: true instead of GridCacheReturnCompletableWrapper
>   at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager.removeTxReturn(IgniteTxManager.java:1098)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxFinishFuture.ackBackup(GridNearTxFinishFuture.java:533)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxFinishFuture.doFinish(GridNearTxFinishFuture.java:500)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxFinishFuture.finish(GridNearTxFinishFuture.java:417)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal$19.apply(GridNearTxLocal.java:3341)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal$19.apply(GridNearTxLocal.java:3335)
>   at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:383)
>   at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.unblock(GridFutureAdapter.java:347)
>   at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.unblockAll(GridFutureAdapter.java:335)
>   at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:495)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheCompoundFuture.onDone(GridCacheCompoundFuture.java:56)
>   at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:474)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimisticTxPrepareFuture.onComplete(GridNearOptimisticTxPrepareFuture.java:310)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimisticTxPrepareFuture.onDone(GridNearOptimisticTxPrepareFuture.java:288)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimisticTxPrepareFuture.onDone(GridNearOptimisticTxPrepareFuture.java:78)
>   at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:451)
>   at 
> org.apache.ignite.internal.util.future.GridCompoundFuture.checkComplete(GridCompoundFuture.java:285)
>   at 
> org.apache.ignite.internal.util.future.GridCompoundFuture.apply(GridCompoundFuture.java:144)
>   at 
> org.apache.ignite.internal.util.future.GridCompoundFuture.apply(GridCompoundFuture.java:45)
>   at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:383)
>   at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.unblock(GridFutureAdapter.java:347)
>   at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.unblockAll(GridFutureAdapter.java:335)
>   at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:495)
>   at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:474)
>   at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:451)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimistic

[jira] [Updated] (IGNITE-8862) IgniteChangeGlobalStateTest hangs on TeamCity

2018-09-19 Thread Andrey Kuznetsov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-8862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Kuznetsov updated IGNITE-8862:
-
Fix Version/s: 2.8

> IgniteChangeGlobalStateTest hangs on TeamCity
> -
>
> Key: IGNITE-8862
> URL: https://issues.apache.org/jira/browse/IGNITE-8862
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.5
>Reporter: Andrey Kuznetsov
>Assignee: Andrey Kuznetsov
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
> Fix For: 2.8
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-6893) Java Deadlocks monitoring

2018-09-19 Thread Andrey Kuznetsov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-6893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Kuznetsov updated IGNITE-6893:
-
Fix Version/s: (was: 2.7)
   2.8

> Java Deadlocks monitoring
> -
>
> Key: IGNITE-6893
> URL: https://issues.apache.org/jira/browse/IGNITE-6893
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Anton Vinogradov
>Assignee: Andrey Kuznetsov
>Priority: Major
>  Labels: iep-7
> Fix For: 2.8
>
>
> Java Level Deadlocks
> Description
> This situation occurs if user or Ignite comes to a Java-level deadlock due to 
> a bug in code - reverse order synchronized(mux1) {synchronized (mux2) {}}  
> sections, reverse order reentrant locks, etc.
> Detection and Solution
> This most likely cannot be resolved automatically and will require JVM 
> restart.
> We can implement periodical threaddumps analysis and detect the deadlock. 
> Report
> Deadlock should be reported to the logs.
> Web Console should fire an alert on java deadlock detection and display a 
> warning on UI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


***UNCHECKED*** [jira] [Updated] (IGNITE-7499) DataRegionMetricsImpl#getPageSize returns ZERO for system data regions

2018-09-19 Thread Andrey Kuznetsov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-7499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Kuznetsov updated IGNITE-7499:
-
Fix Version/s: (was: 2.7)
   2.8

> DataRegionMetricsImpl#getPageSize returns ZERO for system data regions
> --
>
> Key: IGNITE-7499
> URL: https://issues.apache.org/jira/browse/IGNITE-7499
> Project: Ignite
>  Issue Type: Bug
>  Components: cache
>Reporter: Alexey Kuznetsov
>Assignee: Andrey Kuznetsov
>Priority: Major
> Fix For: 2.8
>
>
> Working on IGNITE-7492 I found that DataRegionMetricsImpl#getPageSize returns 
> ZERO for system data regions.
> Meanwhile there is also 
> org.apache.ignite.internal.pagemem.PageMemory#systemPageSize method.
> That looks a bit strange, why we need PageSize and SystemPageSize ?
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-8290) Activation message handling fails with AssertionError sporadically.

2018-09-19 Thread Andrey Kuznetsov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Kuznetsov updated IGNITE-8290:
-
Fix Version/s: 2.8

> Activation message handling fails with AssertionError sporadically.
> ---
>
> Key: IGNITE-8290
> URL: https://issues.apache.org/jira/browse/IGNITE-8290
> Project: Ignite
>  Issue Type: Bug
>  Components: persistence
>Reporter: Andrew Mashenkov
>Assignee: Andrey Kuznetsov
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
> Fix For: 2.8
>
> Attachments: disco-msg-fails-2.stack, disco-msg-fails.stack
>
>
> Some test fails sporadically due to AssertionError while processing custom 
> discovery message which can leads to grid and tests handing.
> PFA stacktraces.
> org.apache.ignite.internal.processors.cache.persistence.db.IgnitePdsWholeClusterRestartTest
>  is a good startpoint.
> However, the test passes at master, it's every run logs lot of 
> AssertionErrors .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-8570) Create lighter version of GridStringLogger

2018-09-19 Thread Andrey Kuznetsov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-8570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16620820#comment-16620820
 ] 

Andrey Kuznetsov commented on IGNITE-8570:
--

After discussion at Upsource and subsequent minor changes, I'm OK with the PR.

> Create lighter version of GridStringLogger
> --
>
> Key: IGNITE-8570
> URL: https://issues.apache.org/jira/browse/IGNITE-8570
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 2.4
>Reporter: Andrey Kuznetsov
>Assignee: Pavel Pereslegin
>Priority: Major
> Fix For: 2.8
>
>
> _+Problem with current GridStringLogger implementation+_: 
>  Most usages of {{GridStringLogger}} in test assumes the following scenario. 
> First, it is set as a logger for some Ignite node. 
>  Then, after some activity on that node, log content is searched for some 
> predefined strings. 
>  {{GridStringLogger}} uses {{StringBuilder}} of bounded size internally to 
> store log contents, older contents gets dropped on exaustion. 
>  Thus, changes that add more logging may damage some independent tests that 
> use {{GridStringLogger}}.
>  
> +_The suggestion for new implementation:_+
>  The suggestion is to implement and use another test logger conforming to 
> these requirements:
>  * It does not accumulate any logs(actually, it will print no logs to 
> anywhere)
>  * It allows to set the listener that fires when log message matches certain 
> regular expression, {{Matcher}} can be passed to the listener
>  
> _+Proposed design+_, pseudocode:
> ```
>  Class GridRegexpLogger implements IgniteLogger{
>  …
>  debug(String str){
>  if (/* str matches pattern. */)
>    \{ /* notify listeners. */ }
> }
>  …
>  listen("regexp", IgniteInClosure loggerListener)// listener receives 
> message
> { /* registers listener. */ }
> listenDebug("regexp", loggerListener)
> { /* registers listener for debug output only. */ }
> …
>  }
>  ```
> +_Sample regexp logger usage_+:
> ```
>  GridRegexpLogger logger;
> logger.listen(“regexp”, new GridRegexpListener());
> logger.listenDebug("regexp", new GridRegexpListener());
>  ```



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-9653) StopNodeOrHaltFailureHandlerTest.testJvmHalted has flaky failures on master branch

2018-09-20 Thread Andrey Kuznetsov (JIRA)
Andrey Kuznetsov created IGNITE-9653:


 Summary: StopNodeOrHaltFailureHandlerTest.testJvmHalted has flaky 
failures on master branch
 Key: IGNITE-9653
 URL: https://issues.apache.org/jira/browse/IGNITE-9653
 Project: Ignite
  Issue Type: Bug
Reporter: Andrey Kuznetsov
 Fix For: 2.8


```
junit.framework.AssertionFailedError
at 
org.apache.ignite.failure.StopNodeOrHaltFailureHandlerTest.testJvmHalted(StopNodeOrHaltFailureHandlerTest.java:93)
```



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-9653) StopNodeOrHaltFailureHandlerTest.testJvmHalted has flaky failures on master branch

2018-09-20 Thread Andrey Kuznetsov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Kuznetsov updated IGNITE-9653:
-
Description: 
{noformat}
junit.framework.AssertionFailedError
at 
org.apache.ignite.failure.StopNodeOrHaltFailureHandlerTest.testJvmHalted(StopNodeOrHaltFailureHandlerTest.java:93)

{noformat}


  was:
```
junit.framework.AssertionFailedError
at 
org.apache.ignite.failure.StopNodeOrHaltFailureHandlerTest.testJvmHalted(StopNodeOrHaltFailureHandlerTest.java:93)
```


> StopNodeOrHaltFailureHandlerTest.testJvmHalted has flaky failures on master 
> branch
> --
>
> Key: IGNITE-9653
> URL: https://issues.apache.org/jira/browse/IGNITE-9653
> Project: Ignite
>  Issue Type: Bug
>Reporter: Andrey Kuznetsov
>Priority: Major
> Fix For: 2.8
>
>
> {noformat}
> junit.framework.AssertionFailedError
> at 
> org.apache.ignite.failure.StopNodeOrHaltFailureHandlerTest.testJvmHalted(StopNodeOrHaltFailureHandlerTest.java:93)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-9653) StopNodeOrHaltFailureHandlerTest.testJvmHalted has flaky failures on master branch

2018-09-20 Thread Andrey Kuznetsov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Kuznetsov updated IGNITE-9653:
-
Ignite Flags:   (was: Docs Required)

> StopNodeOrHaltFailureHandlerTest.testJvmHalted has flaky failures on master 
> branch
> --
>
> Key: IGNITE-9653
> URL: https://issues.apache.org/jira/browse/IGNITE-9653
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.6
>Reporter: Andrey Kuznetsov
>Priority: Major
> Fix For: 2.8
>
>
> {noformat}
> junit.framework.AssertionFailedError
> at 
> org.apache.ignite.failure.StopNodeOrHaltFailureHandlerTest.testJvmHalted(StopNodeOrHaltFailureHandlerTest.java:93)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-9653) StopNodeOrHaltFailureHandlerTest.testJvmHalted has flaky failures on master branch

2018-09-20 Thread Andrey Kuznetsov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Kuznetsov updated IGNITE-9653:
-
Affects Version/s: 2.6

> StopNodeOrHaltFailureHandlerTest.testJvmHalted has flaky failures on master 
> branch
> --
>
> Key: IGNITE-9653
> URL: https://issues.apache.org/jira/browse/IGNITE-9653
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.6
>Reporter: Andrey Kuznetsov
>Priority: Major
> Fix For: 2.8
>
>
> {noformat}
> junit.framework.AssertionFailedError
> at 
> org.apache.ignite.failure.StopNodeOrHaltFailureHandlerTest.testJvmHalted(StopNodeOrHaltFailureHandlerTest.java:93)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-9660) Switch default test FailureHandler to StopNodeFailureHandler

2018-09-20 Thread Andrey Kuznetsov (JIRA)
Andrey Kuznetsov created IGNITE-9660:


 Summary: Switch default test FailureHandler to 
StopNodeFailureHandler
 Key: IGNITE-9660
 URL: https://issues.apache.org/jira/browse/IGNITE-9660
 Project: Ignite
  Issue Type: Test
Affects Versions: 2.6
Reporter: Andrey Kuznetsov
 Fix For: 2.8


{{GridAbstractTest.getFailureHandler()}} returns {{NoOpFailureHandler}} 
instance. This often leads to hiding bugs occurring in tests. 
{{getFailureFailureHandler()}} should return {{StopNodeFailureHandler}} instead.

The change assumes re-checking failed tests and set handler to 
{{NoOpFailureHandler}} in subclasses where it's really a must.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-9660) Switch default test FailureHandler to StopNodeFailureHandler

2018-09-20 Thread Andrey Kuznetsov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Kuznetsov updated IGNITE-9660:
-
Issue Type: Task  (was: Test)

> Switch default test FailureHandler to StopNodeFailureHandler
> 
>
> Key: IGNITE-9660
> URL: https://issues.apache.org/jira/browse/IGNITE-9660
> Project: Ignite
>  Issue Type: Task
>Affects Versions: 2.6
>Reporter: Andrey Kuznetsov
>Priority: Major
> Fix For: 2.8
>
>
> {{GridAbstractTest.getFailureHandler()}} returns {{NoOpFailureHandler}} 
> instance. This often leads to hiding bugs occurring in tests. 
> {{getFailureFailureHandler()}} should return {{StopNodeFailureHandler}} 
> instead.
> The change assumes re-checking failed tests and set handler to 
> {{NoOpFailureHandler}} in subclasses where it's really a must.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9660) Switch default test FailureHandler to StopNodeFailureHandler

2018-09-21 Thread Andrey Kuznetsov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16623651#comment-16623651
 ] 

Andrey Kuznetsov commented on IGNITE-9660:
--

It's better to use special test-scope failure handler instead of 
{{StopNodeFailureHandler}}. That handler should fail current test method as 
graceful as possible.

> Switch default test FailureHandler to StopNodeFailureHandler
> 
>
> Key: IGNITE-9660
> URL: https://issues.apache.org/jira/browse/IGNITE-9660
> Project: Ignite
>  Issue Type: Task
>Affects Versions: 2.6
>Reporter: Andrey Kuznetsov
>Priority: Major
> Fix For: 2.8
>
>
> {{GridAbstractTest.getFailureHandler()}} returns {{NoOpFailureHandler}} 
> instance. This often leads to hiding bugs occurring in tests. 
> {{getFailureFailureHandler()}} should return {{StopNodeFailureHandler}} 
> instead.
> The change assumes re-checking failed tests and set handler to 
> {{NoOpFailureHandler}} in subclasses where it's really a must.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (IGNITE-9660) Switch default test FailureHandler to StopNodeFailureHandler

2018-09-21 Thread Andrey Kuznetsov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16623651#comment-16623651
 ] 

Andrey Kuznetsov edited comment on IGNITE-9660 at 9/21/18 1:56 PM:
---

It's better to use special test-scope failure handler instead of 
{{StopNodeFailureHandler}}. That handler mentioned in [1] should fail current 
test method as graceful as possible.

[1] https://issues.apache.org/jira/browse/IGNITE-8227


was (Author: andrey-kuznetsov):
It's better to use special test-scope failure handler instead of 
{{StopNodeFailureHandler}}. That handler should fail current test method as 
graceful as possible.

> Switch default test FailureHandler to StopNodeFailureHandler
> 
>
> Key: IGNITE-9660
> URL: https://issues.apache.org/jira/browse/IGNITE-9660
> Project: Ignite
>  Issue Type: Task
>Affects Versions: 2.6
>Reporter: Andrey Kuznetsov
>Priority: Major
> Fix For: 2.8
>
>
> {{GridAbstractTest.getFailureHandler()}} returns {{NoOpFailureHandler}} 
> instance. This often leads to hiding bugs occurring in tests. 
> {{getFailureFailureHandler()}} should return {{StopNodeFailureHandler}} 
> instead.
> The change assumes re-checking failed tests and set handler to 
> {{NoOpFailureHandler}} in subclasses where it's really a must.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (IGNITE-9660) Switch default test FailureHandler to StopNodeFailureHandler

2018-09-21 Thread Andrey Kuznetsov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16623651#comment-16623651
 ] 

Andrey Kuznetsov edited comment on IGNITE-9660 at 9/21/18 1:57 PM:
---

It's better to use special test-scope failure handler instead of 
{{StopNodeFailureHandler}}. That handler (mentioned in [1]) should fail current 
test method as graceful as possible.

[1] https://issues.apache.org/jira/browse/IGNITE-8227


was (Author: andrey-kuznetsov):
It's better to use special test-scope failure handler instead of 
{{StopNodeFailureHandler}}. That handler mentioned in [1] should fail current 
test method as graceful as possible.

[1] https://issues.apache.org/jira/browse/IGNITE-8227

> Switch default test FailureHandler to StopNodeFailureHandler
> 
>
> Key: IGNITE-9660
> URL: https://issues.apache.org/jira/browse/IGNITE-9660
> Project: Ignite
>  Issue Type: Task
>Affects Versions: 2.6
>Reporter: Andrey Kuznetsov
>Priority: Major
> Fix For: 2.8
>
>
> {{GridAbstractTest.getFailureHandler()}} returns {{NoOpFailureHandler}} 
> instance. This often leads to hiding bugs occurring in tests. 
> {{getFailureFailureHandler()}} should return {{StopNodeFailureHandler}} 
> instead.
> The change assumes re-checking failed tests and set handler to 
> {{NoOpFailureHandler}} in subclasses where it's really a must.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-9666) TxPessimisticDeadlockDetectionCrossCacheTest.testDeadlockAnotherNear is flaky on master

2018-09-23 Thread Andrey Kuznetsov (JIRA)
Andrey Kuznetsov created IGNITE-9666:


 Summary: 
TxPessimisticDeadlockDetectionCrossCacheTest.testDeadlockAnotherNear is flaky 
on master
 Key: IGNITE-9666
 URL: https://issues.apache.org/jira/browse/IGNITE-9666
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.6
Reporter: Andrey Kuznetsov
 Fix For: 2.8


Sometimes the test cannot pass {{assertTrue(deadlock.get())}}. 

Presumably, it's due to ignoring possible long JVM pauses. For example, one can 
see near the first 'put' pair (note timestamps) :

{noformat}
[2018-09-23 11:16:55,975][INFO ][tx-thread-1][root] >>> Performs put 
[node=TcpDiscoveryNode [id=dd46ab0e-ed28-4c67-b3c4-98900bb0, 
addrs=ArrayList [127.0.0.1], sockAddrs=HashSet [/127.0.0.1:47500], 
discPort=47500, order=1, intOrder=1, lastExchangeTime=1537690615852, loc=true, 
ver=2.7.0#19700101-sha1:, isClient=false], tx=TransactionProxyImpl 
[tx=IgniteTxAdapter [xidVer=GridCacheVersion [topVer=149170604, 
order=1537690611182, nodeOrder=1], writeVer=null, implicit=false, loc=true, 
threadId=129, startTime=1537690615791, 
nodeId=dd46ab0e-ed28-4c67-b3c4-98900bb0, startVer=GridCacheVersion 
[topVer=149170604, order=1537690611182, nodeOrder=1], endVer=null, 
isolation=REPEATABLE_READ, concurrency=PESSIMISTIC, timeout=500, 
sysInvalidate=false, sys=false, plc=2, commitVer=null, finalizing=NONE, 
invalidParts=null, state=ACTIVE, timedOut=false, topVer=AffinityTopologyVersion 
[topVer=-1, minorTopVer=0], 
txCounters=org.apache.ignite.internal.processors.cache.transactions.TxCounters@31c7393f,
 duration=155ms, onePhaseCommit=false]IgniteTxLocalAdapter [completedBase=null, 
sndTransformedVals=false, depEnabled=false, txState=IgniteTxStateImpl 
[activeCacheIds=[], recovery=null, mvccEnabled=null, txMap=EmptySet []], 
mvccWaitTxs=null, qryEnlisted=false, super=, size=0]GridDhtTxLocalAdapter 
[nearOnOriginatingNode=false, nearNodes=KeySetView [], dhtNodes=KeySetView [], 
explicitLock=false, super=]GridNearTxLocal [mappings=IgniteTxMappingsImpl [], 
nearLocallyMapped=false, colocatedLocallyMapped=false, needCheckBackup=null, 
hasRemoteLocks=false, trackTimeout=true, lb=null, mvccTracker=null, sql=null, 
thread=tx-thread-1, mappings=IgniteTxMappingsImpl [], super=], async=false, 
asyncRes=null], key=2, cache=cache0]
[2018-09-23 11:16:55,975][INFO ][tx-thread-2][root] >>> Performs put 
[node=TcpDiscoveryNode [id=dd46ab0e-ed28-4c67-b3c4-98900bb0, 
addrs=ArrayList [127.0.0.1], sockAddrs=HashSet [/127.0.0.1:47500], 
discPort=47500, order=1, intOrder=1, lastExchangeTime=1537690615852, loc=true, 
ver=2.7.0#19700101-sha1:, isClient=false], tx=TransactionProxyImpl 
[tx=IgniteTxAdapter [xidVer=GridCacheVersion [topVer=149170604, 
order=1537690611181, nodeOrder=1], writeVer=null, implicit=false, loc=true, 
threadId=130, startTime=1537690615791, 
nodeId=dd46ab0e-ed28-4c67-b3c4-98900bb0, startVer=GridCacheVersion 
[topVer=149170604, order=1537690611182, nodeOrder=1], endVer=null, 
isolation=REPEATABLE_READ, concurrency=PESSIMISTIC, timeout=500, 
sysInvalidate=false, sys=false, plc=2, commitVer=null, finalizing=NONE, 
invalidParts=null, state=ACTIVE, timedOut=false, topVer=AffinityTopologyVersion 
[topVer=-1, minorTopVer=0], 
txCounters=org.apache.ignite.internal.processors.cache.transactions.TxCounters@14d54c9c,
 duration=155ms, onePhaseCommit=false]IgniteTxLocalAdapter [completedBase=null, 
sndTransformedVals=false, depEnabled=false, txState=IgniteTxStateImpl 
[activeCacheIds=[], recovery=null, mvccEnabled=null, txMap=EmptySet []], 
mvccWaitTxs=null, qryEnlisted=false, super=, size=0]GridDhtTxLocalAdapter 
[nearOnOriginatingNode=false, nearNodes=KeySetView [], dhtNodes=KeySetView [], 
explicitLock=false, super=]GridNearTxLocal [mappings=IgniteTxMappingsImpl [], 
nearLocallyMapped=false, colocatedLocallyMapped=false, needCheckBackup=null, 
hasRemoteLocks=false, trackTimeout=true, lb=null, mvccTracker=null, sql=null, 
thread=tx-thread-2, mappings=IgniteTxMappingsImpl [], super=], async=false, 
asyncRes=null], key=2, cache=cache1]
[2018-09-23 11:16:56,378][INFO 
][exchange-worker-#38%transactions.TxPessimisticDeadlockDetectionCrossCacheTest0%][time]
 Started exchange init [topVer=AffinityTopologyVersion [topVer=2, 
minorTopVer=3], mvccCrd=MvccCoordinator 
[nodeId=dd46ab0e-ed28-4c67-b3c4-98900bb0, crdVer=1537690602134, 
topVer=AffinityTopologyVersion [topVer=1, minorTopVer=0]], mvccCrdChange=false, 
crd=true, evt=DISCOVERY_CUSTOM_EVT, 
evtNode=dd46ab0e-ed28-4c67-b3c4-98900bb0, 
customEvt=CacheAffinityChangeMessage 
[id=d7540850661-799b6d10-6e53-4f8b-9595-98f8c060efa1, 
topVer=AffinityTopologyVersion [topVer=2, minorTopVer=0], exchId=null, 
partsMsg=null, exchangeNeeded=true], allowMerge=false]
{noformat}

And then, transactions have to roll back due to 500 ms timeout, leaving no 
possibility to produce deadlock.




--
This message was sent by Atlassi

[jira] [Updated] (IGNITE-9640) [TC Bot] Determine repetitive failure types by analyzing build log

2018-09-24 Thread Andrey Kuznetsov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Kuznetsov updated IGNITE-9640:
-
Description: 
When someone is analyzing flaky test failure, it's important to distinguish 
between newly created failure and pre-existing one. In the latter case, the bot 
should not attract contributor's attention to the test.

In more detail, TC build log fragments start with identical substrings for 
identical failures very often, e.g.

{noformat}
junit.framework.AssertionFailedError
at 
org.apache.ignite.internal.GridVersionSelfTest.testVersions(GridVersionSelfTest.java:54)
{noformat}


  was:
When someone is analyzing flaky test failure, it's important to distinguish 
between newly created failure and pre-existing one. In the latter case, the bot 
should not attract contributor's attention to the test.

In more detail, TC build log fragments starts with identical substrings for 
identical failures very often, e.g.

{noformat}
junit.framework.AssertionFailedError
at 
org.apache.ignite.internal.GridVersionSelfTest.testVersions(GridVersionSelfTest.java:54)
{noformat}



> [TC Bot] Determine repetitive failure types by analyzing build log
> --
>
> Key: IGNITE-9640
> URL: https://issues.apache.org/jira/browse/IGNITE-9640
> Project: Ignite
>  Issue Type: Task
>Reporter: Andrey Kuznetsov
>Priority: Minor
>
> When someone is analyzing flaky test failure, it's important to distinguish 
> between newly created failure and pre-existing one. In the latter case, the 
> bot should not attract contributor's attention to the test.
> In more detail, TC build log fragments start with identical substrings for 
> identical failures very often, e.g.
> {noformat}
> junit.framework.AssertionFailedError
> at 
> org.apache.ignite.internal.GridVersionSelfTest.testVersions(GridVersionSelfTest.java:54)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-9640) [TC Bot] Determine repetitive failure types by analyzing build log

2018-09-24 Thread Andrey Kuznetsov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Kuznetsov updated IGNITE-9640:
-
Description: 
When the test is already flaky in master branch, developer has to check if 
possible failures in PR are the same as in master branch. As for now, this is 
done manually by analyzing build history on TeamCity. TC Bot can simplify this 
process in the following way:

* When both error descriptions from PR/master build logs start with identical 
substring, then the failure should not be reported.
* Otherwise test failure should be reported.

Example of potentially identical substring:
{noformat}
junit.framework.AssertionFailedError
at 
org.apache.ignite.internal.GridVersionSelfTest.testVersions(GridVersionSelfTest.java:54)
{noformat}


  was:
When someone is analyzing flaky test failure, it's important to distinguish 
between newly created failure and pre-existing one. In the latter case, the bot 
should not attract contributor's attention to the test.

In more detail, TC build log fragments start with identical substrings for 
identical failures very often, e.g.

{noformat}
junit.framework.AssertionFailedError
at 
org.apache.ignite.internal.GridVersionSelfTest.testVersions(GridVersionSelfTest.java:54)
{noformat}



> [TC Bot] Determine repetitive failure types by analyzing build log
> --
>
> Key: IGNITE-9640
> URL: https://issues.apache.org/jira/browse/IGNITE-9640
> Project: Ignite
>  Issue Type: Task
>Reporter: Andrey Kuznetsov
>Priority: Minor
>
> When the test is already flaky in master branch, developer has to check if 
> possible failures in PR are the same as in master branch. As for now, this is 
> done manually by analyzing build history on TeamCity. TC Bot can simplify 
> this process in the following way:
> * When both error descriptions from PR/master build logs start with identical 
> substring, then the failure should not be reported.
> * Otherwise test failure should be reported.
> Example of potentially identical substring:
> {noformat}
> junit.framework.AssertionFailedError
> at 
> org.apache.ignite.internal.GridVersionSelfTest.testVersions(GridVersionSelfTest.java:54)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-9679) Document critical workers liveness checking implementation

2018-09-24 Thread Andrey Kuznetsov (JIRA)
Andrey Kuznetsov created IGNITE-9679:


 Summary: Document critical workers liveness checking implementation
 Key: IGNITE-9679
 URL: https://issues.apache.org/jira/browse/IGNITE-9679
 Project: Ignite
  Issue Type: Task
  Components: documentation
Reporter: Andrey Kuznetsov
Assignee: Denis Magda
 Fix For: 2.7


Newly implemented critical worker thread liveness checks should be mentioned in 
Ignite Documentation. Brief description of the functionality follows.

Ignite node has a number of critical worker threads that should be alive and 
responsive, otherwise node's health is not guaranteed. These threads monitor 
each other periodically and track two aspects for a thread being checked:
- whether it's alive;
- whether it updates its internal heartbeat timestamp.
Both checks use {{IgniteConfiguration.failureDetectionTimeout}} property as a 
threshold value.
Whenever at least one of the above conditions is violated, checker thread logs 
the error and calls currently configured {{FailureHandler}}.

Liveness checks are enabled by default, but can be disabled through 
{{WorkersControlMXBean.healthMonitoringEnabled}} property.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9683) Create manual pinger for ZK client

2018-09-25 Thread Andrey Kuznetsov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627284#comment-16627284
 ] 

Andrey Kuznetsov commented on IGNITE-9683:
--

[~Jokser], Zookeeper docs [1] mention ping requests built into Zk client:

{noformat}
The session is kept alive by requests sent by the client. If the session is 
idle for a period of time that would timeout the session, the client will send 
a PING request to keep the session alive. This PING request not only allows the 
ZooKeeper server to know that the client is still active, but it also allows 
the client to verify that its connection to the ZooKeeper server is still 
active. The timing of the PING is conservative enough to ensure reasonable time 
to detect a dead connection and reconnect to a new server.
{noformat}

Is this statement outdated, or "reasonable time" is not suitable for Discovery 
SPI?


> Create manual pinger for ZK client
> --
>
> Key: IGNITE-9683
> URL: https://issues.apache.org/jira/browse/IGNITE-9683
> Project: Ignite
>  Issue Type: Improvement
>  Components: cache, zookeeper
>Affects Versions: 2.5
>Reporter: Pavel Kovalenko
>Priority: Major
> Fix For: 2.7
>
>
> Connection loss with Zookeeper more than ZK session timeout for server nodes 
> is unacceptable. To improve durability of connrction, we need to keep session 
> with ZK as long possible. We need to introduce manual pinger additionally to 
> ZK client  and ping ZK server with simple request each tick time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (IGNITE-9683) Create manual pinger for ZK client

2018-09-25 Thread Andrey Kuznetsov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627284#comment-16627284
 ] 

Andrey Kuznetsov edited comment on IGNITE-9683 at 9/25/18 12:51 PM:


[~Jokser], Zookeeper docs [1] mention ping requests built into Zk client:

"The session is kept alive by requests sent by the client. If the session is 
idle for a period of time that would timeout the session, the client will send 
a PING request to keep the session alive. This PING request not only allows the 
ZooKeeper server to know that the client is still active, but it also allows 
the client to verify that its connection to the ZooKeeper server is still 
active. The timing of the PING is conservative enough to ensure reasonable time 
to detect a dead connection and reconnect to a new server."

Is this statement outdated, or "reasonable time" is not suitable for Discovery 
SPI?



was (Author: andrey-kuznetsov):
[~Jokser], Zookeeper docs [1] mention ping requests built into Zk client:

{noformat}
The session is kept alive by requests sent by the client. If the session is 
idle for a period of time that would timeout the session, the client will send 
a PING request to keep the session alive. This PING request not only allows the 
ZooKeeper server to know that the client is still active, but it also allows 
the client to verify that its connection to the ZooKeeper server is still 
active. The timing of the PING is conservative enough to ensure reasonable time 
to detect a dead connection and reconnect to a new server.
{noformat}

Is this statement outdated, or "reasonable time" is not suitable for Discovery 
SPI?


> Create manual pinger for ZK client
> --
>
> Key: IGNITE-9683
> URL: https://issues.apache.org/jira/browse/IGNITE-9683
> Project: Ignite
>  Issue Type: Improvement
>  Components: cache, zookeeper
>Affects Versions: 2.5
>Reporter: Pavel Kovalenko
>Priority: Major
> Fix For: 2.7
>
>
> Connection loss with Zookeeper more than ZK session timeout for server nodes 
> is unacceptable. To improve durability of connrction, we need to keep session 
> with ZK as long possible. We need to introduce manual pinger additionally to 
> ZK client  and ping ZK server with simple request each tick time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (IGNITE-9683) Create manual pinger for ZK client

2018-09-25 Thread Andrey Kuznetsov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627284#comment-16627284
 ] 

Andrey Kuznetsov edited comment on IGNITE-9683 at 9/25/18 12:51 PM:


[~Jokser], Zookeeper docs [1] mention ping requests built into Zk client:

"The session is kept alive by requests sent by the client. If the session is 
idle for a period of time that would timeout the session, the client will send 
a PING request to keep the session alive. This PING request not only allows the 
ZooKeeper server to know that the client is still active, but it also allows 
the client to verify that its connection to the ZooKeeper server is still 
active. The timing of the PING is conservative enough to ensure reasonable time 
to detect a dead connection and reconnect to a new server."

Is this statement outdated, or "reasonable time" is not suitable for Discovery 
SPI?

[1] https://zookeeper.apache.org/doc/r3.4.13/zookeeperProgrammers.html



was (Author: andrey-kuznetsov):
[~Jokser], Zookeeper docs [1] mention ping requests built into Zk client:

"The session is kept alive by requests sent by the client. If the session is 
idle for a period of time that would timeout the session, the client will send 
a PING request to keep the session alive. This PING request not only allows the 
ZooKeeper server to know that the client is still active, but it also allows 
the client to verify that its connection to the ZooKeeper server is still 
active. The timing of the PING is conservative enough to ensure reasonable time 
to detect a dead connection and reconnect to a new server."

Is this statement outdated, or "reasonable time" is not suitable for Discovery 
SPI?


> Create manual pinger for ZK client
> --
>
> Key: IGNITE-9683
> URL: https://issues.apache.org/jira/browse/IGNITE-9683
> Project: Ignite
>  Issue Type: Improvement
>  Components: cache, zookeeper
>Affects Versions: 2.5
>Reporter: Pavel Kovalenko
>Priority: Major
> Fix For: 2.7
>
>
> Connection loss with Zookeeper more than ZK session timeout for server nodes 
> is unacceptable. To improve durability of connrction, we need to keep session 
> with ZK as long possible. We need to introduce manual pinger additionally to 
> ZK client  and ping ZK server with simple request each tick time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-9695) Add a way to prevent per-cache WAL disabling in WalStateManager

2018-09-25 Thread Andrey Kuznetsov (JIRA)
Andrey Kuznetsov created IGNITE-9695:


 Summary: Add a way to prevent per-cache WAL disabling in 
WalStateManager
 Key: IGNITE-9695
 URL: https://issues.apache.org/jira/browse/IGNITE-9695
 Project: Ignite
  Issue Type: Task
Affects Versions: 2.6
Reporter: Andrey Kuznetsov
Assignee: Andrey Kuznetsov
 Fix For: 2.8


When this prevention is on, {{WalStateManager.init()}} should return an 
error-holding future immediately.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-9731) NPE is possible during WAL flushing

2018-09-27 Thread Andrey Kuznetsov (JIRA)
Andrey Kuznetsov created IGNITE-9731:


 Summary: NPE is possible during WAL flushing
 Key: IGNITE-9731
 URL: https://issues.apache.org/jira/browse/IGNITE-9731
 Project: Ignite
  Issue Type: Task
Reporter: Andrey Kuznetsov
 Fix For: 2.7
 Attachments: WalRolloverRecordLoggingTest.java

{{FileWriteAheadLogManager.flush()}} seems to be not thread-safe anymore in 
master branch. The test attached produces the following NPE:

{noformat}
java.lang.NullPointerException
at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileHandle.getSegmentId(FileWriteAheadLogManager.java:2371)
at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.needFsync(FileWriteAheadLogManager.java:2642)
at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.fsync(FileWriteAheadLogManager.java:2668)
at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.access$1900(FileWriteAheadLogManager.java:2445)
at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.flush(FileWriteAheadLogManager.java:866)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.markCheckpointBegin(GridCacheDatabaseSharedManager.java:3633)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.doCheckpoint(GridCacheDatabaseSharedManager.java:3126)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.body(GridCacheDatabaseSharedManager.java:3025)
at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at java.lang.Thread.run(Thread.java:748)
{noformat}

This could be possibly brought by commit [1].

[1] 
https://github.com/apache/ignite/commit/2f72fe758d4256c4eb4610e5922ad3d174b43dc5




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-9731) NPE is possible during WAL flushing

2018-09-27 Thread Andrey Kuznetsov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Kuznetsov updated IGNITE-9731:
-
Issue Type: Bug  (was: Task)

> NPE is possible during WAL flushing
> ---
>
> Key: IGNITE-9731
> URL: https://issues.apache.org/jira/browse/IGNITE-9731
> Project: Ignite
>  Issue Type: Bug
>Reporter: Andrey Kuznetsov
>Priority: Critical
> Fix For: 2.7
>
> Attachments: WalRolloverRecordLoggingTest.java
>
>
> {{FileWriteAheadLogManager.flush()}} seems to be not thread-safe anymore in 
> master branch. The test attached produces the following NPE:
> {noformat}
> java.lang.NullPointerException
>   at 
> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileHandle.getSegmentId(FileWriteAheadLogManager.java:2371)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.needFsync(FileWriteAheadLogManager.java:2642)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.fsync(FileWriteAheadLogManager.java:2668)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.access$1900(FileWriteAheadLogManager.java:2445)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.flush(FileWriteAheadLogManager.java:866)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.markCheckpointBegin(GridCacheDatabaseSharedManager.java:3633)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.doCheckpoint(GridCacheDatabaseSharedManager.java:3126)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.body(GridCacheDatabaseSharedManager.java:3025)
>   at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}
> This could be possibly brought by commit [1].
> [1] 
> https://github.com/apache/ignite/commit/2f72fe758d4256c4eb4610e5922ad3d174b43dc5



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (IGNITE-9737) Ignite WatchDog service should be configurable

2018-09-28 Thread Andrey Kuznetsov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Kuznetsov reassigned IGNITE-9737:


Assignee: Andrey Kuznetsov

> Ignite WatchDog service should be configurable
> --
>
> Key: IGNITE-9737
> URL: https://issues.apache.org/jira/browse/IGNITE-9737
> Project: Ignite
>  Issue Type: Bug
>Reporter: Nikolay Izhikov
>Assignee: Andrey Kuznetsov
>Priority: Blocker
> Fix For: 2.7
>
>
> At the moment, there is no way to disable Ignite WatchDog service from config 
> or JVM option.
> In any corner case or bug in that feature Ignite can become fully unusable 
> due to unpredictable shutdown.
> We should provide a way to enable/disable this feature from config or from 
> JVM option.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9737) Ignite WatchDog service should be configurable

2018-09-28 Thread Andrey Kuznetsov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16632210#comment-16632210
 ] 

Andrey Kuznetsov commented on IGNITE-9737:
--

Critical threads monitoring should be configured in a more flexible way than it 
was originally suggested in issue description.

# Separate timeout setting should be introduced for {{SYSTEM_WORKER_BLOCKED}} 
detection in {{IgniteConfiguration}}, falling back to 
{{IgniteConfiguration.failureDetectionTimeout}} if omitted. This can be 
overridden by a system property or changed in runtime by newly created 
management bean. Timeout value of 0 should denote infinite timeout, thus it has 
an effect of detection disabling.
# Separate timeout should be introduced for checkpoint read lock acquisition in 
the same manner.
# When timed out, checkpoint read lock acquisition should not throw an 
exception unless failure handler invalidated the node. This will guarantee 
neutral behavior of {{NoOpFailureHandler}}.



> Ignite WatchDog service should be configurable
> --
>
> Key: IGNITE-9737
> URL: https://issues.apache.org/jira/browse/IGNITE-9737
> Project: Ignite
>  Issue Type: Bug
>Reporter: Nikolay Izhikov
>Assignee: Andrey Kuznetsov
>Priority: Blocker
> Fix For: 2.7
>
>
> At the moment, there is no way to disable Ignite WatchDog service from config 
> or JVM option.
> In any corner case or bug in that feature Ignite can become fully unusable 
> due to unpredictable shutdown.
> We should provide a way to enable/disable this feature from config or from 
> JVM option.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (IGNITE-9710) Ignite watchdog service handles longrunning cache creation

2018-09-28 Thread Andrey Kuznetsov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Kuznetsov reassigned IGNITE-9710:


Assignee: Andrey Kuznetsov

> Ignite watchdog service handles longrunning cache creation
> --
>
> Key: IGNITE-9710
> URL: https://issues.apache.org/jira/browse/IGNITE-9710
> Project: Ignite
>  Issue Type: Bug
>  Components: cache
>Affects Versions: 2.6
>Reporter: Vyacheslav Daradur
>Assignee: Andrey Kuznetsov
>Priority: Major
> Fix For: 2.7
>
> Attachments: LongRunningCacheCreationTest.java
>
>
> Ignite watchdog service introduced by IGNITE-6587 handles long running cache 
> creation.
> Action in {{GridDhtPartitionsExchangeFuture#init}} may take significant time 
> and possibly should be covered by blocking section of warchdog service.
> Reproducer was attached: [^LongRunningCacheCreationTest.java].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9710) Ignite watchdog service handles longrunning cache creation

2018-09-28 Thread Andrey Kuznetsov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16632230#comment-16632230
 ] 

Andrey Kuznetsov commented on IGNITE-9710:
--

Wrapping entire {{GridDhtPartitionsExchangeFuture.init()}} into blocking 
section is too rough solution. By doing this we in fact make 
partition-exchanger thread not critical.

To fix the issue thoroughly, {{init()}} body should be examined, and long 
running operations should be interleaved with {{GridWorker.updateHeartbeat()}} 
calls or wrapped into critical sections, depending on the situation.

> Ignite watchdog service handles longrunning cache creation
> --
>
> Key: IGNITE-9710
> URL: https://issues.apache.org/jira/browse/IGNITE-9710
> Project: Ignite
>  Issue Type: Bug
>  Components: cache
>Affects Versions: 2.6
>Reporter: Vyacheslav Daradur
>Assignee: Andrey Kuznetsov
>Priority: Major
> Fix For: 2.7
>
> Attachments: LongRunningCacheCreationTest.java
>
>
> Ignite watchdog service introduced by IGNITE-6587 handles long running cache 
> creation.
> Action in {{GridDhtPartitionsExchangeFuture#init}} may take significant time 
> and possibly should be covered by blocking section of warchdog service.
> Reproducer was attached: [^LongRunningCacheCreationTest.java].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-9744) Fix SYSTEM_WORKER_TERMINATION detection in general case

2018-09-30 Thread Andrey Kuznetsov (JIRA)
Andrey Kuznetsov created IGNITE-9744:


 Summary: Fix SYSTEM_WORKER_TERMINATION detection in general case
 Key: IGNITE-9744
 URL: https://issues.apache.org/jira/browse/IGNITE-9744
 Project: Ignite
  Issue Type: Bug
Reporter: Andrey Kuznetsov
Assignee: Andrey Kuznetsov
 Fix For: 2.7


All existing critical workers handle unintended termination individually. This 
should be done for arbitrtary critical worker as well. There is a test to check 
this situation, {{SystemWorkersTerminationTest.testTermination}}, but now it 
passes in fact due to {{SYSTEM_WORKER_BLOCKED}} instead of 
{{SYSTEM_WORKER_TERMINATION}}, and this should be fixed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (IGNITE-9679) Document critical workers liveness checking implementation

2018-10-01 Thread Andrey Kuznetsov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Kuznetsov reassigned IGNITE-9679:


Assignee: Andrey Kuznetsov  (was: Artem Budnikov)

> Document critical workers liveness checking implementation
> --
>
> Key: IGNITE-9679
> URL: https://issues.apache.org/jira/browse/IGNITE-9679
> Project: Ignite
>  Issue Type: Task
>  Components: documentation
>Reporter: Andrey Kuznetsov
>Assignee: Andrey Kuznetsov
>Priority: Major
> Fix For: 2.7
>
>
> Newly implemented critical worker thread liveness checks should be mentioned 
> in Ignite Documentation. Brief description of the functionality follows.
> Ignite node has a number of critical worker threads that should be alive and 
> responsive, otherwise node's health is not guaranteed. These threads monitor 
> each other periodically and track two aspects for a thread being checked:
> - whether it's alive;
> - whether it updates its internal heartbeat timestamp.
> Both checks use {{IgniteConfiguration.failureDetectionTimeout}} property as a 
> threshold value.
> Whenever at least one of the above conditions is violated, checker thread 
> logs the error and calls currently configured {{FailureHandler}}.
> Liveness checks are enabled by default, but can be disabled through 
> {{WorkersControlMXBean.healthMonitoringEnabled}} property.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-9776) FsyncModeFileWriteAheadLogManager can block forever in log() call

2018-10-02 Thread Andrey Kuznetsov (JIRA)
Andrey Kuznetsov created IGNITE-9776:


 Summary: FsyncModeFileWriteAheadLogManager can block forever in 
log() call
 Key: IGNITE-9776
 URL: https://issues.apache.org/jira/browse/IGNITE-9776
 Project: Ignite
  Issue Type: Bug
Reporter: Andrey Kuznetsov
 Fix For: 2.7
 Attachments: FsyncWalRolloverDoesNotBlockTest.java

If WAL archiver is disabled and WALRecord being logged has {{rollOver() == 
true}}, then {{log()}} call blocks forever in {{FileArchiver}}'s (!) method:

{noformat}
nextAbsoluteSegmentIndex:1707, FsyncModeFileWriteAheadLogManager$FileArchiver 
(org.apache.ignite.internal.processors.cache.persistence.wal)
access$3200:1437, FsyncModeFileWriteAheadLogManager$FileArchiver 
(org.apache.ignite.internal.processors.cache.persistence.wal)
pollNextFile:1384, FsyncModeFileWriteAheadLogManager 
(org.apache.ignite.internal.processors.cache.persistence.wal)
initNextWriteHandle:1243, FsyncModeFileWriteAheadLogManager 
(org.apache.ignite.internal.processors.cache.persistence.wal)
rollOver:1130, FsyncModeFileWriteAheadLogManager 
(org.apache.ignite.internal.processors.cache.persistence.wal)
log:712, FsyncModeFileWriteAheadLogManager 
(org.apache.ignite.internal.processors.cache.persistence.wal)
{noformat}

Reporoducer is attached.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-9776) FsyncModeFileWriteAheadLogManager can block forever in log() call

2018-10-02 Thread Andrey Kuznetsov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Kuznetsov updated IGNITE-9776:
-
Description: 
If WAL archiver is disabled and WALRecord being logged has {{rollOver() == 
true}}, then {{log()}} call blocks forever in {{FileArchiver}}'s (!) method:

{noformat}
nextAbsoluteSegmentIndex:1707, FsyncModeFileWriteAheadLogManager$FileArchiver 
(org.apache.ignite.internal.processors.cache.persistence.wal)
access$3200:1437, FsyncModeFileWriteAheadLogManager$FileArchiver 
(org.apache.ignite.internal.processors.cache.persistence.wal)
pollNextFile:1384, FsyncModeFileWriteAheadLogManager 
(org.apache.ignite.internal.processors.cache.persistence.wal)
initNextWriteHandle:1243, FsyncModeFileWriteAheadLogManager 
(org.apache.ignite.internal.processors.cache.persistence.wal)
rollOver:1130, FsyncModeFileWriteAheadLogManager 
(org.apache.ignite.internal.processors.cache.persistence.wal)
log:712, FsyncModeFileWriteAheadLogManager 
(org.apache.ignite.internal.processors.cache.persistence.wal)
{noformat}

Reproducer is attached.


  was:
If WAL archiver is disabled and WALRecord being logged has {{rollOver() == 
true}}, then {{log()}} call blocks forever in {{FileArchiver}}'s (!) method:

{noformat}
nextAbsoluteSegmentIndex:1707, FsyncModeFileWriteAheadLogManager$FileArchiver 
(org.apache.ignite.internal.processors.cache.persistence.wal)
access$3200:1437, FsyncModeFileWriteAheadLogManager$FileArchiver 
(org.apache.ignite.internal.processors.cache.persistence.wal)
pollNextFile:1384, FsyncModeFileWriteAheadLogManager 
(org.apache.ignite.internal.processors.cache.persistence.wal)
initNextWriteHandle:1243, FsyncModeFileWriteAheadLogManager 
(org.apache.ignite.internal.processors.cache.persistence.wal)
rollOver:1130, FsyncModeFileWriteAheadLogManager 
(org.apache.ignite.internal.processors.cache.persistence.wal)
log:712, FsyncModeFileWriteAheadLogManager 
(org.apache.ignite.internal.processors.cache.persistence.wal)
{noformat}

Reporoducer is attached.



> FsyncModeFileWriteAheadLogManager can block forever in log() call
> -
>
> Key: IGNITE-9776
> URL: https://issues.apache.org/jira/browse/IGNITE-9776
> Project: Ignite
>  Issue Type: Bug
>Reporter: Andrey Kuznetsov
>Priority: Major
> Fix For: 2.7
>
> Attachments: FsyncWalRolloverDoesNotBlockTest.java
>
>
> If WAL archiver is disabled and WALRecord being logged has {{rollOver() == 
> true}}, then {{log()}} call blocks forever in {{FileArchiver}}'s (!) method:
> {noformat}
> nextAbsoluteSegmentIndex:1707, FsyncModeFileWriteAheadLogManager$FileArchiver 
> (org.apache.ignite.internal.processors.cache.persistence.wal)
> access$3200:1437, FsyncModeFileWriteAheadLogManager$FileArchiver 
> (org.apache.ignite.internal.processors.cache.persistence.wal)
> pollNextFile:1384, FsyncModeFileWriteAheadLogManager 
> (org.apache.ignite.internal.processors.cache.persistence.wal)
> initNextWriteHandle:1243, FsyncModeFileWriteAheadLogManager 
> (org.apache.ignite.internal.processors.cache.persistence.wal)
> rollOver:1130, FsyncModeFileWriteAheadLogManager 
> (org.apache.ignite.internal.processors.cache.persistence.wal)
> log:712, FsyncModeFileWriteAheadLogManager 
> (org.apache.ignite.internal.processors.cache.persistence.wal)
> {noformat}
> Reproducer is attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9744) Fix SYSTEM_WORKER_TERMINATION detection in general case

2018-10-03 Thread Andrey Kuznetsov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16637845#comment-16637845
 ] 

Andrey Kuznetsov commented on IGNITE-9744:
--

[~ivan.glukos], this change is ready. Could you please take a look?

> Fix SYSTEM_WORKER_TERMINATION detection in general case
> ---
>
> Key: IGNITE-9744
> URL: https://issues.apache.org/jira/browse/IGNITE-9744
> Project: Ignite
>  Issue Type: Bug
>Reporter: Andrey Kuznetsov
>Assignee: Andrey Kuznetsov
>Priority: Major
> Fix For: 2.7
>
>
> All existing critical workers handle unintended termination individually. 
> This should be done for arbitrtary critical worker as well. There is a test 
> to check this situation, {{SystemWorkersTerminationTest.testTermination}}, 
> but now it passes in fact due to {{SYSTEM_WORKER_BLOCKED}} instead of 
> {{SYSTEM_WORKER_TERMINATION}}, and this should be fixed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9737) Ignite WatchDog service should be configurable

2018-10-09 Thread Andrey Kuznetsov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16642935#comment-16642935
 ] 

Andrey Kuznetsov commented on IGNITE-9737:
--

Test for new parameters have been added. TeamCity results are ok for the PR.

> Ignite WatchDog service should be configurable
> --
>
> Key: IGNITE-9737
> URL: https://issues.apache.org/jira/browse/IGNITE-9737
> Project: Ignite
>  Issue Type: Bug
>Reporter: Nikolay Izhikov
>Assignee: Andrey Kuznetsov
>Priority: Blocker
> Fix For: 2.7
>
>
> At the moment, there is no way to disable Ignite WatchDog service from config 
> or JVM option.
> In any corner case or bug in that feature Ignite can become fully unusable 
> due to unpredictable shutdown.
> We should provide a way to enable/disable this feature from config or from 
> JVM option.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-9838) TxStateChangeEventTest fails sometimes on TeamCity

2018-10-10 Thread Andrey Kuznetsov (JIRA)
Andrey Kuznetsov created IGNITE-9838:


 Summary: TxStateChangeEventTest fails sometimes on TeamCity
 Key: IGNITE-9838
 URL: https://issues.apache.org/jira/browse/IGNITE-9838
 Project: Ignite
  Issue Type: Test
Reporter: Andrey Kuznetsov
 Fix For: 2.8


Both test methods may fail to acquire transaction lock. Presumably, timeout 
increasing can be enough to fix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-9860) Unreliable listener invocation order in GridDhtPartitionsExchangeFuture#onDone

2018-10-11 Thread Andrey Kuznetsov (JIRA)
Andrey Kuznetsov created IGNITE-9860:


 Summary: Unreliable listener invocation order in 
GridDhtPartitionsExchangeFuture#onDone
 Key: IGNITE-9860
 URL: https://issues.apache.org/jira/browse/IGNITE-9860
 Project: Ignite
  Issue Type: Bug
Reporter: Andrey Kuznetsov
 Fix For: 2.8


Listener being added right before {{super.onDone()}} call is intended to be 
invoked earlier than all other listeners. There is a small probability of 
breaking this guarantee: some other thread can call {{listen()}} before 
future-completing thread enters {{super.onDone()}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9776) FsyncModeFileWriteAheadLogManager can block forever in log() call

2018-10-14 Thread Andrey Kuznetsov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16649444#comment-16649444
 ] 

Andrey Kuznetsov commented on IGNITE-9776:
--

[~astelmak], could you please resurrect failing methods in 
{{WalRolloverTypesTest}} as a part of your PR?

> FsyncModeFileWriteAheadLogManager can block forever in log() call
> -
>
> Key: IGNITE-9776
> URL: https://issues.apache.org/jira/browse/IGNITE-9776
> Project: Ignite
>  Issue Type: Bug
>Reporter: Andrey Kuznetsov
>Assignee: Alexey Stelmak
>Priority: Major
> Fix For: 2.7
>
> Attachments: FsyncWalRolloverDoesNotBlockTest.java
>
>
> If WAL archiver is disabled and WALRecord being logged has {{rollOver() == 
> true}}, then {{log()}} call blocks forever in {{FileArchiver}}'s (!) method:
> {noformat}
> nextAbsoluteSegmentIndex:1707, FsyncModeFileWriteAheadLogManager$FileArchiver 
> (org.apache.ignite.internal.processors.cache.persistence.wal)
> access$3200:1437, FsyncModeFileWriteAheadLogManager$FileArchiver 
> (org.apache.ignite.internal.processors.cache.persistence.wal)
> pollNextFile:1384, FsyncModeFileWriteAheadLogManager 
> (org.apache.ignite.internal.processors.cache.persistence.wal)
> initNextWriteHandle:1243, FsyncModeFileWriteAheadLogManager 
> (org.apache.ignite.internal.processors.cache.persistence.wal)
> rollOver:1130, FsyncModeFileWriteAheadLogManager 
> (org.apache.ignite.internal.processors.cache.persistence.wal)
> log:712, FsyncModeFileWriteAheadLogManager 
> (org.apache.ignite.internal.processors.cache.persistence.wal)
> {noformat}
> Reproducer is attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9710) Ignite watchdog service handles longrunning cache creation

2018-10-16 Thread Andrey Kuznetsov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16651936#comment-16651936
 ] 

Andrey Kuznetsov commented on IGNITE-9710:
--

[~agoncharuk], [~agura], thanks for your comments. I've made all required 
changes. I had to update progress marker for exchanger worker from threads 
other than exchanger thread. Are you ok with this? 

TeamCity (re)tests are in progress right now.

> Ignite watchdog service handles longrunning cache creation
> --
>
> Key: IGNITE-9710
> URL: https://issues.apache.org/jira/browse/IGNITE-9710
> Project: Ignite
>  Issue Type: Bug
>  Components: cache
>Affects Versions: 2.6
>Reporter: Vyacheslav Daradur
>Assignee: Andrey Kuznetsov
>Priority: Major
> Fix For: 2.7
>
> Attachments: LongRunningCacheCreationTest.java
>
>
> Ignite watchdog service introduced by IGNITE-6587 handles long running cache 
> creation.
> Action in {{GridDhtPartitionsExchangeFuture#init}} may take significant time 
> and possibly should be covered by blocking section of warchdog service.
> Reproducer was attached: [^LongRunningCacheCreationTest.java].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9710) Ignite watchdog service handles longrunning cache creation

2018-10-17 Thread Andrey Kuznetsov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16653436#comment-16653436
 ] 

Andrey Kuznetsov commented on IGNITE-9710:
--

Don't believe TC Bot this time: the test mentioned is flaky.

> Ignite watchdog service handles longrunning cache creation
> --
>
> Key: IGNITE-9710
> URL: https://issues.apache.org/jira/browse/IGNITE-9710
> Project: Ignite
>  Issue Type: Bug
>  Components: cache
>Affects Versions: 2.6
>Reporter: Vyacheslav Daradur
>Assignee: Andrey Kuznetsov
>Priority: Major
> Fix For: 2.7
>
> Attachments: LongRunningCacheCreationTest.java
>
>
> Ignite watchdog service introduced by IGNITE-6587 handles long running cache 
> creation.
> Action in {{GridDhtPartitionsExchangeFuture#init}} may take significant time 
> and possibly should be covered by blocking section of warchdog service.
> Reproducer was attached: [^LongRunningCacheCreationTest.java].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-9932) Exchanger blocking session bounds can be accessed from invalid thread

2018-10-18 Thread Andrey Kuznetsov (JIRA)
Andrey Kuznetsov created IGNITE-9932:


 Summary: Exchanger blocking session bounds can be accessed from 
invalid thread
 Key: IGNITE-9932
 URL: https://issues.apache.org/jira/browse/IGNITE-9932
 Project: Ignite
  Issue Type: Bug
Reporter: Andrey Kuznetsov
Assignee: Andrey Kuznetsov


{{GridDhtPartitionExchangeFuture}} uses critical sections surrounded by 
{{exchangerBlockingSectionBegin}} and {{exchangerBlockingSectionEnd}}. 
Currently, these begin/end bounds assert they are called from 
partition-exchanger thread. It appeared that this assertion can be failed 
reasonably. So it is better to make begin/end bounds no-op unless they are 
called from partition-exchanger thread.

{{IgniteStableBaselineBinObjFieldsQuerySelfTest#testQueryReplicatedTransactional}}
 may hang due to this issue, see [1]. Exception stack trace leading to critical 
failure follows.

{noformat}
java.lang.AssertionError
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.exchangerBlockingSectionBegin(GridCachePartitionExchangeManager.java:2351)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.waitUntilNewCachesAreRegistered(GridDhtPartitionsExchangeFuture.java:2261)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onDone(GridDhtPartitionsExchangeFuture.java:2066)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.processFullMessage(GridDhtPartitionsExchangeFuture.java:3980)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.access$2100(GridDhtPartitionsExchangeFuture.java:141)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$7.apply(GridDhtPartitionsExchangeFuture.java:3667)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$7.apply(GridDhtPartitionsExchangeFuture.java:3655)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:385)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:355)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onReceiveFullMessage(GridDhtPartitionsExchangeFuture.java:3655)
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.processFullPartitionUpdate(GridCachePartitionExchangeManager.java:1655)
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$3.onMessage(GridCachePartitionExchangeManager.java:393)
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$3.onMessage(GridCachePartitionExchangeManager.java:380)
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:3178)
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:3157)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1056)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:581)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:380)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:306)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:101)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:295)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1569)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1197)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:127)
at 
org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1093)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{noformat}

[1] 
https://ci.ignite.apache.org/viewLog.html?buildId=2111470&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_BinaryObjectsSimpleMapperQueries




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (IGNITE-9679) Document critical workers liveness checking implementation

2018-10-18 Thread Andrey Kuznetsov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Kuznetsov reassigned IGNITE-9679:


Assignee: (was: Andrey Kuznetsov)

> Document critical workers liveness checking implementation
> --
>
> Key: IGNITE-9679
> URL: https://issues.apache.org/jira/browse/IGNITE-9679
> Project: Ignite
>  Issue Type: Task
>  Components: documentation
>Reporter: Andrey Kuznetsov
>Priority: Major
> Fix For: 2.7
>
>
> Newly implemented critical worker thread liveness checks should be mentioned 
> in Ignite Documentation. Brief description of the functionality follows.
> Ignite node has a number of critical worker threads that should be alive and 
> responsive, otherwise node's health is not guaranteed. These threads monitor 
> each other periodically and track two aspects for a thread being checked:
> - whether it's alive;
> - whether it updates its internal heartbeat timestamp.
> Both checks use {{IgniteConfiguration.failureDetectionTimeout}} property as a 
> threshold value.
> Whenever at least one of the above conditions is violated, checker thread 
> logs the error and calls currently configured {{FailureHandler}}.
> Liveness checks are enabled by default, but can be disabled through 
> {{WorkersControlMXBean.healthMonitoringEnabled}} property.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-9679) Document critical workers liveness checking implementation

2018-10-18 Thread Andrey Kuznetsov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Kuznetsov updated IGNITE-9679:
-
Description: 
Newly implemented critical worker thread liveness checks should be mentioned in 
Ignite Documentation. Brief description of the functionality follows.

Ignite node has a number of critical worker threads that should be alive and 
responsive, otherwise node's health is not guaranteed. These threads monitor 
each other periodically and track two aspects for a thread being checked:
- whether it's alive;
- whether it updates its internal heartbeat timestamp.
Whenever at least one of the above conditions is violated, checker thread logs 
the error and calls currently configured {{FailureHandler}}.

{{IgniteConfiguration.SystemWorkerBlockedTimeout}} configuration property 
affects monitoring behavior. At runtime monitoring settings can be changed via 
{{FailureHandlingMxBean}}.

By default, liveness checks are enabled, but blocked system worker detection 
will not lead to failure handler invocation, see 
{{FailureProcessor#getDefaultFailureHandler}} .


  was:
Newly implemented critical worker thread liveness checks should be mentioned in 
Ignite Documentation. Brief description of the functionality follows.

Ignite node has a number of critical worker threads that should be alive and 
responsive, otherwise node's health is not guaranteed. These threads monitor 
each other periodically and track two aspects for a thread being checked:
- whether it's alive;
- whether it updates its internal heartbeat timestamp.
Both checks use {{IgniteConfiguration.failureDetectionTimeout}} property as a 
threshold value.
Whenever at least one of the above conditions is violated, checker thread logs 
the error and calls currently configured {{FailureHandler}}.

Liveness checks are enabled by default, but can be disabled through 
{{WorkersControlMXBean.healthMonitoringEnabled}} property.



> Document critical workers liveness checking implementation
> --
>
> Key: IGNITE-9679
> URL: https://issues.apache.org/jira/browse/IGNITE-9679
> Project: Ignite
>  Issue Type: Task
>  Components: documentation
>Reporter: Andrey Kuznetsov
>Priority: Major
> Fix For: 2.7
>
>
> Newly implemented critical worker thread liveness checks should be mentioned 
> in Ignite Documentation. Brief description of the functionality follows.
> Ignite node has a number of critical worker threads that should be alive and 
> responsive, otherwise node's health is not guaranteed. These threads monitor 
> each other periodically and track two aspects for a thread being checked:
> - whether it's alive;
> - whether it updates its internal heartbeat timestamp.
> Whenever at least one of the above conditions is violated, checker thread 
> logs the error and calls currently configured {{FailureHandler}}.
> {{IgniteConfiguration.SystemWorkerBlockedTimeout}} configuration property 
> affects monitoring behavior. At runtime monitoring settings can be changed 
> via {{FailureHandlingMxBean}}.
> By default, liveness checks are enabled, but blocked system worker detection 
> will not lead to failure handler invocation, see 
> {{FailureProcessor#getDefaultFailureHandler}} .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9679) Document critical workers liveness checking implementation

2018-10-18 Thread Andrey Kuznetsov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16655548#comment-16655548
 ] 

Andrey Kuznetsov commented on IGNITE-9679:
--

[~Artem Budnikov], this issue is not blocked anymore, your help with it is 
appreciated.

> Document critical workers liveness checking implementation
> --
>
> Key: IGNITE-9679
> URL: https://issues.apache.org/jira/browse/IGNITE-9679
> Project: Ignite
>  Issue Type: Task
>  Components: documentation
>Reporter: Andrey Kuznetsov
>Priority: Major
> Fix For: 2.7
>
>
> Newly implemented critical worker thread liveness checks should be mentioned 
> in Ignite Documentation. Brief description of the functionality follows.
> Ignite node has a number of critical worker threads that should be alive and 
> responsive, otherwise node's health is not guaranteed. These threads monitor 
> each other periodically and track two aspects for a thread being checked:
> - whether it's alive;
> - whether it updates its internal heartbeat timestamp.
> Whenever at least one of the above conditions is violated, checker thread 
> logs the error and calls currently configured {{FailureHandler}}.
> {{IgniteConfiguration.SystemWorkerBlockedTimeout}} configuration property 
> affects monitoring behavior. At runtime monitoring settings can be changed 
> via {{FailureHandlingMxBean}}.
> By default, liveness checks are enabled, but blocked system worker detection 
> will not lead to failure handler invocation, see 
> {{FailureProcessor#getDefaultFailureHandler}} .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-9695) Add a way to prevent WAL disabling in WalStateManager

2018-10-24 Thread Andrey Kuznetsov (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Kuznetsov updated IGNITE-9695:
-
Summary: Add a way to prevent WAL disabling in WalStateManager  (was: Add a 
way to prevent per-cache WAL disabling in WalStateManager)

> Add a way to prevent WAL disabling in WalStateManager
> -
>
> Key: IGNITE-9695
> URL: https://issues.apache.org/jira/browse/IGNITE-9695
> Project: Ignite
>  Issue Type: Task
>Affects Versions: 2.6
>Reporter: Andrey Kuznetsov
>Assignee: Andrey Kuznetsov
>Priority: Major
> Fix For: 2.8
>
>
> When this prevention is on, {{WalStateManager.init()}} should return an 
> error-holding future immediately.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   4   >