Re: [IMPROVEMENT] Use IgniteSystemProperties.getString(key) instead of System.getenv(key)

2019-04-15 Thread Павлухин Иван
Hi,

+1

I failed to imagine a case when IgniteSystemProperties.getString can
make a real harm.

пн, 15 апр. 2019 г. в 09:35, Ilya Kasnacheev :
>
> Hello!
>
> I think it does indeed makes sense to always use
> *IgniteSystemProperties.getStri**ng*
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> чт, 11 апр. 2019 г. в 16:40, Alexey Kuznetsov :
>
> > Hi, Igniters.
> >
> > What do you think, if we will use " IgniteSystemProperties.getString(key)"
> > instead of " System.getenv(key)"?
> >
> > IgniteSystemProperties.getString(key) method check System,properties
> >  and if not found then check for System.getenv(key).
> >
> > The main purpose is to use it in tests, because it easily to set system
> > property in Java code,
> > and very non trivial to do the same for  environment variables.
> >
> > Does any body have objections if I create JIRA issue (with pull request) to
> > replace all direct calls of  "System.getenv(key)" with
> > IgniteSystemProperties.getString(key)?
> >
> > --
> > Alexey Kuznetsov
> >



-- 
Best regards,
Ivan Pavlukhin


Re: Impossible memory region created in TC test

2019-04-15 Thread Ilya Kasnacheev
Hello!

Ignite does not touch every page of data region, and until you touch a page
OS will not allocate any specific physical RAM to the virtual RAM address
of that page.

Moreover, AFAIK Ignite will not even allocate all the memory permitted by
data region until it is needed. It will allocate memory in chunks, which
means your system will slow to grind trying to find RAM for the next chunk
as you try to load data into such node.

If you would use Pre-Touch feature which was suggested in this developer
list a few months ago, you will see it fail explicitly.

Regards,
-- 
Ilya Kasnacheev


пн, 15 апр. 2019 г. в 09:49, Nikolay Izhikov :

> Hello, Igniters.
>
> Why test on TC can allocate Data Region bigger than the amount of
> available RAM?
> How can I disable this feature in a specific test?
>
> "memoryAllocated=1024.0 GiB ... required=1093413MB, available=96675MB"
>
> I have a test [1] that try to create Data Region of 1024 Gib size.
> I want to check OOM exception that should occur while this DataRegion
> starting.
>
> Locally, I have it.
>
> But, On TC that DataRegion successfully created [2]
> I see following log messages:
>
> ```
> [2019-04-13 21:31:30,426][INFO
> ][exchange-worker-#91818%test-server%][PageMemoryImpl] Started page memory
> [memoryAllocated=1024.0 GiB, pages=260204016, tableSize=19.8 GiB,
> checkpointBuffer=2.0 GiB]
> 
> [2019-04-13 21:31:18,004][WARN
> ][test-runner-#91737%pagemem.PageMemoryLazyAllocationWithPDSTest%][IgniteKernal%test-server]
> Nodes started on local machine require more than 80% of physical RAM what
> can lead to significant slowdown due to swapping (please decrease JVM heap
> size, data region size or checkpoint buffer size) [required=1093413MB,
> available=96675MB]
> ```
>
> [1]
> https://github.com/apache/ignite/pull/6388/files#diff-754f5ea2f3579b81c50662f3d20c9126R138
>
> [2]
> https://ci.ignite.apache.org/viewLog.html?buildId=3591587&buildTypeId=IgniteTests24Java8_RunAll#testNameId-7083016202406489951
>


[jira] [Created] (IGNITE-11742) Web console: grid-showing-rows component in grids with no selection

2019-04-15 Thread Ilya Borisov (JIRA)
Ilya Borisov created IGNITE-11742:
-

 Summary: Web console: grid-showing-rows component in grids with no 
selection
 Key: IGNITE-11742
 URL: https://issues.apache.org/jira/browse/IGNITE-11742
 Project: Ignite
  Issue Type: Bug
  Components: wizards
Reporter: Ilya Borisov
Assignee: Ilya Borisov


*The issue:*
Changes to grid-showing-rows component introduced by IGNITE-9981 assume that 
all grid instances have selection module enabled. This is not true in some 
cases (see ui-grid/template.pug), which causes "can't access property of 
undefined" type of runtime errors.

*What to do:*
Handle missing selection API on grid instances in grid-showing-rows.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Impossible memory region created in TC test

2019-04-15 Thread Nikolay Izhikov
Hello, Ilya.

> Ignite does not touch every page of data region, and until you touch a page 
> OS will not allocate any specific physical RAM to the virtual RAM address of 
> that page.

This is not true.
Take a look into this discussion [1]

> Moreover, AFAIK Ignite will not even allocate all the memory permitted by 
> data region until it is needed

May be I miss something, but

1. If persistenEnabled = false Ignite will allocate 1 segment on start [2]
2. If persistenEnabled = true Ignite will allocate all segments on start [3]

> If you would use Pre-Touch feature which was suggested in this developer list 
> a few months ago, you will see it fail explicitly.

Locally, test already work as expected.
But on TC it fails and DataRegion of 1024GiB can be created.

Seems, there is some flag on TC that enable this behaviour.

[1] 
http://apache-ignite-developers.2346864.n4.nabble.com/Data-regions-on-client-nodes-td32834.html
[2] 
https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/pagemem/impl/PageMemoryNoStoreImpl.java#L245
[3] 
https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/pagemem/PageMemoryImpl.java#L386


В Пн, 15/04/2019 в 10:18 +0300, Ilya Kasnacheev пишет:
> Hello!
> 
> Ignite does not touch every page of data region, and until you touch a page
> OS will not allocate any specific physical RAM to the virtual RAM address
> of that page.
> 
> Moreover, AFAIK Ignite will not even allocate all the memory permitted by
> data region until it is needed. It will allocate memory in chunks, which
> means your system will slow to grind trying to find RAM for the next chunk
> as you try to load data into such node.
> 
> If you would use Pre-Touch feature which was suggested in this developer
> list a few months ago, you will see it fail explicitly.
> 
> Regards,


signature.asc
Description: This is a digitally signed message part


Re: Consistency check and fix (review request)

2019-04-15 Thread Andrey Gura
Anton,

what does expression "withConsistency" mean? From user's standpoint it
means that all operations performed without this proxy are not
consistent. It means also that at least method name is bad.

Are there any guarantees that withConsistency proxy will not contain
bugs that will lead to inconsistent write after inconsistency was
found? I think there are no such guarantees. Bugs still are possible.
So I always must use withConsistency proxy because I doesn't have
other choice - all ways are unreliable and withConsistency just sounds
better.

Eventually we will have two different ways for working with cache
values with different bugs set. What is the profit?



On Fri, Apr 12, 2019 at 2:49 PM Anton Vinogradov  wrote:
>
> Folks,
>
> I've checked the tx benchmarks and found no performance drop.
> Also, see no issues at TC results.
> So, seems, code ready to be merged.
>
> Everyone interested, please share any objections about
> - public API
> - test coverage
> - implementation approach
>
> On Wed, Apr 3, 2019 at 5:46 PM Anton Vinogradov  wrote:
>
> > Nikolay,
> >
> > This is not a PoC, but the final solution (I hope so:) ) required the
> > review.
> > LWW means Last Write Wins, detailed explanation can be found at IEP-31.
> >
> > On Wed, Apr 3, 2019 at 5:24 PM Nikolay Izhikov 
> > wrote:
> >
> >> Hello, Anton.
> >>
> >> Thanks for the PoC.
> >>
> >> > finds correct values according to LWW strategy
> >>
> >> Can you, please, clarify what is LWW strategy?
> >>
> >> В Ср, 03/04/2019 в 17:19 +0300, Anton Vinogradov пишет:
> >> > Ilya,
> >> >
> >> > This is impossible due to a conflict between some isolation levels and
> >> > get-with-consistency expectations.
> >> > Basically, it's impossible to perform get-with-consistency after the
> >> other
> >> > get at !READ_COMMITTED transaction.
> >> > The problem here is that value should be cached according to the
> >> isolation
> >> > level, so get-with-consistency is restricted in this case.
> >> > Same problem we have at case get-with-consistency after put, so we have
> >> > restriction here too.
> >> > So, the order matter. :)
> >> >
> >> > See OperationRestrictionsCacheConsistencyTest [1] for details.
> >> >
> >> > [1]
> >> >
> >> https://github.com/apache/ignite/blob/8b0b0c3e1bde93ff9c4eb5667d794dd64a8b06f0/modules/core/src/test/java/org/apache/ignite/internal/processors/cache/consistency/OperationRestrictionsCacheConsistencyTest.java
> >> >
> >> > On Wed, Apr 3, 2019 at 4:54 PM Ilya Kasnacheev <
> >> ilya.kasnach...@gmail.com>
> >> > wrote:
> >> >
> >> > > Hello!
> >> > >
> >> > > Sounds useful especially for new feature development.
> >> > >
> >> > > Can you do a run of all tests with cache.forConsistency(), see if
> >> there are
> >> > > cases that fail?
> >> > >
> >> > > Regards,
> >> > > --
> >> > > Ilya Kasnacheev
> >> > >
> >> > >
> >> > > ср, 3 апр. 2019 г. в 16:17, Anton Vinogradov :
> >> > >
> >> > > > Igniters,
> >> > > >
> >> > > > Sometimes, at real deployment, we're faced with inconsistent state
> >> across
> >> > > > the topology.
> >> > > > This means that somehow we have different values for the same key at
> >> > > > different nodes.
> >> > > > This is an extremely rare situation, but, when you have thousands of
> >> > > > terabytes of data, this can be a real problem.
> >> > > >
> >> > > > Apache Ignite provides a consistency guarantee, each affinity node
> >> should
> >> > > > contain the same value for the same key, at least eventually.
> >> > > > But this guarantee can be violated because of bugs, see IEP-31 [1]
> >> for
> >> > > > details.
> >> > > >
> >> > > > So, I created the issue [2] to handle such situations.
> >> > > > The main idea is to have a special cache.withConsistency() proxy
> >> allows
> >> > > > checking a fix inconsistency on get operation.
> >> > > >
> >> > > > I've created PR [3] with following improvements (when
> >> > > > cache.withConsistency() proxy used):
> >> > > >
> >> > > > - PESSIMISTIC && !READ_COMMITTED transaction
> >> > > > -- checks values across the topology (under locks),
> >> > > > -- finds correct values according to LWW strategy,
> >> > > > -- records special event in case consistency violation found
> >> (contains
> >> > > > inconsistent map > and last values ),
> >> > > > -- enlists writes with latest value for each inconsistent key, so
> >> it will
> >> > > > be written on tx.commit().
> >> > > >
> >> > > > - OPTIMISTIC || READ_COMMITTED transactions
> >> > > > -- checks values across the topology (not under locks, so
> >> false-positive
> >> > > > case is possible),
> >> > > > -- starts PESSIMISTIC && SERIALIZABLE (at separate thread)
> >> transaction
> >> > >
> >> > > for
> >> > > > each possibly broken key and fixes it on a commit if necessary.
> >> > > > -- original transaction performs get-after-fix and can be continued
> >> if
> >> > >
> >> > > the
> >> > > > fix does not conflict with isolation level.
> >> > > >
> >> > > > Future plans
> >> > > > - Consistency guard (special process peri

Re: [IMPROVEMENT] Use IgniteSystemProperties.getString(key) instead of System.getenv(key)

2019-04-15 Thread Alexey Kuznetsov
Thanks for responding, I will file JIRA issue shortly.

On Mon, Apr 15, 2019 at 2:24 PM Павлухин Иван  wrote:

> Hi,
>
> +1
>
> I failed to imagine a case when IgniteSystemProperties.getString can
> make a real harm.
>
> пн, 15 апр. 2019 г. в 09:35, Ilya Kasnacheev :
> >
> > Hello!
> >
> > I think it does indeed makes sense to always use
> > *IgniteSystemProperties.getStri**ng*
> >
> > Regards,
> > --
> > Ilya Kasnacheev
> >
> >
> > чт, 11 апр. 2019 г. в 16:40, Alexey Kuznetsov :
> >
> > > Hi, Igniters.
> > >
> > > What do you think, if we will use "
> IgniteSystemProperties.getString(key)"
> > > instead of " System.getenv(key)"?
> > >
> > > IgniteSystemProperties.getString(key) method check System,properties
> > >  and if not found then check for System.getenv(key).
> > >
> > > The main purpose is to use it in tests, because it easily to set system
> > > property in Java code,
> > > and very non trivial to do the same for  environment variables.
> > >
> > > Does any body have objections if I create JIRA issue (with pull
> request) to
> > > replace all direct calls of  "System.getenv(key)" with
> > > IgniteSystemProperties.getString(key)?
> > >
> > > --
> > > Alexey Kuznetsov
> > >
>
>
>
> --
> Best regards,
> Ivan Pavlukhin
>


-- 
Alexey Kuznetsov


[jira] [Created] (IGNITE-11743) Stopping caches concurrently with node join may lead to crash of the node

2019-04-15 Thread Sergey Chugunov (JIRA)
Sergey Chugunov created IGNITE-11743:


 Summary: Stopping caches concurrently with node join may lead to 
crash of the node
 Key: IGNITE-11743
 URL: https://issues.apache.org/jira/browse/IGNITE-11743
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.7
Reporter: Sergey Chugunov
Assignee: Sergey Chugunov
 Attachments: IgnitePdsNodeRestartCacheCreateTest.java

When an existing cache is stopped (e.g. via call Ignite#destroyCache(String 
name)) this action is distributed across cluster by discovery mechanism (and is 
processed from *disco-notifier-worker* thread).
At the same time joining node prepares to start caches from *exchange-thread*.

If a cache stop request arrives to new node right in the middle of cache start 
prepare, it may lead to exception in FilePageStoreManager like one below and 
node crash.

Test reproducing the issue is attached.

{noformat}
class org.apache.ignite.IgniteCheckedException: Failed to get page store for 
the given cache ID (cache has not been started): -1422502786
at 
org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.getStore(FilePageStoreManager.java:1132)
at 
org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.read(FilePageStoreManager.java:482)
at 
org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.read(FilePageStoreManager.java:469)
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.acquirePage(PageMemoryImpl.java:854)
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.acquirePage(PageMemoryImpl.java:681)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.getOrAllocateCacheMetas(GridCacheOffheapManager.java:869)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.initDataStructures(GridCacheOffheapManager.java:128)
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.start(IgniteCacheOffheapManagerImpl.java:193)
at 
org.apache.ignite.internal.processors.cache.CacheGroupContext.start(CacheGroupContext.java:1043)
at 
org.apache.ignite.internal.processors.cache.GridCacheProcessor.startCacheGroup(GridCacheProcessor.java:2829)
at 
org.apache.ignite.internal.processors.cache.GridCacheProcessor.getOrCreateCacheGroupContext(GridCacheProcessor.java:2557)
at 
org.apache.ignite.internal.processors.cache.GridCacheProcessor.prepareCacheContext(GridCacheProcessor.java:2387)
at 
org.apache.ignite.internal.processors.cache.GridCacheProcessor.lambda$null$6a5b31b9$1(GridCacheProcessor.java:2209)
at 
org.apache.ignite.internal.processors.cache.GridCacheProcessor.lambda$prepareStartCaches$5(GridCacheProcessor.java:2130)
at 
org.apache.ignite.internal.processors.cache.GridCacheProcessor.lambda$prepareStartCaches$926b6886$1(GridCacheProcessor.java:2206)
at 
org.apache.ignite.internal.util.IgniteUtils.lambda$null$1(IgniteUtils.java:10874)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11744) Configuration for explicit plugins providing.

2019-04-15 Thread PetrovMikhail (JIRA)
PetrovMikhail created IGNITE-11744:
--

 Summary: Configuration for explicit plugins providing.
 Key: IGNITE-11744
 URL: https://issues.apache.org/jira/browse/IGNITE-11744
 Project: Ignite
  Issue Type: Task
Reporter: PetrovMikhail
Assignee: PetrovMikhail






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Consistency check and fix (review request)

2019-04-15 Thread Anton Vinogradov
Andrey,

>> It means also that at least method name is bad.
Agreed, already discussed with Aleksey Plekhanov.
Decided that ".withConsistencyCheck()" is a proper name.

>> What is the profit?
This proxy allows to check (and fix) is there any consistency violation
across the topology.
The proxy will check all backups contain the same values as primary.
So, when it's possible (you're ready to spend resources for this check) you
will be able to read-with-consistency-check.
This will decrease the amount of "inconsistency caused
war/strikes/devastation" situations, which is important for financial
systems.

On Mon, Apr 15, 2019 at 3:58 PM Andrey Gura  wrote:

> Anton,
>
> what does expression "withConsistency" mean? From user's standpoint it
> means that all operations performed without this proxy are not
> consistent. It means also that at least method name is bad.
>
> Are there any guarantees that withConsistency proxy will not contain
> bugs that will lead to inconsistent write after inconsistency was
> found? I think there are no such guarantees. Bugs still are possible.
> So I always must use withConsistency proxy because I doesn't have
> other choice - all ways are unreliable and withConsistency just sounds
> better.
>
> Eventually we will have two different ways for working with cache
> values with different bugs set. What is the profit?
>
>
>
> On Fri, Apr 12, 2019 at 2:49 PM Anton Vinogradov  wrote:
> >
> > Folks,
> >
> > I've checked the tx benchmarks and found no performance drop.
> > Also, see no issues at TC results.
> > So, seems, code ready to be merged.
> >
> > Everyone interested, please share any objections about
> > - public API
> > - test coverage
> > - implementation approach
> >
> > On Wed, Apr 3, 2019 at 5:46 PM Anton Vinogradov  wrote:
> >
> > > Nikolay,
> > >
> > > This is not a PoC, but the final solution (I hope so:) ) required the
> > > review.
> > > LWW means Last Write Wins, detailed explanation can be found at IEP-31.
> > >
> > > On Wed, Apr 3, 2019 at 5:24 PM Nikolay Izhikov 
> > > wrote:
> > >
> > >> Hello, Anton.
> > >>
> > >> Thanks for the PoC.
> > >>
> > >> > finds correct values according to LWW strategy
> > >>
> > >> Can you, please, clarify what is LWW strategy?
> > >>
> > >> В Ср, 03/04/2019 в 17:19 +0300, Anton Vinogradov пишет:
> > >> > Ilya,
> > >> >
> > >> > This is impossible due to a conflict between some isolation levels
> and
> > >> > get-with-consistency expectations.
> > >> > Basically, it's impossible to perform get-with-consistency after the
> > >> other
> > >> > get at !READ_COMMITTED transaction.
> > >> > The problem here is that value should be cached according to the
> > >> isolation
> > >> > level, so get-with-consistency is restricted in this case.
> > >> > Same problem we have at case get-with-consistency after put, so we
> have
> > >> > restriction here too.
> > >> > So, the order matter. :)
> > >> >
> > >> > See OperationRestrictionsCacheConsistencyTest [1] for details.
> > >> >
> > >> > [1]
> > >> >
> > >>
> https://github.com/apache/ignite/blob/8b0b0c3e1bde93ff9c4eb5667d794dd64a8b06f0/modules/core/src/test/java/org/apache/ignite/internal/processors/cache/consistency/OperationRestrictionsCacheConsistencyTest.java
> > >> >
> > >> > On Wed, Apr 3, 2019 at 4:54 PM Ilya Kasnacheev <
> > >> ilya.kasnach...@gmail.com>
> > >> > wrote:
> > >> >
> > >> > > Hello!
> > >> > >
> > >> > > Sounds useful especially for new feature development.
> > >> > >
> > >> > > Can you do a run of all tests with cache.forConsistency(), see if
> > >> there are
> > >> > > cases that fail?
> > >> > >
> > >> > > Regards,
> > >> > > --
> > >> > > Ilya Kasnacheev
> > >> > >
> > >> > >
> > >> > > ср, 3 апр. 2019 г. в 16:17, Anton Vinogradov :
> > >> > >
> > >> > > > Igniters,
> > >> > > >
> > >> > > > Sometimes, at real deployment, we're faced with inconsistent
> state
> > >> across
> > >> > > > the topology.
> > >> > > > This means that somehow we have different values for the same
> key at
> > >> > > > different nodes.
> > >> > > > This is an extremely rare situation, but, when you have
> thousands of
> > >> > > > terabytes of data, this can be a real problem.
> > >> > > >
> > >> > > > Apache Ignite provides a consistency guarantee, each affinity
> node
> > >> should
> > >> > > > contain the same value for the same key, at least eventually.
> > >> > > > But this guarantee can be violated because of bugs, see IEP-31
> [1]
> > >> for
> > >> > > > details.
> > >> > > >
> > >> > > > So, I created the issue [2] to handle such situations.
> > >> > > > The main idea is to have a special cache.withConsistency() proxy
> > >> allows
> > >> > > > checking a fix inconsistency on get operation.
> > >> > > >
> > >> > > > I've created PR [3] with following improvements (when
> > >> > > > cache.withConsistency() proxy used):
> > >> > > >
> > >> > > > - PESSIMISTIC && !READ_COMMITTED transaction
> > >> > > > -- checks values across the topology (under locks),
> > >> > > > -- finds correct values acc

Re: Consistency check and fix (review request)

2019-04-15 Thread Andrey Gura
Method name is minor problem. I still believe that there is no need
for this proxy because there are no any guarantees about bugless
implementation this functionality. Better way is reaching bugless
implementation of current functionality.

On Mon, Apr 15, 2019 at 4:51 PM Anton Vinogradov  wrote:
>
> Andrey,
>
> >> It means also that at least method name is bad.
> Agreed, already discussed with Aleksey Plekhanov.
> Decided that ".withConsistencyCheck()" is a proper name.
>
> >> What is the profit?
> This proxy allows to check (and fix) is there any consistency violation
> across the topology.
> The proxy will check all backups contain the same values as primary.
> So, when it's possible (you're ready to spend resources for this check) you
> will be able to read-with-consistency-check.
> This will decrease the amount of "inconsistency caused
> war/strikes/devastation" situations, which is important for financial
> systems.
>
> On Mon, Apr 15, 2019 at 3:58 PM Andrey Gura  wrote:
>
> > Anton,
> >
> > what does expression "withConsistency" mean? From user's standpoint it
> > means that all operations performed without this proxy are not
> > consistent. It means also that at least method name is bad.
> >
> > Are there any guarantees that withConsistency proxy will not contain
> > bugs that will lead to inconsistent write after inconsistency was
> > found? I think there are no such guarantees. Bugs still are possible.
> > So I always must use withConsistency proxy because I doesn't have
> > other choice - all ways are unreliable and withConsistency just sounds
> > better.
> >
> > Eventually we will have two different ways for working with cache
> > values with different bugs set. What is the profit?
> >
> >
> >
> > On Fri, Apr 12, 2019 at 2:49 PM Anton Vinogradov  wrote:
> > >
> > > Folks,
> > >
> > > I've checked the tx benchmarks and found no performance drop.
> > > Also, see no issues at TC results.
> > > So, seems, code ready to be merged.
> > >
> > > Everyone interested, please share any objections about
> > > - public API
> > > - test coverage
> > > - implementation approach
> > >
> > > On Wed, Apr 3, 2019 at 5:46 PM Anton Vinogradov  wrote:
> > >
> > > > Nikolay,
> > > >
> > > > This is not a PoC, but the final solution (I hope so:) ) required the
> > > > review.
> > > > LWW means Last Write Wins, detailed explanation can be found at IEP-31.
> > > >
> > > > On Wed, Apr 3, 2019 at 5:24 PM Nikolay Izhikov 
> > > > wrote:
> > > >
> > > >> Hello, Anton.
> > > >>
> > > >> Thanks for the PoC.
> > > >>
> > > >> > finds correct values according to LWW strategy
> > > >>
> > > >> Can you, please, clarify what is LWW strategy?
> > > >>
> > > >> В Ср, 03/04/2019 в 17:19 +0300, Anton Vinogradov пишет:
> > > >> > Ilya,
> > > >> >
> > > >> > This is impossible due to a conflict between some isolation levels
> > and
> > > >> > get-with-consistency expectations.
> > > >> > Basically, it's impossible to perform get-with-consistency after the
> > > >> other
> > > >> > get at !READ_COMMITTED transaction.
> > > >> > The problem here is that value should be cached according to the
> > > >> isolation
> > > >> > level, so get-with-consistency is restricted in this case.
> > > >> > Same problem we have at case get-with-consistency after put, so we
> > have
> > > >> > restriction here too.
> > > >> > So, the order matter. :)
> > > >> >
> > > >> > See OperationRestrictionsCacheConsistencyTest [1] for details.
> > > >> >
> > > >> > [1]
> > > >> >
> > > >>
> > https://github.com/apache/ignite/blob/8b0b0c3e1bde93ff9c4eb5667d794dd64a8b06f0/modules/core/src/test/java/org/apache/ignite/internal/processors/cache/consistency/OperationRestrictionsCacheConsistencyTest.java
> > > >> >
> > > >> > On Wed, Apr 3, 2019 at 4:54 PM Ilya Kasnacheev <
> > > >> ilya.kasnach...@gmail.com>
> > > >> > wrote:
> > > >> >
> > > >> > > Hello!
> > > >> > >
> > > >> > > Sounds useful especially for new feature development.
> > > >> > >
> > > >> > > Can you do a run of all tests with cache.forConsistency(), see if
> > > >> there are
> > > >> > > cases that fail?
> > > >> > >
> > > >> > > Regards,
> > > >> > > --
> > > >> > > Ilya Kasnacheev
> > > >> > >
> > > >> > >
> > > >> > > ср, 3 апр. 2019 г. в 16:17, Anton Vinogradov :
> > > >> > >
> > > >> > > > Igniters,
> > > >> > > >
> > > >> > > > Sometimes, at real deployment, we're faced with inconsistent
> > state
> > > >> across
> > > >> > > > the topology.
> > > >> > > > This means that somehow we have different values for the same
> > key at
> > > >> > > > different nodes.
> > > >> > > > This is an extremely rare situation, but, when you have
> > thousands of
> > > >> > > > terabytes of data, this can be a real problem.
> > > >> > > >
> > > >> > > > Apache Ignite provides a consistency guarantee, each affinity
> > node
> > > >> should
> > > >> > > > contain the same value for the same key, at least eventually.
> > > >> > > > But this guarantee can be violated because of bugs, see IEP-31
> > [1]
> > > >> for
> >

Re: Consistency check and fix (review request)

2019-04-15 Thread Anton Vinogradov
Seems, we already fixed all bugs caused this feature, but there is no
warranty we will not create new :)
This proxy is just checker that consistency is ok.

>> reaching bugless implementation
Not sure it's possible. Once you have software it contains bugs.
This proxy will tell you whether these bugs lead to inconsistency.

On Mon, Apr 15, 2019 at 5:19 PM Andrey Gura  wrote:

> Method name is minor problem. I still believe that there is no need
> for this proxy because there are no any guarantees about bugless
> implementation this functionality. Better way is reaching bugless
> implementation of current functionality.
>
> On Mon, Apr 15, 2019 at 4:51 PM Anton Vinogradov  wrote:
> >
> > Andrey,
> >
> > >> It means also that at least method name is bad.
> > Agreed, already discussed with Aleksey Plekhanov.
> > Decided that ".withConsistencyCheck()" is a proper name.
> >
> > >> What is the profit?
> > This proxy allows to check (and fix) is there any consistency violation
> > across the topology.
> > The proxy will check all backups contain the same values as primary.
> > So, when it's possible (you're ready to spend resources for this check)
> you
> > will be able to read-with-consistency-check.
> > This will decrease the amount of "inconsistency caused
> > war/strikes/devastation" situations, which is important for financial
> > systems.
> >
> > On Mon, Apr 15, 2019 at 3:58 PM Andrey Gura  wrote:
> >
> > > Anton,
> > >
> > > what does expression "withConsistency" mean? From user's standpoint it
> > > means that all operations performed without this proxy are not
> > > consistent. It means also that at least method name is bad.
> > >
> > > Are there any guarantees that withConsistency proxy will not contain
> > > bugs that will lead to inconsistent write after inconsistency was
> > > found? I think there are no such guarantees. Bugs still are possible.
> > > So I always must use withConsistency proxy because I doesn't have
> > > other choice - all ways are unreliable and withConsistency just sounds
> > > better.
> > >
> > > Eventually we will have two different ways for working with cache
> > > values with different bugs set. What is the profit?
> > >
> > >
> > >
> > > On Fri, Apr 12, 2019 at 2:49 PM Anton Vinogradov 
> wrote:
> > > >
> > > > Folks,
> > > >
> > > > I've checked the tx benchmarks and found no performance drop.
> > > > Also, see no issues at TC results.
> > > > So, seems, code ready to be merged.
> > > >
> > > > Everyone interested, please share any objections about
> > > > - public API
> > > > - test coverage
> > > > - implementation approach
> > > >
> > > > On Wed, Apr 3, 2019 at 5:46 PM Anton Vinogradov 
> wrote:
> > > >
> > > > > Nikolay,
> > > > >
> > > > > This is not a PoC, but the final solution (I hope so:) ) required
> the
> > > > > review.
> > > > > LWW means Last Write Wins, detailed explanation can be found at
> IEP-31.
> > > > >
> > > > > On Wed, Apr 3, 2019 at 5:24 PM Nikolay Izhikov <
> nizhi...@apache.org>
> > > > > wrote:
> > > > >
> > > > >> Hello, Anton.
> > > > >>
> > > > >> Thanks for the PoC.
> > > > >>
> > > > >> > finds correct values according to LWW strategy
> > > > >>
> > > > >> Can you, please, clarify what is LWW strategy?
> > > > >>
> > > > >> В Ср, 03/04/2019 в 17:19 +0300, Anton Vinogradov пишет:
> > > > >> > Ilya,
> > > > >> >
> > > > >> > This is impossible due to a conflict between some isolation
> levels
> > > and
> > > > >> > get-with-consistency expectations.
> > > > >> > Basically, it's impossible to perform get-with-consistency
> after the
> > > > >> other
> > > > >> > get at !READ_COMMITTED transaction.
> > > > >> > The problem here is that value should be cached according to the
> > > > >> isolation
> > > > >> > level, so get-with-consistency is restricted in this case.
> > > > >> > Same problem we have at case get-with-consistency after put, so
> we
> > > have
> > > > >> > restriction here too.
> > > > >> > So, the order matter. :)
> > > > >> >
> > > > >> > See OperationRestrictionsCacheConsistencyTest [1] for details.
> > > > >> >
> > > > >> > [1]
> > > > >> >
> > > > >>
> > >
> https://github.com/apache/ignite/blob/8b0b0c3e1bde93ff9c4eb5667d794dd64a8b06f0/modules/core/src/test/java/org/apache/ignite/internal/processors/cache/consistency/OperationRestrictionsCacheConsistencyTest.java
> > > > >> >
> > > > >> > On Wed, Apr 3, 2019 at 4:54 PM Ilya Kasnacheev <
> > > > >> ilya.kasnach...@gmail.com>
> > > > >> > wrote:
> > > > >> >
> > > > >> > > Hello!
> > > > >> > >
> > > > >> > > Sounds useful especially for new feature development.
> > > > >> > >
> > > > >> > > Can you do a run of all tests with cache.forConsistency(),
> see if
> > > > >> there are
> > > > >> > > cases that fail?
> > > > >> > >
> > > > >> > > Regards,
> > > > >> > > --
> > > > >> > > Ilya Kasnacheev
> > > > >> > >
> > > > >> > >
> > > > >> > > ср, 3 апр. 2019 г. в 16:17, Anton Vinogradov :
> > > > >> > >
> > > > >> > > > Igniters,
> > > > >> > > >
> > > > >> > > > Sometimes, at real 

[jira] [Created] (IGNITE-11745) Rebalancing overwhelmingly prefers some supplier nodes to others

2019-04-15 Thread Ilya Kasnacheev (JIRA)
Ilya Kasnacheev created IGNITE-11745:


 Summary: Rebalancing overwhelmingly prefers some supplier nodes to 
others
 Key: IGNITE-11745
 URL: https://issues.apache.org/jira/browse/IGNITE-11745
 Project: Ignite
  Issue Type: Bug
  Components: cache
Affects Versions: 2.7
Reporter: Ilya Kasnacheev
Assignee: Ilya Kasnacheev


When cache has backups, and you add third node to cluster, Ignite will only 
rebalance data from single node.
When you add n-th node, Ignite will not rebalance from some nodes and it will 
pull 10x as much data from some nodes than from others.

This is because we filter static nodes list by partition availability and then 
pick the first one. Overwhelmingly it is the first nodes in list and nodes 
towards the end of list will never get to supply partitions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11746) It is necessary to work out the ability to read basic parameters from environment variables so that administrators would not have to enter them every time

2019-04-15 Thread Sergei Ryzhov (JIRA)
Sergei Ryzhov created IGNITE-11746:
--

 Summary: It is necessary to work out the ability to read basic 
parameters from environment variables so that administrators would not have to 
enter them every time
 Key: IGNITE-11746
 URL: https://issues.apache.org/jira/browse/IGNITE-11746
 Project: Ignite
  Issue Type: Wish
Reporter: Sergei Ryzhov


It is necessary to work out the ability to read basic parameters from 
environment variables so that administrators would not have to enter them every 
time.

For:

control.sh



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Ignite 2.7.5 Release scope

2019-04-15 Thread Dmitriy Pavlov
Hi Igniters,

Unfortunately, Ignite C++ examples cannot build without
https://issues.apache.org/jira/browse/IGNITE-11195

So this issue should be included in the scope.

Sincerely,
Dmitriy Pavlov

пт, 12 апр. 2019 г. в 13:53, Dmitriy Pavlov :

> Hi Igniters,
>
> A bit of update: Last change fixing GridUnsafe for Java 12 was merged to
> master and to 2.7.5
>
> Overnight run contains a number of failures considered as blockers by
> Apache Ignite Teamcity Bot (comparing to 2.7):
> - Platform .NET (Inspections) always failed- Critical F.R.: 100,0%
> - Platform C++ (Linux Clang) & Platform C++ (Linux)* failed in most cases
> - Critical F.R.: 60,0%
> - Disk Page Compressions - Module is N/A in 2.7.5
> - [Inspections] Core - always failed
> - Hibernate 5.3 - Module is N/A in 2.7.5
> - Platform C++ (Win x64 | Release) - also failed in 2.7
>
> And 2 more tests failure considered as blockers (now it is in re-run
> state). If re-run shows these tests are passing I proceed with RC
> preparation.
>
> Sincerely,
> Dmitriy Pavlov
>
> вт, 9 апр. 2019 г. в 10:03, Dmitriy Pavlov :
>
>> Hi, Nikolay,
>>
>> thanks for offering help.
>>
>> Performance testing of new fix seems to be almost done.
>>
>> After merging of the last ticket (I hope it will be tomorrow) I will
>> continue with RC building. There can be some issues related to scripts.
>>
>> Sincerely,
>> Dmitriy Pavlov
>>
>> вт, 9 апр. 2019 г. в 09:53, Nikolay Izhikov :
>>
>>> Hello, Dmitriy.
>>>
>>> Any news about release?
>>> Do you need assistance with it?
>>>
>>> вт, 2 апр. 2019 г. в 20:04, Dmitriy Pavlov :
>>>
>>> > Ivan P., it seems the netty approach you've proposed works well. Thank
>>> you.
>>> >
>>> > Igniters, please take a look at following fix:
>>> > https://github.com/apache/ignite/pull/6384
>>> > It allows us to start under Java 12 and under Java 11- (as it).
>>> >
>>> > Sincerely,
>>> > Dmitriy Pavlov
>>> >
>>> > пт, 29 мар. 2019 г. в 22:57, Dmitriy Pavlov :
>>> >
>>> > > Denis, the issue here is that we don't know for sure. We see just one
>>> > > blocking issue with accessing NioAccessObject.
>>> > >
>>> > > And there are 3 different scenario related to this issue fix:
>>> > > - fixes won't help, and we should find out other options on how to
>>> create
>>> > > a direct buffer from pointer - needed for durable memory, Java 12
>>> goes
>>> > to a
>>> > > later release.
>>> > > - some fix would help, but other issues come, Java 12 goes to some
>>> later
>>> > > release
>>> > > - some from proposed fixes works, nothing else needs to be done - 1-2
>>> > days
>>> > >
>>> > > If it latest scenario, I would include as much as we can (1-2 days
>>> extra
>>> > > are comparable with minimal voting time).
>>> > >
>>> > > BTW, I've checked scripts it does not work for me, I will ask Andrey
>>> > > G/Peter for advice on Monday.
>>> > >
>>> > > Sincerely,
>>> > > Dmitriy Pavlov
>>> > >
>>> > >
>>> > > пт, 29 мар. 2019 г. в 20:47, Denis Magda :
>>> > >
>>> > >> Folks,
>>> > >>
>>> > >> What are the efforts to support Java 12? Let's do 2.7.6 shortly if
>>> the
>>> > >> fixes are time-consuming.
>>> > >>
>>> > >> -
>>> > >> Denis
>>> > >>
>>> > >>
>>> > >> On Fri, Mar 29, 2019 at 10:08 AM Dmitriy Pavlov >> >
>>> > >> wrote:
>>> > >>
>>> > >> > Hi Igniters,
>>> > >> >
>>> > >> >  I would like to announce code freeze for 2.7.5. Only one open
>>> ticket
>>> > is
>>> > >> > there (reopened):
>>> https://issues.apache.org/jira/browse/IGNITE-11600
>>> > >> (if
>>> > >> > we
>>> > >> > can't start using Java 12 we should clearly state it in
>>> scripts/code).
>>> > >> >
>>> > >> > We're entering to Stabilization phase for
>>> > >> >
>>> > https://cwiki.apache.org/confluence/display/IGNITE/Apache+Ignite+2.7.5
>>> > >> and
>>> > >> > only blockers may be included into scope. See
>>> > >> >
>>> https://cwiki.apache.org/confluence/display/IGNITE/Release+Process
>>> > for
>>> > >> > more
>>> > >> > details.
>>> > >> >
>>> > >> > Sincerely,
>>> > >> > Dmitriy Pavlov
>>> > >> >
>>> > >> > пт, 29 мар. 2019 г. в 13:51, Dmitriy Pavlov :
>>> > >> >
>>> > >> > > Hi Denis,
>>> > >> > >
>>> > >> > > I'm not talking about months. In this discussion, Andrey and
>>> Ivan
>>> > >> > proposed
>>> > >> > > a couple of fixes that may help.
>>> > >> > >
>>> > >> > > It will require a day or two to check if it helps. If it not
>>> helpful
>>> > >> then
>>> > >> > > we should modify startup scripts to say clearly that Java 12 is
>>> not
>>> > >> > > supported.
>>> > >> > >
>>> > >> > > Now under Java 12 Ignite suggests to set startup parameters, but
>>> > even
>>> > >> > with
>>> > >> > > correct parameters, it fails and says please set parameters.
>>> Totally
>>> > >> > > unclear for end-user.
>>> > >> > >
>>> > >> > > I've reopened
>>> https://issues.apache.org/jira/browse/IGNITE-11600
>>> > >> > >
>>> > >> > > Sincerely,
>>> > >> > > Dmitriy Pavlov
>>> > >> > >
>>> > >> > > чт, 28 мар. 2019 г. в 18:51, Denis Magda :
>>> > >> > >
>>> > >> > >> If the failure handler improvements will lower down a n

[jira] [Created] (IGNITE-11747) Document --tx control script commands

2019-04-15 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-11747:
---

 Summary: Document --tx control script commands
 Key: IGNITE-11747
 URL: https://issues.apache.org/jira/browse/IGNITE-11747
 Project: Ignite
  Issue Type: Task
  Components: documentation
Reporter: Ivan Rakov


Along with consistency check utilities, ./control.sh script has --tx command 
which allows to display info about active transactions and even kill hanging 
transactions directly.

./control.sh provides just brief description of options:
{code:java}
List or kill transactions:
control.sh --tx [--xid XID] [--min-duration SECONDS] [--min-size SIZE] [--label 
PATTERN_REGEX] [--servers|--clients] [--nodes 
consistentId1[,consistentId2,,consistentIdN]] [--limit NUMBER] [--order 
DURATION|SIZE|START_TIME] [--kill] [--info] [--yes]
{code}
We should document possible use cases and options of the command, possibly 
somewhere close to [https://apacheignite-tools.readme.io/docs/control-script]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Consistency check and fix (review request)

2019-04-15 Thread Andrey Gura
Anton,

I'm trying tell you that this proxy can produce false positive result,
incorrect result and just hide bugs. What will the next solution?
withNoBugs proxy?

You can perform consistency check using idle verify utility. Recovery
tool is good idea but user should trigger this process, not some cache
proxy implementation.

On Mon, Apr 15, 2019 at 5:34 PM Anton Vinogradov  wrote:
>
> Seems, we already fixed all bugs caused this feature, but there is no
> warranty we will not create new :)
> This proxy is just checker that consistency is ok.
>
> >> reaching bugless implementation
> Not sure it's possible. Once you have software it contains bugs.
> This proxy will tell you whether these bugs lead to inconsistency.
>
> On Mon, Apr 15, 2019 at 5:19 PM Andrey Gura  wrote:
>
> > Method name is minor problem. I still believe that there is no need
> > for this proxy because there are no any guarantees about bugless
> > implementation this functionality. Better way is reaching bugless
> > implementation of current functionality.
> >
> > On Mon, Apr 15, 2019 at 4:51 PM Anton Vinogradov  wrote:
> > >
> > > Andrey,
> > >
> > > >> It means also that at least method name is bad.
> > > Agreed, already discussed with Aleksey Plekhanov.
> > > Decided that ".withConsistencyCheck()" is a proper name.
> > >
> > > >> What is the profit?
> > > This proxy allows to check (and fix) is there any consistency violation
> > > across the topology.
> > > The proxy will check all backups contain the same values as primary.
> > > So, when it's possible (you're ready to spend resources for this check)
> > you
> > > will be able to read-with-consistency-check.
> > > This will decrease the amount of "inconsistency caused
> > > war/strikes/devastation" situations, which is important for financial
> > > systems.
> > >
> > > On Mon, Apr 15, 2019 at 3:58 PM Andrey Gura  wrote:
> > >
> > > > Anton,
> > > >
> > > > what does expression "withConsistency" mean? From user's standpoint it
> > > > means that all operations performed without this proxy are not
> > > > consistent. It means also that at least method name is bad.
> > > >
> > > > Are there any guarantees that withConsistency proxy will not contain
> > > > bugs that will lead to inconsistent write after inconsistency was
> > > > found? I think there are no such guarantees. Bugs still are possible.
> > > > So I always must use withConsistency proxy because I doesn't have
> > > > other choice - all ways are unreliable and withConsistency just sounds
> > > > better.
> > > >
> > > > Eventually we will have two different ways for working with cache
> > > > values with different bugs set. What is the profit?
> > > >
> > > >
> > > >
> > > > On Fri, Apr 12, 2019 at 2:49 PM Anton Vinogradov 
> > wrote:
> > > > >
> > > > > Folks,
> > > > >
> > > > > I've checked the tx benchmarks and found no performance drop.
> > > > > Also, see no issues at TC results.
> > > > > So, seems, code ready to be merged.
> > > > >
> > > > > Everyone interested, please share any objections about
> > > > > - public API
> > > > > - test coverage
> > > > > - implementation approach
> > > > >
> > > > > On Wed, Apr 3, 2019 at 5:46 PM Anton Vinogradov 
> > wrote:
> > > > >
> > > > > > Nikolay,
> > > > > >
> > > > > > This is not a PoC, but the final solution (I hope so:) ) required
> > the
> > > > > > review.
> > > > > > LWW means Last Write Wins, detailed explanation can be found at
> > IEP-31.
> > > > > >
> > > > > > On Wed, Apr 3, 2019 at 5:24 PM Nikolay Izhikov <
> > nizhi...@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > >> Hello, Anton.
> > > > > >>
> > > > > >> Thanks for the PoC.
> > > > > >>
> > > > > >> > finds correct values according to LWW strategy
> > > > > >>
> > > > > >> Can you, please, clarify what is LWW strategy?
> > > > > >>
> > > > > >> В Ср, 03/04/2019 в 17:19 +0300, Anton Vinogradov пишет:
> > > > > >> > Ilya,
> > > > > >> >
> > > > > >> > This is impossible due to a conflict between some isolation
> > levels
> > > > and
> > > > > >> > get-with-consistency expectations.
> > > > > >> > Basically, it's impossible to perform get-with-consistency
> > after the
> > > > > >> other
> > > > > >> > get at !READ_COMMITTED transaction.
> > > > > >> > The problem here is that value should be cached according to the
> > > > > >> isolation
> > > > > >> > level, so get-with-consistency is restricted in this case.
> > > > > >> > Same problem we have at case get-with-consistency after put, so
> > we
> > > > have
> > > > > >> > restriction here too.
> > > > > >> > So, the order matter. :)
> > > > > >> >
> > > > > >> > See OperationRestrictionsCacheConsistencyTest [1] for details.
> > > > > >> >
> > > > > >> > [1]
> > > > > >> >
> > > > > >>
> > > >
> > https://github.com/apache/ignite/blob/8b0b0c3e1bde93ff9c4eb5667d794dd64a8b06f0/modules/core/src/test/java/org/apache/ignite/internal/processors/cache/consistency/OperationRestrictionsCacheConsistencyTest.java
> > > > > >> >
> > > > > >> > On Wed, Apr 3, 2019 at 4:54 

[jira] [Created] (IGNITE-11748) Node.js thin: auto-generated documentation stored in Git

2019-04-15 Thread Igor Sapego (JIRA)
Igor Sapego created IGNITE-11748:


 Summary: Node.js thin: auto-generated documentation stored in Git
 Key: IGNITE-11748
 URL: https://issues.apache.org/jira/browse/IGNITE-11748
 Project: Ignite
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.7
Reporter: Igor Sapego
 Fix For: 2.8


Currently, auto-generated documentation is stored in git in 
https://github.com/apache/ignite/tree/master/modules/platforms/nodejs/api_spec

Only conf.json file should be stored in git. Also, need to add Apache licence 
header to it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: GridDhtInvalidPartitionException takes the cluster down

2019-04-15 Thread Denis Magda
Alright, it took me longer to get back and look into it. Sorry for a delay.
Overall, folks, the things look creepy, seriously. I see 3 primary issues
ranged by priority.

1st, until the failure handler gets smart enough how to deal with
SYSTEM_WORKER_BLOCKED/SYSTEM_CRITICAL_OPERATION_TIMEOUT events we have to
avoid false-positives and print out a warning message instead of stopping a
node. *Andrey*, that's the new behavior of 2.7.5 release according to JIRA,
right?

2nd, the format of the warning/exception message doesn't give any hints for
troubleshooting nor a clue why this happened. I have no idea what to
suggest to those who see exceptions of this kind [1] and have to call for
help from Andrey and other committers. For instance, if to take [1] as a
reference

Critical system error detected. Will be handled accordingly to
configured handler
[hnd=StopNodeOrHaltFailureHandler [*tryStop*=false, *timeout*=0,
super=AbstractFailureHandler
[*ignoredFailureTypes*=[SYSTEM_WORKER_BLOCKED]]],
failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class
o.a.i.IgniteException:
GridWorker [name=grid-timeout-worker,
igniteInstanceName=TravelInventoryTesting,
finished=false, *heartbeatTs*=1553481506244]]] class
org.apache.ignite.IgniteException: GridWorker [name=grid-timeout-worker,
igniteInstanceName=TravelInventoryTesting,
finished=false, *heartbeatTs*=1553481506244]

A lot of the details might be hidden but, unfortunately, but the
interpretation of parameters like heartbeatTs, tryStop,  finished, timeout,
etc is hard. Seems like a message which has to be fed into a complementary
tool which will give me an answer. The format of the message has to help
the user (developer/devops/administrator/architect who has zero affiliation
with the Ignite community) with troubleshooting without calling for help on
the user list:

   - What happened - out of memory/critical error/hanging threads. We're
   already pretty good at that.
   - Why this happened - supply context in human language. For instance,
   "discovery thread was not responding within N seconds because of starvation
   or long GC pause."
   - Troubleshooting guidance - help the user to come around the issue. For
   instance, "Check your GC logs, ensure that compute tasks are not
   oversaturating CPUs causing livelocks. Tune parameter Y and Z."

Would you see anything else? Let's design and enhance.

3rd, full cluster shutdown. Agree, that's harder. Do we have stats when it
usually happens?


[1]
http://apache-ignite-users.70518.x6.nabble.com/Replace-or-Put-after-PutAsync-causes-Ignite-to-hang-td27871.html#a27873

-
Denis


On Sat, Apr 6, 2019 at 11:37 AM Me via Boomerang 
wrote:

> Message moved to top of Inbox by Boomerang (view this conversation
> 
> ).
>
> Don't want this notification email in the future? Go to
> https://b4g.baydin.com/settings and uncheck the 'At the top of your
> Inbox' option under Settings. Please note that your Boomeranged messages
> would no longer return to the top of your Inbox.
>