date:20190416

[jira] [Created] (IGNITE-11749) Implement automatic pages history dump on CorruptedTreeException

2019-04-16 Thread Alexey Goncharuk (JIRA)

Alexey Goncharuk created IGNITE-11749:
-

 Summary: Implement automatic pages history dump on 
CorruptedTreeException
 Key: IGNITE-11749
 URL: https://issues.apache.org/jira/browse/IGNITE-11749
 Project: Ignite
  Issue Type: Improvement
Reporter: Alexey Goncharuk


Currently, the only way to debug possible bugs in checkpointer/recovery 
mechanics is to manually parse WAL files after the corruption happened. This is 
not practical for several reasons. First, it requires manual actions which 
depend on the content of the exception. Second, it is not always possible to 
obtain WAL files (it may contain sensitive data).

We need to add a mechanics which will dump all information required for primary 
analysis of the corruption to the exception handler. For example, if an 
exception happened when materializing a link {{0xabcd}} written on an index 
page {{0xdcba}}, we need to dump history of both pages changes, checkpoint 
records on the analysis interval. Possibly, we should include FreeList pages to 
which the aforementioned pages were included to.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (IGNITE-11750) Implement locked pages info for long-running B+Tree operations

2019-04-16 Thread Alexey Goncharuk (JIRA)

Alexey Goncharuk created IGNITE-11750:
-

 Summary: Implement locked pages info for long-running B+Tree 
operations
 Key: IGNITE-11750
 URL: https://issues.apache.org/jira/browse/IGNITE-11750
 Project: Ignite
  Issue Type: Improvement
Reporter: Alexey Goncharuk


I've stumbled upon an incident where a batch of Ignite threads were hanging on 
BPlusTree operations trying to acquire read or write lock on pages. From the 
thread dump it is impossible to check if there is an issue with 
{{OffheapReadWriteLock}} or there is a subtle deadlock in the tree.

I suggest we implement a timeout for page lock acquire and tracking of locked 
pages. This should be relatively easy to implement in {{PageHandler}} (the only 
thing to consider is performance degradation). If a timeout occurs, we should 
print all the locks currently owned by a thread. This way we should be able to 
determine if there is a deadlock in the {{BPlusTree}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (IGNITE-11751) Javadoc broken

2019-04-16 Thread Peter Ivanov (JIRA)

Peter Ivanov created IGNITE-11751:
-

 Summary: Javadoc broken
 Key: IGNITE-11751
 URL: https://issues.apache.org/jira/browse/IGNITE-11751
 Project: Ignite
  Issue Type: Task
Reporter: Peter Ivanov
 Fix For: 2.8


{code}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-javadoc-plugin:3.0.0:aggregate (core-javadoc) on 
project apache-ignite: An error has occurred in Javadoc report generation:
[ERROR] Exit code: 1 - 
ignite/modules/cassandra/store/src/main/java/org/apache/ignite/cache/store/cassandra/serializer/package-info.java:21:
 warning: a package-info.java file has already been seen for package 
org.apache.ignite.cache.store.cassandra.serializer
[ERROR] package org.apache.ignite.cache.store.cassandra.serializer;
[ERROR]^
[ERROR] javadoc: warning - Multiple sources of package comments found for 
package "org.apache.ignite.cache.store.cassandra.serializer"
[ERROR] javadoc: error - Error - Exception java.lang.ClassNotFoundException 
thrown while trying to register Taglet 
org.apache.ignite.tools.javadoc.IgniteLinkTaglet...
[ERROR] ignite/modules/core/src/main/java/org/apache/ignite/Ignition.java:88: 
warning - @ignitelink is an unknown tag.
[ERROR] ignite/modules/core/src/main/java/org/apache/ignite/Ignition.java:88: 
warning - @ignitelink is an unknown tag.
[ERROR] ignite/modules/core/src/main/java/org/apache/ignite/Ignition.java:88: 
warning - @ignitelink is an unknown tag.
[ERROR] ignite/modules/core/src/main/java/org/apache/ignite/Ignition.java:88: 
warning - @ignitelink is an unknown tag.
[ERROR] 
ignite/modules/core/src/main/java/org/apache/ignite/configuration/IgniteConfiguration.java:828:
 warning - @ignitelink is an unknown tag.
[ERROR] 
ignite/modules/core/src/main/java/org/apache/ignite/configuration/IgniteConfiguration.java:828:
 warning - @ignitelink is an unknown tag.
[ERROR] 
ignite/modules/core/src/main/java/org/apache/ignite/cache/store/CacheStore.java:71:
 warning - @ignitelink is an unknown tag.
[ERROR] 
ignite/modules/core/src/main/java/org/apache/ignite/cache/store/CacheStore.java:71:
 warning - @ignitelink is an unknown tag.
[ERROR] 
ignite/modules/core/src/main/java/org/apache/ignite/cache/store/CacheStoreSessionListener.java:114:
 warning - @ignitelink is an unknown tag.
[ERROR] 
ignite/modules/core/src/main/java/org/apache/ignite/cache/store/CacheStoreSessionListener.java:114:
 warning - @ignitelink is an unknown tag.
[ERROR] 
ignite/modules/core/src/main/java/org/apache/ignite/cache/store/CacheStoreSessionListener.java:114:
 warning - @ignitelink is an unknown tag.
[ERROR] 
ignite/modules/core/src/main/java/org/apache/ignite/cache/store/CacheStoreSessionListener.java:114:
 warning - @ignitelink is an unknown tag.
[ERROR] 
ignite/modules/core/src/main/java/org/apache/ignite/transactions/Transaction.java:120:
 warning - @ignitelink is an unknown tag.
[ERROR] 
ignite/modules/core/src/main/java/org/apache/ignite/transactions/Transaction.java:120:
 warning - @ignitelink is an unknown tag.
[ERROR] 
ignite/modules/core/src/main/java/org/apache/ignite/spi/checkpoint/CheckpointSpi.java:60:
 warning - @ignitelink is an unknown tag.
[ERROR] 
ignite/modules/core/src/main/java/org/apache/ignite/spi/checkpoint/CheckpointSpi.java:60:
 warning - @ignitelink is an unknown tag.
[ERROR] 
ignite/modules/core/src/main/java/org/apache/ignite/spi/discovery/tcp/TcpDiscoverySpi.java:233:
 warning - @ignitelink is an unknown tag.
[ERROR] 
ignite/modules/core/src/main/java/org/apache/ignite/spi/discovery/tcp/TcpDiscoverySpi.java:233:
 warning - @ignitelink is an unknown tag.
[ERROR] 
ignite/modules/core/src/main/java/org/apache/ignite/spi/deployment/DeploymentSpi.java:61:
 warning - @ignitelink is an unknown tag.
[ERROR] 
ignite/modules/core/src/main/java/org/apache/ignite/spi/deployment/DeploymentSpi.java:61:
 warning - @ignitelink is an unknown tag.
[ERROR] 
ignite/modules/core/src/main/java/org/apache/ignite/compute/gridify/GridifySetToSet.java:154:
 warning - @ignitelink is an unknown tag.
[ERROR] 
ignite/modules/core/src/main/java/org/apache/ignite/compute/gridify/GridifySetToSet.java:154:
 warning - @ignitelink is an unknown tag.
[ERROR] 
ignite/modules/core/src/main/java/org/apache/ignite/compute/gridify/GridifySetToValue.java:152:
 warning - @ignitelink is an unknown tag.
[ERROR] 
ignite/modules/core/src/main/java/org/apache/ignite/compute/gridify/GridifySetToValue.java:152:
 warning - @ignitelink is an unknown tag.
[ERROR] 
ignite/modules/spring/src/main/java/org/apache/ignite/cache/spring/SpringCacheManager.java:145:
 warning - @ignitelink is an unknown tag.
[ERROR] 
ignite/modules/spring/src/main/java/org/apache/ignite/cache/spring/SpringCacheManager.java:145:
 warning - @ignitelink is an unknown tag.
[ERROR] 
ignite/modules/spring/src/main/java/org/apache/ignite/transactions/spring/SpringTransactionManager.java:

[jira] [Created] (IGNITE-11752) Refactor usages of "System.getenv(key)" to IgniteSystemProperties.getString(key)

2019-04-16 Thread Alexey Kuznetsov (JIRA)

Alexey Kuznetsov created IGNITE-11752:
-

 Summary: Refactor usages of "System.getenv(key)" to 
IgniteSystemProperties.getString(key)
 Key: IGNITE-11752
 URL: https://issues.apache.org/jira/browse/IGNITE-11752
 Project: Ignite
  Issue Type: Improvement
  Components: general
Reporter: Alexey Kuznetsov
Assignee: Alexey Kuznetsov


IgniteSystemProperties.getString(key) implemented as:
 1. Try to get property from System.properties.
 2. If not found - try to get from System.getenv

In Java you could easily override System.properties from code, for testing 
purposes, for example, but it is almost impossible to do the same for 
environment variables.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Consistency check and fix (review request)

2019-04-16 Thread Павлухин Иван

Anton,

Thank you for your effort for improving consistency guarantees
provided by Ignite.

The subject sounds really vital. Could you please elaborate why it
comes as an on-demand enabled proxy but not as a mode enabled by some
configuration property (or even as a default behavior)? How do you see
the future development of such consistency checks? As for me it will
be great if we can improve consistency guarantees provided by default.

Also thinking loud a bit:
1. It sounds suspicious that reads can cause writes (unexpected
deadlocks might be possible).
2. I do not believe that it is possible to implement a (bugless?)
feature which will fix other bugs.
3. A storage (or database) product (Ignite in our case) consistency is
not equal to a user application consistency. So, it might be that
introduced checks are insufficient to make business applications
happy.

пн, 15 апр. 2019 г. в 19:27, Andrey Gura :
>
> Anton,
>
> I'm trying tell you that this proxy can produce false positive result,
> incorrect result and just hide bugs. What will the next solution?
> withNoBugs proxy?
>
> You can perform consistency check using idle verify utility. Recovery
> tool is good idea but user should trigger this process, not some cache
> proxy implementation.
>
> On Mon, Apr 15, 2019 at 5:34 PM Anton Vinogradov  wrote:
> >
> > Seems, we already fixed all bugs caused this feature, but there is no
> > warranty we will not create new :)
> > This proxy is just checker that consistency is ok.
> >
> > >> reaching bugless implementation
> > Not sure it's possible. Once you have software it contains bugs.
> > This proxy will tell you whether these bugs lead to inconsistency.
> >
> > On Mon, Apr 15, 2019 at 5:19 PM Andrey Gura  wrote:
> >
> > > Method name is minor problem. I still believe that there is no need
> > > for this proxy because there are no any guarantees about bugless
> > > implementation this functionality. Better way is reaching bugless
> > > implementation of current functionality.
> > >
> > > On Mon, Apr 15, 2019 at 4:51 PM Anton Vinogradov  wrote:
> > > >
> > > > Andrey,
> > > >
> > > > >> It means also that at least method name is bad.
> > > > Agreed, already discussed with Aleksey Plekhanov.
> > > > Decided that ".withConsistencyCheck()" is a proper name.
> > > >
> > > > >> What is the profit?
> > > > This proxy allows to check (and fix) is there any consistency violation
> > > > across the topology.
> > > > The proxy will check all backups contain the same values as primary.
> > > > So, when it's possible (you're ready to spend resources for this check)
> > > you
> > > > will be able to read-with-consistency-check.
> > > > This will decrease the amount of "inconsistency caused
> > > > war/strikes/devastation" situations, which is important for financial
> > > > systems.
> > > >
> > > > On Mon, Apr 15, 2019 at 3:58 PM Andrey Gura  wrote:
> > > >
> > > > > Anton,
> > > > >
> > > > > what does expression "withConsistency" mean? From user's standpoint it
> > > > > means that all operations performed without this proxy are not
> > > > > consistent. It means also that at least method name is bad.
> > > > >
> > > > > Are there any guarantees that withConsistency proxy will not contain
> > > > > bugs that will lead to inconsistent write after inconsistency was
> > > > > found? I think there are no such guarantees. Bugs still are possible.
> > > > > So I always must use withConsistency proxy because I doesn't have
> > > > > other choice - all ways are unreliable and withConsistency just sounds
> > > > > better.
> > > > >
> > > > > Eventually we will have two different ways for working with cache
> > > > > values with different bugs set. What is the profit?
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Apr 12, 2019 at 2:49 PM Anton Vinogradov 
> > > wrote:
> > > > > >
> > > > > > Folks,
> > > > > >
> > > > > > I've checked the tx benchmarks and found no performance drop.
> > > > > > Also, see no issues at TC results.
> > > > > > So, seems, code ready to be merged.
> > > > > >
> > > > > > Everyone interested, please share any objections about
> > > > > > - public API
> > > > > > - test coverage
> > > > > > - implementation approach
> > > > > >
> > > > > > On Wed, Apr 3, 2019 at 5:46 PM Anton Vinogradov 
> > > wrote:
> > > > > >
> > > > > > > Nikolay,
> > > > > > >
> > > > > > > This is not a PoC, but the final solution (I hope so:) ) required
> > > the
> > > > > > > review.
> > > > > > > LWW means Last Write Wins, detailed explanation can be found at
> > > IEP-31.
> > > > > > >
> > > > > > > On Wed, Apr 3, 2019 at 5:24 PM Nikolay Izhikov <
> > > nizhi...@apache.org>
> > > > > > > wrote:
> > > > > > >
> > > > > > >> Hello, Anton.
> > > > > > >>
> > > > > > >> Thanks for the PoC.
> > > > > > >>
> > > > > > >> > finds correct values according to LWW strategy
> > > > > > >>
> > > > > > >> Can you, please, clarify what is LWW strategy?
> > > > > > >>
> > > > > > >> В Ср, 03/04/2019 в 17:19 +0300, Anton Vinogradov пишет:
> > > >

[jira] [Created] (IGNITE-11753) control.sh improve error message in case of connection to secured cluster without credentials.

2019-04-16 Thread Sergey Antonov (JIRA)

Sergey Antonov created IGNITE-11753:
---

 Summary: control.sh improve error message in case of connection to 
secured cluster without credentials.
 Key: IGNITE-11753
 URL: https://issues.apache.org/jira/browse/IGNITE-11753
 Project: Ignite
  Issue Type: Improvement
Reporter: Sergey Antonov


If control.sh tries to connect to secured cluster without login/password now we 
got:
{noformat}
./control.sh --state
Failed to get cluster state.
Authentication error, try connection again.
user:
{noformat}

We should print info about attempt to connect to secured cluster and request 
login/password if it isn't set. I.e.
{noformat}
./control.sh --state
Failed to get cluster state.
Cluster required authentication.
user:
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (IGNITE-11754) Memory leak on the GridCacheTxFinishSync#threadMap

2019-04-16 Thread Taras Ledkov (JIRA)

Taras Ledkov created IGNITE-11754:
-

 Summary: Memory leak on the GridCacheTxFinishSync#threadMap
 Key: IGNITE-11754
 URL: https://issues.apache.org/jira/browse/IGNITE-11754
 Project: Ignite
  Issue Type: Bug
  Components: general, mvcc
Affects Versions: 2.7
Reporter: Taras Ledkov
 Fix For: 2.8


The {{GridCacheTxFinishSync#threadMap}} is not cleared when tx thread is 
terminated.
So, memory leak happens when transactions are executed inside new start/stopped 
threads.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: New Committer: Vyacheslav Daradur

2019-04-16 Thread Vyacheslav Daradur

Thank you! I'm glad to contribute to development of the project.

On Fri, Apr 12, 2019 at 1:14 PM Denis Mekhanikov  wrote:
>
> Well done Slava!
>
> It was great working with you on the service grid redesign.
> Looking forward to seeing new commits from you!
>
> Denis
>
> чт, 11 апр. 2019 г. в 18:27, Denis Magda :
>
> > Well deserved, Vyacheslav! Thanks for hardening Service Grid pushing it to
> > a completely next level!
> >
> > -
> > Denis
> >
> >
> > On Thu, Apr 11, 2019 at 7:00 AM Dmitriy Pavlov  wrote:
> >
> > > Dear Ignite Developers,
> > >
> > > The Project Management Committee (PMC) for Apache Ignite has invited
> > > Vyacheslav Daradur to become a committer and we are pleased to announce
> > > that he has accepted. Apache Ignite PMC appreciates Vyacheslav’s
> > > contribution to service grid redesign (is was collaborative efforts. BTW,
> > > thanks to everyone involved), compatibility test framework, contribution
> > to
> > > community development, and to abbreviation plugin.
> > >
> > > Being a committer enables easier contribution to the project since there
> > is
> > > no need to go via the patch submission process. This should enable better
> > > productivity.
> > >
> > > Please join me in welcoming Vyacheslav, and congratulating him on the new
> > > role in the Apache Ignite Community.
> > >
> > > Best Regards,
> > > Dmitriy Pavlov
> > > on behalf of the Apache Ignite PMC
> > >
> >



-- 
Best Regards, Vyacheslav D.

[jira] [Created] (IGNITE-11755) Memory leak H2 connections at the ConnectionManager#detachedConns

2019-04-16 Thread Taras Ledkov (JIRA)

Taras Ledkov created IGNITE-11755:
-

 Summary: Memory leak H2 connections at the 
ConnectionManager#detachedConns
 Key: IGNITE-11755
 URL: https://issues.apache.org/jira/browse/IGNITE-11755
 Project: Ignite
  Issue Type: Bug
  Components: sql
Affects Versions: 2.7
Reporter: Taras Ledkov
Assignee: Taras Ledkov
 Fix For: 2.8


{{ConnectionManager#detachedConns}} leaks on mvcc transnational SELECT.
Reproduce: 
1. CREATE TABLE with enabled MVCC
2. Do SELECTs.
3. Each query is executed at the new JDBC thin connection. A connection is 
closed after query.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Consistency check and fix (review request)

2019-04-16 Thread Anton Vinogradov

Andrey, thanks for tips

>> You can perform consistency check using idle verify utility.
Could you please point to utility's page?
According to its name, it requires to stop the cluster to perform the check?
That's impossible at real production when you should have downtime less
that some minutes per year.
So, the only case I see is to use online check during periods of moderate
activity.

>> Recovery tool is good idea
This tool is a part of my IEP.
But recovery tool (process)
- will allow you to check entries in memory only (otherwise, you will warm
up the cluster incorrectly), and that's a problem when you have
persisted/in_memory rate > 10:1
- will cause latency drop for some (eg. 90+ percentile) requests, which is
not acceptable for real production, when we have strict SLA.
- will not guarantee that each operation will use consistent data,
sometimes it's extremely essential
so, the process is a cool idea, but, sometime you may need more.

Ivan, thanks for analysis

>> why it comes as an on-demand enabled proxy but not as a mode enabled by
some configuration property
It's a bad idea to have this feature permanently enabled, it slows down the
system by design.
Customer should be able to change strategy on the fly according to time
periods or load.
Also, we're going to use this proxy for odd requests or for every 5-th,
10-th, 100-th request depends on the load/time/SLA/etc.
The goal is to perform as much as possible gets-with-consistency operations
without stopping the cluster and never find a problem :)

>> As for me it will be great if we can improve consistency guarantees
provided by default.
Once you checked backups you decreased throughput and increased latency.
This feature requred only for some financial, nuclear, health systems when
you should be additionally sure about consistency.
It's like a
- read from backups
- data modification outside the transaction
- using FULL_ASYNC instead of FULL_SYNC,
sometimes it's possible, sometimes not.

>> 1. It sounds suspicious that reads can cause writes (unexpected
deadlocks might be possible).
Code performs writes
- key per additional transaction in case original tx was OPTIMISTIC ||
READ_COMMITTED,
- all keys per same tx in case original tx was PESSIMISTIC &&
!READ_COMMITTED, since you already obtain the locks,
so, deadlock should be impossible.

>> 2. I do not believe that it is possible to implement a (bugless?)
feature which will fix other bugs.
It does not fix the bugs, it looks for inconsistency (no matter how it
happened) and reports using events (previous state and how it was fixed).
This allows continuing processing for all the entries, even inconsistent.
But, each such fix should be rechecked manually, for sure.

On Tue, Apr 16, 2019 at 11:39 AM Павлухин Иван  wrote:

> Anton,
>
> Thank you for your effort for improving consistency guarantees
> provided by Ignite.
>
> The subject sounds really vital. Could you please elaborate why it
> comes as an on-demand enabled proxy but not as a mode enabled by some
> configuration property (or even as a default behavior)? How do you see
> the future development of such consistency checks? As for me it will
> be great if we can improve consistency guarantees provided by default.
>
> Also thinking loud a bit:
> 1. It sounds suspicious that reads can cause writes (unexpected
> deadlocks might be possible).
> 2. I do not believe that it is possible to implement a (bugless?)
> feature which will fix other bugs.
> 3. A storage (or database) product (Ignite in our case) consistency is
> not equal to a user application consistency. So, it might be that
> introduced checks are insufficient to make business applications
> happy.
>
> пн, 15 апр. 2019 г. в 19:27, Andrey Gura :
> >
> > Anton,
> >
> > I'm trying tell you that this proxy can produce false positive result,
> > incorrect result and just hide bugs. What will the next solution?
> > withNoBugs proxy?
> >
> > You can perform consistency check using idle verify utility. Recovery
> > tool is good idea but user should trigger this process, not some cache
> > proxy implementation.
> >
> > On Mon, Apr 15, 2019 at 5:34 PM Anton Vinogradov  wrote:
> > >
> > > Seems, we already fixed all bugs caused this feature, but there is no
> > > warranty we will not create new :)
> > > This proxy is just checker that consistency is ok.
> > >
> > > >> reaching bugless implementation
> > > Not sure it's possible. Once you have software it contains bugs.
> > > This proxy will tell you whether these bugs lead to inconsistency.
> > >
> > > On Mon, Apr 15, 2019 at 5:19 PM Andrey Gura  wrote:
> > >
> > > > Method name is minor problem. I still believe that there is no need
> > > > for this proxy because there are no any guarantees about bugless
> > > > implementation this functionality. Better way is reaching bugless
> > > > implementation of current functionality.
> > > >
> > > > On Mon, Apr 15, 2019 at 4:51 PM Anton Vinogradov 
> wrote:
> > > > >
> > > > > Andrey,
> > > > >
> > > > > >>

Re: Consistency check and fix (review request)

2019-04-16 Thread Nikolay Izhikov

Hello, Anton.

> Customer should be able to change strategy on the fly according to time> 
> periods or load.

I think we should allow to administrator to enable/disable Consistency check.
This option shouldn't be related to application code because "Consistency 
check" is some kind of maintance procedure.

What do you think?

В Вт, 16/04/2019 в 12:47 +0300, Anton Vinogradov пишет:
> Andrey, thanks for tips
> 
> > > You can perform consistency check using idle verify utility.
> 
> Could you please point to utility's page?
> According to its name, it requires to stop the cluster to perform the check?
> That's impossible at real production when you should have downtime less
> that some minutes per year.
> So, the only case I see is to use online check during periods of moderate
> activity.
> 
> > > Recovery tool is good idea
> 
> This tool is a part of my IEP.
> But recovery tool (process)
> - will allow you to check entries in memory only (otherwise, you will warm
> up the cluster incorrectly), and that's a problem when you have
> persisted/in_memory rate > 10:1
> - will cause latency drop for some (eg. 90+ percentile) requests, which is
> not acceptable for real production, when we have strict SLA.
> - will not guarantee that each operation will use consistent data,
> sometimes it's extremely essential
> so, the process is a cool idea, but, sometime you may need more.
> 
> Ivan, thanks for analysis
> 
> > > why it comes as an on-demand enabled proxy but not as a mode enabled by
> 
> some configuration property
> It's a bad idea to have this feature permanently enabled, it slows down the
> system by design.
> Customer should be able to change strategy on the fly according to time
> periods or load.
> Also, we're going to use this proxy for odd requests or for every 5-th,
> 10-th, 100-th request depends on the load/time/SLA/etc.
> The goal is to perform as much as possible gets-with-consistency operations
> without stopping the cluster and never find a problem :)
> 
> > > As for me it will be great if we can improve consistency guarantees
> 
> provided by default.
> Once you checked backups you decreased throughput and increased latency.
> This feature requred only for some financial, nuclear, health systems when
> you should be additionally sure about consistency.
> It's like a
> - read from backups
> - data modification outside the transaction
> - using FULL_ASYNC instead of FULL_SYNC,
> sometimes it's possible, sometimes not.
> 
> > > 1. It sounds suspicious that reads can cause writes (unexpected
> 
> deadlocks might be possible).
> Code performs writes
> - key per additional transaction in case original tx was OPTIMISTIC ||
> READ_COMMITTED,
> - all keys per same tx in case original tx was PESSIMISTIC &&
> !READ_COMMITTED, since you already obtain the locks,
> so, deadlock should be impossible.
> 
> > > 2. I do not believe that it is possible to implement a (bugless?)
> 
> feature which will fix other bugs.
> It does not fix the bugs, it looks for inconsistency (no matter how it
> happened) and reports using events (previous state and how it was fixed).
> This allows continuing processing for all the entries, even inconsistent.
> But, each such fix should be rechecked manually, for sure.
> 
> On Tue, Apr 16, 2019 at 11:39 AM Павлухин Иван  wrote:
> 
> > Anton,
> > 
> > Thank you for your effort for improving consistency guarantees
> > provided by Ignite.
> > 
> > The subject sounds really vital. Could you please elaborate why it
> > comes as an on-demand enabled proxy but not as a mode enabled by some
> > configuration property (or even as a default behavior)? How do you see
> > the future development of such consistency checks? As for me it will
> > be great if we can improve consistency guarantees provided by default.
> > 
> > Also thinking loud a bit:
> > 1. It sounds suspicious that reads can cause writes (unexpected
> > deadlocks might be possible).
> > 2. I do not believe that it is possible to implement a (bugless?)
> > feature which will fix other bugs.
> > 3. A storage (or database) product (Ignite in our case) consistency is
> > not equal to a user application consistency. So, it might be that
> > introduced checks are insufficient to make business applications
> > happy.
> > 
> > пн, 15 апр. 2019 г. в 19:27, Andrey Gura :
> > > 
> > > Anton,
> > > 
> > > I'm trying tell you that this proxy can produce false positive result,
> > > incorrect result and just hide bugs. What will the next solution?
> > > withNoBugs proxy?
> > > 
> > > You can perform consistency check using idle verify utility. Recovery
> > > tool is good idea but user should trigger this process, not some cache
> > > proxy implementation.
> > > 
> > > On Mon, Apr 15, 2019 at 5:34 PM Anton Vinogradov  wrote:
> > > > 
> > > > Seems, we already fixed all bugs caused this feature, but there is no
> > > > warranty we will not create new :)
> > > > This proxy is just checker that consistency is ok.
> > > > 
> > > > > > reaching bugles

Re: Consistency check and fix (review request)

2019-04-16 Thread Anton Vinogradov

Nikolay, that was the first approach

>> I think we should allow to the administrator to enable/disable
Consistency check.
In that case, we have to introduce cluster-wide change-strategy operation,
since every client node should be aware of the change.
Also, we have to specify caches list, and for each - should we check each
request or only 5-th and so on.
Procedure and configuration become overcomplicated in this case.

My idea that specific service will be able to use a special proxy according
to its own strategy
(eg. when administrator inside the building and boss is sleeping - all
operations on "cache[a,b,c]ed*" should check the consistency).
All service clients will have the same guarantees in that case.

So in other words, consistency should be guaranteed by service, not by
Ignite.
Service should guarantee consistency not only using new proxy but, for
example, using correct isolation fo txs.
That's not a good Idea to specify isolation mode for Ignite, same situation
with get-with-consistency-check.

On Tue, Apr 16, 2019 at 12:56 PM Nikolay Izhikov 
wrote:

> Hello, Anton.
>
> > Customer should be able to change strategy on the fly according to time>
> periods or load.
>
> I think we should allow to administrator to enable/disable Consistency
> check.
> This option shouldn't be related to application code because "Consistency
> check" is some kind of maintance procedure.
>
> What do you think?
>
> В Вт, 16/04/2019 в 12:47 +0300, Anton Vinogradov пишет:
> > Andrey, thanks for tips
> >
> > > > You can perform consistency check using idle verify utility.
> >
> > Could you please point to utility's page?
> > According to its name, it requires to stop the cluster to perform the
> check?
> > That's impossible at real production when you should have downtime less
> > that some minutes per year.
> > So, the only case I see is to use online check during periods of moderate
> > activity.
> >
> > > > Recovery tool is good idea
> >
> > This tool is a part of my IEP.
> > But recovery tool (process)
> > - will allow you to check entries in memory only (otherwise, you will
> warm
> > up the cluster incorrectly), and that's a problem when you have
> > persisted/in_memory rate > 10:1
> > - will cause latency drop for some (eg. 90+ percentile) requests, which
> is
> > not acceptable for real production, when we have strict SLA.
> > - will not guarantee that each operation will use consistent data,
> > sometimes it's extremely essential
> > so, the process is a cool idea, but, sometime you may need more.
> >
> > Ivan, thanks for analysis
> >
> > > > why it comes as an on-demand enabled proxy but not as a mode enabled
> by
> >
> > some configuration property
> > It's a bad idea to have this feature permanently enabled, it slows down
> the
> > system by design.
> > Customer should be able to change strategy on the fly according to time
> > periods or load.
> > Also, we're going to use this proxy for odd requests or for every 5-th,
> > 10-th, 100-th request depends on the load/time/SLA/etc.
> > The goal is to perform as much as possible gets-with-consistency
> operations
> > without stopping the cluster and never find a problem :)
> >
> > > > As for me it will be great if we can improve consistency guarantees
> >
> > provided by default.
> > Once you checked backups you decreased throughput and increased latency.
> > This feature requred only for some financial, nuclear, health systems
> when
> > you should be additionally sure about consistency.
> > It's like a
> > - read from backups
> > - data modification outside the transaction
> > - using FULL_ASYNC instead of FULL_SYNC,
> > sometimes it's possible, sometimes not.
> >
> > > > 1. It sounds suspicious that reads can cause writes (unexpected
> >
> > deadlocks might be possible).
> > Code performs writes
> > - key per additional transaction in case original tx was OPTIMISTIC ||
> > READ_COMMITTED,
> > - all keys per same tx in case original tx was PESSIMISTIC &&
> > !READ_COMMITTED, since you already obtain the locks,
> > so, deadlock should be impossible.
> >
> > > > 2. I do not believe that it is possible to implement a (bugless?)
> >
> > feature which will fix other bugs.
> > It does not fix the bugs, it looks for inconsistency (no matter how it
> > happened) and reports using events (previous state and how it was fixed).
> > This allows continuing processing for all the entries, even inconsistent.
> > But, each such fix should be rechecked manually, for sure.
> >
> > On Tue, Apr 16, 2019 at 11:39 AM Павлухин Иван 
> wrote:
> >
> > > Anton,
> > >
> > > Thank you for your effort for improving consistency guarantees
> > > provided by Ignite.
> > >
> > > The subject sounds really vital. Could you please elaborate why it
> > > comes as an on-demand enabled proxy but not as a mode enabled by some
> > > configuration property (or even as a default behavior)? How do you see
> > > the future development of such consistency checks? As for me it will
> > > be great if we can im

[jira] [Created] (IGNITE-11756) SQL: implement a table row count statistics for the local queries

2019-04-16 Thread Roman Kondakov (JIRA)

Roman Kondakov created IGNITE-11756:
---

 Summary: SQL: implement a table row count statistics for the local 
queries
 Key: IGNITE-11756
 URL: https://issues.apache.org/jira/browse/IGNITE-11756
 Project: Ignite
  Issue Type: Improvement
  Components: sql
Reporter: Roman Kondakov


Row count statistics should help the H2 optimizer to select the better query 
execution plan. Currently the row count supplied to H2 engine is hardcoded 
value == 1 (see {{org.h2.index.Index#getRowCountApproximation}}).  As a 
first step we can provide an actual table size in the case of local query. To 
prevent counting size on each invocation we can cache row count value and 
invalidate it in some cases:
 * Rebalancing
 * Multiple updates (after the initial loading)
 * On timeout (i.e. 1 minute)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Impossible memory region created in TC test

2019-04-16 Thread Nikolay Izhikov

Hello, Igniters.

> Why test on TC can allocate Data Region bigger than the amount of available 
> RAM?

Seems, we are using `echo 1 > /proc/sys/vm/overcommit_memory` on TC.

So I should fix my test :).

В Пн, 15/04/2019 в 13:41 +0300, Nikolay Izhikov пишет:
> Hello, Ilya.
> 
> > Ignite does not touch every page of data region, and until you touch a page 
> > OS will not allocate any specific physical RAM to the virtual RAM address 
> > of that page.
> 
> This is not true.
> Take a look into this discussion [1]
> 
> > Moreover, AFAIK Ignite will not even allocate all the memory permitted by 
> > data region until it is needed
> 
> May be I miss something, but
> 
> 1. If persistenEnabled = false Ignite will allocate 1 segment on start [2]
> 2. If persistenEnabled = true Ignite will allocate all segments on start [3]
> 
> > If you would use Pre-Touch feature which was suggested in this developer 
> > list a few months ago, you will see it fail explicitly.
> 
> Locally, test already work as expected.
> But on TC it fails and DataRegion of 1024GiB can be created.
> 
> Seems, there is some flag on TC that enable this behaviour.
> 
> [1] 
> http://apache-ignite-developers.2346864.n4.nabble.com/Data-regions-on-client-nodes-td32834.html
> [2] 
> https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/pagemem/impl/PageMemoryNoStoreImpl.java#L245
> [3] 
> https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/pagemem/PageMemoryImpl.java#L386
> 
> 
> В Пн, 15/04/2019 в 10:18 +0300, Ilya Kasnacheev пишет:
> > Hello!
> > 
> > Ignite does not touch every page of data region, and until you touch a page
> > OS will not allocate any specific physical RAM to the virtual RAM address
> > of that page.
> > 
> > Moreover, AFAIK Ignite will not even allocate all the memory permitted by
> > data region until it is needed. It will allocate memory in chunks, which
> > means your system will slow to grind trying to find RAM for the next chunk
> > as you try to load data into such node.
> > 
> > If you would use Pre-Touch feature which was suggested in this developer
> > list a few months ago, you will see it fail explicitly.
> > 
> > Regards,


signature.asc
Description: This is a digitally signed message part

[jira] [Created] (IGNITE-11757) Missed partitions during rebalancing when new blank node joins

2019-04-16 Thread Ilya Kasnacheev (JIRA)

Ilya Kasnacheev created IGNITE-11757:


 Summary: Missed partitions during rebalancing when new blank node 
joins
 Key: IGNITE-11757
 URL: https://issues.apache.org/jira/browse/IGNITE-11757
 Project: Ignite
  Issue Type: Bug
  Components: cache
Reporter: Ilya Kasnacheev
Assignee: Ivan Rakov


Please take a look at newly added test
GridCachePartitionedSupplyEventsSelfTest.testSupplyEvents

There's logging of missed partitions during rebalancing, and as you can see 
partitions are missed even when a new node joins stable topology, with no nodes 
leaving.

Expected behavior is that in this case no partitions will be missed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (IGNITE-11758) Python thin: a lot of documentation files without license header

2019-04-16 Thread Igor Sapego (JIRA)

Igor Sapego created IGNITE-11758:


 Summary: Python thin: a lot of documentation files without license 
header
 Key: IGNITE-11758
 URL: https://issues.apache.org/jira/browse/IGNITE-11758
 Project: Ignite
  Issue Type: Bug
  Components: documentation, thin client
Affects Versions: 2.7
Reporter: Igor Sapego
 Fix For: 2.8


There are a lot of .rst documentation files in modules/platforms/python/docs/ 
that does not contain license header. We need either delete them if they are 
auto generated or add headers to them if they are not.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (IGNITE-11759) [ML] Duplicate depenpecies for ml artifacts

2019-04-16 Thread Yury Babak (JIRA)

Yury Babak created IGNITE-11759:
---

 Summary: [ML] Duplicate depenpecies for ml artifacts
 Key: IGNITE-11759
 URL: https://issues.apache.org/jira/browse/IGNITE-11759
 Project: Ignite
  Issue Type: Improvement
  Components: ml
Affects Versions: 2.7
Reporter: Yury Babak






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (IGNITE-11760) [TC Bot] Support escaping or replacement of vertical dash in the suite name

2019-04-16 Thread Dmitriy Pavlov (JIRA)

Dmitriy Pavlov created IGNITE-11760:
---

 Summary: [TC Bot] Support escaping or replacement of vertical dash 
in the suite name
 Key: IGNITE-11760
 URL: https://issues.apache.org/jira/browse/IGNITE-11760
 Project: Ignite
  Issue Type: Task
Reporter: Dmitriy Pavlov
Assignee: Dmitriy Pavlov


Usage of same special symbol in JIRA makes TC bot visa unreadable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[MTCGA]: new failures in builds [3616767] needs to be handled

2019-04-16 Thread dpavlov . tasks

Hi Igniters,

 I've detected some new issue on TeamCity to be handled. You are more than 
welcomed to help.

 If your changes can lead to this failure(s): We're grateful that you were a 
volunteer to make the contribution to this project, but things change and you 
may no longer be able to finalize your contribution.
 Could you respond to this email and indicate if you wish to continue and fix 
test failures or step down and some committer may revert you commit. 

 *Recently contributed test failed in master 
GridP2PComputeWithNestedEntryProcessorTest.testContinuousMode 
https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=-266239113081528&branch=%3Cdefault%3E&tab=testDetails

 *Recently contributed test failed in master 
GridP2PComputeWithNestedEntryProcessorTest.testSharedMode 
https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=490604068578396435&branch=%3Cdefault%3E&tab=testDetails
 Changes may lead to failure were done by 
 - vldpyatkov 
https://ci.ignite.apache.org/viewModification.html?modId=880634

 - Here's a reminder of what contributors were agreed to do 
https://cwiki.apache.org/confluence/display/IGNITE/How+to+Contribute 
 - Should you have any questions please contact dev@ignite.apache.org 

Best Regards,
Apache Ignite TeamCity Bot 
https://github.com/apache/ignite-teamcity-bot
Notification generated at 19:44:42 16-04-2019

[jira] [Created] (IGNITE-11761) Normalize encoding for Ignite .NET test file

2019-04-16 Thread Dmitriy Pavlov (JIRA)

Dmitriy Pavlov created IGNITE-11761:
---

 Summary: Normalize encoding for Ignite .NET test file
 Key: IGNITE-11761
 URL: https://issues.apache.org/jira/browse/IGNITE-11761
 Project: Ignite
  Issue Type: Task
Reporter: Dmitriy Pavlov


It is encoded in UTF-16, but all other files are UTF-8

Idea blocks me from changing encoding because of BOM exists.
https://stackoverflow.com/questions/32986445/remove-a-bom-character-in-a-file



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (IGNITE-11762) Test testClientStartCloseServersRestart causes hang of the whole Cache 2 suite in master

2019-04-16 Thread Ivan Rakov (JIRA)

Ivan Rakov created IGNITE-11762:
---

 Summary: Test testClientStartCloseServersRestart causes hang of 
the whole Cache 2 suite in master
 Key: IGNITE-11762
 URL: https://issues.apache.org/jira/browse/IGNITE-11762
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Rakov
Assignee: Pavel Kovalenko
 Fix For: 2.8


Attempt to restart server node in test hangs:

 
{code:java}
[2019-04-16 19:56:45,049][WARN ][restart-1][GridCachePartitionExchangeManager] 
Failed to wait for initial partition map exchange. Possible reasons are:
^-- Transactions in deadlock.
^-- Long running transactions (ignore if this is the case).
^-- Unreleased explicit locks.
{code}
The reason is that previous PME (late affinity assignment) still hangs due to 
pending transaction:

 

 
{code:java}
[2019-04-16 19:56:23,717][WARN 
][exchange-worker-#1039%cache.IgniteClientCacheStartFailoverTest3%][diagnostic] 
Pending transactions:
[2019-04-16 19:56:23,718][WARN 
][exchange-worker-#1039%cache.IgniteClientCacheStartFailoverTest3%][diagnostic] 
>>> [txVer=AffinityTopologyVersion [topVer=11, minorTopVer=0], exchWait=true, 
tx=GridDhtTxLocal [nearNodeId=8559bfe0-3d4a-4090-a457-6df0eba5, 
nearFutId=1edc7172a61-941f9dde-2b60-4a1f-8213-7d23d738bf33, nearMiniId=1, 
nearFinFutId=null, nearFinMiniId=0, nearXidVer=GridCacheVersion 
[topVer=166913752, order=1555433759036, nodeOrder=6], lb=null, 
super=GridDhtTxLocalAdapter [nearOnOriginatingNode=false, nearNodes=KeySetView 
[], dhtNodes=KeySetView [9ef33532-0e4a-4561-b57e-042afe10], 
explicitLock=false, super=IgniteTxLocalAdapter [completedBase=null, 
sndTransformedVals=false, depEnabled=false, txState=IgniteTxStateImpl 
[activeCacheIds=[-1062368467], recovery=false, mvccEnabled=true, 
mvccCachingCacheIds=[], txMap=HashSet []], super=IgniteTxAdapter 
[xidVer=GridCacheVersion [topVer=166913752, order=1555433759045, nodeOrder=10], 
writeVer=null, implicit=false, loc=true, threadId=1210, 
startTime=1555433762847, nodeId=0088e9b8-f859-4d14-8071-6388e473, 
startVer=GridCacheVersion [topVer=166913752, order=1555433759045, 
nodeOrder=10], endVer=null, isolation=REPEATABLE_READ, concurrency=PESSIMISTIC, 
timeout=0, sysInvalidate=false, sys=false, plc=2, commitVer=GridCacheVersion 
[topVer=166913752, order=1555433759045, nodeOrder=10], finalizing=NONE, 
invalidParts=null, state=MARKED_ROLLBACK, timedOut=false, 
topVer=AffinityTopologyVersion [topVer=11, minorTopVer=0], 
mvccSnapshot=MvccSnapshotResponse [futId=292, crdVer=1555433741506, cntr=395, 
opCntr=1, txs=[394], cleanupVer=390, tracking=0], skipCompletedVers=false, 
parentTx=null, duration=20866ms, onePhaseCommit=false], size=0

{code}
However, load threads don't start any explicit transactions: they either hang 
on put()/get() or on clientCache.close().

Rolling back IGNITE-10799 resolves the issue (however, test remains flaky with 
~10% fail rate due to unhandled TransactionSerializationException).

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (IGNITE-11749) Implement automatic pages history dump on CorruptedTreeException

[jira] [Created] (IGNITE-11750) Implement locked pages info for long-running B+Tree operations

[jira] [Created] (IGNITE-11751) Javadoc broken

[jira] [Created] (IGNITE-11752) Refactor usages of "System.getenv(key)" to IgniteSystemProperties.getString(key)

Re: Consistency check and fix (review request)

[jira] [Created] (IGNITE-11753) control.sh improve error message in case of connection to secured cluster without credentials.

[jira] [Created] (IGNITE-11754) Memory leak on the GridCacheTxFinishSync#threadMap

Re: New Committer: Vyacheslav Daradur

[jira] [Created] (IGNITE-11755) Memory leak H2 connections at the ConnectionManager#detachedConns

Re: Consistency check and fix (review request)

Re: Consistency check and fix (review request)

Re: Consistency check and fix (review request)

[jira] [Created] (IGNITE-11756) SQL: implement a table row count statistics for the local queries

Re: Impossible memory region created in TC test

[jira] [Created] (IGNITE-11757) Missed partitions during rebalancing when new blank node joins

[jira] [Created] (IGNITE-11758) Python thin: a lot of documentation files without license header

[jira] [Created] (IGNITE-11759) [ML] Duplicate depenpecies for ml artifacts

[jira] [Created] (IGNITE-11760) [TC Bot] Support escaping or replacement of vertical dash in the suite name

[MTCGA]: new failures in builds [3616767] needs to be handled

[jira] [Created] (IGNITE-11761) Normalize encoding for Ignite .NET test file

[jira] [Created] (IGNITE-11762) Test testClientStartCloseServersRestart causes hang of the whole Cache 2 suite in master

21 matches

Site Navigation

Mail list logo

Footer information