[jira] [Created] (IGNITE-11749) Implement automatic pages history dump on CorruptedTreeException
Alexey Goncharuk created IGNITE-11749: - Summary: Implement automatic pages history dump on CorruptedTreeException Key: IGNITE-11749 URL: https://issues.apache.org/jira/browse/IGNITE-11749 Project: Ignite Issue Type: Improvement Reporter: Alexey Goncharuk Currently, the only way to debug possible bugs in checkpointer/recovery mechanics is to manually parse WAL files after the corruption happened. This is not practical for several reasons. First, it requires manual actions which depend on the content of the exception. Second, it is not always possible to obtain WAL files (it may contain sensitive data). We need to add a mechanics which will dump all information required for primary analysis of the corruption to the exception handler. For example, if an exception happened when materializing a link {{0xabcd}} written on an index page {{0xdcba}}, we need to dump history of both pages changes, checkpoint records on the analysis interval. Possibly, we should include FreeList pages to which the aforementioned pages were included to. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11750) Implement locked pages info for long-running B+Tree operations
Alexey Goncharuk created IGNITE-11750: - Summary: Implement locked pages info for long-running B+Tree operations Key: IGNITE-11750 URL: https://issues.apache.org/jira/browse/IGNITE-11750 Project: Ignite Issue Type: Improvement Reporter: Alexey Goncharuk I've stumbled upon an incident where a batch of Ignite threads were hanging on BPlusTree operations trying to acquire read or write lock on pages. From the thread dump it is impossible to check if there is an issue with {{OffheapReadWriteLock}} or there is a subtle deadlock in the tree. I suggest we implement a timeout for page lock acquire and tracking of locked pages. This should be relatively easy to implement in {{PageHandler}} (the only thing to consider is performance degradation). If a timeout occurs, we should print all the locks currently owned by a thread. This way we should be able to determine if there is a deadlock in the {{BPlusTree}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11751) Javadoc broken
Peter Ivanov created IGNITE-11751: - Summary: Javadoc broken Key: IGNITE-11751 URL: https://issues.apache.org/jira/browse/IGNITE-11751 Project: Ignite Issue Type: Task Reporter: Peter Ivanov Fix For: 2.8 {code} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-javadoc-plugin:3.0.0:aggregate (core-javadoc) on project apache-ignite: An error has occurred in Javadoc report generation: [ERROR] Exit code: 1 - ignite/modules/cassandra/store/src/main/java/org/apache/ignite/cache/store/cassandra/serializer/package-info.java:21: warning: a package-info.java file has already been seen for package org.apache.ignite.cache.store.cassandra.serializer [ERROR] package org.apache.ignite.cache.store.cassandra.serializer; [ERROR]^ [ERROR] javadoc: warning - Multiple sources of package comments found for package "org.apache.ignite.cache.store.cassandra.serializer" [ERROR] javadoc: error - Error - Exception java.lang.ClassNotFoundException thrown while trying to register Taglet org.apache.ignite.tools.javadoc.IgniteLinkTaglet... [ERROR] ignite/modules/core/src/main/java/org/apache/ignite/Ignition.java:88: warning - @ignitelink is an unknown tag. [ERROR] ignite/modules/core/src/main/java/org/apache/ignite/Ignition.java:88: warning - @ignitelink is an unknown tag. [ERROR] ignite/modules/core/src/main/java/org/apache/ignite/Ignition.java:88: warning - @ignitelink is an unknown tag. [ERROR] ignite/modules/core/src/main/java/org/apache/ignite/Ignition.java:88: warning - @ignitelink is an unknown tag. [ERROR] ignite/modules/core/src/main/java/org/apache/ignite/configuration/IgniteConfiguration.java:828: warning - @ignitelink is an unknown tag. [ERROR] ignite/modules/core/src/main/java/org/apache/ignite/configuration/IgniteConfiguration.java:828: warning - @ignitelink is an unknown tag. [ERROR] ignite/modules/core/src/main/java/org/apache/ignite/cache/store/CacheStore.java:71: warning - @ignitelink is an unknown tag. [ERROR] ignite/modules/core/src/main/java/org/apache/ignite/cache/store/CacheStore.java:71: warning - @ignitelink is an unknown tag. [ERROR] ignite/modules/core/src/main/java/org/apache/ignite/cache/store/CacheStoreSessionListener.java:114: warning - @ignitelink is an unknown tag. [ERROR] ignite/modules/core/src/main/java/org/apache/ignite/cache/store/CacheStoreSessionListener.java:114: warning - @ignitelink is an unknown tag. [ERROR] ignite/modules/core/src/main/java/org/apache/ignite/cache/store/CacheStoreSessionListener.java:114: warning - @ignitelink is an unknown tag. [ERROR] ignite/modules/core/src/main/java/org/apache/ignite/cache/store/CacheStoreSessionListener.java:114: warning - @ignitelink is an unknown tag. [ERROR] ignite/modules/core/src/main/java/org/apache/ignite/transactions/Transaction.java:120: warning - @ignitelink is an unknown tag. [ERROR] ignite/modules/core/src/main/java/org/apache/ignite/transactions/Transaction.java:120: warning - @ignitelink is an unknown tag. [ERROR] ignite/modules/core/src/main/java/org/apache/ignite/spi/checkpoint/CheckpointSpi.java:60: warning - @ignitelink is an unknown tag. [ERROR] ignite/modules/core/src/main/java/org/apache/ignite/spi/checkpoint/CheckpointSpi.java:60: warning - @ignitelink is an unknown tag. [ERROR] ignite/modules/core/src/main/java/org/apache/ignite/spi/discovery/tcp/TcpDiscoverySpi.java:233: warning - @ignitelink is an unknown tag. [ERROR] ignite/modules/core/src/main/java/org/apache/ignite/spi/discovery/tcp/TcpDiscoverySpi.java:233: warning - @ignitelink is an unknown tag. [ERROR] ignite/modules/core/src/main/java/org/apache/ignite/spi/deployment/DeploymentSpi.java:61: warning - @ignitelink is an unknown tag. [ERROR] ignite/modules/core/src/main/java/org/apache/ignite/spi/deployment/DeploymentSpi.java:61: warning - @ignitelink is an unknown tag. [ERROR] ignite/modules/core/src/main/java/org/apache/ignite/compute/gridify/GridifySetToSet.java:154: warning - @ignitelink is an unknown tag. [ERROR] ignite/modules/core/src/main/java/org/apache/ignite/compute/gridify/GridifySetToSet.java:154: warning - @ignitelink is an unknown tag. [ERROR] ignite/modules/core/src/main/java/org/apache/ignite/compute/gridify/GridifySetToValue.java:152: warning - @ignitelink is an unknown tag. [ERROR] ignite/modules/core/src/main/java/org/apache/ignite/compute/gridify/GridifySetToValue.java:152: warning - @ignitelink is an unknown tag. [ERROR] ignite/modules/spring/src/main/java/org/apache/ignite/cache/spring/SpringCacheManager.java:145: warning - @ignitelink is an unknown tag. [ERROR] ignite/modules/spring/src/main/java/org/apache/ignite/cache/spring/SpringCacheManager.java:145: warning - @ignitelink is an unknown tag. [ERROR] ignite/modules/spring/src/main/java/org/apache/ignite/transactions/spring/SpringTransactionManager.java:
[jira] [Created] (IGNITE-11752) Refactor usages of "System.getenv(key)" to IgniteSystemProperties.getString(key)
Alexey Kuznetsov created IGNITE-11752: - Summary: Refactor usages of "System.getenv(key)" to IgniteSystemProperties.getString(key) Key: IGNITE-11752 URL: https://issues.apache.org/jira/browse/IGNITE-11752 Project: Ignite Issue Type: Improvement Components: general Reporter: Alexey Kuznetsov Assignee: Alexey Kuznetsov IgniteSystemProperties.getString(key) implemented as: 1. Try to get property from System.properties. 2. If not found - try to get from System.getenv In Java you could easily override System.properties from code, for testing purposes, for example, but it is almost impossible to do the same for environment variables. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Consistency check and fix (review request)
Anton, Thank you for your effort for improving consistency guarantees provided by Ignite. The subject sounds really vital. Could you please elaborate why it comes as an on-demand enabled proxy but not as a mode enabled by some configuration property (or even as a default behavior)? How do you see the future development of such consistency checks? As for me it will be great if we can improve consistency guarantees provided by default. Also thinking loud a bit: 1. It sounds suspicious that reads can cause writes (unexpected deadlocks might be possible). 2. I do not believe that it is possible to implement a (bugless?) feature which will fix other bugs. 3. A storage (or database) product (Ignite in our case) consistency is not equal to a user application consistency. So, it might be that introduced checks are insufficient to make business applications happy. пн, 15 апр. 2019 г. в 19:27, Andrey Gura : > > Anton, > > I'm trying tell you that this proxy can produce false positive result, > incorrect result and just hide bugs. What will the next solution? > withNoBugs proxy? > > You can perform consistency check using idle verify utility. Recovery > tool is good idea but user should trigger this process, not some cache > proxy implementation. > > On Mon, Apr 15, 2019 at 5:34 PM Anton Vinogradov wrote: > > > > Seems, we already fixed all bugs caused this feature, but there is no > > warranty we will not create new :) > > This proxy is just checker that consistency is ok. > > > > >> reaching bugless implementation > > Not sure it's possible. Once you have software it contains bugs. > > This proxy will tell you whether these bugs lead to inconsistency. > > > > On Mon, Apr 15, 2019 at 5:19 PM Andrey Gura wrote: > > > > > Method name is minor problem. I still believe that there is no need > > > for this proxy because there are no any guarantees about bugless > > > implementation this functionality. Better way is reaching bugless > > > implementation of current functionality. > > > > > > On Mon, Apr 15, 2019 at 4:51 PM Anton Vinogradov wrote: > > > > > > > > Andrey, > > > > > > > > >> It means also that at least method name is bad. > > > > Agreed, already discussed with Aleksey Plekhanov. > > > > Decided that ".withConsistencyCheck()" is a proper name. > > > > > > > > >> What is the profit? > > > > This proxy allows to check (and fix) is there any consistency violation > > > > across the topology. > > > > The proxy will check all backups contain the same values as primary. > > > > So, when it's possible (you're ready to spend resources for this check) > > > you > > > > will be able to read-with-consistency-check. > > > > This will decrease the amount of "inconsistency caused > > > > war/strikes/devastation" situations, which is important for financial > > > > systems. > > > > > > > > On Mon, Apr 15, 2019 at 3:58 PM Andrey Gura wrote: > > > > > > > > > Anton, > > > > > > > > > > what does expression "withConsistency" mean? From user's standpoint it > > > > > means that all operations performed without this proxy are not > > > > > consistent. It means also that at least method name is bad. > > > > > > > > > > Are there any guarantees that withConsistency proxy will not contain > > > > > bugs that will lead to inconsistent write after inconsistency was > > > > > found? I think there are no such guarantees. Bugs still are possible. > > > > > So I always must use withConsistency proxy because I doesn't have > > > > > other choice - all ways are unreliable and withConsistency just sounds > > > > > better. > > > > > > > > > > Eventually we will have two different ways for working with cache > > > > > values with different bugs set. What is the profit? > > > > > > > > > > > > > > > > > > > > On Fri, Apr 12, 2019 at 2:49 PM Anton Vinogradov > > > wrote: > > > > > > > > > > > > Folks, > > > > > > > > > > > > I've checked the tx benchmarks and found no performance drop. > > > > > > Also, see no issues at TC results. > > > > > > So, seems, code ready to be merged. > > > > > > > > > > > > Everyone interested, please share any objections about > > > > > > - public API > > > > > > - test coverage > > > > > > - implementation approach > > > > > > > > > > > > On Wed, Apr 3, 2019 at 5:46 PM Anton Vinogradov > > > wrote: > > > > > > > > > > > > > Nikolay, > > > > > > > > > > > > > > This is not a PoC, but the final solution (I hope so:) ) required > > > the > > > > > > > review. > > > > > > > LWW means Last Write Wins, detailed explanation can be found at > > > IEP-31. > > > > > > > > > > > > > > On Wed, Apr 3, 2019 at 5:24 PM Nikolay Izhikov < > > > nizhi...@apache.org> > > > > > > > wrote: > > > > > > > > > > > > > >> Hello, Anton. > > > > > > >> > > > > > > >> Thanks for the PoC. > > > > > > >> > > > > > > >> > finds correct values according to LWW strategy > > > > > > >> > > > > > > >> Can you, please, clarify what is LWW strategy? > > > > > > >> > > > > > > >> В Ср, 03/04/2019 в 17:19 +0300, Anton Vinogradov пишет: > > > >
[jira] [Created] (IGNITE-11753) control.sh improve error message in case of connection to secured cluster without credentials.
Sergey Antonov created IGNITE-11753: --- Summary: control.sh improve error message in case of connection to secured cluster without credentials. Key: IGNITE-11753 URL: https://issues.apache.org/jira/browse/IGNITE-11753 Project: Ignite Issue Type: Improvement Reporter: Sergey Antonov If control.sh tries to connect to secured cluster without login/password now we got: {noformat} ./control.sh --state Failed to get cluster state. Authentication error, try connection again. user: {noformat} We should print info about attempt to connect to secured cluster and request login/password if it isn't set. I.e. {noformat} ./control.sh --state Failed to get cluster state. Cluster required authentication. user: {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11754) Memory leak on the GridCacheTxFinishSync#threadMap
Taras Ledkov created IGNITE-11754: - Summary: Memory leak on the GridCacheTxFinishSync#threadMap Key: IGNITE-11754 URL: https://issues.apache.org/jira/browse/IGNITE-11754 Project: Ignite Issue Type: Bug Components: general, mvcc Affects Versions: 2.7 Reporter: Taras Ledkov Fix For: 2.8 The {{GridCacheTxFinishSync#threadMap}} is not cleared when tx thread is terminated. So, memory leak happens when transactions are executed inside new start/stopped threads. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: New Committer: Vyacheslav Daradur
Thank you! I'm glad to contribute to development of the project. On Fri, Apr 12, 2019 at 1:14 PM Denis Mekhanikov wrote: > > Well done Slava! > > It was great working with you on the service grid redesign. > Looking forward to seeing new commits from you! > > Denis > > чт, 11 апр. 2019 г. в 18:27, Denis Magda : > > > Well deserved, Vyacheslav! Thanks for hardening Service Grid pushing it to > > a completely next level! > > > > - > > Denis > > > > > > On Thu, Apr 11, 2019 at 7:00 AM Dmitriy Pavlov wrote: > > > > > Dear Ignite Developers, > > > > > > The Project Management Committee (PMC) for Apache Ignite has invited > > > Vyacheslav Daradur to become a committer and we are pleased to announce > > > that he has accepted. Apache Ignite PMC appreciates Vyacheslav’s > > > contribution to service grid redesign (is was collaborative efforts. BTW, > > > thanks to everyone involved), compatibility test framework, contribution > > to > > > community development, and to abbreviation plugin. > > > > > > Being a committer enables easier contribution to the project since there > > is > > > no need to go via the patch submission process. This should enable better > > > productivity. > > > > > > Please join me in welcoming Vyacheslav, and congratulating him on the new > > > role in the Apache Ignite Community. > > > > > > Best Regards, > > > Dmitriy Pavlov > > > on behalf of the Apache Ignite PMC > > > > > -- Best Regards, Vyacheslav D.
[jira] [Created] (IGNITE-11755) Memory leak H2 connections at the ConnectionManager#detachedConns
Taras Ledkov created IGNITE-11755: - Summary: Memory leak H2 connections at the ConnectionManager#detachedConns Key: IGNITE-11755 URL: https://issues.apache.org/jira/browse/IGNITE-11755 Project: Ignite Issue Type: Bug Components: sql Affects Versions: 2.7 Reporter: Taras Ledkov Assignee: Taras Ledkov Fix For: 2.8 {{ConnectionManager#detachedConns}} leaks on mvcc transnational SELECT. Reproduce: 1. CREATE TABLE with enabled MVCC 2. Do SELECTs. 3. Each query is executed at the new JDBC thin connection. A connection is closed after query. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Consistency check and fix (review request)
Andrey, thanks for tips >> You can perform consistency check using idle verify utility. Could you please point to utility's page? According to its name, it requires to stop the cluster to perform the check? That's impossible at real production when you should have downtime less that some minutes per year. So, the only case I see is to use online check during periods of moderate activity. >> Recovery tool is good idea This tool is a part of my IEP. But recovery tool (process) - will allow you to check entries in memory only (otherwise, you will warm up the cluster incorrectly), and that's a problem when you have persisted/in_memory rate > 10:1 - will cause latency drop for some (eg. 90+ percentile) requests, which is not acceptable for real production, when we have strict SLA. - will not guarantee that each operation will use consistent data, sometimes it's extremely essential so, the process is a cool idea, but, sometime you may need more. Ivan, thanks for analysis >> why it comes as an on-demand enabled proxy but not as a mode enabled by some configuration property It's a bad idea to have this feature permanently enabled, it slows down the system by design. Customer should be able to change strategy on the fly according to time periods or load. Also, we're going to use this proxy for odd requests or for every 5-th, 10-th, 100-th request depends on the load/time/SLA/etc. The goal is to perform as much as possible gets-with-consistency operations without stopping the cluster and never find a problem :) >> As for me it will be great if we can improve consistency guarantees provided by default. Once you checked backups you decreased throughput and increased latency. This feature requred only for some financial, nuclear, health systems when you should be additionally sure about consistency. It's like a - read from backups - data modification outside the transaction - using FULL_ASYNC instead of FULL_SYNC, sometimes it's possible, sometimes not. >> 1. It sounds suspicious that reads can cause writes (unexpected deadlocks might be possible). Code performs writes - key per additional transaction in case original tx was OPTIMISTIC || READ_COMMITTED, - all keys per same tx in case original tx was PESSIMISTIC && !READ_COMMITTED, since you already obtain the locks, so, deadlock should be impossible. >> 2. I do not believe that it is possible to implement a (bugless?) feature which will fix other bugs. It does not fix the bugs, it looks for inconsistency (no matter how it happened) and reports using events (previous state and how it was fixed). This allows continuing processing for all the entries, even inconsistent. But, each such fix should be rechecked manually, for sure. On Tue, Apr 16, 2019 at 11:39 AM Павлухин Иван wrote: > Anton, > > Thank you for your effort for improving consistency guarantees > provided by Ignite. > > The subject sounds really vital. Could you please elaborate why it > comes as an on-demand enabled proxy but not as a mode enabled by some > configuration property (or even as a default behavior)? How do you see > the future development of such consistency checks? As for me it will > be great if we can improve consistency guarantees provided by default. > > Also thinking loud a bit: > 1. It sounds suspicious that reads can cause writes (unexpected > deadlocks might be possible). > 2. I do not believe that it is possible to implement a (bugless?) > feature which will fix other bugs. > 3. A storage (or database) product (Ignite in our case) consistency is > not equal to a user application consistency. So, it might be that > introduced checks are insufficient to make business applications > happy. > > пн, 15 апр. 2019 г. в 19:27, Andrey Gura : > > > > Anton, > > > > I'm trying tell you that this proxy can produce false positive result, > > incorrect result and just hide bugs. What will the next solution? > > withNoBugs proxy? > > > > You can perform consistency check using idle verify utility. Recovery > > tool is good idea but user should trigger this process, not some cache > > proxy implementation. > > > > On Mon, Apr 15, 2019 at 5:34 PM Anton Vinogradov wrote: > > > > > > Seems, we already fixed all bugs caused this feature, but there is no > > > warranty we will not create new :) > > > This proxy is just checker that consistency is ok. > > > > > > >> reaching bugless implementation > > > Not sure it's possible. Once you have software it contains bugs. > > > This proxy will tell you whether these bugs lead to inconsistency. > > > > > > On Mon, Apr 15, 2019 at 5:19 PM Andrey Gura wrote: > > > > > > > Method name is minor problem. I still believe that there is no need > > > > for this proxy because there are no any guarantees about bugless > > > > implementation this functionality. Better way is reaching bugless > > > > implementation of current functionality. > > > > > > > > On Mon, Apr 15, 2019 at 4:51 PM Anton Vinogradov > wrote: > > > > > > > > > > Andrey, > > > > > > > > > > >>
Re: Consistency check and fix (review request)
Hello, Anton. > Customer should be able to change strategy on the fly according to time> > periods or load. I think we should allow to administrator to enable/disable Consistency check. This option shouldn't be related to application code because "Consistency check" is some kind of maintance procedure. What do you think? В Вт, 16/04/2019 в 12:47 +0300, Anton Vinogradov пишет: > Andrey, thanks for tips > > > > You can perform consistency check using idle verify utility. > > Could you please point to utility's page? > According to its name, it requires to stop the cluster to perform the check? > That's impossible at real production when you should have downtime less > that some minutes per year. > So, the only case I see is to use online check during periods of moderate > activity. > > > > Recovery tool is good idea > > This tool is a part of my IEP. > But recovery tool (process) > - will allow you to check entries in memory only (otherwise, you will warm > up the cluster incorrectly), and that's a problem when you have > persisted/in_memory rate > 10:1 > - will cause latency drop for some (eg. 90+ percentile) requests, which is > not acceptable for real production, when we have strict SLA. > - will not guarantee that each operation will use consistent data, > sometimes it's extremely essential > so, the process is a cool idea, but, sometime you may need more. > > Ivan, thanks for analysis > > > > why it comes as an on-demand enabled proxy but not as a mode enabled by > > some configuration property > It's a bad idea to have this feature permanently enabled, it slows down the > system by design. > Customer should be able to change strategy on the fly according to time > periods or load. > Also, we're going to use this proxy for odd requests or for every 5-th, > 10-th, 100-th request depends on the load/time/SLA/etc. > The goal is to perform as much as possible gets-with-consistency operations > without stopping the cluster and never find a problem :) > > > > As for me it will be great if we can improve consistency guarantees > > provided by default. > Once you checked backups you decreased throughput and increased latency. > This feature requred only for some financial, nuclear, health systems when > you should be additionally sure about consistency. > It's like a > - read from backups > - data modification outside the transaction > - using FULL_ASYNC instead of FULL_SYNC, > sometimes it's possible, sometimes not. > > > > 1. It sounds suspicious that reads can cause writes (unexpected > > deadlocks might be possible). > Code performs writes > - key per additional transaction in case original tx was OPTIMISTIC || > READ_COMMITTED, > - all keys per same tx in case original tx was PESSIMISTIC && > !READ_COMMITTED, since you already obtain the locks, > so, deadlock should be impossible. > > > > 2. I do not believe that it is possible to implement a (bugless?) > > feature which will fix other bugs. > It does not fix the bugs, it looks for inconsistency (no matter how it > happened) and reports using events (previous state and how it was fixed). > This allows continuing processing for all the entries, even inconsistent. > But, each such fix should be rechecked manually, for sure. > > On Tue, Apr 16, 2019 at 11:39 AM Павлухин Иван wrote: > > > Anton, > > > > Thank you for your effort for improving consistency guarantees > > provided by Ignite. > > > > The subject sounds really vital. Could you please elaborate why it > > comes as an on-demand enabled proxy but not as a mode enabled by some > > configuration property (or even as a default behavior)? How do you see > > the future development of such consistency checks? As for me it will > > be great if we can improve consistency guarantees provided by default. > > > > Also thinking loud a bit: > > 1. It sounds suspicious that reads can cause writes (unexpected > > deadlocks might be possible). > > 2. I do not believe that it is possible to implement a (bugless?) > > feature which will fix other bugs. > > 3. A storage (or database) product (Ignite in our case) consistency is > > not equal to a user application consistency. So, it might be that > > introduced checks are insufficient to make business applications > > happy. > > > > пн, 15 апр. 2019 г. в 19:27, Andrey Gura : > > > > > > Anton, > > > > > > I'm trying tell you that this proxy can produce false positive result, > > > incorrect result and just hide bugs. What will the next solution? > > > withNoBugs proxy? > > > > > > You can perform consistency check using idle verify utility. Recovery > > > tool is good idea but user should trigger this process, not some cache > > > proxy implementation. > > > > > > On Mon, Apr 15, 2019 at 5:34 PM Anton Vinogradov wrote: > > > > > > > > Seems, we already fixed all bugs caused this feature, but there is no > > > > warranty we will not create new :) > > > > This proxy is just checker that consistency is ok. > > > > > > > > > > reaching bugles
Re: Consistency check and fix (review request)
Nikolay, that was the first approach >> I think we should allow to the administrator to enable/disable Consistency check. In that case, we have to introduce cluster-wide change-strategy operation, since every client node should be aware of the change. Also, we have to specify caches list, and for each - should we check each request or only 5-th and so on. Procedure and configuration become overcomplicated in this case. My idea that specific service will be able to use a special proxy according to its own strategy (eg. when administrator inside the building and boss is sleeping - all operations on "cache[a,b,c]ed*" should check the consistency). All service clients will have the same guarantees in that case. So in other words, consistency should be guaranteed by service, not by Ignite. Service should guarantee consistency not only using new proxy but, for example, using correct isolation fo txs. That's not a good Idea to specify isolation mode for Ignite, same situation with get-with-consistency-check. On Tue, Apr 16, 2019 at 12:56 PM Nikolay Izhikov wrote: > Hello, Anton. > > > Customer should be able to change strategy on the fly according to time> > periods or load. > > I think we should allow to administrator to enable/disable Consistency > check. > This option shouldn't be related to application code because "Consistency > check" is some kind of maintance procedure. > > What do you think? > > В Вт, 16/04/2019 в 12:47 +0300, Anton Vinogradov пишет: > > Andrey, thanks for tips > > > > > > You can perform consistency check using idle verify utility. > > > > Could you please point to utility's page? > > According to its name, it requires to stop the cluster to perform the > check? > > That's impossible at real production when you should have downtime less > > that some minutes per year. > > So, the only case I see is to use online check during periods of moderate > > activity. > > > > > > Recovery tool is good idea > > > > This tool is a part of my IEP. > > But recovery tool (process) > > - will allow you to check entries in memory only (otherwise, you will > warm > > up the cluster incorrectly), and that's a problem when you have > > persisted/in_memory rate > 10:1 > > - will cause latency drop for some (eg. 90+ percentile) requests, which > is > > not acceptable for real production, when we have strict SLA. > > - will not guarantee that each operation will use consistent data, > > sometimes it's extremely essential > > so, the process is a cool idea, but, sometime you may need more. > > > > Ivan, thanks for analysis > > > > > > why it comes as an on-demand enabled proxy but not as a mode enabled > by > > > > some configuration property > > It's a bad idea to have this feature permanently enabled, it slows down > the > > system by design. > > Customer should be able to change strategy on the fly according to time > > periods or load. > > Also, we're going to use this proxy for odd requests or for every 5-th, > > 10-th, 100-th request depends on the load/time/SLA/etc. > > The goal is to perform as much as possible gets-with-consistency > operations > > without stopping the cluster and never find a problem :) > > > > > > As for me it will be great if we can improve consistency guarantees > > > > provided by default. > > Once you checked backups you decreased throughput and increased latency. > > This feature requred only for some financial, nuclear, health systems > when > > you should be additionally sure about consistency. > > It's like a > > - read from backups > > - data modification outside the transaction > > - using FULL_ASYNC instead of FULL_SYNC, > > sometimes it's possible, sometimes not. > > > > > > 1. It sounds suspicious that reads can cause writes (unexpected > > > > deadlocks might be possible). > > Code performs writes > > - key per additional transaction in case original tx was OPTIMISTIC || > > READ_COMMITTED, > > - all keys per same tx in case original tx was PESSIMISTIC && > > !READ_COMMITTED, since you already obtain the locks, > > so, deadlock should be impossible. > > > > > > 2. I do not believe that it is possible to implement a (bugless?) > > > > feature which will fix other bugs. > > It does not fix the bugs, it looks for inconsistency (no matter how it > > happened) and reports using events (previous state and how it was fixed). > > This allows continuing processing for all the entries, even inconsistent. > > But, each such fix should be rechecked manually, for sure. > > > > On Tue, Apr 16, 2019 at 11:39 AM Павлухин Иван > wrote: > > > > > Anton, > > > > > > Thank you for your effort for improving consistency guarantees > > > provided by Ignite. > > > > > > The subject sounds really vital. Could you please elaborate why it > > > comes as an on-demand enabled proxy but not as a mode enabled by some > > > configuration property (or even as a default behavior)? How do you see > > > the future development of such consistency checks? As for me it will > > > be great if we can im
[jira] [Created] (IGNITE-11756) SQL: implement a table row count statistics for the local queries
Roman Kondakov created IGNITE-11756: --- Summary: SQL: implement a table row count statistics for the local queries Key: IGNITE-11756 URL: https://issues.apache.org/jira/browse/IGNITE-11756 Project: Ignite Issue Type: Improvement Components: sql Reporter: Roman Kondakov Row count statistics should help the H2 optimizer to select the better query execution plan. Currently the row count supplied to H2 engine is hardcoded value == 1 (see {{org.h2.index.Index#getRowCountApproximation}}). As a first step we can provide an actual table size in the case of local query. To prevent counting size on each invocation we can cache row count value and invalidate it in some cases: * Rebalancing * Multiple updates (after the initial loading) * On timeout (i.e. 1 minute) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Impossible memory region created in TC test
Hello, Igniters. > Why test on TC can allocate Data Region bigger than the amount of available > RAM? Seems, we are using `echo 1 > /proc/sys/vm/overcommit_memory` on TC. So I should fix my test :). В Пн, 15/04/2019 в 13:41 +0300, Nikolay Izhikov пишет: > Hello, Ilya. > > > Ignite does not touch every page of data region, and until you touch a page > > OS will not allocate any specific physical RAM to the virtual RAM address > > of that page. > > This is not true. > Take a look into this discussion [1] > > > Moreover, AFAIK Ignite will not even allocate all the memory permitted by > > data region until it is needed > > May be I miss something, but > > 1. If persistenEnabled = false Ignite will allocate 1 segment on start [2] > 2. If persistenEnabled = true Ignite will allocate all segments on start [3] > > > If you would use Pre-Touch feature which was suggested in this developer > > list a few months ago, you will see it fail explicitly. > > Locally, test already work as expected. > But on TC it fails and DataRegion of 1024GiB can be created. > > Seems, there is some flag on TC that enable this behaviour. > > [1] > http://apache-ignite-developers.2346864.n4.nabble.com/Data-regions-on-client-nodes-td32834.html > [2] > https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/pagemem/impl/PageMemoryNoStoreImpl.java#L245 > [3] > https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/pagemem/PageMemoryImpl.java#L386 > > > В Пн, 15/04/2019 в 10:18 +0300, Ilya Kasnacheev пишет: > > Hello! > > > > Ignite does not touch every page of data region, and until you touch a page > > OS will not allocate any specific physical RAM to the virtual RAM address > > of that page. > > > > Moreover, AFAIK Ignite will not even allocate all the memory permitted by > > data region until it is needed. It will allocate memory in chunks, which > > means your system will slow to grind trying to find RAM for the next chunk > > as you try to load data into such node. > > > > If you would use Pre-Touch feature which was suggested in this developer > > list a few months ago, you will see it fail explicitly. > > > > Regards, signature.asc Description: This is a digitally signed message part
[jira] [Created] (IGNITE-11757) Missed partitions during rebalancing when new blank node joins
Ilya Kasnacheev created IGNITE-11757: Summary: Missed partitions during rebalancing when new blank node joins Key: IGNITE-11757 URL: https://issues.apache.org/jira/browse/IGNITE-11757 Project: Ignite Issue Type: Bug Components: cache Reporter: Ilya Kasnacheev Assignee: Ivan Rakov Please take a look at newly added test GridCachePartitionedSupplyEventsSelfTest.testSupplyEvents There's logging of missed partitions during rebalancing, and as you can see partitions are missed even when a new node joins stable topology, with no nodes leaving. Expected behavior is that in this case no partitions will be missed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11758) Python thin: a lot of documentation files without license header
Igor Sapego created IGNITE-11758: Summary: Python thin: a lot of documentation files without license header Key: IGNITE-11758 URL: https://issues.apache.org/jira/browse/IGNITE-11758 Project: Ignite Issue Type: Bug Components: documentation, thin client Affects Versions: 2.7 Reporter: Igor Sapego Fix For: 2.8 There are a lot of .rst documentation files in modules/platforms/python/docs/ that does not contain license header. We need either delete them if they are auto generated or add headers to them if they are not. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11759) [ML] Duplicate depenpecies for ml artifacts
Yury Babak created IGNITE-11759: --- Summary: [ML] Duplicate depenpecies for ml artifacts Key: IGNITE-11759 URL: https://issues.apache.org/jira/browse/IGNITE-11759 Project: Ignite Issue Type: Improvement Components: ml Affects Versions: 2.7 Reporter: Yury Babak -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11760) [TC Bot] Support escaping or replacement of vertical dash in the suite name
Dmitriy Pavlov created IGNITE-11760: --- Summary: [TC Bot] Support escaping or replacement of vertical dash in the suite name Key: IGNITE-11760 URL: https://issues.apache.org/jira/browse/IGNITE-11760 Project: Ignite Issue Type: Task Reporter: Dmitriy Pavlov Assignee: Dmitriy Pavlov Usage of same special symbol in JIRA makes TC bot visa unreadable -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[MTCGA]: new failures in builds [3616767] needs to be handled
Hi Igniters, I've detected some new issue on TeamCity to be handled. You are more than welcomed to help. If your changes can lead to this failure(s): We're grateful that you were a volunteer to make the contribution to this project, but things change and you may no longer be able to finalize your contribution. Could you respond to this email and indicate if you wish to continue and fix test failures or step down and some committer may revert you commit. *Recently contributed test failed in master GridP2PComputeWithNestedEntryProcessorTest.testContinuousMode https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=-266239113081528&branch=%3Cdefault%3E&tab=testDetails *Recently contributed test failed in master GridP2PComputeWithNestedEntryProcessorTest.testSharedMode https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=490604068578396435&branch=%3Cdefault%3E&tab=testDetails Changes may lead to failure were done by - vldpyatkov https://ci.ignite.apache.org/viewModification.html?modId=880634 - Here's a reminder of what contributors were agreed to do https://cwiki.apache.org/confluence/display/IGNITE/How+to+Contribute - Should you have any questions please contact dev@ignite.apache.org Best Regards, Apache Ignite TeamCity Bot https://github.com/apache/ignite-teamcity-bot Notification generated at 19:44:42 16-04-2019
[jira] [Created] (IGNITE-11761) Normalize encoding for Ignite .NET test file
Dmitriy Pavlov created IGNITE-11761: --- Summary: Normalize encoding for Ignite .NET test file Key: IGNITE-11761 URL: https://issues.apache.org/jira/browse/IGNITE-11761 Project: Ignite Issue Type: Task Reporter: Dmitriy Pavlov It is encoded in UTF-16, but all other files are UTF-8 Idea blocks me from changing encoding because of BOM exists. https://stackoverflow.com/questions/32986445/remove-a-bom-character-in-a-file -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11762) Test testClientStartCloseServersRestart causes hang of the whole Cache 2 suite in master
Ivan Rakov created IGNITE-11762: --- Summary: Test testClientStartCloseServersRestart causes hang of the whole Cache 2 suite in master Key: IGNITE-11762 URL: https://issues.apache.org/jira/browse/IGNITE-11762 Project: Ignite Issue Type: Bug Reporter: Ivan Rakov Assignee: Pavel Kovalenko Fix For: 2.8 Attempt to restart server node in test hangs: {code:java} [2019-04-16 19:56:45,049][WARN ][restart-1][GridCachePartitionExchangeManager] Failed to wait for initial partition map exchange. Possible reasons are: ^-- Transactions in deadlock. ^-- Long running transactions (ignore if this is the case). ^-- Unreleased explicit locks. {code} The reason is that previous PME (late affinity assignment) still hangs due to pending transaction: {code:java} [2019-04-16 19:56:23,717][WARN ][exchange-worker-#1039%cache.IgniteClientCacheStartFailoverTest3%][diagnostic] Pending transactions: [2019-04-16 19:56:23,718][WARN ][exchange-worker-#1039%cache.IgniteClientCacheStartFailoverTest3%][diagnostic] >>> [txVer=AffinityTopologyVersion [topVer=11, minorTopVer=0], exchWait=true, tx=GridDhtTxLocal [nearNodeId=8559bfe0-3d4a-4090-a457-6df0eba5, nearFutId=1edc7172a61-941f9dde-2b60-4a1f-8213-7d23d738bf33, nearMiniId=1, nearFinFutId=null, nearFinMiniId=0, nearXidVer=GridCacheVersion [topVer=166913752, order=1555433759036, nodeOrder=6], lb=null, super=GridDhtTxLocalAdapter [nearOnOriginatingNode=false, nearNodes=KeySetView [], dhtNodes=KeySetView [9ef33532-0e4a-4561-b57e-042afe10], explicitLock=false, super=IgniteTxLocalAdapter [completedBase=null, sndTransformedVals=false, depEnabled=false, txState=IgniteTxStateImpl [activeCacheIds=[-1062368467], recovery=false, mvccEnabled=true, mvccCachingCacheIds=[], txMap=HashSet []], super=IgniteTxAdapter [xidVer=GridCacheVersion [topVer=166913752, order=1555433759045, nodeOrder=10], writeVer=null, implicit=false, loc=true, threadId=1210, startTime=1555433762847, nodeId=0088e9b8-f859-4d14-8071-6388e473, startVer=GridCacheVersion [topVer=166913752, order=1555433759045, nodeOrder=10], endVer=null, isolation=REPEATABLE_READ, concurrency=PESSIMISTIC, timeout=0, sysInvalidate=false, sys=false, plc=2, commitVer=GridCacheVersion [topVer=166913752, order=1555433759045, nodeOrder=10], finalizing=NONE, invalidParts=null, state=MARKED_ROLLBACK, timedOut=false, topVer=AffinityTopologyVersion [topVer=11, minorTopVer=0], mvccSnapshot=MvccSnapshotResponse [futId=292, crdVer=1555433741506, cntr=395, opCntr=1, txs=[394], cleanupVer=390, tracking=0], skipCompletedVers=false, parentTx=null, duration=20866ms, onePhaseCommit=false], size=0 {code} However, load threads don't start any explicit transactions: they either hang on put()/get() or on clientCache.close(). Rolling back IGNITE-10799 resolves the issue (however, test remains flaky with ~10% fail rate due to unhandled TransactionSerializationException). -- This message was sent by Atlassian JIRA (v7.6.3#76005)