[jira] [Created] (IGNITE-13371) Sporadic partition inconsistency after historical rebalancing of updates with same key put-remove pattern
Ivan Rakov created IGNITE-13371: --- Summary: Sporadic partition inconsistency after historical rebalancing of updates with same key put-remove pattern Key: IGNITE-13371 URL: https://issues.apache.org/jira/browse/IGNITE-13371 Project: Ignite Issue Type: Bug Reporter: Ivan Rakov Assignee: Ivan Rakov Fix For: 2.10 h4. scenario # start 3 servers 3 clients, create caches # clients start combined put + 1% remove of data in transactions PESSIMISTIC/REPEATABLE_READ ## kill one node ## restart one node # ensure all transactions completed # run idle_verify Expected: no conflicts found Actual: {noformat} [12:03:18][:55 :230] Control utility --cache idle_verify --skip-zeros --cache-filter PERSISTENT [12:03:20][:55 :230] Control utility [ver. 8.7.13#20200228-sha1:7b016d63] [12:03:20][:55 :230] 2020 Copyright(C) GridGain Systems, Inc. and Contributors [12:03:20][:55 :230] User: prtagent [12:03:20][:55 :230] Time: 2020-03-03T12:03:19.836 [12:03:20][:55 :230] Command [CACHE] started [12:03:20][:55 :230] Arguments: --host 172.25.1.11 --port 11211 --cache idle_verify --skip-zeros --cache-filter PERSISTENT [12:03:20][:55 :230] [12:03:20][:55 :230] idle_verify task was executed with the following args: caches=[], excluded=[], cacheFilter=[PERSISTENT] [12:03:20][:55 :230] idle_verify check has finished, found 1 conflict partitions: [counterConflicts=0, hashConflicts=1] [12:03:20][:55 :230] Hash conflicts: [12:03:20][:55 :230] Conflict partition: PartitionKeyV2 [grpId=1338167321, grpName=cache_group_3_088_1, partId=24] [12:03:20][:55 :230] Partition instances: [PartitionHashRecordV2 [isPrimary=false, consistentId=node_1_2, updateCntr=172349, partitionState=OWNING, size=6299, partHash=157875238], PartitionHashRecordV2 [isPrimary=true, consistentId=node_1_1, updateCntr=172349, partitionState=OWNING, size=6299, partHash=157875238], PartitionHashRecordV2 [isPrimary=false, consistentId=node_1_4, updateCntr=172349, partitionState=OWNING, size=6300, partHash=-944532882]] [12:03:20][:55 :230] Command [CACHE] finished with code: 0 [12:03:20][:55 :230] Control utility has completed execution at: 2020-03-03T12:03:20.593 [12:03:20][:55 :230] Execution time: 757 ms {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSSION] Add index rebuild time metrics
I seem to be in the minority here :) Fine, let's make it as clear as possible which metric method (localCacheSize) should be called in order to retrieve a 100% progress milestone. I've left comments in the PR. On Tue, Aug 11, 2020 at 4:31 PM Nikolay Izhikov wrote: > > I propose to stick with a cache-group level metric (e.g. > getIndexBuildProgress) > > +1 > > > that returns a float from 0 to 1, which is calculated as [processedKeys] > / [localCacheSize]. > > From my point of view, we shouldn’t do calculations on the Ignite side if > we can avoid it. > I’d rather provide two separate metrics - processedKeys and localCacheSize. > > > 11 авг. 2020 г., в 16:26, Ivan Rakov написал(а): > > > >> > >> As a compromise, I can add jmx methods (rebuilding indexes in the > process > >> and the percentage of rebuilding) for the entire node, but I tried to > find > >> a suitable place and did not find it, tell me where to add it? > > > > I have checked existing JMX beans. To be honest, I struggle to find a > > suitable place as well. > > We have ClusterMetrics that may represent the state of a local node, but > > this class is also used for aggregated cluster metrics. I can't propose a > > reasonable way to merge percentages from different nodes. > > On the other hand, total index rebuild for all caches isn't a common > > scenario. It's either performed after manual index.bin removal or after > > index creation, both operations are performed on cache / cache-group > level. > > Also, all other similar metrics are provided on cache-group level. > > > > I propose to stick with a cache-group level metric (e.g. > > getIndexBuildProgress) that returns a float from 0 to 1, which is > > calculated as [processedKeys] / [localCacheSize]. Even if a user handles > > metrics through Zabbix, I anticipate that he'll perform this calculation > on > > his own in order to estimate progress. Let's help him a bit and perform > it > > on the system side. > > If a per-group percentage metric is present, I > > think getIndexRebuildKeyProcessed becomes redundant. > > > > On Tue, Aug 11, 2020 at 8:20 AM ткаленко кирилл > > wrote: > > > >> Hi, Ivan! > >> > >> What precision would be sufficient? > >>> If the progress is very slow, I don't see issues with tracking it if > the > >>> percentage float has enough precision. > >> > >> I think we can add a mention getting cache size. > >>> 1. Gain an understanding that local cache size > >>> (CacheMetricsImpl#getCacheSize) should be used as a 100% milestone (it > >>> isn't mentioned neither in javadoc nor in JMX method description). > >> > >> Do you think users collect metrics with their hands? I think this is > done > >> by other systems, such as zabbix. > >>> 2. Manually calculate sum of all metrics and divide to sum of all cache > >>> sizes. > >> > >> As a compromise, I can add jmx methods (rebuilding indexes in the > process > >> and the percentage of rebuilding) for the entire node, but I tried to > find > >> a suitable place and did not find it, tell me where to add it? > >>> On the other hand, % of index rebuild progress is self-descriptive. I > >> don't > >>> understand why we tend to make user's life harder. > >> > >> 10.08.2020, 21:57, "Ivan Rakov" : > >>>> This metric can be used only for local node, to get size of cache use > >>>> > >> > org.apache.ignite.internal.processors.cache.CacheMetricsImpl#getCacheSize. > >>> > >>> Got it, agree. > >>> > >>> If there is a lot of data in node that can be rebuilt, percentage may > >>>> change very rarely and may not give an estimate of how much time is > >> left. > >>>> If we see for example that 50_000 keys are rebuilt once a minute, and > >> we > >>>> have 1_000_000_000 keys, then we can have an approximate estimate. > >> What do > >>>> you think of that? > >>> > >>> If the progress is very slow, I don't see issues with tracking it if > the > >>> percentage float has enough precision. > >>> Still, usability of the metric concerns me. In order to estimate > >> remaining > >>> time of index rebuild, user should: > >>> 1. Gain an understanding that local cache size > >&g
Re: [DISCUSSION] Add index rebuild time metrics
> > As a compromise, I can add jmx methods (rebuilding indexes in the process > and the percentage of rebuilding) for the entire node, but I tried to find > a suitable place and did not find it, tell me where to add it? I have checked existing JMX beans. To be honest, I struggle to find a suitable place as well. We have ClusterMetrics that may represent the state of a local node, but this class is also used for aggregated cluster metrics. I can't propose a reasonable way to merge percentages from different nodes. On the other hand, total index rebuild for all caches isn't a common scenario. It's either performed after manual index.bin removal or after index creation, both operations are performed on cache / cache-group level. Also, all other similar metrics are provided on cache-group level. I propose to stick with a cache-group level metric (e.g. getIndexBuildProgress) that returns a float from 0 to 1, which is calculated as [processedKeys] / [localCacheSize]. Even if a user handles metrics through Zabbix, I anticipate that he'll perform this calculation on his own in order to estimate progress. Let's help him a bit and perform it on the system side. If a per-group percentage metric is present, I think getIndexRebuildKeyProcessed becomes redundant. On Tue, Aug 11, 2020 at 8:20 AM ткаленко кирилл wrote: > Hi, Ivan! > > What precision would be sufficient? > > If the progress is very slow, I don't see issues with tracking it if the > > percentage float has enough precision. > > I think we can add a mention getting cache size. > > 1. Gain an understanding that local cache size > > (CacheMetricsImpl#getCacheSize) should be used as a 100% milestone (it > > isn't mentioned neither in javadoc nor in JMX method description). > > Do you think users collect metrics with their hands? I think this is done > by other systems, such as zabbix. > > 2. Manually calculate sum of all metrics and divide to sum of all cache > > sizes. > > As a compromise, I can add jmx methods (rebuilding indexes in the process > and the percentage of rebuilding) for the entire node, but I tried to find > a suitable place and did not find it, tell me where to add it? > > On the other hand, % of index rebuild progress is self-descriptive. I > don't > > understand why we tend to make user's life harder. > > 10.08.2020, 21:57, "Ivan Rakov" : > >> This metric can be used only for local node, to get size of cache use > >> > org.apache.ignite.internal.processors.cache.CacheMetricsImpl#getCacheSize. > > > > Got it, agree. > > > > If there is a lot of data in node that can be rebuilt, percentage may > >> change very rarely and may not give an estimate of how much time is > left. > >> If we see for example that 50_000 keys are rebuilt once a minute, and > we > >> have 1_000_000_000 keys, then we can have an approximate estimate. > What do > >> you think of that? > > > > If the progress is very slow, I don't see issues with tracking it if the > > percentage float has enough precision. > > Still, usability of the metric concerns me. In order to estimate > remaining > > time of index rebuild, user should: > > 1. Gain an understanding that local cache size > > (CacheMetricsImpl#getCacheSize) should be used as a 100% milestone (it > > isn't mentioned neither in javadoc nor in JMX method description). > > 2. Manually calculate sum of all metrics and divide to sum of all cache > > sizes. > > On the other hand, % of index rebuild progress is self-descriptive. I > don't > > understand why we tend to make user's life harder. > > > > -- > > Best regards, > > Ivan > > > > On Mon, Aug 10, 2020 at 8:53 PM ткаленко кирилл > > wrote: > > > >> Hi, Ivan! > >> > >> For this you can use > >> org.apache.ignite.cache.CacheMetrics#IsIndexRebuildInProgress > >> > How can a local number of processed keys can help us to understand > when > >> > index rebuild will be finished? > >> > >> This metric can be used only for local node, to get size of cache use > >> > org.apache.ignite.internal.processors.cache.CacheMetricsImpl#getCacheSize. > >> > We can't compare metric value with cache.size(). First one is > node-local, > >> > while cache size covers all partitions in the cluster. > >> > >> If there is a lot of data in node that can be rebuilt, percentage may > >> change very rarely and may not give an estimate of how much time is > left. > >> If we see for example that 50_000 keys are rebuilt once
Re: [DISCUSSION] Add index rebuild time metrics
> > This metric can be used only for local node, to get size of cache use > org.apache.ignite.internal.processors.cache.CacheMetricsImpl#getCacheSize. Got it, agree. If there is a lot of data in node that can be rebuilt, percentage may > change very rarely and may not give an estimate of how much time is left. > If we see for example that 50_000 keys are rebuilt once a minute, and we > have 1_000_000_000 keys, then we can have an approximate estimate. What do > you think of that? If the progress is very slow, I don't see issues with tracking it if the percentage float has enough precision. Still, usability of the metric concerns me. In order to estimate remaining time of index rebuild, user should: 1. Gain an understanding that local cache size (CacheMetricsImpl#getCacheSize) should be used as a 100% milestone (it isn't mentioned neither in javadoc nor in JMX method description). 2. Manually calculate sum of all metrics and divide to sum of all cache sizes. On the other hand, % of index rebuild progress is self-descriptive. I don't understand why we tend to make user's life harder. -- Best regards, Ivan On Mon, Aug 10, 2020 at 8:53 PM ткаленко кирилл wrote: > Hi, Ivan! > > For this you can use > org.apache.ignite.cache.CacheMetrics#IsIndexRebuildInProgress > > How can a local number of processed keys can help us to understand when > > index rebuild will be finished? > > This metric can be used only for local node, to get size of cache use > org.apache.ignite.internal.processors.cache.CacheMetricsImpl#getCacheSize. > > We can't compare metric value with cache.size(). First one is node-local, > > while cache size covers all partitions in the cluster. > > If there is a lot of data in node that can be rebuilt, percentage may > change very rarely and may not give an estimate of how much time is left. > If we see for example that 50_000 keys are rebuilt once a minute, and we > have 1_000_000_000 keys, then we can have an approximate estimate. What do > you think of that? > > I find one single metric much more usable. It would be perfect if metric > > value is represented in percentage, e.g. current progress of local node > > index rebuild is 60%. > > 10.08.2020, 19:11, "Ivan Rakov" : > > Folks, > > > > Sorry for coming late to the party. I've taken a look at this issue > during > > review. > > > > How can a local number of processed keys can help us to understand when > > index rebuild will be finished? > > We can't compare metric value with cache.size(). First one is node-local, > > while cache size covers all partitions in the cluster. > > Also, I don't understand why we need to keep separate metrics for all > > caches. Of course, the metric becomes more fair, but obviously harder to > > make conclusions on whether "the index rebuild" process is over (and the > > cluster is ready to process queries quickly). > > > > I find one single metric much more usable. It would be perfect if metric > > value is represented in percentage, e.g. current progress of local node > > index rebuild is 60%. > > > > -- > > Best regards, > > Ivan > > > > On Fri, Jul 24, 2020 at 1:35 PM Stanislav Lukyanov < > stanlukya...@gmail.com> > > wrote: > > > >> Got it. I thought that index building and index rebuilding are > essentially > >> the same, > >> but now I see that they are different: index rebuilding cares about all > >> indexes at once while index building cares about particular ones. > >> > >> Kirill's approach sounds good. > >> > >> Stan > >> > >> > On 20 Jul 2020, at 14:54, Alexey Goncharuk < > alexey.goncha...@gmail.com> > >> wrote: > >> > > >> > Stan, > >> > > >> > Currently we never build indexes one-by-one - we always use a cache > data > >> > row visitor which either updates all indexes (see > >> IndexRebuildFullClosure) > >> > or updates a set of all indexes that need to catch up (see > >> > IndexRebuildPartialClosure). GIven that, I do not see any need for > >> > per-index rebuild status as this status will be updated for all > outdated > >> > indexes simultaneously. > >> > > >> > Kirill's approach for the total number of processed keys per cache > seems > >> > reasonable to me. > >> > > >> > --AG > >> > > >> > пт, 3 июл. 2020 г. в 10:12, ткаленко кирилл : > >> > > >> >> Hi, Stan! > >> >&
Re: [DISCUSSION] Add index rebuild time metrics
Folks, Sorry for coming late to the party. I've taken a look at this issue during review. How can a local number of processed keys can help us to understand when index rebuild will be finished? We can't compare metric value with cache.size(). First one is node-local, while cache size covers all partitions in the cluster. Also, I don't understand why we need to keep separate metrics for all caches. Of course, the metric becomes more fair, but obviously harder to make conclusions on whether "the index rebuild" process is over (and the cluster is ready to process queries quickly). I find one single metric much more usable. It would be perfect if metric value is represented in percentage, e.g. current progress of local node index rebuild is 60%. -- Best regards, Ivan On Fri, Jul 24, 2020 at 1:35 PM Stanislav Lukyanov wrote: > Got it. I thought that index building and index rebuilding are essentially > the same, > but now I see that they are different: index rebuilding cares about all > indexes at once while index building cares about particular ones. > > Kirill's approach sounds good. > > Stan > > > On 20 Jul 2020, at 14:54, Alexey Goncharuk > wrote: > > > > Stan, > > > > Currently we never build indexes one-by-one - we always use a cache data > > row visitor which either updates all indexes (see > IndexRebuildFullClosure) > > or updates a set of all indexes that need to catch up (see > > IndexRebuildPartialClosure). GIven that, I do not see any need for > > per-index rebuild status as this status will be updated for all outdated > > indexes simultaneously. > > > > Kirill's approach for the total number of processed keys per cache seems > > reasonable to me. > > > > --AG > > > > пт, 3 июл. 2020 г. в 10:12, ткаленко кирилл : > > > >> Hi, Stan! > >> > >> Perhaps it is worth clarifying what exactly I wanted to say. > >> Now we have 2 processes: building and rebuilding indexes. > >> > >> At moment, we have some metrics for rebuilding indexes: > >> "IsIndexRebuildInProgress", "IndexBuildCountPartitionsLeft". > >> > >> I suggest adding another metric "Indexrebuildkeyprocessed", which will > >> allow you to determine how many records are left to rebuild for cache. > >> > >> I think your comments are more about building an index that may need > more > >> metrics, but I think you should do it in a separate ticket. > >> > >> 03.07.2020, 03:09, "Stanislav Lukyanov" : > >>> If multiple indexes are to be built "number of indexed keys" metric may > >> be misleading. > >>> > >>> As a cluster admin, I'd like to know: > >>> - Are all indexes ready on a node? > >>> - How many indexes are to be built? > >>> - How much resources are used by the index building (how many threads > >> are used)? > >>> - Which index(es?) is being built right now? > >>> - How much time until the current (single) index building finishes? > Here > >> "time" can be a lot of things: partitions, entries, percent of the > cache, > >> minutes and hours > >>> - How much time until all indexes are built? > >>> - How much does it take to build each of my indexes / a single index of > >> my cache on average? > >>> > >>> I think we need a set of metrics and/or log messages to solve all of > >> these questions. > >>> I imaging something like: > >>> - numberOfIndexesToBuild > >>> - a standard set of metrics on the index building thread pool (do we > >> already have it?) > >>> - currentlyBuiltIndexName (assuming we only build one at a time which > is > >> probably not true) > >>> - for the "time" metrics I think percentage might be the best as it's > >> the easiest to understand; we may add multiple metrics though. > >>> - For "time per each index" I'd add detailed log messages stating how > >> long did it take to build a particular index > >>> > >>> Thanks, > >>> Stan > >>> > On 26 Jun 2020, at 12:49, ткаленко кирилл > >> wrote: > > Hi, Igniters. > > I would like to know if it is possible to estimate how much the index > >> rebuild will take? > > At the moment, I have found the following metrics [1] and [2] and > >> since the rebuild is based on caches, I think it would be useful to know > >> how many records are processed in indexing. This way we can estimate how > >> long we have to wait for the index to be rebuilt by subtracting [3] and > how > >> many records are indexed. > > I think we should add this metric [4]. > > Comments, suggestions? > > [1] - https://issues.apache.org/jira/browse/IGNITE-12184 > [2] - > >> > org.apache.ignite.internal.processors.cache.CacheGroupMetricsImpl#idxBuildCntPartitionsLeft > [3] - org.apache.ignite.cache.CacheMetrics#getCacheSize > [4] - org.apache.ignite.cache.CacheMetrics#getNumberIndexedKeys > >> > >
Re: Re[2]: Apache Ignite 2.9.0 RELEASE [Time, Scope, Manager]
Hi Alex, https://issues.apache.org/jira/browse/IGNITE-13306 is merged to master. Can you please cherry-pick to 2.9? On Thu, Jul 30, 2020 at 7:42 PM Ilya Kasnacheev wrote: > Hello! > > I don't think that IGNITE-13006 > <https://issues.apache.org/jira/browse/IGNITE-13006> is a blocker in any > way. It is a good candidate for 3.0. > > ignite-spring will work with 4.x Spring as well as 5.x and the user is free > to bump Spring version. I think bumping this dependency explicitly is > infeasible since it may break existing code. > > Regards, > -- > Ilya Kasnacheev > > > ср, 22 июл. 2020 г. в 10:22, Alex Plehanov : > > > Guys, > > > > We are in code-freeze phase now. I've moved almost all non-blocker > > unresolved tickets from 2.9 to the next release. If you think that > > some ticket is a blocker and should be included into 2.9 release, please > > write a note in this thread. > > > > There are some tickets with "blocker" priority targeted to 2.9, some of > > them in "open" state and still unassigned, and I'm not sure we need all > of > > these tickets in 2.9: > > > > IGNITE-13006 [1] (Apache Ignite spring libs upgrade from version 4x to > > spring 5.2 version or later) - Is it really a blocker for 2.9 release? If > > yes, can somebody help with resolving this ticket? > > > > IGNITE-11942 [2] (IGFS and Hadoop Accelerator Discontinuation) - ticket > in > > "Patch available" state. There is a thread on dev-list related to this > > ticket ([6]), but as far as I understand we still don't have consensus > > about version for this patch (2.9, 2.10, 3.0). > > > > IGNITE-12489 [3] (Error during purges by expiration: Unknown page type) - > > perhaps issue is already resolved by some related tickets, there is still > > no reproducer, no additional details and no work in progress. I propose > to > > move this ticket to the next release. > > > > IGNITE-12911 [4] (B+Tree Corrupted exception when using a key extracted > > from a BinaryObject value object --- and SQL enabled) - ticket in "Patch > > available" state, but there is no activity since May 2020. Anton > > Kalashnikov, Ilya Kasnacheev, do we have any updates on this ticket? Is > it > > still in progress? > > > > IGNITE-12553 [5] ([IEP-35] public Java metric API) - since the new > metrics > > framework is already released in 2.8 and it's still marked with > > @IgniteExperemental annotation, I think this ticket is not a blocker. I > > propose to change the ticket priority and move it to the next release. > > > > > > [1]: https://issues.apache.org/jira/browse/IGNITE-13006 > > [2]: https://issues.apache.org/jira/browse/IGNITE-11942 > > [3]: https://issues.apache.org/jira/browse/IGNITE-12489 > > [4]: https://issues.apache.org/jira/browse/IGNITE-12911 > > [5]: https://issues.apache.org/jira/browse/IGNITE-12553 > > [6]: > > > http://apache-ignite-developers.2346864.n4.nabble.com/DISCUSSION-Complete-Discontinuation-of-IGFS-and-Hadoop-Accelerator-td42282.html > > > > пт, 17 июл. 2020 г. в 11:50, Alex Plehanov : > > > >> Ivan, > >> > >> Merged to 2.9. > >> > >> Thanks > >> > >> пт, 17 июл. 2020 г. в 01:35, Ivan Rakov : > >> > >>> Alex, > >>> > >>> Tracing is merged to master: > >>> https://issues.apache.org/jira/browse/IGNITE-13060 > >>> > >>> Can you please port it to 2.9? > >>> For you convenience, there's PR versus 2.9 with conflicts resolved: > >>> https://github.com/apache/ignite/pull/8046/files > >>> > >>> -- > >>> Best Regards, > >>> Ivan Rakov > >>> > >>> On Wed, Jul 15, 2020 at 5:33 PM Alex Plehanov > > >>> wrote: > >>> > >>>> Ivan, > >>>> > >>>> Looks like master is broken after IGNITE-13246 (but everything is ok > in > >>>> 2.9 > >>>> branch) > >>>> > >>>> ср, 15 июл. 2020 г. в 18:54, Alex Plehanov : > >>>> > >>>> > Zhenya, Ivan, > >>>> > > >>>> > I've cherry-picked IGNITE-13229 and IGNITE-13246 to ignite-2.9 > branch. > >>>> > Thank you. > >>>> > > >>>> > ср, 15 июл. 2020 г. в 18:31, Ivan Bessonov : > >>>> > > >>>> >> Guys, > >>>> >> > >>>> >> can
Re: Re[2]: Apache Ignite 2.9.0 RELEASE [Time, Scope, Manager]
Alex, Tracing is merged to master: https://issues.apache.org/jira/browse/IGNITE-13060 Can you please port it to 2.9? For you convenience, there's PR versus 2.9 with conflicts resolved: https://github.com/apache/ignite/pull/8046/files -- Best Regards, Ivan Rakov On Wed, Jul 15, 2020 at 5:33 PM Alex Plehanov wrote: > Ivan, > > Looks like master is broken after IGNITE-13246 (but everything is ok in 2.9 > branch) > > ср, 15 июл. 2020 г. в 18:54, Alex Plehanov : > > > Zhenya, Ivan, > > > > I've cherry-picked IGNITE-13229 and IGNITE-13246 to ignite-2.9 branch. > > Thank you. > > > > ср, 15 июл. 2020 г. в 18:31, Ivan Bessonov : > > > >> Guys, > >> > >> can you please backport > >> https://issues.apache.org/jira/browse/IGNITE-13246 > >> to ignite-2.9? Me and Alexey Kuznetsov really want these new events in > >> release. > >> > >> This time I prepared PR with resolved conflicts: > >> https://github.com/apache/ignite/pull/8042 > >> > >> Thank you! > >> > >> вт, 14 июл. 2020 г. в 19:39, Zhenya Stanilovsky > >> >> >: > >> > >> > > >> > > >> > > >> > Alex, i also suggest to merge this > >> > https://issues.apache.org/jira/browse/IGNITE-13229 too, GridClient > >> > leakage and further TC OOM preventing. > >> > > >> > >Ivan, > >> > > > >> > >It was already in release scope as discussed in this thread. > >> > > > >> > >вт, 14 июл. 2020 г. в 14:31, Ivan Rakov < ivan.glu...@gmail.com >: > >> > > > >> > >> Hi, > >> > >> > >> > >> We are still waiting for a final review of Tracing functionality > [1] > >> > until > >> > >> the end of tomorrow (July 15). > >> > >> We anticipate that it will be merged to Ignite master no later than > >> July > >> > >> 16. > >> > >> > >> > >> Sorry for being a bit late here. Alex P., can you include [1] to > the > >> > >> release scope? > >> > >> > >> > >> [1]: https://issues.apache.org/jira/browse/IGNITE-13060 > >> > >> > >> > >> -- > >> > >> Best Regards, > >> > >> Ivan Rakov > >> > >> > >> > >> On Tue, Jul 14, 2020 at 6:16 AM Alexey Kuznetsov < > >> > akuznet...@gridgain.com > > >> > >> wrote: > >> > >> > >> > >>> Alex, > >> > >>> > >> > >>> Can you cherry-pick to Ignite 2.9 this issue: > >> > >>> https://issues.apache.org/jira/browse/IGNITE-13246 ? > >> > >>> > >> > >>> This issue is about BASELINE events and it is very useful for > >> > notification > >> > >>> external tools about changes in baseline. > >> > >>> > >> > >>> Thank you! > >> > >>> > >> > >>> --- > >> > >>> Alexey Kuznetsov > >> > >>> > >> > >> > >> > > >> > > >> > > >> > > >> > >> > >> > >> -- > >> Sincerely yours, > >> Ivan Bessonov > >> > > >
Re: Choosing historical rebalance heuristics
> > I think we can modify the heuristic so > 1) Exclude partitions by threshold (IGNITE_PDS_WAL_REBALANCE_THRESHOLD - > reduce it to 500) > 2) Select only that partition for historical rebalance where difference > between counters less that partition size. Agreed, let's go this way. On Thu, Jul 16, 2020 at 11:03 AM Vladislav Pyatkov wrote: > I completely forget about another promise to favor of using historical > rebalance where it is possible. When cluster decided to use a full balance, > demander nodes should clear not empty partitions. > This can to consume a long time, in some cases that may be compared with a > time of rebalance. > It also accepts a side of heuristics above. > > On Thu, Jul 16, 2020 at 12:09 AM Vladislav Pyatkov > wrote: > > > Ivan, > > > > I agree with a combined approach: threshold for small partitions and > count > > of update for partition that outgrew it. > > This helps to avoid partitions that update not frequently. > > > > Reading of a big WAL piece (more than 100Gb) it can happen, when a client > > configured it intentionally. > > There are no doubts we can to read it, otherwise WAL space was not > > configured that too large. > > > > I don't see a connection optimization of iterator and issue in atomic > > protocol. > > Reordering in WAL, that happened in checkpoint where counter was not > > changing, is an extremely rare case and the issue will not solve for > > generic case, this should be fixed in bound of protocol. > > > > I think we can modify the heuristic so > > 1) Exclude partitions by threshold (IGNITE_PDS_WAL_REBALANCE_THRESHOLD - > > reduce it to 500) > > 2) Select only that partition for historical rebalance where difference > > between counters less that partition size. > > > > Also implement mentioned optimization for historical iterator, that may > > reduce a time on reading large WAL interval. > > > > On Wed, Jul 15, 2020 at 3:15 PM Ivan Rakov > wrote: > > > >> Hi Vladislav, > >> > >> Thanks for raising this topic. > >> Currently present IGNITE_PDS_WAL_REBALANCE_THRESHOLD (default is > 500_000) > >> is controversial. Assuming that the default number of partitions is > 1024, > >> cache should contain a really huge amount of data in order to make WAL > >> delta rebalancing possible. In fact, it's currently disabled for most > >> production cases, which makes rebalancing of persistent caches > >> unreasonably > >> long. > >> > >> I think, your approach [1] makes much more sense than the current > >> heuristic, let's move forward with the proposed solution. > >> > >> Though, there are some other corner cases, e.g. this one: > >> - Configured size of WAL archive is big (>100 GB) > >> - Cache has small partitions (e.g. 1000 entries) > >> - Infrequent updates (e.g. ~100 in the whole WAL history of any node) > >> - There is another cache with very frequent updates which allocate >99% > of > >> WAL > >> In such scenario we may need to iterate over >100 GB of WAL in order to > >> fetch <1% of needed updates. Even though the amount of network traffic > is > >> still optimized, it would be more effective to transfer partitions with > >> ~1000 entries fully instead of reading >100 GB of WAL. > >> > >> I want to highlight that your heuristic definitely makes the situation > >> better, but due to possible corner cases we should keep the fallback > lever > >> to restrict or limit historical rebalance as before. Probably, it would > be > >> handy to keep IGNITE_PDS_WAL_REBALANCE_THRESHOLD property with a low > >> default value (1000, 500 or even 0) and apply your heuristic only for > >> partitions with bigger size. > >> > >> Regarding case [2]: it looks like an improvement that can mitigate some > >> corner cases (including the one that I have described). I'm ok with it > as > >> long as it takes data updates reordering on backup nodes into account. > We > >> don't track skipped updates for atomic caches. As a result, detection of > >> the absence of updates between two checkpoint markers with the same > >> partition counter can be false positive. > >> > >> -- > >> Best Regards, > >> Ivan Rakov > >> > >> On Tue, Jul 14, 2020 at 3:03 PM Vladislav Pyatkov > > >> wrote: > >> > >> > Hi guys, > >> > > >> > I want to implement a more honest he
Re: Choosing historical rebalance heuristics
Hi Vladislav, Thanks for raising this topic. Currently present IGNITE_PDS_WAL_REBALANCE_THRESHOLD (default is 500_000) is controversial. Assuming that the default number of partitions is 1024, cache should contain a really huge amount of data in order to make WAL delta rebalancing possible. In fact, it's currently disabled for most production cases, which makes rebalancing of persistent caches unreasonably long. I think, your approach [1] makes much more sense than the current heuristic, let's move forward with the proposed solution. Though, there are some other corner cases, e.g. this one: - Configured size of WAL archive is big (>100 GB) - Cache has small partitions (e.g. 1000 entries) - Infrequent updates (e.g. ~100 in the whole WAL history of any node) - There is another cache with very frequent updates which allocate >99% of WAL In such scenario we may need to iterate over >100 GB of WAL in order to fetch <1% of needed updates. Even though the amount of network traffic is still optimized, it would be more effective to transfer partitions with ~1000 entries fully instead of reading >100 GB of WAL. I want to highlight that your heuristic definitely makes the situation better, but due to possible corner cases we should keep the fallback lever to restrict or limit historical rebalance as before. Probably, it would be handy to keep IGNITE_PDS_WAL_REBALANCE_THRESHOLD property with a low default value (1000, 500 or even 0) and apply your heuristic only for partitions with bigger size. Regarding case [2]: it looks like an improvement that can mitigate some corner cases (including the one that I have described). I'm ok with it as long as it takes data updates reordering on backup nodes into account. We don't track skipped updates for atomic caches. As a result, detection of the absence of updates between two checkpoint markers with the same partition counter can be false positive. -- Best Regards, Ivan Rakov On Tue, Jul 14, 2020 at 3:03 PM Vladislav Pyatkov wrote: > Hi guys, > > I want to implement a more honest heuristic for historical rebalance. > Before, a cluster makes a choice between the historical rebalance or not it > only from a partition size. This threshold more known by a name of property > IGNITE_PDS_WAL_REBALANCE_THRESHOLD. > It might prevent a historical rebalance when a partition is too small, but > not if WAL contains more updates than a size of partition, historical > rebalance still can be chosen. > There is a ticket where need to implement more fair heuristic[1]. > > My idea for implementation is need to estimate a size of data which will be > transferred owe network. In other word if need to rebalance a part of WAL > that contains N updates, for recover a partition on another node, which > have to contain M rows at all, need chooses a historical rebalance on the > case where N < M (WAL history should be presented as well). > > This approach is easy implemented, because a coordinator node has the size > of partitions and counters' interval. But in this case cluster still can > find not many updates in too long WAL history. I assume a possibility to > work around it, if rebalance historical iterator will not handle > checkpoints where not contains updates of particular cache. Checkpoints can > skip if counters for the cache (maybe even a specific partitions) was not > changed between it and next one. > > Ticket for improvement rebalance historical iterator[2] > > I want to hear a view of community on the thought above. > Maybe anyone has another opinion? > > [1]: https://issues.apache.org/jira/browse/IGNITE-13253 > [2]: https://issues.apache.org/jira/browse/IGNITE-13254 > > -- > Vladislav Pyatkov >
Re: Apache Ignite 2.9.0 RELEASE [Time, Scope, Manager]
Hi, We are still waiting for a final review of Tracing functionality [1] until the end of tomorrow (July 15). We anticipate that it will be merged to Ignite master no later than July 16. Sorry for being a bit late here. Alex P., can you include [1] to the release scope? [1]: https://issues.apache.org/jira/browse/IGNITE-13060 -- Best Regards, Ivan Rakov On Tue, Jul 14, 2020 at 6:16 AM Alexey Kuznetsov wrote: > Alex, > > Can you cherry-pick to Ignite 2.9 this issue: > https://issues.apache.org/jira/browse/IGNITE-13246 ? > > This issue is about BASELINE events and it is very useful for notification > external tools about changes in baseline. > > Thank you! > > --- > Alexey Kuznetsov >
Re: [DISCUSSION] Tracing: IGNITE-13060
Igniters, The PR is ready to be merged, all comments from my side have been fixed. If anyone has more comments, please let know today. Best Regards, Ivan Rakov On Tue, Jun 30, 2020 at 10:43 AM Alexander Lapin wrote: > Hello Igniters, > > I'd like to discuss with you and then donate changes related to > IGNITE-13060 > <https://issues.apache.org/jira/browse/IGNITE-13060> > In very brief it's an initial tracing implementation that allows to thrace > Communication, Exchange, Discovery and Transactions. Spi concept is used > with OpenCensus as one of implementations. For more details about tracing > engine, tracing configuration, etc please see IEP-48 > <https://cwiki.apache.org/confluence/display/IGNITE/IEP-48%3A+Tracing>. > > Best regards, > Alexander >
[jira] [Created] (IGNITE-13211) Improve public exceptions for case when user attempts to access data from a lost partition
Ivan Rakov created IGNITE-13211: --- Summary: Improve public exceptions for case when user attempts to access data from a lost partition Key: IGNITE-13211 URL: https://issues.apache.org/jira/browse/IGNITE-13211 Project: Ignite Issue Type: Improvement Reporter: Ivan Rakov After IGNITE-13003, attempt to access lost partition from public API throws CacheException with CacheInvalidStateException inside as a root cause. We can improve user experience a bit: 1. Create new type of public exception (subclass of CacheException), which will be thrown in accessing lost data scenarios 2. In case partition is lost in persistent cache, error message should be changed from "partition data has been lost" to "partition data temporary unavailable". -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Extended logging for rebalance performance analysis
+1 to Alex G. >From my experience, the most interesting cases with Ignite rebalancing happen exactly in production. According to the fact that we already have detailed rebalancing logging, adding info about rebalance performance looks like a reasonable improvement. With new logs we'll be able to detect and investigate situations when rebalance is slow due to uneven suppliers distribution or network issues. Option to disable the feature in runtime shouldn't be used often, but it will keep us on the safe side in case something goes wrong. The format described in https://issues.apache.org/jira/browse/IGNITE-12080 looks good to me. On Tue, Jun 23, 2020 at 7:01 PM ткаленко кирилл wrote: > Hello, Alexey! > > Currently there is no way to disable / enable it, but it seems that the > logs will not be overloaded, since Alexei Scherbakov offer seems reasonable > and compact. Of course, you can add disabling / enabling statistics > collection via jmx for example. > > 23.06.2020, 18:47, "Alexey Goncharuk" : > > Hello Maxim, folks, > > > > ср, 6 мая 2020 г. в 21:01, Maxim Muzafarov : > > > >> We won't do performance analysis on the production environment. Each > >> time we need performance analysis it will be done on a test > >> environment with verbose logging enabled. Thus I suggest moving these > >> changes to a separate `profiling` module and extend the logging much > >> more without any ышяу limitations. The same as these [2] [3] > >> activities do. > > > > I strongly disagree with this statement. I am not sure who is meant here > > by 'we', but I see a strong momentum in increasing observability tooling > > that helps people to understand what exactly happens in the production > > environment [1]. Not everybody can afford two identical environments for > > testing. We should make sure users have enough information to understand > > the root cause after the incident happened, and not force them to > reproduce > > it, let alone make them add another module to the classpath and restart > the > > nodes. > > I think having this functionality in the core module with the ability to > > disable/enable it is the right approach. Having the information printed > to > > log is ok, having it in an event that can be sent to a monitoring/tracing > > subsystem is even better. > > > > Kirill, can we enable and disable this feature in runtime to avoid the > very > > same nodes restart? > > > > [1] > https://www.honeycomb.io/blog/yes-i-test-in-production-and-so-do-you/ >
Re: Various shutdown guaranties
Vlad, +1, that's what I mean. We don't need either or dedicated USE_STATIC_CONFIGURATION in case the user will be able to retrieve current shutdown policy and apply the one he needs. My only requirement is that ignite.cluster().getShutdownPolicy() should return a statically configured value {@link IgniteConfiguration#shutdownPolicy} in case no override has been specified. So, static configuration will be applied only on cluster start, like it currently works for SQL schemas. On Tue, Jun 9, 2020 at 7:09 PM V.Pyatkov wrote: > Hi, > > ignite.cluster().setShutdownPolicy(null); // Clear dynamic value and switch > to statically configured. > > I do not understand why we need it. if user want to change configuration to > any other value he set it explicitly. > We can to add warning on start when static option does not math to dynamic > (dynamic always prefer if it initiated). > > shutdownPolicy=IMMEDIATE|GRACEFUL > > Looks better that DEFAULT and WAIT_FOR_BACKUP. > > I general I consider job cancellation need to added in these policies' > enumeration. > But we can do it in the future. > > > > -- > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/ >
Re: Apache Ignite 2.9.0 RELEASE [Time, Scope, Manager]
Hi, Indeed, the tracing feature is almost ready. Discovery, communication and transactions tracing will be introduced, as well as an option to configure tracing in runtime. Right now we are working on final performance optimizations, but it's very likely that we'll complete this activity before the code freeze date. Let's include tracing to the 2.9 release scope. More info: https://cwiki.apache.org/confluence/display/IGNITE/IEP-48%3A+Tracing https://issues.apache.org/jira/browse/IGNITE-13060 -- Best Regards, Ivan Rakov On Sat, Jun 6, 2020 at 4:30 PM Denis Magda wrote: > Hi folks, > > The timelines proposed by Alex Plekhanov sounds reasonable to me. I'd like > only to hear inputs of @Ivan Rakov , who is about to > finish with the tracing support, and @Ivan Bessonov > , who is fixing a serious limitation for K8 > deployments [1]. Most likely, both features will be ready by the code > freeze date (July 10), but the guys should know it better. > > [1] > http://apache-ignite-developers.2346864.n4.nabble.com/DISCUSSION-New-Ignite-settings-for-IGNITE-12438-and-IGNITE-13013-td47586.html > > - > Denis > > > On Wed, Jun 3, 2020 at 4:45 AM Alex Plehanov > wrote: > >> Hello Igniters, >> >> AI 2.8.1 is finally released and as we discussed here [1] its time to >> start >> the discussion about 2.9 release. >> >> I want to propose myself to be the release manager of the 2.9 release. >> >> What about release time, I agree with Maxim that we should deliver >> features >> as frequently as possible. If some feature doesn't fit into release dates >> we should better include it into the next release and schedule the next >> release earlier then postpone the current release. >> >> I propose the following dates for 2.9 release: >> >> Scope Freeze: June 26, 2020 >> Code Freeze: July 10, 2020 >> Voting Date: July 31, 2020 >> Release Date: August 7, 2019 >> >> WDYT? >> >> [1] : >> >> http://apache-ignite-developers.2346864.n4.nabble.com/Ignite-Releases-Plan-td47360.html#a47575 >> >
Re: Various shutdown guaranties
Alex, Also shutdown policy must be always consistent on the grid or unintentional > data loss is possible if two nodes are stopping simultaneously with > different policies. Totally agree. Let's use shutdownPolicy=DEFAULT|GRACEFUL, as was proposed by me earlier. I'm ok with GRACEFUL instead of WAIT_FOR_BACKUPS. 5. Let's keep a static property for simplifying setting of initial > behavior. > In most cases the policy will never be changed during grid's lifetime. > No need for an explicit call to API on grid start. > A joining node should check a local configuration value to match the grid. > If a dynamic value is already present in a metastore, it should override > static value with a warning. To sum it up: - ShutdownPolicy can be set with static configuration (IgniteConfiguration#setShutdownPolicy), on join we validate that statically configured policies on different server nodes are the same - It's possible to override statically configured value by adding distributed metastorage value, which can be done by calling ignite.cluster().setShutdownPolicy(plc) or control.sh method - Dynamic property is persisted Generally, I don't mind if we have both dynamic and static configuration properties. Necessity to call ignite.cluster().setShutdownPolicy(plc); on every new cluster creation is a usability issue itself. What bothers me here are the possible conflicts between static and dynamic configuration. User may be surprised if he has shutdown policy X in IgniteConfiguration, but the cluster behaves according to policy Y (because several months ago another admin had called IgniteCluster#setShutdownPolicy). We can handle it by adding a separate enum field to the shutdown policy: > public enum ShutdownPolicy { > /* Default value of dynamic shutdown policy property. If it's set, the > shutdown policy is resolved according to value of static {@link > IgniteConfiguration#shutdownPolicy} configuration parameter. */ > USE_STATIC_CONFIGURATION, > > /* Node leaves the cluster even if it's the last owner of some > partitions. Only partitions of caches with backups > 0 are taken into > account. */ > IMMEDIATE, > > /* Shutdown is blocked until node is safe to leave without the data > loss. */ > GRACEFUL > } > This way: 1) User may easily understand whether the static parameter is overridden by dynamic. If ignite.cluster().getShutdownPolicy() return anything except USE_STATIC_CONFIGURATION, behavior is overridden. 2) User may clear previous overriding by calling ignite.cluster().setShutdownPolicy(USE_STATIC_CONFIGURATION). After that, behavior will be resolved based in IgniteConfiguration#shutdownPolicy again. If we agree on this mechanism, I propose to use IMMEDIATE name instead of DEFAULT for non-safe policy in order to don't confuse user. Meanwhile, static configuration will accept the same enum, but USE_STATIC_CONFIGURATION will be restricted: > public class IgniteConfiguration { > public static final ShutdownPolicy DFLT_STATIC_SHUTDOWN_POLICY = > IMMEDIATE; > private ShutdownPolicy shutdownPolicy = DFLT_STATIC_SHUTDOWN_POLICY; > ... > public void setShutdownPolicy(ShutdownPolicy shutdownPlc) { > if (shutdownPlc == USE_STATIC_CONFIGURATION) > throw new IllegalArgumentException("USE_STATIC_CONFIGURATION can > only be passed as dynamic property value via > ignite.cluster().setShutdownPolicy"); > ... > } > ... > } > What do you think? On Tue, Jun 9, 2020 at 11:46 AM Alexei Scherbakov < alexey.scherbak...@gmail.com> wrote: > Ivan Rakov, > > Your proposal overall looks good to me. My comments: > > 1. I would avoid adding such a method, because it will be impossible to > change it in the future if some more shutdown policies will be introduced > later. > Also shutdown policy must be always consistent on the grid or unintentional > data loss is possible if two nodes are stopping simultaneously with > different policies. > > This behavior can be achieved by changing policy globally when stopping a > node: > ignite.cluster().setShutdownPolicy(DEFAULT); > ignore.stop(); > > 2. defaultShutdownPolicy with DEFAULT value is a mess. WAIT_FOR_BACKUPS is > not very clear either. > Let's use shutdownPolicy=DEFAULT|GRACEFUL, as was proposed by me earlier. > > 3. OK > > 4. OK > > 5. Let's keep a static property for simplifying setting of initial > behavior. > In most cases the policy will never be changed during grid's lifetime. > No need for an explicit call to API on grid start. > A joining node should check a local configuration value to match the grid. > If a dynamic value is already present in a metastore, it should override > static value with a warning. > > > > > пн, 8 июн. 202
Re: Various shutdown guaranties
Alex, I'm not sure there is a problem at all, because user can always query the > current policy, and a javadoc can describe such behavior clearly. What will the query method return if the static policy is not overridden? If we decide to avoid adding dedicated USE_STATIC_CONFIGURATION value, semantics can be as follows: > // Returns shutdown policy that is currently used by the cluster > // If ignite.cluster().setShutdownPolicy() was never called, returns value > from static configuration {@link IgniteConfiguration#shutdownPolicy}, which > is consistent across all server nodes > // If shutdown policy was overridden by user via > ignite.cluster().setShutdownPolicy(), returns corresponding value ignite.cluster().getShutdownPolicy(); > Seems like there will be no need to reset distributed meta storage value. User can always check which policy is used right now (regardless of whether it has been overridden) and just set the policy that he needs if he wants to change it. The behavior is simple, the only magic is mapping value in distributed meta storage to value from IgniteConfiguration#shutdownPolicy. Can we agree on this? On Tue, Jun 9, 2020 at 3:48 PM Alexei Scherbakov < alexey.scherbak...@gmail.com> wrote: > Ivan, > > Using an additional enum on public API for resetting dynamic value looks a > little bit dirty for me. > I'm not sure there is a problem at all, because user can always query the > current policy, and a javadoc can describe such behavior clearly. > If you really insist maybe use null to reset policy value: > > ignite.cluster().setShutdownPolicy(null); // Clear dynamic value and switch > to statically configured. > > On top of this, we already have a bunch of over properties, which are set > statically and can be changed dynamically later, for example [1] > I think all such properties should behave the same way as shutdown policy > and we need a ticket for this. > In such a case we probably should go with something like > > ignite.cluster().resetDynamicProperValuey(propName); // Resets a property > to statically configured default value. > > Right now I would prefer for shutdown policy behave as other dynamic > properties to make things consistent and fix them all later to be > resettable to static configuration value. > > [1] > org.apache.ignite.IgniteCluster#setTxTimeoutOnPartitionMapExchange(timeout) > > > > вт, 9 июн. 2020 г. в 15:12, Ivan Rakov : > > > Something went wrong with gmail formatting. Resending my reply. > > > > Alex, > > > > Also shutdown policy must be always consistent on the grid or > unintentional > > > data loss is possible if two nodes are stopping simultaneously with > > > different policies. > > > > Totally agree. > > > > Let's use shutdownPolicy=DEFAULT|GRACEFUL, as was proposed by me earlier. > > > > I'm ok with GRACEFUL instead of WAIT_FOR_BACKUPS. > > > > 5. Let's keep a static property for simplifying setting of initial > > > behavior. > > > In most cases the policy will never be changed during grid's lifetime. > > > No need for an explicit call to API on grid start. > > > A joining node should check a local configuration value to match the > > grid. > > > If a dynamic value is already present in a metastore, it should > override > > > static value with a warning. > > > > To sum it up: > > - ShutdownPolicy can be set with static configuration > > (IgniteConfiguration#setShutdownPolicy), on join we validate that > > statically configured policies on different server nodes are the same > > - It's possible to override statically configured value by adding > > distributed metastorage value, which can be done by > > calling ignite.cluster().setShutdownPolicy(plc) or control.sh method > > - Dynamic property is persisted > > > > Generally, I don't mind if we have both dynamic and static configuration > > properties. Necessity to call ignite.cluster().setShutdownPolicy(plc); on > > every new cluster creation is a usability issue itself. > > What bothers me here are the possible conflicts between static and > dynamic > > configuration. User may be surprised if he has shutdown policy X in > > IgniteConfiguration, but the cluster behaves according to policy Y > (because > > several months ago another admin had called > > IgniteCluster#setShutdownPolicy). > > We can handle it by adding a separate enum field to the shutdown policy: > > > > > public enum ShutdownPolicy { > > > /* Default value of dynamic shutdown policy property. If it's set, > the > > > shutdown policy is resolved a
Re: Various shutdown guaranties
Something went wrong with gmail formatting. Resending my reply. Alex, Also shutdown policy must be always consistent on the grid or unintentional > data loss is possible if two nodes are stopping simultaneously with > different policies. Totally agree. Let's use shutdownPolicy=DEFAULT|GRACEFUL, as was proposed by me earlier. I'm ok with GRACEFUL instead of WAIT_FOR_BACKUPS. 5. Let's keep a static property for simplifying setting of initial > behavior. > In most cases the policy will never be changed during grid's lifetime. > No need for an explicit call to API on grid start. > A joining node should check a local configuration value to match the grid. > If a dynamic value is already present in a metastore, it should override > static value with a warning. To sum it up: - ShutdownPolicy can be set with static configuration (IgniteConfiguration#setShutdownPolicy), on join we validate that statically configured policies on different server nodes are the same - It's possible to override statically configured value by adding distributed metastorage value, which can be done by calling ignite.cluster().setShutdownPolicy(plc) or control.sh method - Dynamic property is persisted Generally, I don't mind if we have both dynamic and static configuration properties. Necessity to call ignite.cluster().setShutdownPolicy(plc); on every new cluster creation is a usability issue itself. What bothers me here are the possible conflicts between static and dynamic configuration. User may be surprised if he has shutdown policy X in IgniteConfiguration, but the cluster behaves according to policy Y (because several months ago another admin had called IgniteCluster#setShutdownPolicy). We can handle it by adding a separate enum field to the shutdown policy: > public enum ShutdownPolicy { > /* Default value of dynamic shutdown policy property. If it's set, the > shutdown policy is resolved according to value of static {@link > IgniteConfiguration#shutdownPolicy} configuration parameter. */ > USE_STATIC_CONFIGURATION, > > /* Node leaves the cluster even if it's the last owner of some > partitions. Only partitions of caches with backups > 0 are taken into > account. */ > IMMEDIATE, > > /* Shutdown is blocked until node is safe to leave without the data > loss. */ > GRACEFUL > } > This way: 1) User may easily understand whether the static parameter is overridden by dynamic. If ignite.cluster().getShutdownPolicy() return anything except USE_STATIC_CONFIGURATION, behavior is overridden. 2) User may clear previous overriding by calling ignite.cluster().setShutdownPolicy(USE_STATIC_CONFIGURATION). After that, behavior will be resolved based in IgniteConfiguration#shutdownPolicy again. If we agree on this mechanism, I propose to use IMMEDIATE name instead of DEFAULT for non-safe policy in order to don't confuse user. Meanwhile, static configuration will accept the same enum, but USE_STATIC_CONFIGURATION will be restricted: > public class IgniteConfiguration { > public static final ShutdownPolicy DFLT_STATIC_SHUTDOWN_POLICY = > IMMEDIATE; > private ShutdownPolicy shutdownPolicy = DFLT_STATIC_SHUTDOWN_POLICY; > ... > public void setShutdownPolicy(ShutdownPolicy shutdownPlc) { > if (shutdownPlc == USE_STATIC_CONFIGURATION) > throw new IllegalArgumentException("USE_STATIC_CONFIGURATION can > only be passed as dynamic property value via > ignite.cluster().setShutdownPolicy"); > ... > } > ... > } > What do you think? On Tue, Jun 9, 2020 at 3:09 PM Ivan Rakov wrote: > Alex, > > Also shutdown policy must be always consistent on the grid or unintentional >> data loss is possible if two nodes are stopping simultaneously with >> different policies. > > Totally agree. > > Let's use shutdownPolicy=DEFAULT|GRACEFUL, as was proposed by me earlier. > > I'm ok with GRACEFUL instead of WAIT_FOR_BACKUPS. > > 5. Let's keep a static property for simplifying setting of initial >> behavior. >> In most cases the policy will never be changed during grid's lifetime. >> No need for an explicit call to API on grid start. >> A joining node should check a local configuration value to match the grid. >> If a dynamic value is already present in a metastore, it should override >> static value with a warning. > > To sum it up: > - ShutdownPolicy can be set with static configuration > (IgniteConfiguration#setShutdownPolicy), on join we validate that > statically configured policies on different server nodes are the same > - It's possible to override statically configured value by adding > distributed metastorage value, which can be done by > calling ignite.cluster().setShutdownPolicy(plc) or control.sh method > - D
Re: Various shutdown guaranties
Alex, Also shutdown policy must be always consistent on the grid or unintentional > data loss is possible if two nodes are stopping simultaneously with > different policies. Totally agree. Let's use shutdownPolicy=DEFAULT|GRACEFUL, as was proposed by me earlier. I'm ok with GRACEFUL instead of WAIT_FOR_BACKUPS. 5. Let's keep a static property for simplifying setting of initial > behavior. > In most cases the policy will never be changed during grid's lifetime. > No need for an explicit call to API on grid start. > A joining node should check a local configuration value to match the grid. > If a dynamic value is already present in a metastore, it should override > static value with a warning. To sum it up: - ShutdownPolicy can be set with static configuration (IgniteConfiguration#setShutdownPolicy), on join we validate that statically configured policies on different server nodes are the same - It's possible to override statically configured value by adding distributed metastorage value, which can be done by calling ignite.cluster().setShutdownPolicy(plc) or control.sh method - Dynamic property is persisted Generally, I don't mind if we have both dynamic and static configuration properties. Necessity to call ignite.cluster().setShutdownPolicy(plc); on every new cluster creation is a usability issue itself. What bothers me here are the possible conflicts between static and dynamic configuration. User may be surprised if he has shutdown policy X in IgniteConfiguration, but the cluster behaves according to policy Y (because several months ago another admin had called IgniteCluster#setShutdownPolicy). We can handle it by adding a separate enum field to the shutdown policy: > public enum ShutdownPolicy { > /* Default value of dynamic shutdown policy property. If it's set, the > shutdown policy is resolved according to value of static {@link > IgniteConfiguration#shutdownPolicy} configuration parameter. */ > USE_STATIC_CONFIGURATION, > > /* Node leaves the cluster even if it's the last owner of some > partitions. Only partitions of caches with backups > 0 are taken into > account. */ > IMMEDIATE, > > /* Shutdown is blocked until node is safe to leave without the data > loss. */ > GRACEFUL > } > This way: 1) User may easily understand whether the static parameter is overridden by dynamic. If ignite.cluster().getShutdownPolicy() return anything except USE_STATIC_CONFIGURATION, behavior is overridden. 2) User may clear previous overriding by calling ignite.cluster().setShutdownPolicy(USE_STATIC_CONFIGURATION). After that, behavior will be resolved based in IgniteConfiguration#shutdownPolicy again. If we agree on this mechanism, I propose to use IMMEDIATE name instead of DEFAULT for non-safe policy in order to don't confuse user. Meanwhile, static configuration will accept the same enum, but USE_STATIC_CONFIGURATION will be restricted: > public class IgniteConfiguration { > public static final ShutdownPolicy DFLT_STATIC_SHUTDOWN_POLICY = > IMMEDIATE; > private ShutdownPolicy shutdownPolicy = DFLT_STATIC_SHUTDOWN_POLICY; > ... > public void setShutdownPolicy(ShutdownPolicy shutdownPlc) { > if (shutdownPlc == USE_STATIC_CONFIGURATION) > throw new IllegalArgumentException("USE_STATIC_CONFIGURATION can > only be passed as dynamic property value via > ignite.cluster().setShutdownPolicy"); > ... > } > ... > } > What do you think? On Tue, Jun 9, 2020 at 11:46 AM Alexei Scherbakov < alexey.scherbak...@gmail.com> wrote: > Ivan Rakov, > > Your proposal overall looks good to me. My comments: > > 1. I would avoid adding such a method, because it will be impossible to > change it in the future if some more shutdown policies will be introduced > later. > Also shutdown policy must be always consistent on the grid or unintentional > data loss is possible if two nodes are stopping simultaneously with > different policies. > > This behavior can be achieved by changing policy globally when stopping a > node: > ignite.cluster().setShutdownPolicy(DEFAULT); > ignore.stop(); > > 2. defaultShutdownPolicy with DEFAULT value is a mess. WAIT_FOR_BACKUPS is > not very clear either. > Let's use shutdownPolicy=DEFAULT|GRACEFUL, as was proposed by me earlier. > > 3. OK > > 4. OK > > 5. Let's keep a static property for simplifying setting of initial > behavior. > In most cases the policy will never be changed during grid's lifetime. > No need for an explicit call to API on grid start. > A joining node should check a local configuration value to match the grid. > If a dynamic value is already present in a metastore, it should override > static value with a warning. > > > > > пн, 8 июн. 2020 г. в 19:0
Re: Various shutdown guaranties
Vlad, thanks for starting this discussion. I'll try to clarify the motivation for this change as I see it. In general, Ignite clusters are vulnerable to the data loss. Of course, we have configurable PartitionLossPolicy, which allows to handle data loss safely and mitigate its consequences. But being able to avoid critical situations is always better than being able to recover from it. The most common issue from my perspective is absence of a way to perform rolling cluster restart safely. Scenario: 1. Backup count is 1 2. Admin wants to perform rolling restart in order to deploy new version of business code that uses Ignite in embedded mode 3. Admin shuts down first node, replaces needed binaries and returns the node back to the topology 4. Node joins the cluster successfully 5. Admin shuts down second node 6. Data loss happens: the second node was the only owner of a certain partition, which was being rebalanced from the second node to the first We can prevent such situations by introducing "safe shutdown by default" mode, which blocks stopping node while it remains the only owner for at least one partition. It should be applied to "common" ways of stopping nodes - Ignite.close() and kill . I think, option to be enabled or disabled in runtime should be a requirement for this behavior. Safe shutdown mode has weird side-effects. For example, admin won't be able to stop the whole cluster: stop of last node will be blocked, because the last node is the only present owner of all its partitions. Sure, kill -9 will resolve it, but it's still a usability issue. With the described dynamic property scenario will be changed as follows: 1. Admin enables "safe shutdown" mode 2. Admin shuts down first node, replaces needed binaries and returns the node back to the topology 3. Admin shuts down second node (with either ignite.close() or kill ), shutdown is blocked until the first node returns to the topology and completes the rebalancing process 4. Admin proceeds the rolling restart procedure 5. Admin disables "safe shutdown" mode This logic will also simplify the rolling restart scenario in K8S. Pod with Ignite node won't be terminated until its termination will cause data loss. Aside from waiting for backups, Ignition interface provide lots of options to perform various node stop: - Whether or not to cancel pending compute jobs - Whether or not to perform instant halt() instead of any graceful stop logic - Whether or not to wait for some timeout before halt() - Whether or not the stopped grid should be restarted All these "stop" methods provide very custom logic. I don't see a need to make them part of dynamic cluster-wide configuration. They still can be invoked directly via Java API. Later we can extract some of them to dynamic cluster-wide parameters of default stop if it will become necessary. That's why I think we should create an enum for default shutdown policy, but only with two options so far (we can add more later): DEFAULT and WAIT_FOR_BACKUPS. Regarding the "NORMAL" option that you propose (where the node is not stopped until the rebalance is finished): I don't think that we should add it. It doesn't ensure any strict guarantees: the data still can be lost with it. To sum it up, I propose: 1. Add a new method to Ignition interface to make it possible to stop with "wait for backups" logic directly via Java API, like Ignition.stop(boolean cancel, boolean waitForBackups) 2. Introduce "defaultShutdownPolicy" as a dynamic cluster configuration, two values are available so far: DEFAULT and WAIT_FOR_BACKUPS 3. This property is stored in the distributed metastorage (thus persisted), can be changed via Java API and ./control.sh 4. Behavior configured with this property will be applied only on common ways of stopping the node - Ignite.close() and kill . 5. *Don't* add new options to the static IgniteConfiguration to avoid conflicts between dynamic and static configuration -- Best Regards, Ivan Rakov On Mon, Jun 8, 2020 at 6:44 PM V.Pyatkov wrote: > Hi > > We need to have ability to calling shutdown with various guaranties. > For example: > Need to reboot a node, but after that node should be available for > historical rebalance (all partitions in MOVING state should have gone to > OWNING). > > Implemented a circled reboot of cluster, but all data should be available > on > that time (at least one copy of partition should be available in cluster). > > Need to wait not only data available, but all jobs (before this behavior > available through a stop(false) method invocation). > > All these reason required various behavior before shutting down node. > I propose slightly modify public API and add here method which shown on > shutdown behavior directly: > Ignite.close(Shutdown) > > /public enum Shutdownn { &
Re: Re[2]: Proposal: set default transaction timeout to 5 minutes
Zhenya, Can you please elaborate? Why we need to change default TX timeout via JMX? It looks feasible and perhaps may work as a hotfix for live deployments experiencing issues with long transactions, but it's definitely a separate issue. On Fri, May 22, 2020 at 6:20 PM Zhenya Stanilovsky wrote: > > Ivan, does global timeout change through jmx in scope of this ticket ? If > so, can you add it ? Opposite we need additional ticket, i hope ? We > still have no somehow store for jmx changed params, every one need to > remember that cluster restart will reset this setting to default, in this > case system param need to be appended. > > > > >https://issues.apache.org/jira/browse/IGNITE-13064 is raised with label > >"newbie". > > > >On Tue, May 19, 2020 at 4:10 PM Ivan Rakov < ivan.glu...@gmail.com > > wrote: > > > >> Support this idea in general but why 5 minutes and not less? > >> > >> This value looks to me greater than any value that can possibly affect > >> existing deployments (existing long transactions may suddenly start to > >> rollback), but less than reaction time of users that are only starting > to > >> get along with Ignite and suddenly experience TX deadlock. > >> > >> -- > >> Best Regards, > >> Ivan Rakov > >> > >> On Tue, May 19, 2020 at 10:31 AM Anton Vinogradov < a...@apache.org > > wrote: > >> > >>> +1 > >>> > >>> On Mon, May 18, 2020 at 9:45 PM Sergey Antonov < > antonovserge...@gmail.com > >>> > > >>> wrote: > >>> > >>> > +1 > >>> > > >>> > пн, 18 мая 2020 г. в 21:26, Andrey Mashenkov < > >>> andrey.mashen...@gmail.com >: > >>> > > >>> > > +1 > >>> > > > >>> > > On Mon, May 18, 2020 at 9:19 PM Ivan Rakov < ivan.glu...@gmail.com > > > >>> > wrote: > >>> > > > >>> > > > Hi Igniters, > >>> > > > > >>> > > > I have a very simple proposal. Let's set default TX timeout to 5 > >>> > minutes > >>> > > > (right now it's 0 = no timeout). > >>> > > > Pros: > >>> > > > 1. Deadlock detection procedure is triggered on timeout. In case > >>> user > >>> > > will > >>> > > > get into key-level deadlock, he'll be able to discover root cause > >>> from > >>> > > the > >>> > > > logs (even though load will hang for a while) and skip step with > >>> > googling > >>> > > > and debugging. > >>> > > > 2. Almost every system with transactions has timeout enabled by > >>> > default. > >>> > > > > >>> > > > WDYT? > >>> > > > > >>> > > > -- > >>> > > > Best Regards, > >>> > > > Ivan Rakov > >>> > > > > >>> > > > >>> > > > >>> > > -- > >>> > > Best regards, > >>> > > Andrey V. Mashenkov > >>> > > > >>> > > >>> > > >>> > -- > >>> > BR, Sergey Antonov > >>> > > >>> > >> > > > >
Re: Proposal: set default transaction timeout to 5 minutes
https://issues.apache.org/jira/browse/IGNITE-13064 is raised with label "newbie". On Tue, May 19, 2020 at 4:10 PM Ivan Rakov wrote: > Support this idea in general but why 5 minutes and not less? > > This value looks to me greater than any value that can possibly affect > existing deployments (existing long transactions may suddenly start to > rollback), but less than reaction time of users that are only starting to > get along with Ignite and suddenly experience TX deadlock. > > -- > Best Regards, > Ivan Rakov > > On Tue, May 19, 2020 at 10:31 AM Anton Vinogradov wrote: > >> +1 >> >> On Mon, May 18, 2020 at 9:45 PM Sergey Antonov > > >> wrote: >> >> > +1 >> > >> > пн, 18 мая 2020 г. в 21:26, Andrey Mashenkov < >> andrey.mashen...@gmail.com>: >> > >> > > +1 >> > > >> > > On Mon, May 18, 2020 at 9:19 PM Ivan Rakov >> > wrote: >> > > >> > > > Hi Igniters, >> > > > >> > > > I have a very simple proposal. Let's set default TX timeout to 5 >> > minutes >> > > > (right now it's 0 = no timeout). >> > > > Pros: >> > > > 1. Deadlock detection procedure is triggered on timeout. In case >> user >> > > will >> > > > get into key-level deadlock, he'll be able to discover root cause >> from >> > > the >> > > > logs (even though load will hang for a while) and skip step with >> > googling >> > > > and debugging. >> > > > 2. Almost every system with transactions has timeout enabled by >> > default. >> > > > >> > > > WDYT? >> > > > >> > > > -- >> > > > Best Regards, >> > > > Ivan Rakov >> > > > >> > > >> > > >> > > -- >> > > Best regards, >> > > Andrey V. Mashenkov >> > > >> > >> > >> > -- >> > BR, Sergey Antonov >> > >> >
[jira] [Created] (IGNITE-13064) Set default transaction timeout to 5 minutes
Ivan Rakov created IGNITE-13064: --- Summary: Set default transaction timeout to 5 minutes Key: IGNITE-13064 URL: https://issues.apache.org/jira/browse/IGNITE-13064 Project: Ignite Issue Type: Improvement Reporter: Ivan Rakov Let's set default TX timeout to 5 minutes (right now it's 0 = no timeout). Pros: 1. Deadlock detection procedure is triggered on timeout. In case user will get into key-level deadlock, he'll be able to discover root cause from the logs (even though load will hang for a while) and skip step with googling and debugging. 2. Almost every system with transactions has timeout enabled by default. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).
Folks, Just keeping you informed: I and my colleagues are highly interested in TDE in general and keys rotations specifically, but we don't have enough time so far. We'll dive into this feature and participate in reviews next month. -- Best Regards, Ivan Rakov On Sun, May 17, 2020 at 10:51 PM Pavel Pereslegin wrote: > Hello, Alexey. > > > is the encryption key for the data the same on all nodes in the cluster? > Yes, each encrypted cache group has its own encryption key, the key is > the same on all nodes. > > > Clearly, during the re-encryption there will exist pages > > encrypted with both new and old keys at the same time. > Yes, there will be pages encrypted with different keys at the same time. > Currently, we only store one key for one cache group. To rotate a key, > at a certain point in time it is necessary to support several keys (at > least for reading the WAL). > For the "in place" strategy, we'll store the encryption key identifier > on each encrypted page (we currently have some unused space on > encrypted page, so I don't expect any memory overhead here). Thus, we > will have several keys for reading and one key for writing. I assume > that the old key will be automatically deleted when a specific WAL > segment is deleted (and re-encryption is finished). > > > Will a node continue to re-encrypt the data after it restarts? > Yes. > > > If a node goes down during the re-encryption, but the rest of the > > cluster finishes re-encryption, will we consider the procedure complete? > I'm not sure, but it looks like the key rotation is complete when we > set the new key on all nodes so that the updates will be encrypted > with the new key (as required by PCI DSS). > Status of re-encryption can be obtained separately (locally or cluster > wide). > > I forgot to mention that with “in place” re-encryption it will be > impossible to quickly cancel re-encryption, because by canceling we > mean re-encryption with the old key. > > > How do you see the whole key rotation procedure will work? > Initial design for re-encryption with "partition copying" is described > here [1]. I'll prepare detailed design for "in place" re-encryption if > we'll go this way. In short, send the new encryption key cluster-wide, > each node adds a new key and starts background re-encryption. > > [1] > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign > . > > вс, 17 мая 2020 г. в 18:35, Alexey Goncharuk : > > > > Pavel, Anton, > > > > How do you see the whole key rotation procedure will work? Clearly, > during > > the re-encryption there will exist pages encrypted with both new and old > > keys at the same time. Will a node continue to re-encrypt the data after > it > > restarts? If a node goes down during the re-encryption, but the rest of > the > > cluster finishes re-encryption, will we consider the procedure complete? > By > > the way, is the encryption key for the data the same on all nodes in the > > cluster? > > > > чт, 14 мая 2020 г. в 11:30, Anton Vinogradov : > > > > > +1 to "In place re-encryption". > > > > > > - It has a simple design. > > > - Clusters under load may require just load to re-encrypt the data. > > > (Friendly to load). > > > - Easy to throttle. > > > - Easy to continue. > > > - Design compatible with the multi-key architecture. > > > - It can be optimized to use own WAL buffer and to re-encrypt pages > without > > > restoring them to on-heap. > > > > > > On Thu, May 14, 2020 at 1:54 AM Pavel Pereslegin > wrote: > > > > > > > Hello Igniters. > > > > > > > > Recently, master key rotation for Apache Ignite Transparent Data > > > > Encryption was implemented [1], but some security standards (PCI DSS > > > > at least) require rotation of all encryption keys [2]. Currently, > > > > encryption occurs when reading/writing pages to disk, cache > encryption > > > > keys are stored in metastore. > > > > > > > > I'm going to contribute cache encryption key rotation and want to > > > > consult what is the best way to re-encrypting existing data, I see > two > > > > different strategies. > > > > > > > > 1. In place re-encryption: > > > > Using the old key, sequentially read all the pages from the > datastore, > > > > mark as dirty and log them into the WAL. After checkpoint pages will > > > >
[jira] [Created] (IGNITE-13052) Calculate result of reserveHistoryForExchange in advance
Ivan Rakov created IGNITE-13052: --- Summary: Calculate result of reserveHistoryForExchange in advance Key: IGNITE-13052 URL: https://issues.apache.org/jira/browse/IGNITE-13052 Project: Ignite Issue Type: Improvement Reporter: Ivan Rakov Method reserveHistoryForExchange() is called on every partition map exchange. It's an expensive call: it requires iteration over the whole checkpoint history with possible retrieve of GroupState from WAL (it's stored on heap with SoftReference). On some deployments this operation can take several minutes. The idea of optimization is to calculate it's result only on first PME (ideally, even before first PME, on recovery stage), keep resulting map {grpId, partId -> earlisetCheckpoint} on heap and update it if necessary. From the first glance, map should be updated: 1) On checkpoint. If a new partition appears on local node, it should be registered in the map with current checkpoint. If a partition is evicted from local node, or changed its state to non-OWNING, it should removed from the map. If checkpoint is marked as inapplicable for a certain group, the whole group should be removed from the map. 2) On checkpoint history cleanup. For every (grpId, partId), previous earliest checkpoint should be changed with setIfGreater to new earliest checkpoint. Memory overhead of storing described map on heap in significant. It's size isn't greater than size of map returned from reserveHistoryForExchange(). Described fix should be much simpler than IGNITE-12429. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Proposal: set default transaction timeout to 5 minutes
> > Support this idea in general but why 5 minutes and not less? This value looks to me greater than any value that can possibly affect existing deployments (existing long transactions may suddenly start to rollback), but less than reaction time of users that are only starting to get along with Ignite and suddenly experience TX deadlock. -- Best Regards, Ivan Rakov On Tue, May 19, 2020 at 10:31 AM Anton Vinogradov wrote: > +1 > > On Mon, May 18, 2020 at 9:45 PM Sergey Antonov > wrote: > > > +1 > > > > пн, 18 мая 2020 г. в 21:26, Andrey Mashenkov >: > > > > > +1 > > > > > > On Mon, May 18, 2020 at 9:19 PM Ivan Rakov > > wrote: > > > > > > > Hi Igniters, > > > > > > > > I have a very simple proposal. Let's set default TX timeout to 5 > > minutes > > > > (right now it's 0 = no timeout). > > > > Pros: > > > > 1. Deadlock detection procedure is triggered on timeout. In case user > > > will > > > > get into key-level deadlock, he'll be able to discover root cause > from > > > the > > > > logs (even though load will hang for a while) and skip step with > > googling > > > > and debugging. > > > > 2. Almost every system with transactions has timeout enabled by > > default. > > > > > > > > WDYT? > > > > > > > > -- > > > > Best Regards, > > > > Ivan Rakov > > > > > > > > > > > > > -- > > > Best regards, > > > Andrey V. Mashenkov > > > > > > > > > -- > > BR, Sergey Antonov > > >
Proposal: set default transaction timeout to 5 minutes
Hi Igniters, I have a very simple proposal. Let's set default TX timeout to 5 minutes (right now it's 0 = no timeout). Pros: 1. Deadlock detection procedure is triggered on timeout. In case user will get into key-level deadlock, he'll be able to discover root cause from the logs (even though load will hang for a while) and skip step with googling and debugging. 2. Almost every system with transactions has timeout enabled by default. WDYT? -- Best Regards, Ivan Rakov
Re: [ANNOUNCE] New Committer: Taras Ledkov
Taras, Congratulations and welcome! On Tue, May 12, 2020 at 8:26 PM Denis Magda wrote: > Taras, > > Welcome, that was long overdue on our part! Hope to see you soon among the > PMC group. > > - > Denis > > > On Tue, May 12, 2020 at 9:09 AM Dmitriy Pavlov wrote: > > > Hello Ignite Community, > > > > > > > > The Project Management Committee (PMC) for Apache Ignite has invited > Taras > > Ledkov to become a committer and we are pleased to announce that he has > > accepted. > > > > > > Taras is an Ignite SQL veteran who knows in detail current Ignite - H2 > > integration and binary serialization, actively participates in JDBC and > > thin client protocol development, he is eager to help users on the user > > list within his area of expertise. > > > > > > > > Being a committer enables easier contribution to the project since there > is > > no need to go via the patch submission process. This should enable better > > productivity. > > > > > > > > Taras, thank you for all your efforts, congratulations and welcome on > > board! > > . > > > > > > > > Best Regards, > > > > Dmitriy Pavlov > > > > on behalf of Apache Ignite PMC > > >
Re: [DISCUSS] Apache URL for TC bot
Ivan, Agree. Mail notifications can be temporarily turned off in configuration of the new bot. On Tue, May 12, 2020 at 3:12 PM Ivan Pavlukhin wrote: > Having bot deployed in open/free (and reliable) infrastructure sounds > great! One precaution which seems important to me though is avoidance > of duplicate (or even controversial) notifications from 2 bots at the > same time. > > Best regards, > Ivan Pavlukhin > > вт, 12 мая 2020 г. в 15:06, Ivan Rakov : > > > > Hi, > > > > I've created an INFRA ticket [1] for forwarding requests from " > > mtcga.ignite.apache.org" to the server where TC bot is hosted [1]. > > Definitely, I wouldn't object if anyone will deploy TC bot to the public > > cloud. We can live with two bots for a while, and then start using a > public > > bot after it accumulates enough build history to grant VISAs. If anyone > is > > interested, please check TC bot homepage on github with setup guide [2]. > > <https://github.com/apache/ignite-teamcity-bot> > > > > [1]: https://issues.apache.org/jira/browse/INFRA-20257 > > [2]: https://github.com/apache/ignite-teamcity-bot > > > > -- > > Best Regards, > > Ivan Rakov > > > > On Tue, May 12, 2020 at 12:44 PM Ilya Kasnacheev < > ilya.kasnach...@gmail.com> > > wrote: > > > > > Hello! > > > > > > It would be nice if somebody would try to bring up a parallel > deployment of > > > MTCGA bot on Apache domain. > > > > > > This way people will have a choice of using "old" or "new" bot, and > they we > > > may decide of sticking to one of them. > > > > > > Regards, > > > -- > > > Ilya Kasnacheev > > > > > > > > > пн, 11 мая 2020 г. в 18:37, Maxim Muzafarov : > > > > > > > Ivan, > > > > > > > > > > > > Good idea. > > > > +1 to have the right domain name. > > > > > > > > I can imagine that we can go even further and completely move TC.Bot > > > > to some public cloud storage. For example, Amazon can provide > > > > promotional credits for open source projects [1]. > > > > > > > > > > > > [1] > > > > > > > > https://aws.amazon.com/blogs/opensource/aws-promotional-credits-open-source-projects/ > > > > > > > > On Mon, 11 May 2020 at 11:35, Ivan Pavlukhin > > > wrote: > > > > > > > > > > Igniters, > > > > > > > > > > As you might know currently TC bot has a domain name in a GridGain > > > > > domain [1]. What do you think should we assign a name in an Apache > > > > > domain to the bot? > > > > > > > > > > [1] https://mtcga.gridgain.com/ > > > > > > > > > > Best regards, > > > > > Ivan Pavlukhin > > > > > > > >
Re: [DISCUSS] Apache URL for TC bot
Hi, I've created an INFRA ticket [1] for forwarding requests from " mtcga.ignite.apache.org" to the server where TC bot is hosted [1]. Definitely, I wouldn't object if anyone will deploy TC bot to the public cloud. We can live with two bots for a while, and then start using a public bot after it accumulates enough build history to grant VISAs. If anyone is interested, please check TC bot homepage on github with setup guide [2]. <https://github.com/apache/ignite-teamcity-bot> [1]: https://issues.apache.org/jira/browse/INFRA-20257 [2]: https://github.com/apache/ignite-teamcity-bot -- Best Regards, Ivan Rakov On Tue, May 12, 2020 at 12:44 PM Ilya Kasnacheev wrote: > Hello! > > It would be nice if somebody would try to bring up a parallel deployment of > MTCGA bot on Apache domain. > > This way people will have a choice of using "old" or "new" bot, and they we > may decide of sticking to one of them. > > Regards, > -- > Ilya Kasnacheev > > > пн, 11 мая 2020 г. в 18:37, Maxim Muzafarov : > > > Ivan, > > > > > > Good idea. > > +1 to have the right domain name. > > > > I can imagine that we can go even further and completely move TC.Bot > > to some public cloud storage. For example, Amazon can provide > > promotional credits for open source projects [1]. > > > > > > [1] > > > https://aws.amazon.com/blogs/opensource/aws-promotional-credits-open-source-projects/ > > > > On Mon, 11 May 2020 at 11:35, Ivan Pavlukhin > wrote: > > > > > > Igniters, > > > > > > As you might know currently TC bot has a domain name in a GridGain > > > domain [1]. What do you think should we assign a name in an Apache > > > domain to the bot? > > > > > > [1] https://mtcga.gridgain.com/ > > > > > > Best regards, > > > Ivan Pavlukhin > > >
Re: Extended logging for rebalance performance analysis
Hi, IGNITE_WRITE_REBALANCE_PARTITION_DISTRIBUTION_THRESHOLD - threshold > duration rebalance of cache group after which partitions distribution is > output, set in milliseconds, default value is 10 minutes. Does it mean that if the rebalancing process took less than 10 minutes, only a short version of the message (with supplier statistics) will show up? In general, I have no objections. On Mon, May 4, 2020 at 10:38 AM ткаленко кирилл wrote: > Hi, Igniters! > > I'd like to share a new small feature in AI [1]. > > Current rebalance logging does not allow you to quickly answer following > questions: > 1)How long was the balance(divided by supplier)? > 2)How many records and bytes per supplier were rebalanced? > 3)How many times did rebalance restart? > 4)Which partitions were rebalanced and from which nodes did they receive > them? > 5)When did rebalance for all cache groups end? > > What you can see in logs now: > > 1)Starting rebalance with order of cache groups. > Rebalancing scheduled [order=[ignite-sys-cache, grp1, grp0], > top=AffinityTopologyVersion [topVer=2, minorTopVer=0], force=false, > evt=NODE_JOINED, node=c2146a04-dc23-4bc9-870d-dfbb55c1] > > 2)Start rebalance of cache group from a specific supplier, specifying > partition ids and mode - historical or full. > Starting rebalance routine [ignite-sys-cache, > topVer=AffinityTopologyVersion [topVer=2, minorTopVer=0], > supplier=8c525892-703b-4fc4-b28b-b2f13970, fullPartitions=[0-99], > histPartitions=[]] > > 3)Getting partial or complete partitions of cache group. > Completed rebalancing [grp=ignite-sys-cache, > supplier=8c525892-703b-4fc4-b28b-b2f13970, > topVer=AffinityTopologyVersion [topVer=5, minorTopVer=0], progress=1/2] > Completed (final) rebalancing [grp=ignite-sys-cache, > supplier=c2146a04-dc23-4bc9-870d-dfbb55c1, > topVer=AffinityTopologyVersion [topVer=5, minorTopVer=0], progress=2/2] > > 4)End rebalance of cache group. > Completed rebalance future: RebalanceFuture [grp=CacheGroupContext > [grp=ignite-sys-cache], topVer=AffinityTopologyVersion [topVer=2, > minorTopVer=0], rebalanceId=1, routines=1, receivedBytes=1200, > receivedKeys=0, partitionsLeft=0, startTime=1588519707607, endTime=-1, > lastCancelledTime=-1] > > Rebalance statistics: > > To speed up rebalance analysis, statistics will be output for each cache > group and total for all cache groups. > If duration rebalance for cache group is greater than threshold value, > partition distribution is output. > Statistics will you to analyze duration of the balance for each supplier > to understand which of them has been transmitting data for longest time. > > System properties are used to output statistics: > > IGNITE_QUIET - to output statistics, value must be false; > IGNITE_WRITE_REBALANCE_PARTITION_DISTRIBUTION_THRESHOLD - threshold > duration rebalance of cache group after which partitions distribution is > output, set in milliseconds, default value is 10 minutes. > > Statistics examples: > > Successful full and historical rebalance of group cache, without > partitions distribution. > Rebalance information per cache group (successful rebalance): [id=3181548, > name=grp1, startTime=2020-04-13 10:55:16,117, finishTime=2020-04-13 > 10:55:16,127, d=10 ms, restarted=0] Supplier statistics: [nodeId=0, p=5, > d=10 ms] [nodeId=1, p=5, d=10 ms] Aliases: p - partitions, e - entries, b - > bytes, d - duration, h - historical, nodeId mapping > (nodeId=id,consistentId) [0=rebalancing.RebalanceStatisticsTest1] > [1=rebalancing.RebalanceStatisticsTest0] > Rebalance information per cache group (successful rebalance): [id=3181547, > name=grp0, startTime=2020-04-13 15:01:44,000, finishTime=2020-04-13 > 15:01:44,116, d=116 ms, restarted=0] Supplier statistics: [nodeId=0, hp=10, > he=300, hb=30267, d=116 ms] Aliases: p - partitions, e - entries, b - > bytes, d - duration, h - historical, nodeId mapping > (nodeId=id,consistentId) [0=rebalancing.RebalanceStatisticsTest0] > > Successful full and historical rebalance of group cache, with partitions > distribution. > Rebalance information per cache group (successful rebalance): [id=3181548, > name=grp1, startTime=2020-04-13 10:55:16,117, finishTime=2020-04-13 > 10:55:16,127, d=10 ms, restarted=0] Supplier statistics: [nodeId=0, p=5, > d=10 ms] [nodeId=1, p=5, d=10 ms] Aliases: p - partitions, e - entries, b - > bytes, d - duration, h - historical, nodeId mapping > (nodeId=id,consistentId) [0=rebalancing.RebalanceStatisticsTest1] > [1=rebalancing.RebalanceStatisticsTest0] Rebalance duration was greater > than 5 ms, printing detailed information about partitions distribution > (threshold can be changed by setting number of milliseconds into > IGNITE_WRITE_REBALANCE_PARTITION_DISTRIBUTION_THRESHOLD) 0 = > [0,bu,su],[1,bu],[2,pr,su] 1 = [0,bu,su],[1,bu],[2,pr,su] 2 = > [0,bu,su],[1,bu],[2,pr,su] 3 = [0,bu,su],[1,bu],[2,pr,su] 4 = > [0,bu,su],[1,bu],[2,pr,su] 5 = [0,bu,su],[1,bu],[2,pr,su] 6 = > [0,bu,su],[1,bu],[2,pr,su] 7 = [0,bu,su]
Re: Apache Ignite 2.8.1 RELEASE [Time, Scope, Manager]
Hi, I suggest to include these fixes into 2.8.1 release: https://issues.apache.org/jira/browse/IGNITE-12101 https://issues.apache.org/jira/browse/IGNITE-12651 On Fri, Apr 17, 2020 at 11:32 AM Ivan Pavlukhin wrote: > Hi folks, > > A side note from an external spectator. Should not we reflect on the > release page [1] who is a release manager? > > [1] https://cwiki.apache.org/confluence/display/IGNITE/Apache+Ignite+2.8.1 > > Best regards, > Ivan Pavlukhin > > пт, 17 апр. 2020 г. в 11:11, Nikolay Izhikov : > > > > Hello, Igniters. > > > > I’ve added all tickets proposed in this thread to 2.8.1 scope [1] > > For now we have > > > > 61 resolved tickets. > > 19 unresolved tickets. > > > > > > > > [1] > https://cwiki.apache.org/confluence/display/IGNITE/Apache+Ignite+2.8.1 > > > > > 17 апр. 2020 г., в 01:38, Alex Plehanov > написал(а): > > > > > > Hello guys, > > > > > > I propose to cherry-pick to 2.8.1 following bug-fixes too, which are > > > already in master: > > > Scan query over an evicted partition can cause node failure > (IGNITE-12734 > > > [1]) > > > Java thin client: There were problems with deserialization of some > types on > > > the client-side, these types can't be used (IGNITE-12624 > > > [2], IGNITE-12468 [3]) > > > Java thin client: Thread doesn't stop properly on client close when > > > partition awareness is enabled, this prevents main() method from > exiting > > > (IGNITE-12743 [4]) > > > > > > Also, there is a performance fix for checkpoint read lock, which I > propose > > > to cherry-pick too (IGNITE-12491 [5]). This fix brings significant > > > performance boost on environments with a large number of CPUs (there > was > > > some drop on such environments introduced in 2.8.0 for all > transactional > > > operations after IGNITE-12593 fixing) > > > > > > WDYT? > > > > > > [1]: https://issues.apache.org/jira/browse/IGNITE-12734 > > > [2]: https://issues.apache.org/jira/browse/IGNITE-12624 > > > [3]: https://issues.apache.org/jira/browse/IGNITE-12468 > > > [4]: https://issues.apache.org/jira/browse/IGNITE-12743 > > > [5]: https://issues.apache.org/jira/browse/IGNITE-12491 > > > > > > чт, 16 апр. 2020 г. в 18:48, Maxim Muzafarov : > > > > > >> Nikolay, > > >> > > >> Probably, we should not wait for all blocker issues in minor bug-fix > > >> releases except very special cases. I think we should release all > > >> accumulated bug-fixes `as is` and schedule the next 2.8.2 release. > > >> This will allow as to have shorten minor releases. > > >> > > >> On Thu, 16 Apr 2020 at 18:17, Nikolay Izhikov > wrote: > > >>> > > >>> Hello, Igniters. > > >>> > > >>> I’m started to work on this 2.8.1 release [1] > > >>> > > >>> Resolved issues for release(28) - [2] > > >>> Unresolved issues for release(30) - [3] > > >>> > > >>> My next step: > > >>> > > >>> 1. I want to double-check that all commits for the tickets with the > > >> fixVersion=2.8.1 presented in corresponding release branch. > > >>> And cherry-pick losted changes. > > >>> > > >>> 2. I want to reduce the scope of the release and exclude tickets > that is > > >> not ready for now. > > >>> > > >>> As you may know, 2.8.1 is a bug fix release. > > >>> Therefore, I think we can wait only for a blocker issues. > > >>> > > >>> What do you think? > > >>> > > >>> [1] > > >> > https://cwiki.apache.org/confluence/display/IGNITE/Apache+Ignite+2.8.1 > > >>> [2] > > >> > https://issues.apache.org/jira/issues/?jql=(project%20%3D%20%27Ignite%27%20AND%20fixVersion%20is%20not%20empty%20AND%20fixVersion%20in%20(%272.8.1%27))%20AND%20(component%20is%20EMPTY%20OR%20component%20not%20in%20(documentation))%20and%20status%20in%20(%27CLOSED%27%2C%20%27RESOLVED%27)%20ORDER%20BY%20priority%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20 > > >>> [3] > > >> > https://issues.apache.org/jira/issues/?jql=(project%20%3D%20%27Ignite%27%20AND%20fixVersion%20is%20not%20empty%20AND%20fixVersion%20in%20(%272.8.1%27))%20AND%20(component%20is%20EMPTY%20OR%20component%20not%20in%20(documentation))%20%20and%20status%20not%20in%20(%27CLOSED%27%2C%20%27RESOLVED%27)%20ORDER%20BY%20priority%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20 > > >>> > > >>> > > 8 апр. 2020 г., в 20:15, Вячеслав Коптилин < > slava.kopti...@gmail.com> > > >> написал(а): > > > > Folks, > > > > I'd like to add ticket IGNITE-12805 "NullPointerException on node > > >> restart > > when 3rd party persistence and Ignite native persistence are used" > to > > ignite-2.8.1 scope. > > > > [1] https://issues.apache.org/jira/browse/IGNITE-12805 > > > > Thanks, > > S. > > > > вт, 7 апр. 2020 г. в 19:57, Ilya Kasnacheev < > ilya.kasnach...@gmail.com > > >>> : > > > > > Hello! > > > > > > Done! > > > > > > Regards, > > > -- > > > Ilya Kasnacheev > > > > > > > > > вт, 7 апр. 2020 г. в 12:31, Sergey : > > > > > >> Hi, > > >> > > >> I'm proposing to add > > >> https://issues.apache.org/jira/bro
Re: [DISCUSSION] Major changes in Ignite in 2020
Hi everyone! Major changes that are going to be contributed from our side: - https://issues.apache.org/jira/browse/IGNITE-11704 - keeping tombstones for removed entries to make rebalance consistent (this problem is solved by on-heap deferred deletes queue so far). - https://issues.apache.org/jira/browse/IGNITE-11147 - don't cancel ongoing rebalance if affinity assignment for the rebalancing group wasn't changed during the PME. - Batch of other updates related to the historical rebalance. Goal is to make historical rebalance stable and to ensure that if WAL history is configured properly the cluster will be able to recover data consistency via historical rebalance in case of any topology changes (including cycling restart). - Overhaul of partition loss handling. It has several flaws so far; the most critical one is that by default (with PartitionLossPolicy.IGNORE) Ignite may silently lose data. Also, (PartitionLossPolicy.IGNORE) is totally inapplicable to scenarios when persistence is enabled and BLT is established. Also, even safe policies have bugs: LOST state is reset when node rejoins the cluster, so data actually can be lost even with safe policy. We are going to set safe policy as default and fix related bugs. - Distributed tracing (via OpenCensus). Discovery, communication and transactions will be covered. On Fri, Apr 10, 2020 at 11:43 AM Anton Kalashnikov wrote: > My top priorities: > * Cache warm-up - loading data from disk to memory before the join to > cluster - > https://cwiki.apache.org/confluence/display/IGNITE/IEP-40+Cache+warm-up > * PDS Defragmentation - possibility to free up space on disc after > removing entries > > > -- > Best regards, > Anton Kalashnikov > > > > 20.03.2020, 10:19, "Pavel Tupitsyn" : > > My top priorities: > >- Thin Client API extension: Compute, Continuous Queries, Services >- .NET Near Cache: soon to come in Thick API, to be investigated for >Thin Clients >- .NET Modernization for Ignite 3.0: drop legacy .NET Framework >support, target .NET Standard 2.0, add nullable annotations to the API > > > On Fri, Mar 20, 2020 at 5:23 AM Saikat Maitra > wrote: > > Hi Denis, > > Thank you for sharing the list of top changes. The list looks good. > > I wanted to share that efforts regarding IEP-36 is already underway and > there are also open PRs under review and working through review feedback. > One of the area that we are focussing is first we will merge changes in > ignite-extensions repo before removing the specific migrated module from > ignite repo. > > There are also contribution from community on bug fixes in > ignite-extensions repo as well which we are verifying and merging in > ignite-extensions repo after running through CI pipeline in teamcity. > > I like the focus area on docs and I really like the Apache Ignite Usecases > page https://ignite.apache.org/provenusecases.html, I would like to > suggest if we can add a page like powered by Apache Ignite and list few Org > who are already using Apache Ignite in prod. > > Something similar to this page https://flink.apache.org/poweredby.html > > Regards, > Saikat > > > > > > > On Thu, Mar 19, 2020 at 1:44 PM Denis Magda wrote: > > My top list of changes is as follows: > >- Feature: New lightweight Apache Ignite website with advanced search >engine optimizations and updated technical content. Why? Much better >discoverability of Ignite via search engines like Google to let many more >application developers learn about Ignite existence. This change is to be >brought to live soon: > > http://apache-ignite-developers.2346864.n4.nabble.com/Ignite-Website-New-Look-td46324.html > > >- Feature: New Ignite documentation on a new platform and with a new >structure. Why? Ignite documentation has to help new application developers >to get up and running as quickly as possible, it also has to become a >primary source that answers most of the questions. Our current docs have a >lot of gaps: https://issues.apache.org/jira/browse/IGNITE-7595 > > >- Process Change: to be successful with the point above, documentation >should be created/updated before we close a JIRA ticket for >code/API/feature contribution. Why? First, application developers learn >Ignite and create their Ignite-apps referring to API reference and >technical documentation (and not to the source code), thus, documentation >needs to be treated as an integral part of the whole project. Second, while >writing a new documentation paragraph we could discover incompleteness of a >fix/feature or usability issues before the change is released publicly. > > >- Feature: complete the modularization project by defining the Ignite >core that will be released separately from Ignite extensions. The 'why' is >written here: >https://cwiki.apache.org/confluence/display/IGNITE/IEP-36%3A+Modularization > > - > Denis > > > On Thu, Mar 19, 2020 at 11:21 AM Denis Magda wrote:
Re: Data vanished from cluster after INACTIVE/ACTIVE switch
I don't think that making javadocs more descriptive can be considered as harmful code base enlargement. I'd recommend to extend the docs, but the last word is yours ;) On Tue, Mar 31, 2020 at 2:44 PM Vladimir Steshin wrote: > Ivan, hi. > > I absolutely agree this particular description is not enough to see the > deactivation issue. I also vote for brief code. > > There are about 15 places in inner logic with this description. I > propose balance between code base size and comment completeness. > > Should we enlarge code even if we got several full descriptions? > > > 30.03.2020 20:02, Ivan Rakov пишет: > > Vladimir, > > > > @param forceDeactivation If {@code true}, cluster deactivation will be > >> forced. > > It's true that it's possible to infer semantics of forced deactivation > from > > other parts of API. I just wanted to highlight that exactly this > > description explains something that can be guessed by the parameter name. > > I suppose to shorten the lookup path and shed a light on deactivation > > semantics a bit: > > > >> @param forceDeactivation If {@code true}, cluster will be deactivated > even > >> if running in-memory caches are present. All data in the corresponding > >> caches will be vanished as a result. > > Does this make sense? > > > > On Fri, Mar 27, 2020 at 12:00 PM Vladimir Steshin > > wrote: > > > >> Ivan, hi. > >> > >> > >> 1) >>> Is it correct? If we are on the same page, let's proceed this way > >> > >> It is correct. > >> > >> > >> 2) - In many places in the code I can see the following javadoc > >> > >>>@param forceDeactivation If {@code true}, cluster deactivation will > be > >> forced. > >> > >> In the internal params/flags. You can also find /@see > >> ClusterState#INACTIVE/ and full description with several public APIs ( > >> like /Ignite.active(boolean)/ ): > >> > >> // > >> > >> /* / > >> > >> // > >> > >> /* NOTE:/ > >> > >> // > >> > >> /* Deactivation clears in-memory caches (without persistence) including > >> the system caches./ > >> > >> Should be enough. Is not? > >> > >> > >> 27.03.2020 10:51, Ivan Rakov пишет: > >>> Vladimir, Igniters, > >>> > >>> Let's emphasize our final plan. > >>> > >>> We are going to add --force flags that will be necessary to pass for a > >>> deactivation if there are in-memory caches to: > >>> 1) Rest API (already implemented in [1]) > >>> 2) Command line utility (already implemented in [1]) > >>> 3) JMX bean (going to be implemented in [2]) > >>> We are *not* going to change IgniteCluster or any other thick Java API, > >>> thus we are *not* going to merge [3]. > >>> We plan to *fully rollback* [1] and [2] once cache data survival after > >>> activation-deactivation cycle will be implemented. > >>> > >>> Is it correct? If we are on the same page, let's proceed this way. > >>> I propose to: > >>> - Create a JIRA issue for in-memory-data-safe deactivation (possibly, > >>> without IEP and detailed design so far) > >>> - Describe in the issue description what exact parts of API should be > >>> removed under the issue scope. > >>> > >>> Also, a few questions on already merged [1]: > >>> - We have removed GridClientClusterState#state(ClusterState) from Java > >>> client API. Is it a legitimate thing to do? Don't we have to support > API > >>> compatibility for thin clients as well? > >>> - In many places in the code I can see the following javadoc > >>> > >>>>@param forceDeactivation If {@code true}, cluster deactivation will > >> be forced. > >>>> As for me, this javadoc doesn't clarify anything. I'd suggest to > >> describe > >>> in which cases deactivation won't happen unless it's forced and which > >>> impact forced deactivation will bring on the system. > >>> > >>> [1]: https://issues.apache.org/jira/browse/IGNITE-12701 > >>> [2]: https://issues.apache.org/jira/browse/IGNITE-12779 > >>> [3]: https://issues.apache.org/jira/browse/IGNITE-12614 > >>> > >>> -- > >>> Ivan > >>> > >>> On Tue, Mar
Re: Data vanished from cluster after INACTIVE/ACTIVE switch
Vladimir, @param forceDeactivation If {@code true}, cluster deactivation will be > forced. It's true that it's possible to infer semantics of forced deactivation from other parts of API. I just wanted to highlight that exactly this description explains something that can be guessed by the parameter name. I suppose to shorten the lookup path and shed a light on deactivation semantics a bit: > @param forceDeactivation If {@code true}, cluster will be deactivated even > if running in-memory caches are present. All data in the corresponding > caches will be vanished as a result. Does this make sense? On Fri, Mar 27, 2020 at 12:00 PM Vladimir Steshin wrote: > Ivan, hi. > > > 1) >>> Is it correct? If we are on the same page, let's proceed this way > > It is correct. > > > 2) - In many places in the code I can see the following javadoc > > > @param forceDeactivation If {@code true}, cluster deactivation will be > forced. > > In the internal params/flags. You can also find /@see > ClusterState#INACTIVE/ and full description with several public APIs ( > like /Ignite.active(boolean)/ ): > > // > > /* / > > // > > /* NOTE:/ > > // > > /* Deactivation clears in-memory caches (without persistence) including > the system caches./ > > Should be enough. Is not? > > > 27.03.2020 10:51, Ivan Rakov пишет: > > Vladimir, Igniters, > > > > Let's emphasize our final plan. > > > > We are going to add --force flags that will be necessary to pass for a > > deactivation if there are in-memory caches to: > > 1) Rest API (already implemented in [1]) > > 2) Command line utility (already implemented in [1]) > > 3) JMX bean (going to be implemented in [2]) > > We are *not* going to change IgniteCluster or any other thick Java API, > > thus we are *not* going to merge [3]. > > We plan to *fully rollback* [1] and [2] once cache data survival after > > activation-deactivation cycle will be implemented. > > > > Is it correct? If we are on the same page, let's proceed this way. > > I propose to: > > - Create a JIRA issue for in-memory-data-safe deactivation (possibly, > > without IEP and detailed design so far) > > - Describe in the issue description what exact parts of API should be > > removed under the issue scope. > > > > Also, a few questions on already merged [1]: > > - We have removed GridClientClusterState#state(ClusterState) from Java > > client API. Is it a legitimate thing to do? Don't we have to support API > > compatibility for thin clients as well? > > - In many places in the code I can see the following javadoc > > > >> @param forceDeactivation If {@code true}, cluster deactivation will > be forced. > >> > >> As for me, this javadoc doesn't clarify anything. I'd suggest to > describe > > in which cases deactivation won't happen unless it's forced and which > > impact forced deactivation will bring on the system. > > > > [1]: https://issues.apache.org/jira/browse/IGNITE-12701 > > [2]: https://issues.apache.org/jira/browse/IGNITE-12779 > > [3]: https://issues.apache.org/jira/browse/IGNITE-12614 > > > > -- > > Ivan > > > > On Tue, Mar 24, 2020 at 7:18 PM Vladimir Steshin > wrote: > > > >> Hi, Igniters. > >> > >> I'd like to remind you that cluster can be deactivated by user with 3 > >> utilities: control.sh, *JMX and the REST*. Proposed in [1] solution is > >> not about control.sh. It suggests same approach regardless of the > >> utility user executes. The task touches *only* *API of the user calls*, > >> not the internal APIs. > >> > >> The reasons why flag “--yes” and confirmation prompt hasn’t taken into > >> account for control.sh are: > >> > >> -Various commands widely use “--yes” just to start. Even not dangerous > >> ones require “--yes” to begin. “--force” is dedicated for *harmless > >> actions*. > >> > >> -Checking of probable data erasure works after command start and > >> “--force” may not be required at all. > >> > >> -There are also JMX and REST. They have no “—yes” but should work alike. > >> > >> To get the deactivation safe I propose to merge last ticket with > >> the JMX fixes [2]. In future releases, I believe, we should estimate > >> jobs and fix memory erasure in general. For now, let’s prevent it. WDYT? > >> > >> > >> [1] https://issues.apache.org/jira/browse/IGNITE-12614 > >> > >> [2] https://is
Re: Security Subject of thin client on remote nodes
Denis, In general, code changes look good to me. If we decide to keep security API in its current state for a while, I highly recommend to extend its documentation. We don't have descriptive javadocs or articles about security API so far, so I expect that next contributors will face difficulties in untangling security logic. Let's help them a bit. See more details in my JIRA comment: https://issues.apache.org/jira/browse/IGNITE-12759 On Thu, Mar 26, 2020 at 5:54 PM Ivan Rakov wrote: > Denis, > > I'll review your PR. If this issue is a subject to be included in 2.8.1 in > emergency mode, I'm ok with the current API changes. > Please think about driving creating IEP on security API overhaul prior to > 2.9. I believe that you are the most suitable Ignite community member to > drive this activity. I'd love to share some ideas as well. > > On Tue, Mar 24, 2020 at 2:04 PM Denis Garus wrote: > >> Hi, guys! >> >> >> I agree that we should rework the security API, but it can take a long >> time. >> >> And currently, our users have certain impediments that are blockers for >> their job. >> >> I think we have to fix bugs that IEP-41 [1] contains as soon as possible >> to >> support our users. >> >> From my point of view, IEP-41 is the best place to track bug fixing. >> >> >> >> 1. >> >> https://cwiki.apache.org/confluence/display/IGNITE/IEP-41%3A+Security+Context+of+thin+client+on+remote+nodes >> >> >> вт, 24 мар. 2020 г. в 12:26, Ivan Rakov : >> >> > Alexey, >> > >> > That can be another version of our plan. If everyone agrees that >> > SecurityContext and SecuritySubject should be merged, such fix of thin >> > clients' issue will bring us closer to the final solution. >> > Denis, what do you think? >> > >> > On Tue, Mar 24, 2020 at 10:38 AM Alexei Scherbakov < >> > alexey.scherbak...@gmail.com> wrote: >> > >> > > Why can't we start gradually changing security API right now ? >> > > I see no point in delaying with. >> > > All changes will go to next 2.9 release anyway. >> > > >> > > My proposal: >> > > 1. Get rid of security context. Doing this will bring security API to >> > more >> > > or less consistent state. >> > > 2. Remove IEP-41 because it's no longer needed because of change [1] >> > > 3. Propose an IEP to make security API avoid using internals. >> > > >> > > >> > > >> > > пн, 23 мар. 2020 г. в 19:53, Denis Garus : >> > > >> > > > Hello, Alexei, Ivan! >> > > > >> > > > >> Seems like security API is indeed a bit over-engineered >> > > > >> > > > Nobody has doubt we should do a reworking of GridSecurityProcessor. >> > > > But this point is outside of scope thin client's problem that we are >> > > > solving. >> > > > I think we can create new IEP that will accumulate all ideas of >> > Ignite's >> > > > security improvements. >> > > > >> > > > >> Presence of the separate #securityContext(UUID) highlights that >> user >> > > > indeed should care >> > > > >> about propagation of thin clients' contexts between the cluster >> > nodes. >> > > > >> > > > I agree with Ivan. I've implemented both variants, >> > > > and I like one with #securityContext(UUID) more. >> > > > >> > > > Could you please take a look at PR [1] for the issue [2]? >> > > > >> > > > 1. https://github.com/apache/ignite/pull/7523 >> > > > 2. https://issues.apache.org/jira/browse/IGNITE-12759 >> > > > >> > > > пн, 23 мар. 2020 г. в 11:45, Ivan Rakov : >> > > > >> > > > > Alex, Denis, >> > > > > >> > > > > Seems like security API is indeed a bit over-engineered. >> > > > > >> > > > > Let's get rid of SecurityContext and use SecuritySubject instead. >> > > > > > SecurityContext is just a POJO wrapper over >> > > > > > SecuritySubject's >> > > > > > org.apache.ignite.plugin.security.SecuritySubject#permissions. >> > > > > > It's functionality can be easily moved to SecuritySubject. >> > > > > >> > > > > I totally
Re: Data vanished from cluster after INACTIVE/ACTIVE switch
Vladimir, Igniters, Let's emphasize our final plan. We are going to add --force flags that will be necessary to pass for a deactivation if there are in-memory caches to: 1) Rest API (already implemented in [1]) 2) Command line utility (already implemented in [1]) 3) JMX bean (going to be implemented in [2]) We are *not* going to change IgniteCluster or any other thick Java API, thus we are *not* going to merge [3]. We plan to *fully rollback* [1] and [2] once cache data survival after activation-deactivation cycle will be implemented. Is it correct? If we are on the same page, let's proceed this way. I propose to: - Create a JIRA issue for in-memory-data-safe deactivation (possibly, without IEP and detailed design so far) - Describe in the issue description what exact parts of API should be removed under the issue scope. Also, a few questions on already merged [1]: - We have removed GridClientClusterState#state(ClusterState) from Java client API. Is it a legitimate thing to do? Don't we have to support API compatibility for thin clients as well? - In many places in the code I can see the following javadoc > @param forceDeactivation If {@code true}, cluster deactivation will be > forced. > > As for me, this javadoc doesn't clarify anything. I'd suggest to describe in which cases deactivation won't happen unless it's forced and which impact forced deactivation will bring on the system. [1]: https://issues.apache.org/jira/browse/IGNITE-12701 [2]: https://issues.apache.org/jira/browse/IGNITE-12779 [3]: https://issues.apache.org/jira/browse/IGNITE-12614 -- Ivan On Tue, Mar 24, 2020 at 7:18 PM Vladimir Steshin wrote: > Hi, Igniters. > > I'd like to remind you that cluster can be deactivated by user with 3 > utilities: control.sh, *JMX and the REST*. Proposed in [1] solution is > not about control.sh. It suggests same approach regardless of the > utility user executes. The task touches *only* *API of the user calls*, > not the internal APIs. > > The reasons why flag “--yes” and confirmation prompt hasn’t taken into > account for control.sh are: > > -Various commands widely use “--yes” just to start. Even not dangerous > ones require “--yes” to begin. “--force” is dedicated for *harmless > actions*. > > -Checking of probable data erasure works after command start and > “--force” may not be required at all. > > -There are also JMX and REST. They have no “—yes” but should work alike. > > To get the deactivation safe I propose to merge last ticket with > the JMX fixes [2]. In future releases, I believe, we should estimate > jobs and fix memory erasure in general. For now, let’s prevent it. WDYT? > > > [1] https://issues.apache.org/jira/browse/IGNITE-12614 > > [2] https://issues.apache.org/jira/browse/IGNITE-12779 > > > 24.03.2020 15:55, Вячеслав Коптилин пишет: > > Hello Nikolay, > > > > I am talking about the interactive mode of the control utility, which > > requires explicit confirmation from the user. > > Please take a look at DeactivateCommand#prepareConfirmation and its > usages. > > It seems to me, this mode has the same aim as the forceDeactivation flag. > > We can change the message returned by > DeactivateCommand#confirmationPrompt > > as follows: > > "Warning: the command will deactivate the cluster nnn and clear > > in-memory caches (without persistence) including system caches." > > > > What do you think? > > > > Thanks, > > S. > > > > вт, 24 мар. 2020 г. в 13:07, Nikolay Izhikov : > > > >> Hello, Slava. > >> > >> Are you talking about this commit [1] (sorry for commit message it’s due > >> to the Github issue)? > >> > >> The message for this command for now > >> > >> «Deactivation stopped. Deactivation clears in-memory caches (without > >> persistence) including the system caches.» > >> > >> Is it clear enough? > >> > >> [1] > >> > https://github.com/apache/ignite/commit/4921fcf1fecbd8a1ab02099e09cc2adb0b3ff88a > >> > >> > >>> 24 марта 2020 г., в 13:02, Вячеслав Коптилин > > >> написал(а): > >>> Hi Nikolay, > >>> > 1. We should add —force flag to the command.sh deactivation command. > >>> I just checked and it seems that the deactivation command > >>> (control-utility.sh) already has a confirmation option. > >>> Perhaps, we need to clearly state the consequences of using this > command > >>> with in-memory caches. > >>> > >>> Thanks, > >>> S. > >>> > >>> вт, 24 мар. 2020 г. в 12:51, Nikolay Izhikov : > >>> > Hello, Alexey. > > I just repeat our agreement to be on the same page > > > The confirmation should only present in the user-facing interfaces. > 1. We should add —force flag to the command.sh deactivation command. > 2. We should throw the exception if cluster has in-memory caches and > —force=false. > 3. We shouldn’t change Java API for deactivation. > > Is it correct? > > > The DROP TABLE command does not have a "yes I am sure" clause in it > I think it because the command itself has a «DROP» word in it
Re: Security Subject of thin client on remote nodes
Denis, I'll review your PR. If this issue is a subject to be included in 2.8.1 in emergency mode, I'm ok with the current API changes. Please think about driving creating IEP on security API overhaul prior to 2.9. I believe that you are the most suitable Ignite community member to drive this activity. I'd love to share some ideas as well. On Tue, Mar 24, 2020 at 2:04 PM Denis Garus wrote: > Hi, guys! > > > I agree that we should rework the security API, but it can take a long > time. > > And currently, our users have certain impediments that are blockers for > their job. > > I think we have to fix bugs that IEP-41 [1] contains as soon as possible to > support our users. > > From my point of view, IEP-41 is the best place to track bug fixing. > > > >1. > > https://cwiki.apache.org/confluence/display/IGNITE/IEP-41%3A+Security+Context+of+thin+client+on+remote+nodes > > > вт, 24 мар. 2020 г. в 12:26, Ivan Rakov : > > > Alexey, > > > > That can be another version of our plan. If everyone agrees that > > SecurityContext and SecuritySubject should be merged, such fix of thin > > clients' issue will bring us closer to the final solution. > > Denis, what do you think? > > > > On Tue, Mar 24, 2020 at 10:38 AM Alexei Scherbakov < > > alexey.scherbak...@gmail.com> wrote: > > > > > Why can't we start gradually changing security API right now ? > > > I see no point in delaying with. > > > All changes will go to next 2.9 release anyway. > > > > > > My proposal: > > > 1. Get rid of security context. Doing this will bring security API to > > more > > > or less consistent state. > > > 2. Remove IEP-41 because it's no longer needed because of change [1] > > > 3. Propose an IEP to make security API avoid using internals. > > > > > > > > > > > > пн, 23 мар. 2020 г. в 19:53, Denis Garus : > > > > > > > Hello, Alexei, Ivan! > > > > > > > > >> Seems like security API is indeed a bit over-engineered > > > > > > > > Nobody has doubt we should do a reworking of GridSecurityProcessor. > > > > But this point is outside of scope thin client's problem that we are > > > > solving. > > > > I think we can create new IEP that will accumulate all ideas of > > Ignite's > > > > security improvements. > > > > > > > > >> Presence of the separate #securityContext(UUID) highlights that > user > > > > indeed should care > > > > >> about propagation of thin clients' contexts between the cluster > > nodes. > > > > > > > > I agree with Ivan. I've implemented both variants, > > > > and I like one with #securityContext(UUID) more. > > > > > > > > Could you please take a look at PR [1] for the issue [2]? > > > > > > > > 1. https://github.com/apache/ignite/pull/7523 > > > > 2. https://issues.apache.org/jira/browse/IGNITE-12759 > > > > > > > > пн, 23 мар. 2020 г. в 11:45, Ivan Rakov : > > > > > > > > > Alex, Denis, > > > > > > > > > > Seems like security API is indeed a bit over-engineered. > > > > > > > > > > Let's get rid of SecurityContext and use SecuritySubject instead. > > > > > > SecurityContext is just a POJO wrapper over > > > > > > SecuritySubject's > > > > > > org.apache.ignite.plugin.security.SecuritySubject#permissions. > > > > > > It's functionality can be easily moved to SecuritySubject. > > > > > > > > > > I totally agree. Both subject and context are implemented by plugin > > > > > provider, and I don't see any reason to keep both abstractions, > > > > especially > > > > > if we are going to get rid of transferring subject in node > attributes > > > > > (argument that subject is more lightweight won't work anymore). > > > > > > > > > > Also, there's kind of mess in node authentication logic. There are > at > > > > least > > > > > two components responsible for it: DiscoverySpiNodeAuthenticator > > (which > > > > is > > > > > forcibly set by GridDiscoveryManager, but in fact public) and > > > > > GridSecurityProcessor (which performs actual node auth logic, but > > > > private). > > > > > I also don't un
Re: Data vanished from cluster after INACTIVE/ACTIVE switch
> > I can’t agree with the «temporary» design. > We have neither design nor IEP or contributor who can fix current behavior. > And, if I understand Alexey Goncharyuk correctly, current behavior was > implemented intentionally. Alex, what do you think? Are we on the same page that desired behavior for the deactivation is to keep data of all in-memory caches, even though it was intentionally implemented in 2.0 another way? On Tue, Mar 24, 2020 at 12:21 PM Nikolay Izhikov wrote: > Hello, Ivan. > > > I believe we should fix the issue instead of adapting API to temporary > flaws. > > Agree. Let’s fix it. > > > I think that clear description of active(false) impact in the > documentation is more than enough > > I can’t agree with this point. > > We shouldn’t imply the assumption that every user reads the whole > documentation and completely understand the consequences of the > deactivation command. > > This whole thread shows that even active core developers don't understand > it. > > So my proposal is to remove --force flag only after we fix deactivation. > > > To sum it up, the question is whether we should reflect temporary system > design flaws in the API > > I can’t agree with the «temporary» design. > We have neither design nor IEP or contributor who can fix current behavior. > And, if I understand Alexey Goncharyuk correctly, current behavior was > implemented intentionally. > > So, my understanding that current implementation would be here for a while. > And after we fix it I totally support removing —force flag. > > > 24 марта 2020 г., в 12:06, Ivan Rakov > написал(а): > > > >> > >> I think the only question is - Do we need —force flag in Java API or > not. > > > > From my perspective, there's also no agreement that it should be present > > in the thin clients' API. For instance, I think it shouldn't. > > > > As far as I know, IGNITE_REUSE_MEMORY_ON_DEACTIVATE is for *other* > purpose. > >> Can you provide a simple reproducer when in-memory data not cleared on > >> deactivation? > > > > Preserving in-memory data isn't implemented so far, so I can't provide a > > reproducer. My point is that we are halfway through it: we can build a > > solution based on IGNITE_REUSE_MEMORY_ON_DEACTIVATE and additional logic > > with reusing memory pages. > > > > For me, the ultimate value of Ignite into real production environment is > >> user data. > >> If we have some cases when data is lost - we should avoid it as hard as > we > >> can. > >> > >> So, for now, this flag required. > > > > Totally agree that sudden vanishing of user data is unacceptable. But I > > don't see how it implies that we have to solve this issue by tangling > > public API. If we see that system behaves incorrectly, I believe we > should > > fix the issue instead of adapting API to temporary flaws. I think that > > clear description of active(false) impact in the documentation is more > than > > enough: on the one hand, if user didn't read documentation for the method > > he calls, he can't complain about the consequences; on the other hand, if > > user decided to deactivate the cluster for no matter what, -force flag > will > > barely stop him. > > We anyway have enough time before 2.9 to implement a proper solution. > > > > To sum it up, the question is whether we should reflect temporary system > > design flaws in the API. I think, we surely shouldn't: API certainly > lives > > longer and is not intended to collect workarounds for all bugs that are > > already fixed or planned to be fixed. > > We can collect more opinions on this. > > > > On Tue, Mar 24, 2020 at 10:22 AM Nikolay Izhikov > > wrote: > > > >> Alexey. > >> > >> Having the way to silently vanish user data is even worse. > >> So I’m strictly against removing —force flag. > >> > >>> 24 марта 2020 г., в 10:16, Alexei Scherbakov < > >> alexey.scherbak...@gmail.com> написал(а): > >>> > >>> Nikolay, > >>> > >>> I'm on the same page with Ivan. > >>> > >>> Having "force" flag in public API as preposterous as having it in > >>> System.exit. > >>> For me it looks like badly designed API. > >>> If a call to some method is dangerous it should be clearly specified in > >> the > >>> javadoc. > >>> I'm also against some "temporary" API. > >>> > >>>
Re: Security Subject of thin client on remote nodes
Alexey, That can be another version of our plan. If everyone agrees that SecurityContext and SecuritySubject should be merged, such fix of thin clients' issue will bring us closer to the final solution. Denis, what do you think? On Tue, Mar 24, 2020 at 10:38 AM Alexei Scherbakov < alexey.scherbak...@gmail.com> wrote: > Why can't we start gradually changing security API right now ? > I see no point in delaying with. > All changes will go to next 2.9 release anyway. > > My proposal: > 1. Get rid of security context. Doing this will bring security API to more > or less consistent state. > 2. Remove IEP-41 because it's no longer needed because of change [1] > 3. Propose an IEP to make security API avoid using internals. > > > > пн, 23 мар. 2020 г. в 19:53, Denis Garus : > > > Hello, Alexei, Ivan! > > > > >> Seems like security API is indeed a bit over-engineered > > > > Nobody has doubt we should do a reworking of GridSecurityProcessor. > > But this point is outside of scope thin client's problem that we are > > solving. > > I think we can create new IEP that will accumulate all ideas of Ignite's > > security improvements. > > > > >> Presence of the separate #securityContext(UUID) highlights that user > > indeed should care > > >> about propagation of thin clients' contexts between the cluster nodes. > > > > I agree with Ivan. I've implemented both variants, > > and I like one with #securityContext(UUID) more. > > > > Could you please take a look at PR [1] for the issue [2]? > > > > 1. https://github.com/apache/ignite/pull/7523 > > 2. https://issues.apache.org/jira/browse/IGNITE-12759 > > > > пн, 23 мар. 2020 г. в 11:45, Ivan Rakov : > > > > > Alex, Denis, > > > > > > Seems like security API is indeed a bit over-engineered. > > > > > > Let's get rid of SecurityContext and use SecuritySubject instead. > > > > SecurityContext is just a POJO wrapper over > > > > SecuritySubject's > > > > org.apache.ignite.plugin.security.SecuritySubject#permissions. > > > > It's functionality can be easily moved to SecuritySubject. > > > > > > I totally agree. Both subject and context are implemented by plugin > > > provider, and I don't see any reason to keep both abstractions, > > especially > > > if we are going to get rid of transferring subject in node attributes > > > (argument that subject is more lightweight won't work anymore). > > > > > > Also, there's kind of mess in node authentication logic. There are at > > least > > > two components responsible for it: DiscoverySpiNodeAuthenticator (which > > is > > > forcibly set by GridDiscoveryManager, but in fact public) and > > > GridSecurityProcessor (which performs actual node auth logic, but > > private). > > > I also don't understand why we need both > > > #authenticate(AuthenticationContext) and #authenticateNode(ClusterNode, > > > SecurityCredentials) methods while it's possible to set explicit > > > SecuritySubjectType.REMOTE_NODE in AuthenticationContext (this is > > arguable; > > > perhaps there are strong reasons). > > > > > > Finally, areas of responsibility between IgniteSecurity and > > > GridSecurityProcessor are kind of mixed. As far as I understand, the > > first > > > is responsible for Ignite-internal management of security logic > (keeping > > > thread-local context, caching security contexts, etc; we don't expect > > > IgniteSecurity to be replaced by plugin provider) and the latter is > > > responsible for user-custom authentication / authorization logic. To be > > > honest, it took plenty of time to figure this out for me. > > > > > > From my point of view, we should make GridSecurityProcessor interface > > > public, rename it (it requires plenty of time to find the difference > from > > > IgniteSecurity), make its API as simple and non-duplicating as possible > > and > > > clarify its area of responsibility (e.g. should it be responsible for > > > propagation of successfully authenticated subject among all nodes or > > not?) > > > to make it easy to embed custom security logic in Ignite. > > > > > > Regarding thin clients fix: implementation made by Denis suits better > to > > > the very implicit contract that it's better to change API contracts of > an > > > internal IgniteSecurity than of internal G
Re: Data vanished from cluster after INACTIVE/ACTIVE switch
> > I think the only question is - Do we need —force flag in Java API or not. From my perspective, there's also no agreement that it should be present in the thin clients' API. For instance, I think it shouldn't. As far as I know, IGNITE_REUSE_MEMORY_ON_DEACTIVATE is for *other* purpose. > Can you provide a simple reproducer when in-memory data not cleared on > deactivation? Preserving in-memory data isn't implemented so far, so I can't provide a reproducer. My point is that we are halfway through it: we can build a solution based on IGNITE_REUSE_MEMORY_ON_DEACTIVATE and additional logic with reusing memory pages. For me, the ultimate value of Ignite into real production environment is > user data. > If we have some cases when data is lost - we should avoid it as hard as we > can. > > So, for now, this flag required. Totally agree that sudden vanishing of user data is unacceptable. But I don't see how it implies that we have to solve this issue by tangling public API. If we see that system behaves incorrectly, I believe we should fix the issue instead of adapting API to temporary flaws. I think that clear description of active(false) impact in the documentation is more than enough: on the one hand, if user didn't read documentation for the method he calls, he can't complain about the consequences; on the other hand, if user decided to deactivate the cluster for no matter what, -force flag will barely stop him. We anyway have enough time before 2.9 to implement a proper solution. To sum it up, the question is whether we should reflect temporary system design flaws in the API. I think, we surely shouldn't: API certainly lives longer and is not intended to collect workarounds for all bugs that are already fixed or planned to be fixed. We can collect more opinions on this. On Tue, Mar 24, 2020 at 10:22 AM Nikolay Izhikov wrote: > Alexey. > > Having the way to silently vanish user data is even worse. > So I’m strictly against removing —force flag. > > > 24 марта 2020 г., в 10:16, Alexei Scherbakov < > alexey.scherbak...@gmail.com> написал(а): > > > > Nikolay, > > > > I'm on the same page with Ivan. > > > > Having "force" flag in public API as preposterous as having it in > > System.exit. > > For me it looks like badly designed API. > > If a call to some method is dangerous it should be clearly specified in > the > > javadoc. > > I'm also against some "temporary" API. > > > > We should: > > > > 1. Partially remove IGNITE-12701 except javadoc part. Note control.sh > for a > > long time has support for a confirmation on deactivation (interactive > mode). > > 2. IGNITE_REUSE_MEMORY_ON_DEACTIVATE=true already preserves memory > content > > after deactivation. We should start working on restoring page memory > state > > after subsequent reactivation. > > 3. Describe current behavior for in-memory cache on deactivation in > Ignite > > documentation. > > > > > > пн, 23 мар. 2020 г. в 21:22, Nikolay Izhikov : > > > >> Hello, Ivan. > >> > >>> Seems like we don't have a final agreement on whether we should add > force > >> flag to deactivation API. > >> > >> I think the only question is - Do we need —force flag in Java API or > not. > >> > >> > >>> As a final solution, I'd want to see behavior when all in-memory data > is > >> available after deactivation and further activation. > >> > >> Agree. > >> > >>> I believe it’s possible to don't deallocate memory > >>> (like mentioned before, we already can achieve that with > >> IGNITE_REUSE_MEMORY_ON_DEACTIVATE=true) and carefully reuse all loaded > data > >> pages on next activation and caches start. > >> > >> As far as I know, IGNITE_REUSE_MEMORY_ON_DEACTIVATE is for *other* > purpose. > >> Can you provide a simple reproducer when in-memory data not cleared on > >> deactivation? > >> > >>> Considering this, do we really need to introduce force flag as a > >> temporary precaution? > >> > >> My answer is yes we need it. > >> Right now, we can’t prevent data loss on deactivation for in-memory > caches. > >> > >> For me, the ultimate value of Ignite into real production environment is > >> user data. > >> If we have some cases when data is lost - we should avoid it as hard as > we > >> can. > >> > >> So, for now, this flag required. > >> > >>> I suggest to rollback [2] from AI master, stop working
Re: Data vanished from cluster after INACTIVE/ACTIVE switch
Folks, Let's revive this discussion until it's too late and all API changes are merged to master [1]. Seems like we don't have a final agreement on whether we should add force flag to deactivation API. First of all, I think we are all on the same page that in-memory caches data vanishing on deactivation is counter-intuitive and dangerous. As a final solution, I'd want to see behavior when all in-memory data is available after deactivation and further activation. I believe it's possible to don't deallocate memory (like mentioned before, we already can achieve that with IGNITE_REUSE_MEMORY_ON_DEACTIVATE=true) and carefully reuse all loaded data pages on next activation and caches start. Also, this is a wider question, but: do we understand what cluster deactivation is actually intended for? I can only think of two cases: - graceful cluster shutdown: an ability to cut checkpoints and to end transactional load consistently prior to further stop of all nodes - blocking all API (both reads and writes) due to some maintenance Neither of them requires forcefully clearing all in-memory data on deactivation. If everyone agrees, from now on we should assume data clearing as system design flaw that should be fixed, not as possible scenario which we should support on the API level. Considering this, do we really need to introduce force flag as a temporary precaution? I have at least two reasons against it: 1) Once API was changed and released, we have to support it until the next major release. If we all understand that data vanishing issue is fixable, I believe we shouldn't engrave in the API flags that will become pointless. 2) More personal one, but I'm against any force flags in the API. This makes API harder to understand; more than that, the presence of such flags just highlights that API is poorly designed. I suggest to rollback [2] from AI master, stop working on [1] and focus on how to implement keeping in-memory data after the deactivation. I think we can still require user consent for deactivation via control.sh (it already requires --yes) and JMX. Thoughts? [1]: https://issues.apache.org/jira/browse/IGNITE-12614 [2]: https://issues.apache.org/jira/browse/IGNITE-12701 -- Ivan On Tue, Mar 17, 2020 at 2:26 PM Vladimir Steshin wrote: > Nikolay, I think we should reconsider clearing at least system caches > when deactivating. > > 17.03.2020 14:18, Nikolay Izhikov пишет: > > Hello, Vladimir. > > > > I don’t get it. > > > > What is your proposal? > > What we should do? > > > >> 17 марта 2020 г., в 14:11, Vladimir Steshin > написал(а): > >> > >> Nikolay, hi. > >> > > And should be covered with the —force parameter we added. > >> As fix for user cases - yes. My idea is to emphasize overall ability to > lose various objects, not only data. Probably might be reconsidered in > future. > >> > >> > >> 17.03.2020 13:49, Nikolay Izhikov пишет: > >>> Hello, Vladimir. > >>> > >>> If there is at lease one persistent data region then system data > region also becomes persistent. > >>> Your example applies only to pure in-memory clusters. > >>> > >>> And should be covered with the —force parameter we added. > >>> > >>> What do you think? > >>> > 17 марта 2020 г., в 13:45, Vladimir Steshin > написал(а): > > Hi, all. > > Fixes for control.sh and the REST have been merged. Could anyone take > a look to the previous email with an issue? Isn't this conductvery wierd? > >
Re: Re[2]: Discuss idle_verify with moving partitions changes.
Partial results are consistent though. I'd add something like "Possible results are not full" instead. On Mon, Mar 23, 2020 at 12:47 PM Zhenya Stanilovsky wrote: > > Guys thank for quick response, Ivan what do you think about Vlad`s > proposal to add additional info like : > "Possible results are not consistent due to rebalance still in progress" ? > Thanks ! > > >Понедельник, 23 марта 2020, 12:30 +03:00 от Ivan Rakov < > ivan.glu...@gmail.com>: > > > >Zhenya, > > > >As for me, the current behavior of idle_verify looks correct. > >There's no sense in checking MOVING partitions (on which we explicitly > >inform user), however checking consistency between the rest of owners > still > >makes sense: they still can diverge and we can be aware of the presence of > >the conflicts sooner. > >In case cluster is not idle (in terms of user activities, not in terms of > >internal cluster processes like rebalancing), utility will fail as > expected. > > > >On Mon, Mar 23, 2020 at 11:23 AM Vladislav Pyatkov < > vpyat...@gridgain.com > > >wrote: > > > >> Hi Zhenya, > >> > >> I see your point. Need to show some message, because cluster is not idle > >> (rebalance is going). > >> When cluster not idle we cannot validate partitions honestly. After > several > >> minutes we can to get absolutely different result, without any client's > >> operation of cache happened. > >> > >> May be enough showing some message more clear for end user. For example: > >> "Result has not valid, rebalance is going." > >> > >> Another thing you meaning - issue in indexes, when rebalance is > following. > >> I think idex_validate should fail in this case, because indexes always > in > >> load during rebalance. > >> > >> > >> On Mon, Mar 23, 2020 at 10:20 AM Zhenya Stanilovsky > >> < arzamas...@mail.ru.invalid > wrote: > >> > >> > > >> > Igniters, i found that near idle check commands only shows partitions > in > >> > MOVING states as info in log and not take into account this fact as > >> > erroneous idle cluster state. > >> > control.sh --cache idle_verify, control.sh --cache validate_indexes > >> > --check-crc > >> > > >> > for example command would show something like : > >> > > >> > Arguments: --cache idle_verify --yes > >> > > >> > > >> > > >> > idle_verify task was executed with the following args: caches=[], > >> > excluded=[], cacheFilter=[DEFAULT] > >> > idle_verify check has finished, no conflicts have been found. > >> > Verification was skipped for 21 MOVING partitions: > >> > Skipped partition: PartitionKeyV2 [grpId=1544803905, grpName=default, > >> > partId=7] > >> > Partition instances: [PartitionHashRecordV2 [isPrimary=false, > >> > consistentId=gridCommandHandlerTest2, updateCntr=3, > >> partitionState=MOVING, > >> > state=MOVING]] .. and so on > >> > > >> > I found this erroneous and can lead to further cluster index > corruption, > >> > for example in case when only command OK result checked. > >> > > >> > If no objections would be here, i plan to inform about moving states > as > >> > not OK exit code too. > >> > > >> > > >> > >> > >> > >> -- > >> Vladislav Pyatkov > >> Architect-Consultant "GridGain Rus" Llc. > >> +7-929-537-79-60 > >> > > > >
Re: Discuss idle_verify with moving partitions changes.
Zhenya, As for me, the current behavior of idle_verify looks correct. There's no sense in checking MOVING partitions (on which we explicitly inform user), however checking consistency between the rest of owners still makes sense: they still can diverge and we can be aware of the presence of the conflicts sooner. In case cluster is not idle (in terms of user activities, not in terms of internal cluster processes like rebalancing), utility will fail as expected. On Mon, Mar 23, 2020 at 11:23 AM Vladislav Pyatkov wrote: > Hi Zhenya, > > I see your point. Need to show some message, because cluster is not idle > (rebalance is going). > When cluster not idle we cannot validate partitions honestly. After several > minutes we can to get absolutely different result, without any client's > operation of cache happened. > > May be enough showing some message more clear for end user. For example: > "Result has not valid, rebalance is going." > > Another thing you meaning - issue in indexes, when rebalance is following. > I think idex_validate should fail in this case, because indexes always in > load during rebalance. > > > On Mon, Mar 23, 2020 at 10:20 AM Zhenya Stanilovsky > wrote: > > > > > Igniters, i found that near idle check commands only shows partitions in > > MOVING states as info in log and not take into account this fact as > > erroneous idle cluster state. > > control.sh --cache idle_verify, control.sh --cache validate_indexes > > --check-crc > > > > for example command would show something like : > > > > Arguments: --cache idle_verify --yes > > > > > > > idle_verify task was executed with the following args: caches=[], > > excluded=[], cacheFilter=[DEFAULT] > > idle_verify check has finished, no conflicts have been found. > > Verification was skipped for 21 MOVING partitions: > > Skipped partition: PartitionKeyV2 [grpId=1544803905, grpName=default, > > partId=7] > > Partition instances: [PartitionHashRecordV2 [isPrimary=false, > > consistentId=gridCommandHandlerTest2, updateCntr=3, > partitionState=MOVING, > > state=MOVING]] .. and so on > > > > I found this erroneous and can lead to further cluster index corruption, > > for example in case when only command OK result checked. > > > > If no objections would be here, i plan to inform about moving states as > > not OK exit code too. > > > > > > > > -- > Vladislav Pyatkov > Architect-Consultant "GridGain Rus" Llc. > +7-929-537-79-60 >
Re: Security Subject of thin client on remote nodes
Alex, Denis, Seems like security API is indeed a bit over-engineered. Let's get rid of SecurityContext and use SecuritySubject instead. > SecurityContext is just a POJO wrapper over > SecuritySubject's > org.apache.ignite.plugin.security.SecuritySubject#permissions. > It's functionality can be easily moved to SecuritySubject. I totally agree. Both subject and context are implemented by plugin provider, and I don't see any reason to keep both abstractions, especially if we are going to get rid of transferring subject in node attributes (argument that subject is more lightweight won't work anymore). Also, there's kind of mess in node authentication logic. There are at least two components responsible for it: DiscoverySpiNodeAuthenticator (which is forcibly set by GridDiscoveryManager, but in fact public) and GridSecurityProcessor (which performs actual node auth logic, but private). I also don't understand why we need both #authenticate(AuthenticationContext) and #authenticateNode(ClusterNode, SecurityCredentials) methods while it's possible to set explicit SecuritySubjectType.REMOTE_NODE in AuthenticationContext (this is arguable; perhaps there are strong reasons). Finally, areas of responsibility between IgniteSecurity and GridSecurityProcessor are kind of mixed. As far as I understand, the first is responsible for Ignite-internal management of security logic (keeping thread-local context, caching security contexts, etc; we don't expect IgniteSecurity to be replaced by plugin provider) and the latter is responsible for user-custom authentication / authorization logic. To be honest, it took plenty of time to figure this out for me. >From my point of view, we should make GridSecurityProcessor interface public, rename it (it requires plenty of time to find the difference from IgniteSecurity), make its API as simple and non-duplicating as possible and clarify its area of responsibility (e.g. should it be responsible for propagation of successfully authenticated subject among all nodes or not?) to make it easy to embed custom security logic in Ignite. Regarding thin clients fix: implementation made by Denis suits better to the very implicit contract that it's better to change API contracts of an internal IgniteSecurity than of internal GridSecurityProcessor (which actually mustn't be internal). > My approach doesn't require any IEPs, just minor change in code and to > > org.apache.ignite.internal.processors.security.IgniteSecurity#authenticate(AuthenticationContext) > contract. Looks like a misuse of #authenticate method to me. It should perform initial authentication based on credentials (this may include queries to external authentication subsystem, e.g. LDAP). User may want to don't authenticate thin client on every node (this will increase the number of requests to auth subsystem unless user implicitly implements propagation of thin clients' contexts between nodes and make #authenticate cluster-wide idempotent: first call should perform actual authentication, next calls should retrieve context of already authenticated client). Presence of the separate #securityContext(UUID) highlights that user indeed should care about propagation of thin clients' contexts between the cluster nodes. -- Ivan On Fri, Mar 20, 2020 at 12:22 PM Veena Mithare wrote: > Hi Alexei, Denis, > > One of the main usecases of thin client authentication is to be able to > audit the changes done using the thin client user. > To enable that : > We really need to resolve this concern as well : > https://issues.apache.org/jira/browse/IGNITE-12781 > > ( Incorrect security subject id is associated with a cache_put event > when the originator of the event is a thin client. ) > > Regards, > Veena > > > -Original Message- > From: Alexei Scherbakov > Sent: 18 March 2020 08:11 > To: dev > Subject: Re: Security Subject of thin client on remote nodes > > Denis Garus, > > Both variants are capable of solving the thin client security context > problem. > > My approach doesn't require any IEPs, just minor change in code and to > > org.apache.ignite.internal.processors.security.IgniteSecurity#authenticate(AuthenticationContext) > contract. > We can add appropriate documentation to emphasize this. > The argument "fragile" is not very convincing for me. > > I think we should collect more opinions before proceeding with IEP. > > Considering a fact we actually *may not care* about compatibility (I've > already explained why), I'm thinking of another approach. > Let's get rid of SecurityContext and use SecuritySubject instead. > SecurityContext is just a POJO wrapper over SecuritySubject's > org.apache.ignite.plugin.security.SecuritySubject#permissions. > It's functionality can be easily moved to SecuritySubject. > > What do you think? > > > > пн, 16 мар. 2020 г. в 15:47, Denis Garus : > > > Hello, Alexei! > > > > I agree with you if we may not care about compatibility at all, then > > we can solve the problem much more straightforward
Re: [DISCUSSION] Deprecation of obsolete rebalancing functionality
Hello, +1 from me for rebalance delay deprecation. I can imagine only one actual case for this option: prevent excessive load on the cluster in case of temporary short-term topology changes (e.g. node is stopped for a while and then returned back). Now it's handled by baseline auto adjustment in a much more correct way: partitions are not reassigned within a maintenance interval (unlike with the rebalance delay). I also don't think that ability to configure rebalance delay per cache is crucial. > rebalanceOrder is also useless, agreed. +1 Except for one case: we may want to rebalance caches with CacheRebalanceMode.SYNC first. But anyway, this behavior doesn't require a separate property to be enabled. On Wed, Feb 12, 2020 at 4:54 PM Alexei Scherbakov < alexey.scherbak...@gmail.com> wrote: > Maxim, > > rebalanceDelay was introduced before the BLT appear in the product to solve > scenarios which are now solved by BLT. > > It's pointless for me having it in the product since BLT was introduced. > > I do not think delaying rebalancing per cache group has any meaning. I > cannot image any reason for it. > > rebalanceOrder is also useless, agreed. > > > > > ср, 12 февр. 2020 г. в 16:19, Maxim Muzafarov : > > > Alexey, > > > > Why do you think delaying of historical rebalance (on BLT node join) > > for particular cache groups is not the real world use case? Probably > > the same topic may be started on user-list to collect more use cases > > from real users. > > > > In general, I support reducing the number of available rebalance > > configuration parameters, but we should do it really carefully. > > I can also propose - rebalanceOrder param for removing. > > > > On Wed, 12 Feb 2020 at 15:50, Alexei Scherbakov > > wrote: > > > > > > Maxim, > > > > > > In general rebalanceDelay is used to delay/disable rebalance then > > topology > > > is changed. > > > Right now we have BLT to avoid unnecesary rebalancing when topology is > > > changed. > > > If a node left from cluster topology no rebalancing happens until the > > node > > > explicitly removed from baseline topology. > > > > > > I would like to know real world scenarios which can not be covered by > BLT > > > configuration. > > > > > > > > > > > > ср, 12 февр. 2020 г. в 15:16, Maxim Muzafarov : > > > > > > > Alexey, > > > > > > > > > All scenarios where rebalanceDelay has meaning are handled by > > baseline > > > > topology now. > > > > > > > > Can you, please, provide more details here e.g. the whole list of > > > > scenarios where rebalanceDelay is used and how these handled by > > > > baseline topology? > > > > > > > > Actually, I doubt that it covers exactly all the cases due to > > > > rebalanceDelay is a "per cache group property" rather than "baseline" > > > > is meaningful for the whole topology. > > > > > > > > On Wed, 12 Feb 2020 at 12:58, Alexei Scherbakov > > > > wrote: > > > > > > > > > > I've meant baseline topology. > > > > > > > > > > ср, 12 февр. 2020 г. в 12:41, Alexei Scherbakov < > > > > > alexey.scherbak...@gmail.com>: > > > > > > > > > > > > > > > > > V.Pyatkov > > > > > > > > > > > > Doesn't rebalance topology solves it ? > > > > > > > > > > > > ср, 12 февр. 2020 г. в 12:31, V.Pyatkov : > > > > > > > > > > > >> Hi, > > > > > >> > > > > > >> I am sure we can to reduce this ability, but do not completely. > > > > > >> We can use rebalance delay for disable it until manually > > triggered. > > > > > >> > > > > > >> CacheConfiguration#setRebalanceDelay(-1) > > > > > >> > > > > > >> It may helpful for cluster where can not allow performance drop > > from > > > > > >> rebalance at any time. > > > > > >> > > > > > >> > > > > > >> > > > > > >> -- > > > > > >> Sent from: > http://apache-ignite-developers.2346864.n4.nabble.com/ > > > > > >> > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > Best regards, > > > > > > Alexei Scherbakov > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > Best regards, > > > > > Alexei Scherbakov > > > > > > > > > > > > > -- > > > > > > Best regards, > > > Alexei Scherbakov > > > > > -- > > Best regards, > Alexei Scherbakov >
Re: [VOTE] Allow or prohibit a joint use of @deprecated and @IgniteExperimental
-1 Prohibit >From my point of view, deprecation of the existing API will confuse users in case API suggested as a replacement is marked with @IgniteExperimental. On Mon, Feb 10, 2020 at 12:20 PM Nikolay Izhikov wrote: > +1 > > > 10 февр. 2020 г., в 11:57, Andrey Mashenkov > написал(а): > > > > -1 Prohibit. > > > > We must not deprecate old API without have a new stable well-documented > > alternative and a way to migrate to new one. > > > > > > On Mon, Feb 10, 2020 at 11:02 AM Alexey Goncharuk > > > wrote: > > > >> Dear Apache Ignite community, > >> > >> We would like to conduct a formal vote on the subject of whether to > allow > >> or prohibit a joint existence of @deprecated annotation for an old API > >> and @IgniteExperimental [1] for a new (replacement) API. The result of > this > >> vote will be formalized as an Apache Ignite development rule to be used > in > >> future. > >> > >> The discussion thread where you can address all non-vote messages is > [2]. > >> > >> The votes are: > >> *[+1 Allow]* Allow to deprecate the old APIs even when new APIs are > marked > >> with @IgniteExperimental to explicitly notify users that an old APIs > will > >> be removed in the next major release AND new APIs are available. > >> *[-1 Prohibit]* Never deprecate the old APIs unless the new APIs are > stable > >> and released without @IgniteExperimental. The old APIs javadoc may be > >> updated with a reference to new APIs to encourage users to evaluate new > >> APIs. The deprecation and new API release may happen simultaneously if > the > >> new API is not marked with @IgniteExperimental or the annotation is > removed > >> in the same release. > >> > >> Neither of the choices prohibits deprecation of an API without a > >> replacement if community decides so. > >> > >> The vote will hold for 72 hours and will end on February 13th 2020 08:00 > >> UTC: > >> > >> > https://www.timeanddate.com/countdown/to?year=2020&month=2&day=13&hour=8&min=0&sec=0&p0=utc-1 > >> > >> All votes count, there is no binding/non-binding status for this. > >> > >> [1] > >> > >> > https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/lang/IgniteExperimental.java > >> [2] > >> > >> > http://apache-ignite-developers.2346864.n4.nabble.com/DISCUSS-Public-API-deprecation-rules-td45647.html > >> > >> Thanks, > >> --AG > >> > > > > > > -- > > Best regards, > > Andrey V. Mashenkov > >
Re: Forbid mixed cache groups with both atomic and transactional caches
Ivan, Thanks for pointing this out. Less than one day is indeed too early to treat this discussion thread as a "community conclusion". Still, the consensus among the current participants made me feel that a conclusion will be reached. We'll surely get back to the discussion if opposite opinions will arise. On Wed, Feb 5, 2020 at 1:11 PM Ivan Pavlukhin wrote: > Folks, > > A bit of offtop. Do we have some recommendations in the community how > long should we wait until treating something as "a Community > conclusion"? It worries me a little bit that I see a discussion for a > first time and there is already a conclusion. And the discussion was > started lesser than 24 hours ago. I suppose we should allow everyone > interested to share an opinion (here I agree with the proposal) and it > usually requires some time in open-source communities. > > ср, 5 февр. 2020 г. в 10:58, Ivan Rakov : > > > > Folks, > > > > Thanks for your feedback. > > I've created a JIRA issue on this change: > > https://issues.apache.org/jira/browse/IGNITE-12622 > > > > On Tue, Feb 4, 2020 at 10:43 PM Denis Magda wrote: > > > > > +1 from my end. It doesn't sound like a big deal if Ignite users need > to > > > define separate groups for atomic and transactional caches. > > > > > > - > > > Denis > > > > > > > > > On Tue, Feb 4, 2020 at 3:28 AM Ivan Rakov > wrote: > > > > > > > Igniters, > > > > > > > > Apparently it's possible in Ignite to configure a cache group with > both > > > > ATOMIC and TRANSACTIONAL caches. > > > > Proof: IgniteCacheGroupsTest#testContinuousQueriesMultipleGroups* > tests. > > > > In my opinion, it would be better to remove such possibility from the > > > > product. There are several reasons: > > > > > > > > 1) The original idea of grouping caches was optimizing storage > overhead > > > and > > > > PME time by joining data of similar caches into the same partitions. > > > ATOMIC > > > > and TRANSACTIONAL caches provide different guarantees and are > designed > > > for > > > > different use cases, thus they can hardly be called "similar". > > > > > > > > 2) Diving deeper: synchronization protocols and possible reasons for > > > > primary-backup divergences are conceptually different for ATOMIC and > > > > TRANSACTIONAL cases. In TRANSACTIONAL case, transactions recovery > > > protocol > > > > allows to recover consistency if any participating node will fail, > but > > > for > > > > ATOMIC caches there's possible scenario with failure of primary node > > > where > > > > neither of backups will contain the most recent state of the data. > > > Example: > > > > one backup have received updates 1, 3, 5 while another have received > 2, 4 > > > > (which is possible due to message reordering), and even tracking > counters > > > > [1] won't restore the consistency. The problem is that we can't > > > distinguish > > > > what kind of conflict we have faced in case update counters have > diverged > > > > in a mixed group. > > > > > > > > 3) Mixed groups are poorly tested. I can't find any tests except a > couple > > > > of smoke tests in IgniteCacheGroupsTest. We can't be sure that > different > > > > synchronization protocols will work correctly for such > configurations, > > > > especially under load and with a variety of dependent configuration > > > > parameters. > > > > > > > > 4) I have never heard of any feedback on mixed groups. I have asked > > > > different people on this and no one recalled any attempts to > configure > > > such > > > > groups. I believe that in fact no one has ever tried to do it. > > > > > > > > Please let me know if you are aware of any cases where mixed groups > are > > > > used or reasons to keep them. Otherwise I'll create a ticket to > prohibit > > > > mixed configurations. > > > > > > > > [1]: https://issues.apache.org/jira/browse/IGNITE-11797 > > > > > > > > -- > > > > Best Regards, > > > > Ivan Rakov > > > > > > > > > > > -- > Best regards, > Ivan Pavlukhin >
Re: Forbid mixed cache groups with both atomic and transactional caches
Folks, Thanks for your feedback. I've created a JIRA issue on this change: https://issues.apache.org/jira/browse/IGNITE-12622 On Tue, Feb 4, 2020 at 10:43 PM Denis Magda wrote: > +1 from my end. It doesn't sound like a big deal if Ignite users need to > define separate groups for atomic and transactional caches. > > - > Denis > > > On Tue, Feb 4, 2020 at 3:28 AM Ivan Rakov wrote: > > > Igniters, > > > > Apparently it's possible in Ignite to configure a cache group with both > > ATOMIC and TRANSACTIONAL caches. > > Proof: IgniteCacheGroupsTest#testContinuousQueriesMultipleGroups* tests. > > In my opinion, it would be better to remove such possibility from the > > product. There are several reasons: > > > > 1) The original idea of grouping caches was optimizing storage overhead > and > > PME time by joining data of similar caches into the same partitions. > ATOMIC > > and TRANSACTIONAL caches provide different guarantees and are designed > for > > different use cases, thus they can hardly be called "similar". > > > > 2) Diving deeper: synchronization protocols and possible reasons for > > primary-backup divergences are conceptually different for ATOMIC and > > TRANSACTIONAL cases. In TRANSACTIONAL case, transactions recovery > protocol > > allows to recover consistency if any participating node will fail, but > for > > ATOMIC caches there's possible scenario with failure of primary node > where > > neither of backups will contain the most recent state of the data. > Example: > > one backup have received updates 1, 3, 5 while another have received 2, 4 > > (which is possible due to message reordering), and even tracking counters > > [1] won't restore the consistency. The problem is that we can't > distinguish > > what kind of conflict we have faced in case update counters have diverged > > in a mixed group. > > > > 3) Mixed groups are poorly tested. I can't find any tests except a couple > > of smoke tests in IgniteCacheGroupsTest. We can't be sure that different > > synchronization protocols will work correctly for such configurations, > > especially under load and with a variety of dependent configuration > > parameters. > > > > 4) I have never heard of any feedback on mixed groups. I have asked > > different people on this and no one recalled any attempts to configure > such > > groups. I believe that in fact no one has ever tried to do it. > > > > Please let me know if you are aware of any cases where mixed groups are > > used or reasons to keep them. Otherwise I'll create a ticket to prohibit > > mixed configurations. > > > > [1]: https://issues.apache.org/jira/browse/IGNITE-11797 > > > > -- > > Best Regards, > > Ivan Rakov > > >
[jira] [Created] (IGNITE-12622) Forbid mixed cache groups with both atomic and transactional caches
Ivan Rakov created IGNITE-12622: --- Summary: Forbid mixed cache groups with both atomic and transactional caches Key: IGNITE-12622 URL: https://issues.apache.org/jira/browse/IGNITE-12622 Project: Ignite Issue Type: Improvement Components: cache Reporter: Ivan Rakov Fix For: 2.9 Apparently it's possible in Ignite to configure a cache group with both ATOMIC and TRANSACTIONAL caches. IgniteCacheGroupsTest#testContinuousQueriesMultipleGroups* tests. As per discussed on dev list (http://apache-ignite-developers.2346864.n4.nabble.com/Forbid-mixed-cache-groups-with-both-atomic-and-transactional-caches-td45586.html), the community has concluded that such configurations should be prohibited. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Forbid mixed cache groups with both atomic and transactional caches
Anton, Indeed, that's +1 point for forbidding mixed configurations. On Tue, Feb 4, 2020 at 2:36 PM Anton Vinogradov wrote: > Seems, we already started the separation by atomic operations restriction > inside the transactions [1]. > See no reason to allow mixes in this case. > > [1] https://issues.apache.org/jira/browse/IGNITE-2313 > > On Tue, Feb 4, 2020 at 2:28 PM Ivan Rakov wrote: > > > Igniters, > > > > Apparently it's possible in Ignite to configure a cache group with both > > ATOMIC and TRANSACTIONAL caches. > > Proof: IgniteCacheGroupsTest#testContinuousQueriesMultipleGroups* tests. > > In my opinion, it would be better to remove such possibility from the > > product. There are several reasons: > > > > 1) The original idea of grouping caches was optimizing storage overhead > and > > PME time by joining data of similar caches into the same partitions. > ATOMIC > > and TRANSACTIONAL caches provide different guarantees and are designed > for > > different use cases, thus they can hardly be called "similar". > > > > 2) Diving deeper: synchronization protocols and possible reasons for > > primary-backup divergences are conceptually different for ATOMIC and > > TRANSACTIONAL cases. In TRANSACTIONAL case, transactions recovery > protocol > > allows to recover consistency if any participating node will fail, but > for > > ATOMIC caches there's possible scenario with failure of primary node > where > > neither of backups will contain the most recent state of the data. > Example: > > one backup have received updates 1, 3, 5 while another have received 2, 4 > > (which is possible due to message reordering), and even tracking counters > > [1] won't restore the consistency. The problem is that we can't > distinguish > > what kind of conflict we have faced in case update counters have diverged > > in a mixed group. > > > > 3) Mixed groups are poorly tested. I can't find any tests except a couple > > of smoke tests in IgniteCacheGroupsTest. We can't be sure that different > > synchronization protocols will work correctly for such configurations, > > especially under load and with a variety of dependent configuration > > parameters. > > > > 4) I have never heard of any feedback on mixed groups. I have asked > > different people on this and no one recalled any attempts to configure > such > > groups. I believe that in fact no one has ever tried to do it. > > > > Please let me know if you are aware of any cases where mixed groups are > > used or reasons to keep them. Otherwise I'll create a ticket to prohibit > > mixed configurations. > > > > [1]: https://issues.apache.org/jira/browse/IGNITE-11797 > > > > -- > > Best Regards, > > Ivan Rakov > > >
Forbid mixed cache groups with both atomic and transactional caches
Igniters, Apparently it's possible in Ignite to configure a cache group with both ATOMIC and TRANSACTIONAL caches. Proof: IgniteCacheGroupsTest#testContinuousQueriesMultipleGroups* tests. In my opinion, it would be better to remove such possibility from the product. There are several reasons: 1) The original idea of grouping caches was optimizing storage overhead and PME time by joining data of similar caches into the same partitions. ATOMIC and TRANSACTIONAL caches provide different guarantees and are designed for different use cases, thus they can hardly be called "similar". 2) Diving deeper: synchronization protocols and possible reasons for primary-backup divergences are conceptually different for ATOMIC and TRANSACTIONAL cases. In TRANSACTIONAL case, transactions recovery protocol allows to recover consistency if any participating node will fail, but for ATOMIC caches there's possible scenario with failure of primary node where neither of backups will contain the most recent state of the data. Example: one backup have received updates 1, 3, 5 while another have received 2, 4 (which is possible due to message reordering), and even tracking counters [1] won't restore the consistency. The problem is that we can't distinguish what kind of conflict we have faced in case update counters have diverged in a mixed group. 3) Mixed groups are poorly tested. I can't find any tests except a couple of smoke tests in IgniteCacheGroupsTest. We can't be sure that different synchronization protocols will work correctly for such configurations, especially under load and with a variety of dependent configuration parameters. 4) I have never heard of any feedback on mixed groups. I have asked different people on this and no one recalled any attempts to configure such groups. I believe that in fact no one has ever tried to do it. Please let me know if you are aware of any cases where mixed groups are used or reasons to keep them. Otherwise I'll create a ticket to prohibit mixed configurations. [1]: https://issues.apache.org/jira/browse/IGNITE-11797 -- Best Regards, Ivan Rakov
[jira] [Created] (IGNITE-12607) PartitionsExchangeAwareTest is flaky
Ivan Rakov created IGNITE-12607: --- Summary: PartitionsExchangeAwareTest is flaky Key: IGNITE-12607 URL: https://issues.apache.org/jira/browse/IGNITE-12607 Project: Ignite Issue Type: Bug Reporter: Ivan Rakov Assignee: Ivan Rakov Fix For: 2.9 Proof: https://ci.ignite.apache.org/buildConfiguration/IgniteTests24Java8_Cache6/4972239 Seems like cache update sometimes is not possible even before topologies are locked. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Wrong results on Scan queries on REPLICATED caches during rebalance
Hi Sergey, Just FYI: a similar problem with replicated caches has been fixed in SQL [1]. If you have a reproducer, you may check whether your issue is still actual. [1]: https://issues.apache.org/jira/browse/IGNITE-12482 On Thu, Jan 16, 2020 at 1:51 PM Sergey-A Kosarev wrote: > Classification: Public > Hello, Igniters, > > Recently I've came across a problem with REPLICATED caches, so I've > created an issue: > https://issues.apache.org/jira/browse/IGNITE-12549 > > Please look at this. I believe, it's a bug. > > Not sure I could fix it quickly, feel free to take it if you like. > > And as workaround I think PARTITIONED caches with Integer.MAX_VALUE > backups can be used instead of REPLICATED caches. > > Will be glad for any feedback. > > Kind regards, > Sergey Kosarev > > > > > --- > This e-mail may contain confidential and/or privileged information. If you > are not the intended recipient (or have received this e-mail in error) > please notify the sender immediately and delete this e-mail. Any > unauthorized copying, disclosure or distribution of the material in this > e-mail is strictly forbidden. > > Please refer to https://www.db.com/disclosures for additional EU > corporate and regulatory disclosures and to > http://www.db.com/unitedkingdom/content/privacy.htm for information about > privacy. > -- Best Regards, Ivan Rakov
[jira] [Created] (IGNITE-12545) Introduce listener interface for components to react to partition map exchange events
Ivan Rakov created IGNITE-12545: --- Summary: Introduce listener interface for components to react to partition map exchange events Key: IGNITE-12545 URL: https://issues.apache.org/jira/browse/IGNITE-12545 Project: Ignite Issue Type: Bug Reporter: Ivan Rakov It would be handly to have listener interface for components that should react to PME instead of just adding more and more calls to GridDhtPartitionsExchangeFuture. In general, there are four possible moments when a compnent can be notified: on exchnage init (before and after topologies are updates and exchange latch is acquired) and on exchange done (before and after readyTopVer is incremented and user operations are unlocked). -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Apache Ignite 2.8 RELEASE [Time, Scope, Manager]
gt;> пт, 10 янв. 2020 г. в 10:24, Zhenya Stanilovsky < > > > > arzamas...@mail.ru.invalid > > > > >> >: > > > > >> > > > > >> > > > > > >> > > > > > >> > Agree with Nikolay, -1 from me, too. > > > > >> > > > > > >> > >Hello, Igniters. > > > > >> > > > > > > >> > >I’m -1 to include the read-only patch to 2.8. > > > > >> > >I think we shouldn’t accept any patches to 2.8 except bug fixes > > for > > > > >> > blockers and major issues. > > > > >> > > > > > > >> > >Guys, we don’t release Apache Ignite for 13 months! > > > > >> > >We should focus on the release and make it ASAP. > > > > >> > > > > > > >> > >We can’t extend the scope anymore. > > > > >> > > > > > > >> > >> 10 янв. 2020 г., в 04:29, Sergey Antonov < > > > > antonovserge...@gmail.com > > > > > >> > написал(а): > > > > >> > >> > > > > >> > >> Hello, Maxim! > > > > >> > >> > > > > >> > >>> This PR [2] doesn't look a very simple +5,517 −2,038, 111 > > files > > > > >> > >> changed. > > > > >> > >> Yes, PR is huge, but I wrote a lot of new tests and reworked > > > > already > > > > >> > >> presented. Changes in product code are minimal - only 30 > > changed > > > > files > > > > >> > in > > > > >> > >> /src/main/ part. And most of them are new control.sh commands > > and > > > > >> > >> configuration. > > > > >> > >> > > > > >> > >>> Do we have customer requests for this feature or maybe users > > who > > > > are > > > > >> > >> waiting for exactly that ENUM values exactly in 2.8 release > > (not > > > > the > > > > >> > 2.8.1 > > > > >> > >> for instance)? > > > > >> > >> Can we introduce in new features in maintanance release > > (2.8.1)? > > > > Cluster > > > > >> > >> read-only mode will be new feature, if we remove > > > > IgniteCluster#readOnly > > > > >> > in > > > > >> > >> 2.8 release. If all ok with that, lets remove > > > > IgniteCluster#readOnly and > > > > >> > >> move ticket [1] to 2.8.1 release. > > > > >> > >> > > > > >> > >>> Do we have extended test results report (on just only TC.Bot > > green > > > > >> > visa) > > > > >> > >> on this feature to be sure that we will not add any blocker > > issues > > > > to > > > > >> > the > > > > >> > >> release? > > > > >> > >> I'm preparing patch for 2.8 release and I will get new TC Bot > > visa > > > > vs > > > > >> > >> release branch. > > > > >> > >> > > > > >> > >> [1] https://issues.apache.org/jira/browse/IGNITE-12225 > > > > >> > >> > > > > >> > >> > > > > >> > >> > > > > >> > >> чт, 9 янв. 2020 г. в 19:38, Maxim Muzafarov < > > mmu...@apache.org > > > > >: > > > > >> > >> > > > > >> > >>> Folks, > > > > >> > >>> > > > > >> > >>> > > > > >> > >>> Let me remind you that we are working on the 2.8 release > > branch > > > > >> > >>> stabilization currently (please, keep it in mind). > > > > >> > >>> > > > > >> > >>> > > > > >> > >>> Do we have a really STRONG reason for adding such a change > > [1] to > > > > the > > > > >> > >>> ignite-2.8 branch? This PR [2] doesn't look a very simple > > +5,517 > > > > >> > >>> −2,038, 111 files changed. > > > > >> > >>> Do w
[jira] [Created] (IGNITE-12531) Cluster is unable to change BLT on 2.8 if storage was initially created on 2.7 or less
Ivan Rakov created IGNITE-12531: --- Summary: Cluster is unable to change BLT on 2.8 if storage was initially created on 2.7 or less Key: IGNITE-12531 URL: https://issues.apache.org/jira/browse/IGNITE-12531 Project: Ignite Issue Type: Bug Affects Versions: 2.8 Reporter: Ivan Rakov Fix For: 2.8 Due to bug in https://issues.apache.org/jira/browse/IGNITE-10348, after storage migration from 2.7- to 2.8 any updates of metastorage are not persisted. S2R: (on 2.7) - Activate persistent cluster with 2 nodes - Shutdown the cluster (on 2.8) - Start cluster with 2 nodes based on persistent storage from 2.7 - Start 3rd node - Change baseline - Shutdown the cluster - Start initial two nodes - Start 3rd node (join is rejected: first two nodes has old BLT of two nodes, 3rd node has new BLT of three nodes) -- This message was sent by Atlassian Jira (v8.3.4#803005)
Hint for user that baseline topology should be changed in order to trigger rebalance
Folks, Since 2.4, Ignite cluster requires baseline topology in persistent mode. That means if user wants to scale cluster and add more nodes, data won't be redistributed among the whole node set until manual call of IgniteCluster#setBaselineTopology. Surely this behavior is well-documented, but don't we need to give user a hint that baseline topology should be managed manually? I think, log message with something like "Current set of nodes differs from baseline topology, please call XXX in order to trigger rebalance and redistribute your data" will make the situation a bit more transparent. Right now we have only this message > [2020-01-07T19:36:45,997][INFO > ][exchange-worker-#39%blue-54.158.100.161%][GridCachePartitionExchangeManager] > Skipping > rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=2, > minorTopVer=0], force=false, evt=NODE_JOINED, node=57bc10fe-1505-4e8e-9987- > 52c9c903c6ef] which doesn't properly explain what's going on.
Re: Apache Ignite 2.8 RELEASE [Time, Scope, Manager]
Maxim M. and anyone who is interested, I suggest to include this fix to 2.8 release: https://issues.apache.org/jira/browse/IGNITE-12225 Basically, it's a result of the following discussion: http://apache-ignite-developers.2346864.n4.nabble.com/DISCUSSION-Single-point-in-API-for-changing-cluster-state-td43665.html The fix affects public API: IgniteCluster#readOnly methods that work with boolean are replaced with ones that work with enum. If we include it, we won't be obliged to keep deprecated boolean version of API in the code (which is currently present in 2.8 branch) as it wasn't published in any release. On Tue, Dec 31, 2019 at 3:54 PM Ilya Kasnacheev wrote: > Hello! > > I have ran dependency checker plugin and quote the following: > > One or more dependencies were identified with known vulnerabilities in > ignite-urideploy: > One or more dependencies were identified with known vulnerabilities in > ignite-spring: > One or more dependencies were identified with known vulnerabilities in > ignite-spring-data: > One or more dependencies were identified with known vulnerabilities in > ignite-aop: > One or more dependencies were identified with known vulnerabilities in > ignite-visor-console: > > spring-core-4.3.18.RELEASE.jar > (pkg:maven/org.springframework/spring-core@4.3.18.RELEASE, > cpe:2.3:a:pivotal_software:spring_framework:4.3.18.release:*:*:*:*:*:*:*, > cpe:2.3:a:springsource:spring_framework:4.3.18.release:*:*:*:*:*:*:*, > cpe:2.3:a:vmware:springsource_spring_framework:4.3.18:*:*:*:*:*:*:*) : > CVE-2018-15756 > > One or more dependencies were identified with known vulnerabilities in > ignite-spring-data_2.0: > > spring-core-5.0.8.RELEASE.jar > (pkg:maven/org.springframework/spring-core@5.0.8.RELEASE, > cpe:2.3:a:pivotal_software:spring_framework:5.0.8.release:*:*:*:*:*:*:*, > cpe:2.3:a:springsource:spring_framework:5.0.8.release:*:*:*:*:*:*:*, > cpe:2.3:a:vmware:springsource_spring_framework:5.0.8:*:*:*:*:*:*:*) : > CVE-2018-15756 > > One or more dependencies were identified with known vulnerabilities in > ignite-rest-http: > > jetty-server-9.4.11.v20180605.jar > (pkg:maven/org.eclipse.jetty/jetty-server@9.4.11.v20180605, > cpe:2.3:a:eclipse:jetty:9.4.11:20180605:*:*:*:*:*:*, > cpe:2.3:a:jetty:jetty:9.4.11.v20180605:*:*:*:*:*:*:*, > cpe:2.3:a:mortbay_jetty:jetty:9.4.11:20180605:*:*:*:*:*:*) : > CVE-2018-12545, CVE-2019-10241, CVE-2019-10247 > jackson-databind-2.9.6.jar > (pkg:maven/com.fasterxml.jackson.core/jackson-databind@2.9.6, > cpe:2.3:a:fasterxml:jackson:2.9.6:*:*:*:*:*:*:*, > cpe:2.3:a:fasterxml:jackson-databind:2.9.6:*:*:*:*:*:*:*) : > CVE-2018-1000873, CVE-2018-14718, CVE-2018-14719, CVE-2018-14720, > CVE-2018-14721, CVE-2018-19360, CVE-2018-19361, CVE-2018-19362, > CVE-2019-12086, CVE-2019-12384, CVE-2019-12814, CVE-2019-14379, > CVE-2019-14439, CVE-2019-14540, CVE-2019-16335, CVE-2019-16942, > CVE-2019-16943, CVE-2019-17267, CVE-2019-17531 > > One or more dependencies were identified with known vulnerabilities in > ignite-kubernetes: > One or more dependencies were identified with known vulnerabilities in > ignite-aws: > > jackson-databind-2.9.6.jar > (pkg:maven/com.fasterxml.jackson.core/jackson-databind@2.9.6, > cpe:2.3:a:fasterxml:jackson:2.9.6:*:*:*:*:*:*:*, > cpe:2.3:a:fasterxml:jackson-databind:2.9.6:*:*:*:*:*:*:*) : > CVE-2018-1000873, CVE-2018-14718, CVE-2018-14719, CVE-2018-14720, > CVE-2018-14721, CVE-2018-19360, CVE-2018-19361, CVE-2018-19362, > CVE-2019-12086, CVE-2019-12384, CVE-2019-12814, CVE-2019-14379, > CVE-2019-14439, CVE-2019-14540, CVE-2019-16335, CVE-2019-16942, > CVE-2019-16943, CVE-2019-17267, CVE-2019-17531 > bcprov-ext-jdk15on-1.54.jar > (pkg:maven/org.bouncycastle/bcprov-ext-jdk15on@1.54) : CVE-2015-6644, > CVE-2016-1000338, CVE-2016-1000339, CVE-2016-1000340, CVE-2016-1000341, > CVE-2016-1000342, CVE-2016-1000343, CVE-2016-1000344, CVE-2016-1000345, > CVE-2016-1000346, CVE-2016-1000352, CVE-2016-2427, CVE-2017-13098, > CVE-2018-1000180, CVE-2018-1000613 > > One or more dependencies were identified with known vulnerabilities in > ignite-gce: > > httpclient-4.0.1.jar (pkg:maven/org.apache.httpcomponents/httpclient@4.0.1 > , > cpe:2.3:a:apache:httpclient:4.0.1:*:*:*:*:*:*:*) : CVE-2011-1498, > CVE-2014-3577, CVE-2015-5262 > guava-jdk5-17.0.jar (pkg:maven/com.google.guava/guava-jdk5@17.0, > cpe:2.3:a:google:guava:17.0:*:*:*:*:*:*:*) : CVE-2018-10237 > > One or more dependencies were identified with known vulnerabilities in > ignite-cloud: > > openstack-keystone-2.0.0.jar > (pkg:maven/org.apache.jclouds.api/openstack-keystone@2.0.0, > cpe:2.3:a:openstack:keystone:2.0.0:*:*:*:*:*:*:*, > cpe:2.3:a:openstack:openstack:2.0.0:*:*:*:*:*:*:*) : CVE-2013-2014, > CVE-2013-4222, CVE-2013-6391, CVE-2014-0204, CVE-2014-3476, CVE-2014-3520, > CVE-2014-3621, CVE-2015-3646, CVE-2015-7546, CVE-2018-14432, CVE-2018-20170 > cloudstack-2.0.0.jar (pkg:maven/org.apache.jclouds.api/cloudstack@2.0.0, > cpe:2.3:a:apache:cloudstack:2.0.0:*:*:*:*:*:*:*) : CVE-2013-2136, > CVE-2013-639
[jira] [Created] (IGNITE-12510) In-memory page eviction may fail in case very large entries are stored in the cache
Ivan Rakov created IGNITE-12510: --- Summary: In-memory page eviction may fail in case very large entries are stored in the cache Key: IGNITE-12510 URL: https://issues.apache.org/jira/browse/IGNITE-12510 Project: Ignite Issue Type: Bug Affects Versions: 2.7.6 Reporter: Ivan Rakov In-memory page eviction (both DataPageEvictionMode#RANDOM_LRU and DataPageEvictionMode#RANDOM_2_LRU) has limited number of attempts to choose candidate page for data removal: {code:java} if (sampleSpinCnt > SAMPLE_SPIN_LIMIT) { // 5000 LT.warn(log, "Too many attempts to choose data page: " + SAMPLE_SPIN_LIMIT); return; } {code} Large data entries are stored in several data pages which are sequentially linked to each other. Only "head" pages are suitable for eviction, because the whole entry is available only from "head" page (list of pages is singly linked; there are no reverse links from tail to head). The problem is that if we put large enough entries to evictable cache (e.g. each entry needs more than 5000 pages to be stored), there are too few head pages and "Too many attempts to choose data page" error is likely to show up. We need to perform something like full scan if we failed to find a head page in SAMPLE_SPIN_LIMIT attempts instead of just failing node with error. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-12509) CACHE_REBALANCE_STOPPED event raises for wrong caches in case of specified RebalanceDelay
Ivan Rakov created IGNITE-12509: --- Summary: CACHE_REBALANCE_STOPPED event raises for wrong caches in case of specified RebalanceDelay Key: IGNITE-12509 URL: https://issues.apache.org/jira/browse/IGNITE-12509 Project: Ignite Issue Type: Bug Reporter: Ivan Rakov Fix For: 2.9 Steps to reproduce: 1. Start in-memory cluster with 2 server nodes 2. Start 3 caches with different rebalance delays (e.g. 5, 10 and 15 seconds) and upload some data 3. Start localListener for EVT_CACHE_REBALANCE_STOPPED event on one of the nodes. 4. Start one more server node. 5. Wait for 5 seconds, till rebalance delay is reached. 6. EVT_CACHE_REBALANCE_STOPPED event received 3 times (1 for each cache), but in fact only 1 cache was rebalanced. The same happens for the rest of the caches. As result on rebalance finish we're getting event for each cache [CACHE_COUNT] times, instead of 1. Reproducer attached. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-12508) GridCacheProcessor#cacheDescriptor(int) has O(N) complexity
Ivan Rakov created IGNITE-12508: --- Summary: GridCacheProcessor#cacheDescriptor(int) has O(N) complexity Key: IGNITE-12508 URL: https://issues.apache.org/jira/browse/IGNITE-12508 Project: Ignite Issue Type: Bug Reporter: Ivan Rakov Fix For: 2.9 See the method code: {code} @Nullable public DynamicCacheDescriptor cacheDescriptor(int cacheId) { for (DynamicCacheDescriptor cacheDesc : cacheDescriptors().values()) { CacheConfiguration ccfg = cacheDesc.cacheConfiguration(); assert ccfg != null : cacheDesc; if (CU.cacheId(ccfg.getName()) == cacheId) return cacheDesc; } return null; } {code} This method is invoked in several hot paths which causes significant performance regression when the number of caches is large, for example, logical recovery and security check for indexing. The method should be improved to use a hash map or similar data structure to get a better complexity -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-12507) Implement cache size metric in bytes
Ivan Rakov created IGNITE-12507: --- Summary: Implement cache size metric in bytes Key: IGNITE-12507 URL: https://issues.apache.org/jira/browse/IGNITE-12507 Project: Ignite Issue Type: Improvement Components: cache Reporter: Ivan Rakov Fix For: 2.9 There is a need to have cache size in bytes metric for pure in-memory case. When all data is in RAM, it is not obvious to find out exactly how much space is consumed by cache data on a running node as the only things that could be watched on is number of keys in partition on a specific node and memory usage metrics on the machine. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-12451) Introduce deadlock detection for cache entry reentrant locks
Ivan Rakov created IGNITE-12451: --- Summary: Introduce deadlock detection for cache entry reentrant locks Key: IGNITE-12451 URL: https://issues.apache.org/jira/browse/IGNITE-12451 Project: Ignite Issue Type: Improvement Affects Versions: 2.7.6 Reporter: Ivan Rakov Fix For: 2.9 Aside from IGNITE-12365, we still have possible threat of cache-entry-level deadlock in case of careless usage of JCache mass operations (putAll, removeAll): 1. If two different user threads will perform putAll on the same two keys in reverse order (primary node for which is the same), there's a chance that sys-stripe threads will be deadlocked. 2. Even without direct contract violation from user side, HashMap can be passed as argument for putAll. Even if user threads have called mass operations with two keys in the same order, HashMap iteration order is not strictly defined, which may cause the same deadlock. Local deadlock detection should mitigate this issue. We can create a wrapper for ReentrantLock with logic that performs cycle detection in wait-for graph in case we are waiting for lock acquisition for too long. Exception will be thrown from one of the threads in such case, failing user operation, but letting the system make progress. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IGNITE-12429) Rework bytes-based WAL archive size management logic to make historical rebalance more predictable
Ivan Rakov created IGNITE-12429: --- Summary: Rework bytes-based WAL archive size management logic to make historical rebalance more predictable Key: IGNITE-12429 URL: https://issues.apache.org/jira/browse/IGNITE-12429 Project: Ignite Issue Type: Improvement Reporter: Ivan Rakov Since 2.7 DataStorageConfiguration allows to specify size of WAL archive in bytes (see DataStorageConfiguration#maxWalArchiveSize), which is much more trasparent to user. Unfortunately, new logic may be unpredictable when it comes to the historical rebalance. WAL archive is truncated when one of the following conditions occur: 1. Total number of checkpoints in WAL archive is bigger than DataStorageConfiguration#walHistSize 2. Total size of WAL archive is bigger than DataStorageConfiguration#maxWalArchiveSize Independently, in-memory checkpoint history contains only fixed number of last checkpoints (can be changed with IGNITE_PDS_MAX_CHECKPOINT_MEMORY_HISTORY_SIZE, 100 by default). All these particular qualities make it hard for user to cotrol usage of historical rebalance. Imagine the case when user has slight load (WAL gets rotated very slowly) and default checkpoint frequency. After 100 * 3 = 300 minutes, all updates in WAL will be impossible to be received via historical rebalance even if: 1. User has configured large DataStorageConfiguration#maxWalArchiveSize 2. User has configured large DataStorageConfiguration#walHistSize At the same time, setting large IGNITE_PDS_MAX_CHECKPOINT_MEMORY_HISTORY_SIZE will help (only with previous two points combined), but Ignite node heap usage may increase dramatically. I propose to change WAL history management logic in the following way: 1. *Don't* cut WAL archive when number of checkpoint exceeds DataStorageConfiguration#walHistSize. WAL history should be managed only based on DataStorageConfiguration#maxWalArchiveSize. 2. Checkpoint history should contain fixed number of entries, but should cover the whole stored WAL archive (not only its more recent part with IGNITE_PDS_MAX_CHECKPOINT_MEMORY_HISTORY_SIZE last checkpoints). This can be achieved by making checkpoint history sparse: some intermediate checkpoints *may be not present in history*, but fixed number of checkpoints can be positioned either in uniform distribution (trying to keep fixed number of bytes between two neighbour checkpoints) or exponentially (trying to keep fixed ratio between (size of WAL from checkpoint(N-1) to current write pointer) and (size of WAL from checkpoint(N) to current write pointer). -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [VOTE] Apache Ignite PMC Chair
+1 for Dmitry Pavlov Best Regards, Ivan Rakov On 29.10.2019 10:50, Ilya Kasnacheev wrote: +1 for Nikolay Izhikov (binding) Regards,
Re: Metric showing how many nodes may safely leave the cluster
https://issues.apache.org/jira/browse/IGNITE-12278 Best Regards, Ivan Rakov On 07.10.2019 15:08, Ivan Rakov wrote: Denis, Alex, Sure, new metric will be integrated into new metrics framework. Let's not expose its value to control.sh right now. I'll create an issue for aggregated "getMinimumNumberOfPartitionCopies" if everyone agrees. Best Regards, Ivan Rakov On 04.10.2019 20:06, Denis Magda wrote: I'm for the proposal to add new JMX metrics and enhance the existing tooling. But I would encourage us to integrate this into the new metrics framework Nikolay has been working on. Otherwise, we will be deprecating these JMX metrics in a short time frame in favor of the new monitoring APIs. - Denis On Fri, Oct 4, 2019 at 9:33 AM Alexey Goncharuk wrote: I agree that we should have the ability to read any metric using simple Ignite tooling. I am not sure if visor.sh is a good fit - if I remember correctly, it will start a daemon node which will bump the topology version with all related consequences. I believe in the long term it will beneficial to migrate all visor.sh functionality to a more lightweight protocol, such as used in control.sh. As for the metrics, the metric suggested by Ivan totally makes sense to me - it is a simple and, actually, quite critical metric. It will be completely unusable to select a minimum of some metric for all cache groups manually. A monitoring system, on the other hand, might not be available when the metric is needed, or may not support aggregation. --AG пт, 4 окт. 2019 г. в 18:58, Ivan Rakov : Nikolay, Many users start to use Ignite with a small project without production-level monitoring. When proof-of-concept appears to be viable, they tend to expand Ignite usage by growing cluster and adding needed environment (including monitoring systems). Inability to find such basic thing as survival in case of next node crash may affect overall product impression. We all want Ignite to be successful and widespread. Can you clarify, what do you mean, exactly? Right now user can access metric mentioned by Alex and choose minimum of all cache groups. I want to highlight that not every user understands Ignite and its internals so much to find out that exactly these sequence of actions will bring him to desired answer. Can you clarify, what do you mean, exactly? We have a ticket[1] to support metrics output via visor.sh. My understanding: we should have an easy way to output metric values for each node in cluster. [1] https://issues.apache.org/jira/browse/IGNITE-12191 I propose to add metric method for aggregated "getMinimumNumberOfPartitionCopies" and expose it to control.sh. My understanding: it's result is critical enough to be accessible in a short path. I've started this topic due to request from user list, and I've heard many similar complaints before. Best Regards, Ivan Rakov On 04.10.2019 17:18, Nikolay Izhikov wrote: Ivan. We shouldn't force users to configure external tools and write extra code for basic things. Actually, I don't agree with you. Having external monitoring system for any production cluster is a *basic* thing. Can you, please, define "basic things"? single method for the whole cluster Can you clarify, what do you mean, exactly? We have a ticket[1] to support metrics output via visor.sh. My understanding: we should have an easy way to output metric values for each node in cluster. [1] https://issues.apache.org/jira/browse/IGNITE-12191 В Пт, 04/10/2019 в 17:09 +0300, Ivan Rakov пишет: Max, What if user simply don't have configured monitoring system? Knowing whether cluster will survive node shutdown is critical for any administrator that performs any manipulations with cluster topology. Essential information should be easily accessed. We shouldn't force users to configure external tools and write extra code for basic things. Alex, Thanks, that's exact metric we need. My point is that we should make it more accessible: via control.sh command and single method for the whole cluster. Best Regards, Ivan Rakov On 04.10.2019 16:34, Alex Plehanov wrote: Ivan, there already exist metric CacheGroupMetricsMXBean#getMinimumNumberOfPartitionCopies, which shows the current redundancy level for the cache group. We can lose up to ( getMinimumNumberOfPartitionCopies-1) nodes without data loss in this cache group. пт, 4 окт. 2019 г. в 16:17, Ivan Rakov : Igniters, I've seen numerous requests to find out an easy way to check whether is it safe to turn off cluster node. As we know, in Ignite protection from sudden node shutdown is implemented through keeping several backup copies of each partition. However, this guarantee can be weakened for a while in case cluster has recently experienced node restart and rebalancing process is still in progress. Example scenario is restarting nodes one by one in order to
[jira] [Created] (IGNITE-12278) Add metric showing how many nodes may safely leave the cluster wihout partition loss
Ivan Rakov created IGNITE-12278: --- Summary: Add metric showing how many nodes may safely leave the cluster wihout partition loss Key: IGNITE-12278 URL: https://issues.apache.org/jira/browse/IGNITE-12278 Project: Ignite Issue Type: Improvement Reporter: Ivan Rakov Fix For: 2.8 We already have getMinimumNumberOfPartitionCopies metrics that shows partitions redundancy number for a specific cache group. It would be handy if user has single aggregated metric for all cache groups showing how many nodes may leave the cluster without partition loss in any cache. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Metric showing how many nodes may safely leave the cluster
Denis, Alex, Sure, new metric will be integrated into new metrics framework. Let's not expose its value to control.sh right now. I'll create an issue for aggregated "getMinimumNumberOfPartitionCopies" if everyone agrees. Best Regards, Ivan Rakov On 04.10.2019 20:06, Denis Magda wrote: I'm for the proposal to add new JMX metrics and enhance the existing tooling. But I would encourage us to integrate this into the new metrics framework Nikolay has been working on. Otherwise, we will be deprecating these JMX metrics in a short time frame in favor of the new monitoring APIs. - Denis On Fri, Oct 4, 2019 at 9:33 AM Alexey Goncharuk wrote: I agree that we should have the ability to read any metric using simple Ignite tooling. I am not sure if visor.sh is a good fit - if I remember correctly, it will start a daemon node which will bump the topology version with all related consequences. I believe in the long term it will beneficial to migrate all visor.sh functionality to a more lightweight protocol, such as used in control.sh. As for the metrics, the metric suggested by Ivan totally makes sense to me - it is a simple and, actually, quite critical metric. It will be completely unusable to select a minimum of some metric for all cache groups manually. A monitoring system, on the other hand, might not be available when the metric is needed, or may not support aggregation. --AG пт, 4 окт. 2019 г. в 18:58, Ivan Rakov : Nikolay, Many users start to use Ignite with a small project without production-level monitoring. When proof-of-concept appears to be viable, they tend to expand Ignite usage by growing cluster and adding needed environment (including monitoring systems). Inability to find such basic thing as survival in case of next node crash may affect overall product impression. We all want Ignite to be successful and widespread. Can you clarify, what do you mean, exactly? Right now user can access metric mentioned by Alex and choose minimum of all cache groups. I want to highlight that not every user understands Ignite and its internals so much to find out that exactly these sequence of actions will bring him to desired answer. Can you clarify, what do you mean, exactly? We have a ticket[1] to support metrics output via visor.sh. My understanding: we should have an easy way to output metric values for each node in cluster. [1] https://issues.apache.org/jira/browse/IGNITE-12191 I propose to add metric method for aggregated "getMinimumNumberOfPartitionCopies" and expose it to control.sh. My understanding: it's result is critical enough to be accessible in a short path. I've started this topic due to request from user list, and I've heard many similar complaints before. Best Regards, Ivan Rakov On 04.10.2019 17:18, Nikolay Izhikov wrote: Ivan. We shouldn't force users to configure external tools and write extra code for basic things. Actually, I don't agree with you. Having external monitoring system for any production cluster is a *basic* thing. Can you, please, define "basic things"? single method for the whole cluster Can you clarify, what do you mean, exactly? We have a ticket[1] to support metrics output via visor.sh. My understanding: we should have an easy way to output metric values for each node in cluster. [1] https://issues.apache.org/jira/browse/IGNITE-12191 В Пт, 04/10/2019 в 17:09 +0300, Ivan Rakov пишет: Max, What if user simply don't have configured monitoring system? Knowing whether cluster will survive node shutdown is critical for any administrator that performs any manipulations with cluster topology. Essential information should be easily accessed. We shouldn't force users to configure external tools and write extra code for basic things. Alex, Thanks, that's exact metric we need. My point is that we should make it more accessible: via control.sh command and single method for the whole cluster. Best Regards, Ivan Rakov On 04.10.2019 16:34, Alex Plehanov wrote: Ivan, there already exist metric CacheGroupMetricsMXBean#getMinimumNumberOfPartitionCopies, which shows the current redundancy level for the cache group. We can lose up to ( getMinimumNumberOfPartitionCopies-1) nodes without data loss in this cache group. пт, 4 окт. 2019 г. в 16:17, Ivan Rakov : Igniters, I've seen numerous requests to find out an easy way to check whether is it safe to turn off cluster node. As we know, in Ignite protection from sudden node shutdown is implemented through keeping several backup copies of each partition. However, this guarantee can be weakened for a while in case cluster has recently experienced node restart and rebalancing process is still in progress. Example scenario is restarting nodes one by one in order to update a local configuration parameter. User restarts one node and rebalancing starts: when it will be completed, it will be sa
Re: Metric showing how many nodes may safely leave the cluster
Nikolay, Many users start to use Ignite with a small project without production-level monitoring. When proof-of-concept appears to be viable, they tend to expand Ignite usage by growing cluster and adding needed environment (including monitoring systems). Inability to find such basic thing as survival in case of next node crash may affect overall product impression. We all want Ignite to be successful and widespread. Can you clarify, what do you mean, exactly? Right now user can access metric mentioned by Alex and choose minimum of all cache groups. I want to highlight that not every user understands Ignite and its internals so much to find out that exactly these sequence of actions will bring him to desired answer. Can you clarify, what do you mean, exactly? We have a ticket[1] to support metrics output via visor.sh. My understanding: we should have an easy way to output metric values for each node in cluster. [1] https://issues.apache.org/jira/browse/IGNITE-12191 I propose to add metric method for aggregated "getMinimumNumberOfPartitionCopies" and expose it to control.sh. My understanding: it's result is critical enough to be accessible in a short path. I've started this topic due to request from user list, and I've heard many similar complaints before. Best Regards, Ivan Rakov On 04.10.2019 17:18, Nikolay Izhikov wrote: Ivan. We shouldn't force users to configure external tools and write extra code for basic things. Actually, I don't agree with you. Having external monitoring system for any production cluster is a *basic* thing. Can you, please, define "basic things"? single method for the whole cluster Can you clarify, what do you mean, exactly? We have a ticket[1] to support metrics output via visor.sh. My understanding: we should have an easy way to output metric values for each node in cluster. [1] https://issues.apache.org/jira/browse/IGNITE-12191 В Пт, 04/10/2019 в 17:09 +0300, Ivan Rakov пишет: Max, What if user simply don't have configured monitoring system? Knowing whether cluster will survive node shutdown is critical for any administrator that performs any manipulations with cluster topology. Essential information should be easily accessed. We shouldn't force users to configure external tools and write extra code for basic things. Alex, Thanks, that's exact metric we need. My point is that we should make it more accessible: via control.sh command and single method for the whole cluster. Best Regards, Ivan Rakov On 04.10.2019 16:34, Alex Plehanov wrote: Ivan, there already exist metric CacheGroupMetricsMXBean#getMinimumNumberOfPartitionCopies, which shows the current redundancy level for the cache group. We can lose up to ( getMinimumNumberOfPartitionCopies-1) nodes without data loss in this cache group. пт, 4 окт. 2019 г. в 16:17, Ivan Rakov : Igniters, I've seen numerous requests to find out an easy way to check whether is it safe to turn off cluster node. As we know, in Ignite protection from sudden node shutdown is implemented through keeping several backup copies of each partition. However, this guarantee can be weakened for a while in case cluster has recently experienced node restart and rebalancing process is still in progress. Example scenario is restarting nodes one by one in order to update a local configuration parameter. User restarts one node and rebalancing starts: when it will be completed, it will be safe to proceed (backup count=1). However, there's no transparent way to determine whether rebalancing is over. From my perspective, it would be very helpful to: 1) Add information about rebalancing and number of free-to-go nodes to ./control.sh --state command. Examples of output: Cluster ID: 125a6dce-74b1-4ee7-a453-c58f23f1f8fc Cluster tag: new_tag Cluster is active All partitions are up-to-date. 3 node(s) can safely leave the cluster without partition loss. Cluster ID: 125a6dce-74b1-4ee7-a453-c58f23f1f8fc Cluster tag: new_tag Cluster is active Rebalancing is in progress. 1 node(s) can safely leave the cluster without partition loss. 2) Provide the same information via ClusterMetrics. For example: ClusterMetrics#isRebalanceInProgress // boolean ClusterMetrics#getSafeToLeaveNodesCount // int Here I need to mention that this information can be calculated from existing rebalance metrics (see CacheMetrics#*rebalance*). However, I still think that we need more simple and understandable flag whether cluster is in danger of data loss. Another point is that current metrics are bound to specific cache, which makes this information even harder to analyze. Thoughts? -- Best Regards, Ivan Rakov
Re: Metric showing how many nodes may safely leave the cluster
Max, What if user simply don't have configured monitoring system? Knowing whether cluster will survive node shutdown is critical for any administrator that performs any manipulations with cluster topology. Essential information should be easily accessed. We shouldn't force users to configure external tools and write extra code for basic things. Alex, Thanks, that's exact metric we need. My point is that we should make it more accessible: via control.sh command and single method for the whole cluster. Best Regards, Ivan Rakov On 04.10.2019 16:34, Alex Plehanov wrote: Ivan, there already exist metric CacheGroupMetricsMXBean#getMinimumNumberOfPartitionCopies, which shows the current redundancy level for the cache group. We can lose up to ( getMinimumNumberOfPartitionCopies-1) nodes without data loss in this cache group. пт, 4 окт. 2019 г. в 16:17, Ivan Rakov : Igniters, I've seen numerous requests to find out an easy way to check whether is it safe to turn off cluster node. As we know, in Ignite protection from sudden node shutdown is implemented through keeping several backup copies of each partition. However, this guarantee can be weakened for a while in case cluster has recently experienced node restart and rebalancing process is still in progress. Example scenario is restarting nodes one by one in order to update a local configuration parameter. User restarts one node and rebalancing starts: when it will be completed, it will be safe to proceed (backup count=1). However, there's no transparent way to determine whether rebalancing is over. From my perspective, it would be very helpful to: 1) Add information about rebalancing and number of free-to-go nodes to ./control.sh --state command. Examples of output: Cluster ID: 125a6dce-74b1-4ee7-a453-c58f23f1f8fc Cluster tag: new_tag Cluster is active All partitions are up-to-date. 3 node(s) can safely leave the cluster without partition loss. Cluster ID: 125a6dce-74b1-4ee7-a453-c58f23f1f8fc Cluster tag: new_tag Cluster is active Rebalancing is in progress. 1 node(s) can safely leave the cluster without partition loss. 2) Provide the same information via ClusterMetrics. For example: ClusterMetrics#isRebalanceInProgress // boolean ClusterMetrics#getSafeToLeaveNodesCount // int Here I need to mention that this information can be calculated from existing rebalance metrics (see CacheMetrics#*rebalance*). However, I still think that we need more simple and understandable flag whether cluster is in danger of data loss. Another point is that current metrics are bound to specific cache, which makes this information even harder to analyze. Thoughts? -- Best Regards, Ivan Rakov
Metric showing how many nodes may safely leave the cluster
Igniters, I've seen numerous requests to find out an easy way to check whether is it safe to turn off cluster node. As we know, in Ignite protection from sudden node shutdown is implemented through keeping several backup copies of each partition. However, this guarantee can be weakened for a while in case cluster has recently experienced node restart and rebalancing process is still in progress. Example scenario is restarting nodes one by one in order to update a local configuration parameter. User restarts one node and rebalancing starts: when it will be completed, it will be safe to proceed (backup count=1). However, there's no transparent way to determine whether rebalancing is over. From my perspective, it would be very helpful to: 1) Add information about rebalancing and number of free-to-go nodes to ./control.sh --state command. Examples of output: Cluster ID: 125a6dce-74b1-4ee7-a453-c58f23f1f8fc Cluster tag: new_tag Cluster is active All partitions are up-to-date. 3 node(s) can safely leave the cluster without partition loss. Cluster ID: 125a6dce-74b1-4ee7-a453-c58f23f1f8fc Cluster tag: new_tag Cluster is active Rebalancing is in progress. 1 node(s) can safely leave the cluster without partition loss. 2) Provide the same information via ClusterMetrics. For example: ClusterMetrics#isRebalanceInProgress // boolean ClusterMetrics#getSafeToLeaveNodesCount // int Here I need to mention that this information can be calculated from existing rebalance metrics (see CacheMetrics#*rebalance*). However, I still think that we need more simple and understandable flag whether cluster is in danger of data loss. Another point is that current metrics are bound to specific cache, which makes this information even harder to analyze. Thoughts? -- Best Regards, Ivan Rakov
Re: Apache Ignite 2.7.6 (Time, Scope, and Release manager)
Alexey, I've merged https://issues.apache.org/jira/browse/IGNITE-12163 to master and 2.7.6. Best Regards, Ivan Rakov On 11.09.2019 18:13, Alexey Goncharuk wrote: Good, Please let me know when this is done, I will re-upload the release artifacts. ср, 11 сент. 2019 г. в 18:11, Alexandr Shapkin : Alexey, The changes already have been tested, so no TC problems expected. If this is true, then we need just a few hours to merge them. From: Alexey Goncharuk Sent: Wednesday, September 11, 2019 6:03 PM To: dev Cc: Dmitriy Govorukhin; Anton Kalashnikov Subject: Re: Re[2]: Apache Ignite 2.7.6 (Time, Scope, and Release manager) Alexandr, I almost sent the vote email :) When do you expect the fix to be in master and 2.7.6? ср, 11 сент. 2019 г. в 17:38, Alexandr Shapkin : Folks, A critical bug was detected in .NET [1]. I understand that it’s a little bit late, but I propose to include this issue into the release scope. PR is ready, currently waiting for a TC visa. Thoughts? [1] - https://issues.apache.org/jira/browse/IGNITE-12163 From: Alexey Goncharuk Sent: Monday, September 9, 2019 5:11 PM To: dev Cc: Dmitriy Govorukhin; Anton Kalashnikov Subject: Re: Re[2]: Apache Ignite 2.7.6 (Time, Scope, and Release manager) Igniters, I just pushed the last ticket to ignite-2.7.6 branch; looks like we are ready for the next iteration. Given that Dmitriy Pavlov will be unavailable till the end of this week, I will take over the release. TC re-run is started. чт, 5 сент. 2019 г. в 16:14, Dmitriy Govorukhin < dmitriy.govoruk...@gmail.com>: Hi Igniters, I finished work on https://issues.apache.org/jira/browse/IGNITE-12127, fix already in master and ignite-2.7.6 On Wed, Sep 4, 2019 at 2:22 PM Dmitriy Govorukhin < dmitriy.govoruk...@gmail.com> wrote: Hi Alexey, I think that I will finish work on the fix tomorrow. Fix already completed but I need to get VISA from TC bot. On Mon, Sep 2, 2019 at 8:27 PM Alexey Goncharuk < alexey.goncha...@gmail.com> wrote: Folks, it looks like I was overly optimistic with the estimates for the mentioned two tickets. Dmitriy, Anton, Can you share your vision when the issues will be fixed? Perhaps, it makes sense to release 2.7.6 with the already fixed issues and schedule 2.7.7? Neither of them is a regression, so it's ok to release 2.7.6 as it is now. Thoughts? сб, 31 авг. 2019 г. в 11:37, Alexey Goncharuk < alexey.goncha...@gmail.com : Yes, my bad, forgot to include the link. That's the one. пт, 30 авг. 2019 г. в 15:01, Maxim Muzafarov : Alexey, Does the issue [1] is related to this [2] discussion on the user-list? If yes, I think it is very important to include these fixes to 2.7.6. [1] https://issues.apache.org/jira/browse/IGNITE-12127 [2] http://apache-ignite-users.70518.x6.nabble.com/Node-failure-with-quot-Failed-to-write-buffer-quot-error-td29100.html On Fri, 30 Aug 2019 at 14:26, Alexei Scherbakov wrote: Alexey, Looks like important fixes, better to include them. пт, 30 авг. 2019 г. в 12:51, Alexey Goncharuk < alexey.goncha...@gmail.com>: Igniters, Given that the RC1 vote did not succeed and we are still waiting for a few minor fixes, may I suggest including these two tickest to the 2.7.6 scope? https://issues.apache.org/jira/browse/IGNITE-12127 https://issues.apache.org/jira/browse/IGNITE-12128 The first one has been already reported on the dev-list [1], the second one may cause a state when an Ignite node cannot start on existing persisted data. Looking at the tickets, the fixes should be reasonably easy, so it should not shift 2.7.6 release timeline much. Thoughts? ср, 28 авг. 2019 г. в 15:25, Nikolay Izhikov < nizhi...@apache.org : Separate repos for different Spark version is a good idea for me. Anyway, can you help with Spark version migration, for now? В Ср, 28/08/2019 в 15:20 +0300, Alexey Zinoviev пишет: Maybe the best solution today add for each new version of Spark the sub-module (Spark-2.3, Spark-2.4) or the separate repository with modules for each version or another way with separate repository and different branches like in https://github.com/datastax/spark-cassandra-connector 3 ways to support different versions with the different costs of support In the case of separate repository I could help, for example ср, 28 авг. 2019 г. в 14:57, Nikolay Izhikov < nizhi...@apache.org : Hello, Alexey. But the compatibility with Spark 2.3 will be broken, isn't it? Yes. Do you have any plans to support the different version of Spark without loosing your unique expertise in Spark-Ignite integration? What do you mean by "my unique expertise"? :) How do you see support of several Spark version? В Ср, 28/08/2019 в 14:29 +0300, Alexey Zinoviev пишет: Dear Nikolay Izhikov Are you going to update the Ignite-Spark integration for Spark 2.4. But the compatibility with Spark 2.3 will be broken,
Re: [VOTE] Release Apache Ignite 2.7.6-rc1
+1 Downloaded binaries, successfully assembled cluster. Best Regards, Ivan Rakov On 23.08.2019 19:07, Dmitriy Pavlov wrote: +1 Checked: build from sources, startup node on Windows, simple topology, version and copyright year output, 2.7.6-rc0 is used in the Apache Ignite Teamcity Bot since Sunday, Aug 18 2.7.6-rc1 (ver. 2.7.6#20190821-sha1:6b3acf40) installed as DB for the TC Bot just now and the bot works well. пт, 23 авг. 2019 г. в 18:58, Alexey Kuznetsov : +1 Compiled from sources on Windows, started ignite.bat. On Fri, Aug 23, 2019 at 10:52 PM Pavel Tupitsyn wrote: +1, checked .NET node start and examples On Fri, Aug 23, 2019 at 6:49 PM Alexei Scherbakov < alexey.scherbak...@gmail.com> wrote: +1 пт, 23 авг. 2019 г. в 18:33, Alexey Goncharuk < alexey.goncha...@gmail.com : +1 Checked the source compilation and release package build, node start, and a few examples. Left a comment on the failed TC task in the discussion thread. пт, 23 авг. 2019 г. в 18:15, Andrey Gura : +1 On Fri, Aug 23, 2019 at 3:32 PM Anton Vinogradov wrote: -1 (binding) Explained at discussion thread. On Fri, Aug 23, 2019 at 11:17 AM Anton Vinogradov wrote: Dmitriy, Did you check RC using automated TeamCity task? On Fri, Aug 23, 2019 at 11:09 AM Zhenya Stanilovsky wrote: Build from sources, run yardstick test. +1 --- Forwarded message --- From: "Dmitriy Pavlov" < dpav...@apache.org > To: dev < dev@ignite.apache.org > Cc: Subject: [VOTE] Release Apache Ignite 2.7.6-rc1 Date: Thu, 22 Aug 2019 20:11:58 +0300 Dear Community, I have uploaded release candidate to https://dist.apache.org/repos/dist/dev/ignite/2.7.6-rc1/ https://dist.apache.org/repos/dist/dev/ignite/packages_2.7.6-rc1/ The following staging can be used for any dependent project for testing: https://repository.apache.org/content/repositories/orgapacheignite-1466/ This is the second maintenance release for 2.7.x with a number of fixes. Tag name is 2.7.6-rc1: https://gitbox.apache.org/repos/asf?p=ignite.git;a=tag;h=refs/tags/2.7.6-rc1 2.7.6 changes: * Ignite work directory is now set to the current user's home directory by default, native persistence files will not be stored in the Temp directory anymore * Fixed a bug that caused a SELECT query with an equality predicate on a part of the primary compound key to return a single row even if the query matched multiple rows * Fixed an issue that could cause data corruption during checkpointing * Fixed an issue where a row size was calculated incorrectly for shared cache groups, which caused a tree corruption * Reduced java heap footprint by optimizing GridDhtPartitionsFullMessage maps in exchange history * .NET: Native persistence now works with a custom affinity function * Fixed an issue where an outdated node with a destroyed cache caused the cluster to hang * Fixed a bug that made it impossible to change the inline_size property of an existing index after it was dropped and recreated with a different value RELEASE NOTES: https://gitbox.apache.org/repos/asf?p=ignite.git;a=blob;f=RELEASE_NOTES.txt;hb=ignite-2.7.6 Complete list of closed issues: https://issues.apache.org/jira/issues/?jql=project%20%3D%20IGNITE%20AND%20fixVersion%20%3D%202.7.6 DEVNOTES https://gitbox.apache.org/repos/asf?p=ignite.git;a=blob_plain;f=DEVNOTES.txt;hb=ignite-2.7.6 The vote is formal, see voting guidelines https://www.apache.org/foundation/voting.html +1 - to accept Apache Ignite 2.7.6-rc1 0 - don't care either way -1 - DO NOT accept Apache Ignite Ignite 2.7.6-rc1 (explain why) See notes on how to verify release here https://www.apache.org/info/verification.html and https://cwiki.apache.org/confluence/display/IGNITE/Release+Process#ReleaseProcess-P5.VotingonReleaseandReleaseVerification This vote will be open for at least 3 days till Sun Aug 25, 18:00 UTC. https://www.timeanddate.com/countdown/to?year=2019&month=8&day=25&hour=18&min=0&sec=0&p0=utc-1 Best Regards, Dmitriy Pavlov -- Zhenya Stanilovsky -- Best regards, Alexei Scherbakov -- Alexey Kuznetsov
Re: Replacing default work dir from tmp to current dir
Choosing the smallest of two evils, I'll agree with user.dir. Being able to run without preset env variables is strong benefit for Ignite as a product. Best Regards, Ivan Rakov On 12.08.2019 19:02, Denis Magda wrote: +1 for the user.dir as a default one. Denis On Monday, August 12, 2019, Dmitriy Pavlov wrote: +1 to user home directory. A number of open source products create their dirs there. For me, it is a kind of expected behavior. Ivan mentioned an important point: binary meta & marshaller. We should update documentation and stop require PDS dir setup, but require home setup (for older versions of Ignite, it is relevant anyway). пн, 12 авг. 2019 г. в 18:49, Pavel Tupitsyn : Hi Ivan, fail Ignite node in case neither IGNITE_HOME nor IgniteConfiguration#igniteWorkDir is set I strongly disagree, this is bad usability. Ignition.start() should work without any extra configuration as is it right now. Let's come up with reasonable defaults instead, user dir sounds good to me. On Mon, Aug 12, 2019 at 6:45 PM Stephen Darlington < stephen.darling...@gridgain.com> wrote: Yes, when data is a stake, fail early is the absolutely the right thing to do. Regards, Stephen On 12 Aug 2019, at 16:37, Ivan Rakov wrote: Hi Anton, Actually, the issue is even more unpleasant. Official Ignite documentation says that it's possible to configure path where your persistence files will be stored: https://apacheignite.readme.io/docs/distributed-persistent-store However, even if you have set all path options (storage, WAL, WAL archive), Ignite will still store crucial metadata in resolved work directory (java.io.tmpdir by default). Example is binary metadata files, absence of which can make your data unavailable. I propose to fail Ignite node in case neither IGNITE_HOME nor IgniteConfiguration#igniteWorkDir is set. It's better to let user know about missing configuration options during startup than let OS corrupt storage by cleaning temp dirs. Thoughts? Best Regards, Ivan Rakov On 12.08.2019 18:10, Anton Kalashnikov wrote: Hello, Igniters. Currently, in the case, when work directory wasn't set by user ignite can resolve it to tmp directory which leads to some problem - tmp directory can be cleared at some unexpected moment by operation system and different types of critical data would be lost(ex. binary_meta, persistance data). Looks like it is not expected behaviour and maybe it is better instead of tmp directory use the current working directory("user.dir")? Or any other idea? A little more details you can find in the ticket - https://issues.apache.org/jira/browse/IGNITE-12057 -- Best regards, Anton Kalashnikov
Re: Replacing default work dir from tmp to current dir
Hi Anton, Actually, the issue is even more unpleasant. Official Ignite documentation says that it's possible to configure path where your persistence files will be stored: https://apacheignite.readme.io/docs/distributed-persistent-store However, even if you have set all path options (storage, WAL, WAL archive), Ignite will still store crucial metadata in resolved work directory (java.io.tmpdir by default). Example is binary metadata files, absence of which can make your data unavailable. I propose to fail Ignite node in case neither IGNITE_HOME nor IgniteConfiguration#igniteWorkDir is set. It's better to let user know about missing configuration options during startup than let OS corrupt storage by cleaning temp dirs. Thoughts? Best Regards, Ivan Rakov On 12.08.2019 18:10, Anton Kalashnikov wrote: Hello, Igniters. Currently, in the case, when work directory wasn't set by user ignite can resolve it to tmp directory which leads to some problem - tmp directory can be cleared at some unexpected moment by operation system and different types of critical data would be lost(ex. binary_meta, persistance data). Looks like it is not expected behaviour and maybe it is better instead of tmp directory use the current working directory("user.dir")? Or any other idea? A little more details you can find in the ticket - https://issues.apache.org/jira/browse/IGNITE-12057 -- Best regards, Anton Kalashnikov
Re: [DISCUSSION][IEP-35] Metrics configuration
Hi guys, DataStorageConfiguration#getMetricsSubIntervalCount was added by me as last resort to decrease number of intervals in HitRateMetrics in case of unexpected negative performance impact. As far as I know, no one ever used it - the precaution appeared to be premature. We can disregard its presence in DataStorageConfiguration. From my point of view, there's no need to change intervals count in runtime - it affects only metric smoothness and should be chosen by developer who understands details of metric implementation. Regarding metrics configuration change management: if we are going to add it to the product, it should be user friendly (persistent and changeable in the whole cluster by single toggle, at least). Needing to change configuration at every cluster node after every cluster restart would irritate user more than provide help. Only very hacky cluster admin will be able to deal with current solution. Distributed Metastorage is a good candidate for storing and handling such configuration options. Best Regards, Ivan Rakov On 05.08.2019 18:38, Nikolay Izhikov wrote: Hello, Andrey. Not necessary if we have exponential bounds' values for histograms. What do you mean by "exponential bounds"? Anyway, in current solution it looks ugly and not usable. Thanks, for the feedback, appreciate you ownesty. No. But we should admit that this is bad decision and do not include this change to the code base. What is your proposal? How metrics configuration should work? Yes. But it still will not give enough accuracy. Enough for what? В Пн, 05/08/2019 в 18:29 +0300, Andrey Gura пишет: - metric configuration is node local (not cluster wide). This issue is easy to solve on the user-side and in Ignite core. It's imaginary simplicity. The first, you need some additional automation on user-side in order to configure all nodes of the cluster. The second, new nodes can join to the cluster and configuration will be different on new node and on other nodes of the cluster. This leads to complication whole functionality. Anyway, I don't like such simplified solution because at the moment it brings more problems than value. The easiest solution was implemented. Do we want to make it more complex right now :)? No. But we should admit that this is bad decision and do not include this change to the code base. The reason it exists in PR - we already have this parameter in DataStorageConfiguration#getMetricsSubIntervalCount I believe this method should be deprecated and removed in major release. I think the user should be able to configure buckets for histogram and rateTimeInterval for hitrate. Not necessary if we have exponential bounds' values for histograms. Anyway, in current solution it looks ugly and not usable. Ignite has dozens of use-cases and deployment modes, seems, we can't cover it all with the single predefined buckets/rateTimeInterval set. Yes. But it still will not give enough accuracy. On Mon, Aug 5, 2019 at 5:25 PM Nikolay Izhikov wrote: Hello, Andrey. - metric configuration is node local (not cluster wide). This issue is easy to solve on the user-side and in Ignite core. - metric configuration doesn't survive node restart. We decide to go with the simplest solution, for now. The easiest solution was implemented. Do we want to make it more complex right now :)? - User shouldn't configure hit rate metrics at runtime in most cases. I agree with you - the size of the counters array looks odd as a configuration parameter. The reason it exists in PR - we already have this parameter in DataStorageConfiguration#getMetricsSubIntervalCount - May be it is enough for user to have histograms with pre-configured buckets So I think we should drop this change and idea about runtime histrogram and hit rate configuration. I think the user should be able to configure buckets for histogram and rateTimeInterval for hitrate. Ignite has dozens of use-cases and deployment modes, seems, we can't cover it all with the single predefined buckets/rateTimeInterval set. В Пн, 05/08/2019 в 16:59 +0300, Andrey Gura пишет: Igniters, I've took a look to the PR and I want follow up this discussion again. Proposed solution has a couple of significant drawbacks: - metric configuration is node local (not cluster wide). - metric configuration doesn't survive node restart. This drawbacks make configuration complex, annoying and useless in most cases. Moreover, I think that: - User shouldn't configure hit rate metrics at runtime in most cases. Especially HitRateMetric.size because it's just details of implementation. Purpose of size is plots smoothing and this parameter could be fixed (e.g. 16 is enough). HitRate metric is just LongMetric but with additional feature. - May be it is enough for user to have histograms with pre-configured buckets. The trick here is properly chosen bounds. It seems that e
Re: Partition map exchange metrics
Nikita and Maxim, What if we just update current metric getCurrentPmeDuration behaviour to show durations only for blocking PMEs? Remain it as a long value and rename it to getCacheOperationsBlockedDuration. No other changes will require. WDYT? I agree with these two metrics. I also think that current getCurrentPmeDuration will become redundant. Anton, It looks like we're trying to implement "extended debug" instead of "monitoring". It should not be interesting for real admin what phase of PME is in progress and so on. PME is mission critical cluster process. I agree that there's a fine line between monitoring and debug here. However, it's not good to add monitoring capabilities only for scenario when everything is alright. If PME will really hang, *real admin* will be extremely interested how to return cluster back to working state. Metrics about stages completion time may really help here: e.g. if one specific node hasn't completed stage X while rest of the cluster has, it can be a signal that this node should be killed. Of course, it's possible to build monitoring system that extract this information from logs, but: - It's more resource intensive as it requires parsing logs for all the time - It's less reliable as log messages may change Best Regards, Ivan Rakov On 24.07.2019 14:57, Maxim Muzafarov wrote: Folks, +1 with Anton post. What if we just update current metric getCurrentPmeDuration behaviour to show durations only for blocking PMEs? Remain it as a long value and rename it to getCacheOperationsBlockedDuration. No other changes will require. WDYT? On Wed, 24 Jul 2019 at 14:02, Nikita Amelchev wrote: Nikolay, The сacheOperationsBlockedDuration metric will show current blocking duration or 0 if there is no blocking right now. The totalCacheOperationsBlockedDuration metric will accumulate all blocking durations that happen after node starts. ср, 24 июл. 2019 г. в 13:35, Nikolay Izhikov : Nikita What is the difference between those two metrics? ср, 24 июля 2019 г., 12:45 Nikita Amelchev : Igniters, thanks for comments. From the discussion it can be seen that we need only two metrics for now: - сacheOperationsBlockedDuration (long) - totalCacheOperationsBlockedDuration (long) I will prepare PR at the nearest time. ср, 24 июл. 2019 г. в 09:11, Zhenya Stanilovsky : +1 with Anton decisions. Среда, 24 июля 2019, 8:44 +03:00 от Anton Vinogradov : Folks, It looks like we're trying to implement "extended debug" instead of "monitoring". It should not be interesting for real admin what phase of PME is in progress and so on. Interested metrics are - total blocked time (will be used for real SLA counting) - are we blocked right now (shows we have an SLA degradation right now) Duration of the current blocking period can be easily presented using any modern monitoring tool by regular checks. Initial true will means "period start", precision will be a result of checks frequency. Anyway, I'm ok to have current metric presented with long, where long is a duration, see no reason, but ok :) All other features you mentioned are useful for code or deployment improving and can (should) be taken from logs at the analysis phase. On Tue, Jul 23, 2019 at 7:22 PM Ivan Rakov < ivan.glu...@gmail.com > wrote: Folks, let me step in. Nikita, thanks for your suggestions! 1. initialVersion. Topology version that initiates the exchange. 2. initTime. Time PME was started. 3. initEvent. Event that triggered PME. 4. partitionReleaseTime. Time when a node has finished waiting for all updates and translations on a previous topology. 5. sendSingleMessageTime. Time when a node sent a single message. 6. recieveFullMessageTime. Time when a node received a full message. 7. finishTime. Time PME was ended. When new PME started all these metrics resets. Every metric from Nikita's list looks useful and simple to implement. I think that it would be better to change format of metrics 4, 5, 6 and 7 a bit: we can keep only difference between time of previous event and time of corresponding event. Such metrics would be easier to perceive: they answer to specific questions "how much time did partition release take?" or "how much time did awaiting of distributed phase end take?". Also, if results of 4, 5, 6, 7 will be exported to monitoring system, graphs will show how different stages times change from one PME to another. When PME cause no blocking, it's a good PME and I see no reason to have monitoring related to it Agree with Anton here. These metrics should be measured only for true distributed exchange. Saving results for client leave/join PMEs will just complicate monitoring. I agree with total blocking duration metric but I still don't understand why instant value indicating that operations are blocked should be boolean. Duration time s
Re: Partition map exchange metrics
Folks, let me step in. Nikita, thanks for your suggestions! 1. initialVersion. Topology version that initiates the exchange. 2. initTime. Time PME was started. 3. initEvent. Event that triggered PME. 4. partitionReleaseTime. Time when a node has finished waiting for all updates and translations on a previous topology. 5. sendSingleMessageTime. Time when a node sent a single message. 6. recieveFullMessageTime. Time when a node received a full message. 7. finishTime. Time PME was ended. When new PME started all these metrics resets. Every metric from Nikita's list looks useful and simple to implement. I think that it would be better to change format of metrics 4, 5, 6 and 7 a bit: we can keep only difference between time of previous event and time of corresponding event. Such metrics would be easier to perceive: they answer to specific questions "how much time did partition release take?" or "how much time did awaiting of distributed phase end take?". Also, if results of 4, 5, 6, 7 will be exported to monitoring system, graphs will show how different stages times change from one PME to another. When PME cause no blocking, it's a good PME and I see no reason to have monitoring related to it Agree with Anton here. These metrics should be measured only for true distributed exchange. Saving results for client leave/join PMEs will just complicate monitoring. I agree with total blocking duration metric but I still don't understand why instant value indicating that operations are blocked should be boolean. Duration time since blocking has started looks more appropriate and useful. It gives more information while semantic is left the same. Totally agree with Pavel here. Both "accumulated block time" and "current PME block time" metrics are useful. Growth of accumulated metric for specific period of time (should be easy to check via monitoring system graph) will show for how much business operations were blocked in total, and non-zero current metric will show that we are experiencing issues right now. Boolean metric "are we blocked right now" is not needed as it's obviously can be inferred from "current PME block time". Best Regards, Ivan Rakov On 23.07.2019 16:02, Pavel Kovalenko wrote: Nikita, I agree with total blocking duration metric but I still don't understand why instant value indicating that operations are blocked should be boolean. Duration time since blocking has started looks more appropriate and useful. It gives more information while semantic is left the same. вт, 23 июл. 2019 г. в 11:42, Nikita Amelchev : Folks, All previous suggestions have some disadvantages. It can be several exchanges between two metric updates and fast exchange can rewrite previous long exchange. We can introduce a metric of total blocking duration that will accumulate at the end of the exchange. So, users will get actual information about how long operations were blocked. Cluster metric will be a maximum of local nodes metrics. And we need a boolean metric that will indicate realtime status. It needs because of duration metric updates at the end of the exchange. So I propose to change the current metric that not released to the totalCacheOperationsBlockingDuration metric and to add the isCacheOperationsBlocked metric. WDYT? пн, 22 июл. 2019 г. в 09:27, Anton Vinogradov : Nikolay, Still see no reason to replace boolean with long. On Mon, Jul 22, 2019 at 9:19 AM Nikolay Izhikov wrote: Anton. 1. Value exported based on SPI settings, not in the moment it changed. 2. Clock synchronisation - if we export start time, we should also export node local timestamp. пн, 22 июля 2019 г., 8:33 Anton Vinogradov : Folks, What's the reason for duration counting? AFAIU, it's a monitoring system feature to count the durations. Sine monitoring system checks metrics periodically it will know the duration by its own log. On Fri, Jul 19, 2019 at 7:32 PM Pavel Kovalenko wrote: Nikita, Yes, I mean duration not timestamp. For the metric name, I suggest "cacheOperationsBlockingDuration", I think it cleaner represents what is blocked during PME. We can also combine both timestamp "cacheOperationsBlockingStartTs" and duration to have better correlation when cache operations were blocked and how much time it's taken. For instant view (like in JMX bean) a calculated value as you mentioned can be used. For metrics are exported to some backend (IEP-35) a counter can be used. The counter is incremented by blocking time after blocking has ended. пт, 19 июл. 2019 г. в 19:10, Nikita Amelchev : Pavel, The main purpose of this metric is how much time we wait for resuming cache operations Seems I misunderstood you. Do you mean timestamp or duration here? What do you think if we change the boolean value of metric to a long value that represents time in milliseconds when o
Re: Improvements for new security approach.
Hello Max, Thanks for your analysis! Have you created a JIRA issue for discovered defects? Best Regards, Ivan Rakov On 17.07.2019 17:08, Maksim Stepachev wrote: Hello, Igniters. The main idea of the new security is propagation security context to other nodes and does action with initial permission. The solution looks fine but has imperfections. 1. ZookeaperDiscoveryImpl doesn't implement security into itself. As a result: Caused by: class org.apache.ignite.spi.IgniteSpiException: Security context isn't certain. 2. The visor tasks lost permission. The method VisorQueryUtils#scheduleQueryStart makes a new thread and loses context. 3. The GridRestProcessor does tasks outside "withContext" section. As result context loses. 4. The GridRestProcessor isn't client, we can't read security subject from node attribute. We should transmit secCtx for fake nodes and secSubjId for real. 5. NoOpIgniteSecurityProcessor should include a disabled processor and validate it too if it is not null. It is important for a client node. For example: Into IgniteKernal#securityProcessor method createComponent return a GridSecurityProcessor. For server nodes are enabled, but for clients aren't. The clients aren't able to pass validation for this reason. 6. ATTR_SECURITY_SUBJECT was removed. It broke compatibility. I going to fix it.
Re: Tx lock partial happens before
Anton, Step-by-step: 1) primary locked on key mention (get/put) at pessimistic/!read-committed tx 2) backups locked on prepare 3) primary unlocked on finish 4) backups unlocked on finish (after the primary) correct? Yes, this corresponds to my understanding of transactions protocol. With minor exception: steps 3 and 4 are inverted in case of one-phase commit. Agree, but seems there is no need to acquire the lock, we have just to wait until entry becomes unlocked. - entry locked means that previous tx's "finish" phase is in progress - entry unlocked means reading value is up-to-date (previous "finish" phase finished) correct? Diving deeper, entry is locked if its GridCacheMapEntry.localCandidates queue is not empty (first item in queue is actually the transaction that owns lock). we have just to wait until entry becomes unlocked. This may work. If consistency checking code has acquired lock on primary, backup can be in two states: - not locked - and new locks won't appear as we are holding lock on primary - still locked by transaction that owned lock on primary just before our checking code - in such case checking code should just wait for lock release Best Regards, Ivan Rakov On 15.07.2019 9:34, Anton Vinogradov wrote: Ivan R. Thanks for joining! Got an idea, but not sure that got a way of a fix. AFAIK (can be wrong, please correct if necessary), at 2PC, locks are acquired on backups during the "prepare" phase and released at "finish" phase after primary fully committed. Step-by-step: 1) primary locked on key mention (get/put) at pessimistic/!read-committed tx 2) backups locked on prepare 3) primary unlocked on finish 4) backups unlocked on finish (after the primary) correct? So, acquiring locks on backups, not at the "prepare" phase, may cause unexpected behavior in case of primary fail or other errors. That's definitely possible to update failover to solve this issue, but it seems to be an overcomplicated way. The main question there, it there any simple way? checking read from backup will just wait for commit if it's in progress. Agree, but seems there is no need to acquire the lock, we have just to wait until entry becomes unlocked. - entry locked means that previous tx's "finish" phase is in progress - entry unlocked means reading value is up-to-date (previous "finish" phase finished) correct? On Mon, Jul 15, 2019 at 8:37 AM Павлухин Иван wrote: Anton, I did not know mechanics locking entries on backups during prepare phase. Thank you for pointing that out! пт, 12 июл. 2019 г. в 22:45, Ivan Rakov : Hi Anton, Each get method now checks the consistency. Check means: 1) tx lock acquired on primary 2) gained data from each owner (primary and backups) 3) data compared Did you consider acquiring locks on backups as well during your check, just like 2PC prepare does? If there's HB between steps 1 (lock primary) and 2 (update primary + lock backup + update backup), you may be sure that there will be no false-positive results and no deadlocks as well. Protocol won't be complicated: checking read from backup will just wait for commit if it's in progress. Best Regards, Ivan Rakov On 12.07.2019 9:47, Anton Vinogradov wrote: Igniters, Let me explain problem in detail. Read Repair at pessimistic tx (locks acquired on primary, full sync, 2pc) able to see consistency violation because backups are not updated yet. This seems to be not a good idea to "fix" code to unlock primary only when backups updated, this definitely will cause a performance drop. Currently, there is no explicit sync feature allows waiting for backups updated during the previous tx. Previous tx just sends GridNearTxFinishResponse to the originating node. Bad ideas how to handle this: - retry some times (still possible to gain false positive) - lock tx entry on backups (will definitely break failover logic) - wait for same entry version on backups during some timeout (will require huge changes at "get" logic and false positive still possible) Is there any simple fix for this issue? Thanks for tips in advance. Ivan, thanks for your interest 4. Very fast and lucky txB writes a value 2 for the key on primary and backup. AFAIK, reordering not possible since backups "prepared" before primary releases lock. So, consistency guaranteed by failover and by "prepare" feature of 2PC. Seems, the problem is NOT with consistency at AI, but with consistency detection implementation (RR) and possible "false positive" results. BTW, checked 1PC case (only one data node at test) and gained no issues. On Fri, Jul 12, 2019 at 9:26 AM Павлухин Иван wrote: Anton, Is such behavior observed for 2PC or for 1PC optimization? Does not it mean that the things can be even worse and an inconsistent write is possible on a backup? E.g. in scenario: 1. txA writes
Re: Tx lock partial happens before
Hi Anton, Each get method now checks the consistency. Check means: 1) tx lock acquired on primary 2) gained data from each owner (primary and backups) 3) data compared Did you consider acquiring locks on backups as well during your check, just like 2PC prepare does? If there's HB between steps 1 (lock primary) and 2 (update primary + lock backup + update backup), you may be sure that there will be no false-positive results and no deadlocks as well. Protocol won't be complicated: checking read from backup will just wait for commit if it's in progress. Best Regards, Ivan Rakov On 12.07.2019 9:47, Anton Vinogradov wrote: Igniters, Let me explain problem in detail. Read Repair at pessimistic tx (locks acquired on primary, full sync, 2pc) able to see consistency violation because backups are not updated yet. This seems to be not a good idea to "fix" code to unlock primary only when backups updated, this definitely will cause a performance drop. Currently, there is no explicit sync feature allows waiting for backups updated during the previous tx. Previous tx just sends GridNearTxFinishResponse to the originating node. Bad ideas how to handle this: - retry some times (still possible to gain false positive) - lock tx entry on backups (will definitely break failover logic) - wait for same entry version on backups during some timeout (will require huge changes at "get" logic and false positive still possible) Is there any simple fix for this issue? Thanks for tips in advance. Ivan, thanks for your interest 4. Very fast and lucky txB writes a value 2 for the key on primary and backup. AFAIK, reordering not possible since backups "prepared" before primary releases lock. So, consistency guaranteed by failover and by "prepare" feature of 2PC. Seems, the problem is NOT with consistency at AI, but with consistency detection implementation (RR) and possible "false positive" results. BTW, checked 1PC case (only one data node at test) and gained no issues. On Fri, Jul 12, 2019 at 9:26 AM Павлухин Иван wrote: Anton, Is such behavior observed for 2PC or for 1PC optimization? Does not it mean that the things can be even worse and an inconsistent write is possible on a backup? E.g. in scenario: 1. txA writes a value 1 for the key on primary. 2. txA unlocks the key on primary. 3. txA freezes before updating backup. 4. Very fast and lucky txB writes a value 2 for the key on primary and backup. 5. txB wakes up and writes 1 for the key. 6. As result there is 2 on primary and 1 on backup. Naively it seems that locks should be released after all replicas are updated. ср, 10 июл. 2019 г. в 16:36, Anton Vinogradov : Folks, Investigating now unexpected repairs [1] in case of ReadRepair usage at testAccountTxNodeRestart. Updated [2] the test to check is there any repairs happen. Test's name now is "testAccountTxNodeRestartWithReadRepair". Each get method now checks the consistency. Check means: 1) tx lock acquired on primary 2) gained data from each owner (primary and backups) 3) data compared Sometime, backup may have obsolete value during such check. Seems, this happen because tx commit on primary going in the following way (check code [2] for details): 1) performing localFinish (releases tx lock) 2) performing dhtFinish (commits on backups) 3) transferring control back to the caller So, seems, the problem here is that "tx lock released on primary" does not mean that backups updated, but "commit() method finished at caller's thread" does. This means that, currently, there is no happens-before between 1) thread 1 committed data on primary and tx lock can be reobtained 2) thread 2 reads from backup but still strong HB between "commit() finished" and "backup updated" So, it seems to be possible, for example, to gain notification by a continuous query, then read from backup and gain obsolete value. Is this "partial happens before" behavior expected? [1] https://issues.apache.org/jira/browse/IGNITE-11973 [2] https://github.com/apache/ignite/pull/6679/files [3] org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal#finishTx -- Best regards, Ivan Pavlukhin
Re: Lightweight version of partitions map exchange
Hi Nikita, I've checked out your branch, looked through the changes and run IgniteBaselineNodeLeaveExchangeTest. Some thoughts: 1. First of all, there's fundamental issue that backup and primary partitions behave differently: - On primary, updating transaction needs to own exclusive lock (be on top of GridCacheMapEntry#localCandidates queue) on key object for the whole prepare-commit cycle. That's how two-phase commit works in Ignite. - Primary node generates update counter via PartitionTxUpdateCounterImpl#reserve, while backup receives update and just applies it with provided counter. So, if we'll perform PME in non-distributed way, we'll lose happen-before guarantees between updates of transactions mapped on previous topology and ones that are mapped to new topology. This may cause the following issues: - New primary node may start behaving as primary (spawn DHT transaction instances and acquire exclusive locks) but still may receive updates from previous primary. I don't know how to handle these updates correctly as they may conflict with new updates and locks. - New primary node should start generating update counters, but it actually doesn't know last update counter in cluster. If it optimistically will start from last known counter, partition consistency may break in case updates with actual last update counter will arrive (I guess, this issue should be reproduced as LWM > HWM assertion error). 2. According to current state of your test, testBltServerLeaveUnderLoad is called only with PickKeyOption#NO_DATA_ON_LEAVING_NODE (which means backups that are promoted to primaries without global synchronization are not affected by transactional load). However, it still fails with LWM > HWM assertion. I guess, there are another details in new partition counters implementation that require global happen-before between updates of transactions that are mapped to different topology versions. Alex S, backups that are promoted to primaries without global synchronization are not affected by transactional load test still fails with LWM > HWM assertion Do you have any ideas why this may happen? New primary node should start generating update counters, but it actually doesn't know last update counter in cluster. If it optimistically will start from last known counter, partition consistency may break in case updates with actual last update counter will arrive (I guess, this issue should be reproduced as LWM > HWM assertion error). How do you think, does this problem looks solvable? Alex S and Alex G, New primary node may start behaving as primary (spawn DHT transaction instances and acquire exclusive locks) but still may receive updates from previous primary. I don't know how to handle these updates correctly as they may conflict with new updates and locks. How do you think, can we overcome this limitation with our existing implementation of transactions? Best Regards, Ivan Rakov On 01.07.2019 11:13, Nikita Amelchev wrote: Hi, Igniters. I'm working on the implementation of lightweight PME for a baseline node leave case. [1] In my implementation, each node recalculates a new affinity and completes PME locally without distributed communication. This is possible because there are all partitions are distributed according to the baseline topology. And I found two possible blockers to do it without blocking updates: 1. Finalize partitions counter. It seems that we can't correctly collect gaps and process them without completing all txs. See the GridDhtPartitionTopologyImpl#finalizeUpdateCounters method. 2. Apply update counters. We can't correctly set HWM counter if primary left the cluster and sent updates to part of backups. Such updates can be processed later and break guarantee that LWM<=HWM. Is it impossible to leave a baseline node without waiting for all txs completed? 1. https://issues.apache.org/jira/browse/IGNITE-9913 ср, 5 июн. 2019 г. в 12:15, Nikita Amelchev : Maksim, I agree with you that we should implement current issue and do not allow lightweight PME if there are MOVING partitions in the cluster. But now I'm investigating issue about finalizing update counters cause it assumes that finalizing happens on exchange and all cache updates are completed. Here we can wrong process update counters gaps and break recently merged IGNITE-10078. And about phase 2, correct me if I misunderstood you. You suggest do not move primary partitions on rebalancing completing (do not change affinity assignment)? In this case, nodes recently join to cluster will not have primary partitions and won't get a load after rebalancing. чт, 30 мая 2019 г. в 19:55, Maxim Muzafarov : Igniters, I've looked through Nikita's changes and I think for the current issue [1] we should not allow the existence of MOVING partitions in the cluster (it must be stable) to run the lightweight PME
Re: Lightweight version of partitions map exchange
My bad, I've sent the message accidentally. What I wanted to ask: Alex S, backups that are promoted to primaries without global synchronization are not affected by transactional load test still fails with LWM > HWM assertion Do you have any ideas why this may happen? New primary node should start generating update counters, but it actually doesn't know last update counter in cluster. If it optimistically will start from last known counter, partition consistency may break in case updates with actual last update counter will arrive (I guess, this issue should be reproduced as LWM > HWM assertion error). How do you think, does this problem looks solvable? Alex S and Alex G, New primary node may start behaving as primary (spawn DHT transaction instances and acquire exclusive locks) but still may receive updates from previous primary. I don't know how to handle these updates correctly as they may conflict with new updates and locks. How do you think, can we overcome this limitation with existing transaction implementation? Best Regards, Ivan Rakov On 10.07.2019 2:25, Ivan Rakov wrote: Hi Nikita, I've checked out your branch, looked through the changes and run IgniteBaselineNodeLeaveExchangeTest. Some thoughts: 1. First of all, there's fundamental issue that backup and primary partitions behave differently: - On primary, updating transaction needs to own exclusive lock (be on top of GridCacheMapEntry#localCandidates queue) on key object for the whole prepare-commit cycle. That's how two-phase commit works in Ignite. - Primary node generates update counter via PartitionTxUpdateCounterImpl#reserve, while backup receives update and just applies it with provided counter. So, if we'll perform PME in non-distributed way, we'll lose happen-before guarantees between updates of transactions mapped on previous topology and updates of transactions that are mapped to new topology. This may cause the following issues: - New primary node may start behaving as primary (spawn DHT transaction instances and acquire exclusive locks) but still may receive updates from previous primary. I don't know how to handle these updates correctly as they may conflict with new updates and locks. - New primary node should start generating update counters, but it actually doesn't know last update counter in cluster. If it optimistically will start from last known counter, partition consistency may break in case updates with actual last update counter will arrive (I guess, this issue should be reproduced as LWM > HWM assertion error). 2. According to current state of your test, testBltServerLeaveUnderLoad is called only with PickKeyOption#NO_DATA_ON_LEAVING_NODE (which means backups that are promoted to primaries without global synchronization are not affected by transactional load). However, it still fails with LWM > HWM assertion. I guess, there are another details in new partition counters implementation that require global happen-before between updates of transactions that are mapped to different topology versions. Alex S, backups that are promoted to primaries without global synchronization are not affected by transactional load Best Regards, Ivan Rakov On 01.07.2019 11:13, Nikita Amelchev wrote: Hi, Igniters. I'm working on the implementation of lightweight PME for a baseline node leave case. [1] In my implementation, each node recalculates a new affinity and completes PME locally without distributed communication. This is possible because there are all partitions are distributed according to the baseline topology. And I found two possible blockers to do it without blocking updates: 1. Finalize partitions counter. It seems that we can't correctly collect gaps and process them without completing all txs. See the GridDhtPartitionTopologyImpl#finalizeUpdateCounters method. 2. Apply update counters. We can't correctly set HWM counter if primary left the cluster and sent updates to part of backups. Such updates can be processed later and break guarantee that LWM<=HWM. Is it impossible to leave a baseline node without waiting for all txs completed? 1. https://issues.apache.org/jira/browse/IGNITE-9913 ср, 5 июн. 2019 г. в 12:15, Nikita Amelchev : Maksim, I agree with you that we should implement current issue and do not allow lightweight PME if there are MOVING partitions in the cluster. But now I'm investigating issue about finalizing update counters cause it assumes that finalizing happens on exchange and all cache updates are completed. Here we can wrong process update counters gaps and break recently merged IGNITE-10078. And about phase 2, correct me if I misunderstood you. You suggest do not move primary partitions on rebalancing completing (do not change affinity assignment)? In this case, nodes recently join to cluster will not have primary partitions and won't get a load
Re: Lightweight version of partitions map exchange
Hi Nikita, I've checked out your branch, looked through the changes and run IgniteBaselineNodeLeaveExchangeTest. Some thoughts: 1. First of all, there's fundamental issue that backup and primary partitions behave differently: - On primary, updating transaction needs to own exclusive lock (be on top of GridCacheMapEntry#localCandidates queue) on key object for the whole prepare-commit cycle. That's how two-phase commit works in Ignite. - Primary node generates update counter via PartitionTxUpdateCounterImpl#reserve, while backup receives update and just applies it with provided counter. So, if we'll perform PME in non-distributed way, we'll lose happen-before guarantees between updates of transactions mapped on previous topology and updates of transactions that are mapped to new topology. This may cause the following issues: - New primary node may start behaving as primary (spawn DHT transaction instances and acquire exclusive locks) but still may receive updates from previous primary. I don't know how to handle these updates correctly as they may conflict with new updates and locks. - New primary node should start generating update counters, but it actually doesn't know last update counter in cluster. If it optimistically will start from last known counter, partition consistency may break in case updates with actual last update counter will arrive (I guess, this issue should be reproduced as LWM > HWM assertion error). 2. According to current state of your test, testBltServerLeaveUnderLoad is called only with PickKeyOption#NO_DATA_ON_LEAVING_NODE (which means backups that are promoted to primaries without global synchronization are not affected by transactional load). However, it still fails with LWM > HWM assertion. I guess, there are another details in new partition counters implementation that require global happen-before between updates of transactions that are mapped to different topology versions. Alex S, backups that are promoted to primaries without global synchronization are not affected by transactional load Best Regards, Ivan Rakov On 01.07.2019 11:13, Nikita Amelchev wrote: Hi, Igniters. I'm working on the implementation of lightweight PME for a baseline node leave case. [1] In my implementation, each node recalculates a new affinity and completes PME locally without distributed communication. This is possible because there are all partitions are distributed according to the baseline topology. And I found two possible blockers to do it without blocking updates: 1. Finalize partitions counter. It seems that we can't correctly collect gaps and process them without completing all txs. See the GridDhtPartitionTopologyImpl#finalizeUpdateCounters method. 2. Apply update counters. We can't correctly set HWM counter if primary left the cluster and sent updates to part of backups. Such updates can be processed later and break guarantee that LWM<=HWM. Is it impossible to leave a baseline node without waiting for all txs completed? 1. https://issues.apache.org/jira/browse/IGNITE-9913 ср, 5 июн. 2019 г. в 12:15, Nikita Amelchev : Maksim, I agree with you that we should implement current issue and do not allow lightweight PME if there are MOVING partitions in the cluster. But now I'm investigating issue about finalizing update counters cause it assumes that finalizing happens on exchange and all cache updates are completed. Here we can wrong process update counters gaps and break recently merged IGNITE-10078. And about phase 2, correct me if I misunderstood you. You suggest do not move primary partitions on rebalancing completing (do not change affinity assignment)? In this case, nodes recently join to cluster will not have primary partitions and won't get a load after rebalancing. чт, 30 мая 2019 г. в 19:55, Maxim Muzafarov : Igniters, I've looked through Nikita's changes and I think for the current issue [1] we should not allow the existence of MOVING partitions in the cluster (it must be stable) to run the lightweight PME on BLT node leave event occurred to achieve truly unlocked operations and here are my thoughts why. In general, as Nikita mentioned above, the existence of MOVING partitions in the cluster means that the rebalance procedure is currently running. It owns cache partitions locally and sends in the background (with additional timeout) the actual statuses of his local partitions to the coordinator node. So, we will always have a lag between local node partition states and all other cluster nodes partitions states. This lag can be very huge since previous #scheduleResendPartitions() is cancelled when a new cache group rebalance finished. Without the fair partition states synchronization (without full PME) and in case of local affinity recalculation on BLT node leave event, other nodes will mark such partitions LOST in most of the cases, which in fact are present
Re: "Idle verify" to "Online verify"
Anton, Automatic quorum-based partition drop may work as a partial workaround for IGNITE-10078, but discussed approach surely doesn't replace IGNITE-10078 activity. We still don't know what do to when quorum can't be reached (2 partitions have hash X, 2 have hash Y) and keeping extended update counters is the only way to resolve such case. On the other hand, precalculated partition hashes validation on PME can be a good addition to IGNITE-10078 logic: we'll be able to detect situations when extended update counters are equal, but for some reason (bug or whatsoever) partition contents are different. Best Regards, Ivan Rakov On 06.05.2019 12:27, Anton Vinogradov wrote: Ivan, just to make sure ... The discussed case will fully solve the issue [1] in case we'll also add some strategy to reject partitions with missed updates (updateCnt==Ok, Hash!=Ok). For example, we may use the Quorum strategy, when the majority wins. Sounds correct? [1] https://issues.apache.org/jira/browse/IGNITE-10078 On Tue, Apr 30, 2019 at 3:14 PM Anton Vinogradov wrote: Ivan, Thanks for the detailed explanation. I'll try to implement the PoC to check the idea. On Mon, Apr 29, 2019 at 8:22 PM Ivan Rakov wrote: But how to keep this hash? I think, we can just adopt way of storing partition update counters. Update counters are: 1) Kept and updated in heap, see IgniteCacheOffheapManagerImpl.CacheDataStoreImpl#pCntr (accessed during regular cache operations, no page replacement latency issues) 2) Synchronized with page memory (and with disk) on every checkpoint, see GridCacheOffheapManager#saveStoreMetadata 3) Stored in partition meta page, see PagePartitionMetaIO#setUpdateCounter 4) On node restart, we init onheap counter with value from disk (for the moment of last checkpoint) and update it to latest value during WAL logical records replay 2) PME is a rare operation on production cluster, but, seems, we have to check consistency in a regular way. Since we have to finish all operations before the check, should we have fake PME for maintenance check in this case? From my experience, PME happens on prod clusters from time to time (several times per week), which can be enough. In case it's needed to check consistency more often than regular PMEs occur, we can implement command that will trigger fake PME for consistency checking. Best Regards, Ivan Rakov On 29.04.2019 18:53, Anton Vinogradov wrote: Ivan, thanks for the analysis! With having pre-calculated partition hash value, we can automatically detect inconsistent partitions on every PME. Great idea, seems this covers all broken synс cases. It will check alive nodes in case the primary failed immediately and will check rejoining node once it finished a rebalance (PME on becoming an owner). Recovered cluster will be checked on activation PME (or even before that?). Also, warmed cluster will be still warmed after check. Have I missed some cases leads to broken sync except bugs? 1) But how to keep this hash? - It should be automatically persisted on each checkpoint (it should not require recalculation on restore, snapshots should be covered too) (and covered by WAL?). - It should be always available at RAM for every partition (even for cold partitions never updated/readed on this node) to be immediately used once all operations done on PME. Can we have special pages to keep such hashes and never allow their eviction? 2) PME is a rare operation on production cluster, but, seems, we have to check consistency in a regular way. Since we have to finish all operations before the check, should we have fake PME for maintenance check in this case? On Mon, Apr 29, 2019 at 4:59 PM Ivan Rakov mailto:ivan.glu...@gmail.com>> wrote: Hi Anton, Thanks for sharing your ideas. I think your approach should work in general. I'll just share my concerns about possible issues that may come up. 1) Equality of update counters doesn't imply equality of partitions content under load. For every update, primary node generates update counter and then update is delivered to backup node and gets applied with the corresponding update counter. For example, there are two transactions (A and B) that update partition X by the following scenario: - A updates key1 in partition X on primary node and increments counter to 10 - B updates key2 in partition X on primary node and increments counter to 11 - While A is still updating another keys, B is finally committed - Update of key2 arrives to backup node and sets update counter to 11 Observer will see equal update counters (11), but update of key 1 is still missing in the backup partition. This is a fundamental problem which is being solved here: https://issues.apache.org/jira/browse/IGNITE-10078 "Online verify" should operate with new complex update counters w
Re: "Idle verify" to "Online verify"
But how to keep this hash? I think, we can just adopt way of storing partition update counters. Update counters are: 1) Kept and updated in heap, see IgniteCacheOffheapManagerImpl.CacheDataStoreImpl#pCntr (accessed during regular cache operations, no page replacement latency issues) 2) Synchronized with page memory (and with disk) on every checkpoint, see GridCacheOffheapManager#saveStoreMetadata 3) Stored in partition meta page, see PagePartitionMetaIO#setUpdateCounter 4) On node restart, we init onheap counter with value from disk (for the moment of last checkpoint) and update it to latest value during WAL logical records replay 2) PME is a rare operation on production cluster, but, seems, we have to check consistency in a regular way. Since we have to finish all operations before the check, should we have fake PME for maintenance check in this case? From my experience, PME happens on prod clusters from time to time (several times per week), which can be enough. In case it's needed to check consistency more often than regular PMEs occur, we can implement command that will trigger fake PME for consistency checking. Best Regards, Ivan Rakov On 29.04.2019 18:53, Anton Vinogradov wrote: Ivan, thanks for the analysis! >> With having pre-calculated partition hash value, we can automatically detect inconsistent partitions on every PME. Great idea, seems this covers all broken synс cases. It will check alive nodes in case the primary failed immediately and will check rejoining node once it finished a rebalance (PME on becoming an owner). Recovered cluster will be checked on activation PME (or even before that?). Also, warmed cluster will be still warmed after check. Have I missed some cases leads to broken sync except bugs? 1) But how to keep this hash? - It should be automatically persisted on each checkpoint (it should not require recalculation on restore, snapshots should be covered too) (and covered by WAL?). - It should be always available at RAM for every partition (even for cold partitions never updated/readed on this node) to be immediately used once all operations done on PME. Can we have special pages to keep such hashes and never allow their eviction? 2) PME is a rare operation on production cluster, but, seems, we have to check consistency in a regular way. Since we have to finish all operations before the check, should we have fake PME for maintenance check in this case? On Mon, Apr 29, 2019 at 4:59 PM Ivan Rakov <mailto:ivan.glu...@gmail.com>> wrote: Hi Anton, Thanks for sharing your ideas. I think your approach should work in general. I'll just share my concerns about possible issues that may come up. 1) Equality of update counters doesn't imply equality of partitions content under load. For every update, primary node generates update counter and then update is delivered to backup node and gets applied with the corresponding update counter. For example, there are two transactions (A and B) that update partition X by the following scenario: - A updates key1 in partition X on primary node and increments counter to 10 - B updates key2 in partition X on primary node and increments counter to 11 - While A is still updating another keys, B is finally committed - Update of key2 arrives to backup node and sets update counter to 11 Observer will see equal update counters (11), but update of key 1 is still missing in the backup partition. This is a fundamental problem which is being solved here: https://issues.apache.org/jira/browse/IGNITE-10078 "Online verify" should operate with new complex update counters which take such "update holes" into account. Otherwise, online verify may provide false-positive inconsistency reports. 2) Acquisition and comparison of update counters is fast, but partition hash calculation is long. We should check that update counter remains unchanged after every K keys handled. 3) Another hope is that we'll be able to pause/continue scan, for example, we'll check 1/3 partitions today, 1/3 tomorrow, and in three days we'll check the whole cluster. Totally makes sense. We may find ourselves into a situation where some "hot" partitions are still unprocessed, and every next attempt to calculate partition hash fails due to another concurrent update. We should be able to track progress of validation (% of calculation time wasted due to concurrent operations may be a good metric, 100% is the worst case) and provide option to stop/pause activity. I think, pause should return an "intermediate results report" with information about which partitions have been successfully checked. With such report, we can resume activity later: partitions from report will be just skipped. 4) Sinc
Re: "Idle verify" to "Online verify"
Hi Anton, Thanks for sharing your ideas. I think your approach should work in general. I'll just share my concerns about possible issues that may come up. 1) Equality of update counters doesn't imply equality of partitions content under load. For every update, primary node generates update counter and then update is delivered to backup node and gets applied with the corresponding update counter. For example, there are two transactions (A and B) that update partition X by the following scenario: - A updates key1 in partition X on primary node and increments counter to 10 - B updates key2 in partition X on primary node and increments counter to 11 - While A is still updating another keys, B is finally committed - Update of key2 arrives to backup node and sets update counter to 11 Observer will see equal update counters (11), but update of key 1 is still missing in the backup partition. This is a fundamental problem which is being solved here: https://issues.apache.org/jira/browse/IGNITE-10078 "Online verify" should operate with new complex update counters which take such "update holes" into account. Otherwise, online verify may provide false-positive inconsistency reports. 2) Acquisition and comparison of update counters is fast, but partition hash calculation is long. We should check that update counter remains unchanged after every K keys handled. 3) Another hope is that we'll be able to pause/continue scan, for example, we'll check 1/3 partitions today, 1/3 tomorrow, and in three days we'll check the whole cluster. Totally makes sense. We may find ourselves into a situation where some "hot" partitions are still unprocessed, and every next attempt to calculate partition hash fails due to another concurrent update. We should be able to track progress of validation (% of calculation time wasted due to concurrent operations may be a good metric, 100% is the worst case) and provide option to stop/pause activity. I think, pause should return an "intermediate results report" with information about which partitions have been successfully checked. With such report, we can resume activity later: partitions from report will be just skipped. 4) Since "Idle verify" uses regular pagmem, I assume it replaces hot data with persisted. So, we have to warm up the cluster after each check. Are there any chances to check without cooling the cluster? I don't see an easy way to achieve it with our page memory architecture. We definitely can't just read pages from disk directly: we need to synchronize page access with concurrent update operations and checkpoints. From my point of view, the correct way to solve this issue is improving our page replacement [1] mechanics by making it truly scan-resistant. P. S. There's another possible way of achieving online verify: instead of on-demand hash calculation, we can always keep up-to-date hash value for every partition. We'll need to update hash on every insert/update/remove operation, but there will be no reordering issues as per function that we use for aggregating hash results (+) is commutative. With having pre-calculated partition hash value, we can automatically detect inconsistent partitions on every PME. What do you think? [1] - https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Durable+Memory+-+under+the+hood#IgniteDurableMemory-underthehood-Pagereplacement(rotationwithdisk) Best Regards, Ivan Rakov On 29.04.2019 12:20, Anton Vinogradov wrote: Igniters and especially Ivan Rakov, "Idle verify" [1] is a really cool tool, to make sure that cluster is consistent. 1) But it required to have operations paused during cluster check. At some clusters, this check requires hours (3-4 hours at cases I saw). I've checked the code of "idle verify" and it seems it possible to make it "online" with some assumptions. Idea: Currently "Idle verify" checks that partitions hashes, generated this way while (it.hasNextX()) { CacheDataRow row = it.nextX(); partHash += row.key().hashCode(); partHash += Arrays.hashCode(row.value().valueBytes(grpCtx.cacheObjectContext())); } , are the same. What if we'll generate same pairs updateCounter-partitionHash but will compare hashes only in case counters are the same? So, for example, will ask cluster to generate pairs for 64 partitions, then will find that 55 have the same counters (was not updated during check) and check them. The rest (64-55 = 9) partitions will be re-requested and rechecked with an additional 55. This way we'll be able to check cluster is consistent even in сase operations are in progress (just retrying modified). Risks and assumptions: Using this strategy we'll check the cluster's consistency ... eventually, and the check will take more time even on an idle cluster. In case operationsPerTimeToGenerateP
Re: IgniteConfigVariationsAbstractTest subclasses do not run
Ivan P., Good catch, thanks. I was wrong, test scenario is correct. The problem was in atomicityMode() method - it could have returned null (which was okay for config generation, but wasn't expected in the test code). Please take a look at tx_out_test_fixed.patch (attached to IGNITE-11708). To sum it up, both issues should be fixed now. Best Regards, Ivan Rakov On 26.04.2019 14:40, Павлухин Иван wrote: Ivan R., As I can see IgniteCacheConfigVariationsFullApiTest#testGetOutTx does not expect lock/unlock events due to line: if (atomicityMode() == ATOMIC) return lockEvtCnt.get() == 0; Could you please elaborate? пт, 26 апр. 2019 г. в 13:32, Ivan Rakov : Ivan, Seems like IgniteCacheReadThroughEvictionSelfTest is broken. Test scenario assumes that even after expiration entry will be present in IgniteCache as per it will be loaded from CacheStore. However, CacheStore is not specified in node config. I've added patch that enables cache store factory, please check IGNITE-11708 attachments. Regarding IgniteCacheConfigVariationsFullApiTest#testGetOutTx* tests: from my point of view, test scenarios make no sense. We perform get() operation from ATOMIC caches and expect that entries will be locked. I don't understand why we should lock entries on ATOMIC get, therefore I suppose to remove part of code where we listen and check EVT_CACHE_OBJECT_LOCKED/UNLOCKED events. Best Regards, Ivan Rakov On 17.04.2019 22:05, Ivan Rakov wrote: Hi Ivan, I've checked your branch. Seems like these tests fail due to real issue in functionality. I'll take a look. Best Regards, Ivan Rakov On 17.04.2019 13:54, Ivan Fedotov wrote: Hi, Igniters! During work on iep-30[1] I discovered that IgniteConfigVariationsAbstractTest subclasses - it is about 15_000 tests[2] - do not work. You can check it just run one of the tests with log output, for example ConfigVariationsTestSuiteBuilderTest#LegacyLifecycleTest#test1 [3]. There is no warning notification in the console. The same situation with other IgniteConfigVariationsAbstractTest subclasses - tests run, but they simply represent empty code. So, I created a ticket on such issue [4] and it turned out that the problem is with ruleChain in IgniteConfigVariationsAbstractTest [5]. The rule that is responsible for running a test statement does not start indeed [6] under ruleChain runRule. I suggested a solution - move testsCfg initialization to IgniteConfigVariationsAbstractTest#beforeTestsStarted method. After such changes ruleChain becomes not necessary. But I faced another problem - multiple failures on TeamCity [7]. From logs, it seems that failures are related to what tests check, but not JUnit error. I can not track TeamCity history on that fact were tests failed or not on the previous JUnit version - the oldest log is dated the start of the March when JUnit4 already was implemented (for example, this [8] test). Moreover, there are not so much failed tests, but because of running with multiple configurations (InterceptorCacheConfigVariationsFullApiTestSuite_0 ..._95) it turns out about 400 failed tests. TeamCity results also confirm that tests do not work in the master branch - duration time is less than 1ms. Now all tests are green and that is not surprising - under @Test annotation, nothing happens. Could some of us confirm or disprove my guess that tests are red because of its functionality, but not JUnit implementation? And if it is true, how should I take such fact into account in this ticket? [1] https://cwiki.apache.org/confluence/display/IGNITE/IEP-30%3A+Migration+to+JUnit+5 [2] https://github.com/apache/ignite/blob/master/modules/core/src/test/java/org/apache/ignite/testsuites/InterceptorCacheConfigVariationsFullApiTestSuite.java [3] https://github.com/apache/ignite/blob/master/modules/core/src/test/java/org/apache/ignite/testframework/test/ConfigVariationsTestSuiteBuilderTest.java#L434 [4] https://issues.apache.org/jira/browse/IGNITE-11708 [5] https://github.com/apache/ignite/blob/master/modules/core/src/test/java/org/apache/ignite/testframework/junits/IgniteConfigVariationsAbstractTest.java#L62 [6] https://github.com/apache/ignite/blob/master/modules/core/src/test/java/org/apache/ignite/testframework/junits/GridAbstractTest.java#L181 [7] https://mtcga.gridgain.com/pr.html?serverId=apache&suiteId=IgniteTests24Java8_RunAll&branchForTc=pull/6434/head&action=Latest [8] https://ci.ignite.apache.org/project.html?tab=testDetails&projectId=IgniteTests24Java8&testNameId=-9037806478172035481&page=8
Re: IgniteConfigVariationsAbstractTest subclasses do not run
Ivan, Seems like IgniteCacheReadThroughEvictionSelfTest is broken. Test scenario assumes that even after expiration entry will be present in IgniteCache as per it will be loaded from CacheStore. However, CacheStore is not specified in node config. I've added patch that enables cache store factory, please check IGNITE-11708 attachments. Regarding IgniteCacheConfigVariationsFullApiTest#testGetOutTx* tests: from my point of view, test scenarios make no sense. We perform get() operation from ATOMIC caches and expect that entries will be locked. I don't understand why we should lock entries on ATOMIC get, therefore I suppose to remove part of code where we listen and check EVT_CACHE_OBJECT_LOCKED/UNLOCKED events. Best Regards, Ivan Rakov On 17.04.2019 22:05, Ivan Rakov wrote: Hi Ivan, I've checked your branch. Seems like these tests fail due to real issue in functionality. I'll take a look. Best Regards, Ivan Rakov On 17.04.2019 13:54, Ivan Fedotov wrote: Hi, Igniters! During work on iep-30[1] I discovered that IgniteConfigVariationsAbstractTest subclasses - it is about 15_000 tests[2] - do not work. You can check it just run one of the tests with log output, for example ConfigVariationsTestSuiteBuilderTest#LegacyLifecycleTest#test1 [3]. There is no warning notification in the console. The same situation with other IgniteConfigVariationsAbstractTest subclasses - tests run, but they simply represent empty code. So, I created a ticket on such issue [4] and it turned out that the problem is with ruleChain in IgniteConfigVariationsAbstractTest [5]. The rule that is responsible for running a test statement does not start indeed [6] under ruleChain runRule. I suggested a solution - move testsCfg initialization to IgniteConfigVariationsAbstractTest#beforeTestsStarted method. After such changes ruleChain becomes not necessary. But I faced another problem - multiple failures on TeamCity [7]. From logs, it seems that failures are related to what tests check, but not JUnit error. I can not track TeamCity history on that fact were tests failed or not on the previous JUnit version - the oldest log is dated the start of the March when JUnit4 already was implemented (for example, this [8] test). Moreover, there are not so much failed tests, but because of running with multiple configurations (InterceptorCacheConfigVariationsFullApiTestSuite_0 ..._95) it turns out about 400 failed tests. TeamCity results also confirm that tests do not work in the master branch - duration time is less than 1ms. Now all tests are green and that is not surprising - under @Test annotation, nothing happens. Could some of us confirm or disprove my guess that tests are red because of its functionality, but not JUnit implementation? And if it is true, how should I take such fact into account in this ticket? [1] https://cwiki.apache.org/confluence/display/IGNITE/IEP-30%3A+Migration+to+JUnit+5 [2] https://github.com/apache/ignite/blob/master/modules/core/src/test/java/org/apache/ignite/testsuites/InterceptorCacheConfigVariationsFullApiTestSuite.java [3] https://github.com/apache/ignite/blob/master/modules/core/src/test/java/org/apache/ignite/testframework/test/ConfigVariationsTestSuiteBuilderTest.java#L434 [4] https://issues.apache.org/jira/browse/IGNITE-11708 [5] https://github.com/apache/ignite/blob/master/modules/core/src/test/java/org/apache/ignite/testframework/junits/IgniteConfigVariationsAbstractTest.java#L62 [6] https://github.com/apache/ignite/blob/master/modules/core/src/test/java/org/apache/ignite/testframework/junits/GridAbstractTest.java#L181 [7] https://mtcga.gridgain.com/pr.html?serverId=apache&suiteId=IgniteTests24Java8_RunAll&branchForTc=pull/6434/head&action=Latest [8] https://ci.ignite.apache.org/project.html?tab=testDetails&projectId=IgniteTests24Java8&testNameId=-9037806478172035481&page=8
[jira] [Created] (IGNITE-11807) Index validation control.sh command may provide false-positive error results
Ivan Rakov created IGNITE-11807: --- Summary: Index validation control.sh command may provide false-positive error results Key: IGNITE-11807 URL: https://issues.apache.org/jira/browse/IGNITE-11807 Project: Ignite Issue Type: Bug Reporter: Ivan Rakov Fix For: 2.8 There are two possible issues in validate_indexes command: 1. In case index validation is performed under load, there's a chance that we'll fetch link from B+ tree and won't found this key in partition cache data store as per it was conurrently removed. We may work it around by double-checking partition update counters (before and after indexes validation procedure). 2. Since indexes validation is subscribed to checkpoint start (reason: we perform CRC validation of file page store pages which is sensitive to concurrent disk page writes), we may bump into the following situation: - User fairly stops all load - A few moments later users triggers validate_indexes - Checkpoint starts due to timeout, pages that were modified before validate_indexes start are being written to the disk - validate_indexes fails We may work it around by triggering checkpoint forcibly before start of indexes validation activities. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: IgniteConfigVariationsAbstractTest subclasses do not run
Hi Ivan, I've checked your branch. Seems like these tests fail due to real issue in functionality. I'll take a look. Best Regards, Ivan Rakov On 17.04.2019 13:54, Ivan Fedotov wrote: Hi, Igniters! During work on iep-30[1] I discovered that IgniteConfigVariationsAbstractTest subclasses - it is about 15_000 tests[2] - do not work. You can check it just run one of the tests with log output, for example ConfigVariationsTestSuiteBuilderTest#LegacyLifecycleTest#test1 [3]. There is no warning notification in the console. The same situation with other IgniteConfigVariationsAbstractTest subclasses - tests run, but they simply represent empty code. So, I created a ticket on such issue [4] and it turned out that the problem is with ruleChain in IgniteConfigVariationsAbstractTest [5]. The rule that is responsible for running a test statement does not start indeed [6] under ruleChain runRule. I suggested a solution - move testsCfg initialization to IgniteConfigVariationsAbstractTest#beforeTestsStarted method. After such changes ruleChain becomes not necessary. But I faced another problem - multiple failures on TeamCity [7]. From logs, it seems that failures are related to what tests check, but not JUnit error. I can not track TeamCity history on that fact were tests failed or not on the previous JUnit version - the oldest log is dated the start of the March when JUnit4 already was implemented (for example, this [8] test). Moreover, there are not so much failed tests, but because of running with multiple configurations (InterceptorCacheConfigVariationsFullApiTestSuite_0 ..._95) it turns out about 400 failed tests. TeamCity results also confirm that tests do not work in the master branch - duration time is less than 1ms. Now all tests are green and that is not surprising - under @Test annotation, nothing happens. Could some of us confirm or disprove my guess that tests are red because of its functionality, but not JUnit implementation? And if it is true, how should I take such fact into account in this ticket? [1] https://cwiki.apache.org/confluence/display/IGNITE/IEP-30%3A+Migration+to+JUnit+5 [2] https://github.com/apache/ignite/blob/master/modules/core/src/test/java/org/apache/ignite/testsuites/InterceptorCacheConfigVariationsFullApiTestSuite.java [3] https://github.com/apache/ignite/blob/master/modules/core/src/test/java/org/apache/ignite/testframework/test/ConfigVariationsTestSuiteBuilderTest.java#L434 [4] https://issues.apache.org/jira/browse/IGNITE-11708 [5] https://github.com/apache/ignite/blob/master/modules/core/src/test/java/org/apache/ignite/testframework/junits/IgniteConfigVariationsAbstractTest.java#L62 [6] https://github.com/apache/ignite/blob/master/modules/core/src/test/java/org/apache/ignite/testframework/junits/GridAbstractTest.java#L181 [7] https://mtcga.gridgain.com/pr.html?serverId=apache&suiteId=IgniteTests24Java8_RunAll&branchForTc=pull/6434/head&action=Latest [8] https://ci.ignite.apache.org/project.html?tab=testDetails&projectId=IgniteTests24Java8&testNameId=-9037806478172035481&page=8
[jira] [Created] (IGNITE-11769) Investigate JVM crash in PDS Direct IO TeamCity suites
Ivan Rakov created IGNITE-11769: --- Summary: Investigate JVM crash in PDS Direct IO TeamCity suites Key: IGNITE-11769 URL: https://issues.apache.org/jira/browse/IGNITE-11769 Project: Ignite Issue Type: Bug Reporter: Ivan Rakov Fix For: 2.8 Both PDS Direct IO suites periodically fail with JVM crash. The issue can be reproduced on Linux machine by running IgnitePdsWithTtlTest#testTtlIsAppliedAfterRestart using ignite-direct-io classpath. The investigation is complicated because JVM crash report* is not generated* during this crash. After some point, JVM stays dormant for 2 minutes and then process gets killed by OS signal {code:java} Process finished with exit code 134 (interrupted by signal 6: SIGABRT {code} and the following error messages can be dumped to stderr before process death {code:java} `corrupted double-linked list` `free(): corrupted unsorted chunks` {code} which appear to be libc error messages. Seems like Ignite corrupts virtual memory in sophisticated way which prevents normal JVM Crash flow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11762) Test testClientStartCloseServersRestart causes hang of the whole Cache 2 suite in master
Ivan Rakov created IGNITE-11762: --- Summary: Test testClientStartCloseServersRestart causes hang of the whole Cache 2 suite in master Key: IGNITE-11762 URL: https://issues.apache.org/jira/browse/IGNITE-11762 Project: Ignite Issue Type: Bug Reporter: Ivan Rakov Assignee: Pavel Kovalenko Fix For: 2.8 Attempt to restart server node in test hangs: {code:java} [2019-04-16 19:56:45,049][WARN ][restart-1][GridCachePartitionExchangeManager] Failed to wait for initial partition map exchange. Possible reasons are: ^-- Transactions in deadlock. ^-- Long running transactions (ignore if this is the case). ^-- Unreleased explicit locks. {code} The reason is that previous PME (late affinity assignment) still hangs due to pending transaction: {code:java} [2019-04-16 19:56:23,717][WARN ][exchange-worker-#1039%cache.IgniteClientCacheStartFailoverTest3%][diagnostic] Pending transactions: [2019-04-16 19:56:23,718][WARN ][exchange-worker-#1039%cache.IgniteClientCacheStartFailoverTest3%][diagnostic] >>> [txVer=AffinityTopologyVersion [topVer=11, minorTopVer=0], exchWait=true, tx=GridDhtTxLocal [nearNodeId=8559bfe0-3d4a-4090-a457-6df0eba5, nearFutId=1edc7172a61-941f9dde-2b60-4a1f-8213-7d23d738bf33, nearMiniId=1, nearFinFutId=null, nearFinMiniId=0, nearXidVer=GridCacheVersion [topVer=166913752, order=1555433759036, nodeOrder=6], lb=null, super=GridDhtTxLocalAdapter [nearOnOriginatingNode=false, nearNodes=KeySetView [], dhtNodes=KeySetView [9ef33532-0e4a-4561-b57e-042afe10], explicitLock=false, super=IgniteTxLocalAdapter [completedBase=null, sndTransformedVals=false, depEnabled=false, txState=IgniteTxStateImpl [activeCacheIds=[-1062368467], recovery=false, mvccEnabled=true, mvccCachingCacheIds=[], txMap=HashSet []], super=IgniteTxAdapter [xidVer=GridCacheVersion [topVer=166913752, order=1555433759045, nodeOrder=10], writeVer=null, implicit=false, loc=true, threadId=1210, startTime=1555433762847, nodeId=0088e9b8-f859-4d14-8071-6388e473, startVer=GridCacheVersion [topVer=166913752, order=1555433759045, nodeOrder=10], endVer=null, isolation=REPEATABLE_READ, concurrency=PESSIMISTIC, timeout=0, sysInvalidate=false, sys=false, plc=2, commitVer=GridCacheVersion [topVer=166913752, order=1555433759045, nodeOrder=10], finalizing=NONE, invalidParts=null, state=MARKED_ROLLBACK, timedOut=false, topVer=AffinityTopologyVersion [topVer=11, minorTopVer=0], mvccSnapshot=MvccSnapshotResponse [futId=292, crdVer=1555433741506, cntr=395, opCntr=1, txs=[394], cleanupVer=390, tracking=0], skipCompletedVers=false, parentTx=null, duration=20866ms, onePhaseCommit=false], size=0 {code} However, load threads don't start any explicit transactions: they either hang on put()/get() or on clientCache.close(). Rolling back IGNITE-10799 resolves the issue (however, test remains flaky with ~10% fail rate due to unhandled TransactionSerializationException). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11747) Document --tx control script commands
Ivan Rakov created IGNITE-11747: --- Summary: Document --tx control script commands Key: IGNITE-11747 URL: https://issues.apache.org/jira/browse/IGNITE-11747 Project: Ignite Issue Type: Task Components: documentation Reporter: Ivan Rakov Along with consistency check utilities, ./control.sh script has --tx command which allows to display info about active transactions and even kill hanging transactions directly. ./control.sh provides just brief description of options: {code:java} List or kill transactions: control.sh --tx [--xid XID] [--min-duration SECONDS] [--min-size SIZE] [--label PATTERN_REGEX] [--servers|--clients] [--nodes consistentId1[,consistentId2,,consistentIdN]] [--limit NUMBER] [--order DURATION|SIZE|START_TIME] [--kill] [--info] [--yes] {code} We should document possible use cases and options of the command, possibly somewhere close to [https://apacheignite-tools.readme.io/docs/control-script] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11735) Safely handle new closures of IGNITE-11392 in mixed cluster environment
Ivan Rakov created IGNITE-11735: --- Summary: Safely handle new closures of IGNITE-11392 in mixed cluster environment Key: IGNITE-11735 URL: https://issues.apache.org/jira/browse/IGNITE-11735 Project: Ignite Issue Type: Improvement Reporter: Ivan Rakov Assignee: Denis Chudov Fix For: 2.8 Under IGNITE-11392 we have added two new closures (FetchActiveTxOwnerTraceClosure and TxOwnerDumpRequestAllowedSettingClosure). In case we'll assemble mixed cluster (some nodes contain the patch, some don't), we may bump into situation when closures are sent to node that doesn't contain corresponding classes in classpath. Normally, closurer will be deployed to "old" node via peer-to-peer class deployment. However, p2p may be disabled in configuration, which will cause ClassNotFoundException on "old" node. We should register IGNITE-11392 in IgniteFeatures (recent example: IGNITE-11598) and filter out nodes that don't support new feature before sending compute. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11591) Add info about lock candidates that are ahead in queue to transaction timeout error message
Ivan Rakov created IGNITE-11591: --- Summary: Add info about lock candidates that are ahead in queue to transaction timeout error message Key: IGNITE-11591 URL: https://issues.apache.org/jira/browse/IGNITE-11591 Project: Ignite Issue Type: Improvement Reporter: Ivan Rakov Fix For: 2.8 If transaction is timed out due to lock acquisition failure, corresponding error will show up in server log on DHT node: {code:java} [2019-03-20 21:13:10,831][ERROR][grid-timeout-worker-#23%transactions.TxRollbackOnTimeoutTest0%][GridDhtColocatedCache] Failed to acquire lock for request: GridNearLockRequest [topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], miniId=1, dhtVers=GridCacheVersion[] [null], subjId=651a30e1-45ac-4b35-86d2-028d1f81d8dc, taskNameHash=0, createTtl=-1, accessTtl=-1, flags=6, txLbl=null, filter=null, super=GridDistributedLockRequest [nodeId=651a30e1-45ac-4b35-86d2-028d1f81d8dc, nearXidVer=GridCacheVersion [topVer=164585585, order=1553105588524, nodeOrder=4], threadId=262, futId=5967e4c9961-d32ea2a6-1789-47d7-bdbf-aa66e6d8c35b, timeout=890, isInTx=true, isInvalidate=false, isRead=false, isolation=REPEATABLE_READ, retVals=[false], txSize=2, flags=0, keysCnt=1, super=GridDistributedBaseMessage [ver=GridCacheVersion [topVer=164585585, order=1553105588524, nodeOrder=4], committedVers=null, rolledbackVers=null, cnt=0, super=GridCacheIdMessage [cacheId=3556498 class org.apache.ignite.internal.transactions.IgniteTxTimeoutCheckedException: Failed to acquire lock within provided timeout for transaction [timeout=890, tx=GridDhtTxLocal[xid=f219e4c9961--09cf-6071--0001, xidVersion=GridCacheVersion [topVer=164585585, order=1553105588527, nodeOrder=1], concurrency=PESSIMISTIC, isolation=REPEATABLE_READ, state=MARKED_ROLLBACK, invalidate=false, rollbackOnly=true, nodeId=c7dccddb-dee1-4499-94b1-03896350, timeout=890, duration=891]] at org.apache.ignite.internal.processors.cache.transactions.IgniteTxLocalAdapter$PostLockClosure1.apply(IgniteTxLocalAdapter.java:1766) at org.apache.ignite.internal.processors.cache.transactions.IgniteTxLocalAdapter$PostLockClosure1.apply(IgniteTxLocalAdapter.java:1714) at org.apache.ignite.internal.util.future.GridEmbeddedFuture$2.applyx(GridEmbeddedFuture.java:86) at org.apache.ignite.internal.util.future.GridEmbeddedFuture$AsyncListener1.apply(GridEmbeddedFuture.java:292) at org.apache.ignite.internal.util.future.GridEmbeddedFuture$AsyncListener1.apply(GridEmbeddedFuture.java:285) at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:399) at org.apache.ignite.internal.util.future.GridFutureAdapter.unblock(GridFutureAdapter.java:347) at org.apache.ignite.internal.util.future.GridFutureAdapter.unblockAll(GridFutureAdapter.java:335) at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:511) at org.apache.ignite.internal.processors.cache.GridCacheCompoundIdentityFuture.onDone(GridCacheCompoundIdentityFuture.java:56) at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:490) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture.onComplete(GridDhtLockFuture.java:793) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture.access$900(GridDhtLockFuture.java:89) at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture$LockTimeoutObject.onTimeout(GridDhtLockFuture.java:1189) at org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor$TimeoutWorker.body(GridTimeoutProcessor.java:234) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) at java.lang.Thread.run(Thread.j {code} It would be much more useful if this message also contained information about transaction that actually owns corresponding lock (or information about all transactions that are ahead in queue if there are several). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11484) Get rid of ForkJoinPool#commonPool usage for csystem critical tasks
Ivan Rakov created IGNITE-11484: --- Summary: Get rid of ForkJoinPool#commonPool usage for csystem critical tasks Key: IGNITE-11484 URL: https://issues.apache.org/jira/browse/IGNITE-11484 Project: Ignite Issue Type: Bug Reporter: Ivan Rakov Assignee: Ivan Rakov Fix For: 2.8 We use ForkJoinPool#commonPool for sorting checkpoint pages. This may backfire if common pool is already utilized in current JVM: checkpoint may wait for sorting for a long time, which in turn will cause user load dropdown. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Please re-commit 3 last changes in the master
Thanks for keeping track of it, I've re-applied the following commits: IGNITE-11199 Add extra logging for client-server connections in TCP discovery - Fixes #6048. Andrey Kalinin* 04.03.2019 2:11 IGNITE-11322 [USABILITY] Extend Node FAILED message by add consistentId if it exist - Fixes #6180. Andrey Kalinin* 04.03.2019 2:03 Best Regards, Ivan Rakov On 04.03.2019 13:56, Dmitriy Pavlov wrote: Thanks to Alexey Plehanov for noticing and Infra Team for fixing the issue: https://issues.apache.org/jira/browse/INFRA-17950 пн, 4 мар. 2019 г. в 13:53, Dmitriy Pavlov : Hi Developers, Because of the sync issue, the following 3 commits were lost. Please re-apply it to the master. https://gitbox.apache.org/repos/asf?p=ignite.git;a=commit;h=b26bbb29d5fdd9d4de5187042778ebe3b8c6c42e https://gitbox.apache.org/repos/asf?p=ignite.git;a=commit;h=6c562a997c0beb3a3cd9dd2976e016759a808f0c https://gitbox.apache.org/repos/asf?p=ignite.git;a=commit;h=45c4dc98e0eac33cccd2e24acb3e9882f098cad1 Sorry for the inconvenience. Sincerely, Dmitriy Pavlov
[jira] [Created] (IGNITE-11465) Multiple client leave/join events may wipe affinity assignment history and cause transactions fail
Ivan Rakov created IGNITE-11465: --- Summary: Multiple client leave/join events may wipe affinity assignment history and cause transactions fail Key: IGNITE-11465 URL: https://issues.apache.org/jira/browse/IGNITE-11465 Project: Ignite Issue Type: Bug Reporter: Ivan Rakov Assignee: Ivan Rakov Fix For: 2.8 We keep history of GridAffinityAssignmentCache#MAX_HIST_SIZE affinity assignments, however flood of client joins/leaves may wipe it out entirely and cause fail/hang of transaction that was started before the flood: {code:java} if (cache == null || cache.topologyVersion().compareTo(topVer) > 0) { throw new IllegalStateException("Getting affinity for topology version earlier than affinity is " + "calculated [locNode=" + ctx.discovery().localNode() + ", grp=" + cacheOrGrpName + ", topVer=" + topVer + ", head=" + head.get().topologyVersion() + ", history=" + affCache.keySet() + ']'); } {code} History is limited in order to prevent JVM heap overflow. At the same time, only "server event" affinity assignments are heavy: "client event" assignments are just shallow copies of "server event" assignments. I suggest to limit history by the number of "server event" assignments. Also, consider the provided fix, I don't see any need to keep 500 items in history. I changed history size to 40. -- This message was sent by Atlassian JIRA (v7.6.3#76005)