Re: How to free up space on disc after removing entries from IgniteCache with enabled PDS?

2019-10-07 Thread Anton Vinogradov
Denis,

I like the idea that defragmentation is just an additional step on a node
(re)start like we perform PDS recovery now.
We may just use special key to specify node should defragment persistence
on (re)start.
Defragmentation can be the part of Rolling Upgrade in this case :)
It seems to be not a problem to restart nodes one-by-one, this will "eat"
only one backup guarantee.

On Mon, Oct 7, 2019 at 8:28 PM Denis Magda  wrote:

> Alex, thanks for the summary and proposal. Anton, Ivan and others who took
> part in this discussion, what're your thoughts? I see this
> rolling-upgrades-based approach as a reasonable solution. Even though a
> node shutdown is expected, the procedure doesn't lead to the cluster outage
> meaning it can be utilized for 24x7 production environments.
>
> -
> Denis
>
>
> On Mon, Oct 7, 2019 at 1:35 AM Alexey Goncharuk <
> alexey.goncha...@gmail.com>
> wrote:
>
> > Created a ticket for the first stage of this improvement. This can be a
> > first change towards the online mode suggested by Sergey and Anton.
> > https://issues.apache.org/jira/browse/IGNITE-12263
> >
> > пт, 4 окт. 2019 г. в 19:38, Alexey Goncharuk  >:
> >
> > > Maxim,
> > >
> > > Having a cluster-wide lock for a cache does not improve availability of
> > > the solution. A user cannot defragment a cache if the cache is involved
> > in
> > > a mission-critical operation, so having a lock on such a cache is
> > > equivalent to the whole cluster shutdown.
> > >
> > > We should decide between either a single offline node or a more complex
> > > fully online solution.
> > >
> > > пт, 4 окт. 2019 г. в 11:55, Maxim Muzafarov :
> > >
> > >> Igniters,
> > >>
> > >> This thread seems to be endless, but we if some kind of cache group
> > >> distributed write lock (exclusive for some of the internal Ignite
> > >> process) will be introduced? I think it will help to solve a batch of
> > >> problems, like:
> > >>
> > >> 1. defragmentation of all cache group partitions on the local node
> > >> without concurrent updates.
> > >> 2. improve data loading with data streamer isolation mode [1]. It
> > >> seems we should not allow concurrent updates to cache if we on `fast
> > >> data load` step.
> > >> 3. recovery from a snapshot without cache stop\start actions
> > >>
> > >>
> > >> [1] https://issues.apache.org/jira/browse/IGNITE-11793
> > >>
> > >> On Thu, 3 Oct 2019 at 22:50, Sergey Kozlov 
> > wrote:
> > >> >
> > >> > Hi
> > >> >
> > >> > I'm not sure that node offline is a best way to do that.
> > >> > Cons:
> > >> >  - different caches may have different defragmentation but we force
> to
> > >> stop
> > >> > whole node
> > >> >  - offline node is a maintenance operation will require to add +1
> > >> backup to
> > >> > reduce the risk of data loss
> > >> >  - baseline auto adjustment?
> > >> >  - impact to index rebuild?
> > >> >  - cache configuration changes (or destroy) during node offline
> > >> >
> > >> > What about other ways without node stop? E.g. make cache group on a
> > node
> > >> > offline? Add *defrag  *command to control.sh to force
> > start
> > >> > rebalance internally in the node with expected impact to
> performance.
> > >> >
> > >> >
> > >> >
> > >> > On Thu, Oct 3, 2019 at 12:08 PM Anton Vinogradov 
> > wrote:
> > >> >
> > >> > > Alexey,
> > >> > > As for me, it does not matter will it be IEP, umbrella or a single
> > >> issue.
> > >> > > The most important thing is Assignee :)
> > >> > >
> > >> > > On Thu, Oct 3, 2019 at 11:59 AM Alexey Goncharuk <
> > >> > > alexey.goncha...@gmail.com>
> > >> > > wrote:
> > >> > >
> > >> > > > Anton, do you think we should file a single ticket for this or
> > >> should we
> > >> > > go
> > >> > > > with an IEP? As of now, the change does not look big enough for
> an
> > >> IEP
> > >> > > for
> > >> > > > me.
> > >> > > >
> > >> > > > чт, 3 окт. 2019 г. в 11:18, Anton Vinogradov :
> > >> > > >
> > >> > > > > Alexey,
> > >> > > > >
> > >> > > > > Sounds good to me.
> > >> > > > >
> > >> > > > > On Thu, Oct 3, 2019 at 10:51 AM Alexey Goncharuk <
> > >> > > > > alexey.goncha...@gmail.com>
> > >> > > > > wrote:
> > >> > > > >
> > >> > > > > > Anton,
> > >> > > > > >
> > >> > > > > > Switching a partition to and from the SHRINKING state will
> > >> require
> > >> > > > > > intricate synchronizations in order to properly determine
> the
> > >> start
> > >> > > > > > position for historical rebalance without PME.
> > >> > > > > >
> > >> > > > > > I would still go with an offline-node approach, but instead
> of
> > >> > > cleaning
> > >> > > > > the
> > >> > > > > > persistence, we can do effective defragmentation when the
> node
> > >> is
> > >> > > > offline
> > >> > > > > > because we are sure that there is no concurrent load. After
> > the
> > >> > > > > > defragmentation completes, we bring the node back to the
> > >> cluster and
> > >> > > > > > historical rebalance will kick in automatically. It will
> still
> > >> > > require
> > >> > > > > > manual node restarts, but since th

Re: Enabling the checkstyle profile on Build Apache Ignite suite (test period)

2019-10-07 Thread Ivan Pavlukhin
Let's give it a try.

пн, 7 окт. 2019 г. в 13:21, Nikolay Izhikov :
>
> +1
>
> В Пн, 07/10/2019 в 13:18 +0300, Maxim Muzafarov пишет:
> > Igniters,
> >
> >
> > I'm planning October 11 (Friday) 22-00 MSK enable `checkstyle` profile
> > on the [Build
> > Apache Ignite] suite by the end of the next weekend (one week test
> > period). Such an option discussed many times before (e.g. [1]).
> >
> > Here are the reasons:
> >
> > - any code style violations in a PR lead to source code fixes which in
> > turn require re-run of other test suites, so it is better to fail
> > fast;
> > - each a new Run:All suite (e.g. for a new module) must contain a
> > checkstyle suite to code style by default, so it is better to include
> > mandatory checks to the build Apache Ignite procedure;
> > - `fail fast` paradigm will eliminate all the check style violations,
> > currently in happens time ot time;
> >
> > The ability to create a prototype PR without code style checks still
> > exists. You can disable `checkstyle` profile for such PRs in your
> > local branches.
> >
> >
> > Any objections?
> >
> >
> > [1] 
> > http://apache-ignite-developers.2346864.n4.nabble.com/Code-inspection-td27709i80.html#a41297



-- 
Best regards,
Ivan Pavlukhin


Re: Adding experimental support for Intel Optane DC Persistent Memory

2019-10-07 Thread Alexey Goncharuk
Igniters,

I would like to resurrect this discussion and will review the change again
shortly. If anyone want to join the review - you are welcome!

ср, 22 авг. 2018 г. в 18:49, Denis Magda :

> Hi Dmitry,
>
> That's a BSD-3-Clause license if to believe this statement
> "SPDX-License-Identifier: BSD-3-Clause":
> https://github.com/pmem/llpl/blob/master/LICENSE
>
> This license can be used with ASF software:
> https://www.apache.org/legal/resolved.html#category-a
>
> --
> Denis
>
> On Wed, Aug 22, 2018 at 9:28 AM Dmitriy Pavlov 
> wrote:
>
> > Hi Denis,
> >
> > Could you please double check if we can refer to any library licensed to
> > Intel. Can we develop code only version of this support (without shipping
> > it in release)?
> >
> > https://github.com/apache/ignite/pull/4381 is quite huge change,
> > including 128 files changed, patch review will require resources from
> > community members to review. I would like to be sure we can include this
> > patch from the legal point of view.
> >
> > Sincerely,
> > Dmitriy Pavlov
> >
> > пт, 3 авг. 2018 г. в 19:23, Dmitriy Pavlov :
> >
> >> Hi Mulugeta,
> >>
> >> I appologise, I've missed that license is already there. So I guess it
> is
> >> not standard open-source license, it is seems it is not listed in
> >> https://www.apache.org/legal/resolved.html#category-a
> >>
> >> So there can be legal concern related to including this lib as
> dependency
> >> into Apache product. It should not block review, we can later
> >> consult Secretary/Legal to find out how we can correctly include
> reference
> >> to lib.
> >>
> >> Sincerely,
> >> Dmitriy Pavlov
> >>
> >> чт, 2 авг. 2018 г. в 0:24, Mammo, Mulugeta :
> >>
> >>> Hi Dmitriy,
> >>>
> >>> Do you mean our LLPL library? It has a license, please look here:
> >>> https://github.com/pmem/llpl
> >>>
> >>> Regarding the changes made to Ignite, you may refer to the pull request
> >>> here: https://github.com/apache/ignite/pull/4381
> >>>
> >>> Thanks,
> >>> Mulugeta
> >>>
> >>> -Original Message-
> >>> From: Dmitriy Pavlov [mailto:dpavlov@gmail.com]
> >>> Sent: Wednesday, August 1, 2018 10:49 AM
> >>> To: dev@ignite.apache.org
> >>> Subject: Re: Adding experimental support for Intel Optane DC Persistent
> >>> Memory
> >>>
> >>> Hi Mulugeta Mammo,
> >>>
> >>> I've just noticed that repository, what you refer is full fork of
> Ignite.
> >>> How can I see differences with original Ignite?
> >>>
> >>> One more thing, library which you're referencing seems to not contain
> >>> license, at least github can not parse it. Apache product has
> limitations
> >>> which libraries may be used (see
> >>> https://www.apache.org/legal/resolved.html#category-a and
> >>> https://www.apache.org/legal/resolved.html#category-b)
> >>>
> >>> Could you please comment if there is some legal risk?
> >>>
> >>> Sincerely,
> >>> Dmitriy Pavlov
> >>>
> >>> ср, 1 авг. 2018 г. в 20:43, Dmitriy Pavlov :
> >>>
> >>> > Hi,
> >>> >
> >>> > This link works for me
> >>> >
> >>> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-26%3A+Adding+Ex
> >>> > perimental+Support+for+Intel+Optane+DC+Persistent+Memory
> >>> >
> >>> > Sincerely,
> >>> > Dmitriy Pavlov
> >>> >
> >>> > чт, 26 июл. 2018 г. в 15:31, Stanislav Lukyanov <
> >>> stanlukya...@gmail.com>:
> >>> >
> >>> >> Ah, ok, it’s just the ‘.’ at the end of the link. Removed it and
> it’s
> >>> >> fine.
> >>> >>
> >>> >> From: Stanislav Lukyanov
> >>> >> Sent: 26 июля 2018 г. 15:12
> >>> >> To: dev@ignite.apache.org
> >>> >> Subject: RE: Adding experimental support for Intel Optane DC
> >>> >> Persistent Memory
> >>> >>
> >>> >> Hi,
> >>> >>
> >>> >> The link you’ve shared gives me 404.
> >>> >> Perhaps you need to add a permission for everyone to access the
> page?
> >>> >>
> >>> >> Thanks,
> >>> >> Stan
> >>> >>
> >>> >> From: Mammo, Mulugeta
> >>> >> Sent: 26 июля 2018 г. 2:44
> >>> >> To: dev@ignite.apache.org
> >>> >> Subject: Adding experimental support for Intel Optane DC Persistent
> >>> >> Memory
> >>> >>
> >>> >> Hi,
> >>> >>
> >>> >> I have added a new proposal to support Intel Optane DC Persistent
> >>> >> Memory for Ignite here:
> >>> >>
> https://cwiki.apache.org/confluence/display/IGNITE/Adding+Experimenta
> >>> >> l+Support+for+Intel+Optane+DC+Persistent+Memory
> >>> >> .
> >>> >>
> >>> >> I'm looking forward to your feedback and collaboration on this.
> >>> >>
> >>> >> Thanks,
> >>> >> Mulugeta
> >>> >>
> >>> >>
> >>> >>
> >>> >>
> >>>
> >>
>


Re: How to free up space on disc after removing entries from IgniteCache with enabled PDS?

2019-10-07 Thread Denis Magda
Alex, thanks for the summary and proposal. Anton, Ivan and others who took
part in this discussion, what're your thoughts? I see this
rolling-upgrades-based approach as a reasonable solution. Even though a
node shutdown is expected, the procedure doesn't lead to the cluster outage
meaning it can be utilized for 24x7 production environments.

-
Denis


On Mon, Oct 7, 2019 at 1:35 AM Alexey Goncharuk 
wrote:

> Created a ticket for the first stage of this improvement. This can be a
> first change towards the online mode suggested by Sergey and Anton.
> https://issues.apache.org/jira/browse/IGNITE-12263
>
> пт, 4 окт. 2019 г. в 19:38, Alexey Goncharuk :
>
> > Maxim,
> >
> > Having a cluster-wide lock for a cache does not improve availability of
> > the solution. A user cannot defragment a cache if the cache is involved
> in
> > a mission-critical operation, so having a lock on such a cache is
> > equivalent to the whole cluster shutdown.
> >
> > We should decide between either a single offline node or a more complex
> > fully online solution.
> >
> > пт, 4 окт. 2019 г. в 11:55, Maxim Muzafarov :
> >
> >> Igniters,
> >>
> >> This thread seems to be endless, but we if some kind of cache group
> >> distributed write lock (exclusive for some of the internal Ignite
> >> process) will be introduced? I think it will help to solve a batch of
> >> problems, like:
> >>
> >> 1. defragmentation of all cache group partitions on the local node
> >> without concurrent updates.
> >> 2. improve data loading with data streamer isolation mode [1]. It
> >> seems we should not allow concurrent updates to cache if we on `fast
> >> data load` step.
> >> 3. recovery from a snapshot without cache stop\start actions
> >>
> >>
> >> [1] https://issues.apache.org/jira/browse/IGNITE-11793
> >>
> >> On Thu, 3 Oct 2019 at 22:50, Sergey Kozlov 
> wrote:
> >> >
> >> > Hi
> >> >
> >> > I'm not sure that node offline is a best way to do that.
> >> > Cons:
> >> >  - different caches may have different defragmentation but we force to
> >> stop
> >> > whole node
> >> >  - offline node is a maintenance operation will require to add +1
> >> backup to
> >> > reduce the risk of data loss
> >> >  - baseline auto adjustment?
> >> >  - impact to index rebuild?
> >> >  - cache configuration changes (or destroy) during node offline
> >> >
> >> > What about other ways without node stop? E.g. make cache group on a
> node
> >> > offline? Add *defrag  *command to control.sh to force
> start
> >> > rebalance internally in the node with expected impact to performance.
> >> >
> >> >
> >> >
> >> > On Thu, Oct 3, 2019 at 12:08 PM Anton Vinogradov 
> wrote:
> >> >
> >> > > Alexey,
> >> > > As for me, it does not matter will it be IEP, umbrella or a single
> >> issue.
> >> > > The most important thing is Assignee :)
> >> > >
> >> > > On Thu, Oct 3, 2019 at 11:59 AM Alexey Goncharuk <
> >> > > alexey.goncha...@gmail.com>
> >> > > wrote:
> >> > >
> >> > > > Anton, do you think we should file a single ticket for this or
> >> should we
> >> > > go
> >> > > > with an IEP? As of now, the change does not look big enough for an
> >> IEP
> >> > > for
> >> > > > me.
> >> > > >
> >> > > > чт, 3 окт. 2019 г. в 11:18, Anton Vinogradov :
> >> > > >
> >> > > > > Alexey,
> >> > > > >
> >> > > > > Sounds good to me.
> >> > > > >
> >> > > > > On Thu, Oct 3, 2019 at 10:51 AM Alexey Goncharuk <
> >> > > > > alexey.goncha...@gmail.com>
> >> > > > > wrote:
> >> > > > >
> >> > > > > > Anton,
> >> > > > > >
> >> > > > > > Switching a partition to and from the SHRINKING state will
> >> require
> >> > > > > > intricate synchronizations in order to properly determine the
> >> start
> >> > > > > > position for historical rebalance without PME.
> >> > > > > >
> >> > > > > > I would still go with an offline-node approach, but instead of
> >> > > cleaning
> >> > > > > the
> >> > > > > > persistence, we can do effective defragmentation when the node
> >> is
> >> > > > offline
> >> > > > > > because we are sure that there is no concurrent load. After
> the
> >> > > > > > defragmentation completes, we bring the node back to the
> >> cluster and
> >> > > > > > historical rebalance will kick in automatically. It will still
> >> > > require
> >> > > > > > manual node restarts, but since the data is not removed, there
> >> are no
> >> > > > > > additional risks. Also, this will be an excellent solution for
> >> those
> >> > > > who
> >> > > > > > can afford downtime and execute the defragment command on all
> >> nodes
> >> > > in
> >> > > > > the
> >> > > > > > cluster simultaneously - this will be the fastest way
> possible.
> >> > > > > >
> >> > > > > > --AG
> >> > > > > >
> >> > > > > > пн, 30 сент. 2019 г. в 09:29, Anton Vinogradov  >:
> >> > > > > >
> >> > > > > > > Alexei,
> >> > > > > > > >> stopping fragmented node and removing partition data,
> then
> >> > > > starting
> >> > > > > it
> >> > > > > > > again
> >> > > > > > >
> >> > > > > > > That's exactly what we're doing to solve the fragmentat

[jira] [Created] (IGNITE-12267) ClassCastException after change column type (drop, add)

2019-10-07 Thread Kirill Tkalenko (Jira)
Kirill Tkalenko created IGNITE-12267:


 Summary: ClassCastException after change column type (drop, add)
 Key: IGNITE-12267
 URL: https://issues.apache.org/jira/browse/IGNITE-12267
 Project: Ignite
  Issue Type: Improvement
Reporter: Kirill Tkalenko
Assignee: Kirill Tkalenko
 Fix For: 2.8


SQL column type change is not present, but it is possible to delete and create 
with a new type.
The application of the migration script passes without errors.
The error occurs whenever the column is accessed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Metric showing how many nodes may safely leave the cluster

2019-10-07 Thread Ivan Rakov

Denis, Alex,

Sure, new metric will be integrated into new metrics framework.
Let's not expose its value to control.sh right now. I'll create an issue 
for aggregated "getMinimumNumberOfPartitionCopies" if everyone agrees.


Best Regards,
Ivan Rakov

On 04.10.2019 20:06, Denis Magda wrote:

I'm for the proposal to add new JMX metrics and enhance the existing
tooling. But I would encourage us to integrate this into the new metrics
framework Nikolay has been working on. Otherwise, we will be deprecating
these JMX metrics in a short time frame in favor of the new monitoring APIs.

-
Denis


On Fri, Oct 4, 2019 at 9:33 AM Alexey Goncharuk 
wrote:


I agree that we should have the ability to read any metric using simple
Ignite tooling. I am not sure if visor.sh is a good fit - if I
remember correctly, it will start a daemon node which will bump the
topology version with all related consequences. I believe in the long term
it will beneficial to migrate all visor.sh functionality to a more
lightweight protocol, such as used in control.sh.

As for the metrics, the metric suggested by Ivan totally makes sense to me
- it is a simple and, actually, quite critical metric. It will be
completely unusable to select a minimum of some metric for all cache groups
manually. A monitoring system, on the other hand, might not be available
when the metric is needed, or may not support aggregation.

--AG

пт, 4 окт. 2019 г. в 18:58, Ivan Rakov :


Nikolay,

Many users start to use Ignite with a small project without
production-level monitoring. When proof-of-concept appears to be viable,
they tend to expand Ignite usage by growing cluster and adding needed
environment (including monitoring systems).
Inability to find such basic thing as survival in case of next node
crash may affect overall product impression. We all want Ignite to be
successful and widespread.


Can you clarify, what do you mean, exactly?

Right now user can access metric mentioned by Alex and choose minimum of
all cache groups. I want to highlight that not every user understands
Ignite and its internals so much to find out that exactly these sequence
of actions will bring him to desired answer.


Can you clarify, what do you mean, exactly?
We have a ticket[1] to support metrics output via visor.sh.

My understanding: we should have an easy way to output metric values

for

each node in cluster.

[1] https://issues.apache.org/jira/browse/IGNITE-12191

I propose to add metric method for aggregated
"getMinimumNumberOfPartitionCopies" and expose it to control.sh.
My understanding: it's result is critical enough to be accessible in a
short path. I've started this topic due to request from user list, and
I've heard many similar complaints before.

Best Regards,
Ivan Rakov

On 04.10.2019 17:18, Nikolay Izhikov wrote:

Ivan.


We shouldn't force users to configure external tools and write extra

code for basic things.

Actually, I don't agree with you.
Having external monitoring system for any production cluster is a

*basic* thing.

Can you, please, define "basic things"?


single method for the whole cluster

Can you clarify, what do you mean, exactly?
We have a ticket[1] to support metrics output via visor.sh.

My understanding: we should have an easy way to output metric values

for

each node in cluster.

[1] https://issues.apache.org/jira/browse/IGNITE-12191


В Пт, 04/10/2019 в 17:09 +0300, Ivan Rakov пишет:

Max,

What if user simply don't have configured monitoring system?
Knowing whether cluster will survive node shutdown is critical for any
administrator that performs any manipulations with cluster topology.
Essential information should be easily accessed. We shouldn't force
users to configure external tools and write extra code for basic

things.

Alex,

Thanks, that's exact metric we need.
My point is that we should make it more accessible: via control.sh
command and single method for the whole cluster.

Best Regards,
Ivan Rakov

On 04.10.2019 16:34, Alex Plehanov wrote:

Ivan, there already exist metric
CacheGroupMetricsMXBean#getMinimumNumberOfPartitionCopies, which

shows

the

current redundancy level for the cache group.
We can lose up to ( getMinimumNumberOfPartitionCopies-1) nodes

without

data

loss in this cache group.

пт, 4 окт. 2019 г. в 16:17, Ivan Rakov :


Igniters,

I've seen numerous requests to find out an easy way to check whether

is

it safe to turn off cluster node. As we know, in Ignite protection

from

sudden node shutdown is implemented through keeping several backup
copies of each partition. However, this guarantee can be weakened

for

a

while in case cluster has recently experienced node restart and
rebalancing process is still in progress.
Example scenario is restarting nodes one by one in order to update a
local configuration parameter. User restarts one node and

rebalancing

starts: when it will be completed, it will be safe to proceed

(backup

count=1). However, there's no transparent way to determine whether
rebalanci

[jira] [Created] (IGNITE-12266) Add limit parameter to Platforms for processing TextQuery

2019-10-07 Thread Yuriy Shuliha (Jira)
Yuriy Shuliha  created IGNITE-12266:
---

 Summary: Add limit parameter to Platforms for processing TextQuery
 Key: IGNITE-12266
 URL: https://issues.apache.org/jira/browse/IGNITE-12266
 Project: Ignite
  Issue Type: Improvement
  Components: platforms
Reporter: Yuriy Shuliha 
Assignee: Yuriy Shuliha 
 Fix For: 2.8






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12265) JavaDoc doesn't have documentation for the org.apache.ignite.client package

2019-10-07 Thread Denis Mekhanikov (Jira)
Denis Mekhanikov created IGNITE-12265:
-

 Summary: JavaDoc doesn't have documentation for the 
org.apache.ignite.client package
 Key: IGNITE-12265
 URL: https://issues.apache.org/jira/browse/IGNITE-12265
 Project: Ignite
  Issue Type: Bug
Reporter: Denis Mekhanikov


JavaDoc published on the website doesn't have documentation for the 
{{org.apache.ignite.client}} package. Link to the website: 
[https://ignite.apache.org/releases/2.7.6/javadoc/]

A lack of {{package-info.java}} file or exclusion from the 
{{maven-javadoc-plugin}} in the root {{pom.xml}} may be the reason.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12264) Private application data should not be lit in the logs, exceptions, ERROR, WARN etc.

2019-10-07 Thread Pushenko Kirill (Jira)
Pushenko Kirill created IGNITE-12264:


 Summary: Private application data should not be lit in the logs, 
exceptions, ERROR, WARN etc.
 Key: IGNITE-12264
 URL: https://issues.apache.org/jira/browse/IGNITE-12264
 Project: Ignite
  Issue Type: Improvement
Affects Versions: 2.7.6
Reporter: Pushenko Kirill


Private application data should not be lit in the logs, exceptions, ERROR, WARN 
etc.

The executions contained a value in which there were cardboard numbers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Enabling the checkstyle profile on Build Apache Ignite suite (test period)

2019-10-07 Thread Nikolay Izhikov
+1

В Пн, 07/10/2019 в 13:18 +0300, Maxim Muzafarov пишет:
> Igniters,
> 
> 
> I'm planning October 11 (Friday) 22-00 MSK enable `checkstyle` profile
> on the [Build
> Apache Ignite] suite by the end of the next weekend (one week test
> period). Such an option discussed many times before (e.g. [1]).
> 
> Here are the reasons:
> 
> - any code style violations in a PR lead to source code fixes which in
> turn require re-run of other test suites, so it is better to fail
> fast;
> - each a new Run:All suite (e.g. for a new module) must contain a
> checkstyle suite to code style by default, so it is better to include
> mandatory checks to the build Apache Ignite procedure;
> - `fail fast` paradigm will eliminate all the check style violations,
> currently in happens time ot time;
> 
> The ability to create a prototype PR without code style checks still
> exists. You can disable `checkstyle` profile for such PRs in your
> local branches.
> 
> 
> Any objections?
> 
> 
> [1] 
> http://apache-ignite-developers.2346864.n4.nabble.com/Code-inspection-td27709i80.html#a41297


signature.asc
Description: This is a digitally signed message part


Enabling the checkstyle profile on Build Apache Ignite suite (test period)

2019-10-07 Thread Maxim Muzafarov
Igniters,


I'm planning October 11 (Friday) 22-00 MSK enable `checkstyle` profile
on the [Build
Apache Ignite] suite by the end of the next weekend (one week test
period). Such an option discussed many times before (e.g. [1]).

Here are the reasons:

- any code style violations in a PR lead to source code fixes which in
turn require re-run of other test suites, so it is better to fail
fast;
- each a new Run:All suite (e.g. for a new module) must contain a
checkstyle suite to code style by default, so it is better to include
mandatory checks to the build Apache Ignite procedure;
- `fail fast` paradigm will eliminate all the check style violations,
currently in happens time ot time;

The ability to create a prototype PR without code style checks still
exists. You can disable `checkstyle` profile for such PRs in your
local branches.


Any objections?


[1] 
http://apache-ignite-developers.2346864.n4.nabble.com/Code-inspection-td27709i80.html#a41297


Re: How to free up space on disc after removing entries from IgniteCache with enabled PDS?

2019-10-07 Thread Alexey Goncharuk
Created a ticket for the first stage of this improvement. This can be a
first change towards the online mode suggested by Sergey and Anton.
https://issues.apache.org/jira/browse/IGNITE-12263

пт, 4 окт. 2019 г. в 19:38, Alexey Goncharuk :

> Maxim,
>
> Having a cluster-wide lock for a cache does not improve availability of
> the solution. A user cannot defragment a cache if the cache is involved in
> a mission-critical operation, so having a lock on such a cache is
> equivalent to the whole cluster shutdown.
>
> We should decide between either a single offline node or a more complex
> fully online solution.
>
> пт, 4 окт. 2019 г. в 11:55, Maxim Muzafarov :
>
>> Igniters,
>>
>> This thread seems to be endless, but we if some kind of cache group
>> distributed write lock (exclusive for some of the internal Ignite
>> process) will be introduced? I think it will help to solve a batch of
>> problems, like:
>>
>> 1. defragmentation of all cache group partitions on the local node
>> without concurrent updates.
>> 2. improve data loading with data streamer isolation mode [1]. It
>> seems we should not allow concurrent updates to cache if we on `fast
>> data load` step.
>> 3. recovery from a snapshot without cache stop\start actions
>>
>>
>> [1] https://issues.apache.org/jira/browse/IGNITE-11793
>>
>> On Thu, 3 Oct 2019 at 22:50, Sergey Kozlov  wrote:
>> >
>> > Hi
>> >
>> > I'm not sure that node offline is a best way to do that.
>> > Cons:
>> >  - different caches may have different defragmentation but we force to
>> stop
>> > whole node
>> >  - offline node is a maintenance operation will require to add +1
>> backup to
>> > reduce the risk of data loss
>> >  - baseline auto adjustment?
>> >  - impact to index rebuild?
>> >  - cache configuration changes (or destroy) during node offline
>> >
>> > What about other ways without node stop? E.g. make cache group on a node
>> > offline? Add *defrag  *command to control.sh to force start
>> > rebalance internally in the node with expected impact to performance.
>> >
>> >
>> >
>> > On Thu, Oct 3, 2019 at 12:08 PM Anton Vinogradov  wrote:
>> >
>> > > Alexey,
>> > > As for me, it does not matter will it be IEP, umbrella or a single
>> issue.
>> > > The most important thing is Assignee :)
>> > >
>> > > On Thu, Oct 3, 2019 at 11:59 AM Alexey Goncharuk <
>> > > alexey.goncha...@gmail.com>
>> > > wrote:
>> > >
>> > > > Anton, do you think we should file a single ticket for this or
>> should we
>> > > go
>> > > > with an IEP? As of now, the change does not look big enough for an
>> IEP
>> > > for
>> > > > me.
>> > > >
>> > > > чт, 3 окт. 2019 г. в 11:18, Anton Vinogradov :
>> > > >
>> > > > > Alexey,
>> > > > >
>> > > > > Sounds good to me.
>> > > > >
>> > > > > On Thu, Oct 3, 2019 at 10:51 AM Alexey Goncharuk <
>> > > > > alexey.goncha...@gmail.com>
>> > > > > wrote:
>> > > > >
>> > > > > > Anton,
>> > > > > >
>> > > > > > Switching a partition to and from the SHRINKING state will
>> require
>> > > > > > intricate synchronizations in order to properly determine the
>> start
>> > > > > > position for historical rebalance without PME.
>> > > > > >
>> > > > > > I would still go with an offline-node approach, but instead of
>> > > cleaning
>> > > > > the
>> > > > > > persistence, we can do effective defragmentation when the node
>> is
>> > > > offline
>> > > > > > because we are sure that there is no concurrent load. After the
>> > > > > > defragmentation completes, we bring the node back to the
>> cluster and
>> > > > > > historical rebalance will kick in automatically. It will still
>> > > require
>> > > > > > manual node restarts, but since the data is not removed, there
>> are no
>> > > > > > additional risks. Also, this will be an excellent solution for
>> those
>> > > > who
>> > > > > > can afford downtime and execute the defragment command on all
>> nodes
>> > > in
>> > > > > the
>> > > > > > cluster simultaneously - this will be the fastest way possible.
>> > > > > >
>> > > > > > --AG
>> > > > > >
>> > > > > > пн, 30 сент. 2019 г. в 09:29, Anton Vinogradov :
>> > > > > >
>> > > > > > > Alexei,
>> > > > > > > >> stopping fragmented node and removing partition data, then
>> > > > starting
>> > > > > it
>> > > > > > > again
>> > > > > > >
>> > > > > > > That's exactly what we're doing to solve the fragmentation
>> issue.
>> > > > > > > The problem here is that we have to perform N/B
>> restart-rebalance
>> > > > > > > operations (N - cluster size, B - backups count) and it takes
>> a lot
>> > > > of
>> > > > > > time
>> > > > > > > with risks to lose the data.
>> > > > > > >
>> > > > > > > On Fri, Sep 27, 2019 at 5:49 PM Alexei Scherbakov <
>> > > > > > > alexey.scherbak...@gmail.com> wrote:
>> > > > > > >
>> > > > > > > > Probably this should be allowed to do using public API,
>> actually
>> > > > this
>> > > > > > is
>> > > > > > > > same as manual rebalancing.
>> > > > > > > >
>> > > > > > > > пт, 27 сент. 2019 г. в 17:40, Alexei Scherbakov <
>> > > > > > > > alexey.scherb

[jira] [Created] (IGNITE-12263) Introduce native persistence compaction operation

2019-10-07 Thread Alexey Goncharuk (Jira)
Alexey Goncharuk created IGNITE-12263:
-

 Summary: Introduce native persistence compaction operation
 Key: IGNITE-12263
 URL: https://issues.apache.org/jira/browse/IGNITE-12263
 Project: Ignite
  Issue Type: Improvement
Reporter: Alexey Goncharuk


Currently, Ignite native persistence does not shrink storage files after 
key-value pairs are removed.
The causes of this behavior are:
 * The absence of a mechanism that allows Ignite to track highest non-empty 
page position in a partition file
 * The absence of a mechanism which allows Ignite to select a page closest to 
the file beginning for write
 * The absence of a mechanism which allows Ignite to move a key-value pair from 
page to page during defragmentation

As an initial change I suggest to introduce a new node startup mode, which will 
run a defragmentation procedure allowing the node to shrink storage files. The 
procedure will not mutate the logical state of a partition allowing further 
historical rebalance to quickly catch up the node. Since the procedure will run 
during the node startup (during the final stages of recovery), there will be no 
concurrent load, thus the entries can be freely moved from page to page with no 
tricky synchronization.

If a procedure is applied during the whole cluster restart, then all nodes will 
be defragmented simultaneously, allowing for a quicker parallel defragmentation 
at a cost of downtime.

The procedure should accept an optional list of cache groups to defragment to 
allow arbitrary cache group selection for defragmentation.

An idea of the actions taken during the run for each partition selected for 
defragmentation:
 * Partition pages are preloaded to memory if possible to avoid excessive page 
replacement. During the scan, a HWM of the written data is detected (empty 
pages are skipped)
 * Pages references in a free list are sorted in a way allowing to pick pages 
closest to the file start
 * The partition is scanned in reverse order, key-value pairs are moved closer 
to the file start, HWM is updated accordingly. This step is particularly open 
for various optimizations because different strategies will work well for 
different fragmentation patterns.
 * After the scan iteration is completed, the file size can be updated 
according to the HWM

As a further improvement, this partition defragmentation procedure can be later 
run in online mode, after proper cache update protocol changes are designed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: How to free up space on disc after removing entries from IgniteCache with enabled PDS?

2019-10-07 Thread Ilya Kasnacheev
Hello!

I think that good robust approach is to start background thread which will
try to compact pages and remove unneeded ones. It should only be active
when system is reasonably idle, or if there's severe fragmentation problem.

However, I am aware that implementing such heurestical cleaner is a
challenging task.

Regards,
-- 
Ilya Kasnacheev


пт, 4 окт. 2019 г. в 19:38, Alexey Goncharuk :

> Maxim,
>
> Having a cluster-wide lock for a cache does not improve availability of the
> solution. A user cannot defragment a cache if the cache is involved in a
> mission-critical operation, so having a lock on such a cache is equivalent
> to the whole cluster shutdown.
>
> We should decide between either a single offline node or a more complex
> fully online solution.
>
> пт, 4 окт. 2019 г. в 11:55, Maxim Muzafarov :
>
> > Igniters,
> >
> > This thread seems to be endless, but we if some kind of cache group
> > distributed write lock (exclusive for some of the internal Ignite
> > process) will be introduced? I think it will help to solve a batch of
> > problems, like:
> >
> > 1. defragmentation of all cache group partitions on the local node
> > without concurrent updates.
> > 2. improve data loading with data streamer isolation mode [1]. It
> > seems we should not allow concurrent updates to cache if we on `fast
> > data load` step.
> > 3. recovery from a snapshot without cache stop\start actions
> >
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-11793
> >
> > On Thu, 3 Oct 2019 at 22:50, Sergey Kozlov  wrote:
> > >
> > > Hi
> > >
> > > I'm not sure that node offline is a best way to do that.
> > > Cons:
> > >  - different caches may have different defragmentation but we force to
> > stop
> > > whole node
> > >  - offline node is a maintenance operation will require to add +1
> backup
> > to
> > > reduce the risk of data loss
> > >  - baseline auto adjustment?
> > >  - impact to index rebuild?
> > >  - cache configuration changes (or destroy) during node offline
> > >
> > > What about other ways without node stop? E.g. make cache group on a
> node
> > > offline? Add *defrag  *command to control.sh to force
> start
> > > rebalance internally in the node with expected impact to performance.
> > >
> > >
> > >
> > > On Thu, Oct 3, 2019 at 12:08 PM Anton Vinogradov 
> wrote:
> > >
> > > > Alexey,
> > > > As for me, it does not matter will it be IEP, umbrella or a single
> > issue.
> > > > The most important thing is Assignee :)
> > > >
> > > > On Thu, Oct 3, 2019 at 11:59 AM Alexey Goncharuk <
> > > > alexey.goncha...@gmail.com>
> > > > wrote:
> > > >
> > > > > Anton, do you think we should file a single ticket for this or
> > should we
> > > > go
> > > > > with an IEP? As of now, the change does not look big enough for an
> > IEP
> > > > for
> > > > > me.
> > > > >
> > > > > чт, 3 окт. 2019 г. в 11:18, Anton Vinogradov :
> > > > >
> > > > > > Alexey,
> > > > > >
> > > > > > Sounds good to me.
> > > > > >
> > > > > > On Thu, Oct 3, 2019 at 10:51 AM Alexey Goncharuk <
> > > > > > alexey.goncha...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Anton,
> > > > > > >
> > > > > > > Switching a partition to and from the SHRINKING state will
> > require
> > > > > > > intricate synchronizations in order to properly determine the
> > start
> > > > > > > position for historical rebalance without PME.
> > > > > > >
> > > > > > > I would still go with an offline-node approach, but instead of
> > > > cleaning
> > > > > > the
> > > > > > > persistence, we can do effective defragmentation when the node
> is
> > > > > offline
> > > > > > > because we are sure that there is no concurrent load. After the
> > > > > > > defragmentation completes, we bring the node back to the
> cluster
> > and
> > > > > > > historical rebalance will kick in automatically. It will still
> > > > require
> > > > > > > manual node restarts, but since the data is not removed, there
> > are no
> > > > > > > additional risks. Also, this will be an excellent solution for
> > those
> > > > > who
> > > > > > > can afford downtime and execute the defragment command on all
> > nodes
> > > > in
> > > > > > the
> > > > > > > cluster simultaneously - this will be the fastest way possible.
> > > > > > >
> > > > > > > --AG
> > > > > > >
> > > > > > > пн, 30 сент. 2019 г. в 09:29, Anton Vinogradov  >:
> > > > > > >
> > > > > > > > Alexei,
> > > > > > > > >> stopping fragmented node and removing partition data, then
> > > > > starting
> > > > > > it
> > > > > > > > again
> > > > > > > >
> > > > > > > > That's exactly what we're doing to solve the fragmentation
> > issue.
> > > > > > > > The problem here is that we have to perform N/B
> > restart-rebalance
> > > > > > > > operations (N - cluster size, B - backups count) and it takes
> > a lot
> > > > > of
> > > > > > > time
> > > > > > > > with risks to lose the data.
> > > > > > > >
> > > > > > > > On Fri, Sep 27, 2019 at 5:49 PM Alexei Scherbakov <
> > > > > > > > alexey.scherbak...@gmail.com> wrote:
>