[MTCGA]: new failures in builds [5991299] needs to be handled

2021-05-06 Thread dpavlov . tasks
Hi Igniters,

 I've detected some new issue on TeamCity to be handled. You are more than 
welcomed to help.

 *Test with high flaky rate in master 
CacheExchangeMergeTest.testFailExchangeCoordinatorChange_NoMerge_2 
https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8&testNameId=-3746402421590627771&branch=%3Cdefault%3E&tab=testDetails
 No changes in the build

 - Here's a reminder of what contributors were agreed to do 
https://cwiki.apache.org/confluence/display/IGNITE/How+to+Contribute 
 - Should you have any questions please contact dev@ignite.apache.org 

Best Regards,
Apache Ignite TeamCity Bot 
https://github.com/apache/ignite-teamcity-bot
Notification generated at 01:40:20 07-05-2021 


[MTCGA]: new failures in builds [5996118] needs to be handled

2021-05-06 Thread dpavlov . tasks
Hi Igniters,

 I've detected some new issue on TeamCity to be handled. You are more than 
welcomed to help.

 *New Critical Failure in master RDD 
https://ci.ignite.apache.org/buildConfiguration/IgniteTests24Java8_Rdd?branch=%3Cdefault%3E
 No changes in the build

 - Here's a reminder of what contributors were agreed to do 
https://cwiki.apache.org/confluence/display/IGNITE/How+to+Contribute 
 - Should you have any questions please contact dev@ignite.apache.org 

Best Regards,
Apache Ignite TeamCity Bot 
https://github.com/apache/ignite-teamcity-bot
Notification generated at 01:10:20 07-05-2021 


Re: Exceeding the DataStorageConfiguration#getMaxWalArchiveSize due to historical rebalance

2021-05-06 Thread Stanislav Lukyanov
An interesting suggestion I heard today.

The minWalArchiveSize property might actually be minWalArchiveTimespan - i.e. 
be a number of seconds instead of a number of bytes!

I think this makes perfect sense from the user point of view.
"I want to have WAL archive for at least N hours but I have a limit of M 
gigabytes to store it".

Do we have checkpoint timestamp stored anywhere? (cp start markers?)
Perhaps we can actually implement this?

Thanks,
Stan


> On 6 May 2021, at 14:13, Stanislav Lukyanov  wrote:
> 
> +1 to cancel WAL reservation on reaching getMaxWalArchiveSize
> +1 to add a public property to replace 
> IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE
> 
> I don't like the name getWalArchiveSize - I think it's a bit confusing (is it 
> the current size? the minimal size? the target size?)
> I suggest to name the property geMintWalArchiveSize. I think that this is 
> exactly what it is - the minimal size of the archive that we want to have.
> The archive size at all times should be between min and max.
> If archive size is less than min or more than max then the system 
> functionality can degrade (e.g. historical rebalance may not work as 
> expected).
> I think these rules are intuitively understood from the "min" and "max" names.
> 
> Ilya's suggestion about throttling is great although I'd do this in a 
> different ticket.
> 
> Thanks,
> Stan
> 
>> On 5 May 2021, at 19:25, Maxim Muzafarov  wrote:
>> 
>> Hello, Kirill
>> 
>> +1 for this change, however, there are too many configuration settings
>> that exist for the user to configure Ignite cluster. It is better to
>> keep the options that we already have and fix the behaviour of the
>> rebalance process as you suggested.
>> 
>> On Tue, 4 May 2021 at 19:01, ткаленко кирилл  wrote:
>>> 
>>> Hi Ilya!
>>> 
>>> Then we can greatly reduce the user load on the cluster until the rebalance 
>>> is over. Which can be critical for the user.
>>> 
>>> 04.05.2021, 18:43, "Ilya Kasnacheev" :
 Hello!
 
 Maybe we can have a mechanic here similar (or equal) to checkpoint based
 write throttling?
 
 So we will be throttling for both checkpoint page buffer and WAL limit.
 
 Regards,
 --
 Ilya Kasnacheev
 
 вт, 4 мая 2021 г. в 11:29, ткаленко кирилл :
 
> Hello everybody!
> 
> At the moment, if there are partitions for the rebalance for which the
> historical rebalance will be used, then we reserve segments in the WAL
> archive (we do not allow cleaning the WAL archive) until the rebalance for
> all cache groups is over.
> 
> If a cluster is under load during the rebalance, WAL archive size may
> significantly exceed limits set in
> DataStorageConfiguration#getMaxWalArchiveSize until the process is
> complete. This may lead to user issues and nodes may crash with the "No
> space left on device" error.
> 
> We have a system property IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE by
> default 0.5, which sets the threshold (multiplied by getMaxWalArchiveSize)
> from which and up to which the WAL archive will be cleared, i.e. sets the
> size of the WAL archive that will always be on the node. I propose to
> replace this system property with the
> DataStorageConfiguration#getWalArchiveSize in bytes, the default is
> (getMaxWalArchiveSize * 0.5) as it is now.
> 
> Main proposal:
> When theDataStorageConfiguration#getMaxWalArchiveSize is reached, cancel
> and do not give the reservation of the WAL segments until we reach
> DataStorageConfiguration#getWalArchiveSize. In this case, if there is no
> segment for historical rebalance, we will automatically switch to full
> rebalance.
> 



ASF report May 2021, due 12 May, eod

2021-05-06 Thread Dmitry Pavlov
Hi Igniters,

Its time for our TLP to prepare next report to the Board.

Drafting report in reported on in a mailing thread is not so convinitent, so 
I've started to prepare it here: 
https://cwiki.apache.org/confluence/display/IGNITE/ASF+Board+Report+Drafts

It contains now very first version, you're welcome to suggest here in the 
thread any of
- significiant community decisions
- released artifacts
- any events/facts worth mentioning.

There are 2 issues that stops from publish the report rigth now,

1) for some reason our user@iao list statistics is not collected since March 
(since 10th week of 2021). Most likely Infra and/or com.devs could help here.
2) our downloads page does not show downloads for extensions: 
https://ignite.apache.org/download.cgi#extensions is not visible for me. Is it 
my local issue?

Sincerely,
Dmitriy Pavlov


[Ignite Summit] Keynote Speakers and how can you influence the future schedule of the event

2021-05-06 Thread Kseniya Romanova
Hi Igniters!

The CFP is closed, and our Program Committee members are working hard on
submission evaluation. You can check accepted keynotes on the website:
https://ignite-summit.org/schedule


The future of elastic databases - Dmitriy Setrakyan, Apache Ignite PMC, Sr.
Manager at AWS Aurora


Intel® Optane™ Persistent Memory (PMem): A Revolutionary Technology for
Databases - Evgeny Kharchenko, Technical Lead of FasterLab, Intel
corporation & Valentin Kulichenko, Apache Ignite PMC, Director of Product
Manager, GridGain


Real-Time Exposure Management Using Ignite - Rich Spencer, JPMC, Executive
Director


Ignite for Complex Event Processing and Event-Driven Microservices - Manoel
Pereira de Lima Junior, IT Specialist and Solutions Manager at Banco do
Brasil


Ignite as SaaS/Cloud system integration platform - Craig Gresbrink, 24 Hour
Fitness Ignite Product Owner and Solutions Architect




The schedule will be ready next week, and you can make its time more
convenient for you if you register for the Ignite Summit right now:
https://ignite-summit.org/register We need a compromise between the time
zones of the speakers and time zones of the attendees. Let us know your
location so we have better statistics for making decisions on timing!

Cheers,
Kseniya


Re: Change IGNITE_PDS_WAL_REBALANCE_THRESHOLD from System property to Configuraton

2021-05-06 Thread Eduard Rakhmankulov
Some addition.

I want to add configuration to
org.apache.ignite.configuration.DataStorageConfiguration#getDefaultWarmUpConfiguration#getP
artitionWalRebalanceThreshold
which will have same semantics as system property (number of entries in WAL
to trigger rebalance).

On Thu, 6 May 2021 at 15:50, Eduard Rakhmankulov 
wrote:

> Hello, Igniters!
>
> I suggest changing IGNITE_PDS_WAL_REBALANCE_THRESHOLD from system
> properties to IgniteConfiguration.
> This configuration is effectively cluster-wide (because only the
> coordinator's configuration matters when the heuristic with this property
> applies).
>
> It is easier to validate that we have the same configuration on all nodes
> than system property (in the case when another coordinator was elected).
>
> --
> Best regards, Eduard.
>


-- 
С уважением, Рахманкулов Э.Р.


Change IGNITE_PDS_WAL_REBALANCE_THRESHOLD from System property to Configuraton

2021-05-06 Thread Eduard Rakhmankulov
Hello, Igniters!

I suggest changing IGNITE_PDS_WAL_REBALANCE_THRESHOLD from system
properties to IgniteConfiguration.
This configuration is effectively cluster-wide (because only the
coordinator's configuration matters when the heuristic with this property
applies).

It is easier to validate that we have the same configuration on all nodes
than system property (in the case when another coordinator was elected).

-- 
Best regards, Eduard.


Re: Exceeding the DataStorageConfiguration#getMaxWalArchiveSize due to historical rebalance

2021-05-06 Thread Stanislav Lukyanov
+1 to cancel WAL reservation on reaching getMaxWalArchiveSize
+1 to add a public property to replace 
IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE

I don't like the name getWalArchiveSize - I think it's a bit confusing (is it 
the current size? the minimal size? the target size?)
I suggest to name the property geMintWalArchiveSize. I think that this is 
exactly what it is - the minimal size of the archive that we want to have.
The archive size at all times should be between min and max.
If archive size is less than min or more than max then the system functionality 
can degrade (e.g. historical rebalance may not work as expected).
I think these rules are intuitively understood from the "min" and "max" names.

Ilya's suggestion about throttling is great although I'd do this in a different 
ticket.

Thanks,
Stan

> On 5 May 2021, at 19:25, Maxim Muzafarov  wrote:
> 
> Hello, Kirill
> 
> +1 for this change, however, there are too many configuration settings
> that exist for the user to configure Ignite cluster. It is better to
> keep the options that we already have and fix the behaviour of the
> rebalance process as you suggested.
> 
> On Tue, 4 May 2021 at 19:01, ткаленко кирилл  wrote:
>> 
>> Hi Ilya!
>> 
>> Then we can greatly reduce the user load on the cluster until the rebalance 
>> is over. Which can be critical for the user.
>> 
>> 04.05.2021, 18:43, "Ilya Kasnacheev" :
>>> Hello!
>>> 
>>> Maybe we can have a mechanic here similar (or equal) to checkpoint based
>>> write throttling?
>>> 
>>> So we will be throttling for both checkpoint page buffer and WAL limit.
>>> 
>>> Regards,
>>> --
>>> Ilya Kasnacheev
>>> 
>>> вт, 4 мая 2021 г. в 11:29, ткаленко кирилл :
>>> 
 Hello everybody!
 
 At the moment, if there are partitions for the rebalance for which the
 historical rebalance will be used, then we reserve segments in the WAL
 archive (we do not allow cleaning the WAL archive) until the rebalance for
 all cache groups is over.
 
 If a cluster is under load during the rebalance, WAL archive size may
 significantly exceed limits set in
 DataStorageConfiguration#getMaxWalArchiveSize until the process is
 complete. This may lead to user issues and nodes may crash with the "No
 space left on device" error.
 
 We have a system property IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE by
 default 0.5, which sets the threshold (multiplied by getMaxWalArchiveSize)
 from which and up to which the WAL archive will be cleared, i.e. sets the
 size of the WAL archive that will always be on the node. I propose to
 replace this system property with the
  DataStorageConfiguration#getWalArchiveSize in bytes, the default is
 (getMaxWalArchiveSize * 0.5) as it is now.
 
 Main proposal:
 When theDataStorageConfiguration#getMaxWalArchiveSize is reached, cancel
 and do not give the reservation of the WAL segments until we reach
 DataStorageConfiguration#getWalArchiveSize. In this case, if there is no
 segment for historical rebalance, we will automatically switch to full
 rebalance.



Memory corruption in overflow in Binary*OutputStream

2021-05-06 Thread Sergey
Hi, Igniters.
I've came across JVM crush recently. And after investigations I've found
Binary(Off)HeapOutputStream does not check correctly for overflow of the
capacity counter and can write to an unallocated memory.
Please take a look at https://issues.apache.org/jira/browse/IGNITE-14687

Best regards,
Sergey Kosarev.