[jira] [Created] (IGNITE-11809) Missing help for control.sh --baseline collect command.

2019-04-25 Thread Sergey Antonov (JIRA)
Sergey Antonov created IGNITE-11809:
---

 Summary: Missing help for control.sh --baseline collect command.
 Key: IGNITE-11809
 URL: https://issues.apache.org/jira/browse/IGNITE-11809
 Project: Ignite
  Issue Type: Bug
Reporter: Sergey Antonov


I found enum value {{BaselineCommand#COLLECT}}, but I don't see this option in 
help output.  We should add example with {{BaselineCommand#COLLECT}} to 
{{control.sh --help output}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: TC build queue stall

2019-04-25 Thread Павлухин Иван
I digged a little bit deeper into causes of the VCS problem reported
by TC. It looks like Russian characters in commit details (an author
name here) drive TC crazy. Still not sure that it is the root cause
(quite weird if so).

I found a PR with problematic commits [1]. Dmitrii Ryabov, Dmitriy
Sorokin and Mikhail Petrov commited to that PR. Guys could you please
close the PR in order to check that it makes TC mad? It seems that
closing PR makes no harm because it can be easily reopened after. Or,
for example squash commits and force push to PR avoiding Russian
characters.

Need your assistance here.

[1] https://github.com/apache/ignite/pull/6223/commits

пт, 26 апр. 2019 г. в 06:10, Павлухин Иван :
>
> Hi Igniters,
>
> Right now I am observing that many builds cannot make progress for
> many hours so far. TC shows that the majority of agents is free [1].
> But new builds seems fail to start.
>
> Does anyone know what is it and how to resolve it? Perhaps simple
> reboot can help here. Also new codestyle check is a recent change and
> might be somehow related.
>
> Also I see problem with git checkout in some logs:
> [13:45:12] Collecting changes in 1 VCS root (7s)
> [13:45:12] [Collecting changes in 1 VCS root] VCS Root details
> [13:45:18] [Collecting changes in 1 VCS root] Compute revision for
> 'GitHub [apache/ignite]'
> [14:36:57] The build is removed from the queue to be prepared for the start
> [14:36:57] Starting the build on the agent aitc-lin12:06
> [14:36:58] Clearing temporary directory: /opt/buildagent/temp/buildTmp
> [14:36:58] Publishing internal artifacts
> [14:36:58] Clean build enabled: removing old files from
> /opt/buildagent/work/69588afcb2ab3382
> [14:36:58] Checkout directory: /opt/buildagent/work/69588afcb2ab3382
> [14:36:58] Updating sources: server side checkout (running for 15h:17m:37s)
> [14:36:58] [Updating sources] Building clean patch for VCS root:
> GitHub [apache/ignite]
> [14:39:20] [Updating sources] Failed to build patch for build #11255
> {build id=3698721, buildTypeId=IgniteTests24Java8_CacheFailover2}, VCS
> root: "GitHub [apache/ignite]" {instance id=296, parent internal
> id=77, parent id=GitHubApacheIgnite, description:
> "https://github.com/apache/ignite.git#refs/heads/master"}, due to
> error: Patch building failed:
> org.apache.catalina.connector.ClientAbortException:
> java.net.SocketTimeoutException
> [14:39:20] [Updating sources] Transferring repository sources: 36.93
> MB so far...
> [02:40:02] [Updating sources] Transferring repository sources: 36.95
> MB so far...
>
> And a reported problem in "Overview" tab:
> Error collecting changes for VCS repository '"GitHub [apache/ignite]"
> {instance id=296, parent internal id=77, parent id=GitHubApacheIgnite,
> description: "https://github.com/apache/ignite.git#refs/heads/master"}'
> jetbrains.buildServer.serverSide.db.MySQL.MySqlIncorrectStringValueException:
> Incorrect string value: '\xD0\xB4\xD0\xBC\xD0\xB8...' for column
> 'user_name' at row 1 while performing SQL query: SQL DML: insert into
> vcs_history (modification_id, user_name, description, change_date,
> vcs_root_id, version, display_version, changes_count, register_date)
> values (?, ?, ?, ?, ?, ?, ?, ?, ?) | PARAMETERS: 882649, "дмитрий
> рябов ", "fix warming up throttle test\n",
> 1551457143000, 296, "06127b70b2060dade9bd652abc71f2b949508245",
> "06127b70b2060dade9bd652abc71f2b949508245", 1, 1556247859429:
> java.sql.SQLException: Incorrect string value:
> '\xD0\xB4\xD0\xBC\xD0\xB8...' for column 'user_name' at row 1
>
> [1] 
> https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_RunAll&tab=buildTypeBranches
>
> --
> Best regards,
> Ivan Pavlukhin



-- 
Best regards,
Ivan Pavlukhin


TC build queue stall

2019-04-25 Thread Павлухин Иван
Hi Igniters,

Right now I am observing that many builds cannot make progress for
many hours so far. TC shows that the majority of agents is free [1].
But new builds seems fail to start.

Does anyone know what is it and how to resolve it? Perhaps simple
reboot can help here. Also new codestyle check is a recent change and
might be somehow related.

Also I see problem with git checkout in some logs:
[13:45:12] Collecting changes in 1 VCS root (7s)
[13:45:12] [Collecting changes in 1 VCS root] VCS Root details
[13:45:18] [Collecting changes in 1 VCS root] Compute revision for
'GitHub [apache/ignite]'
[14:36:57] The build is removed from the queue to be prepared for the start
[14:36:57] Starting the build on the agent aitc-lin12:06
[14:36:58] Clearing temporary directory: /opt/buildagent/temp/buildTmp
[14:36:58] Publishing internal artifacts
[14:36:58] Clean build enabled: removing old files from
/opt/buildagent/work/69588afcb2ab3382
[14:36:58] Checkout directory: /opt/buildagent/work/69588afcb2ab3382
[14:36:58] Updating sources: server side checkout (running for 15h:17m:37s)
[14:36:58] [Updating sources] Building clean patch for VCS root:
GitHub [apache/ignite]
[14:39:20] [Updating sources] Failed to build patch for build #11255
{build id=3698721, buildTypeId=IgniteTests24Java8_CacheFailover2}, VCS
root: "GitHub [apache/ignite]" {instance id=296, parent internal
id=77, parent id=GitHubApacheIgnite, description:
"https://github.com/apache/ignite.git#refs/heads/master"}, due to
error: Patch building failed:
org.apache.catalina.connector.ClientAbortException:
java.net.SocketTimeoutException
[14:39:20] [Updating sources] Transferring repository sources: 36.93
MB so far...
[02:40:02] [Updating sources] Transferring repository sources: 36.95
MB so far...

And a reported problem in "Overview" tab:
Error collecting changes for VCS repository '"GitHub [apache/ignite]"
{instance id=296, parent internal id=77, parent id=GitHubApacheIgnite,
description: "https://github.com/apache/ignite.git#refs/heads/master"}'
jetbrains.buildServer.serverSide.db.MySQL.MySqlIncorrectStringValueException:
Incorrect string value: '\xD0\xB4\xD0\xBC\xD0\xB8...' for column
'user_name' at row 1 while performing SQL query: SQL DML: insert into
vcs_history (modification_id, user_name, description, change_date,
vcs_root_id, version, display_version, changes_count, register_date)
values (?, ?, ?, ?, ?, ?, ?, ?, ?) | PARAMETERS: 882649, "дмитрий
рябов ", "fix warming up throttle test\n",
1551457143000, 296, "06127b70b2060dade9bd652abc71f2b949508245",
"06127b70b2060dade9bd652abc71f2b949508245", 1, 1556247859429:
java.sql.SQLException: Incorrect string value:
'\xD0\xB4\xD0\xBC\xD0\xB8...' for column 'user_name' at row 1

[1] 
https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_RunAll&tab=buildTypeBranches

-- 
Best regards,
Ivan Pavlukhin


Re: New committer: Ivan Pavlukhin

2019-04-25 Thread Alexey Zinoviev
Make Ignite Great Again

чт, 25 апр. 2019 г., 10:39 Павлухин Иван vololo...@gmail.com:

> Thank you!
>
> ср, 24 апр. 2019 г. в 11:48, Andrey Mashenkov  >:
> >
> > Congratulations, Ivan!
> >
> > On Wed, Apr 24, 2019 at 10:50 AM Dmitriy Pavlov 
> wrote:
> >
> > > Dear Apache Ignite Developers,
> > >
> > >
> > > The Project Management Committee (PMC) for Apache Ignite has invited
> Ivan
> > > Pavlukhin to become a committer and we are pleased to announce that he
> has
> > > accepted.
> > >
> > >
> > >
> > > Ivan contributes a lot into community development by being very active
> on
> > > the user and dev lists. He already implemented a lot of stuff in Apache
> > > Ignite, mostly around MVCC component, including the contribution of
> > > deadlock detector for MVCC caches which was a research and development
> > > effort.
> > >
> > >
> > >
> > > Being a committer enables easier contribution to the project since
> there is
> > > no need to go via the patch submission process. This should enable
> better
> > > productivity.
> > >
> > >
> > >
> > > Best Regards,
> > >
> > > Dmitriy Pavlov
> > >
> > > on behalf of the Apache Ignite PMC
> > >
> >
> >
> > --
> > Best regards,
> > Andrey V. Mashenkov
>
>
>
> --
> Best regards,
> Ivan Pavlukhin
>


[jira] [Created] (IGNITE-11808) Scale test execution time constant along in IgniteCacheCrossCacheTxFailoverTest

2019-04-25 Thread Ivan Pavlukhin (JIRA)
Ivan Pavlukhin created IGNITE-11808:
---

 Summary: Scale test execution time constant along in 
IgniteCacheCrossCacheTxFailoverTest
 Key: IGNITE-11808
 URL: https://issues.apache.org/jira/browse/IGNITE-11808
 Project: Ignite
  Issue Type: Bug
  Components: cache
Reporter: Ivan Pavlukhin
Assignee: Ivan Pavlukhin


Execution time of {{IgniteCacheCrossCacheTxFailoverTest}} is scaled by using 
{{ScaleFactorUtil}}. Currently an explicit constant is used to define test 
execution timeout and a magic constant is used after scaling to define a 
duration of the test. It is better to use the explicit constant throughout.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: New committer: Ivan Pavlukhin

2019-04-25 Thread Павлухин Иван
Thank you!

ср, 24 апр. 2019 г. в 11:48, Andrey Mashenkov :
>
> Congratulations, Ivan!
>
> On Wed, Apr 24, 2019 at 10:50 AM Dmitriy Pavlov  wrote:
>
> > Dear Apache Ignite Developers,
> >
> >
> > The Project Management Committee (PMC) for Apache Ignite has invited Ivan
> > Pavlukhin to become a committer and we are pleased to announce that he has
> > accepted.
> >
> >
> >
> > Ivan contributes a lot into community development by being very active on
> > the user and dev lists. He already implemented a lot of stuff in Apache
> > Ignite, mostly around MVCC component, including the contribution of
> > deadlock detector for MVCC caches which was a research and development
> > effort.
> >
> >
> >
> > Being a committer enables easier contribution to the project since there is
> > no need to go via the patch submission process. This should enable better
> > productivity.
> >
> >
> >
> > Best Regards,
> >
> > Dmitriy Pavlov
> >
> > on behalf of the Apache Ignite PMC
> >
>
>
> --
> Best regards,
> Andrey V. Mashenkov



-- 
Best regards,
Ivan Pavlukhin


Re: AI 3.0: writeSynchronizationMode re-thinking

2019-04-25 Thread Sergey Kozlov
There's another point to improve:
if  *syncPartitions=N* comes as the configurable in run-time it will allow
to manage the consistency-performance balance runtime, e.g. switch to full
async for preloading and then go to back to full sync for regular operations


On Thu, Apr 25, 2019 at 6:48 PM Sergey Kozlov  wrote:

> Vyacheskav,
>
> You're right with the referring to MongoDB doc. In general the idea is
> very similar. Many vendors use such approach (1).
>
> [1]
> https://dev.mysql.com/doc/refman/8.0/en/replication-options-master.html#sysvar_rpl_semi_sync_master_wait_for_slave_count
>
>
>
>
>
>
>
>
> On Thu, Apr 25, 2019 at 6:40 PM Vyacheslav Daradur 
> wrote:
>
>> Hi, Sergey,
>>
>> Makes sense to me in case of performance issues, but may lead to losing
>> data.
>>
>> >> *by the new option *syncPartitions=N* (not best name just for
>> referring)
>>
>> Seems similar to "Write Concern"[1] in MongoDB. It is used in the same
>> way as you described.
>>
>> On the other hand, if you have such issues it should be investigated
>> first: why it causes performance drops: network issues etc.
>>
>> [1] https://docs.mongodb.com/manual/reference/write-concern/
>>
>> On Thu, Apr 25, 2019 at 6:24 PM Sergey Kozlov 
>> wrote:
>> >
>> > Ilya
>> >
>> > See comments inline.
>> > On Thu, Apr 25, 2019 at 5:11 PM Ilya Kasnacheev <
>> ilya.kasnach...@gmail.com>
>> > wrote:
>> >
>> > > Hello!
>> > >
>> > > When you have 2 backups and N = 1, how will conflicts be resolved?
>> > >
>> >
>> > > Imagine that you had N = 1, and primary node failed immediately after
>> > > operation. Now you have one backup that was updated synchronously and
>> one
>> > > which did not. Will they stay unsynced, or is there any mechanism of
>> > > re-syncing?
>> > >
>> >
>> > Same way as Ignite processes the failures for PRIMARY_SYNC.
>> >
>> >
>> > >
>> > > Why would one want to "update for 1 primary and 1 backup
>> synchronously,
>> > > update the rest of backup partitions asynchronously"? What's the use
>> case?
>> > >
>> >
>> > The case to have more backups but do not pay the performance penalty for
>> > that :)
>> > For the distributed systems one backup looks like risky. But more
>> backups
>> > directly impacts to performance.
>> > Other point is to split the strict consistent apps like bank apps and
>> the
>> > other apps like fraud detection, analytics, reports and so on.
>> > In that case you can configure partitions distribution by a custom
>> affinity
>> > and have following:
>> >  - first set of nodes for critical (from consistency point) operations
>> >  - second set of nodes have async backup partitions only for other
>> > operations (reports, analytics)
>> >
>> >
>> >
>> > >
>> > > Regards,
>> > > --
>> > > Ilya Kasnacheev
>> > >
>> > >
>> > > чт, 25 апр. 2019 г. в 16:55, Sergey Kozlov :
>> > >
>> > > > Igniters
>> > > >
>> > > > I'm working with the wide range of cache configurations and found
>> (from
>> > > my
>> > > > standpoint) the interesting point for the discussion:
>> > > >
>> > > > Now we have following *writeSynchronizationMode *options:
>> > > >
>> > > >1. *FULL_ASYNC*
>> > > >   -  primary partition updated asynchronously
>> > > >   -  backup partitions updated asynchronously
>> > > >2. *PRIMARY_SYNC*
>> > > >   - primary partition updated synchronously
>> > > >   - backup partitions updated asynchronously
>> > > >3. *FULL_SYNC*
>> > > >   - primary partition updated synchronously
>> > > >   - backup partitions updated synchronously
>> > > >
>> > > > The approach above is covering everything if you've 0 or 1 backup.
>> > > > But for 2 or more backups we can't reach the following case
>> (something
>> > > > between *PRIMARY_SYNC *and *FULL_SYNC*):
>> > > >  - update for 1 primary and 1 backup synchronously
>> > > >  - update the rest of backup partitions asynchronously
>> > > >
>> > > > The idea is to join all current modes into single one and replace
>> > > > *writeSynchronizationMode
>> > > > *by the new option *syncPartitions=N* (not best name just for
>> referring)
>> > > > covers the approach:
>> > > >
>> > > >- N = 0 means *FULL_ASYNC*
>> > > >- N = (backups+1) means *FULL_SYNC*
>> > > >- 0 < N < (backups+1) means either *PRIMARY_SYNC *(N=1) or new
>> mode
>> > > >described above
>> > > >
>> > > > IMO it will allow to make more flexible and consistent
>> configurations
>> > > >
>> > > > --
>> > > > Sergey Kozlov
>> > > > GridGain Systems
>> > > > www.gridgain.com
>> > > >
>> > >
>> >
>> >
>> > --
>> > Sergey Kozlov
>> > GridGain Systems
>> > www.gridgain.com
>>
>>
>>
>> --
>> Best Regards, Vyacheslav D.
>>
>
>
> --
> Sergey Kozlov
> GridGain Systems
> www.gridgain.com
>


-- 
Sergey Kozlov
GridGain Systems
www.gridgain.com


Re: AI 3.0: writeSynchronizationMode re-thinking

2019-04-25 Thread Sergey Kozlov
Vyacheskav,

You're right with the referring to MongoDB doc. In general the idea is very
similar. Many vendors use such approach (1).

[1]
https://dev.mysql.com/doc/refman/8.0/en/replication-options-master.html#sysvar_rpl_semi_sync_master_wait_for_slave_count








On Thu, Apr 25, 2019 at 6:40 PM Vyacheslav Daradur 
wrote:

> Hi, Sergey,
>
> Makes sense to me in case of performance issues, but may lead to losing
> data.
>
> >> *by the new option *syncPartitions=N* (not best name just for referring)
>
> Seems similar to "Write Concern"[1] in MongoDB. It is used in the same
> way as you described.
>
> On the other hand, if you have such issues it should be investigated
> first: why it causes performance drops: network issues etc.
>
> [1] https://docs.mongodb.com/manual/reference/write-concern/
>
> On Thu, Apr 25, 2019 at 6:24 PM Sergey Kozlov 
> wrote:
> >
> > Ilya
> >
> > See comments inline.
> > On Thu, Apr 25, 2019 at 5:11 PM Ilya Kasnacheev <
> ilya.kasnach...@gmail.com>
> > wrote:
> >
> > > Hello!
> > >
> > > When you have 2 backups and N = 1, how will conflicts be resolved?
> > >
> >
> > > Imagine that you had N = 1, and primary node failed immediately after
> > > operation. Now you have one backup that was updated synchronously and
> one
> > > which did not. Will they stay unsynced, or is there any mechanism of
> > > re-syncing?
> > >
> >
> > Same way as Ignite processes the failures for PRIMARY_SYNC.
> >
> >
> > >
> > > Why would one want to "update for 1 primary and 1 backup synchronously,
> > > update the rest of backup partitions asynchronously"? What's the use
> case?
> > >
> >
> > The case to have more backups but do not pay the performance penalty for
> > that :)
> > For the distributed systems one backup looks like risky. But more backups
> > directly impacts to performance.
> > Other point is to split the strict consistent apps like bank apps and the
> > other apps like fraud detection, analytics, reports and so on.
> > In that case you can configure partitions distribution by a custom
> affinity
> > and have following:
> >  - first set of nodes for critical (from consistency point) operations
> >  - second set of nodes have async backup partitions only for other
> > operations (reports, analytics)
> >
> >
> >
> > >
> > > Regards,
> > > --
> > > Ilya Kasnacheev
> > >
> > >
> > > чт, 25 апр. 2019 г. в 16:55, Sergey Kozlov :
> > >
> > > > Igniters
> > > >
> > > > I'm working with the wide range of cache configurations and found
> (from
> > > my
> > > > standpoint) the interesting point for the discussion:
> > > >
> > > > Now we have following *writeSynchronizationMode *options:
> > > >
> > > >1. *FULL_ASYNC*
> > > >   -  primary partition updated asynchronously
> > > >   -  backup partitions updated asynchronously
> > > >2. *PRIMARY_SYNC*
> > > >   - primary partition updated synchronously
> > > >   - backup partitions updated asynchronously
> > > >3. *FULL_SYNC*
> > > >   - primary partition updated synchronously
> > > >   - backup partitions updated synchronously
> > > >
> > > > The approach above is covering everything if you've 0 or 1 backup.
> > > > But for 2 or more backups we can't reach the following case
> (something
> > > > between *PRIMARY_SYNC *and *FULL_SYNC*):
> > > >  - update for 1 primary and 1 backup synchronously
> > > >  - update the rest of backup partitions asynchronously
> > > >
> > > > The idea is to join all current modes into single one and replace
> > > > *writeSynchronizationMode
> > > > *by the new option *syncPartitions=N* (not best name just for
> referring)
> > > > covers the approach:
> > > >
> > > >- N = 0 means *FULL_ASYNC*
> > > >- N = (backups+1) means *FULL_SYNC*
> > > >- 0 < N < (backups+1) means either *PRIMARY_SYNC *(N=1) or new
> mode
> > > >described above
> > > >
> > > > IMO it will allow to make more flexible and consistent configurations
> > > >
> > > > --
> > > > Sergey Kozlov
> > > > GridGain Systems
> > > > www.gridgain.com
> > > >
> > >
> >
> >
> > --
> > Sergey Kozlov
> > GridGain Systems
> > www.gridgain.com
>
>
>
> --
> Best Regards, Vyacheslav D.
>


-- 
Sergey Kozlov
GridGain Systems
www.gridgain.com


[jira] [Created] (IGNITE-11807) Index validation control.sh command may provide false-positive error results

2019-04-25 Thread Ivan Rakov (JIRA)
Ivan Rakov created IGNITE-11807:
---

 Summary: Index validation control.sh command may provide 
false-positive error results
 Key: IGNITE-11807
 URL: https://issues.apache.org/jira/browse/IGNITE-11807
 Project: Ignite
  Issue Type: Bug
Reporter: Ivan Rakov
 Fix For: 2.8


There are two possible issues in validate_indexes command:
1. In case index validation is performed under load, there's a chance that 
we'll fetch link from B+ tree and won't found this key in partition cache data 
store as per it was conurrently removed.
We may work it around by double-checking partition update counters (before and 
after indexes validation procedure).
2. Since indexes validation is subscribed to checkpoint start (reason: we 
perform CRC validation of file page store pages which is sensitive to 
concurrent disk page writes), we may bump into the following situation:
- User fairly stops all load
- A few moments later users triggers validate_indexes
- Checkpoint starts due to timeout, pages that were modified before 
validate_indexes start are being written to the disk
- validate_indexes fails
We may work it around by triggering checkpoint forcibly before start of indexes 
validation activities.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: AI 3.0: writeSynchronizationMode re-thinking

2019-04-25 Thread Vyacheslav Daradur
Hi, Sergey,

Makes sense to me in case of performance issues, but may lead to losing data.

>> *by the new option *syncPartitions=N* (not best name just for referring)

Seems similar to "Write Concern"[1] in MongoDB. It is used in the same
way as you described.

On the other hand, if you have such issues it should be investigated
first: why it causes performance drops: network issues etc.

[1] https://docs.mongodb.com/manual/reference/write-concern/

On Thu, Apr 25, 2019 at 6:24 PM Sergey Kozlov  wrote:
>
> Ilya
>
> See comments inline.
> On Thu, Apr 25, 2019 at 5:11 PM Ilya Kasnacheev 
> wrote:
>
> > Hello!
> >
> > When you have 2 backups and N = 1, how will conflicts be resolved?
> >
>
> > Imagine that you had N = 1, and primary node failed immediately after
> > operation. Now you have one backup that was updated synchronously and one
> > which did not. Will they stay unsynced, or is there any mechanism of
> > re-syncing?
> >
>
> Same way as Ignite processes the failures for PRIMARY_SYNC.
>
>
> >
> > Why would one want to "update for 1 primary and 1 backup synchronously,
> > update the rest of backup partitions asynchronously"? What's the use case?
> >
>
> The case to have more backups but do not pay the performance penalty for
> that :)
> For the distributed systems one backup looks like risky. But more backups
> directly impacts to performance.
> Other point is to split the strict consistent apps like bank apps and the
> other apps like fraud detection, analytics, reports and so on.
> In that case you can configure partitions distribution by a custom affinity
> and have following:
>  - first set of nodes for critical (from consistency point) operations
>  - second set of nodes have async backup partitions only for other
> operations (reports, analytics)
>
>
>
> >
> > Regards,
> > --
> > Ilya Kasnacheev
> >
> >
> > чт, 25 апр. 2019 г. в 16:55, Sergey Kozlov :
> >
> > > Igniters
> > >
> > > I'm working with the wide range of cache configurations and found (from
> > my
> > > standpoint) the interesting point for the discussion:
> > >
> > > Now we have following *writeSynchronizationMode *options:
> > >
> > >1. *FULL_ASYNC*
> > >   -  primary partition updated asynchronously
> > >   -  backup partitions updated asynchronously
> > >2. *PRIMARY_SYNC*
> > >   - primary partition updated synchronously
> > >   - backup partitions updated asynchronously
> > >3. *FULL_SYNC*
> > >   - primary partition updated synchronously
> > >   - backup partitions updated synchronously
> > >
> > > The approach above is covering everything if you've 0 or 1 backup.
> > > But for 2 or more backups we can't reach the following case (something
> > > between *PRIMARY_SYNC *and *FULL_SYNC*):
> > >  - update for 1 primary and 1 backup synchronously
> > >  - update the rest of backup partitions asynchronously
> > >
> > > The idea is to join all current modes into single one and replace
> > > *writeSynchronizationMode
> > > *by the new option *syncPartitions=N* (not best name just for referring)
> > > covers the approach:
> > >
> > >- N = 0 means *FULL_ASYNC*
> > >- N = (backups+1) means *FULL_SYNC*
> > >- 0 < N < (backups+1) means either *PRIMARY_SYNC *(N=1) or new mode
> > >described above
> > >
> > > IMO it will allow to make more flexible and consistent configurations
> > >
> > > --
> > > Sergey Kozlov
> > > GridGain Systems
> > > www.gridgain.com
> > >
> >
>
>
> --
> Sergey Kozlov
> GridGain Systems
> www.gridgain.com



-- 
Best Regards, Vyacheslav D.


Re: AI 3.0: writeSynchronizationMode re-thinking

2019-04-25 Thread Sergey Kozlov
Ilya

See comments inline.
On Thu, Apr 25, 2019 at 5:11 PM Ilya Kasnacheev 
wrote:

> Hello!
>
> When you have 2 backups and N = 1, how will conflicts be resolved?
>

> Imagine that you had N = 1, and primary node failed immediately after
> operation. Now you have one backup that was updated synchronously and one
> which did not. Will they stay unsynced, or is there any mechanism of
> re-syncing?
>

Same way as Ignite processes the failures for PRIMARY_SYNC.


>
> Why would one want to "update for 1 primary and 1 backup synchronously,
> update the rest of backup partitions asynchronously"? What's the use case?
>

The case to have more backups but do not pay the performance penalty for
that :)
For the distributed systems one backup looks like risky. But more backups
directly impacts to performance.
Other point is to split the strict consistent apps like bank apps and the
other apps like fraud detection, analytics, reports and so on.
In that case you can configure partitions distribution by a custom affinity
and have following:
 - first set of nodes for critical (from consistency point) operations
 - second set of nodes have async backup partitions only for other
operations (reports, analytics)



>
> Regards,
> --
> Ilya Kasnacheev
>
>
> чт, 25 апр. 2019 г. в 16:55, Sergey Kozlov :
>
> > Igniters
> >
> > I'm working with the wide range of cache configurations and found (from
> my
> > standpoint) the interesting point for the discussion:
> >
> > Now we have following *writeSynchronizationMode *options:
> >
> >1. *FULL_ASYNC*
> >   -  primary partition updated asynchronously
> >   -  backup partitions updated asynchronously
> >2. *PRIMARY_SYNC*
> >   - primary partition updated synchronously
> >   - backup partitions updated asynchronously
> >3. *FULL_SYNC*
> >   - primary partition updated synchronously
> >   - backup partitions updated synchronously
> >
> > The approach above is covering everything if you've 0 or 1 backup.
> > But for 2 or more backups we can't reach the following case (something
> > between *PRIMARY_SYNC *and *FULL_SYNC*):
> >  - update for 1 primary and 1 backup synchronously
> >  - update the rest of backup partitions asynchronously
> >
> > The idea is to join all current modes into single one and replace
> > *writeSynchronizationMode
> > *by the new option *syncPartitions=N* (not best name just for referring)
> > covers the approach:
> >
> >- N = 0 means *FULL_ASYNC*
> >- N = (backups+1) means *FULL_SYNC*
> >- 0 < N < (backups+1) means either *PRIMARY_SYNC *(N=1) or new mode
> >described above
> >
> > IMO it will allow to make more flexible and consistent configurations
> >
> > --
> > Sergey Kozlov
> > GridGain Systems
> > www.gridgain.com
> >
>


-- 
Sergey Kozlov
GridGain Systems
www.gridgain.com


Re: AI 3.0: writeSynchronizationMode re-thinking

2019-04-25 Thread Anton Vinogradov
Sergey,

+1 to your proposal.
Looks pretty similar to Kafka's approach.

Ilya,

>>  Will they stay unsynced, or is there any mechanism of re-syncing?
Yes, I'm currently working [1] on it.
The current implementation allows restoring the latest value.

[1] https://issues.apache.org/jira/browse/IGNITE-10663

On Thu, Apr 25, 2019 at 5:11 PM Ilya Kasnacheev 
wrote:

> Hello!
>
> When you have 2 backups and N = 1, how will conflicts be resolved?
>
> Imagine that you had N = 1, and primary node failed immediately after
> operation. Now you have one backup that was updated synchronously and one
> which did not. Will they stay unsynced, or is there any mechanism of
> re-syncing?
>
> Why would one want to "update for 1 primary and 1 backup synchronously,
> update the rest of backup partitions asynchronously"? What's the use case?
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> чт, 25 апр. 2019 г. в 16:55, Sergey Kozlov :
>
> > Igniters
> >
> > I'm working with the wide range of cache configurations and found (from
> my
> > standpoint) the interesting point for the discussion:
> >
> > Now we have following *writeSynchronizationMode *options:
> >
> >1. *FULL_ASYNC*
> >   -  primary partition updated asynchronously
> >   -  backup partitions updated asynchronously
> >2. *PRIMARY_SYNC*
> >   - primary partition updated synchronously
> >   - backup partitions updated asynchronously
> >3. *FULL_SYNC*
> >   - primary partition updated synchronously
> >   - backup partitions updated synchronously
> >
> > The approach above is covering everything if you've 0 or 1 backup.
> > But for 2 or more backups we can't reach the following case (something
> > between *PRIMARY_SYNC *and *FULL_SYNC*):
> >  - update for 1 primary and 1 backup synchronously
> >  - update the rest of backup partitions asynchronously
> >
> > The idea is to join all current modes into single one and replace
> > *writeSynchronizationMode
> > *by the new option *syncPartitions=N* (not best name just for referring)
> > covers the approach:
> >
> >- N = 0 means *FULL_ASYNC*
> >- N = (backups+1) means *FULL_SYNC*
> >- 0 < N < (backups+1) means either *PRIMARY_SYNC *(N=1) or new mode
> >described above
> >
> > IMO it will allow to make more flexible and consistent configurations
> >
> > --
> > Sergey Kozlov
> > GridGain Systems
> > www.gridgain.com
> >
>


Re: AI 3.0: writeSynchronizationMode re-thinking

2019-04-25 Thread Ilya Kasnacheev
Hello!

When you have 2 backups and N = 1, how will conflicts be resolved?

Imagine that you had N = 1, and primary node failed immediately after
operation. Now you have one backup that was updated synchronously and one
which did not. Will they stay unsynced, or is there any mechanism of
re-syncing?

Why would one want to "update for 1 primary and 1 backup synchronously,
update the rest of backup partitions asynchronously"? What's the use case?

Regards,
-- 
Ilya Kasnacheev


чт, 25 апр. 2019 г. в 16:55, Sergey Kozlov :

> Igniters
>
> I'm working with the wide range of cache configurations and found (from my
> standpoint) the interesting point for the discussion:
>
> Now we have following *writeSynchronizationMode *options:
>
>1. *FULL_ASYNC*
>   -  primary partition updated asynchronously
>   -  backup partitions updated asynchronously
>2. *PRIMARY_SYNC*
>   - primary partition updated synchronously
>   - backup partitions updated asynchronously
>3. *FULL_SYNC*
>   - primary partition updated synchronously
>   - backup partitions updated synchronously
>
> The approach above is covering everything if you've 0 or 1 backup.
> But for 2 or more backups we can't reach the following case (something
> between *PRIMARY_SYNC *and *FULL_SYNC*):
>  - update for 1 primary and 1 backup synchronously
>  - update the rest of backup partitions asynchronously
>
> The idea is to join all current modes into single one and replace
> *writeSynchronizationMode
> *by the new option *syncPartitions=N* (not best name just for referring)
> covers the approach:
>
>- N = 0 means *FULL_ASYNC*
>- N = (backups+1) means *FULL_SYNC*
>- 0 < N < (backups+1) means either *PRIMARY_SYNC *(N=1) or new mode
>described above
>
> IMO it will allow to make more flexible and consistent configurations
>
> --
> Sergey Kozlov
> GridGain Systems
> www.gridgain.com
>


AI 3.0: writeSynchronizationMode re-thinking

2019-04-25 Thread Sergey Kozlov
Igniters

I'm working with the wide range of cache configurations and found (from my
standpoint) the interesting point for the discussion:

Now we have following *writeSynchronizationMode *options:

   1. *FULL_ASYNC*
  -  primary partition updated asynchronously
  -  backup partitions updated asynchronously
   2. *PRIMARY_SYNC*
  - primary partition updated synchronously
  - backup partitions updated asynchronously
   3. *FULL_SYNC*
  - primary partition updated synchronously
  - backup partitions updated synchronously

The approach above is covering everything if you've 0 or 1 backup.
But for 2 or more backups we can't reach the following case (something
between *PRIMARY_SYNC *and *FULL_SYNC*):
 - update for 1 primary and 1 backup synchronously
 - update the rest of backup partitions asynchronously

The idea is to join all current modes into single one and replace
*writeSynchronizationMode
*by the new option *syncPartitions=N* (not best name just for referring)
covers the approach:

   - N = 0 means *FULL_ASYNC*
   - N = (backups+1) means *FULL_SYNC*
   - 0 < N < (backups+1) means either *PRIMARY_SYNC *(N=1) or new mode
   described above

IMO it will allow to make more flexible and consistent configurations

-- 
Sergey Kozlov
GridGain Systems
www.gridgain.com


Re: Consistency check and fix (review request)

2019-04-25 Thread Anton Vinogradov
Folks,

Just an update.
According to all your tips I decided to refactor API, logic, and approach
(mostly everything :)),
so, currently refactoring is in progress and you may see inconsistent PR
state.

Thanks to everyone involved for your tips, review and etc.
I'll provide a proper presentation once refactoring will be finished.

On Tue, Apr 16, 2019 at 2:20 PM Anton Vinogradov  wrote:

> Nikolay, that was the first approach
>
> >> I think we should allow to the administrator to enable/disable
> Consistency check.
> In that case, we have to introduce cluster-wide change-strategy operation,
> since every client node should be aware of the change.
> Also, we have to specify caches list, and for each - should we check each
> request or only 5-th and so on.
> Procedure and configuration become overcomplicated in this case.
>
> My idea that specific service will be able to use a special proxy
> according to its own strategy
> (eg. when administrator inside the building and boss is sleeping - all
> operations on "cache[a,b,c]ed*" should check the consistency).
> All service clients will have the same guarantees in that case.
>
> So in other words, consistency should be guaranteed by service, not by
> Ignite.
> Service should guarantee consistency not only using new proxy but, for
> example, using correct isolation fo txs.
> That's not a good Idea to specify isolation mode for Ignite, same
> situation with get-with-consistency-check.
>
> On Tue, Apr 16, 2019 at 12:56 PM Nikolay Izhikov 
> wrote:
>
>> Hello, Anton.
>>
>> > Customer should be able to change strategy on the fly according to
>> time> periods or load.
>>
>> I think we should allow to administrator to enable/disable Consistency
>> check.
>> This option shouldn't be related to application code because "Consistency
>> check" is some kind of maintance procedure.
>>
>> What do you think?
>>
>> В Вт, 16/04/2019 в 12:47 +0300, Anton Vinogradov пишет:
>> > Andrey, thanks for tips
>> >
>> > > > You can perform consistency check using idle verify utility.
>> >
>> > Could you please point to utility's page?
>> > According to its name, it requires to stop the cluster to perform the
>> check?
>> > That's impossible at real production when you should have downtime less
>> > that some minutes per year.
>> > So, the only case I see is to use online check during periods of
>> moderate
>> > activity.
>> >
>> > > > Recovery tool is good idea
>> >
>> > This tool is a part of my IEP.
>> > But recovery tool (process)
>> > - will allow you to check entries in memory only (otherwise, you will
>> warm
>> > up the cluster incorrectly), and that's a problem when you have
>> > persisted/in_memory rate > 10:1
>> > - will cause latency drop for some (eg. 90+ percentile) requests, which
>> is
>> > not acceptable for real production, when we have strict SLA.
>> > - will not guarantee that each operation will use consistent data,
>> > sometimes it's extremely essential
>> > so, the process is a cool idea, but, sometime you may need more.
>> >
>> > Ivan, thanks for analysis
>> >
>> > > > why it comes as an on-demand enabled proxy but not as a mode
>> enabled by
>> >
>> > some configuration property
>> > It's a bad idea to have this feature permanently enabled, it slows down
>> the
>> > system by design.
>> > Customer should be able to change strategy on the fly according to time
>> > periods or load.
>> > Also, we're going to use this proxy for odd requests or for every 5-th,
>> > 10-th, 100-th request depends on the load/time/SLA/etc.
>> > The goal is to perform as much as possible gets-with-consistency
>> operations
>> > without stopping the cluster and never find a problem :)
>> >
>> > > > As for me it will be great if we can improve consistency guarantees
>> >
>> > provided by default.
>> > Once you checked backups you decreased throughput and increased latency.
>> > This feature requred only for some financial, nuclear, health systems
>> when
>> > you should be additionally sure about consistency.
>> > It's like a
>> > - read from backups
>> > - data modification outside the transaction
>> > - using FULL_ASYNC instead of FULL_SYNC,
>> > sometimes it's possible, sometimes not.
>> >
>> > > > 1. It sounds suspicious that reads can cause writes (unexpected
>> >
>> > deadlocks might be possible).
>> > Code performs writes
>> > - key per additional transaction in case original tx was OPTIMISTIC ||
>> > READ_COMMITTED,
>> > - all keys per same tx in case original tx was PESSIMISTIC &&
>> > !READ_COMMITTED, since you already obtain the locks,
>> > so, deadlock should be impossible.
>> >
>> > > > 2. I do not believe that it is possible to implement a (bugless?)
>> >
>> > feature which will fix other bugs.
>> > It does not fix the bugs, it looks for inconsistency (no matter how it
>> > happened) and reports using events (previous state and how it was
>> fixed).
>> > This allows continuing processing for all the entries, even
>> inconsistent.
>> > But, each such fix should be rechecked man

Re: Brainstorm: Make TC Run All faster

2019-04-25 Thread Vyacheslav Daradur
Hi Igniters,

At the moment we have several separated test suites:
* ~Build Apache Ignite~ _ ~10..20mins
* [Javadocs] _ ~10mins
* [Licenses Headers] _ ~1min
* [Check Code Style] _ ~7min
The most time of each build (except Licenses Headers) is taken by
dependency resolving.

Their main goal is a check that the project is built properly.

Also, profiles of [Javadocs], [Licenses Headers] uses at the step of
preparing release (see DEVNOTES.txt) that means they are important.

I'd suggest uniting the builds, this should reduce the time of tests
on ~15 minutes and releases agents.

What do you think?

On Tue, Nov 27, 2018 at 3:56 PM Павлухин Иван  wrote:
>
> Roman,
>
> Do you have some expectations how faster "correlated" tests
> elimination will make Run All? Also do you have a vision how can we
> determine such "correlated" tests, can we do it relatively fast?
>
> But all in all, I am not sure that reducing a group of correlated
> tests to only one test can show good stability.
> пн, 26 нояб. 2018 г. в 17:48, aplatonov :
> >
> > It should be noticed that additional parameter TEST_SCALE_FACTOR was added.
> > This parameter with ScaleFactorUtil methods can be used for test size
> > scaling for different runs (like ordinary and nightly RunALLs). If someone
> > want to distinguish these builds he/she can apply scaling methods from
> > ScaleFactorUtil in own tests. For nightly test TEST_SCALE_FACTOR=1.0, for
> > non-nightly builds TEST_SCALE_FACTOR<1.0. For example in
> > GridAbstractCacheInterceptorRebalanceTest test ScaleFactorUtil was used for
> > scaling count of iterations. I guess that TEST_SCALE_FACTOR support will be
> > added to runs at the same time with RunALL (nightly) runs.
> >
> >
> >
> > --
> > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
>
>
>
> --
> Best regards,
> Ivan Pavlukhin



-- 
Best Regards, Vyacheslav D.