date:20230814

Yury Gerzhedovich created IGNITE-20202:
--

 Summary: Base metrics for SQL thread pools
 Key: IGNITE-20202
 URL: https://issues.apache.org/jira/browse/IGNITE-20202
 Project: Ignite
  Issue Type: Improvement
  Components: sql
Reporter: Yury Gerzhedovich


Let's introduce queue size for SQL execution and planning thread pools.
Type is a Gauge. Suggested names are
* {code:java}
sql.plan.QueueSize
{code}
* {code:java}
sql.execution.thread..QueueSize
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-20202) Base metrics for SQL thread pools



 [ 
https://issues.apache.org/jira/browse/IGNITE-20202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yury Gerzhedovich updated IGNITE-20202:
---
Epic Link: IGNITE-17353

> Base metrics for SQL thread pools
> -
>
> Key: IGNITE-20202
> URL: https://issues.apache.org/jira/browse/IGNITE-20202
> Project: Ignite
>  Issue Type: Improvement
>  Components: sql
>Reporter: Yury Gerzhedovich
>Priority: Major
>  Labels: ignite-3
>
> Let's introduce queue size for SQL execution and planning thread pools.
> Type is a Gauge. Suggested names are
> * {code:java}
> sql.plan.QueueSize
> {code}
> * {code:java}
> sql.execution.thread..QueueSize
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-19009) Introduce file transfer support in messaging



 [ 
https://issues.apache.org/jira/browse/IGNITE-19009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Gagarkin updated IGNITE-19009:
---
Attachment: file_transfer.drawio.png

> Introduce file transfer support in messaging
> 
>
> Key: IGNITE-19009
> URL: https://issues.apache.org/jira/browse/IGNITE-19009
> Project: Ignite
>  Issue Type: Improvement
>  Components: networking
>Reporter: Mikhail Pochatkin
>Assignee: Ivan Gagarkin
>Priority: Major
>  Labels: iep-103, ignite-3
> Attachments: file_transfer.drawio.png
>
>  Time Spent: 10h 40m
>  Remaining Estimate: 0h
>
> Current implemenatation of Network force to load deployment unit content to 
> heap as byte[]. This is a potential easy way of OOM.
>  
> As solution we need to introduce lazy buffer in Network code where file will 
> readed by chunks. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IGNITE-20203) File transfer for Ignite 3

Ivan Gagarkin created IGNITE-20203:
--

 Summary: File transfer for Ignite 3
 Key: IGNITE-20203
 URL: https://issues.apache.org/jira/browse/IGNITE-20203
 Project: Ignite
  Issue Type: Epic
Reporter: Ivan Gagarkin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IGNITE-20204) Use FileTransferService for transfer deployment units between nodes

Ivan Gagarkin created IGNITE-20204:
--

 Summary: Use FileTransferService for transfer deployment units 
between nodes
 Key: IGNITE-20204
 URL: https://issues.apache.org/jira/browse/IGNITE-20204
 Project: Ignite
  Issue Type: Improvement
  Components: compute
Reporter: Ivan Gagarkin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (IGNITE-19889) Implement observable timestamp on server

2023-08-14 Thread Vladislav Pyatkov (Jira)



 [ 
https://issues.apache.org/jira/browse/IGNITE-19889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Pyatkov reassigned IGNITE-19889:
--

Assignee: Vladislav Pyatkov

> Implement observable timestamp on server
> 
>
> Key: IGNITE-19889
> URL: https://issues.apache.org/jira/browse/IGNITE-19889
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Assignee: Vladislav Pyatkov
>Priority: Major
>  Labels: ignite-3
>
> *Motivation*
> Client timestamp is used to determine a read timestamp for RO transaction on 
> client-side (IGNITE-19888). For consistency behavior, need to implement a 
> similar timestamp on server.
> *Implementation note*
> The last server observable timestamp should update at least when the 
> transaction commuted.
> Any RO transaction should use the timestamp: for SQL (IGNITE-19898) and 
> through key-value API (IGNITE-19887)
> *Definition of done*
> All serve-side created RO transactions should execute in past with timestamp 
> has been determining by last observation time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-20204) Use FileTransferService for transfer deployment units between nodes



 [ 
https://issues.apache.org/jira/browse/IGNITE-20204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Gagarkin updated IGNITE-20204:
---
Description: 
The current implementation of Network force to load deployment unit content to 
the heap as byte[]. This is a potentially easy way of OOM.

 

  was:
Current implementation of Network force to load deployment unit content to the 
heap as byte[]. This is a potentially easy way of OOM.

 


> Use FileTransferService for transfer deployment units between nodes
> ---
>
> Key: IGNITE-20204
> URL: https://issues.apache.org/jira/browse/IGNITE-20204
> Project: Ignite
>  Issue Type: Improvement
>  Components: compute
>Reporter: Ivan Gagarkin
>Priority: Major
>  Labels: ignite-3
>
> The current implementation of Network force to load deployment unit content 
> to the heap as byte[]. This is a potentially easy way of OOM.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-20204) Use FileTransferService for transfer deployment units between nodes



 [ 
https://issues.apache.org/jira/browse/IGNITE-20204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Gagarkin updated IGNITE-20204:
---
Description: 
Current implementation of Network force to load deployment unit content to the 
heap as byte[]. This is a potentially easy way of OOM.

 

  was:
Current implemenatation of Network force to load deployment unit content to 
heap as byte[]. This is a potential easy way of OOM.

 


> Use FileTransferService for transfer deployment units between nodes
> ---
>
> Key: IGNITE-20204
> URL: https://issues.apache.org/jira/browse/IGNITE-20204
> Project: Ignite
>  Issue Type: Improvement
>  Components: compute
>Reporter: Ivan Gagarkin
>Priority: Major
>  Labels: ignite-3
>
> Current implementation of Network force to load deployment unit content to 
> the heap as byte[]. This is a potentially easy way of OOM.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-20204) Use FileTransferService for transfer deployment units between nodes



 [ 
https://issues.apache.org/jira/browse/IGNITE-20204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Gagarkin updated IGNITE-20204:
---
Description: 
The current implementation of Network forces to load deployment unit content to 
the heap as byte[]. This is a potentially easy way of OOM.

 

  was:
The current implementation of Network force to load deployment unit content to 
the heap as byte[]. This is a potentially easy way of OOM.

 


> Use FileTransferService for transfer deployment units between nodes
> ---
>
> Key: IGNITE-20204
> URL: https://issues.apache.org/jira/browse/IGNITE-20204
> Project: Ignite
>  Issue Type: Improvement
>  Components: compute
>Reporter: Ivan Gagarkin
>Priority: Major
>  Labels: ignite-3
>
> The current implementation of Network forces to load deployment unit content 
> to the heap as byte[]. This is a potentially easy way of OOM.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-20204) Use FileTransferService for transfer deployment units between nodes



 [ 
https://issues.apache.org/jira/browse/IGNITE-20204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Gagarkin updated IGNITE-20204:
---
Description: 
Current implemenatation of Network force to load deployment unit content to 
heap as byte[]. This is a potential easy way of OOM.

 

> Use FileTransferService for transfer deployment units between nodes
> ---
>
> Key: IGNITE-20204
> URL: https://issues.apache.org/jira/browse/IGNITE-20204
> Project: Ignite
>  Issue Type: Improvement
>  Components: compute
>Reporter: Ivan Gagarkin
>Priority: Major
>  Labels: ignite-3
>
> Current implemenatation of Network force to load deployment unit content to 
> heap as byte[]. This is a potential easy way of OOM.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-20204) Use FileTransferService for transfer deployment units between nodes



 [ 
https://issues.apache.org/jira/browse/IGNITE-20204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Gagarkin updated IGNITE-20204:
---
Description: 
Currently, Ignite 3 loads deployment units content to the heap as byte[] when 
transfer units 

The current implementation of Network forces to load deployment unit content to 
the heap as byte[]. This is a potentially easy way of OOM.

 

  was:
The current implementation of Network forces to load deployment unit content to 
the heap as byte[]. This is a potentially easy way of OOM.

 


> Use FileTransferService for transfer deployment units between nodes
> ---
>
> Key: IGNITE-20204
> URL: https://issues.apache.org/jira/browse/IGNITE-20204
> Project: Ignite
>  Issue Type: Improvement
>  Components: compute
>Reporter: Ivan Gagarkin
>Priority: Major
>  Labels: ignite-3
>
> Currently, Ignite 3 loads deployment units content to the heap as byte[] when 
> transfer units 
> The current implementation of Network forces to load deployment unit content 
> to the heap as byte[]. This is a potentially easy way of OOM.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-20204) Use FileTransferService for transfer deployment units between nodes



 [ 
https://issues.apache.org/jira/browse/IGNITE-20204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Gagarkin updated IGNITE-20204:
---
Description: 
Currently, Ignite 3 loads deployment units content to the heap as byte[] when 
transferring units between nodes. This is a potentially easy way of OOM.

We should replace deployment transferring mechanism by 

 

  was:
Currently, Ignite 3 loads deployment units content to the heap as byte[] when 
transfer units 

The current implementation of Network forces to load deployment unit content to 
the heap as byte[]. This is a potentially easy way of OOM.

 


> Use FileTransferService for transfer deployment units between nodes
> ---
>
> Key: IGNITE-20204
> URL: https://issues.apache.org/jira/browse/IGNITE-20204
> Project: Ignite
>  Issue Type: Improvement
>  Components: compute
>Reporter: Ivan Gagarkin
>Priority: Major
>  Labels: ignite-3
>
> Currently, Ignite 3 loads deployment units content to the heap as byte[] when 
> transferring units between nodes. This is a potentially easy way of OOM.
> We should replace deployment transferring mechanism by 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-20204) Use FileTransferService for transfer deployment units between nodes



 [ 
https://issues.apache.org/jira/browse/IGNITE-20204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Gagarkin updated IGNITE-20204:
---
Description: 
Currently, Ignite 3 loads deployment units content to the heap as byte[] when 
transferring units between nodes. This is a potentially easy way of OOM.

We should replace deployment transferring mechanism by FileTransferService.

 

  was:
Currently, Ignite 3 loads deployment units content to the heap as byte[] when 
transferring units between nodes. This is a potentially easy way of OOM.

We should replace deployment transferring mechanism by 

 


> Use FileTransferService for transfer deployment units between nodes
> ---
>
> Key: IGNITE-20204
> URL: https://issues.apache.org/jira/browse/IGNITE-20204
> Project: Ignite
>  Issue Type: Improvement
>  Components: compute
>Reporter: Ivan Gagarkin
>Priority: Major
>  Labels: ignite-3
>
> Currently, Ignite 3 loads deployment units content to the heap as byte[] when 
> transferring units between nodes. This is a potentially easy way of OOM.
> We should replace deployment transferring mechanism by FileTransferService.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-20204) Use FileTransferService for transfer deployment units between nodes



 [ 
https://issues.apache.org/jira/browse/IGNITE-20204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Gagarkin updated IGNITE-20204:
---
Description: 
Currently, Ignite 3 loads deployment units content to the heap as byte[] when 
transferring units between nodes. This is a potentially easy way of OOM.

We should replace the deployment transferring mechanism with 
FileTransferService.

 

  was:
Currently, Ignite 3 loads deployment units content to the heap as byte[] when 
transferring units between nodes. This is a potentially easy way of OOM.

We should replace deployment transferring mechanism by FileTransferService.

 


> Use FileTransferService for transfer deployment units between nodes
> ---
>
> Key: IGNITE-20204
> URL: https://issues.apache.org/jira/browse/IGNITE-20204
> Project: Ignite
>  Issue Type: Improvement
>  Components: compute
>Reporter: Ivan Gagarkin
>Priority: Major
>  Labels: ignite-3
>
> Currently, Ignite 3 loads deployment units content to the heap as byte[] when 
> transferring units between nodes. This is a potentially easy way of OOM.
> We should replace the deployment transferring mechanism with 
> FileTransferService.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IGNITE-20205) TxLocalTest#testBalance is flaky

Denis Chudov created IGNITE-20205:
-

 Summary: TxLocalTest#testBalance is flaky
 Key: IGNITE-20205
 URL: https://issues.apache.org/jira/browse/IGNITE-20205
 Project: Ignite
  Issue Type: Bug
Reporter: Denis Chudov


TxLocalTest is actually a mock of transactional logic based on local dummy 
table. Seems there are some problems with finalizing the transactions 
transferring money between accounts which causes lock exceptions on checking 
final sum over all accounts. Most likely there is a problem with mock because 
there is no similar problem with other test classes for this test.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IGNITE-20206) Improve parallelism in FileTransferService on the file chunks level

Ivan Gagarkin created IGNITE-20206:
--

 Summary: Improve parallelism in FileTransferService on the file 
chunks level
 Key: IGNITE-20206
 URL: https://issues.apache.org/jira/browse/IGNITE-20206
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Gagarkin


The current implementation of 
`{{{}org.apache.ignite.internal.network.file.FileSender`{}}} has parallelism at 
the file level. It works well when there are many files to send. It may be 
worth improving by implementing parallelism at the file chunk level.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-20206) Improve parallelism in FileTransferService on the file chunks level



 [ 
https://issues.apache.org/jira/browse/IGNITE-20206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Gagarkin updated IGNITE-20206:
---
Description: The current implementation of 
{{org.apache.ignite.internal.network.file.FileSender}} has parallelism at the 
file level. It works well when there are many files to send. It may be worth 
improving by implementing parallelism at the file chunk level.  (was: The 
current implementation of 
`{{{}org.apache.ignite.internal.network.file.FileSender`{}}} has parallelism at 
the file level. It works well when there are many files to send. It may be 
worth improving by implementing parallelism at the file chunk level.)

> Improve parallelism in FileTransferService on the file chunks level
> ---
>
> Key: IGNITE-20206
> URL: https://issues.apache.org/jira/browse/IGNITE-20206
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Gagarkin
>Priority: Major
>  Labels: ignite-3
>
> The current implementation of 
> {{org.apache.ignite.internal.network.file.FileSender}} has parallelism at the 
> file level. It works well when there are many files to send. It may be worth 
> improving by implementing parallelism at the file chunk level.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-20206) Improve parallelism of sending files in FileTransferService



 [ 
https://issues.apache.org/jira/browse/IGNITE-20206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Gagarkin updated IGNITE-20206:
---
Summary: Improve parallelism of sending files in FileTransferService  (was: 
Improve parallelism in FileTransferService on the file chunks level)

> Improve parallelism of sending files in FileTransferService
> ---
>
> Key: IGNITE-20206
> URL: https://issues.apache.org/jira/browse/IGNITE-20206
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Gagarkin
>Priority: Major
>  Labels: ignite-3
>
> The current implementation of 
> {{org.apache.ignite.internal.network.file.FileSender}} has parallelism at the 
> file level. It works well when there are many files to send. It may be worth 
> improving by implementing parallelism at the file chunk level.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-20206) Improve parallelism the sending of files in FileTransferService



 [ 
https://issues.apache.org/jira/browse/IGNITE-20206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Gagarkin updated IGNITE-20206:
---
Summary: Improve parallelism the sending of files in FileTransferService  
(was: Improve parallelism of sending files in FileTransferService)

> Improve parallelism the sending of files in FileTransferService
> ---
>
> Key: IGNITE-20206
> URL: https://issues.apache.org/jira/browse/IGNITE-20206
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Gagarkin
>Priority: Major
>  Labels: ignite-3
>
> The current implementation of 
> {{org.apache.ignite.internal.network.file.FileSender}} has parallelism at the 
> file level. It works well when there are many files to send. It may be worth 
> improving by implementing parallelism at the file chunk level.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IGNITE-20207) Improve the writing of files in FileTransferService

Ivan Gagarkin created IGNITE-20207:
--

 Summary: Improve the writing of files in FileTransferService
 Key: IGNITE-20207
 URL: https://issues.apache.org/jira/browse/IGNITE-20207
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Gagarkin


The current implementation of 
{{org.apache.ignite.internal.network.file.ChunkedFileWriter}} compares the file 
pointer with the offset of the received file chunk. If they are equal, the 
chunk is written to the disk; if not, the chunk is placed in the queue, and it 
will be written when all previous chunks have been written.

It might be more efficient to write chunks instantly.

We should investigate this approach and improve the implementation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IGNITE-20208) Use file ids instead of file names when transferring file chunks

Ivan Gagarkin created IGNITE-20208:
--

 Summary: Use file ids instead of file names when transferring file 
chunks
 Key: IGNITE-20208
 URL: https://issues.apache.org/jira/browse/IGNITE-20208
 Project: Ignite
  Issue Type: Improvement
Reporter: Ivan Gagarkin


We can decrease the size of 
org.apache.ignite.internal.network.file.messages.FileChunkMessage by replacing 
file names with file ids. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-20124) Prevent double storage updates within primary



 [ 
https://issues.apache.org/jira/browse/IGNITE-20124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denis Chudov updated IGNITE-20124:
--
Description: 
h3. Motivation

In order to preserve the guarantee that the primary replica is always 
up-to-date it's required to:
 * In case of common RW transaction - insert writeIntent to the storage within 
primary before replication.
 * In case of one-phase-commit - insert commitedWrite after the replication.

Both have already been done. However, that means that if primary is part of the 
replication group, and it's true in almost all cases, we will double the update:
 * In case of common RW transaction - through the replication.
 * In case of one-phase-commit - either through the replication, or though post 
update, if replication was fast enough.

h3. Definition of Done
 * Prevent double storage updates within primary.

h3. Implementation Notes

The easiest way to prevent double insert is to skip one if local safe time is 
greater or equal to candidates. There are 3 places where we update partition 
storage:
 # Primary pre-replication update. In that case, the second update on 
replication should be excluded.
 # Primary post-replication update in case of 1PC. It's possible to see already 
updated data if replication was already processed locally. It is expected to be 
already covered in https://issues.apache.org/jira/browse/IGNITE-15927 . We 
should check the primary safe time on post-replication update and don't do 
update if the safe time is already adjusted.
 # Insert through replication. In case of !1PC on every primary there will be 
double insert (see 1). In case of 1PC it depends, so we should check the safe 
time on primary to know whether the update should be done (see 2).

In every case, the storage indexes still should be adjusted on replication, as 
it is done now, because the progress of indexes on FSM write operations should 
not be violated - otherwise, a Raft snapshot-based rebalance would be broken.

  was:
h3. Motivation

In order to preserve the guarantee that the primary replica is always 
up-to-date it's required to:
 * In case of common RW transaction - insert writeIntent to the storage within 
primary before replication.
 * In case of one-phase-commit - insert commitedWrite after the replication.

Both have already been done. However, that means that if primary is part of the 
replication group, and it's true in almost all cases, we will double the insert:
 * In case of common RW transaction - through the replication.
 * In case of one-phase-commit - either through the replication, or though post 
insert, if replication was fast enough.

h3. Definition of Done
 * Prevent double storage updates within primary.

h3. Implementation Notes

The easiest way to prevent double insert is to skip one if local safe time is 
greater or equal to candidates. There are 3 places where we update partition 
storage:
 # Primary pre-replication insert. In that case, it's never possible to see 
already adjusted data.
 # Primary post-replication insert in case of 1PC. It's possible to see already 
inserted data if replication was already processed locally. It is expected to 
be already covered in https://issues.apache.org/jira/browse/IGNITE-15927
 # Insert through replication. In case of !1PC on every primary there will be 
double insert. In case of 1PC it depends.


> Prevent double storage updates within primary
> -
>
> Key: IGNITE-20124
> URL: https://issues.apache.org/jira/browse/IGNITE-20124
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Alexander Lapin
>Priority: Major
>  Labels: ignite-3, transactions
>
> h3. Motivation
> In order to preserve the guarantee that the primary replica is always 
> up-to-date it's required to:
>  * In case of common RW transaction - insert writeIntent to the storage 
> within primary before replication.
>  * In case of one-phase-commit - insert commitedWrite after the replication.
> Both have already been done. However, that means that if primary is part of 
> the replication group, and it's true in almost all cases, we will double the 
> update:
>  * In case of common RW transaction - through the replication.
>  * In case of one-phase-commit - either through the replication, or though 
> post update, if replication was fast enough.
> h3. Definition of Done
>  * Prevent double storage updates within primary.
> h3. Implementation Notes
> The easiest way to prevent double insert is to skip one if local safe time is 
> greater or equal to candidates. There are 3 places where we update partition 
> storage:
>  # Primary pre-replication update. In that case, the second update on 
> replication should be excluded.
>  # Primary post-replication update in case of 1PC. It's possible to see 
> already updated data if replication was

[jira] [Updated] (IGNITE-20202) Base metrics for SQL thread pools



 [ 
https://issues.apache.org/jira/browse/IGNITE-20202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yury Gerzhedovich updated IGNITE-20202:
---
Description: 
Let's introduce queue size for SQL execution and planning thread pools.
Type is a Gauge. Suggested names are
* {code:java}
sql.plan.QueueSize
{code}
* {code:java}
sql.execution.stripe..QueueSize
{code}


  was:
Let's introduce queue size for SQL execution and planning thread pools.
Type is a Gauge. Suggested names are
* {code:java}
sql.plan.QueueSize
{code}
* {code:java}
sql.execution.thread..QueueSize
{code}



> Base metrics for SQL thread pools
> -
>
> Key: IGNITE-20202
> URL: https://issues.apache.org/jira/browse/IGNITE-20202
> Project: Ignite
>  Issue Type: Improvement
>  Components: sql
>Reporter: Yury Gerzhedovich
>Priority: Major
>  Labels: ignite-3
>
> Let's introduce queue size for SQL execution and planning thread pools.
> Type is a Gauge. Suggested names are
> * {code:java}
> sql.plan.QueueSize
> {code}
> * {code:java}
> sql.execution.stripe..QueueSize
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-20124) Prevent double storage updates within primary



 [ 
https://issues.apache.org/jira/browse/IGNITE-20124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denis Chudov updated IGNITE-20124:
--
Description: 
h3. Motivation

In order to preserve the guarantee that the primary replica is always 
up-to-date it's required to:
 * In case of common RW transaction - insert writeIntent to the storage within 
primary before replication.
 * In case of one-phase-commit - insert commitedWrite after the replication.

Both have already been done. However, that means that if primary is part of the 
replication group, and it's true in almost all cases, we will double the update:
 * In case of common RW transaction - through the replication.
 * In case of one-phase-commit - either through the replication, or though post 
update, if replication was fast enough.

h3. Definition of Done
 * Prevent double storage updates within primary.

h3. Implementation Notes

The easiest way to prevent double insert is to skip one if local safe time is 
greater or equal to candidates. There are 3 places where we update partition 
storage:
 # Primary pre-replication update. In that case, the second update on 
replication should be excluded.
 # Primary post-replication update in case of 1PC. It's possible to see already 
updated data if replication was already processed locally. It is expected to be 
already covered in https://issues.apache.org/jira/browse/IGNITE-15927 . We 
should check the primary safe time on post-replication update and don't do 
update if the safe time is already adjusted.
 # Insert through replication. In case of !1PC on every primary there will be 
double insert (see 1). In case of 1PC it depends, so we should check the safe 
time on primary to know whether the update should be done (see 2).

In every case, the storage indexes still should be adjusted on replication, as 
it is done now, because the progress of indexes on FSM write operations should 
not be violated - otherwise, a Raft snapshot-based rebalance would be broken. 
We may have two non-consistent storage updates on primary which may affect 
different fsyncs, 

  was:
h3. Motivation

In order to preserve the guarantee that the primary replica is always 
up-to-date it's required to:
 * In case of common RW transaction - insert writeIntent to the storage within 
primary before replication.
 * In case of one-phase-commit - insert commitedWrite after the replication.

Both have already been done. However, that means that if primary is part of the 
replication group, and it's true in almost all cases, we will double the update:
 * In case of common RW transaction - through the replication.
 * In case of one-phase-commit - either through the replication, or though post 
update, if replication was fast enough.

h3. Definition of Done
 * Prevent double storage updates within primary.

h3. Implementation Notes

The easiest way to prevent double insert is to skip one if local safe time is 
greater or equal to candidates. There are 3 places where we update partition 
storage:
 # Primary pre-replication update. In that case, the second update on 
replication should be excluded.
 # Primary post-replication update in case of 1PC. It's possible to see already 
updated data if replication was already processed locally. It is expected to be 
already covered in https://issues.apache.org/jira/browse/IGNITE-15927 . We 
should check the primary safe time on post-replication update and don't do 
update if the safe time is already adjusted.
 # Insert through replication. In case of !1PC on every primary there will be 
double insert (see 1). In case of 1PC it depends, so we should check the safe 
time on primary to know whether the update should be done (see 2).

In every case, the storage indexes still should be adjusted on replication, as 
it is done now, because the progress of indexes on FSM write operations should 
not be violated - otherwise, a Raft snapshot-based rebalance would be broken.


> Prevent double storage updates within primary
> -
>
> Key: IGNITE-20124
> URL: https://issues.apache.org/jira/browse/IGNITE-20124
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Alexander Lapin
>Priority: Major
>  Labels: ignite-3, transactions
>
> h3. Motivation
> In order to preserve the guarantee that the primary replica is always 
> up-to-date it's required to:
>  * In case of common RW transaction - insert writeIntent to the storage 
> within primary before replication.
>  * In case of one-phase-commit - insert commitedWrite after the replication.
> Both have already been done. However, that means that if primary is part of 
> the replication group, and it's true in almost all cases, we will double the 
> update:
>  * In case of common RW transaction - through the replication.
>  * In case of one-phase-commit - either through the rep

[jira] [Comment Edited] (IGNITE-20165) Revisit the configuration of thread pools used by JRaft



[ 
https://issues.apache.org/jira/browse/IGNITE-20165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17753233#comment-17753233
 ] 

Mirza Aliev edited comment on IGNITE-20165 at 8/14/23 9:04 AM:
---

By default, all executors are shared among the instance of Loza, meaning that 
all raft groups share executors.

Below I've represented all JRaft executors with short description and the 
number of threads  
||Pool name||Description||Number of Threads||
|JRaft-Common-Executor|A pool for processing short-lived asynchronous tasks. 
Should never be blocked.|Utils.cpus() (core == max)|
|JRaft-Node-Scheduler|A scheduled executor for running delayed or repeating 
tasks.|Math.min(Utils.cpus() * 3, 20) (core, max == Integer.MAX_VALUE, 
DelayedWorkQueue)|
|JRaft-Request-Processor|A default pool for handling RAFT requests. Should 
never be blocked.|Utils.cpus() * 6 (core == max)|
|JRaft-Response-Processor|A default pool for handling RAFT responses. Should 
never be blocked.|80 (core == max/3, workQueue == 1)|
|JRaft-AppendEntries-Processor|A pool of single thread executors. Used only if 
a replication pipelining is enabled. Handles append entries requests and 
responses (used by the replication flow). Threads are started on demand. Each 
replication pair (leader-follower) uses dedicated single thread executor from 
the pool, so all messages between replication peer pairs are processed 
sequentially.|SystemPropertyUtil.getInt(
"jraft.append.entries.threads.send", Math.max(16, 
Ints.findNextPositivePowerOfTwo(cpus() * 2)));|
|NodeImpl-Disruptor|A striped disruptor for batching FSM (finite state machine) 
user tasks.|DEFAULT_STRIPES = Utils.cpus() * 2|
|ReadOnlyService-Disruptor|A striped disruptor for batching read requests 
before doing read index request.|DEFAULT_STRIPES = Utils.cpus() * 2|
|LogManager-Disruptor|A striped disruptor for delivering log entries to a 
storage.|DEFAULT_STRIPES = Utils.cpus() * 2|
|FSMCaller-Disruptor|A striped disruptor for FSM callbacks.|DEFAULT_STRIPES = 
Utils.cpus() * 2|
|SnapshotTimer|A timer for periodic snapshot creation.|Math.min(Utils.cpus() * 
3, 20) (core, max == Integer.MAX_VALUE)|
|ElectionTimer|A timer to handle election timeout on 
followers.|Math.min(Utils.cpus() * 3, 20) (core, max == Integer.MAX_VALUE)|
|VoteTimer|A timer to handle vote timeout when a leader was not confirmed by 
majority.|Math.min(Utils.cpus() * 3, 20) (core, max == Integer.MAX_VALUE)|
|StepDownTimer|A timer to process leader step down 
condition.|Math.min(Utils.cpus() * 3, 20) (core, max == Integer.MAX_VALUE)|


was (Author: maliev):
By default, all executors are shared among the instance of Loza, meaning that 
all raft groups share executors.

Below I've represented all JRaft executors with short description and the 
number of threads  
||Pool name||Description||Number of Threads||
|JRaft-Common-Executor|A pool for processing short-lived asynchronous tasks. 
Should never be blocked.|Utils.cpus() (core == max)|
|JRaft-Node-Scheduler|A scheduled executor for running delayed or repeating 
tasks.|Math.min(Utils.cpus() * 3, 20) (core, max == Integer.MAX_VALUE)|
|JRaft-Request-Processor|A default pool for handling RAFT requests. Should 
never be blocked.|Utils.cpus() * 6 (core == max)|
|JRaft-Response-Processor|A default pool for handling RAFT responses. Should 
never be blocked.|80 (core == max/3, workQueue == 1)|
|JRaft-AppendEntries-Processor|A pool of single thread executors. Used only if 
a replication pipelining is enabled. Handles append entries requests and 
responses (used by the replication flow). Threads are started on demand. Each 
replication pair (leader-follower) uses dedicated single thread executor from 
the pool, so all messages between replication peer pairs are processed 
sequentially.|SystemPropertyUtil.getInt(
"jraft.append.entries.threads.send", Math.max(16, 
Ints.findNextPositivePowerOfTwo(cpus() * 2)));|
|NodeImpl-Disruptor|A striped disruptor for batching FSM (finite state machine) 
user tasks.|DEFAULT_STRIPES = Utils.cpus() * 2|
|ReadOnlyService-Disruptor|A striped disruptor for batching read requests 
before doing read index request.|DEFAULT_STRIPES = Utils.cpus() * 2|
|LogManager-Disruptor|A striped disruptor for delivering log entries to a 
storage.|DEFAULT_STRIPES = Utils.cpus() * 2|
|FSMCaller-Disruptor|A striped disruptor for FSM callbacks.|DEFAULT_STRIPES = 
Utils.cpus() * 2|
|SnapshotTimer|A timer for periodic snapshot creation.|Math.min(Utils.cpus() * 
3, 20) (core, max == Integer.MAX_VALUE)|
|ElectionTimer|A timer to handle election timeout on 
followers.|Math.min(Utils.cpus() * 3, 20) (core, max == Integer.MAX_VALUE)|
|VoteTimer|A timer to handle vote timeout when a leader was not confirmed by 
majority.|Math.min(Utils.cpus() * 3, 20) (core, max == Integer.MAX_VALUE)|
|StepDownTimer|A timer to process leader step down 
condition.|Math.min(Utils.cpus() * 3, 20)

[jira] [Updated] (IGNITE-20124) Prevent double storage updates within primary



 [ 
https://issues.apache.org/jira/browse/IGNITE-20124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denis Chudov updated IGNITE-20124:
--
Description: 
h3. Motivation

In order to preserve the guarantee that the primary replica is always 
up-to-date it's required to:
 * In case of common RW transaction - insert writeIntent to the storage within 
primary before replication.
 * In case of one-phase-commit - insert commitedWrite after the replication.

Both have already been done. However, that means that if primary is part of the 
replication group, and it's true in almost all cases, we will double the update:
 * In case of common RW transaction - through the replication.
 * In case of one-phase-commit - either through the replication, or though post 
update, if replication was fast enough.

h3. Definition of Done
 * Prevent double storage updates within primary.

h3. Implementation Notes

The easiest way to prevent double insert is to skip one if local safe time is 
greater or equal to candidates. There are 3 places where we update partition 
storage:
 # Primary pre-replication update. In that case, the second update on 
replication should be excluded.
 # Primary post-replication update in case of 1PC. It's possible to see already 
updated data if replication was already processed locally. It is expected to be 
already covered in https://issues.apache.org/jira/browse/IGNITE-15927 . We 
should check the primary safe time on post-replication update and don't do 
update if the safe time is already adjusted.
 # Insert through replication. In case of !1PC on every primary there will be 
double insert (see 1). In case of 1PC it depends, so we should check the safe 
time on primary to know whether the update should be done (see 2).

In every case, the storage indexes still should be adjusted on replication, as 
it is done now, because the progress of indexes on FSM write operations should 
not be violated - otherwise, a Raft snapshot-based rebalance would be broken. 
We may have two non-consistent storage updates on primary which may affect 
different fsyncs, so maybe we should benchmark this optimization to find out 
how useful it is. The transactional correctness isn't violated by these 

 

  was:
h3. Motivation

In order to preserve the guarantee that the primary replica is always 
up-to-date it's required to:
 * In case of common RW transaction - insert writeIntent to the storage within 
primary before replication.
 * In case of one-phase-commit - insert commitedWrite after the replication.

Both have already been done. However, that means that if primary is part of the 
replication group, and it's true in almost all cases, we will double the update:
 * In case of common RW transaction - through the replication.
 * In case of one-phase-commit - either through the replication, or though post 
update, if replication was fast enough.

h3. Definition of Done
 * Prevent double storage updates within primary.

h3. Implementation Notes

The easiest way to prevent double insert is to skip one if local safe time is 
greater or equal to candidates. There are 3 places where we update partition 
storage:
 # Primary pre-replication update. In that case, the second update on 
replication should be excluded.
 # Primary post-replication update in case of 1PC. It's possible to see already 
updated data if replication was already processed locally. It is expected to be 
already covered in https://issues.apache.org/jira/browse/IGNITE-15927 . We 
should check the primary safe time on post-replication update and don't do 
update if the safe time is already adjusted.
 # Insert through replication. In case of !1PC on every primary there will be 
double insert (see 1). In case of 1PC it depends, so we should check the safe 
time on primary to know whether the update should be done (see 2).

In every case, the storage indexes still should be adjusted on replication, as 
it is done now, because the progress of indexes on FSM write operations should 
not be violated - otherwise, a Raft snapshot-based rebalance would be broken. 
We may have two non-consistent storage updates on primary which may affect 
different fsyncs, 


> Prevent double storage updates within primary
> -
>
> Key: IGNITE-20124
> URL: https://issues.apache.org/jira/browse/IGNITE-20124
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Alexander Lapin
>Priority: Major
>  Labels: ignite-3, transactions
>
> h3. Motivation
> In order to preserve the guarantee that the primary replica is always 
> up-to-date it's required to:
>  * In case of common RW transaction - insert writeIntent to the storage 
> within primary before replication.
>  * In case of one-phase-commit - insert commitedWrite after the replication.
> Both have already been done. However, that means that if p

[jira] [Comment Edited] (IGNITE-20165) Revisit the configuration of thread pools used by JRaft



[ 
https://issues.apache.org/jira/browse/IGNITE-20165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17753233#comment-17753233
 ] 

Mirza Aliev edited comment on IGNITE-20165 at 8/14/23 9:05 AM:
---

By default, all executors are shared among the instance of Loza, meaning that 
all raft groups share executors.

Below I've represented all JRaft executors with short description and the 
number of threads  
||Pool name||Description||Number of Threads||
|JRaft-Common-Executor|A pool for processing short-lived asynchronous tasks. 
Should never be blocked.|Utils.cpus() (core == max)|
|JRaft-Node-Scheduler|A scheduled executor for running delayed or repeating 
tasks.|Math.min(Utils.cpus() * 3, 20) (core, max == Integer.MAX_VALUE, 
DelayedWorkQueue)|
|JRaft-Request-Processor|A default pool for handling RAFT requests. Should 
never be blocked.|Utils.cpus() * 6 (core == max)|
|JRaft-Response-Processor|A default pool for handling RAFT responses. Should 
never be blocked.|80 (core == max/3, workQueue == 1)|
|JRaft-AppendEntries-Processor|A pool of single thread executors. Used only if 
a replication pipelining is enabled. Handles append entries requests and 
responses (used by the replication flow). Threads are started on demand. Each 
replication pair (leader-follower) uses dedicated single thread executor from 
the pool, so all messages between replication peer pairs are processed 
sequentially.|SystemPropertyUtil.getInt(
"jraft.append.entries.threads.send", Math.max(16, 
Ints.findNextPositivePowerOfTwo(cpus() * 2)));|
|NodeImpl-Disruptor|A striped disruptor for batching FSM (finite state machine) 
user tasks.|DEFAULT_STRIPES = Utils.cpus() * 2|
|ReadOnlyService-Disruptor|A striped disruptor for batching read requests 
before doing read index request.|DEFAULT_STRIPES = Utils.cpus() * 2|
|LogManager-Disruptor|A striped disruptor for delivering log entries to a 
storage.|DEFAULT_STRIPES = Utils.cpus() * 2|
|FSMCaller-Disruptor|A striped disruptor for FSM callbacks.|DEFAULT_STRIPES = 
Utils.cpus() * 2|
|SnapshotTimer|A timer for periodic snapshot creation.|Math.min(Utils.cpus() * 
3, 20) (core, max == Integer.MAX_VALUE, DelayedWorkQueue)|
|ElectionTimer|A timer to handle election timeout on 
followers.|Math.min(Utils.cpus() * 3, 20) (core, max == Integer.MAX_VALUE, 
DelayedWorkQueue)|
|VoteTimer|A timer to handle vote timeout when a leader was not confirmed by 
majority.|Math.min(Utils.cpus() * 3, 20) (core, max == Integer.MAX_VALUE, 
DelayedWorkQueue)|
|StepDownTimer|A timer to process leader step down 
condition.|Math.min(Utils.cpus() * 3, 20) (core, max == Integer.MAX_VALUE, 
DelayedWorkQueue)|


was (Author: maliev):
By default, all executors are shared among the instance of Loza, meaning that 
all raft groups share executors.

Below I've represented all JRaft executors with short description and the 
number of threads  
||Pool name||Description||Number of Threads||
|JRaft-Common-Executor|A pool for processing short-lived asynchronous tasks. 
Should never be blocked.|Utils.cpus() (core == max)|
|JRaft-Node-Scheduler|A scheduled executor for running delayed or repeating 
tasks.|Math.min(Utils.cpus() * 3, 20) (core, max == Integer.MAX_VALUE, 
DelayedWorkQueue)|
|JRaft-Request-Processor|A default pool for handling RAFT requests. Should 
never be blocked.|Utils.cpus() * 6 (core == max)|
|JRaft-Response-Processor|A default pool for handling RAFT responses. Should 
never be blocked.|80 (core == max/3, workQueue == 1)|
|JRaft-AppendEntries-Processor|A pool of single thread executors. Used only if 
a replication pipelining is enabled. Handles append entries requests and 
responses (used by the replication flow). Threads are started on demand. Each 
replication pair (leader-follower) uses dedicated single thread executor from 
the pool, so all messages between replication peer pairs are processed 
sequentially.|SystemPropertyUtil.getInt(
"jraft.append.entries.threads.send", Math.max(16, 
Ints.findNextPositivePowerOfTwo(cpus() * 2)));|
|NodeImpl-Disruptor|A striped disruptor for batching FSM (finite state machine) 
user tasks.|DEFAULT_STRIPES = Utils.cpus() * 2|
|ReadOnlyService-Disruptor|A striped disruptor for batching read requests 
before doing read index request.|DEFAULT_STRIPES = Utils.cpus() * 2|
|LogManager-Disruptor|A striped disruptor for delivering log entries to a 
storage.|DEFAULT_STRIPES = Utils.cpus() * 2|
|FSMCaller-Disruptor|A striped disruptor for FSM callbacks.|DEFAULT_STRIPES = 
Utils.cpus() * 2|
|SnapshotTimer|A timer for periodic snapshot creation.|Math.min(Utils.cpus() * 
3, 20) (core, max == Integer.MAX_VALUE)|
|ElectionTimer|A timer to handle election timeout on 
followers.|Math.min(Utils.cpus() * 3, 20) (core, max == Integer.MAX_VALUE)|
|VoteTimer|A timer to handle vote timeout when a leader was not confirmed by 
majority.|Math.min(Utils.cpus() * 3, 20) (core, max == Integer.MAX_VALUE)|

[jira] [Updated] (IGNITE-20202) Base metrics for SQL thread pools



 [ 
https://issues.apache.org/jira/browse/IGNITE-20202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yury Gerzhedovich updated IGNITE-20202:
---
Description: 
Let's introduce queue size for planning thread pool.
Type is a Gauge. Suggested names are
* {code:java}
sql.plan.QueueSize
{code}

  was:
Let's introduce queue size for SQL execution and planning thread pools.
Type is a Gauge. Suggested names are
* {code:java}
sql.plan.QueueSize
{code}
* {code:java}
sql.execution.stripe..QueueSize
{code}



> Base metrics for SQL thread pools
> -
>
> Key: IGNITE-20202
> URL: https://issues.apache.org/jira/browse/IGNITE-20202
> Project: Ignite
>  Issue Type: Improvement
>  Components: sql
>Reporter: Yury Gerzhedovich
>Priority: Major
>  Labels: ignite-3
>
> Let's introduce queue size for planning thread pool.
> Type is a Gauge. Suggested names are
> * {code:java}
> sql.plan.QueueSize
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-20202) Introduce queue size of SQL plan thread pool as metric



 [ 
https://issues.apache.org/jira/browse/IGNITE-20202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yury Gerzhedovich updated IGNITE-20202:
---
Summary: Introduce queue size of SQL plan thread pool as  metric  (was: 
Base metrics for SQL thread pools)

> Introduce queue size of SQL plan thread pool as  metric
> ---
>
> Key: IGNITE-20202
> URL: https://issues.apache.org/jira/browse/IGNITE-20202
> Project: Ignite
>  Issue Type: Improvement
>  Components: sql
>Reporter: Yury Gerzhedovich
>Priority: Major
>  Labels: ignite-3
>
> Let's introduce queue size for planning thread pool.
> Type is a Gauge. Suggested names are
> * {code:java}
> sql.plan.QueueSize
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-20202) Introduce queue size of SQL plan thread pool as metric



 [ 
https://issues.apache.org/jira/browse/IGNITE-20202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yury Gerzhedovich updated IGNITE-20202:
---
Description: 
Let's introduce queue size for planning thread pool.
Type is a Gauge. Suggested name are
* {code:java}
sql.plan.QueueSize
{code}

  was:
Let's introduce queue size for planning thread pool.
Type is a Gauge. Suggested names are
* {code:java}
sql.plan.QueueSize
{code}


> Introduce queue size of SQL plan thread pool as  metric
> ---
>
> Key: IGNITE-20202
> URL: https://issues.apache.org/jira/browse/IGNITE-20202
> Project: Ignite
>  Issue Type: Improvement
>  Components: sql
>Reporter: Yury Gerzhedovich
>Priority: Major
>  Labels: ignite-3
>
> Let's introduce queue size for planning thread pool.
> Type is a Gauge. Suggested name are
> * {code:java}
> sql.plan.QueueSize
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-20202) Introduce queue size of SQL plan thread pool as metric



 [ 
https://issues.apache.org/jira/browse/IGNITE-20202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yury Gerzhedovich updated IGNITE-20202:
---
Description: 
Let's introduce queue size for planning thread pool.
Type is a Gauge. Suggested name is
* {code:java}
sql.plan.QueueSize
{code}

  was:
Let's introduce queue size for planning thread pool.
Type is a Gauge. Suggested name are
* {code:java}
sql.plan.QueueSize
{code}


> Introduce queue size of SQL plan thread pool as  metric
> ---
>
> Key: IGNITE-20202
> URL: https://issues.apache.org/jira/browse/IGNITE-20202
> Project: Ignite
>  Issue Type: Improvement
>  Components: sql
>Reporter: Yury Gerzhedovich
>Priority: Major
>  Labels: ignite-3
>
> Let's introduce queue size for planning thread pool.
> Type is a Gauge. Suggested name is
> * {code:java}
> sql.plan.QueueSize
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (IGNITE-20165) Revisit the configuration of thread pools used by JRaft



[ 
https://issues.apache.org/jira/browse/IGNITE-20165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17753233#comment-17753233
 ] 

Mirza Aliev edited comment on IGNITE-20165 at 8/14/23 9:35 AM:
---

By default, all executors are shared among the instance of Loza, meaning that 
all raft groups share executors.

Below I've represented all JRaft executors with short description and the 
number of threads  
||Pool name||Description||Number of Threads||
|JRaft-Common-Executor|A pool for processing short-lived asynchronous tasks. 
Should never be blocked.|Utils.cpus() (core == max)|
|JRaft-Node-Scheduler|A scheduled executor for running delayed or repeating 
tasks.|Math.min(Utils.cpus() * 3, 20) (core, max == Integer.MAX_VALUE, 
DelayedWorkQueue)|
|JRaft-Request-Processor|A default pool for handling RAFT requests. Should 
never be blocked.|Utils.cpus() * 6 (core == max)|
|JRaft-Response-Processor|A default pool for handling RAFT responses. Should 
never be blocked.|80 (core == max/3, workQueue == 1)|
|JRaft-AppendEntries-Processor|A pool of single thread executors. Used only if 
a replication pipelining is enabled (is is enabled by default). Handles append 
entries requests and responses (used by the replication flow). Threads are 
started on demand. Each replication pair (leader-follower) uses dedicated 
single thread executor from the pool, so all messages between replication peer 
pairs are processed sequentially.|SystemPropertyUtil.getInt(
"jraft.append.entries.threads.send", Math.max(16, 
Ints.findNextPositivePowerOfTwo(cpus() * 2)));|
|NodeImpl-Disruptor|A striped disruptor for batching FSM (finite state machine) 
user tasks.|DEFAULT_STRIPES = Utils.cpus() * 2|
|ReadOnlyService-Disruptor|A striped disruptor for batching read requests 
before doing read index request.|DEFAULT_STRIPES = Utils.cpus() * 2|
|LogManager-Disruptor|A striped disruptor for delivering log entries to a 
storage.|DEFAULT_STRIPES = Utils.cpus() * 2|
|FSMCaller-Disruptor|A striped disruptor for FSM callbacks.|DEFAULT_STRIPES = 
Utils.cpus() * 2|
|SnapshotTimer|A timer for periodic snapshot creation.|Math.min(Utils.cpus() * 
3, 20) (core, max == Integer.MAX_VALUE, DelayedWorkQueue)|
|ElectionTimer|A timer to handle election timeout on 
followers.|Math.min(Utils.cpus() * 3, 20) (core, max == Integer.MAX_VALUE, 
DelayedWorkQueue)|
|VoteTimer|A timer to handle vote timeout when a leader was not confirmed by 
majority.|Math.min(Utils.cpus() * 3, 20) (core, max == Integer.MAX_VALUE, 
DelayedWorkQueue)|
|StepDownTimer|A timer to process leader step down 
condition.|Math.min(Utils.cpus() * 3, 20) (core, max == Integer.MAX_VALUE, 
DelayedWorkQueue)|


was (Author: maliev):
By default, all executors are shared among the instance of Loza, meaning that 
all raft groups share executors.

Below I've represented all JRaft executors with short description and the 
number of threads  
||Pool name||Description||Number of Threads||
|JRaft-Common-Executor|A pool for processing short-lived asynchronous tasks. 
Should never be blocked.|Utils.cpus() (core == max)|
|JRaft-Node-Scheduler|A scheduled executor for running delayed or repeating 
tasks.|Math.min(Utils.cpus() * 3, 20) (core, max == Integer.MAX_VALUE, 
DelayedWorkQueue)|
|JRaft-Request-Processor|A default pool for handling RAFT requests. Should 
never be blocked.|Utils.cpus() * 6 (core == max)|
|JRaft-Response-Processor|A default pool for handling RAFT responses. Should 
never be blocked.|80 (core == max/3, workQueue == 1)|
|JRaft-AppendEntries-Processor|A pool of single thread executors. Used only if 
a replication pipelining is enabled. Handles append entries requests and 
responses (used by the replication flow). Threads are started on demand. Each 
replication pair (leader-follower) uses dedicated single thread executor from 
the pool, so all messages between replication peer pairs are processed 
sequentially.|SystemPropertyUtil.getInt(
"jraft.append.entries.threads.send", Math.max(16, 
Ints.findNextPositivePowerOfTwo(cpus() * 2)));|
|NodeImpl-Disruptor|A striped disruptor for batching FSM (finite state machine) 
user tasks.|DEFAULT_STRIPES = Utils.cpus() * 2|
|ReadOnlyService-Disruptor|A striped disruptor for batching read requests 
before doing read index request.|DEFAULT_STRIPES = Utils.cpus() * 2|
|LogManager-Disruptor|A striped disruptor for delivering log entries to a 
storage.|DEFAULT_STRIPES = Utils.cpus() * 2|
|FSMCaller-Disruptor|A striped disruptor for FSM callbacks.|DEFAULT_STRIPES = 
Utils.cpus() * 2|
|SnapshotTimer|A timer for periodic snapshot creation.|Math.min(Utils.cpus() * 
3, 20) (core, max == Integer.MAX_VALUE, DelayedWorkQueue)|
|ElectionTimer|A timer to handle election timeout on 
followers.|Math.min(Utils.cpus() * 3, 20) (core, max == Integer.MAX_VALUE, 
DelayedWorkQueue)|
|VoteTimer|A timer to handle vote timeout when a leader was not confirmed by 
majority.|M

[jira] [Updated] (IGNITE-20201) Node failure when incorrect names are used for hitrate and histogram metrics configuration



 [ 
https://issues.apache.org/jira/browse/IGNITE-20201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ilya Shishkov updated IGNITE-20201:
---
Description: 
There are no metric name validation when we perform hitrate and historgam 
metrics configuration by means of control script. It can lead to impossibility 
to restart persistent cluster.

*How to reproduce:*
 # Start persistent cluster
 # Just repeat commands from instructions [1].
{noformat}
control.sh —metric —configure-histogram histogram-metric-name 1,2,3
control.sh —metric —configure-hitrate hitrate-metric-name 1000
{noformat}

 # Deactivate and restart cluster.
 # Start and activate cluster and nodes will fail with following error:
{noformat}
[19:47:26,981][SEVERE][main][IgniteKernal] Got exception while starting (will 
rollback startup routine).
java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at java.lang.String.substring(String.java:1967)
at 
org.apache.ignite.internal.processors.metric.impl.MetricUtils.fromFullName(MetricUtils.java:72)
at 
org.apache.ignite.internal.processors.metric.GridMetricManager.find(GridMetricManager.java:502)
at 
org.apache.ignite.internal.processors.metric.GridMetricManager.onHistogramConfigChanged(GridMetricManager.java:480)
at 
org.apache.ignite.internal.processors.metric.GridMetricManager.access$300(GridMetricManager.java:73)
at 
org.apache.ignite.internal.processors.metric.GridMetricManager$1.lambda$onReadyForRead$1(GridMetricManager.java:272)
at 
org.apache.ignite.internal.processors.metastorage.persistence.InMemoryCachedDistributedMetaStorageBridge.iterate(InMemoryCachedDistributedMetaStorageBridge.java:87)
at 
org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl.iterate(DistributedMetaStorageImpl.java:542)
at 
org.apache.ignite.internal.processors.metric.GridMetricManager$1.onReadyForRead(GridMetricManager.java:272)
at 
org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl.notifyReadyForRead(DistributedMetaStorageImpl.java:355)
at 
org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl.onMetaStorageReadyForRead(DistributedMetaStorageImpl.java:434)
at 
org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl.access$200(DistributedMetaStorageImpl.java:116)
at 
org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl$2.onReadyForRead(DistributedMetaStorageImpl.java:259)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetastorageReadyForRead(GridCacheDatabaseSharedManager.java:430)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readMetastore(GridCacheDatabaseSharedManager.java:877)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetaStorageSubscribersOnReadyForRead(GridCacheDatabaseSharedManager.java:3094)
at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1120)
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1725)
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1647)
at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1089)
at 
org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:983)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:889)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:808)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:678)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:647)
at org.apache.ignite.Ignition.start(Ignition.java:325)
at 
org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:365)

{noformat}

Failure occurs when {{GridMetricManager}} tries to parse entries with incorrect 
metric names from metastorage:
{noformat}
metrics.histogram.histogram-metric-name [1, 2, 3]   


metrics.hitrate.hitrate-metric-name 1000
{noformat}

Solution:
# Add extra validation of metric name into {{\-\-metric \-\-configure-*}} 
command.
# Add exception handling into {{GridMetricManager.onHistogramConfigChanged}} 
and {{GridMetricManager.onHitrateConfigChanged}}.

*Workaround:*
Clean metastorage.

Links:
# 
https://ignite.apache.org/docs/latest/tools/control-script#metric-configure-command

  was:
There are no metric name validation when we perform hitrate and historgam 
metrics configuration by means of control script. It can lead to impossibility 
to r

[jira] [Updated] (IGNITE-20201) Node failure when incorrect names are used for hitrate and histogram metrics configuration



 [ 
https://issues.apache.org/jira/browse/IGNITE-20201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ilya Shishkov updated IGNITE-20201:
---
Description: 
There are no metric name validation when we perform hitrate and historgam 
metrics configuration by means of control script. It can lead to impossibility 
to restart persistent cluster.

*How to reproduce:*
 # Start persistent cluster
 # Just repeat commands from instructions [1].
{noformat}
control.sh —metric —configure-histogram histogram-metric-name 1,2,3
control.sh —metric —configure-hitrate hitrate-metric-name 1000
{noformat}

 # Deactivate and restart cluster.
 # Start and activate cluster and nodes will fail with following error:
{noformat}
[19:47:26,981][SEVERE][main][IgniteKernal] Got exception while starting (will 
rollback startup routine).
java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at java.lang.String.substring(String.java:1967)
at 
org.apache.ignite.internal.processors.metric.impl.MetricUtils.fromFullName(MetricUtils.java:72)
at 
org.apache.ignite.internal.processors.metric.GridMetricManager.find(GridMetricManager.java:502)
at 
org.apache.ignite.internal.processors.metric.GridMetricManager.onHistogramConfigChanged(GridMetricManager.java:480)
at 
org.apache.ignite.internal.processors.metric.GridMetricManager.access$300(GridMetricManager.java:73)
at 
org.apache.ignite.internal.processors.metric.GridMetricManager$1.lambda$onReadyForRead$1(GridMetricManager.java:272)
at 
org.apache.ignite.internal.processors.metastorage.persistence.InMemoryCachedDistributedMetaStorageBridge.iterate(InMemoryCachedDistributedMetaStorageBridge.java:87)
at 
org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl.iterate(DistributedMetaStorageImpl.java:542)
at 
org.apache.ignite.internal.processors.metric.GridMetricManager$1.onReadyForRead(GridMetricManager.java:272)
at 
org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl.notifyReadyForRead(DistributedMetaStorageImpl.java:355)
at 
org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl.onMetaStorageReadyForRead(DistributedMetaStorageImpl.java:434)
at 
org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl.access$200(DistributedMetaStorageImpl.java:116)
at 
org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl$2.onReadyForRead(DistributedMetaStorageImpl.java:259)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetastorageReadyForRead(GridCacheDatabaseSharedManager.java:430)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readMetastore(GridCacheDatabaseSharedManager.java:877)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetaStorageSubscribersOnReadyForRead(GridCacheDatabaseSharedManager.java:3094)
at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1120)
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1725)
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1647)
at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1089)
at 
org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:983)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:889)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:808)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:678)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:647)
at org.apache.ignite.Ignition.start(Ignition.java:325)
at 
org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:365)

{noformat}

Failure occurs when {{GridMetricManager}} tries to parse entries with incorrect 
metric names from metastorage:
{noformat}
metrics.histogram.histogram-metric-name [1, 2, 3]   


metrics.hitrate.hitrate-metric-name 1000
{noformat}

Solution:
# Add extra validation of metric name into {{--metic --configure-*}} command.
# Add exception handling into {{GridMetricManager.onHistogramConfigChanged}} 
and {{GridMetricManager.onHitrateConfigChanged}}.

*Workaround:*
Clean metastorage.

Links:
# 
https://ignite.apache.org/docs/latest/tools/control-script#metric-configure-command

  was:
There are no metric name validation when we perform hitrate and historgam 
metrics configuration by means of control script. It can lead to impossibility 
to restart

[jira] [Updated] (IGNITE-20201) Node failure when incorrect names are used for hitrate and histogram metrics configuration



 [ 
https://issues.apache.org/jira/browse/IGNITE-20201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ilya Shishkov updated IGNITE-20201:
---
Description: 
There are no metric name validation when we perform hitrate and historgam 
metrics configuration by means of control script. It can lead to impossibility 
to restart persistent cluster.

*How to reproduce:*
 # Start persistent cluster
 # Just repeat commands from instructions [1].
{noformat}
control.sh —metric —configure-histogram histogram-metric-name 1,2,3
control.sh —metric —configure-hitrate hitrate-metric-name 1000
{noformat}

 # Deactivate and restart cluster.
 # Start and activate cluster and nodes will fail with following error:
{noformat}
[19:47:26,981][SEVERE][main][IgniteKernal] Got exception while starting (will 
rollback startup routine).
java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at java.lang.String.substring(String.java:1967)
at 
org.apache.ignite.internal.processors.metric.impl.MetricUtils.fromFullName(MetricUtils.java:72)
at 
org.apache.ignite.internal.processors.metric.GridMetricManager.find(GridMetricManager.java:502)
at 
org.apache.ignite.internal.processors.metric.GridMetricManager.onHistogramConfigChanged(GridMetricManager.java:480)
at 
org.apache.ignite.internal.processors.metric.GridMetricManager.access$300(GridMetricManager.java:73)
at 
org.apache.ignite.internal.processors.metric.GridMetricManager$1.lambda$onReadyForRead$1(GridMetricManager.java:272)
at 
org.apache.ignite.internal.processors.metastorage.persistence.InMemoryCachedDistributedMetaStorageBridge.iterate(InMemoryCachedDistributedMetaStorageBridge.java:87)
at 
org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl.iterate(DistributedMetaStorageImpl.java:542)
at 
org.apache.ignite.internal.processors.metric.GridMetricManager$1.onReadyForRead(GridMetricManager.java:272)
at 
org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl.notifyReadyForRead(DistributedMetaStorageImpl.java:355)
at 
org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl.onMetaStorageReadyForRead(DistributedMetaStorageImpl.java:434)
at 
org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl.access$200(DistributedMetaStorageImpl.java:116)
at 
org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl$2.onReadyForRead(DistributedMetaStorageImpl.java:259)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetastorageReadyForRead(GridCacheDatabaseSharedManager.java:430)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readMetastore(GridCacheDatabaseSharedManager.java:877)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetaStorageSubscribersOnReadyForRead(GridCacheDatabaseSharedManager.java:3094)
at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1120)
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1725)
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1647)
at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1089)
at 
org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:983)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:889)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:808)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:678)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:647)
at org.apache.ignite.Ignition.start(Ignition.java:325)
at 
org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:365)

{noformat}

Failure occurs when {{GridMetricManager}} tries to parse entries with incorrect 
metric names from metastorage:
{noformat}
metrics.histogram.histogram-metric-name [1, 2, 3]   


metrics.hitrate.hitrate-metric-name 1000
{noformat}

Solution:
# Add extra validation of metric name into {{\-\-metric \-\-configure-*}} 
command.
# Add exception handling into {{GridMetricManager.onHistogramConfigChanged}} 
and {{GridMetricManager.onHitRateConfigChanged}}.

*Workaround:*
Clean metastorage.

Links:
# 
https://ignite.apache.org/docs/latest/tools/control-script#metric-configure-command

  was:
There are no metric name validation when we perform hitrate and historgam 
metrics configuration by means of control script. It can lead to impossibility 
to r

[jira] [Updated] (IGNITE-20201) Node failure when incorrect names are used for hitrate and histogram metrics configuration



 [ 
https://issues.apache.org/jira/browse/IGNITE-20201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ilya Shishkov updated IGNITE-20201:
---
Description: 
There are no metric name validation when we perform hitrate and historgam 
metrics configuration by means of control script. It can lead to impossibility 
to restart persistent cluster.

*How to reproduce:*
 # Start persistent cluster
 # Just repeat commands from instructions [1].
{noformat}
control.sh —metric —configure-histogram histogram-metric-name 1,2,3
control.sh —metric —configure-hitrate hitrate-metric-name 1000
{noformat}

 # Deactivate and restart cluster.
 # Start and activate cluster and nodes will fail with following error:
{noformat}
[19:47:26,981][SEVERE][main][IgniteKernal] Got exception while starting (will 
rollback startup routine).
java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at java.lang.String.substring(String.java:1967)
at 
org.apache.ignite.internal.processors.metric.impl.MetricUtils.fromFullName(MetricUtils.java:72)
at 
org.apache.ignite.internal.processors.metric.GridMetricManager.find(GridMetricManager.java:502)
at 
org.apache.ignite.internal.processors.metric.GridMetricManager.onHistogramConfigChanged(GridMetricManager.java:480)
at 
org.apache.ignite.internal.processors.metric.GridMetricManager.access$300(GridMetricManager.java:73)
at 
org.apache.ignite.internal.processors.metric.GridMetricManager$1.lambda$onReadyForRead$1(GridMetricManager.java:272)
at 
org.apache.ignite.internal.processors.metastorage.persistence.InMemoryCachedDistributedMetaStorageBridge.iterate(InMemoryCachedDistributedMetaStorageBridge.java:87)
at 
org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl.iterate(DistributedMetaStorageImpl.java:542)
at 
org.apache.ignite.internal.processors.metric.GridMetricManager$1.onReadyForRead(GridMetricManager.java:272)
at 
org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl.notifyReadyForRead(DistributedMetaStorageImpl.java:355)
at 
org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl.onMetaStorageReadyForRead(DistributedMetaStorageImpl.java:434)
at 
org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl.access$200(DistributedMetaStorageImpl.java:116)
at 
org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl$2.onReadyForRead(DistributedMetaStorageImpl.java:259)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetastorageReadyForRead(GridCacheDatabaseSharedManager.java:430)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readMetastore(GridCacheDatabaseSharedManager.java:877)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetaStorageSubscribersOnReadyForRead(GridCacheDatabaseSharedManager.java:3094)
at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1120)
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1725)
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1647)
at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1089)
at 
org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:983)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:889)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:808)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:678)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:647)
at org.apache.ignite.Ignition.start(Ignition.java:325)
at 
org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:365)

{noformat}

Failure occurs when {{GridMetricManager}} tries to parse entries with incorrect 
metric names from metastorage:
{noformat}
metrics.histogram.histogram-metric-name [1, 2, 3]   


metrics.hitrate.hitrate-metric-name 1000
{noformat}

*Solution:*
# Add extra validation of metric name into {{\-\-metric \-\-configure-*}} 
command.
# Add exception handling into {{GridMetricManager.onHistogramConfigChanged}} 
and {{GridMetricManager.onHitRateConfigChanged}}.

*Workaround:*
Clean metastorage.

Links:
# 
https://ignite.apache.org/docs/latest/tools/control-script#metric-configure-command

  was:
There are no metric name validation when we perform hitrate and historgam 
metrics configuration by means of control script. It can lead to impossibility 
to

[jira] [Resolved] (IGNITE-18902) Too much threads started on empty cluster



 [ 
https://issues.apache.org/jira/browse/IGNITE-18902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksandr Polovtcev resolved IGNITE-18902.
--
Resolution: Duplicate

> Too much threads started on empty cluster
> -
>
> Key: IGNITE-18902
> URL: https://issues.apache.org/jira/browse/IGNITE-18902
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Konstantin Orlov
>Priority: Major
>  Labels: ignite-3
> Attachments: after_start.txt, after_table_creation.txt
>
>
> Seems we start unreasonable amount of threads. Thread dump right after the 
> start of a single node shows a 170 threads with prefix  ('idt_n_0' 
> in dumps), 157 of them belongs to the JRaft. Creation of a table contributes 
> another 160 threads to the dump (330 in total), and 114 of them belongs to 
> the JRaft (271 in total).
> Let's investigate if we really need all those threads or we can do better. 
> You could find thread dumps attached below.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-20201) Node failure when incorrect names are used for hitrate and histogram metrics configuration



 [ 
https://issues.apache.org/jira/browse/IGNITE-20201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ilya Shishkov updated IGNITE-20201:
---
Description: 
There are no metric name validation when we perform hitrate and historgam 
metrics configuration by means of control script. It can lead to impossibility 
to restart persistent cluster.

*How to reproduce:*
 # Start persistent cluster.
 # Enter commands from instructions [1].
{noformat}
control.sh —metric —configure-histogram histogram-metric-name 1,2,3
control.sh —metric —configure-hitrate hitrate-metric-name 1000
{noformat}
 # Deactivate and restart cluster.
 # Start and activate cluster and nodes will fail with following error:
{noformat}
[19:47:26,981][SEVERE][main][IgniteKernal] Got exception while starting (will 
rollback startup routine).
java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at java.lang.String.substring(String.java:1967)
at 
org.apache.ignite.internal.processors.metric.impl.MetricUtils.fromFullName(MetricUtils.java:72)
at 
org.apache.ignite.internal.processors.metric.GridMetricManager.find(GridMetricManager.java:502)
at 
org.apache.ignite.internal.processors.metric.GridMetricManager.onHistogramConfigChanged(GridMetricManager.java:480)
at 
org.apache.ignite.internal.processors.metric.GridMetricManager.access$300(GridMetricManager.java:73)
at 
org.apache.ignite.internal.processors.metric.GridMetricManager$1.lambda$onReadyForRead$1(GridMetricManager.java:272)
at 
org.apache.ignite.internal.processors.metastorage.persistence.InMemoryCachedDistributedMetaStorageBridge.iterate(InMemoryCachedDistributedMetaStorageBridge.java:87)
at 
org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl.iterate(DistributedMetaStorageImpl.java:542)
at 
org.apache.ignite.internal.processors.metric.GridMetricManager$1.onReadyForRead(GridMetricManager.java:272)
at 
org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl.notifyReadyForRead(DistributedMetaStorageImpl.java:355)
at 
org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl.onMetaStorageReadyForRead(DistributedMetaStorageImpl.java:434)
at 
org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl.access$200(DistributedMetaStorageImpl.java:116)
at 
org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl$2.onReadyForRead(DistributedMetaStorageImpl.java:259)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetastorageReadyForRead(GridCacheDatabaseSharedManager.java:430)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readMetastore(GridCacheDatabaseSharedManager.java:877)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetaStorageSubscribersOnReadyForRead(GridCacheDatabaseSharedManager.java:3094)
at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1120)
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1725)
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1647)
at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1089)
at 
org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:983)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:889)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:808)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:678)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:647)
at org.apache.ignite.Ignition.start(Ignition.java:325)
at 
org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:365)

{noformat}

Failure occurs when {{GridMetricManager}} tries to parse entries with incorrect 
metric names from metastorage:
{noformat}
metrics.histogram.histogram-metric-name [1, 2, 3]   


metrics.hitrate.hitrate-metric-name 1000
{noformat}

*Solution:*
# Add extra validation of metric name into {{\-\-metric \-\-configure-*}} 
command.
# Add exception handling into {{GridMetricManager.onHistogramConfigChanged}} 
and {{GridMetricManager.onHitRateConfigChanged}}.

*Workaround:*
Clean metastorage.

Links:
# 
https://ignite.apache.org/docs/latest/tools/control-script#metric-configure-command

  was:
There are no metric name validation when we perform hitrate and historgam 
metrics configuration by means of control script. It can lead to impossibility 
to resta

[jira] [Updated] (IGNITE-20201) Node failure when incorrect names are used for hitrate and histogram metrics configuration



 [ 
https://issues.apache.org/jira/browse/IGNITE-20201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ilya Shishkov updated IGNITE-20201:
---
Description: 
There are no metric name validation when we perform hitrate and historgam 
metrics configuration by means of control script. It can lead to impossibility 
to restart persistent cluster.

*How to reproduce:*
 # Start persistent cluster.
 # Enter commands from instructions [1].
{noformat}
control.sh —metric —configure-histogram histogram-metric-name 1,2,3
control.sh —metric —configure-hitrate hitrate-metric-name 1000
{noformat}

 # Deactivate and restart cluster.
 # Start and activate cluster and nodes will fail with following error:
{noformat}
[19:47:26,981][SEVERE][main][IgniteKernal] Got exception while starting (will 
rollback startup routine).
java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at java.lang.String.substring(String.java:1967)
at 
org.apache.ignite.internal.processors.metric.impl.MetricUtils.fromFullName(MetricUtils.java:72)
at 
org.apache.ignite.internal.processors.metric.GridMetricManager.find(GridMetricManager.java:502)
at 
org.apache.ignite.internal.processors.metric.GridMetricManager.onHistogramConfigChanged(GridMetricManager.java:480)
at 
org.apache.ignite.internal.processors.metric.GridMetricManager.access$300(GridMetricManager.java:73)
at 
org.apache.ignite.internal.processors.metric.GridMetricManager$1.lambda$onReadyForRead$1(GridMetricManager.java:272)
at 
org.apache.ignite.internal.processors.metastorage.persistence.InMemoryCachedDistributedMetaStorageBridge.iterate(InMemoryCachedDistributedMetaStorageBridge.java:87)
at 
org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl.iterate(DistributedMetaStorageImpl.java:542)
at 
org.apache.ignite.internal.processors.metric.GridMetricManager$1.onReadyForRead(GridMetricManager.java:272)
at 
org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl.notifyReadyForRead(DistributedMetaStorageImpl.java:355)
at 
org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl.onMetaStorageReadyForRead(DistributedMetaStorageImpl.java:434)
at 
org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl.access$200(DistributedMetaStorageImpl.java:116)
at 
org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl$2.onReadyForRead(DistributedMetaStorageImpl.java:259)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetastorageReadyForRead(GridCacheDatabaseSharedManager.java:430)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readMetastore(GridCacheDatabaseSharedManager.java:877)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetaStorageSubscribersOnReadyForRead(GridCacheDatabaseSharedManager.java:3094)
at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1120)
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1725)
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1647)
at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1089)
at 
org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:983)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:889)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:808)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:678)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:647)
at org.apache.ignite.Ignition.start(Ignition.java:325)
at 
org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:365)

{noformat}

Failure occurs when {{GridMetricManager}} tries to parse entries with incorrect 
metric names from metastorage:
{noformat}
metrics.histogram.histogram-metric-name [1, 2, 3]   


metrics.hitrate.hitrate-metric-name 1000
{noformat}

*Solution:*
# Add extra validation of metric name into {{\-\-metric \-\-configure-*}} 
command.
# Add exception handling into {{GridMetricManager.onHistogramConfigChanged}} 
and {{GridMetricManager.onHitRateConfigChanged}}.

*Workaround:*
Clean metastorage.

Links:
# 
https://ignite.apache.org/docs/latest/tools/control-script#metric-configure-command

  was:
There are no metric name validation when we perform hitrate and historgam 
metrics configuration by means of control script. It can lead to impossibility 
to rest

[jira] [Updated] (IGNITE-20201) Node failure when incorrect names are used for hitrate and histogram metrics configuration



 [ 
https://issues.apache.org/jira/browse/IGNITE-20201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ilya Shishkov updated IGNITE-20201:
---
Labels: ise  (was: )

> Node failure when incorrect names are used for hitrate and histogram metrics 
> configuration
> --
>
> Key: IGNITE-20201
> URL: https://issues.apache.org/jira/browse/IGNITE-20201
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.15
>Reporter: Ilya Shishkov
>Priority: Critical
>  Labels: ise
>
> There are no metric name validation when we perform hitrate and historgam 
> metrics configuration by means of control script. It can lead to 
> impossibility to restart persistent cluster.
> *How to reproduce:*
>  # Start persistent cluster
>  # Just repeat commands from instructions [1].
> {noformat}
> control.sh —metric —configure-histogram histogram-metric-name 1,2,3
> control.sh —metric —configure-hitrate hitrate-metric-name 1000
> {noformat}
>  # Deactivate and restart cluster.
>  # Start and activate cluster and nodes will fail with following error:
> {noformat}
> [19:47:26,981][SEVERE][main][IgniteKernal] Got exception while starting (will 
> rollback startup routine).
> java.lang.StringIndexOutOfBoundsException: String index out of range: -1
>   at java.lang.String.substring(String.java:1967)
>   at 
> org.apache.ignite.internal.processors.metric.impl.MetricUtils.fromFullName(MetricUtils.java:72)
>   at 
> org.apache.ignite.internal.processors.metric.GridMetricManager.find(GridMetricManager.java:502)
>   at 
> org.apache.ignite.internal.processors.metric.GridMetricManager.onHistogramConfigChanged(GridMetricManager.java:480)
>   at 
> org.apache.ignite.internal.processors.metric.GridMetricManager.access$300(GridMetricManager.java:73)
>   at 
> org.apache.ignite.internal.processors.metric.GridMetricManager$1.lambda$onReadyForRead$1(GridMetricManager.java:272)
>   at 
> org.apache.ignite.internal.processors.metastorage.persistence.InMemoryCachedDistributedMetaStorageBridge.iterate(InMemoryCachedDistributedMetaStorageBridge.java:87)
>   at 
> org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl.iterate(DistributedMetaStorageImpl.java:542)
>   at 
> org.apache.ignite.internal.processors.metric.GridMetricManager$1.onReadyForRead(GridMetricManager.java:272)
>   at 
> org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl.notifyReadyForRead(DistributedMetaStorageImpl.java:355)
>   at 
> org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl.onMetaStorageReadyForRead(DistributedMetaStorageImpl.java:434)
>   at 
> org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl.access$200(DistributedMetaStorageImpl.java:116)
>   at 
> org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl$2.onReadyForRead(DistributedMetaStorageImpl.java:259)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetastorageReadyForRead(GridCacheDatabaseSharedManager.java:430)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readMetastore(GridCacheDatabaseSharedManager.java:877)
>   at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetaStorageSubscribersOnReadyForRead(GridCacheDatabaseSharedManager.java:3094)
>   at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1120)
>   at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1725)
>   at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1647)
>   at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1089)
>   at 
> org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:983)
>   at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:889)
>   at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:808)
>   at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:678)
>   at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:647)
>   at org.apache.ignite.Ignition.start(Ignition.java:325)
>   at 
> org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:365)
> {noformat}
> Failure occurs when {{GridMetricManager}} tries to parse entries with 
> incorrect metric names from metastorage:
> {noformat}
> metrics.histogram.histogram-metric-name [1, 2, 3] 
>   
> 
> metrics.hitrate.hitra

[jira] [Commented] (IGNITE-20178) Introduce param-free IgniteInternalFuture.listen(() -> {}) in addition to .listen((fut) -> {}) to avoid ignored params

2023-08-14 Thread Ignite TC Bot (Jira)



[ 
https://issues.apache.org/jira/browse/IGNITE-20178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17753976#comment-17753976
 ] 

Ignite TC Bot commented on IGNITE-20178:


{panel:title=Branch: [pull/10885/head] Base: [master] : No blockers 
found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel}
{panel:title=Branch: [pull/10885/head] Base: [master] : New Tests 
(42)|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}
{color:#8b}Snapshots{color} [[tests 
42|https://ci2.ignite.apache.org/viewLog.html?buildId=7296412]]
* {color:#013220}IgniteSnapshotTestSuite: testsuites.IgniteSnapshotTestSuite - 
PASSED{color}
* {color:#013220}IgniteSnapshotTestSuite: testsuites.IgniteSnapshotTestSuite - 
PASSED{color}
* {color:#013220}IgniteSnapshotTestSuite: testsuites.IgniteSnapshotTestSuite - 
PASSED{color}
* {color:#013220}IgniteSnapshotTestSuite: testsuites.IgniteSnapshotTestSuite - 
PASSED{color}
* {color:#013220}IgniteSnapshotTestSuite: testsuites.IgniteSnapshotTestSuite - 
PASSED{color}
* {color:#013220}IgniteSnapshotTestSuite: testsuites.IgniteSnapshotTestSuite - 
PASSED{color}
* {color:#013220}IgniteSnapshotTestSuite: testsuites.IgniteSnapshotTestSuite - 
PASSED{color}
* {color:#013220}IgniteSnapshotTestSuite: testsuites.IgniteSnapshotTestSuite - 
PASSED{color}
* {color:#013220}IgniteSnapshotTestSuite: testsuites.IgniteSnapshotTestSuite - 
PASSED{color}
* {color:#013220}IgniteSnapshotTestSuite: testsuites.IgniteSnapshotTestSuite - 
PASSED{color}
* {color:#013220}IgniteSnapshotTestSuite: testsuites.IgniteSnapshotTestSuite - 
PASSED{color}
... and 31 new tests

{panel}
[TeamCity *--> Run :: All* 
Results|https://ci2.ignite.apache.org/viewLog.html?buildId=7296273&buildTypeId=IgniteTests24Java8_RunAll]

> Introduce param-free IgniteInternalFuture.listen(() -> {}) in addition to 
> .listen((fut) -> {}) to avoid ignored params
> --
>
> Key: IGNITE-20178
> URL: https://issues.apache.org/jira/browse/IGNITE-20178
> Project: Ignite
>  Issue Type: Sub-task
>Reporter: Anton Vinogradov
>Assignee: Anton Vinogradov
>Priority: Major
> Fix For: 2.16
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (IGNITE-16700) ItTxDistributedTestThreeNodesThreeReplicas#testBalance is flaky



[ 
https://issues.apache.org/jira/browse/IGNITE-16700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754021#comment-17754021
 ] 

Denis Chudov commented on IGNITE-16700:
---

This test creates 2 * CPU_COUNT threads and each thread repeats transactions 
transferring money from one account to another, using the number of accounts 
similar to threads' number. In fact, it’s load test from some point of view as 
it discovers performance problems. The reason of test failures are replication 
timeout exception but the reasons of exceptions are different.
 * upsert operations timeouts: the reason of these timeouts is long waiting of 
lock acquisition because of high contention, and lock release after cleanup, so 
that there can be a queue of waiters to acquire lock for each key, and each of 
them wait for tx cleanup.

 * any command timeouts: seems that there are problems with rocksdb log 
storage, and storage flush in RocksDbSharedLogStorage#commitWriteBatch: having 
batch size of several hundred of bytes, the db put operation can last over a 
second. I see many such records in log while logging time for flushing that 
took over 100 ms.

If I turn off fsync for Raft log, and increase number of accounts by 10 times, 
it drastically reduces the fail rate of the test (no failures after 600 runs, 
comparing with 1 per ~25 without fixes). The problem with Raft storage needs 
separate ticket.

> ItTxDistributedTestThreeNodesThreeReplicas#testBalance is flaky
> ---
>
> Key: IGNITE-16700
> URL: https://issues.apache.org/jira/browse/IGNITE-16700
> Project: Ignite
>  Issue Type: Bug
>Reporter: Mirza Aliev
>Assignee: Denis Chudov
>Priority: Major
>  Labels: ignite-3
> Attachments: _Integration_Tests_Module_Table_2055.log, 
> _Integration_Tests_Module_Table_2098.log
>
>
> {{ItTxDistributedTestThreeNodesThreeReplicas#testBalance}} periodically falls 
> with 
> {noformat}
> org.apache.ignite.lang.IgniteException
> org.apache.ignite.lang.IgniteException: java.util.concurrent.TimeoutException 
> ==> expected:  but was: 
> {noformat}
> We've noticed that the test become flaky after IGNITE-16393 has been merged. 
> Probably, the current problem is related to the problem with stopping 
> executors for network's user object serialization threads IGNITE-16699 as far 
> as the logs are full of warnings from IGNITE-16699.
> The plan for this ticket is to wait for IGNITE-16699 to be fixed and check 
> whether this issue is still reproducible. 
> https://ci.ignite.apache.org/buildConfiguration/ignite3_Test_IntegrationTests_ModuleTable/6466138
> UPD: Ticket IGNITE-16699 has been fixed and but the current ticket is still 
> reproducible, so the problem is not related to IGNITE-16699.
> In logs, we can see some suspicious message, need to investigate if this is 
> related to the problem. Actual run 
> https://ci.ignite.apache.org/buildConfiguration/ignite3_Test_RunAllTests/6470268,
>  actual logs are attached
> {noformat}
> 2022-03-18 10:29:33:399 +0300 
> [INFO][%ItTxDistributedTestSingleNode_null_2%JRaft-FSMCaller-Disruptor-_stripe_35-0][ActionRequestProcessor]
>  Error occurred on a user's state machine
> class org.apache.ignite.tx.TransactionException: Failed to enlist a key into 
> a transaction, state=ABORTED
>   at 
> org.apache.ignite.internal.table.distributed.raft.PartitionListener.tryEnlistIntoTransaction(PartitionListener.java:196)
>   at 
> org.apache.ignite.internal.table.distributed.raft.PartitionListener.lambda$onWrite$1(PartitionListener.java:134)
>   at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
>   at 
> org.apache.ignite.internal.table.distributed.raft.PartitionListener.onWrite(PartitionListener.java:131)
>   at 
> org.apache.ignite.internal.raft.server.impl.JraftServerImpl$DelegatingStateMachine.onApply(JraftServerImpl.java:415)
>   at 
> org.apache.ignite.raft.jraft.core.FSMCallerImpl.doApplyTasks(FSMCallerImpl.java:539)
>   at 
> org.apache.ignite.raft.jraft.core.FSMCallerImpl.doCommitted(FSMCallerImpl.java:507)
>   at 
> org.apache.ignite.raft.jraft.core.FSMCallerImpl.runApplyTask(FSMCallerImpl.java:437)
>   at 
> org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:134)
>   at 
> org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:128)
>   at 
> org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:215)
>   at 
> org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:179)
>   at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:137)
>   at java.base/java.lang.Thread.run(Thread.java:834)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.

[jira] [Commented] (IGNITE-16700) ItTxDistributedTestThreeNodesThreeReplicas#testBalance is flaky



[ 
https://issues.apache.org/jira/browse/IGNITE-16700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754022#comment-17754022
 ] 

Denis Chudov commented on IGNITE-16700:
---

I discovered flakiness of TxLocalTest#testBalance after unmuting it (see 
IGNITE-20205 ) - this is a mock of transactional logic based on local dummy 
table.

> ItTxDistributedTestThreeNodesThreeReplicas#testBalance is flaky
> ---
>
> Key: IGNITE-16700
> URL: https://issues.apache.org/jira/browse/IGNITE-16700
> Project: Ignite
>  Issue Type: Bug
>Reporter: Mirza Aliev
>Assignee: Denis Chudov
>Priority: Major
>  Labels: ignite-3
> Attachments: _Integration_Tests_Module_Table_2055.log, 
> _Integration_Tests_Module_Table_2098.log
>
>
> {{ItTxDistributedTestThreeNodesThreeReplicas#testBalance}} periodically falls 
> with 
> {noformat}
> org.apache.ignite.lang.IgniteException
> org.apache.ignite.lang.IgniteException: java.util.concurrent.TimeoutException 
> ==> expected:  but was: 
> {noformat}
> We've noticed that the test become flaky after IGNITE-16393 has been merged. 
> Probably, the current problem is related to the problem with stopping 
> executors for network's user object serialization threads IGNITE-16699 as far 
> as the logs are full of warnings from IGNITE-16699.
> The plan for this ticket is to wait for IGNITE-16699 to be fixed and check 
> whether this issue is still reproducible. 
> https://ci.ignite.apache.org/buildConfiguration/ignite3_Test_IntegrationTests_ModuleTable/6466138
> UPD: Ticket IGNITE-16699 has been fixed and but the current ticket is still 
> reproducible, so the problem is not related to IGNITE-16699.
> In logs, we can see some suspicious message, need to investigate if this is 
> related to the problem. Actual run 
> https://ci.ignite.apache.org/buildConfiguration/ignite3_Test_RunAllTests/6470268,
>  actual logs are attached
> {noformat}
> 2022-03-18 10:29:33:399 +0300 
> [INFO][%ItTxDistributedTestSingleNode_null_2%JRaft-FSMCaller-Disruptor-_stripe_35-0][ActionRequestProcessor]
>  Error occurred on a user's state machine
> class org.apache.ignite.tx.TransactionException: Failed to enlist a key into 
> a transaction, state=ABORTED
>   at 
> org.apache.ignite.internal.table.distributed.raft.PartitionListener.tryEnlistIntoTransaction(PartitionListener.java:196)
>   at 
> org.apache.ignite.internal.table.distributed.raft.PartitionListener.lambda$onWrite$1(PartitionListener.java:134)
>   at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
>   at 
> org.apache.ignite.internal.table.distributed.raft.PartitionListener.onWrite(PartitionListener.java:131)
>   at 
> org.apache.ignite.internal.raft.server.impl.JraftServerImpl$DelegatingStateMachine.onApply(JraftServerImpl.java:415)
>   at 
> org.apache.ignite.raft.jraft.core.FSMCallerImpl.doApplyTasks(FSMCallerImpl.java:539)
>   at 
> org.apache.ignite.raft.jraft.core.FSMCallerImpl.doCommitted(FSMCallerImpl.java:507)
>   at 
> org.apache.ignite.raft.jraft.core.FSMCallerImpl.runApplyTask(FSMCallerImpl.java:437)
>   at 
> org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:134)
>   at 
> org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:128)
>   at 
> org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:215)
>   at 
> org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:179)
>   at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:137)
>   at java.base/java.lang.Thread.run(Thread.java:834)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-20199) Do not return updating rebalance assignments futures in DistributionZoneRebalanceEngine#onUpdateReplicas



 [ 
https://issues.apache.org/jira/browse/IGNITE-20199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mirza Aliev updated IGNITE-20199:
-
Epic Link: IGNITE-20166

> Do not return updating rebalance assignments futures in 
> DistributionZoneRebalanceEngine#onUpdateReplicas 
> -
>
> Key: IGNITE-20199
> URL: https://issues.apache.org/jira/browse/IGNITE-20199
> Project: Ignite
>  Issue Type: Bug
>Reporter: Mirza Aliev
>Priority: Major
>  Labels: ignite-3
>
> Seems that the current logic in 
> {{DistributionZoneRebalanceEngine#onUpdateReplicas}} is not correct in terms 
> of futures chaining. Currently we block configuration notification thread 
> until all partitions would updates theirs rebalance assignments keys in 
> metastorage. 
>  
> {code:java}
> private CompletableFuture 
> onUpdateReplicas(ConfigurationNotificationEvent replicasCtx) {
> ...
> ...
> return 
> distributionZoneManager.dataNodes(replicasCtx.storageRevision(), 
> zoneCfg.zoneId())
> .thenCompose(dataNodes -> {
> ...
> for (TableView tableCfg : tableViews) {
>...
> CompletableFuture[] partitionFutures = 
> RebalanceUtil.triggerAllTablePartitionsRebalance(...);
> tableFutures.add(allOf(partitionFutures));
> }
> return 
> allOf(tableFutures.toArray(CompletableFuture[]::new));
> });
> ...
> } {code}
> As a solution, we could just return completed future in the 
> {{DistributionZoneRebalanceEngine#onUpdateReplicas}} after we started 
> asynchronous logic of updating rebalance assignmnets.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (IGNITE-16700) ItTxDistributedTestThreeNodesThreeReplicas#testBalance is flaky



[ 
https://issues.apache.org/jira/browse/IGNITE-16700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754024#comment-17754024
 ] 

Denis Chudov commented on IGNITE-16700:
---

I made 31 builds of the Table module, seems to be okay: 
https://ci.ignite.apache.org/viewType.html?buildTypeId=ApacheIgnite3xGradle_Test_IntegrationTests_ModuleTable&branch_ApacheIgnite3xGradle_Test_IntegrationTests=pull%2F2439&tab=buildTypeHistoryList

> ItTxDistributedTestThreeNodesThreeReplicas#testBalance is flaky
> ---
>
> Key: IGNITE-16700
> URL: https://issues.apache.org/jira/browse/IGNITE-16700
> Project: Ignite
>  Issue Type: Bug
>Reporter: Mirza Aliev
>Assignee: Denis Chudov
>Priority: Major
>  Labels: ignite-3
> Attachments: _Integration_Tests_Module_Table_2055.log, 
> _Integration_Tests_Module_Table_2098.log
>
>
> {{ItTxDistributedTestThreeNodesThreeReplicas#testBalance}} periodically falls 
> with 
> {noformat}
> org.apache.ignite.lang.IgniteException
> org.apache.ignite.lang.IgniteException: java.util.concurrent.TimeoutException 
> ==> expected:  but was: 
> {noformat}
> We've noticed that the test become flaky after IGNITE-16393 has been merged. 
> Probably, the current problem is related to the problem with stopping 
> executors for network's user object serialization threads IGNITE-16699 as far 
> as the logs are full of warnings from IGNITE-16699.
> The plan for this ticket is to wait for IGNITE-16699 to be fixed and check 
> whether this issue is still reproducible. 
> https://ci.ignite.apache.org/buildConfiguration/ignite3_Test_IntegrationTests_ModuleTable/6466138
> UPD: Ticket IGNITE-16699 has been fixed and but the current ticket is still 
> reproducible, so the problem is not related to IGNITE-16699.
> In logs, we can see some suspicious message, need to investigate if this is 
> related to the problem. Actual run 
> https://ci.ignite.apache.org/buildConfiguration/ignite3_Test_RunAllTests/6470268,
>  actual logs are attached
> {noformat}
> 2022-03-18 10:29:33:399 +0300 
> [INFO][%ItTxDistributedTestSingleNode_null_2%JRaft-FSMCaller-Disruptor-_stripe_35-0][ActionRequestProcessor]
>  Error occurred on a user's state machine
> class org.apache.ignite.tx.TransactionException: Failed to enlist a key into 
> a transaction, state=ABORTED
>   at 
> org.apache.ignite.internal.table.distributed.raft.PartitionListener.tryEnlistIntoTransaction(PartitionListener.java:196)
>   at 
> org.apache.ignite.internal.table.distributed.raft.PartitionListener.lambda$onWrite$1(PartitionListener.java:134)
>   at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
>   at 
> org.apache.ignite.internal.table.distributed.raft.PartitionListener.onWrite(PartitionListener.java:131)
>   at 
> org.apache.ignite.internal.raft.server.impl.JraftServerImpl$DelegatingStateMachine.onApply(JraftServerImpl.java:415)
>   at 
> org.apache.ignite.raft.jraft.core.FSMCallerImpl.doApplyTasks(FSMCallerImpl.java:539)
>   at 
> org.apache.ignite.raft.jraft.core.FSMCallerImpl.doCommitted(FSMCallerImpl.java:507)
>   at 
> org.apache.ignite.raft.jraft.core.FSMCallerImpl.runApplyTask(FSMCallerImpl.java:437)
>   at 
> org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:134)
>   at 
> org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:128)
>   at 
> org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:215)
>   at 
> org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:179)
>   at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:137)
>   at java.base/java.lang.Thread.run(Thread.java:834)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (IGNITE-19211) ODBC 3.0: Align metainfo provided by driver with SQL engine in 3.0



 [ 
https://issues.apache.org/jira/browse/IGNITE-19211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Sapego reassigned IGNITE-19211:


Assignee: Igor Sapego

> ODBC 3.0: Align metainfo provided by driver with SQL engine in 3.0
> --
>
> Key: IGNITE-19211
> URL: https://issues.apache.org/jira/browse/IGNITE-19211
> Project: Ignite
>  Issue Type: Improvement
>  Components: odbc
>Reporter: Igor Sapego
>Assignee: Igor Sapego
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>
> Scope: 
> - Make sure we return proper metainformation on SQL types. Check 
> ignite/odbc/meta, ignite/odbc/type_traits.h, etc;
> - Port tests that are applicable;
> - Add new tests where needed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (IGNITE-19983) C++: Support BOOLEAN datatype



 [ 
https://issues.apache.org/jira/browse/IGNITE-19983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Sapego reassigned IGNITE-19983:


Assignee: Igor Sapego

> C++: Support BOOLEAN datatype
> -
>
> Key: IGNITE-19983
> URL: https://issues.apache.org/jira/browse/IGNITE-19983
> Project: Ignite
>  Issue Type: Improvement
>  Components: platforms, thin client
>Reporter: Igor Sapego
>Assignee: Igor Sapego
>Priority: Major
>  Labels: ignite-3
>
> IGNITE-17298 added support for boolean type to server, so we need to add it 
> to C++ client as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-20187) Catch-up rebalance on node restart: assignments keys



 [ 
https://issues.apache.org/jira/browse/IGNITE-20187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Lapin updated IGNITE-20187:
-
Description: 
h3. Motivation

Prior to the implementation of the meta storage compaction and the related node 
restart updates, the node restored its volatile state in terms of assignments 
through ms.watches starting from APPLIED_REVISION + 1. Meaning that after the 
restart, the node was notified about missing state through {*}the events{*}. 
However, it's no longer true: new logic assumes that the node will register 
ms.watch starting from APPLIED_REVISION + X + 1 and will manually read local 
meta storage state for APPLIED_REVISION +X along with related processing. The 
implementation of the above process is the essence of this ticket.
h3. Definition of Done

Within node restart process, TableManager or similar should manually read local 
assignments pending keys (reading assignments stable will be covered in a 
separate ticket) and schedule corresponding rebalance.
h3. Implementation Notes

It's possible that assignemnts.pending keys will be stale at the moment of 
processing, so in order to overcome given issue following 
common-for-current-rebalance steps are proposed:
 # Start all new needed nodes {{partition.assignments.pending / 
partition.assignments.stable}}
 # After successful starts - check if current node is the leader of raft group 
(leader response must be updated by current term), if it is
 # Read distributed {{partition.assignments.pending }}and if the retrieved 
revision is less or equal to the one retrieved within initial local read run{{ 
}}{{{}RaftGroupService#changePeersAsync(leaderTerm, peers){}}}{{{}. 
{}}}{{RaftGroupService#changePeersAsync}}{{ from old terms must be skipped.}}

Seems that 
https://github.com/apache/ignite-3/blob/main/modules/table/tech-notes/rebalance.md
 should be also updated a bit.

> Catch-up rebalance on node restart: assignments keys
> 
>
> Key: IGNITE-20187
> URL: https://issues.apache.org/jira/browse/IGNITE-20187
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Alexander Lapin
>Priority: Major
>  Labels: ignite-3
>
> h3. Motivation
> Prior to the implementation of the meta storage compaction and the related 
> node restart updates, the node restored its volatile state in terms of 
> assignments through ms.watches starting from APPLIED_REVISION + 1. Meaning 
> that after the restart, the node was notified about missing state through 
> {*}the events{*}. However, it's no longer true: new logic assumes that the 
> node will register ms.watch starting from APPLIED_REVISION + X + 1 and will 
> manually read local meta storage state for APPLIED_REVISION +X along with 
> related processing. The implementation of the above process is the essence of 
> this ticket.
> h3. Definition of Done
> Within node restart process, TableManager or similar should manually read 
> local assignments pending keys (reading assignments stable will be covered in 
> a separate ticket) and schedule corresponding rebalance.
> h3. Implementation Notes
> It's possible that assignemnts.pending keys will be stale at the moment of 
> processing, so in order to overcome given issue following 
> common-for-current-rebalance steps are proposed:
>  # Start all new needed nodes {{partition.assignments.pending / 
> partition.assignments.stable}}
>  # After successful starts - check if current node is the leader of raft 
> group (leader response must be updated by current term), if it is
>  # Read distributed {{partition.assignments.pending }}and if the retrieved 
> revision is less or equal to the one retrieved within initial local read 
> run{{ }}{{{}RaftGroupService#changePeersAsync(leaderTerm, peers){}}}{{{}. 
> {}}}{{RaftGroupService#changePeersAsync}}{{ from old terms must be skipped.}}
> Seems that 
> https://github.com/apache/ignite-3/blob/main/modules/table/tech-notes/rebalance.md
>  should be also updated a bit.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-20187) Catch-up rebalance on node restart: assignments keys



 [ 
https://issues.apache.org/jira/browse/IGNITE-20187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Lapin updated IGNITE-20187:
-
Description: 
h3. Motivation

Prior to the implementation of the meta storage compaction and the related node 
restart updates, the node restored its volatile state in terms of assignments 
through ms.watches starting from APPLIED_REVISION + 1. Meaning that after the 
restart, the node was notified about missing state through {*}the events{*}. 
However, it's no longer true: new logic assumes that the node will register 
ms.watch starting from APPLIED_REVISION + X + 1 and will manually read local 
meta storage state for APPLIED_REVISION +X along with related processing. The 
implementation of the above process is the essence of this ticket.
h3. Definition of Done

Within node restart process, TableManager or similar should manually read local 
assignments pending keys (reading assignments stable will be covered in a 
separate ticket) and schedule corresponding rebalance.
h3. Implementation Notes

It's possible that assignemnts.pending keys will be stale at the moment of 
processing, so in order to overcome given issue following 
common-for-current-rebalance steps are proposed:
 # Start all new needed nodes {{partition.assignments.pending / 
partition.assignments.stable}}
 # After successful starts - check if current node is the leader of raft group 
(leader response must be updated by current term), if it is
 # Read distributed {{partition.assignments.pending }}and if the retrieved 
revision is less or equal to the one retrieved within initial local read run 
RaftGroupService#changePeersAsync(leaderTerm, peers) 
RaftGroupService#changePeersAsync from old terms must be skipped.

Seems that 
https://github.com/apache/ignite-3/blob/main/modules/table/tech-notes/rebalance.md
 should be also updated a bit.

  was:
h3. Motivation

Prior to the implementation of the meta storage compaction and the related node 
restart updates, the node restored its volatile state in terms of assignments 
through ms.watches starting from APPLIED_REVISION + 1. Meaning that after the 
restart, the node was notified about missing state through {*}the events{*}. 
However, it's no longer true: new logic assumes that the node will register 
ms.watch starting from APPLIED_REVISION + X + 1 and will manually read local 
meta storage state for APPLIED_REVISION +X along with related processing. The 
implementation of the above process is the essence of this ticket.
h3. Definition of Done

Within node restart process, TableManager or similar should manually read local 
assignments pending keys (reading assignments stable will be covered in a 
separate ticket) and schedule corresponding rebalance.
h3. Implementation Notes

It's possible that assignemnts.pending keys will be stale at the moment of 
processing, so in order to overcome given issue following 
common-for-current-rebalance steps are proposed:
 # Start all new needed nodes {{partition.assignments.pending / 
partition.assignments.stable}}
 # After successful starts - check if current node is the leader of raft group 
(leader response must be updated by current term), if it is
 # Read distributed {{partition.assignments.pending }}and if the retrieved 
revision is less or equal to the one retrieved within initial local read run{{ 
}}{{{}RaftGroupService#changePeersAsync(leaderTerm, peers){}}}{{{}. 
{}}}{{RaftGroupService#changePeersAsync}}{{ from old terms must be skipped.}}

Seems that 
https://github.com/apache/ignite-3/blob/main/modules/table/tech-notes/rebalance.md
 should be also updated a bit.


> Catch-up rebalance on node restart: assignments keys
> 
>
> Key: IGNITE-20187
> URL: https://issues.apache.org/jira/browse/IGNITE-20187
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Alexander Lapin
>Priority: Major
>  Labels: ignite-3
>
> h3. Motivation
> Prior to the implementation of the meta storage compaction and the related 
> node restart updates, the node restored its volatile state in terms of 
> assignments through ms.watches starting from APPLIED_REVISION + 1. Meaning 
> that after the restart, the node was notified about missing state through 
> {*}the events{*}. However, it's no longer true: new logic assumes that the 
> node will register ms.watch starting from APPLIED_REVISION + X + 1 and will 
> manually read local meta storage state for APPLIED_REVISION +X along with 
> related processing. The implementation of the above process is the essence of 
> this ticket.
> h3. Definition of Done
> Within node restart process, TableManager or similar should manually read 
> local assignments pending keys (reading assignments stable will be covered in 
> a separate ticket) and schedule corresponding rebalance.
> h3. Implementati

[jira] [Created] (IGNITE-20209) Catch-up rebalance triggers on node restart

Alexander Lapin created IGNITE-20209:


 Summary: Catch-up rebalance triggers on node restart
 Key: IGNITE-20209
 URL: https://issues.apache.org/jira/browse/IGNITE-20209
 Project: Ignite
  Issue Type: Improvement
Reporter: Alexander Lapin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (IGNITE-20015) Sql. Introduce new distribution function

2023-08-14 Thread Andrey Mashenkov (Jira)



 [ 
https://issues.apache.org/jira/browse/IGNITE-20015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Mashenkov reassigned IGNITE-20015:
-

Assignee: (was: Andrey Mashenkov)

> Sql. Introduce new distribution function
> 
>
> Key: IGNITE-20015
> URL: https://issues.apache.org/jira/browse/IGNITE-20015
> Project: Ignite
>  Issue Type: Improvement
>  Components: sql
>Reporter: Konstantin Orlov
>Priority: Major
>  Labels: ignite-3
>
> To realize the full potential of sql engine in queries over node specific 
> views, we need to support new type of distribution function 
> ({{org.apache.ignite.internal.sql.engine.trait.DistributionFunction}}). The 
> semantic of this new function should be pretty strait forward: the column 
> this function refers to is actually an identity of the node containing the 
> data.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-20209) Catch-up rebalance triggers on node restart



 [ 
https://issues.apache.org/jira/browse/IGNITE-20209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Lapin updated IGNITE-20209:
-
Description: 
h3. Motivation

Please check https://issues.apache.org/jira/browse/IGNITE-20187 for more 
context, that is about catching-up assignments.pending meta storage keys, 
whether given one is about catching-up its triggers:
 * Replica factor updates.
 * Partitions count updates. Immutable for now.
 * Data nodes updates.
 * Replica storage addition/removal. !By the way, is it possible to remove 
replica storage.

For all aforementioned cases, it's required to update distributed assignments 
pending (planned) keys if it's not yet done. And the only difficulty here is 
precisely in understanding whether this was done or not.
h3. Definition of Done

Updated distributed assignments pending(planned) keys if necessary according to 
the current triggers state.

> Catch-up rebalance triggers on node restart
> ---
>
> Key: IGNITE-20209
> URL: https://issues.apache.org/jira/browse/IGNITE-20209
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Alexander Lapin
>Priority: Major
>  Labels: ignite-3
>
> h3. Motivation
> Please check https://issues.apache.org/jira/browse/IGNITE-20187 for more 
> context, that is about catching-up assignments.pending meta storage keys, 
> whether given one is about catching-up its triggers:
>  * Replica factor updates.
>  * Partitions count updates. Immutable for now.
>  * Data nodes updates.
>  * Replica storage addition/removal. !By the way, is it possible to remove 
> replica storage.
> For all aforementioned cases, it's required to update distributed assignments 
> pending (planned) keys if it's not yet done. And the only difficulty here is 
> precisely in understanding whether this was done or not.
> h3. Definition of Done
> Updated distributed assignments pending(planned) keys if necessary according 
> to the current triggers state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IGNITE-20210) Start partitions on corresponding assignments.stable, calculate if missing, cleanup obsolete resources

Alexander Lapin created IGNITE-20210:


 Summary: Start partitions on corresponding assignments.stable, 
calculate if missing, cleanup obsolete resources
 Key: IGNITE-20210
 URL: https://issues.apache.org/jira/browse/IGNITE-20210
 Project: Ignite
  Issue Type: Improvement
Reporter: Alexander Lapin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IGNITE-20211) GridTxfuture's (scope#3) code deduplication

2023-08-14 Thread Anton Vinogradov (Jira)

Anton Vinogradov created IGNITE-20211:
-

 Summary: Grid*Tx*future's (scope#3) code deduplication
 Key: IGNITE-20211
 URL: https://issues.apache.org/jira/browse/IGNITE-20211
 Project: Ignite
  Issue Type: Sub-task
Reporter: Anton Vinogradov
Assignee: Anton Vinogradov






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-20210) Start partitions on corresponding assignments.stable, calculate if missing, cleanup obsolete resources



 [ 
https://issues.apache.org/jira/browse/IGNITE-20210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Lapin updated IGNITE-20210:
-
Description: 
h3. Motivation

Please check https://issues.apache.org/jira/browse/IGNITE-20187 and 
https://issues.apache.org/jira/browse/IGNITE-20209 for more details. This 
ticket is about assignments stable catch-up. Obviously there are the following 
possibilities:
 # Assignments.stable are present - just start table locally. Basically it is 
IGNITE-20187 but not for assignments pending but stable.
 # Assignemnts stable are missing. Well it's the same as IGNITE-20209 but for 
table creation triggers and not rebalance ones.

Besides that it's nessessary to cleanup obsolete resourves e.g. raft and 
partitions storages.

Currently, all that stuff is implemented incorrectly through:
{code:java}
if (partitionAssignments(vaultManager, tableId, 0) != null) {
assignmentsFuture = completedFuture(tableAssignments(vaultManager, tableId, 
zoneDescriptor.partitions()));
} else {
assignmentsFuture = 
distributionZoneManager.dataNodes(ctx.storageRevision(), 
tableDescriptor.zoneId())
.thenApply(dataNodes -> AffinityUtils.calculateAssignments(
dataNodes,
zoneDescriptor.partitions(),
zoneDescriptor.replicas()
));
} {code}
h3. Definition of Done
 * Assignments.stable update is properly catched-up on top of corresponding 
table creation triggers.
 * Partitions start up is implemented trough assignments.stable instead of 
table cfg triggers along with assignments recalculation, like it's implemented 
now.
 * Obsolete partition storages are removed on node restart.

 

> Start partitions on corresponding assignments.stable, calculate if missing, 
> cleanup obsolete resources
> --
>
> Key: IGNITE-20210
> URL: https://issues.apache.org/jira/browse/IGNITE-20210
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Alexander Lapin
>Priority: Major
>  Labels: ignite-3
>
> h3. Motivation
> Please check https://issues.apache.org/jira/browse/IGNITE-20187 and 
> https://issues.apache.org/jira/browse/IGNITE-20209 for more details. This 
> ticket is about assignments stable catch-up. Obviously there are the 
> following possibilities:
>  # Assignments.stable are present - just start table locally. Basically it is 
> IGNITE-20187 but not for assignments pending but stable.
>  # Assignemnts stable are missing. Well it's the same as IGNITE-20209 but for 
> table creation triggers and not rebalance ones.
> Besides that it's nessessary to cleanup obsolete resourves e.g. raft and 
> partitions storages.
> Currently, all that stuff is implemented incorrectly through:
> {code:java}
> if (partitionAssignments(vaultManager, tableId, 0) != null) {
> assignmentsFuture = completedFuture(tableAssignments(vaultManager, 
> tableId, zoneDescriptor.partitions()));
> } else {
> assignmentsFuture = 
> distributionZoneManager.dataNodes(ctx.storageRevision(), 
> tableDescriptor.zoneId())
> .thenApply(dataNodes -> AffinityUtils.calculateAssignments(
> dataNodes,
> zoneDescriptor.partitions(),
> zoneDescriptor.replicas()
> ));
> } {code}
> h3. Definition of Done
>  * Assignments.stable update is properly catched-up on top of corresponding 
> table creation triggers.
>  * Partitions start up is implemented trough assignments.stable instead of 
> table cfg triggers along with assignments recalculation, like it's 
> implemented now.
>  * Obsolete partition storages are removed on node restart.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-20124) Prevent double storage updates within primary



 [ 
https://issues.apache.org/jira/browse/IGNITE-20124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denis Chudov updated IGNITE-20124:
--
Description: 
h3. Motivation

In order to preserve the guarantee that the primary replica is always 
up-to-date it's required to:
 * In case of common RW transaction - insert writeIntent to the storage within 
primary before replication.
 * In case of one-phase-commit - insert commitedWrite after the replication.

Both have already been done. However, that means that if primary is part of the 
replication group, and it's true in almost all cases, we will double the update:
 * In case of common RW transaction - through the replication.
 * In case of one-phase-commit - either through the replication, or though post 
update, if replication was fast enough.

h3. Definition of Done
 * Prevent double storage updates within primary.

h3. Implementation Notes

The easiest way to prevent double insert is to skip one if local safe time is 
greater or equal to candidates. There are 3 places where we update partition 
storage:
 # Primary pre-replication update. In that case, the second update on 
replication should be excluded.
 # Primary post-replication update in case of 1PC. It's possible to see already 
updated data if replication was already processed locally. It is expected to be 
already covered in https://issues.apache.org/jira/browse/IGNITE-15927 . We 
should check the primary safe time on post-replication update and don't do 
update if the safe time is already adjusted.
 # Insert through replication. In case of !1PC on every primary there will be 
double insert (see 1). In case of 1PC it depends, so we should check the safe 
time on primary to know whether the update should be done (see 2).

In every case, the storage indexes still should be adjusted on replication, as 
it is done now, because the progress of indexes on FSM write operations should 
not be violated - otherwise, a Raft snapshot-based rebalance would be broken. 
We may have two non-consistent storage updates on primary which may affect 
different fsyncs, so maybe we should benchmark this optimization to find out 
how useful it is. The transactional correctness isn't violated by these 
non-consistent storage updates, because there is only a possibility that some 
writes or write intents will go ahead of indexes and therefore will be included 
into snapshots - however we still can process such writes and resolve write 
intents.

Also, the safe time needs to be updated on the primary replica now.

  was:
h3. Motivation

In order to preserve the guarantee that the primary replica is always 
up-to-date it's required to:
 * In case of common RW transaction - insert writeIntent to the storage within 
primary before replication.
 * In case of one-phase-commit - insert commitedWrite after the replication.

Both have already been done. However, that means that if primary is part of the 
replication group, and it's true in almost all cases, we will double the update:
 * In case of common RW transaction - through the replication.
 * In case of one-phase-commit - either through the replication, or though post 
update, if replication was fast enough.

h3. Definition of Done
 * Prevent double storage updates within primary.

h3. Implementation Notes

The easiest way to prevent double insert is to skip one if local safe time is 
greater or equal to candidates. There are 3 places where we update partition 
storage:
 # Primary pre-replication update. In that case, the second update on 
replication should be excluded.
 # Primary post-replication update in case of 1PC. It's possible to see already 
updated data if replication was already processed locally. It is expected to be 
already covered in https://issues.apache.org/jira/browse/IGNITE-15927 . We 
should check the primary safe time on post-replication update and don't do 
update if the safe time is already adjusted.
 # Insert through replication. In case of !1PC on every primary there will be 
double insert (see 1). In case of 1PC it depends, so we should check the safe 
time on primary to know whether the update should be done (see 2).

In every case, the storage indexes still should be adjusted on replication, as 
it is done now, because the progress of indexes on FSM write operations should 
not be violated - otherwise, a Raft snapshot-based rebalance would be broken. 
We may have two non-consistent storage updates on primary which may affect 
different fsyncs, so maybe we should benchmark this optimization to find out 
how useful it is. The transactional correctness isn't violated by these 

 


> Prevent double storage updates within primary
> -
>
> Key: IGNITE-20124
> URL: https://issues.apache.org/jira/browse/IGNITE-20124
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Alexander Lapin
>

[jira] [Created] (IGNITE-20212) Investigate maximum throughput of inserts via DataStreamer

2023-08-14 Thread Alexey Scherbakov (Jira)

Alexey Scherbakov created IGNITE-20212:
--

 Summary: Investigate maximum throughput of inserts via DataStreamer
 Key: IGNITE-20212
 URL: https://issues.apache.org/jira/browse/IGNITE-20212
 Project: Ignite
  Issue Type: Task
Reporter: Alexey Scherbakov
 Fix For: 3.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-20212) Investigate maximum throughput of inserts via DataStreamer

2023-08-14 Thread Alexey Scherbakov (Jira)



 [ 
https://issues.apache.org/jira/browse/IGNITE-20212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Scherbakov updated IGNITE-20212:
---
Epic Link: IGNITE-19479

> Investigate maximum throughput of inserts via DataStreamer
> --
>
> Key: IGNITE-20212
> URL: https://issues.apache.org/jira/browse/IGNITE-20212
> Project: Ignite
>  Issue Type: Task
>Reporter: Alexey Scherbakov
>Priority: Major
> Fix For: 3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-20207) Improve the error handling



 [ 
https://issues.apache.org/jira/browse/IGNITE-20207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Gagarkin updated IGNITE-20207:
---
Summary: Improve the error handling  (was: Improve the writing of files in 
FileTransferService)

> Improve the error handling
> --
>
> Key: IGNITE-20207
> URL: https://issues.apache.org/jira/browse/IGNITE-20207
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Gagarkin
>Priority: Major
>  Labels: ignite-3
>
> The current implementation of 
> {{org.apache.ignite.internal.network.file.ChunkedFileWriter}} compares the 
> file pointer with the offset of the received file chunk. If they are equal, 
> the chunk is written to the disk; if not, the chunk is placed in the queue, 
> and it will be written when all previous chunks have been written.
> It might be more efficient to write chunks instantly.
> We should investigate this approach and improve the implementation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-15927) Implement one phase commit

2023-08-14 Thread Alexey Scherbakov (Jira)



 [ 
https://issues.apache.org/jira/browse/IGNITE-15927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Scherbakov updated IGNITE-15927:
---
Labels: ignite-3 ignite3_performance  (was: ignite-3)

> Implement one phase commit
> --
>
> Key: IGNITE-15927
> URL: https://issues.apache.org/jira/browse/IGNITE-15927
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Alexey Scherbakov
>Assignee: Alexey Scherbakov
>Priority: Major
>  Labels: ignite-3, ignite3_performance
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> If all keys in the implicit transaction belong to a same partition in can be 
> committed in one round-trip.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-20207) Improve the error handling



 [ 
https://issues.apache.org/jira/browse/IGNITE-20207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Gagarkin updated IGNITE-20207:
---
Description: The current implementation of 
org.apache.ignite.internal.network.file.FileTransferService doesn't provide 
recovery functionality. Any error during file transfer leads to repeating the 
transfer from scratch.   (was: The current implementation of 
org.apache.ignite.internal.network.file.FileTransferService doesn't provide 
recovery functionality. Any error during file transfer leads to repeat the 
transfer from scratch. )

> Improve the error handling
> --
>
> Key: IGNITE-20207
> URL: https://issues.apache.org/jira/browse/IGNITE-20207
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Gagarkin
>Priority: Major
>  Labels: ignite-3
>
> The current implementation of 
> org.apache.ignite.internal.network.file.FileTransferService doesn't provide 
> recovery functionality. Any error during file transfer leads to repeating the 
> transfer from scratch. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-20207) Improve the error handling



 [ 
https://issues.apache.org/jira/browse/IGNITE-20207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Gagarkin updated IGNITE-20207:
---
Description: The current implementation of 
org.apache.ignite.internal.network.file.FileTransferService doesn't provide 
recovery functionality. Any error during file transfer leads to repeat the 
transfer from scratch.   (was: The current implementation of 
{{org.apache.ignite.internal.network.file.ChunkedFileWriter}} compares the file 
pointer with the offset of the received file chunk. If they are equal, the 
chunk is written to the disk; if not, the chunk is placed in the queue, and it 
will be written when all previous chunks have been written.

It might be more efficient to write chunks instantly.

We should investigate this approach and improve the implementation.)

> Improve the error handling
> --
>
> Key: IGNITE-20207
> URL: https://issues.apache.org/jira/browse/IGNITE-20207
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Gagarkin
>Priority: Major
>  Labels: ignite-3
>
> The current implementation of 
> org.apache.ignite.internal.network.file.FileTransferService doesn't provide 
> recovery functionality. Any error during file transfer leads to repeat the 
> transfer from scratch. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-20207) Improve the error handling



 [ 
https://issues.apache.org/jira/browse/IGNITE-20207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Gagarkin updated IGNITE-20207:
---
Description: The current implementation of 
org.apache.ignite.internal.network.file.FileTransferService doesn't provide 
recovery functionality. Any error during file transfer leads to repeating the 
transfer from scratch. We need to define cases when Ignite can provide recovery 
and implement this functionality.   (was: The current implementation of 
org.apache.ignite.internal.network.file.FileTransferService doesn't provide 
recovery functionality. Any error during file transfer leads to repeating the 
transfer from scratch. )

> Improve the error handling
> --
>
> Key: IGNITE-20207
> URL: https://issues.apache.org/jira/browse/IGNITE-20207
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Gagarkin
>Priority: Major
>  Labels: ignite-3
>
> The current implementation of 
> org.apache.ignite.internal.network.file.FileTransferService doesn't provide 
> recovery functionality. Any error during file transfer leads to repeating the 
> transfer from scratch. We need to define cases when Ignite can provide 
> recovery and implement this functionality. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-20203) File transfer for Ignite 3



 [ 
https://issues.apache.org/jira/browse/IGNITE-20203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Gagarkin updated IGNITE-20203:
---
Description: As outcome of 

> File transfer for Ignite 3
> --
>
> Key: IGNITE-20203
> URL: https://issues.apache.org/jira/browse/IGNITE-20203
> Project: Ignite
>  Issue Type: Epic
>Reporter: Ivan Gagarkin
>Priority: Major
>  Labels: ignite-3
>
> As outcome of 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-20203) File transfer for Ignite 3



 [ 
https://issues.apache.org/jira/browse/IGNITE-20203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Gagarkin updated IGNITE-20203:
---
Description: As outcome of  IGNITE-19009, we   (was: As outcome of )

> File transfer for Ignite 3
> --
>
> Key: IGNITE-20203
> URL: https://issues.apache.org/jira/browse/IGNITE-20203
> Project: Ignite
>  Issue Type: Epic
>Reporter: Ivan Gagarkin
>Priority: Major
>  Labels: ignite-3
>
> As outcome of  IGNITE-19009, we 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-20203) File transfer for Ignite 3



 [ 
https://issues.apache.org/jira/browse/IGNITE-20203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Gagarkin updated IGNITE-20203:
---
Description: 
In the outcome of IGNITE-19009, we obtained the new module 
{{{}ignite-file-transfer{}}}.

All file transfers in Ignite 3 should utilize the new service, 
FileTransferService. Additionally, there are some aspects of the service that 
need improvement.

  was:As outcome of  IGNITE-19009, we 


> File transfer for Ignite 3
> --
>
> Key: IGNITE-20203
> URL: https://issues.apache.org/jira/browse/IGNITE-20203
> Project: Ignite
>  Issue Type: Epic
>Reporter: Ivan Gagarkin
>Priority: Major
>  Labels: ignite-3
>
> In the outcome of IGNITE-19009, we obtained the new module 
> {{{}ignite-file-transfer{}}}.
> All file transfers in Ignite 3 should utilize the new service, 
> FileTransferService. Additionally, there are some aspects of the service that 
> need improvement.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-20208) Reduce the size of



 [ 
https://issues.apache.org/jira/browse/IGNITE-20208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Gagarkin updated IGNITE-20208:
---
Summary: Reduce the size of   (was: Use file ids instead of file names when 
transferring file chunks)

> Reduce the size of 
> ---
>
> Key: IGNITE-20208
> URL: https://issues.apache.org/jira/browse/IGNITE-20208
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Gagarkin
>Priority: Major
>  Labels: ignite-3
>
> We can decrease the size of 
> org.apache.ignite.internal.network.file.messages.FileChunkMessage by 
> replacing file names with file ids. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-20208) Reduce the size of FileChunkMessage



 [ 
https://issues.apache.org/jira/browse/IGNITE-20208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Gagarkin updated IGNITE-20208:
---
Summary: Reduce the size of FileChunkMessage  (was: Reduce the size of )

> Reduce the size of FileChunkMessage
> ---
>
> Key: IGNITE-20208
> URL: https://issues.apache.org/jira/browse/IGNITE-20208
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Ivan Gagarkin
>Priority: Major
>  Labels: ignite-3
>
> We can decrease the size of 
> org.apache.ignite.internal.network.file.messages.FileChunkMessage by 
> replacing file names with file ids. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IGNITE-20213) RO transactions should not block LWM from rising

Aleksandr Polovtcev created IGNITE-20213:


 Summary: RO transactions should not block LWM from rising
 Key: IGNITE-20213
 URL: https://issues.apache.org/jira/browse/IGNITE-20213
 Project: Ignite
  Issue Type: Task
Reporter: Aleksandr Polovtcev
Assignee: Alexander Lapin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-20213) RO transactions should not block LWM from rising



 [ 
https://issues.apache.org/jira/browse/IGNITE-20213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksandr Polovtcev updated IGNITE-20213:
-
Description: 
{{org.apache.ignite.internal.tx.TxManager#updateLowWatermark}} returns a 
{{CompletableFuture}} that is completed when all currently running RO 
transactions complete. Until that future is complete, local Low Watermark does 
not get updated.

It is proposed to change this behavior: instead of blocking the Low Watermark 
update, all unfinished RO transactions (at the time of the LWM update) must 
fail with an appropriate error. 

  was:
{{org.apache.ignite.internal.tx.TxManager#updateLowWatermark}} returns a 
{{CompletableFuture}} that is completed when all currently running RO 
transactions complete. Until that future is complete, local Low Watermark does 
not get updated.

It is proposed to change this behavior: instead of blocking the Low Watermark 
update, all unfinished RO transactions must fail with an appropriate error. 


> RO transactions should not block LWM from rising
> 
>
> Key: IGNITE-20213
> URL: https://issues.apache.org/jira/browse/IGNITE-20213
> Project: Ignite
>  Issue Type: Task
>Reporter: Aleksandr Polovtcev
>Assignee: Alexander Lapin
>Priority: Major
>  Labels: ignite-3
>
> {{org.apache.ignite.internal.tx.TxManager#updateLowWatermark}} returns a 
> {{CompletableFuture}} that is completed when all currently running RO 
> transactions complete. Until that future is complete, local Low Watermark 
> does not get updated.
> It is proposed to change this behavior: instead of blocking the Low Watermark 
> update, all unfinished RO transactions (at the time of the LWM update) must 
> fail with an appropriate error. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-20213) RO transactions should not block LWM from rising



 [ 
https://issues.apache.org/jira/browse/IGNITE-20213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksandr Polovtcev updated IGNITE-20213:
-
Description: 
{{org.apache.ignite.internal.tx.TxManager#updateLowWatermark}} returns a 
{{CompletableFuture}} that is completed when all currently running RO 
transactions complete. Until that future is complete, local Low Watermark does 
not get updated.

It is proposed to change this behavior: instead of blocking the Low Watermark 
update, all unfinished RO transactions must fail with an appropriate error. 

> RO transactions should not block LWM from rising
> 
>
> Key: IGNITE-20213
> URL: https://issues.apache.org/jira/browse/IGNITE-20213
> Project: Ignite
>  Issue Type: Task
>Reporter: Aleksandr Polovtcev
>Assignee: Alexander Lapin
>Priority: Major
>  Labels: ignite-3
>
> {{org.apache.ignite.internal.tx.TxManager#updateLowWatermark}} returns a 
> {{CompletableFuture}} that is completed when all currently running RO 
> transactions complete. Until that future is complete, local Low Watermark 
> does not get updated.
> It is proposed to change this behavior: instead of blocking the Low Watermark 
> update, all unfinished RO transactions must fail with an appropriate error. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (IGNITE-16088) Reuse Marshaller code in marshaller-common module



 [ 
https://issues.apache.org/jira/browse/IGNITE-16088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksandr Polovtcev reassigned IGNITE-16088:


Assignee: Aleksandr Polovtcev  (was: Pavel Tupitsyn)

> Reuse Marshaller code in marshaller-common module
> -
>
> Key: IGNITE-16088
> URL: https://issues.apache.org/jira/browse/IGNITE-16088
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha4
>Reporter: Pavel Tupitsyn
>Assignee: Aleksandr Polovtcev
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> IGNITE-14971 added *ignite-marshaller-common* module to reuse serialization 
> logic between the server and client parts.
> This module duplicates some logic from *ignite-schema* module.
> * Remove duplicated code from *ignite-schema* and reuse the logic from common 
> module.
> * Extract other common bits where applicable (e.g. *AsmSerializerGenerator*)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-20187) Catch-up rebalance on node restart: assignments keys



 [ 
https://issues.apache.org/jira/browse/IGNITE-20187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mirza Aliev updated IGNITE-20187:
-
Description: 
h3. Motivation

Prior to the implementation of the meta storage compaction and the related node 
restart updates, the node restored its volatile state in terms of assignments 
through ms.watches starting from APPLIED_REVISION + 1. Meaning that after the 
restart, the node was notified about missing state through {*}the events{*}. 
However, it's no longer true: new logic assumes that the node will register 
ms.watch starting from APPLIED_REVISION + X + 1 and will manually read local 
meta storage state for APPLIED_REVISION +X along with related processing. The 
implementation of the above process is the essence of this ticket.
h3. Definition of Done

Within node restart process, TableManager or similar should manually read local 
assignments pending keys (reading assignments stable will be covered in a 
separate ticket) and schedule corresponding rebalance.
h3. Implementation Notes

It's possible that assignemnts.pending keys will be stale at the moment of 
processing, so in order to overcome given issue following 
common-for-current-rebalance steps are proposed:
 # Start all new needed nodes {{partition.assignments.pending / 
partition.assignments.stable}}
 # After successful starts - check if current node is the leader of raft group 
(leader response must be updated by current term), if it is
 # Read distributed \{{partition.assignments.pending}} and if the retrieved 
revision is less or equal to the one retrieved within initial local read run 
RaftGroupService#changePeersAsync(leaderTerm, peers) 
RaftGroupService#changePeersAsync from old terms must be skipped.

Seems that 
[https://github.com/apache/ignite-3/blob/main/modules/table/tech-notes/rebalance.md]
 should be also updated a bit.

  was:
h3. Motivation

Prior to the implementation of the meta storage compaction and the related node 
restart updates, the node restored its volatile state in terms of assignments 
through ms.watches starting from APPLIED_REVISION + 1. Meaning that after the 
restart, the node was notified about missing state through {*}the events{*}. 
However, it's no longer true: new logic assumes that the node will register 
ms.watch starting from APPLIED_REVISION + X + 1 and will manually read local 
meta storage state for APPLIED_REVISION +X along with related processing. The 
implementation of the above process is the essence of this ticket.
h3. Definition of Done

Within node restart process, TableManager or similar should manually read local 
assignments pending keys (reading assignments stable will be covered in a 
separate ticket) and schedule corresponding rebalance.
h3. Implementation Notes

It's possible that assignemnts.pending keys will be stale at the moment of 
processing, so in order to overcome given issue following 
common-for-current-rebalance steps are proposed:
 # Start all new needed nodes {{partition.assignments.pending / 
partition.assignments.stable}}
 # After successful starts - check if current node is the leader of raft group 
(leader response must be updated by current term), if it is
 # Read distributed {{partition.assignments.pending }}and if the retrieved 
revision is less or equal to the one retrieved within initial local read run 
RaftGroupService#changePeersAsync(leaderTerm, peers) 
RaftGroupService#changePeersAsync from old terms must be skipped.

Seems that 
https://github.com/apache/ignite-3/blob/main/modules/table/tech-notes/rebalance.md
 should be also updated a bit.


> Catch-up rebalance on node restart: assignments keys
> 
>
> Key: IGNITE-20187
> URL: https://issues.apache.org/jira/browse/IGNITE-20187
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Alexander Lapin
>Priority: Major
>  Labels: ignite-3
>
> h3. Motivation
> Prior to the implementation of the meta storage compaction and the related 
> node restart updates, the node restored its volatile state in terms of 
> assignments through ms.watches starting from APPLIED_REVISION + 1. Meaning 
> that after the restart, the node was notified about missing state through 
> {*}the events{*}. However, it's no longer true: new logic assumes that the 
> node will register ms.watch starting from APPLIED_REVISION + X + 1 and will 
> manually read local meta storage state for APPLIED_REVISION +X along with 
> related processing. The implementation of the above process is the essence of 
> this ticket.
> h3. Definition of Done
> Within node restart process, TableManager or similar should manually read 
> local assignments pending keys (reading assignments stable will be covered in 
> a separate ticket) and schedule corresponding rebalance.
> h3. Implementation Notes
> It's possible that assi

[jira] [Updated] (IGNITE-20209) Catch-up rebalance triggers on node restart



 [ 
https://issues.apache.org/jira/browse/IGNITE-20209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mirza Aliev updated IGNITE-20209:
-
Description: 
h3. Motivation

Please check https://issues.apache.org/jira/browse/IGNITE-20187 for more 
context, that is about catching-up assignments.pending meta storage keys, 
whether given one is about catching-up its triggers:
 * Replica factor updates.
 * Partitions count updates. Immutable for now.
 * Data nodes updates.
 * Replica storage addition/removal. !By the way, is it possible to remove 
replica storage.

For all aforementioned cases, it's required to update distributed assignments 
pending (planned) keys if it's not yet done. And the only difficulty here is 
precisely in understanding whether this was done or not.
h3. Definition of Done

Updated distributed assignments pending(planned) keys if necessary according to 
the current triggers state.

 

Notes:

1) Add to metastorage starting revision 

  was:
h3. Motivation

Please check https://issues.apache.org/jira/browse/IGNITE-20187 for more 
context, that is about catching-up assignments.pending meta storage keys, 
whether given one is about catching-up its triggers:
 * Replica factor updates.
 * Partitions count updates. Immutable for now.
 * Data nodes updates.
 * Replica storage addition/removal. !By the way, is it possible to remove 
replica storage.

For all aforementioned cases, it's required to update distributed assignments 
pending (planned) keys if it's not yet done. And the only difficulty here is 
precisely in understanding whether this was done or not.
h3. Definition of Done

Updated distributed assignments pending(planned) keys if necessary according to 
the current triggers state.


> Catch-up rebalance triggers on node restart
> ---
>
> Key: IGNITE-20209
> URL: https://issues.apache.org/jira/browse/IGNITE-20209
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Alexander Lapin
>Priority: Major
>  Labels: ignite-3
>
> h3. Motivation
> Please check https://issues.apache.org/jira/browse/IGNITE-20187 for more 
> context, that is about catching-up assignments.pending meta storage keys, 
> whether given one is about catching-up its triggers:
>  * Replica factor updates.
>  * Partitions count updates. Immutable for now.
>  * Data nodes updates.
>  * Replica storage addition/removal. !By the way, is it possible to remove 
> replica storage.
> For all aforementioned cases, it's required to update distributed assignments 
> pending (planned) keys if it's not yet done. And the only difficulty here is 
> precisely in understanding whether this was done or not.
> h3. Definition of Done
> Updated distributed assignments pending(planned) keys if necessary according 
> to the current triggers state.
>  
> Notes:
> 1) Add to metastorage starting revision 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (IGNITE-19836) .NET: Thin 3.0: Reject Tuples and POCOs with unmapped fields



[ 
https://issues.apache.org/jira/browse/IGNITE-19836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754134#comment-17754134
 ] 

Igor Sapego commented on IGNITE-19836:
--

Looks good to me.

> .NET: Thin 3.0: Reject Tuples and POCOs with unmapped fields
> 
>
> Key: IGNITE-19836
> URL: https://issues.apache.org/jira/browse/IGNITE-19836
> Project: Ignite
>  Issue Type: Improvement
>  Components: platforms, thin client
>Affects Versions: 3.0.0-beta1
>Reporter: Pavel Tupitsyn
>Assignee: Pavel Tupitsyn
>Priority: Major
>  Labels: .NET, ignite-3
> Fix For: 3.0.0-beta2
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Tuples and POCOs with unmapped fields should not be allowed in table APIs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-20209) Catch-up rebalance triggers on node restart



 [ 
https://issues.apache.org/jira/browse/IGNITE-20209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mirza Aliev updated IGNITE-20209:
-
Description: 
h3. Motivation

Please check https://issues.apache.org/jira/browse/IGNITE-20187 for more 
context, that is about catching-up assignments.pending meta storage keys, 
whether given one is about catching-up its triggers:
 * Replica factor updates.
 * Partitions count updates. Immutable for now.
 * Data nodes updates.
 * Replica storage addition/removal. !By the way, is it possible to remove 
replica storage.

For all aforementioned cases, it's required to update distributed assignments 
pending (planned) keys if it's not yet done. And the only difficulty here is 
precisely in understanding whether this was done or not.
h3. Definition of Done

Updated distributed assignments pending(planned) keys if necessary according to 
the current triggers state.

 

Notes:

1) Add to metastorage starting revision 
(\{{metaStorageMgr.recoveryFinishedFuture()}} returns long with maximal 
recovered revision)

  was:
h3. Motivation

Please check https://issues.apache.org/jira/browse/IGNITE-20187 for more 
context, that is about catching-up assignments.pending meta storage keys, 
whether given one is about catching-up its triggers:
 * Replica factor updates.
 * Partitions count updates. Immutable for now.
 * Data nodes updates.
 * Replica storage addition/removal. !By the way, is it possible to remove 
replica storage.

For all aforementioned cases, it's required to update distributed assignments 
pending (planned) keys if it's not yet done. And the only difficulty here is 
precisely in understanding whether this was done or not.
h3. Definition of Done

Updated distributed assignments pending(planned) keys if necessary according to 
the current triggers state.

 

Notes:

1) Add to metastorage starting revision 


> Catch-up rebalance triggers on node restart
> ---
>
> Key: IGNITE-20209
> URL: https://issues.apache.org/jira/browse/IGNITE-20209
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Alexander Lapin
>Priority: Major
>  Labels: ignite-3
>
> h3. Motivation
> Please check https://issues.apache.org/jira/browse/IGNITE-20187 for more 
> context, that is about catching-up assignments.pending meta storage keys, 
> whether given one is about catching-up its triggers:
>  * Replica factor updates.
>  * Partitions count updates. Immutable for now.
>  * Data nodes updates.
>  * Replica storage addition/removal. !By the way, is it possible to remove 
> replica storage.
> For all aforementioned cases, it's required to update distributed assignments 
> pending (planned) keys if it's not yet done. And the only difficulty here is 
> precisely in understanding whether this was done or not.
> h3. Definition of Done
> Updated distributed assignments pending(planned) keys if necessary according 
> to the current triggers state.
>  
> Notes:
> 1) Add to metastorage starting revision 
> (\{{metaStorageMgr.recoveryFinishedFuture()}} returns long with maximal 
> recovered revision)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-20209) Catch-up rebalance triggers on node restart



 [ 
https://issues.apache.org/jira/browse/IGNITE-20209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mirza Aliev updated IGNITE-20209:
-
Description: 
h3. Motivation

Please check https://issues.apache.org/jira/browse/IGNITE-20187 for more 
context, that is about catching-up assignments.pending meta storage keys, 
whether given one is about catching-up its triggers:
 * Replica factor updates.
 * Partitions count updates. Immutable for now.
 * Data nodes updates.
 * Replica storage addition/removal. !By the way, is it possible to remove 
replica storage.

For all aforementioned cases, it's required to update distributed assignments 
pending (planned) keys if it's not yet done. And the only difficulty here is 
precisely in understanding whether this was done or not.
h3. Definition of Done

Updated distributed assignments pending(planned) keys if necessary according to 
the current triggers state.

 

Notes:

1) Add to metastorage starting revision 
({{{}metaStorageMgr.recoveryFinishedFuture(){}}} returns long with the maximal 
recovered revision)

  was:
h3. Motivation

Please check https://issues.apache.org/jira/browse/IGNITE-20187 for more 
context, that is about catching-up assignments.pending meta storage keys, 
whether given one is about catching-up its triggers:
 * Replica factor updates.
 * Partitions count updates. Immutable for now.
 * Data nodes updates.
 * Replica storage addition/removal. !By the way, is it possible to remove 
replica storage.

For all aforementioned cases, it's required to update distributed assignments 
pending (planned) keys if it's not yet done. And the only difficulty here is 
precisely in understanding whether this was done or not.
h3. Definition of Done

Updated distributed assignments pending(planned) keys if necessary according to 
the current triggers state.

 

Notes:

1) Add to metastorage starting revision 
(\{{metaStorageMgr.recoveryFinishedFuture()}} returns long with maximal 
recovered revision)


> Catch-up rebalance triggers on node restart
> ---
>
> Key: IGNITE-20209
> URL: https://issues.apache.org/jira/browse/IGNITE-20209
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Alexander Lapin
>Priority: Major
>  Labels: ignite-3
>
> h3. Motivation
> Please check https://issues.apache.org/jira/browse/IGNITE-20187 for more 
> context, that is about catching-up assignments.pending meta storage keys, 
> whether given one is about catching-up its triggers:
>  * Replica factor updates.
>  * Partitions count updates. Immutable for now.
>  * Data nodes updates.
>  * Replica storage addition/removal. !By the way, is it possible to remove 
> replica storage.
> For all aforementioned cases, it's required to update distributed assignments 
> pending (planned) keys if it's not yet done. And the only difficulty here is 
> precisely in understanding whether this was done or not.
> h3. Definition of Done
> Updated distributed assignments pending(planned) keys if necessary according 
> to the current triggers state.
>  
> Notes:
> 1) Add to metastorage starting revision 
> ({{{}metaStorageMgr.recoveryFinishedFuture(){}}} returns long with the 
> maximal recovered revision)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-20209) Catch-up rebalance triggers on node restart



 [ 
https://issues.apache.org/jira/browse/IGNITE-20209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mirza Aliev updated IGNITE-20209:
-
Description: 
h3. Motivation

Please check https://issues.apache.org/jira/browse/IGNITE-20187 for more 
context, that is about catching-up assignments.pending meta storage keys, 
whether given one is about catching-up its triggers:
 * Replica factor updates.
 * Partitions count updates. Immutable for now.
 * Data nodes updates.
 * Replica storage addition/removal. !By the way, is it possible to remove 
replica storage.

For all aforementioned cases, it's required to update distributed assignments 
pending (planned) keys if it's not yet done. And the only difficulty here is 
precisely in understanding whether this was done or not.
h3. Definition of Done

Updated distributed assignments pending(planned) keys if necessary according to 
the current triggers state.

 

Notes:

1) -Add to metastorage starting revision- 
({{{}metaStorageMgr.recoveryFinishedFuture(){}}} returns long with the maximal 
recovered revision)

  was:
h3. Motivation

Please check https://issues.apache.org/jira/browse/IGNITE-20187 for more 
context, that is about catching-up assignments.pending meta storage keys, 
whether given one is about catching-up its triggers:
 * Replica factor updates.
 * Partitions count updates. Immutable for now.
 * Data nodes updates.
 * Replica storage addition/removal. !By the way, is it possible to remove 
replica storage.

For all aforementioned cases, it's required to update distributed assignments 
pending (planned) keys if it's not yet done. And the only difficulty here is 
precisely in understanding whether this was done or not.
h3. Definition of Done

Updated distributed assignments pending(planned) keys if necessary according to 
the current triggers state.

 

Notes:

1) Add to metastorage starting revision 
({{{}metaStorageMgr.recoveryFinishedFuture(){}}} returns long with the maximal 
recovered revision)


> Catch-up rebalance triggers on node restart
> ---
>
> Key: IGNITE-20209
> URL: https://issues.apache.org/jira/browse/IGNITE-20209
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Alexander Lapin
>Priority: Major
>  Labels: ignite-3
>
> h3. Motivation
> Please check https://issues.apache.org/jira/browse/IGNITE-20187 for more 
> context, that is about catching-up assignments.pending meta storage keys, 
> whether given one is about catching-up its triggers:
>  * Replica factor updates.
>  * Partitions count updates. Immutable for now.
>  * Data nodes updates.
>  * Replica storage addition/removal. !By the way, is it possible to remove 
> replica storage.
> For all aforementioned cases, it's required to update distributed assignments 
> pending (planned) keys if it's not yet done. And the only difficulty here is 
> precisely in understanding whether this was done or not.
> h3. Definition of Done
> Updated distributed assignments pending(planned) keys if necessary according 
> to the current triggers state.
>  
> Notes:
> 1) -Add to metastorage starting revision- 
> ({{{}metaStorageMgr.recoveryFinishedFuture(){}}} returns long with the 
> maximal recovered revision)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IGNITE-20214) ItSimpleCounterServerTest#testRefreshLeader is flaky

Aleksandr Polovtcev created IGNITE-20214:


 Summary: ItSimpleCounterServerTest#testRefreshLeader is flaky
 Key: IGNITE-20214
 URL: https://issues.apache.org/jira/browse/IGNITE-20214
 Project: Ignite
  Issue Type: Task
Reporter: Aleksandr Polovtcev
Assignee: Aleksandr Polovtcev






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-20214) ItSimpleCounterServerTest#testRefreshLeader is flaky



 [ 
https://issues.apache.org/jira/browse/IGNITE-20214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksandr Polovtcev updated IGNITE-20214:
-
Ignite Flags:   (was: Docs Required,Release Notes Required)

> ItSimpleCounterServerTest#testRefreshLeader is flaky
> 
>
> Key: IGNITE-20214
> URL: https://issues.apache.org/jira/browse/IGNITE-20214
> Project: Ignite
>  Issue Type: Task
>Reporter: Aleksandr Polovtcev
>Assignee: Aleksandr Polovtcev
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-20214) ItSimpleCounterServerTest#testRefreshLeader is flaky



 [ 
https://issues.apache.org/jira/browse/IGNITE-20214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksandr Polovtcev updated IGNITE-20214:
-
Labels: ignite-3  (was: )

> ItSimpleCounterServerTest#testRefreshLeader is flaky
> 
>
> Key: IGNITE-20214
> URL: https://issues.apache.org/jira/browse/IGNITE-20214
> Project: Ignite
>  Issue Type: Task
>Reporter: Aleksandr Polovtcev
>Assignee: Aleksandr Polovtcev
>Priority: Blocker
>  Labels: ignite-3
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (IGNITE-19836) .NET: Thin 3.0: Reject Tuples and POCOs with unmapped fields

2023-08-14 Thread Pavel Tupitsyn (Jira)



[ 
https://issues.apache.org/jira/browse/IGNITE-19836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754137#comment-17754137
 ] 

Pavel Tupitsyn commented on IGNITE-19836:
-

Merged to master: 4a646a7cd7eeaa8cc5b4b3000d430be3f4fb2587

> .NET: Thin 3.0: Reject Tuples and POCOs with unmapped fields
> 
>
> Key: IGNITE-19836
> URL: https://issues.apache.org/jira/browse/IGNITE-19836
> Project: Ignite
>  Issue Type: Improvement
>  Components: platforms, thin client
>Affects Versions: 3.0.0-beta1
>Reporter: Pavel Tupitsyn
>Assignee: Pavel Tupitsyn
>Priority: Major
>  Labels: .NET, ignite-3
> Fix For: 3.0.0-beta2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Tuples and POCOs with unmapped fields should not be allowed in table APIs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-20214) ItSimpleCounterServerTest#testRefreshLeader is flaky



 [ 
https://issues.apache.org/jira/browse/IGNITE-20214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksandr Polovtcev updated IGNITE-20214:
-
Description: 
{{ItSimpleCounterServerTest#testRefreshLeader}} sometimes fails with the 
following error:


{code:java}
org.opentest4j.AssertionFailedError: 
Expected :true
Actual   :false

at org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:180)
at 
org.apache.ignite.raft.server.ItSimpleCounterServerTest.before(ItSimpleCounterServerTest.java:141)
{code}

Looks like timeouts in {{waitForTopology}} calls are too small.

> ItSimpleCounterServerTest#testRefreshLeader is flaky
> 
>
> Key: IGNITE-20214
> URL: https://issues.apache.org/jira/browse/IGNITE-20214
> Project: Ignite
>  Issue Type: Task
>Reporter: Aleksandr Polovtcev
>Assignee: Aleksandr Polovtcev
>Priority: Blocker
>  Labels: ignite-3
>
> {{ItSimpleCounterServerTest#testRefreshLeader}} sometimes fails with the 
> following error:
> {code:java}
> org.opentest4j.AssertionFailedError: 
> Expected :true
> Actual   :false
>   at org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:180)
>   at 
> org.apache.ignite.raft.server.ItSimpleCounterServerTest.before(ItSimpleCounterServerTest.java:141)
> {code}
> Looks like timeouts in {{waitForTopology}} calls are too small.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (IGNITE-19836) .NET: Thin 3.0: Reject Tuples and POCOs with unmapped fields

2023-08-14 Thread Pavel Tupitsyn (Jira)



[ 
https://issues.apache.org/jira/browse/IGNITE-19836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754137#comment-17754137
 ] 

Pavel Tupitsyn edited comment on IGNITE-19836 at 8/14/23 3:06 PM:
--

Merged to main: 4a646a7cd7eeaa8cc5b4b3000d430be3f4fb2587


was (Author: ptupitsyn):
Merged to master: 4a646a7cd7eeaa8cc5b4b3000d430be3f4fb2587

> .NET: Thin 3.0: Reject Tuples and POCOs with unmapped fields
> 
>
> Key: IGNITE-19836
> URL: https://issues.apache.org/jira/browse/IGNITE-19836
> Project: Ignite
>  Issue Type: Improvement
>  Components: platforms, thin client
>Affects Versions: 3.0.0-beta1
>Reporter: Pavel Tupitsyn
>Assignee: Pavel Tupitsyn
>Priority: Major
>  Labels: .NET, ignite-3
> Fix For: 3.0.0-beta2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Tuples and POCOs with unmapped fields should not be allowed in table APIs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-16088) Reuse Marshaller code in marshaller-common module



 [ 
https://issues.apache.org/jira/browse/IGNITE-16088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksandr Polovtcev updated IGNITE-16088:
-
Fix Version/s: 3.0.0-beta2

> Reuse Marshaller code in marshaller-common module
> -
>
> Key: IGNITE-16088
> URL: https://issues.apache.org/jira/browse/IGNITE-16088
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha4
>Reporter: Pavel Tupitsyn
>Assignee: Aleksandr Polovtcev
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> IGNITE-14971 added *ignite-marshaller-common* module to reuse serialization 
> logic between the server and client parts.
> This module duplicates some logic from *ignite-schema* module.
> * Remove duplicated code from *ignite-schema* and reuse the logic from common 
> module.
> * Extract other common bits where applicable (e.g. *AsmSerializerGenerator*)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-16088) Reuse Marshaller code in marshaller-common module



 [ 
https://issues.apache.org/jira/browse/IGNITE-16088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksandr Polovtcev updated IGNITE-16088:
-
Fix Version/s: (was: 3.0.0-beta2)

> Reuse Marshaller code in marshaller-common module
> -
>
> Key: IGNITE-16088
> URL: https://issues.apache.org/jira/browse/IGNITE-16088
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha4
>Reporter: Pavel Tupitsyn
>Assignee: Aleksandr Polovtcev
>Priority: Major
>  Labels: ignite-3
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> IGNITE-14971 added *ignite-marshaller-common* module to reuse serialization 
> logic between the server and client parts.
> This module duplicates some logic from *ignite-schema* module.
> * Remove duplicated code from *ignite-schema* and reuse the logic from common 
> module.
> * Extract other common bits where applicable (e.g. *AsmSerializerGenerator*)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-20214) ItSimpleCounterServerTest#testRefreshLeader is flaky



 [ 
https://issues.apache.org/jira/browse/IGNITE-20214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksandr Polovtcev updated IGNITE-20214:
-
Fix Version/s: 3.0.0-beta2

> ItSimpleCounterServerTest#testRefreshLeader is flaky
> 
>
> Key: IGNITE-20214
> URL: https://issues.apache.org/jira/browse/IGNITE-20214
> Project: Ignite
>  Issue Type: Task
>Reporter: Aleksandr Polovtcev
>Assignee: Aleksandr Polovtcev
>Priority: Blocker
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{ItSimpleCounterServerTest#testRefreshLeader}} sometimes fails with the 
> following error:
> {code:java}
> org.opentest4j.AssertionFailedError: 
> Expected :true
> Actual   :false
>   at org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:180)
>   at 
> org.apache.ignite.raft.server.ItSimpleCounterServerTest.before(ItSimpleCounterServerTest.java:141)
> {code}
> Looks like timeouts in {{waitForTopology}} calls are too small.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-20035) IndexOutOfBoundsException when statement.SetMaxRows is set

2023-08-14 Thread Pavel Pereslegin (Jira)



 [ 
https://issues.apache.org/jira/browse/IGNITE-20035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Pereslegin updated IGNITE-20035:
--
Fix Version/s: 3.0.0-beta2

> IndexOutOfBoundsException when statement.SetMaxRows is set
> --
>
> Key: IGNITE-20035
> URL: https://issues.apache.org/jira/browse/IGNITE-20035
> Project: Ignite
>  Issue Type: Bug
>  Components: sql
>Affects Versions: 3.0
>Reporter: Alexander Belyak
>Assignee: Pavel Pereslegin
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> If setMaxRows > count(*) - query fail with IndexOutOfBound exception.
> Reproducer:
>  
> {noformat}
> try (Connection connection = connect(); Statement statement = 
> connection.createStatement()) {
> JdbcSteps steps = new JdbcSteps(statement);
> steps.executeUpdateQuery("CREATE TABLE Person (id INT PRIMARY KEY, name 
> VARCHAR)", "Creating a table with two columns.");
> steps.executeUpdateQuery("INSERT INTO Person (id, name) VALUES (1, 
> 'John')", "Inserting a single record");
> statement.setMaxRows(25);
> ResultSet res = steps.executeQuery("SELECT * FROM Person", "Selecting all 
> the records from the table");
> while (res.next()) {
> log.info("{}, {}", res.getInt(1), res.getString(2));
> assertEquals(1, res.getInt(1));
> assertEquals("John", res.getString(2));
> }
> }{noformat}
> Returns:
>  
>  
> {noformat}
> Exception while executing query [query=SELECT * FROM Person]. Error 
> message:toIndex = 25
> java.sql.SQLException: Exception while executing query [query=SELECT * FROM 
> Person]. Error message:toIndex = 25
>     at 
> org.apache.ignite.internal.jdbc.proto.IgniteQueryErrorCode.createJdbcSqlException(IgniteQueryErrorCode.java:57)
>     at 
> org.apache.ignite.internal.jdbc.JdbcStatement.execute0(JdbcStatement.java:149)
>     at 
> org.apache.ignite.internal.jdbc.JdbcStatement.executeQuery(JdbcStatement.java:108)
>     at 
> org.gridgain.ai3tests.tests.teststeps.JdbcSteps.executeQuery(JdbcSteps.java:50)
>     at 
> org.gridgain.ai3tests.tests.BasicOperationsTest.testSaveAndGetFromCache(BasicOperationsTest.java:41)
>     at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
>     at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>     at 
> org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:727)
>     at 
> org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
>     at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
>     at 
> org.junit.jupiter.api.extension.InvocationInterceptor.interceptTestMethod(InvocationInterceptor.java:118)
>     at 
> org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103)
>     at 
> org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93)
>     at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
>     at 
> org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:156)
>     at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:147)
>     at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:86)
>     at 
> org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103)
>     at 
> org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93)
>     at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
>     at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
>     at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
>     at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
>     at 
> org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:92)
>     at 
> org.junit.jupiter.engine.execution

[jira] [Updated] (IGNITE-20035) IndexOutOfBoundsException when statement.SetMaxRows is set

2023-08-14 Thread Pavel Pereslegin (Jira)



 [ 
https://issues.apache.org/jira/browse/IGNITE-20035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Pereslegin updated IGNITE-20035:
--
Ignite Flags: Release Notes Required  (was: Docs Required,Release Notes 
Required)

> IndexOutOfBoundsException when statement.SetMaxRows is set
> --
>
> Key: IGNITE-20035
> URL: https://issues.apache.org/jira/browse/IGNITE-20035
> Project: Ignite
>  Issue Type: Bug
>  Components: sql
>Affects Versions: 3.0
>Reporter: Alexander Belyak
>Assignee: Pavel Pereslegin
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> If setMaxRows > count(*) - query fail with IndexOutOfBound exception.
> Reproducer:
>  
> {noformat}
> try (Connection connection = connect(); Statement statement = 
> connection.createStatement()) {
> JdbcSteps steps = new JdbcSteps(statement);
> steps.executeUpdateQuery("CREATE TABLE Person (id INT PRIMARY KEY, name 
> VARCHAR)", "Creating a table with two columns.");
> steps.executeUpdateQuery("INSERT INTO Person (id, name) VALUES (1, 
> 'John')", "Inserting a single record");
> statement.setMaxRows(25);
> ResultSet res = steps.executeQuery("SELECT * FROM Person", "Selecting all 
> the records from the table");
> while (res.next()) {
> log.info("{}, {}", res.getInt(1), res.getString(2));
> assertEquals(1, res.getInt(1));
> assertEquals("John", res.getString(2));
> }
> }{noformat}
> Returns:
>  
>  
> {noformat}
> Exception while executing query [query=SELECT * FROM Person]. Error 
> message:toIndex = 25
> java.sql.SQLException: Exception while executing query [query=SELECT * FROM 
> Person]. Error message:toIndex = 25
>     at 
> org.apache.ignite.internal.jdbc.proto.IgniteQueryErrorCode.createJdbcSqlException(IgniteQueryErrorCode.java:57)
>     at 
> org.apache.ignite.internal.jdbc.JdbcStatement.execute0(JdbcStatement.java:149)
>     at 
> org.apache.ignite.internal.jdbc.JdbcStatement.executeQuery(JdbcStatement.java:108)
>     at 
> org.gridgain.ai3tests.tests.teststeps.JdbcSteps.executeQuery(JdbcSteps.java:50)
>     at 
> org.gridgain.ai3tests.tests.BasicOperationsTest.testSaveAndGetFromCache(BasicOperationsTest.java:41)
>     at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
>     at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>     at 
> org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:727)
>     at 
> org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
>     at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
>     at 
> org.junit.jupiter.api.extension.InvocationInterceptor.interceptTestMethod(InvocationInterceptor.java:118)
>     at 
> org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103)
>     at 
> org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93)
>     at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
>     at 
> org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:156)
>     at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:147)
>     at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:86)
>     at 
> org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103)
>     at 
> org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93)
>     at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
>     at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
>     at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
>     at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
>     at 
> org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker

[jira] [Updated] (IGNITE-20035) IndexOutOfBoundsException when statement.SetMaxRows is set

2023-08-14 Thread Pavel Pereslegin (Jira)



 [ 
https://issues.apache.org/jira/browse/IGNITE-20035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Pereslegin updated IGNITE-20035:
--
Ignite Flags:   (was: Release Notes Required)

> IndexOutOfBoundsException when statement.SetMaxRows is set
> --
>
> Key: IGNITE-20035
> URL: https://issues.apache.org/jira/browse/IGNITE-20035
> Project: Ignite
>  Issue Type: Bug
>  Components: sql
>Affects Versions: 3.0
>Reporter: Alexander Belyak
>Assignee: Pavel Pereslegin
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> If setMaxRows > count(*) - query fail with IndexOutOfBound exception.
> Reproducer:
>  
> {noformat}
> try (Connection connection = connect(); Statement statement = 
> connection.createStatement()) {
> JdbcSteps steps = new JdbcSteps(statement);
> steps.executeUpdateQuery("CREATE TABLE Person (id INT PRIMARY KEY, name 
> VARCHAR)", "Creating a table with two columns.");
> steps.executeUpdateQuery("INSERT INTO Person (id, name) VALUES (1, 
> 'John')", "Inserting a single record");
> statement.setMaxRows(25);
> ResultSet res = steps.executeQuery("SELECT * FROM Person", "Selecting all 
> the records from the table");
> while (res.next()) {
> log.info("{}, {}", res.getInt(1), res.getString(2));
> assertEquals(1, res.getInt(1));
> assertEquals("John", res.getString(2));
> }
> }{noformat}
> Returns:
>  
>  
> {noformat}
> Exception while executing query [query=SELECT * FROM Person]. Error 
> message:toIndex = 25
> java.sql.SQLException: Exception while executing query [query=SELECT * FROM 
> Person]. Error message:toIndex = 25
>     at 
> org.apache.ignite.internal.jdbc.proto.IgniteQueryErrorCode.createJdbcSqlException(IgniteQueryErrorCode.java:57)
>     at 
> org.apache.ignite.internal.jdbc.JdbcStatement.execute0(JdbcStatement.java:149)
>     at 
> org.apache.ignite.internal.jdbc.JdbcStatement.executeQuery(JdbcStatement.java:108)
>     at 
> org.gridgain.ai3tests.tests.teststeps.JdbcSteps.executeQuery(JdbcSteps.java:50)
>     at 
> org.gridgain.ai3tests.tests.BasicOperationsTest.testSaveAndGetFromCache(BasicOperationsTest.java:41)
>     at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
>     at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>     at 
> org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:727)
>     at 
> org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
>     at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
>     at 
> org.junit.jupiter.api.extension.InvocationInterceptor.interceptTestMethod(InvocationInterceptor.java:118)
>     at 
> org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103)
>     at 
> org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93)
>     at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
>     at 
> org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:156)
>     at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:147)
>     at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:86)
>     at 
> org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103)
>     at 
> org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93)
>     at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
>     at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
>     at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
>     at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
>     at 
> org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:92)
>     at 
> org.junit.jupit

[jira] [Updated] (IGNITE-20124) Prevent double storage updates within primary

[
https://issues.apache.org/jira/browse/IGNITE-20124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Denis Chudov updated IGNITE-20124:
--
Description:
h3. Motivation

In order to preserve the guarantee that the primary replica is always
up-to-date it's required to:
* In case of common RW transaction - insert writeIntent to the storage within
primary before replication.
* In case of one-phase-commit - insert commitedWrite after the replication.

Both have already been done. However, that means that if primary is part of the
replication group, and it's true in almost all cases, we will double the update:
* In case of common RW transaction - through the replication.
* In case of one-phase-commit - either through the replication, or though post
update, if replication was fast enough.

h3. Definition of Done
* Prevent double storage updates within primary.

h3. Implementation Notes

The easiest way to prevent double insert is to skip one if local safe time is
greater or equal to candidates. There are 3 places where we update partition
storage:
# Primary pre-replication update. In that case, the second update on
replication should be excluded.
# Primary post-replication update in case of 1PC. It's possible to see already
updated data if replication was already processed locally. It is expected to be
already covered in https://issues.apache.org/jira/browse/IGNITE-15927 . We
should check the primary safe time on post-replication update and don't do
update if the safe time is already adjusted.
# Insert through replication. In case of !1PC on every primary there will be
double insert (see 1). In case of 1PC it depends, so we should check the safe
time on primary to know whether the update should be done (see 2).

In every case, the storage indexes still should be adjusted on replication, as
it is done now, because the progress of indexes on FSM write operations should
not be violated - otherwise, a Raft snapshot-based rebalance would be broken.
We may have two non-consistent storage updates on primary which may affect
different fsyncs, so maybe we should benchmark this optimization to find out
how useful it is. The transactional correctness isn't violated by these
non-consistent storage updates, because there is only a possibility that some
writes or write intents will go ahead of indexes and therefore will be included
into snapshots - however we still can process such writes and resolve write
intents.

Also, the safe time needs to be updated on the primary replica now. There can
be following scenarios:
# Two-phase commit: we can advance safe time on primary, make pre-replication
update and then run Raft command. Both safe time adjustment and storage update
happen before replication.
# One-phase commit: safe time should be advanced after completeness of Raft
command future. There is no happens-before between the future callback and the
replication handler, so the safe time should be checked and advanced in both
places. We should use some critical section for the exact transaction,
preventing race between safe time check, safe time adjustment and storage
update.

was:
h3. Motivation

h3. Definition of Done
* Prevent double storage updates within primary.

h3. Implementation Notes

In every case, the storage indexes still should be adjusted on replication, as
it is done now, because the progress of indexes on FSM write operations should
not be viola

[jira] [Updated] (IGNITE-20157) Share context details to ease replication timeout exception analysis



 [ 
https://issues.apache.org/jira/browse/IGNITE-20157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denis Chudov updated IGNITE-20157:
--
Description: 
*Motivation*

On client side, we have only "Replication timeout exception" happening on 
request timeout, and we can't know the exact reason without debugging. Later it 
will be replaced with transaction timeout exception, but this would not solve 
the problem. We should know somehow what happened on the server side. Timeouts 
should be set on operations on primary replica side that are most likely the 
cause of request timeouts. In case of such operation timeout on promary 
replica, a corresponding exception should be printed to the log, so that it can 
be matched with an exception on client by transaction id.

*Definition of done*

Future returned by LockManager#acquire is completed exceptionally if the lock 
was not acquired in some time interval (lock acquisition timeout).

*Implementation notes*

This exception (or its message) should differ from the exception thrown because 
of deadlock prevention policy with timeout.

  was:
*Motivation*

Currently we have lock timeouts only for specific implementations of 
DeadlockPreventionPolicy. In the same time, we have transaction request 
timeouts. It makes no sense for such requests to wait for acquiring locks 
longer than request timeout.

*Definition of done*

Future returned by LockManager#acquire is completed exceptionally if the lock 
was not acquired in some time interval (lock acquisition timeout).

*Implementation notes*

This exception (or its message) should differ from the exception thrown because 
of deadlock prevention policy with timeout.


> Share context details to ease replication timeout exception analysis
> 
>
> Key: IGNITE-20157
> URL: https://issues.apache.org/jira/browse/IGNITE-20157
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Denis Chudov
>Priority: Major
>  Labels: ignite-3
>
> *Motivation*
> On client side, we have only "Replication timeout exception" happening on 
> request timeout, and we can't know the exact reason without debugging. Later 
> it will be replaced with transaction timeout exception, but this would not 
> solve the problem. We should know somehow what happened on the server side. 
> Timeouts should be set on operations on primary replica side that are most 
> likely the cause of request timeouts. In case of such operation timeout on 
> promary replica, a corresponding exception should be printed to the log, so 
> that it can be matched with an exception on client by transaction id.
> *Definition of done*
> Future returned by LockManager#acquire is completed exceptionally if the lock 
> was not acquired in some time interval (lock acquisition timeout).
> *Implementation notes*
> This exception (or its message) should differ from the exception thrown 
> because of deadlock prevention policy with timeout.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-20213) RO transactions should not block LWM from rising



 [ 
https://issues.apache.org/jira/browse/IGNITE-20213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksandr Polovtcev updated IGNITE-20213:
-
Description: 
{{org.apache.ignite.internal.tx.TxManager#updateLowWatermark}} returns a 
{{CompletableFuture}} that is completed when all currently running RO 
transactions finish. Until that future is complete, local Low Watermark does 
not get updated.

It is proposed to change this behavior: instead of blocking the Low Watermark 
update, all unfinished RO transactions (at the time of the LWM update) must 
fail with an appropriate error. 

  was:
{{org.apache.ignite.internal.tx.TxManager#updateLowWatermark}} returns a 
{{CompletableFuture}} that is completed when all currently running RO 
transactions complete. Until that future is complete, local Low Watermark does 
not get updated.

It is proposed to change this behavior: instead of blocking the Low Watermark 
update, all unfinished RO transactions (at the time of the LWM update) must 
fail with an appropriate error. 


> RO transactions should not block LWM from rising
> 
>
> Key: IGNITE-20213
> URL: https://issues.apache.org/jira/browse/IGNITE-20213
> Project: Ignite
>  Issue Type: Task
>Reporter: Aleksandr Polovtcev
>Assignee: Alexander Lapin
>Priority: Major
>  Labels: ignite-3
>
> {{org.apache.ignite.internal.tx.TxManager#updateLowWatermark}} returns a 
> {{CompletableFuture}} that is completed when all currently running RO 
> transactions finish. Until that future is complete, local Low Watermark does 
> not get updated.
> It is proposed to change this behavior: instead of blocking the Low Watermark 
> update, all unfinished RO transactions (at the time of the LWM update) must 
> fail with an appropriate error. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (IGNITE-20057) C++ client: Track observable timestamp