[jira] [Created] (IGNITE-20202) Base metrics for SQL thread pools
Yury Gerzhedovich created IGNITE-20202: -- Summary: Base metrics for SQL thread pools Key: IGNITE-20202 URL: https://issues.apache.org/jira/browse/IGNITE-20202 Project: Ignite Issue Type: Improvement Components: sql Reporter: Yury Gerzhedovich Let's introduce queue size for SQL execution and planning thread pools. Type is a Gauge. Suggested names are * {code:java} sql.plan.QueueSize {code} * {code:java} sql.execution.thread..QueueSize {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20202) Base metrics for SQL thread pools
[ https://issues.apache.org/jira/browse/IGNITE-20202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yury Gerzhedovich updated IGNITE-20202: --- Epic Link: IGNITE-17353 > Base metrics for SQL thread pools > - > > Key: IGNITE-20202 > URL: https://issues.apache.org/jira/browse/IGNITE-20202 > Project: Ignite > Issue Type: Improvement > Components: sql >Reporter: Yury Gerzhedovich >Priority: Major > Labels: ignite-3 > > Let's introduce queue size for SQL execution and planning thread pools. > Type is a Gauge. Suggested names are > * {code:java} > sql.plan.QueueSize > {code} > * {code:java} > sql.execution.thread..QueueSize > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-19009) Introduce file transfer support in messaging
[ https://issues.apache.org/jira/browse/IGNITE-19009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Gagarkin updated IGNITE-19009: --- Attachment: file_transfer.drawio.png > Introduce file transfer support in messaging > > > Key: IGNITE-19009 > URL: https://issues.apache.org/jira/browse/IGNITE-19009 > Project: Ignite > Issue Type: Improvement > Components: networking >Reporter: Mikhail Pochatkin >Assignee: Ivan Gagarkin >Priority: Major > Labels: iep-103, ignite-3 > Attachments: file_transfer.drawio.png > > Time Spent: 10h 40m > Remaining Estimate: 0h > > Current implemenatation of Network force to load deployment unit content to > heap as byte[]. This is a potential easy way of OOM. > > As solution we need to introduce lazy buffer in Network code where file will > readed by chunks. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-20203) File transfer for Ignite 3
Ivan Gagarkin created IGNITE-20203: -- Summary: File transfer for Ignite 3 Key: IGNITE-20203 URL: https://issues.apache.org/jira/browse/IGNITE-20203 Project: Ignite Issue Type: Epic Reporter: Ivan Gagarkin -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-20204) Use FileTransferService for transfer deployment units between nodes
Ivan Gagarkin created IGNITE-20204: -- Summary: Use FileTransferService for transfer deployment units between nodes Key: IGNITE-20204 URL: https://issues.apache.org/jira/browse/IGNITE-20204 Project: Ignite Issue Type: Improvement Components: compute Reporter: Ivan Gagarkin -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IGNITE-19889) Implement observable timestamp on server
[ https://issues.apache.org/jira/browse/IGNITE-19889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladislav Pyatkov reassigned IGNITE-19889: -- Assignee: Vladislav Pyatkov > Implement observable timestamp on server > > > Key: IGNITE-19889 > URL: https://issues.apache.org/jira/browse/IGNITE-19889 > Project: Ignite > Issue Type: Improvement >Reporter: Vladislav Pyatkov >Assignee: Vladislav Pyatkov >Priority: Major > Labels: ignite-3 > > *Motivation* > Client timestamp is used to determine a read timestamp for RO transaction on > client-side (IGNITE-19888). For consistency behavior, need to implement a > similar timestamp on server. > *Implementation note* > The last server observable timestamp should update at least when the > transaction commuted. > Any RO transaction should use the timestamp: for SQL (IGNITE-19898) and > through key-value API (IGNITE-19887) > *Definition of done* > All serve-side created RO transactions should execute in past with timestamp > has been determining by last observation time. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20204) Use FileTransferService for transfer deployment units between nodes
[ https://issues.apache.org/jira/browse/IGNITE-20204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Gagarkin updated IGNITE-20204: --- Description: The current implementation of Network force to load deployment unit content to the heap as byte[]. This is a potentially easy way of OOM. was: Current implementation of Network force to load deployment unit content to the heap as byte[]. This is a potentially easy way of OOM. > Use FileTransferService for transfer deployment units between nodes > --- > > Key: IGNITE-20204 > URL: https://issues.apache.org/jira/browse/IGNITE-20204 > Project: Ignite > Issue Type: Improvement > Components: compute >Reporter: Ivan Gagarkin >Priority: Major > Labels: ignite-3 > > The current implementation of Network force to load deployment unit content > to the heap as byte[]. This is a potentially easy way of OOM. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20204) Use FileTransferService for transfer deployment units between nodes
[ https://issues.apache.org/jira/browse/IGNITE-20204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Gagarkin updated IGNITE-20204: --- Description: Current implementation of Network force to load deployment unit content to the heap as byte[]. This is a potentially easy way of OOM. was: Current implemenatation of Network force to load deployment unit content to heap as byte[]. This is a potential easy way of OOM. > Use FileTransferService for transfer deployment units between nodes > --- > > Key: IGNITE-20204 > URL: https://issues.apache.org/jira/browse/IGNITE-20204 > Project: Ignite > Issue Type: Improvement > Components: compute >Reporter: Ivan Gagarkin >Priority: Major > Labels: ignite-3 > > Current implementation of Network force to load deployment unit content to > the heap as byte[]. This is a potentially easy way of OOM. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20204) Use FileTransferService for transfer deployment units between nodes
[ https://issues.apache.org/jira/browse/IGNITE-20204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Gagarkin updated IGNITE-20204: --- Description: The current implementation of Network forces to load deployment unit content to the heap as byte[]. This is a potentially easy way of OOM. was: The current implementation of Network force to load deployment unit content to the heap as byte[]. This is a potentially easy way of OOM. > Use FileTransferService for transfer deployment units between nodes > --- > > Key: IGNITE-20204 > URL: https://issues.apache.org/jira/browse/IGNITE-20204 > Project: Ignite > Issue Type: Improvement > Components: compute >Reporter: Ivan Gagarkin >Priority: Major > Labels: ignite-3 > > The current implementation of Network forces to load deployment unit content > to the heap as byte[]. This is a potentially easy way of OOM. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20204) Use FileTransferService for transfer deployment units between nodes
[ https://issues.apache.org/jira/browse/IGNITE-20204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Gagarkin updated IGNITE-20204: --- Description: Current implemenatation of Network force to load deployment unit content to heap as byte[]. This is a potential easy way of OOM. > Use FileTransferService for transfer deployment units between nodes > --- > > Key: IGNITE-20204 > URL: https://issues.apache.org/jira/browse/IGNITE-20204 > Project: Ignite > Issue Type: Improvement > Components: compute >Reporter: Ivan Gagarkin >Priority: Major > Labels: ignite-3 > > Current implemenatation of Network force to load deployment unit content to > heap as byte[]. This is a potential easy way of OOM. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20204) Use FileTransferService for transfer deployment units between nodes
[ https://issues.apache.org/jira/browse/IGNITE-20204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Gagarkin updated IGNITE-20204: --- Description: Currently, Ignite 3 loads deployment units content to the heap as byte[] when transfer units The current implementation of Network forces to load deployment unit content to the heap as byte[]. This is a potentially easy way of OOM. was: The current implementation of Network forces to load deployment unit content to the heap as byte[]. This is a potentially easy way of OOM. > Use FileTransferService for transfer deployment units between nodes > --- > > Key: IGNITE-20204 > URL: https://issues.apache.org/jira/browse/IGNITE-20204 > Project: Ignite > Issue Type: Improvement > Components: compute >Reporter: Ivan Gagarkin >Priority: Major > Labels: ignite-3 > > Currently, Ignite 3 loads deployment units content to the heap as byte[] when > transfer units > The current implementation of Network forces to load deployment unit content > to the heap as byte[]. This is a potentially easy way of OOM. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20204) Use FileTransferService for transfer deployment units between nodes
[ https://issues.apache.org/jira/browse/IGNITE-20204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Gagarkin updated IGNITE-20204: --- Description: Currently, Ignite 3 loads deployment units content to the heap as byte[] when transferring units between nodes. This is a potentially easy way of OOM. We should replace deployment transferring mechanism by was: Currently, Ignite 3 loads deployment units content to the heap as byte[] when transfer units The current implementation of Network forces to load deployment unit content to the heap as byte[]. This is a potentially easy way of OOM. > Use FileTransferService for transfer deployment units between nodes > --- > > Key: IGNITE-20204 > URL: https://issues.apache.org/jira/browse/IGNITE-20204 > Project: Ignite > Issue Type: Improvement > Components: compute >Reporter: Ivan Gagarkin >Priority: Major > Labels: ignite-3 > > Currently, Ignite 3 loads deployment units content to the heap as byte[] when > transferring units between nodes. This is a potentially easy way of OOM. > We should replace deployment transferring mechanism by > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20204) Use FileTransferService for transfer deployment units between nodes
[ https://issues.apache.org/jira/browse/IGNITE-20204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Gagarkin updated IGNITE-20204: --- Description: Currently, Ignite 3 loads deployment units content to the heap as byte[] when transferring units between nodes. This is a potentially easy way of OOM. We should replace deployment transferring mechanism by FileTransferService. was: Currently, Ignite 3 loads deployment units content to the heap as byte[] when transferring units between nodes. This is a potentially easy way of OOM. We should replace deployment transferring mechanism by > Use FileTransferService for transfer deployment units between nodes > --- > > Key: IGNITE-20204 > URL: https://issues.apache.org/jira/browse/IGNITE-20204 > Project: Ignite > Issue Type: Improvement > Components: compute >Reporter: Ivan Gagarkin >Priority: Major > Labels: ignite-3 > > Currently, Ignite 3 loads deployment units content to the heap as byte[] when > transferring units between nodes. This is a potentially easy way of OOM. > We should replace deployment transferring mechanism by FileTransferService. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20204) Use FileTransferService for transfer deployment units between nodes
[ https://issues.apache.org/jira/browse/IGNITE-20204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Gagarkin updated IGNITE-20204: --- Description: Currently, Ignite 3 loads deployment units content to the heap as byte[] when transferring units between nodes. This is a potentially easy way of OOM. We should replace the deployment transferring mechanism with FileTransferService. was: Currently, Ignite 3 loads deployment units content to the heap as byte[] when transferring units between nodes. This is a potentially easy way of OOM. We should replace deployment transferring mechanism by FileTransferService. > Use FileTransferService for transfer deployment units between nodes > --- > > Key: IGNITE-20204 > URL: https://issues.apache.org/jira/browse/IGNITE-20204 > Project: Ignite > Issue Type: Improvement > Components: compute >Reporter: Ivan Gagarkin >Priority: Major > Labels: ignite-3 > > Currently, Ignite 3 loads deployment units content to the heap as byte[] when > transferring units between nodes. This is a potentially easy way of OOM. > We should replace the deployment transferring mechanism with > FileTransferService. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-20205) TxLocalTest#testBalance is flaky
Denis Chudov created IGNITE-20205: - Summary: TxLocalTest#testBalance is flaky Key: IGNITE-20205 URL: https://issues.apache.org/jira/browse/IGNITE-20205 Project: Ignite Issue Type: Bug Reporter: Denis Chudov TxLocalTest is actually a mock of transactional logic based on local dummy table. Seems there are some problems with finalizing the transactions transferring money between accounts which causes lock exceptions on checking final sum over all accounts. Most likely there is a problem with mock because there is no similar problem with other test classes for this test. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-20206) Improve parallelism in FileTransferService on the file chunks level
Ivan Gagarkin created IGNITE-20206: -- Summary: Improve parallelism in FileTransferService on the file chunks level Key: IGNITE-20206 URL: https://issues.apache.org/jira/browse/IGNITE-20206 Project: Ignite Issue Type: Improvement Reporter: Ivan Gagarkin The current implementation of `{{{}org.apache.ignite.internal.network.file.FileSender`{}}} has parallelism at the file level. It works well when there are many files to send. It may be worth improving by implementing parallelism at the file chunk level. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20206) Improve parallelism in FileTransferService on the file chunks level
[ https://issues.apache.org/jira/browse/IGNITE-20206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Gagarkin updated IGNITE-20206: --- Description: The current implementation of {{org.apache.ignite.internal.network.file.FileSender}} has parallelism at the file level. It works well when there are many files to send. It may be worth improving by implementing parallelism at the file chunk level. (was: The current implementation of `{{{}org.apache.ignite.internal.network.file.FileSender`{}}} has parallelism at the file level. It works well when there are many files to send. It may be worth improving by implementing parallelism at the file chunk level.) > Improve parallelism in FileTransferService on the file chunks level > --- > > Key: IGNITE-20206 > URL: https://issues.apache.org/jira/browse/IGNITE-20206 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Gagarkin >Priority: Major > Labels: ignite-3 > > The current implementation of > {{org.apache.ignite.internal.network.file.FileSender}} has parallelism at the > file level. It works well when there are many files to send. It may be worth > improving by implementing parallelism at the file chunk level. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20206) Improve parallelism of sending files in FileTransferService
[ https://issues.apache.org/jira/browse/IGNITE-20206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Gagarkin updated IGNITE-20206: --- Summary: Improve parallelism of sending files in FileTransferService (was: Improve parallelism in FileTransferService on the file chunks level) > Improve parallelism of sending files in FileTransferService > --- > > Key: IGNITE-20206 > URL: https://issues.apache.org/jira/browse/IGNITE-20206 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Gagarkin >Priority: Major > Labels: ignite-3 > > The current implementation of > {{org.apache.ignite.internal.network.file.FileSender}} has parallelism at the > file level. It works well when there are many files to send. It may be worth > improving by implementing parallelism at the file chunk level. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20206) Improve parallelism the sending of files in FileTransferService
[ https://issues.apache.org/jira/browse/IGNITE-20206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Gagarkin updated IGNITE-20206: --- Summary: Improve parallelism the sending of files in FileTransferService (was: Improve parallelism of sending files in FileTransferService) > Improve parallelism the sending of files in FileTransferService > --- > > Key: IGNITE-20206 > URL: https://issues.apache.org/jira/browse/IGNITE-20206 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Gagarkin >Priority: Major > Labels: ignite-3 > > The current implementation of > {{org.apache.ignite.internal.network.file.FileSender}} has parallelism at the > file level. It works well when there are many files to send. It may be worth > improving by implementing parallelism at the file chunk level. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-20207) Improve the writing of files in FileTransferService
Ivan Gagarkin created IGNITE-20207: -- Summary: Improve the writing of files in FileTransferService Key: IGNITE-20207 URL: https://issues.apache.org/jira/browse/IGNITE-20207 Project: Ignite Issue Type: Improvement Reporter: Ivan Gagarkin The current implementation of {{org.apache.ignite.internal.network.file.ChunkedFileWriter}} compares the file pointer with the offset of the received file chunk. If they are equal, the chunk is written to the disk; if not, the chunk is placed in the queue, and it will be written when all previous chunks have been written. It might be more efficient to write chunks instantly. We should investigate this approach and improve the implementation. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-20208) Use file ids instead of file names when transferring file chunks
Ivan Gagarkin created IGNITE-20208: -- Summary: Use file ids instead of file names when transferring file chunks Key: IGNITE-20208 URL: https://issues.apache.org/jira/browse/IGNITE-20208 Project: Ignite Issue Type: Improvement Reporter: Ivan Gagarkin We can decrease the size of org.apache.ignite.internal.network.file.messages.FileChunkMessage by replacing file names with file ids. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20124) Prevent double storage updates within primary
[ https://issues.apache.org/jira/browse/IGNITE-20124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denis Chudov updated IGNITE-20124: -- Description: h3. Motivation In order to preserve the guarantee that the primary replica is always up-to-date it's required to: * In case of common RW transaction - insert writeIntent to the storage within primary before replication. * In case of one-phase-commit - insert commitedWrite after the replication. Both have already been done. However, that means that if primary is part of the replication group, and it's true in almost all cases, we will double the update: * In case of common RW transaction - through the replication. * In case of one-phase-commit - either through the replication, or though post update, if replication was fast enough. h3. Definition of Done * Prevent double storage updates within primary. h3. Implementation Notes The easiest way to prevent double insert is to skip one if local safe time is greater or equal to candidates. There are 3 places where we update partition storage: # Primary pre-replication update. In that case, the second update on replication should be excluded. # Primary post-replication update in case of 1PC. It's possible to see already updated data if replication was already processed locally. It is expected to be already covered in https://issues.apache.org/jira/browse/IGNITE-15927 . We should check the primary safe time on post-replication update and don't do update if the safe time is already adjusted. # Insert through replication. In case of !1PC on every primary there will be double insert (see 1). In case of 1PC it depends, so we should check the safe time on primary to know whether the update should be done (see 2). In every case, the storage indexes still should be adjusted on replication, as it is done now, because the progress of indexes on FSM write operations should not be violated - otherwise, a Raft snapshot-based rebalance would be broken. was: h3. Motivation In order to preserve the guarantee that the primary replica is always up-to-date it's required to: * In case of common RW transaction - insert writeIntent to the storage within primary before replication. * In case of one-phase-commit - insert commitedWrite after the replication. Both have already been done. However, that means that if primary is part of the replication group, and it's true in almost all cases, we will double the insert: * In case of common RW transaction - through the replication. * In case of one-phase-commit - either through the replication, or though post insert, if replication was fast enough. h3. Definition of Done * Prevent double storage updates within primary. h3. Implementation Notes The easiest way to prevent double insert is to skip one if local safe time is greater or equal to candidates. There are 3 places where we update partition storage: # Primary pre-replication insert. In that case, it's never possible to see already adjusted data. # Primary post-replication insert in case of 1PC. It's possible to see already inserted data if replication was already processed locally. It is expected to be already covered in https://issues.apache.org/jira/browse/IGNITE-15927 # Insert through replication. In case of !1PC on every primary there will be double insert. In case of 1PC it depends. > Prevent double storage updates within primary > - > > Key: IGNITE-20124 > URL: https://issues.apache.org/jira/browse/IGNITE-20124 > Project: Ignite > Issue Type: Improvement >Reporter: Alexander Lapin >Priority: Major > Labels: ignite-3, transactions > > h3. Motivation > In order to preserve the guarantee that the primary replica is always > up-to-date it's required to: > * In case of common RW transaction - insert writeIntent to the storage > within primary before replication. > * In case of one-phase-commit - insert commitedWrite after the replication. > Both have already been done. However, that means that if primary is part of > the replication group, and it's true in almost all cases, we will double the > update: > * In case of common RW transaction - through the replication. > * In case of one-phase-commit - either through the replication, or though > post update, if replication was fast enough. > h3. Definition of Done > * Prevent double storage updates within primary. > h3. Implementation Notes > The easiest way to prevent double insert is to skip one if local safe time is > greater or equal to candidates. There are 3 places where we update partition > storage: > # Primary pre-replication update. In that case, the second update on > replication should be excluded. > # Primary post-replication update in case of 1PC. It's possible to see > already updated data if replication was
[jira] [Updated] (IGNITE-20202) Base metrics for SQL thread pools
[ https://issues.apache.org/jira/browse/IGNITE-20202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yury Gerzhedovich updated IGNITE-20202: --- Description: Let's introduce queue size for SQL execution and planning thread pools. Type is a Gauge. Suggested names are * {code:java} sql.plan.QueueSize {code} * {code:java} sql.execution.stripe..QueueSize {code} was: Let's introduce queue size for SQL execution and planning thread pools. Type is a Gauge. Suggested names are * {code:java} sql.plan.QueueSize {code} * {code:java} sql.execution.thread..QueueSize {code} > Base metrics for SQL thread pools > - > > Key: IGNITE-20202 > URL: https://issues.apache.org/jira/browse/IGNITE-20202 > Project: Ignite > Issue Type: Improvement > Components: sql >Reporter: Yury Gerzhedovich >Priority: Major > Labels: ignite-3 > > Let's introduce queue size for SQL execution and planning thread pools. > Type is a Gauge. Suggested names are > * {code:java} > sql.plan.QueueSize > {code} > * {code:java} > sql.execution.stripe..QueueSize > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20124) Prevent double storage updates within primary
[ https://issues.apache.org/jira/browse/IGNITE-20124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denis Chudov updated IGNITE-20124: -- Description: h3. Motivation In order to preserve the guarantee that the primary replica is always up-to-date it's required to: * In case of common RW transaction - insert writeIntent to the storage within primary before replication. * In case of one-phase-commit - insert commitedWrite after the replication. Both have already been done. However, that means that if primary is part of the replication group, and it's true in almost all cases, we will double the update: * In case of common RW transaction - through the replication. * In case of one-phase-commit - either through the replication, or though post update, if replication was fast enough. h3. Definition of Done * Prevent double storage updates within primary. h3. Implementation Notes The easiest way to prevent double insert is to skip one if local safe time is greater or equal to candidates. There are 3 places where we update partition storage: # Primary pre-replication update. In that case, the second update on replication should be excluded. # Primary post-replication update in case of 1PC. It's possible to see already updated data if replication was already processed locally. It is expected to be already covered in https://issues.apache.org/jira/browse/IGNITE-15927 . We should check the primary safe time on post-replication update and don't do update if the safe time is already adjusted. # Insert through replication. In case of !1PC on every primary there will be double insert (see 1). In case of 1PC it depends, so we should check the safe time on primary to know whether the update should be done (see 2). In every case, the storage indexes still should be adjusted on replication, as it is done now, because the progress of indexes on FSM write operations should not be violated - otherwise, a Raft snapshot-based rebalance would be broken. We may have two non-consistent storage updates on primary which may affect different fsyncs, was: h3. Motivation In order to preserve the guarantee that the primary replica is always up-to-date it's required to: * In case of common RW transaction - insert writeIntent to the storage within primary before replication. * In case of one-phase-commit - insert commitedWrite after the replication. Both have already been done. However, that means that if primary is part of the replication group, and it's true in almost all cases, we will double the update: * In case of common RW transaction - through the replication. * In case of one-phase-commit - either through the replication, or though post update, if replication was fast enough. h3. Definition of Done * Prevent double storage updates within primary. h3. Implementation Notes The easiest way to prevent double insert is to skip one if local safe time is greater or equal to candidates. There are 3 places where we update partition storage: # Primary pre-replication update. In that case, the second update on replication should be excluded. # Primary post-replication update in case of 1PC. It's possible to see already updated data if replication was already processed locally. It is expected to be already covered in https://issues.apache.org/jira/browse/IGNITE-15927 . We should check the primary safe time on post-replication update and don't do update if the safe time is already adjusted. # Insert through replication. In case of !1PC on every primary there will be double insert (see 1). In case of 1PC it depends, so we should check the safe time on primary to know whether the update should be done (see 2). In every case, the storage indexes still should be adjusted on replication, as it is done now, because the progress of indexes on FSM write operations should not be violated - otherwise, a Raft snapshot-based rebalance would be broken. > Prevent double storage updates within primary > - > > Key: IGNITE-20124 > URL: https://issues.apache.org/jira/browse/IGNITE-20124 > Project: Ignite > Issue Type: Improvement >Reporter: Alexander Lapin >Priority: Major > Labels: ignite-3, transactions > > h3. Motivation > In order to preserve the guarantee that the primary replica is always > up-to-date it's required to: > * In case of common RW transaction - insert writeIntent to the storage > within primary before replication. > * In case of one-phase-commit - insert commitedWrite after the replication. > Both have already been done. However, that means that if primary is part of > the replication group, and it's true in almost all cases, we will double the > update: > * In case of common RW transaction - through the replication. > * In case of one-phase-commit - either through the rep
[jira] [Comment Edited] (IGNITE-20165) Revisit the configuration of thread pools used by JRaft
[ https://issues.apache.org/jira/browse/IGNITE-20165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17753233#comment-17753233 ] Mirza Aliev edited comment on IGNITE-20165 at 8/14/23 9:04 AM: --- By default, all executors are shared among the instance of Loza, meaning that all raft groups share executors. Below I've represented all JRaft executors with short description and the number of threads ||Pool name||Description||Number of Threads|| |JRaft-Common-Executor|A pool for processing short-lived asynchronous tasks. Should never be blocked.|Utils.cpus() (core == max)| |JRaft-Node-Scheduler|A scheduled executor for running delayed or repeating tasks.|Math.min(Utils.cpus() * 3, 20) (core, max == Integer.MAX_VALUE, DelayedWorkQueue)| |JRaft-Request-Processor|A default pool for handling RAFT requests. Should never be blocked.|Utils.cpus() * 6 (core == max)| |JRaft-Response-Processor|A default pool for handling RAFT responses. Should never be blocked.|80 (core == max/3, workQueue == 1)| |JRaft-AppendEntries-Processor|A pool of single thread executors. Used only if a replication pipelining is enabled. Handles append entries requests and responses (used by the replication flow). Threads are started on demand. Each replication pair (leader-follower) uses dedicated single thread executor from the pool, so all messages between replication peer pairs are processed sequentially.|SystemPropertyUtil.getInt( "jraft.append.entries.threads.send", Math.max(16, Ints.findNextPositivePowerOfTwo(cpus() * 2)));| |NodeImpl-Disruptor|A striped disruptor for batching FSM (finite state machine) user tasks.|DEFAULT_STRIPES = Utils.cpus() * 2| |ReadOnlyService-Disruptor|A striped disruptor for batching read requests before doing read index request.|DEFAULT_STRIPES = Utils.cpus() * 2| |LogManager-Disruptor|A striped disruptor for delivering log entries to a storage.|DEFAULT_STRIPES = Utils.cpus() * 2| |FSMCaller-Disruptor|A striped disruptor for FSM callbacks.|DEFAULT_STRIPES = Utils.cpus() * 2| |SnapshotTimer|A timer for periodic snapshot creation.|Math.min(Utils.cpus() * 3, 20) (core, max == Integer.MAX_VALUE)| |ElectionTimer|A timer to handle election timeout on followers.|Math.min(Utils.cpus() * 3, 20) (core, max == Integer.MAX_VALUE)| |VoteTimer|A timer to handle vote timeout when a leader was not confirmed by majority.|Math.min(Utils.cpus() * 3, 20) (core, max == Integer.MAX_VALUE)| |StepDownTimer|A timer to process leader step down condition.|Math.min(Utils.cpus() * 3, 20) (core, max == Integer.MAX_VALUE)| was (Author: maliev): By default, all executors are shared among the instance of Loza, meaning that all raft groups share executors. Below I've represented all JRaft executors with short description and the number of threads ||Pool name||Description||Number of Threads|| |JRaft-Common-Executor|A pool for processing short-lived asynchronous tasks. Should never be blocked.|Utils.cpus() (core == max)| |JRaft-Node-Scheduler|A scheduled executor for running delayed or repeating tasks.|Math.min(Utils.cpus() * 3, 20) (core, max == Integer.MAX_VALUE)| |JRaft-Request-Processor|A default pool for handling RAFT requests. Should never be blocked.|Utils.cpus() * 6 (core == max)| |JRaft-Response-Processor|A default pool for handling RAFT responses. Should never be blocked.|80 (core == max/3, workQueue == 1)| |JRaft-AppendEntries-Processor|A pool of single thread executors. Used only if a replication pipelining is enabled. Handles append entries requests and responses (used by the replication flow). Threads are started on demand. Each replication pair (leader-follower) uses dedicated single thread executor from the pool, so all messages between replication peer pairs are processed sequentially.|SystemPropertyUtil.getInt( "jraft.append.entries.threads.send", Math.max(16, Ints.findNextPositivePowerOfTwo(cpus() * 2)));| |NodeImpl-Disruptor|A striped disruptor for batching FSM (finite state machine) user tasks.|DEFAULT_STRIPES = Utils.cpus() * 2| |ReadOnlyService-Disruptor|A striped disruptor for batching read requests before doing read index request.|DEFAULT_STRIPES = Utils.cpus() * 2| |LogManager-Disruptor|A striped disruptor for delivering log entries to a storage.|DEFAULT_STRIPES = Utils.cpus() * 2| |FSMCaller-Disruptor|A striped disruptor for FSM callbacks.|DEFAULT_STRIPES = Utils.cpus() * 2| |SnapshotTimer|A timer for periodic snapshot creation.|Math.min(Utils.cpus() * 3, 20) (core, max == Integer.MAX_VALUE)| |ElectionTimer|A timer to handle election timeout on followers.|Math.min(Utils.cpus() * 3, 20) (core, max == Integer.MAX_VALUE)| |VoteTimer|A timer to handle vote timeout when a leader was not confirmed by majority.|Math.min(Utils.cpus() * 3, 20) (core, max == Integer.MAX_VALUE)| |StepDownTimer|A timer to process leader step down condition.|Math.min(Utils.cpus() * 3, 20)
[jira] [Updated] (IGNITE-20124) Prevent double storage updates within primary
[ https://issues.apache.org/jira/browse/IGNITE-20124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denis Chudov updated IGNITE-20124: -- Description: h3. Motivation In order to preserve the guarantee that the primary replica is always up-to-date it's required to: * In case of common RW transaction - insert writeIntent to the storage within primary before replication. * In case of one-phase-commit - insert commitedWrite after the replication. Both have already been done. However, that means that if primary is part of the replication group, and it's true in almost all cases, we will double the update: * In case of common RW transaction - through the replication. * In case of one-phase-commit - either through the replication, or though post update, if replication was fast enough. h3. Definition of Done * Prevent double storage updates within primary. h3. Implementation Notes The easiest way to prevent double insert is to skip one if local safe time is greater or equal to candidates. There are 3 places where we update partition storage: # Primary pre-replication update. In that case, the second update on replication should be excluded. # Primary post-replication update in case of 1PC. It's possible to see already updated data if replication was already processed locally. It is expected to be already covered in https://issues.apache.org/jira/browse/IGNITE-15927 . We should check the primary safe time on post-replication update and don't do update if the safe time is already adjusted. # Insert through replication. In case of !1PC on every primary there will be double insert (see 1). In case of 1PC it depends, so we should check the safe time on primary to know whether the update should be done (see 2). In every case, the storage indexes still should be adjusted on replication, as it is done now, because the progress of indexes on FSM write operations should not be violated - otherwise, a Raft snapshot-based rebalance would be broken. We may have two non-consistent storage updates on primary which may affect different fsyncs, so maybe we should benchmark this optimization to find out how useful it is. The transactional correctness isn't violated by these was: h3. Motivation In order to preserve the guarantee that the primary replica is always up-to-date it's required to: * In case of common RW transaction - insert writeIntent to the storage within primary before replication. * In case of one-phase-commit - insert commitedWrite after the replication. Both have already been done. However, that means that if primary is part of the replication group, and it's true in almost all cases, we will double the update: * In case of common RW transaction - through the replication. * In case of one-phase-commit - either through the replication, or though post update, if replication was fast enough. h3. Definition of Done * Prevent double storage updates within primary. h3. Implementation Notes The easiest way to prevent double insert is to skip one if local safe time is greater or equal to candidates. There are 3 places where we update partition storage: # Primary pre-replication update. In that case, the second update on replication should be excluded. # Primary post-replication update in case of 1PC. It's possible to see already updated data if replication was already processed locally. It is expected to be already covered in https://issues.apache.org/jira/browse/IGNITE-15927 . We should check the primary safe time on post-replication update and don't do update if the safe time is already adjusted. # Insert through replication. In case of !1PC on every primary there will be double insert (see 1). In case of 1PC it depends, so we should check the safe time on primary to know whether the update should be done (see 2). In every case, the storage indexes still should be adjusted on replication, as it is done now, because the progress of indexes on FSM write operations should not be violated - otherwise, a Raft snapshot-based rebalance would be broken. We may have two non-consistent storage updates on primary which may affect different fsyncs, > Prevent double storage updates within primary > - > > Key: IGNITE-20124 > URL: https://issues.apache.org/jira/browse/IGNITE-20124 > Project: Ignite > Issue Type: Improvement >Reporter: Alexander Lapin >Priority: Major > Labels: ignite-3, transactions > > h3. Motivation > In order to preserve the guarantee that the primary replica is always > up-to-date it's required to: > * In case of common RW transaction - insert writeIntent to the storage > within primary before replication. > * In case of one-phase-commit - insert commitedWrite after the replication. > Both have already been done. However, that means that if p
[jira] [Comment Edited] (IGNITE-20165) Revisit the configuration of thread pools used by JRaft
[ https://issues.apache.org/jira/browse/IGNITE-20165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17753233#comment-17753233 ] Mirza Aliev edited comment on IGNITE-20165 at 8/14/23 9:05 AM: --- By default, all executors are shared among the instance of Loza, meaning that all raft groups share executors. Below I've represented all JRaft executors with short description and the number of threads ||Pool name||Description||Number of Threads|| |JRaft-Common-Executor|A pool for processing short-lived asynchronous tasks. Should never be blocked.|Utils.cpus() (core == max)| |JRaft-Node-Scheduler|A scheduled executor for running delayed or repeating tasks.|Math.min(Utils.cpus() * 3, 20) (core, max == Integer.MAX_VALUE, DelayedWorkQueue)| |JRaft-Request-Processor|A default pool for handling RAFT requests. Should never be blocked.|Utils.cpus() * 6 (core == max)| |JRaft-Response-Processor|A default pool for handling RAFT responses. Should never be blocked.|80 (core == max/3, workQueue == 1)| |JRaft-AppendEntries-Processor|A pool of single thread executors. Used only if a replication pipelining is enabled. Handles append entries requests and responses (used by the replication flow). Threads are started on demand. Each replication pair (leader-follower) uses dedicated single thread executor from the pool, so all messages between replication peer pairs are processed sequentially.|SystemPropertyUtil.getInt( "jraft.append.entries.threads.send", Math.max(16, Ints.findNextPositivePowerOfTwo(cpus() * 2)));| |NodeImpl-Disruptor|A striped disruptor for batching FSM (finite state machine) user tasks.|DEFAULT_STRIPES = Utils.cpus() * 2| |ReadOnlyService-Disruptor|A striped disruptor for batching read requests before doing read index request.|DEFAULT_STRIPES = Utils.cpus() * 2| |LogManager-Disruptor|A striped disruptor for delivering log entries to a storage.|DEFAULT_STRIPES = Utils.cpus() * 2| |FSMCaller-Disruptor|A striped disruptor for FSM callbacks.|DEFAULT_STRIPES = Utils.cpus() * 2| |SnapshotTimer|A timer for periodic snapshot creation.|Math.min(Utils.cpus() * 3, 20) (core, max == Integer.MAX_VALUE, DelayedWorkQueue)| |ElectionTimer|A timer to handle election timeout on followers.|Math.min(Utils.cpus() * 3, 20) (core, max == Integer.MAX_VALUE, DelayedWorkQueue)| |VoteTimer|A timer to handle vote timeout when a leader was not confirmed by majority.|Math.min(Utils.cpus() * 3, 20) (core, max == Integer.MAX_VALUE, DelayedWorkQueue)| |StepDownTimer|A timer to process leader step down condition.|Math.min(Utils.cpus() * 3, 20) (core, max == Integer.MAX_VALUE, DelayedWorkQueue)| was (Author: maliev): By default, all executors are shared among the instance of Loza, meaning that all raft groups share executors. Below I've represented all JRaft executors with short description and the number of threads ||Pool name||Description||Number of Threads|| |JRaft-Common-Executor|A pool for processing short-lived asynchronous tasks. Should never be blocked.|Utils.cpus() (core == max)| |JRaft-Node-Scheduler|A scheduled executor for running delayed or repeating tasks.|Math.min(Utils.cpus() * 3, 20) (core, max == Integer.MAX_VALUE, DelayedWorkQueue)| |JRaft-Request-Processor|A default pool for handling RAFT requests. Should never be blocked.|Utils.cpus() * 6 (core == max)| |JRaft-Response-Processor|A default pool for handling RAFT responses. Should never be blocked.|80 (core == max/3, workQueue == 1)| |JRaft-AppendEntries-Processor|A pool of single thread executors. Used only if a replication pipelining is enabled. Handles append entries requests and responses (used by the replication flow). Threads are started on demand. Each replication pair (leader-follower) uses dedicated single thread executor from the pool, so all messages between replication peer pairs are processed sequentially.|SystemPropertyUtil.getInt( "jraft.append.entries.threads.send", Math.max(16, Ints.findNextPositivePowerOfTwo(cpus() * 2)));| |NodeImpl-Disruptor|A striped disruptor for batching FSM (finite state machine) user tasks.|DEFAULT_STRIPES = Utils.cpus() * 2| |ReadOnlyService-Disruptor|A striped disruptor for batching read requests before doing read index request.|DEFAULT_STRIPES = Utils.cpus() * 2| |LogManager-Disruptor|A striped disruptor for delivering log entries to a storage.|DEFAULT_STRIPES = Utils.cpus() * 2| |FSMCaller-Disruptor|A striped disruptor for FSM callbacks.|DEFAULT_STRIPES = Utils.cpus() * 2| |SnapshotTimer|A timer for periodic snapshot creation.|Math.min(Utils.cpus() * 3, 20) (core, max == Integer.MAX_VALUE)| |ElectionTimer|A timer to handle election timeout on followers.|Math.min(Utils.cpus() * 3, 20) (core, max == Integer.MAX_VALUE)| |VoteTimer|A timer to handle vote timeout when a leader was not confirmed by majority.|Math.min(Utils.cpus() * 3, 20) (core, max == Integer.MAX_VALUE)|
[jira] [Updated] (IGNITE-20202) Base metrics for SQL thread pools
[ https://issues.apache.org/jira/browse/IGNITE-20202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yury Gerzhedovich updated IGNITE-20202: --- Description: Let's introduce queue size for planning thread pool. Type is a Gauge. Suggested names are * {code:java} sql.plan.QueueSize {code} was: Let's introduce queue size for SQL execution and planning thread pools. Type is a Gauge. Suggested names are * {code:java} sql.plan.QueueSize {code} * {code:java} sql.execution.stripe..QueueSize {code} > Base metrics for SQL thread pools > - > > Key: IGNITE-20202 > URL: https://issues.apache.org/jira/browse/IGNITE-20202 > Project: Ignite > Issue Type: Improvement > Components: sql >Reporter: Yury Gerzhedovich >Priority: Major > Labels: ignite-3 > > Let's introduce queue size for planning thread pool. > Type is a Gauge. Suggested names are > * {code:java} > sql.plan.QueueSize > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20202) Introduce queue size of SQL plan thread pool as metric
[ https://issues.apache.org/jira/browse/IGNITE-20202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yury Gerzhedovich updated IGNITE-20202: --- Summary: Introduce queue size of SQL plan thread pool as metric (was: Base metrics for SQL thread pools) > Introduce queue size of SQL plan thread pool as metric > --- > > Key: IGNITE-20202 > URL: https://issues.apache.org/jira/browse/IGNITE-20202 > Project: Ignite > Issue Type: Improvement > Components: sql >Reporter: Yury Gerzhedovich >Priority: Major > Labels: ignite-3 > > Let's introduce queue size for planning thread pool. > Type is a Gauge. Suggested names are > * {code:java} > sql.plan.QueueSize > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20202) Introduce queue size of SQL plan thread pool as metric
[ https://issues.apache.org/jira/browse/IGNITE-20202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yury Gerzhedovich updated IGNITE-20202: --- Description: Let's introduce queue size for planning thread pool. Type is a Gauge. Suggested name are * {code:java} sql.plan.QueueSize {code} was: Let's introduce queue size for planning thread pool. Type is a Gauge. Suggested names are * {code:java} sql.plan.QueueSize {code} > Introduce queue size of SQL plan thread pool as metric > --- > > Key: IGNITE-20202 > URL: https://issues.apache.org/jira/browse/IGNITE-20202 > Project: Ignite > Issue Type: Improvement > Components: sql >Reporter: Yury Gerzhedovich >Priority: Major > Labels: ignite-3 > > Let's introduce queue size for planning thread pool. > Type is a Gauge. Suggested name are > * {code:java} > sql.plan.QueueSize > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20202) Introduce queue size of SQL plan thread pool as metric
[ https://issues.apache.org/jira/browse/IGNITE-20202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yury Gerzhedovich updated IGNITE-20202: --- Description: Let's introduce queue size for planning thread pool. Type is a Gauge. Suggested name is * {code:java} sql.plan.QueueSize {code} was: Let's introduce queue size for planning thread pool. Type is a Gauge. Suggested name are * {code:java} sql.plan.QueueSize {code} > Introduce queue size of SQL plan thread pool as metric > --- > > Key: IGNITE-20202 > URL: https://issues.apache.org/jira/browse/IGNITE-20202 > Project: Ignite > Issue Type: Improvement > Components: sql >Reporter: Yury Gerzhedovich >Priority: Major > Labels: ignite-3 > > Let's introduce queue size for planning thread pool. > Type is a Gauge. Suggested name is > * {code:java} > sql.plan.QueueSize > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (IGNITE-20165) Revisit the configuration of thread pools used by JRaft
[ https://issues.apache.org/jira/browse/IGNITE-20165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17753233#comment-17753233 ] Mirza Aliev edited comment on IGNITE-20165 at 8/14/23 9:35 AM: --- By default, all executors are shared among the instance of Loza, meaning that all raft groups share executors. Below I've represented all JRaft executors with short description and the number of threads ||Pool name||Description||Number of Threads|| |JRaft-Common-Executor|A pool for processing short-lived asynchronous tasks. Should never be blocked.|Utils.cpus() (core == max)| |JRaft-Node-Scheduler|A scheduled executor for running delayed or repeating tasks.|Math.min(Utils.cpus() * 3, 20) (core, max == Integer.MAX_VALUE, DelayedWorkQueue)| |JRaft-Request-Processor|A default pool for handling RAFT requests. Should never be blocked.|Utils.cpus() * 6 (core == max)| |JRaft-Response-Processor|A default pool for handling RAFT responses. Should never be blocked.|80 (core == max/3, workQueue == 1)| |JRaft-AppendEntries-Processor|A pool of single thread executors. Used only if a replication pipelining is enabled (is is enabled by default). Handles append entries requests and responses (used by the replication flow). Threads are started on demand. Each replication pair (leader-follower) uses dedicated single thread executor from the pool, so all messages between replication peer pairs are processed sequentially.|SystemPropertyUtil.getInt( "jraft.append.entries.threads.send", Math.max(16, Ints.findNextPositivePowerOfTwo(cpus() * 2)));| |NodeImpl-Disruptor|A striped disruptor for batching FSM (finite state machine) user tasks.|DEFAULT_STRIPES = Utils.cpus() * 2| |ReadOnlyService-Disruptor|A striped disruptor for batching read requests before doing read index request.|DEFAULT_STRIPES = Utils.cpus() * 2| |LogManager-Disruptor|A striped disruptor for delivering log entries to a storage.|DEFAULT_STRIPES = Utils.cpus() * 2| |FSMCaller-Disruptor|A striped disruptor for FSM callbacks.|DEFAULT_STRIPES = Utils.cpus() * 2| |SnapshotTimer|A timer for periodic snapshot creation.|Math.min(Utils.cpus() * 3, 20) (core, max == Integer.MAX_VALUE, DelayedWorkQueue)| |ElectionTimer|A timer to handle election timeout on followers.|Math.min(Utils.cpus() * 3, 20) (core, max == Integer.MAX_VALUE, DelayedWorkQueue)| |VoteTimer|A timer to handle vote timeout when a leader was not confirmed by majority.|Math.min(Utils.cpus() * 3, 20) (core, max == Integer.MAX_VALUE, DelayedWorkQueue)| |StepDownTimer|A timer to process leader step down condition.|Math.min(Utils.cpus() * 3, 20) (core, max == Integer.MAX_VALUE, DelayedWorkQueue)| was (Author: maliev): By default, all executors are shared among the instance of Loza, meaning that all raft groups share executors. Below I've represented all JRaft executors with short description and the number of threads ||Pool name||Description||Number of Threads|| |JRaft-Common-Executor|A pool for processing short-lived asynchronous tasks. Should never be blocked.|Utils.cpus() (core == max)| |JRaft-Node-Scheduler|A scheduled executor for running delayed or repeating tasks.|Math.min(Utils.cpus() * 3, 20) (core, max == Integer.MAX_VALUE, DelayedWorkQueue)| |JRaft-Request-Processor|A default pool for handling RAFT requests. Should never be blocked.|Utils.cpus() * 6 (core == max)| |JRaft-Response-Processor|A default pool for handling RAFT responses. Should never be blocked.|80 (core == max/3, workQueue == 1)| |JRaft-AppendEntries-Processor|A pool of single thread executors. Used only if a replication pipelining is enabled. Handles append entries requests and responses (used by the replication flow). Threads are started on demand. Each replication pair (leader-follower) uses dedicated single thread executor from the pool, so all messages between replication peer pairs are processed sequentially.|SystemPropertyUtil.getInt( "jraft.append.entries.threads.send", Math.max(16, Ints.findNextPositivePowerOfTwo(cpus() * 2)));| |NodeImpl-Disruptor|A striped disruptor for batching FSM (finite state machine) user tasks.|DEFAULT_STRIPES = Utils.cpus() * 2| |ReadOnlyService-Disruptor|A striped disruptor for batching read requests before doing read index request.|DEFAULT_STRIPES = Utils.cpus() * 2| |LogManager-Disruptor|A striped disruptor for delivering log entries to a storage.|DEFAULT_STRIPES = Utils.cpus() * 2| |FSMCaller-Disruptor|A striped disruptor for FSM callbacks.|DEFAULT_STRIPES = Utils.cpus() * 2| |SnapshotTimer|A timer for periodic snapshot creation.|Math.min(Utils.cpus() * 3, 20) (core, max == Integer.MAX_VALUE, DelayedWorkQueue)| |ElectionTimer|A timer to handle election timeout on followers.|Math.min(Utils.cpus() * 3, 20) (core, max == Integer.MAX_VALUE, DelayedWorkQueue)| |VoteTimer|A timer to handle vote timeout when a leader was not confirmed by majority.|M
[jira] [Updated] (IGNITE-20201) Node failure when incorrect names are used for hitrate and histogram metrics configuration
[ https://issues.apache.org/jira/browse/IGNITE-20201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ilya Shishkov updated IGNITE-20201: --- Description: There are no metric name validation when we perform hitrate and historgam metrics configuration by means of control script. It can lead to impossibility to restart persistent cluster. *How to reproduce:* # Start persistent cluster # Just repeat commands from instructions [1]. {noformat} control.sh —metric —configure-histogram histogram-metric-name 1,2,3 control.sh —metric —configure-hitrate hitrate-metric-name 1000 {noformat} # Deactivate and restart cluster. # Start and activate cluster and nodes will fail with following error: {noformat} [19:47:26,981][SEVERE][main][IgniteKernal] Got exception while starting (will rollback startup routine). java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at java.lang.String.substring(String.java:1967) at org.apache.ignite.internal.processors.metric.impl.MetricUtils.fromFullName(MetricUtils.java:72) at org.apache.ignite.internal.processors.metric.GridMetricManager.find(GridMetricManager.java:502) at org.apache.ignite.internal.processors.metric.GridMetricManager.onHistogramConfigChanged(GridMetricManager.java:480) at org.apache.ignite.internal.processors.metric.GridMetricManager.access$300(GridMetricManager.java:73) at org.apache.ignite.internal.processors.metric.GridMetricManager$1.lambda$onReadyForRead$1(GridMetricManager.java:272) at org.apache.ignite.internal.processors.metastorage.persistence.InMemoryCachedDistributedMetaStorageBridge.iterate(InMemoryCachedDistributedMetaStorageBridge.java:87) at org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl.iterate(DistributedMetaStorageImpl.java:542) at org.apache.ignite.internal.processors.metric.GridMetricManager$1.onReadyForRead(GridMetricManager.java:272) at org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl.notifyReadyForRead(DistributedMetaStorageImpl.java:355) at org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl.onMetaStorageReadyForRead(DistributedMetaStorageImpl.java:434) at org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl.access$200(DistributedMetaStorageImpl.java:116) at org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl$2.onReadyForRead(DistributedMetaStorageImpl.java:259) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetastorageReadyForRead(GridCacheDatabaseSharedManager.java:430) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readMetastore(GridCacheDatabaseSharedManager.java:877) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetaStorageSubscribersOnReadyForRead(GridCacheDatabaseSharedManager.java:3094) at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1120) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1725) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1647) at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1089) at org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:983) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:889) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:808) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:678) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:647) at org.apache.ignite.Ignition.start(Ignition.java:325) at org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:365) {noformat} Failure occurs when {{GridMetricManager}} tries to parse entries with incorrect metric names from metastorage: {noformat} metrics.histogram.histogram-metric-name [1, 2, 3] metrics.hitrate.hitrate-metric-name 1000 {noformat} Solution: # Add extra validation of metric name into {{\-\-metric \-\-configure-*}} command. # Add exception handling into {{GridMetricManager.onHistogramConfigChanged}} and {{GridMetricManager.onHitrateConfigChanged}}. *Workaround:* Clean metastorage. Links: # https://ignite.apache.org/docs/latest/tools/control-script#metric-configure-command was: There are no metric name validation when we perform hitrate and historgam metrics configuration by means of control script. It can lead to impossibility to r
[jira] [Updated] (IGNITE-20201) Node failure when incorrect names are used for hitrate and histogram metrics configuration
[ https://issues.apache.org/jira/browse/IGNITE-20201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ilya Shishkov updated IGNITE-20201: --- Description: There are no metric name validation when we perform hitrate and historgam metrics configuration by means of control script. It can lead to impossibility to restart persistent cluster. *How to reproduce:* # Start persistent cluster # Just repeat commands from instructions [1]. {noformat} control.sh —metric —configure-histogram histogram-metric-name 1,2,3 control.sh —metric —configure-hitrate hitrate-metric-name 1000 {noformat} # Deactivate and restart cluster. # Start and activate cluster and nodes will fail with following error: {noformat} [19:47:26,981][SEVERE][main][IgniteKernal] Got exception while starting (will rollback startup routine). java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at java.lang.String.substring(String.java:1967) at org.apache.ignite.internal.processors.metric.impl.MetricUtils.fromFullName(MetricUtils.java:72) at org.apache.ignite.internal.processors.metric.GridMetricManager.find(GridMetricManager.java:502) at org.apache.ignite.internal.processors.metric.GridMetricManager.onHistogramConfigChanged(GridMetricManager.java:480) at org.apache.ignite.internal.processors.metric.GridMetricManager.access$300(GridMetricManager.java:73) at org.apache.ignite.internal.processors.metric.GridMetricManager$1.lambda$onReadyForRead$1(GridMetricManager.java:272) at org.apache.ignite.internal.processors.metastorage.persistence.InMemoryCachedDistributedMetaStorageBridge.iterate(InMemoryCachedDistributedMetaStorageBridge.java:87) at org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl.iterate(DistributedMetaStorageImpl.java:542) at org.apache.ignite.internal.processors.metric.GridMetricManager$1.onReadyForRead(GridMetricManager.java:272) at org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl.notifyReadyForRead(DistributedMetaStorageImpl.java:355) at org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl.onMetaStorageReadyForRead(DistributedMetaStorageImpl.java:434) at org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl.access$200(DistributedMetaStorageImpl.java:116) at org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl$2.onReadyForRead(DistributedMetaStorageImpl.java:259) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetastorageReadyForRead(GridCacheDatabaseSharedManager.java:430) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readMetastore(GridCacheDatabaseSharedManager.java:877) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetaStorageSubscribersOnReadyForRead(GridCacheDatabaseSharedManager.java:3094) at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1120) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1725) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1647) at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1089) at org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:983) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:889) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:808) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:678) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:647) at org.apache.ignite.Ignition.start(Ignition.java:325) at org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:365) {noformat} Failure occurs when {{GridMetricManager}} tries to parse entries with incorrect metric names from metastorage: {noformat} metrics.histogram.histogram-metric-name [1, 2, 3] metrics.hitrate.hitrate-metric-name 1000 {noformat} Solution: # Add extra validation of metric name into {{--metic --configure-*}} command. # Add exception handling into {{GridMetricManager.onHistogramConfigChanged}} and {{GridMetricManager.onHitrateConfigChanged}}. *Workaround:* Clean metastorage. Links: # https://ignite.apache.org/docs/latest/tools/control-script#metric-configure-command was: There are no metric name validation when we perform hitrate and historgam metrics configuration by means of control script. It can lead to impossibility to restart
[jira] [Updated] (IGNITE-20201) Node failure when incorrect names are used for hitrate and histogram metrics configuration
[ https://issues.apache.org/jira/browse/IGNITE-20201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ilya Shishkov updated IGNITE-20201: --- Description: There are no metric name validation when we perform hitrate and historgam metrics configuration by means of control script. It can lead to impossibility to restart persistent cluster. *How to reproduce:* # Start persistent cluster # Just repeat commands from instructions [1]. {noformat} control.sh —metric —configure-histogram histogram-metric-name 1,2,3 control.sh —metric —configure-hitrate hitrate-metric-name 1000 {noformat} # Deactivate and restart cluster. # Start and activate cluster and nodes will fail with following error: {noformat} [19:47:26,981][SEVERE][main][IgniteKernal] Got exception while starting (will rollback startup routine). java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at java.lang.String.substring(String.java:1967) at org.apache.ignite.internal.processors.metric.impl.MetricUtils.fromFullName(MetricUtils.java:72) at org.apache.ignite.internal.processors.metric.GridMetricManager.find(GridMetricManager.java:502) at org.apache.ignite.internal.processors.metric.GridMetricManager.onHistogramConfigChanged(GridMetricManager.java:480) at org.apache.ignite.internal.processors.metric.GridMetricManager.access$300(GridMetricManager.java:73) at org.apache.ignite.internal.processors.metric.GridMetricManager$1.lambda$onReadyForRead$1(GridMetricManager.java:272) at org.apache.ignite.internal.processors.metastorage.persistence.InMemoryCachedDistributedMetaStorageBridge.iterate(InMemoryCachedDistributedMetaStorageBridge.java:87) at org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl.iterate(DistributedMetaStorageImpl.java:542) at org.apache.ignite.internal.processors.metric.GridMetricManager$1.onReadyForRead(GridMetricManager.java:272) at org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl.notifyReadyForRead(DistributedMetaStorageImpl.java:355) at org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl.onMetaStorageReadyForRead(DistributedMetaStorageImpl.java:434) at org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl.access$200(DistributedMetaStorageImpl.java:116) at org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl$2.onReadyForRead(DistributedMetaStorageImpl.java:259) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetastorageReadyForRead(GridCacheDatabaseSharedManager.java:430) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readMetastore(GridCacheDatabaseSharedManager.java:877) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetaStorageSubscribersOnReadyForRead(GridCacheDatabaseSharedManager.java:3094) at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1120) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1725) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1647) at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1089) at org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:983) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:889) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:808) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:678) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:647) at org.apache.ignite.Ignition.start(Ignition.java:325) at org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:365) {noformat} Failure occurs when {{GridMetricManager}} tries to parse entries with incorrect metric names from metastorage: {noformat} metrics.histogram.histogram-metric-name [1, 2, 3] metrics.hitrate.hitrate-metric-name 1000 {noformat} Solution: # Add extra validation of metric name into {{\-\-metric \-\-configure-*}} command. # Add exception handling into {{GridMetricManager.onHistogramConfigChanged}} and {{GridMetricManager.onHitRateConfigChanged}}. *Workaround:* Clean metastorage. Links: # https://ignite.apache.org/docs/latest/tools/control-script#metric-configure-command was: There are no metric name validation when we perform hitrate and historgam metrics configuration by means of control script. It can lead to impossibility to r
[jira] [Updated] (IGNITE-20201) Node failure when incorrect names are used for hitrate and histogram metrics configuration
[ https://issues.apache.org/jira/browse/IGNITE-20201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ilya Shishkov updated IGNITE-20201: --- Description: There are no metric name validation when we perform hitrate and historgam metrics configuration by means of control script. It can lead to impossibility to restart persistent cluster. *How to reproduce:* # Start persistent cluster # Just repeat commands from instructions [1]. {noformat} control.sh —metric —configure-histogram histogram-metric-name 1,2,3 control.sh —metric —configure-hitrate hitrate-metric-name 1000 {noformat} # Deactivate and restart cluster. # Start and activate cluster and nodes will fail with following error: {noformat} [19:47:26,981][SEVERE][main][IgniteKernal] Got exception while starting (will rollback startup routine). java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at java.lang.String.substring(String.java:1967) at org.apache.ignite.internal.processors.metric.impl.MetricUtils.fromFullName(MetricUtils.java:72) at org.apache.ignite.internal.processors.metric.GridMetricManager.find(GridMetricManager.java:502) at org.apache.ignite.internal.processors.metric.GridMetricManager.onHistogramConfigChanged(GridMetricManager.java:480) at org.apache.ignite.internal.processors.metric.GridMetricManager.access$300(GridMetricManager.java:73) at org.apache.ignite.internal.processors.metric.GridMetricManager$1.lambda$onReadyForRead$1(GridMetricManager.java:272) at org.apache.ignite.internal.processors.metastorage.persistence.InMemoryCachedDistributedMetaStorageBridge.iterate(InMemoryCachedDistributedMetaStorageBridge.java:87) at org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl.iterate(DistributedMetaStorageImpl.java:542) at org.apache.ignite.internal.processors.metric.GridMetricManager$1.onReadyForRead(GridMetricManager.java:272) at org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl.notifyReadyForRead(DistributedMetaStorageImpl.java:355) at org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl.onMetaStorageReadyForRead(DistributedMetaStorageImpl.java:434) at org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl.access$200(DistributedMetaStorageImpl.java:116) at org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl$2.onReadyForRead(DistributedMetaStorageImpl.java:259) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetastorageReadyForRead(GridCacheDatabaseSharedManager.java:430) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readMetastore(GridCacheDatabaseSharedManager.java:877) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetaStorageSubscribersOnReadyForRead(GridCacheDatabaseSharedManager.java:3094) at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1120) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1725) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1647) at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1089) at org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:983) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:889) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:808) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:678) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:647) at org.apache.ignite.Ignition.start(Ignition.java:325) at org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:365) {noformat} Failure occurs when {{GridMetricManager}} tries to parse entries with incorrect metric names from metastorage: {noformat} metrics.histogram.histogram-metric-name [1, 2, 3] metrics.hitrate.hitrate-metric-name 1000 {noformat} *Solution:* # Add extra validation of metric name into {{\-\-metric \-\-configure-*}} command. # Add exception handling into {{GridMetricManager.onHistogramConfigChanged}} and {{GridMetricManager.onHitRateConfigChanged}}. *Workaround:* Clean metastorage. Links: # https://ignite.apache.org/docs/latest/tools/control-script#metric-configure-command was: There are no metric name validation when we perform hitrate and historgam metrics configuration by means of control script. It can lead to impossibility to
[jira] [Resolved] (IGNITE-18902) Too much threads started on empty cluster
[ https://issues.apache.org/jira/browse/IGNITE-18902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Polovtcev resolved IGNITE-18902. -- Resolution: Duplicate > Too much threads started on empty cluster > - > > Key: IGNITE-18902 > URL: https://issues.apache.org/jira/browse/IGNITE-18902 > Project: Ignite > Issue Type: Improvement >Reporter: Konstantin Orlov >Priority: Major > Labels: ignite-3 > Attachments: after_start.txt, after_table_creation.txt > > > Seems we start unreasonable amount of threads. Thread dump right after the > start of a single node shows a 170 threads with prefix ('idt_n_0' > in dumps), 157 of them belongs to the JRaft. Creation of a table contributes > another 160 threads to the dump (330 in total), and 114 of them belongs to > the JRaft (271 in total). > Let's investigate if we really need all those threads or we can do better. > You could find thread dumps attached below. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20201) Node failure when incorrect names are used for hitrate and histogram metrics configuration
[ https://issues.apache.org/jira/browse/IGNITE-20201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ilya Shishkov updated IGNITE-20201: --- Description: There are no metric name validation when we perform hitrate and historgam metrics configuration by means of control script. It can lead to impossibility to restart persistent cluster. *How to reproduce:* # Start persistent cluster. # Enter commands from instructions [1]. {noformat} control.sh —metric —configure-histogram histogram-metric-name 1,2,3 control.sh —metric —configure-hitrate hitrate-metric-name 1000 {noformat} # Deactivate and restart cluster. # Start and activate cluster and nodes will fail with following error: {noformat} [19:47:26,981][SEVERE][main][IgniteKernal] Got exception while starting (will rollback startup routine). java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at java.lang.String.substring(String.java:1967) at org.apache.ignite.internal.processors.metric.impl.MetricUtils.fromFullName(MetricUtils.java:72) at org.apache.ignite.internal.processors.metric.GridMetricManager.find(GridMetricManager.java:502) at org.apache.ignite.internal.processors.metric.GridMetricManager.onHistogramConfigChanged(GridMetricManager.java:480) at org.apache.ignite.internal.processors.metric.GridMetricManager.access$300(GridMetricManager.java:73) at org.apache.ignite.internal.processors.metric.GridMetricManager$1.lambda$onReadyForRead$1(GridMetricManager.java:272) at org.apache.ignite.internal.processors.metastorage.persistence.InMemoryCachedDistributedMetaStorageBridge.iterate(InMemoryCachedDistributedMetaStorageBridge.java:87) at org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl.iterate(DistributedMetaStorageImpl.java:542) at org.apache.ignite.internal.processors.metric.GridMetricManager$1.onReadyForRead(GridMetricManager.java:272) at org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl.notifyReadyForRead(DistributedMetaStorageImpl.java:355) at org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl.onMetaStorageReadyForRead(DistributedMetaStorageImpl.java:434) at org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl.access$200(DistributedMetaStorageImpl.java:116) at org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl$2.onReadyForRead(DistributedMetaStorageImpl.java:259) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetastorageReadyForRead(GridCacheDatabaseSharedManager.java:430) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readMetastore(GridCacheDatabaseSharedManager.java:877) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetaStorageSubscribersOnReadyForRead(GridCacheDatabaseSharedManager.java:3094) at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1120) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1725) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1647) at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1089) at org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:983) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:889) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:808) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:678) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:647) at org.apache.ignite.Ignition.start(Ignition.java:325) at org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:365) {noformat} Failure occurs when {{GridMetricManager}} tries to parse entries with incorrect metric names from metastorage: {noformat} metrics.histogram.histogram-metric-name [1, 2, 3] metrics.hitrate.hitrate-metric-name 1000 {noformat} *Solution:* # Add extra validation of metric name into {{\-\-metric \-\-configure-*}} command. # Add exception handling into {{GridMetricManager.onHistogramConfigChanged}} and {{GridMetricManager.onHitRateConfigChanged}}. *Workaround:* Clean metastorage. Links: # https://ignite.apache.org/docs/latest/tools/control-script#metric-configure-command was: There are no metric name validation when we perform hitrate and historgam metrics configuration by means of control script. It can lead to impossibility to resta
[jira] [Updated] (IGNITE-20201) Node failure when incorrect names are used for hitrate and histogram metrics configuration
[ https://issues.apache.org/jira/browse/IGNITE-20201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ilya Shishkov updated IGNITE-20201: --- Description: There are no metric name validation when we perform hitrate and historgam metrics configuration by means of control script. It can lead to impossibility to restart persistent cluster. *How to reproduce:* # Start persistent cluster. # Enter commands from instructions [1]. {noformat} control.sh —metric —configure-histogram histogram-metric-name 1,2,3 control.sh —metric —configure-hitrate hitrate-metric-name 1000 {noformat} # Deactivate and restart cluster. # Start and activate cluster and nodes will fail with following error: {noformat} [19:47:26,981][SEVERE][main][IgniteKernal] Got exception while starting (will rollback startup routine). java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at java.lang.String.substring(String.java:1967) at org.apache.ignite.internal.processors.metric.impl.MetricUtils.fromFullName(MetricUtils.java:72) at org.apache.ignite.internal.processors.metric.GridMetricManager.find(GridMetricManager.java:502) at org.apache.ignite.internal.processors.metric.GridMetricManager.onHistogramConfigChanged(GridMetricManager.java:480) at org.apache.ignite.internal.processors.metric.GridMetricManager.access$300(GridMetricManager.java:73) at org.apache.ignite.internal.processors.metric.GridMetricManager$1.lambda$onReadyForRead$1(GridMetricManager.java:272) at org.apache.ignite.internal.processors.metastorage.persistence.InMemoryCachedDistributedMetaStorageBridge.iterate(InMemoryCachedDistributedMetaStorageBridge.java:87) at org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl.iterate(DistributedMetaStorageImpl.java:542) at org.apache.ignite.internal.processors.metric.GridMetricManager$1.onReadyForRead(GridMetricManager.java:272) at org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl.notifyReadyForRead(DistributedMetaStorageImpl.java:355) at org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl.onMetaStorageReadyForRead(DistributedMetaStorageImpl.java:434) at org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl.access$200(DistributedMetaStorageImpl.java:116) at org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl$2.onReadyForRead(DistributedMetaStorageImpl.java:259) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetastorageReadyForRead(GridCacheDatabaseSharedManager.java:430) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readMetastore(GridCacheDatabaseSharedManager.java:877) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetaStorageSubscribersOnReadyForRead(GridCacheDatabaseSharedManager.java:3094) at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1120) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1725) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1647) at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1089) at org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:983) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:889) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:808) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:678) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:647) at org.apache.ignite.Ignition.start(Ignition.java:325) at org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:365) {noformat} Failure occurs when {{GridMetricManager}} tries to parse entries with incorrect metric names from metastorage: {noformat} metrics.histogram.histogram-metric-name [1, 2, 3] metrics.hitrate.hitrate-metric-name 1000 {noformat} *Solution:* # Add extra validation of metric name into {{\-\-metric \-\-configure-*}} command. # Add exception handling into {{GridMetricManager.onHistogramConfigChanged}} and {{GridMetricManager.onHitRateConfigChanged}}. *Workaround:* Clean metastorage. Links: # https://ignite.apache.org/docs/latest/tools/control-script#metric-configure-command was: There are no metric name validation when we perform hitrate and historgam metrics configuration by means of control script. It can lead to impossibility to rest
[jira] [Updated] (IGNITE-20201) Node failure when incorrect names are used for hitrate and histogram metrics configuration
[ https://issues.apache.org/jira/browse/IGNITE-20201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ilya Shishkov updated IGNITE-20201: --- Labels: ise (was: ) > Node failure when incorrect names are used for hitrate and histogram metrics > configuration > -- > > Key: IGNITE-20201 > URL: https://issues.apache.org/jira/browse/IGNITE-20201 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.15 >Reporter: Ilya Shishkov >Priority: Critical > Labels: ise > > There are no metric name validation when we perform hitrate and historgam > metrics configuration by means of control script. It can lead to > impossibility to restart persistent cluster. > *How to reproduce:* > # Start persistent cluster > # Just repeat commands from instructions [1]. > {noformat} > control.sh —metric —configure-histogram histogram-metric-name 1,2,3 > control.sh —metric —configure-hitrate hitrate-metric-name 1000 > {noformat} > # Deactivate and restart cluster. > # Start and activate cluster and nodes will fail with following error: > {noformat} > [19:47:26,981][SEVERE][main][IgniteKernal] Got exception while starting (will > rollback startup routine). > java.lang.StringIndexOutOfBoundsException: String index out of range: -1 > at java.lang.String.substring(String.java:1967) > at > org.apache.ignite.internal.processors.metric.impl.MetricUtils.fromFullName(MetricUtils.java:72) > at > org.apache.ignite.internal.processors.metric.GridMetricManager.find(GridMetricManager.java:502) > at > org.apache.ignite.internal.processors.metric.GridMetricManager.onHistogramConfigChanged(GridMetricManager.java:480) > at > org.apache.ignite.internal.processors.metric.GridMetricManager.access$300(GridMetricManager.java:73) > at > org.apache.ignite.internal.processors.metric.GridMetricManager$1.lambda$onReadyForRead$1(GridMetricManager.java:272) > at > org.apache.ignite.internal.processors.metastorage.persistence.InMemoryCachedDistributedMetaStorageBridge.iterate(InMemoryCachedDistributedMetaStorageBridge.java:87) > at > org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl.iterate(DistributedMetaStorageImpl.java:542) > at > org.apache.ignite.internal.processors.metric.GridMetricManager$1.onReadyForRead(GridMetricManager.java:272) > at > org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl.notifyReadyForRead(DistributedMetaStorageImpl.java:355) > at > org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl.onMetaStorageReadyForRead(DistributedMetaStorageImpl.java:434) > at > org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl.access$200(DistributedMetaStorageImpl.java:116) > at > org.apache.ignite.internal.processors.metastorage.persistence.DistributedMetaStorageImpl$2.onReadyForRead(DistributedMetaStorageImpl.java:259) > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetastorageReadyForRead(GridCacheDatabaseSharedManager.java:430) > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readMetastore(GridCacheDatabaseSharedManager.java:877) > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetaStorageSubscribersOnReadyForRead(GridCacheDatabaseSharedManager.java:3094) > at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1120) > at > org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1725) > at > org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1647) > at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1089) > at > org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:983) > at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:889) > at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:808) > at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:678) > at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:647) > at org.apache.ignite.Ignition.start(Ignition.java:325) > at > org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:365) > {noformat} > Failure occurs when {{GridMetricManager}} tries to parse entries with > incorrect metric names from metastorage: > {noformat} > metrics.histogram.histogram-metric-name [1, 2, 3] > > > metrics.hitrate.hitra
[jira] [Commented] (IGNITE-20178) Introduce param-free IgniteInternalFuture.listen(() -> {}) in addition to .listen((fut) -> {}) to avoid ignored params
[ https://issues.apache.org/jira/browse/IGNITE-20178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17753976#comment-17753976 ] Ignite TC Bot commented on IGNITE-20178: {panel:title=Branch: [pull/10885/head] Base: [master] : No blockers found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel} {panel:title=Branch: [pull/10885/head] Base: [master] : New Tests (42)|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1} {color:#8b}Snapshots{color} [[tests 42|https://ci2.ignite.apache.org/viewLog.html?buildId=7296412]] * {color:#013220}IgniteSnapshotTestSuite: testsuites.IgniteSnapshotTestSuite - PASSED{color} * {color:#013220}IgniteSnapshotTestSuite: testsuites.IgniteSnapshotTestSuite - PASSED{color} * {color:#013220}IgniteSnapshotTestSuite: testsuites.IgniteSnapshotTestSuite - PASSED{color} * {color:#013220}IgniteSnapshotTestSuite: testsuites.IgniteSnapshotTestSuite - PASSED{color} * {color:#013220}IgniteSnapshotTestSuite: testsuites.IgniteSnapshotTestSuite - PASSED{color} * {color:#013220}IgniteSnapshotTestSuite: testsuites.IgniteSnapshotTestSuite - PASSED{color} * {color:#013220}IgniteSnapshotTestSuite: testsuites.IgniteSnapshotTestSuite - PASSED{color} * {color:#013220}IgniteSnapshotTestSuite: testsuites.IgniteSnapshotTestSuite - PASSED{color} * {color:#013220}IgniteSnapshotTestSuite: testsuites.IgniteSnapshotTestSuite - PASSED{color} * {color:#013220}IgniteSnapshotTestSuite: testsuites.IgniteSnapshotTestSuite - PASSED{color} * {color:#013220}IgniteSnapshotTestSuite: testsuites.IgniteSnapshotTestSuite - PASSED{color} ... and 31 new tests {panel} [TeamCity *--> Run :: All* Results|https://ci2.ignite.apache.org/viewLog.html?buildId=7296273&buildTypeId=IgniteTests24Java8_RunAll] > Introduce param-free IgniteInternalFuture.listen(() -> {}) in addition to > .listen((fut) -> {}) to avoid ignored params > -- > > Key: IGNITE-20178 > URL: https://issues.apache.org/jira/browse/IGNITE-20178 > Project: Ignite > Issue Type: Sub-task >Reporter: Anton Vinogradov >Assignee: Anton Vinogradov >Priority: Major > Fix For: 2.16 > > Time Spent: 2.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IGNITE-16700) ItTxDistributedTestThreeNodesThreeReplicas#testBalance is flaky
[ https://issues.apache.org/jira/browse/IGNITE-16700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754021#comment-17754021 ] Denis Chudov commented on IGNITE-16700: --- This test creates 2 * CPU_COUNT threads and each thread repeats transactions transferring money from one account to another, using the number of accounts similar to threads' number. In fact, it’s load test from some point of view as it discovers performance problems. The reason of test failures are replication timeout exception but the reasons of exceptions are different. * upsert operations timeouts: the reason of these timeouts is long waiting of lock acquisition because of high contention, and lock release after cleanup, so that there can be a queue of waiters to acquire lock for each key, and each of them wait for tx cleanup. * any command timeouts: seems that there are problems with rocksdb log storage, and storage flush in RocksDbSharedLogStorage#commitWriteBatch: having batch size of several hundred of bytes, the db put operation can last over a second. I see many such records in log while logging time for flushing that took over 100 ms. If I turn off fsync for Raft log, and increase number of accounts by 10 times, it drastically reduces the fail rate of the test (no failures after 600 runs, comparing with 1 per ~25 without fixes). The problem with Raft storage needs separate ticket. > ItTxDistributedTestThreeNodesThreeReplicas#testBalance is flaky > --- > > Key: IGNITE-16700 > URL: https://issues.apache.org/jira/browse/IGNITE-16700 > Project: Ignite > Issue Type: Bug >Reporter: Mirza Aliev >Assignee: Denis Chudov >Priority: Major > Labels: ignite-3 > Attachments: _Integration_Tests_Module_Table_2055.log, > _Integration_Tests_Module_Table_2098.log > > > {{ItTxDistributedTestThreeNodesThreeReplicas#testBalance}} periodically falls > with > {noformat} > org.apache.ignite.lang.IgniteException > org.apache.ignite.lang.IgniteException: java.util.concurrent.TimeoutException > ==> expected: but was: > {noformat} > We've noticed that the test become flaky after IGNITE-16393 has been merged. > Probably, the current problem is related to the problem with stopping > executors for network's user object serialization threads IGNITE-16699 as far > as the logs are full of warnings from IGNITE-16699. > The plan for this ticket is to wait for IGNITE-16699 to be fixed and check > whether this issue is still reproducible. > https://ci.ignite.apache.org/buildConfiguration/ignite3_Test_IntegrationTests_ModuleTable/6466138 > UPD: Ticket IGNITE-16699 has been fixed and but the current ticket is still > reproducible, so the problem is not related to IGNITE-16699. > In logs, we can see some suspicious message, need to investigate if this is > related to the problem. Actual run > https://ci.ignite.apache.org/buildConfiguration/ignite3_Test_RunAllTests/6470268, > actual logs are attached > {noformat} > 2022-03-18 10:29:33:399 +0300 > [INFO][%ItTxDistributedTestSingleNode_null_2%JRaft-FSMCaller-Disruptor-_stripe_35-0][ActionRequestProcessor] > Error occurred on a user's state machine > class org.apache.ignite.tx.TransactionException: Failed to enlist a key into > a transaction, state=ABORTED > at > org.apache.ignite.internal.table.distributed.raft.PartitionListener.tryEnlistIntoTransaction(PartitionListener.java:196) > at > org.apache.ignite.internal.table.distributed.raft.PartitionListener.lambda$onWrite$1(PartitionListener.java:134) > at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133) > at > org.apache.ignite.internal.table.distributed.raft.PartitionListener.onWrite(PartitionListener.java:131) > at > org.apache.ignite.internal.raft.server.impl.JraftServerImpl$DelegatingStateMachine.onApply(JraftServerImpl.java:415) > at > org.apache.ignite.raft.jraft.core.FSMCallerImpl.doApplyTasks(FSMCallerImpl.java:539) > at > org.apache.ignite.raft.jraft.core.FSMCallerImpl.doCommitted(FSMCallerImpl.java:507) > at > org.apache.ignite.raft.jraft.core.FSMCallerImpl.runApplyTask(FSMCallerImpl.java:437) > at > org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:134) > at > org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:128) > at > org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:215) > at > org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:179) > at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:137) > at java.base/java.lang.Thread.run(Thread.java:834) > {noformat} -- This message was sent by Atlassian Jira (v8.
[jira] [Commented] (IGNITE-16700) ItTxDistributedTestThreeNodesThreeReplicas#testBalance is flaky
[ https://issues.apache.org/jira/browse/IGNITE-16700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754022#comment-17754022 ] Denis Chudov commented on IGNITE-16700: --- I discovered flakiness of TxLocalTest#testBalance after unmuting it (see IGNITE-20205 ) - this is a mock of transactional logic based on local dummy table. > ItTxDistributedTestThreeNodesThreeReplicas#testBalance is flaky > --- > > Key: IGNITE-16700 > URL: https://issues.apache.org/jira/browse/IGNITE-16700 > Project: Ignite > Issue Type: Bug >Reporter: Mirza Aliev >Assignee: Denis Chudov >Priority: Major > Labels: ignite-3 > Attachments: _Integration_Tests_Module_Table_2055.log, > _Integration_Tests_Module_Table_2098.log > > > {{ItTxDistributedTestThreeNodesThreeReplicas#testBalance}} periodically falls > with > {noformat} > org.apache.ignite.lang.IgniteException > org.apache.ignite.lang.IgniteException: java.util.concurrent.TimeoutException > ==> expected: but was: > {noformat} > We've noticed that the test become flaky after IGNITE-16393 has been merged. > Probably, the current problem is related to the problem with stopping > executors for network's user object serialization threads IGNITE-16699 as far > as the logs are full of warnings from IGNITE-16699. > The plan for this ticket is to wait for IGNITE-16699 to be fixed and check > whether this issue is still reproducible. > https://ci.ignite.apache.org/buildConfiguration/ignite3_Test_IntegrationTests_ModuleTable/6466138 > UPD: Ticket IGNITE-16699 has been fixed and but the current ticket is still > reproducible, so the problem is not related to IGNITE-16699. > In logs, we can see some suspicious message, need to investigate if this is > related to the problem. Actual run > https://ci.ignite.apache.org/buildConfiguration/ignite3_Test_RunAllTests/6470268, > actual logs are attached > {noformat} > 2022-03-18 10:29:33:399 +0300 > [INFO][%ItTxDistributedTestSingleNode_null_2%JRaft-FSMCaller-Disruptor-_stripe_35-0][ActionRequestProcessor] > Error occurred on a user's state machine > class org.apache.ignite.tx.TransactionException: Failed to enlist a key into > a transaction, state=ABORTED > at > org.apache.ignite.internal.table.distributed.raft.PartitionListener.tryEnlistIntoTransaction(PartitionListener.java:196) > at > org.apache.ignite.internal.table.distributed.raft.PartitionListener.lambda$onWrite$1(PartitionListener.java:134) > at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133) > at > org.apache.ignite.internal.table.distributed.raft.PartitionListener.onWrite(PartitionListener.java:131) > at > org.apache.ignite.internal.raft.server.impl.JraftServerImpl$DelegatingStateMachine.onApply(JraftServerImpl.java:415) > at > org.apache.ignite.raft.jraft.core.FSMCallerImpl.doApplyTasks(FSMCallerImpl.java:539) > at > org.apache.ignite.raft.jraft.core.FSMCallerImpl.doCommitted(FSMCallerImpl.java:507) > at > org.apache.ignite.raft.jraft.core.FSMCallerImpl.runApplyTask(FSMCallerImpl.java:437) > at > org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:134) > at > org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:128) > at > org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:215) > at > org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:179) > at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:137) > at java.base/java.lang.Thread.run(Thread.java:834) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20199) Do not return updating rebalance assignments futures in DistributionZoneRebalanceEngine#onUpdateReplicas
[ https://issues.apache.org/jira/browse/IGNITE-20199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mirza Aliev updated IGNITE-20199: - Epic Link: IGNITE-20166 > Do not return updating rebalance assignments futures in > DistributionZoneRebalanceEngine#onUpdateReplicas > - > > Key: IGNITE-20199 > URL: https://issues.apache.org/jira/browse/IGNITE-20199 > Project: Ignite > Issue Type: Bug >Reporter: Mirza Aliev >Priority: Major > Labels: ignite-3 > > Seems that the current logic in > {{DistributionZoneRebalanceEngine#onUpdateReplicas}} is not correct in terms > of futures chaining. Currently we block configuration notification thread > until all partitions would updates theirs rebalance assignments keys in > metastorage. > > {code:java} > private CompletableFuture > onUpdateReplicas(ConfigurationNotificationEvent replicasCtx) { > ... > ... > return > distributionZoneManager.dataNodes(replicasCtx.storageRevision(), > zoneCfg.zoneId()) > .thenCompose(dataNodes -> { > ... > for (TableView tableCfg : tableViews) { >... > CompletableFuture[] partitionFutures = > RebalanceUtil.triggerAllTablePartitionsRebalance(...); > tableFutures.add(allOf(partitionFutures)); > } > return > allOf(tableFutures.toArray(CompletableFuture[]::new)); > }); > ... > } {code} > As a solution, we could just return completed future in the > {{DistributionZoneRebalanceEngine#onUpdateReplicas}} after we started > asynchronous logic of updating rebalance assignmnets. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IGNITE-16700) ItTxDistributedTestThreeNodesThreeReplicas#testBalance is flaky
[ https://issues.apache.org/jira/browse/IGNITE-16700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754024#comment-17754024 ] Denis Chudov commented on IGNITE-16700: --- I made 31 builds of the Table module, seems to be okay: https://ci.ignite.apache.org/viewType.html?buildTypeId=ApacheIgnite3xGradle_Test_IntegrationTests_ModuleTable&branch_ApacheIgnite3xGradle_Test_IntegrationTests=pull%2F2439&tab=buildTypeHistoryList > ItTxDistributedTestThreeNodesThreeReplicas#testBalance is flaky > --- > > Key: IGNITE-16700 > URL: https://issues.apache.org/jira/browse/IGNITE-16700 > Project: Ignite > Issue Type: Bug >Reporter: Mirza Aliev >Assignee: Denis Chudov >Priority: Major > Labels: ignite-3 > Attachments: _Integration_Tests_Module_Table_2055.log, > _Integration_Tests_Module_Table_2098.log > > > {{ItTxDistributedTestThreeNodesThreeReplicas#testBalance}} periodically falls > with > {noformat} > org.apache.ignite.lang.IgniteException > org.apache.ignite.lang.IgniteException: java.util.concurrent.TimeoutException > ==> expected: but was: > {noformat} > We've noticed that the test become flaky after IGNITE-16393 has been merged. > Probably, the current problem is related to the problem with stopping > executors for network's user object serialization threads IGNITE-16699 as far > as the logs are full of warnings from IGNITE-16699. > The plan for this ticket is to wait for IGNITE-16699 to be fixed and check > whether this issue is still reproducible. > https://ci.ignite.apache.org/buildConfiguration/ignite3_Test_IntegrationTests_ModuleTable/6466138 > UPD: Ticket IGNITE-16699 has been fixed and but the current ticket is still > reproducible, so the problem is not related to IGNITE-16699. > In logs, we can see some suspicious message, need to investigate if this is > related to the problem. Actual run > https://ci.ignite.apache.org/buildConfiguration/ignite3_Test_RunAllTests/6470268, > actual logs are attached > {noformat} > 2022-03-18 10:29:33:399 +0300 > [INFO][%ItTxDistributedTestSingleNode_null_2%JRaft-FSMCaller-Disruptor-_stripe_35-0][ActionRequestProcessor] > Error occurred on a user's state machine > class org.apache.ignite.tx.TransactionException: Failed to enlist a key into > a transaction, state=ABORTED > at > org.apache.ignite.internal.table.distributed.raft.PartitionListener.tryEnlistIntoTransaction(PartitionListener.java:196) > at > org.apache.ignite.internal.table.distributed.raft.PartitionListener.lambda$onWrite$1(PartitionListener.java:134) > at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133) > at > org.apache.ignite.internal.table.distributed.raft.PartitionListener.onWrite(PartitionListener.java:131) > at > org.apache.ignite.internal.raft.server.impl.JraftServerImpl$DelegatingStateMachine.onApply(JraftServerImpl.java:415) > at > org.apache.ignite.raft.jraft.core.FSMCallerImpl.doApplyTasks(FSMCallerImpl.java:539) > at > org.apache.ignite.raft.jraft.core.FSMCallerImpl.doCommitted(FSMCallerImpl.java:507) > at > org.apache.ignite.raft.jraft.core.FSMCallerImpl.runApplyTask(FSMCallerImpl.java:437) > at > org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:134) > at > org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:128) > at > org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:215) > at > org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:179) > at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:137) > at java.base/java.lang.Thread.run(Thread.java:834) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IGNITE-19211) ODBC 3.0: Align metainfo provided by driver with SQL engine in 3.0
[ https://issues.apache.org/jira/browse/IGNITE-19211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Sapego reassigned IGNITE-19211: Assignee: Igor Sapego > ODBC 3.0: Align metainfo provided by driver with SQL engine in 3.0 > -- > > Key: IGNITE-19211 > URL: https://issues.apache.org/jira/browse/IGNITE-19211 > Project: Ignite > Issue Type: Improvement > Components: odbc >Reporter: Igor Sapego >Assignee: Igor Sapego >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > > Scope: > - Make sure we return proper metainformation on SQL types. Check > ignite/odbc/meta, ignite/odbc/type_traits.h, etc; > - Port tests that are applicable; > - Add new tests where needed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IGNITE-19983) C++: Support BOOLEAN datatype
[ https://issues.apache.org/jira/browse/IGNITE-19983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Sapego reassigned IGNITE-19983: Assignee: Igor Sapego > C++: Support BOOLEAN datatype > - > > Key: IGNITE-19983 > URL: https://issues.apache.org/jira/browse/IGNITE-19983 > Project: Ignite > Issue Type: Improvement > Components: platforms, thin client >Reporter: Igor Sapego >Assignee: Igor Sapego >Priority: Major > Labels: ignite-3 > > IGNITE-17298 added support for boolean type to server, so we need to add it > to C++ client as well. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20187) Catch-up rebalance on node restart: assignments keys
[ https://issues.apache.org/jira/browse/IGNITE-20187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Lapin updated IGNITE-20187: - Description: h3. Motivation Prior to the implementation of the meta storage compaction and the related node restart updates, the node restored its volatile state in terms of assignments through ms.watches starting from APPLIED_REVISION + 1. Meaning that after the restart, the node was notified about missing state through {*}the events{*}. However, it's no longer true: new logic assumes that the node will register ms.watch starting from APPLIED_REVISION + X + 1 and will manually read local meta storage state for APPLIED_REVISION +X along with related processing. The implementation of the above process is the essence of this ticket. h3. Definition of Done Within node restart process, TableManager or similar should manually read local assignments pending keys (reading assignments stable will be covered in a separate ticket) and schedule corresponding rebalance. h3. Implementation Notes It's possible that assignemnts.pending keys will be stale at the moment of processing, so in order to overcome given issue following common-for-current-rebalance steps are proposed: # Start all new needed nodes {{partition.assignments.pending / partition.assignments.stable}} # After successful starts - check if current node is the leader of raft group (leader response must be updated by current term), if it is # Read distributed {{partition.assignments.pending }}and if the retrieved revision is less or equal to the one retrieved within initial local read run{{ }}{{{}RaftGroupService#changePeersAsync(leaderTerm, peers){}}}{{{}. {}}}{{RaftGroupService#changePeersAsync}}{{ from old terms must be skipped.}} Seems that https://github.com/apache/ignite-3/blob/main/modules/table/tech-notes/rebalance.md should be also updated a bit. > Catch-up rebalance on node restart: assignments keys > > > Key: IGNITE-20187 > URL: https://issues.apache.org/jira/browse/IGNITE-20187 > Project: Ignite > Issue Type: Improvement >Reporter: Alexander Lapin >Priority: Major > Labels: ignite-3 > > h3. Motivation > Prior to the implementation of the meta storage compaction and the related > node restart updates, the node restored its volatile state in terms of > assignments through ms.watches starting from APPLIED_REVISION + 1. Meaning > that after the restart, the node was notified about missing state through > {*}the events{*}. However, it's no longer true: new logic assumes that the > node will register ms.watch starting from APPLIED_REVISION + X + 1 and will > manually read local meta storage state for APPLIED_REVISION +X along with > related processing. The implementation of the above process is the essence of > this ticket. > h3. Definition of Done > Within node restart process, TableManager or similar should manually read > local assignments pending keys (reading assignments stable will be covered in > a separate ticket) and schedule corresponding rebalance. > h3. Implementation Notes > It's possible that assignemnts.pending keys will be stale at the moment of > processing, so in order to overcome given issue following > common-for-current-rebalance steps are proposed: > # Start all new needed nodes {{partition.assignments.pending / > partition.assignments.stable}} > # After successful starts - check if current node is the leader of raft > group (leader response must be updated by current term), if it is > # Read distributed {{partition.assignments.pending }}and if the retrieved > revision is less or equal to the one retrieved within initial local read > run{{ }}{{{}RaftGroupService#changePeersAsync(leaderTerm, peers){}}}{{{}. > {}}}{{RaftGroupService#changePeersAsync}}{{ from old terms must be skipped.}} > Seems that > https://github.com/apache/ignite-3/blob/main/modules/table/tech-notes/rebalance.md > should be also updated a bit. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20187) Catch-up rebalance on node restart: assignments keys
[ https://issues.apache.org/jira/browse/IGNITE-20187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Lapin updated IGNITE-20187: - Description: h3. Motivation Prior to the implementation of the meta storage compaction and the related node restart updates, the node restored its volatile state in terms of assignments through ms.watches starting from APPLIED_REVISION + 1. Meaning that after the restart, the node was notified about missing state through {*}the events{*}. However, it's no longer true: new logic assumes that the node will register ms.watch starting from APPLIED_REVISION + X + 1 and will manually read local meta storage state for APPLIED_REVISION +X along with related processing. The implementation of the above process is the essence of this ticket. h3. Definition of Done Within node restart process, TableManager or similar should manually read local assignments pending keys (reading assignments stable will be covered in a separate ticket) and schedule corresponding rebalance. h3. Implementation Notes It's possible that assignemnts.pending keys will be stale at the moment of processing, so in order to overcome given issue following common-for-current-rebalance steps are proposed: # Start all new needed nodes {{partition.assignments.pending / partition.assignments.stable}} # After successful starts - check if current node is the leader of raft group (leader response must be updated by current term), if it is # Read distributed {{partition.assignments.pending }}and if the retrieved revision is less or equal to the one retrieved within initial local read run RaftGroupService#changePeersAsync(leaderTerm, peers) RaftGroupService#changePeersAsync from old terms must be skipped. Seems that https://github.com/apache/ignite-3/blob/main/modules/table/tech-notes/rebalance.md should be also updated a bit. was: h3. Motivation Prior to the implementation of the meta storage compaction and the related node restart updates, the node restored its volatile state in terms of assignments through ms.watches starting from APPLIED_REVISION + 1. Meaning that after the restart, the node was notified about missing state through {*}the events{*}. However, it's no longer true: new logic assumes that the node will register ms.watch starting from APPLIED_REVISION + X + 1 and will manually read local meta storage state for APPLIED_REVISION +X along with related processing. The implementation of the above process is the essence of this ticket. h3. Definition of Done Within node restart process, TableManager or similar should manually read local assignments pending keys (reading assignments stable will be covered in a separate ticket) and schedule corresponding rebalance. h3. Implementation Notes It's possible that assignemnts.pending keys will be stale at the moment of processing, so in order to overcome given issue following common-for-current-rebalance steps are proposed: # Start all new needed nodes {{partition.assignments.pending / partition.assignments.stable}} # After successful starts - check if current node is the leader of raft group (leader response must be updated by current term), if it is # Read distributed {{partition.assignments.pending }}and if the retrieved revision is less or equal to the one retrieved within initial local read run{{ }}{{{}RaftGroupService#changePeersAsync(leaderTerm, peers){}}}{{{}. {}}}{{RaftGroupService#changePeersAsync}}{{ from old terms must be skipped.}} Seems that https://github.com/apache/ignite-3/blob/main/modules/table/tech-notes/rebalance.md should be also updated a bit. > Catch-up rebalance on node restart: assignments keys > > > Key: IGNITE-20187 > URL: https://issues.apache.org/jira/browse/IGNITE-20187 > Project: Ignite > Issue Type: Improvement >Reporter: Alexander Lapin >Priority: Major > Labels: ignite-3 > > h3. Motivation > Prior to the implementation of the meta storage compaction and the related > node restart updates, the node restored its volatile state in terms of > assignments through ms.watches starting from APPLIED_REVISION + 1. Meaning > that after the restart, the node was notified about missing state through > {*}the events{*}. However, it's no longer true: new logic assumes that the > node will register ms.watch starting from APPLIED_REVISION + X + 1 and will > manually read local meta storage state for APPLIED_REVISION +X along with > related processing. The implementation of the above process is the essence of > this ticket. > h3. Definition of Done > Within node restart process, TableManager or similar should manually read > local assignments pending keys (reading assignments stable will be covered in > a separate ticket) and schedule corresponding rebalance. > h3. Implementati
[jira] [Created] (IGNITE-20209) Catch-up rebalance triggers on node restart
Alexander Lapin created IGNITE-20209: Summary: Catch-up rebalance triggers on node restart Key: IGNITE-20209 URL: https://issues.apache.org/jira/browse/IGNITE-20209 Project: Ignite Issue Type: Improvement Reporter: Alexander Lapin -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IGNITE-20015) Sql. Introduce new distribution function
[ https://issues.apache.org/jira/browse/IGNITE-20015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Mashenkov reassigned IGNITE-20015: - Assignee: (was: Andrey Mashenkov) > Sql. Introduce new distribution function > > > Key: IGNITE-20015 > URL: https://issues.apache.org/jira/browse/IGNITE-20015 > Project: Ignite > Issue Type: Improvement > Components: sql >Reporter: Konstantin Orlov >Priority: Major > Labels: ignite-3 > > To realize the full potential of sql engine in queries over node specific > views, we need to support new type of distribution function > ({{org.apache.ignite.internal.sql.engine.trait.DistributionFunction}}). The > semantic of this new function should be pretty strait forward: the column > this function refers to is actually an identity of the node containing the > data. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20209) Catch-up rebalance triggers on node restart
[ https://issues.apache.org/jira/browse/IGNITE-20209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Lapin updated IGNITE-20209: - Description: h3. Motivation Please check https://issues.apache.org/jira/browse/IGNITE-20187 for more context, that is about catching-up assignments.pending meta storage keys, whether given one is about catching-up its triggers: * Replica factor updates. * Partitions count updates. Immutable for now. * Data nodes updates. * Replica storage addition/removal. !By the way, is it possible to remove replica storage. For all aforementioned cases, it's required to update distributed assignments pending (planned) keys if it's not yet done. And the only difficulty here is precisely in understanding whether this was done or not. h3. Definition of Done Updated distributed assignments pending(planned) keys if necessary according to the current triggers state. > Catch-up rebalance triggers on node restart > --- > > Key: IGNITE-20209 > URL: https://issues.apache.org/jira/browse/IGNITE-20209 > Project: Ignite > Issue Type: Improvement >Reporter: Alexander Lapin >Priority: Major > Labels: ignite-3 > > h3. Motivation > Please check https://issues.apache.org/jira/browse/IGNITE-20187 for more > context, that is about catching-up assignments.pending meta storage keys, > whether given one is about catching-up its triggers: > * Replica factor updates. > * Partitions count updates. Immutable for now. > * Data nodes updates. > * Replica storage addition/removal. !By the way, is it possible to remove > replica storage. > For all aforementioned cases, it's required to update distributed assignments > pending (planned) keys if it's not yet done. And the only difficulty here is > precisely in understanding whether this was done or not. > h3. Definition of Done > Updated distributed assignments pending(planned) keys if necessary according > to the current triggers state. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-20210) Start partitions on corresponding assignments.stable, calculate if missing, cleanup obsolete resources
Alexander Lapin created IGNITE-20210: Summary: Start partitions on corresponding assignments.stable, calculate if missing, cleanup obsolete resources Key: IGNITE-20210 URL: https://issues.apache.org/jira/browse/IGNITE-20210 Project: Ignite Issue Type: Improvement Reporter: Alexander Lapin -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-20211) Grid*Tx*future's (scope#3) code deduplication
Anton Vinogradov created IGNITE-20211: - Summary: Grid*Tx*future's (scope#3) code deduplication Key: IGNITE-20211 URL: https://issues.apache.org/jira/browse/IGNITE-20211 Project: Ignite Issue Type: Sub-task Reporter: Anton Vinogradov Assignee: Anton Vinogradov -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20210) Start partitions on corresponding assignments.stable, calculate if missing, cleanup obsolete resources
[ https://issues.apache.org/jira/browse/IGNITE-20210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Lapin updated IGNITE-20210: - Description: h3. Motivation Please check https://issues.apache.org/jira/browse/IGNITE-20187 and https://issues.apache.org/jira/browse/IGNITE-20209 for more details. This ticket is about assignments stable catch-up. Obviously there are the following possibilities: # Assignments.stable are present - just start table locally. Basically it is IGNITE-20187 but not for assignments pending but stable. # Assignemnts stable are missing. Well it's the same as IGNITE-20209 but for table creation triggers and not rebalance ones. Besides that it's nessessary to cleanup obsolete resourves e.g. raft and partitions storages. Currently, all that stuff is implemented incorrectly through: {code:java} if (partitionAssignments(vaultManager, tableId, 0) != null) { assignmentsFuture = completedFuture(tableAssignments(vaultManager, tableId, zoneDescriptor.partitions())); } else { assignmentsFuture = distributionZoneManager.dataNodes(ctx.storageRevision(), tableDescriptor.zoneId()) .thenApply(dataNodes -> AffinityUtils.calculateAssignments( dataNodes, zoneDescriptor.partitions(), zoneDescriptor.replicas() )); } {code} h3. Definition of Done * Assignments.stable update is properly catched-up on top of corresponding table creation triggers. * Partitions start up is implemented trough assignments.stable instead of table cfg triggers along with assignments recalculation, like it's implemented now. * Obsolete partition storages are removed on node restart. > Start partitions on corresponding assignments.stable, calculate if missing, > cleanup obsolete resources > -- > > Key: IGNITE-20210 > URL: https://issues.apache.org/jira/browse/IGNITE-20210 > Project: Ignite > Issue Type: Improvement >Reporter: Alexander Lapin >Priority: Major > Labels: ignite-3 > > h3. Motivation > Please check https://issues.apache.org/jira/browse/IGNITE-20187 and > https://issues.apache.org/jira/browse/IGNITE-20209 for more details. This > ticket is about assignments stable catch-up. Obviously there are the > following possibilities: > # Assignments.stable are present - just start table locally. Basically it is > IGNITE-20187 but not for assignments pending but stable. > # Assignemnts stable are missing. Well it's the same as IGNITE-20209 but for > table creation triggers and not rebalance ones. > Besides that it's nessessary to cleanup obsolete resourves e.g. raft and > partitions storages. > Currently, all that stuff is implemented incorrectly through: > {code:java} > if (partitionAssignments(vaultManager, tableId, 0) != null) { > assignmentsFuture = completedFuture(tableAssignments(vaultManager, > tableId, zoneDescriptor.partitions())); > } else { > assignmentsFuture = > distributionZoneManager.dataNodes(ctx.storageRevision(), > tableDescriptor.zoneId()) > .thenApply(dataNodes -> AffinityUtils.calculateAssignments( > dataNodes, > zoneDescriptor.partitions(), > zoneDescriptor.replicas() > )); > } {code} > h3. Definition of Done > * Assignments.stable update is properly catched-up on top of corresponding > table creation triggers. > * Partitions start up is implemented trough assignments.stable instead of > table cfg triggers along with assignments recalculation, like it's > implemented now. > * Obsolete partition storages are removed on node restart. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20124) Prevent double storage updates within primary
[ https://issues.apache.org/jira/browse/IGNITE-20124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denis Chudov updated IGNITE-20124: -- Description: h3. Motivation In order to preserve the guarantee that the primary replica is always up-to-date it's required to: * In case of common RW transaction - insert writeIntent to the storage within primary before replication. * In case of one-phase-commit - insert commitedWrite after the replication. Both have already been done. However, that means that if primary is part of the replication group, and it's true in almost all cases, we will double the update: * In case of common RW transaction - through the replication. * In case of one-phase-commit - either through the replication, or though post update, if replication was fast enough. h3. Definition of Done * Prevent double storage updates within primary. h3. Implementation Notes The easiest way to prevent double insert is to skip one if local safe time is greater or equal to candidates. There are 3 places where we update partition storage: # Primary pre-replication update. In that case, the second update on replication should be excluded. # Primary post-replication update in case of 1PC. It's possible to see already updated data if replication was already processed locally. It is expected to be already covered in https://issues.apache.org/jira/browse/IGNITE-15927 . We should check the primary safe time on post-replication update and don't do update if the safe time is already adjusted. # Insert through replication. In case of !1PC on every primary there will be double insert (see 1). In case of 1PC it depends, so we should check the safe time on primary to know whether the update should be done (see 2). In every case, the storage indexes still should be adjusted on replication, as it is done now, because the progress of indexes on FSM write operations should not be violated - otherwise, a Raft snapshot-based rebalance would be broken. We may have two non-consistent storage updates on primary which may affect different fsyncs, so maybe we should benchmark this optimization to find out how useful it is. The transactional correctness isn't violated by these non-consistent storage updates, because there is only a possibility that some writes or write intents will go ahead of indexes and therefore will be included into snapshots - however we still can process such writes and resolve write intents. Also, the safe time needs to be updated on the primary replica now. was: h3. Motivation In order to preserve the guarantee that the primary replica is always up-to-date it's required to: * In case of common RW transaction - insert writeIntent to the storage within primary before replication. * In case of one-phase-commit - insert commitedWrite after the replication. Both have already been done. However, that means that if primary is part of the replication group, and it's true in almost all cases, we will double the update: * In case of common RW transaction - through the replication. * In case of one-phase-commit - either through the replication, or though post update, if replication was fast enough. h3. Definition of Done * Prevent double storage updates within primary. h3. Implementation Notes The easiest way to prevent double insert is to skip one if local safe time is greater or equal to candidates. There are 3 places where we update partition storage: # Primary pre-replication update. In that case, the second update on replication should be excluded. # Primary post-replication update in case of 1PC. It's possible to see already updated data if replication was already processed locally. It is expected to be already covered in https://issues.apache.org/jira/browse/IGNITE-15927 . We should check the primary safe time on post-replication update and don't do update if the safe time is already adjusted. # Insert through replication. In case of !1PC on every primary there will be double insert (see 1). In case of 1PC it depends, so we should check the safe time on primary to know whether the update should be done (see 2). In every case, the storage indexes still should be adjusted on replication, as it is done now, because the progress of indexes on FSM write operations should not be violated - otherwise, a Raft snapshot-based rebalance would be broken. We may have two non-consistent storage updates on primary which may affect different fsyncs, so maybe we should benchmark this optimization to find out how useful it is. The transactional correctness isn't violated by these > Prevent double storage updates within primary > - > > Key: IGNITE-20124 > URL: https://issues.apache.org/jira/browse/IGNITE-20124 > Project: Ignite > Issue Type: Improvement >Reporter: Alexander Lapin >
[jira] [Created] (IGNITE-20212) Investigate maximum throughput of inserts via DataStreamer
Alexey Scherbakov created IGNITE-20212: -- Summary: Investigate maximum throughput of inserts via DataStreamer Key: IGNITE-20212 URL: https://issues.apache.org/jira/browse/IGNITE-20212 Project: Ignite Issue Type: Task Reporter: Alexey Scherbakov Fix For: 3.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20212) Investigate maximum throughput of inserts via DataStreamer
[ https://issues.apache.org/jira/browse/IGNITE-20212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Scherbakov updated IGNITE-20212: --- Epic Link: IGNITE-19479 > Investigate maximum throughput of inserts via DataStreamer > -- > > Key: IGNITE-20212 > URL: https://issues.apache.org/jira/browse/IGNITE-20212 > Project: Ignite > Issue Type: Task >Reporter: Alexey Scherbakov >Priority: Major > Fix For: 3.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20207) Improve the error handling
[ https://issues.apache.org/jira/browse/IGNITE-20207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Gagarkin updated IGNITE-20207: --- Summary: Improve the error handling (was: Improve the writing of files in FileTransferService) > Improve the error handling > -- > > Key: IGNITE-20207 > URL: https://issues.apache.org/jira/browse/IGNITE-20207 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Gagarkin >Priority: Major > Labels: ignite-3 > > The current implementation of > {{org.apache.ignite.internal.network.file.ChunkedFileWriter}} compares the > file pointer with the offset of the received file chunk. If they are equal, > the chunk is written to the disk; if not, the chunk is placed in the queue, > and it will be written when all previous chunks have been written. > It might be more efficient to write chunks instantly. > We should investigate this approach and improve the implementation. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-15927) Implement one phase commit
[ https://issues.apache.org/jira/browse/IGNITE-15927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Scherbakov updated IGNITE-15927: --- Labels: ignite-3 ignite3_performance (was: ignite-3) > Implement one phase commit > -- > > Key: IGNITE-15927 > URL: https://issues.apache.org/jira/browse/IGNITE-15927 > Project: Ignite > Issue Type: Improvement >Reporter: Alexey Scherbakov >Assignee: Alexey Scherbakov >Priority: Major > Labels: ignite-3, ignite3_performance > Time Spent: 3h 10m > Remaining Estimate: 0h > > If all keys in the implicit transaction belong to a same partition in can be > committed in one round-trip. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20207) Improve the error handling
[ https://issues.apache.org/jira/browse/IGNITE-20207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Gagarkin updated IGNITE-20207: --- Description: The current implementation of org.apache.ignite.internal.network.file.FileTransferService doesn't provide recovery functionality. Any error during file transfer leads to repeating the transfer from scratch. (was: The current implementation of org.apache.ignite.internal.network.file.FileTransferService doesn't provide recovery functionality. Any error during file transfer leads to repeat the transfer from scratch. ) > Improve the error handling > -- > > Key: IGNITE-20207 > URL: https://issues.apache.org/jira/browse/IGNITE-20207 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Gagarkin >Priority: Major > Labels: ignite-3 > > The current implementation of > org.apache.ignite.internal.network.file.FileTransferService doesn't provide > recovery functionality. Any error during file transfer leads to repeating the > transfer from scratch. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20207) Improve the error handling
[ https://issues.apache.org/jira/browse/IGNITE-20207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Gagarkin updated IGNITE-20207: --- Description: The current implementation of org.apache.ignite.internal.network.file.FileTransferService doesn't provide recovery functionality. Any error during file transfer leads to repeat the transfer from scratch. (was: The current implementation of {{org.apache.ignite.internal.network.file.ChunkedFileWriter}} compares the file pointer with the offset of the received file chunk. If they are equal, the chunk is written to the disk; if not, the chunk is placed in the queue, and it will be written when all previous chunks have been written. It might be more efficient to write chunks instantly. We should investigate this approach and improve the implementation.) > Improve the error handling > -- > > Key: IGNITE-20207 > URL: https://issues.apache.org/jira/browse/IGNITE-20207 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Gagarkin >Priority: Major > Labels: ignite-3 > > The current implementation of > org.apache.ignite.internal.network.file.FileTransferService doesn't provide > recovery functionality. Any error during file transfer leads to repeat the > transfer from scratch. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20207) Improve the error handling
[ https://issues.apache.org/jira/browse/IGNITE-20207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Gagarkin updated IGNITE-20207: --- Description: The current implementation of org.apache.ignite.internal.network.file.FileTransferService doesn't provide recovery functionality. Any error during file transfer leads to repeating the transfer from scratch. We need to define cases when Ignite can provide recovery and implement this functionality. (was: The current implementation of org.apache.ignite.internal.network.file.FileTransferService doesn't provide recovery functionality. Any error during file transfer leads to repeating the transfer from scratch. ) > Improve the error handling > -- > > Key: IGNITE-20207 > URL: https://issues.apache.org/jira/browse/IGNITE-20207 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Gagarkin >Priority: Major > Labels: ignite-3 > > The current implementation of > org.apache.ignite.internal.network.file.FileTransferService doesn't provide > recovery functionality. Any error during file transfer leads to repeating the > transfer from scratch. We need to define cases when Ignite can provide > recovery and implement this functionality. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20203) File transfer for Ignite 3
[ https://issues.apache.org/jira/browse/IGNITE-20203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Gagarkin updated IGNITE-20203: --- Description: As outcome of > File transfer for Ignite 3 > -- > > Key: IGNITE-20203 > URL: https://issues.apache.org/jira/browse/IGNITE-20203 > Project: Ignite > Issue Type: Epic >Reporter: Ivan Gagarkin >Priority: Major > Labels: ignite-3 > > As outcome of -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20203) File transfer for Ignite 3
[ https://issues.apache.org/jira/browse/IGNITE-20203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Gagarkin updated IGNITE-20203: --- Description: As outcome of IGNITE-19009, we (was: As outcome of ) > File transfer for Ignite 3 > -- > > Key: IGNITE-20203 > URL: https://issues.apache.org/jira/browse/IGNITE-20203 > Project: Ignite > Issue Type: Epic >Reporter: Ivan Gagarkin >Priority: Major > Labels: ignite-3 > > As outcome of IGNITE-19009, we -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20203) File transfer for Ignite 3
[ https://issues.apache.org/jira/browse/IGNITE-20203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Gagarkin updated IGNITE-20203: --- Description: In the outcome of IGNITE-19009, we obtained the new module {{{}ignite-file-transfer{}}}. All file transfers in Ignite 3 should utilize the new service, FileTransferService. Additionally, there are some aspects of the service that need improvement. was:As outcome of IGNITE-19009, we > File transfer for Ignite 3 > -- > > Key: IGNITE-20203 > URL: https://issues.apache.org/jira/browse/IGNITE-20203 > Project: Ignite > Issue Type: Epic >Reporter: Ivan Gagarkin >Priority: Major > Labels: ignite-3 > > In the outcome of IGNITE-19009, we obtained the new module > {{{}ignite-file-transfer{}}}. > All file transfers in Ignite 3 should utilize the new service, > FileTransferService. Additionally, there are some aspects of the service that > need improvement. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20208) Reduce the size of
[ https://issues.apache.org/jira/browse/IGNITE-20208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Gagarkin updated IGNITE-20208: --- Summary: Reduce the size of (was: Use file ids instead of file names when transferring file chunks) > Reduce the size of > --- > > Key: IGNITE-20208 > URL: https://issues.apache.org/jira/browse/IGNITE-20208 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Gagarkin >Priority: Major > Labels: ignite-3 > > We can decrease the size of > org.apache.ignite.internal.network.file.messages.FileChunkMessage by > replacing file names with file ids. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20208) Reduce the size of FileChunkMessage
[ https://issues.apache.org/jira/browse/IGNITE-20208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Gagarkin updated IGNITE-20208: --- Summary: Reduce the size of FileChunkMessage (was: Reduce the size of ) > Reduce the size of FileChunkMessage > --- > > Key: IGNITE-20208 > URL: https://issues.apache.org/jira/browse/IGNITE-20208 > Project: Ignite > Issue Type: Improvement >Reporter: Ivan Gagarkin >Priority: Major > Labels: ignite-3 > > We can decrease the size of > org.apache.ignite.internal.network.file.messages.FileChunkMessage by > replacing file names with file ids. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-20213) RO transactions should not block LWM from rising
Aleksandr Polovtcev created IGNITE-20213: Summary: RO transactions should not block LWM from rising Key: IGNITE-20213 URL: https://issues.apache.org/jira/browse/IGNITE-20213 Project: Ignite Issue Type: Task Reporter: Aleksandr Polovtcev Assignee: Alexander Lapin -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20213) RO transactions should not block LWM from rising
[ https://issues.apache.org/jira/browse/IGNITE-20213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Polovtcev updated IGNITE-20213: - Description: {{org.apache.ignite.internal.tx.TxManager#updateLowWatermark}} returns a {{CompletableFuture}} that is completed when all currently running RO transactions complete. Until that future is complete, local Low Watermark does not get updated. It is proposed to change this behavior: instead of blocking the Low Watermark update, all unfinished RO transactions (at the time of the LWM update) must fail with an appropriate error. was: {{org.apache.ignite.internal.tx.TxManager#updateLowWatermark}} returns a {{CompletableFuture}} that is completed when all currently running RO transactions complete. Until that future is complete, local Low Watermark does not get updated. It is proposed to change this behavior: instead of blocking the Low Watermark update, all unfinished RO transactions must fail with an appropriate error. > RO transactions should not block LWM from rising > > > Key: IGNITE-20213 > URL: https://issues.apache.org/jira/browse/IGNITE-20213 > Project: Ignite > Issue Type: Task >Reporter: Aleksandr Polovtcev >Assignee: Alexander Lapin >Priority: Major > Labels: ignite-3 > > {{org.apache.ignite.internal.tx.TxManager#updateLowWatermark}} returns a > {{CompletableFuture}} that is completed when all currently running RO > transactions complete. Until that future is complete, local Low Watermark > does not get updated. > It is proposed to change this behavior: instead of blocking the Low Watermark > update, all unfinished RO transactions (at the time of the LWM update) must > fail with an appropriate error. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20213) RO transactions should not block LWM from rising
[ https://issues.apache.org/jira/browse/IGNITE-20213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Polovtcev updated IGNITE-20213: - Description: {{org.apache.ignite.internal.tx.TxManager#updateLowWatermark}} returns a {{CompletableFuture}} that is completed when all currently running RO transactions complete. Until that future is complete, local Low Watermark does not get updated. It is proposed to change this behavior: instead of blocking the Low Watermark update, all unfinished RO transactions must fail with an appropriate error. > RO transactions should not block LWM from rising > > > Key: IGNITE-20213 > URL: https://issues.apache.org/jira/browse/IGNITE-20213 > Project: Ignite > Issue Type: Task >Reporter: Aleksandr Polovtcev >Assignee: Alexander Lapin >Priority: Major > Labels: ignite-3 > > {{org.apache.ignite.internal.tx.TxManager#updateLowWatermark}} returns a > {{CompletableFuture}} that is completed when all currently running RO > transactions complete. Until that future is complete, local Low Watermark > does not get updated. > It is proposed to change this behavior: instead of blocking the Low Watermark > update, all unfinished RO transactions must fail with an appropriate error. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IGNITE-16088) Reuse Marshaller code in marshaller-common module
[ https://issues.apache.org/jira/browse/IGNITE-16088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Polovtcev reassigned IGNITE-16088: Assignee: Aleksandr Polovtcev (was: Pavel Tupitsyn) > Reuse Marshaller code in marshaller-common module > - > > Key: IGNITE-16088 > URL: https://issues.apache.org/jira/browse/IGNITE-16088 > Project: Ignite > Issue Type: Improvement >Affects Versions: 3.0.0-alpha4 >Reporter: Pavel Tupitsyn >Assignee: Aleksandr Polovtcev >Priority: Major > Labels: ignite-3 > Time Spent: 10m > Remaining Estimate: 0h > > IGNITE-14971 added *ignite-marshaller-common* module to reuse serialization > logic between the server and client parts. > This module duplicates some logic from *ignite-schema* module. > * Remove duplicated code from *ignite-schema* and reuse the logic from common > module. > * Extract other common bits where applicable (e.g. *AsmSerializerGenerator*) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20187) Catch-up rebalance on node restart: assignments keys
[ https://issues.apache.org/jira/browse/IGNITE-20187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mirza Aliev updated IGNITE-20187: - Description: h3. Motivation Prior to the implementation of the meta storage compaction and the related node restart updates, the node restored its volatile state in terms of assignments through ms.watches starting from APPLIED_REVISION + 1. Meaning that after the restart, the node was notified about missing state through {*}the events{*}. However, it's no longer true: new logic assumes that the node will register ms.watch starting from APPLIED_REVISION + X + 1 and will manually read local meta storage state for APPLIED_REVISION +X along with related processing. The implementation of the above process is the essence of this ticket. h3. Definition of Done Within node restart process, TableManager or similar should manually read local assignments pending keys (reading assignments stable will be covered in a separate ticket) and schedule corresponding rebalance. h3. Implementation Notes It's possible that assignemnts.pending keys will be stale at the moment of processing, so in order to overcome given issue following common-for-current-rebalance steps are proposed: # Start all new needed nodes {{partition.assignments.pending / partition.assignments.stable}} # After successful starts - check if current node is the leader of raft group (leader response must be updated by current term), if it is # Read distributed \{{partition.assignments.pending}} and if the retrieved revision is less or equal to the one retrieved within initial local read run RaftGroupService#changePeersAsync(leaderTerm, peers) RaftGroupService#changePeersAsync from old terms must be skipped. Seems that [https://github.com/apache/ignite-3/blob/main/modules/table/tech-notes/rebalance.md] should be also updated a bit. was: h3. Motivation Prior to the implementation of the meta storage compaction and the related node restart updates, the node restored its volatile state in terms of assignments through ms.watches starting from APPLIED_REVISION + 1. Meaning that after the restart, the node was notified about missing state through {*}the events{*}. However, it's no longer true: new logic assumes that the node will register ms.watch starting from APPLIED_REVISION + X + 1 and will manually read local meta storage state for APPLIED_REVISION +X along with related processing. The implementation of the above process is the essence of this ticket. h3. Definition of Done Within node restart process, TableManager or similar should manually read local assignments pending keys (reading assignments stable will be covered in a separate ticket) and schedule corresponding rebalance. h3. Implementation Notes It's possible that assignemnts.pending keys will be stale at the moment of processing, so in order to overcome given issue following common-for-current-rebalance steps are proposed: # Start all new needed nodes {{partition.assignments.pending / partition.assignments.stable}} # After successful starts - check if current node is the leader of raft group (leader response must be updated by current term), if it is # Read distributed {{partition.assignments.pending }}and if the retrieved revision is less or equal to the one retrieved within initial local read run RaftGroupService#changePeersAsync(leaderTerm, peers) RaftGroupService#changePeersAsync from old terms must be skipped. Seems that https://github.com/apache/ignite-3/blob/main/modules/table/tech-notes/rebalance.md should be also updated a bit. > Catch-up rebalance on node restart: assignments keys > > > Key: IGNITE-20187 > URL: https://issues.apache.org/jira/browse/IGNITE-20187 > Project: Ignite > Issue Type: Improvement >Reporter: Alexander Lapin >Priority: Major > Labels: ignite-3 > > h3. Motivation > Prior to the implementation of the meta storage compaction and the related > node restart updates, the node restored its volatile state in terms of > assignments through ms.watches starting from APPLIED_REVISION + 1. Meaning > that after the restart, the node was notified about missing state through > {*}the events{*}. However, it's no longer true: new logic assumes that the > node will register ms.watch starting from APPLIED_REVISION + X + 1 and will > manually read local meta storage state for APPLIED_REVISION +X along with > related processing. The implementation of the above process is the essence of > this ticket. > h3. Definition of Done > Within node restart process, TableManager or similar should manually read > local assignments pending keys (reading assignments stable will be covered in > a separate ticket) and schedule corresponding rebalance. > h3. Implementation Notes > It's possible that assi
[jira] [Updated] (IGNITE-20209) Catch-up rebalance triggers on node restart
[ https://issues.apache.org/jira/browse/IGNITE-20209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mirza Aliev updated IGNITE-20209: - Description: h3. Motivation Please check https://issues.apache.org/jira/browse/IGNITE-20187 for more context, that is about catching-up assignments.pending meta storage keys, whether given one is about catching-up its triggers: * Replica factor updates. * Partitions count updates. Immutable for now. * Data nodes updates. * Replica storage addition/removal. !By the way, is it possible to remove replica storage. For all aforementioned cases, it's required to update distributed assignments pending (planned) keys if it's not yet done. And the only difficulty here is precisely in understanding whether this was done or not. h3. Definition of Done Updated distributed assignments pending(planned) keys if necessary according to the current triggers state. Notes: 1) Add to metastorage starting revision was: h3. Motivation Please check https://issues.apache.org/jira/browse/IGNITE-20187 for more context, that is about catching-up assignments.pending meta storage keys, whether given one is about catching-up its triggers: * Replica factor updates. * Partitions count updates. Immutable for now. * Data nodes updates. * Replica storage addition/removal. !By the way, is it possible to remove replica storage. For all aforementioned cases, it's required to update distributed assignments pending (planned) keys if it's not yet done. And the only difficulty here is precisely in understanding whether this was done or not. h3. Definition of Done Updated distributed assignments pending(planned) keys if necessary according to the current triggers state. > Catch-up rebalance triggers on node restart > --- > > Key: IGNITE-20209 > URL: https://issues.apache.org/jira/browse/IGNITE-20209 > Project: Ignite > Issue Type: Improvement >Reporter: Alexander Lapin >Priority: Major > Labels: ignite-3 > > h3. Motivation > Please check https://issues.apache.org/jira/browse/IGNITE-20187 for more > context, that is about catching-up assignments.pending meta storage keys, > whether given one is about catching-up its triggers: > * Replica factor updates. > * Partitions count updates. Immutable for now. > * Data nodes updates. > * Replica storage addition/removal. !By the way, is it possible to remove > replica storage. > For all aforementioned cases, it's required to update distributed assignments > pending (planned) keys if it's not yet done. And the only difficulty here is > precisely in understanding whether this was done or not. > h3. Definition of Done > Updated distributed assignments pending(planned) keys if necessary according > to the current triggers state. > > Notes: > 1) Add to metastorage starting revision -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IGNITE-19836) .NET: Thin 3.0: Reject Tuples and POCOs with unmapped fields
[ https://issues.apache.org/jira/browse/IGNITE-19836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754134#comment-17754134 ] Igor Sapego commented on IGNITE-19836: -- Looks good to me. > .NET: Thin 3.0: Reject Tuples and POCOs with unmapped fields > > > Key: IGNITE-19836 > URL: https://issues.apache.org/jira/browse/IGNITE-19836 > Project: Ignite > Issue Type: Improvement > Components: platforms, thin client >Affects Versions: 3.0.0-beta1 >Reporter: Pavel Tupitsyn >Assignee: Pavel Tupitsyn >Priority: Major > Labels: .NET, ignite-3 > Fix For: 3.0.0-beta2 > > Time Spent: 10m > Remaining Estimate: 0h > > Tuples and POCOs with unmapped fields should not be allowed in table APIs. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20209) Catch-up rebalance triggers on node restart
[ https://issues.apache.org/jira/browse/IGNITE-20209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mirza Aliev updated IGNITE-20209: - Description: h3. Motivation Please check https://issues.apache.org/jira/browse/IGNITE-20187 for more context, that is about catching-up assignments.pending meta storage keys, whether given one is about catching-up its triggers: * Replica factor updates. * Partitions count updates. Immutable for now. * Data nodes updates. * Replica storage addition/removal. !By the way, is it possible to remove replica storage. For all aforementioned cases, it's required to update distributed assignments pending (planned) keys if it's not yet done. And the only difficulty here is precisely in understanding whether this was done or not. h3. Definition of Done Updated distributed assignments pending(planned) keys if necessary according to the current triggers state. Notes: 1) Add to metastorage starting revision (\{{metaStorageMgr.recoveryFinishedFuture()}} returns long with maximal recovered revision) was: h3. Motivation Please check https://issues.apache.org/jira/browse/IGNITE-20187 for more context, that is about catching-up assignments.pending meta storage keys, whether given one is about catching-up its triggers: * Replica factor updates. * Partitions count updates. Immutable for now. * Data nodes updates. * Replica storage addition/removal. !By the way, is it possible to remove replica storage. For all aforementioned cases, it's required to update distributed assignments pending (planned) keys if it's not yet done. And the only difficulty here is precisely in understanding whether this was done or not. h3. Definition of Done Updated distributed assignments pending(planned) keys if necessary according to the current triggers state. Notes: 1) Add to metastorage starting revision > Catch-up rebalance triggers on node restart > --- > > Key: IGNITE-20209 > URL: https://issues.apache.org/jira/browse/IGNITE-20209 > Project: Ignite > Issue Type: Improvement >Reporter: Alexander Lapin >Priority: Major > Labels: ignite-3 > > h3. Motivation > Please check https://issues.apache.org/jira/browse/IGNITE-20187 for more > context, that is about catching-up assignments.pending meta storage keys, > whether given one is about catching-up its triggers: > * Replica factor updates. > * Partitions count updates. Immutable for now. > * Data nodes updates. > * Replica storage addition/removal. !By the way, is it possible to remove > replica storage. > For all aforementioned cases, it's required to update distributed assignments > pending (planned) keys if it's not yet done. And the only difficulty here is > precisely in understanding whether this was done or not. > h3. Definition of Done > Updated distributed assignments pending(planned) keys if necessary according > to the current triggers state. > > Notes: > 1) Add to metastorage starting revision > (\{{metaStorageMgr.recoveryFinishedFuture()}} returns long with maximal > recovered revision) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20209) Catch-up rebalance triggers on node restart
[ https://issues.apache.org/jira/browse/IGNITE-20209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mirza Aliev updated IGNITE-20209: - Description: h3. Motivation Please check https://issues.apache.org/jira/browse/IGNITE-20187 for more context, that is about catching-up assignments.pending meta storage keys, whether given one is about catching-up its triggers: * Replica factor updates. * Partitions count updates. Immutable for now. * Data nodes updates. * Replica storage addition/removal. !By the way, is it possible to remove replica storage. For all aforementioned cases, it's required to update distributed assignments pending (planned) keys if it's not yet done. And the only difficulty here is precisely in understanding whether this was done or not. h3. Definition of Done Updated distributed assignments pending(planned) keys if necessary according to the current triggers state. Notes: 1) Add to metastorage starting revision ({{{}metaStorageMgr.recoveryFinishedFuture(){}}} returns long with the maximal recovered revision) was: h3. Motivation Please check https://issues.apache.org/jira/browse/IGNITE-20187 for more context, that is about catching-up assignments.pending meta storage keys, whether given one is about catching-up its triggers: * Replica factor updates. * Partitions count updates. Immutable for now. * Data nodes updates. * Replica storage addition/removal. !By the way, is it possible to remove replica storage. For all aforementioned cases, it's required to update distributed assignments pending (planned) keys if it's not yet done. And the only difficulty here is precisely in understanding whether this was done or not. h3. Definition of Done Updated distributed assignments pending(planned) keys if necessary according to the current triggers state. Notes: 1) Add to metastorage starting revision (\{{metaStorageMgr.recoveryFinishedFuture()}} returns long with maximal recovered revision) > Catch-up rebalance triggers on node restart > --- > > Key: IGNITE-20209 > URL: https://issues.apache.org/jira/browse/IGNITE-20209 > Project: Ignite > Issue Type: Improvement >Reporter: Alexander Lapin >Priority: Major > Labels: ignite-3 > > h3. Motivation > Please check https://issues.apache.org/jira/browse/IGNITE-20187 for more > context, that is about catching-up assignments.pending meta storage keys, > whether given one is about catching-up its triggers: > * Replica factor updates. > * Partitions count updates. Immutable for now. > * Data nodes updates. > * Replica storage addition/removal. !By the way, is it possible to remove > replica storage. > For all aforementioned cases, it's required to update distributed assignments > pending (planned) keys if it's not yet done. And the only difficulty here is > precisely in understanding whether this was done or not. > h3. Definition of Done > Updated distributed assignments pending(planned) keys if necessary according > to the current triggers state. > > Notes: > 1) Add to metastorage starting revision > ({{{}metaStorageMgr.recoveryFinishedFuture(){}}} returns long with the > maximal recovered revision) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20209) Catch-up rebalance triggers on node restart
[ https://issues.apache.org/jira/browse/IGNITE-20209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mirza Aliev updated IGNITE-20209: - Description: h3. Motivation Please check https://issues.apache.org/jira/browse/IGNITE-20187 for more context, that is about catching-up assignments.pending meta storage keys, whether given one is about catching-up its triggers: * Replica factor updates. * Partitions count updates. Immutable for now. * Data nodes updates. * Replica storage addition/removal. !By the way, is it possible to remove replica storage. For all aforementioned cases, it's required to update distributed assignments pending (planned) keys if it's not yet done. And the only difficulty here is precisely in understanding whether this was done or not. h3. Definition of Done Updated distributed assignments pending(planned) keys if necessary according to the current triggers state. Notes: 1) -Add to metastorage starting revision- ({{{}metaStorageMgr.recoveryFinishedFuture(){}}} returns long with the maximal recovered revision) was: h3. Motivation Please check https://issues.apache.org/jira/browse/IGNITE-20187 for more context, that is about catching-up assignments.pending meta storage keys, whether given one is about catching-up its triggers: * Replica factor updates. * Partitions count updates. Immutable for now. * Data nodes updates. * Replica storage addition/removal. !By the way, is it possible to remove replica storage. For all aforementioned cases, it's required to update distributed assignments pending (planned) keys if it's not yet done. And the only difficulty here is precisely in understanding whether this was done or not. h3. Definition of Done Updated distributed assignments pending(planned) keys if necessary according to the current triggers state. Notes: 1) Add to metastorage starting revision ({{{}metaStorageMgr.recoveryFinishedFuture(){}}} returns long with the maximal recovered revision) > Catch-up rebalance triggers on node restart > --- > > Key: IGNITE-20209 > URL: https://issues.apache.org/jira/browse/IGNITE-20209 > Project: Ignite > Issue Type: Improvement >Reporter: Alexander Lapin >Priority: Major > Labels: ignite-3 > > h3. Motivation > Please check https://issues.apache.org/jira/browse/IGNITE-20187 for more > context, that is about catching-up assignments.pending meta storage keys, > whether given one is about catching-up its triggers: > * Replica factor updates. > * Partitions count updates. Immutable for now. > * Data nodes updates. > * Replica storage addition/removal. !By the way, is it possible to remove > replica storage. > For all aforementioned cases, it's required to update distributed assignments > pending (planned) keys if it's not yet done. And the only difficulty here is > precisely in understanding whether this was done or not. > h3. Definition of Done > Updated distributed assignments pending(planned) keys if necessary according > to the current triggers state. > > Notes: > 1) -Add to metastorage starting revision- > ({{{}metaStorageMgr.recoveryFinishedFuture(){}}} returns long with the > maximal recovered revision) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-20214) ItSimpleCounterServerTest#testRefreshLeader is flaky
Aleksandr Polovtcev created IGNITE-20214: Summary: ItSimpleCounterServerTest#testRefreshLeader is flaky Key: IGNITE-20214 URL: https://issues.apache.org/jira/browse/IGNITE-20214 Project: Ignite Issue Type: Task Reporter: Aleksandr Polovtcev Assignee: Aleksandr Polovtcev -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20214) ItSimpleCounterServerTest#testRefreshLeader is flaky
[ https://issues.apache.org/jira/browse/IGNITE-20214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Polovtcev updated IGNITE-20214: - Ignite Flags: (was: Docs Required,Release Notes Required) > ItSimpleCounterServerTest#testRefreshLeader is flaky > > > Key: IGNITE-20214 > URL: https://issues.apache.org/jira/browse/IGNITE-20214 > Project: Ignite > Issue Type: Task >Reporter: Aleksandr Polovtcev >Assignee: Aleksandr Polovtcev >Priority: Blocker > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20214) ItSimpleCounterServerTest#testRefreshLeader is flaky
[ https://issues.apache.org/jira/browse/IGNITE-20214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Polovtcev updated IGNITE-20214: - Labels: ignite-3 (was: ) > ItSimpleCounterServerTest#testRefreshLeader is flaky > > > Key: IGNITE-20214 > URL: https://issues.apache.org/jira/browse/IGNITE-20214 > Project: Ignite > Issue Type: Task >Reporter: Aleksandr Polovtcev >Assignee: Aleksandr Polovtcev >Priority: Blocker > Labels: ignite-3 > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IGNITE-19836) .NET: Thin 3.0: Reject Tuples and POCOs with unmapped fields
[ https://issues.apache.org/jira/browse/IGNITE-19836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754137#comment-17754137 ] Pavel Tupitsyn commented on IGNITE-19836: - Merged to master: 4a646a7cd7eeaa8cc5b4b3000d430be3f4fb2587 > .NET: Thin 3.0: Reject Tuples and POCOs with unmapped fields > > > Key: IGNITE-19836 > URL: https://issues.apache.org/jira/browse/IGNITE-19836 > Project: Ignite > Issue Type: Improvement > Components: platforms, thin client >Affects Versions: 3.0.0-beta1 >Reporter: Pavel Tupitsyn >Assignee: Pavel Tupitsyn >Priority: Major > Labels: .NET, ignite-3 > Fix For: 3.0.0-beta2 > > Time Spent: 20m > Remaining Estimate: 0h > > Tuples and POCOs with unmapped fields should not be allowed in table APIs. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20214) ItSimpleCounterServerTest#testRefreshLeader is flaky
[ https://issues.apache.org/jira/browse/IGNITE-20214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Polovtcev updated IGNITE-20214: - Description: {{ItSimpleCounterServerTest#testRefreshLeader}} sometimes fails with the following error: {code:java} org.opentest4j.AssertionFailedError: Expected :true Actual :false at org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:180) at org.apache.ignite.raft.server.ItSimpleCounterServerTest.before(ItSimpleCounterServerTest.java:141) {code} Looks like timeouts in {{waitForTopology}} calls are too small. > ItSimpleCounterServerTest#testRefreshLeader is flaky > > > Key: IGNITE-20214 > URL: https://issues.apache.org/jira/browse/IGNITE-20214 > Project: Ignite > Issue Type: Task >Reporter: Aleksandr Polovtcev >Assignee: Aleksandr Polovtcev >Priority: Blocker > Labels: ignite-3 > > {{ItSimpleCounterServerTest#testRefreshLeader}} sometimes fails with the > following error: > {code:java} > org.opentest4j.AssertionFailedError: > Expected :true > Actual :false > at org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:180) > at > org.apache.ignite.raft.server.ItSimpleCounterServerTest.before(ItSimpleCounterServerTest.java:141) > {code} > Looks like timeouts in {{waitForTopology}} calls are too small. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (IGNITE-19836) .NET: Thin 3.0: Reject Tuples and POCOs with unmapped fields
[ https://issues.apache.org/jira/browse/IGNITE-19836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754137#comment-17754137 ] Pavel Tupitsyn edited comment on IGNITE-19836 at 8/14/23 3:06 PM: -- Merged to main: 4a646a7cd7eeaa8cc5b4b3000d430be3f4fb2587 was (Author: ptupitsyn): Merged to master: 4a646a7cd7eeaa8cc5b4b3000d430be3f4fb2587 > .NET: Thin 3.0: Reject Tuples and POCOs with unmapped fields > > > Key: IGNITE-19836 > URL: https://issues.apache.org/jira/browse/IGNITE-19836 > Project: Ignite > Issue Type: Improvement > Components: platforms, thin client >Affects Versions: 3.0.0-beta1 >Reporter: Pavel Tupitsyn >Assignee: Pavel Tupitsyn >Priority: Major > Labels: .NET, ignite-3 > Fix For: 3.0.0-beta2 > > Time Spent: 20m > Remaining Estimate: 0h > > Tuples and POCOs with unmapped fields should not be allowed in table APIs. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-16088) Reuse Marshaller code in marshaller-common module
[ https://issues.apache.org/jira/browse/IGNITE-16088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Polovtcev updated IGNITE-16088: - Fix Version/s: 3.0.0-beta2 > Reuse Marshaller code in marshaller-common module > - > > Key: IGNITE-16088 > URL: https://issues.apache.org/jira/browse/IGNITE-16088 > Project: Ignite > Issue Type: Improvement >Affects Versions: 3.0.0-alpha4 >Reporter: Pavel Tupitsyn >Assignee: Aleksandr Polovtcev >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > Time Spent: 10m > Remaining Estimate: 0h > > IGNITE-14971 added *ignite-marshaller-common* module to reuse serialization > logic between the server and client parts. > This module duplicates some logic from *ignite-schema* module. > * Remove duplicated code from *ignite-schema* and reuse the logic from common > module. > * Extract other common bits where applicable (e.g. *AsmSerializerGenerator*) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-16088) Reuse Marshaller code in marshaller-common module
[ https://issues.apache.org/jira/browse/IGNITE-16088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Polovtcev updated IGNITE-16088: - Fix Version/s: (was: 3.0.0-beta2) > Reuse Marshaller code in marshaller-common module > - > > Key: IGNITE-16088 > URL: https://issues.apache.org/jira/browse/IGNITE-16088 > Project: Ignite > Issue Type: Improvement >Affects Versions: 3.0.0-alpha4 >Reporter: Pavel Tupitsyn >Assignee: Aleksandr Polovtcev >Priority: Major > Labels: ignite-3 > Time Spent: 10m > Remaining Estimate: 0h > > IGNITE-14971 added *ignite-marshaller-common* module to reuse serialization > logic between the server and client parts. > This module duplicates some logic from *ignite-schema* module. > * Remove duplicated code from *ignite-schema* and reuse the logic from common > module. > * Extract other common bits where applicable (e.g. *AsmSerializerGenerator*) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20214) ItSimpleCounterServerTest#testRefreshLeader is flaky
[ https://issues.apache.org/jira/browse/IGNITE-20214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Polovtcev updated IGNITE-20214: - Fix Version/s: 3.0.0-beta2 > ItSimpleCounterServerTest#testRefreshLeader is flaky > > > Key: IGNITE-20214 > URL: https://issues.apache.org/jira/browse/IGNITE-20214 > Project: Ignite > Issue Type: Task >Reporter: Aleksandr Polovtcev >Assignee: Aleksandr Polovtcev >Priority: Blocker > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > Time Spent: 10m > Remaining Estimate: 0h > > {{ItSimpleCounterServerTest#testRefreshLeader}} sometimes fails with the > following error: > {code:java} > org.opentest4j.AssertionFailedError: > Expected :true > Actual :false > at org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:180) > at > org.apache.ignite.raft.server.ItSimpleCounterServerTest.before(ItSimpleCounterServerTest.java:141) > {code} > Looks like timeouts in {{waitForTopology}} calls are too small. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20035) IndexOutOfBoundsException when statement.SetMaxRows is set
[ https://issues.apache.org/jira/browse/IGNITE-20035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Pereslegin updated IGNITE-20035: -- Fix Version/s: 3.0.0-beta2 > IndexOutOfBoundsException when statement.SetMaxRows is set > -- > > Key: IGNITE-20035 > URL: https://issues.apache.org/jira/browse/IGNITE-20035 > Project: Ignite > Issue Type: Bug > Components: sql >Affects Versions: 3.0 >Reporter: Alexander Belyak >Assignee: Pavel Pereslegin >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > If setMaxRows > count(*) - query fail with IndexOutOfBound exception. > Reproducer: > > {noformat} > try (Connection connection = connect(); Statement statement = > connection.createStatement()) { > JdbcSteps steps = new JdbcSteps(statement); > steps.executeUpdateQuery("CREATE TABLE Person (id INT PRIMARY KEY, name > VARCHAR)", "Creating a table with two columns."); > steps.executeUpdateQuery("INSERT INTO Person (id, name) VALUES (1, > 'John')", "Inserting a single record"); > statement.setMaxRows(25); > ResultSet res = steps.executeQuery("SELECT * FROM Person", "Selecting all > the records from the table"); > while (res.next()) { > log.info("{}, {}", res.getInt(1), res.getString(2)); > assertEquals(1, res.getInt(1)); > assertEquals("John", res.getString(2)); > } > }{noformat} > Returns: > > > {noformat} > Exception while executing query [query=SELECT * FROM Person]. Error > message:toIndex = 25 > java.sql.SQLException: Exception while executing query [query=SELECT * FROM > Person]. Error message:toIndex = 25 > at > org.apache.ignite.internal.jdbc.proto.IgniteQueryErrorCode.createJdbcSqlException(IgniteQueryErrorCode.java:57) > at > org.apache.ignite.internal.jdbc.JdbcStatement.execute0(JdbcStatement.java:149) > at > org.apache.ignite.internal.jdbc.JdbcStatement.executeQuery(JdbcStatement.java:108) > at > org.gridgain.ai3tests.tests.teststeps.JdbcSteps.executeQuery(JdbcSteps.java:50) > at > org.gridgain.ai3tests.tests.BasicOperationsTest.testSaveAndGetFromCache(BasicOperationsTest.java:41) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:727) > at > org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60) > at > org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131) > at > org.junit.jupiter.api.extension.InvocationInterceptor.interceptTestMethod(InvocationInterceptor.java:118) > at > org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103) > at > org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93) > at > org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106) > at > org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:156) > at > org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:147) > at > org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:86) > at > org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103) > at > org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93) > at > org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106) > at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64) > at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45) > at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37) > at > org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:92) > at > org.junit.jupiter.engine.execution
[jira] [Updated] (IGNITE-20035) IndexOutOfBoundsException when statement.SetMaxRows is set
[ https://issues.apache.org/jira/browse/IGNITE-20035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Pereslegin updated IGNITE-20035: -- Ignite Flags: Release Notes Required (was: Docs Required,Release Notes Required) > IndexOutOfBoundsException when statement.SetMaxRows is set > -- > > Key: IGNITE-20035 > URL: https://issues.apache.org/jira/browse/IGNITE-20035 > Project: Ignite > Issue Type: Bug > Components: sql >Affects Versions: 3.0 >Reporter: Alexander Belyak >Assignee: Pavel Pereslegin >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > If setMaxRows > count(*) - query fail with IndexOutOfBound exception. > Reproducer: > > {noformat} > try (Connection connection = connect(); Statement statement = > connection.createStatement()) { > JdbcSteps steps = new JdbcSteps(statement); > steps.executeUpdateQuery("CREATE TABLE Person (id INT PRIMARY KEY, name > VARCHAR)", "Creating a table with two columns."); > steps.executeUpdateQuery("INSERT INTO Person (id, name) VALUES (1, > 'John')", "Inserting a single record"); > statement.setMaxRows(25); > ResultSet res = steps.executeQuery("SELECT * FROM Person", "Selecting all > the records from the table"); > while (res.next()) { > log.info("{}, {}", res.getInt(1), res.getString(2)); > assertEquals(1, res.getInt(1)); > assertEquals("John", res.getString(2)); > } > }{noformat} > Returns: > > > {noformat} > Exception while executing query [query=SELECT * FROM Person]. Error > message:toIndex = 25 > java.sql.SQLException: Exception while executing query [query=SELECT * FROM > Person]. Error message:toIndex = 25 > at > org.apache.ignite.internal.jdbc.proto.IgniteQueryErrorCode.createJdbcSqlException(IgniteQueryErrorCode.java:57) > at > org.apache.ignite.internal.jdbc.JdbcStatement.execute0(JdbcStatement.java:149) > at > org.apache.ignite.internal.jdbc.JdbcStatement.executeQuery(JdbcStatement.java:108) > at > org.gridgain.ai3tests.tests.teststeps.JdbcSteps.executeQuery(JdbcSteps.java:50) > at > org.gridgain.ai3tests.tests.BasicOperationsTest.testSaveAndGetFromCache(BasicOperationsTest.java:41) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:727) > at > org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60) > at > org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131) > at > org.junit.jupiter.api.extension.InvocationInterceptor.interceptTestMethod(InvocationInterceptor.java:118) > at > org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103) > at > org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93) > at > org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106) > at > org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:156) > at > org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:147) > at > org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:86) > at > org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103) > at > org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93) > at > org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106) > at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64) > at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45) > at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37) > at > org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker
[jira] [Updated] (IGNITE-20035) IndexOutOfBoundsException when statement.SetMaxRows is set
[ https://issues.apache.org/jira/browse/IGNITE-20035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Pereslegin updated IGNITE-20035: -- Ignite Flags: (was: Release Notes Required) > IndexOutOfBoundsException when statement.SetMaxRows is set > -- > > Key: IGNITE-20035 > URL: https://issues.apache.org/jira/browse/IGNITE-20035 > Project: Ignite > Issue Type: Bug > Components: sql >Affects Versions: 3.0 >Reporter: Alexander Belyak >Assignee: Pavel Pereslegin >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > If setMaxRows > count(*) - query fail with IndexOutOfBound exception. > Reproducer: > > {noformat} > try (Connection connection = connect(); Statement statement = > connection.createStatement()) { > JdbcSteps steps = new JdbcSteps(statement); > steps.executeUpdateQuery("CREATE TABLE Person (id INT PRIMARY KEY, name > VARCHAR)", "Creating a table with two columns."); > steps.executeUpdateQuery("INSERT INTO Person (id, name) VALUES (1, > 'John')", "Inserting a single record"); > statement.setMaxRows(25); > ResultSet res = steps.executeQuery("SELECT * FROM Person", "Selecting all > the records from the table"); > while (res.next()) { > log.info("{}, {}", res.getInt(1), res.getString(2)); > assertEquals(1, res.getInt(1)); > assertEquals("John", res.getString(2)); > } > }{noformat} > Returns: > > > {noformat} > Exception while executing query [query=SELECT * FROM Person]. Error > message:toIndex = 25 > java.sql.SQLException: Exception while executing query [query=SELECT * FROM > Person]. Error message:toIndex = 25 > at > org.apache.ignite.internal.jdbc.proto.IgniteQueryErrorCode.createJdbcSqlException(IgniteQueryErrorCode.java:57) > at > org.apache.ignite.internal.jdbc.JdbcStatement.execute0(JdbcStatement.java:149) > at > org.apache.ignite.internal.jdbc.JdbcStatement.executeQuery(JdbcStatement.java:108) > at > org.gridgain.ai3tests.tests.teststeps.JdbcSteps.executeQuery(JdbcSteps.java:50) > at > org.gridgain.ai3tests.tests.BasicOperationsTest.testSaveAndGetFromCache(BasicOperationsTest.java:41) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:727) > at > org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60) > at > org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131) > at > org.junit.jupiter.api.extension.InvocationInterceptor.interceptTestMethod(InvocationInterceptor.java:118) > at > org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103) > at > org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93) > at > org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106) > at > org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:156) > at > org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:147) > at > org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:86) > at > org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103) > at > org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93) > at > org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106) > at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64) > at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45) > at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37) > at > org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:92) > at > org.junit.jupit
[jira] [Updated] (IGNITE-20124) Prevent double storage updates within primary
[ https://issues.apache.org/jira/browse/IGNITE-20124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denis Chudov updated IGNITE-20124: -- Description: h3. Motivation In order to preserve the guarantee that the primary replica is always up-to-date it's required to: * In case of common RW transaction - insert writeIntent to the storage within primary before replication. * In case of one-phase-commit - insert commitedWrite after the replication. Both have already been done. However, that means that if primary is part of the replication group, and it's true in almost all cases, we will double the update: * In case of common RW transaction - through the replication. * In case of one-phase-commit - either through the replication, or though post update, if replication was fast enough. h3. Definition of Done * Prevent double storage updates within primary. h3. Implementation Notes The easiest way to prevent double insert is to skip one if local safe time is greater or equal to candidates. There are 3 places where we update partition storage: # Primary pre-replication update. In that case, the second update on replication should be excluded. # Primary post-replication update in case of 1PC. It's possible to see already updated data if replication was already processed locally. It is expected to be already covered in https://issues.apache.org/jira/browse/IGNITE-15927 . We should check the primary safe time on post-replication update and don't do update if the safe time is already adjusted. # Insert through replication. In case of !1PC on every primary there will be double insert (see 1). In case of 1PC it depends, so we should check the safe time on primary to know whether the update should be done (see 2). In every case, the storage indexes still should be adjusted on replication, as it is done now, because the progress of indexes on FSM write operations should not be violated - otherwise, a Raft snapshot-based rebalance would be broken. We may have two non-consistent storage updates on primary which may affect different fsyncs, so maybe we should benchmark this optimization to find out how useful it is. The transactional correctness isn't violated by these non-consistent storage updates, because there is only a possibility that some writes or write intents will go ahead of indexes and therefore will be included into snapshots - however we still can process such writes and resolve write intents. Also, the safe time needs to be updated on the primary replica now. There can be following scenarios: # Two-phase commit: we can advance safe time on primary, make pre-replication update and then run Raft command. Both safe time adjustment and storage update happen before replication. # One-phase commit: safe time should be advanced after completeness of Raft command future. There is no happens-before between the future callback and the replication handler, so the safe time should be checked and advanced in both places. We should use some critical section for the exact transaction, preventing race between safe time check, safe time adjustment and storage update. was: h3. Motivation In order to preserve the guarantee that the primary replica is always up-to-date it's required to: * In case of common RW transaction - insert writeIntent to the storage within primary before replication. * In case of one-phase-commit - insert commitedWrite after the replication. Both have already been done. However, that means that if primary is part of the replication group, and it's true in almost all cases, we will double the update: * In case of common RW transaction - through the replication. * In case of one-phase-commit - either through the replication, or though post update, if replication was fast enough. h3. Definition of Done * Prevent double storage updates within primary. h3. Implementation Notes The easiest way to prevent double insert is to skip one if local safe time is greater or equal to candidates. There are 3 places where we update partition storage: # Primary pre-replication update. In that case, the second update on replication should be excluded. # Primary post-replication update in case of 1PC. It's possible to see already updated data if replication was already processed locally. It is expected to be already covered in https://issues.apache.org/jira/browse/IGNITE-15927 . We should check the primary safe time on post-replication update and don't do update if the safe time is already adjusted. # Insert through replication. In case of !1PC on every primary there will be double insert (see 1). In case of 1PC it depends, so we should check the safe time on primary to know whether the update should be done (see 2). In every case, the storage indexes still should be adjusted on replication, as it is done now, because the progress of indexes on FSM write operations should not be viola
[jira] [Updated] (IGNITE-20157) Share context details to ease replication timeout exception analysis
[ https://issues.apache.org/jira/browse/IGNITE-20157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denis Chudov updated IGNITE-20157: -- Description: *Motivation* On client side, we have only "Replication timeout exception" happening on request timeout, and we can't know the exact reason without debugging. Later it will be replaced with transaction timeout exception, but this would not solve the problem. We should know somehow what happened on the server side. Timeouts should be set on operations on primary replica side that are most likely the cause of request timeouts. In case of such operation timeout on promary replica, a corresponding exception should be printed to the log, so that it can be matched with an exception on client by transaction id. *Definition of done* Future returned by LockManager#acquire is completed exceptionally if the lock was not acquired in some time interval (lock acquisition timeout). *Implementation notes* This exception (or its message) should differ from the exception thrown because of deadlock prevention policy with timeout. was: *Motivation* Currently we have lock timeouts only for specific implementations of DeadlockPreventionPolicy. In the same time, we have transaction request timeouts. It makes no sense for such requests to wait for acquiring locks longer than request timeout. *Definition of done* Future returned by LockManager#acquire is completed exceptionally if the lock was not acquired in some time interval (lock acquisition timeout). *Implementation notes* This exception (or its message) should differ from the exception thrown because of deadlock prevention policy with timeout. > Share context details to ease replication timeout exception analysis > > > Key: IGNITE-20157 > URL: https://issues.apache.org/jira/browse/IGNITE-20157 > Project: Ignite > Issue Type: Improvement >Reporter: Denis Chudov >Priority: Major > Labels: ignite-3 > > *Motivation* > On client side, we have only "Replication timeout exception" happening on > request timeout, and we can't know the exact reason without debugging. Later > it will be replaced with transaction timeout exception, but this would not > solve the problem. We should know somehow what happened on the server side. > Timeouts should be set on operations on primary replica side that are most > likely the cause of request timeouts. In case of such operation timeout on > promary replica, a corresponding exception should be printed to the log, so > that it can be matched with an exception on client by transaction id. > *Definition of done* > Future returned by LockManager#acquire is completed exceptionally if the lock > was not acquired in some time interval (lock acquisition timeout). > *Implementation notes* > This exception (or its message) should differ from the exception thrown > because of deadlock prevention policy with timeout. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20213) RO transactions should not block LWM from rising
[ https://issues.apache.org/jira/browse/IGNITE-20213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Polovtcev updated IGNITE-20213: - Description: {{org.apache.ignite.internal.tx.TxManager#updateLowWatermark}} returns a {{CompletableFuture}} that is completed when all currently running RO transactions finish. Until that future is complete, local Low Watermark does not get updated. It is proposed to change this behavior: instead of blocking the Low Watermark update, all unfinished RO transactions (at the time of the LWM update) must fail with an appropriate error. was: {{org.apache.ignite.internal.tx.TxManager#updateLowWatermark}} returns a {{CompletableFuture}} that is completed when all currently running RO transactions complete. Until that future is complete, local Low Watermark does not get updated. It is proposed to change this behavior: instead of blocking the Low Watermark update, all unfinished RO transactions (at the time of the LWM update) must fail with an appropriate error. > RO transactions should not block LWM from rising > > > Key: IGNITE-20213 > URL: https://issues.apache.org/jira/browse/IGNITE-20213 > Project: Ignite > Issue Type: Task >Reporter: Aleksandr Polovtcev >Assignee: Alexander Lapin >Priority: Major > Labels: ignite-3 > > {{org.apache.ignite.internal.tx.TxManager#updateLowWatermark}} returns a > {{CompletableFuture}} that is completed when all currently running RO > transactions finish. Until that future is complete, local Low Watermark does > not get updated. > It is proposed to change this behavior: instead of blocking the Low Watermark > update, all unfinished RO transactions (at the time of the LWM update) must > fail with an appropriate error. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IGNITE-20057) C++ client: Track observable timestamp
[ https://issues.apache.org/jira/browse/IGNITE-20057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Sapego reassigned IGNITE-20057: Assignee: Igor Sapego (was: Pavel Tupitsyn) > C++ client: Track observable timestamp > -- > > Key: IGNITE-20057 > URL: https://issues.apache.org/jira/browse/IGNITE-20057 > Project: Ignite > Issue Type: Improvement > Components: platforms, thin client >Affects Versions: 3.0.0-beta1 >Reporter: Vladislav Pyatkov >Assignee: Igor Sapego >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > > Implement observable timestamp roundtrip in C++ client. See IGNITE-19888 for > more details. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-19499) TableManager should listen CatalogService events instead of configuration
[ https://issues.apache.org/jira/browse/IGNITE-19499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Tkalenko updated IGNITE-19499: - Ignite Flags: (was: Docs Required,Release Notes Required) > TableManager should listen CatalogService events instead of configuration > - > > Key: IGNITE-19499 > URL: https://issues.apache.org/jira/browse/IGNITE-19499 > Project: Ignite > Issue Type: Improvement >Reporter: Andrey Mashenkov >Assignee: Andrey Mashenkov >Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > As of now, TableManager listens configuration events to create internal > structures. > Let's make TableManager listens CatalogService events instead. > Note: Some tests may fails due to changed guarantees and related ticked > incompletion. So, let's do this in a separate feature branch. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IGNITE-20215) DistributionZoneRebalanceEngine should listen CatalogService events instead of configuration
Kirill Tkalenko created IGNITE-20215: Summary: DistributionZoneRebalanceEngine should listen CatalogService events instead of configuration Key: IGNITE-20215 URL: https://issues.apache.org/jira/browse/IGNITE-20215 Project: Ignite Issue Type: Improvement Reporter: Kirill Tkalenko Assignee: Kirill Tkalenko Fix For: 3.0.0-beta2 In the process of implementing IGNITE-20114, it was found that we can separately switch *org.apache.ignite.internal.distributionzones.rebalance.DistributionZoneRebalanceEngine* to the catalog, I propose to do this. -- This message was sent by Atlassian Jira (v8.20.10#820010)