[ 
https://issues.apache.org/jira/browse/IGNITE-17076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bessonov updated IGNITE-17076:
-----------------------------------
    Description: 
Current MV store bridge API has a fatal flaw, born from a misunderstanding. 
There's a method called "insert" that generates RowId by itself. This is wrong, 
because it can lead to different id for the same row on the replica storage. 
This completely breaks everything.

Every replicated write command, that inserts new value, should produce same row 
ids. There are several ways to achieve this:
 * Use timestamps as identifiers. This is not very convenient, because we would 
have to attach partition id on top of it. It's mandatory to know the partition 
of the row.
 * Use more complicated structure, for example a tuple of (raftCommitIndex, 
partitionId, batchCounter), where

 * 
 ** raftCommitIndex is the index of write command that performs insertion.
 ** partitionId is an integer identifier of the partition. Could be 4 bytes, 
considering that there are plans to support more than 65000 partitions per 
table.
 ** batchCounter is used to differentiate insertions made in a single write 
command. We can limit it with 2 bytes to save a little bit of space, if it's 
necessary.

I prefer the second option, but maybe it could be revised during the 
implementation.

Of course, method "insert" should be removed from bridge API. Tests have to be 
updated. With the lack of RAFT group in storage tests, we can generate row ids 
artificially, it's not a big deal.

EDIT: second option makes it difficult to use row ids in action request 
processor in cases when data is inserted. So, hybrid clock + partition id is a 
better option.

  was:
Current MV store bridge API has a fatal flaw, born from a misunderstanding. 
There's a method called "insert" that generates RowId by itself. This is wrong, 
because it can lead to different id for the same row on the replica storage. 
This completely breaks everything.

Every replicated write command, that inserts new value, should produce same row 
ids. There are several ways to achieve this:
 * Use timestamps as identifiers. This is not very convenient, because we would 
have to attach partition id on top of it. It's mandatory to know the partition 
of the row.
 * Use more complicated structure, for example a tuple of (raftCommitIndex, 
partitionId, batchCounter), where

 ** raftCommitIndex is the index of write command that performs insertion.
 ** partitionId is an integer identifier of the partition. Could be 4 bytes, 
considering that there are plans to support more than 65000 partitions per 
table.
 ** batchCounter is used to differentiate insertions made in a single write 
command. We can limit it with 2 bytes to save a little bit of space, if it's 
necessary.

I prefer the second option, but maybe it could be revised during the 
implementation.

Of course, method "insert" should be removed from bridge API. Tests have to be 
updated. With the lack of RAFT group in storage tests, we can generate row ids 
artificially, it's not a big deal.


> Unify RowId format for different storages
> -----------------------------------------
>
>                 Key: IGNITE-17076
>                 URL: https://issues.apache.org/jira/browse/IGNITE-17076
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Ivan Bessonov
>            Priority: Major
>              Labels: ignite-3
>
> Current MV store bridge API has a fatal flaw, born from a misunderstanding. 
> There's a method called "insert" that generates RowId by itself. This is 
> wrong, because it can lead to different id for the same row on the replica 
> storage. This completely breaks everything.
> Every replicated write command, that inserts new value, should produce same 
> row ids. There are several ways to achieve this:
>  * Use timestamps as identifiers. This is not very convenient, because we 
> would have to attach partition id on top of it. It's mandatory to know the 
> partition of the row.
>  * Use more complicated structure, for example a tuple of (raftCommitIndex, 
> partitionId, batchCounter), where
>  * 
>  ** raftCommitIndex is the index of write command that performs insertion.
>  ** partitionId is an integer identifier of the partition. Could be 4 bytes, 
> considering that there are plans to support more than 65000 partitions per 
> table.
>  ** batchCounter is used to differentiate insertions made in a single write 
> command. We can limit it with 2 bytes to save a little bit of space, if it's 
> necessary.
> I prefer the second option, but maybe it could be revised during the 
> implementation.
> Of course, method "insert" should be removed from bridge API. Tests have to 
> be updated. With the lack of RAFT group in storage tests, we can generate row 
> ids artificially, it's not a big deal.
> EDIT: second option makes it difficult to use row ids in action request 
> processor in cases when data is inserted. So, hybrid clock + partition id is 
> a better option.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to