[jira] [Updated] (IGNITE-21003) Switch to the 'always use current schema' approach

2023-12-03 Thread Roman Puchkovskiy (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Puchkovskiy updated IGNITE-21003:
---
Description: 
This issue is to think about the problem and design it first, not about 
implementing it right away.

Currently, for an RW transaction, we take schema at the beginning of the 
transaction and run the whole transaction on that schema (indices are an 
exception, but this is not visible to the end user). The transaction still 
notices most schema changes (if a change happens before a read/write in the 
transaction, the transaction gets aborted), but it does not notice a table 
created after the transaction had started.

IGNITE-20107 addressed this issue, but it was decided that we need to design 
this in more depth.

An alternative is to always use the latest schema on each operation (still 
having schema validation). This might have some downsides/bring difficulties:
 # Same query might return data with different schemas
 # It's not clear against which schema to validate the current schema at 
execution of each operation (probably, still against the initial one, defined 
as Max(txStartTs, tableCreationTs))
 # A query would have to use the same query timestamp on each node executing 
its fragments, so it would have to propagate this timestamp

The schema synchronization design was created starting with an assumption that 
the schema is taken for the start of a transaction, so the design should be 
revised carefully when switching to the proposed one.
h2. Proposed changes

It seems that this can be achieved with the following:
 # Always execute a KV/SQL operation/query using the current schema (obtained 
using the schema sync procedure for now())
 # An operation/query executed distributively (like SQL queries that produce a 
few fragments executed on different nodes) must pass that query timestamp to 
each of the nodes participating in its execution; they must use this timestamp 
to get the 'current' schema
 # In each read/write operation processed in a PartitionReplicaListener, 
instead of failing the operation if the current schema is different from the 
initial transaction schema, do the (already implemented) forward compatibility 
check (so a few white-listed change types, like adding a nullable column, will 
be allowed). This is optional, but the 'fail any read/write if theĀ  table 
schema is changed in any way' rule was introduced to disallow a user see any 
effects of a DDL in the middle of the transaction (except for its abortion); if 
we allow each query see new schema, this strict rule kinda makes no sense 
anymore. (On commit, we still do forward compatibility check)
 # To do the commit/read/write forward compatibility check, take the initial 
table schema not at transaciton start, but at the moment when the transaction 
had first enlisted the table. For this, we might need a mechanism to pass the 
'tableId->enlistTs' map with each transactional operation/query (back and 
forth), analogously to how it's done to maintain maxObservableTimestamp.
 # Probably we should make column rename forward incompatible as the user will 
now have to switch to the new name immediately.

  was:
This issue is to think about the problem and design it first, not about 
implementing it right away.

Currently, for an RW transaction, we take schema at the beginning of the 
transaction and run the whole transaction on that schema (indices are an 
exception, but this is not visible to the end user). The transaction still 
notices most schema changes (if a change happens before a read/write in the 
transaction, the transaction gets aborted), but it does not notice a table 
created after the transaction had started.

IGNITE-20107 addressed this issue, but it was decided that we need to design 
this in more depth.

An alternative is to always use the latest schema on each operation (still 
having schema validation). This might have some downsides/bring difficulties:
 # Same query might return data with different schemas
 # It's not clear against which schema to validate the current schema at 
execution of each operation (probably, still against the initial one, defined 
as Max(txStartTs, tableCreationTs))
 # A query would have to use the same query timestamp on each node executing 
its fragments, so it would have to propagate this timestamp

The schema synchronization design was created starting with an assumption that 
the schema is taken for the start of a transaction, so the design should be 
revised carefully when switching to the proposed one.


> Switch to the 'always use current schema' approach
> --
>
> Key: IGNITE-21003
> URL: https://issues.apache.org/jira/browse/IGNITE-21003
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Roman Puchkovskiy
>Priority: Major
>  

[jira] [Updated] (IGNITE-21003) Switch to the 'always use current schema' approach

2023-12-01 Thread Roman Puchkovskiy (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-21003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Puchkovskiy updated IGNITE-21003:
---
Description: 
This issue is to think about the problem and design it first, not about 
implementing it right away.

Currently, for an RW transaction, we take schema at the beginning of the 
transaction and run the whole transaction on that schema (indices are an 
exception, but this is not visible to the end user). The transaction still 
notices most schema changes (if a change happens before a read/write in the 
transaction, the transaction gets aborted), but it does not notice a table 
created after the transaction had started.

IGNITE-20107 addressed this issue, but it was decided that we need to design 
this in more depth.

An alternative is to always use the latest schema on each operation (still 
having schema validation). This might have some downsides/bring difficulties:
 # Same query might return data with different schemas
 # It's not clear against which schema to validate the current schema at 
execution of each operation (probably, still against the initial one, defined 
as Max(txStartTs, tableCreationTs))
 # A query would have to use the same query timestamp on each node executing 
its fragments, so it would have to propagate this timestamp

The schema synchronization design was created starting with an assumption that 
the schema is taken for the start of a transaction, so the design should be 
revised carefully when switching to the proposed one.

> Switch to the 'always use current schema' approach
> --
>
> Key: IGNITE-21003
> URL: https://issues.apache.org/jira/browse/IGNITE-21003
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Roman Puchkovskiy
>Priority: Major
>  Labels: ignite-3
>
> This issue is to think about the problem and design it first, not about 
> implementing it right away.
> Currently, for an RW transaction, we take schema at the beginning of the 
> transaction and run the whole transaction on that schema (indices are an 
> exception, but this is not visible to the end user). The transaction still 
> notices most schema changes (if a change happens before a read/write in the 
> transaction, the transaction gets aborted), but it does not notice a table 
> created after the transaction had started.
> IGNITE-20107 addressed this issue, but it was decided that we need to design 
> this in more depth.
> An alternative is to always use the latest schema on each operation (still 
> having schema validation). This might have some downsides/bring difficulties:
>  # Same query might return data with different schemas
>  # It's not clear against which schema to validate the current schema at 
> execution of each operation (probably, still against the initial one, defined 
> as Max(txStartTs, tableCreationTs))
>  # A query would have to use the same query timestamp on each node executing 
> its fragments, so it would have to propagate this timestamp
> The schema synchronization design was created starting with an assumption 
> that the schema is taken for the start of a transaction, so the design should 
> be revised carefully when switching to the proposed one.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)