jacques-n commented on a change in pull request #1849: URL: https://github.com/apache/iceberg/pull/1849#discussion_r536468975
########## File path: api/src/main/java/org/apache/iceberg/catalog/TransactionalCatalog.java ########## @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.iceberg.catalog; + +import org.apache.iceberg.catalog.SupportsCatalogTransactions.IsolationLevel; +import org.apache.iceberg.catalog.SupportsCatalogTransactions.LockingMode; +import org.apache.iceberg.exceptions.CommitFailedException; + +/** + * A {@link Catalog} that applies all mutations within a single transaction. + * + * <p>A TransactionalCatalog can spawn child transactions for multiple operations on different + * tables. All operations will be done within the context of a single Catalog-level transaction + * and they will either all be successful or all fail. + * + * <p>A TransactionalCatalog is initially active upon creation and will remain so until one of + * the following terminal actions occurs: + * <ul> + * <li>{@link rollback} is called. + * <li>{@link commit} is called. + * <li>The transaction expires while using Pessimistic {@link LockingMode}. + * <li>The transaction is terminated externally (for example, when a locking arbitrator + * determines a deadlock between two transactions has occurred). + * <li>The underlying implementation determines that the transaction can no longer complete + * successfully. + * </ul> + * + * <p>When one of the items above occurs, the transaction is no longer valid. Further use + * of the transaction will result in a {@link IllegalStateException} being thrown. + * + * <p>Nested transactions such as creating a new table may fail. Those failures alone do + * not necessarily result in a failure of the catalog-level transaction. + * + */ +public interface TransactionalCatalog extends Catalog, AutoCloseable { + + /** + * An internal identifier associated with this transaction. + * @return An internal identifier. + */ + String transactionId(); + + /** + * Return the current {@code IsolationLevel} for this transaction. + * @return The IsolationLevel for this transaction. + */ + IsolationLevel isolationLevel(); + + /** + * Return the {@link LockingMode} for this transaction. + * @return The LockingMode for this transaction. + */ + LockingMode lockingMode(); + + /** + * Whether the current transaction is still active/open. + * @return True until a terminal action occurs. + */ + boolean active(); + + /** + * Aborts the set of operations here and makes this TransactionalCatalog inoperable. + * + * <p>Once called, no further operations can be done against this catalog. If any + * operations are attempted, {@link IllegalStateException} will be thrown. + */ + void rollback(); + + /** + * Commit the pending changes from all nested transactions against the Catalog. + * + * <p>Once called, no further operations can be done against this catalog. If any + * operations are attempted, {@link IllegalStateException} will be thrown. + * + * @throws CommitFailedException If the updates cannot be committed due to conflicts. + */ + void commit(); + + /** + * A shortcut for {@link commit} that allows users to use this catalog in try-with-resources + * block. + * + * @throws CommitFailedException If the updates cannot be committed due to conflicts. + */ + @Override + default void close() { Review comment: Updated to use rollback closeable pattern. ########## File path: api/src/main/java/org/apache/iceberg/catalog/SupportsCatalogTransactions.java ########## @@ -0,0 +1,162 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.iceberg.catalog; + +import java.util.Set; + +/** + * Catalog methods for working with catalog-level transactions. + * + * <p>Catalog implementations are not required to support catalog-level transactional state. + * If they do, they may support one or more {@code IsolationLevel}s and one or more + * {@code LockingMode}s. + */ +public interface SupportsCatalogTransactions { + + /** + * The level of isolation for a catalog-level transaction. + * + * <p>Isolation covers both what data is read and what data can be written. + * + * <p>At all levels, data is only visible if it is either committed by another transaction or + * committed by a nested transaction within this catalog-level transaction. + * + * <p>Individual nested Table transactions may be "rebased" to expose updated versions of a + * table if the isolation level allows that behavior. + * + * <p>In the definitions of each isolation level, the concept of conflicting writes is + * referenced. Conflicting writes are two mutations to the same object that happen concurrently. + * Depending on the particular implementation, the coarseness of this conflict may vary. The + * most coarse conflict is any two mutations to the same table. However, some implementations + * may consider some of these "absolute" conflicts as allowable by using finer-grained conflict + * resolution. For example, two different operations that both append new files to a table may + * be in "absolute" conflict but could be resolved automatically as a "safe conflict" by using + * a set of automatic implementation-defined conflict resolution rules. + */ + enum IsolationLevel { + + /** + * Reading the same table multiple times may result in different versions read of the same + * table. A commit can be completed as long as any tables changed externally do not conflict + * with any writes within this transaction. + */ + READ_COMMITTED, + + /** + * Reading the same table multiple times will result in the same view of that table. + * Different tables may come from different snapshots. A commit can be completed as + * long as any tables changed externally do not conflict with any writes within this + * transaction. + */ + REPEATED_READ, + + /** + * A commit will only succeed if there have been no meaningful changes to data read during + * the course of this transaction prior to commit. This imposes stricter read guarantees than + * {@code REPEATED_READ} (consistent reads per table) as it requires that the reads are + * consistent for all tables to a single point in time (or single snapshot of the database). + * Additionally, it implies additional requirements around the successful completion of a + * write. In order for a write to complete, any entities read during this transaction are also + * blocked from changing (via another transaction) post-read in ways that would influence the + * writes of this operation. This is also sometimes called snapshot isolation. Review comment: Snapshot isolation in general is more sticky from my pov as I don't believe there is a canonical definition of it. Serializable has a very clear definition from sql 92. People were confused previously by this definition and the missing snapshot isolation which is what caused me to add this sentence. ########## File path: api/src/main/java/org/apache/iceberg/Table.java ########## @@ -41,6 +41,13 @@ default String name() { /** * Refresh the current table metadata. + * + * <p>If this table is associated with a TransactionalCatalog, this refresh will be bounded by + * the visibility that the {@code IsolationLevel} of that transaction exposes. For example, if + * we are in a context of {@code READ_COMMITTED}, this refresh will update to the latest state + * of the table. However, in the case of {@code SERIALIZABLE} where this table hasn't mutated + * within this transaction, calling refresh will have no impact as the isolation level Review comment: correct ########## File path: api/src/main/java/org/apache/iceberg/catalog/SupportsCatalogTransactions.java ########## @@ -0,0 +1,162 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.iceberg.catalog; + +import java.util.Set; + +/** + * Catalog methods for working with catalog-level transactions. + * + * <p>Catalog implementations are not required to support catalog-level transactional state. + * If they do, they may support one or more {@code IsolationLevel}s and one or more + * {@code LockingMode}s. + */ +public interface SupportsCatalogTransactions { + + /** + * The level of isolation for a catalog-level transaction. + * + * <p>Isolation covers both what data is read and what data can be written. + * + * <p>At all levels, data is only visible if it is either committed by another transaction or + * committed by a nested transaction within this catalog-level transaction. + * + * <p>Individual nested Table transactions may be "rebased" to expose updated versions of a + * table if the isolation level allows that behavior. + * + * <p>In the definitions of each isolation level, the concept of conflicting writes is + * referenced. Conflicting writes are two mutations to the same object that happen concurrently. + * Depending on the particular implementation, the coarseness of this conflict may vary. The + * most coarse conflict is any two mutations to the same table. However, some implementations + * may consider some of these "absolute" conflicts as allowable by using finer-grained conflict + * resolution. For example, two different operations that both append new files to a table may + * be in "absolute" conflict but could be resolved automatically as a "safe conflict" by using + * a set of automatic implementation-defined conflict resolution rules. + */ + enum IsolationLevel { + + /** + * Reading the same table multiple times may result in different versions read of the same + * table. A commit can be completed as long as any tables changed externally do not conflict + * with any writes within this transaction. + */ + READ_COMMITTED, + + /** + * Reading the same table multiple times will result in the same view of that table. + * Different tables may come from different snapshots. A commit can be completed as + * long as any tables changed externally do not conflict with any writes within this + * transaction. + */ + REPEATED_READ, + + /** + * A commit will only succeed if there have been no meaningful changes to data read during + * the course of this transaction prior to commit. This imposes stricter read guarantees than Review comment: This is defined in more detail further down in this paragraph. This is a simplified introductory sentence to help guide the users. > any entities read during this transaction are also blocked from changing (via another transaction) post-read in ways that would influence the writes of this operation ########## File path: api/src/main/java/org/apache/iceberg/catalog/SupportsCatalogTransactions.java ########## @@ -0,0 +1,162 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.iceberg.catalog; + +import java.util.Set; + +/** + * Catalog methods for working with catalog-level transactions. + * + * <p>Catalog implementations are not required to support catalog-level transactional state. + * If they do, they may support one or more {@code IsolationLevel}s and one or more + * {@code LockingMode}s. + */ +public interface SupportsCatalogTransactions { + + /** + * The level of isolation for a catalog-level transaction. + * + * <p>Isolation covers both what data is read and what data can be written. + * + * <p>At all levels, data is only visible if it is either committed by another transaction or + * committed by a nested transaction within this catalog-level transaction. + * + * <p>Individual nested Table transactions may be "rebased" to expose updated versions of a + * table if the isolation level allows that behavior. + * + * <p>In the definitions of each isolation level, the concept of conflicting writes is Review comment: It's more complex than that and I think different implementations will do it differently, which is why I state that this will be implementation dependent. The types of legal conflict resolutions change depending on the isolation levels and and how much work implementers want to put into things. My expectation is that initially the conflict resolution will be fairly minimal and people will mostly use serializable which basically disallows conflict resolution. In time, as we can reliably distinguish open for modify versus open for scan, we will be able to get more sophisticated but I don't want to put such a high bar at the API to begin that no one can (or is willing to) implement the interface. ########## File path: api/src/main/java/org/apache/iceberg/catalog/SupportsCatalogTransactions.java ########## @@ -0,0 +1,162 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.iceberg.catalog; + +import java.util.Set; + +/** + * Catalog methods for working with catalog-level transactions. + * + * <p>Catalog implementations are not required to support catalog-level transactional state. + * If they do, they may support one or more {@code IsolationLevel}s and one or more + * {@code LockingMode}s. + */ +public interface SupportsCatalogTransactions { + + /** + * The level of isolation for a catalog-level transaction. + * + * <p>Isolation covers both what data is read and what data can be written. + * + * <p>At all levels, data is only visible if it is either committed by another transaction or + * committed by a nested transaction within this catalog-level transaction. + * + * <p>Individual nested Table transactions may be "rebased" to expose updated versions of a + * table if the isolation level allows that behavior. + * + * <p>In the definitions of each isolation level, the concept of conflicting writes is + * referenced. Conflicting writes are two mutations to the same object that happen concurrently. + * Depending on the particular implementation, the coarseness of this conflict may vary. The + * most coarse conflict is any two mutations to the same table. However, some implementations + * may consider some of these "absolute" conflicts as allowable by using finer-grained conflict + * resolution. For example, two different operations that both append new files to a table may + * be in "absolute" conflict but could be resolved automatically as a "safe conflict" by using + * a set of automatic implementation-defined conflict resolution rules. + */ + enum IsolationLevel { + + /** + * Reading the same table multiple times may result in different versions read of the same Review comment: All isolation level concepts are within the same transaction so yes. ########## File path: api/src/main/java/org/apache/iceberg/catalog/SupportsCatalogTransactions.java ########## @@ -0,0 +1,162 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.iceberg.catalog; + +import java.util.Set; + +/** + * Catalog methods for working with catalog-level transactions. + * + * <p>Catalog implementations are not required to support catalog-level transactional state. + * If they do, they may support one or more {@code IsolationLevel}s and one or more + * {@code LockingMode}s. + */ +public interface SupportsCatalogTransactions { + + /** + * The level of isolation for a catalog-level transaction. + * + * <p>Isolation covers both what data is read and what data can be written. + * + * <p>At all levels, data is only visible if it is either committed by another transaction or + * committed by a nested transaction within this catalog-level transaction. + * + * <p>Individual nested Table transactions may be "rebased" to expose updated versions of a + * table if the isolation level allows that behavior. + * + * <p>In the definitions of each isolation level, the concept of conflicting writes is + * referenced. Conflicting writes are two mutations to the same object that happen concurrently. + * Depending on the particular implementation, the coarseness of this conflict may vary. The + * most coarse conflict is any two mutations to the same table. However, some implementations + * may consider some of these "absolute" conflicts as allowable by using finer-grained conflict + * resolution. For example, two different operations that both append new files to a table may + * be in "absolute" conflict but could be resolved automatically as a "safe conflict" by using + * a set of automatic implementation-defined conflict resolution rules. + */ + enum IsolationLevel { + + /** + * Reading the same table multiple times may result in different versions read of the same + * table. A commit can be completed as long as any tables changed externally do not conflict + * with any writes within this transaction. + */ + READ_COMMITTED, + + /** + * Reading the same table multiple times will result in the same view of that table. + * Different tables may come from different snapshots. A commit can be completed as + * long as any tables changed externally do not conflict with any writes within this + * transaction. + */ + REPEATED_READ, Review comment: Agreed, will update. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
