george-zubrienko opened a new issue, #1202: URL: https://github.com/apache/polaris/issues/1202
### Describe the bug After this [change](https://github.com/apache/polaris/commit/ca38c97111946737ed7df0abcb070d3670c519e6) was introduced, when multiple concurrent requests to create/drop tables or even simply read catalog info are thrown at the webhost, it will quite often throw this: ``` Caused by: java.sql.SQLException: Query failed (#20250319_060102_06461_43kzu): Failed to drop table 'staging_custinvoicejour__2025_03_19_06_01_01_3f698ce2_58ab_4f81_8892_66e014a7a927' at io.trino.jdbc.ResultUtils.resultsException(ResultUtils.java:33) at io.trino.jdbc.AsyncResultIterator.lambda$new$1(AsyncResultIterator.java:93) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.base/java.util.concurrent.FutureTask.run(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.base/java.lang.Thread.run(Unknown Source) Suppressed: zio.Cause$FiberTrace: Exception in thread "zio-fiber-1997042971" java.sql.SQLException: Query failed (#20250319_060102_06461_43kzu): Failed to drop table 'staging_custinvoicejour__2025_03_19_06_01_01_3f698ce2_58ab_4f81_8892_66e014a7a927' at com.sneaksanddata.arcane.framework.services.merging.JdbcMergeServiceClient.executeBatchQuery(JdbcMergeServiceClient.scala:233) at com.sneaksanddata.arcane.framework.services.merging.JdbcMergeServiceClient.executeBatchQuery(JdbcMergeServiceClient.scala:234) at com.sneaksanddata.arcane.framework.services.merging.JdbcMergeServiceClient.executeBatchQuery(JdbcMergeServiceClient.scala:235) at com.sneaksanddata.arcane.framework.services.streaming.processors.batch_processors.DisposeBatchProcessor.process(DisposeBatchProcessor.scala:25) at com.sneaksanddata.arcane.framework.services.streaming.processors.batch_processors.DisposeBatchProcessor.process(DisposeBatchProcessor.scala:26) at com.sneaksanddata.arcane.framework.services.streaming.processors.batch_processors.DisposeBatchProcessor.process(DisposeBatchProcessor.scala:27) at com.sneaksanddata.arcane.microsoft_synapse_link.services.app.StreamRunnerServiceCdm.run(StreamRunnerServiceCdm.scala:45) Suppressed: io.trino.jdbc.$internal.client.FailureException: Failed to drop table 'staging_custinvoicejour__2025_03_19_06_01_01_3f698ce2_58ab_4f81_8892_66e014a7a927' Suppressed: io.trino.jdbc.$internal.client.FailureException: Server error: PersistenceException: Exception [EclipseLink-4002] (Eclipse Persistence Services - 4.0.5.v202412231137-a96b873527f305f932543045c8679bb1de8d3a43): org.eclipse.persistence.exceptions.DatabaseException Internal Exception: org.postgresql.util.PSQLException: ERROR: could not serialize access due to read/write dependencies among transactions Detail: Reason code: Canceled on identification as a pivot, during conflict out checking. Hint: The transaction might succeed if retried. Error Code: 0 Call: SELECT CATALOGID, ID, ENTITYVERSION, GRANTRECORDSVERSION, VERSION FROM ENTITIES_CHANGE_TRACKING WHERE ((CATALOGID = ?) AND (ID = ?)) bind => [2 parameters bound] Query: ReadObjectQuery(referenceClass=ModelEntityChangeTracking sql="SELECT CATALOGID, ID, ENTITYVERSION, GRANTRECORDSVERSION, VERSION FROM ENTITIES_CHANGE_TRACKING WHERE ((CATALOGID = ?) AND (ID = ?))") Caused by: io.trino.jdbc.$internal.client.FailureException: Failed to drop table 'staging_custinvoicejour__2025_03_19_06_01_01_3f698ce2_58ab_4f81_8892_66e014a7a927' at io.trino.plugin.iceberg.catalog.rest.TrinoRestCatalog.dropTable(TrinoRestCatalog.java:467) at io.trino.plugin.iceberg.IcebergMetadata.dropTable(IcebergMetadata.java:2391) at io.trino.plugin.base.classloader.ClassLoaderSafeConnectorMetadata.dropTable(ClassLoaderSafeConnectorMetadata.java:452) at io.trino.tracing.TracingConnectorMetadata.dropTable(TracingConnectorMetadata.java:388) at io.trino.metadata.MetadataManager.dropTable(MetadataManager.java:1062) ``` ### To Reproduce 1. Deploy Polaris 0.9 with Postgres 15.10 metastore backend, with `persistence.xml` shown below: ``` <persistence version="2.0" xmlns="http://java.sun.com/xml/ns/persistence" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://java.sun.com/xml/ns/persistence http://java.sun.com/xml/ns/persistence/persistence_2_0.xsd"> <persistence-unit name="polaris" transaction-type="RESOURCE_LOCAL"> <provider>org.eclipse.persistence.jpa.PersistenceProvider</provider> <class>org.apache.polaris.jpa.models.ModelEntity</class> <class>org.apache.polaris.jpa.models.ModelEntityActive</class> <class>org.apache.polaris.jpa.models.ModelEntityChangeTracking</class> <class>org.apache.polaris.jpa.models.ModelEntityDropped</class> <class>org.apache.polaris.jpa.models.ModelGrantRecord</class> <class>org.apache.polaris.jpa.models.ModelPrincipalSecrets</class> <class>org.apache.polaris.jpa.models.ModelSequenceId</class> <shared-cache-mode>NONE</shared-cache-mode> <properties> <property name="jakarta.persistence.jdbc.url" value="jdbc:postgresql://..:5432/{realm}"/> <property name="jakarta.persistence.jdbc.user" value="..."/> <property name="jakarta.persistence.jdbc.password" value="..."/> <property name="jakarta.persistence.schema-generation.database.action" value="create"/> <property name="eclipselink.logging.level.sql" value="OFF"/> <property name="eclipselink.logging.parameters" value="false"/> <property name="eclipselink.persistence-context.flush-mode" value="auto"/> <property name="eclipselink.connection-pool.default.initial" value="1" /> <property name="eclipselink.connection-pool.default.min" value="1" /> <property name="eclipselink.connection-pool.default.max" value="32" /> <property name="eclipselink.session.customizer" value="org.apache.polaris.extension.persistence.impl.eclipselink.PolarisEclipseLinkSessionCustomizer" /> <property name="eclipselink.transaction.join-existing" value="true" /> </properties> </persistence-unit> </persistence> ``` 2. Deploy Polaris server via Helm chart with **autoscaling enabled**, 2 min replicas, bootstrap from admin tool of the same version as per doc. 3. Throw 30 parallel unique table create/drop/select from statements for 5 minutes, observe the fun ### Actual Behavior Around 90% of the statements succeed, DROP fail sometimes with the error shown in the issue ### Expected Behavior No errors thrown ### Additional context _No response_ ### System information Polaris v0.9, commit from Mar 17, Postgres 15.10 (Aurora), container build deployed on EKS 1.29, from a helm chart built from the same commit. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
