Re: Spark-SQL - Concurrent Inserts Into Same Table Throws Exception

2023-07-30 Thread Mich Talebzadeh
ok so as expected the underlying database is Hive. Hive uses hdfs storage. You said you encountered limitations on concurrent writes. The order and limitations are introduced by Hive metastore so to speak. Since this is all happening through Spark, by default implementation of the Hive metastore

Re: Spark-SQL - Concurrent Inserts Into Same Table Throws Exception

2023-07-30 Thread Patrick Tucci
Hi Mich and Pol, Thanks for the feedback. The database layer is Hadoop 3.3.5. The cluster restarted so I lost the stack trace in the application UI. In the snippets I saved, it looks like the exception being thrown was from Hive. Given the feedback you've provided, I suspect the issue is with how

Re: Spark-SQL - Concurrent Inserts Into Same Table Throws Exception

2023-07-30 Thread Pol Santamaria
Hi Patrick, You can have multiple writers simultaneously writing to the same table in HDFS by utilizing an open table format with concurrency control. Several formats, such as Apache Hudi, Apache Iceberg, Delta Lake, and Qbeast Format, offer this capability. All of them provide advanced features

Re: Spark-SQL - Concurrent Inserts Into Same Table Throws Exception

2023-07-29 Thread Mich Talebzadeh
It is not Spark SQL that throws the error. It is the underlying Database or layer that throws the error. Spark acts as an ETL tool. What is the underlying DB where the table resides? Is concurrency supported. Please send the error to this list HTH Mich Talebzadeh, Solutions

Spark-SQL - Concurrent Inserts Into Same Table Throws Exception

2023-07-29 Thread Patrick Tucci
Hello, I'm building an application on Spark SQL. The cluster is set up in standalone mode with HDFS as storage. The only Spark application running is the Spark Thrift Server using FAIR scheduling mode. Queries are submitted to Thrift Server using beeline. I have multiple queries that insert rows