Hello, I'm building an application on Spark SQL. The cluster is set up in standalone mode with HDFS as storage. The only Spark application running is the Spark Thrift Server using FAIR scheduling mode. Queries are submitted to Thrift Server using beeline.
I have multiple queries that insert rows into the same table (EventClaims). These queries work fine when run sequentially, however, some individual queries don't fully utilize the resources available on the cluster. I would like to run all of these queries concurrently to more fully utilize available resources. When I attempt to do this, tasks eventually begin to fail. The stack trace is pretty long, but here's what looks like the most relevant parts: org.apache.spark.sql.errors.QueryExecutionErrors$.taskFailedWhileWritingRowsError(QueryExecutionErrors.scala:788) org.apache.hive.service.cli.HiveSQLException: Error running query: org.apache.spark.SparkException: Job aborted due to stage failure: Task 28 in stage 128.0 failed 4 times, most recent failure: Lost task 28.3 in stage 128.0 (TID 6578) (10.0.50.2 executor 0): org.apache.spark.SparkException: [TASK_WRITE_FAILED] Task failed while writing rows to hdfs:// 10.0.50.1:8020/user/spark/warehouse/eventclaims. Is it possible to have multiple concurrent writers to the same table with Spark SQL? Is there any way to make this work? Thanks for the help. Patrick