Zef Wolffs created LIVY-767: ------------------------------- Summary: Spark-Cassandra-Connector's joinWithCassandraTable makes spark use only two executors Key: LIVY-767 URL: https://issues.apache.org/jira/browse/LIVY-767 Project: Livy Issue Type: Bug Components: Interpreter Affects Versions: 0.7.0 Environment: Spark and Livy running in containers on Kubernetes. Deployed using Microsofts Helm chart: https://hub.helm.sh/charts/microsoft/spark. Reporter: Zef Wolffs
I have a script that uses joinWithCassandraTable, a method available through the spark-cassandra-connector. This script usually claims all available executors (about 25 in my setup) for this method when ran from anywhere directly through spark-submit. However, when I run it through Livy, spark only claims two executors which makes the job much slower. The script is the following: {noformat} case class TimeSeriesContainerID(timeseriescontainerid: Int) // Defines partition key val timeSeriesContainerIDofInterest = sc.parallelize(1 to 20).map(TimeSeriesContainerID(_)) val rdd = timeSeriesContainerIDofInterest.joinWithCassandraTable[(Timestamp, Int)]("test_ts_partitions", "ts_0") val aggregatedRDD = rdd.select("timestamp", "tsvalue").map(s => s._2).reduceByKey((a, b) => a + b).collect.head{noformat} I am happy to provide more information as requested. -- This message was sent by Atlassian Jira (v8.3.4#803005)