Zef Wolffs created LIVY-767:
-------------------------------

             Summary: Spark-Cassandra-Connector's joinWithCassandraTable makes 
spark use only two executors
                 Key: LIVY-767
                 URL: https://issues.apache.org/jira/browse/LIVY-767
             Project: Livy
          Issue Type: Bug
          Components: Interpreter
    Affects Versions: 0.7.0
         Environment: Spark and Livy running in containers on Kubernetes. 
Deployed using Microsofts Helm chart: 
https://hub.helm.sh/charts/microsoft/spark.
            Reporter: Zef Wolffs


I have a script that uses joinWithCassandraTable, a method available through 
the spark-cassandra-connector. This script usually claims all available 
executors (about 25 in my setup) for this method when ran from anywhere 
directly through spark-submit. However, when I run it through Livy, spark only 
claims two executors which makes the job much slower. 

 

The script is the following: 

 
{noformat}
case class TimeSeriesContainerID(timeseriescontainerid: Int) // Defines 
partition key

val timeSeriesContainerIDofInterest = sc.parallelize(1 to 
20).map(TimeSeriesContainerID(_)) 

val rdd =  timeSeriesContainerIDofInterest.joinWithCassandraTable[(Timestamp, 
Int)]("test_ts_partitions", "ts_0")

val aggregatedRDD = rdd.select("timestamp", "tsvalue").map(s => 
s._2).reduceByKey((a, b) => a + b).collect.head{noformat}
 
 
I am happy to provide more information as requested.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to