Hi All,

I have a use case where I have cached my schemaRDD and I want to launch
executors just on the partition which I know of (prime use-case of

I tried something like following :-

val partitionIdx = 2
val schemaRdd = hiveContext.table("myTable") //myTable is cached in memory
val partitionPrunedRDD = new PartitionPrunedRDD(schemaRdd, _ ==
val partitionSchemaRDD = hiveContext.applySchema(partitionPrunedRDD,
hiveContext.hql("select * from myTablePartition2 where id=10001")

If I do this, if I expect my executor to run query in 500ms, it is running
in 3000-4000 ms. I think this is happening because I did "applySchema" and
lost the queryExecution plan. 

But, if I do partitionSchemaRDD.cache as well, then I get the 500ms
performance but in this case, same partition/data is getting cached twice. 

My question is that can we create a PartitionPruningCachedSchemaRDD like
class which can prune the partitions of InMemoryColumnarTableScan's
RDD[CachedBatch] and launch executor on just the selected partition(s)?


View this message in context: 
Sent from the Apache Spark User List mailing list archive at Nabble.com.

To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to