Hi All,

My question is about lazy running mode for SchemaRDD, I guess. I know lazy
mode is good, however, I still have this demand.

For example, here is the first SchemaRDD, named result.(select * from table
where num>1 and num < 4):

results: org.apache.spark.sql.SchemaRDD =
SchemaRDD[59] at RDD at SchemaRDD.scala:103
== Query Plan ==
== Physical Plan ==
Filter ((num#0 > 1) && (num#0 < 4))
 ExistingRdd [num#0,str1#1,str2#2], MapPartitionsRDD[4] at mapPartitions at
basicOperators.scala:208

Then I create the second RDD with: select num, str1 from table from result

results1: org.apache.spark.sql.SchemaRDD =
SchemaRDD[60] at RDD at SchemaRDD.scala:103
== Query Plan ==
== Physical Plan ==
Project [num#0,str1#1]
 Filter ((num#0 > 1) && (num#0 < 4))
  ExistingRdd [num#0,str1#1,str2#2], MapPartitionsRDD[4] at mapPartitions
at basicOperators.scala:208

Actually, I want the second RDD's plan is based on result not the original
table.

How can I create a new SchemaRDD whose plan starts from last RDD?

Thanks,
Tim

Reply via email to