But from the user's perspective, optimization is not run right? So it is still lazy.
On Fri, Jun 8, 2018 at 12:35 PM Li Jin <ice.xell...@gmail.com> wrote: > Hi All, > > Sorry for the long email title. I am a bit surprised to find that the > current optimizer rule "ConvertToLocalRelation" causes expressions to be > eager-evaluated in planning phase, this can be demonstrated with the > following code: > > scala> val myUDF = udf((x: String) => { println("UDF evaled"); "result" }) > > myUDF: org.apache.spark.sql.expressions.UserDefinedFunction = > UserDefinedFunction(<function1>,StringType,Some(List(StringType))) > > > scala> val df = Seq(("test", 1)).toDF("s", "i").select(myUDF($"s")) > > df: org.apache.spark.sql.DataFrame = [UDF(s): string] > > > scala> println(df.queryExecution.optimizedPlan) > > UDF evaled > > LocalRelation [UDF(s)#9] > > This is somewhat unexpected to me because of Spark's lazy execution model. > > I am wondering if this behavior is by design? > > Thanks! > Li > > >