But from the user's perspective, optimization is not run right? So it is
still lazy.


On Fri, Jun 8, 2018 at 12:35 PM Li Jin <ice.xell...@gmail.com> wrote:

> Hi All,
>
> Sorry for the long email title. I am a bit surprised to find that the
> current optimizer rule "ConvertToLocalRelation" causes expressions to be
> eager-evaluated in planning phase, this can be demonstrated with the
> following code:
>
> scala> val myUDF = udf((x: String) => { println("UDF evaled"); "result" })
>
> myUDF: org.apache.spark.sql.expressions.UserDefinedFunction =
> UserDefinedFunction(<function1>,StringType,Some(List(StringType)))
>
>
> scala> val df = Seq(("test", 1)).toDF("s", "i").select(myUDF($"s"))
>
> df: org.apache.spark.sql.DataFrame = [UDF(s): string]
>
>
> scala> println(df.queryExecution.optimizedPlan)
>
> UDF evaled
>
> LocalRelation [UDF(s)#9]
>
>  This is somewhat unexpected to me because of Spark's lazy execution model.
>
> I am wondering if this behavior is by design?
>
> Thanks!
> Li
>
>
>

Reply via email to