Hi Makatun,

For 2, I guess `cache` will break up the logical plan and force it be
analyzed.
For 3, I have a similar observation here
https://medium.com/@manuzhang/the-hidden-cost-of-spark-withcolumn-8ffea517c015.
Each `withColumn` will force the logical plan to be analyzed which is not
free. There is `RuleExecutor.dumpTimeSpent` that prints analysis time and
turning on DEBUG log will also give you much more info.

Thanks,
Manu Zhang

On Mon, Aug 20, 2018 at 10:25 PM antonkulaga <antonkul...@gmail.com> wrote:

> makatun, did you try to test somewhing more complex, like
> dataframe.describe
> or PCA?
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Reply via email to