Hi Chin Wei,
Thank you for confirming this on 2.0.1 and being happy to hear it never
happens.
The performance will be improved when this PR (
https://github.com/apache/spark/pull/15219) is integrated.
Regards,
Kazuaki Ishizaki
From: Chin Wei Low
To: Kazuaki
Hi Kazuaki,
I print a debug log right before I call the collect, and use that to
compare against the job start log (it is available when turning on debug
log).
Anyway, I test that in Spark 2.0.1 and never see it happen. But, the query
on cached dataframe is still slightly slower than the one
Hi Chin Wei,
I am sorry for being late to reply.
Got it. Interesting behavior. How did you measure the time between 1st and
2nd events?
Best Regards,
Kazuaki Ishizaki
From: Chin Wei Low
To: Kazuaki Ishizaki/Japan/IBM@IBMJP
Cc: user@spark.apache.org
Date: