Re: Spark SQL is slower when DataFrame is cache in Memory

2016-10-27 Thread Kazuaki Ishizaki
Hi Chin Wei, Thank you for confirming this on 2.0.1 and being happy to hear it never happens. The performance will be improved when this PR ( https://github.com/apache/spark/pull/15219) is integrated. Regards, Kazuaki Ishizaki From: Chin Wei Low To: Kazuaki

Re: Spark SQL is slower when DataFrame is cache in Memory

2016-10-25 Thread Chin Wei Low
Hi Kazuaki, I print a debug log right before I call the collect, and use that to compare against the job start log (it is available when turning on debug log). Anyway, I test that in Spark 2.0.1 and never see it happen. But, the query on cached dataframe is still slightly slower than the one

Re: Spark SQL is slower when DataFrame is cache in Memory

2016-10-24 Thread Kazuaki Ishizaki
Hi Chin Wei, I am sorry for being late to reply. Got it. Interesting behavior. How did you measure the time between 1st and 2nd events? Best Regards, Kazuaki Ishizaki From: Chin Wei Low To: Kazuaki Ishizaki/Japan/IBM@IBMJP Cc: user@spark.apache.org Date: