Re: Spark SQL: The cached columnar table is not columnar?

2015-01-07 Thread
Thanks Michael. 2015-01-08 6:04 GMT+08:00 Michael Armbrust mich...@databricks.com: The cache command caches the entire table, with each column stored in its own byte buffer. When querying the data, only the columns that you are asking for are scanned in memory. I'm not sure what mechanism

When will spark support push style shuffle?

2015-01-07 Thread
Hi, I've heard a lot of complain about spark's pull style shuffle. Is there any plan to support push style shuffle in the near future? Currently, the shuffle phase must be completed before the next stage starts. While, it is said, in Impala, the shuffled data is streamed to the next