Thanks Michael.
2015-01-08 6:04 GMT+08:00 Michael Armbrust mich...@databricks.com:
The cache command caches the entire table, with each column stored in its
own byte buffer. When querying the data, only the columns that you are
asking for are scanned in memory. I'm not sure what mechanism
Hi,
I've heard a lot of complain about spark's pull style shuffle. Is
there any plan to support push style shuffle in the near future?
Currently, the shuffle phase must be completed before the next stage
starts. While, it is said, in Impala, the shuffled data is streamed to
the next