to a long stop-the-world GC
> pause. This should not happen on the machine running the driver program if
> all that you are doing is reading data from HDFS, perform a bunch of
> transformations and write result back into HDFS.
>
>
>
> Perhaps, the program is not actually using Spa
I'm having trouble with a Spark SQL job in which I run a series of SQL
transformations on data loaded from HDFS.
The first two stages load data from hdfs input without issues, but later
stages that require shuffles cause the driver memory to keep rising until
it is exhausted, and then the driver
ich.wordpress.com <http://talebzadehmich.wordpress.com/>
>
>
> On 13 June 2016 at 20:45, Khaled Hammouda <khaled.hammo...@kik.com
> <mailto:khaled.hammo...@kik.com>> wrote:
> Hi Michael,
>
> Thanks for the suggestion to use Spark 2.0 preview. I just downloa
Hi Michael,
Thanks for the suggestion to use Spark 2.0 preview. I just downloaded the
preview and tried using it, but I’m running into the exact same issue.
Khaled
> On Jun 13, 2016, at 2:58 PM, Michael Armbrust wrote:
>
> You might try with the Spark 2.0 preview. We
Great! That's what I gathered from the thread titled Serial batching with
Spark Streaming, but thanks for confirming this again.
On 6 July 2015 at 15:31, Tathagata Das t...@databricks.com wrote:
Yes, RDD of batch t+1 will be processed only after RDD of batch t has been
processed. Unless there