Questions related to writing data to S3

2017-04-23 Thread Richard Hanson
I have a streaming job which writes data to S3. I know there are saveAs functions helping write data to S3. But it bundles all elements then writes out to S3. So my first question - Is there any way to let saveAs functions write data in batch or single elements instead of whole bundle?

Spark-shell's performance

2017-04-17 Thread Richard Hanson
I am playing with some data using (stand alone) spark-shell (Spark version 1.6.0) by executing `spark-shell`. The flow is simple; a bit like cp - basically moving local 100k files (the max size is 190k) to S3. Memory is configured as below export SPARK_DRIVER_MEMORY=8192M export