Pinging back to see if anybody could provide me with some pointers on hot
to stream/batch JSON-to-ORC conversion in Spark SQL or why I get an OOM
dump with such small memory footprint?
Thanks,
Alec
On Wed, Nov 15, 2017 at 11:03 AM, Alec Swan wrote:
> Thanks Steve and Vadim
Thanks Steve and Vadim for the feedback.
@Steve, are you suggesting creating a custom receiver and somehow piping it
through Spark Streaming/Spark SQL? Or are you suggesting creating smaller
datasets from the stream and using my original code to process smaller
datasets? It'd be very helpful for
There's a lot of off-heap memory involved in decompressing Snappy,
compressing ZLib.
Since you're running using `local[*]`, you process multiple tasks
simultaneously, so they all might consume memory.
I don't think that increasing heap will help, since it looks like you're
hitting system memory
On 14 Nov 2017, at 15:32, Alec Swan
> wrote:
But I wonder if there is a way to stream/batch the content of JSON file in
order to convert it to ORC piecemeal and avoid reading the whole JSON file in
memory in the first place?
That is what
Thanks all. I am not submitting a spark job explicitly. Instead, I am using
the Spark library functionality embedded in my web service as shown in the
code I included in the previous email. So, effectively Spark SQL runs in
the web service's JVM. Therefore, --driver-memory option would not (and
If you are running Spark with local[*] as master, there will be a single
process whose memory will be controlled by --driver-memory command line
option to spark submit. Check
http://spark.apache.org/docs/latest/configuration.html
spark.driver.memory 1g Amount of memory to use for the driver
https://stackoverflow.com/questions/26562033/how-to-set-apache-spark-executor-memory
Regards,
Vaquar khan
On Mon, Nov 13, 2017 at 6:22 PM, Alec Swan wrote:
> Hello,
>
> I am using the Spark library to convert JSON/Snappy files to ORC/ZLIB
> format. Effectively, my Java
Hi Joel,
Here are the relevant snippets of my code and an OOM error thrown
in frameWriter.save(..). Surprisingly, the heap dump is pretty small ~60MB
even though I am running with -Xmx10G and 4G executor and driver memory as
shown below.
SparkConf sparkConf = new SparkConf()
Have you tried increasing driver, exec mem (gc overhead too if required)?
your code snippet and stack trace will be helpful.
On Mon, Nov 13, 2017 at 7:23 PM Alec Swan wrote:
> Hello,
>
> I am using the Spark library to convert JSON/Snappy files to ORC/ZLIB
> format.
Hello,
I am using the Spark library to convert JSON/Snappy files to ORC/ZLIB
format. Effectively, my Java service starts up an embedded Spark cluster
(master=local[*]) and uses Spark SQL to convert JSON to ORC. However, I
keep getting OOM errors with large (~1GB) files.
I've tried different ways
10 matches
Mail list logo