Re: Out of memory HDFS Read and Write

2019-12-22 Thread Ruijing Li
@Chris destPaths is just a Seq[String] that holds the paths we wish to copy the output to. Even if the collection only holds one path, it does not work. However, the job runs fine if we don’t copy the output. The pipeline succeeds in read input -> perform logic as dataframe -> write output. As for

Re: Out of memory HDFS Read and Write

2019-12-22 Thread Chris Teoh
Does it work for just a single path input and single output? Is the destPath a collection that is sitting on the driver? On Sun, 22 Dec 2019, 7:59 pm Ruijing Li, wrote: > I was experimenting and found something interesting. I have executor OOM > even if I don’t write to remote clusters. So it

Re: Out of memory HDFS Read and Write

2019-12-22 Thread Ruijing Li
I was experimenting and found something interesting. I have executor OOM even if I don’t write to remote clusters. So it is purely a dataframe read and write issue — To recap, I have an ETL data pipeline that does some logic, repartitions to reduce the amount of files written,