@Chris destPaths is just a Seq[String] that holds the paths we wish to copy
the output to. Even if the collection only holds one path, it does not
work. However, the job runs fine if we don’t copy the output. The pipeline
succeeds in read input -> perform logic as dataframe -> write output. As
for
Does it work for just a single path input and single output?
Is the destPath a collection that is sitting on the driver?
On Sun, 22 Dec 2019, 7:59 pm Ruijing Li, wrote:
> I was experimenting and found something interesting. I have executor OOM
> even if I don’t write to remote clusters. So it
I was experimenting and found something interesting. I have executor OOM
even if I don’t write to remote clusters. So it is purely a dataframe read
and write issue
—
To recap, I have an ETL data pipeline that does some logic, repartitions to
reduce the amount of files written,