Yes, verified on the cluster with 5 executors.
--
Cheers,
-z
On Fri, 29 May 2020 11:16:12 -0700
Something Something wrote:
> Did you try this on the Cluster? Note: This works just fine under 'Local'
> mode.
>
> On Thu, May 28, 2020 at 9:12 PM ZHANG Wei wrote:
>
> > I can't reproduce the
Hello,
My hard drive has about 80 GB of space left on it, and the RAM is about
12GB.
I am not sure the size of the .tsv file, but it will most likely be around
30 GB.
Thanks,
Wilbert Seoane
On Fri, May 29, 2020 at 5:03 PM Anwar AliKhan
wrote:
> What is the size of your .tsv file sir ?
Hi All,
I use the following to read a set of parquet file paths when files are
scattered across many many partitions.
paths = ['p1', 'p2', ... 'p1']
df = spark.read.parquet(*paths)
Above method feels like is sequentially reading those files & not really
parallelizing the read operation, is