Re: Using Spark Accumulators with Structured Streaming

2020-06-01 Thread ZHANG Wei
Yes, verified on the cluster with 5 executors. -- Cheers, -z On Fri, 29 May 2020 11:16:12 -0700 Something Something wrote: > Did you try this on the Cluster? Note: This works just fine under 'Local' > mode. > > On Thu, May 28, 2020 at 9:12 PM ZHANG Wei wrote: > > > I can't reproduce the

Re: Spark Security

2020-06-01 Thread Wilbert S.
Hello, My hard drive has about 80 GB of space left on it, and the RAM is about 12GB. I am not sure the size of the .tsv file, but it will most likely be around 30 GB. Thanks, Wilbert Seoane On Fri, May 29, 2020 at 5:03 PM Anwar AliKhan wrote: > What is the size of your .tsv file sir ?

[PySpark 2.3+] Reading parquet entire path vs a set of file paths

2020-06-01 Thread Rishi Shah
Hi All, I use the following to read a set of parquet file paths when files are scattered across many many partitions. paths = ['p1', 'p2', ... 'p1'] df = spark.read.parquet(*paths) Above method feels like is sequentially reading those files & not really parallelizing the read operation, is