Re: What is Spark context cleaner in structured streaming

2019-05-02 Thread kanchan tewary
Hi Akshay, You may refer the following: https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-service-contextcleaner.html Thanks, Kanchan Data Engineer, IBM On Thu, May 2, 2019 at 5:21 PM Akshay Bhardwaj < akshay.bhardwaj1...@gmail.com> wrote: > Hi All, > > I am using Spark

Error while using spark-avro module in pyspark 2.4

2019-05-01 Thread kanchan tewary
Hi All, Greetings! I am facing an error while trying to write my dataframe into avro format, using spark-avro package ( https://spark.apache.org/docs/latest/sql-data-sources-avro.html#deploying). I have added the package while running spark-submit as follows. Do I need to add any additional

Handle empty partitions in pyspark

2019-04-24 Thread kanchan tewary
Hi All, I have a situation where the rdd is having some empty partitions, which I would like to identify and handle while applying mapPartitions or similar functions. Is there a way to do this in pyspark? The method isEmpty works on the rdd only and can not be applied. Much appreciated. Code

Re: toDebugString - RDD Logical Plan

2019-04-23 Thread kanchan tewary
gt; About the other question, you may use `getNumberPartitions`. > > On Sat, Apr 20, 2019 at 2:40 PM kanchan tewary > wrote: > >> Dear All, >> >> Greetings! >> >> I am new to Apache Spark and working on RDDs using pyspark. I am trying >> to und

toDebugString - RDD Logical Plan

2019-04-20 Thread kanchan tewary
Dear All, Greetings! I am new to Apache Spark and working on RDDs using pyspark. I am trying to understand the logical plan provided by toDebugString function, but I find two issues a) the output is not formatted when I print the result b) I do not see number of partitions shown. Can anyone