Re: Future timeout

2020-07-21 Thread Piyush Acharya
spark.conf.set("spark.sql.broadcastTimeout", ##) On Mon, Jul 20, 2020 at 11:51 PM Amit Sharma wrote: > Please help on this. > > > Thanks > Amit > > On Fri, Jul 17, 2020 at 9:10 AM Amit Sharma wrote: > >> Hi, sometimes my spark streaming job throw this exception Futures timed >> out after

Re: schema changes of custom data source in persistent tables DataSourceV1

2020-07-20 Thread Piyush Acharya
Do you want to merge the schema when incoming data is changed? spark.conf.set("spark.sql.parquet.mergeSchema", "true") https://kontext.tech/column/spark/381/schema-merging-evolution-with-parquet-in-spark-and-hive On Mon, Jul 20, 2020 at 3:48 PM fansparker wrote: > Does anybody know if there

Re: Spark Structured Streaming keep on consuming usercache

2020-07-20 Thread Piyush Acharya
Can you try calling batchDF.unpersist() once the work is done in loop? On Mon, Jul 20, 2020 at 3:38 PM Yong Yuan wrote: > It seems the following structured streaming code keeps on consuming > usercache until all disk space are occupied. > > val monitoring_stream = >

Re: Spark UI

2020-07-19 Thread Piyush Acharya
https://www.youtube.com/watch?v=YgQgJceojJY (Xiao's video ) On Mon, Jul 20, 2020 at 8:03 AM Xiao Li wrote: > https://spark.apache.org/docs/3.0.0/web-ui.html is the official doc > for Spark UI. > > Xiao > > On Sun, Jul 19, 2020 at 1:38 PM venkatadevarapu > wrote: > >> Hi, >> >> I'm looking

Re: Schedule/Orchestrate spark structured streaming job

2020-07-19 Thread Piyush Acharya
Some of the options of workflows https://medium.com/@xunnan.xu/workflow-processing-engine-overview-2018-airflow-vs-azkaban-vs-conductor-vs-oozie-vs-amazon-step-90affc54d53b Streaming is a kind of infinitely running job, so, you just have to trigger it only once unless you re not using it with

Re: Overwrite Mode not Working Correctly in spark 3.0.0

2020-07-19 Thread Piyush Acharya
Can you please send the error message? it would ve very helpful to get to the root cause. On Sun, Jul 19, 2020 at 10:57 PM anbutech wrote: > Hi Team, > > I'm facing weird behavior in the pyspark dataframe(databricks delta spark > 3.0.0 supported) > > I have tried the below two options to write

Re: OOM while processing read/write to S3 using Spark Structured Streaming

2020-07-19 Thread Piyush Acharya
Please try with maxBytesPerTrigger option, probably files are big enough to crash the JVM. Please give some info on Executors and file info ( size etc) Regards, ..Piyush On Sun, Jul 19, 2020 at 3:29 PM Rachana Srivastava wrote: > *Issue:* I am trying to process 5000+ files of gzipped json file

subscribe

2020-07-18 Thread Piyush Acharya