date:20200721

spark job delay when starting

2020-07-21 Thread Bulldog20630405

when running spark jobs we find when running the following command: top -H -i -p showed that a single thread labeled "map-output-disp" was running at 99.7% for a majority of the delay period. this delay gets progressively worse with the increase in partition count. it seems the delay comes from

Spark Structured Streaming join data results in missing result set

2020-07-21 Thread dong524dong

We are using Spark structured streaming to make the join association between two data streams. Use Kafka to collect data in the earliest way (the sender sends data cyclically, sending only one data message at a time). The following are our kafka configuration parameters: def

Re: java.lang.ClassNotFoundException for s3a comitter

2020-07-21 Thread Gourav Sengupta

Hi, I am not sure about this but is there any requirement to use S3a at all ? Regards, Gourav On Tue, Jul 21, 2020 at 12:07 PM Steve Loughran wrote: > > > On Tue, 7 Jul 2020 at 03:42, Stephen Coy > wrote: > >> Hi Steve, >> >> While I understand your point regarding the mixing of Hadoop

Re: java.lang.ClassNotFoundException for s3a comitter

2020-07-21 Thread Steve Loughran

On Tue, 7 Jul 2020 at 03:42, Stephen Coy wrote: > Hi Steve, > > While I understand your point regarding the mixing of Hadoop jars, this > does not address the java.lang.ClassNotFoundException. > > Prebuilt Apache Spark 3.0 builds are only available for Hadoop 2.7 or > Hadoop 3.2. Not Hadoop 3.1.

Re: Future timeout

2020-07-21 Thread Dhaval Patel

Just a suggestion, Looks like its timing out when you are broadcasting big object. Generally its not advisable to do so, if you can get rid of that, program may behave consistent. On Tue, Jul 21, 2020 at 3:17 AM Piyush Acharya wrote: > spark.conf.set("spark.sql.broadcastTimeout", ##) > >

Refreshing static data with streaming data at regular Intervals

2020-07-21 Thread Debabrata Ghosh

Hi All, We have a Static DataFrame with as follows. -- id|time_stamp| -- |1|1540527851| |2|1540525602| |3|1530529187| |4|1520529185| |5|1510529182| |6|1578945709| -- We also have live stream of events, a Streaming DataFrame which contains id and updated

Re: Need your help!! (URGENT Code works fine when submitted as java main but part of data missing when running as Spark-Submit)

2020-07-21 Thread Pasha Finkelshteyn

Hi Rachana, Couls you please provide us with mre details: Minimal repro Spark version Java version Scala version On 20/07/21 08:27AM, Rachana Srivastava wrote: > I am unable to identify the root cause of why my code is missing data when I > run as spark-submit but the code works fine when I

Need your help!! (URGENT Code works fine when submitted as java main but part of data missing when running as Spark-Submit)

2020-07-21 Thread Rachana Srivastava

I am unable to identify the root cause of why my code is missing data when I run as spark-submit but the code works fine when I run as java main Any idea

Re: Using pyspark with Spark 2.4.3 a MultiLayerPerceptron model givens inconsistent outputs if a large amount of data is fed into it and at least one of the model outputs is fed to a Python UDF.

2020-07-21 Thread Ben Smith

I can also recreate with the very latest master branch (3.1.0-SNAPSHOT) if I compile it locally -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Future timeout

2020-07-21 Thread Piyush Acharya

spark.conf.set("spark.sql.broadcastTimeout", ##) On Mon, Jul 20, 2020 at 11:51 PM Amit Sharma wrote: > Please help on this. > > > Thanks > Amit > > On Fri, Jul 17, 2020 at 9:10 AM Amit Sharma wrote: > >> Hi, sometimes my spark streaming job throw this exception Futures timed >> out after

spark job delay when starting

Spark Structured Streaming join data results in missing result set

Re: java.lang.ClassNotFoundException for s3a comitter

Re: java.lang.ClassNotFoundException for s3a comitter

Re: Future timeout

Refreshing static data with streaming data at regular Intervals

Re: Need your help!! (URGENT Code works fine when submitted as java main but part of data missing when running as Spark-Submit)

Need your help!! (URGENT Code works fine when submitted as java main but part of data missing when running as Spark-Submit)

Re: Using pyspark with Spark 2.4.3 a MultiLayerPerceptron model givens inconsistent outputs if a large amount of data is fed into it and at least one of the model outputs is fed to a Python UDF.

Re: Future timeout

10 matches

Site Navigation

Mail list logo

Footer information