when running spark jobs we find when running the following command:
top -H -i -p
showed that a single thread labeled "map-output-disp" was running at 99.7%
for a majority of the delay period. this delay gets progressively worse
with the increase in partition count.
it seems the delay comes from
We are using Spark structured streaming to make the join association between
two data streams. Use Kafka to collect data in the earliest way (the sender
sends data cyclically, sending only one data message at a time).
The following are our kafka configuration parameters:
def
Hi,
I am not sure about this but is there any requirement to use S3a at all ?
Regards,
Gourav
On Tue, Jul 21, 2020 at 12:07 PM Steve Loughran
wrote:
>
>
> On Tue, 7 Jul 2020 at 03:42, Stephen Coy
> wrote:
>
>> Hi Steve,
>>
>> While I understand your point regarding the mixing of Hadoop
On Tue, 7 Jul 2020 at 03:42, Stephen Coy
wrote:
> Hi Steve,
>
> While I understand your point regarding the mixing of Hadoop jars, this
> does not address the java.lang.ClassNotFoundException.
>
> Prebuilt Apache Spark 3.0 builds are only available for Hadoop 2.7 or
> Hadoop 3.2. Not Hadoop 3.1.
Just a suggestion,
Looks like its timing out when you are broadcasting big object. Generally
its not advisable to do so, if you can get rid of that, program may behave
consistent.
On Tue, Jul 21, 2020 at 3:17 AM Piyush Acharya
wrote:
> spark.conf.set("spark.sql.broadcastTimeout", ##)
>
>
Hi All,
We have a Static DataFrame with as follows.
--
id|time_stamp|
--
|1|1540527851|
|2|1540525602|
|3|1530529187|
|4|1520529185|
|5|1510529182|
|6|1578945709|
--
We also have live stream of events, a Streaming DataFrame which contains id
and updated
Hi Rachana,
Couls you please provide us with mre details:
Minimal repro
Spark version
Java version
Scala version
On 20/07/21 08:27AM, Rachana Srivastava wrote:
> I am unable to identify the root cause of why my code is missing data when I
> run as spark-submit but the code works fine when I
I am unable to identify the root cause of why my code is missing data when I
run as spark-submit but the code works fine when I run as java mainĀ Any
idea
I can also recreate with the very latest master branch (3.1.0-SNAPSHOT) if I
compile it locally
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
spark.conf.set("spark.sql.broadcastTimeout", ##)
On Mon, Jul 20, 2020 at 11:51 PM Amit Sharma wrote:
> Please help on this.
>
>
> Thanks
> Amit
>
> On Fri, Jul 17, 2020 at 9:10 AM Amit Sharma wrote:
>
>> Hi, sometimes my spark streaming job throw this exception Futures timed
>> out after
10 matches
Mail list logo