Hi,
This:
*to_list = [list(row) for row in df.collect()]*
Gives:
[[5, 1, 1, 1, 2, 1, 3, 1, 1, 0], [5, 4, 4, 5, 7, 10, 3, 2, 1, 0], [3, 1, 1,
1, 2, 2, 3, 1, 1, 0], [6, 8, 8, 1, 3, 4, 3, 7, 1, 0], [4, 1, 1, 3, 2, 1, 3,
1, 1, 0]]
I want to avoid collect operation, but still convert the
Hi, Pedro
I also start using AWS EMR, with Spark 2.4.0. I'm seeking methods for
performance tuning.
Do you configure dynamic allocation ?
FYI:
https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation
I've not tested it yet. I guess spark-submit needs to specify
-- Forwarded message -
From: prudhvi ch
Date: Thu, Jan 31, 2019, 5:54 PM
Subject: Fwd: Spark driver pod scheduling fails on auto scaled node
To:
-- Forwarded message -
From: Prudhvi Chennuru (CONT)
Date: Thu, Jan 31, 2019, 5:01 PM
Subject: Fwd: Spark driver
-- Forwarded message -
From: Prudhvi Chennuru (CONT)
Date: Thu, Jan 31, 2019, 5:01 PM
Subject: Fwd: Spark driver pod scheduling fails on auto scaled node
To:
Hi,
I am using kubernetes *v 1.11.5* and spark *v 2.3.0*,
*calico(daemonset)* as overlay network plugin and
Hi,
I am using kubernetes *v 1.11.5* and spark *v 2.3.0*,
*calico(daemonset)* as overlay network plugin and kubernetes *cluster auto
scalar* feature to autoscale cluster if needed. When the cluster is auto
scaling calico pods are scheduling on those nodes but they are not ready
for 40 to 50
Hi guys,
I use to run spark jobs in Aws emr.
Recently I switch from aws emr label 5.16 to 5.20 (which use Spark 2.4.0).
I've noticed that a lot of steps are taking longer than before.
I think it is related to the automatic configuration of cores by executor.
In version 5.16, some executors toke
I received some questions about what the exact change was which fixed the
issue, and the PMC decided to post info in jira to make it easier for the
community to track. The relevant details are all on
https://issues.apache.org/jira/browse/SPARK-26802
On Mon, Jan 28, 2019 at 1:08 PM Imran Rashid
I noticed that Spark 2.4.0 implemented support for reading only committed
messages in Kafka, and was excited. Are there currently any plans to update
the Kafka output sink to support exactly-once delivery?
Thanks,
Will
The correct way to unsubscribe is to mail
user-unsubscr...@spark.apache.org
Just mailing the list with "unsubscribe" doesn't actually do anything...
Thanks
Andrew
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
unsubscribe
Hello.
I am running Spark 2.3.0 via Yarn. I have a Spark Streaming application
where the driver threw an uncaught out of memory exception:
19/01/31 13:00:59 ERROR Utils: Uncaught exception in thread
element-tracking-store-worker
java.lang.OutOfMemoryError: GC overhead limit exceeded
at
unsubscribe
Hello People,
I'm conducting PhD research on applications using data stream processing,
that aims to investigate practices, tools and experiences with development,
testing and validation of data stream software.
We’ll be grateful if you share your expertise by answering a questionnaire.
(it takes
unsubscribe
unsubscribe
Unsubscribe
Unsubscribe
18 matches
Mail list logo