date:20190131

Avoiding collect but use foreach

2019-01-31 Thread Aakash Basu

Hi, This: *to_list = [list(row) for row in df.collect()]* Gives: [[5, 1, 1, 1, 2, 1, 3, 1, 1, 0], [5, 4, 4, 5, 7, 10, 3, 2, 1, 0], [3, 1, 1, 1, 2, 2, 3, 1, 1, 0], [6, 8, 8, 1, 3, 4, 3, 7, 1, 0], [4, 1, 1, 3, 2, 1, 3, 1, 1, 0]] I want to avoid collect operation, but still convert the

Re: Aws

2019-01-31 Thread Hiroyuki Nagata

Hi, Pedro I also start using AWS EMR, with Spark 2.4.0. I'm seeking methods for performance tuning. Do you configure dynamic allocation ? FYI: https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation I've not tested it yet. I guess spark-submit needs to specify

Fwd: Spark driver pod scheduling fails on auto scaled node

2019-01-31 Thread prudhvi ch

-- Forwarded message - From: prudhvi ch Date: Thu, Jan 31, 2019, 5:54 PM Subject: Fwd: Spark driver pod scheduling fails on auto scaled node To: -- Forwarded message - From: Prudhvi Chennuru (CONT) Date: Thu, Jan 31, 2019, 5:01 PM Subject: Fwd: Spark driver

Fwd: Spark driver pod scheduling fails on auto scaled node

2019-01-31 Thread prudhvi ch

-- Forwarded message - From: Prudhvi Chennuru (CONT) Date: Thu, Jan 31, 2019, 5:01 PM Subject: Fwd: Spark driver pod scheduling fails on auto scaled node To: Hi, I am using kubernetes *v 1.11.5* and spark *v 2.3.0*, *calico(daemonset)* as overlay network plugin and

Fwd: Spark driver pod scheduling fails on auto scaled node

2019-01-31 Thread Prudhvi Chennuru (CONT)

Hi, I am using kubernetes *v 1.11.5* and spark *v 2.3.0*, *calico(daemonset)* as overlay network plugin and kubernetes *cluster auto scalar* feature to autoscale cluster if needed. When the cluster is auto scaling calico pods are scheduling on those nodes but they are not ready for 40 to 50

Aws

2019-01-31 Thread Pedro Tuero

Hi guys, I use to run spark jobs in Aws emr. Recently I switch from aws emr label 5.16 to 5.20 (which use Spark 2.4.0). I've noticed that a lot of steps are taking longer than before. I think it is related to the automatic configuration of cores by executor. In version 5.16, some executors toke

unsubscribe

2019-01-31 Thread Daniel O' Shaughnessy

Re: CVE-2018-11760: Apache Spark local privilege escalation vulnerability

2019-01-31 Thread Imran Rashid

I received some questions about what the exact change was which fixed the issue, and the PMC decided to post info in jira to make it easier for the community to track. The relevant details are all on https://issues.apache.org/jira/browse/SPARK-26802 On Mon, Jan 28, 2019 at 1:08 PM Imran Rashid

Exactly-Once delivery with Structured Streaming and Kafka

2019-01-31 Thread William Briggs

I noticed that Spark 2.4.0 implemented support for reading only committed messages in Kafka, and was excited. Are there currently any plans to update the Kafka output sink to support exactly-once delivery? Thanks, Will

Please stop asking to unsubscribe

2019-01-31 Thread Andrew Melo

The correct way to unsubscribe is to mail user-unsubscr...@spark.apache.org Just mailing the list with "unsubscribe" doesn't actually do anything... Thanks Andrew - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

unsubscribe

2019-01-31 Thread Ahmed Abdulla

unsubscribe

Driver OOM does not shut down Spark Context

2019-01-31 Thread Bryan Jeffrey

Hello. I am running Spark 2.3.0 via Yarn. I have a Spark Streaming application where the driver threw an uncaught out of memory exception: 19/01/31 13:00:59 ERROR Utils: Uncaught exception in thread element-tracking-store-worker java.lang.OutOfMemoryError: GC overhead limit exceeded at

[no subject]

2019-01-31 Thread Ahmed Abdulla

unsubscribe

Survey on Data Stream Processing

2019-01-31 Thread Alexandre Strapacao Guedes Vianna

Hello People, I'm conducting PhD research on applications using data stream processing, that aims to investigate practices, tools and experiences with development, testing and validation of data stream software. We’ll be grateful if you share your expertise by answering a questionnaire. (it takes