Re: Running Spark on EMR

2017-01-15 Thread Darren Govoni
So what was the answer? Sent from my Verizon, Samsung Galaxy smartphone Original message From: Andrew Holway Date: 1/15/17 11:37 AM (GMT-05:00) To: Marco Mistroni Cc: Neil Jonkers , User

Spark in docker over EC2

2017-01-10 Thread Darren Govoni
Anyone got a good guide for getting spark master to talk to remote workers inside dockers? I followed the tips found by searching but doesn't work still. Spark 1.6.2. I exposed all the ports and tried to set local IP inside container to the host IP but spark complains it can't bind ui ports.

RE: AMQP extension for Apache Spark Streaming (messaging/IoT)

2016-07-03 Thread Darren Govoni
This is fantastic news. Sent from my Verizon 4G LTE smartphone Original message From: Paolo Patierno Date: 7/3/16 4:41 AM (GMT-05:00) To: user@spark.apache.org Subject: AMQP extension for Apache Spark Streaming (messaging/IoT) Hi all, I'm

Re: Spark + Kafka processing trouble

2016-05-30 Thread Darren Govoni
from my Verizon Wireless 4G LTE smartphone Original message From: Malcolm Lockyer <malcolm.lock...@hapara.com> Date: 05/30/2016 10:40 PM (GMT-05:00) To: user@spark.apache.org Subject: Re: Spark + Kafka processing trouble On Tue, May 31, 2016 at 1:56 PM, Darren Govon

RE: Spark + Kafka processing trouble

2016-05-30 Thread Darren Govoni
So you are calling a SQL query (to a single database) within a spark operation distributed across your workers?  Sent from my Verizon Wireless 4G LTE smartphone Original message From: Malcolm Lockyer Date: 05/30/2016 9:45 PM (GMT-05:00)

Submit python egg?

2016-05-18 Thread Darren Govoni
Hi  I have a python egg with a __main__.py in it. I am able to execute the egg by itself fine. Is there a way to just submit the egg to spark and have it run? It seems an external .py script is needed which would be unfortunate if true. Thanks Sent from my Verizon Wireless 4G LTE

Re: Does pyspark still lag far behind the Scala API in terms of features

2016-03-02 Thread Darren Govoni
te: 03/02/2016 5:43 PM (GMT-05:00) To: Darren Govoni <dar...@ontrenet.com>, Jules Damji <dmat...@comcast.net>, Joshua Sorrell <jsor...@gmail.com> Cc: user@spark.apache.org Subject: Re: Does pyspark still lag far behind the Scala API in terms of features Plenty of people g

Re: Does pyspark still lag far behind the Scala API in terms of features

2016-03-02 Thread Darren Govoni
Dataframes are essentially structured tables with schemas. So where does the non typed data sit before it becomes structured if not in a traditional RDD? For us almost all the processing comes before there is structure to it. Sent from my Verizon Wireless 4G LTE smartphone

RE: How could I do this algorithm in Spark?

2016-02-25 Thread Darren Govoni
This might be hard to do. One generalization of this problem is  https://en.m.wikipedia.org/wiki/Longest_path_problem Given a node (e.g. A), find longest path. All interior relations are transitive and can be inferred. But finding a distributed spark way of doing it in P time would be

RE: Unusually large deserialisation time

2016-02-16 Thread Darren Govoni
I meant to write 'last task in stage'. Sent from my Verizon Wireless 4G LTE smartphone Original message From: Darren Govoni <dar...@ontrenet.com> Date: 02/16/2016 6:55 AM (GMT-05:00) To: Abhishek Modi <abshkm...@gmail.com>, user@spark.apache.org

RE: Unusually large deserialisation time

2016-02-16 Thread Darren Govoni
I think this is part of the bigger issue of serious deadlock conditions occurring in spark many of us have posted on. Would the task in question be the past task of a stage by chance? Sent from my Verizon Wireless 4G LTE smartphone Original message From: Abhishek Modi

Re: Launching EC2 instances with Spark compiled for Scala 2.11

2016-01-25 Thread Darren Govoni
Why not deploy it. Then build a custom distribution with Scala 2.11 and just overlay it. Sent from my Verizon Wireless 4G LTE smartphone Original message From: Nuno Santos Date: 01/25/2016 7:38 AM (GMT-05:00) To: user@spark.apache.org Subject:

Re: 10hrs of Scheduler Delay

2016-01-25 Thread Darren Govoni
: "Sanders, Isaac B" <sande...@rose-hulman.edu> Date: 01/25/2016 8:59 AM (GMT-05:00) To: Ted Yu <yuzhih...@gmail.com> Cc: Darren Govoni <dar...@ontrenet.com>, Renu Yadav <yren...@gmail.com>, Muthu Jayakumar <bablo...@gmail.com>, user@spark.apache.or

Re: 10hrs of Scheduler Delay

2016-01-25 Thread Darren Govoni
4 PM (GMT-05:00) To: Renu Yadav <yren...@gmail.com> Cc: Darren Govoni <dar...@ontrenet.com>, Muthu Jayakumar <bablo...@gmail.com>, Ted Yu <yuzhih...@gmail.com>, user@spark.apache.org Subject: Re: 10hrs of Scheduler Delay I am not getting anywhere with any of the su

Re: 10hrs of Scheduler Delay

2016-01-22 Thread Darren Govoni
2/2016 3:50 PM (GMT-05:00) To: Darren Govoni <dar...@ontrenet.com>, "Sanders, Isaac B" <sande...@rose-hulman.edu>, Ted Yu <yuzhih...@gmail.com> Cc: user@spark.apache.org Subject: Re: 10hrs of Scheduler Delay Does increasing the number of partition helps? You cou

Re: 10hrs of Scheduler Delay

2016-01-22 Thread Darren Govoni
Me too. I had to shrink my dataset to get it to work. For us at least Spark seems to have scaling issues. Sent from my Verizon Wireless 4G LTE smartphone Original message From: "Sanders, Isaac B" Date: 01/21/2016 11:18 PM (GMT-05:00) To:

Re: 10hrs of Scheduler Delay

2016-01-21 Thread Darren Govoni
I've experienced this same problem. Always the last stage hangs. Indeterminant. No errors in logs. I run spark 1.5.2. Can't find an explanation. But it's definitely a showstopper. Sent from my Verizon Wireless 4G LTE smartphone Original message From: Ted Yu

Re: Docker/Mesos with Spark

2016-01-19 Thread Darren Govoni
I also would be interested in some best practice for making this work. Where will the writeup be posted? On mesosphere website? Sent from my Verizon Wireless 4G LTE smartphone Original message From: Sathish Kumaran Vairavelu Date: 01/19/2016

Re: rdd.foreach return value

2016-01-18 Thread Darren Govoni
What's the rationale behind that? It certainly limits the kind of flow logic we can do in one statement. Sent from my Verizon Wireless 4G LTE smartphone Original message From: David Russell Date: 01/18/2016 10:44 PM (GMT-05:00) To:

Task hang problem

2015-12-29 Thread Darren Govoni
Hi,   I've had this nagging problem where a task will hang and the entire job hangs. Using pyspark. Spark 1.5.1 The job output looks like this, and hangs after the last task: .. 15/12/29 17:00:38 INFO BlockManagerInfo: Added broadcast_0_piece0 in

Re: Task hang problem

2015-12-29 Thread Darren Govoni
here's executor trace. Thread 58: Executor task launch worker-3 (RUNNABLE) java.net.SocketInputStream.socketRead0(Native Method) java.net.SocketInputStream.read(SocketInputStream.java:152)

Re: DataFrame Vs RDDs ... Which one to use When ?

2015-12-28 Thread Darren Govoni
I'll throw a thought in here. Dataframes are nice if your data is uniform and clean with consistent schema. However in many big data problems this is seldom the case.  Sent from my Verizon Wireless 4G LTE smartphone Original message From: Chris Fregly

Re: Scala VS Java VS Python

2015-12-16 Thread Darren Govoni
I use python too. I'm actually surprises it's not the primary language since it is by far more used in data science than java snd Scala combined. If I had a second choice of script language for general apps I'd want groovy over scala. Sent from my Verizon Wireless 4G LTE smartphone

Re: Pyspark submitted app just hangs

2015-12-02 Thread Darren Govoni
to me doesn't give me a direction to look without the actual logs from $SPARK_HOME or the stderr from the worker UI. Just imho maybe someone know what this means but it seems like it could be caused by a lot of things. On 12/2/2015 6:48 PM, Darren Govoni wrote: Hi all, Wondering if someone ca

Pyspark submitted app just hangs

2015-12-02 Thread Darren Govoni
Hi all, Wondering if someone can provide some insight why this pyspark app is just hanging. Here is output. ... 15/12/03 01:47:05 INFO TaskSetManager: Starting task 21.0 in stage 0.0 (TID 21, 10.65.143.174, PROCESS_LOCAL, 1794787 bytes) 15/12/03 01:47:05 INFO TaskSetManager: Starting task

Python Kafka support?

2015-11-10 Thread Darren Govoni
Hi, I read on this page http://spark.apache.org/docs/latest/streaming-kafka-integration.html about python support for "receiverless" kafka integration (Approach 2) but it says its incomplete as of version 1.4. Has this been updated in version 1.5.1? Darren