Re: Pyspark Error

2014-11-18 Thread Shannon Quinn
My best guess would be a networking issue--it looks like the Python socket library isn't able to connect to whatever hostname you're providing Spark in the configuration. On 11/18/14 9:10 AM, amin mohebbi wrote: Hi there, *I have already downloaded Pre-built spark-1.1.0, I want to run

Iterative transformations over RDD crashes in phantom reduce

2014-11-18 Thread Shannon Quinn
Hi all, This is somewhat related to my previous question ( http://apache-spark-user-list.1001560.n3.nabble.com/Iterative-changes-to-RDD-and-broadcast-variables-tt19042.html , for additional context) but for all practical purposes this is its own issue. As in my previous question, I'm making

Re: Iterative transformations over RDD crashes in phantom reduce

2014-11-18 Thread Shannon Quinn
with INDEX == 1 when there clearly isn't another reduce call at line 7? On 11/18/14 1:58 PM, Shannon Quinn wrote: Hi all, This is somewhat related to my previous question ( http://apache-spark-user-list.1001560.n3.nabble.com/Iterative-changes-to-RDD-and-broadcast-variables-tt19042.html

Re: Iterative transformations over RDD crashes in phantom reduce

2014-11-18 Thread Shannon Quinn
) Works as expected now, and I understand why it was failing before: Spark was trying to recompute the RDD but consequently it was invoked with index == 1. On 11/18/14 2:02 PM, Shannon Quinn wrote: To clarify about what, precisely, is impossible: the crash happens with INDEX == 1 in func2

Iterative changes to RDD and broadcast variables

2014-11-16 Thread Shannon Quinn
Hi all, I'm iterating over an RDD (representing a distributed matrix...have to roll my own in Python) and making changes to different submatrices at each iteration. The loop structure looks something like: for i in range(x): VAR = sc.broadcast(i) rdd.map(func1).reduceByKey(func2) M =

Re: Dividing tasks among Spark workers

2014-07-18 Thread Shannon Quinn
The default # of partitions is the # of cores, correct? On 7/18/14, 10:53 AM, Yanbo Liang wrote: check how many partitions in your program. If only one, change it to more partitions will make the execution parallel. 2014-07-18 20:57 GMT+08:00 Madhura das.madhur...@gmail.com

Re: Python: saving/reloading RDD

2014-07-18 Thread Shannon Quinn
+1, had to learn this the hard way when some of my objects were written as pointers, rather than translated correctly to strings :) On 7/18/14, 11:52 AM, Xiangrui Meng wrote: You can save RDDs to text files using RDD.saveAsTextFile and load it back using sc.textFile. But make sure the record

Job aborted due to stage failure: TID x failed for unknown reasons

2014-07-18 Thread Shannon Quinn
Hi all, I'm dealing with some strange error messages that I *think* comes down to a memory issue, but I'm having a hard time pinning it down and could use some guidance from the experts. I have a 2-machine Spark (1.0.1) cluster. Both machines have 8 cores; one has 16GB memory, the other

Re: Spark standalone network configuration problems

2014-06-27 Thread Shannon Quinn
). Thanks Best Regards On Fri, Jun 27, 2014 at 5:47 AM, Shannon Quinn squ...@gatech.edu mailto:squ...@gatech.edu wrote: In the interest of completeness, this is how I invoke spark: [on master] sbin/start-all.sh spark-submit --py-files extra.py main.py iPhone'd

Re: Spark standalone network configuration problems

2014-06-27 Thread Shannon Quinn
No joy, unfortunately. Same issue; see my previous email--still crashes with address already in use. On 6/27/14, 1:54 AM, sujeetv wrote: Try to explicitly set set the spark.driver.host property to the master's IP. Sujeet -- View this message in context:

Re: Spark standalone network configuration problems

2014-06-27 Thread Shannon Quinn
Sorry, master spark URL in the web UI is *spark://192.168.1.101:5060*, exactly as configured. On 6/27/14, 9:07 AM, Shannon Quinn wrote: I put the settings as you specified in spark-env.sh for the master. When I run start-all.sh, the web UI shows both the worker on the master (machine1

Re: numpy + pyspark

2014-06-27 Thread Shannon Quinn
Would deploying virtualenv on each directory on the cluster be viable? The dependencies would get tricky but I think this is the sort of situation it's built for. On 6/27/14, 11:06 AM, Avishek Saha wrote: I too felt the same Nick but I don't have root privileges on the cluster, unfortunately.

Re: numpy + pyspark

2014-06-27 Thread Shannon Quinn
On Fri, Jun 27, 2014 at 5:08 PM, Shannon Quinn squ...@gatech.edu mailto:squ...@gatech.edu wrote: Would deploying virtualenv on each directory on the cluster be viable? The dependencies would get tricky but I think this is the sort of situation it's built for. On 6/27/14, 11

Re: Spark standalone network configuration problems

2014-06-27 Thread Shannon Quinn
For some reason, commenting out spark.driver.host and spark.driver.port fixed something...and broke something else (or at least revealed another problem). For reference, the only lines I have in my spark-defaults.conf now: spark.app.name myProg spark.master

Re: Spark standalone network configuration problems

2014-06-27 Thread Shannon Quinn
if the master machine can route to 130.49.226.148 Sujeet On Fri, Jun 27, 2014 at 12:04 PM, Shannon Quinn squ...@gatech.edu mailto:squ...@gatech.edu wrote: For some reason, commenting out spark.driver.host and spark.driver.port fixed something...and broke something else (or at least

Re: Spark standalone network configuration problems

2014-06-27 Thread Shannon Quinn
check if the master machine can route to 130.49.226.148 Sujeet On Fri, Jun 27, 2014 at 12:04 PM, Shannon Quinn squ...@gatech.edu mailto:squ...@gatech.edu wrote: For some reason, commenting out spark.driver.host and spark.driver.port fixed something...and broke something else

Re: Spark standalone network configuration problems

2014-06-26 Thread Shannon Quinn
: Can you paste your spark-env.sh file? Thanks Best Regards On Thu, Jun 26, 2014 at 7:01 PM, Shannon Quinn squ...@gatech.edu mailto:squ...@gatech.edu wrote: Both /etc/hosts have each other's IP addresses in them. Telneting from machine2 to machine1 on port 5060 works just fine. Here's

Re: Spark standalone network configuration problems

2014-06-26 Thread Shannon Quinn
In the interest of completeness, this is how I invoke spark: [on master] sbin/start-all.sh spark-submit --py-files extra.py main.py iPhone'd On Jun 26, 2014, at 17:29, Shannon Quinn squ...@gatech.edu wrote: My *best guess* (please correct me if I'm wrong) is that the master (machine1

Spark standalone network configuration problems

2014-06-25 Thread Shannon Quinn
the worker on the master node indicate that it's running just fine) I appreciate any assistance you can offer! Regards, Shannon Quinn