Re: Help with error initializing SparkR.

2014-04-20 Thread Shivaram Venkataraman
I just updated the github issue -- In case anybody is curious, this was a problem with R resolving the right java version installed in the VM. Thanks Shivaram On Sat, Apr 19, 2014 at 7:12 PM, tongzzz tongzhang...@gmail.com wrote: I can't initialize sc context after a successful install on

question about the SocketReceiver

2014-04-20 Thread YouPeng Yang
Hi I am studing the structure of the Spark Streaming(my spark version is 0.9.0). I have a question about the SocketReceiver.In the onStart function: --- protected def onStart() { logInfo(Connecting to + host + : + port) val socket

Re: Anyone using value classes in RDDs?

2014-04-20 Thread Surendranauth Hiraman
If the purpose is only aliasing, rather than adding additional methods and avoiding runtime allocation, what about type aliases? type ID = String type Name = String On Sat, Apr 19, 2014 at 9:26 PM, kamatsuoka ken...@gmail.com wrote: No, you can wrap other types in value classes as well. You

Re: Anyone using value classes in RDDs?

2014-04-20 Thread Surendranauth Hiraman
Oh, sorry, I think your point was probably you wouldn't need runtime allocation. I guess that is the key question. I would be interested if this works for you. -Suren On Sun, Apr 20, 2014 at 9:18 AM, Surendranauth Hiraman suren.hira...@velos.io wrote: If the purpose is only aliasing,

Re: Anyone using value classes in RDDs?

2014-04-20 Thread Luis Ángel Vicente Sánchez
Type alias aren't safe as you could use any string as a name or id. On 20 Apr 2014 14:18, Surendranauth Hiraman suren.hira...@velos.io wrote: If the purpose is only aliasing, rather than adding additional methods and avoiding runtime allocation, what about type aliases? type ID = String type

Re: Help with error initializing SparkR.

2014-04-20 Thread tongzzz
Problem solved, Shivaram's answer in the github post is the perfect solution for me. See https://github.com/amplab-extras/SparkR-pkg/issues/46# Thanks! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Help-with-error-initializing-SparkR-tp4495p4504.html

Re: Ooyala Server - plans to merge it into Apache ?

2014-04-20 Thread Andrew Ash
The homepage for Ooyala's job server is here: https://github.com/ooyala/spark-jobserver They decided (I think with input from the Spark team) that it made more sense to keep the jobserver in a separate repository for now. Andrew On Fri, Apr 18, 2014 at 5:42 AM, Azuryy Yu azury...@gmail.com

evaluate spark

2014-04-20 Thread Joe L
I want to evaluate spark performance by measuring the running time of transformation operations such as map and join. To do so, do I need to materialize merely count action? because As far as I know, transformations are lazy operations and don't do any computation until we action on them but when

Re: Hung inserts?

2014-04-20 Thread Rahul Chugh
M ¥ n vc czwqq On Sunday, April 20, 2014, Brad Heller brad.hel...@gmail.com wrote: Hey list, I've got some CSV data I'm importing from S3. I can create the external table well enough, and I can also do a CREATE TABLE ... AS SELECT ... from it to pull the data internal to Spark. Here's

Long running time for GraphX pagerank in dataset com-Friendster

2014-04-20 Thread Qi Song
Hello~ I was running some pagerank tests of GraphX in my 8 nodes cluster. I allocated each worker 32G memory and 8 CPU cores. The LiveJournal dataset used 370s, which in my mind is reasonable. But when I tried the com-Friendster data ( http://snap.stanford.edu/data/com-Friendster.html ) with

running tests selectively

2014-04-20 Thread Arun Ramakrishnan
I would like to run some of the tests selectively. I am in branch-1.0 Tried the following two commands. But, it seems to run everything. ./sbt/sbt testOnly org.apache.spark.rdd.RDDSuite ./sbt/sbt test-only org.apache.spark.rdd.RDDSuite Also, how do I run tests of only one of the

Re: running tests selectively

2014-04-20 Thread Patrick Wendell
I put some notes in this doc: https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools On Sun, Apr 20, 2014 at 8:58 PM, Arun Ramakrishnan sinchronized.a...@gmail.com wrote: I would like to run some of the tests selectively. I am in branch-1.0 Tried the following two

Re: Task splitting among workers

2014-04-20 Thread Patrick Wendell
For a HadoopRDD, first the spark scheduler calculates the number of tasks based on input splits. Usually people use this with HDFS data so in that case it's based on HDFS blocks. If the HDFS datanodes are co-located with the Spark cluster then it will try to run the tasks on the data node that