How to get the help or explanation for the functions in Spark shell?

2014-06-08 Thread Carter
Hi All, I am new to Spark. In the Spark shell, how can I get the help or explanation for those functions that I can use for a variable or RDD? For example, after I input a RDD's name with a dot (.) at the end, if I press the Tab key, a list of functions that I can use for this RDD will be

Re: How to get the help or explanation for the functions in Spark shell?

2014-06-08 Thread Gerard Maas
You can consult the docs at : https://spark.apache.org/docs/latest/api/scala/index.html#package In particular, the rdd docs contain the explanation of each method : https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.RDD Kr, Gerard On Jun 8, 2014 1:00 PM, Carter

Re: How to get the help or explanation for the functions in Spark shell?

2014-06-08 Thread Carter
Thank you very much Gerard. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-get-the-help-or-explanation-for-the-functions-in-Spark-shell-tp7191p7193.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Best practise for 'Streaming' dumps?

2014-06-08 Thread Jeremy Lee
I read it more carefully, and window() might actually work for some other stuff like logs. (assuming I can have multiple windows with entirely different attributes on a single stream..) Thanks for that! On Sun, Jun 8, 2014 at 11:11 PM, Jeremy Lee unorthodox.engine...@gmail.com wrote: Yes..

Re: Best practise for 'Streaming' dumps?

2014-06-08 Thread Gino Bustelo
Yeah... Have not tried it, but if you set the slidingDuration == windowDuration that should prevent overlaps. Gino B. On Jun 8, 2014, at 8:25 AM, Jeremy Lee unorthodox.engine...@gmail.com wrote: I read it more carefully, and window() might actually work for some other stuff like logs.

How to compile a Spark project in Scala IDE for Eclipse?

2014-06-08 Thread Carter
Hi All, I just downloaded the Scala IDE for Eclipse. After I created a Spark project and clicked Run there was an error on this line of code import org.apache.spark.SparkContext: object apache is not a member of package org. I guess I need to import the Spark dependency into Scala IDE for

Re: How to compile a Spark project in Scala IDE for Eclipse?

2014-06-08 Thread Krishna Sankar
Project-Properties-Java Build Path-Add External Jars Add the /spark-1.0.0-bin-hadoop2/lib/spark-assembly-1.0.0-hadoop2.2.0.jar Cheers K/ On Sun, Jun 8, 2014 at 8:06 AM, Carter gyz...@hotmail.com wrote: Hi All, I just downloaded the Scala IDE for Eclipse. After I created a Spark project and

Re: How to compile a Spark project in Scala IDE for Eclipse?

2014-06-08 Thread Wei Tan
This will make the compilation pass but you may not be able to run it correctly. I used maven adding these two jars (I use Hadoop 1), maven added their dependent jars (a lot) for me. dependency groupIdorg.apache.spark/groupId artifactIdspark-core_2.10/artifactId

Are scala.MatchError messages a problem?

2014-06-08 Thread Jeremy Lee
I shut down my first (working) cluster and brought up a fresh one... and It's been a bit of a horror and I need to sleep now. Should I be worried about these errors? Or did I just have the old log4j.config tuned so I didn't see them? I 14/06/08 16:32:52 ERROR scheduler.JobScheduler: Error

Re: Are scala.MatchError messages a problem?

2014-06-08 Thread Sean Owen
A match clause needs to cover all the possibilities, and not matching any regex is a distinct possibility. It's not really like 'switch' because it requires this and I think that has benefits, like being able to interpret a match as something with a type. I think it's all in order, but it's more

Re: Are scala.MatchError messages a problem?

2014-06-08 Thread Nick Pentreath
When you use match, the match must be exhaustive. That is, a match error is thrown if the match fails.  That's why you usually handle the default case using case _ = ... Here it looks like your taking the text of all statuses - which means not all of them will be commands... Which means

Re: Are scala.MatchError messages a problem?

2014-06-08 Thread Mark Hamstra
The solution is either to add a default case which does nothing, or probably better to add a .filter such that you filter out anything that's not a command before matching. And you probably want to push down that filter into the cluster -- collecting all of the elements of an RDD only to not

Re: Spark Worker Core Allocation

2014-06-08 Thread Subacini B
HI, I am stuck here, my cluster is not effficiently utilized . Appreciate any input on this. Thanks Subacini On Sat, Jun 7, 2014 at 10:54 PM, Subacini B subac...@gmail.com wrote: Hi All, My cluster has 5 workers each having 4 cores (So total 20 cores).It is in stand alone mode (not using

Spark Streaming union expected behaviour?

2014-06-08 Thread Shrikar archak
Hi All, I was writing a simple Streaming job to get more understanding about Spark streaming. I am not understanding why the union behaviour in this particular case *WORKS:* val lines = ssc.socketTextStream(localhost, , StorageLevel.MEMORY_AND_DISK_SER) val words =

Re: How to get the help or explanation for the functions in Spark shell?

2014-06-08 Thread Nicholas Chammas
In PySpark you can also do help(my_rdd) and get a nice help page of methods available. 2014년 6월 8일 일요일, Cartergyz...@hotmail.com님이 작성한 메시지: Thank you very much Gerard. -- View this message in context:

Re: Strange problem with saveAsTextFile after upgrade Spark 0.9.0-1.0.0

2014-06-08 Thread Paul Brown
Moving over to the dev list, as this isn't a user-scope issue. I just ran into this issue with the missing saveAsTestFile, and here's a little additional information: - Code ported from 0.9.1 up to 1.0.0; works with local[n] in both cases. - Driver built as an uberjar via Maven. - Deployed to

Re: Strange problem with saveAsTextFile after upgrade Spark 0.9.0-1.0.0

2014-06-08 Thread Patrick Wendell
Paul, Could you give the version of Java that you are building with and the version of Java you are running with? Are they the same? Just off the cuff, I wonder if this is related to: https://issues.apache.org/jira/browse/SPARK-1520 If it is, it could appear that certain functions are not in

Re: Strange problem with saveAsTextFile after upgrade Spark 0.9.0-1.0.0

2014-06-08 Thread Patrick Wendell
Also I should add - thanks for taking time to help narrow this down! On Sun, Jun 8, 2014 at 1:02 PM, Patrick Wendell pwend...@gmail.com wrote: Paul, Could you give the version of Java that you are building with and the version of Java you are running with? Are they the same? Just off the

Re: Strange problem with saveAsTextFile after upgrade Spark 0.9.0-1.0.0

2014-06-08 Thread Sean Owen
I suspect Patrick is right about the cause. The Maven artifact that was released does contain this class (phew) http://search.maven.org/#artifactdetails%7Corg.apache.spark%7Cspark-core_2.10%7C1.0.0%7Cjar As to the hadoop1 / hadoop2 artifact question -- agree that is often done. Here the working

Re: Strange problem with saveAsTextFile after upgrade Spark 0.9.0-1.0.0

2014-06-08 Thread Patrick Wendell
Okay I think I've isolated this a bit more. Let's discuss over on the JIRA: https://issues.apache.org/jira/browse/SPARK-2075 On Sun, Jun 8, 2014 at 1:16 PM, Paul Brown p...@mult.ifario.us wrote: Hi, Patrick -- Java 7 on the development machines: » java -version 1 ↵ java version 1.7.0_51

Re: Spark Worker Core Allocation

2014-06-08 Thread Subacini B
Thanks Sean, let me try to set spark.deploy.spreadOut as false. On Sun, Jun 8, 2014 at 12:44 PM, Sean Owen so...@cloudera.com wrote: Have a look at: https://spark.apache.org/docs/1.0.0/job-scheduling.html https://spark.apache.org/docs/1.0.0/spark-standalone.html The default is to grab

Re: Spark Kafka streaming - ClassNotFoundException: org.apache.spark.streaming.kafka.KafkaReceiver

2014-06-08 Thread Tobias Pfeiffer
Gaurav, I am not sure that the * expands to what you expect it to do. Normally the bash expands * to a space-separated string, not colon-separated. Try specifying all the jars manually, maybe? Tobias On Thu, Jun 5, 2014 at 6:45 PM, Gaurav Dasgupta gaurav.d...@gmail.com wrote: Hi, I have

Re: Are scala.MatchError messages a problem?

2014-06-08 Thread Jeremy Lee
On Sun, Jun 8, 2014 at 10:00 AM, Nick Pentreath nick.pentre...@gmail.com wrote: When you use match, the match must be exhaustive. That is, a match error is thrown if the match fails. Ahh, right. That makes sense. Scala is applying its strong typing rules here instead of no ceremony... but

Re: Are scala.MatchError messages a problem?

2014-06-08 Thread Tobias Pfeiffer
Jeremy, On Mon, Jun 9, 2014 at 10:22 AM, Jeremy Lee unorthodox.engine...@gmail.com wrote: When you use match, the match must be exhaustive. That is, a match error is thrown if the match fails. Ahh, right. That makes sense. Scala is applying its strong typing rules here instead of no

Classpath errors with Breeze

2014-06-08 Thread dlaw
I'm having some trouble getting a basic matrix multiply to work with Breeze. I'm pretty sure it's related to my classpath. My setup is a cluster on AWS with 8 m3.xlarges. To create the cluster I used the provided ec2 scripts and Spark 1.0.0. I've made a gist with the relevant pieces of my app:

How to achieve a reasonable performance on Spark Streaming

2014-06-08 Thread onpoq
Dear All, I recently installed Spark 1.0.0 on a 10-slave dedicate cluster. However, the max input rate that the system can sustain with stable latency seems very low. I use a simple word counting workload over tweets: theDStream.flatMap(extractWordOnePairs).reduceByKey(sumFunc).count.print With

Re: Classpath errors with Breeze

2014-06-08 Thread Tobias Pfeiffer
Hi, I had a similar problem; I was using `sbt assembly` to build a jar containing all my dependencies, but since my file system has a problem with long file names (due to disk encryption), some class files (which correspond to functions in Scala) where not included in the jar I uploaded.

Re: How to compile a Spark project in Scala IDE for Eclipse?

2014-06-08 Thread Carter
Thanks a lot Krishna, this works for me. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-compile-a-Spark-project-in-Scala-IDE-for-Eclipse-tp7197p7223.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: How to compile a Spark project in Scala IDE for Eclipse?

2014-06-08 Thread Carter
Thanks for your reply Wei, will try this. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-compile-a-Spark-project-in-Scala-IDE-for-Eclipse-tp7197p7224.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Classpath errors with Breeze

2014-06-08 Thread dlaw
Thanks for the quick response. No, I actually build my jar via 'sbt package' on EC2 on the master itself. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Classpath-errors-with-Breeze-tp7220p7225.html Sent from the Apache Spark User List mailing list archive

Re: Classpath errors with Breeze

2014-06-08 Thread Xiangrui Meng
Hi dlaw, You are using breeze-0.8.1, but the spark assembly jar depends on breeze-0.7. If the spark assembly jar comes the first on the classpath but the method from DenseMatrix is only available in breeze-0.8.1, you get NoSuchMethod. So, a) If you don't need the features in breeze-0.8.1, do not

Re: Classpath errors with Breeze

2014-06-08 Thread Xiangrui Meng
Hi Tobias, Which file system and which encryption are you using? Best, Xiangrui On Sun, Jun 8, 2014 at 10:16 PM, Xiangrui Meng men...@gmail.com wrote: Hi dlaw, You are using breeze-0.8.1, but the spark assembly jar depends on breeze-0.7. If the spark assembly jar comes the first on the

RE: How can I make Spark 1.0 saveAsTextFile to overwrite existing file

2014-06-08 Thread innowireless TaeYun Kim
Without (C), what is the best practice to implement the following scenario? 1. rdd = sc.textFile(FileA) 2. rdd = rdd.map(...) // actually modifying the rdd 3. rdd.saveAsTextFile(FileA) Since the rdd transformation is 'lazy', rdd will not materialize until saveAsTextFile(), so FileA must still