Hi All,
I am new to Spark.
In the Spark shell, how can I get the help or explanation for those
functions that I can use for a variable or RDD? For example, after I input a
RDD's name with a dot (.) at the end, if I press the Tab key, a list of
functions that I can use for this RDD will be
You can consult the docs at :
https://spark.apache.org/docs/latest/api/scala/index.html#package
In particular, the rdd docs contain the explanation of each method :
https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.RDD
Kr, Gerard
On Jun 8, 2014 1:00 PM, Carter
Thank you very much Gerard.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-get-the-help-or-explanation-for-the-functions-in-Spark-shell-tp7191p7193.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
I read it more carefully, and window() might actually work for some other
stuff like logs. (assuming I can have multiple windows with entirely
different attributes on a single stream..)
Thanks for that!
On Sun, Jun 8, 2014 at 11:11 PM, Jeremy Lee unorthodox.engine...@gmail.com
wrote:
Yes..
Yeah... Have not tried it, but if you set the slidingDuration == windowDuration
that should prevent overlaps.
Gino B.
On Jun 8, 2014, at 8:25 AM, Jeremy Lee unorthodox.engine...@gmail.com wrote:
I read it more carefully, and window() might actually work for some other
stuff like logs.
Hi All,
I just downloaded the Scala IDE for Eclipse. After I created a Spark project
and clicked Run there was an error on this line of code import
org.apache.spark.SparkContext: object apache is not a member of package
org. I guess I need to import the Spark dependency into Scala IDE for
Project-Properties-Java Build Path-Add External Jars
Add the /spark-1.0.0-bin-hadoop2/lib/spark-assembly-1.0.0-hadoop2.2.0.jar
Cheers
K/
On Sun, Jun 8, 2014 at 8:06 AM, Carter gyz...@hotmail.com wrote:
Hi All,
I just downloaded the Scala IDE for Eclipse. After I created a Spark
project
and
This will make the compilation pass but you may not be able to run it
correctly.
I used maven adding these two jars (I use Hadoop 1), maven added their
dependent jars (a lot) for me.
dependency
groupIdorg.apache.spark/groupId
artifactIdspark-core_2.10/artifactId
I shut down my first (working) cluster and brought up a fresh one... and
It's been a bit of a horror and I need to sleep now. Should I be worried
about these errors? Or did I just have the old log4j.config tuned so I
didn't see them?
I
14/06/08 16:32:52 ERROR scheduler.JobScheduler: Error
A match clause needs to cover all the possibilities, and not matching
any regex is a distinct possibility. It's not really like 'switch'
because it requires this and I think that has benefits, like being
able to interpret a match as something with a type. I think it's all
in order, but it's more
When you use match, the match must be exhaustive. That is, a match error is
thrown if the match fails.
That's why you usually handle the default case using case _ = ...
Here it looks like your taking the text of all statuses - which means not all
of them will be commands... Which means
The solution is either to add a default case which does nothing, or
probably better to add a .filter such that you filter out anything that's
not a command before matching.
And you probably want to push down that filter into the cluster --
collecting all of the elements of an RDD only to not
HI,
I am stuck here, my cluster is not effficiently utilized . Appreciate any
input on this.
Thanks
Subacini
On Sat, Jun 7, 2014 at 10:54 PM, Subacini B subac...@gmail.com wrote:
Hi All,
My cluster has 5 workers each having 4 cores (So total 20 cores).It is in
stand alone mode (not using
Hi All,
I was writing a simple Streaming job to get more understanding about Spark
streaming.
I am not understanding why the union behaviour in this particular case
*WORKS:*
val lines = ssc.socketTextStream(localhost, ,
StorageLevel.MEMORY_AND_DISK_SER)
val words =
In PySpark you can also do help(my_rdd) and get a nice help page of methods
available.
2014년 6월 8일 일요일, Cartergyz...@hotmail.com님이 작성한 메시지:
Thank you very much Gerard.
--
View this message in context:
Moving over to the dev list, as this isn't a user-scope issue.
I just ran into this issue with the missing saveAsTestFile, and here's a
little additional information:
- Code ported from 0.9.1 up to 1.0.0; works with local[n] in both cases.
- Driver built as an uberjar via Maven.
- Deployed to
Paul,
Could you give the version of Java that you are building with and the
version of Java you are running with? Are they the same?
Just off the cuff, I wonder if this is related to:
https://issues.apache.org/jira/browse/SPARK-1520
If it is, it could appear that certain functions are not in
Also I should add - thanks for taking time to help narrow this down!
On Sun, Jun 8, 2014 at 1:02 PM, Patrick Wendell pwend...@gmail.com wrote:
Paul,
Could you give the version of Java that you are building with and the
version of Java you are running with? Are they the same?
Just off the
I suspect Patrick is right about the cause. The Maven artifact that
was released does contain this class (phew)
http://search.maven.org/#artifactdetails%7Corg.apache.spark%7Cspark-core_2.10%7C1.0.0%7Cjar
As to the hadoop1 / hadoop2 artifact question -- agree that is often
done. Here the working
Okay I think I've isolated this a bit more. Let's discuss over on the JIRA:
https://issues.apache.org/jira/browse/SPARK-2075
On Sun, Jun 8, 2014 at 1:16 PM, Paul Brown p...@mult.ifario.us wrote:
Hi, Patrick --
Java 7 on the development machines:
» java -version
1 ↵
java version 1.7.0_51
Thanks Sean, let me try to set spark.deploy.spreadOut as false.
On Sun, Jun 8, 2014 at 12:44 PM, Sean Owen so...@cloudera.com wrote:
Have a look at:
https://spark.apache.org/docs/1.0.0/job-scheduling.html
https://spark.apache.org/docs/1.0.0/spark-standalone.html
The default is to grab
Gaurav,
I am not sure that the * expands to what you expect it to do.
Normally the bash expands * to a space-separated string, not
colon-separated. Try specifying all the jars manually, maybe?
Tobias
On Thu, Jun 5, 2014 at 6:45 PM, Gaurav Dasgupta gaurav.d...@gmail.com wrote:
Hi,
I have
On Sun, Jun 8, 2014 at 10:00 AM, Nick Pentreath nick.pentre...@gmail.com
wrote:
When you use match, the match must be exhaustive. That is, a match error
is thrown if the match fails.
Ahh, right. That makes sense. Scala is applying its strong typing rules
here instead of no ceremony... but
Jeremy,
On Mon, Jun 9, 2014 at 10:22 AM, Jeremy Lee
unorthodox.engine...@gmail.com wrote:
When you use match, the match must be exhaustive. That is, a match error
is thrown if the match fails.
Ahh, right. That makes sense. Scala is applying its strong typing rules
here instead of no
I'm having some trouble getting a basic matrix multiply to work with Breeze.
I'm pretty sure it's related to my classpath. My setup is a cluster on AWS
with 8 m3.xlarges. To create the cluster I used the provided ec2 scripts and
Spark 1.0.0.
I've made a gist with the relevant pieces of my app:
Dear All,
I recently installed Spark 1.0.0 on a 10-slave dedicate cluster. However,
the max input rate that the system can sustain with stable latency seems
very low. I use a simple word counting workload over tweets:
theDStream.flatMap(extractWordOnePairs).reduceByKey(sumFunc).count.print
With
Hi,
I had a similar problem; I was using `sbt assembly` to build a jar
containing all my dependencies, but since my file system has a problem
with long file names (due to disk encryption), some class files (which
correspond to functions in Scala) where not included in the jar I
uploaded.
Thanks a lot Krishna, this works for me.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-compile-a-Spark-project-in-Scala-IDE-for-Eclipse-tp7197p7223.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Thanks for your reply Wei, will try this.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-compile-a-Spark-project-in-Scala-IDE-for-Eclipse-tp7197p7224.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Thanks for the quick response. No, I actually build my jar via 'sbt package'
on EC2 on the master itself.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Classpath-errors-with-Breeze-tp7220p7225.html
Sent from the Apache Spark User List mailing list archive
Hi dlaw,
You are using breeze-0.8.1, but the spark assembly jar depends on
breeze-0.7. If the spark assembly jar comes the first on the classpath
but the method from DenseMatrix is only available in breeze-0.8.1, you
get NoSuchMethod. So,
a) If you don't need the features in breeze-0.8.1, do not
Hi Tobias,
Which file system and which encryption are you using?
Best,
Xiangrui
On Sun, Jun 8, 2014 at 10:16 PM, Xiangrui Meng men...@gmail.com wrote:
Hi dlaw,
You are using breeze-0.8.1, but the spark assembly jar depends on
breeze-0.7. If the spark assembly jar comes the first on the
Without (C), what is the best practice to implement the following scenario?
1. rdd = sc.textFile(FileA)
2. rdd = rdd.map(...) // actually modifying the rdd
3. rdd.saveAsTextFile(FileA)
Since the rdd transformation is 'lazy', rdd will not materialize until
saveAsTextFile(), so FileA must still
33 matches
Mail list logo