All the stuff in lib_managed are what gets downloaded by sbt/maven when you
compile. Those are necessary for running spark, spark streaming, etc. But
you should not have to add all that to classpath individually and manually
when running Spark programs. If you are trying to run your Spark program
Hey you guys,
What is the different in spark on yarn mode and standalone mode about
resource schedule?
Wish you happy everyday.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/different-in-spark-on-yarn-mode-and-standalone-mode-tp5300.html
Sent from the
Does this perhaps have to do with the spark.closure.serializer?
On Sat, May 3, 2014 at 7:50 AM, Soren Macbeth so...@yieldbot.com wrote:
Poking around in the bowels of scala, it seems like this has something to
do with implicit scala - java collection munging. Why would it be doing
this and
Is this supposed to be supported? It doesn't work, at least in mesos fine
grained mode. First it fails a bunch of times because it can't find my
registrator class because my assembly jar hasn't been fetched like so:
java.lang.ClassNotFoundException: pickles.kryo.PicklesRegistrator
at
Maybe your memory isn't enough to contain the current RDD and also all the
past ones?
RDDs that are cached or persisted have to be unpersisted explicitly, no
auto-unpersist (maybe changes will be for 1.0 version?) exists.
Be careful that calling cache() or persist() doesn't imply the RDD will be
Can you tell which version of Spark you are using? Spark 1.0 RC3, or
something intermediate?
And do you call sparkContext.stop at the end of your application? If so,
does this error occur before or after the stop()?
TD
On Sun, May 4, 2014 at 2:40 AM, wxhsdp wxh...@gmail.com wrote:
Hi, all
i
Hi Michael,
The log after I typed last is as below:
last
scala.tools.nsc.MissingRequirementError: object scala not found.
at
scala.tools.nsc.symtab.Definitions$definitions$.getModuleOrClass(Definitions.scala:655)
at
according to the code, SPARK_YARN_APP_JAR is retrieved from system
variables.
and the key-value pairs you pass through to JavaSparkContext is isolated
from system variables.
so, you maybe should try setting it through System.setProperty().
thanks
On Wed, Apr 23, 2014 at 6:05 PM, 肥肥
thx for the help, unpersist is excatly what I want:)
I see that spark will remove some cache automatically when memory is full,
it is much more helpful if the rule satisfy something like LRU
It seems that persist and cache is some kind of lazy?
--
View this message in context:
Hi, TD
actually, i'am not very clear with my spark version. i check out from
https://github.com/apache/spark/trunk on Apr 30.
please tell me from where do you get the version Spark 1.0 RC3
i do not call sparkContext.stop. now i add it to the end of my code
here's the log
14/05/04 18:48:21 INFO
Hi,
i'am trying to use breeze linalg library for matrix operation in my spark
code. i already add dependency
on breeze in my build.sbt, and package my code sucessfully.
when i run on local mode, sbt run local..., everything is ok
but when turn to standalone mode, sbt run
unsubscribe
Thanks Mayur, the only think that my code is doing is:
read from s3, and saveAsTextFile on hdfs. Like I said, everything is
written correctly, but at the end of the job there is this warnning,
I will try to compile with hadoop 2.4
thanks
2014-05-04 11:17 GMT-03:00 Mayur Rustagi
Yes, persist/cache will cache an RDD only when an action is applied to it.
On Sun, May 4, 2014 at 6:32 AM, Earthson earthson...@gmail.com wrote:
thx for the help, unpersist is excatly what I want:)
I see that spark will remove some cache automatically when memory is full,
it is much more
Chris,
To use s3distcp in this case, are you suggesting saving the RDD to
local/ephemeral HDFS and then copying it up to S3 using this tool?
On Sat, May 3, 2014 at 7:14 PM, Chris Fregly ch...@fregly.com wrote:
not sure if this directly addresses your issue, peter, but it's worth
mentioned a
Thank you Chris, I am familiar with S3distcp, I'm trying to replicate some of
that functionality and combine it with my log post processing in one step
instead of yet another step.
On Saturday, May 3, 2014 4:15 PM, Chris Fregly ch...@fregly.com wrote:
not sure if this directly addresses your
Hi Patrick
I should probably explain my use case in a bit more detail. I have hundreds of
thousands to millions of clients uploading events to my pipeline, these are
batched periodically (every 60 seconds atm) into logs which are dumped into S3
(and uploaded into a data warehouse). I need to
If you add the breeze dependency in your build.sbt project, it will not be
available to all the workers.
There are couple options, 1) use sbt assembly to package breeze into your
application jar. 2) manually copy breeze jar into all the nodes, and have
them in the classpath. 3) spark 1.0 has
An additional option 4) Use SparkContext.addJar() and have the
application ship your jar to all the nodes.
Yadid
On 5/4/14, 4:07 PM, DB Tsai wrote:
If you add the breeze dependency in your build.sbt project, it will
not be available to all the workers.
There are couple options, 1) use sbt
Hi all,
A heads up in case others hit this and are confused… This nice addition
https://github.com/apache/spark/pull/612 causes an error if running the
spark-ec2.py deploy script from a version other than master (e.g. 0.8.0).
The error occurs during launch, here:
...
Creating local config
I have been working on a Spark program, completed it, but have spent the past
few hours trying to run on EC2 without any luck. I am hoping i can
comprehensively describe my problem and what I have done, but I am pretty
stuck.
My code uses the following lines to configure the SparkContext, which
Hi ,
It might be a very general question to ask here but I'm curious to know why
spark streaming can achieve better throughput than storm as claimed in the
spark streaming paper. Does it depend on certain use cases and/or data
source ? What drives better performance in spark streaming case or in
Hey Jeremy,
This is actually a big problem - thanks for reporting it, I'm going to
revert this change until we can make sure it is backwards compatible.
- Patrick
On Sun, May 4, 2014 at 2:00 PM, Jeremy Freeman freeman.jer...@gmail.com wrote:
Hi all,
A heads up in case others hit this and are
Okay I just went ahead and fixed this to make it backwards-compatible
(was a simple fix). I launched a cluster successfully with Spark
0.8.1.
Jeremy - if you could try again and let me know if there are any
issues, that would be great. Thanks again for reporting this.
On Sun, May 4, 2014 at 3:41
great questions, weide. in addition, i'd also like to hear more about how
to horizontally scale a spark-streaming cluster.
i've gone through the samples (standalone mode) and read the documentation,
but it's still not clear to me how to scale this puppy out under high load.
i assume i add more
Cool, glad to help! I just tested with 0.8.1 and 0.9.0 and both worked
perfectly, so seems to all be good.
-- Jeremy
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/spark-ec2-error-tp5323p5329.html
Sent from the Apache Spark User List mailing list archive
I compiled spark with SPARK_HADOOP_VERSION=2.4.0 sbt/sbt assembly, fixed
the s3 dependencies, but I am still getting the same error...
14/05/05 00:32:33 WARN TaskSetManager: Loss was due to
org.apache.hadoop.ipc.RemoteException
原始邮件
主题:unsubscribe
发件人:Nabeel Memon nm3...@gmail.com
收件人:user@spark.apache.org
抄送:
unsubscribe
I am using Spark 0.9.1. When I'm trying to start a EC2 cluster with the
spark-ec2 script, an error occurs and the following message is issued:
AttributeError: 'module' object has no attribute 'check_output'. By this
time, EC2 instances are up and running but Spark doesn't seem to be
installed on
In the core, they are not quite different
In standalone mode, you have spark master and spark worker who allocate driver
and executors for your spark app.
While in Yarn mode, Yarn resource manager and node manager do this work.
When the driver and executors have been launched, the rest part of
Hey Pedro,
From which version of Spark were you running the spark-ec2.py script? You
might have run into the problem described here
(http://apache-spark-user-list.1001560.n3.nabble.com/spark-ec2-error-td5323.html),
which Patrick just fixed up to ensure backwards compatibility.
With the bug, it
Hi Jeremy,
I am running from the most recent release, 0.9. I just fixed the problem, and
it is indeed correct setting of variables in deployment.
Once I had the cluster I wanted running, I began to suspect that master was not
responding. So I killed a worker, then recreated it, and found it
I think I forgot to rsync the slaves with the new compiled jar, I will
give it a try as soon as possible,
Em 04/05/2014 21:35, Andre Kuhnen andrekuh...@gmail.com escreveu:
I compiled spark with SPARK_HADOOP_VERSION=2.4.0 sbt/sbt assembly, fixed
the s3 dependencies, but I am still getting the
the total memory of your machine is 2G right?
then how much memory is left free? wouldn`t ubuntu take up quite a big
portion of 2G?
just a guess!
On Sat, May 3, 2014 at 8:15 PM, Carter gyz...@hotmail.com wrote:
Hi, thanks for all your help.
I tried your setting in the sbt file, but the
Hi Nan,
Have you found a way to fix the issue? Now I run into the same problem with
version 0.9.1.
Thanks,
Cheney
--
View this message in context:
I just ran into the same problem. I will respond if I find how to fix.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-tp5182p5342.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Since it appears breeze is going to be included by default in Spark in 1.0,
and I ran into the issue here:
http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-td5182.html
And it seems like the issues I had were recently introduced, I am cloning
spark and checking out the
check if the jar file that includes your example code is under
examples/target/scala-2.10/.
On Sat, May 3, 2014 at 5:58 AM, SK skrishna...@gmail.com wrote:
I am using Spark 0.9.1 in standalone mode. In the
SPARK_HOME/examples/src/main/scala/org/apache/spark/ folder, I created my
directory
Hi Jacob,
Taking both concerns into account, I'm actually thinking about using a separate
subnet to isolate the Spark Workers, but need to look into how to bind the
process onto the correct interface first. This may require some code
change.Separate subnet doesn't limit itself with port range
I'd just like to update this thread by pointing to the PR based on our
initial design: https://github.com/apache/spark/pull/640
This solution is a little more general and avoids catching IOException
altogether. Long live exception propagation!
On Mon, Apr 28, 2014 at 1:28 PM, Patrick Wendell
A new broadcast object will generated for every iteration step, it may eat up
the memory and make persist fail.
The broadcast object should not be removed because RDD may be recomputed.
And I am trying to prevent recomputing RDD, it need old broadcast release
some memory.
I've tried to set
Code Here
https://github.com/Earthson/sparklda/blob/dev/src/main/scala/net/earthson/nlp/lda/lda.scala#L121
Finally, iteration still runs into recomputing...
--
View this message in context:
hello,ZhangYi
I find ooyala's opensourced spark-jobserver,
https://github.com/ooyala/spark-jobserver
seems that they are also using akka and spray and spark, maybe helpful for
you.
On Mon, May 5, 2014 at 11:37 AM, ZhangYi yizh...@thoughtworks.com wrote:
Hi all,
Currently, our project is
I tried using serialization instead of broadcast, and my program exit with
Error(beyond physical memory limits).
The large object can not be released by GC? because it is needed for
recomputing? So what is the recomended way to solve this problem?
--
View this message in context:
Hi, DB, i think it's something related to sbt publishLocal
if i remove the breeze dependency in my sbt file, breeze can not be found
[error] /home/wxhsdp/spark/example/test/src/main/scala/test.scala:5: not
found: object breeze
[error] import breeze.linalg._
[error]^
here's my sbt file:
45 matches
Mail list logo