Re: spark run issue

2014-05-04 Thread Tathagata Das
All the stuff in lib_managed are what gets downloaded by sbt/maven when you compile. Those are necessary for running spark, spark streaming, etc. But you should not have to add all that to classpath individually and manually when running Spark programs. If you are trying to run your Spark program

different in spark on yarn mode and standalone mode

2014-05-04 Thread Sophia
Hey you guys, What is the different in spark on yarn mode and standalone mode about resource schedule? Wish you happy everyday. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/different-in-spark-on-yarn-mode-and-standalone-mode-tp5300.html Sent from the

Re: Crazy Kryo Exception

2014-05-04 Thread Soren Macbeth
Does this perhaps have to do with the spark.closure.serializer? On Sat, May 3, 2014 at 7:50 AM, Soren Macbeth so...@yieldbot.com wrote: Poking around in the bowels of scala, it seems like this has something to do with implicit scala - java collection munging. Why would it be doing this and

using kryo for spark.closure.serializer with a registrator doesn't work

2014-05-04 Thread Soren Macbeth
Is this supposed to be supported? It doesn't work, at least in mesos fine grained mode. First it fails a bunch of times because it can't find my registrator class because my assembly jar hasn't been fetched like so: java.lang.ClassNotFoundException: pickles.kryo.PicklesRegistrator at

Re: cache not work as expected for iteration?

2014-05-04 Thread Andrea Esposito
Maybe your memory isn't enough to contain the current RDD and also all the past ones? RDDs that are cached or persisted have to be unpersisted explicitly, no auto-unpersist (maybe changes will be for 1.0 version?) exists. Be careful that calling cache() or persist() doesn't imply the RDD will be

Re: sbt run with spark.ContextCleaner ERROR

2014-05-04 Thread Tathagata Das
Can you tell which version of Spark you are using? Spark 1.0 RC3, or something intermediate? And do you call sparkContext.stop at the end of your application? If so, does this error occur before or after the stop()? TD On Sun, May 4, 2014 at 2:40 AM, wxhsdp wxh...@gmail.com wrote: Hi, all i

Re: sbt/sbt run command returns a JVM problem

2014-05-04 Thread Carter
Hi Michael, The log after I typed last is as below: last scala.tools.nsc.MissingRequirementError: object scala not found. at scala.tools.nsc.symtab.Definitions$definitions$.getModuleOrClass(Definitions.scala:655) at

Re: SparkException: env SPARK_YARN_APP_JAR is not set

2014-05-04 Thread phoenix bai
according to the code, SPARK_YARN_APP_JAR is retrieved from system variables. and the key-value pairs you pass through to JavaSparkContext is isolated from system variables. so, you maybe should try setting it through System.setProperty(). thanks On Wed, Apr 23, 2014 at 6:05 PM, 肥肥

Re: cache not work as expected for iteration?

2014-05-04 Thread Earthson
thx for the help, unpersist is excatly what I want:) I see that spark will remove some cache automatically when memory is full, it is much more helpful if the rule satisfy something like LRU It seems that persist and cache is some kind of lazy? -- View this message in context:

Re: sbt run with spark.ContextCleaner ERROR

2014-05-04 Thread wxhsdp
Hi, TD actually, i'am not very clear with my spark version. i check out from https://github.com/apache/spark/trunk on Apr 30. please tell me from where do you get the version Spark 1.0 RC3 i do not call sparkContext.stop. now i add it to the end of my code here's the log 14/05/04 18:48:21 INFO

NoSuchMethodError: breeze.linalg.DenseMatrix

2014-05-04 Thread wxhsdp
Hi, i'am trying to use breeze linalg library for matrix operation in my spark code. i already add dependency on breeze in my build.sbt, and package my code sucessfully. when i run on local mode, sbt run local..., everything is ok but when turn to standalone mode, sbt run

unsubscribe

2014-05-04 Thread Nabeel Memon
unsubscribe

Re: Lease Exception hadoop 2.4

2014-05-04 Thread Andre Kuhnen
Thanks Mayur, the only think that my code is doing is: read from s3, and saveAsTextFile on hdfs. Like I said, everything is written correctly, but at the end of the job there is this warnning, I will try to compile with hadoop 2.4 thanks 2014-05-04 11:17 GMT-03:00 Mayur Rustagi

Re: cache not work as expected for iteration?

2014-05-04 Thread Nicholas Chammas
Yes, persist/cache will cache an RDD only when an action is applied to it. On Sun, May 4, 2014 at 6:32 AM, Earthson earthson...@gmail.com wrote: thx for the help, unpersist is excatly what I want:) I see that spark will remove some cache automatically when memory is full, it is much more

Re: Reading multiple S3 objects, transforming, writing back one

2014-05-04 Thread Nicholas Chammas
Chris, To use s3distcp in this case, are you suggesting saving the RDD to local/ephemeral HDFS and then copying it up to S3 using this tool? On Sat, May 3, 2014 at 7:14 PM, Chris Fregly ch...@fregly.com wrote: not sure if this directly addresses your issue, peter, but it's worth mentioned a

Re: Reading multiple S3 objects, transforming, writing back one

2014-05-04 Thread Peter
Thank you Chris, I am familiar with S3distcp, I'm trying to replicate some of that functionality and combine it with my log post processing in one step instead of yet another step.  On Saturday, May 3, 2014 4:15 PM, Chris Fregly ch...@fregly.com wrote: not sure if this directly addresses your

Re: Reading multiple S3 objects, transforming, writing back one

2014-05-04 Thread Peter
Hi Patrick I should probably explain my use case in a bit more detail. I have hundreds of thousands to millions of clients uploading events to my pipeline, these are batched periodically (every 60 seconds atm) into logs which are dumped into S3 (and uploaded into a data warehouse). I need to

Re: NoSuchMethodError: breeze.linalg.DenseMatrix

2014-05-04 Thread DB Tsai
If you add the breeze dependency in your build.sbt project, it will not be available to all the workers. There are couple options, 1) use sbt assembly to package breeze into your application jar. 2) manually copy breeze jar into all the nodes, and have them in the classpath. 3) spark 1.0 has

Re: NoSuchMethodError: breeze.linalg.DenseMatrix

2014-05-04 Thread Yadid Ayzenberg
An additional option 4) Use SparkContext.addJar() and have the application ship your jar to all the nodes. Yadid On 5/4/14, 4:07 PM, DB Tsai wrote: If you add the breeze dependency in your build.sbt project, it will not be available to all the workers. There are couple options, 1) use sbt

spark ec2 error

2014-05-04 Thread Jeremy Freeman
Hi all, A heads up in case others hit this and are confused… This nice addition https://github.com/apache/spark/pull/612 causes an error if running the spark-ec2.py deploy script from a version other than master (e.g. 0.8.0). The error occurs during launch, here: ... Creating local config

Initial job has not accepted any resources

2014-05-04 Thread pedro
I have been working on a Spark program, completed it, but have spent the past few hours trying to run on EC2 without any luck. I am hoping i can comprehensively describe my problem and what I have done, but I am pretty stuck. My code uses the following lines to configure the SparkContext, which

spark streaming question

2014-05-04 Thread Weide Zhang
Hi , It might be a very general question to ask here but I'm curious to know why spark streaming can achieve better throughput than storm as claimed in the spark streaming paper. Does it depend on certain use cases and/or data source ? What drives better performance in spark streaming case or in

Re: spark ec2 error

2014-05-04 Thread Patrick Wendell
Hey Jeremy, This is actually a big problem - thanks for reporting it, I'm going to revert this change until we can make sure it is backwards compatible. - Patrick On Sun, May 4, 2014 at 2:00 PM, Jeremy Freeman freeman.jer...@gmail.com wrote: Hi all, A heads up in case others hit this and are

Re: spark ec2 error

2014-05-04 Thread Patrick Wendell
Okay I just went ahead and fixed this to make it backwards-compatible (was a simple fix). I launched a cluster successfully with Spark 0.8.1. Jeremy - if you could try again and let me know if there are any issues, that would be great. Thanks again for reporting this. On Sun, May 4, 2014 at 3:41

Re: spark streaming question

2014-05-04 Thread Chris Fregly
great questions, weide. in addition, i'd also like to hear more about how to horizontally scale a spark-streaming cluster. i've gone through the samples (standalone mode) and read the documentation, but it's still not clear to me how to scale this puppy out under high load. i assume i add more

Re: spark ec2 error

2014-05-04 Thread Jeremy Freeman
Cool, glad to help! I just tested with 0.8.1 and 0.9.0 and both worked perfectly, so seems to all be good. -- Jeremy -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-ec2-error-tp5323p5329.html Sent from the Apache Spark User List mailing list archive

Re: Lease Exception hadoop 2.4

2014-05-04 Thread Andre Kuhnen
I compiled spark with SPARK_HADOOP_VERSION=2.4.0 sbt/sbt assembly, fixed the s3 dependencies, but I am still getting the same error... 14/05/05 00:32:33 WARN TaskSetManager: Loss was due to org.apache.hadoop.ipc.RemoteException

unsubscribe

2014-05-04 Thread ZHANG Jun
原始邮件 主题:unsubscribe 发件人:Nabeel Memon nm3...@gmail.com 收件人:user@spark.apache.org 抄送: unsubscribe

Error starting EC2 cluster

2014-05-04 Thread Aliaksei Litouka
I am using Spark 0.9.1. When I'm trying to start a EC2 cluster with the spark-ec2 script, an error occurs and the following message is issued: AttributeError: 'module' object has no attribute 'check_output'. By this time, EC2 instances are up and running but Spark doesn't seem to be installed on

RE: different in spark on yarn mode and standalone mode

2014-05-04 Thread Liu, Raymond
In the core, they are not quite different In standalone mode, you have spark master and spark worker who allocate driver and executors for your spark app. While in Yarn mode, Yarn resource manager and node manager do this work. When the driver and executors have been launched, the rest part of

Re: Initial job has not accepted any resources

2014-05-04 Thread Jeremy Freeman
Hey Pedro, From which version of Spark were you running the spark-ec2.py script? You might have run into the problem described here (http://apache-spark-user-list.1001560.n3.nabble.com/spark-ec2-error-td5323.html), which Patrick just fixed up to ensure backwards compatibility. With the bug, it

Re: Initial job has not accepted any resources

2014-05-04 Thread pedro
Hi Jeremy, I am running from the most recent release, 0.9. I just fixed the problem, and it is indeed correct setting of variables in deployment. Once I had the cluster I wanted running, I began to suspect that master was not responding. So I killed a worker, then recreated it, and found it

Re: Lease Exception hadoop 2.4

2014-05-04 Thread Andre Kuhnen
I think I forgot to rsync the slaves with the new compiled jar, I will give it a try as soon as possible, Em 04/05/2014 21:35, Andre Kuhnen andrekuh...@gmail.com escreveu: I compiled spark with SPARK_HADOOP_VERSION=2.4.0 sbt/sbt assembly, fixed the s3 dependencies, but I am still getting the

Re: sbt/sbt run command returns a JVM problem

2014-05-04 Thread phoenix bai
the total memory of your machine is 2G right? then how much memory is left free? wouldn`t ubuntu take up quite a big portion of 2G? just a guess! On Sat, May 3, 2014 at 8:15 PM, Carter gyz...@hotmail.com wrote: Hi, thanks for all your help. I tried your setting in the sbt file, but the

Re: master attempted to re-register the worker and then took all workers as unregistered

2014-05-04 Thread Cheney Sun
Hi Nan, Have you found a way to fix the issue? Now I run into the same problem with version 0.9.1. Thanks, Cheney -- View this message in context:

Re: ClassNotFoundException

2014-05-04 Thread pedro
I just ran into the same problem. I will respond if I find how to fix. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-tp5182p5342.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Initial job has not accepted any resources

2014-05-04 Thread pedro
Since it appears breeze is going to be included by default in Spark in 1.0, and I ran into the issue here: http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-td5182.html And it seems like the issues I had were recently introduced, I am cloning spark and checking out the

Re: spark 0.9.1: ClassNotFoundException

2014-05-04 Thread phoenix bai
check if the jar file that includes your example code is under examples/target/scala-2.10/. On Sat, May 3, 2014 at 5:58 AM, SK skrishna...@gmail.com wrote: I am using Spark 0.9.1 in standalone mode. In the SPARK_HOME/examples/src/main/scala/org/apache/spark/ folder, I created my directory

RE: spark-shell driver interacting with Workers in YARN mode - firewall blocking communication

2014-05-04 Thread Andrew Lee
Hi Jacob, Taking both concerns into account, I'm actually thinking about using a separate subnet to isolate the Spark Workers, but need to look into how to bind the process onto the correct interface first. This may require some code change.Separate subnet doesn't limit itself with port range

Re: pySpark memory usage

2014-05-04 Thread Aaron Davidson
I'd just like to update this thread by pointing to the PR based on our initial design: https://github.com/apache/spark/pull/640 This solution is a little more general and avoids catching IOException altogether. Long live exception propagation! On Mon, Apr 28, 2014 at 1:28 PM, Patrick Wendell

Cache issue for iteration with broadcast

2014-05-04 Thread Earthson
A new broadcast object will generated for every iteration step, it may eat up the memory and make persist fail. The broadcast object should not be removed because RDD may be recomputed. And I am trying to prevent recomputing RDD, it need old broadcast release some memory. I've tried to set

Re: Cache issue for iteration with broadcast

2014-05-04 Thread Earthson
Code Here https://github.com/Earthson/sparklda/blob/dev/src/main/scala/net/earthson/nlp/lda/lda.scala#L121 Finally, iteration still runs into recomputing... -- View this message in context:

Re: Is any idea on architecture based on Spark + Spray + Akka

2014-05-04 Thread 诺铁
hello,ZhangYi I find ooyala's opensourced spark-jobserver, https://github.com/ooyala/spark-jobserver seems that they are also using akka and spray and spark, maybe helpful for you. On Mon, May 5, 2014 at 11:37 AM, ZhangYi yizh...@thoughtworks.com wrote: Hi all, Currently, our project is

Re: Cache issue for iteration with broadcast

2014-05-04 Thread Earthson
I tried using serialization instead of broadcast, and my program exit with Error(beyond physical memory limits). The large object can not be released by GC? because it is needed for recomputing? So what is the recomended way to solve this problem? -- View this message in context:

Re: NoSuchMethodError: breeze.linalg.DenseMatrix

2014-05-04 Thread wxhsdp
Hi, DB, i think it's something related to sbt publishLocal if i remove the breeze dependency in my sbt file, breeze can not be found [error] /home/wxhsdp/spark/example/test/src/main/scala/test.scala:5: not found: object breeze [error] import breeze.linalg._ [error]^ here's my sbt file: