Task Recalculate or toal failure due to fectchError

2014-03-16 Thread guojc
Hi there, In our experiment with spark, we found same spark application has large variance on execution time and sometimes even fail totally. And in the log, we find this usually due to task resubmit from fetch failure, with log as following, 14/03/16 16:40:38 WARN TaskSetManager: Lost TID

[Powered by] Yandex Islands powered by Spark

2014-03-16 Thread Egor Pahomov
Hi, page https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Sparksays I need write here, if want my project to be added there. In Yandex (www.yandex.com) now we using spark for project Yandex Islands (

Maximum memory limits

2014-03-16 Thread Debasish Das
Hi, I gave my spark job 16 gb of memory and it is running on 8 executors. The job needs more memory due to ALS requirements (20M x 1M matrix) On each node I do have 96 gb of memory and I am using 16 gb out of it. I want to increase the memory but I am not sure what is the right way to do

How to kill a spark app ?

2014-03-16 Thread Debasish Das
Are these the right options: 1. If there is a spark script, just do a ctrl-c from spark-shell and the job will be killed property. 2. For spark application also ctrl c will kill the job property on the cluster: Somehow the ctrl-c option did not work for us... Similar option works fine for

Re: Maximum memory limits

2014-03-16 Thread Sean Owen
You should simply use a snapshot built from HEAD of github.com/apache/sparkif you can. The key change is in MLlib and with any luck you can just replace that bit. See the PR I referenced. Sure with enough memory you can get it to run even with the memory issue, but it could be hundreds of GB at

Re: How to kill a spark app ?

2014-03-16 Thread Mayur Rustagi
Thr is a no good way to kill jobs in Spark yet. The closest is cancelAllJobs cancelJobGroup in spark context. I have had bugs using both. I am trying to test them out, typically you would start a different thread call these functions on it when you wish to cancel a job. Regards Mayur Mayur

Re: possible bug in Spark's ALS implementation...

2014-03-16 Thread Matei Zaharia
On Mar 14, 2014, at 5:52 PM, Michael Allman m...@allman.ms wrote: I also found that the product and user RDDs were being rebuilt many times over in my tests, even for tiny data sets. By persisting the RDD returned from updateFeatures() I was able to avoid a raft of duplicate computations. Is

Re: How to kill a spark app ?

2014-03-16 Thread Debasish Das
From http://spark.incubator.apache.org/docs/latest/spark-standalone.html#launching-applications-inside-the-cluster ./bin/spark-class org.apache.spark.deploy.Client kill driverId does not work / has bugs ? On Sun, Mar 16, 2014 at 1:17 PM, Mayur Rustagi mayur.rust...@gmail.comwrote: Thr is a

Re: How to kill a spark app ?

2014-03-16 Thread Mayur Rustagi
This is meant to kill the whole driver hosted inside the Master (new feature as of 0.9.0). I assume you are trying to kill a job/task/stage inside the Spark rather than the whole application. Regards Mayur Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi

Running Spark on a single machine

2014-03-16 Thread goi cto
Hi, I know it is probably not the purpose of spark but the syntax is easy and cool... I need to run some spark like code in memory on a single machine any pointers how to optimize it to run only on one machine? -- Eran | CTO

Machine Learning on streaming data

2014-03-16 Thread Nasir Khan
hi, I m into a project in which i have to get streaming URL's and Filter it and classify it as benin or suspicious. Now Machine Learning and Streaming are two separate things in apache spark (AFAIK). my Question is Can we apply Online Machine Learning Algorithms on Streams?? I am at Beginner

Re: slf4j and log4j loop

2014-03-16 Thread Patrick Wendell
This is not released yet but we're planning to cut a 0.9.1 release very soon (e.g. most likely this week). In the mean time you'll have checkout branch-0.9 of Spark and publish it locally then depend on the snapshot version. Or just wait it out... On Fri, Mar 14, 2014 at 2:01 PM, Adrian Mocanu

Re: Running Spark on a single machine

2014-03-16 Thread Nick Pentreath
Please follow the instructions at  http://spark.apache.org/docs/latest/index.html and  http://spark.apache.org/docs/latest/quick-start.html to get started on a local machine. — Sent from Mailbox for iPhone On Sun, Mar 16, 2014 at 11:39 PM, goi cto goi@gmail.com wrote: Hi, I know it is

Re: How to kill a spark app ?

2014-03-16 Thread Matei Zaharia
If it’s a driver on the cluster, please open a JIRA issue about this — this kill command is indeed intended to work. Matei On Mar 16, 2014, at 2:35 PM, Mayur Rustagi mayur.rust...@gmail.com wrote: Are you embedding your driver inside the cluster? If not then that command will not kill the

Re: [Powered by] Yandex Islands powered by Spark

2014-03-16 Thread Matei Zaharia
Thanks, I’ve added you: https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Spark. Let me know if you want to change any wording. Matei On Mar 16, 2014, at 6:48 AM, Egor Pahomov pahomov.e...@gmail.com wrote: Hi, page https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Spark