Re: Spark on Mesos 0.20
On 10/10/2014 06:11 AM, Fairiz Azizi wrote: Hello, Sorry for the late reply. When I tried the LogQuery example this time, things now seem to be fine! ... 14/10/10 04:01:21 INFO scheduler.DAGScheduler: Stage 0 (collect at LogQuery.scala:80) finished in 0.429 s 14/10/10 04:01:21 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool defa 14/10/10 04:01:21 INFO spark.SparkContext: Job finished: collect at LogQuery.scala:80, took 12.802743914 s (10.10.10.10,FRED,GET http://images.com/2013/Generic.jpg HTTP/1.1) bytes=621 n=2 Not sure if this is the correct response for that example. Our mesos/spark builds have since been updated since I last wrote. Possibly, the JDK version was updated to 1.7.0_67 If you are using an older JDK, maybe try updating that? I have tested on current JDK 7 and now I am running JDK 8, the problem still exist. Can you run logquery on data of size say 100+ GB, so that you have more map tasks. As we start to see the issue on larger tasks. - Gurvinder - Fi Fairiz Fi Azizi On Wed, Oct 8, 2014 at 7:54 AM, RJ Nowling rnowl...@gmail.com mailto:rnowl...@gmail.com wrote: Yep! That's the example I was talking about. Is an error message printed when it hangs? I get : 14/09/30 13:23:14 ERROR BlockManagerMasterActor: Got two different block manager registrations on 20140930-131734-1723727882-5050-1895-1 On Tue, Oct 7, 2014 at 8:36 PM, Fairiz Azizi code...@gmail.com mailto:code...@gmail.com wrote: Sure, could you point me to the example? The only thing I could find was https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/LogQuery.scala So do you mean running it like: MASTER=mesos://xxx_:5050_ ./run-example LogQuery I tried that and I can see the job run and the tasks complete on the slave nodes, but the client process seems to hang forever, it's probably a different problem. BTW, only a dozen or so tasks kick off. I actually haven't done much with Scala and Spark (it's been all python). Fi Fairiz Fi Azizi On Tue, Oct 7, 2014 at 6:29 AM, RJ Nowling rnowl...@gmail.com mailto:rnowl...@gmail.com wrote: I was able to reproduce it on a small 4 node cluster (1 mesos master and 3 mesos slaves) with relatively low-end specs. As I said, I just ran the log query examples with the fine-grained mesos mode. Spark 1.1.0 and mesos 0.20.1. Fairiz, could you try running the logquery example included with Spark and see what you get? Thanks! On Mon, Oct 6, 2014 at 8:07 PM, Fairiz Azizi code...@gmail.com mailto:code...@gmail.com wrote: That's what great about Spark, the community is so active! :) I compiled Mesos 0.20.1 from the source tarball. Using the Mapr3 Spark 1.1.0 distribution from the Spark downloads page (spark-1.1.0-bin-mapr3.tgz). I see no problems for the workloads we are trying. However, the cluster is small (less than 100 cores across 3 nodes). The workloads reads in just a few gigabytes from HDFS, via an ipython notebook spark shell. thanks, Fi Fairiz Fi Azizi On Mon, Oct 6, 2014 at 9:20 AM, Timothy Chen tnac...@gmail.com mailto:tnac...@gmail.com wrote: Ok I created SPARK-3817 to track this, will try to repro it as well. Tim On Mon, Oct 6, 2014 at 6:08 AM, RJ Nowling rnowl...@gmail.com mailto:rnowl...@gmail.com wrote: I've recently run into this issue as well. I get it from running Spark examples such as log query. Maybe that'll help reproduce the issue. On Monday, October 6, 2014, Gurvinder Singh gurvinder.si...@uninett.no mailto:gurvinder.si...@uninett.no wrote: The issue does not occur if the task at hand has small number of map tasks. I have a task which has 978 map tasks and I see this error as 14/10/06 09:34:40 ERROR BlockManagerMasterActor: Got two different block manager registrations on
Re: Breaking the previous large-scale sort record with Spark
Brilliant stuff ! Congrats all :-) This is indeed really heartening news ! Regards, Mridul On Fri, Oct 10, 2014 at 8:24 PM, Matei Zaharia matei.zaha...@gmail.com wrote: Hi folks, I interrupt your regularly scheduled user / dev list to bring you some pretty cool news for the project, which is that we've been able to use Spark to break MapReduce's 100 TB and 1 PB sort records, sorting data 3x faster on 10x fewer nodes. There's a detailed writeup at http://databricks.com/blog/2014/10/10/spark-breaks-previous-large-scale-sort-record.html. Summary: while Hadoop MapReduce held last year's 100 TB world record by sorting 100 TB in 72 minutes on 2100 nodes, we sorted it in 23 minutes on 206 nodes; and we also scaled up to sort 1 PB in 234 minutes. I want to thank Reynold Xin for leading this effort over the past few weeks, along with Parviz Deyhim, Xiangrui Meng, Aaron Davidson and Ali Ghodsi. In addition, we'd really like to thank Amazon's EC2 team for providing the machines to make this possible. Finally, this result would of course not be possible without the many many other contributions, testing and feature requests from throughout the community. For an engine to scale from these multi-hour petabyte batch jobs down to 100-millisecond streaming and interactive queries is quite uncommon, and it's thanks to all of you folks that we are able to make this happen. Matei - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Breaking the previous large-scale sort record with Spark
Wow.. Cool.. Congratulations.. :) On Fri, Oct 10, 2014 at 8:51 PM, Ted Malaska ted.mala...@cloudera.com wrote: This is a bad deal, great job. On Fri, Oct 10, 2014 at 11:19 AM, Mridul Muralidharan mri...@gmail.com wrote: Brilliant stuff ! Congrats all :-) This is indeed really heartening news ! Regards, Mridul On Fri, Oct 10, 2014 at 8:24 PM, Matei Zaharia matei.zaha...@gmail.com wrote: Hi folks, I interrupt your regularly scheduled user / dev list to bring you some pretty cool news for the project, which is that we've been able to use Spark to break MapReduce's 100 TB and 1 PB sort records, sorting data 3x faster on 10x fewer nodes. There's a detailed writeup at http://databricks.com/blog/2014/10/10/spark-breaks-previous-large-scale-sort-record.html . Summary: while Hadoop MapReduce held last year's 100 TB world record by sorting 100 TB in 72 minutes on 2100 nodes, we sorted it in 23 minutes on 206 nodes; and we also scaled up to sort 1 PB in 234 minutes. I want to thank Reynold Xin for leading this effort over the past few weeks, along with Parviz Deyhim, Xiangrui Meng, Aaron Davidson and Ali Ghodsi. In addition, we'd really like to thank Amazon's EC2 team for providing the machines to make this possible. Finally, this result would of course not be possible without the many many other contributions, testing and feature requests from throughout the community. For an engine to scale from these multi-hour petabyte batch jobs down to 100-millisecond streaming and interactive queries is quite uncommon, and it's thanks to all of you folks that we are able to make this happen. Matei - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org -- Thanks Best Regards, *Dinesh J. Weerakkody* *www.dineshjweerakkody.com http://www.dineshjweerakkody.com*
Re: Breaking the previous large-scale sort record with Spark
Great! Congratulations! -- Nan Zhu On Friday, October 10, 2014 at 11:19 AM, Mridul Muralidharan wrote: Brilliant stuff ! Congrats all :-) This is indeed really heartening news ! Regards, Mridul On Fri, Oct 10, 2014 at 8:24 PM, Matei Zaharia matei.zaha...@gmail.com (mailto:matei.zaha...@gmail.com) wrote: Hi folks, I interrupt your regularly scheduled user / dev list to bring you some pretty cool news for the project, which is that we've been able to use Spark to break MapReduce's 100 TB and 1 PB sort records, sorting data 3x faster on 10x fewer nodes. There's a detailed writeup at http://databricks.com/blog/2014/10/10/spark-breaks-previous-large-scale-sort-record.html. Summary: while Hadoop MapReduce held last year's 100 TB world record by sorting 100 TB in 72 minutes on 2100 nodes, we sorted it in 23 minutes on 206 nodes; and we also scaled up to sort 1 PB in 234 minutes. I want to thank Reynold Xin for leading this effort over the past few weeks, along with Parviz Deyhim, Xiangrui Meng, Aaron Davidson and Ali Ghodsi. In addition, we'd really like to thank Amazon's EC2 team for providing the machines to make this possible. Finally, this result would of course not be possible without the many many other contributions, testing and feature requests from throughout the community. For an engine to scale from these multi-hour petabyte batch jobs down to 100-millisecond streaming and interactive queries is quite uncommon, and it's thanks to all of you folks that we are able to make this happen. Matei - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org (mailto:dev-unsubscr...@spark.apache.org) For additional commands, e-mail: dev-h...@spark.apache.org (mailto:dev-h...@spark.apache.org) - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org (mailto:user-unsubscr...@spark.apache.org) For additional commands, e-mail: user-h...@spark.apache.org (mailto:user-h...@spark.apache.org)
Re: Breaking the previous large-scale sort record with Spark
Wonderful !! On 11 Oct, 2014, at 12:00 am, Nan Zhu zhunanmcg...@gmail.com wrote: Great! Congratulations! -- Nan Zhu On Friday, October 10, 2014 at 11:19 AM, Mridul Muralidharan wrote: Brilliant stuff ! Congrats all :-) This is indeed really heartening news ! Regards, Mridul On Fri, Oct 10, 2014 at 8:24 PM, Matei Zaharia matei.zaha...@gmail.com wrote: Hi folks, I interrupt your regularly scheduled user / dev list to bring you some pretty cool news for the project, which is that we've been able to use Spark to break MapReduce's 100 TB and 1 PB sort records, sorting data 3x faster on 10x fewer nodes. There's a detailed writeup at http://databricks.com/blog/2014/10/10/spark-breaks-previous-large-scale-sort-record.html. Summary: while Hadoop MapReduce held last year's 100 TB world record by sorting 100 TB in 72 minutes on 2100 nodes, we sorted it in 23 minutes on 206 nodes; and we also scaled up to sort 1 PB in 234 minutes. I want to thank Reynold Xin for leading this effort over the past few weeks, along with Parviz Deyhim, Xiangrui Meng, Aaron Davidson and Ali Ghodsi. In addition, we'd really like to thank Amazon's EC2 team for providing the machines to make this possible. Finally, this result would of course not be possible without the many many other contributions, testing and feature requests from throughout the community. For an engine to scale from these multi-hour petabyte batch jobs down to 100-millisecond streaming and interactive queries is quite uncommon, and it's thanks to all of you folks that we are able to make this happen. Matei - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Breaking the previous large-scale sort record with Spark
Great stuff. Wonderful to see such progress in so short a time. How about some links to code and instructions so that these benchmarks can be reproduced? Regards, - Steve From: Debasish Das debasish.da...@gmail.com Date: Friday, October 10, 2014 at 8:17 To: Matei Zaharia matei.zaha...@gmail.com Cc: user u...@spark.apache.org, dev dev@spark.apache.org Subject: Re: Breaking the previous large-scale sort record with Spark Awesome news Matei ! Congratulations to the databricks team and all the community members... On Fri, Oct 10, 2014 at 7:54 AM, Matei Zaharia matei.zaha...@gmail.com wrote: Hi folks, I interrupt your regularly scheduled user / dev list to bring you some pretty cool news for the project, which is that we've been able to use Spark to break MapReduce's 100 TB and 1 PB sort records, sorting data 3x faster on 10x fewer nodes. There's a detailed writeup at http://databricks.com/blog/2014/10/10/spark-breaks-previous-large-scale-sort- record.html. Summary: while Hadoop MapReduce held last year's 100 TB world record by sorting 100 TB in 72 minutes on 2100 nodes, we sorted it in 23 minutes on 206 nodes; and we also scaled up to sort 1 PB in 234 minutes. I want to thank Reynold Xin for leading this effort over the past few weeks, along with Parviz Deyhim, Xiangrui Meng, Aaron Davidson and Ali Ghodsi. In addition, we'd really like to thank Amazon's EC2 team for providing the machines to make this possible. Finally, this result would of course not be possible without the many many other contributions, testing and feature requests from throughout the community. For an engine to scale from these multi-hour petabyte batch jobs down to 100-millisecond streaming and interactive queries is quite uncommon, and it's thanks to all of you folks that we are able to make this happen. Matei - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Trouble running tests
Running dev/run-tests as-is should work and will test everything. That's what the contributing guide recommends, if I remember correctly. At some point we should make it easier to test individual components locally using the dev script, but calling sbt on the various tests suites as Michael pointed out will always work. Nick On Friday, October 10, 2014, Yana Kadiyska yana.kadiy...@gmail.com wrote: Thanks Nicholas and Michael-- yes, I wanted to make sure all tests pass before I submitted a pull request. AMPLAB_JENKINS=true ./dev/run-tests fails for me in mlib and yarn suites(synced to 14f222f7f76cc93633aae27a94c0e556e289ec56). I was however able to run Michael's suggested tests and my changes affect the SQL project only, so I'll go ahead with the pull request... I'd like to know if people run the full suite locally though -- I can imagine cases where a change is not clearly isolated to a single module. thanks again On Thu, Oct 9, 2014 at 5:26 PM, Michael Armbrust mich...@databricks.com javascript:_e(%7B%7D,'cvml','mich...@databricks.com'); wrote: Also, in general for SQL only changes it is sufficient to run sbt/sbt catatlyst/test sql/test hive/test. The hive/test part takes the longest, so I usually leave that out until just before submitting unless my changes are hive specific. On Thu, Oct 9, 2014 at 11:40 AM, Nicholas Chammas nicholas.cham...@gmail.com javascript:_e(%7B%7D,'cvml','nicholas.cham...@gmail.com'); wrote: _RUN_SQL_TESTS needs to be true as well. Those two _... variables set get correctly when tests are run on Jenkins. They’re not meant to be manipulated directly by testers. Did you want to run SQL tests only locally? You can try faking being Jenkins by setting AMPLAB_JENKINS=true before calling run-tests. That should be simpler than futzing with the _... variables. Nick On Thu, Oct 9, 2014 at 10:10 AM, Yana yana.kadiy...@gmail.com javascript:_e(%7B%7D,'cvml','yana.kadiy...@gmail.com'); wrote: Hi, apologies if I missed a FAQ somewhere. I am trying to submit a bug fix for the very first time. Reading instructions, I forked the git repo (at c9ae79fba25cd49ca70ca398bc75434202d26a97) and am trying to run tests. I run this: ./dev/run-tests _SQL_TESTS_ONLY=true and after a while get the following error: [info] ScalaTest [info] Run completed in 3 minutes, 37 seconds. [info] Total number of tests run: 224 [info] Suites: completed 19, aborted 0 [info] Tests: succeeded 224, failed 0, canceled 0, ignored 5, pending 0 [info] All tests passed. [info] Passed: Total 224, Failed 0, Errors 0, Passed 224, Ignored 5 [success] Total time: 301 s, completed Oct 9, 2014 9:31:23 AM [error] Expected ID character [error] Not a valid command: hive-thriftserver [error] Expected project ID [error] Expected configuration [error] Expected ':' (if selecting a configuration) [error] Expected key [error] Not a valid key: hive-thriftserver [error] hive-thriftserver/test [error] ^ (I am running this without my changes) I have 2 questions: 1. How to fix this 2. Is there a best practice on what to fork so you start off with a good state? I'm wondering if I should sync the latest changes or go back to a label? thanks in advance -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Trouble-running-tests-tp8717.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org javascript:_e(%7B%7D,'cvml','dev-unsubscr...@spark.apache.org'); For additional commands, e-mail: dev-h...@spark.apache.org javascript:_e(%7B%7D,'cvml','dev-h...@spark.apache.org');
Re: spark-prs and mesos/spark-ec2
I think this would require fairly significant refactoring of the PR board code. I’d love it if the PR board code was more easily configurable to support different JIRA / GitHub repositories, etc, but I don’t have the time to work on this myself. - Josh On October 9, 2014 at 6:20:12 PM, Nicholas Chammas (nicholas.cham...@gmail.com) wrote: Does it make sense to point the Spark PR review board to read from mesos/spark-ec2 as well? PRs submitted against that repo may reference Spark JIRAs and need review just like any other Spark PR. Nick