Re: Hybrid GPU CPU computation
This is a bit crazy :) I suppose you would have to run Java code on the GPU! I heard there are some funny projects to do that... Pascal On Fri, Apr 11, 2014 at 2:38 PM, Jaonary Rabarisoa jaon...@gmail.comwrote: Hi all, I'm just wondering if hybrid GPU/CPU computation is something that is feasible with spark ? And what should be the best way to do it. Cheers, Jaonary
Re: Hybrid GPU CPU computation
On Fri, Apr 11, 2014 at 3:34 PM, Dean Wampler deanwamp...@gmail.com wrote: I've thought about this idea, although I haven't tried it, but I think the right approach is to pick your granularity boundary and use Spark + JVM for large-scale parts of the algorithm, then use the gpgus API for number crunching large chunks at a time. No need to run the JVM and Spark on the GPU, which would make no sense anyway. I find that would be crazy to be able to run the JVM on a GPU even if it's a bit non-sense XD Anyway, you're right, the approach by delegating just some parts of the code to the GPU is interesting but it also means you have to pre-install this code on all cluster nodes... Here's another approach: http://www.cakesolutions.net/teamblogs/2013/02/13/akka-and-cuda/ dean On Fri, Apr 11, 2014 at 7:49 AM, Saurabh Jha saurabh.jha.2...@gmail.comwrote: There is a scala implementation for gpgus (nvidia cuda to be precise). but you also need to port mesos for gpu's. I am not sure about mesos. Also, the current scala gpu version is not stable to be used commercially. Hope this helps. Thanks saurabh. *Saurabh Jha* Intl. Exchange Student School of Computing Engineering Nanyang Technological University, Singapore Web: http://profile.saurabhjha.in Mob: +65 94663172 On Fri, Apr 11, 2014 at 8:40 PM, Pascal Voitot Dev pascal.voitot@gmail.com wrote: This is a bit crazy :) I suppose you would have to run Java code on the GPU! I heard there are some funny projects to do that... Pascal On Fri, Apr 11, 2014 at 2:38 PM, Jaonary Rabarisoa jaon...@gmail.comwrote: Hi all, I'm just wondering if hybrid GPU/CPU computation is something that is feasible with spark ? And what should be the best way to do it. Cheers, Jaonary -- Dean Wampler, Ph.D. Typesafe @deanwampler http://typesafe.com http://polyglotprogramming.com
Re: Announcing Spark SQL
Le 27 mars 2014 09:47, andy petrella andy.petre...@gmail.com a écrit : I hijack the thread, but my2c is that this feature is also important to enable ad-hoc queries which is done at runtime. It doesn't remove interests for such macro for precompiled jobs of course, but it may not be the first use case envisioned with this Spark SQL. I'm not sure to see what you call ad- hoc queries... Any sample? Again, only my0.2c (ok I divided by 10 after writing my thoughts ^^) Andy On Thu, Mar 27, 2014 at 9:16 AM, Pascal Voitot Dev pascal.voitot@gmail.com wrote: Hi, Quite interesting! Suggestion: why not go even fancier parse SQL queries at compile-time with a macro ? ;) Pascal On Wed, Mar 26, 2014 at 10:58 PM, Michael Armbrust mich...@databricks.com wrote: Hey Everyone, This already went out to the dev list, but I wanted to put a pointer here as well to a new feature we are pretty excited about for Spark 1.0. http://databricks.com/blog/2014/03/26/Spark-SQL-manipulating-structured-data-using-Spark.html Michael
Re: Announcing Spark SQL
On Thu, Mar 27, 2014 at 10:22 AM, andy petrella andy.petre...@gmail.comwrote: I just mean queries sent at runtime ^^, like for any RDBMS. In our project we have such requirement to have a layer to play with the data (custom and low level service layer of a lambda arch), and something like this is interesting. Ok that's what I thought! But for these runtime queries, is a macro useful for you? On Thu, Mar 27, 2014 at 10:15 AM, Pascal Voitot Dev pascal.voitot@gmail.com wrote: Le 27 mars 2014 09:47, andy petrella andy.petre...@gmail.com a écrit : I hijack the thread, but my2c is that this feature is also important to enable ad-hoc queries which is done at runtime. It doesn't remove interests for such macro for precompiled jobs of course, but it may not be the first use case envisioned with this Spark SQL. I'm not sure to see what you call ad- hoc queries... Any sample? Again, only my0.2c (ok I divided by 10 after writing my thoughts ^^) Andy On Thu, Mar 27, 2014 at 9:16 AM, Pascal Voitot Dev pascal.voitot@gmail.com wrote: Hi, Quite interesting! Suggestion: why not go even fancier parse SQL queries at compile-time with a macro ? ;) Pascal On Wed, Mar 26, 2014 at 10:58 PM, Michael Armbrust mich...@databricks.com wrote: Hey Everyone, This already went out to the dev list, but I wanted to put a pointer here as well to a new feature we are pretty excited about for Spark 1.0. http://databricks.com/blog/2014/03/26/Spark-SQL-manipulating-structured-data-using-Spark.html Michael
Re: Announcing Spark SQL
On Thu, Mar 27, 2014 at 11:08 AM, andy petrella andy.petre...@gmail.comwrote: nope (what I said :-P) That's also my answer to my own question :D but I didn't understand that in your sentence: my2c is that this feature is also important to enable ad-hoc queries which is done at runtime. On Thu, Mar 27, 2014 at 11:05 AM, Pascal Voitot Dev pascal.voitot@gmail.com wrote: On Thu, Mar 27, 2014 at 10:22 AM, andy petrella andy.petre...@gmail.comwrote: I just mean queries sent at runtime ^^, like for any RDBMS. In our project we have such requirement to have a layer to play with the data (custom and low level service layer of a lambda arch), and something like this is interesting. Ok that's what I thought! But for these runtime queries, is a macro useful for you? On Thu, Mar 27, 2014 at 10:15 AM, Pascal Voitot Dev pascal.voitot@gmail.com wrote: Le 27 mars 2014 09:47, andy petrella andy.petre...@gmail.com a écrit : I hijack the thread, but my2c is that this feature is also important to enable ad-hoc queries which is done at runtime. It doesn't remove interests for such macro for precompiled jobs of course, but it may not be the first use case envisioned with this Spark SQL. I'm not sure to see what you call ad- hoc queries... Any sample? Again, only my0.2c (ok I divided by 10 after writing my thoughts ^^) Andy On Thu, Mar 27, 2014 at 9:16 AM, Pascal Voitot Dev pascal.voitot@gmail.com wrote: Hi, Quite interesting! Suggestion: why not go even fancier parse SQL queries at compile-time with a macro ? ;) Pascal On Wed, Mar 26, 2014 at 10:58 PM, Michael Armbrust mich...@databricks.com wrote: Hey Everyone, This already went out to the dev list, but I wanted to put a pointer here as well to a new feature we are pretty excited about for Spark 1.0. http://databricks.com/blog/2014/03/26/Spark-SQL-manipulating-structured-data-using-Spark.html Michael
Re: Announcing Spark SQL
when there is something new, it's also cool to let imagination fly far away ;) On Thu, Mar 27, 2014 at 2:20 PM, andy petrella andy.petre...@gmail.comwrote: Yes it could, of course. I didn't say that there is no tool to do it, though ;-). Andy On Thu, Mar 27, 2014 at 12:49 PM, yana yana.kadiy...@gmail.com wrote: Does Shark not suit your needs? That's what we use at the moment and it's been good Sent from my Samsung Galaxy S®4 Original message From: andy petrella Date:03/27/2014 6:08 AM (GMT-05:00) To: user@spark.apache.org Subject: Re: Announcing Spark SQL nope (what I said :-P) On Thu, Mar 27, 2014 at 11:05 AM, Pascal Voitot Dev pascal.voitot@gmail.com wrote: On Thu, Mar 27, 2014 at 10:22 AM, andy petrella andy.petre...@gmail.com wrote: I just mean queries sent at runtime ^^, like for any RDBMS. In our project we have such requirement to have a layer to play with the data (custom and low level service layer of a lambda arch), and something like this is interesting. Ok that's what I thought! But for these runtime queries, is a macro useful for you? On Thu, Mar 27, 2014 at 10:15 AM, Pascal Voitot Dev pascal.voitot@gmail.com wrote: Le 27 mars 2014 09:47, andy petrella andy.petre...@gmail.com a écrit : I hijack the thread, but my2c is that this feature is also important to enable ad-hoc queries which is done at runtime. It doesn't remove interests for such macro for precompiled jobs of course, but it may not be the first use case envisioned with this Spark SQL. I'm not sure to see what you call ad- hoc queries... Any sample? Again, only my0.2c (ok I divided by 10 after writing my thoughts ^^) Andy On Thu, Mar 27, 2014 at 9:16 AM, Pascal Voitot Dev pascal.voitot@gmail.com wrote: Hi, Quite interesting! Suggestion: why not go even fancier parse SQL queries at compile-time with a macro ? ;) Pascal On Wed, Mar 26, 2014 at 10:58 PM, Michael Armbrust mich...@databricks.com wrote: Hey Everyone, This already went out to the dev list, but I wanted to put a pointer here as well to a new feature we are pretty excited about for Spark 1.0. http://databricks.com/blog/2014/03/26/Spark-SQL-manipulating-structured-data-using-Spark.html Michael
Re: Relation between DStream and RDDs
If I may add my contribution to this discussion if I understand well your question... DStream is discretized stream. It discretized the data stream over windows of time (according to the project code I've read and paper too). so when you write: JavaStreamingContext stcObj = new JavaStreamingContext(confObj, new Duration(60 * 60 * 1000)); //1 hour It means you are discretizing over a 1h window. Each batch so each RDD of the dstream will collect data for 1h before going to next RDD. So if you want to have more RDD, you should reduce batch size/duration... Pascal On Thu, Mar 20, 2014 at 7:51 AM, Tathagata Das tathagata.das1...@gmail.comwrote: That is a good question. If I understand correctly, you need multiple RDDs from a DStream in *every batch*. Can you elaborate on why do you need multiple RDDs every batch? TD On Wed, Mar 19, 2014 at 10:20 PM, Sanjay Awatramani sanjay_a...@yahoo.com wrote: Hi, As I understand, a DStream consists of 1 or more RDDs. And foreachRDD will run a given func on each and every RDD inside a DStream. I created a simple program which reads log files from a folder every hour: JavaStreamingContext stcObj = new JavaStreamingContext(confObj, new Duration(60 * 60 * 1000)); //1 hour JavaDStreamString obj = stcObj.textFileStream(/Users/path/to/Input); When the interval is reached, Spark reads all the files and creates one and only one RDD (as i verified from a sysout inside foreachRDD). The streaming doc at a lot of places gives an indication that many operations (e.g. flatMap) on a DStream are applied individually to a RDD and the resulting DStream consists of the mapped RDDs in the same number as the input DStream. ref: https://spark.apache.org/docs/latest/streaming-programming-guide.html#dstreams If that is the case, how can i generate a scenario where in I have multiple RDDs inside a DStream in my example ? Regards, Sanjay
[spark] New article on spark scalaz-stream ( a bit of ML)
Hi, I wrote this new article after studying deeper how to adapt scalaz-stream to spark dstreams. I re-explain a few spark ( scalaz-stream) concepts (in my own words) in it and I went further using new scalaz-stream NIO API which is quite interesting IMHO. The result is a long blog tryptic starting here : http://mandubian.com/2014/03/08/zpark-ml-nio-1/ Regards Pascal