Re: Hybrid GPU CPU computation

2014-04-11 Thread Pascal Voitot Dev
This is a bit crazy :)
I suppose you would have to run Java code on the GPU!
I heard there are some funny projects to do that...

Pascal

On Fri, Apr 11, 2014 at 2:38 PM, Jaonary Rabarisoa jaon...@gmail.comwrote:

 Hi all,

 I'm just wondering if hybrid GPU/CPU computation is something that is
 feasible with spark ? And what should be the best way to do it.


 Cheers,

 Jaonary



Re: Hybrid GPU CPU computation

2014-04-11 Thread Pascal Voitot Dev
On Fri, Apr 11, 2014 at 3:34 PM, Dean Wampler deanwamp...@gmail.com wrote:

 I've thought about this idea, although I haven't tried it, but I think the
 right approach is to pick your granularity boundary and use Spark + JVM for
 large-scale parts of the algorithm, then use the gpgus API for number
 crunching large chunks at a time. No need to run the JVM and Spark on the
 GPU, which would make no sense anyway.


I find that would be crazy to be able to run the JVM on a GPU even if it's
a bit non-sense XD
Anyway, you're right, the approach by delegating just some parts of the
code to the GPU is interesting but it also means you have to pre-install
this code on all cluster nodes...


 Here's another approach:
 http://www.cakesolutions.net/teamblogs/2013/02/13/akka-and-cuda/

 dean


 On Fri, Apr 11, 2014 at 7:49 AM, Saurabh Jha 
 saurabh.jha.2...@gmail.comwrote:

 There is a scala implementation for gpgus (nvidia cuda to be precise).
 but you also need to port mesos for gpu's. I am not sure about mesos. Also,
 the current scala gpu version is not stable to be used commercially.

 Hope this helps.

 Thanks
 saurabh.



 *Saurabh Jha*
 Intl. Exchange Student
 School of Computing Engineering
 Nanyang Technological University,
 Singapore
 Web: http://profile.saurabhjha.in
 Mob: +65 94663172


 On Fri, Apr 11, 2014 at 8:40 PM, Pascal Voitot Dev 
 pascal.voitot@gmail.com wrote:

 This is a bit crazy :)
 I suppose you would have to run Java code on the GPU!
 I heard there are some funny projects to do that...

 Pascal

 On Fri, Apr 11, 2014 at 2:38 PM, Jaonary Rabarisoa jaon...@gmail.comwrote:

 Hi all,

 I'm just wondering if hybrid GPU/CPU computation is something that is
 feasible with spark ? And what should be the best way to do it.


 Cheers,

 Jaonary






 --
 Dean Wampler, Ph.D.
 Typesafe
 @deanwampler
 http://typesafe.com
 http://polyglotprogramming.com



Re: Announcing Spark SQL

2014-03-27 Thread Pascal Voitot Dev
Le 27 mars 2014 09:47, andy petrella andy.petre...@gmail.com a écrit :

 I hijack the thread, but my2c is that this feature is also important to
enable ad-hoc queries which is done at runtime. It doesn't remove interests
for such macro for precompiled jobs of course, but it may not be the first
use case envisioned with this Spark SQL.


I'm not sure to see what you call ad- hoc queries... Any sample?

 Again, only my0.2c (ok I divided by 10 after writing my thoughts ^^)

 Andy

 On Thu, Mar 27, 2014 at 9:16 AM, Pascal Voitot Dev 
pascal.voitot@gmail.com wrote:

 Hi,
 Quite interesting!

 Suggestion: why not go even fancier  parse SQL queries at compile-time
with a macro ? ;)

 Pascal



 On Wed, Mar 26, 2014 at 10:58 PM, Michael Armbrust 
mich...@databricks.com wrote:

 Hey Everyone,

 This already went out to the dev list, but I wanted to put a pointer
here as well to a new feature we are pretty excited about for Spark 1.0.


http://databricks.com/blog/2014/03/26/Spark-SQL-manipulating-structured-data-using-Spark.html

 Michael





Re: Announcing Spark SQL

2014-03-27 Thread Pascal Voitot Dev
On Thu, Mar 27, 2014 at 10:22 AM, andy petrella andy.petre...@gmail.comwrote:

 I just mean queries sent at runtime ^^, like for any RDBMS.
 In our project we have such requirement to have a layer to play with the
 data (custom and low level service layer of a lambda arch), and something
 like this is interesting.


Ok that's what I thought! But for these runtime queries, is a macro useful
for you?




 On Thu, Mar 27, 2014 at 10:15 AM, Pascal Voitot Dev 
 pascal.voitot@gmail.com wrote:


 Le 27 mars 2014 09:47, andy petrella andy.petre...@gmail.com a écrit
 :

 
  I hijack the thread, but my2c is that this feature is also important to
 enable ad-hoc queries which is done at runtime. It doesn't remove interests
 for such macro for precompiled jobs of course, but it may not be the first
 use case envisioned with this Spark SQL.
 

 I'm not sure to see what you call ad- hoc queries... Any sample?

  Again, only my0.2c (ok I divided by 10 after writing my thoughts ^^)
 
  Andy
 
  On Thu, Mar 27, 2014 at 9:16 AM, Pascal Voitot Dev 
 pascal.voitot@gmail.com wrote:
 
  Hi,
  Quite interesting!
 
  Suggestion: why not go even fancier  parse SQL queries at
 compile-time with a macro ? ;)
 
  Pascal
 
 
 
  On Wed, Mar 26, 2014 at 10:58 PM, Michael Armbrust 
 mich...@databricks.com wrote:
 
  Hey Everyone,
 
  This already went out to the dev list, but I wanted to put a pointer
 here as well to a new feature we are pretty excited about for Spark 1.0.
 
 
 http://databricks.com/blog/2014/03/26/Spark-SQL-manipulating-structured-data-using-Spark.html
 
  Michael
 
 
 





Re: Announcing Spark SQL

2014-03-27 Thread Pascal Voitot Dev
On Thu, Mar 27, 2014 at 11:08 AM, andy petrella andy.petre...@gmail.comwrote:

 nope (what I said :-P)


That's also my answer to my own question :D

but I didn't understand that in your sentence: my2c is that this feature
is also important to enable ad-hoc queries which is done at runtime.




 On Thu, Mar 27, 2014 at 11:05 AM, Pascal Voitot Dev 
 pascal.voitot@gmail.com wrote:




 On Thu, Mar 27, 2014 at 10:22 AM, andy petrella 
 andy.petre...@gmail.comwrote:

 I just mean queries sent at runtime ^^, like for any RDBMS.
 In our project we have such requirement to have a layer to play with the
 data (custom and low level service layer of a lambda arch), and something
 like this is interesting.


 Ok that's what I thought! But for these runtime queries, is a macro
 useful for you?




 On Thu, Mar 27, 2014 at 10:15 AM, Pascal Voitot Dev 
 pascal.voitot@gmail.com wrote:


 Le 27 mars 2014 09:47, andy petrella andy.petre...@gmail.com a
 écrit :

 
  I hijack the thread, but my2c is that this feature is also important
 to enable ad-hoc queries which is done at runtime. It doesn't remove
 interests for such macro for precompiled jobs of course, but it may not be
 the first use case envisioned with this Spark SQL.
 

 I'm not sure to see what you call ad- hoc queries... Any sample?

  Again, only my0.2c (ok I divided by 10 after writing my thoughts ^^)
 
  Andy
 
  On Thu, Mar 27, 2014 at 9:16 AM, Pascal Voitot Dev 
 pascal.voitot@gmail.com wrote:
 
  Hi,
  Quite interesting!
 
  Suggestion: why not go even fancier  parse SQL queries at
 compile-time with a macro ? ;)
 
  Pascal
 
 
 
  On Wed, Mar 26, 2014 at 10:58 PM, Michael Armbrust 
 mich...@databricks.com wrote:
 
  Hey Everyone,
 
  This already went out to the dev list, but I wanted to put a
 pointer here as well to a new feature we are pretty excited about for Spark
 1.0.
 
 
 http://databricks.com/blog/2014/03/26/Spark-SQL-manipulating-structured-data-using-Spark.html
 
  Michael
 
 
 







Re: Announcing Spark SQL

2014-03-27 Thread Pascal Voitot Dev
when there is something new, it's also cool to let imagination fly far away
;)


On Thu, Mar 27, 2014 at 2:20 PM, andy petrella andy.petre...@gmail.comwrote:

 Yes it could, of course. I didn't say that there is no tool to do it,
 though ;-).

 Andy


 On Thu, Mar 27, 2014 at 12:49 PM, yana yana.kadiy...@gmail.com wrote:

 Does Shark not suit your needs? That's what we use at the moment and it's
 been good


 Sent from my Samsung Galaxy S®4


  Original message 
 From: andy petrella
 Date:03/27/2014 6:08 AM (GMT-05:00)
 To: user@spark.apache.org
 Subject: Re: Announcing Spark SQL

 nope (what I said :-P)


 On Thu, Mar 27, 2014 at 11:05 AM, Pascal Voitot Dev 
 pascal.voitot@gmail.com wrote:




 On Thu, Mar 27, 2014 at 10:22 AM, andy petrella andy.petre...@gmail.com
  wrote:

 I just mean queries sent at runtime ^^, like for any RDBMS.
 In our project we have such requirement to have a layer to play with
 the data (custom and low level service layer of a lambda arch), and
 something like this is interesting.


 Ok that's what I thought! But for these runtime queries, is a macro
 useful for you?




 On Thu, Mar 27, 2014 at 10:15 AM, Pascal Voitot Dev 
 pascal.voitot@gmail.com wrote:


 Le 27 mars 2014 09:47, andy petrella andy.petre...@gmail.com a
 écrit :

 
  I hijack the thread, but my2c is that this feature is also important
 to enable ad-hoc queries which is done at runtime. It doesn't remove
 interests for such macro for precompiled jobs of course, but it may not be
 the first use case envisioned with this Spark SQL.
 

 I'm not sure to see what you call ad- hoc queries... Any sample?

  Again, only my0.2c (ok I divided by 10 after writing my thoughts ^^)
 
  Andy
 
  On Thu, Mar 27, 2014 at 9:16 AM, Pascal Voitot Dev 
 pascal.voitot@gmail.com wrote:
 
  Hi,
  Quite interesting!
 
  Suggestion: why not go even fancier  parse SQL queries at
 compile-time with a macro ? ;)
 
  Pascal
 
 
 
  On Wed, Mar 26, 2014 at 10:58 PM, Michael Armbrust 
 mich...@databricks.com wrote:
 
  Hey Everyone,
 
  This already went out to the dev list, but I wanted to put a
 pointer here as well to a new feature we are pretty excited about for 
 Spark
 1.0.
 
 
 http://databricks.com/blog/2014/03/26/Spark-SQL-manipulating-structured-data-using-Spark.html
 
  Michael
 
 
 








Re: Relation between DStream and RDDs

2014-03-20 Thread Pascal Voitot Dev
If I may add my contribution to this discussion if I understand well your
question...

DStream is discretized stream. It discretized the data stream over windows
of time (according to the project code I've read and paper too). so when
you write:

JavaStreamingContext stcObj = new JavaStreamingContext(confObj, new
Duration(60 * 60 * 1000)); //1 hour

It means you are discretizing over a 1h window. Each batch so each RDD of
the dstream will collect data for 1h before going to next RDD.
So if you want to have more RDD, you should reduce batch size/duration...

Pascal


On Thu, Mar 20, 2014 at 7:51 AM, Tathagata Das
tathagata.das1...@gmail.comwrote:

 That is a good question. If I understand correctly, you need multiple RDDs
 from a DStream in *every batch*. Can you elaborate on why do you need
 multiple RDDs every batch?

 TD


 On Wed, Mar 19, 2014 at 10:20 PM, Sanjay Awatramani sanjay_a...@yahoo.com
  wrote:

 Hi,

 As I understand, a DStream consists of 1 or more RDDs. And foreachRDD
 will run a given func on each and every RDD inside a DStream.

 I created a simple program which reads log files from a folder every hour:
 JavaStreamingContext stcObj = new JavaStreamingContext(confObj, new
 Duration(60 * 60 * 1000)); //1 hour
 JavaDStreamString obj = stcObj.textFileStream(/Users/path/to/Input);

 When the interval is reached, Spark reads all the files and creates one
 and only one RDD (as i verified from a sysout inside foreachRDD).

 The streaming doc at a lot of places gives an indication that many
 operations (e.g. flatMap) on a DStream are applied individually to a RDD
 and the resulting DStream consists of the mapped RDDs in the same number as
 the input DStream.
 ref:
 https://spark.apache.org/docs/latest/streaming-programming-guide.html#dstreams

 If that is the case, how can i generate a scenario where in I have
 multiple RDDs inside a DStream in my example ?

 Regards,
 Sanjay





[spark] New article on spark scalaz-stream ( a bit of ML)

2014-03-18 Thread Pascal Voitot Dev
Hi,
I wrote this new article after studying deeper how to adapt scalaz-stream
to spark dstreams.
I re-explain a few spark ( scalaz-stream) concepts (in my own words) in
it and I went further using new scalaz-stream NIO API which is quite
interesting IMHO.

The result is a long blog tryptic starting here :
http://mandubian.com/2014/03/08/zpark-ml-nio-1/

Regards
Pascal