Re: mahout on GPU

2012-07-13 Thread mohsen jadidi
yes , you are right . it's not something general but I like idea.It just
proves that in some cases we can achieve higher speed.

by the way I was just wondering which one is most beneficial for machine
learning stuff.for some like matrix factorization such as
Eigendecomposition or some clustering g stuff. I am new to both one and I
want to choose one to focus on.



On Tue, Jul 10, 2012 at 4:35 PM, Ted Dunning ted.dunn...@gmail.com wrote:

 Note that on page 6 they explicitly say that if they had to actually read
 their input, this wouldn't help.  Since they *generate* their input inside
 the GPU, they get speedup.  Without that aspect, they wouldn't get any
 gain.

 This is a wildly non-typical case and is a great example of the kind of
 program that I mentioned before that has an enormous ratio of compute /
 input size.

 On Tue, Jul 10, 2012 at 3:51 AM, mohsen jadidi mohsen.jad...@gmail.com
 wrote:

  to add some note:
 
  This paper demonstrated that a version of Hadoop MapReduce when “ported”
 to
  a small 4-node GPU cluster could outperform a regular Hadoop 62 node CPU
  cluster and achieved a 508x speed-up per cluster node when performing
 Black
  Scholes option pricing. It should be noted that Black Scholes algorithm
 is
  an analytical algorithm.  The GPU cluster configuration comprised 5 nodes
  each comprising a quad-core CPU with two 9800 GX2 GPUs, each with 128
 core
  processors, connected to a Gigabit Ethernet router and one control node
  also connected to the Gigabit Ethernet router.
 
 
  On Tue, Jul 10, 2012 at 12:48 PM, mohsen jadidi mohsen.jad...@gmail.com
  wrote:
 
   sorry but I don't agree with you. We can benefit of GPU to speed up the
   hadoop MapReduce computation .look at this paper. I just found it :
  
  
  
 
 http://ieeexplore.ieee.org/xpl/login.jsp?tp=arnumber=5289201url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D5289201
  
  
  
   On Mon, Jul 9, 2012 at 7:13 PM, Sean Owen sro...@gmail.com wrote:
  
   Hadoop and CUDA are quite at odds -- Hadoop is all about splitting up
   a problem across quite remote machines while CUDA/GPU approaches rely
   on putting all computation together not only on one machine but within
   one graphics card.
  
   It doesn't make sense to combine them. Either you want to distribute a
   lot or you don't.
  
   As has been said above, it is all quite possible to implement if you
  want
   to.
   Nothing like this exists in Mahout. There is not even native code in
   this project.
  
   On Mon, Jul 9, 2012 at 6:07 PM, mohsen jadidi 
 mohsen.jad...@gmail.com
   wrote:
yes it makes sense .
but I am more interested to get faster computation by combining the
   Mahout
and GPU capabilities. I just wanted to know if   people involve in
   Mahout
have thought about it or is it at all possible or not.for example
  speed
   up
the Map and Reduce phases by parallelise computations on nodes. Of
   course I
am not aware of communication cost.
  
  
  
  
   --
   Mohsen Jadidi
  
  
 
 
  --
  Mohsen Jadidi
 




-- 
Mohsen Jadidi


Re: mahout on GPU

2012-07-13 Thread Ted Dunning
Suprisingly enough, even the large scale svd codes in Mahout are nearly I/O
bound.  The issue is with sparse data, you really can process data nearly
as fast as it comes in (with a few exceptional steps).

On Fri, Jul 13, 2012 at 11:10 AM, mohsen jadidi mohsen.jad...@gmail.comwrote:

 yes , you are right . it's not something general but I like idea.It just
 proves that in some cases we can achieve higher speed.

 by the way I was just wondering which one is most beneficial for machine
 learning stuff.for some like matrix factorization such as
 Eigendecomposition or some clustering g stuff. I am new to both one and I
 want to choose one to focus on.



 On Tue, Jul 10, 2012 at 4:35 PM, Ted Dunning ted.dunn...@gmail.com
 wrote:

  Note that on page 6 they explicitly say that if they had to actually read
  their input, this wouldn't help.  Since they *generate* their input
 inside
  the GPU, they get speedup.  Without that aspect, they wouldn't get any
  gain.
 
  This is a wildly non-typical case and is a great example of the kind of
  program that I mentioned before that has an enormous ratio of compute /
  input size.
 
  On Tue, Jul 10, 2012 at 3:51 AM, mohsen jadidi mohsen.jad...@gmail.com
  wrote:
 
   to add some note:
  
   This paper demonstrated that a version of Hadoop MapReduce when
 “ported”
  to
   a small 4-node GPU cluster could outperform a regular Hadoop 62 node
 CPU
   cluster and achieved a 508x speed-up per cluster node when performing
  Black
   Scholes option pricing. It should be noted that Black Scholes algorithm
  is
   an analytical algorithm.  The GPU cluster configuration comprised 5
 nodes
   each comprising a quad-core CPU with two 9800 GX2 GPUs, each with 128
  core
   processors, connected to a Gigabit Ethernet router and one control node
   also connected to the Gigabit Ethernet router.
  
  
   On Tue, Jul 10, 2012 at 12:48 PM, mohsen jadidi 
 mohsen.jad...@gmail.com
   wrote:
  
sorry but I don't agree with you. We can benefit of GPU to speed up
 the
hadoop MapReduce computation .look at this paper. I just found it :
   
   
   
  
 
 http://ieeexplore.ieee.org/xpl/login.jsp?tp=arnumber=5289201url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D5289201
   
   
   
On Mon, Jul 9, 2012 at 7:13 PM, Sean Owen sro...@gmail.com wrote:
   
Hadoop and CUDA are quite at odds -- Hadoop is all about splitting
 up
a problem across quite remote machines while CUDA/GPU approaches
 rely
on putting all computation together not only on one machine but
 within
one graphics card.
   
It doesn't make sense to combine them. Either you want to
 distribute a
lot or you don't.
   
As has been said above, it is all quite possible to implement if you
   want
to.
Nothing like this exists in Mahout. There is not even native code in
this project.
   
On Mon, Jul 9, 2012 at 6:07 PM, mohsen jadidi 
  mohsen.jad...@gmail.com
wrote:
 yes it makes sense .
 but I am more interested to get faster computation by combining
 the
Mahout
 and GPU capabilities. I just wanted to know if   people involve in
Mahout
 have thought about it or is it at all possible or not.for example
   speed
up
 the Map and Reduce phases by parallelise computations on nodes. Of
course I
 am not aware of communication cost.
   
   
   
   
--
Mohsen Jadidi
   
   
  
  
   --
   Mohsen Jadidi
  
 



 --
 Mohsen Jadidi



Re: mahout on GPU

2012-07-10 Thread mohsen jadidi
sorry but I don't agree with you. We can benefit of GPU to speed up the
hadoop MapReduce computation .look at this paper. I just found it :

http://ieeexplore.ieee.org/xpl/login.jsp?tp=arnumber=5289201url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D5289201



On Mon, Jul 9, 2012 at 7:13 PM, Sean Owen sro...@gmail.com wrote:

 Hadoop and CUDA are quite at odds -- Hadoop is all about splitting up
 a problem across quite remote machines while CUDA/GPU approaches rely
 on putting all computation together not only on one machine but within
 one graphics card.

 It doesn't make sense to combine them. Either you want to distribute a
 lot or you don't.

 As has been said above, it is all quite possible to implement if you want
 to.
 Nothing like this exists in Mahout. There is not even native code in
 this project.

 On Mon, Jul 9, 2012 at 6:07 PM, mohsen jadidi mohsen.jad...@gmail.com
 wrote:
  yes it makes sense .
  but I am more interested to get faster computation by combining the
 Mahout
  and GPU capabilities. I just wanted to know if   people involve in Mahout
  have thought about it or is it at all possible or not.for example speed
 up
  the Map and Reduce phases by parallelise computations on nodes. Of
 course I
  am not aware of communication cost.




-- 
Mohsen Jadidi


Re: mahout on GPU

2012-07-10 Thread mohsen jadidi
to add some note:

This paper demonstrated that a version of Hadoop MapReduce when “ported” to
a small 4-node GPU cluster could outperform a regular Hadoop 62 node CPU
cluster and achieved a 508x speed-up per cluster node when performing Black
Scholes option pricing. It should be noted that Black Scholes algorithm is
an analytical algorithm.  The GPU cluster configuration comprised 5 nodes
each comprising a quad-core CPU with two 9800 GX2 GPUs, each with 128 core
processors, connected to a Gigabit Ethernet router and one control node
also connected to the Gigabit Ethernet router.


On Tue, Jul 10, 2012 at 12:48 PM, mohsen jadidi mohsen.jad...@gmail.comwrote:

 sorry but I don't agree with you. We can benefit of GPU to speed up the
 hadoop MapReduce computation .look at this paper. I just found it :


 http://ieeexplore.ieee.org/xpl/login.jsp?tp=arnumber=5289201url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D5289201



 On Mon, Jul 9, 2012 at 7:13 PM, Sean Owen sro...@gmail.com wrote:

 Hadoop and CUDA are quite at odds -- Hadoop is all about splitting up
 a problem across quite remote machines while CUDA/GPU approaches rely
 on putting all computation together not only on one machine but within
 one graphics card.

 It doesn't make sense to combine them. Either you want to distribute a
 lot or you don't.

 As has been said above, it is all quite possible to implement if you want
 to.
 Nothing like this exists in Mahout. There is not even native code in
 this project.

 On Mon, Jul 9, 2012 at 6:07 PM, mohsen jadidi mohsen.jad...@gmail.com
 wrote:
  yes it makes sense .
  but I am more interested to get faster computation by combining the
 Mahout
  and GPU capabilities. I just wanted to know if   people involve in
 Mahout
  have thought about it or is it at all possible or not.for example speed
 up
  the Map and Reduce phases by parallelise computations on nodes. Of
 course I
  am not aware of communication cost.




 --
 Mohsen Jadidi




-- 
Mohsen Jadidi


Re: mahout on GPU

2012-07-10 Thread Sean Owen
I don't think this result holds in general -- they chose a very CPU
intensive problem, without much data movement. This won't work for,
say, Mahout jobs.  I don't really see the point in porting Hadoop to a
GPU. If you're in a GPU you don't need most of what Hadoop does! That
is I imagine this is faster if you just wrote a straight CUDA app.

On Tue, Jul 10, 2012 at 11:51 AM, mohsen jadidi mohsen.jad...@gmail.com wrote:
 to add some note:

 This paper demonstrated that a version of Hadoop MapReduce when “ported” to
 a small 4-node GPU cluster could outperform a regular Hadoop 62 node CPU
 cluster and achieved a 508x speed-up per cluster node when performing Black
 Scholes option pricing. It should be noted that Black Scholes algorithm is
 an analytical algorithm.  The GPU cluster configuration comprised 5 nodes
 each comprising a quad-core CPU with two 9800 GX2 GPUs, each with 128 core
 processors, connected to a Gigabit Ethernet router and one control node
 also connected to the Gigabit Ethernet router.



Re: mahout on GPU

2012-07-10 Thread Ted Dunning
Note that on page 6 they explicitly say that if they had to actually read
their input, this wouldn't help.  Since they *generate* their input inside
the GPU, they get speedup.  Without that aspect, they wouldn't get any gain.

This is a wildly non-typical case and is a great example of the kind of
program that I mentioned before that has an enormous ratio of compute /
input size.

On Tue, Jul 10, 2012 at 3:51 AM, mohsen jadidi mohsen.jad...@gmail.comwrote:

 to add some note:

 This paper demonstrated that a version of Hadoop MapReduce when “ported” to
 a small 4-node GPU cluster could outperform a regular Hadoop 62 node CPU
 cluster and achieved a 508x speed-up per cluster node when performing Black
 Scholes option pricing. It should be noted that Black Scholes algorithm is
 an analytical algorithm.  The GPU cluster configuration comprised 5 nodes
 each comprising a quad-core CPU with two 9800 GX2 GPUs, each with 128 core
 processors, connected to a Gigabit Ethernet router and one control node
 also connected to the Gigabit Ethernet router.


 On Tue, Jul 10, 2012 at 12:48 PM, mohsen jadidi mohsen.jad...@gmail.com
 wrote:

  sorry but I don't agree with you. We can benefit of GPU to speed up the
  hadoop MapReduce computation .look at this paper. I just found it :
 
 
 
 http://ieeexplore.ieee.org/xpl/login.jsp?tp=arnumber=5289201url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D5289201
 
 
 
  On Mon, Jul 9, 2012 at 7:13 PM, Sean Owen sro...@gmail.com wrote:
 
  Hadoop and CUDA are quite at odds -- Hadoop is all about splitting up
  a problem across quite remote machines while CUDA/GPU approaches rely
  on putting all computation together not only on one machine but within
  one graphics card.
 
  It doesn't make sense to combine them. Either you want to distribute a
  lot or you don't.
 
  As has been said above, it is all quite possible to implement if you
 want
  to.
  Nothing like this exists in Mahout. There is not even native code in
  this project.
 
  On Mon, Jul 9, 2012 at 6:07 PM, mohsen jadidi mohsen.jad...@gmail.com
  wrote:
   yes it makes sense .
   but I am more interested to get faster computation by combining the
  Mahout
   and GPU capabilities. I just wanted to know if   people involve in
  Mahout
   have thought about it or is it at all possible or not.for example
 speed
  up
   the Map and Reduce phases by parallelise computations on nodes. Of
  course I
   am not aware of communication cost.
 
 
 
 
  --
  Mohsen Jadidi
 
 


 --
 Mohsen Jadidi



Re: mahout on GPU

2012-07-09 Thread Manuel Blechschmidt
Hi Mohsen, hello Sean,
there is already a lot of researching going on for doing recommendations 
especially matrix factorization on GPUs:

e.g.
http://www.slideshare.net/NVIDIA/1034-gtc09
20x - 300x faster
or
http://www.multicoreinfo.com/research/papers/2009/ipdps09-lahabar.pdf
60x faster over MATLAB 
1,41x - 17x faster over Intel MKL

So basically it has already been proven that from a number crunching 
perspective GPUs are the way to go. Nevertheless there are a lot of other facts 
that have to be incorporate e.g. graphic memory. The main trend for doing real 
time recommendations is currently to put all the data into memory 
(http://notes.matthiasb.com/post/7423754826/hunch-graph-database) and then use 
it directly from there. I don't know if there already graphic cards with 1 
Terrabyte of graphics memory.

So currently in the end a semi real time approach with batch and real time 
processing parts is deployed e.g. Yahoo: 
http://users.cis.fiu.edu/~lzhen001/activities/KDD_USB_key_2010/docs/p703.pdf

Have a great day
Manuel

On 09.07.2012, at 01:04, Sean Owen wrote:

 More than that, Mahout is mostly Hadoop-based, which is well up the
 stack from Java. No there is nothing CUDA-related in the project. The
 closest thing are the pure Java non-Hadoop-based recommender pieces.
 But it is still far from CUDA.
 
 I think CUDA is intriguing since a lot of ML is a bunch of matrix math
 and GPUs are very good at vectorized math. I think a first step is to
 introduce proper JNI bindings for the big matrix math jobs and see how
 much that gains. If it's a lot, then CUDA-izing the JNI pieces is an
 interesting next step.
 
 On Sun, Jul 8, 2012 at 11:41 PM, mohsen jadidi mohsen.jad...@gmail.com 
 wrote:
 Hello ,
 
 This is my first post here and I just started reading about Hadoop, Mahout
 and all. I was wondering if there is any solution to use Mahout on parallel
 computing on GPU (mainly CUDA) ? I know it's a bit wired question to ask
 because cud a is C base and Mahout is Java base , but I just ask it as
 curiosity! I think it would be a very cool combination to use both cluster
 and local parallelisation !
 
 cheers,
 --
 Mohsen

-- 
Manuel Blechschmidt
M.Sc. IT Systems Engineering
Dortustr. 57
14467 Potsdam
Mobil: 0173/6322621
Twitter: http://twitter.com/Manuel_B



Re: mahout on GPU

2012-07-09 Thread Sean Owen
(I agree, it's quite a useful approach -- was answering the question
about whether there was any such thing in Mahout. This all assumes you
can fit the data in memory in the GPU but that is true for moderately
large data sets.)

On Mon, Jul 9, 2012 at 9:04 AM, Manuel Blechschmidt
manuel.blechschm...@gmx.de wrote:
 Hi Mohsen, hello Sean,
 there is already a lot of researching going on for doing recommendations 
 especially matrix factorization on GPUs:

 e.g.
 http://www.slideshare.net/NVIDIA/1034-gtc09
 20x - 300x faster
 or
 http://www.multicoreinfo.com/research/papers/2009/ipdps09-lahabar.pdf
 60x faster over MATLAB
 1,41x - 17x faster over Intel MKL

 So basically it has already been proven that from a number crunching 
 perspective GPUs are the way to go. Nevertheless there are a lot of other 
 facts that have to be incorporate e.g. graphic memory. The main trend for 
 doing real time recommendations is currently to put all the data into memory 
 (http://notes.matthiasb.com/post/7423754826/hunch-graph-database) and then 
 use it directly from there. I don't know if there already graphic cards with 
 1 Terrabyte of graphics memory.

 So currently in the end a semi real time approach with batch and real time 
 processing parts is deployed e.g. Yahoo: 
 http://users.cis.fiu.edu/~lzhen001/activities/KDD_USB_key_2010/docs/p703.pdf

 Have a great day
 Manuel


Re: mahout on GPU

2012-07-09 Thread Dan Brickley
Just a quick and possible innumerate thought re WebGL (which is OpenGL
exposed as Web browser content via Javascript).

Perhaps the big heavy number-crunching can be done on server-side
Mahout / Hadoop, but with a role for *delivery* of computed matrices
in the browser? The memory concerns are still relevant, but if you can
get data into GPU shaders (via texture) there might be modern Web
application scenarios where it's worth doing some computations locally
on GPU is worthwhile. Last time i looked, getting floats off of the
graphics card wasn't easy with standard WebGL btw, though there's a
WebCL looming too.

Dan


Re: mahout on GPU

2012-07-09 Thread Sean Owen
The factorization is the heavy number crunching. The client of a
recommender needs to do very little computation in comparison, like a
vector-matrix product. While a GPU might make this happen faster, it's
already on the order of microseconds. Compare with the cost of
downloading the whole factored matrix which may run into gigabytes
though.

On Mon, Jul 9, 2012 at 9:11 AM, Dan Brickley dan...@danbri.org wrote:
 Just a quick and possible innumerate thought re WebGL (which is OpenGL
 exposed as Web browser content via Javascript).

 Perhaps the big heavy number-crunching can be done on server-side
 Mahout / Hadoop, but with a role for *delivery* of computed matrices
 in the browser? The memory concerns are still relevant, but if you can
 get data into GPU shaders (via texture) there might be modern Web
 application scenarios where it's worth doing some computations locally
 on GPU is worthwhile. Last time i looked, getting floats off of the
 graphics card wasn't easy with standard WebGL btw, though there's a
 WebCL looming too.

 Dan


Re: mahout on GPU

2012-07-09 Thread mohsen jadidi
Thanks for clarifications and comments.


On Mon, Jul 9, 2012 at 10:18 AM, Sean Owen sro...@gmail.com wrote:

 The factorization is the heavy number crunching. The client of a
 recommender needs to do very little computation in comparison, like a
 vector-matrix product. While a GPU might make this happen faster, it's
 already on the order of microseconds. Compare with the cost of
 downloading the whole factored matrix which may run into gigabytes
 though.

 On Mon, Jul 9, 2012 at 9:11 AM, Dan Brickley dan...@danbri.org wrote:
  Just a quick and possible innumerate thought re WebGL (which is OpenGL
  exposed as Web browser content via Javascript).
 
  Perhaps the big heavy number-crunching can be done on server-side
  Mahout / Hadoop, but with a role for *delivery* of computed matrices
  in the browser? The memory concerns are still relevant, but if you can
  get data into GPU shaders (via texture) there might be modern Web
  application scenarios where it's worth doing some computations locally
  on GPU is worthwhile. Last time i looked, getting floats off of the
  graphics card wasn't easy with standard WebGL btw, though there's a
  WebCL looming too.
 
  Dan




-- 
Mohsen Jadidi


Re: mahout on GPU

2012-07-09 Thread Ted Dunning
Dot products are an example of something that gpu can't help with. The problem 
is that there the same number of flops as memory operations and memory is slow. 
 

To get acceleration you need lots of flops per memory fetch. Usually you need 
at least matrix by matrix multiply with both dense. Scalable algorithms depend 
on sparsity in many cases so you are left with a problem. 

Sent from my iPhone

On Jul 9, 2012, at 9:31 AM, mohsen jadidi mohsen.jad...@gmail.com wrote:

 Thanks for clarifications and comments.
 
 
 On Mon, Jul 9, 2012 at 10:18 AM, Sean Owen sro...@gmail.com wrote:
 
 The factorization is the heavy number crunching. The client of a
 recommender needs to do very little computation in comparison, like a
 vector-matrix product. While a GPU might make this happen faster, it's
 already on the order of microseconds. Compare with the cost of
 downloading the whole factored matrix which may run into gigabytes
 though.
 
 On Mon, Jul 9, 2012 at 9:11 AM, Dan Brickley dan...@danbri.org wrote:
 Just a quick and possible innumerate thought re WebGL (which is OpenGL
 exposed as Web browser content via Javascript).
 
 Perhaps the big heavy number-crunching can be done on server-side
 Mahout / Hadoop, but with a role for *delivery* of computed matrices
 in the browser? The memory concerns are still relevant, but if you can
 get data into GPU shaders (via texture) there might be modern Web
 application scenarios where it's worth doing some computations locally
 on GPU is worthwhile. Last time i looked, getting floats off of the
 graphics card wasn't easy with standard WebGL btw, though there's a
 WebCL looming too.
 
 Dan
 
 
 
 
 -- 
 Mohsen Jadidi


Re: mahout on GPU

2012-07-09 Thread mohsen jadidi
yes it makes sense .
but I am more interested to get faster computation by combining the Mahout
and GPU capabilities. I just wanted to know if   people involve in Mahout
have thought about it or is it at all possible or not.for example speed up
the Map and Reduce phases by parallelise computations on nodes. Of course I
am not aware of communication cost.

On Mon, Jul 9, 2012 at 6:48 PM, Ted Dunning ted.dunn...@gmail.com wrote:

 Dot products are an example of something that gpu can't help with. The
 problem is that there the same number of flops as memory operations and
 memory is slow.

 To get acceleration you need lots of flops per memory fetch. Usually you
 need at least matrix by matrix multiply with both dense. Scalable
 algorithms depend on sparsity in many cases so you are left with a problem.

 Sent from my iPhone

 On Jul 9, 2012, at 9:31 AM, mohsen jadidi mohsen.jad...@gmail.com wrote:

  Thanks for clarifications and comments.
 
 
  On Mon, Jul 9, 2012 at 10:18 AM, Sean Owen sro...@gmail.com wrote:
 
  The factorization is the heavy number crunching. The client of a
  recommender needs to do very little computation in comparison, like a
  vector-matrix product. While a GPU might make this happen faster, it's
  already on the order of microseconds. Compare with the cost of
  downloading the whole factored matrix which may run into gigabytes
  though.
 
  On Mon, Jul 9, 2012 at 9:11 AM, Dan Brickley dan...@danbri.org wrote:
  Just a quick and possible innumerate thought re WebGL (which is OpenGL
  exposed as Web browser content via Javascript).
 
  Perhaps the big heavy number-crunching can be done on server-side
  Mahout / Hadoop, but with a role for *delivery* of computed matrices
  in the browser? The memory concerns are still relevant, but if you can
  get data into GPU shaders (via texture) there might be modern Web
  application scenarios where it's worth doing some computations locally
  on GPU is worthwhile. Last time i looked, getting floats off of the
  graphics card wasn't easy with standard WebGL btw, though there's a
  WebCL looming too.
 
  Dan
 
 
 
 
  --
  Mohsen Jadidi




-- 
Mohsen Jadidi


Re: mahout on GPU

2012-07-09 Thread Sean Owen
Hadoop and CUDA are quite at odds -- Hadoop is all about splitting up
a problem across quite remote machines while CUDA/GPU approaches rely
on putting all computation together not only on one machine but within
one graphics card.

It doesn't make sense to combine them. Either you want to distribute a
lot or you don't.

As has been said above, it is all quite possible to implement if you want to.
Nothing like this exists in Mahout. There is not even native code in
this project.

On Mon, Jul 9, 2012 at 6:07 PM, mohsen jadidi mohsen.jad...@gmail.com wrote:
 yes it makes sense .
 but I am more interested to get faster computation by combining the Mahout
 and GPU capabilities. I just wanted to know if   people involve in Mahout
 have thought about it or is it at all possible or not.for example speed up
 the Map and Reduce phases by parallelise computations on nodes. Of course I
 am not aware of communication cost.


Re: mahout on GPU

2012-07-08 Thread Sean Owen
More than that, Mahout is mostly Hadoop-based, which is well up the
stack from Java. No there is nothing CUDA-related in the project. The
closest thing are the pure Java non-Hadoop-based recommender pieces.
But it is still far from CUDA.

I think CUDA is intriguing since a lot of ML is a bunch of matrix math
and GPUs are very good at vectorized math. I think a first step is to
introduce proper JNI bindings for the big matrix math jobs and see how
much that gains. If it's a lot, then CUDA-izing the JNI pieces is an
interesting next step.

On Sun, Jul 8, 2012 at 11:41 PM, mohsen jadidi mohsen.jad...@gmail.com wrote:
 Hello ,

 This is my first post here and I just started reading about Hadoop, Mahout
 and all. I was wondering if there is any solution to use Mahout on parallel
 computing on GPU (mainly CUDA) ? I know it's a bit wired question to ask
 because cud a is C base and Mahout is Java base , but I just ask it as
 curiosity! I think it would be a very cool combination to use both cluster
 and local parallelisation !

 cheers,
 --
 Mohsen


Re: mahout on GPU

2012-07-08 Thread Ted Dunning
In general, large scale machine learning is I/O bound already.  There are
some things that would not be, but to really feed a GPU reasonably, data
almost has to be memory resident.

For more information on CUDA from Java, see (among others)
http://www.jcuda.de/

On Sun, Jul 8, 2012 at 4:04 PM, Sean Owen sro...@gmail.com wrote:

 More than that, Mahout is mostly Hadoop-based, which is well up the
 stack from Java. No there is nothing CUDA-related in the project. The
 closest thing are the pure Java non-Hadoop-based recommender pieces.
 But it is still far from CUDA.

 I think CUDA is intriguing since a lot of ML is a bunch of matrix math
 and GPUs are very good at vectorized math. I think a first step is to
 introduce proper JNI bindings for the big matrix math jobs and see how
 much that gains. If it's a lot, then CUDA-izing the JNI pieces is an
 interesting next step.

 On Sun, Jul 8, 2012 at 11:41 PM, mohsen jadidi mohsen.jad...@gmail.com
 wrote:
  Hello ,
 
  This is my first post here and I just started reading about Hadoop,
 Mahout
  and all. I was wondering if there is any solution to use Mahout on
 parallel
  computing on GPU (mainly CUDA) ? I know it's a bit wired question to ask
  because cud a is C base and Mahout is Java base , but I just ask it as
  curiosity! I think it would be a very cool combination to use both
 cluster
  and local parallelisation !
 
  cheers,
  --
  Mohsen



Re: mahout on GPU

2012-07-08 Thread Lance Norskog
To put it a little differently: the GPU architecture has been
developed around video games. In a video game architecture, you have a
fairly small amount of data (models and textures) going into the GPU
memory via the bus, and then a lot of data coming out of the GPU
hardware substrate to the video output.

The bus architecture is not designed to shovel a lot of data into the
GPU memory, and I don't know how good they are about pushing data out
to system memory. So you want problems where a relatively small
amount of data has to be chewed over heavily (without building up
numerical error). There are some like this- stock market quants for
example.

On Sun, Jul 8, 2012 at 4:21 PM, Ted Dunning ted.dunn...@gmail.com wrote:
 In general, large scale machine learning is I/O bound already.  There are
 some things that would not be, but to really feed a GPU reasonably, data
 almost has to be memory resident.

 For more information on CUDA from Java, see (among others)
 http://www.jcuda.de/

 On Sun, Jul 8, 2012 at 4:04 PM, Sean Owen sro...@gmail.com wrote:

 More than that, Mahout is mostly Hadoop-based, which is well up the
 stack from Java. No there is nothing CUDA-related in the project. The
 closest thing are the pure Java non-Hadoop-based recommender pieces.
 But it is still far from CUDA.

 I think CUDA is intriguing since a lot of ML is a bunch of matrix math
 and GPUs are very good at vectorized math. I think a first step is to
 introduce proper JNI bindings for the big matrix math jobs and see how
 much that gains. If it's a lot, then CUDA-izing the JNI pieces is an
 interesting next step.

 On Sun, Jul 8, 2012 at 11:41 PM, mohsen jadidi mohsen.jad...@gmail.com
 wrote:
  Hello ,
 
  This is my first post here and I just started reading about Hadoop,
 Mahout
  and all. I was wondering if there is any solution to use Mahout on
 parallel
  computing on GPU (mainly CUDA) ? I know it's a bit wired question to ask
  because cud a is C base and Mahout is Java base , but I just ask it as
  curiosity! I think it would be a very cool combination to use both
 cluster
  and local parallelisation !
 
  cheers,
  --
  Mohsen




-- 
Lance Norskog
goks...@gmail.com