Re: mahout on GPU
yes , you are right . it's not something general but I like idea.It just proves that in some cases we can achieve higher speed. by the way I was just wondering which one is most beneficial for machine learning stuff.for some like matrix factorization such as Eigendecomposition or some clustering g stuff. I am new to both one and I want to choose one to focus on. On Tue, Jul 10, 2012 at 4:35 PM, Ted Dunning ted.dunn...@gmail.com wrote: Note that on page 6 they explicitly say that if they had to actually read their input, this wouldn't help. Since they *generate* their input inside the GPU, they get speedup. Without that aspect, they wouldn't get any gain. This is a wildly non-typical case and is a great example of the kind of program that I mentioned before that has an enormous ratio of compute / input size. On Tue, Jul 10, 2012 at 3:51 AM, mohsen jadidi mohsen.jad...@gmail.com wrote: to add some note: This paper demonstrated that a version of Hadoop MapReduce when “ported” to a small 4-node GPU cluster could outperform a regular Hadoop 62 node CPU cluster and achieved a 508x speed-up per cluster node when performing Black Scholes option pricing. It should be noted that Black Scholes algorithm is an analytical algorithm. The GPU cluster configuration comprised 5 nodes each comprising a quad-core CPU with two 9800 GX2 GPUs, each with 128 core processors, connected to a Gigabit Ethernet router and one control node also connected to the Gigabit Ethernet router. On Tue, Jul 10, 2012 at 12:48 PM, mohsen jadidi mohsen.jad...@gmail.com wrote: sorry but I don't agree with you. We can benefit of GPU to speed up the hadoop MapReduce computation .look at this paper. I just found it : http://ieeexplore.ieee.org/xpl/login.jsp?tp=arnumber=5289201url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D5289201 On Mon, Jul 9, 2012 at 7:13 PM, Sean Owen sro...@gmail.com wrote: Hadoop and CUDA are quite at odds -- Hadoop is all about splitting up a problem across quite remote machines while CUDA/GPU approaches rely on putting all computation together not only on one machine but within one graphics card. It doesn't make sense to combine them. Either you want to distribute a lot or you don't. As has been said above, it is all quite possible to implement if you want to. Nothing like this exists in Mahout. There is not even native code in this project. On Mon, Jul 9, 2012 at 6:07 PM, mohsen jadidi mohsen.jad...@gmail.com wrote: yes it makes sense . but I am more interested to get faster computation by combining the Mahout and GPU capabilities. I just wanted to know if people involve in Mahout have thought about it or is it at all possible or not.for example speed up the Map and Reduce phases by parallelise computations on nodes. Of course I am not aware of communication cost. -- Mohsen Jadidi -- Mohsen Jadidi -- Mohsen Jadidi
Re: mahout on GPU
Suprisingly enough, even the large scale svd codes in Mahout are nearly I/O bound. The issue is with sparse data, you really can process data nearly as fast as it comes in (with a few exceptional steps). On Fri, Jul 13, 2012 at 11:10 AM, mohsen jadidi mohsen.jad...@gmail.comwrote: yes , you are right . it's not something general but I like idea.It just proves that in some cases we can achieve higher speed. by the way I was just wondering which one is most beneficial for machine learning stuff.for some like matrix factorization such as Eigendecomposition or some clustering g stuff. I am new to both one and I want to choose one to focus on. On Tue, Jul 10, 2012 at 4:35 PM, Ted Dunning ted.dunn...@gmail.com wrote: Note that on page 6 they explicitly say that if they had to actually read their input, this wouldn't help. Since they *generate* their input inside the GPU, they get speedup. Without that aspect, they wouldn't get any gain. This is a wildly non-typical case and is a great example of the kind of program that I mentioned before that has an enormous ratio of compute / input size. On Tue, Jul 10, 2012 at 3:51 AM, mohsen jadidi mohsen.jad...@gmail.com wrote: to add some note: This paper demonstrated that a version of Hadoop MapReduce when “ported” to a small 4-node GPU cluster could outperform a regular Hadoop 62 node CPU cluster and achieved a 508x speed-up per cluster node when performing Black Scholes option pricing. It should be noted that Black Scholes algorithm is an analytical algorithm. The GPU cluster configuration comprised 5 nodes each comprising a quad-core CPU with two 9800 GX2 GPUs, each with 128 core processors, connected to a Gigabit Ethernet router and one control node also connected to the Gigabit Ethernet router. On Tue, Jul 10, 2012 at 12:48 PM, mohsen jadidi mohsen.jad...@gmail.com wrote: sorry but I don't agree with you. We can benefit of GPU to speed up the hadoop MapReduce computation .look at this paper. I just found it : http://ieeexplore.ieee.org/xpl/login.jsp?tp=arnumber=5289201url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D5289201 On Mon, Jul 9, 2012 at 7:13 PM, Sean Owen sro...@gmail.com wrote: Hadoop and CUDA are quite at odds -- Hadoop is all about splitting up a problem across quite remote machines while CUDA/GPU approaches rely on putting all computation together not only on one machine but within one graphics card. It doesn't make sense to combine them. Either you want to distribute a lot or you don't. As has been said above, it is all quite possible to implement if you want to. Nothing like this exists in Mahout. There is not even native code in this project. On Mon, Jul 9, 2012 at 6:07 PM, mohsen jadidi mohsen.jad...@gmail.com wrote: yes it makes sense . but I am more interested to get faster computation by combining the Mahout and GPU capabilities. I just wanted to know if people involve in Mahout have thought about it or is it at all possible or not.for example speed up the Map and Reduce phases by parallelise computations on nodes. Of course I am not aware of communication cost. -- Mohsen Jadidi -- Mohsen Jadidi -- Mohsen Jadidi
Re: mahout on GPU
sorry but I don't agree with you. We can benefit of GPU to speed up the hadoop MapReduce computation .look at this paper. I just found it : http://ieeexplore.ieee.org/xpl/login.jsp?tp=arnumber=5289201url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D5289201 On Mon, Jul 9, 2012 at 7:13 PM, Sean Owen sro...@gmail.com wrote: Hadoop and CUDA are quite at odds -- Hadoop is all about splitting up a problem across quite remote machines while CUDA/GPU approaches rely on putting all computation together not only on one machine but within one graphics card. It doesn't make sense to combine them. Either you want to distribute a lot or you don't. As has been said above, it is all quite possible to implement if you want to. Nothing like this exists in Mahout. There is not even native code in this project. On Mon, Jul 9, 2012 at 6:07 PM, mohsen jadidi mohsen.jad...@gmail.com wrote: yes it makes sense . but I am more interested to get faster computation by combining the Mahout and GPU capabilities. I just wanted to know if people involve in Mahout have thought about it or is it at all possible or not.for example speed up the Map and Reduce phases by parallelise computations on nodes. Of course I am not aware of communication cost. -- Mohsen Jadidi
Re: mahout on GPU
to add some note: This paper demonstrated that a version of Hadoop MapReduce when “ported” to a small 4-node GPU cluster could outperform a regular Hadoop 62 node CPU cluster and achieved a 508x speed-up per cluster node when performing Black Scholes option pricing. It should be noted that Black Scholes algorithm is an analytical algorithm. The GPU cluster configuration comprised 5 nodes each comprising a quad-core CPU with two 9800 GX2 GPUs, each with 128 core processors, connected to a Gigabit Ethernet router and one control node also connected to the Gigabit Ethernet router. On Tue, Jul 10, 2012 at 12:48 PM, mohsen jadidi mohsen.jad...@gmail.comwrote: sorry but I don't agree with you. We can benefit of GPU to speed up the hadoop MapReduce computation .look at this paper. I just found it : http://ieeexplore.ieee.org/xpl/login.jsp?tp=arnumber=5289201url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D5289201 On Mon, Jul 9, 2012 at 7:13 PM, Sean Owen sro...@gmail.com wrote: Hadoop and CUDA are quite at odds -- Hadoop is all about splitting up a problem across quite remote machines while CUDA/GPU approaches rely on putting all computation together not only on one machine but within one graphics card. It doesn't make sense to combine them. Either you want to distribute a lot or you don't. As has been said above, it is all quite possible to implement if you want to. Nothing like this exists in Mahout. There is not even native code in this project. On Mon, Jul 9, 2012 at 6:07 PM, mohsen jadidi mohsen.jad...@gmail.com wrote: yes it makes sense . but I am more interested to get faster computation by combining the Mahout and GPU capabilities. I just wanted to know if people involve in Mahout have thought about it or is it at all possible or not.for example speed up the Map and Reduce phases by parallelise computations on nodes. Of course I am not aware of communication cost. -- Mohsen Jadidi -- Mohsen Jadidi
Re: mahout on GPU
I don't think this result holds in general -- they chose a very CPU intensive problem, without much data movement. This won't work for, say, Mahout jobs. I don't really see the point in porting Hadoop to a GPU. If you're in a GPU you don't need most of what Hadoop does! That is I imagine this is faster if you just wrote a straight CUDA app. On Tue, Jul 10, 2012 at 11:51 AM, mohsen jadidi mohsen.jad...@gmail.com wrote: to add some note: This paper demonstrated that a version of Hadoop MapReduce when “ported” to a small 4-node GPU cluster could outperform a regular Hadoop 62 node CPU cluster and achieved a 508x speed-up per cluster node when performing Black Scholes option pricing. It should be noted that Black Scholes algorithm is an analytical algorithm. The GPU cluster configuration comprised 5 nodes each comprising a quad-core CPU with two 9800 GX2 GPUs, each with 128 core processors, connected to a Gigabit Ethernet router and one control node also connected to the Gigabit Ethernet router.
Re: mahout on GPU
Note that on page 6 they explicitly say that if they had to actually read their input, this wouldn't help. Since they *generate* their input inside the GPU, they get speedup. Without that aspect, they wouldn't get any gain. This is a wildly non-typical case and is a great example of the kind of program that I mentioned before that has an enormous ratio of compute / input size. On Tue, Jul 10, 2012 at 3:51 AM, mohsen jadidi mohsen.jad...@gmail.comwrote: to add some note: This paper demonstrated that a version of Hadoop MapReduce when “ported” to a small 4-node GPU cluster could outperform a regular Hadoop 62 node CPU cluster and achieved a 508x speed-up per cluster node when performing Black Scholes option pricing. It should be noted that Black Scholes algorithm is an analytical algorithm. The GPU cluster configuration comprised 5 nodes each comprising a quad-core CPU with two 9800 GX2 GPUs, each with 128 core processors, connected to a Gigabit Ethernet router and one control node also connected to the Gigabit Ethernet router. On Tue, Jul 10, 2012 at 12:48 PM, mohsen jadidi mohsen.jad...@gmail.com wrote: sorry but I don't agree with you. We can benefit of GPU to speed up the hadoop MapReduce computation .look at this paper. I just found it : http://ieeexplore.ieee.org/xpl/login.jsp?tp=arnumber=5289201url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D5289201 On Mon, Jul 9, 2012 at 7:13 PM, Sean Owen sro...@gmail.com wrote: Hadoop and CUDA are quite at odds -- Hadoop is all about splitting up a problem across quite remote machines while CUDA/GPU approaches rely on putting all computation together not only on one machine but within one graphics card. It doesn't make sense to combine them. Either you want to distribute a lot or you don't. As has been said above, it is all quite possible to implement if you want to. Nothing like this exists in Mahout. There is not even native code in this project. On Mon, Jul 9, 2012 at 6:07 PM, mohsen jadidi mohsen.jad...@gmail.com wrote: yes it makes sense . but I am more interested to get faster computation by combining the Mahout and GPU capabilities. I just wanted to know if people involve in Mahout have thought about it or is it at all possible or not.for example speed up the Map and Reduce phases by parallelise computations on nodes. Of course I am not aware of communication cost. -- Mohsen Jadidi -- Mohsen Jadidi
Re: mahout on GPU
Hi Mohsen, hello Sean, there is already a lot of researching going on for doing recommendations especially matrix factorization on GPUs: e.g. http://www.slideshare.net/NVIDIA/1034-gtc09 20x - 300x faster or http://www.multicoreinfo.com/research/papers/2009/ipdps09-lahabar.pdf 60x faster over MATLAB 1,41x - 17x faster over Intel MKL So basically it has already been proven that from a number crunching perspective GPUs are the way to go. Nevertheless there are a lot of other facts that have to be incorporate e.g. graphic memory. The main trend for doing real time recommendations is currently to put all the data into memory (http://notes.matthiasb.com/post/7423754826/hunch-graph-database) and then use it directly from there. I don't know if there already graphic cards with 1 Terrabyte of graphics memory. So currently in the end a semi real time approach with batch and real time processing parts is deployed e.g. Yahoo: http://users.cis.fiu.edu/~lzhen001/activities/KDD_USB_key_2010/docs/p703.pdf Have a great day Manuel On 09.07.2012, at 01:04, Sean Owen wrote: More than that, Mahout is mostly Hadoop-based, which is well up the stack from Java. No there is nothing CUDA-related in the project. The closest thing are the pure Java non-Hadoop-based recommender pieces. But it is still far from CUDA. I think CUDA is intriguing since a lot of ML is a bunch of matrix math and GPUs are very good at vectorized math. I think a first step is to introduce proper JNI bindings for the big matrix math jobs and see how much that gains. If it's a lot, then CUDA-izing the JNI pieces is an interesting next step. On Sun, Jul 8, 2012 at 11:41 PM, mohsen jadidi mohsen.jad...@gmail.com wrote: Hello , This is my first post here and I just started reading about Hadoop, Mahout and all. I was wondering if there is any solution to use Mahout on parallel computing on GPU (mainly CUDA) ? I know it's a bit wired question to ask because cud a is C base and Mahout is Java base , but I just ask it as curiosity! I think it would be a very cool combination to use both cluster and local parallelisation ! cheers, -- Mohsen -- Manuel Blechschmidt M.Sc. IT Systems Engineering Dortustr. 57 14467 Potsdam Mobil: 0173/6322621 Twitter: http://twitter.com/Manuel_B
Re: mahout on GPU
(I agree, it's quite a useful approach -- was answering the question about whether there was any such thing in Mahout. This all assumes you can fit the data in memory in the GPU but that is true for moderately large data sets.) On Mon, Jul 9, 2012 at 9:04 AM, Manuel Blechschmidt manuel.blechschm...@gmx.de wrote: Hi Mohsen, hello Sean, there is already a lot of researching going on for doing recommendations especially matrix factorization on GPUs: e.g. http://www.slideshare.net/NVIDIA/1034-gtc09 20x - 300x faster or http://www.multicoreinfo.com/research/papers/2009/ipdps09-lahabar.pdf 60x faster over MATLAB 1,41x - 17x faster over Intel MKL So basically it has already been proven that from a number crunching perspective GPUs are the way to go. Nevertheless there are a lot of other facts that have to be incorporate e.g. graphic memory. The main trend for doing real time recommendations is currently to put all the data into memory (http://notes.matthiasb.com/post/7423754826/hunch-graph-database) and then use it directly from there. I don't know if there already graphic cards with 1 Terrabyte of graphics memory. So currently in the end a semi real time approach with batch and real time processing parts is deployed e.g. Yahoo: http://users.cis.fiu.edu/~lzhen001/activities/KDD_USB_key_2010/docs/p703.pdf Have a great day Manuel
Re: mahout on GPU
Just a quick and possible innumerate thought re WebGL (which is OpenGL exposed as Web browser content via Javascript). Perhaps the big heavy number-crunching can be done on server-side Mahout / Hadoop, but with a role for *delivery* of computed matrices in the browser? The memory concerns are still relevant, but if you can get data into GPU shaders (via texture) there might be modern Web application scenarios where it's worth doing some computations locally on GPU is worthwhile. Last time i looked, getting floats off of the graphics card wasn't easy with standard WebGL btw, though there's a WebCL looming too. Dan
Re: mahout on GPU
The factorization is the heavy number crunching. The client of a recommender needs to do very little computation in comparison, like a vector-matrix product. While a GPU might make this happen faster, it's already on the order of microseconds. Compare with the cost of downloading the whole factored matrix which may run into gigabytes though. On Mon, Jul 9, 2012 at 9:11 AM, Dan Brickley dan...@danbri.org wrote: Just a quick and possible innumerate thought re WebGL (which is OpenGL exposed as Web browser content via Javascript). Perhaps the big heavy number-crunching can be done on server-side Mahout / Hadoop, but with a role for *delivery* of computed matrices in the browser? The memory concerns are still relevant, but if you can get data into GPU shaders (via texture) there might be modern Web application scenarios where it's worth doing some computations locally on GPU is worthwhile. Last time i looked, getting floats off of the graphics card wasn't easy with standard WebGL btw, though there's a WebCL looming too. Dan
Re: mahout on GPU
Thanks for clarifications and comments. On Mon, Jul 9, 2012 at 10:18 AM, Sean Owen sro...@gmail.com wrote: The factorization is the heavy number crunching. The client of a recommender needs to do very little computation in comparison, like a vector-matrix product. While a GPU might make this happen faster, it's already on the order of microseconds. Compare with the cost of downloading the whole factored matrix which may run into gigabytes though. On Mon, Jul 9, 2012 at 9:11 AM, Dan Brickley dan...@danbri.org wrote: Just a quick and possible innumerate thought re WebGL (which is OpenGL exposed as Web browser content via Javascript). Perhaps the big heavy number-crunching can be done on server-side Mahout / Hadoop, but with a role for *delivery* of computed matrices in the browser? The memory concerns are still relevant, but if you can get data into GPU shaders (via texture) there might be modern Web application scenarios where it's worth doing some computations locally on GPU is worthwhile. Last time i looked, getting floats off of the graphics card wasn't easy with standard WebGL btw, though there's a WebCL looming too. Dan -- Mohsen Jadidi
Re: mahout on GPU
Dot products are an example of something that gpu can't help with. The problem is that there the same number of flops as memory operations and memory is slow. To get acceleration you need lots of flops per memory fetch. Usually you need at least matrix by matrix multiply with both dense. Scalable algorithms depend on sparsity in many cases so you are left with a problem. Sent from my iPhone On Jul 9, 2012, at 9:31 AM, mohsen jadidi mohsen.jad...@gmail.com wrote: Thanks for clarifications and comments. On Mon, Jul 9, 2012 at 10:18 AM, Sean Owen sro...@gmail.com wrote: The factorization is the heavy number crunching. The client of a recommender needs to do very little computation in comparison, like a vector-matrix product. While a GPU might make this happen faster, it's already on the order of microseconds. Compare with the cost of downloading the whole factored matrix which may run into gigabytes though. On Mon, Jul 9, 2012 at 9:11 AM, Dan Brickley dan...@danbri.org wrote: Just a quick and possible innumerate thought re WebGL (which is OpenGL exposed as Web browser content via Javascript). Perhaps the big heavy number-crunching can be done on server-side Mahout / Hadoop, but with a role for *delivery* of computed matrices in the browser? The memory concerns are still relevant, but if you can get data into GPU shaders (via texture) there might be modern Web application scenarios where it's worth doing some computations locally on GPU is worthwhile. Last time i looked, getting floats off of the graphics card wasn't easy with standard WebGL btw, though there's a WebCL looming too. Dan -- Mohsen Jadidi
Re: mahout on GPU
yes it makes sense . but I am more interested to get faster computation by combining the Mahout and GPU capabilities. I just wanted to know if people involve in Mahout have thought about it or is it at all possible or not.for example speed up the Map and Reduce phases by parallelise computations on nodes. Of course I am not aware of communication cost. On Mon, Jul 9, 2012 at 6:48 PM, Ted Dunning ted.dunn...@gmail.com wrote: Dot products are an example of something that gpu can't help with. The problem is that there the same number of flops as memory operations and memory is slow. To get acceleration you need lots of flops per memory fetch. Usually you need at least matrix by matrix multiply with both dense. Scalable algorithms depend on sparsity in many cases so you are left with a problem. Sent from my iPhone On Jul 9, 2012, at 9:31 AM, mohsen jadidi mohsen.jad...@gmail.com wrote: Thanks for clarifications and comments. On Mon, Jul 9, 2012 at 10:18 AM, Sean Owen sro...@gmail.com wrote: The factorization is the heavy number crunching. The client of a recommender needs to do very little computation in comparison, like a vector-matrix product. While a GPU might make this happen faster, it's already on the order of microseconds. Compare with the cost of downloading the whole factored matrix which may run into gigabytes though. On Mon, Jul 9, 2012 at 9:11 AM, Dan Brickley dan...@danbri.org wrote: Just a quick and possible innumerate thought re WebGL (which is OpenGL exposed as Web browser content via Javascript). Perhaps the big heavy number-crunching can be done on server-side Mahout / Hadoop, but with a role for *delivery* of computed matrices in the browser? The memory concerns are still relevant, but if you can get data into GPU shaders (via texture) there might be modern Web application scenarios where it's worth doing some computations locally on GPU is worthwhile. Last time i looked, getting floats off of the graphics card wasn't easy with standard WebGL btw, though there's a WebCL looming too. Dan -- Mohsen Jadidi -- Mohsen Jadidi
Re: mahout on GPU
Hadoop and CUDA are quite at odds -- Hadoop is all about splitting up a problem across quite remote machines while CUDA/GPU approaches rely on putting all computation together not only on one machine but within one graphics card. It doesn't make sense to combine them. Either you want to distribute a lot or you don't. As has been said above, it is all quite possible to implement if you want to. Nothing like this exists in Mahout. There is not even native code in this project. On Mon, Jul 9, 2012 at 6:07 PM, mohsen jadidi mohsen.jad...@gmail.com wrote: yes it makes sense . but I am more interested to get faster computation by combining the Mahout and GPU capabilities. I just wanted to know if people involve in Mahout have thought about it or is it at all possible or not.for example speed up the Map and Reduce phases by parallelise computations on nodes. Of course I am not aware of communication cost.
Re: mahout on GPU
More than that, Mahout is mostly Hadoop-based, which is well up the stack from Java. No there is nothing CUDA-related in the project. The closest thing are the pure Java non-Hadoop-based recommender pieces. But it is still far from CUDA. I think CUDA is intriguing since a lot of ML is a bunch of matrix math and GPUs are very good at vectorized math. I think a first step is to introduce proper JNI bindings for the big matrix math jobs and see how much that gains. If it's a lot, then CUDA-izing the JNI pieces is an interesting next step. On Sun, Jul 8, 2012 at 11:41 PM, mohsen jadidi mohsen.jad...@gmail.com wrote: Hello , This is my first post here and I just started reading about Hadoop, Mahout and all. I was wondering if there is any solution to use Mahout on parallel computing on GPU (mainly CUDA) ? I know it's a bit wired question to ask because cud a is C base and Mahout is Java base , but I just ask it as curiosity! I think it would be a very cool combination to use both cluster and local parallelisation ! cheers, -- Mohsen
Re: mahout on GPU
In general, large scale machine learning is I/O bound already. There are some things that would not be, but to really feed a GPU reasonably, data almost has to be memory resident. For more information on CUDA from Java, see (among others) http://www.jcuda.de/ On Sun, Jul 8, 2012 at 4:04 PM, Sean Owen sro...@gmail.com wrote: More than that, Mahout is mostly Hadoop-based, which is well up the stack from Java. No there is nothing CUDA-related in the project. The closest thing are the pure Java non-Hadoop-based recommender pieces. But it is still far from CUDA. I think CUDA is intriguing since a lot of ML is a bunch of matrix math and GPUs are very good at vectorized math. I think a first step is to introduce proper JNI bindings for the big matrix math jobs and see how much that gains. If it's a lot, then CUDA-izing the JNI pieces is an interesting next step. On Sun, Jul 8, 2012 at 11:41 PM, mohsen jadidi mohsen.jad...@gmail.com wrote: Hello , This is my first post here and I just started reading about Hadoop, Mahout and all. I was wondering if there is any solution to use Mahout on parallel computing on GPU (mainly CUDA) ? I know it's a bit wired question to ask because cud a is C base and Mahout is Java base , but I just ask it as curiosity! I think it would be a very cool combination to use both cluster and local parallelisation ! cheers, -- Mohsen
Re: mahout on GPU
To put it a little differently: the GPU architecture has been developed around video games. In a video game architecture, you have a fairly small amount of data (models and textures) going into the GPU memory via the bus, and then a lot of data coming out of the GPU hardware substrate to the video output. The bus architecture is not designed to shovel a lot of data into the GPU memory, and I don't know how good they are about pushing data out to system memory. So you want problems where a relatively small amount of data has to be chewed over heavily (without building up numerical error). There are some like this- stock market quants for example. On Sun, Jul 8, 2012 at 4:21 PM, Ted Dunning ted.dunn...@gmail.com wrote: In general, large scale machine learning is I/O bound already. There are some things that would not be, but to really feed a GPU reasonably, data almost has to be memory resident. For more information on CUDA from Java, see (among others) http://www.jcuda.de/ On Sun, Jul 8, 2012 at 4:04 PM, Sean Owen sro...@gmail.com wrote: More than that, Mahout is mostly Hadoop-based, which is well up the stack from Java. No there is nothing CUDA-related in the project. The closest thing are the pure Java non-Hadoop-based recommender pieces. But it is still far from CUDA. I think CUDA is intriguing since a lot of ML is a bunch of matrix math and GPUs are very good at vectorized math. I think a first step is to introduce proper JNI bindings for the big matrix math jobs and see how much that gains. If it's a lot, then CUDA-izing the JNI pieces is an interesting next step. On Sun, Jul 8, 2012 at 11:41 PM, mohsen jadidi mohsen.jad...@gmail.com wrote: Hello , This is my first post here and I just started reading about Hadoop, Mahout and all. I was wondering if there is any solution to use Mahout on parallel computing on GPU (mainly CUDA) ? I know it's a bit wired question to ask because cud a is C base and Mahout is Java base , but I just ask it as curiosity! I think it would be a very cool combination to use both cluster and local parallelisation ! cheers, -- Mohsen -- Lance Norskog goks...@gmail.com