Yes! Connection R and Mahout within the same JVM is an awesome idea. Approaching Mahout as a non-mathematician user is frustrating because of the difficulty in visualizing and tuning results. I've done some hacky things with KNime and Excel, but the ability to do math-heavy post-processing and visualization directly would be excellent.
On Tue, Feb 14, 2012 at 12:56 PM, Dmitriy Lyubimov <[email protected]> wrote: > I and my company have allocated some time to create some mixed > environment of R and other "stuff", and, in particular, Mahout. I am > thinking of a "contributed" project with R where R is enabled to do > the following roles: > > #1 Mahout's front end driver mixing Mahout computations and R vector/matrices > #2 data vectorization/preparation routines loaded into backend of > Mahout's abstract job and adapted to write DRM; > #3 perhaps some routines allowing subsampling & subsequent > visulalization of Mahout result for prototyping and control purposes. > > > #2 kind of comes close to what R-Hadoop project does with their > mapreduce package but unfortunately it looks like that project focuses > on a particular way of serialization of R objects and adaptation for > DRM serialization doesn't seem plausible at this time. Besides, I am > thinking that it's not so difficult to run R from inside mapper > (R-Hadoop uses streaming, but i think it's worth to try R inverse java > package instead of streaming and bypass the whole text/parse routine > completely). > > Rapid prototyping and visualization of results i think is one of the > bigger barriers to Mahout adoption. But enabling mixed environment for > cpu-laden computations in R is a huge leap towards prototyping big > data pipeline IMO. Or at least it seems from the vantage point of > problems i am currently with. Rapid prototyping of Mahout pipelines > may be a huge help, esp. as new methods become available to try and > validate. > > -d > > On Sat, Feb 11, 2012 at 11:01 AM, Jeff Eastman > <[email protected]> wrote: >> Now that 0.6 is in the box, it seems a good time to start thinking about >> 0.7, from a high level goal perspective at least. Here are a couple that >> come to mind: >> >> Target code freeze date August 1, 2012 >> Get Jenkins working for us again >> Complete clustering refactoring and classification convergence >> ... -- Lance Norskog [email protected]
