Yes! Connection R and Mahout within the same JVM is an awesome idea.

Approaching Mahout as a non-mathematician user is frustrating because
of the difficulty in visualizing and tuning results. I've done some
hacky things with KNime and Excel, but the ability to do math-heavy
post-processing and visualization directly would be excellent.


On Tue, Feb 14, 2012 at 12:56 PM, Dmitriy Lyubimov <[email protected]> wrote:
> I and my company have allocated some time to create some mixed
> environment of R and other "stuff", and, in particular, Mahout. I am
> thinking of a "contributed" project with R where R is enabled to do
> the following roles:
>
> #1 Mahout's front end driver mixing Mahout computations and R vector/matrices
> #2 data vectorization/preparation routines loaded into backend of
> Mahout's abstract job and adapted to write DRM;
> #3 perhaps some routines allowing subsampling & subsequent
> visulalization of Mahout result for prototyping and control purposes.
>
>
> #2 kind of comes close to what R-Hadoop project does with their
> mapreduce package but unfortunately it looks like that project focuses
> on a particular way of serialization of R objects and adaptation for
> DRM serialization doesn't seem plausible at this time. Besides, I am
> thinking that it's not so difficult to run R from inside mapper
> (R-Hadoop uses streaming, but i think it's worth to try R inverse java
> package instead of streaming and bypass the whole text/parse routine
> completely).
>
> Rapid prototyping and visualization of results i think is one of the
> bigger barriers to Mahout adoption. But enabling mixed environment for
> cpu-laden computations in R is a huge leap towards prototyping big
> data pipeline IMO. Or at least it seems from the vantage point of
> problems i am currently with. Rapid prototyping of Mahout pipelines
> may be a huge help, esp. as new methods become available to try and
> validate.
>
> -d
>
> On Sat, Feb 11, 2012 at 11:01 AM, Jeff Eastman
> <[email protected]> wrote:
>> Now that 0.6 is in the box, it seems a good time to start thinking about
>> 0.7, from a high level goal perspective at least. Here are a couple that
>> come to mind:
>>
>> Target code freeze date August 1, 2012
>> Get Jenkins working for us again
>> Complete clustering refactoring and classification convergence
>> ...



-- 
Lance Norskog
[email protected]

Reply via email to