Another algo where this came in to picture was PFPgrowth. Where in the
reducer. I needed 2 pass over the data first to count frequencies and second
to build the graph using that information. There I kept the data in memory
in a compressed form and reused it and ensured a reduce chunk doesnt get
much data that will cause it to go out of memory. But chances are it will on
some long chain data. The algorithm wasnt designed for long chained
transactions anyway

Robin

On Fri, Jan 29, 2010 at 12:41 AM, Robin Anil <[email protected]> wrote:

> Glad that you asked because I have been asking the same question myself
> when creating a Text->Vector convertor where i need to iterate over the same
> data converting them to vectors using a chunk of dictionary at a time. If i
> had the option of running multiple passes. It would have taken me just a
> single mapreduce. Here i have to do 1 pass over the data for every chunk of
> dictionary in memory.  True, I can run n sequential job using a HDFS client
> on different servers. The network data transfer  wasn't worth it.
>
> Robin
>
> On Fri, Jan 29, 2010 at 12:30 AM, Markus Weimer <[email protected]
> > wrote:
>
>> Hi,
>>
>> I have a question about hadoop, which most likely someone in mahout
>> must have solved before:
>>
>> Many online ML algorithms require multiple passes over data for best
>> performance. When putting these algorithms on hadoop, one would want
>> to run the code close to the data (same machine/rack). Mappers offer
>> this data-local execution but do not offer means to run multiple times
>> over the data. Of course, one could run the code outside of the hadoop
>> mapreduce framework as a HDFS client, but that does not offer the
>> data-locality advantage, in addition to not being scheduled through
>> the hadoop schedulers.
>>
>> How is this solved in mahout?
>>
>> Thanks for any pointer,
>>
>> Markus
>>
>
>
>
> --
> ------
> Robin Anil
> Blog: http://techdigger.wordpress.com
> -------
>
> Mahout in Action - Mammoth Scale machine learning
> Read Chapter 1 - Its Frrreeee
> http://www.manning.com/owen
>
> Try out Swipeball for iPhone
> http://itunes.com/apps/swipeball
>

Reply via email to