On Thu, Jun 28, 2012 at 9:29 AM, Rahul <[email protected]> wrote:
> Yes indeed this is a small PoC to get familiar with Crunch in relation to my
> problem. Basically I have the following algo at play:
> 1. Read data rows
> 2. Create custom keys for each of them, built using various attributes of
> data (this time it is just a simple hash code, but I would like to emit
> multiple key-value pairs)
> 3. Group similar data based on created Keys
> 4. Iterate over individual items in the group and do extensive comparison
> between all of them
>
> I just built an outline in the test case to see what/how can be done, can
> you advise something better ?


Thanks for the outline. In this case, your approach (with putting the
contents of the
incoming Iterable into a collection) should work fine, as long as
number of elements
per group is relatively small (i.e. easily able to fit in the memory
available to each reducer in your Hadoop cluster).

- Gabriel

Reply via email to