I think fragment-replicate join may not work for this case as it can only do
inner join. Cross should work, though 1 reducer will be a limiting factor.

Ashutosh
On Sat, Oct 17, 2009 at 18:44, Thejas Nair <[email protected]> wrote:

>
> You can do a cross product of numbers, min_max and calculate -
>
> grunt> numbers_min_max = cross numbers, min_max;
> grunt> dump numbers_min_max;
> (6,4,8)
> (4,4,8)
> (8,4,8)
> grunt> rescaled_numbers = foreach numbers_min_max { generate
> ((float)n-min)/(max-min); }
> grunt> dump rescaled_numbers;
>
> (0.5F)
> (0.0F)
> (1.0F)
>
>
> I think the current cross product implementation uses a single reducer. So
> a
> more efficient thing to do might be a fragment replicate join. ( I did not
> try this.)
>
> Ie, you could replace the above cross with -
> grunt> numbers_min_max = join numbers by 'dummy', min_max  by 'dummy' using
> 'replicated' ;
>
> -Thejas
>
>
> On 10/17/09 3:40 AM, "Mat Kelcey" <[email protected]> wrote:
>
> > hi guys!
> >
> > i have a list of numbers that i was to rescale to 0.0 -> 1.0
> >
> > eg for (6,4,8) i want to convert to (0.5, 0.0, 1.0)
> >
> > i can find the min/max...
> > grunt> numbers = load 'numbers' as (n:int);
> > grunt> dump numbers;
> > (6)
> > (4)
> > (8)
> > grunt> all_numbers = group numbers all;
> > grunt> min_max = foreach all_numbers { generate MIN(numbers.n) as min,
> > MAX(numbers.n) as max; }
> > grunt> dump min_max;
> > (4,8)
> >
> > ...and given the min max i can rescale the list
> > grunt> rescaled_numbers = foreach numbers { generate
> > ((float)n-4F)/(8F-4F); }
> > grunt> dump rescaled_numbers;
> > (0.5F)
> > (0.0F)
> > (1.0F)
> >
> > but how do i inject the values found during min_max into the
> > rescaled_numbers foreach clause?
> >
> > perhaps i'm thinking about this totally the wrong way?
> >
> > help me obi-wan kenobi, you're my only hope
> >
> > mat
>
>

Reply via email to