I think fragment-replicate join may not work for this case as it can only do inner join. Cross should work, though 1 reducer will be a limiting factor.
Ashutosh On Sat, Oct 17, 2009 at 18:44, Thejas Nair <[email protected]> wrote: > > You can do a cross product of numbers, min_max and calculate - > > grunt> numbers_min_max = cross numbers, min_max; > grunt> dump numbers_min_max; > (6,4,8) > (4,4,8) > (8,4,8) > grunt> rescaled_numbers = foreach numbers_min_max { generate > ((float)n-min)/(max-min); } > grunt> dump rescaled_numbers; > > (0.5F) > (0.0F) > (1.0F) > > > I think the current cross product implementation uses a single reducer. So > a > more efficient thing to do might be a fragment replicate join. ( I did not > try this.) > > Ie, you could replace the above cross with - > grunt> numbers_min_max = join numbers by 'dummy', min_max by 'dummy' using > 'replicated' ; > > -Thejas > > > On 10/17/09 3:40 AM, "Mat Kelcey" <[email protected]> wrote: > > > hi guys! > > > > i have a list of numbers that i was to rescale to 0.0 -> 1.0 > > > > eg for (6,4,8) i want to convert to (0.5, 0.0, 1.0) > > > > i can find the min/max... > > grunt> numbers = load 'numbers' as (n:int); > > grunt> dump numbers; > > (6) > > (4) > > (8) > > grunt> all_numbers = group numbers all; > > grunt> min_max = foreach all_numbers { generate MIN(numbers.n) as min, > > MAX(numbers.n) as max; } > > grunt> dump min_max; > > (4,8) > > > > ...and given the min max i can rescale the list > > grunt> rescaled_numbers = foreach numbers { generate > > ((float)n-4F)/(8F-4F); } > > grunt> dump rescaled_numbers; > > (0.5F) > > (0.0F) > > (1.0F) > > > > but how do i inject the values found during min_max into the > > rescaled_numbers foreach clause? > > > > perhaps i'm thinking about this totally the wrong way? > > > > help me obi-wan kenobi, you're my only hope > > > > mat > >
