[jira] Created: (PIG-1264) Skewed join sampler misses out the key with the highest frequency

Sriranjan Manjunath (JIRA) Fri, 26 Feb 2010 12:04:49 -0800

Skewed join sampler misses out the key with the highest frequency
-----------------------------------------------------------------


                 Key: PIG-1264
                 URL: https://issues.apache.org/jira/browse/PIG-1264
             Project: Pig
          Issue Type: Bug
            Reporter: Sriranjan Manjunath
            Assignee: Richard Ding
             Fix For: 0.7.0


I am noticing two issues with the sampler used in skewed join:
1. It does not allocate multiple reducers to the key with the highest frequency.
2. It seems to be allocating the same number of reducers to every key (8 in 
this case).

Query:

a = load 'studenttab10k' using PigStorage() as (name, age, gpa);
b = load 'votertab10k' as (name, age, registration, contributions);
e = join a by name right, b by name using "skewed" parallel 8;
store e into 'SkewedJoin_9.out';


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (PIG-1264) Skewed join sampler misses out the key with the highest frequency

Reply via email to