Skewed join sampler misses out the key with the highest frequency
-----------------------------------------------------------------
Key: PIG-1264
URL: https://issues.apache.org/jira/browse/PIG-1264
Project: Pig
Issue Type: Bug
Reporter: Sriranjan Manjunath
Assignee: Richard Ding
Fix For: 0.7.0
I am noticing two issues with the sampler used in skewed join:
1. It does not allocate multiple reducers to the key with the highest frequency.
2. It seems to be allocating the same number of reducers to every key (8 in
this case).
Query:
a = load 'studenttab10k' using PigStorage() as (name, age, gpa);
b = load 'votertab10k' as (name, age, registration, contributions);
e = join a by name right, b by name using "skewed" parallel 8;
store e into 'SkewedJoin_9.out';
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.