Skewed join sampler misses out the key with the highest frequency -----------------------------------------------------------------
Key: PIG-1264 URL: https://issues.apache.org/jira/browse/PIG-1264 Project: Pig Issue Type: Bug Reporter: Sriranjan Manjunath Assignee: Richard Ding Fix For: 0.7.0 I am noticing two issues with the sampler used in skewed join: 1. It does not allocate multiple reducers to the key with the highest frequency. 2. It seems to be allocating the same number of reducers to every key (8 in this case). Query: a = load 'studenttab10k' using PigStorage() as (name, age, gpa); b = load 'votertab10k' as (name, age, registration, contributions); e = join a by name right, b by name using "skewed" parallel 8; store e into 'SkewedJoin_9.out'; -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.