[ https://issues.apache.org/jira/browse/PIG-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12851661#action_12851661 ]
Ashutosh Chauhan commented on PIG-1309: --------------------------------------- To build index, we sample every split and get an index entry corresponding to the split. After sampling all the index entries are sorted and then index is written to disk. When I first wrote MergeJoin I wasn't able to figure out how to use hadoop sorting to sort the index. So, there is a comment in MRCompiler for that: {noformat} // Sorting of index can possibly be achieved by using Hadoop sorting // between map and reduce instead of Pig doing sort. If that is so, // it will simplify lot of the code below. {noformat} Now I figured it out :) By default, if LocalRearranges produce key of type tuple Pig supplies raw binary comparators (PigTupleWritableComparator) to hadoop to compare tuples, which ignores the semantics of tuple. We need to override that behavior to make Pig supply correct version of tuple comparator (PigTupleRawComparator). We need to communicate this info to JobControlCompiler from MRCompiler. So, I am doing the same through MapReduceOper object. As a nice side-effects of this a) code in MRCompiler is indeed simplified now b) We got rid of extra index sorting inside reducer. > Map-side Cogroup > ---------------- > > Key: PIG-1309 > URL: https://issues.apache.org/jira/browse/PIG-1309 > Project: Pig > Issue Type: Bug > Components: impl > Reporter: Ashutosh Chauhan > Assignee: Ashutosh Chauhan > Attachments: mapsideCogrp.patch, pig-1309_1.patch > > > In never ending quest to make Pig go faster, we want to parallelize as many > relational operations as possible. Its already possible to do Group-by( > PIG-984 ) and Joins( PIG-845 , PIG-554 ) purely in map-side in Pig. This jira > is to add map-side implementation of Cogroup in Pig. Details to follow. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.