[ https://issues.apache.org/jira/browse/PIG-1062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779573#action_12779573 ]
Thejas M Nair commented on PIG-1062: ------------------------------------ Instead of adding the num-rows information as a last special tuple, I am making a change to have this as part of the last tuple, appended to its end (special marker column and num-rows column). {quote} Instead of keeping track of max. num of columns in the different rows and then appending the special marker string and num of rows at the end, would it be better to just have these as the first two fields of the last tuple emitted and then introduce a split-union combination to ensure that the foreach pipeline gets the regular tuples (excluding the special tuple)? {quote} In the implementation in my upcoming patch, foreach pipleline that evaluates the join expression (in map of sampling MR job) would be getting regular tuples, except in case of last tuple. This is safer than existing implementation in trunk where all the tuples had a disk-size column appended to it. The split-union approach proposed above helps in getting the special tuple to bypass the foreach, but getting it around the reduce stage (of sampling MR job) sort would involve lot more changes (if the special tuple has marker and num-rows as first two columns). > load-store-redesign branch: change SampleLoader and subclasses to work with > new LoadFunc interface > --------------------------------------------------------------------------------------------------- > > Key: PIG-1062 > URL: https://issues.apache.org/jira/browse/PIG-1062 > Project: Pig > Issue Type: Sub-task > Reporter: Thejas M Nair > Assignee: Thejas M Nair > Attachments: PIG-1062.patch, PIG-1062.patch.3 > > > This is part of the effort to implement new load store interfaces as laid out > in http://wiki.apache.org/pig/LoadStoreRedesignProposal . > PigStorage and BinStorage are now working. > SampleLoader and subclasses -RandomSampleLoader, PoissonSampleLoader need to > be changed to work with new LoadFunc interface. > Fixing SampleLoader and RandomSampleLoader will get order-by queries working. > PoissonSampleLoader is used by skew join. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.