[ 
https://issues.apache.org/jira/browse/PIG-1062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779573#action_12779573
 ] 

Thejas M Nair commented on PIG-1062:
------------------------------------

Instead of adding the num-rows information as a last special tuple, I am making 
a change to have this as part of the last tuple, appended to its end (special 
marker column  and num-rows column).
{quote}
Instead of keeping track of max. num of columns in the different rows and then 
appending the
special marker string and num of rows at the end, would it be better to just 
have these as the
first two fields of the last tuple emitted and then introduce a split-union 
combination to
ensure that the foreach pipeline gets the regular tuples (excluding the special 
tuple)?
{quote}
In the implementation in my upcoming patch, foreach pipleline that evaluates 
the join expression (in map of sampling MR job) would be getting regular 
tuples, except in case of last tuple. This is safer than existing 
implementation in trunk where all the tuples had a disk-size column appended to 
it. The split-union approach proposed above helps in getting the special tuple 
to bypass the foreach, but getting it around the reduce stage (of sampling MR 
job) sort would involve lot more changes (if the special tuple has marker and 
num-rows as first two columns). 


> load-store-redesign branch: change SampleLoader and subclasses to work with 
> new LoadFunc interface 
> ---------------------------------------------------------------------------------------------------
>
>                 Key: PIG-1062
>                 URL: https://issues.apache.org/jira/browse/PIG-1062
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Thejas M Nair
>            Assignee: Thejas M Nair
>         Attachments: PIG-1062.patch, PIG-1062.patch.3
>
>
> This is part of the effort to implement new load store interfaces as laid out 
> in http://wiki.apache.org/pig/LoadStoreRedesignProposal .
> PigStorage and BinStorage are now working.
> SampleLoader and subclasses -RandomSampleLoader, PoissonSampleLoader need to 
> be changed to work with new LoadFunc interface.  
> Fixing SampleLoader and RandomSampleLoader will get order-by queries working.
> PoissonSampleLoader is used by skew join. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to