[
https://issues.apache.org/jira/browse/PIG-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840883#action_12840883
]
Dmitriy V. Ryaboy commented on PIG-1252:
----------------------------------------
Richard,
Is there any documentation on what the secondary key optimization does, when it
kicks in, benchmarks of how much improvement it provides, and hints on what the
expected tradeoffs would be?
> Diamond splitter does not generate correct results when using Multi-query
> optimization
> --------------------------------------------------------------------------------------
>
> Key: PIG-1252
> URL: https://issues.apache.org/jira/browse/PIG-1252
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.6.0
> Reporter: Viraj Bhat
> Assignee: Richard Ding
> Fix For: 0.7.0
>
> Attachments: PIG-1252.patch
>
>
> I have script which uses split but somehow does not use one of the split
> branch. The skeleton of the script is as follows
> {code}
> loadData = load '/user/viraj/zebradata' using
> org.apache.hadoop.zebra.pig.TableLoader('col1,col2, col3, col4, col5, col6,
> col7');
> prjData = FOREACH loadData GENERATE (chararray) col1, (chararray) col2,
> (chararray) col3, (chararray) ((col4 is not null and col4 != '') ? col4 :
> ((col5 is not null) ? col5 : '')) as splitcond, (chararray) (col6 == 'c' ? 1
> : IS_VALID ('200', '0', '0', 'input.txt')) as validRec;
> SPLIT prjData INTO trueDataTmp IF (validRec == '1' AND splitcond != ''),
> falseDataTmp IF (validRec == '1' AND splitcond == '');
> grpData = GROUP trueDataTmp BY splitcond;
> finalData = FOREACH grpData {
> orderedData = ORDER trueDataTmp BY col1,col2;
> GENERATE FLATTEN ( MYUDF (orderedData, 60,
> 1800, 'input.txt', 'input.dat','20100222','5', 'debug_on')) as (s,m,l);
> }
> dump finalData;
> {code}
> You can see that "falseDataTmp" is untouched.
> When I run this script with no-Multiquery (-M) option I get the right result.
> This could be the result of complex BinCond's in the POLoad. We can get rid
> of this error by using FILTER instead of SPIT.
> Viraj
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.