[ 
https://issues.apache.org/jira/browse/PIG-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13109309#comment-13109309
 ] 

Todd Lipcon commented on PIG-2296:
----------------------------------

The script in question is:
{code}
er = LOAD 'er' AS (en : chararray, es : chararray);
tokenized = FOREACH er GENERATE TOKENIZE(en) AS en, TOKENIZE(es) AS es;
pairs = FOREACH tokenized GENERATE FLATTEN(en) AS en_word, FLATTEN(es) AS 
es_word;
pairs_long = FILTER pairs BY (SIZE(en_word) > 4) AND (SIZE(es_word) > 4);
{code}
After this, pairs_long contains pairs where es_word has length <= 4. Running 
with "-t All" to disable the optimizer has correct results. An example line of 
data is:
{code}
it was a bright cold day in April , and the clocks were striking thirteen . 
Winston Smith , his chin nuzzled into his breast in an effort to escape the 
vile wind , slipped quickly through the glass doors of Victory Mansions , 
though not quickly enough to prevent a swirl of gritty dust from entering along 
with him .    intr - o zi senina si friguroasa de aprilie , pe cind ceasurile 
bateau ora treisprezece , Winston Smith , cu barbia infundata in piept pentru a 
scapa de vintul care - l lua pe sus , se strecura iute prin usile de sticla ale 
Blocului Victoria , desi nu destul de repede pentru a impiedica un virtej de 
praf si nisip sa patrunda o data cu el . 
{code}

> case where optimizer causes incorrect filtering
> -----------------------------------------------
>
>                 Key: PIG-2296
>                 URL: https://issues.apache.org/jira/browse/PIG-2296
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.1
>            Reporter: Todd Lipcon
>            Priority: Critical
>
> I have a script which reproducibly generates incorrect filter results on Pig 
> 0.8.1. Haven't tried to reproduce on 0.9 or trunk.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to