[ 
https://issues.apache.org/jira/browse/PIG-3720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi resolved PIG-3720.
-------------------------------

    Resolution: Duplicate

I believe this is fixed in PIG-2769 for version 11.2 and later.  Steve can you 
try patching it to your 0.10?

> Nested concats of binary conditionals take 1/2 hour to parse
> ------------------------------------------------------------
>
>                 Key: PIG-3720
>                 URL: https://issues.apache.org/jira/browse/PIG-3720
>             Project: Pig
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.10.0
>            Reporter: Steve Ogden
>            Priority: Minor
>
> This statement takes over 1/2 hour to parse. Seems to be related to the 
> conditionals. Removing them and just running the nested concats, it parses 
> fast:
> fact_tsgsrtd_dim_hash = foreach tsgsrtd generate checksum,
>         UPPER(
>                 CONCAT((no_of_rics == '\\N' ? '0' : no_of_rics),
>                 CONCAT(request_start_dttm,
>                 CONCAT(request_end_dttm,
>                 CONCAT((adjs_list == '\\N' ? 'UNKNOWN' : adjs_list),
>                 CONCAT((event_datatype == '\\N' ? 'UNKNOWN' : event_datatype),
>                 CONCAT((facts_list == '\\N' ? 'UNKNOWN' : facts_list),
>                 CONCAT((frequency == '\\N' ? 'UNKNOWN' : frequency),
>                 CONCAT((points == '\\N' ? '0' : points),
>                 CONCAT((multiplier == '\\N' ? '0' : multiplier),
>                 CONCAT((qos == '\\N' ? 'UNKNOWN' : qos),
>                 CONCAT((pe == '\\N' ? '0' : pe),
>                 (event_type == 'GSREQ' ? 'GS' : (event_type == 'RICREQ' ? 
> 'RTD' : (event_type == 'TSREQ' ? 'TS' : 'UNKNOWN')))
>                 ))))))))))));
> I noticed it I split it, do half the conditionals in one relation, then take 
> the results of that and create another relation and do the other half of the 
> conditionals, it parses in less than a minute:
> fact_tsgsrtd_cat1 = foreach tsgsrtd generate checksum, points, multiplier, 
> qos, pe, event_type,
>                 CONCAT(CONCAT((no_of_rics == '\\N' ? '0' : 
> no_of_rics),'.000000000'),
>                 CONCAT(request_start_dttm,
>                 CONCAT(request_end_dttm,
>                 CONCAT((adjs_list == '\\N' ? 'UNKNOWN' : adjs_list),
>                 CONCAT((event_datatype == '\\N' ? 'UNKNOWN' : event_datatype),
>                 CONCAT((facts_list == '\\N' ? 'UNKNOWN' : facts_list),
>                 (frequency == '\\N' ? 'UNKNOWN' : frequency)
>                 )))))) as cat1;
> fact_tsgsrtd_dim_hash = foreach fact_tsgsrtd_cat1 generate checksum,
>         UPPER(
>                 CONCAT(cat1,
>                 CONCAT((points == '\\N' ? '0' : points),
>                 CONCAT((multiplier == '\\N' ? '0' : multiplier),
>                 CONCAT((qos == '\\N' ? 'UNKNOWN' : qos),
>                 CONCAT(CONCAT((pe == '\\N' ? '0' : pe), '.0000'),
>                 (event_type == 'GSREQ' ? 'GS' : (event_type == 'RICREQ' ? 
> 'RTD' : (event_type == 'TSREQ' ? 'TS' : 'UNKNOWN')))
>                 )))))) as ts_dim_hash;



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to