[
https://issues.apache.org/jira/browse/PIG-3720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13893371#comment-13893371
]
Steve Ogden commented on PIG-3720:
----------------------------------
Yes, this fixes the problem. Thanks!
Steve Ogden
Lead Data Warehouse Developer
Thomson Reuters
Office: 651-848-4721
Cell: 651-206-4856
[email protected]
> Nested concats of binary conditionals take 1/2 hour to parse
> ------------------------------------------------------------
>
> Key: PIG-3720
> URL: https://issues.apache.org/jira/browse/PIG-3720
> Project: Pig
> Issue Type: Bug
> Components: parser
> Affects Versions: 0.10.0
> Reporter: Steve Ogden
> Priority: Minor
>
> This statement takes over 1/2 hour to parse. Seems to be related to the
> conditionals. Removing them and just running the nested concats, it parses
> fast:
> fact_tsgsrtd_dim_hash = foreach tsgsrtd generate checksum,
> UPPER(
> CONCAT((no_of_rics == '\\N' ? '0' : no_of_rics),
> CONCAT(request_start_dttm,
> CONCAT(request_end_dttm,
> CONCAT((adjs_list == '\\N' ? 'UNKNOWN' : adjs_list),
> CONCAT((event_datatype == '\\N' ? 'UNKNOWN' : event_datatype),
> CONCAT((facts_list == '\\N' ? 'UNKNOWN' : facts_list),
> CONCAT((frequency == '\\N' ? 'UNKNOWN' : frequency),
> CONCAT((points == '\\N' ? '0' : points),
> CONCAT((multiplier == '\\N' ? '0' : multiplier),
> CONCAT((qos == '\\N' ? 'UNKNOWN' : qos),
> CONCAT((pe == '\\N' ? '0' : pe),
> (event_type == 'GSREQ' ? 'GS' : (event_type == 'RICREQ' ?
> 'RTD' : (event_type == 'TSREQ' ? 'TS' : 'UNKNOWN')))
> ))))))))))));
> I noticed it I split it, do half the conditionals in one relation, then take
> the results of that and create another relation and do the other half of the
> conditionals, it parses in less than a minute:
> fact_tsgsrtd_cat1 = foreach tsgsrtd generate checksum, points, multiplier,
> qos, pe, event_type,
> CONCAT(CONCAT((no_of_rics == '\\N' ? '0' :
> no_of_rics),'.000000000'),
> CONCAT(request_start_dttm,
> CONCAT(request_end_dttm,
> CONCAT((adjs_list == '\\N' ? 'UNKNOWN' : adjs_list),
> CONCAT((event_datatype == '\\N' ? 'UNKNOWN' : event_datatype),
> CONCAT((facts_list == '\\N' ? 'UNKNOWN' : facts_list),
> (frequency == '\\N' ? 'UNKNOWN' : frequency)
> )))))) as cat1;
> fact_tsgsrtd_dim_hash = foreach fact_tsgsrtd_cat1 generate checksum,
> UPPER(
> CONCAT(cat1,
> CONCAT((points == '\\N' ? '0' : points),
> CONCAT((multiplier == '\\N' ? '0' : multiplier),
> CONCAT((qos == '\\N' ? 'UNKNOWN' : qos),
> CONCAT(CONCAT((pe == '\\N' ? '0' : pe), '.0000'),
> (event_type == 'GSREQ' ? 'GS' : (event_type == 'RICREQ' ?
> 'RTD' : (event_type == 'TSREQ' ? 'TS' : 'UNKNOWN')))
> )))))) as ts_dim_hash;
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)