[ https://issues.apache.org/jira/browse/PIG-3720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Koji Noguchi resolved PIG-3720. ------------------------------- Resolution: Duplicate I believe this is fixed in PIG-2769 for version 11.2 and later. Steve can you try patching it to your 0.10? > Nested concats of binary conditionals take 1/2 hour to parse > ------------------------------------------------------------ > > Key: PIG-3720 > URL: https://issues.apache.org/jira/browse/PIG-3720 > Project: Pig > Issue Type: Bug > Components: parser > Affects Versions: 0.10.0 > Reporter: Steve Ogden > Priority: Minor > > This statement takes over 1/2 hour to parse. Seems to be related to the > conditionals. Removing them and just running the nested concats, it parses > fast: > fact_tsgsrtd_dim_hash = foreach tsgsrtd generate checksum, > UPPER( > CONCAT((no_of_rics == '\\N' ? '0' : no_of_rics), > CONCAT(request_start_dttm, > CONCAT(request_end_dttm, > CONCAT((adjs_list == '\\N' ? 'UNKNOWN' : adjs_list), > CONCAT((event_datatype == '\\N' ? 'UNKNOWN' : event_datatype), > CONCAT((facts_list == '\\N' ? 'UNKNOWN' : facts_list), > CONCAT((frequency == '\\N' ? 'UNKNOWN' : frequency), > CONCAT((points == '\\N' ? '0' : points), > CONCAT((multiplier == '\\N' ? '0' : multiplier), > CONCAT((qos == '\\N' ? 'UNKNOWN' : qos), > CONCAT((pe == '\\N' ? '0' : pe), > (event_type == 'GSREQ' ? 'GS' : (event_type == 'RICREQ' ? > 'RTD' : (event_type == 'TSREQ' ? 'TS' : 'UNKNOWN'))) > )))))))))))); > I noticed it I split it, do half the conditionals in one relation, then take > the results of that and create another relation and do the other half of the > conditionals, it parses in less than a minute: > fact_tsgsrtd_cat1 = foreach tsgsrtd generate checksum, points, multiplier, > qos, pe, event_type, > CONCAT(CONCAT((no_of_rics == '\\N' ? '0' : > no_of_rics),'.000000000'), > CONCAT(request_start_dttm, > CONCAT(request_end_dttm, > CONCAT((adjs_list == '\\N' ? 'UNKNOWN' : adjs_list), > CONCAT((event_datatype == '\\N' ? 'UNKNOWN' : event_datatype), > CONCAT((facts_list == '\\N' ? 'UNKNOWN' : facts_list), > (frequency == '\\N' ? 'UNKNOWN' : frequency) > )))))) as cat1; > fact_tsgsrtd_dim_hash = foreach fact_tsgsrtd_cat1 generate checksum, > UPPER( > CONCAT(cat1, > CONCAT((points == '\\N' ? '0' : points), > CONCAT((multiplier == '\\N' ? '0' : multiplier), > CONCAT((qos == '\\N' ? 'UNKNOWN' : qos), > CONCAT(CONCAT((pe == '\\N' ? '0' : pe), '.0000'), > (event_type == 'GSREQ' ? 'GS' : (event_type == 'RICREQ' ? > 'RTD' : (event_type == 'TSREQ' ? 'TS' : 'UNKNOWN'))) > )))))) as ts_dim_hash; -- This message was sent by Atlassian JIRA (v6.1.5#6160)