[ https://issues.apache.org/jira/browse/HIVE-12664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Johan Gustavsson updated HIVE-12664: ------------------------------------ Description: The optimisation check for reduce deduplication only checks the first child node for join -and the check itself also contains a major bug- causing ArrayOutOfBoundException no matter what. Sample data table form: time||user||host||path||referer||code||agent||size||method int|string|string|string|string|bigint|string|bigint|string Sample query {code:sql} SELECT t1.host, COUNT(DISTINCT t1.`date`) AS login_count, MAX(t2.code) AS code, unix_timestamp() AS time FROM ( SELECT HOST, MIN(time) AS DATE FROM www_access WHERE HOST IS NOT NULL GROUP BY HOST ) t1 JOIN ( SELECT HOST, MIN(time) AS code FROM www_access WHERE HOST IS NOT NULL GROUP BY HOST ) t2 ON t1.host = t2.host GROUP BY t1.host {code} was:The optimisation check for reduce deduplication only checks the first child node for join -and the check itself also contains a major bug- causing ArrayOutOfBoundException no matter what. > Bug in reduce deduplication optimization causing ArrayOutOfBoundException > ------------------------------------------------------------------------- > > Key: HIVE-12664 > URL: https://issues.apache.org/jira/browse/HIVE-12664 > Project: Hive > Issue Type: Bug > Components: Hive > Affects Versions: 1.1.1, 1.2.1 > Reporter: Johan Gustavsson > Assignee: Johan Gustavsson > Attachments: HIVE-12664-1.patch, HIVE-12664.1.patch, HIVE-12664.patch > > > The optimisation check for reduce deduplication only checks the first child > node for join -and the check itself also contains a major bug- causing > ArrayOutOfBoundException no matter what. > Sample data table form: > time||user||host||path||referer||code||agent||size||method > int|string|string|string|string|bigint|string|bigint|string > Sample query > {code:sql} > SELECT > t1.host, > COUNT(DISTINCT t1.`date`) AS login_count, > MAX(t2.code) AS code, > unix_timestamp() AS time > FROM ( > SELECT > HOST, > MIN(time) AS DATE > FROM > www_access > WHERE > HOST IS NOT NULL > GROUP BY > HOST > ) t1 > JOIN ( > SELECT > HOST, > MIN(time) AS code > FROM > www_access > WHERE > HOST IS NOT NULL > GROUP BY > HOST > ) t2 > ON t1.host = t2.host > GROUP BY > t1.host > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)