[ https://issues.apache.org/jira/browse/HIVE-12664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Johan Gustavsson updated HIVE-12664: ------------------------------------ Attachment: HIVE-12664.2.patch > Bug in reduce deduplication optimization causing ArrayOutOfBoundException > ------------------------------------------------------------------------- > > Key: HIVE-12664 > URL: https://issues.apache.org/jira/browse/HIVE-12664 > Project: Hive > Issue Type: Bug > Components: Hive > Affects Versions: 1.1.1, 1.2.1 > Reporter: Johan Gustavsson > Assignee: Johan Gustavsson > Attachments: HIVE-12664-1.patch, HIVE-12664-2.patch, > HIVE-12664.1.patch, HIVE-12664.2.patch, HIVE-12664.patch > > > The optimisation check for reduce deduplication only checks the first child > node for join -and the check itself also contains a major bug- causing > ArrayOutOfBoundException no matter what. > Sample data table form: > ||time||user||host||path||referer||code||agent||size||method|| > |int|string|string|string|string|bigint|string|bigint|string| > Sample query > {code:sql} > SELECT > t1.host, > COUNT(DISTINCT t1.`date`) AS login_count, > MAX(t2.code) AS code, > unix_timestamp() AS time > FROM ( > SELECT > HOST, > MIN(time) AS DATE > FROM > www_access > WHERE > HOST IS NOT NULL > GROUP BY > HOST > ) t1 > JOIN ( > SELECT > HOST, > MIN(time) AS code > FROM > www_access > WHERE > HOST IS NOT NULL > GROUP BY > HOST > ) t2 > ON t1.host = t2.host > GROUP BY > t1.host > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)