[ 
https://issues.apache.org/jira/browse/HIVE-12664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Johan Gustavsson updated HIVE-12664:
------------------------------------
    Description: 
The optimisation check for reduce deduplication only checks the first child 
node for join -and the check itself also contains a major bug- causing 
ArrayOutOfBoundException no matter what.

Sample data table form:
time||user||host||path||referer||code||agent||size||method
int|string|string|string|string|bigint|string|bigint|string

Sample query
{code:sql}
SELECT 
  t1.host,
  COUNT(DISTINCT t1.`date`) AS login_count,
  MAX(t2.code) AS code,
  unix_timestamp() AS time
FROM (
    SELECT 
      HOST,
      MIN(time) AS DATE
    FROM
      www_access
    WHERE
      HOST IS NOT NULL
    GROUP BY
      HOST
  ) t1
JOIN (
    SELECT 
      HOST,
      MIN(time) AS code
    FROM
      www_access
    WHERE
      HOST IS NOT NULL
    GROUP BY
      HOST
  ) t2
  ON t1.host = t2.host
GROUP BY
  t1.host
{code}

  was:The optimisation check for reduce deduplication only checks the first 
child node for join -and the check itself also contains a major bug- causing 
ArrayOutOfBoundException no matter what.


> Bug in reduce deduplication optimization causing ArrayOutOfBoundException
> -------------------------------------------------------------------------
>
>                 Key: HIVE-12664
>                 URL: https://issues.apache.org/jira/browse/HIVE-12664
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 1.1.1, 1.2.1
>            Reporter: Johan Gustavsson
>            Assignee: Johan Gustavsson
>         Attachments: HIVE-12664-1.patch, HIVE-12664.1.patch, HIVE-12664.patch
>
>
> The optimisation check for reduce deduplication only checks the first child 
> node for join -and the check itself also contains a major bug- causing 
> ArrayOutOfBoundException no matter what.
> Sample data table form:
> time||user||host||path||referer||code||agent||size||method
> int|string|string|string|string|bigint|string|bigint|string
> Sample query
> {code:sql}
> SELECT 
>   t1.host,
>   COUNT(DISTINCT t1.`date`) AS login_count,
>   MAX(t2.code) AS code,
>   unix_timestamp() AS time
> FROM (
>     SELECT 
>       HOST,
>       MIN(time) AS DATE
>     FROM
>       www_access
>     WHERE
>       HOST IS NOT NULL
>     GROUP BY
>       HOST
>   ) t1
> JOIN (
>     SELECT 
>       HOST,
>       MIN(time) AS code
>     FROM
>       www_access
>     WHERE
>       HOST IS NOT NULL
>     GROUP BY
>       HOST
>   ) t2
>   ON t1.host = t2.host
> GROUP BY
>   t1.host
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to