[
https://issues.apache.org/jira/browse/IMPALA-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Armstrong resolved IMPALA-1728.
-----------------------------------
Resolution: Duplicate
IMPALA-1270 implemented this and changed the plan for TPC-DS Q95
> sub-query with duplicate values used IN conditional operator should discard
> the duplicate values before applying the operator
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: IMPALA-1728
> URL: https://issues.apache.org/jira/browse/IMPALA-1728
> Project: IMPALA
> Issue Type: New Feature
> Components: Frontend
> Affects Versions: Impala 2.0, Impala 2.1
> Reporter: Dileep Kumar
> Priority: Minor
> Labels: performance, planner, tpc-ds
> Attachments: q95.sql, q95.sql.DISTINCT
>
>
> When running the TPC-DS Q95 we found that it usages a result of CTE in IN
> conditional later in query.
> In this case CTE generates too many duplicate values for the same column
> which is used in conditional. When applied the DISTINCT to CTE it took 40%
> less time to complete.
> The timings(in Sec.) are as:
> Without DISTINCT : 1240
> With DISTINCT : 728
> Both versions of the query are attached.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)