[ https://issues.apache.org/jira/browse/IMPALA-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16498545#comment-16498545 ]
Ruslan Dautkhanov commented on IMPALA-110: ------------------------------------------ {quote}[~alex.behm] added a comment - 27/Oct/17 21:53 The feature requires fundamental architectural changes from a tree-execution to a dag-execution model. [~alex.behm] added a comment - 30/Oct/17 10:04 Ruslan Dautkhanov, this is it. As far as I'm aware of, there currently is no other feature that requires the switch. {quote} I thought of another use case for DAG execution - is a MATERIALIZED subquery IMPALA-7114 . Would also be great to implement MATERIALIZE hint as in Oracle too. [https://oracle-base.com/articles/misc/with-clause#materialize-hint] Basically in Oracle we could run WITH subqueries that will be persisted for the duration of the query, so if that subquery is used multiple times in outer query, a heavy join or any other operation has to run only once. If it's executed as a DAG, then all subqueries dependent on this materialized subquery can be run in parallel. Hopefully that would tip the Impala to switch to DAG execution model. > Add support for multiple distinct operators in the same query block > ------------------------------------------------------------------- > > Key: IMPALA-110 > URL: https://issues.apache.org/jira/browse/IMPALA-110 > Project: IMPALA > Issue Type: New Feature > Components: Backend, Frontend > Affects Versions: Impala 0.5, Impala 1.4, Impala 2.0, Impala 2.2, Impala > 2.3.0 > Reporter: Greg Rahn > Assignee: Thomas Tauber-Marshall > Priority: Major > Labels: sql-language > > Impala only allows a single (DISTINCT columns) expression in each query. > {color:red}Note: > If you do not need precise accuracy, you can produce an estimate of the > distinct values for a column by specifying NDV(column); a query can contain > multiple instances of NDV(column). To make Impala automatically rewrite > COUNT(DISTINCT) expressions to NDV(), enable the APPX_COUNT_DISTINCT query > option. > {color} > {code} > [impala:21000] > select count(distinct i_class_id) from item; > Query: select count(distinct i_class_id) from item > Query finished, fetching results ... > 16 > Returned 1 row(s) in 1.51s > {code} > {code} > [impala:21000] > select count(distinct i_class_id), count(distinct > i_brand_id) from item; > Query: select count(distinct i_class_id), count(distinct i_brand_id) from item > ERROR: com.cloudera.impala.common.AnalysisException: Analysis exception (in > select count(distinct i_class_id), count(distinct i_brand_id) from item) > at > com.cloudera.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:133) > at > com.cloudera.impala.service.Frontend.createExecRequest(Frontend.java:221) > at > com.cloudera.impala.service.JniFrontend.createExecRequest(JniFrontend.java:89) > Caused by: com.cloudera.impala.common.AnalysisException: all DISTINCT > aggregate functions need to have the same set of parameters as COUNT(DISTINCT > i_class_id); deviating function: COUNT(DISTINCT i_brand_id) > at > com.cloudera.impala.analysis.AggregateInfo.createDistinctAggInfo(AggregateInfo.java:196) > at > com.cloudera.impala.analysis.AggregateInfo.create(AggregateInfo.java:143) > at > com.cloudera.impala.analysis.SelectStmt.createAggInfo(SelectStmt.java:466) > at > com.cloudera.impala.analysis.SelectStmt.analyzeAggregation(SelectStmt.java:347) > at com.cloudera.impala.analysis.SelectStmt.analyze(SelectStmt.java:155) > at > com.cloudera.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:130) > ... 2 more > {code} > Hive supports this: > {code} > $ hive -e "select count(distinct i_class_id), count(distinct i_brand_id) from > item;" > Logging initialized using configuration in > file:/etc/hive/conf.dist/hive-log4j.properties > Hive history file=/tmp/grahn/hive_job_log_grahn_201303052234_1625576708.txt > Total MapReduce jobs = 1 > Launching Job 1 out of 1 > Number of reduce tasks determined at compile time: 1 > In order to change the average load for a reducer (in bytes): > set hive.exec.reducers.bytes.per.reducer=<number> > In order to limit the maximum number of reducers: > set hive.exec.reducers.max=<number> > In order to set a constant number of reducers: > set mapred.reduce.tasks=<number> > Starting Job = job_201302081514_0073, Tracking URL = > http://impala:50030/jobdetails.jsp?jobid=job_201302081514_0073 > Kill Command = /usr/lib/hadoop/bin/hadoop job > -Dmapred.job.tracker=m0525.mtv.cloudera.com:8021 -kill job_201302081514_0073 > Hadoop job information for Stage-1: number of mappers: 1; number of reducers: > 1 > 2013-03-05 22:34:43,255 Stage-1 map = 0%, reduce = 0% > 2013-03-05 22:34:49,323 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.81 > sec > 2013-03-05 22:34:50,337 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.81 > sec > 2013-03-05 22:34:51,351 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.81 > sec > 2013-03-05 22:34:52,360 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.81 > sec > 2013-03-05 22:34:53,370 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.81 > sec > 2013-03-05 22:34:54,379 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.81 > sec > 2013-03-05 22:34:55,389 Stage-1 map = 100%, reduce = 100%, Cumulative CPU > 8.58 sec > 2013-03-05 22:34:56,402 Stage-1 map = 100%, reduce = 100%, Cumulative CPU > 8.58 sec > 2013-03-05 22:34:57,413 Stage-1 map = 100%, reduce = 100%, Cumulative CPU > 8.58 sec > 2013-03-05 22:34:58,424 Stage-1 map = 100%, reduce = 100%, Cumulative CPU > 8.58 sec > MapReduce Total cumulative CPU time: 8 seconds 580 msec > Ended Job = job_201302081514_0073 > MapReduce Jobs Launched: > Job 0: Map: 1 Reduce: 1 Cumulative CPU: 8.58 sec HDFS Read: 0 HDFS > Write: 0 SUCCESS > Total MapReduce CPU Time Spent: 8 seconds 580 msec > OK > 16 952 > Time taken: 25.666 seconds > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org