[jira] [Commented] (HIVE-15544) Support scalar subqueries

2017-01-21 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1584#comment-1584
 ] 

Lefty Leverenz commented on HIVE-15544:
---

This should be documented in the wiki:

* [Subqueries in the FROM Clause | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SubQueries#LanguageManualSubQueries-SubqueriesintheFROMClause]
* [Subqueries in the WHERE Clause | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SubQueries#LanguageManualSubQueries-SubqueriesintheWHEREClause]

Adding a TODOC2.2 label.

> Support scalar subqueries
> -
>
> Key: HIVE-15544
> URL: https://issues.apache.org/jira/browse/HIVE-15544
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>  Labels: sub-query
> Fix For: 2.2.0
>
> Attachments: HIVE-15544.1.patch, HIVE-15544.2.patch, 
> HIVE-15544.3.patch, HIVE-15544.4.patch, HIVE-15544.5.patch
>
>
> Currently HIVE only support IN/EXISTS/NOT IN/NOT EXISTS subqueries. HIVE 
> doesn't allow sub-queries such as:
> {code}
> explain select  a.ca_state state, count(*) cnt
>  from customer_address a
>  ,customer c
>  ,store_sales s
>  ,date_dim d
>  ,item i
>  where   a.ca_address_sk = c.c_current_addr_sk
>   and c.c_customer_sk = s.ss_customer_sk
>   and s.ss_sold_date_sk = d.d_date_sk
>   and s.ss_item_sk = i.i_item_sk
>   and d.d_month_seq = 
>(select distinct (d_month_seq)
> from date_dim
>where d_year = 2000
>   and d_moy = 2 )
>   and i.i_current_price > 1.2 * 
>  (select avg(j.i_current_price) 
>from item j 
>where j.i_category = i.i_category)
>  group by a.ca_state
>  having count(*) >= 10
>  order by cnt 
>  limit 100;
> {code}
> We initially plan to support such scalar subqueries in filter i.e. WHERE and 
> HAVING



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15544) Support scalar subqueries

2017-01-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832751#comment-15832751
 ] 

Hive QA commented on HIVE-15544:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12848422/HIVE-15544.5.patch

{color:green}SUCCESS:{color} +1 due to 19 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 19 failed/errored test(s), 10974 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[show_functions] 
(batchId=67)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[subquery_exists] 
(batchId=38)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver[cascade_dbdrop]
 (batchId=226)
org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver[generatehfiles_require_family_path]
 (batchId=226)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a]
 (batchId=136)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cbo_subq_not_in]
 (batchId=148)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[escape1] 
(batchId=139)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[escape2] 
(batchId=154)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_char_simple]
 (batchId=147)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_varchar_simple]
 (batchId=152)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query16] 
(batchId=223)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query69] 
(batchId=223)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[cbo_subq_not_in] 
(batchId=116)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_exists] 
(batchId=112)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_in] 
(batchId=122)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3081/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3081/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3081/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 19 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12848422 - PreCommit-HIVE-Build

> Support scalar subqueries
> -
>
> Key: HIVE-15544
> URL: https://issues.apache.org/jira/browse/HIVE-15544
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>  Labels: sub-query
> Attachments: HIVE-15544.1.patch, HIVE-15544.2.patch, 
> HIVE-15544.3.patch, HIVE-15544.4.patch, HIVE-15544.5.patch
>
>
> Currently HIVE only support IN/EXISTS/NOT IN/NOT EXISTS subqueries. HIVE 
> doesn't allow sub-queries such as:
> {code}
> explain select  a.ca_state state, count(*) cnt
>  from customer_address a
>  ,customer c
>  ,store_sales s
>  ,date_dim d
>  ,item i
>  where   a.ca_address_sk = c.c_current_addr_sk
>   and c.c_customer_sk = s.ss_customer_sk
>   and s.ss_sold_date_sk = d.d_date_sk
>   and s.ss_item_sk = i.i_item_sk
>   and d.d_month_seq = 
>(select distinct (d_month_seq)
> from date_dim
>where d_year = 2000
>   and d_moy = 2 )
>   and i.i_current_price > 1.2 * 
>  (select avg(j.i_current_price) 
>from item j 
>where j.i_category = i.i_category)
>  group by a.ca_state
>  having count(*) >= 10
>  order by cnt 
>  limit 100;
> {code}
> We initially plan to support such scalar subqueries in filter i.e. WHERE and 
> HAVING



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15544) Support scalar subqueries

2017-01-19 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830870#comment-15830870
 ] 

Ashutosh Chauhan commented on HIVE-15544:
-

+1

> Support scalar subqueries
> -
>
> Key: HIVE-15544
> URL: https://issues.apache.org/jira/browse/HIVE-15544
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>  Labels: sub-query
> Attachments: HIVE-15544.1.patch, HIVE-15544.2.patch, 
> HIVE-15544.3.patch, HIVE-15544.4.patch, HIVE-15544.5.patch
>
>
> Currently HIVE only support IN/EXISTS/NOT IN/NOT EXISTS subqueries. HIVE 
> doesn't allow sub-queries such as:
> {code}
> explain select  a.ca_state state, count(*) cnt
>  from customer_address a
>  ,customer c
>  ,store_sales s
>  ,date_dim d
>  ,item i
>  where   a.ca_address_sk = c.c_current_addr_sk
>   and c.c_customer_sk = s.ss_customer_sk
>   and s.ss_sold_date_sk = d.d_date_sk
>   and s.ss_item_sk = i.i_item_sk
>   and d.d_month_seq = 
>(select distinct (d_month_seq)
> from date_dim
>where d_year = 2000
>   and d_moy = 2 )
>   and i.i_current_price > 1.2 * 
>  (select avg(j.i_current_price) 
>from item j 
>where j.i_category = i.i_category)
>  group by a.ca_state
>  having count(*) >= 10
>  order by cnt 
>  limit 100;
> {code}
> We initially plan to support such scalar subqueries in filter i.e. WHERE and 
> HAVING



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15544) Support scalar subqueries

2017-01-16 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15825491#comment-15825491
 ] 

Xuefu Zhang commented on HIVE-15544:


Re: Can you explain what do you mean by semantic problem at runtime? 

Since Hive doesn't know if a subquery will produce a single value, so as you 
proposed, Hive fails the query at runtime if it doesn't. This basically pushes 
a semantic problem (by relaxing semantic check) to a runtime problem. It seems 
so harsh (and probably unacceptable) that a query that has run for hours ends 
up with failure.

I didn't read your patch, but let me know if I misunderstood your proposal.

> Support scalar subqueries
> -
>
> Key: HIVE-15544
> URL: https://issues.apache.org/jira/browse/HIVE-15544
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>  Labels: sub-query
> Attachments: HIVE-15544.1.patch, HIVE-15544.2.patch, 
> HIVE-15544.3.patch
>
>
> Currently HIVE only support IN/EXISTS/NOT IN/NOT EXISTS subqueries. HIVE 
> doesn't allow sub-queries such as:
> {code}
> explain select  a.ca_state state, count(*) cnt
>  from customer_address a
>  ,customer c
>  ,store_sales s
>  ,date_dim d
>  ,item i
>  where   a.ca_address_sk = c.c_current_addr_sk
>   and c.c_customer_sk = s.ss_customer_sk
>   and s.ss_sold_date_sk = d.d_date_sk
>   and s.ss_item_sk = i.i_item_sk
>   and d.d_month_seq = 
>(select distinct (d_month_seq)
> from date_dim
>where d_year = 2000
>   and d_moy = 2 )
>   and i.i_current_price > 1.2 * 
>  (select avg(j.i_current_price) 
>from item j 
>where j.i_category = i.i_category)
>  group by a.ca_state
>  having count(*) >= 10
>  order by cnt 
>  limit 100;
> {code}
> We initially plan to support such scalar subqueries in filter i.e. WHERE and 
> HAVING



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15544) Support scalar subqueries

2017-01-15 Thread Vineet Garg (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15823382#comment-15823382
 ] 

Vineet Garg commented on HIVE-15544:


RB is at https://reviews.apache.org/r/3/. Current patch has few outstanding 
issues as described in RB.

> Support scalar subqueries
> -
>
> Key: HIVE-15544
> URL: https://issues.apache.org/jira/browse/HIVE-15544
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>  Labels: sub-query
> Attachments: HIVE-15544.1.patch, HIVE-15544.2.patch, 
> HIVE-15544.3.patch
>
>
> Currently HIVE only support IN/EXISTS/NOT IN/NOT EXISTS subqueries. HIVE 
> doesn't allow sub-queries such as:
> {code}
> explain select  a.ca_state state, count(*) cnt
>  from customer_address a
>  ,customer c
>  ,store_sales s
>  ,date_dim d
>  ,item i
>  where   a.ca_address_sk = c.c_current_addr_sk
>   and c.c_customer_sk = s.ss_customer_sk
>   and s.ss_sold_date_sk = d.d_date_sk
>   and s.ss_item_sk = i.i_item_sk
>   and d.d_month_seq = 
>(select distinct (d_month_seq)
> from date_dim
>where d_year = 2000
>   and d_moy = 2 )
>   and i.i_current_price > 1.2 * 
>  (select avg(j.i_current_price) 
>from item j 
>where j.i_category = i.i_category)
>  group by a.ca_state
>  having count(*) >= 10
>  order by cnt 
>  limit 100;
> {code}
> We initially plan to support such scalar subqueries in filter i.e. WHERE and 
> HAVING



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15544) Support scalar subqueries

2017-01-14 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822877#comment-15822877
 ] 

Ashutosh Chauhan commented on HIVE-15544:
-

[~vgarg] Can you create RB for this?

> Support scalar subqueries
> -
>
> Key: HIVE-15544
> URL: https://issues.apache.org/jira/browse/HIVE-15544
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>  Labels: sub-query
> Attachments: HIVE-15544.1.patch, HIVE-15544.2.patch
>
>
> Currently HIVE only support IN/EXISTS/NOT IN/NOT EXISTS subqueries. HIVE 
> doesn't allow sub-queries such as:
> {code}
> explain select  a.ca_state state, count(*) cnt
>  from customer_address a
>  ,customer c
>  ,store_sales s
>  ,date_dim d
>  ,item i
>  where   a.ca_address_sk = c.c_current_addr_sk
>   and c.c_customer_sk = s.ss_customer_sk
>   and s.ss_sold_date_sk = d.d_date_sk
>   and s.ss_item_sk = i.i_item_sk
>   and d.d_month_seq = 
>(select distinct (d_month_seq)
> from date_dim
>where d_year = 2000
>   and d_moy = 2 )
>   and i.i_current_price > 1.2 * 
>  (select avg(j.i_current_price) 
>from item j 
>where j.i_category = i.i_category)
>  group by a.ca_state
>  having count(*) >= 10
>  order by cnt 
>  limit 100;
> {code}
> We initially plan to support such scalar subqueries in filter i.e. WHERE and 
> HAVING



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15544) Support scalar subqueries

2017-01-11 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15820390#comment-15820390
 ] 

Hive QA commented on HIVE-15544:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12847104/HIVE-15544.2.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2902/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2902/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2902/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-01-12 07:34:45.410
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-2902/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-01-12 07:34:45.413
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 6973edc HIVE-15365 : Add new methods to MessageFactory API 
(corresponding to the ones added in JSONMessageFactory) (Sushanth Sowmyan, 
reviewed by Daniel Dai)
+ git clean -f -d
Removing ql/src/test/queries/clientpositive/sqlmerge.q
Removing ql/src/test/results/clientpositive/llap/sqlmerge.q.out
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at 6973edc HIVE-15365 : Add new methods to MessageFactory API 
(corresponding to the ones added in JSONMessageFactory) (Sushanth Sowmyan, 
reviewed by Daniel Dai)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-01-12 07:34:46.779
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
Going to apply patch with: patch -p1
patching file ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
patching file 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelDecorrelator.java
patching file 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveSubQueryRemoveRule.java
patching file 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/RexNodeConverter.java
patching file ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java
patching file ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g
patching file 
ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java
patching file 
ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeSubQueryDesc.java
patching file 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSQCountCheck.java
patching file ql/src/test/queries/clientnegative/scalar_sub1.q
patching file ql/src/test/queries/clientpositive/perf/query1.q
patching file ql/src/test/queries/clientpositive/perf/query23.q
patching file ql/src/test/queries/clientpositive/perf/query30.q
patching file ql/src/test/queries/clientpositive/perf/query41.q
patching file ql/src/test/queries/clientpositive/perf/query6.q
patching file ql/src/test/queries/clientpositive/perf/query81.q
patching file ql/src/test/results/clientnegative/scalar_sub1.q.out
patching file ql/src/test/results/clientpositive/perf/query1.q.out
patching file ql/src/test/results/clientpositive/perf/query23.q.out
patching file ql/src/test/results/clientpositive/perf/query30.q.out
patching file ql/src/test/results/clientpositive/perf/query41.q.out
patching file ql/src/test/results/clientpositive/perf/query6.q.out
patching file ql/src/test/results/clientpositive/perf/query81.q.out
+ [[ maven == \m\a\v\e\n ]]
+ rm -rf /data/hiveptest/working/maven/org/apache/hive
+ mvn -B clean install -DskipTests -T 4 -q 
-Dmaven.repo.local=/data/hiveptest/working/maven
ANTLR Parser Generator  Version 3.5.2
Output file 

[jira] [Commented] (HIVE-15544) Support scalar subqueries

2017-01-11 Thread Vineet Garg (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819673#comment-15819673
 ] 

Vineet Garg commented on HIVE-15544:


Hi [~xuefuz] Can you explain what do you mean by semantic problem at runtime? 
Do you have anything else in mind besides making sure subquery outputs scalar 
value?  What do you mean by PL and how do you thing hiveconf variables may fit 
better for this?

> Support scalar subqueries
> -
>
> Key: HIVE-15544
> URL: https://issues.apache.org/jira/browse/HIVE-15544
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>  Labels: sub-query
> Attachments: HIVE-15544.1.patch, HIVE-15544.2.patch
>
>
> Currently HIVE only support IN/EXISTS/NOT IN/NOT EXISTS subqueries. HIVE 
> doesn't allow sub-queries such as:
> {code}
> explain select  a.ca_state state, count(*) cnt
>  from customer_address a
>  ,customer c
>  ,store_sales s
>  ,date_dim d
>  ,item i
>  where   a.ca_address_sk = c.c_current_addr_sk
>   and c.c_customer_sk = s.ss_customer_sk
>   and s.ss_sold_date_sk = d.d_date_sk
>   and s.ss_item_sk = i.i_item_sk
>   and d.d_month_seq = 
>(select distinct (d_month_seq)
> from date_dim
>where d_year = 2000
>   and d_moy = 2 )
>   and i.i_current_price > 1.2 * 
>  (select avg(j.i_current_price) 
>from item j 
>where j.i_category = i.i_category)
>  group by a.ca_state
>  having count(*) >= 10
>  order by cnt 
>  limit 100;
> {code}
> We initially plan to support such scalar subqueries in filter i.e. WHERE and 
> HAVING



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15544) Support scalar subqueries

2017-01-11 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819577#comment-15819577
 ] 

Xuefu Zhang commented on HIVE-15544:


Thanks for sharing your findings.

However, I don't think it's good enough to catch the semantic problem at 
runtime. It's waste and confusion that a user's query fails after hours'r 
running.

To me, PL and hiveconf variables may fit better for this.

> Support scalar subqueries
> -
>
> Key: HIVE-15544
> URL: https://issues.apache.org/jira/browse/HIVE-15544
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>  Labels: sub-query
> Attachments: HIVE-15544.1.patch, HIVE-15544.2.patch
>
>
> Currently HIVE only support IN/EXISTS/NOT IN/NOT EXISTS subqueries. HIVE 
> doesn't allow sub-queries such as:
> {code}
> explain select  a.ca_state state, count(*) cnt
>  from customer_address a
>  ,customer c
>  ,store_sales s
>  ,date_dim d
>  ,item i
>  where   a.ca_address_sk = c.c_current_addr_sk
>   and c.c_customer_sk = s.ss_customer_sk
>   and s.ss_sold_date_sk = d.d_date_sk
>   and s.ss_item_sk = i.i_item_sk
>   and d.d_month_seq = 
>(select distinct (d_month_seq)
> from date_dim
>where d_year = 2000
>   and d_moy = 2 )
>   and i.i_current_price > 1.2 * 
>  (select avg(j.i_current_price) 
>from item j 
>where j.i_category = i.i_category)
>  group by a.ca_state
>  having count(*) >= 10
>  order by cnt 
>  limit 100;
> {code}
> We initially plan to support such scalar subqueries in filter i.e. WHERE and 
> HAVING



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15544) Support scalar subqueries

2017-01-05 Thread Vineet Garg (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15802524#comment-15802524
 ] 

Vineet Garg commented on HIVE-15544:


Doing a quick search in Postgres code shows that they handle this case during 
execution instead of planning. Postgres avoid re-writing scalar subqueries into 
join and execute them separately as subplan and so are able to handle this at 
runtime

Also according to MariaDB's this documentation page 
(https://mariadb.com/kb/en/sql-99/scalar-subqueries/) {noformat}It is an error 
if, at runtime, the DBMS discovers that a scalar subquery returns more than one 
row{noformat}

Current patch leverages Calcite's functionality to plan scalar subqueries, 
which re-write them into LEFT JOIN. Calcite tries to introduce SINGLE_VALUE 
function to make sure subquery expression produces single value but HIVE avoids 
that since it doesn't support SINGLE_VALUE.

I am not yet sure how HIVE is going to handle this yet. Most likely this will 
have to be handled at runtime but that'll probably involve introducing new 
execution operator to execute subplans.

> Support scalar subqueries
> -
>
> Key: HIVE-15544
> URL: https://issues.apache.org/jira/browse/HIVE-15544
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>  Labels: sub-query
> Attachments: HIVE-15544.1.patch
>
>
> Currently HIVE only support IN/EXISTS/NOT IN/NOT EXISTS subqueries. HIVE 
> doesn't allow sub-queries such as:
> {code}
> explain select  a.ca_state state, count(*) cnt
>  from customer_address a
>  ,customer c
>  ,store_sales s
>  ,date_dim d
>  ,item i
>  where   a.ca_address_sk = c.c_current_addr_sk
>   and c.c_customer_sk = s.ss_customer_sk
>   and s.ss_sold_date_sk = d.d_date_sk
>   and s.ss_item_sk = i.i_item_sk
>   and d.d_month_seq = 
>(select distinct (d_month_seq)
> from date_dim
>where d_year = 2000
>   and d_moy = 2 )
>   and i.i_current_price > 1.2 * 
>  (select avg(j.i_current_price) 
>from item j 
>where j.i_category = i.i_category)
>  group by a.ca_state
>  having count(*) >= 10
>  order by cnt 
>  limit 100;
> {code}
> We initially plan to support such scalar subqueries in filter i.e. WHERE and 
> HAVING



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15544) Support scalar subqueries

2017-01-05 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15802285#comment-15802285
 ] 

Xuefu Zhang commented on HIVE-15544:


It's interesting to know how we know if a subquery returns a scalar value of a 
set of rows. Semantically, a query returns a set of rows, so using a subquery 
in place for a scalar is semantically confusing. Thus, I doubt the legitimacy 
of the syntax, but am open to see how SQL standard says and if any DB system 
has things like this.

> Support scalar subqueries
> -
>
> Key: HIVE-15544
> URL: https://issues.apache.org/jira/browse/HIVE-15544
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>  Labels: sub-query
> Attachments: HIVE-15544.1.patch
>
>
> Currently HIVE only support IN/EXISTS/NOT IN/NOT EXISTS subqueries. HIVE 
> doesn't allow sub-queries such as:
> {code}
> explain select  a.ca_state state, count(*) cnt
>  from customer_address a
>  ,customer c
>  ,store_sales s
>  ,date_dim d
>  ,item i
>  where   a.ca_address_sk = c.c_current_addr_sk
>   and c.c_customer_sk = s.ss_customer_sk
>   and s.ss_sold_date_sk = d.d_date_sk
>   and s.ss_item_sk = i.i_item_sk
>   and d.d_month_seq = 
>(select distinct (d_month_seq)
> from date_dim
>where d_year = 2000
>   and d_moy = 2 )
>   and i.i_current_price > 1.2 * 
>  (select avg(j.i_current_price) 
>from item j 
>where j.i_category = i.i_category)
>  group by a.ca_state
>  having count(*) >= 10
>  order by cnt 
>  limit 100;
> {code}
> We initially plan to support such scalar subqueries in filter i.e. WHERE and 
> HAVING



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15544) Support scalar subqueries

2017-01-04 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15800541#comment-15800541
 ] 

Hive QA commented on HIVE-15544:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12845708/HIVE-15544.1.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2793/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2793/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2793/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-01-05 06:52:35.750
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-2793/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-01-05 06:52:35.752
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at ad335c1 HIVE-15518: Refactoring rows and range related classes 
to put the window type on Window (Aihua Xu, reviewed by Yongzhi Chen)
+ git clean -f -d
Removing ql/src/java/org/apache/hadoop/hive/ql/exec/DynamicValueRegistry.java
Removing 
ql/src/java/org/apache/hadoop/hive/ql/exec/ExprNodeDynamicValueEvaluator.java
Removing 
ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DynamicValueRegistryTez.java
Removing 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/DynamicValueVectorExpression.java
Removing ql/src/java/org/apache/hadoop/hive/ql/parse/RuntimeValuesInfo.java
Removing ql/src/java/org/apache/hadoop/hive/ql/plan/DynamicValue.java
Removing 
ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDynamicValueDesc.java
Removing ql/src/test/queries/clientpositive/dynamic_semijoin_reduction.q
Removing 
ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction.q.out
Removing 
storage-api/src/java/org/apache/hadoop/hive/ql/io/sarg/LiteralDelegate.java
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at ad335c1 HIVE-15518: Refactoring rows and range related classes 
to put the window type on Window (Aihua Xu, reviewed by Yongzhi Chen)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-01-05 06:52:36.731
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: a/itests/src/test/resources/testconfiguration.properties: No such file 
or directory
error: 
a/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveSubQRemoveRelBuilder.java:
 No such file or directory
error: 
a/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveFilter.java:
 No such file or directory
error: 
a/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveSubQueryRemoveRule.java:
 No such file or directory
error: 
a/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/RexNodeConverter.java:
 No such file or directory
error: a/ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java: No 
such file or directory
error: a/ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g: No 
such file or directory
error: a/ql/src/java/org/apache/hadoop/hive/ql/parse/QBSubQuery.java: No such 
file or directory
error: a/ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java: 
No such file or directory
error: a/ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeSubQueryDesc.java: 
No such file or directory
error: a/ql/src/test/queries/clientnegative/subquery_restrictions.q: No such 
file or directory
error: a/ql/src/test/queries/clientpositive/perf/query45.q: No such file or 
directory
error: