[jira] [Commented] (HIVE-25095) Beeline/hive -e command can't deal with query with trailing quote

2021-11-17 Thread Robbie Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17445564#comment-17445564
 ] 

Robbie Zhang commented on HIVE-25095:
-

[~kgyrtkirk] , thanks for your offering :D

> Beeline/hive -e command can't deal with query with trailing quote
> -
>
> Key: HIVE-25095
> URL: https://issues.apache.org/jira/browse/HIVE-25095
> Project: Hive
>  Issue Type: Bug
>Reporter: Robbie Zhang
>Assignee: Robbie Zhang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The command 
> {code:java}
> hive -e 'select "hive"'{code}
> and
> {code:java}
> beeline -e 'select "hive"'{code}
> fail with such error:
> {code:java}
> Error: Error while compiling statement: FAILED: ParseException line 1:12 
> character '' not supported here (state=42000,code=4){code}
> The reason is that org.apache.commons.cli.Util.stripLeadingAndTrailingQuotes 
> in commons-cli-1.2.jar strips the trailing quote so the query string is 
> changed to
> {code:java}
> select "hive{code}
> This bug is fixed in commons-cli-1.3.1 and commons-cli-1.4.jar. The 
> workaround is to overwrite commons-cli-1.2.jar with commons-cli-1.3.1 or 
> commons-cli-1.4.jar.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25498) Query with more than 31 count distinct functions returns wrong result

2021-09-08 Thread Robbie Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Zhang updated HIVE-25498:

Summary: Query with more than 31 count distinct functions returns wrong 
result  (was: Query with more than 32 count distinct functions returns wrong 
result)

> Query with more than 31 count distinct functions returns wrong result
> -
>
> Key: HIVE-25498
> URL: https://issues.apache.org/jira/browse/HIVE-25498
> Project: Hive
>  Issue Type: Bug
>Reporter: Robbie Zhang
>Assignee: Robbie Zhang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> If there are more than 32 "COUNT(DISTINCT COL)" functions in a query, all 
> these COUNT functions in this query return 0 instead of the proper values.
> Here are the queries to reproduce this issue:
> {code:java}
> set hive.cbo.enable=true;
> create table test_count (c0 string, c1 string, c2 string, c3 string, c4 
> string, c5 string, c6 string, c7 string, c8 string, c9 string, c10 string, 
> c11 string, c12 string, c13 string, c14 string, c15 string, c16 string, c17 
> string, c18 string, c19 string, c20 string, c21 string, c22 string, c23 
> string, c24 string, c25 string, c26 string, c27 string, c28 string, c29 
> string, c30 string, c31 string, c32 string);
> INSERT INTO test_count values ('c0', 'c1', 'c2', 'c3', 'c4', 'c5', 'c6', 
> 'c7', 'c8', 'c9', 'c10', 'c11', 'c12', 'c13', 'c14', 'c15', 'c16', 'c17', 
> 'c18', 'c19', 'c20', 'c21', 'c22', 'c23', 'c24', 'c25', 'c26', 'c27', 'c28', 
> 'c29', 'c30', 'c31', 'c32'); 
> select count (distinct c0), count(distinct c1), count(distinct c2), 
> count(distinct c3), count(distinct c4), count(distinct c5), count(distinct 
> c6), count(distinct c7), count(distinct c8), count(distinct c9), 
> count(distinct c10), count(distinct c11), count(distinct c12), count(distinct 
> c13), count(distinct c14), count(distinct c15), count(distinct c16), 
> count(distinct c17), count(distinct c18), count(distinct c19), count(distinct 
> c20), count(distinct c21), count(distinct c22), count(distinct c23), 
> count(distinct c24), count(distinct c25), count(distinct c26), count(distinct 
> c27), count(distinct c28), count(distinct c29), count(distinct c30), 
> count(distinct c31), count(distinct c32) from test_count;
> {code}
>  This bug is caused by HiveExpandDistinctAggregatesRule.getGroupingIdValue() 
> which uses int type. When there are more than 32 groupings the values 
> overflow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25498) Query with more than 31 count distinct functions returns wrong result

2021-09-08 Thread Robbie Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Zhang updated HIVE-25498:

Description: 
If there are more than 32 "COUNT(DISTINCT COL)" functions in a query, some or 
even all these COUNT functions in this query return 0 instead of the proper 
values.

Here are the queries to reproduce this issue:
{code:java}
set hive.cbo.enable=true;
create table test_count (c0 string, c1 string, c2 string, c3 string, c4 string, 
c5 string, c6 string, c7 string, c8 string, c9 string, c10 string, c11 string, 
c12 string, c13 string, c14 string, c15 string, c16 string, c17 string, c18 
string, c19 string, c20 string, c21 string, c22 string, c23 string, c24 string, 
c25 string, c26 string, c27 string, c28 string, c29 string, c30 string, c31 
string, c32 string);
INSERT INTO test_count values ('c0', 'c1', 'c2', 'c3', 'c4', 'c5', 'c6', 'c7', 
'c8', 'c9', 'c10', 'c11', 'c12', 'c13', 'c14', 'c15', 'c16', 'c17', 'c18', 
'c19', 'c20', 'c21', 'c22', 'c23', 'c24', 'c25', 'c26', 'c27', 'c28', 'c29', 
'c30', 'c31', 'c32'); 
select count (distinct c0), count(distinct c1), count(distinct c2), 
count(distinct c3), count(distinct c4), count(distinct c5), count(distinct c6), 
count(distinct c7), count(distinct c8), count(distinct c9), count(distinct 
c10), count(distinct c11), count(distinct c12), count(distinct c13), 
count(distinct c14), count(distinct c15), count(distinct c16), count(distinct 
c17), count(distinct c18), count(distinct c19), count(distinct c20), 
count(distinct c21), count(distinct c22), count(distinct c23), count(distinct 
c24), count(distinct c25), count(distinct c26), count(distinct c27), 
count(distinct c28), count(distinct c29), count(distinct c30), count(distinct 
c31), count(distinct c32) from test_count;
{code}
 This bug is caused by HiveExpandDistinctAggregatesRule.getGroupingIdValue() 
which uses int type. When there are more than 32 groupings the values overflow.

  was:
If there are more than 32 "COUNT(DISTINCT COL)" functions in a query, all these 
COUNT functions in this query return 0 instead of the proper values.

Here are the queries to reproduce this issue:
{code:java}
set hive.cbo.enable=true;
create table test_count (c0 string, c1 string, c2 string, c3 string, c4 string, 
c5 string, c6 string, c7 string, c8 string, c9 string, c10 string, c11 string, 
c12 string, c13 string, c14 string, c15 string, c16 string, c17 string, c18 
string, c19 string, c20 string, c21 string, c22 string, c23 string, c24 string, 
c25 string, c26 string, c27 string, c28 string, c29 string, c30 string, c31 
string, c32 string);
INSERT INTO test_count values ('c0', 'c1', 'c2', 'c3', 'c4', 'c5', 'c6', 'c7', 
'c8', 'c9', 'c10', 'c11', 'c12', 'c13', 'c14', 'c15', 'c16', 'c17', 'c18', 
'c19', 'c20', 'c21', 'c22', 'c23', 'c24', 'c25', 'c26', 'c27', 'c28', 'c29', 
'c30', 'c31', 'c32'); 
select count (distinct c0), count(distinct c1), count(distinct c2), 
count(distinct c3), count(distinct c4), count(distinct c5), count(distinct c6), 
count(distinct c7), count(distinct c8), count(distinct c9), count(distinct 
c10), count(distinct c11), count(distinct c12), count(distinct c13), 
count(distinct c14), count(distinct c15), count(distinct c16), count(distinct 
c17), count(distinct c18), count(distinct c19), count(distinct c20), 
count(distinct c21), count(distinct c22), count(distinct c23), count(distinct 
c24), count(distinct c25), count(distinct c26), count(distinct c27), 
count(distinct c28), count(distinct c29), count(distinct c30), count(distinct 
c31), count(distinct c32) from test_count;
{code}
 This bug is caused by HiveExpandDistinctAggregatesRule.getGroupingIdValue() 
which uses int type. When there are more than 32 groupings the values overflow.


> Query with more than 31 count distinct functions returns wrong result
> -
>
> Key: HIVE-25498
> URL: https://issues.apache.org/jira/browse/HIVE-25498
> Project: Hive
>  Issue Type: Bug
>Reporter: Robbie Zhang
>Assignee: Robbie Zhang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> If there are more than 32 "COUNT(DISTINCT COL)" functions in a query, some or 
> even all these COUNT functions in this query return 0 instead of the proper 
> values.
> Here are the queries to reproduce this issue:
> {code:java}
> set hive.cbo.enable=true;
> create table test_count (c0 string, c1 string, c2 string, c3 string, c4 
> string, c5 string, c6 string, c7 string, c8 string, c9 string, c10 string, 
> c11 string, c12 string, c13 string, c14 string, c15 string, c16 string, c17 
> string, c18 string, c19 string, c20 string, c21 string, c22 string, c23 
> string, c24 string, c25 string, c26 string, c27 string, c28 string, c29 
> string, c30 string, c31 

[jira] [Assigned] (HIVE-25498) Query with more than 32 count distinct functions returns wrong result

2021-09-03 Thread Robbie Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Zhang reassigned HIVE-25498:
---

Assignee: Robbie Zhang

> Query with more than 32 count distinct functions returns wrong result
> -
>
> Key: HIVE-25498
> URL: https://issues.apache.org/jira/browse/HIVE-25498
> Project: Hive
>  Issue Type: Bug
>Reporter: Robbie Zhang
>Assignee: Robbie Zhang
>Priority: Major
>
> If there are more than 32 "COUNT(DISTINCT COL)" functions in a query, all 
> these COUNT functions in this query return 0 instead of the proper values.
> Here are the queries to reproduce this issue:
> {code:java}
> set hive.cbo.enable=true;
> create table test_count (c0 string, c1 string, c2 string, c3 string, c4 
> string, c5 string, c6 string, c7 string, c8 string, c9 string, c10 string, 
> c11 string, c12 string, c13 string, c14 string, c15 string, c16 string, c17 
> string, c18 string, c19 string, c20 string, c21 string, c22 string, c23 
> string, c24 string, c25 string, c26 string, c27 string, c28 string, c29 
> string, c30 string, c31 string, c32 string);
> INSERT INTO test_count values ('c0', 'c1', 'c2', 'c3', 'c4', 'c5', 'c6', 
> 'c7', 'c8', 'c9', 'c10', 'c11', 'c12', 'c13', 'c14', 'c15', 'c16', 'c17', 
> 'c18', 'c19', 'c20', 'c21', 'c22', 'c23', 'c24', 'c25', 'c26', 'c27', 'c28', 
> 'c29', 'c30', 'c31', 'c32'); 
> select count (distinct c0), count(distinct c1), count(distinct c2), 
> count(distinct c3), count(distinct c4), count(distinct c5), count(distinct 
> c6), count(distinct c7), count(distinct c8), count(distinct c9), 
> count(distinct c10), count(distinct c11), count(distinct c12), count(distinct 
> c13), count(distinct c14), count(distinct c15), count(distinct c16), 
> count(distinct c17), count(distinct c18), count(distinct c19), count(distinct 
> c20), count(distinct c21), count(distinct c22), count(distinct c23), 
> count(distinct c24), count(distinct c25), count(distinct c26), count(distinct 
> c27), count(distinct c28), count(distinct c29), count(distinct c30), 
> count(distinct c31), count(distinct c32) from test_count;
> {code}
>  This bug is caused by HiveExpandDistinctAggregatesRule.getGroupingIdValue() 
> which uses int type. When there are more than 32 groupings the values 
> overflow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25331) Create database query doesn't create MANAGEDLOCATION directory

2021-07-17 Thread Robbie Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Zhang reassigned HIVE-25331:
---

Assignee: Robbie Zhang

> Create database query doesn't create MANAGEDLOCATION directory
> --
>
> Key: HIVE-25331
> URL: https://issues.apache.org/jira/browse/HIVE-25331
> Project: Hive
>  Issue Type: Bug
>Reporter: Robbie Zhang
>Assignee: Robbie Zhang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If we don't assign MANAGEDLOCATION in a "create database" query, the 
> MANAGEDLOCATION will be NULL so HMS doesn't create the directory. In this 
> case, a CTAS query immediately after the CREATE DATABASE query might fail in 
> MOVE task due to "destination's parent does not exist". I can use the 
> following script to reproduce this issue:
> {code:java}
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> create database testdb location '/tmp/testdb.db';
> create table testdb.test as select 1;
> {code}
> If the staging directory is under the MANAGEDLOCATION directory, the CTAS 
> query is fine as the MANAGEDLOCATION directory is created while creating the 
> staging directory. Since we set LOCATION to a default directory when LOCATION 
> is not assigned in the CREATE DATABASE query, I believe it's worth to set 
> MANAGEDLOCATION to a default directory, too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25329) CTAS creates a managed table as non-ACID table

2021-07-17 Thread Robbie Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Zhang reassigned HIVE-25329:
---

Assignee: Robbie Zhang

> CTAS creates a managed table as non-ACID table
> --
>
> Key: HIVE-25329
> URL: https://issues.apache.org/jira/browse/HIVE-25329
> Project: Hive
>  Issue Type: Bug
>Reporter: Robbie Zhang
>Assignee: Robbie Zhang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> According to HIVE-22158,  MANAGED tables should be ACID tables only. When we 
> set hive.create.as.external.legacy to true, the query like 'create managed 
> table as select 1' creates a non-ACID table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-23489) dynamic partition failed use insert overwrite

2021-06-05 Thread Robbie Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Zhang resolved HIVE-23489.
-
Resolution: Resolved

> dynamic partition failed use insert overwrite
> -
>
> Key: HIVE-23489
> URL: https://issues.apache.org/jira/browse/HIVE-23489
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.1.1
>Reporter: shining
>Assignee: Robbie Zhang
>Priority: Major
>
> SQL: insert *overwrite *table dzdz_fpxx_dzfp partition(nian) select * from 
> test.dzdz_fpxx_dzfp
> {noformat}
> create table dzdz_fpxx_dzfp  (
>   FPDM string,
>   FPHM string,
>   KPRQ timestamp,
>   FPLY string) 
>   partitioned by (nian string) 
>   stored as parquet;
>   
>   
>   create table test.dzdz_fpxx_dzfp (
>   FPDM string,
>   FPHM string,
>   KPRQ timestamp,
>   FPLY string,
>   nian string) 
>   stored as textfile
>   location "/origin/data/dzfp_origin/"
> {noformat}
> The execute insert sql: 
> SQL: insert overwrite table dzdz_fpxx_dzfp partition(nian) select * from 
> test.dzdz_fpxx_dzfp
> {noformat}
> INFO  : Compiling 
> command(queryId=hive_20200519100412_12f2b39c-45f4-4f4c-9261-32ca86fa28db): 
> insert overwrite table dzdz_fpxx_dzfp partition(nian) select * from 
> test.dzdz_fpxx_dzfp
> INFO  : Semantic Analysis Completed
> INFO  : Returning Hive schema: 
> Schema(fieldSchemas:[FieldSchema(name:dzdz_fpxx_dzfp.fpdm, type:string, 
> comment:null), FieldSchema(name:dzdz_fpxx_dzfp.fphm, type:string, 
> comment:null), FieldSchema(name:dzdz_fpxx_dzfp.kprq, type:timestamp, 
> comment:null), FieldSchema(name:dzdz_fpxx_dzfp.tspz_dm, type:string, 
> comment:null), FieldSchema(name:dzdz_fpxx_dzfp.fpzt_bz, type:string, 
> comment:null), FieldSchema(name:dzdz_fpxx_dzfp.fply, type:string, 
> comment:null), FieldSchema(name:dzdz_fpxx_dzfp.nian, type:string, 
> comment:null)], properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20200519100412_12f2b39c-45f4-4f4c-9261-32ca86fa28db); 
> Time taken: 1.719 seconds
> INFO  : Executing 
> command(queryId=hive_20200519100412_12f2b39c-45f4-4f4c-9261-32ca86fa28db): 
> insert overwrite table dzdz_fpxx_dzfp partition(nian) select * from 
> test.dzdz_fpxx_dzfp
> WARN  :
> INFO  : Query ID = hive_20200519100412_12f2b39c-45f4-4f4c-9261-32ca86fa28db
> INFO  : Total jobs = 3
> INFO  : Launching Job 1 out of 3
> INFO  : Starting task [Stage-1:MAPRED] in serial mode
> INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
> INFO  : number of splits:1
> INFO  : Submitting tokens for job: job_1589451904439_0049
> INFO  : Executing with tokens: []
> INFO  : The url to track the job: 
> http://qcj37.hde.com:8088/proxy/application_1589451904439_0049/
> INFO  : Starting Job = job_1589451904439_0049, Tracking URL = 
> http://qcj37.hde.com:8088/proxy/application_1589451904439_0049/
> INFO  : Kill Command = /usr/hdp/current/hadoop-client/bin/hadoop job  -kill 
> job_1589451904439_0049
> INFO  : Hadoop job information for Stage-1: number of mappers: 1; number of 
> reducers: 0
> INFO  : 2020-05-19 10:06:56,823 Stage-1 map = 0%,  reduce = 0%
> INFO  : 2020-05-19 10:07:06,317 Stage-1 map = 100%,  reduce = 0%, Cumulative 
> CPU 4.3 sec
> INFO  : MapReduce Total cumulative CPU time: 4 seconds 300 msec
> INFO  : Ended Job = job_1589451904439_0049
> INFO  : Starting task [Stage-7:CONDITIONAL] in serial mode
> INFO  : Stage-4 is selected by condition resolver.
> INFO  : Stage-3 is filtered out by condition resolver.
> INFO  : Stage-5 is filtered out by condition resolver.
> INFO  : Starting task [Stage-4:MOVE] in serial mode
> INFO  : Moving data to directory 
> hdfs://mycluster/warehouse/tablespace/managed/hive/dzdz_fpxx_dzfp/.hive-staging_hive_2020-05-19_10-04-12_468_7695595367555279265-3/-ext-1
>  from 
> hdfs://mycluster/warehouse/tablespace/managed/hive/dzdz_fpxx_dzfp/.hive-staging_hive_2020-05-19_10-04-12_468_7695595367555279265-3/-ext-10002
> INFO  : Starting task [Stage-0:MOVE] in serial mode
> INFO  : Loading data to table default.dzdz_fpxx_dzfp partition (nian=null) 
> from 
> hdfs://mycluster/warehouse/tablespace/managed/hive/dzdz_fpxx_dzfp/.hive-staging_hive_2020-05-19_10-04-12_468_7695595367555279265-3/-ext-1
> INFO  :
> ERROR : FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.MoveTask. Exception when loading 2 in table 
> dzdz_fpxx_dzfp with 
> loadPath=hdfs://mycluster/warehouse/tablespace/managed/hive/dzdz_fpxx_dzfp/.hive-staging_hive_2020-05-19_10-04-12_468_7695595367555279265-3/-ext-1
> INFO  : MapReduce Jobs Launched:
> INFO  : Stage-Stage-1: Map: 1   Cumulative CPU: 4.3 sec   HDFS Read: 8015 
> HDFS Write: 2511 SUCCESS
> 

[jira] [Commented] (HIVE-23489) dynamic partition failed use insert overwrite

2021-06-05 Thread Robbie Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17357764#comment-17357764
 ] 

Robbie Zhang commented on HIVE-23489:
-

I can reproduce this issue on the same version by setting 
hive.metastore.dml.events to true. Insert Overwrites to a new partition should 
not capture new files as part of insert event. Otherwise, HMS will report this 
error because the new partition doesn't exist. This bug is fixed by HIVE-15642.

 

> dynamic partition failed use insert overwrite
> -
>
> Key: HIVE-23489
> URL: https://issues.apache.org/jira/browse/HIVE-23489
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.1.1
>Reporter: shining
>Assignee: Robbie Zhang
>Priority: Major
>
> SQL: insert *overwrite *table dzdz_fpxx_dzfp partition(nian) select * from 
> test.dzdz_fpxx_dzfp
> {noformat}
> create table dzdz_fpxx_dzfp  (
>   FPDM string,
>   FPHM string,
>   KPRQ timestamp,
>   FPLY string) 
>   partitioned by (nian string) 
>   stored as parquet;
>   
>   
>   create table test.dzdz_fpxx_dzfp (
>   FPDM string,
>   FPHM string,
>   KPRQ timestamp,
>   FPLY string,
>   nian string) 
>   stored as textfile
>   location "/origin/data/dzfp_origin/"
> {noformat}
> The execute insert sql: 
> SQL: insert overwrite table dzdz_fpxx_dzfp partition(nian) select * from 
> test.dzdz_fpxx_dzfp
> {noformat}
> INFO  : Compiling 
> command(queryId=hive_20200519100412_12f2b39c-45f4-4f4c-9261-32ca86fa28db): 
> insert overwrite table dzdz_fpxx_dzfp partition(nian) select * from 
> test.dzdz_fpxx_dzfp
> INFO  : Semantic Analysis Completed
> INFO  : Returning Hive schema: 
> Schema(fieldSchemas:[FieldSchema(name:dzdz_fpxx_dzfp.fpdm, type:string, 
> comment:null), FieldSchema(name:dzdz_fpxx_dzfp.fphm, type:string, 
> comment:null), FieldSchema(name:dzdz_fpxx_dzfp.kprq, type:timestamp, 
> comment:null), FieldSchema(name:dzdz_fpxx_dzfp.tspz_dm, type:string, 
> comment:null), FieldSchema(name:dzdz_fpxx_dzfp.fpzt_bz, type:string, 
> comment:null), FieldSchema(name:dzdz_fpxx_dzfp.fply, type:string, 
> comment:null), FieldSchema(name:dzdz_fpxx_dzfp.nian, type:string, 
> comment:null)], properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20200519100412_12f2b39c-45f4-4f4c-9261-32ca86fa28db); 
> Time taken: 1.719 seconds
> INFO  : Executing 
> command(queryId=hive_20200519100412_12f2b39c-45f4-4f4c-9261-32ca86fa28db): 
> insert overwrite table dzdz_fpxx_dzfp partition(nian) select * from 
> test.dzdz_fpxx_dzfp
> WARN  :
> INFO  : Query ID = hive_20200519100412_12f2b39c-45f4-4f4c-9261-32ca86fa28db
> INFO  : Total jobs = 3
> INFO  : Launching Job 1 out of 3
> INFO  : Starting task [Stage-1:MAPRED] in serial mode
> INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
> INFO  : number of splits:1
> INFO  : Submitting tokens for job: job_1589451904439_0049
> INFO  : Executing with tokens: []
> INFO  : The url to track the job: 
> http://qcj37.hde.com:8088/proxy/application_1589451904439_0049/
> INFO  : Starting Job = job_1589451904439_0049, Tracking URL = 
> http://qcj37.hde.com:8088/proxy/application_1589451904439_0049/
> INFO  : Kill Command = /usr/hdp/current/hadoop-client/bin/hadoop job  -kill 
> job_1589451904439_0049
> INFO  : Hadoop job information for Stage-1: number of mappers: 1; number of 
> reducers: 0
> INFO  : 2020-05-19 10:06:56,823 Stage-1 map = 0%,  reduce = 0%
> INFO  : 2020-05-19 10:07:06,317 Stage-1 map = 100%,  reduce = 0%, Cumulative 
> CPU 4.3 sec
> INFO  : MapReduce Total cumulative CPU time: 4 seconds 300 msec
> INFO  : Ended Job = job_1589451904439_0049
> INFO  : Starting task [Stage-7:CONDITIONAL] in serial mode
> INFO  : Stage-4 is selected by condition resolver.
> INFO  : Stage-3 is filtered out by condition resolver.
> INFO  : Stage-5 is filtered out by condition resolver.
> INFO  : Starting task [Stage-4:MOVE] in serial mode
> INFO  : Moving data to directory 
> hdfs://mycluster/warehouse/tablespace/managed/hive/dzdz_fpxx_dzfp/.hive-staging_hive_2020-05-19_10-04-12_468_7695595367555279265-3/-ext-1
>  from 
> hdfs://mycluster/warehouse/tablespace/managed/hive/dzdz_fpxx_dzfp/.hive-staging_hive_2020-05-19_10-04-12_468_7695595367555279265-3/-ext-10002
> INFO  : Starting task [Stage-0:MOVE] in serial mode
> INFO  : Loading data to table default.dzdz_fpxx_dzfp partition (nian=null) 
> from 
> hdfs://mycluster/warehouse/tablespace/managed/hive/dzdz_fpxx_dzfp/.hive-staging_hive_2020-05-19_10-04-12_468_7695595367555279265-3/-ext-1
> INFO  :
> ERROR : FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.MoveTask. Exception when loading 2 in table 

[jira] [Assigned] (HIVE-23489) dynamic partition failed use insert overwrite

2021-06-04 Thread Robbie Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Zhang reassigned HIVE-23489:
---

Assignee: Robbie Zhang

> dynamic partition failed use insert overwrite
> -
>
> Key: HIVE-23489
> URL: https://issues.apache.org/jira/browse/HIVE-23489
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.1.1
>Reporter: shining
>Assignee: Robbie Zhang
>Priority: Major
>
> SQL: insert *overwrite *table dzdz_fpxx_dzfp partition(nian) select * from 
> test.dzdz_fpxx_dzfp
> {noformat}
> create table dzdz_fpxx_dzfp  (
>   FPDM string,
>   FPHM string,
>   KPRQ timestamp,
>   FPLY string) 
>   partitioned by (nian string) 
>   stored as parquet;
>   
>   
>   create table test.dzdz_fpxx_dzfp (
>   FPDM string,
>   FPHM string,
>   KPRQ timestamp,
>   FPLY string,
>   nian string) 
>   stored as textfile
>   location "/origin/data/dzfp_origin/"
> {noformat}
> The execute insert sql: 
> SQL: insert overwrite table dzdz_fpxx_dzfp partition(nian) select * from 
> test.dzdz_fpxx_dzfp
> {noformat}
> INFO  : Compiling 
> command(queryId=hive_20200519100412_12f2b39c-45f4-4f4c-9261-32ca86fa28db): 
> insert overwrite table dzdz_fpxx_dzfp partition(nian) select * from 
> test.dzdz_fpxx_dzfp
> INFO  : Semantic Analysis Completed
> INFO  : Returning Hive schema: 
> Schema(fieldSchemas:[FieldSchema(name:dzdz_fpxx_dzfp.fpdm, type:string, 
> comment:null), FieldSchema(name:dzdz_fpxx_dzfp.fphm, type:string, 
> comment:null), FieldSchema(name:dzdz_fpxx_dzfp.kprq, type:timestamp, 
> comment:null), FieldSchema(name:dzdz_fpxx_dzfp.tspz_dm, type:string, 
> comment:null), FieldSchema(name:dzdz_fpxx_dzfp.fpzt_bz, type:string, 
> comment:null), FieldSchema(name:dzdz_fpxx_dzfp.fply, type:string, 
> comment:null), FieldSchema(name:dzdz_fpxx_dzfp.nian, type:string, 
> comment:null)], properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20200519100412_12f2b39c-45f4-4f4c-9261-32ca86fa28db); 
> Time taken: 1.719 seconds
> INFO  : Executing 
> command(queryId=hive_20200519100412_12f2b39c-45f4-4f4c-9261-32ca86fa28db): 
> insert overwrite table dzdz_fpxx_dzfp partition(nian) select * from 
> test.dzdz_fpxx_dzfp
> WARN  :
> INFO  : Query ID = hive_20200519100412_12f2b39c-45f4-4f4c-9261-32ca86fa28db
> INFO  : Total jobs = 3
> INFO  : Launching Job 1 out of 3
> INFO  : Starting task [Stage-1:MAPRED] in serial mode
> INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
> INFO  : number of splits:1
> INFO  : Submitting tokens for job: job_1589451904439_0049
> INFO  : Executing with tokens: []
> INFO  : The url to track the job: 
> http://qcj37.hde.com:8088/proxy/application_1589451904439_0049/
> INFO  : Starting Job = job_1589451904439_0049, Tracking URL = 
> http://qcj37.hde.com:8088/proxy/application_1589451904439_0049/
> INFO  : Kill Command = /usr/hdp/current/hadoop-client/bin/hadoop job  -kill 
> job_1589451904439_0049
> INFO  : Hadoop job information for Stage-1: number of mappers: 1; number of 
> reducers: 0
> INFO  : 2020-05-19 10:06:56,823 Stage-1 map = 0%,  reduce = 0%
> INFO  : 2020-05-19 10:07:06,317 Stage-1 map = 100%,  reduce = 0%, Cumulative 
> CPU 4.3 sec
> INFO  : MapReduce Total cumulative CPU time: 4 seconds 300 msec
> INFO  : Ended Job = job_1589451904439_0049
> INFO  : Starting task [Stage-7:CONDITIONAL] in serial mode
> INFO  : Stage-4 is selected by condition resolver.
> INFO  : Stage-3 is filtered out by condition resolver.
> INFO  : Stage-5 is filtered out by condition resolver.
> INFO  : Starting task [Stage-4:MOVE] in serial mode
> INFO  : Moving data to directory 
> hdfs://mycluster/warehouse/tablespace/managed/hive/dzdz_fpxx_dzfp/.hive-staging_hive_2020-05-19_10-04-12_468_7695595367555279265-3/-ext-1
>  from 
> hdfs://mycluster/warehouse/tablespace/managed/hive/dzdz_fpxx_dzfp/.hive-staging_hive_2020-05-19_10-04-12_468_7695595367555279265-3/-ext-10002
> INFO  : Starting task [Stage-0:MOVE] in serial mode
> INFO  : Loading data to table default.dzdz_fpxx_dzfp partition (nian=null) 
> from 
> hdfs://mycluster/warehouse/tablespace/managed/hive/dzdz_fpxx_dzfp/.hive-staging_hive_2020-05-19_10-04-12_468_7695595367555279265-3/-ext-1
> INFO  :
> ERROR : FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.MoveTask. Exception when loading 2 in table 
> dzdz_fpxx_dzfp with 
> loadPath=hdfs://mycluster/warehouse/tablespace/managed/hive/dzdz_fpxx_dzfp/.hive-staging_hive_2020-05-19_10-04-12_468_7695595367555279265-3/-ext-1
> INFO  : MapReduce Jobs Launched:
> INFO  : Stage-Stage-1: Map: 1   Cumulative CPU: 4.3 sec   HDFS Read: 8015 
> HDFS Write: 2511 

[jira] [Commented] (HIVE-25158) Beeline/hive command can't get operation logs when hive.session.id is set

2021-05-24 Thread Robbie Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17350708#comment-17350708
 ] 

Robbie Zhang commented on HIVE-25158:
-

HiveSessionImpl.setOperationLogSessionDir() uses the ID of sessionHandle to 
work out the session log directory:
https://github.com/apache/hive/blob/c10aa5370caf9b72a91c53b18dc5f8cb5c9fa6d6/service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java#L330

The sessionHandle is created before HS2 applies the session ID provided by the 
client. If the client doesn't provide session ID, the session ID is the same as 
the ID of sessionHandle. But if the client provides session ID, the ID of 
sessionHandle and the session ID are different. In this case, the operation 
logs are stored in // but HS2 looks for 
operation logs in /. When queries 
finish, HS2 can't can't clean up the operation logs either.


> Beeline/hive command can't get operation logs when hive.session.id is set
> -
>
> Key: HIVE-25158
> URL: https://issues.apache.org/jira/browse/HIVE-25158
> Project: Hive
>  Issue Type: Bug
>Reporter: Robbie Zhang
>Assignee: Robbie Zhang
>Priority: Major
>
> Usually, we can see the operation logs when we run a query from beeline/hive. 
> For example, the query ID, the time taken in compiling/executing, the 
> application information, etc. But if we use "–hiveconf hive.session.id=" 
> to set the session ID, we can't see the operation logs any more. Here are 
> examples:
>  * Without hive.session.id
> {code:java}
> $ hive -e "select 1"
> SLF4J: Class path contains multiple SLF4J bindings.
> ...
> Connected to: Apache Hive (version 3.1.3000.7.1.6.0-297)
> Driver: Hive JDBC (version 3.1.3000.7.1.6.0-297)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> INFO  : Compiling 
> command(queryId=hive_20210524105207_9d0774b2-8108-4800-a5e4-3b950ae03198): 
> select 1
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, 
> type:int, comment:null)], properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20210524105207_9d0774b2-8108-4800-a5e4-3b950ae03198); 
> Time taken: 0.122 seconds
> INFO  : Executing 
> command(queryId=hive_20210524105207_9d0774b2-8108-4800-a5e4-3b950ae03198): 
> select 1
> INFO  : Completed executing 
> command(queryId=hive_20210524105207_9d0774b2-8108-4800-a5e4-3b950ae03198); 
> Time taken: 0.016 seconds
> INFO  : OK
> +--+
> | _c0  |
> +--+
> | 1    |
> +--+
> 1 row selected (0.318 seconds)
> Beeline version 3.1.3000.7.1.6.0-297 by Apache Hive
> {code}
>  * With hive.session.id
> {code:java}
> $ hive --hiveconf hive.session.id=abcd -e "select 1"
> SLF4J: Class path contains multiple SLF4J bindings.
> ...
> Connected to: Apache Hive (version 3.1.3000.7.1.6.0-297)
> Driver: Hive JDBC (version 3.1.3000.7.1.6.0-297)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> +--+
> | _c0  |
> +--+
> | 1|
> +--+
> 1 row selected (5.862 seconds)
> Beeline version 3.1.3000.7.1.6.0-297 by Apache Hive
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25158) Beeline/hive command can't get operation logs when hive.session.id is set

2021-05-24 Thread Robbie Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Zhang reassigned HIVE-25158:
---

Assignee: Robbie Zhang

> Beeline/hive command can't get operation logs when hive.session.id is set
> -
>
> Key: HIVE-25158
> URL: https://issues.apache.org/jira/browse/HIVE-25158
> Project: Hive
>  Issue Type: Bug
>Reporter: Robbie Zhang
>Assignee: Robbie Zhang
>Priority: Major
>
> Usually, we can see the operation logs when we run a query from beeline/hive. 
> For example, the query ID, the time taken in compiling/executing, the 
> application information, etc. But if we use "–hiveconf hive.session.id=" 
> to set the session ID, we can't see the operation logs any more. Here are 
> examples:
>  * Without hive.session.id
> {code:java}
> $ hive -e "select 1"
> SLF4J: Class path contains multiple SLF4J bindings.
> ...
> Connected to: Apache Hive (version 3.1.3000.7.1.6.0-297)
> Driver: Hive JDBC (version 3.1.3000.7.1.6.0-297)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> INFO  : Compiling 
> command(queryId=hive_20210524105207_9d0774b2-8108-4800-a5e4-3b950ae03198): 
> select 1
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, 
> type:int, comment:null)], properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20210524105207_9d0774b2-8108-4800-a5e4-3b950ae03198); 
> Time taken: 0.122 seconds
> INFO  : Executing 
> command(queryId=hive_20210524105207_9d0774b2-8108-4800-a5e4-3b950ae03198): 
> select 1
> INFO  : Completed executing 
> command(queryId=hive_20210524105207_9d0774b2-8108-4800-a5e4-3b950ae03198); 
> Time taken: 0.016 seconds
> INFO  : OK
> +--+
> | _c0  |
> +--+
> | 1    |
> +--+
> 1 row selected (0.318 seconds)
> Beeline version 3.1.3000.7.1.6.0-297 by Apache Hive
> {code}
>  * With hive.session.id
> {code:java}
> $ hive --hiveconf hive.session.id=abcd -e "select 1"
> SLF4J: Class path contains multiple SLF4J bindings.
> ...
> Connected to: Apache Hive (version 3.1.3000.7.1.6.0-297)
> Driver: Hive JDBC (version 3.1.3000.7.1.6.0-297)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> +--+
> | _c0  |
> +--+
> | 1|
> +--+
> 1 row selected (5.862 seconds)
> Beeline version 3.1.3000.7.1.6.0-297 by Apache Hive
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25095) Beeline/hive -e command can't deal with query with trailing quote

2021-05-06 Thread Robbie Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Zhang reassigned HIVE-25095:
---

Assignee: Robbie Zhang

> Beeline/hive -e command can't deal with query with trailing quote
> -
>
> Key: HIVE-25095
> URL: https://issues.apache.org/jira/browse/HIVE-25095
> Project: Hive
>  Issue Type: Bug
>Reporter: Robbie Zhang
>Assignee: Robbie Zhang
>Priority: Major
>
> The command 
> {code:java}
> hive -e 'select "hive"'{code}
> and
> {code:java}
> beeline -e 'select "hive"'{code}
> fail with such error:
> {code:java}
> Error: Error while compiling statement: FAILED: ParseException line 1:12 
> character '' not supported here (state=42000,code=4){code}
> The reason is that org.apache.commons.cli.Util.stripLeadingAndTrailingQuotes 
> in commons-cli-1.2.jar strips the trailing quote so the query string is 
> changed to
> {code:java}
> select "hive{code}
> This bug is fixed in commons-cli-1.3.1 and commons-cli-1.4.jar. The 
> workaround is to overwrite commons-cli-1.2.jar with commons-cli-1.3.1 or 
> commons-cli-1.4.jar.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24839) SubStrStatEstimator.estimate throws NullPointerException

2021-03-02 Thread Robbie Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17293681#comment-17293681
 ] 

Robbie Zhang commented on HIVE-24839:
-

This bug can be worked around by setting hive.stats.estimators.enable to false.

> SubStrStatEstimator.estimate throws NullPointerException
> 
>
> Key: HIVE-24839
> URL: https://issues.apache.org/jira/browse/HIVE-24839
> Project: Hive
>  Issue Type: Bug
>Reporter: Robbie Zhang
>Assignee: Robbie Zhang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This issue can be reproduced by running the following queries:
> {code:java}
> create table t0 (s string);
> create table t1 (s string, i int);
> insert into t0 select "abc";
> insert into t1 select "abc", 4;
> select substr(t0.s, t1.i-1) from t0 join t1 on t0.s=t1.s;
> {code}
> The select query fails with error:
> {code:java}
> Error: Error while compiling statement: FAILED: NullPointerException null 
> (state=42000,code=4)
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24839) SubStrStatEstimator.estimate throws NullPointerException

2021-03-02 Thread Robbie Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17293674#comment-17293674
 ] 

Robbie Zhang commented on HIVE-24839:
-

We can see such backtrace in HS2 log file:
{code:java}
java.lang.NullPointerException
        at 
org.apache.hadoop.hive.ql.udf.UDFSubstr$SubStrStatEstimator.getRangeWidth(UDFSubstr.java:177)
        at 
org.apache.hadoop.hive.ql.udf.UDFSubstr$SubStrStatEstimator.estimate(UDFSubstr.java:156)
        at 
org.apache.hadoop.hive.ql.stats.StatsUtils.getColStatisticsFromExpression(StatsUtils.java:1576)
        at 
org.apache.hadoop.hive.ql.stats.StatsUtils.getColStatisticsFromExprMap(StatsUtils.java:1435)
        at 
org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$SelectStatsRule.process(StatsRulesProcFactory.java:197)
        at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
        at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
        at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
        at 
org.apache.hadoop.hive.ql.lib.LevelOrderWalker.walk(LevelOrderWalker.java:143)
        at 
org.apache.hadoop.hive.ql.lib.LevelOrderWalker.startWalking(LevelOrderWalker.java:122)
        at 
org.apache.hadoop.hive.ql.optimizer.stats.annotation.AnnotateWithStatistics.transform(AnnotateWithStatistics.java:78)
        at 
org.apache.hadoop.hive.ql.parse.TezCompiler.runStatsAnnotation(TezCompiler.java:447)
        at 
org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:185)
        at 
org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:158)
        at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12823)
        at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:422)
        at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:288)
        at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:221)
        at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104)
        at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:188)
        at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:598)
        at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:544)
        at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:538)
        at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:127)
        at 
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:199)
        at 
org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:260)
        at 
org.apache.hive.service.cli.operation.Operation.run(Operation.java:274)
        at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:565)
        at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:551)
        at 
org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:315)
        at 
org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:567)
        at 
org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1557)
        at 
org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1542)
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
        at 
org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
        at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
{code}
 The expression "substr(t0.s, t1.i-1)" has a nested function. The second 
parameter of substr is actually GenericUDFOPMinus. The ColStatistics on it 
doesn't have a valid range. But getRangeWidth doesn't check it:
{code:java}
    private Optional getRangeWidth(Range range) {
      if (range.minValue != null && range.maxValue != null) {
        return Optional.of(range.maxValue.doubleValue() - 
range.minValue.doubleValue());
      }
      return Optional.empty();
    }
{code}
Only 4 UDF classes implement StatEstimatorProvider and only UDFSubstr has this 
bug.

> SubStrStatEstimator.estimate throws NullPointerException
> 
>
> Key: HIVE-24839
> URL: https://issues.apache.org/jira/browse/HIVE-24839
>  

[jira] [Assigned] (HIVE-24839) SubStrStatEstimator.estimate throws NullPointerException

2021-03-02 Thread Robbie Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Zhang reassigned HIVE-24839:
---


> SubStrStatEstimator.estimate throws NullPointerException
> 
>
> Key: HIVE-24839
> URL: https://issues.apache.org/jira/browse/HIVE-24839
> Project: Hive
>  Issue Type: Bug
>Reporter: Robbie Zhang
>Assignee: Robbie Zhang
>Priority: Major
>
> This issue can be reproduced by running the following queries:
> {code:java}
> create table t0 (s string);
> create table t1 (s string, i int);
> insert into t0 select "abc";
> insert into t1 select "abc", 4;
> select substr(t0.s, t1.i-1) from t0 join t1 on t0.s=t1.s;
> {code}
> The select query fails with error:
> {code:java}
> Error: Error while compiling statement: FAILED: NullPointerException null 
> (state=42000,code=4)
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-15820) comment at the head of beeline -e

2021-01-05 Thread Robbie Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-15820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Zhang reassigned HIVE-15820:
---

Assignee: Robbie Zhang  (was: muxin)

> comment at the head of beeline -e
> -
>
> Key: HIVE-15820
> URL: https://issues.apache.org/jira/browse/HIVE-15820
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 1.2.1, 2.1.1
>Reporter: muxin
>Assignee: Robbie Zhang
>Priority: Major
>  Labels: patch, pull-request-available
> Attachments: HIVE-15820.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> $ beeline -u jdbc:hive2://localhost:1 -n test -e "
> > --asdfasdfasdfasdf
> > select * from test_table;
> > "
> expected result of the above command should be all rows of test_table(same as 
> run in beeline interactive mode),but it does not output anything.
> the cause is that -e option will read commands as one string, and in method 
> dispatch(String line) it calls function isComment(String line) in the first, 
> which using
>  'lineTrimmed.startsWith("#") || lineTrimmed.startsWith("--")' 
> to regard commands as a comment.
> two ways can be considered to fix this problem:
> 1. in method initArgs(String[] args), split command by '\n' into command list 
> before dispatch when cl.getOptionValues('e') != null
> 2. in method dispatch(String line), remove comments using this:
> static String removeComments(String line) {
> if (line == null || line.isEmpty()) {
> return line;
> }
> StringBuilder builder = new StringBuilder();
> int escape = -1;
> for (int index = 0; index < line.length(); index++) {
> if (index < line.length() - 1 && line.charAt(index) == 
> line.charAt(index + 1)) {
> if (escape == -1 && line.charAt(index) == '-') {
> //find \n as the end of comment
> index = line.indexOf('\n',index+1);
> //there is no sql after this comment,so just break out
> if (-1==index){
> break;
> }
> }
> }
> char letter = line.charAt(index);
> if (letter == escape) {
> escape = -1; // Turn escape off.
> } else if (escape == -1 && (letter == '\'' || letter == '"')) {
> escape = letter; // Turn escape on.
> }
> builder.append(letter);
> }
> return builder.toString();
>   }
> the second way can be a general solution to remove all comments start with 
> '--'  in a sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-15820) comment at the head of beeline -e

2020-12-24 Thread Robbie Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-15820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254363#comment-17254363
 ] 

Robbie Zhang edited comment on HIVE-15820 at 12/25/20, 1:38 AM:


HIVE-16935 introduces HiveStringUtils.removeComments() into Commands.java but 
this issue still exists. There are two problems:
 # Beeline.isComment() checks a single line but the option '-e' will pass 
multiple lines as a single line to Beeline.dispatch(). If the first line starts 
with '–' or ‘#’, the rest lines are considered as comments so they won't be 
passed to Commands.execute() at all.
 # HiveStringUtils.removeComments(String, int[]) is used for a single line. It 
checks if this line starts with '--' or '#'. If it is, an empty string is 
returned immediately. In fact, for multiple lines, we should use 
HiveStringUtils.removeComments(String).

I'll provide a patch later.


was (Author: robbie):
HIVE-16935 introduces HiveStringUtils.removeComments() into Commands.java but 
this issue still exists. There are two problems:
 # Beeline.isComment() doesn't work properly on multiple lines. If the first 
line starts with '–' or ‘#’, the rest lines are considered as comments so they 
won't be passed to Commands.execute() at all.
 # HiveStringUtils.removeComments(String, int[]) is used for a single line. It 
checks if this line starts with '--' or '#'. If it is, an empty string is 
returned immediately. In fact, for multiple lines, we should use 
HiveStringUtils.removeComments(String).

I'll provide a patch later.

> comment at the head of beeline -e
> -
>
> Key: HIVE-15820
> URL: https://issues.apache.org/jira/browse/HIVE-15820
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 1.2.1, 2.1.1
>Reporter: muxin
>Assignee: muxin
>Priority: Major
>  Labels: patch
> Attachments: HIVE-15820.patch
>
>
> $ beeline -u jdbc:hive2://localhost:1 -n test -e "
> > --asdfasdfasdfasdf
> > select * from test_table;
> > "
> expected result of the above command should be all rows of test_table(same as 
> run in beeline interactive mode),but it does not output anything.
> the cause is that -e option will read commands as one string, and in method 
> dispatch(String line) it calls function isComment(String line) in the first, 
> which using
>  'lineTrimmed.startsWith("#") || lineTrimmed.startsWith("--")' 
> to regard commands as a comment.
> two ways can be considered to fix this problem:
> 1. in method initArgs(String[] args), split command by '\n' into command list 
> before dispatch when cl.getOptionValues('e') != null
> 2. in method dispatch(String line), remove comments using this:
> static String removeComments(String line) {
> if (line == null || line.isEmpty()) {
> return line;
> }
> StringBuilder builder = new StringBuilder();
> int escape = -1;
> for (int index = 0; index < line.length(); index++) {
> if (index < line.length() - 1 && line.charAt(index) == 
> line.charAt(index + 1)) {
> if (escape == -1 && line.charAt(index) == '-') {
> //find \n as the end of comment
> index = line.indexOf('\n',index+1);
> //there is no sql after this comment,so just break out
> if (-1==index){
> break;
> }
> }
> }
> char letter = line.charAt(index);
> if (letter == escape) {
> escape = -1; // Turn escape off.
> } else if (escape == -1 && (letter == '\'' || letter == '"')) {
> escape = letter; // Turn escape on.
> }
> builder.append(letter);
> }
> return builder.toString();
>   }
> the second way can be a general solution to remove all comments start with 
> '--'  in a sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-15820) comment at the head of beeline -e

2020-12-23 Thread Robbie Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-15820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254363#comment-17254363
 ] 

Robbie Zhang commented on HIVE-15820:
-

HIVE-16935 introduces HiveStringUtils.removeComments() into Commands.java but 
this issue still exists. There are two problems:
 # Beeline.isComment() doesn't work properly on multiple lines. If the first 
line starts with '–' or ‘#’, the rest lines are considered as comments so they 
won't be passed to Commands.execute() at all.
 # HiveStringUtils.removeComments(String, int[]) is used for a single line. It 
checks if this line starts with '--' or '#'. If it is, an empty string is 
returned immediately. In fact, for multiple lines, we should use 
HiveStringUtils.removeComments(String).

I'll provide a patch later.

> comment at the head of beeline -e
> -
>
> Key: HIVE-15820
> URL: https://issues.apache.org/jira/browse/HIVE-15820
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 1.2.1, 2.1.1
>Reporter: muxin
>Assignee: muxin
>Priority: Major
>  Labels: patch
> Attachments: HIVE-15820.patch
>
>
> $ beeline -u jdbc:hive2://localhost:1 -n test -e "
> > --asdfasdfasdfasdf
> > select * from test_table;
> > "
> expected result of the above command should be all rows of test_table(same as 
> run in beeline interactive mode),but it does not output anything.
> the cause is that -e option will read commands as one string, and in method 
> dispatch(String line) it calls function isComment(String line) in the first, 
> which using
>  'lineTrimmed.startsWith("#") || lineTrimmed.startsWith("--")' 
> to regard commands as a comment.
> two ways can be considered to fix this problem:
> 1. in method initArgs(String[] args), split command by '\n' into command list 
> before dispatch when cl.getOptionValues('e') != null
> 2. in method dispatch(String line), remove comments using this:
> static String removeComments(String line) {
> if (line == null || line.isEmpty()) {
> return line;
> }
> StringBuilder builder = new StringBuilder();
> int escape = -1;
> for (int index = 0; index < line.length(); index++) {
> if (index < line.length() - 1 && line.charAt(index) == 
> line.charAt(index + 1)) {
> if (escape == -1 && line.charAt(index) == '-') {
> //find \n as the end of comment
> index = line.indexOf('\n',index+1);
> //there is no sql after this comment,so just break out
> if (-1==index){
> break;
> }
> }
> }
> char letter = line.charAt(index);
> if (letter == escape) {
> escape = -1; // Turn escape off.
> } else if (escape == -1 && (letter == '\'' || letter == '"')) {
> escape = letter; // Turn escape on.
> }
> builder.append(letter);
> }
> return builder.toString();
>   }
> the second way can be a general solution to remove all comments start with 
> '--'  in a sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24161) Support Oracle CLOB type in beeline

2020-09-14 Thread Robbie Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Zhang reassigned HIVE-24161:
---

Assignee: Robbie Zhang

> Support Oracle CLOB type in beeline
> ---
>
> Key: HIVE-24161
> URL: https://issues.apache.org/jira/browse/HIVE-24161
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline
>Reporter: Robbie Zhang
>Assignee: Robbie Zhang
>Priority: Major
>
> We can use beeline as a JDBC client to access RDBMS such as Oracle. Sometimes 
> Oracle JDBC driver will return a CLOB object instead of a String object if 
> the string is too long. Beeline used to work well with CLOB type but it's 
> broken by HIVE-14786:
> [https://github.com/apache/hive/blob/2a760dd607e206d7f1061c01075767ecfff40d0c/beeline/src/java/org/apache/hive/beeline/Rows.java#L169]
> In the above line, when Oracle JDBC driver returns a CLOB object, it returns 
> a string like "oracle.sql.CLOB@2f7c7260". In this case, we should use 
> ResultSet.getString() rather than ResultSet.getObject().toString().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)