[jira] [Work logged] (HIVE-23633) Metastore some JDO query objects do not close properly

2021-08-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23633?focusedWorklogId=644761=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-644761
 ]

ASF GitHub Bot logged work on HIVE-23633:
-

Author: ASF GitHub Bot
Created on: 01/Sep/21 04:47
Start Date: 01/Sep/21 04:47
Worklog Time Spent: 10m 
  Work Description: pvary commented on pull request #2344:
URL: https://github.com/apache/hive/pull/2344#issuecomment-909880973


   Thanks for catching this! 
   
   We should close all the queries, even if there is an exception.
   Since Java 8 closing queries in `try-with-resources` would be the best if 
possible, falling back to `finally` when not possible is the second best.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 644761)
Time Spent: 8h 40m  (was: 8.5h)

> Metastore some JDO query objects do not close properly
> --
>
> Key: HIVE-23633
> URL: https://issues.apache.org/jira/browse/HIVE-23633
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-23633.01.patch
>
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
> After patched [HIVE-10895|https://issues.apache.org/jira/browse/HIVE-10895],  
> The metastore still has seen a memory leak on db resources: many 
> StatementImpls left unclosed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23633) Metastore some JDO query objects do not close properly

2021-08-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23633?focusedWorklogId=644760=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-644760
 ]

ASF GitHub Bot logged work on HIVE-23633:
-

Author: ASF GitHub Bot
Created on: 01/Sep/21 04:42
Start Date: 01/Sep/21 04:42
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2344:
URL: https://github.com/apache/hive/pull/2344#discussion_r699838852



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
##
@@ -1454,12 +1455,14 @@ public ColumnStatistics getTableStats(final String 
catName, final String dbName,
   }
 };
 List list = Batchable.runBatched(batchSize, colNames, b);
+final ColumnStatistics result;
 if (list.isEmpty()) {
-  return null;
+  result = null;
+} else {
+  ColumnStatisticsDesc csd = new ColumnStatisticsDesc(true, dbName, 
tableName);
+  csd.setCatName(catName);
+  result = makeColumnStats(list, csd, 0, engine);
 }

Review comment:
   What happens if there is an exception? Should we close the query there 
too? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 644760)
Time Spent: 8.5h  (was: 8h 20m)

> Metastore some JDO query objects do not close properly
> --
>
> Key: HIVE-23633
> URL: https://issues.apache.org/jira/browse/HIVE-23633
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-23633.01.patch
>
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> After patched [HIVE-10895|https://issues.apache.org/jira/browse/HIVE-10895],  
> The metastore still has seen a memory leak on db resources: many 
> StatementImpls left unclosed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23633) Metastore some JDO query objects do not close properly

2021-08-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23633?focusedWorklogId=644758=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-644758
 ]

ASF GitHub Bot logged work on HIVE-23633:
-

Author: ASF GitHub Bot
Created on: 01/Sep/21 04:40
Start Date: 01/Sep/21 04:40
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2344:
URL: https://github.com/apache/hive/pull/2344#discussion_r699838158



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
##
@@ -8142,9 +8146,11 @@ private void dropPartitionAllColumnGrantsNoTxn(
   query.declareParameters("java.lang.String t1");
   mSecurityDCList = (List) query.execute(dcName);
 }
+try (Query q = query) {

Review comment:
   We might have an error during execution. Should we close the query there 
too? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 644758)
Time Spent: 8h 20m  (was: 8h 10m)

> Metastore some JDO query objects do not close properly
> --
>
> Key: HIVE-23633
> URL: https://issues.apache.org/jira/browse/HIVE-23633
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-23633.01.patch
>
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> After patched [HIVE-10895|https://issues.apache.org/jira/browse/HIVE-10895],  
> The metastore still has seen a memory leak on db resources: many 
> StatementImpls left unclosed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23633) Metastore some JDO query objects do not close properly

2021-08-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23633?focusedWorklogId=644757=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-644757
 ]

ASF GitHub Bot logged work on HIVE-23633:
-

Author: ASF GitHub Bot
Created on: 01/Sep/21 04:36
Start Date: 01/Sep/21 04:36
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2344:
URL: https://github.com/apache/hive/pull/2344#discussion_r699837099



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
##
@@ -8142,9 +8146,11 @@ private void dropPartitionAllColumnGrantsNoTxn(
   query.declareParameters("java.lang.String t1");
   mSecurityDCList = (List) query.execute(dcName);
 }
+try (Query q = query) {
 pm.retrieveAll(mSecurityDCList);

Review comment:
   NIT: formatting




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 644757)
Time Spent: 8h 10m  (was: 8h)

> Metastore some JDO query objects do not close properly
> --
>
> Key: HIVE-23633
> URL: https://issues.apache.org/jira/browse/HIVE-23633
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-23633.01.patch
>
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> After patched [HIVE-10895|https://issues.apache.org/jira/browse/HIVE-10895],  
> The metastore still has seen a memory leak on db resources: many 
> StatementImpls left unclosed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23633) Metastore some JDO query objects do not close properly

2021-08-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23633?focusedWorklogId=644706=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-644706
 ]

ASF GitHub Bot logged work on HIVE-23633:
-

Author: ASF GitHub Bot
Created on: 01/Sep/21 01:36
Start Date: 01/Sep/21 01:36
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on pull request #2344:
URL: https://github.com/apache/hive/pull/2344#issuecomment-909790783


   Hi @pvary, @nrg4878,  could you please take a look if have secs?
   Thanks!
   Zhihua Deng


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 644706)
Time Spent: 8h  (was: 7h 50m)

> Metastore some JDO query objects do not close properly
> --
>
> Key: HIVE-23633
> URL: https://issues.apache.org/jira/browse/HIVE-23633
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-23633.01.patch
>
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> After patched [HIVE-10895|https://issues.apache.org/jira/browse/HIVE-10895],  
> The metastore still has seen a memory leak on db resources: many 
> StatementImpls left unclosed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25482) Add option to enable connectionLeak detection for Hikari datasource

2021-08-31 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan resolved HIVE-25482.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Merged the PR. 

Thanks for the contribution [~aleksandr_pashkovskii]. 

> Add option to enable connectionLeak detection for Hikari datasource
> ---
>
> Key: HIVE-25482
> URL: https://issues.apache.org/jira/browse/HIVE-25482
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Aleksandr Pashkovskii
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> There are corner cases where we observed connection leaks to DB.
>  
> It will be good to add an option to provide connection leak timeout parameter 
> in HikariCPDataSourceProvider.
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/datasource/HikariCPDataSourceProvider.java#L69]
> e.g following should help Hikari to warn about connection leak, when a 
> connection is not returned to the pool for 1 hour.
> {noformat}
> config.setLeakDetectionThreshold(3600*1000); {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25482) Add option to enable connectionLeak detection for Hikari datasource

2021-08-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25482?focusedWorklogId=644684=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-644684
 ]

ASF GitHub Bot logged work on HIVE-25482:
-

Author: ASF GitHub Bot
Created on: 31/Aug/21 23:54
Start Date: 31/Aug/21 23:54
Worklog Time Spent: 10m 
  Work Description: rbalamohan merged pull request #2610:
URL: https://github.com/apache/hive/pull/2610


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 644684)
Time Spent: 2.5h  (was: 2h 20m)

> Add option to enable connectionLeak detection for Hikari datasource
> ---
>
> Key: HIVE-25482
> URL: https://issues.apache.org/jira/browse/HIVE-25482
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Aleksandr Pashkovskii
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> There are corner cases where we observed connection leaks to DB.
>  
> It will be good to add an option to provide connection leak timeout parameter 
> in HikariCPDataSourceProvider.
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/datasource/HikariCPDataSourceProvider.java#L69]
> e.g following should help Hikari to warn about connection leak, when a 
> connection is not returned to the pool for 1 hour.
> {noformat}
> config.setLeakDetectionThreshold(3600*1000); {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23896) hiveserver2 not listening on any port, am i miss some configurations?

2021-08-31 Thread Chao Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17407703#comment-17407703
 ] 

Chao Sun commented on HIVE-23896:
-

[~brahmareddy] I don't think there is PR for this.

> hiveserver2 not listening on any port, am i miss some configurations?
> -
>
> Key: HIVE-23896
> URL: https://issues.apache.org/jira/browse/HIVE-23896
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.1.2
> Environment: hive: 3.1.2
> hadoop: 3.2.1, standalone, url: hdfs://namenode.hadoop.svc.cluster.local:9000
> {quote}$ $HADOOP_HOME/bin/hadoop fs -mkdir /tmp
>  $ $HADOOP_HOME/bin/hadoop fs -mkdir /user/hive/warehouse
> {quote}
> hadoop commands  are workable in the hiveserver node(POD).
>  
>Reporter: alanwake
>Priority: Blocker
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
>  
>  
> i try deply hive 3.1.2 on k8s.  it was worked on version 2.3.2.
> metastore node and postgres node are ok, but hiveserver look like i miss some 
> important configuration properties?
> {code:java}
>  {code}
>  
>  
>  
> {code:java}
> [root@master hive]# ./get.sh 
> NAME READY   STATUSRESTARTS   AGE   IP
>  NODE   NOMINATED NODE   READINESS GATES
> hive-7bd48747d4-5zjmh1/1 Running   0  56s   10.244.3.110  
>  node03.51.local  
> metastore-66b58f9f76-6wsxj   1/1 Running   0  56s   10.244.3.109  
>  node03.51.local  
> postgres-57794b99b7-pqxwm1/1 Running   0  56s   10.244.2.241  
>  node02.51.local  NAMETYPECLUSTER-IP  
>  EXTERNAL-IP   PORT(S)   AGE   SELECTOR
> hiveNodePort10.108.40.17 
> 10002:30626/TCP,1:31845/TCP   56s   app=hive
> metastore   ClusterIP   10.106.159.220   9083/TCP   
>56s   app=metastore
> postgresClusterIP   10.108.85.47 5432/TCP   
>56s   app=postgres
> {code}
>  
>  
> {code:java}
> [root@master hive]# kubectl logs hive-7bd48747d4-5zjmh -n=hive
> Configuring core
>  - Setting hadoop.proxyuser.hue.hosts=*
>  - Setting fs.defaultFS=hdfs://namenode.hadoop.svc.cluster.local:9000
>  - Setting hadoop.http.staticuser.user=root
>  - Setting hadoop.proxyuser.hue.groups=*
> Configuring hdfs
>  - Setting dfs.namenode.datanode.registration.ip-hostname-check=false
>  - Setting dfs.webhdfs.enabled=true
>  - Setting dfs.permissions.enabled=false
> Configuring yarn
>  - Setting yarn.timeline-service.enabled=true
>  - Setting yarn.resourcemanager.system-metrics-publisher.enabled=true
>  - Setting 
> yarn.resourcemanager.store.class=org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore
>  - Setting 
> yarn.log.server.url=http://historyserver.hadoop.svc.cluster.local:8188/applicationhistory/logs/
>  - Setting yarn.resourcemanager.fs.state-store.uri=/rmstate
>  - Setting yarn.timeline-service.generic-application-history.enabled=true
>  - Setting yarn.log-aggregation-enable=true
>  - Setting 
> yarn.resourcemanager.hostname=resourcemanager.hadoop.svc.cluster.local
>  - Setting 
> yarn.resourcemanager.resource.tracker.address=resourcemanager.hadoop.svc.cluster.local:8031
>  - Setting 
> yarn.timeline-service.hostname=historyserver.hadoop.svc.cluster.local
>  - Setting 
> yarn.resourcemanager.scheduler.address=resourcemanager.hadoop.svc.cluster.local:8030
>  - Setting 
> yarn.resourcemanager.address=resourcemanager.hadoop.svc.cluster.local:8032
>  - Setting yarn.nodemanager.remote-app-log-dir=/app-logs
>  - Setting yarn.resourcemanager.recovery.enabled=true
> Configuring httpfs
> Configuring kms
> Configuring mapred
> Configuring hive
>  - Setting datanucleus.autoCreateSchema=false
>  - Setting javax.jdo.option.ConnectionPassword=hive
>  - Setting hive.metastore.uris=thrift://metastore:9083
>  - Setting 
> javax.jdo.option.ConnectionURL=jdbc:postgresql://metastore/metastore
>  - Setting javax.jdo.option.ConnectionUserName=hive
>  - Setting javax.jdo.option.ConnectionDriverName=org.postgresql.Driver
> Configuring for multihomed network
> [1/100] check for metastore:9083...
> [1/100] metastore:9083 is not available yet
> [1/100] try in 5s once again ...
> [2/100] check for metastore:9083...
> [2/100] metastore:9083 is not available yet
> [2/100] try in 5s once again ...
> [3/100] check for metastore:9083...
> [3/100] metastore:9083 is not available yet
> [3/100] try in 5s once again ...
> [4/100] check for metastore:9083...
> [4/100] metastore:9083 is not available yet
> [4/100] try in 5s once again ...
> [5/100] metastore:9083 is available.
> mkdir: `/tmp': File exists
> 2020-07-22 07:15:33: 

[jira] [Assigned] (HIVE-25493) TBLPROPERTIES upper- vs. lower-case confusion

2021-08-31 Thread Matt McCline (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline reassigned HIVE-25493:
---


> TBLPROPERTIES upper- vs. lower-case confusion
> -
>
> Key: HIVE-25493
> URL: https://issues.apache.org/jira/browse/HIVE-25493
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Major
>
> User confused by ALTER TABLE SET PROPERTIES difference between 
> 'EXTERNAL'='FALSE' (ignored adds 2 properties EXTERNAL and FALSE) and 
> 'external'='false' (transaction error).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value

2021-08-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=644403=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-644403
 ]

ASF GitHub Bot logged work on HIVE-23688:
-

Author: ASF GitHub Bot
Created on: 31/Aug/21 15:20
Start Date: 31/Aug/21 15:20
Worklog Time Spent: 10m 
  Work Description: maheshk114 commented on pull request #2479:
URL: https://github.com/apache/hive/pull/2479#issuecomment-908894870


   LGTM +1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 644403)
Time Spent: 4h 50m  (was: 4h 40m)

> Vectorization: IndexArrayOutOfBoundsException For map type column which 
> includes null value
> ---
>
> Key: HIVE-23688
> URL: https://issues.apache.org/jira/browse/HIVE-23688
> Project: Hive
>  Issue Type: Bug
>  Components: Parquet, storage-api, Vectorization
>Affects Versions: All Versions
>Reporter: 范宜臻
>Assignee: László Bodor
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-23688.patch
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays 
> in MapColumnVector.values(BytesColumnVector) when values in map contain 
> {color:#de350b}null{color}
> reproduce in master branch:
> {code:java}
> set hive.vectorized.execution.enabled=true; 
> CREATE TABLE parquet_map_type (id int,stringMap map) 
> stored as parquet; 
> insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', 
> 'bar'); 
> select id, stringMap['k1'] from parquet_map_type group by 1,2;
> {code}
> query explain:
> {code:java}
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2 vectorized
>   File Output Operator [FS_12]
> Group By Operator [GBY_11] (rows=5 width=2)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE] vectorized
>   SHUFFLE [RS_10]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_9] (rows=10 width=2)
>   Output:["_col0","_col1"],keys:_col0, _col1
>   Select Operator [SEL_8] (rows=10 width=2)
> Output:["_col0","_col1"]
> TableScan [TS_0] (rows=10 width=2)
>   
> temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"]
> {code}
> runtime error:
> {code:java}
> Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, 
> diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row 
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
>   at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> 

[jira] [Work logged] (HIVE-24762) StringValueBoundaryScanner ignores boundary which leads to incorrect results

2021-08-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24762?focusedWorklogId=644412=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-644412
 ]

ASF GitHub Bot logged work on HIVE-24762:
-

Author: ASF GitHub Bot
Created on: 31/Aug/21 15:21
Start Date: 31/Aug/21 15:21
Worklog Time Spent: 10m 
  Work Description: abstractdog opened a new pull request #1965:
URL: https://github.com/apache/hive/pull/1965


   ### What changes were proposed in this pull request?
   StringValueBoundaryScanner.isDistanceGreater to take amt into account.
   
   
   ### Why are the changes needed?
   Described in jira.
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   
   ### How was this patch tested?
   Added string based range window to ptf.q.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 644412)
Time Spent: 1.5h  (was: 1h 20m)

>  StringValueBoundaryScanner ignores boundary which leads to incorrect results
> -
>
> Key: HIVE-24762
> URL: https://issues.apache.org/jira/browse/HIVE-24762
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/ValueBoundaryScanner.java#L901
> {code}
>   public boolean isDistanceGreater(Object v1, Object v2, int amt) {
> ...
> return s1 != null && s2 != null && s1.compareTo(s2) > 0;
> {code}
> Like other boundary scanners, StringValueBoundaryScanner should take amt into 
> account, otherwise it'll result in the same range regardless of the given 
> window size. This typically affects queries where the range is defined on a 
> string column:
> {code}
> select p_mfgr, p_name, p_retailprice,
> count(*) over(partition by p_mfgr order by p_name range between 1 preceding 
> and current row) as cs1,
> count(*) over(partition by p_mfgr order by p_name range between 3 preceding 
> and current row) as cs2
> from vector_ptf_part_simple_orc;
> {code} 
> with "> 0" cs1 and cs2 will be calculated on the same window, so cs1 == cs2, 
> but actually it should be different, this is the correct result (see "almond 
> antique olive coral navajo"):
> {code}
> +-+-+--+--+
> | p_mfgr  |   p_name| cs1  | cs2  
> |
> +-+-+--+--+
> | Manufacturer#1  | almond antique burnished rose metallic  | 2| 2
> |
> | Manufacturer#1  | almond antique burnished rose metallic  | 2| 2
> |
> | Manufacturer#1  | almond antique chartreuse lavender yellow   | 6| 6
> |
> | Manufacturer#1  | almond antique chartreuse lavender yellow   | 6| 6
> |
> | Manufacturer#1  | almond antique chartreuse lavender yellow   | 6| 6
> |
> | Manufacturer#1  | almond antique chartreuse lavender yellow   | 6| 6
> |
> | Manufacturer#1  | almond antique salmon chartreuse burlywood  | 1| 1
> |
> | Manufacturer#1  | almond aquamarine burnished black steel | 1| 8
> |
> | Manufacturer#1  | almond aquamarine pink moccasin thistle | 4| 4
> |
> | Manufacturer#1  | almond aquamarine pink moccasin thistle | 4| 4
> |
> | Manufacturer#1  | almond aquamarine pink moccasin thistle | 4| 4
> |
> | Manufacturer#1  | almond aquamarine pink moccasin thistle | 4| 4
> |
> | Manufacturer#2  | almond antique violet chocolate turquoise   | 1| 1
> |
> | Manufacturer#2  | almond antique violet turquoise frosted | 3| 3
> |
> | Manufacturer#2  | almond antique violet turquoise frosted | 3| 3
> |
> | Manufacturer#2  | almond antique violet turquoise frosted | 3| 3
> |
> | Manufacturer#2  | almond aquamarine midnight light salmon | 1| 5
> |
> | Manufacturer#2  | almond aquamarine rose maroon antique   | 2| 2
> |
> | Manufacturer#2  | almond aquamarine rose maroon antique   | 2| 2
> |
> | Manufacturer#2  | almond aquamarine sandy cyan gainsboro  | 3| 3
> |
> | Manufacturer#3  | almond antique chartreuse khaki white   | 1| 1
> |
> | Manufacturer#3  | almond antique forest lavender goldenrod| 4| 5
> |
> | 

[jira] [Work logged] (HIVE-25480) Fix Time Travel with CBO

2021-08-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25480?focusedWorklogId=644400=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-644400
 ]

ASF GitHub Bot logged work on HIVE-25480:
-

Author: ASF GitHub Bot
Created on: 31/Aug/21 15:19
Start Date: 31/Aug/21 15:19
Worklog Time Spent: 10m 
  Work Description: pvary merged pull request #2602:
URL: https://github.com/apache/hive/pull/2602


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 644400)
Time Spent: 0.5h  (was: 20m)

> Fix Time Travel with CBO
> 
>
> Key: HIVE-25480
> URL: https://issues.apache.org/jira/browse/HIVE-25480
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When CBO is enable the Time Travel features are not working



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24471) Add support for combiner in hash mode group aggregation

2021-08-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24471?focusedWorklogId=644280=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-644280
 ]

ASF GitHub Bot logged work on HIVE-24471:
-

Author: ASF GitHub Bot
Created on: 31/Aug/21 15:06
Start Date: 31/Aug/21 15:06
Worklog Time Spent: 10m 
  Work Description: maheshk114 opened a new pull request #2611:
URL: https://github.com/apache/hive/pull/2611


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 644280)
Time Spent: 2.5h  (was: 2h 20m)

> Add support for combiner in hash mode group aggregation 
> 
>
> Key: HIVE-24471
> URL: https://issues.apache.org/jira/browse/HIVE-24471
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> In map side group aggregation, partial grouped aggregation is calculated to 
> reduce the data written to disk by map task. In case of hash aggregation, 
> where the input data is not sorted, hash table is used (with sorting also 
> being performed before flushing). If the hash table size increases beyond 
> configurable limit, data is flushed to disk and new hash table is generated. 
> If the reduction by hash table is less than min hash aggregation reduction 
> calculated during compile time, the map side aggregation is converted to 
> streaming mode. So if the first few batch of records does not result into 
> significant reduction, then the mode is switched to streaming mode. This may 
> have impact on performance, if the subsequent batch of records have less 
> number of distinct values. 
> To improve performance both in Hash and Streaming mode, a combiner can be 
> added to the map task after the keys are sorted. This will make sure that the 
> aggregation is done if possible and reduce the data written to disk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25248) Fix TestLlapTaskSchedulerService#testForcedLocalityMultiplePreemptionsSameHost1

2021-08-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25248?focusedWorklogId=644282=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-644282
 ]

ASF GitHub Bot logged work on HIVE-25248:
-

Author: ASF GitHub Bot
Created on: 31/Aug/21 15:06
Start Date: 31/Aug/21 15:06
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #2420:
URL: https://github.com/apache/hive/pull/2420


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 644282)
Time Spent: 50m  (was: 40m)

> Fix 
> TestLlapTaskSchedulerService#testForcedLocalityMultiplePreemptionsSameHost1
> ---
>
> Key: HIVE-25248
> URL: https://issues.apache.org/jira/browse/HIVE-25248
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This test is failing randomly recently
> http://ci.hive.apache.org/job/hive-flaky-check/233/testReport/org.apache.hadoop.hive.llap.tezplugins/TestLlapTaskSchedulerService/testForcedLocalityMultiplePreemptionsSameHost1/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25303) CTAS hive.create.as.external.legacy tries to place data files in managed WH path

2021-08-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25303?focusedWorklogId=644265=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-644265
 ]

ASF GitHub Bot logged work on HIVE-25303:
-

Author: ASF GitHub Bot
Created on: 31/Aug/21 15:04
Start Date: 31/Aug/21 15:04
Worklog Time Spent: 10m 
  Work Description: pvary commented on pull request #2442:
URL: https://github.com/apache/hive/pull/2442#issuecomment-909027286


   CC: @marton-bod 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 644265)
Time Spent: 2h 20m  (was: 2h 10m)

> CTAS hive.create.as.external.legacy tries to place data files in managed WH 
> path
> 
>
> Key: HIVE-25303
> URL: https://issues.apache.org/jira/browse/HIVE-25303
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Standalone Metastore
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Under legacy table creation mode (hive.create.as.external.legacy=true), when 
> a database has been created in a specific LOCATION, in a session where that 
> database is Used, tables are created using the following command:
> {code:java}
> CREATE TABLE  AS SELECT {code}
> should inherit the HDFS path from the database's location. Instead, Hive is 
> trying to write the table data into 
> /warehouse/tablespace/managed/hive//
> +Design+: 
> In the CTAS query, first data is written in the target directory (which 
> happens in HS2) and then the table is created(This happens in HMS). So here 
> two decisions are being made i) target directory location ii) how the table 
> should be created (table type, sd e.t.c).
> When HS2 needs a target location that needs to be set, it'll make create 
> table dry run call to HMS (where table translation happens) and i) and ii) 
> decisions are made within HMS and returns table object. Then HS2 will use 
> this location set by HMS for placing the data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24590) Operation Logging still leaks the log4j Appenders

2021-08-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24590?focusedWorklogId=644250=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-644250
 ]

ASF GitHub Bot logged work on HIVE-24590:
-

Author: ASF GitHub Bot
Created on: 31/Aug/21 15:03
Start Date: 31/Aug/21 15:03
Worklog Time Spent: 10m 
  Work Description: zabetak commented on pull request #2432:
URL: https://github.com/apache/hive/pull/2432#issuecomment-909110409


   Hey @prasanthj , can you have a final look on this for getting this in? It 
seems that more and more people are bumping into the problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 644250)
Time Spent: 3h  (was: 2h 50m)

> Operation Logging still leaks the log4j Appenders
> -
>
> Key: HIVE-24590
> URL: https://issues.apache.org/jira/browse/HIVE-24590
> Project: Hive
>  Issue Type: Bug
>  Components: Logging
>Reporter: Eugene Chung
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screen Shot 2021-01-06 at 18.42.05.png, Screen Shot 
> 2021-01-06 at 18.42.24.png, Screen Shot 2021-01-06 at 18.42.55.png, Screen 
> Shot 2021-01-06 at 21.38.32.png, Screen Shot 2021-01-06 at 21.47.28.png, 
> Screen Shot 2021-01-08 at 21.01.40.png, add_debug_log_and_trace.patch
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> I'm using Hive 3.1.2 with options below.
>  * hive.server2.logging.operation.enabled=true
>  * hive.server2.logging.operation.level=VERBOSE
>  * hive.async.log.enabled=false
> I already know the ticket, https://issues.apache.org/jira/browse/HIVE-17128 
> but HS2 still leaks log4j RandomAccessFileManager.
> !Screen Shot 2021-01-06 at 18.42.05.png|width=756,height=197!
> I checked the operation log file which is not closed/deleted properly.
> !Screen Shot 2021-01-06 at 18.42.24.png|width=603,height=272!
> Then there's the log,
> {code:java}
> client.TezClient: Shutting down Tez Session, sessionName= {code}
> !Screen Shot 2021-01-06 at 18.42.55.png|width=1372,height=26!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value

2021-08-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=644230=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-644230
 ]

ASF GitHub Bot logged work on HIVE-23688:
-

Author: ASF GitHub Bot
Created on: 31/Aug/21 15:01
Start Date: 31/Aug/21 15:01
Worklog Time Spent: 10m 
  Work Description: abstractdog merged pull request #2479:
URL: https://github.com/apache/hive/pull/2479


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 644230)
Time Spent: 4h 40m  (was: 4.5h)

> Vectorization: IndexArrayOutOfBoundsException For map type column which 
> includes null value
> ---
>
> Key: HIVE-23688
> URL: https://issues.apache.org/jira/browse/HIVE-23688
> Project: Hive
>  Issue Type: Bug
>  Components: Parquet, storage-api, Vectorization
>Affects Versions: All Versions
>Reporter: 范宜臻
>Assignee: László Bodor
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-23688.patch
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays 
> in MapColumnVector.values(BytesColumnVector) when values in map contain 
> {color:#de350b}null{color}
> reproduce in master branch:
> {code:java}
> set hive.vectorized.execution.enabled=true; 
> CREATE TABLE parquet_map_type (id int,stringMap map) 
> stored as parquet; 
> insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', 
> 'bar'); 
> select id, stringMap['k1'] from parquet_map_type group by 1,2;
> {code}
> query explain:
> {code:java}
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2 vectorized
>   File Output Operator [FS_12]
> Group By Operator [GBY_11] (rows=5 width=2)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE] vectorized
>   SHUFFLE [RS_10]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_9] (rows=10 width=2)
>   Output:["_col0","_col1"],keys:_col0, _col1
>   Select Operator [SEL_8] (rows=10 width=2)
> Output:["_col0","_col1"]
> TableScan [TS_0] (rows=10 width=2)
>   
> temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"]
> {code}
> runtime error:
> {code:java}
> Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, 
> diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row 
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
>   at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> 

[jira] [Work logged] (HIVE-25233) Removing deprecated unix_timestamp UDF

2021-08-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25233?focusedWorklogId=644208=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-644208
 ]

ASF GitHub Bot logged work on HIVE-25233:
-

Author: ASF GitHub Bot
Created on: 31/Aug/21 14:59
Start Date: 31/Aug/21 14:59
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #2380:
URL: https://github.com/apache/hive/pull/2380


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 644208)
Time Spent: 1h 40m  (was: 1.5h)

> Removing deprecated unix_timestamp UDF
> --
>
> Key: HIVE-25233
> URL: https://issues.apache.org/jira/browse/HIVE-25233
> Project: Hive
>  Issue Type: Task
>  Components: UDF
>Affects Versions: All Versions
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Description
> Since unix_timestamp() UDF was deprecated as part of 
> https://issues.apache.org/jira/browse/HIVE-10728. Internal 
> GenericUDFUnixTimeStamp extend GenericUDFToUnixTimeStamp and call 
> to_utc_timestamp() for unix_timestamp(string date) & unix_timestamp(string 
> date, string pattern).
> unix_timestamp()   => CURRENT_TIMESTAMP
> unix_timestamp(string date) => to_unix_timestamp()
> unix_timestamp(string date, string pattern) => to_unix_timestamp()
> We should clean up unix_timestamp() and points to to_unix_timestamp()
>



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25482) Add option to enable connectionLeak detection for Hikari datasource

2021-08-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25482?focusedWorklogId=644202=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-644202
 ]

ASF GitHub Bot logged work on HIVE-25482:
-

Author: ASF GitHub Bot
Created on: 31/Aug/21 14:58
Start Date: 31/Aug/21 14:58
Worklog Time Spent: 10m 
  Work Description: avpash43 commented on pull request #2610:
URL: https://github.com/apache/hive/pull/2610#issuecomment-908906935






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 644202)
Time Spent: 2h 20m  (was: 2h 10m)

> Add option to enable connectionLeak detection for Hikari datasource
> ---
>
> Key: HIVE-25482
> URL: https://issues.apache.org/jira/browse/HIVE-25482
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Aleksandr Pashkovskii
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> There are corner cases where we observed connection leaks to DB.
>  
> It will be good to add an option to provide connection leak timeout parameter 
> in HikariCPDataSourceProvider.
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/datasource/HikariCPDataSourceProvider.java#L69]
> e.g following should help Hikari to warn about connection leak, when a 
> connection is not returned to the pool for 1 hour.
> {noformat}
> config.setLeakDetectionThreshold(3600*1000); {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25482) Add option to enable connectionLeak detection for Hikari datasource

2021-08-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25482?focusedWorklogId=644186=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-644186
 ]

ASF GitHub Bot logged work on HIVE-25482:
-

Author: ASF GitHub Bot
Created on: 31/Aug/21 14:57
Start Date: 31/Aug/21 14:57
Worklog Time Spent: 10m 
  Work Description: avpash43 commented on a change in pull request #2610:
URL: https://github.com/apache/hive/pull/2610#discussion_r698521908



##
File path: 
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java
##
@@ -602,6 +602,13 @@ public static ConfVars getMetaConf(String name) {
 CONNECTION_USER_NAME("javax.jdo.option.ConnectionUserName",
 "javax.jdo.option.ConnectionUserName", "APP",
 "Username to use against metastore database"),
+
CONNECTION_LEAK_DETECTION_THRESHOLD("javax.jdo.option.ConnectionLeakDetectionThreshold",

Review comment:
   "Can you refer to MetaStoreConf?" - I didn't catch you. What do you 
mean? Just create constant or what?

##
File path: 
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java
##
@@ -602,6 +602,13 @@ public static ConfVars getMetaConf(String name) {
 CONNECTION_USER_NAME("javax.jdo.option.ConnectionUserName",
 "javax.jdo.option.ConnectionUserName", "APP",
 "Username to use against metastore database"),
+
CONNECTION_LEAK_DETECTION_THRESHOLD("javax.jdo.option.ConnectionLeakDetectionThreshold",

Review comment:
   "Can you refer to MetaStoreConf?" - I didn't catch you. What do you 
mean? Just create constant or what?

##
File path: 
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java
##
@@ -602,6 +602,13 @@ public static ConfVars getMetaConf(String name) {
 CONNECTION_USER_NAME("javax.jdo.option.ConnectionUserName",
 "javax.jdo.option.ConnectionUserName", "APP",
 "Username to use against metastore database"),
+
CONNECTION_LEAK_DETECTION_THRESHOLD("javax.jdo.option.ConnectionLeakDetectionThreshold",

Review comment:
   fixed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 644186)
Time Spent: 2h 10m  (was: 2h)

> Add option to enable connectionLeak detection for Hikari datasource
> ---
>
> Key: HIVE-25482
> URL: https://issues.apache.org/jira/browse/HIVE-25482
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Aleksandr Pashkovskii
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> There are corner cases where we observed connection leaks to DB.
>  
> It will be good to add an option to provide connection leak timeout parameter 
> in HikariCPDataSourceProvider.
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/datasource/HikariCPDataSourceProvider.java#L69]
> e.g following should help Hikari to warn about connection leak, when a 
> connection is not returned to the pool for 1 hour.
> {noformat}
> config.setLeakDetectionThreshold(3600*1000); {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25241) Simplify Metrics System

2021-08-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25241?focusedWorklogId=644180=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-644180
 ]

ASF GitHub Bot logged work on HIVE-25241:
-

Author: ASF GitHub Bot
Created on: 31/Aug/21 14:56
Start Date: 31/Aug/21 14:56
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #2388:
URL: https://github.com/apache/hive/pull/2388


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 644180)
Time Spent: 50m  (was: 40m)

> Simplify Metrics System
> ---
>
> Key: HIVE-25241
> URL: https://issues.apache.org/jira/browse/HIVE-25241
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Took a look at the {{Metrics}} stuff in Hive and found a lot of boilerplate 
> code on the client code to interact with Metrics.  It's too much stuff and 
> it's done differently in different places.
> * Never allow Metrics System to be "null" - supply a no-op version by default
> * Metrics system should never throw an error to the client, just 
> log-and-ignore. Metrics shouldn't break a query or other operation
> * General cleanup



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25482) Add option to enable connectionLeak detection for Hikari datasource

2021-08-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25482?focusedWorklogId=644149=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-644149
 ]

ASF GitHub Bot logged work on HIVE-25482:
-

Author: ASF GitHub Bot
Created on: 31/Aug/21 14:53
Start Date: 31/Aug/21 14:53
Worklog Time Spent: 10m 
  Work Description: rbalamohan commented on pull request #2610:
URL: https://github.com/apache/hive/pull/2610#issuecomment-908892050






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 644149)
Time Spent: 2h  (was: 1h 50m)

> Add option to enable connectionLeak detection for Hikari datasource
> ---
>
> Key: HIVE-25482
> URL: https://issues.apache.org/jira/browse/HIVE-25482
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Aleksandr Pashkovskii
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> There are corner cases where we observed connection leaks to DB.
>  
> It will be good to add an option to provide connection leak timeout parameter 
> in HikariCPDataSourceProvider.
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/datasource/HikariCPDataSourceProvider.java#L69]
> e.g following should help Hikari to warn about connection leak, when a 
> connection is not returned to the pool for 1 hour.
> {noformat}
> config.setLeakDetectionThreshold(3600*1000); {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-25482) Add option to enable connectionLeak detection for Hikari datasource

2021-08-31 Thread Aleksandr Pashkovskii (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-25482 started by Aleksandr Pashkovskii.

> Add option to enable connectionLeak detection for Hikari datasource
> ---
>
> Key: HIVE-25482
> URL: https://issues.apache.org/jira/browse/HIVE-25482
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Aleksandr Pashkovskii
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> There are corner cases where we observed connection leaks to DB.
>  
> It will be good to add an option to provide connection leak timeout parameter 
> in HikariCPDataSourceProvider.
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/datasource/HikariCPDataSourceProvider.java#L69]
> e.g following should help Hikari to warn about connection leak, when a 
> connection is not returned to the pool for 1 hour.
> {noformat}
> config.setLeakDetectionThreshold(3600*1000); {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25474) Concurrency add jars cause hiveserver2 sys cpu to high

2021-08-31 Thread guangbao zhao (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17407296#comment-17407296
 ] 

guangbao zhao commented on HIVE-25474:
--

[~mgergely] Can help review?thanks

> Concurrency add jars cause hiveserver2 sys cpu to high
> --
>
> Key: HIVE-25474
> URL: https://issues.apache.org/jira/browse/HIVE-25474
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive, HiveServer2
>Affects Versions: 3.1.2
>Reporter: guangbao zhao
>Assignee: guangbao zhao
>Priority: Major
> Attachments: HIVE-25474.jpg, HIVE-25474.patch, PermissionTest.java
>
>
> In the Linux environment, adding multiple jars concurrently through HiveCli 
> or JDBC will increase the system cpu and even affect the service. Finally, we 
> found that when the add jar is executed, the FileUtil chmod method is used to 
> grant permissions to the downloaded jar file. The performance of this method 
> is very low. So we use the setPosixFilePermissions method of the Files class 
> to test. The performance is seventy to eighty times that of FileUtil (the 
> same file is given permissions in multiple cycles, when it is cycled 1000 
> times), and as the number of cycles increases, the gap becomes larger and 
> larger. But the file requires jdk7+, which is not friendly to windows. 
> Therefore, if you use the setPosixFilePermissions method of the Files class 
> to grant permissions to files in an operating system that conforms to the 
> posix specification(tested on Mac and Linux), the performance will be 
> improved.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24762) StringValueBoundaryScanner ignores boundary which leads to incorrect results

2021-08-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24762?focusedWorklogId=643994=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-643994
 ]

ASF GitHub Bot logged work on HIVE-24762:
-

Author: ASF GitHub Bot
Created on: 31/Aug/21 12:24
Start Date: 31/Aug/21 12:24
Worklog Time Spent: 10m 
  Work Description: abstractdog opened a new pull request #1965:
URL: https://github.com/apache/hive/pull/1965


   ### What changes were proposed in this pull request?
   StringValueBoundaryScanner.isDistanceGreater to take amt into account.
   
   
   ### Why are the changes needed?
   Described in jira.
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   
   ### How was this patch tested?
   Added string based range window to ptf.q.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 643994)
Time Spent: 1h 20m  (was: 1h 10m)

>  StringValueBoundaryScanner ignores boundary which leads to incorrect results
> -
>
> Key: HIVE-24762
> URL: https://issues.apache.org/jira/browse/HIVE-24762
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/ValueBoundaryScanner.java#L901
> {code}
>   public boolean isDistanceGreater(Object v1, Object v2, int amt) {
> ...
> return s1 != null && s2 != null && s1.compareTo(s2) > 0;
> {code}
> Like other boundary scanners, StringValueBoundaryScanner should take amt into 
> account, otherwise it'll result in the same range regardless of the given 
> window size. This typically affects queries where the range is defined on a 
> string column:
> {code}
> select p_mfgr, p_name, p_retailprice,
> count(*) over(partition by p_mfgr order by p_name range between 1 preceding 
> and current row) as cs1,
> count(*) over(partition by p_mfgr order by p_name range between 3 preceding 
> and current row) as cs2
> from vector_ptf_part_simple_orc;
> {code} 
> with "> 0" cs1 and cs2 will be calculated on the same window, so cs1 == cs2, 
> but actually it should be different, this is the correct result (see "almond 
> antique olive coral navajo"):
> {code}
> +-+-+--+--+
> | p_mfgr  |   p_name| cs1  | cs2  
> |
> +-+-+--+--+
> | Manufacturer#1  | almond antique burnished rose metallic  | 2| 2
> |
> | Manufacturer#1  | almond antique burnished rose metallic  | 2| 2
> |
> | Manufacturer#1  | almond antique chartreuse lavender yellow   | 6| 6
> |
> | Manufacturer#1  | almond antique chartreuse lavender yellow   | 6| 6
> |
> | Manufacturer#1  | almond antique chartreuse lavender yellow   | 6| 6
> |
> | Manufacturer#1  | almond antique chartreuse lavender yellow   | 6| 6
> |
> | Manufacturer#1  | almond antique salmon chartreuse burlywood  | 1| 1
> |
> | Manufacturer#1  | almond aquamarine burnished black steel | 1| 8
> |
> | Manufacturer#1  | almond aquamarine pink moccasin thistle | 4| 4
> |
> | Manufacturer#1  | almond aquamarine pink moccasin thistle | 4| 4
> |
> | Manufacturer#1  | almond aquamarine pink moccasin thistle | 4| 4
> |
> | Manufacturer#1  | almond aquamarine pink moccasin thistle | 4| 4
> |
> | Manufacturer#2  | almond antique violet chocolate turquoise   | 1| 1
> |
> | Manufacturer#2  | almond antique violet turquoise frosted | 3| 3
> |
> | Manufacturer#2  | almond antique violet turquoise frosted | 3| 3
> |
> | Manufacturer#2  | almond antique violet turquoise frosted | 3| 3
> |
> | Manufacturer#2  | almond aquamarine midnight light salmon | 1| 5
> |
> | Manufacturer#2  | almond aquamarine rose maroon antique   | 2| 2
> |
> | Manufacturer#2  | almond aquamarine rose maroon antique   | 2| 2
> |
> | Manufacturer#2  | almond aquamarine sandy cyan gainsboro  | 3| 3
> |
> | Manufacturer#3  | almond antique chartreuse khaki white   | 1| 1
> |
> | Manufacturer#3  | almond antique forest lavender goldenrod| 4| 5
> |
> | 

[jira] [Commented] (HIVE-25482) Add option to enable connectionLeak detection for Hikari datasource

2021-08-31 Thread Aleksandr Pashkovskii (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17407291#comment-17407291
 ] 

Aleksandr Pashkovskii commented on HIVE-25482:
--

PR: [https://github.com/apache/hive/pull/2610]

> Add option to enable connectionLeak detection for Hikari datasource
> ---
>
> Key: HIVE-25482
> URL: https://issues.apache.org/jira/browse/HIVE-25482
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Aleksandr Pashkovskii
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> There are corner cases where we observed connection leaks to DB.
>  
> It will be good to add an option to provide connection leak timeout parameter 
> in HikariCPDataSourceProvider.
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/datasource/HikariCPDataSourceProvider.java#L69]
> e.g following should help Hikari to warn about connection leak, when a 
> connection is not returned to the pool for 1 hour.
> {noformat}
> config.setLeakDetectionThreshold(3600*1000); {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25482) Add option to enable connectionLeak detection for Hikari datasource

2021-08-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25482?focusedWorklogId=643988=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-643988
 ]

ASF GitHub Bot logged work on HIVE-25482:
-

Author: ASF GitHub Bot
Created on: 31/Aug/21 12:09
Start Date: 31/Aug/21 12:09
Worklog Time Spent: 10m 
  Work Description: avpash43 commented on pull request #2610:
URL: https://github.com/apache/hive/pull/2610#issuecomment-909177024


   @rbalamohan , all tests has been finished successfully. Can we make merge?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 643988)
Time Spent: 1h 50m  (was: 1h 40m)

> Add option to enable connectionLeak detection for Hikari datasource
> ---
>
> Key: HIVE-25482
> URL: https://issues.apache.org/jira/browse/HIVE-25482
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Aleksandr Pashkovskii
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> There are corner cases where we observed connection leaks to DB.
>  
> It will be good to add an option to provide connection leak timeout parameter 
> in HikariCPDataSourceProvider.
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/datasource/HikariCPDataSourceProvider.java#L69]
> e.g following should help Hikari to warn about connection leak, when a 
> connection is not returned to the pool for 1 hour.
> {noformat}
> config.setLeakDetectionThreshold(3600*1000); {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25491) Fix: TestReplicationScenariosIncrementalLoadAcidTables.testAcidTableIncrementalReplication

2021-08-31 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17407264#comment-17407264
 ] 

Peter Vary commented on HIVE-25491:
---

The error is this:
{code}
org.apache.hadoop.hive.ql.lockmgr.LockException: Error communicating with the 
metastore
at 
org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.replAllocateTableWriteIdsBatch(DbTxnManager.java:1039)
at 
org.apache.hadoop.hive.ql.exec.ReplTxnTask.execute(ReplTxnTask.java:103)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:83)
Caused by: org.apache.thrift.TApplicationException: Internal error processing 
allocate_table_write_ids
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_allocate_table_write_ids(ThriftHiveMetastore.java:6228)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.allocate_table_write_ids(ThriftHiveMetastore.java:6215)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.allocateTableWriteIdsBatchIntr(HiveMetaStoreClient.java:4005)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.replAllocateTableWriteIdsBatch(HiveMetaStoreClient.java:4001)
at sun.reflect.GeneratedMethodAccessor247.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:218)
at com.sun.proxy.$Proxy63.replAllocateTableWriteIdsBatch(Unknown Source)
at 
org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.replAllocateTableWriteIdsBatch(DbTxnManager.java:1037)
... 4 more
{code}

> Fix: 
> TestReplicationScenariosIncrementalLoadAcidTables.testAcidTableIncrementalReplication
> --
>
> Key: HIVE-25491
> URL: https://issues.apache.org/jira/browse/HIVE-25491
> Project: Hive
>  Issue Type: Test
>Reporter: Peter Vary
>Priority: Major
>
> The test is flaky.
> Found here: 
> http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-2602/4/tests
> Confirmed here: http://ci.hive.apache.org/job/hive-flaky-check/400/
> CC: [~aasha]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24590) Operation Logging still leaks the log4j Appenders

2021-08-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24590?focusedWorklogId=643954=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-643954
 ]

ASF GitHub Bot logged work on HIVE-24590:
-

Author: ASF GitHub Bot
Created on: 31/Aug/21 10:31
Start Date: 31/Aug/21 10:31
Worklog Time Spent: 10m 
  Work Description: zabetak commented on pull request #2432:
URL: https://github.com/apache/hive/pull/2432#issuecomment-909110409


   Hey @prasanthj , can you have a final look on this for getting this in? It 
seems that more and more people are bumping into the problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 643954)
Time Spent: 2h 50m  (was: 2h 40m)

> Operation Logging still leaks the log4j Appenders
> -
>
> Key: HIVE-24590
> URL: https://issues.apache.org/jira/browse/HIVE-24590
> Project: Hive
>  Issue Type: Bug
>  Components: Logging
>Reporter: Eugene Chung
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screen Shot 2021-01-06 at 18.42.05.png, Screen Shot 
> 2021-01-06 at 18.42.24.png, Screen Shot 2021-01-06 at 18.42.55.png, Screen 
> Shot 2021-01-06 at 21.38.32.png, Screen Shot 2021-01-06 at 21.47.28.png, 
> Screen Shot 2021-01-08 at 21.01.40.png, add_debug_log_and_trace.patch
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> I'm using Hive 3.1.2 with options below.
>  * hive.server2.logging.operation.enabled=true
>  * hive.server2.logging.operation.level=VERBOSE
>  * hive.async.log.enabled=false
> I already know the ticket, https://issues.apache.org/jira/browse/HIVE-17128 
> but HS2 still leaks log4j RandomAccessFileManager.
> !Screen Shot 2021-01-06 at 18.42.05.png|width=756,height=197!
> I checked the operation log file which is not closed/deleted properly.
> !Screen Shot 2021-01-06 at 18.42.24.png|width=603,height=272!
> Then there's the log,
> {code:java}
> client.TezClient: Shutting down Tez Session, sessionName= {code}
> !Screen Shot 2021-01-06 at 18.42.55.png|width=1372,height=26!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25480) Fix Time Travel with CBO

2021-08-31 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-25480.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master.
Thanks for the review [~szita]!

> Fix Time Travel with CBO
> 
>
> Key: HIVE-25480
> URL: https://issues.apache.org/jira/browse/HIVE-25480
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When CBO is enable the Time Travel features are not working



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25480) Fix Time Travel with CBO

2021-08-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25480?focusedWorklogId=643924=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-643924
 ]

ASF GitHub Bot logged work on HIVE-25480:
-

Author: ASF GitHub Bot
Created on: 31/Aug/21 08:42
Start Date: 31/Aug/21 08:42
Worklog Time Spent: 10m 
  Work Description: pvary merged pull request #2602:
URL: https://github.com/apache/hive/pull/2602


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 643924)
Time Spent: 20m  (was: 10m)

> Fix Time Travel with CBO
> 
>
> Key: HIVE-25480
> URL: https://issues.apache.org/jira/browse/HIVE-25480
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When CBO is enable the Time Travel features are not working



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25303) CTAS hive.create.as.external.legacy tries to place data files in managed WH path

2021-08-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25303?focusedWorklogId=643922=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-643922
 ]

ASF GitHub Bot logged work on HIVE-25303:
-

Author: ASF GitHub Bot
Created on: 31/Aug/21 08:40
Start Date: 31/Aug/21 08:40
Worklog Time Spent: 10m 
  Work Description: pvary commented on pull request #2442:
URL: https://github.com/apache/hive/pull/2442#issuecomment-909027286


   CC: @marton-bod 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 643922)
Time Spent: 2h 10m  (was: 2h)

> CTAS hive.create.as.external.legacy tries to place data files in managed WH 
> path
> 
>
> Key: HIVE-25303
> URL: https://issues.apache.org/jira/browse/HIVE-25303
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Standalone Metastore
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Under legacy table creation mode (hive.create.as.external.legacy=true), when 
> a database has been created in a specific LOCATION, in a session where that 
> database is Used, tables are created using the following command:
> {code:java}
> CREATE TABLE  AS SELECT {code}
> should inherit the HDFS path from the database's location. Instead, Hive is 
> trying to write the table data into 
> /warehouse/tablespace/managed/hive//
> +Design+: 
> In the CTAS query, first data is written in the target directory (which 
> happens in HS2) and then the table is created(This happens in HMS). So here 
> two decisions are being made i) target directory location ii) how the table 
> should be created (table type, sd e.t.c).
> When HS2 needs a target location that needs to be set, it'll make create 
> table dry run call to HMS (where table translation happens) and i) and ii) 
> decisions are made within HMS and returns table object. Then HS2 will use 
> this location set by HMS for placing the data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value

2021-08-31 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-23688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor resolved HIVE-23688.
-
Resolution: Fixed

> Vectorization: IndexArrayOutOfBoundsException For map type column which 
> includes null value
> ---
>
> Key: HIVE-23688
> URL: https://issues.apache.org/jira/browse/HIVE-23688
> Project: Hive
>  Issue Type: Bug
>  Components: Parquet, storage-api, Vectorization
>Affects Versions: All Versions
>Reporter: 范宜臻
>Assignee: László Bodor
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-23688.patch
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays 
> in MapColumnVector.values(BytesColumnVector) when values in map contain 
> {color:#de350b}null{color}
> reproduce in master branch:
> {code:java}
> set hive.vectorized.execution.enabled=true; 
> CREATE TABLE parquet_map_type (id int,stringMap map) 
> stored as parquet; 
> insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', 
> 'bar'); 
> select id, stringMap['k1'] from parquet_map_type group by 1,2;
> {code}
> query explain:
> {code:java}
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2 vectorized
>   File Output Operator [FS_12]
> Group By Operator [GBY_11] (rows=5 width=2)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE] vectorized
>   SHUFFLE [RS_10]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_9] (rows=10 width=2)
>   Output:["_col0","_col1"],keys:_col0, _col1
>   Select Operator [SEL_8] (rows=10 width=2)
> Output:["_col0","_col1"]
> TableScan [TS_0] (rows=10 width=2)
>   
> temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"]
> {code}
> runtime error:
> {code:java}
> Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, 
> diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row 
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
>   at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row 
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:403)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
>   ... 16 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error 

[jira] [Updated] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value

2021-08-31 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-23688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-23688:

Fix Version/s: (was: 3.0.0)

> Vectorization: IndexArrayOutOfBoundsException For map type column which 
> includes null value
> ---
>
> Key: HIVE-23688
> URL: https://issues.apache.org/jira/browse/HIVE-23688
> Project: Hive
>  Issue Type: Bug
>  Components: Parquet, storage-api, Vectorization
>Affects Versions: All Versions
>Reporter: 范宜臻
>Assignee: László Bodor
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-23688.patch
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays 
> in MapColumnVector.values(BytesColumnVector) when values in map contain 
> {color:#de350b}null{color}
> reproduce in master branch:
> {code:java}
> set hive.vectorized.execution.enabled=true; 
> CREATE TABLE parquet_map_type (id int,stringMap map) 
> stored as parquet; 
> insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', 
> 'bar'); 
> select id, stringMap['k1'] from parquet_map_type group by 1,2;
> {code}
> query explain:
> {code:java}
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2 vectorized
>   File Output Operator [FS_12]
> Group By Operator [GBY_11] (rows=5 width=2)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE] vectorized
>   SHUFFLE [RS_10]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_9] (rows=10 width=2)
>   Output:["_col0","_col1"],keys:_col0, _col1
>   Select Operator [SEL_8] (rows=10 width=2)
> Output:["_col0","_col1"]
> TableScan [TS_0] (rows=10 width=2)
>   
> temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"]
> {code}
> runtime error:
> {code:java}
> Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, 
> diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row 
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
>   at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row 
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:403)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
>   ... 16 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive 

[jira] [Work logged] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value

2021-08-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23688?focusedWorklogId=643895=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-643895
 ]

ASF GitHub Bot logged work on HIVE-23688:
-

Author: ASF GitHub Bot
Created on: 31/Aug/21 07:12
Start Date: 31/Aug/21 07:12
Worklog Time Spent: 10m 
  Work Description: abstractdog merged pull request #2479:
URL: https://github.com/apache/hive/pull/2479


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 643895)
Time Spent: 4.5h  (was: 4h 20m)

> Vectorization: IndexArrayOutOfBoundsException For map type column which 
> includes null value
> ---
>
> Key: HIVE-23688
> URL: https://issues.apache.org/jira/browse/HIVE-23688
> Project: Hive
>  Issue Type: Bug
>  Components: Parquet, storage-api, Vectorization
>Affects Versions: All Versions
>Reporter: 范宜臻
>Assignee: László Bodor
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.0.0, 4.0.0
>
> Attachments: HIVE-23688.patch
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays 
> in MapColumnVector.values(BytesColumnVector) when values in map contain 
> {color:#de350b}null{color}
> reproduce in master branch:
> {code:java}
> set hive.vectorized.execution.enabled=true; 
> CREATE TABLE parquet_map_type (id int,stringMap map) 
> stored as parquet; 
> insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', 
> 'bar'); 
> select id, stringMap['k1'] from parquet_map_type group by 1,2;
> {code}
> query explain:
> {code:java}
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2 vectorized
>   File Output Operator [FS_12]
> Group By Operator [GBY_11] (rows=5 width=2)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE] vectorized
>   SHUFFLE [RS_10]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_9] (rows=10 width=2)
>   Output:["_col0","_col1"],keys:_col0, _col1
>   Select Operator [SEL_8] (rows=10 width=2)
> Output:["_col0","_col1"]
> TableScan [TS_0] (rows=10 width=2)
>   
> temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"]
> {code}
> runtime error:
> {code:java}
> Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, 
> diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row 
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
>   at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> 

[jira] [Commented] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value

2021-08-31 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-23688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17407135#comment-17407135
 ] 

László Bodor commented on HIVE-23688:
-

merged to master, thanks [~maheshk114]

> Vectorization: IndexArrayOutOfBoundsException For map type column which 
> includes null value
> ---
>
> Key: HIVE-23688
> URL: https://issues.apache.org/jira/browse/HIVE-23688
> Project: Hive
>  Issue Type: Bug
>  Components: Parquet, storage-api, Vectorization
>Affects Versions: All Versions
>Reporter: 范宜臻
>Assignee: László Bodor
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.0.0, 4.0.0
>
> Attachments: HIVE-23688.patch
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays 
> in MapColumnVector.values(BytesColumnVector) when values in map contain 
> {color:#de350b}null{color}
> reproduce in master branch:
> {code:java}
> set hive.vectorized.execution.enabled=true; 
> CREATE TABLE parquet_map_type (id int,stringMap map) 
> stored as parquet; 
> insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', 
> 'bar'); 
> select id, stringMap['k1'] from parquet_map_type group by 1,2;
> {code}
> query explain:
> {code:java}
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2 vectorized
>   File Output Operator [FS_12]
> Group By Operator [GBY_11] (rows=5 width=2)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE] vectorized
>   SHUFFLE [RS_10]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_9] (rows=10 width=2)
>   Output:["_col0","_col1"],keys:_col0, _col1
>   Select Operator [SEL_8] (rows=10 width=2)
> Output:["_col0","_col1"]
> TableScan [TS_0] (rows=10 width=2)
>   
> temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"]
> {code}
> runtime error:
> {code:java}
> Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, 
> diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row 
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
>   at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row 
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:403)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
>   ... 16 more
> Caused by: 

[jira] [Comment Edited] (HIVE-23688) Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value

2021-08-31 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-23688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17407135#comment-17407135
 ] 

László Bodor edited comment on HIVE-23688 at 8/31/21, 7:12 AM:
---

merged to master, thanks [~maheshk114] for the review!


was (Author: abstractdog):
merged to master, thanks [~maheshk114]

> Vectorization: IndexArrayOutOfBoundsException For map type column which 
> includes null value
> ---
>
> Key: HIVE-23688
> URL: https://issues.apache.org/jira/browse/HIVE-23688
> Project: Hive
>  Issue Type: Bug
>  Components: Parquet, storage-api, Vectorization
>Affects Versions: All Versions
>Reporter: 范宜臻
>Assignee: László Bodor
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.0.0, 4.0.0
>
> Attachments: HIVE-23688.patch
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> {color:#de350b}start{color} and {color:#de350b}length{color} are empty arrays 
> in MapColumnVector.values(BytesColumnVector) when values in map contain 
> {color:#de350b}null{color}
> reproduce in master branch:
> {code:java}
> set hive.vectorized.execution.enabled=true; 
> CREATE TABLE parquet_map_type (id int,stringMap map) 
> stored as parquet; 
> insert overwrite table parquet_map_typeSELECT 1, MAP('k1', null, 'k2', 
> 'bar'); 
> select id, stringMap['k1'] from parquet_map_type group by 1,2;
> {code}
> query explain:
> {code:java}
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 2 vectorized
>   File Output Operator [FS_12]
> Group By Operator [GBY_11] (rows=5 width=2)
>   Output:["_col0","_col1"],keys:KEY._col0, KEY._col1
> <-Map 1 [SIMPLE_EDGE] vectorized
>   SHUFFLE [RS_10]
> PartitionCols:_col0, _col1
> Group By Operator [GBY_9] (rows=10 width=2)
>   Output:["_col0","_col1"],keys:_col0, _col1
>   Select Operator [SEL_8] (rows=10 width=2)
> Output:["_col0","_col1"]
> TableScan [TS_0] (rows=10 width=2)
>   
> temp@parquet_map_type_fyz,parquet_map_type_fyz,Tbl:COMPLETE,Col:NONE,Output:["id","stringmap"]
> {code}
> runtime error:
> {code:java}
> Vertex failed, vertexName=Map 1, vertexId=vertex_1592040015150_0001_3_00, 
> diagnostics=[Task failed, taskId=task_1592040015150_0001_3_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1592040015150_0001_3_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row 
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
>   at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row 
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:403)
>   at 
>