[jira] [Updated] (HIVE-11353) Map env does not reflect in the Local Map Join

2024-04-15 Thread Ryu Kobayashi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-11353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryu Kobayashi updated HIVE-11353:
-
Fix Version/s: All Versions
   Resolution: Won't Fix
   Status: Resolved  (was: Patch Available)

> Map env does not reflect in the Local Map Join
> --
>
> Key: HIVE-11353
> URL: https://issues.apache.org/jira/browse/HIVE-11353
> Project: Hive
>  Issue Type: Bug
>Reporter: Ryu Kobayashi
>Assignee: Ryu Kobayashi
>Priority: Major
> Fix For: All Versions
>
> Attachments: HIVE-11353.1.patch
>
>
> mapreduce.map.env is not reflected when the Local Map Join is ran. Following 
> a sample query:
> {code}
> hive> set mapreduce.map.env=AAA=111,BBB=222,CCC=333;
> hive> select
> >   reflect("java.lang.System", "getenv", "CCC") as CCC,
> >   a.AAA,
> >   b.BBB
> > from (
> >   SELECT
> > reflect("java.lang.System", "getenv", "AAA") as AAA
> >   from
> > foo
> > ) a
> > join (
> >   select
> > reflect("java.lang.System", "getenv", "BBB") as BBB
> >   from
> > foo
> > ) b
> > limit 1;
> Warning: Map Join MAPJOIN[10][bigTable=?] in task 'Stage-3:MAPRED' is a cross 
> product
> Query ID = root_20150716013643_a8ca1539-68ae-4f13-b9fa-7a8b88f01f13
> Total jobs = 1
> 15/07/16 01:36:46 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> Execution log at: 
> /tmp/root/root_20150716013643_a8ca1539-68ae-4f13-b9fa-7a8b88f01f13.log
> 2015-07-16 01:36:47 Starting to launch local task to process map join;
>   maximum memory = 477102080
> 2015-07-16 01:36:48 Dump the side-table for tag: 0 with group count: 1 
> into file: 
> file:/tmp/root/9b900f85-d5e4-4632-90bc-19f4bac516ff/hive_2015-07-16_01-36-43_217_8812243019719259041-1/-local-10003/HashTable-Stage-3/MapJoin-mapfile00--.hashtable
> 2015-07-16 01:36:48 Uploaded 1 File to: 
> file:/tmp/root/9b900f85-d5e4-4632-90bc-19f4bac516ff/hive_2015-07-16_01-36-43_217_8812243019719259041-1/-local-10003/HashTable-Stage-3/MapJoin-mapfile00--.hashtable
>  (282 bytes)
> 2015-07-16 01:36:48 End of local task; Time Taken: 0.934 sec.
> Execution completed successfully
> MapredLocal task succeeded
> Launching Job 1 out of 1
> Number of reduce tasks is set to 0 since there's no reduce operator
> Starting Job = job_1436962851556_0015, Tracking URL = 
> http://hadoop27:8088/proxy/application_1436962851556_0015/
> Kill Command = /usr/local/hadoop/bin/hadoop job  -kill job_1436962851556_0015
> Hadoop job information for Stage-3: number of mappers: 1; number of reducers: > 0
> 2015-07-16 01:36:56,488 Stage-3 map = 0%,  reduce = 0%
> 2015-07-16 01:37:01,656 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 1.28 
> sec
> MapReduce Total cumulative CPU time: 1 seconds 280 msec
> Ended Job = job_1436962851556_0015
> MapReduce Jobs Launched:
> Stage-Stage-3: Map: 1   Cumulative CPU: 1.28 sec   HDFS Read: 5428 HDFS 
> Write: 13 SUCCESS
> Total MapReduce CPU Time Spent: 1 seconds 280 msec
> OK
> 333 null222
> Time taken: 19.562 seconds, Fetched: 1 row(s)
> {code}
> The attached patch will include those taken from Hadoop's code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-11353) Map env does not reflect in the Local Map Join

2024-04-15 Thread Ryu Kobayashi (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-11353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837493#comment-17837493
 ] 

Ryu Kobayashi commented on HIVE-11353:
--

MapReduce has been deprecated and this ticket will be closed.

> Map env does not reflect in the Local Map Join
> --
>
> Key: HIVE-11353
> URL: https://issues.apache.org/jira/browse/HIVE-11353
> Project: Hive
>  Issue Type: Bug
>Reporter: Ryu Kobayashi
>Assignee: Ryu Kobayashi
>Priority: Major
> Attachments: HIVE-11353.1.patch
>
>
> mapreduce.map.env is not reflected when the Local Map Join is ran. Following 
> a sample query:
> {code}
> hive> set mapreduce.map.env=AAA=111,BBB=222,CCC=333;
> hive> select
> >   reflect("java.lang.System", "getenv", "CCC") as CCC,
> >   a.AAA,
> >   b.BBB
> > from (
> >   SELECT
> > reflect("java.lang.System", "getenv", "AAA") as AAA
> >   from
> > foo
> > ) a
> > join (
> >   select
> > reflect("java.lang.System", "getenv", "BBB") as BBB
> >   from
> > foo
> > ) b
> > limit 1;
> Warning: Map Join MAPJOIN[10][bigTable=?] in task 'Stage-3:MAPRED' is a cross 
> product
> Query ID = root_20150716013643_a8ca1539-68ae-4f13-b9fa-7a8b88f01f13
> Total jobs = 1
> 15/07/16 01:36:46 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> Execution log at: 
> /tmp/root/root_20150716013643_a8ca1539-68ae-4f13-b9fa-7a8b88f01f13.log
> 2015-07-16 01:36:47 Starting to launch local task to process map join;
>   maximum memory = 477102080
> 2015-07-16 01:36:48 Dump the side-table for tag: 0 with group count: 1 
> into file: 
> file:/tmp/root/9b900f85-d5e4-4632-90bc-19f4bac516ff/hive_2015-07-16_01-36-43_217_8812243019719259041-1/-local-10003/HashTable-Stage-3/MapJoin-mapfile00--.hashtable
> 2015-07-16 01:36:48 Uploaded 1 File to: 
> file:/tmp/root/9b900f85-d5e4-4632-90bc-19f4bac516ff/hive_2015-07-16_01-36-43_217_8812243019719259041-1/-local-10003/HashTable-Stage-3/MapJoin-mapfile00--.hashtable
>  (282 bytes)
> 2015-07-16 01:36:48 End of local task; Time Taken: 0.934 sec.
> Execution completed successfully
> MapredLocal task succeeded
> Launching Job 1 out of 1
> Number of reduce tasks is set to 0 since there's no reduce operator
> Starting Job = job_1436962851556_0015, Tracking URL = 
> http://hadoop27:8088/proxy/application_1436962851556_0015/
> Kill Command = /usr/local/hadoop/bin/hadoop job  -kill job_1436962851556_0015
> Hadoop job information for Stage-3: number of mappers: 1; number of reducers: > 0
> 2015-07-16 01:36:56,488 Stage-3 map = 0%,  reduce = 0%
> 2015-07-16 01:37:01,656 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 1.28 
> sec
> MapReduce Total cumulative CPU time: 1 seconds 280 msec
> Ended Job = job_1436962851556_0015
> MapReduce Jobs Launched:
> Stage-Stage-3: Map: 1   Cumulative CPU: 1.28 sec   HDFS Read: 5428 HDFS 
> Write: 13 SUCCESS
> Total MapReduce CPU Time Spent: 1 seconds 280 msec
> OK
> 333 null222
> Time taken: 19.562 seconds, Fetched: 1 row(s)
> {code}
> The attached patch will include those taken from Hadoop's code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28199) Docker quickstart does not work for Hive 3.1.3 on Mac M2

2024-04-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28199:
--
Labels: pull-request-available  (was: )

> Docker quickstart does not work for Hive 3.1.3 on Mac M2
> 
>
> Key: HIVE-28199
> URL: https://issues.apache.org/jira/browse/HIVE-28199
> Project: Hive
>  Issue Type: Bug
>Reporter: Ryan Goldenberg
>Assignee: Ryan Goldenberg
>Priority: Minor
>  Labels: pull-request-available
>
> Quickstart: 
> [https://hive.apache.org/developement/quickstart/#--hiveserver2-metastore]
> On Mac M2, {{docker-compose up}} for {{HIVE_VERSION=3.1.3}} gives the 
> following errors
>  * {{/home/hive/.beeline}} directory issue
> {quote}metastore | *** schemaTool failed ***
> metastore | [
> metastore | WARN] Failed to create directory:
> metastore | /home/hive/.beeline
> metastore | No such file or directory
> {quote} * Underscore in network name, from {{/tmp/hive/hive.log}} on 
> {{{}hiveserver2{}}}:
> {quote}2024-04-02T16:26:24,867 ERROR [main] utils.MetaStoreUtils: Got 
> exception: java.net.URISyntaxException Illegal character in hostname at index 
> 25: thrift://metastore.docker_default:9083
> java.net.URISyntaxException: Illegal character in hostname at index 25: 
> thrift://metastore.docker_default:9083
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28199) Docker quickstart does not work for Hive 3.1.3 on Mac M2

2024-04-15 Thread Ryan Goldenberg (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Goldenberg updated HIVE-28199:
---
Description: 
Quickstart: 
[https://hive.apache.org/developement/quickstart/#--hiveserver2-metastore]

On Mac M2, {{docker-compose up}} for {{HIVE_VERSION=3.1.3}} gives the following 
errors
 * {{/home/hive/.beeline}} directory issue

{quote}metastore | *** schemaTool failed ***
metastore | [
metastore | WARN] Failed to create directory:
metastore | /home/hive/.beeline
metastore | No such file or directory
{quote} * Underscore in network name, from {{/tmp/hive/hive.log}} on 
{{{}hiveserver2{}}}:

{quote}2024-04-02T16:26:24,867 ERROR [main] utils.MetaStoreUtils: Got 
exception: java.net.URISyntaxException Illegal character in hostname at index 
25: thrift://metastore.docker_default:9083
java.net.URISyntaxException: Illegal character in hostname at index 25: 
thrift://metastore.docker_default:9083
{quote}

  was:
Quickstart: 
[https://hive.apache.org/developement/quickstart/#--hiveserver2-metastore]

On Mac M2, {{docker-compose up}} for {{HIVE_VERSION=3.1.3}} gives the following 
errors
 * {{/home/hive/.beeline}} directory issue

metastore| *** schemaTool failed ***
metastore| [
metastore| WARN] Failed to create directory:
metastore| /home/hive/.beeline
metastore| No such file or directory * Underscore in network name, from 
{{/tmp/hive/hive.log}} on {{{}hiveserver2{}}}:

2024-04-02T16:26:24,867 ERROR [main] utils.MetaStoreUtils: Got exception: 
java.net.URISyntaxException Illegal character in hostname at index 25: 
thrift://metastore.docker_default:9083
java.net.URISyntaxException: Illegal character in hostname at index 25: 
thrift://metastore.docker_default:9083{{}}


> Docker quickstart does not work for Hive 3.1.3 on Mac M2
> 
>
> Key: HIVE-28199
> URL: https://issues.apache.org/jira/browse/HIVE-28199
> Project: Hive
>  Issue Type: Bug
>Reporter: Ryan Goldenberg
>Priority: Minor
>
> Quickstart: 
> [https://hive.apache.org/developement/quickstart/#--hiveserver2-metastore]
> On Mac M2, {{docker-compose up}} for {{HIVE_VERSION=3.1.3}} gives the 
> following errors
>  * {{/home/hive/.beeline}} directory issue
> {quote}metastore | *** schemaTool failed ***
> metastore | [
> metastore | WARN] Failed to create directory:
> metastore | /home/hive/.beeline
> metastore | No such file or directory
> {quote} * Underscore in network name, from {{/tmp/hive/hive.log}} on 
> {{{}hiveserver2{}}}:
> {quote}2024-04-02T16:26:24,867 ERROR [main] utils.MetaStoreUtils: Got 
> exception: java.net.URISyntaxException Illegal character in hostname at index 
> 25: thrift://metastore.docker_default:9083
> java.net.URISyntaxException: Illegal character in hostname at index 25: 
> thrift://metastore.docker_default:9083
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-28199) Docker quickstart does not work for Hive 3.1.3 on Mac M2

2024-04-15 Thread Ryan Goldenberg (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Goldenberg reassigned HIVE-28199:
--

Assignee: Ryan Goldenberg

> Docker quickstart does not work for Hive 3.1.3 on Mac M2
> 
>
> Key: HIVE-28199
> URL: https://issues.apache.org/jira/browse/HIVE-28199
> Project: Hive
>  Issue Type: Bug
>Reporter: Ryan Goldenberg
>Assignee: Ryan Goldenberg
>Priority: Minor
>
> Quickstart: 
> [https://hive.apache.org/developement/quickstart/#--hiveserver2-metastore]
> On Mac M2, {{docker-compose up}} for {{HIVE_VERSION=3.1.3}} gives the 
> following errors
>  * {{/home/hive/.beeline}} directory issue
> {quote}metastore | *** schemaTool failed ***
> metastore | [
> metastore | WARN] Failed to create directory:
> metastore | /home/hive/.beeline
> metastore | No such file or directory
> {quote} * Underscore in network name, from {{/tmp/hive/hive.log}} on 
> {{{}hiveserver2{}}}:
> {quote}2024-04-02T16:26:24,867 ERROR [main] utils.MetaStoreUtils: Got 
> exception: java.net.URISyntaxException Illegal character in hostname at index 
> 25: thrift://metastore.docker_default:9083
> java.net.URISyntaxException: Illegal character in hostname at index 25: 
> thrift://metastore.docker_default:9083
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-28199) Docker quickstart does not work for Hive 3.1.3 on Mac M2

2024-04-15 Thread Ryan Goldenberg (Jira)
Ryan Goldenberg created HIVE-28199:
--

 Summary: Docker quickstart does not work for Hive 3.1.3 on Mac M2
 Key: HIVE-28199
 URL: https://issues.apache.org/jira/browse/HIVE-28199
 Project: Hive
  Issue Type: Bug
Reporter: Ryan Goldenberg


Quickstart: 
[https://hive.apache.org/developement/quickstart/#--hiveserver2-metastore]

On Mac M2, {{docker-compose up}} for {{HIVE_VERSION=3.1.3}} gives the following 
errors
 * {{/home/hive/.beeline}} directory issue

metastore| *** schemaTool failed ***
metastore| [
metastore| WARN] Failed to create directory:
metastore| /home/hive/.beeline
metastore| No such file or directory * Underscore in network name, from 
{{/tmp/hive/hive.log}} on {{{}hiveserver2{}}}:

2024-04-02T16:26:24,867 ERROR [main] utils.MetaStoreUtils: Got exception: 
java.net.URISyntaxException Illegal character in hostname at index 25: 
thrift://metastore.docker_default:9083
java.net.URISyntaxException: Illegal character in hostname at index 25: 
thrift://metastore.docker_default:9083{{}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28019) Fix query type information in proto files for load and explain queries

2024-04-15 Thread Ramesh Kumar Thangarajan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837401#comment-17837401
 ] 

Ramesh Kumar Thangarajan commented on HIVE-28019:
-

Hi [~zabetak] First of all, thank you very much for the review on this. :)

I am with you on the fact that HiveOperation was introduced for authorization 
and may be we should not change it to represent the query type. But I still 
believe we should do the change for PREHOOK: type: and POSTHOOK: type: and also 
the HiveProtoLoggingHook.

I feel that the change to HiveOperation.Explain for the explain queries is 
needed mostly because we use the HiveOperation to print in the preexecute and 
postexecute actions.

[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/hooks/PreExecutePrinter.java#L69]

At present we report the type information for the queries in preexec and 
postexec as below:

PREHOOK: type: QUERY

POSTHOOK: type: QUERY

I think this is the query type information that is reported along with other 
information on the query. If that is the case I feel we should not report other 
type for explain queries. If this change is loss of information shouldn't the 
usage of type wrong by the users? 

Although we can skip this and fix only the HiveProtoLoggingHook to address 
right query type, I feel we will report two different information for the same 
query in different places. Also keeping them synchronized will help us in the 
complete testing for all types of queries.

Please let me know if you think my points make sense. I will address to not 
touch the commandType and rather create a field to represent explain queries 
and use that to report the correct query type in HiveProtoLoggingHook and the 
PREHOOK: type: and POSTHOOK: type.

> Fix query type information in proto files for load and explain queries
> --
>
> Key: HIVE-28019
> URL: https://issues.apache.org/jira/browse/HIVE-28019
> Project: Hive
>  Issue Type: Task
>  Components: HiveServer2
>Reporter: Ramesh Kumar Thangarajan
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
>  Labels: pull-request-available
>
> Certain query types like LOAD, export, import and explain queries did not 
> produce the right Hive operation type



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27734) Add Iceberg's storage-partitioned join capabilities to Hive's [sorted-]bucket-map-join

2024-04-15 Thread Shohei Okumiya (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837264#comment-17837264
 ] 

Shohei Okumiya commented on HIVE-27734:
---

[~dkuzmenko] 

I surveyed some optimizations implemented in Hive, potentially useful features 
of Iceberg, and how to integrate those existing optimizations with Iceberg. I 
drafted a document and PR as an example.
 * [Design 
document|https://docs.google.com/document/d/1srEK3atO2T3Apa-FsF6bW__ECY-nFrev_1RZ8EN4UF8/edit?usp=sharing]
 * [A sample implementation|https://github.com/apache/hive/pull/5194]

I presume we can take the following actions. I'd be glad to hear other ideas if 
you had.
 # We may create an umbrella ticket as this topic seems too big to complete in 
a single ticket.
 # We may share the documents with the Hive dev ML so that Hive and Iceberg 
experts can be involved.
 # Anything else?

> Add Iceberg's storage-partitioned join capabilities to Hive's 
> [sorted-]bucket-map-join
> --
>
> Key: HIVE-27734
> URL: https://issues.apache.org/jira/browse/HIVE-27734
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Affects Versions: 4.0.0-alpha-2
>Reporter: Janos Kovacs
>Assignee: Shohei Okumiya
>Priority: Major
>
> Iceberg's 'data bucketing' is implemented through its rich (function based) 
> partitioning feature which helps to optimize join operations - called storage 
> partitioned joins. 
> doc: 
> [https://docs.google.com/document/d/1foTkDSM91VxKgkEcBMsuAvEjNybjja-uHk-r3vtXWFE/edit#heading=h.82w8qxfl2uwl]
> spark impl.: https://issues.apache.org/jira/browse/SPARK-37375
> This feature is not yet leveraged in Hive into its bucket-map-join 
> optimization, neither alone nor with Iceberg's SortOrder to 
> sorted-bucket-map-join.
> Customers migrating from Hive table format to Iceberg format with storage 
> optimized schema will experience performance degradation on large tables 
> where Iceberg's gain on no-listing performance improvement is significantly 
> smaller than the actual join performance over bucket-join or even 
> sorted-bucket-join.
>  
> {noformat}
> SET hive.query.results.cache.enabled=false;
> SET hive.fetch.task.conversion = none;
> SET hive.optimize.bucketmapjoin=true;
> SET hive.convert.join.bucket.mapjoin.tez=true;
> SET hive.auto.convert.join.noconditionaltask.size=1000;
> --if you are working with external table, you need this for bmj:
> SET hive.disable.unsafe.external.table.operations=false;
> -- HIVE BUCKET-MAP-JOIN
> DROP TABLE IF EXISTS default.hivebmjt1 PURGE;
> DROP TABLE IF EXISTS default.hivebmjt2 PURGE;
> CREATE TABLE default.hivebmjt1 (id int, txt string) CLUSTERED BY (id) INTO 8 
> BUCKETS;
> CREATE TABLE default.hivebmjt2 (id int, txt string);
> INSERT INTO default.hivebmjt1 VALUES 
> (1,'1'),(2,'2'),(3,'3'),(4,'4'),(5,'5'),(6,'6'),(7,'7'),(8,'8');
> INSERT INTO default.hivebmjt2 VALUES (1,'1'),(2,'2'),(3,'3'),(4,'4');
> EXPLAIN
> SELECT * FROM default.hivebmjt1 f INNER  JOIN default.hivebmjt2 d ON f.id 
> = d.id;
> EXPLAIN
> SELECT * FROM default.hivebmjt1 f LEFT OUTER JOIN default.hivebmjt2 d ON f.id 
> = d.id;
> -- Both are optimized into BMJ
> -- ICEBERG BUCKET-MAP-JOIN via Iceberg's storage-partitioned join
> DROP TABLE IF EXISTS default.icespbmjt1 PURGE;
> DROP TABLE IF EXISTS default.icespbmjt2 PURGE;
> CREATE TABLE default.icespbmjt1 (txt string) PARTITIONED BY (id int) STORED 
> BY ICEBERG ;
> CREATE TABLE default.icespbmjt2 (txt string) PARTITIONED BY (id int) STORED 
> BY ICEBERG ;
> INSERT INTO default.icespbmjt1 VALUES ('1',1),('2',2),('3',3),('4',4);
> INSERT INTO default.icespbmjt2 VALUES ('1',1),('2',2),('3',3),('4',4);
> EXPLAIN
> SELECT * FROM default.icespbmjt1 f INNER  JOIN default.icespbmjt2 d ON 
> f.id = d.id;
> EXPLAIN
> SELECT * FROM default.icespbmjt1 f LEFT OUTER JOIN default.icespbmjt2 d ON 
> f.id = d.id;
> -- Only Map-Join optimised
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28082) HiveAggregateReduceFunctionsRule could generate an inconsistent result

2024-04-15 Thread Krisztian Kasa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837244#comment-17837244
 ] 

Krisztian Kasa commented on HIVE-28082:
---

Some more details about the issue:
{code}
explain cbo
select avg('text');
select avg('text');
{code}
{{avg('text')}} is converted to {{sum('text')/count('text')}}
{code}
HiveProject(_o__c0=[/($0, $1)])
  HiveAggregate(group=[{}], agg#0=[sum($0)], agg#1=[count()])
HiveProject($f0=[_UTF-16LE'text':VARCHAR(2147483647) CHARACTER SET 
"UTF-16LE"])
  HiveTableScan(table=[[_dummy_database, _dummy_table]], 
table:alias=[_dummy_table])
{code}
and {{sum('text')}} throws an exception at execution time and logged as a 
warning:
{code}
 2024-04-15T04:47:57,568  WARN [TezTR-671313_1_1_0_0_0] generic.GenericUDAFSum: 
GenericUDAFSumDouble java.lang.NumberFormatException: For input string: "text"
at 
sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
at java.lang.Double.parseDouble(Double.java:538)
at 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getDouble(PrimitiveObjectInspectorUtils.java:867)
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFSum$GenericUDAFSumDouble.iterate(GenericUDAFSum.java:444)
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:215)
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:620)
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:792)
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:701)
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:766)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:94)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:173)
at 
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:155)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:555)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at 
org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

2024-04-15T04:47:57,568  WARN [TezTR-671313_1_1_0_0_0] generic.GenericUDAFSum: 
GenericUDAFSumDouble ignoring similar exceptions.
{code}

Similar when CBO is turned off
{code}
set hive.cbo.enable=false;
select avg('text');
{code}
{code}
024-04-15T04:55:29,444  WARN [TezTR-126305_1_1_0_0_0] 
generic.GenericUDAFAverage: Ignoring similar exceptions
java.lang.NumberFormatException: For input string: "text"
at 
sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043) 
~[?:1.8.0_301]
at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110) 
~[?:1.8.0_301]
at java.lang.Double.parseDouble(Double.java:538) ~[?:1.8.0_301]
at 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getDouble(PrimitiveObjectIn

[jira] [Updated] (HIVE-28198) Trino table is recognized as EXTERNAL_TABLE regardless of external_location parameter

2024-04-15 Thread Mladjan Gadzic (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mladjan Gadzic updated HIVE-28198:
--
Description: 
{code:java}
trino > create table hive.default.test_table(id int);{code}
{code:java}
trino> delete from hive.default.test_table;
Query 20240402_103228_00042_hm8m3, FAILED, 1 node Splits: 1 total, 0 done 
(0.00%) 0.08 [0 rows, 0B] [0 rows/s, 0B/s]
Query 20240402_103228_00042_hm8m3 failed: Cannot delete from non-managed Hive  
table{code}
This behavior is tested and works as expected in Hive 3. Table type is stored 
in HMS DB in {{TBLS}} table {{TBL_TYPE}} field. For Hive 3 value is 
MANAGED_TABLE and EXTERNAL_TABLE for Hive 4.

 

  was:
{code:java}
trino > create table hive.default.test_table(id int);{code}
{code:java}
trino> delete from hive.default.test_table;
Query 20240402_103228_00042_hm8m3, FAILED, 1 node Splits: 1 total, 0 done 
(0.00%) 0.08 [0 rows, 0B] [0 rows/s, 0B/s]
Query 20240402_103228_00042_hm8m3 failed: Cannot delete from non-managed Hive  
table{code}
This behavior is tested and works as expected in Hive 3. Table type is stored 
in HMS DB in {{TBLS}} table {{TBL_TYPE}} field. For Hive 3 value is 
MANAGED_TABLE and EXTERNAL_TABLEfor Hive 4.

 


> Trino table is recognized as EXTERNAL_TABLE regardless of external_location 
> parameter
> -
>
> Key: HIVE-28198
> URL: https://issues.apache.org/jira/browse/HIVE-28198
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Mladjan Gadzic
>Priority: Major
>
> {code:java}
> trino > create table hive.default.test_table(id int);{code}
> {code:java}
> trino> delete from hive.default.test_table;
> Query 20240402_103228_00042_hm8m3, FAILED, 1 node Splits: 1 total, 0 done 
> (0.00%) 0.08 [0 rows, 0B] [0 rows/s, 0B/s]
> Query 20240402_103228_00042_hm8m3 failed: Cannot delete from non-managed Hive 
>  table{code}
> This behavior is tested and works as expected in Hive 3. Table type is stored 
> in HMS DB in {{TBLS}} table {{TBL_TYPE}} field. For Hive 3 value is 
> MANAGED_TABLE and EXTERNAL_TABLE for Hive 4.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-28198) Trino table is recognized as EXTERNAL_TABLE regardless of external_location parameter

2024-04-15 Thread Mladjan Gadzic (Jira)
Mladjan Gadzic created HIVE-28198:
-

 Summary: Trino table is recognized as EXTERNAL_TABLE regardless of 
external_location parameter
 Key: HIVE-28198
 URL: https://issues.apache.org/jira/browse/HIVE-28198
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 4.0.0
Reporter: Mladjan Gadzic


{code:java}
trino > create table hive.default.test_table(id int);{code}
{code:java}
trino> delete from hive.default.test_table;
Query 20240402_103228_00042_hm8m3, FAILED, 1 node Splits: 1 total, 0 done 
(0.00%) 0.08 [0 rows, 0B] [0 rows/s, 0B/s]
Query 20240402_103228_00042_hm8m3 failed: Cannot delete from non-managed Hive  
table{code}
This behavior is tested and works as expected in Hive 3. Table type is stored 
in HMS DB in {{TBLS}} table {{TBL_TYPE}} field. For Hive 3 value is 
MANAGED_TABLE and EXTERNAL_TABLEfor Hive 4.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-28153) Flaky test TestConflictingDataFiles.testMultiFiltersUpdate

2024-04-15 Thread Simhadri Govindappa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simhadri Govindappa resolved HIVE-28153.

Fix Version/s: 4.1.0
   Resolution: Fixed

> Flaky test TestConflictingDataFiles.testMultiFiltersUpdate
> --
>
> Key: HIVE-28153
> URL: https://issues.apache.org/jira/browse/HIVE-28153
> Project: Hive
>  Issue Type: Test
>  Components: Test
>Reporter: Butao Zhang
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> This test has been failing a lot lately, such as 
> [http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-5063/13/tests/]
>  
> And the flaky test shows this test is unstable:
> [http://ci.hive.apache.org/job/hive-flaky-check/831/testReport/]
> {code:java}
> 10:29:21  [INFO]  T E S T S
> 10:29:21  [INFO] ---
> 10:29:21  [INFO] Running org.apache.iceberg.mr.hive.TestConflictingDataFiles
> 10:36:13  [ERROR] Tests run: 60, Failures: 1, Errors: 0, Skipped: 24, Time 
> elapsed: 399.12 s <<< FAILURE! - in 
> org.apache.iceberg.mr.hive.TestConflictingDataFiles
> 10:36:13  [ERROR] 
> org.apache.iceberg.mr.hive.TestConflictingDataFiles.testMultiFiltersUpdate[fileFormat=PARQUET,
>  engine=tez, catalog=HIVE_CATALOG, isVectorized=false, formatVersion=1]  Time 
> elapsed: 11.781 s  <<< FAILURE!
> 10:36:13  java.lang.AssertionError: expected:<12> but was:<13>
> 10:36:13  at org.junit.Assert.fail(Assert.java:89)
> 10:36:13  at org.junit.Assert.failNotEquals(Assert.java:835)
> 10:36:13  at org.junit.Assert.assertEquals(Assert.java:647)
> 10:36:13  at org.junit.Assert.assertEquals(Assert.java:633)
> 10:36:13  at 
> org.apache.iceberg.mr.hive.TestConflictingDataFiles.testMultiFiltersUpdate(TestConflictingDataFiles.java:135)
> 10:36:13  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 10:36:13  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 10:36:13  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 10:36:13  at java.lang.reflect.Method.invoke(Method.java:498)
> 10:36:13  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> 10:36:13  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 10:36:13  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> 10:36:13  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> 10:36:13  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> 10:36:13  at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> 10:36:13  at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
> 10:36:13  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
> 10:36:13  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
> 10:36:13  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 10:36:13  at java.lang.Thread.run(Thread.java:748) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28153) Flaky test TestConflictingDataFiles.testMultiFiltersUpdate

2024-04-15 Thread Simhadri Govindappa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837165#comment-17837165
 ] 

Simhadri Govindappa commented on HIVE-28153:


Change is merged to master.
Thanks, [~dkuzmenko]  and [~zhangbutao]  for the review!


Additionally, I have raised HIVE-28192 to investigate the bug mentioned above. 
It seems like the IOContext is shared between threads in non-vectorized code 
flow which is causing duplicate records.

> Flaky test TestConflictingDataFiles.testMultiFiltersUpdate
> --
>
> Key: HIVE-28153
> URL: https://issues.apache.org/jira/browse/HIVE-28153
> Project: Hive
>  Issue Type: Test
>  Components: Test
>Reporter: Butao Zhang
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>
> This test has been failing a lot lately, such as 
> [http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-5063/13/tests/]
>  
> And the flaky test shows this test is unstable:
> [http://ci.hive.apache.org/job/hive-flaky-check/831/testReport/]
> {code:java}
> 10:29:21  [INFO]  T E S T S
> 10:29:21  [INFO] ---
> 10:29:21  [INFO] Running org.apache.iceberg.mr.hive.TestConflictingDataFiles
> 10:36:13  [ERROR] Tests run: 60, Failures: 1, Errors: 0, Skipped: 24, Time 
> elapsed: 399.12 s <<< FAILURE! - in 
> org.apache.iceberg.mr.hive.TestConflictingDataFiles
> 10:36:13  [ERROR] 
> org.apache.iceberg.mr.hive.TestConflictingDataFiles.testMultiFiltersUpdate[fileFormat=PARQUET,
>  engine=tez, catalog=HIVE_CATALOG, isVectorized=false, formatVersion=1]  Time 
> elapsed: 11.781 s  <<< FAILURE!
> 10:36:13  java.lang.AssertionError: expected:<12> but was:<13>
> 10:36:13  at org.junit.Assert.fail(Assert.java:89)
> 10:36:13  at org.junit.Assert.failNotEquals(Assert.java:835)
> 10:36:13  at org.junit.Assert.assertEquals(Assert.java:647)
> 10:36:13  at org.junit.Assert.assertEquals(Assert.java:633)
> 10:36:13  at 
> org.apache.iceberg.mr.hive.TestConflictingDataFiles.testMultiFiltersUpdate(TestConflictingDataFiles.java:135)
> 10:36:13  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 10:36:13  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 10:36:13  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 10:36:13  at java.lang.reflect.Method.invoke(Method.java:498)
> 10:36:13  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> 10:36:13  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 10:36:13  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> 10:36:13  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> 10:36:13  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> 10:36:13  at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> 10:36:13  at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
> 10:36:13  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
> 10:36:13  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
> 10:36:13  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 10:36:13  at java.lang.Thread.run(Thread.java:748) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28177) Announce Hive 1.x EOL and remove from downloads space

2024-04-15 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837116#comment-17837116
 ] 

Stamatis Zampetakis commented on HIVE-28177:


Hey [~ayushsaxena], I've seen that you send the announcement in user@ and dev@ 
thanks for doing that! Are you also planning to tackle the remaining items? 

> Announce Hive 1.x EOL and remove from downloads space
> -
>
> Key: HIVE-28177
> URL: https://issues.apache.org/jira/browse/HIVE-28177
> Project: Hive
>  Issue Type: Task
>  Components: Documentation
>Reporter: Stamatis Zampetakis
>Priority: Major
>
> The Hive 1.x release line is officially unsupported. The respective 
> discussion and vote can be found below:
>  * https://lists.apache.org/thread/sxcrcf4v9j630tl9domp0bn4m33bdq0s
>  * [https://lists.apache.org/thread/cyfg2ftrsh9bn0wgycm7ltqsx9yb6fts]
> The following tasks are pending:
>  * Update the Hive website to reflect that Hive 1.x is EOL
>  * Send an official announcement email to the following lists: user@hive, 
> dev@hive, announce@apache
>  * Remove hive-1.2.2 from [https://downloads.apache.org/hive/] (it will be 
> automatically archived)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26220) Shade & relocate dependencies in hive-exec to avoid conflicting with downstream projects

2024-04-15 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-26220:
--
Labels: hive-4.0.1-must  (was: )

> Shade & relocate dependencies in hive-exec to avoid conflicting with 
> downstream projects
> 
>
> Key: HIVE-26220
> URL: https://issues.apache.org/jira/browse/HIVE-26220
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0, 4.0.0-alpha-1
>Reporter: Chao Sun
>Priority: Blocker
>  Labels: hive-4.0.1-must
>
> Currently projects like Spark, Trino/Presto, Iceberg, etc, are depending on 
> {{hive-exec:core}} which was removed in HIVE-25531. The reason these projects 
> use {{hive-exec:core}} is because they have the flexibility to exclude, shade 
> & relocate dependencies in {{hive-exec}} that conflict with the ones they 
> brought in by themselves. However, with {{hive-exec}} this is no longer 
> possible, since it is a fat jar that shade those dependencies but do not 
> relocate many of them.
> In order for the downstream projects to consume {{hive-exec}}, we will need 
> to make sure all the dependencies in {{hive-exec}} are properly shaded and 
> relocated, so they won't cause conflicts with those from the downstream.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28133) Log the original exception in HiveIOExceptionHandlerUtil#handleRecordReaderException

2024-04-15 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837112#comment-17837112
 ] 

Denys Kuzmenko commented on HIVE-28133:
---

Merged to master
Thanks [~abstractdog] for the review!

> Log the original exception in 
> HiveIOExceptionHandlerUtil#handleRecordReaderException
> 
>
> Key: HIVE-28133
> URL: https://issues.apache.org/jira/browse/HIVE-28133
> Project: Hive
>  Issue Type: Improvement
>Reporter: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-28133) Log the original exception in HiveIOExceptionHandlerUtil#handleRecordReaderException

2024-04-15 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko resolved HIVE-28133.
---
Fix Version/s: 4.1.0
   Resolution: Fixed

> Log the original exception in 
> HiveIOExceptionHandlerUtil#handleRecordReaderException
> 
>
> Key: HIVE-28133
> URL: https://issues.apache.org/jira/browse/HIVE-28133
> Project: Hive
>  Issue Type: Improvement
>Reporter: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28197) Add deserializer to convert JSON plans to RelNodes

2024-04-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28197:
--
Labels: pull-request-available  (was: )

> Add deserializer to convert JSON plans to RelNodes
> --
>
> Key: HIVE-28197
> URL: https://issues.apache.org/jira/browse/HIVE-28197
> Project: Hive
>  Issue Type: New Feature
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Major
>  Labels: pull-request-available
>
> We have a serializer that converts RelNodes to JSON. With this patch, we will 
> be able to deserialize JSON plans.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)