[jira] [Created] (HIVE-25240) Query Text based MaterializedView rewrite if subqueries

2021-06-10 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-25240:
-

 Summary: Query Text based MaterializedView rewrite if subqueries
 Key: HIVE-25240
 URL: https://issues.apache.org/jira/browse/HIVE-25240
 Project: Hive
  Issue Type: Improvement
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


{code}
create materialized view mat1 as
select col0 from t1 where col0 > 1;

explain cbo
select col0 from
  (select col0 from t1 where col0 > 1) sub
where col0 = 10;
{code}
{code}
HiveProject(col0=[CAST(10):INTEGER])
  HiveFilter(condition=[=($0, 10)])
HiveTableScan(table=[[default, mat1]], table:alias=[default.mat1])
{code}

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25239) Create the compression table but the compressed properties are no

2021-06-10 Thread GuangMing Lu (Jira)
GuangMing Lu created HIVE-25239:
---

 Summary: Create the compression table but the compressed 
properties are no
 Key: HIVE-25239
 URL: https://issues.apache.org/jira/browse/HIVE-25239
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 3.1.0
Reporter: GuangMing Lu
 Attachments: image-2021-06-11-10-49-25-710.png

Create an ORC Snappy format table, call 'desc formatted table' found that 
'Compressed' is No, should need to display as YES
{quote}create database lgm;

create table lgm.test_tbl(
 f1 int,
 f2 string
) stored as orc
TBLPROPERTIES("orc.compress"="snappy");

desc formatted lgm.test_tbl;

!image-2021-06-11-10-49-25-710.png!
{quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25238) Make excluded SSL cipher suites configurable for Hive Web UI and HS2

2021-06-10 Thread Yongzhi Chen (Jira)
Yongzhi Chen created HIVE-25238:
---

 Summary: Make excluded SSL cipher suites configurable for Hive Web 
UI and HS2
 Key: HIVE-25238
 URL: https://issues.apache.org/jira/browse/HIVE-25238
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2, Web UI
Reporter: Yongzhi Chen


When starting a jetty http server, one can explicitly exclude certain (unsecure)
SSL cipher suites. This can be especially important, when Hive
needs to be compliant with security regulations. Need add properties to support 
Hive WebUi and HiveServer2 to this



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[ANNOUNCE] Apache Hive 2.3.9 Released

2021-06-10 Thread Chao Sun
The Apache Hive team is proud to announce the release of Apache Hive
version 2.3.9.

The Apache Hive (TM) data warehouse software facilitates querying and
managing large datasets residing in distributed storage. Built on top of
Apache Hadoop (TM), it provides, among others:

* Tools to enable easy data extract/transform/load (ETL)
* A mechanism to impose structure on a variety of data formats
* Access to files stored either directly in Apache HDFS (TM) or in other
data storage systems such as Apache HBase (TM)
* Query execution via Apache Hadoop MapReduce, Apache Tez and Apache Spark
frameworks.

For Hive release details and downloads, please visit:
https://hive.apache.org/downloads.html
Hive 2.3.9 Release Notes are available here:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12350009&styleName=Html&projectId=12310843

We would like to thank the many contributors who made this release possible.

Regards,
The Apache Hive Team


[jira] [Created] (HIVE-25237) Thrift CLI Service Protocol: Enhance HTTP variant

2021-06-10 Thread Matt McCline (Jira)
Matt McCline created HIVE-25237:
---

 Summary: Thrift CLI Service Protocol: Enhance HTTP variant
 Key: HIVE-25237
 URL: https://issues.apache.org/jira/browse/HIVE-25237
 Project: Hive
  Issue Type: Improvement
Reporter: Matt McCline
Assignee: Matt McCline


I have been thinking about the (Thrift) CLI Service protocol between the client 
and server.

Cloudera's Prashanth Jayachandran (private e-mail) told me that its original 
BINARY (TCP/IP) transport is designed +_differently_+ than the newer HTTP 
transport. HTTP is used when we go through a Gateway. The design for HTTP is 
stateless and different in nature than the direct BINARY TCP/IP connection. 
Which means today when we see that a Hive Server 2 response to a HTTP query 
request can be lost and that is part of the design... It is the WARNING we have 
seen when the Gateway drops its HTTP connection to Hive Server 2. We had been 
thinking this was a bug but it is by design.

I think the HTTP design needs a rethink.

When I worked for Tandem computers a long time ago messages were 
fault-tolerant. They used a message sequence #. When you send a message to a 
Tandem server it is a process pair. The message gets routed to the current 
process called the primary. The primary computes the message work and tells the 
backup process to remember the results before replying in case there is a 
failure. You can see where this goes -- if there is a failure before the client 
gets the result it retries and the backup process can resiliently give back the 
result the primary sent it. This isn't unique to Tandem -- without a 
process-pair -- this is a general resilient protocol.

In the HTTP design says message lost is possible both directions (request and 
response). I think we adopt a better scheme but not necessarily a process pair.

The first principle of rethink is the +_client_+ needs to generate a new 
operation num (an integer) that replaces the server-side generated random GUID. 
And the client generates a new msg num within its new operation. So beeline 
might say ExecuteStatement operationNum = 57 NEW, operationMsgNum = 1. If the 
client gets an OS connection kind of error, it retries with those (57, 1) 
numbers. Hive Server 2 will remember the last response. When Hive Server 2 gets 
a message, there are 3 cases:

1) The sessionId GUID is not valid -- for now we reject the request because it 
is likely Hive Server 2 killed the session perhaps because it was restarted.

2) The operationNum or operationMsgNum is new. (Assert the msg num increases 
monotonically.) Perform the request and save the response. And respond.

3) The (operationNum, operationMsgNum) matches the last request. Resiliently 
respond with the saved result.

I think this message handling is in alignment with the HTTP stateless and any 
messages in-between can be lost philosophy. And it will shield the client from 
suffering a whole category of message failures that unnecessarily kill queries.

This also allows to not worry about which request is idempotent or not but 
instead requests are resilient.

-

Link to earlier HTTP change: [HIVE-24786: JDBC HttpClient should retry for 
idempotent and unsent http methods by prasanthj · Pull Request #1983 · 
apache/hive (github.com)|https://github.com/apache/hive/pull/1983/files]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25236) Hive lineage is not generated for columns on CREATE MATERIALIZED VIEW

2021-06-10 Thread Radhika Kundam (Jira)
Radhika Kundam created HIVE-25236:
-

 Summary: Hive lineage is not generated for columns on CREATE 
MATERIALIZED VIEW
 Key: HIVE-25236
 URL: https://issues.apache.org/jira/browse/HIVE-25236
 Project: Hive
  Issue Type: Bug
Reporter: Radhika Kundam


While creating materialized view, HookContext is supposed to send lineage info 
which is missing.

CREATE MATERIALIZED VIEW tbl1_view as select * from tbl1;

Hook Context passed from hive.ql.Driver to Hive Hook of Atlas through 
hookRunner.runPostExecHooks call doesn't have lineage info.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25235) Remove ThreadPoolExecutorWithOomHook

2021-06-10 Thread David Mollitor (Jira)
David Mollitor created HIVE-25235:
-

 Summary: Remove ThreadPoolExecutorWithOomHook
 Key: HIVE-25235
 URL: https://issues.apache.org/jira/browse/HIVE-25235
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: David Mollitor
Assignee: David Mollitor


While I was looking at [HIVE-24846] to better perform OOM logging and I just 
realized that this is not a good way to handle OOM.

https://stackoverflow.com/questions/1692230/is-it-possible-to-catch-out-of-memory-exception-in-java

bq. there's likely no easy way for you to recover from it if you do catch it

If we want to handle OOM, it's best to do it from outside. It's be to do it 
with the JVM facilities:

{{-XX:+ExitOnOutOfMemoryError}}
{{-XX:OnOutOfMemoryError}}

It seems odd that the OOM handler attempts to load a handler and then do more 
work when clearly the server is hosed at this point and just requesting to do 
more work will further add to memory pressure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25234) Implement ALTER TABLE ... SET PARTITION SPEC to change partitioning on Iceberg tables

2021-06-10 Thread Jira
László Pintér created HIVE-25234:


 Summary: Implement ALTER TABLE ... SET PARTITION SPEC to change 
partitioning on Iceberg tables
 Key: HIVE-25234
 URL: https://issues.apache.org/jira/browse/HIVE-25234
 Project: Hive
  Issue Type: Improvement
Reporter: László Pintér
Assignee: László Pintér


Provide a way to change the schema and the Iceberg partitioning specification 
using Hive syntax.
{code:sql}
ALTER TABLE tbl SET PARTITION SPEC(...)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25233) Removing deprecated unix_timestamp() UDF

2021-06-10 Thread Ashish Sharma (Jira)
Ashish Sharma created HIVE-25233:


 Summary: Removing deprecated unix_timestamp() UDF
 Key: HIVE-25233
 URL: https://issues.apache.org/jira/browse/HIVE-25233
 Project: Hive
  Issue Type: Task
  Components: UDF
Affects Versions: All Versions
Reporter: Ashish Sharma
Assignee: Ashish Sharma


Description

Since unix_timestamp() UDF was deprecated as part of 
https://issues.apache.org/jira/browse/HIVE-10728. Internal 
GenericUDFUnixTimeStamp extend GenericUDFToUnixTimeStamp and call 
to_utc_timestamp() for unix_timestamp(string date) & unix_timestamp(string 
date, string pattern).


unix_timestamp()   => CURRENT_TIMESTAMP
unix_timestamp(string date) => to_utc_timestamp()
unix_timestamp(string date, string pattern) => to_utc_timestamp()


We should clean up unix_timestamp() and points to to_utc_timestamp()
   




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25232) Update Hive syntax to use plural form of time based partition transforms

2021-06-10 Thread Jira
László Pintér created HIVE-25232:


 Summary: Update Hive syntax to use plural form of time based 
partition transforms 
 Key: HIVE-25232
 URL: https://issues.apache.org/jira/browse/HIVE-25232
 Project: Hive
  Issue Type: Task
Reporter: László Pintér
Assignee: László Pintér


We should follow the [SparkSQL 
syntax|https://iceberg.apache.org/spark-ddl/#partitioned-by] when defining 
partition transform for Iceberg tables. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Any best practices for hive upgrade from 1.2.1 to 3.1.2

2021-06-10 Thread Battula, Brahma Reddy
Hi All,

We are planning to upgrade the hive from 1.2.1 to 3.1.2, can we get any best 
practices ..?

Is any chance to upgrade without down time (if we disable new features like 
managed tables)..?


Thanks.





[jira] [Created] (HIVE-25231) Add an ability to migrate CSV generated to hive table in replstats

2021-06-10 Thread Ayush Saxena (Jira)
Ayush Saxena created HIVE-25231:
---

 Summary: Add an ability to migrate CSV generated to hive table in 
replstats
 Key: HIVE-25231
 URL: https://issues.apache.org/jira/browse/HIVE-25231
 Project: Hive
  Issue Type: Improvement
Reporter: Ayush Saxena
Assignee: Ayush Saxena


Add an option to replstats.sh to load the CSV generated using the replication 
policy into a hive table/view.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)