[jira] [Created] (HIVE-25240) Query Text based MaterializedView rewrite if subqueries
Krisztian Kasa created HIVE-25240: - Summary: Query Text based MaterializedView rewrite if subqueries Key: HIVE-25240 URL: https://issues.apache.org/jira/browse/HIVE-25240 Project: Hive Issue Type: Improvement Reporter: Krisztian Kasa Assignee: Krisztian Kasa {code} create materialized view mat1 as select col0 from t1 where col0 > 1; explain cbo select col0 from (select col0 from t1 where col0 > 1) sub where col0 = 10; {code} {code} HiveProject(col0=[CAST(10):INTEGER]) HiveFilter(condition=[=($0, 10)]) HiveTableScan(table=[[default, mat1]], table:alias=[default.mat1]) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25239) Create the compression table but the compressed properties are no
GuangMing Lu created HIVE-25239: --- Summary: Create the compression table but the compressed properties are no Key: HIVE-25239 URL: https://issues.apache.org/jira/browse/HIVE-25239 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 3.1.0 Reporter: GuangMing Lu Attachments: image-2021-06-11-10-49-25-710.png Create an ORC Snappy format table, call 'desc formatted table' found that 'Compressed' is No, should need to display as YES {quote}create database lgm; create table lgm.test_tbl( f1 int, f2 string ) stored as orc TBLPROPERTIES("orc.compress"="snappy"); desc formatted lgm.test_tbl; !image-2021-06-11-10-49-25-710.png! {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25238) Make excluded SSL cipher suites configurable for Hive Web UI and HS2
Yongzhi Chen created HIVE-25238: --- Summary: Make excluded SSL cipher suites configurable for Hive Web UI and HS2 Key: HIVE-25238 URL: https://issues.apache.org/jira/browse/HIVE-25238 Project: Hive Issue Type: Improvement Components: HiveServer2, Web UI Reporter: Yongzhi Chen When starting a jetty http server, one can explicitly exclude certain (unsecure) SSL cipher suites. This can be especially important, when Hive needs to be compliant with security regulations. Need add properties to support Hive WebUi and HiveServer2 to this -- This message was sent by Atlassian Jira (v8.3.4#803005)
[ANNOUNCE] Apache Hive 2.3.9 Released
The Apache Hive team is proud to announce the release of Apache Hive version 2.3.9. The Apache Hive (TM) data warehouse software facilitates querying and managing large datasets residing in distributed storage. Built on top of Apache Hadoop (TM), it provides, among others: * Tools to enable easy data extract/transform/load (ETL) * A mechanism to impose structure on a variety of data formats * Access to files stored either directly in Apache HDFS (TM) or in other data storage systems such as Apache HBase (TM) * Query execution via Apache Hadoop MapReduce, Apache Tez and Apache Spark frameworks. For Hive release details and downloads, please visit: https://hive.apache.org/downloads.html Hive 2.3.9 Release Notes are available here: https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12350009&styleName=Html&projectId=12310843 We would like to thank the many contributors who made this release possible. Regards, The Apache Hive Team
[jira] [Created] (HIVE-25237) Thrift CLI Service Protocol: Enhance HTTP variant
Matt McCline created HIVE-25237: --- Summary: Thrift CLI Service Protocol: Enhance HTTP variant Key: HIVE-25237 URL: https://issues.apache.org/jira/browse/HIVE-25237 Project: Hive Issue Type: Improvement Reporter: Matt McCline Assignee: Matt McCline I have been thinking about the (Thrift) CLI Service protocol between the client and server. Cloudera's Prashanth Jayachandran (private e-mail) told me that its original BINARY (TCP/IP) transport is designed +_differently_+ than the newer HTTP transport. HTTP is used when we go through a Gateway. The design for HTTP is stateless and different in nature than the direct BINARY TCP/IP connection. Which means today when we see that a Hive Server 2 response to a HTTP query request can be lost and that is part of the design... It is the WARNING we have seen when the Gateway drops its HTTP connection to Hive Server 2. We had been thinking this was a bug but it is by design. I think the HTTP design needs a rethink. When I worked for Tandem computers a long time ago messages were fault-tolerant. They used a message sequence #. When you send a message to a Tandem server it is a process pair. The message gets routed to the current process called the primary. The primary computes the message work and tells the backup process to remember the results before replying in case there is a failure. You can see where this goes -- if there is a failure before the client gets the result it retries and the backup process can resiliently give back the result the primary sent it. This isn't unique to Tandem -- without a process-pair -- this is a general resilient protocol. In the HTTP design says message lost is possible both directions (request and response). I think we adopt a better scheme but not necessarily a process pair. The first principle of rethink is the +_client_+ needs to generate a new operation num (an integer) that replaces the server-side generated random GUID. And the client generates a new msg num within its new operation. So beeline might say ExecuteStatement operationNum = 57 NEW, operationMsgNum = 1. If the client gets an OS connection kind of error, it retries with those (57, 1) numbers. Hive Server 2 will remember the last response. When Hive Server 2 gets a message, there are 3 cases: 1) The sessionId GUID is not valid -- for now we reject the request because it is likely Hive Server 2 killed the session perhaps because it was restarted. 2) The operationNum or operationMsgNum is new. (Assert the msg num increases monotonically.) Perform the request and save the response. And respond. 3) The (operationNum, operationMsgNum) matches the last request. Resiliently respond with the saved result. I think this message handling is in alignment with the HTTP stateless and any messages in-between can be lost philosophy. And it will shield the client from suffering a whole category of message failures that unnecessarily kill queries. This also allows to not worry about which request is idempotent or not but instead requests are resilient. - Link to earlier HTTP change: [HIVE-24786: JDBC HttpClient should retry for idempotent and unsent http methods by prasanthj · Pull Request #1983 · apache/hive (github.com)|https://github.com/apache/hive/pull/1983/files] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25236) Hive lineage is not generated for columns on CREATE MATERIALIZED VIEW
Radhika Kundam created HIVE-25236: - Summary: Hive lineage is not generated for columns on CREATE MATERIALIZED VIEW Key: HIVE-25236 URL: https://issues.apache.org/jira/browse/HIVE-25236 Project: Hive Issue Type: Bug Reporter: Radhika Kundam While creating materialized view, HookContext is supposed to send lineage info which is missing. CREATE MATERIALIZED VIEW tbl1_view as select * from tbl1; Hook Context passed from hive.ql.Driver to Hive Hook of Atlas through hookRunner.runPostExecHooks call doesn't have lineage info. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25235) Remove ThreadPoolExecutorWithOomHook
David Mollitor created HIVE-25235: - Summary: Remove ThreadPoolExecutorWithOomHook Key: HIVE-25235 URL: https://issues.apache.org/jira/browse/HIVE-25235 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: David Mollitor Assignee: David Mollitor While I was looking at [HIVE-24846] to better perform OOM logging and I just realized that this is not a good way to handle OOM. https://stackoverflow.com/questions/1692230/is-it-possible-to-catch-out-of-memory-exception-in-java bq. there's likely no easy way for you to recover from it if you do catch it If we want to handle OOM, it's best to do it from outside. It's be to do it with the JVM facilities: {{-XX:+ExitOnOutOfMemoryError}} {{-XX:OnOutOfMemoryError}} It seems odd that the OOM handler attempts to load a handler and then do more work when clearly the server is hosed at this point and just requesting to do more work will further add to memory pressure. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25234) Implement ALTER TABLE ... SET PARTITION SPEC to change partitioning on Iceberg tables
László Pintér created HIVE-25234: Summary: Implement ALTER TABLE ... SET PARTITION SPEC to change partitioning on Iceberg tables Key: HIVE-25234 URL: https://issues.apache.org/jira/browse/HIVE-25234 Project: Hive Issue Type: Improvement Reporter: László Pintér Assignee: László Pintér Provide a way to change the schema and the Iceberg partitioning specification using Hive syntax. {code:sql} ALTER TABLE tbl SET PARTITION SPEC(...) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25233) Removing deprecated unix_timestamp() UDF
Ashish Sharma created HIVE-25233: Summary: Removing deprecated unix_timestamp() UDF Key: HIVE-25233 URL: https://issues.apache.org/jira/browse/HIVE-25233 Project: Hive Issue Type: Task Components: UDF Affects Versions: All Versions Reporter: Ashish Sharma Assignee: Ashish Sharma Description Since unix_timestamp() UDF was deprecated as part of https://issues.apache.org/jira/browse/HIVE-10728. Internal GenericUDFUnixTimeStamp extend GenericUDFToUnixTimeStamp and call to_utc_timestamp() for unix_timestamp(string date) & unix_timestamp(string date, string pattern). unix_timestamp() => CURRENT_TIMESTAMP unix_timestamp(string date) => to_utc_timestamp() unix_timestamp(string date, string pattern) => to_utc_timestamp() We should clean up unix_timestamp() and points to to_utc_timestamp() -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25232) Update Hive syntax to use plural form of time based partition transforms
László Pintér created HIVE-25232: Summary: Update Hive syntax to use plural form of time based partition transforms Key: HIVE-25232 URL: https://issues.apache.org/jira/browse/HIVE-25232 Project: Hive Issue Type: Task Reporter: László Pintér Assignee: László Pintér We should follow the [SparkSQL syntax|https://iceberg.apache.org/spark-ddl/#partitioned-by] when defining partition transform for Iceberg tables. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Any best practices for hive upgrade from 1.2.1 to 3.1.2
Hi All, We are planning to upgrade the hive from 1.2.1 to 3.1.2, can we get any best practices ..? Is any chance to upgrade without down time (if we disable new features like managed tables)..? Thanks.
[jira] [Created] (HIVE-25231) Add an ability to migrate CSV generated to hive table in replstats
Ayush Saxena created HIVE-25231: --- Summary: Add an ability to migrate CSV generated to hive table in replstats Key: HIVE-25231 URL: https://issues.apache.org/jira/browse/HIVE-25231 Project: Hive Issue Type: Improvement Reporter: Ayush Saxena Assignee: Ayush Saxena Add an option to replstats.sh to load the CSV generated using the replication policy into a hive table/view. -- This message was sent by Atlassian Jira (v8.3.4#803005)