[jira] [Created] (HIVE-27014) Iceberg: getSplits/planTasks should filter out relevant folders instead of scanning entire table

2023-02-02 Thread Rajesh Balamohan (Jira)
Rajesh Balamohan created HIVE-27014:
---

 Summary: Iceberg: getSplits/planTasks should filter out relevant 
folders instead of scanning entire table
 Key: HIVE-27014
 URL: https://issues.apache.org/jira/browse/HIVE-27014
 Project: Hive
  Issue Type: Improvement
  Components: Iceberg integration
Reporter: Rajesh Balamohan


With dynamic partition pruning, only relevant folders in fact tables are 
scanned.

In tez, DynamicPartitionPruner will set the relevant filters.In iceberg, these 
filters are applied after "Table:planTasks()" is invoked in iceberg. This 
forces entire table metadata to be scanned and then throw off the unwanted 
partitions. 

This makes split computation expensive (e.g for store_sales, it has to look at 
all 1800+ partitions and throw off unwanted partitions).

For short running queries, it takes 3-5+ seconds for split computation. 
Creating this ticket as a placeholder to make use of the relevant filters from 
DPP.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27015) Add support to accept client connections with X-CSRF-Token as part of header in http transport mode

2023-02-02 Thread Taraka Rama Rao Lethavadla (Jira)
Taraka Rama Rao Lethavadla created HIVE-27015:
-

 Summary: Add support to accept client connections with 
X-CSRF-Token as part of header in http transport mode
 Key: HIVE-27015
 URL: https://issues.apache.org/jira/browse/HIVE-27015
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 4.0.0
Reporter: Taraka Rama Rao Lethavadla
Assignee: Taraka Rama Rao Lethavadla


Today in HTTP transport mode, clients needs to send *X-XSRF-HEADER* introduced 
as part of HIVE-13853

This Jira is about adding support to accept connections from client(ex: Hue) 
which are sending *X-CSRF-Token*

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27016) Invoke optional output committer in TezProcessor

2023-02-02 Thread Yi Zhang (Jira)
Yi Zhang created HIVE-27016:
---

 Summary: Invoke optional output committer in TezProcessor
 Key: HIVE-27016
 URL: https://issues.apache.org/jira/browse/HIVE-27016
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor, StorageHandler
Affects Versions: 3.1.3
Reporter: Yi Zhang


This is backport HIVE-24629 and HIVE-24867, so StorageHandler with their own 
OutputCommitter run in tez.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27017) option to use createTable DDLTask in CTAS for StorgeHandler

2023-02-02 Thread Yi Zhang (Jira)
Yi Zhang created HIVE-27017:
---

 Summary: option to use createTable DDLTask in CTAS for 
StorgeHandler
 Key: HIVE-27017
 URL: https://issues.apache.org/jira/browse/HIVE-27017
 Project: Hive
  Issue Type: Improvement
  Components: StorageHandler
Affects Versions: 3.1.3
Reporter: Yi Zhang


This is to add a directInsert option for StorageHandler and advance DDLTask for 
CTAS when it is storagehandler directInsert mode. This is partial backport of 
HIVE-26771



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27018) Aborted transaction cleanup outside compaction process

2023-02-02 Thread Sourabh Badhya (Jira)
Sourabh Badhya created HIVE-27018:
-

 Summary:  Aborted transaction cleanup outside compaction process
 Key: HIVE-27018
 URL: https://issues.apache.org/jira/browse/HIVE-27018
 Project: Hive
  Issue Type: Improvement
Reporter: Sourabh Badhya
Assignee: Sourabh Badhya


Aborted transactions processing is tightly integrated into the compaction 
pipeline and consists of 3 main stages: Initiator, Compactor (Worker), Cleaner. 
That could be simplified by doing all work on the Cleaner side.


*Potential Benefits -* 
There are major advantages of implementing this on the cleaner side - 
 1) Currently an aborted txn in the TXNS table blocks the cleaning of 
TXN_TO_WRITE_ID table since nothing gets cleaned above MIN(aborted txnid) in 
the current implementation. After implementing this on the cleaner side, the 
cleaner regularly checks and cleans the aborted records in the TXN_COMPONENTS 
table, which in turn makes the AcidTxnCleanerService clean the aborted txns in 
TXNS table.
 2) Initiator and worker do not do anything on tables which contain only 
aborted directories. It's the cleaner which removes the aborted directories of 
the table. Hence all operations associated with the initiator and worker for 
these tables are wasteful. These wasteful operations are avoided.
3) DP writes which are aborted are skipped by the worker currently. Hence once 
again the cleaner is the one deleting the aborted directories. All operations 
associated with the initiator and worker for this entry are wasteful. These 
wasteful operations are avoided.

*Proposed solution -* 
*Implement logic to handle aborted transactions exclusively in Cleaner.*
Implement logic to fetch the TXN_COMPONENTS which are associated with 
transactions in aborted state and send the required information to the cleaner. 
Cleaner must clean up the aborted deltas/delete deltas by using the aborted 
directories in the AcidState of the table/partition.
It is also better to separate entities which provide information of compaction 
and abort cleanup to enhance code modularity. This can be done in this way -


Cleaner can be divided into separate entities like - 
*1) Handler* - This entity fetches the data from the metastore DB from relevant 
tables and converts it into a request entity called CleaningRequest. It would 
also do SQL operations post cleanup (postprocess). Every type of cleaning 
request is provided by a separate handler.
*2) Filesystem remover* - This entity fetches the cleaning requests from 
various handlers and deletes them according to the cleaning request.

*This division allows for dynamic extensibility of cleanup from multiple 
handlers. Every handler is responsible for providing cleaning requests from a 
specific source.*

The following solution is resilient i.e. in the event of abrupt metastore 
shutdown, the cleaner can still see the relevant entries in the metastore DB 
and retry the cleaning task for that entry.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27019) Split Cleaner into separate manageable modular entities

2023-02-02 Thread Sourabh Badhya (Jira)
Sourabh Badhya created HIVE-27019:
-

 Summary: Split Cleaner into separate manageable modular entities
 Key: HIVE-27019
 URL: https://issues.apache.org/jira/browse/HIVE-27019
 Project: Hive
  Issue Type: Sub-task
Reporter: Sourabh Badhya
Assignee: Sourabh Badhya


As described by the parent task - 
Cleaner can be divided into separate entities like -
*1) Handler* - This entity fetches the data from the metastore DB from relevant 
tables and converts it into a request entity called CleaningRequest. It would 
also do SQL operations post cleanup (postprocess). Every type of cleaning 
request is provided by a separate handler.
*2) Filesystem remover* - This entity fetches the cleaning requests from 
various handlers and deletes them according to the cleaning request.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27020) Implement a separate handler to handle abort transaction cleanup

2023-02-02 Thread Sourabh Badhya (Jira)
Sourabh Badhya created HIVE-27020:
-

 Summary: Implement a separate handler to handle abort transaction 
cleanup
 Key: HIVE-27020
 URL: https://issues.apache.org/jira/browse/HIVE-27020
 Project: Hive
  Issue Type: Sub-task
Reporter: Sourabh Badhya
Assignee: Sourabh Badhya


As described in the parent task, once the cleaner is separated into different 
entities, implement a separate handler which can handle requests for aborted 
transactions cleanup. This would move the aborted transaction cleanup 
exclusively to the cleaner.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)