[ 
https://issues.apache.org/jira/browse/HIVE-24367?focusedWorklogId=707444&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-707444
 ]

ASF GitHub Bot logged work on HIVE-24367:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 12/Jan/22 12:20
            Start Date: 12/Jan/22 12:20
    Worklog Time Spent: 10m 
      Work Description: HarshitGupta11 commented on a change in pull request 
#2771:
URL: https://github.com/apache/hive/pull/2771#discussion_r783023540



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##########
@@ -1230,12 +1232,16 @@ else if(statementId != parsedDelta.statementId) {
    * that for any dir, either all files are acid or all are not.
    */
   public static ParsedDelta parsedDelta(Path deltaDir, FileSystem fs) throws 
IOException {
-    return parsedDelta(deltaDir, fs, null);
+    return parsedDelta(deltaDir, fs, null, false, -1);
   }
 
-  private static ParsedDelta parsedDelta(Path deltaDir, FileSystem fs, 
HdfsDirSnapshot dirSnapshot)
+  private static ParsedDelta parsedDelta(Path deltaDir, FileSystem fs, 
HdfsDirSnapshot dirSnapshot,
+      boolean canTrim, long highWaterMark)
       throws IOException {
     ParsedDeltaLight deltaLight = ParsedDeltaLight.parse(deltaDir);
+    if(canTrim && !(deltaLight.minWriteId >= highWaterMark)){

Review comment:
       Yeah, it will not read the deltas that are not part of the current 
transaction or were written before the current transaction.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 707444)
    Time Spent: 1h 50m  (was: 1h 40m)

> Explore whether HiveAlterHandler::alterTable can be optimised for 
> non-partitioned tables
> ----------------------------------------------------------------------------------------
>
>                 Key: HIVE-24367
>                 URL: https://issues.apache.org/jira/browse/HIVE-24367
>             Project: Hive
>          Issue Type: Improvement
>          Components: HiveServer2
>            Reporter: Rajesh Balamohan
>            Assignee: Harshit Gupta
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> {color:#222222}Writing lots of delta in non-partitioned table creates runtime 
> issues, when lot of delta folders are present.{color}
> {color:#222222} {color}
> {color:#222222}Following code in HiveAlterHandler is invoked for every insert 
> operation. It computes {{{color}
> {color:#222222}updateTableStatsSlow}} for every insert causing runtime 
> delays.{color}
> {color:#222222} {color}
> {noformat}
> if (MetaStoreUtils.requireCalStats(null, null, newt, environmentContext) &&
>     !isPartitionedTable) {
>   Database db = msdb.getDatabase(catName, newDbName);
>   assert(isReplicated == HiveMetaStore.HMSHandler.isDbReplicationTarget(db));
>   // Update table stats. For partitioned table, we update stats in 
> alterPartition()
>   MetaStoreUtils.updateTableStatsSlow(db, newt, wh, false, true, 
> environmentContext);
> }
> {noformat}
> {color:#222222}It would be good to explore whether only the newly added delta 
> can be listed for computing stats. This would avoid huge listing call during 
> stats collection.{color}
> {color:#222222}e.g queries to repro{color}
> {noformat}
> CREATE TABLE IF NOT EXISTS test (name String, value int);
> INSERT INTO test VALUES('K1',1);
> INSERT INTO test VALUES('K2',2);
> ..
> ..
> ..
> INSERT INTO test VALUES('K20000',2)
>  {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to