[jira] [Work logged] (HIVE-23520) REPL: repl dump could add support for immutable dataset

ASF GitHub Bot (Jira) Fri, 05 Jun 2020 01:48:12 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-23520?focusedWorklogId=441744&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441744
 ]


ASF GitHub Bot logged work on HIVE-23520:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 05/Jun/20 08:47
            Start Date: 05/Jun/20 08:47
    Worklog Time Spent: 10m 
      Work Description: aasha commented on a change in pull request #1060:
URL: https://github.com/apache/hive/pull/1060#discussion_r435778877



##########
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java
##########
@@ -154,7 +157,45 @@ private ReplicationState initialReplicationState() throws 
SemanticException {
     );
   }
 
+  private boolean isImmutableDataCopy() {
+    //at the time of repl dump, data got referenced externally and not part of 
the dump.
+    return HiveConf.getBoolVar(context.hiveConf, 
REPL_DUMP_SKIP_IMMUTABLE_DATA_COPY);
+  }
+
+  /**
+   * Get all partitions and consolidate them into single partition request.
+   * Also, copy relevant stats and other information from original request.
+   *
+   * @throws SemanticException
+   */
+  private void addConsolidatedPartitionDesc() throws Exception {
+    List<AlterTableAddPartitionDesc.PartitionDesc> partitions = new 
LinkedList<>();
+    for (AlterTableAddPartitionDesc alterTableAddPartitionDesc : 
event.partitionDescriptions(tableDesc)) {
+
+      AlterTableAddPartitionDesc.PartitionDesc src = 
alterTableAddPartitionDesc.getPartitions().get(0);
+
+      partitions.add(new AlterTableAddPartitionDesc.PartitionDesc(
+          src.getPartSpec(), src.getLocation(), src.getPartParams(), 
src.getInputFormat(),
+          src.getOutputFormat(), src.getNumBuckets(), src.getCols(), 
src.getSerializationLib(),
+          src.getSerdeParams(), src.getBucketCols(), src.getSortCols(), 
src.getColStats(),
+          src.getWriteId()));
+    }
+    AlterTableAddPartitionDesc consolidatedPartitionDesc = new 
AlterTableAddPartitionDesc(tableDesc.getDatabaseName(),
+        tableDesc.getTableName(), true, partitions);
+
+    addPartition(false, consolidatedPartitionDesc, null);
+    if (partitions.size() > 0) {
+      LOG.info("Added {} partitions", partitions.size());
+    }
+  }
+
   private TaskTracker forNewTable() throws Exception {
+    if (isImmutableDataCopy()) {

Review comment:
       can we do this for both isImmutableDataCopy and repl dump metadata only 
task? 
   This optimization will be helpful for all metadata only load right?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 441744)
    Time Spent: 0.5h  (was: 20m)

> REPL: repl dump could add support for immutable dataset
> -------------------------------------------------------
>
>                 Key: HIVE-23520
>                 URL: https://issues.apache.org/jira/browse/HIVE-23520
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>            Priority: Minor
>              Labels: pull-request-available
>         Attachments: HIVE-23520.1.patch
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently, "REPL DUMP" ends up copying entire dataset along with partition 
> information, stats etc in its dump folder. However, there are cases (e.g 
> large reference datasets), where we need a way to just retain metadata along 
> with partition information & stats.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23520) REPL: repl dump could add support for immutable dataset

Reply via email to