[ 
https://issues.apache.org/jira/browse/HIVE-23520?focusedWorklogId=441768&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441768
 ]

ASF GitHub Bot logged work on HIVE-23520:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 05/Jun/20 10:01
            Start Date: 05/Jun/20 10:01
    Worklog Time Spent: 10m 
      Work Description: rbalamohan commented on a change in pull request #1060:
URL: https://github.com/apache/hive/pull/1060#discussion_r435820119



##########
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java
##########
@@ -154,7 +157,45 @@ private ReplicationState initialReplicationState() throws 
SemanticException {
     );
   }
 
+  private boolean isImmutableDataCopy() {
+    //at the time of repl dump, data got referenced externally and not part of 
the dump.
+    return HiveConf.getBoolVar(context.hiveConf, 
REPL_DUMP_SKIP_IMMUTABLE_DATA_COPY);
+  }
+
+  /**
+   * Get all partitions and consolidate them into single partition request.
+   * Also, copy relevant stats and other information from original request.
+   *
+   * @throws SemanticException
+   */
+  private void addConsolidatedPartitionDesc() throws Exception {
+    List<AlterTableAddPartitionDesc.PartitionDesc> partitions = new 
LinkedList<>();
+    for (AlterTableAddPartitionDesc alterTableAddPartitionDesc : 
event.partitionDescriptions(tableDesc)) {
+
+      AlterTableAddPartitionDesc.PartitionDesc src = 
alterTableAddPartitionDesc.getPartitions().get(0);
+
+      partitions.add(new AlterTableAddPartitionDesc.PartitionDesc(
+          src.getPartSpec(), src.getLocation(), src.getPartParams(), 
src.getInputFormat(),
+          src.getOutputFormat(), src.getNumBuckets(), src.getCols(), 
src.getSerializationLib(),
+          src.getSerdeParams(), src.getBucketCols(), src.getSortCols(), 
src.getColStats(),
+          src.getWriteId()));
+    }
+    AlterTableAddPartitionDesc consolidatedPartitionDesc = new 
AlterTableAddPartitionDesc(tableDesc.getDatabaseName(),
+        tableDesc.getTableName(), true, partitions);
+
+    addPartition(false, consolidatedPartitionDesc, null);
+    if (partitions.size() > 0) {
+      LOG.info("Added {} partitions", partitions.size());
+    }
+  }
+
   private TaskTracker forNewTable() throws Exception {
+    if (isImmutableDataCopy()) {

Review comment:
       Synced up offline. Added REPL_DUMP_METADATA_ONLY flag as well, so that 
this optimisation is available for metadata only operation as well.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 441768)
    Time Spent: 50m  (was: 40m)

> REPL: repl dump could add support for immutable dataset
> -------------------------------------------------------
>
>                 Key: HIVE-23520
>                 URL: https://issues.apache.org/jira/browse/HIVE-23520
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>            Priority: Minor
>              Labels: pull-request-available
>         Attachments: HIVE-23520.1.patch
>
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently, "REPL DUMP" ends up copying entire dataset along with partition 
> information, stats etc in its dump folder. However, there are cases (e.g 
> large reference datasets), where we need a way to just retain metadata along 
> with partition information & stats.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to