[jira] [Work logged] (HIVE-23520) REPL: repl dump could add support for immutable dataset

ASF GitHub Bot (Jira) Fri, 05 Jun 2020 02:00:27 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-23520?focusedWorklogId=441750&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441750
 ]


ASF GitHub Bot logged work on HIVE-23520:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 05/Jun/20 08:59
            Start Date: 05/Jun/20 08:59
    Worklog Time Spent: 10m 
      Work Description: rbalamohan commented on a change in pull request #1060:
URL: https://github.com/apache/hive/pull/1060#discussion_r435785601



##########
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java
##########
@@ -154,7 +157,45 @@ private ReplicationState initialReplicationState() throws 
SemanticException {
     );
   }
 
+  private boolean isImmutableDataCopy() {
+    //at the time of repl dump, data got referenced externally and not part of 
the dump.
+    return HiveConf.getBoolVar(context.hiveConf, 
REPL_DUMP_SKIP_IMMUTABLE_DATA_COPY);
+  }
+
+  /**
+   * Get all partitions and consolidate them into single partition request.
+   * Also, copy relevant stats and other information from original request.
+   *
+   * @throws SemanticException
+   */
+  private void addConsolidatedPartitionDesc() throws Exception {
+    List<AlterTableAddPartitionDesc.PartitionDesc> partitions = new 
LinkedList<>();
+    for (AlterTableAddPartitionDesc alterTableAddPartitionDesc : 
event.partitionDescriptions(tableDesc)) {
+
+      AlterTableAddPartitionDesc.PartitionDesc src = 
alterTableAddPartitionDesc.getPartitions().get(0);
+
+      partitions.add(new AlterTableAddPartitionDesc.PartitionDesc(
+          src.getPartSpec(), src.getLocation(), src.getPartParams(), 
src.getInputFormat(),
+          src.getOutputFormat(), src.getNumBuckets(), src.getCols(), 
src.getSerializationLib(),
+          src.getSerdeParams(), src.getBucketCols(), src.getSortCols(), 
src.getColStats(),
+          src.getWriteId()));
+    }
+    AlterTableAddPartitionDesc consolidatedPartitionDesc = new 
AlterTableAddPartitionDesc(tableDesc.getDatabaseName(),
+        tableDesc.getTableName(), true, partitions);
+
+    addPartition(false, consolidatedPartitionDesc, null);
+    if (partitions.size() > 0) {
+      LOG.info("Added {} partitions", partitions.size());
+    }
+  }
+
   private TaskTracker forNewTable() throws Exception {
+    if (isImmutableDataCopy()) {

Review comment:
       Since we rely on "REPL_DUMP_SKIP_IMMUTABLE_DATA_COPY" in both dump/load, 
I haven't added that config here.  We didn't honor metadata only dump anyways 
earlier https://issues.apache.org/jira/browse/HIVE-23499




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 441750)
    Time Spent: 40m  (was: 0.5h)

> REPL: repl dump could add support for immutable dataset
> -------------------------------------------------------
>
>                 Key: HIVE-23520
>                 URL: https://issues.apache.org/jira/browse/HIVE-23520
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>            Priority: Minor
>              Labels: pull-request-available
>         Attachments: HIVE-23520.1.patch
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently, "REPL DUMP" ends up copying entire dataset along with partition 
> information, stats etc in its dump folder. However, there are cases (e.g 
> large reference datasets), where we need a way to just retain metadata along 
> with partition information & stats.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23520) REPL: repl dump could add support for immutable dataset

Reply via email to