[
https://issues.apache.org/jira/browse/HIVE-23520?focusedWorklogId=441750&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441750
]
ASF GitHub Bot logged work on HIVE-23520:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 05/Jun/20 08:59
Start Date: 05/Jun/20 08:59
Worklog Time Spent: 10m
Work Description: rbalamohan commented on a change in pull request #1060:
URL: https://github.com/apache/hive/pull/1060#discussion_r435785601
##########
File path:
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java
##########
@@ -154,7 +157,45 @@ private ReplicationState initialReplicationState() throws
SemanticException {
);
}
+ private boolean isImmutableDataCopy() {
+ //at the time of repl dump, data got referenced externally and not part of
the dump.
+ return HiveConf.getBoolVar(context.hiveConf,
REPL_DUMP_SKIP_IMMUTABLE_DATA_COPY);
+ }
+
+ /**
+ * Get all partitions and consolidate them into single partition request.
+ * Also, copy relevant stats and other information from original request.
+ *
+ * @throws SemanticException
+ */
+ private void addConsolidatedPartitionDesc() throws Exception {
+ List<AlterTableAddPartitionDesc.PartitionDesc> partitions = new
LinkedList<>();
+ for (AlterTableAddPartitionDesc alterTableAddPartitionDesc :
event.partitionDescriptions(tableDesc)) {
+
+ AlterTableAddPartitionDesc.PartitionDesc src =
alterTableAddPartitionDesc.getPartitions().get(0);
+
+ partitions.add(new AlterTableAddPartitionDesc.PartitionDesc(
+ src.getPartSpec(), src.getLocation(), src.getPartParams(),
src.getInputFormat(),
+ src.getOutputFormat(), src.getNumBuckets(), src.getCols(),
src.getSerializationLib(),
+ src.getSerdeParams(), src.getBucketCols(), src.getSortCols(),
src.getColStats(),
+ src.getWriteId()));
+ }
+ AlterTableAddPartitionDesc consolidatedPartitionDesc = new
AlterTableAddPartitionDesc(tableDesc.getDatabaseName(),
+ tableDesc.getTableName(), true, partitions);
+
+ addPartition(false, consolidatedPartitionDesc, null);
+ if (partitions.size() > 0) {
+ LOG.info("Added {} partitions", partitions.size());
+ }
+ }
+
private TaskTracker forNewTable() throws Exception {
+ if (isImmutableDataCopy()) {
Review comment:
Since we rely on "REPL_DUMP_SKIP_IMMUTABLE_DATA_COPY" in both dump/load,
I haven't added that config here. We didn't honor metadata only dump anyways
earlier https://issues.apache.org/jira/browse/HIVE-23499
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 441750)
Time Spent: 40m (was: 0.5h)
> REPL: repl dump could add support for immutable dataset
> -------------------------------------------------------
>
> Key: HIVE-23520
> URL: https://issues.apache.org/jira/browse/HIVE-23520
> Project: Hive
> Issue Type: Improvement
> Reporter: Rajesh Balamohan
> Assignee: Rajesh Balamohan
> Priority: Minor
> Labels: pull-request-available
> Attachments: HIVE-23520.1.patch
>
> Time Spent: 40m
> Remaining Estimate: 0h
>
> Currently, "REPL DUMP" ends up copying entire dataset along with partition
> information, stats etc in its dump folder. However, there are cases (e.g
> large reference datasets), where we need a way to just retain metadata along
> with partition information & stats.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)