[
https://issues.apache.org/jira/browse/HIVE-23520?focusedWorklogId=441744&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-441744
]
ASF GitHub Bot logged work on HIVE-23520:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 05/Jun/20 08:47
Start Date: 05/Jun/20 08:47
Worklog Time Spent: 10m
Work Description: aasha commented on a change in pull request #1060:
URL: https://github.com/apache/hive/pull/1060#discussion_r435778877
##########
File path:
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java
##########
@@ -154,7 +157,45 @@ private ReplicationState initialReplicationState() throws
SemanticException {
);
}
+ private boolean isImmutableDataCopy() {
+ //at the time of repl dump, data got referenced externally and not part of
the dump.
+ return HiveConf.getBoolVar(context.hiveConf,
REPL_DUMP_SKIP_IMMUTABLE_DATA_COPY);
+ }
+
+ /**
+ * Get all partitions and consolidate them into single partition request.
+ * Also, copy relevant stats and other information from original request.
+ *
+ * @throws SemanticException
+ */
+ private void addConsolidatedPartitionDesc() throws Exception {
+ List<AlterTableAddPartitionDesc.PartitionDesc> partitions = new
LinkedList<>();
+ for (AlterTableAddPartitionDesc alterTableAddPartitionDesc :
event.partitionDescriptions(tableDesc)) {
+
+ AlterTableAddPartitionDesc.PartitionDesc src =
alterTableAddPartitionDesc.getPartitions().get(0);
+
+ partitions.add(new AlterTableAddPartitionDesc.PartitionDesc(
+ src.getPartSpec(), src.getLocation(), src.getPartParams(),
src.getInputFormat(),
+ src.getOutputFormat(), src.getNumBuckets(), src.getCols(),
src.getSerializationLib(),
+ src.getSerdeParams(), src.getBucketCols(), src.getSortCols(),
src.getColStats(),
+ src.getWriteId()));
+ }
+ AlterTableAddPartitionDesc consolidatedPartitionDesc = new
AlterTableAddPartitionDesc(tableDesc.getDatabaseName(),
+ tableDesc.getTableName(), true, partitions);
+
+ addPartition(false, consolidatedPartitionDesc, null);
+ if (partitions.size() > 0) {
+ LOG.info("Added {} partitions", partitions.size());
+ }
+ }
+
private TaskTracker forNewTable() throws Exception {
+ if (isImmutableDataCopy()) {
Review comment:
can we do this for both isImmutableDataCopy and repl dump metadata only
task?
This optimization will be helpful for all metadata only load right?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 441744)
Time Spent: 0.5h (was: 20m)
> REPL: repl dump could add support for immutable dataset
> -------------------------------------------------------
>
> Key: HIVE-23520
> URL: https://issues.apache.org/jira/browse/HIVE-23520
> Project: Hive
> Issue Type: Improvement
> Reporter: Rajesh Balamohan
> Assignee: Rajesh Balamohan
> Priority: Minor
> Labels: pull-request-available
> Attachments: HIVE-23520.1.patch
>
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> Currently, "REPL DUMP" ends up copying entire dataset along with partition
> information, stats etc in its dump folder. However, there are cases (e.g
> large reference datasets), where we need a way to just retain metadata along
> with partition information & stats.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)