[GitHub] [hudi] vinothchandar commented on a change in pull request #1858: [WIP] [1014] Part 1: Adding Upgrade or downgrade infra

2020-07-29 Thread GitBox


vinothchandar commented on a change in pull request #1858:
URL: https://github.com/apache/hudi/pull/1858#discussion_r461265990



##
File path: 
hudi-client/src/main/java/org/apache/hudi/table/action/rollback/CopyOnWriteRollbackActionExecutor.java
##
@@ -100,8 +108,13 @@ public CopyOnWriteRollbackActionExecutor(JavaSparkContext 
jsc,
   }
 
   @Override
-  protected List 
executeRollbackUsingFileListing(HoodieInstant instantToRollback) {
+  protected List 
executeRollbackUsingFileListing(HoodieInstant instantToRollback, boolean 
doDelete) {
 List rollbackRequests = 
generateRollbackRequestsByListing();
-return new ListingBasedRollbackHelper(table.getMetaClient(), 
config).performRollback(jsc, instantToRollback, rollbackRequests);
+ListingBasedRollbackHelper listingBasedRollbackHelper = new 
ListingBasedRollbackHelper(table.getMetaClient(), config);
+if(doDelete) {
+  return listingBasedRollbackHelper.performRollback(jsc, 
instantToRollback, rollbackRequests);

Review comment:
   as discussed, we can just call the `collectRollbackStats` directly, 
assuming listing based rollback strategy.

##
File path: 
hudi-client/src/main/java/org/apache/hudi/table/action/rollback/ListingBasedRollbackHelper.java
##
@@ -68,34 +69,38 @@ public ListingBasedRollbackHelper(HoodieTableMetaClient 
metaClient, HoodieWriteC
* Performs all rollback actions that we have collected in parallel.
*/
   public List performRollback(JavaSparkContext jsc, 
HoodieInstant instantToRollback, List 
rollbackRequests) {
-SerializablePathFilter filter = (path) -> {

Review comment:
   ack

##
File path: 
hudi-client/src/main/java/org/apache/hudi/table/action/rollback/ListingBasedRollbackHelper.java
##
@@ -130,39 +137,55 @@ public ListingBasedRollbackHelper(HoodieTableMetaClient 
metaClient, HoodieWriteC
   1L
   );
   return new Tuple2<>(rollbackRequest.getPartitionPath(),
-  
HoodieRollbackStat.newBuilder().withPartitionPath(rollbackRequest.getPartitionPath())
-  
.withRollbackBlockAppendResults(filesToNumBlocksRollback).build());
+  
HoodieRollbackStat.newBuilder().withPartitionPath(rollbackRequest.getPartitionPath())
+  
.withRollbackBlockAppendResults(filesToNumBlocksRollback).build());
 }
 default:
   throw new IllegalStateException("Unknown Rollback action " + 
rollbackRequest);
   }
-}).reduceByKey(RollbackUtils::mergeRollbackStat).map(Tuple2::_2).collect();
+});
   }
 
 
-
   /**
* Common method used for cleaning out base files under a partition path 
during rollback of a set of commits.
*/
-  private Map deleteCleanedFiles(HoodieTableMetaClient 
metaClient, HoodieWriteConfig config,
-  String partitionPath, 
PathFilter filter) throws IOException {
+  private Map deleteBaseAndLogFiles(HoodieTableMetaClient 
metaClient, HoodieWriteConfig config,

Review comment:
   its hard. but would nt MERGE work for both in terms of actually 
performing a correct rollback? 

##
File path: 
hudi-client/src/main/java/org/apache/hudi/table/action/rollback/ListingBasedRollbackHelper.java
##
@@ -130,39 +137,55 @@ public ListingBasedRollbackHelper(HoodieTableMetaClient 
metaClient, HoodieWriteC
   1L
   );
   return new Tuple2<>(rollbackRequest.getPartitionPath(),
-  
HoodieRollbackStat.newBuilder().withPartitionPath(rollbackRequest.getPartitionPath())
-  
.withRollbackBlockAppendResults(filesToNumBlocksRollback).build());
+  
HoodieRollbackStat.newBuilder().withPartitionPath(rollbackRequest.getPartitionPath())
+  
.withRollbackBlockAppendResults(filesToNumBlocksRollback).build());
 }
 default:
   throw new IllegalStateException("Unknown Rollback action " + 
rollbackRequest);
   }
-}).reduceByKey(RollbackUtils::mergeRollbackStat).map(Tuple2::_2).collect();
+});
   }
 
 
-
   /**
* Common method used for cleaning out base files under a partition path 
during rollback of a set of commits.
*/
-  private Map deleteCleanedFiles(HoodieTableMetaClient 
metaClient, HoodieWriteConfig config,
-  String partitionPath, 
PathFilter filter) throws IOException {
+  private Map deleteBaseAndLogFiles(HoodieTableMetaClient 
metaClient, HoodieWriteConfig config,
+  String commit, String partitionPath, boolean doDelete) throws 
IOException {
 LOG.info("Cleaning path " + partitionPath);
+String basefileExtension = 
metaClient.getTableConfig().getBaseFileFormat().getFileExtension();
+SerializablePathFilter filter = (path) -> {
+  if (path.toString().endsWith(basefileExtension)) {
+String fileCommitTime = FSUtils.getCommitTime(path.getName());
+return commit.equ

[GitHub] [hudi] vinothchandar commented on a change in pull request #1858: [WIP] [1014] Part 1: Adding Upgrade or downgrade infra

2020-07-24 Thread GitBox


vinothchandar commented on a change in pull request #1858:
URL: https://github.com/apache/hudi/pull/1858#discussion_r460343368



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java
##
@@ -151,6 +154,27 @@ public HoodieTableType getTableType() {
 : Option.empty();
   }
 
+  /**
+   * @return the table version from .hoodie properties file.
+   */
+  public HoodieTableVersion getHoodieTableVersionFromPropertyFile() {
+if (props.contains(HOODIE_TABLE_VERSION_PROP_NAME)) {
+  String propValue = props.getProperty(HOODIE_TABLE_VERSION_PROP_NAME);
+  if (propValue.equals(HoodieTableVersion.ZERO_SIX_ZERO.version)) {
+return HoodieTableVersion.ZERO_SIX_ZERO;
+  }
+}
+return DEFAULT_TABLE_VERSION;
+  }
+
+  /**
+   * @return the current hoodie table version.
+   */
+  public HoodieTableVersion getCurrentHoodieTableVersion() {
+// TODO: fetch current version dynamically

Review comment:
   `HoodieTableVersion` or someplace we need to ahve a `CURR_VERSION` 
variable that gets bumped to 0.6.1 . 
   
   More I think about this. I think its better to name the versions 0,1,2... 
and so on, instead of release numbers. we may not bump this up every release . 
only when upgrade/downgrade is necessary. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar commented on a change in pull request #1858: [WIP] [1014] Part 1: Adding Upgrade or downgrade infra

2020-07-22 Thread GitBox


vinothchandar commented on a change in pull request #1858:
URL: https://github.com/apache/hudi/pull/1858#discussion_r458729215



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java
##
@@ -151,6 +154,27 @@ public HoodieTableType getTableType() {
 : Option.empty();
   }
 
+  /**
+   * @return the table version from .hoodie properties file.
+   */
+  public HoodieTableVersion getHoodieTableVersionFromPropertyFile() {
+if (props.contains(HOODIE_TABLE_VERSION_PROP_NAME)) {
+  String propValue = props.getProperty(HOODIE_TABLE_VERSION_PROP_NAME);
+  if (propValue.equals(HoodieTableVersion.ZERO_SIX_ZERO.version)) {
+return HoodieTableVersion.ZERO_SIX_ZERO;
+  }
+}
+return DEFAULT_TABLE_VERSION;
+  }
+
+  /**
+   * @return the current hoodie table version.
+   */
+  public HoodieTableVersion getCurrentHoodieTableVersion() {
+// TODO: fetch current version dynamically

Review comment:
   By reading `hoodie.properties` we need to treat everything below 0.6.0 
as V_PRE_0.6.0 . Cannot deduce the actual jars per se. 
   
   We can also make these version numbers 0,1,2 instead of PRE_0.6.0, 0.6.0, 
0.7.0 and so on. ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar commented on a change in pull request #1858: [WIP] [1014] Part 1: Adding Upgrade or downgrade infra

2020-07-21 Thread GitBox


vinothchandar commented on a change in pull request #1858:
URL: https://github.com/apache/hudi/pull/1858#discussion_r458412092



##
File path: 
hudi-client/src/main/java/org/apache/hudi/table/UpgradeDowngradeHelper.java
##
@@ -0,0 +1,175 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.table;
+
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.HoodieTableVersion;
+import org.apache.hudi.exception.HoodieException;
+
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileUtil;
+import org.apache.hadoop.fs.Path;
+
+import java.io.IOException;
+import java.util.Properties;
+
+/**
+ * Helper class to assist in upgrading/downgrading Hoodie when there is a 
version change.
+ */
+public class UpgradeDowngradeHelper {
+
+  public static final String HOODIE_ORIG_PROPERTY_FILE = 
"hoodie.properties.orig";
+
+  /**
+   * Perform Upgrade or Downgrade steps if required and updated table version 
if need be.
+   * 
+   * Starting from version 0.6.0, this upgrade/downgrade step will be added in 
all write paths.
+   * Essentially, if a dataset was created using any pre 0.6.0(for eg 0.5.3), 
and Hoodie verion was upgraded to 0.6.0, there are some upgrade steps need
+   * to be executed before doing any writes.
+   * Similarly, if a dataset was created using 0.6.0 and then hoodie was 
downgraded, some downgrade steps need to be executed before proceeding w/ any 
writes.
+   * On a high level, these are the steps performed
+   * Step1 : Understand current hoodie version and table version from 
hoodie.properties file
+   * Step2 : Fix any residues from previous upgrade/downgrade
+   * Step3 : Check for version upgrade/downgrade.
+   * Step4 : If upgrade/downgrade is required, perform the steps required for 
the same.
+   * Step5 : Copy hoodie.properties -> hoodie.properties.orig
+   * Step6 : Update hoodie.properties file with current table version
+   * Step7 : Delete hoodie.properties.orig
+   * 
+   * @param metaClient instance of {@link HoodieTableMetaClient} to use
+   * @throws IOException
+   */
+  public static void doUpgradeOrDowngrade(HoodieTableMetaClient metaClient) 
throws IOException {
+// Fetch version from property file and current version
+HoodieTableVersion versionFromPropertyFile = 
metaClient.getTableConfig().getHoodieTableVersionFromPropertyFile();
+HoodieTableVersion currentVersion = 
metaClient.getTableConfig().getCurrentHoodieTableVersion();
+
+Path metaPath = new Path(metaClient.getMetaPath());
+Path originalHoodiePropertyPath = 
getOrigHoodiePropertyFilePath(metaPath.toString());
+
+boolean updateTableVersionInPropertyFile = false;
+
+if (metaClient.getFs().exists(originalHoodiePropertyPath)) {
+  // if hoodie.properties.orig exists, rename to hoodie.properties and 
skip upgrade/downgrade step
+  metaClient.getFs().rename(originalHoodiePropertyPath, 
getHoodiePropertyFilePath(metaPath.toString()));
+  updateTableVersionInPropertyFile = true;
+} else {
+  // upgrade or downgrade if there is a version mismatch
+  if (versionFromPropertyFile != currentVersion) {
+updateTableVersionInPropertyFile = true;
+if (versionFromPropertyFile == HoodieTableVersion.PRE_ZERO_SIZE_ZERO 
&& currentVersion == HoodieTableVersion.ZERO_SIX_ZERO) {
+  upgradeFromOlderToZeroSixZero();
+} else if (versionFromPropertyFile == HoodieTableVersion.ZERO_SIX_ZERO 
&& currentVersion == HoodieTableVersion.PRE_ZERO_SIZE_ZERO) {
+  downgradeFromZeroSixZeroToPreZeroSixZero();
+} else {
+  throw new HoodieException("Illegal state wrt table versions. Version 
from proerpty file " + versionFromPropertyFile + " and current version " + 
currentVersion);
+}
+  }
+}
+
+/**
+ * If table version needs to be updated in hoodie.properties file.
+ * Step1: Copy hoodie.properties to hoodie.properties.orig
+ * Step2: add table.version to hoodie.properties
+ * Step3: delete hoodie.properties.orig
+ */
+if (updateTableVer

[GitHub] [hudi] vinothchandar commented on a change in pull request #1858: [WIP] [1014] Part 1: Adding Upgrade or downgrade infra

2020-07-21 Thread GitBox


vinothchandar commented on a change in pull request #1858:
URL: https://github.com/apache/hudi/pull/1858#discussion_r458411863



##
File path: 
hudi-client/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
##
@@ -190,6 +192,7 @@ public HoodieMetrics getMetrics() {
*/
   protected HoodieTable getTableAndInitCtx(WriteOperationType operationType) {
 HoodieTableMetaClient metaClient = createMetaClient(true);
+mayBeUpradeOrDowngrade(metaClient);

Review comment:
   looks ok. Mostly sure





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org