vinothchandar commented on a change in pull request #2136:
URL: https://github.com/apache/hudi/pull/2136#discussion_r520944072



##########
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndexUtils.java
##########
@@ -85,4 +91,61 @@ public static HoodieRecord getTaggedRecord(HoodieRecord 
inputRecord, Option<Hood
     }
     return record;
   }
+
+  /**
+   * Check compatible between new writeIndexType and indexType already in 
hoodie.properties.
+   * @param writeIndexType new indexType
+   * @param persistIndexType indexType already in hoodie.properties
+   */
+  public static void checkIndexTypeCompatible(IndexType writeIndexType, 
IndexType persistIndexType) {

Review comment:
       reads like poetry!  thanks! 

##########
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndex.java
##########
@@ -97,13 +98,20 @@ public abstract O updateLocation(O writeStatuses, 
HoodieEngineContext context,
   @PublicAPIMethod(maturity = ApiMaturityLevel.STABLE)
   public abstract boolean isImplicitWithStorage();
 
+  /**
+   * Get the index Type.
+   */
+  public IndexType indexType() {

Review comment:
       should we leave this abstract? So, any new index is forced to implement 
this? Downside is, when people upgrade, they may need to implement the method 
(may be ok). 
   I don't think many outside users have custom indexes now. cc @leesf do you 
know of any? 
   

##########
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndexUtils.java
##########
@@ -85,4 +91,61 @@ public static HoodieRecord getTaggedRecord(HoodieRecord 
inputRecord, Option<Hood
     }
     return record;
   }
+
+  /**
+   * Check compatible between new writeIndexType and indexType already in 
hoodie.properties.
+   * @param writeIndexType new indexType
+   * @param persistIndexType indexType already in hoodie.properties
+   */
+  public static void checkIndexTypeCompatible(IndexType writeIndexType, 
IndexType persistIndexType) {
+    boolean isTypeCompatible = false;
+    switch (persistIndexType) {
+      case GLOBAL_BLOOM:
+        if (writeIndexType.equals(IndexType.GLOBAL_BLOOM)
+            || writeIndexType.equals(IndexType.BLOOM)
+            || writeIndexType.equals(IndexType.SIMPLE)
+            || writeIndexType.equals(IndexType.GLOBAL_SIMPLE)) {
+          isTypeCompatible = true;
+        }
+        break;
+      case GLOBAL_SIMPLE:
+        if (writeIndexType.equals(IndexType.GLOBAL_SIMPLE)
+            || writeIndexType.equals(IndexType.GLOBAL_BLOOM)) {
+          isTypeCompatible = true;
+        }
+        break;
+      case SIMPLE:
+        if (writeIndexType.equals(IndexType.SIMPLE)) {
+          isTypeCompatible = true;
+        }
+        break;
+      case BLOOM:
+        if (writeIndexType.equals(IndexType.BLOOM)) {
+          isTypeCompatible = true;

Review comment:
       just one line?
   
   `isTypeCompatible = writeIndexType.equals(IndexType.BLOOM)`

##########
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndexUtils.java
##########
@@ -85,4 +91,61 @@ public static HoodieRecord getTaggedRecord(HoodieRecord 
inputRecord, Option<Hood
     }
     return record;
   }
+
+  /**
+   * Check compatible between new writeIndexType and indexType already in 
hoodie.properties.
+   * @param writeIndexType new indexType
+   * @param persistIndexType indexType already in hoodie.properties
+   */
+  public static void checkIndexTypeCompatible(IndexType writeIndexType, 
IndexType persistIndexType) {
+    boolean isTypeCompatible = false;
+    switch (persistIndexType) {
+      case GLOBAL_BLOOM:
+        if (writeIndexType.equals(IndexType.GLOBAL_BLOOM)
+            || writeIndexType.equals(IndexType.BLOOM)
+            || writeIndexType.equals(IndexType.SIMPLE)
+            || writeIndexType.equals(IndexType.GLOBAL_SIMPLE)) {
+          isTypeCompatible = true;
+        }
+        break;
+      case GLOBAL_SIMPLE:
+        if (writeIndexType.equals(IndexType.GLOBAL_SIMPLE)
+            || writeIndexType.equals(IndexType.GLOBAL_BLOOM)) {
+          isTypeCompatible = true;
+        }
+        break;
+      case SIMPLE:
+        if (writeIndexType.equals(IndexType.SIMPLE)) {
+          isTypeCompatible = true;
+        }
+        break;
+      case BLOOM:
+        if (writeIndexType.equals(IndexType.BLOOM)) {
+          isTypeCompatible = true;
+        }
+        break;
+      case INMEMORY:
+        if (writeIndexType.equals(IndexType.INMEMORY)) {
+          isTypeCompatible = true;
+        }
+        LOG.error("PersistIndexType INMEMORY can not be used in production");
+        break;
+      case HBASE:
+        if (writeIndexType.equals(IndexType.HBASE)) {
+          isTypeCompatible = true;
+        }
+        break;
+      case CUSTOM:
+        if (writeIndexType.equals(IndexType.CUSTOM)) {
+          isTypeCompatible = true;
+        }
+        break;
+      default:
+        throw new HoodieIndexException("Index type" + persistIndexType + " 
unspecified in  properties file");
+    }
+    if (!isTypeCompatible) {
+      throw new HoodieException("The new write indextype " + writeIndexType

Review comment:
       throw an `HoodieIndexException`? 

##########
File path: 
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/testutils/HoodieClientTestHarness.java
##########
@@ -100,11 +101,19 @@ public void setTestMethodName(TestInfo testInfo) {
    * Initializes resource group for the subclasses of {@link 
HoodieClientTestBase}.
    */
   public void initResources() throws IOException {
+    Properties properties = new Properties();
+    initResources(properties);

Review comment:
       single line? 

##########
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/SparkRDDWriteClient.java
##########
@@ -88,7 +91,18 @@ public static SparkConf registerClasses(SparkConf conf) {
 
   @Override
   protected HoodieIndex<T, JavaRDD<HoodieRecord<T>>, JavaRDD<HoodieKey>, 
JavaRDD<WriteStatus>> createIndex(HoodieWriteConfig writeConfig) {
-    return SparkHoodieIndex.createIndex(config);
+    String persistIndexType = null;
+    try {
+      persistIndexType = 
this.createMetaClient(true).getTableConfig().getProperties().getProperty(HoodieIndexConfig.INDEX_TYPE_PROP);

Review comment:
       should we create a new metaClient. Can we see if we can pass in 
metaClient from the caller? 

##########
File path: hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
##########
@@ -108,11 +109,13 @@ private[hudi] object HoodieSparkSqlWriter {
       handleSaveModes(mode, basePath, tableConfig, tblName, operation, fs)
       // Create the table if not present
       if (!tableExists) {
-        val archiveLogFolder = parameters.getOrElse(
-          HoodieTableConfig.HOODIE_ARCHIVELOG_FOLDER_PROP_NAME, "archived")
+        val archiveLogFolder = 
parameters.get(HoodieTableConfig.HOODIE_ARCHIVELOG_FOLDER_PROP_NAME).get
+        val hoodieWriteConfig = 
DataSourceUtils.createHoodieConfig(Schema.create(Schema.Type.NULL).toString, 
path.get, tblName, mapAsJavaMap(parameters))
+        val index = SparkHoodieIndex.createIndex(hoodieWriteConfig)

Review comment:
       you can just convert the config value to the enum right?

##########
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndexUtils.java
##########
@@ -85,4 +91,61 @@ public static HoodieRecord getTaggedRecord(HoodieRecord 
inputRecord, Option<Hood
     }
     return record;
   }
+
+  /**
+   * Check compatible between new writeIndexType and indexType already in 
hoodie.properties.
+   * @param writeIndexType new indexType
+   * @param persistIndexType indexType already in hoodie.properties
+   */
+  public static void checkIndexTypeCompatible(IndexType writeIndexType, 
IndexType persistIndexType) {
+    boolean isTypeCompatible = false;
+    switch (persistIndexType) {
+      case GLOBAL_BLOOM:
+        if (writeIndexType.equals(IndexType.GLOBAL_BLOOM)
+            || writeIndexType.equals(IndexType.BLOOM)
+            || writeIndexType.equals(IndexType.SIMPLE)
+            || writeIndexType.equals(IndexType.GLOBAL_SIMPLE)) {
+          isTypeCompatible = true;
+        }
+        break;
+      case GLOBAL_SIMPLE:
+        if (writeIndexType.equals(IndexType.GLOBAL_SIMPLE)
+            || writeIndexType.equals(IndexType.GLOBAL_BLOOM)) {
+          isTypeCompatible = true;
+        }
+        break;
+      case SIMPLE:
+        if (writeIndexType.equals(IndexType.SIMPLE)) {
+          isTypeCompatible = true;
+        }
+        break;
+      case BLOOM:
+        if (writeIndexType.equals(IndexType.BLOOM)) {
+          isTypeCompatible = true;
+        }
+        break;
+      case INMEMORY:
+        if (writeIndexType.equals(IndexType.INMEMORY)) {
+          isTypeCompatible = true;
+        }
+        LOG.error("PersistIndexType INMEMORY can not be used in production");
+        break;
+      case HBASE:
+        if (writeIndexType.equals(IndexType.HBASE)) {

Review comment:
       if an user is using HBASE then, they could switch to BLOOM, SIMPLE, 
GLOBAL_BLOOM etc actually 

##########
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndexUtils.java
##########
@@ -85,4 +91,61 @@ public static HoodieRecord getTaggedRecord(HoodieRecord 
inputRecord, Option<Hood
     }
     return record;
   }
+
+  /**
+   * Check compatible between new writeIndexType and indexType already in 
hoodie.properties.
+   * @param writeIndexType new indexType
+   * @param persistIndexType indexType already in hoodie.properties
+   */
+  public static void checkIndexTypeCompatible(IndexType writeIndexType, 
IndexType persistIndexType) {
+    boolean isTypeCompatible = false;
+    switch (persistIndexType) {
+      case GLOBAL_BLOOM:
+        if (writeIndexType.equals(IndexType.GLOBAL_BLOOM)
+            || writeIndexType.equals(IndexType.BLOOM)
+            || writeIndexType.equals(IndexType.SIMPLE)
+            || writeIndexType.equals(IndexType.GLOBAL_SIMPLE)) {
+          isTypeCompatible = true;
+        }
+        break;
+      case GLOBAL_SIMPLE:
+        if (writeIndexType.equals(IndexType.GLOBAL_SIMPLE)
+            || writeIndexType.equals(IndexType.GLOBAL_BLOOM)) {
+          isTypeCompatible = true;
+        }
+        break;
+      case SIMPLE:
+        if (writeIndexType.equals(IndexType.SIMPLE)) {

Review comment:
       SIMPLE is compatible with BLOOM and vice versa

##########
File path: hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
##########
@@ -108,11 +109,13 @@ private[hudi] object HoodieSparkSqlWriter {
       handleSaveModes(mode, basePath, tableConfig, tblName, operation, fs)
       // Create the table if not present
       if (!tableExists) {
-        val archiveLogFolder = parameters.getOrElse(
-          HoodieTableConfig.HOODIE_ARCHIVELOG_FOLDER_PROP_NAME, "archived")
+        val archiveLogFolder = 
parameters.get(HoodieTableConfig.HOODIE_ARCHIVELOG_FOLDER_PROP_NAME).get
+        val hoodieWriteConfig = 
DataSourceUtils.createHoodieConfig(Schema.create(Schema.Type.NULL).toString, 
path.get, tblName, mapAsJavaMap(parameters))
+        val index = SparkHoodieIndex.createIndex(hoodieWriteConfig)

Review comment:
       this is problematic. We are initializing an `index`, just for this. We 
should refrain from accessing anything beyond WriteClient at this level. 

##########
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndexUtils.java
##########
@@ -85,4 +91,61 @@ public static HoodieRecord getTaggedRecord(HoodieRecord 
inputRecord, Option<Hood
     }
     return record;
   }
+
+  /**
+   * Check compatible between new writeIndexType and indexType already in 
hoodie.properties.
+   * @param writeIndexType new indexType
+   * @param persistIndexType indexType already in hoodie.properties
+   */
+  public static void checkIndexTypeCompatible(IndexType writeIndexType, 
IndexType persistIndexType) {
+    boolean isTypeCompatible = false;
+    switch (persistIndexType) {
+      case GLOBAL_BLOOM:
+        if (writeIndexType.equals(IndexType.GLOBAL_BLOOM)
+            || writeIndexType.equals(IndexType.BLOOM)
+            || writeIndexType.equals(IndexType.SIMPLE)
+            || writeIndexType.equals(IndexType.GLOBAL_SIMPLE)) {
+          isTypeCompatible = true;
+        }
+        break;
+      case GLOBAL_SIMPLE:
+        if (writeIndexType.equals(IndexType.GLOBAL_SIMPLE)

Review comment:
       this should be technically be compatible with BLOOM and SIMPLE as well 
right. 

##########
File path: hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
##########
@@ -255,11 +257,13 @@ private[hudi] object HoodieSparkSqlWriter {
     }
 
     if (!tableExists) {
-      val archiveLogFolder = parameters.getOrElse(
-        HoodieTableConfig.HOODIE_ARCHIVELOG_FOLDER_PROP_NAME, "archived")
+      val archiveLogFolder = 
parameters.get(HoodieTableConfig.HOODIE_ARCHIVELOG_FOLDER_PROP_NAME).get

Review comment:
       can we back out the changes that are not strictly needed for this PR. 
(same comment on block above) 

##########
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/BootstrapExecutor.java
##########
@@ -170,10 +170,11 @@ private void initializeTable() throws IOException {
       throw new HoodieException("target base path already exists at " + 
cfg.targetBasePath
           + ". Cannot bootstrap data on top of an existing table");
     }
-
+    SparkHoodieIndex index = SparkHoodieIndex.createIndex(bootstrapConfig);

Review comment:
       same comment. can we avoid creating the index object here? 

##########
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java
##########
@@ -235,8 +239,11 @@ public void refreshTimeline() throws IOException {
       }
     } else {
       this.commitTimelineOpt = Option.empty();
+      SparkHoodieIndex index = 
SparkHoodieIndex.createIndex(this.getHoodieClientConfig((Schema) null));

Review comment:
       same comment.  

##########
File path: hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
##########
@@ -108,11 +109,13 @@ private[hudi] object HoodieSparkSqlWriter {
       handleSaveModes(mode, basePath, tableConfig, tblName, operation, fs)
       // Create the table if not present
       if (!tableExists) {
-        val archiveLogFolder = parameters.getOrElse(
-          HoodieTableConfig.HOODIE_ARCHIVELOG_FOLDER_PROP_NAME, "archived")
+        val archiveLogFolder = 
parameters.get(HoodieTableConfig.HOODIE_ARCHIVELOG_FOLDER_PROP_NAME).get
+        val hoodieWriteConfig = 
DataSourceUtils.createHoodieConfig(Schema.create(Schema.Type.NULL).toString, 
path.get, tblName, mapAsJavaMap(parameters))
+        val index = SparkHoodieIndex.createIndex(hoodieWriteConfig)

Review comment:
       If your goal was to just validate by turning the config to enum to 
string, then can we validate at the WriteConfig builder level? That will leave 
this code very clean. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to