[GitHub] [hudi] xicm commented on a diff in pull request #8911: [Hudi-8882] Compatible with versions below hive 2.3 to read hudi rt table

2023-06-14 Thread via GitHub


xicm commented on code in PR #8911:
URL: https://github.com/apache/hudi/pull/8911#discussion_r1229102825


##
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/shims/HiveCompatibleShim.java:
##
@@ -0,0 +1,35 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.hadoop.utils.shims;
+
+import java.io.IOException;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.ql.io.HiveInputFormat;
+import org.apache.hadoop.mapred.JobConf;
+
+/**
+ * Shim class to resolve Hive version compatibility.
+ */
+public interface HiveCompatibleShim {

Review Comment:
   The differences are between 2.3 and <2.3, HiveShim is between hive2 and 
hive3, so we may not merge it into.
   And I think  the name `HudiHiveInputformatShim` is better.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xicm commented on a diff in pull request #8911: [Hudi-8882] Compatible with versions below hive 2.3 to read hudi rt table

2023-06-14 Thread via GitHub


xicm commented on code in PR #8911:
URL: https://github.com/apache/hudi/pull/8911#discussion_r1229102825


##
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/shims/HiveCompatibleShim.java:
##
@@ -0,0 +1,35 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.hadoop.utils.shims;
+
+import java.io.IOException;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.ql.io.HiveInputFormat;
+import org.apache.hadoop.mapred.JobConf;
+
+/**
+ * Shim class to resolve Hive version compatibility.
+ */
+public interface HiveCompatibleShim {

Review Comment:
   The differences are between 2.3 and <2.3, HiveShim is between hive2 and 
hive3, so we may not merge it into.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xicm commented on a diff in pull request #8911: [Hudi-8882] Compatible with versions below hive 2.3 to read hudi rt table

2023-06-14 Thread via GitHub


xicm commented on code in PR #8911:
URL: https://github.com/apache/hudi/pull/8911#discussion_r1229100105


##
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/shims/HudiHiveInputformatShimAbove23.java:
##
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.hadoop.utils.shims;
+
+import java.io.IOException;
+import java.lang.reflect.Method;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.ql.io.HiveInputFormat;
+import org.apache.hadoop.mapred.JobConf;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * When the hive version is greater than or equal to hive version 2.3
+ */
+public class HudiHiveInputformatShimAbove23 implements HiveCompatibleShim {
+
+  public static final Logger LOG = 
LoggerFactory.getLogger(HudiHiveInputformatShimAbove23.class);
+
+  private static HudiHiveInputformatShimAbove23 INSTANCE = new 
HudiHiveInputformatShimAbove23();
+  public static HiveCompatibleShim getInstance() {
+return INSTANCE;
+  }
+
+  private static Method PUSH_PROJECT_METHOD;
+
+  static {
+try {
+  PUSH_PROJECT_METHOD = 
HiveInputFormat.class.getDeclaredMethod("pushProjectionsAndFilters", 
JobConf.class,
+  Class.class, Path.class);

Review Comment:
   Do we need to support -Dhive.version=2.2.x? If not, we don't need reflection 
here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xicm commented on a diff in pull request #8911: [Hudi-8882] Compatible with versions below hive 2.3 to read hudi rt table

2023-06-13 Thread via GitHub


xicm commented on code in PR #8911:
URL: https://github.com/apache/hudi/pull/8911#discussion_r1227578284


##
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/shims/HiveCompatibleShimBelow23.java:
##
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.hadoop.utils.shims;
+
+import java.io.IOException;
+import java.lang.reflect.Method;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.JobConf;
+
+/**
+ * When the hive version is less than hive version 2.3
+ */
+public class HiveCompatibleShimBelow23 implements HiveCompatibleShim {
+
+  private static HiveCompatibleShimBelow23 INSTANCE = new 
HiveCompatibleShimBelow23();
+
+  public static HiveCompatibleShim getInstance() {
+return INSTANCE;
+  }
+
+  @Override
+  public void invokePushProjectAndFilters(JobConf job, Class 
inputFormatClass, Path splitPath)
+  throws IOException {
+try {
+  Class hiveInputFormatClass = this.getClass().getSuperclass();

Review Comment:
   This may not work.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xicm commented on a diff in pull request #8911: [Hudi-8882] Compatible with versions below hive 2.3 to read hudi rt table

2023-06-13 Thread via GitHub


xicm commented on code in PR #8911:
URL: https://github.com/apache/hudi/pull/8911#discussion_r1227577203


##
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/shims/HiveCompatibleShimAbove23.java:
##
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.hadoop.utils.shims;
+
+import java.io.IOException;
+import java.lang.reflect.Method;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.JobConf;
+
+/**
+ * When the hive version is greater than or equal to hive version 2.3
+ */
+public class HiveCompatibleShimAbove23 implements HiveCompatibleShim {
+
+  private static HiveCompatibleShimAbove23 INSTANCE = new 
HiveCompatibleShimAbove23();
+
+  public static HiveCompatibleShim getInstance() {
+return INSTANCE;
+  }
+
+  @Override
+  public void invokePushProjectAndFilters(JobConf job, Class 
inputFormatClass, Path splitPath)
+  throws IOException {
+try {
+  Class hiveInputFormatClass = this.getClass().getSuperclass();
+  Method pushProjectionsAndFilters = 
hiveInputFormatClass.getDeclaredMethod("pushProjectionsAndFilters", 
JobConf.class,

Review Comment:
   Oh, sorry, I know your concern, can this be compiled with 
-Dhive.version=2.2.0



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xicm commented on a diff in pull request #8911: [Hudi-8882] Compatible with versions below hive 2.3 to read hudi rt table

2023-06-12 Thread via GitHub


xicm commented on code in PR #8911:
URL: https://github.com/apache/hudi/pull/8911#discussion_r1227550989


##
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/shims/HiveCompatibleShimBelow23.java:
##
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.hadoop.utils.shims;
+
+import java.io.IOException;
+import java.lang.reflect.Method;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.JobConf;
+
+/**
+ * When the hive version is less than hive version 2.3
+ */
+public class HiveCompatibleShimBelow23 implements HiveCompatibleShim {
+
+  private static HiveCompatibleShimBelow23 INSTANCE = new 
HiveCompatibleShimBelow23();
+
+  public static HiveCompatibleShim getInstance() {
+return INSTANCE;
+  }
+
+  @Override
+  public void invokePushProjectAndFilters(JobConf job, Class 
inputFormatClass, Path splitPath)
+  throws IOException {
+try {
+  Class hiveInputFormatClass = this.getClass().getSuperclass();
+  Method pushProjectionsAndFilters = 
hiveInputFormatClass.getDeclaredMethod("pushProjectionsAndFilters", 
JobConf.class,

Review Comment:
   inputFormatClass. getDeclaredMethod()



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xicm commented on a diff in pull request #8911: [Hudi-8882] Compatible with versions below hive 2.3 to read hudi rt table

2023-06-12 Thread via GitHub


xicm commented on code in PR #8911:
URL: https://github.com/apache/hudi/pull/8911#discussion_r1227550752


##
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/shims/Hive3Shim.java:
##
@@ -122,4 +126,16 @@ public long getMills(Object timestamp) {
   throw new HoodieException("can not create writable v2 class!", e);
 }
   }
+
+  @Override
+  public PartitionDesc getPartitionDesc(Map 
pathToPartitionInfo, Path dir,
+  Map, Map> cacheMap) throws 
IOException {
+try {
+  Class hiveUtilsClass = 
Class.forName("org.apache.hadoop.hive.ql.io.HiveFileFormatUtils");
+  Method method = hiveUtilsClass.getMethod("getFromPathRecursively", 
Map.class, Path.class, Map.class);

Review Comment:
   Sorry for the delay, do reflection in static block. 



##
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/shims/HiveCompatibleShimAbove23.java:
##
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.hadoop.utils.shims;
+
+import java.io.IOException;
+import java.lang.reflect.Method;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.JobConf;
+
+/**
+ * When the hive version is greater than or equal to hive version 2.3
+ */
+public class HiveCompatibleShimAbove23 implements HiveCompatibleShim {
+
+  private static HiveCompatibleShimAbove23 INSTANCE = new 
HiveCompatibleShimAbove23();
+
+  public static HiveCompatibleShim getInstance() {
+return INSTANCE;
+  }
+
+  @Override
+  public void invokePushProjectAndFilters(JobConf job, Class 
inputFormatClass, Path splitPath)
+  throws IOException {
+try {
+  Class hiveInputFormatClass = this.getClass().getSuperclass();

Review Comment:
   is this right?



##
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/shims/HiveCompatibleShimAbove23.java:
##
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.hadoop.utils.shims;
+
+import java.io.IOException;
+import java.lang.reflect.Method;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.JobConf;
+
+/**
+ * When the hive version is greater than or equal to hive version 2.3
+ */
+public class HiveCompatibleShimAbove23 implements HiveCompatibleShim {
+
+  private static HiveCompatibleShimAbove23 INSTANCE = new 
HiveCompatibleShimAbove23();
+
+  public static HiveCompatibleShim getInstance() {
+return INSTANCE;
+  }
+
+  @Override
+  public void invokePushProjectAndFilters(JobConf job, Class 
inputFormatClass, Path splitPath)
+  throws IOException {
+try {
+  Class hiveInputFormatClass = this.getClass().getSuperclass();
+  Method pushProjectionsAndFilters = 
hiveInputFormatClass.getDeclaredMethod("pushProjectionsAndFilters", 
JobConf.class,

Review Comment:
   The default version of hive is 2.3, can we extends `HiveInputFormat` and 
call `pushProjectionsAndFilters ` directly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xicm commented on a diff in pull request #8911: [Hudi-8882] Compatible with versions below hive 2.3 to read hudi rt table

2023-06-12 Thread via GitHub


xicm commented on code in PR #8911:
URL: https://github.com/apache/hudi/pull/8911#discussion_r1227456677


##
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/shims/Hive2Shim.java:
##
@@ -53,4 +59,17 @@ public int getDays(Object dateWritable) {
   public long getMills(Object timestamp) {
 return ((Timestamp) timestamp).getTime();
   }
+
+  @Override
+  public PartitionDesc getPartitionDesc(Map 
pathToPartitionInfo, Path dir,
+  Map, Map> cacheMap) throws 
IOException {
+
+try {
+  Class hiveUtilsClass = 
Class.forName("org.apache.hudi.hadoop.utils.HudiHiveFileFormatUtils");

Review Comment:
   Do we need reflection here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xicm commented on a diff in pull request #8911: [Hudi-8882] Compatible with versions below hive 2.3 to read hudi rt table

2023-06-12 Thread via GitHub


xicm commented on code in PR #8911:
URL: https://github.com/apache/hudi/pull/8911#discussion_r1227427696


##
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/hive/HoodieCombineHiveInputFormat.java:
##
@@ -562,8 +568,35 @@ public RecordReader getRecordReader(InputSplit split, 
JobConf job, Reporter repo
 }
   }
 
+  private void invokePushProjectAndFilters(JobConf job, Class 
inputFormatClass, Path splitPath)

Review Comment:
   yes, <2.3 and 2.3+
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xicm commented on a diff in pull request #8911: [Hudi-8882] Compatible with versions below hive 2.3 to read hudi rt table

2023-06-12 Thread via GitHub


xicm commented on code in PR #8911:
URL: https://github.com/apache/hudi/pull/8911#discussion_r1226365144


##
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/hive/HoodieCombineHiveInputFormat.java:
##
@@ -562,8 +568,35 @@ public RecordReader getRecordReader(InputSplit split, 
JobConf job, Reporter repo
 }
   }
 
+  private void invokePushProjectAndFilters(JobConf job, Class 
inputFormatClass, Path splitPath)

Review Comment:
   We can add a new shim class. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xicm commented on a diff in pull request #8911: [Hudi-8882] Compatible with versions below hive 2.3 to read hudi rt table

2023-06-12 Thread via GitHub


xicm commented on code in PR #8911:
URL: https://github.com/apache/hudi/pull/8911#discussion_r1226162058


##
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/hive/HoodieCombineHiveInputFormat.java:
##
@@ -562,8 +568,35 @@ public RecordReader getRecordReader(InputSplit split, 
JobConf job, Reporter repo
 }
   }
 
+  private void invokePushProjectAndFilters(JobConf job, Class 
inputFormatClass, Path splitPath)

Review Comment:
   ditto



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xicm commented on a diff in pull request #8911: [Hudi-8882] Compatible with versions below hive 2.3 to read hudi rt table

2023-06-12 Thread via GitHub


xicm commented on code in PR #8911:
URL: https://github.com/apache/hudi/pull/8911#discussion_r1226161654


##
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/hive/HoodieCombineHiveInputFormat.java:
##
@@ -401,13 +407,14 @@ private static PartitionDesc 
getPartitionFromPath(Map pathT
   throws IOException {
 Method method;
 try {
-  Class hiveUtilsClass = 
Class.forName("org.apache.hadoop.hive.ql.io.HiveFileFormatUtils");
+  Class hiveUtilsClass = 
Class.forName("org.apache.hudi.hadoop.utils.HudiHiveFileFormatUtils");

Review Comment:
   Looks good to me, how about adding a shim class to handle differences 
between versions? like we do in `org.apache.hudi.hadoop.utils.shims`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xicm commented on a diff in pull request #8911: [Hudi-8882] Compatible with versions below hive 2.3 to read hudi rt table

2023-06-10 Thread via GitHub


xicm commented on code in PR #8911:
URL: https://github.com/apache/hudi/pull/8911#discussion_r1224098760


##
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/hive/HoodieCombineHiveInputFormat.java:
##
@@ -549,21 +567,48 @@ public RecordReader getRecordReader(InputSplit split, 
JobConf job, Reporter repo
   throw new IOException("cannot find class " + inputFormatClassName);
 }
 
-pushProjectionsAndFilters(job, inputFormatClass, hsplit.getPath(0));
-
+invokePushProjectAndFilters(job, inputFormatClass, hsplit.getPath(0));
 if 
(inputFormatClass.getName().equals(getParquetRealtimeInputFormatClassName())) {
   HoodieCombineFileInputFormatShim shims = createInputFormatShim();
   IOContextMap.get(job).setInputPath(((CombineHiveInputSplit) 
split).getPath(0));
   return shims.getRecordReader(job, ((CombineHiveInputSplit) 
split).getInputSplitShim(),
   reporter, CombineHiveRecordReader.class);
 } else {
-  return 
ShimLoader.getHadoopShims().getCombineFileInputFormat().getRecordReader(job, 
(CombineFileSplit) split,
-  reporter, CombineHiveRecordReader.class);
+  return ShimLoader.getHadoopShims().getCombineFileInputFormat()
+  .getRecordReader(job, (CombineFileSplit) split,
+  reporter, CombineHiveRecordReader.class);
+}
+  }
+
+  private void invokePushProjectAndFilters(JobConf job, Class 
inputFormatClass, Path splitPath)
+  throws IOException {

Review Comment:
   @danny0405 Is it better to add a shim class to do 
`pushProjectionsAndFilters`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xicm commented on a diff in pull request #8911: [Hudi-8882] Compatible with versions below hive 2.3 to read hudi rt table

2023-06-09 Thread via GitHub


xicm commented on code in PR #8911:
URL: https://github.com/apache/hudi/pull/8911#discussion_r1224098760


##
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/hive/HoodieCombineHiveInputFormat.java:
##
@@ -549,21 +567,48 @@ public RecordReader getRecordReader(InputSplit split, 
JobConf job, Reporter repo
   throw new IOException("cannot find class " + inputFormatClassName);
 }
 
-pushProjectionsAndFilters(job, inputFormatClass, hsplit.getPath(0));
-
+invokePushProjectAndFilters(job, inputFormatClass, hsplit.getPath(0));
 if 
(inputFormatClass.getName().equals(getParquetRealtimeInputFormatClassName())) {
   HoodieCombineFileInputFormatShim shims = createInputFormatShim();
   IOContextMap.get(job).setInputPath(((CombineHiveInputSplit) 
split).getPath(0));
   return shims.getRecordReader(job, ((CombineHiveInputSplit) 
split).getInputSplitShim(),
   reporter, CombineHiveRecordReader.class);
 } else {
-  return 
ShimLoader.getHadoopShims().getCombineFileInputFormat().getRecordReader(job, 
(CombineFileSplit) split,
-  reporter, CombineHiveRecordReader.class);
+  return ShimLoader.getHadoopShims().getCombineFileInputFormat()
+  .getRecordReader(job, (CombineFileSplit) split,
+  reporter, CombineHiveRecordReader.class);
+}
+  }
+
+  private void invokePushProjectAndFilters(JobConf job, Class 
inputFormatClass, Path splitPath)
+  throws IOException {

Review Comment:
   @danny0405 Is it will be better if we add a shim class to do 
`pushProjectionsAndFilters`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xicm commented on a diff in pull request #8911: [Hudi-8882] Compatible with versions below hive 2.3 to read hudi rt table

2023-06-09 Thread via GitHub


xicm commented on code in PR #8911:
URL: https://github.com/apache/hudi/pull/8911#discussion_r1224091960


##
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/hive/HoodieCombineHiveInputFormat.java:
##
@@ -390,24 +401,28 @@ private void processPaths(JobConf job, 
CombineFileInputFormatShim combine, List<
   /**
* HiveFileFormatUtils.getPartitionDescFromPathRecursively is no longer 
available since Hive 3.
* This method is to make it compatible with both Hive 2 and Hive 3.
+   *
* @param pathToPartitionInfo
* @param dir
* @param cacheMap
* @return
* @throws IOException
*/
-  private static PartitionDesc getPartitionFromPath(Map 
pathToPartitionInfo, Path dir,
+  private static PartitionDesc getPartitionFromPath(Map 
pathToPartitionInfo,
+  Path dir,
   Map, Map> cacheMap)
   throws IOException {
 Method method;
 try {
-  Class hiveUtilsClass = 
Class.forName("org.apache.hadoop.hive.ql.io.HiveFileFormatUtils");
+  Class hiveUtilsClass = 
Class.forName("org.apache.hudi.hadoop.utils.HudiHiveFileFormatUtils");
   try {
 // HiveFileFormatUtils.getPartitionDescFromPathRecursively method only 
available in Hive 2.x
-method = 
hiveUtilsClass.getMethod("getPartitionDescFromPathRecursively", Map.class, 
Path.class, Map.class);
+method = hiveUtilsClass
+.getMethod("getPartitionDescFromPathRecursively", Map.class, 
Path.class, Map.class);
   } catch (NoSuchMethodException e) {
 // HiveFileFormatUtils.getFromPathRecursively method only available in 
Hive 3.x
-method = hiveUtilsClass.getMethod("getFromPathRecursively", Map.class, 
Path.class, Map.class);
+method = hiveUtilsClass
+.getMethod("getFromPathRecursively", Map.class, Path.class, 
Map.class);

Review Comment:
   Need to test if this works with hive3.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xicm commented on a diff in pull request #8911: [Hudi-8882] Compatible with versions below hive 2.3 to read hudi rt table

2023-06-09 Thread via GitHub


xicm commented on code in PR #8911:
URL: https://github.com/apache/hudi/pull/8911#discussion_r1224076980


##
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HiveCompatibleUtils.java:
##
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.hadoop.utils;
+
+import java.util.LinkedHashMap;
+import java.util.Map;
+import java.util.Map.Entry;
+import java.util.Set;
+import java.util.function.Function;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.ql.plan.PartitionDesc;
+
+public class HiveCompatibleUtils {
+
+  public static  Map, Map> 
convertPartitionDesc(

Review Comment:
   I guess the difference between 2.3 and 2.2 is partition type is `String` or 
`Path`? Could you add some comment to the method?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xicm commented on a diff in pull request #8911: [Hudi-8882] Compatible with versions below hive 2.3 to read hudi rt table

2023-06-09 Thread via GitHub


xicm commented on code in PR #8911:
URL: https://github.com/apache/hudi/pull/8911#discussion_r1224074166


##
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HiveCompatibleUtils.java:
##
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.hadoop.utils;
+
+import java.util.LinkedHashMap;
+import java.util.Map;
+import java.util.Map.Entry;
+import java.util.Set;
+import java.util.function.Function;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.ql.plan.PartitionDesc;
+
+public class HiveCompatibleUtils {
+
+  public static  Map, Map> 
convertPartitionDesc(
+  Map, Map> oldMap) {
+Map, Map> resMap = new 
LinkedHashMap<>();
+Set, Map>> entries = 
oldMap.entrySet();
+for (Entry, Map> entry : entries) {
+  Map newKeyMap = convertMapKeyToPath(entry.getKey());
+  Map newValueMap = 
convertMapKeyToPath(entry.getValue());
+  resMap.put(newKeyMap, newValueMap);
+}
+return resMap;
+  }
+
+  public static  Map convertMapKeyToPath(Map 
pathToAliases) {

Review Comment:
   redundant `T`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xicm commented on a diff in pull request #8911: [Hudi-8882] Compatible with versions below hive 2.3 to read hudi rt table

2023-06-09 Thread via GitHub


xicm commented on code in PR #8911:
URL: https://github.com/apache/hudi/pull/8911#discussion_r1224058977


##
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/hive/HoodieCombineHiveInputFormat.java:
##
@@ -390,24 +401,28 @@ private void processPaths(JobConf job, 
CombineFileInputFormatShim combine, List<
   /**
* HiveFileFormatUtils.getPartitionDescFromPathRecursively is no longer 
available since Hive 3.
* This method is to make it compatible with both Hive 2 and Hive 3.
+   *
* @param pathToPartitionInfo
* @param dir
* @param cacheMap
* @return
* @throws IOException
*/
-  private static PartitionDesc getPartitionFromPath(Map 
pathToPartitionInfo, Path dir,
+  private static PartitionDesc getPartitionFromPath(Map 
pathToPartitionInfo,
+  Path dir,
   Map, Map> cacheMap)
   throws IOException {
 Method method;
 try {
-  Class hiveUtilsClass = 
Class.forName("org.apache.hadoop.hive.ql.io.HiveFileFormatUtils");
+  Class hiveUtilsClass = 
Class.forName("org.apache.hudi.hadoop.utils.HudiHiveFileFormatUtils");
   try {
 // HiveFileFormatUtils.getPartitionDescFromPathRecursively method only 
available in Hive 2.x
-method = 
hiveUtilsClass.getMethod("getPartitionDescFromPathRecursively", Map.class, 
Path.class, Map.class);
+method = hiveUtilsClass
+.getMethod("getPartitionDescFromPathRecursively", Map.class, 
Path.class, Map.class);
   } catch (NoSuchMethodException e) {
 // HiveFileFormatUtils.getFromPathRecursively method only available in 
Hive 3.x
-method = hiveUtilsClass.getMethod("getFromPathRecursively", Map.class, 
Path.class, Map.class);
+method = hiveUtilsClass
+.getMethod("getFromPathRecursively", Map.class, Path.class, 
Map.class);

Review Comment:
   `hiveUtilsClass` is `HudiHiveFileFormatUtils` now, is the reflection working?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xicm commented on a diff in pull request #8911: [Hudi-8882] Compatible with versions below hive 2.3 to read hudi rt table

2023-06-09 Thread via GitHub


xicm commented on code in PR #8911:
URL: https://github.com/apache/hudi/pull/8911#discussion_r1224058647


##
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/hive/HoodieCombineHiveInputFormat.java:
##
@@ -64,39 +76,29 @@
 import org.apache.hadoop.mapred.lib.CombineFileInputFormat;
 import org.apache.hadoop.mapred.lib.CombineFileSplit;
 import org.apache.hadoop.mapreduce.JobContext;
+import org.apache.hive.common.util.HiveVersionInfo;
+import org.apache.hudi.common.util.ReflectionUtils;
+import org.apache.hudi.common.util.ValidationUtils;
+import org.apache.hudi.hadoop.HoodieParquetInputFormat;
+import org.apache.hudi.hadoop.HoodieParquetInputFormatBase;
+import org.apache.hudi.hadoop.realtime.HoodieCombineRealtimeRecordReader;
+import org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat;
+import org.apache.hudi.hadoop.utils.HoodieInputFormatUtils;
+import org.apache.hudi.hadoop.utils.HudiHiveFileFormatUtils;
+import org.apache.hudi.hadoop.utils.HudiStringInternUtils;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
-import java.io.DataInput;
-import java.io.DataOutput;
-import java.io.IOException;
-import java.lang.reflect.Method;
-import java.util.ArrayList;
-import java.util.Arrays;
-import java.util.Collections;
-import java.util.HashMap;
-import java.util.HashSet;
-import java.util.LinkedList;
-import java.util.List;
-import java.util.Map;
-import java.util.Objects;
-import java.util.Set;
-import java.util.concurrent.Callable;
-import java.util.concurrent.ExecutionException;
-import java.util.concurrent.ExecutorService;
-import java.util.concurrent.Executors;
-import java.util.concurrent.Future;

Review Comment:
   Just add the package you added.



##
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/hive/HoodieCombineHiveInputFormat.java:
##
@@ -121,13 +123,15 @@ protected HoodieCombineFileInputFormatShim 
createInputFormatShim() {
 return new 
HoodieCombineHiveInputFormat.HoodieCombineFileInputFormatShim<>();
   }
 
+
   /**
* Create Hive splits based on CombineFileSplit.
*/
-  private InputSplit[] getCombineSplits(JobConf job, int numSplits, Map pathToPartitionInfo)
+  private InputSplit[] getCombineSplits(JobConf job, int numSplits,
+  Map pathToPartitionInfo)

Review Comment:
   Is this to fix checkStyle?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org