[GitHub] [hudi] nsivabalan commented on a diff in pull request #7488: [HUDI-5403] Turn off metadata-table-based file listing in BaseHoodieTableFileIndex

2022-12-16 Thread GitBox


nsivabalan commented on code in PR #7488:
URL: https://github.com/apache/hudi/pull/7488#discussion_r1051326658


##
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java:
##
@@ -111,6 +111,18 @@ public class HoodieTableMetadataUtil {
   public static final String PARTITION_NAME_COLUMN_STATS = "column_stats";
   public static final String PARTITION_NAME_BLOOM_FILTERS = "bloom_filters";
 
+  /**
+   * Returns whether the files partition of metadata table is ready for read.
+   *
+   * @param metaClient {@link HoodieTableMetaClient} instance.
+   * @return true if the files partition of metadata table is ready for read,
+   * based on the table config; false otherwise.
+   */
+  public static boolean isFilesPartitionAvailable(HoodieTableMetaClient 
metaClient) {

Review Comment:
   something to be wary about. our new readers may not work w/ tables from 
older version as we are relying on tableConfig. no changes required per se. 
just something to keep in mind



##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieFileIndex.scala:
##
@@ -251,8 +252,8 @@ case class HoodieFileIndex(spark: SparkSession,
 
   override def sizeInBytes: Long = getTotalCachedFilesSize
 
-  private def isDataSkippingEnabled: Boolean = 
HoodieSparkConfUtils.getBooleanConfigValue(
-options, spark.sessionState.conf, 
DataSourceReadOptions.ENABLE_DATA_SKIPPING.key(), false)
+  private def isDataSkippingEnabled: Boolean = getConfigValue(options, 
spark.sessionState.conf,
+DataSourceReadOptions.ENABLE_DATA_SKIPPING.key(), "false").toBoolean

Review Comment:
   shouldn't we do DataSourceReadOptions.ENABLE_DATA_SKIPPING.defaultValue() 
instead of hard coding "false"



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a diff in pull request #7488: [HUDI-5403] Turn off metadata-table-based file listing in BaseHoodieTableFileIndex

2022-12-16 Thread GitBox


nsivabalan commented on code in PR #7488:
URL: https://github.com/apache/hudi/pull/7488#discussion_r1050971882


##
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieCopyOnWriteTableInputFormat.java:
##
@@ -246,18 +258,38 @@ private List 
listStatusForSnapshotMode(JobConf job,
   queryCommitInstant,
   shouldIncludePendingCommits);
 
-  Map> partitionedFileSlices = 
fileIndex.listFileSlices();
-
-  Option virtualKeyInfoOpt = 
getHoodieVirtualKeyInfo(tableMetaClient);
-
-  targetFiles.addAll(
-  partitionedFileSlices.values()
-  .stream()
-  .flatMap(Collection::stream)
-  .filter(fileSlice -> checkIfValidFileSlice(fileSlice))
-  .map(fileSlice -> createFileStatusUnchecked(fileSlice, 
fileIndex, virtualKeyInfoOpt))
-  .collect(Collectors.toList())
-  );
+Map> partitionedFileSlices = 
fileIndex.listFileSlices();
+
+Option virtualKeyInfoOpt = 
getHoodieVirtualKeyInfo(tableMetaClient);
+
+targetFiles.addAll(partitionedFileSlices.values()
+.stream()
+.flatMap(Collection::stream)
+.filter(fileSlice -> checkIfValidFileSlice(fileSlice))
+.map(fileSlice -> createFileStatusUnchecked(fileSlice, fileIndex, 
virtualKeyInfoOpt))
+.collect(Collectors.toList())
+);
+  } else {
+LOG.info(">>> Inside non-metadata path");

Review Comment:
   please fix the logging



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org