[ https://issues.apache.org/jira/browse/HUDI-4586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ethan Guo updated HUDI-4586: ---------------------------- Description: For partitioned table, there are significant number of S3 requests timeout causing the upserts to fail when using Bloom Index with metadata table. {code:java} Load meta index key ranges for file slices: hudi collect at HoodieSparkEngineContext.java:137+details org.apache.spark.api.java.AbstractJavaRDDLike.collect(JavaRDDLike.scala:45) org.apache.hudi.client.common.HoodieSparkEngineContext.flatMap(HoodieSparkEngineContext.java:137) org.apache.hudi.index.bloom.HoodieBloomIndex.loadColumnRangesFromMetaIndex(HoodieBloomIndex.java:213) org.apache.hudi.index.bloom.HoodieBloomIndex.getBloomIndexFileInfoForPartitions(HoodieBloomIndex.java:145) org.apache.hudi.index.bloom.HoodieBloomIndex.lookupIndex(HoodieBloomIndex.java:123) org.apache.hudi.index.bloom.HoodieBloomIndex.tagLocation(HoodieBloomIndex.java:89) org.apache.hudi.table.action.commit.HoodieWriteHelper.tag(HoodieWriteHelper.java:49) org.apache.hudi.table.action.commit.HoodieWriteHelper.tag(HoodieWriteHelper.java:32) org.apache.hudi.table.action.commit.BaseWriteHelper.write(BaseWriteHelper.java:53) org.apache.hudi.table.action.commit.SparkUpsertCommitActionExecutor.execute(SparkUpsertCommitActionExecutor.java:45) org.apache.hudi.table.HoodieSparkCopyOnWriteTable.upsert(HoodieSparkCopyOnWriteTable.java:113) org.apache.hudi.table.HoodieSparkCopyOnWriteTable.upsert(HoodieSparkCopyOnWriteTable.java:97) org.apache.hudi.client.SparkRDDWriteClient.upsert(SparkRDDWriteClient.java:155) org.apache.hudi.DataSourceUtils.doWriteOperation(DataSourceUtils.java:206) org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:329) org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:183) org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75) org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73) org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84) {code} was:For partitioned table, there are significant number of S3 requests timeout causing the upserts to fail when using Bloom Index with metadata table. > Address S3 timeouts in Bloom Index with metadata table > ------------------------------------------------------ > > Key: HUDI-4586 > URL: https://issues.apache.org/jira/browse/HUDI-4586 > Project: Apache Hudi > Issue Type: Improvement > Reporter: Ethan Guo > Assignee: Ethan Guo > Priority: Blocker > Fix For: 0.13.0 > > Attachments: Screen Shot 2022-08-15 at 17.39.01.png > > > For partitioned table, there are significant number of S3 requests timeout > causing the upserts to fail when using Bloom Index with metadata table. > {code:java} > Load meta index key ranges for file slices: hudi > collect at HoodieSparkEngineContext.java:137+details > org.apache.spark.api.java.AbstractJavaRDDLike.collect(JavaRDDLike.scala:45) > org.apache.hudi.client.common.HoodieSparkEngineContext.flatMap(HoodieSparkEngineContext.java:137) > org.apache.hudi.index.bloom.HoodieBloomIndex.loadColumnRangesFromMetaIndex(HoodieBloomIndex.java:213) > org.apache.hudi.index.bloom.HoodieBloomIndex.getBloomIndexFileInfoForPartitions(HoodieBloomIndex.java:145) > org.apache.hudi.index.bloom.HoodieBloomIndex.lookupIndex(HoodieBloomIndex.java:123) > org.apache.hudi.index.bloom.HoodieBloomIndex.tagLocation(HoodieBloomIndex.java:89) > org.apache.hudi.table.action.commit.HoodieWriteHelper.tag(HoodieWriteHelper.java:49) > org.apache.hudi.table.action.commit.HoodieWriteHelper.tag(HoodieWriteHelper.java:32) > org.apache.hudi.table.action.commit.BaseWriteHelper.write(BaseWriteHelper.java:53) > org.apache.hudi.table.action.commit.SparkUpsertCommitActionExecutor.execute(SparkUpsertCommitActionExecutor.java:45) > org.apache.hudi.table.HoodieSparkCopyOnWriteTable.upsert(HoodieSparkCopyOnWriteTable.java:113) > org.apache.hudi.table.HoodieSparkCopyOnWriteTable.upsert(HoodieSparkCopyOnWriteTable.java:97) > org.apache.hudi.client.SparkRDDWriteClient.upsert(SparkRDDWriteClient.java:155) > org.apache.hudi.DataSourceUtils.doWriteOperation(DataSourceUtils.java:206) > org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:329) > org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:183) > org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75) > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73) > org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)