[GitHub] [hudi] vinothchandar commented on a change in pull request #2903: [HUDI-1850] Fixing read of a empty table but with failed write
vinothchandar commented on a change in pull request #2903: URL: https://github.com/apache/hudi/pull/2903#discussion_r655073962 ## File path: hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/HoodieSparkSqlWriterSuite.scala ## @@ -229,7 +231,42 @@ class HoodieSparkSqlWriterSuite extends FunSuite with Matchers { } } - test("test bulk insert dataset with datasource impl multiple rounds") { + test("test read of a table with one failed write") { +initSparkContext("test_read_table_with_one_failed_write") +val path = java.nio.file.Files.createTempDirectory("hoodie_test_path") +try { + val hoodieFooTableName = "hoodie_foo_tbl" + val fooTableModifier = Map("path" -> path.toAbsolutePath.toString, +HoodieWriteConfig.TABLE_NAME -> hoodieFooTableName, +DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY -> "_row_key", +DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY -> "partition") + + val fooTableParams = HoodieWriterUtils.parametersWithWriteDefaults(fooTableModifier) + val props = new Properties() + fooTableParams.foreach(entry => props.setProperty(entry._1, entry._2)) + val metaClient = HoodieTableMetaClient.initTableAndGetMetaClient(spark.sparkContext.hadoopConfiguration, path.toAbsolutePath.toString, props) + + val partitionAndFileId = new util.HashMap[String, String]() + partitionAndFileId.put(HoodieTestDataGenerator.DEFAULT_FIRST_PARTITION_PATH, "file-1") + + HoodieTestTable.of(metaClient).withPartitionMetaFiles(HoodieTestDataGenerator.DEFAULT_FIRST_PARTITION_PATH) +.addInflightCommit("001") +.withBaseFilesInPartitions(partitionAndFileId) + + val snapshotDF1 = spark.read.format("org.apache.hudi") +.load(path.toAbsolutePath.toString + "/*/*/*/*") + snapshotDF1.count() + assertFalse(true) +} catch { + case e: InvalidTableException => +assertTrue(e.getMessage.contains("Invalid Hoodie Table")) +} finally { + spark.stop() Review comment: @nsivabalan why are stopping the spark session here? is it not shared outside of a single test? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vinothchandar commented on a change in pull request #2903: [HUDI-1850] Fixing read of a empty table but with failed write
vinothchandar commented on a change in pull request #2903: URL: https://github.com/apache/hudi/pull/2903#discussion_r638949244 ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala ## @@ -105,7 +105,9 @@ class DefaultSource extends RelationProvider val tableType = metaClient.getTableType val queryType = parameters(QUERY_TYPE_OPT_KEY) log.info(s"Is bootstrapped table => $isBootstrappedTable, tableType is: $tableType") - +val schemaUtil = new TableSchemaResolver(metaClient) Review comment: why call this `schemaUtil` as opposed to `schemaResolver` ? ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala ## @@ -105,7 +105,9 @@ class DefaultSource extends RelationProvider val tableType = metaClient.getTableType val queryType = parameters(QUERY_TYPE_OPT_KEY) log.info(s"Is bootstrapped table => $isBootstrappedTable, tableType is: $tableType") - +val schemaUtil = new TableSchemaResolver(metaClient) +schemaUtil.getTableAvroSchema(false) // this will throw InValidTableException if there is no Review comment: Verifying table existence via schema resolution, seems pretty un-intuitive to me. can we do it from the `metaClient`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vinothchandar commented on a change in pull request #2903: [HUDI-1850] Fixing read of a empty table but with failed write
vinothchandar commented on a change in pull request #2903: URL: https://github.com/apache/hudi/pull/2903#discussion_r629724089 ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala ## @@ -105,7 +105,9 @@ class DefaultSource extends RelationProvider val tableType = metaClient.getTableType val queryType = parameters(QUERY_TYPE_OPT_KEY) log.info(s"Is bootstrapped table => $isBootstrappedTable, tableType is: $tableType") - +val schemaUtil = new TableSchemaResolver(metaClient) Review comment: why though? we can create the using the test table API and read from data source right? ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/TableSchemaResolver.java ## @@ -130,7 +131,11 @@ private MessageType getTableParquetSchemaFromDataFile() throws Exception { + " for file " + filePathWithFormat.getLeft()); } } else { -return readSchemaFromLastCompaction(lastCompactionCommit); +if (lastCompactionCommit.isPresent()) { Review comment: how about using `lastCompactionCommit.map().orElseThrow()`. instead of the if-else -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org