[GitHub] [hudi] vinothchandar commented on a change in pull request #2903: [HUDI-1850] Fixing read of a empty table but with failed write

2021-06-20 Thread GitBox


vinothchandar commented on a change in pull request #2903:
URL: https://github.com/apache/hudi/pull/2903#discussion_r655073962



##
File path: 
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/HoodieSparkSqlWriterSuite.scala
##
@@ -229,7 +231,42 @@ class HoodieSparkSqlWriterSuite extends FunSuite with 
Matchers {
 }
   }
 
-  test("test bulk insert dataset with datasource impl multiple rounds") {
+  test("test read of a table with one failed write") {
+initSparkContext("test_read_table_with_one_failed_write")
+val path = java.nio.file.Files.createTempDirectory("hoodie_test_path")
+try {
+  val hoodieFooTableName = "hoodie_foo_tbl"
+  val fooTableModifier = Map("path" -> path.toAbsolutePath.toString,
+HoodieWriteConfig.TABLE_NAME -> hoodieFooTableName,
+DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY -> "_row_key",
+DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY -> "partition")
+
+  val fooTableParams = 
HoodieWriterUtils.parametersWithWriteDefaults(fooTableModifier)
+  val props = new Properties()
+  fooTableParams.foreach(entry => props.setProperty(entry._1, entry._2))
+  val metaClient = 
HoodieTableMetaClient.initTableAndGetMetaClient(spark.sparkContext.hadoopConfiguration,
 path.toAbsolutePath.toString, props)
+
+  val partitionAndFileId = new util.HashMap[String, String]()
+  
partitionAndFileId.put(HoodieTestDataGenerator.DEFAULT_FIRST_PARTITION_PATH, 
"file-1")
+
+  
HoodieTestTable.of(metaClient).withPartitionMetaFiles(HoodieTestDataGenerator.DEFAULT_FIRST_PARTITION_PATH)
+.addInflightCommit("001")
+.withBaseFilesInPartitions(partitionAndFileId)
+
+  val snapshotDF1 = spark.read.format("org.apache.hudi")
+.load(path.toAbsolutePath.toString + "/*/*/*/*")
+  snapshotDF1.count()
+  assertFalse(true)
+}  catch {
+  case e: InvalidTableException =>
+assertTrue(e.getMessage.contains("Invalid Hoodie Table"))
+} finally {
+  spark.stop()

Review comment:
   @nsivabalan why are stopping the spark session here? is it not shared 
outside of a single test?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar commented on a change in pull request #2903: [HUDI-1850] Fixing read of a empty table but with failed write

2021-05-25 Thread GitBox


vinothchandar commented on a change in pull request #2903:
URL: https://github.com/apache/hudi/pull/2903#discussion_r638949244



##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala
##
@@ -105,7 +105,9 @@ class DefaultSource extends RelationProvider
 val tableType = metaClient.getTableType
 val queryType = parameters(QUERY_TYPE_OPT_KEY)
 log.info(s"Is bootstrapped table => $isBootstrappedTable, tableType is: 
$tableType")
-
+val schemaUtil = new TableSchemaResolver(metaClient)

Review comment:
   why call this `schemaUtil`  as opposed to `schemaResolver` ?

##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala
##
@@ -105,7 +105,9 @@ class DefaultSource extends RelationProvider
 val tableType = metaClient.getTableType
 val queryType = parameters(QUERY_TYPE_OPT_KEY)
 log.info(s"Is bootstrapped table => $isBootstrappedTable, tableType is: 
$tableType")
-
+val schemaUtil = new TableSchemaResolver(metaClient)
+schemaUtil.getTableAvroSchema(false) // this will throw 
InValidTableException if there is no

Review comment:
   Verifying table existence via schema resolution, seems pretty 
un-intuitive to me. can we do it from the `metaClient`? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar commented on a change in pull request #2903: [HUDI-1850] Fixing read of a empty table but with failed write

2021-05-10 Thread GitBox


vinothchandar commented on a change in pull request #2903:
URL: https://github.com/apache/hudi/pull/2903#discussion_r629724089



##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala
##
@@ -105,7 +105,9 @@ class DefaultSource extends RelationProvider
 val tableType = metaClient.getTableType
 val queryType = parameters(QUERY_TYPE_OPT_KEY)
 log.info(s"Is bootstrapped table => $isBootstrappedTable, tableType is: 
$tableType")
-
+val schemaUtil = new TableSchemaResolver(metaClient)

Review comment:
   why though? we can create the using the test table API and read from 
data source right? 

##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/TableSchemaResolver.java
##
@@ -130,7 +131,11 @@ private MessageType getTableParquetSchemaFromDataFile() 
throws Exception {
 + " for file " + filePathWithFormat.getLeft());
 }
   } else {
-return readSchemaFromLastCompaction(lastCompactionCommit);
+if (lastCompactionCommit.isPresent()) {

Review comment:
   how about using `lastCompactionCommit.map().orElseThrow()`. instead of 
the if-else




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org