Hi Team,
We are trying to run incremental updates to our MoR hudi table on S3 and it
looks like inevitably after 20-30 commits table gets corrupted. We do initial
data import and enable incremental upserts then we verify that tables are
readable by running:
hive> select * from table_name _ro limit 1;
but after letting incremental upserts to run for several hours , the mentioned
above select query starts throwing exceptions like:
Failed with exception java.io.IOException:java.lang.IllegalStateException: Hudi
File Id (HoodieFileGroupId{partitionPath='983',
fileId='8e9fde92-7515-4f89-a667-ce5c1087e60c-0'}) has more than 1 pending
compactions.
Checking compactions mentioned in exception message via hudi-cli, do indeed
verifies that fileid is present in both compactions. The upsert settings that
we use are:
hudiOptions = Map[String,String](
HoodieWriteConfig.TABLE_NAME → inputTableName,
"hoodie.consistency.check.enabled"->"true",
"hoodie.compact.inline.max.delta.commits"->"30",
"hoodie.compact.inline"->"true",
"hoodie.clean.automatic"->"true",
"hoodie.cleaner.commits.retained"->"1000",
"hoodie.keep.min.commits"->"1001",
"hoodie.keep.max.commits"->"1050",
DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY -> "MERGE_ON_READ",
DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY -> primaryKeys,
DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY ->
classOf[ComplexKeyGenerator].getName,
DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY
->"partition_val_str",
DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY -> sortKeys,
DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY → "true",
DataSourceWriteOptions.HIVE_TABLE_OPT_KEY → inputTableName,
DataSourceWriteOptions.HIVE_PARTITION_FIELDS_OPT_KEY →
"partition_val_str",
DataSourceWriteOptions.HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY →
classOf[MultiPartKeysValueExtractor].getName,
DataSourceWriteOptions.HIVE_URL_OPT_KEY
->s"jdbc:hive2://$hiveServer2URI:10000"
)
Any suggestions on what can cause or how to possibly debug this issue would
help a lot.
Thank you,
Anton Zuyeu