lucasmo opened a new issue, #466: URL: https://github.com/apache/incubator-xtable/issues/466
### Search before asking - [X] I had searched in the [issues](https://github.com/apache/incubator-xtable/issues?q=is%3Aissue) and found no similar issues. ### Please describe the bug š Iām trying to use XTable to convert a hudi source to a delta target and I am receiving the following exception. The table is active and frequently updated. It is being actively queried as a hudi table. Is there any other debug information I can provide to make this more useful? My git head is 4a96627a OS is Linux/Ubuntu Java 11 Modified log4j2.xml to set level=trace for org.apache.hudi, o.a.xtable ## Run with stacktrace: ``` $ java -jar ./xtable-utilities/target/xtable-utilities-0.1.0-SNAPSHOT-bundled.jar --datasetConfig config.yaml WARNING: Runtime environment or build system does not support multi-release JARs. This will impact location-based features. 2024-06-05 23:22:05 INFO org.apache.xtable.utilities.RunSync:148 - Running sync for basePath s3://hidden-s3-bucket/hidden-prefix/ for following table formats [DELTA] 2024-06-05 23:22:05 INFO org.apache.hudi.common.table.HoodieTableMetaClient:133 - Loading HoodieTableMetaClient from s3://hidden-s3-bucket/hidden-prefix 2024-06-05 23:22:05 WARN org.apache.hadoop.util.NativeCodeLoader:60 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2024-06-05 23:22:05 WARN org.apache.hadoop.metrics2.impl.MetricsConfig:136 - Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties 2024-06-05 23:22:06 WARN org.apache.hadoop.fs.s3a.SDKV2Upgrade:39 - Directly referencing AWS SDK V1 credential provider com.amazonaws.auth.DefaultAWSCredentialsProviderChain. AWS SDK V1 credential providers will be removed once S3A is upgraded to SDK V2 2024-06-05 23:22:07 INFO org.apache.hudi.common.table.HoodieTableConfig:276 - Loading table properties from s3://hidden-s3-bucket/hidden-prefix/.hoodie/hoodie.properties 2024-06-05 23:22:07 INFO org.apache.hudi.common.table.HoodieTableMetaClient:152 - Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from s3://hidden-s3-bucket/hidden-prefix 2024-06-05 23:22:07 INFO org.apache.hudi.common.table.HoodieTableMetaClient:155 - Loading Active commit timeline for s3://hidden-s3-bucket/hidden-prefix 2024-06-05 23:22:07 INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline:171 - Loaded instants upto : Option{val=[20240605231910580__clean__COMPLETED__20240605231918000]} 2024-06-05 23:22:07 INFO org.apache.hudi.common.table.HoodieTableMetaClient:133 - Loading HoodieTableMetaClient from s3://hidden-s3-bucket/hidden-prefix 2024-06-05 23:22:07 INFO org.apache.hudi.common.table.HoodieTableConfig:276 - Loading table properties from s3://hidden-s3-bucket/hidden-prefix/.hoodie/hoodie.properties 2024-06-05 23:22:07 INFO org.apache.hudi.common.table.HoodieTableMetaClient:152 - Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from s3://hidden-s3-bucket/hidden-prefix 2024-06-05 23:22:07 INFO org.apache.hudi.common.table.HoodieTableMetaClient:133 - Loading HoodieTableMetaClient from s3://hidden-s3-bucket/hidden-prefix/.hoodie/metadata 2024-06-05 23:22:07 INFO org.apache.hudi.common.table.HoodieTableConfig:276 - Loading table properties from s3://hidden-s3-bucket/hidden-prefix/.hoodie/metadata/.hoodie/hoodie.properties 2024-06-05 23:22:07 INFO org.apache.hudi.common.table.HoodieTableMetaClient:152 - Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=HFILE) from s3://hidden-s3-bucket/hidden-prefix/.hoodie/metadata 2024-06-05 23:22:08 INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline:171 - Loaded instants upto : Option{val=[20240605231910580__deltacommit__COMPLETED__20240605231917000]} 2024-06-05 23:22:08 INFO org.apache.hudi.common.table.view.AbstractTableFileSystemView:259 - Took 7 ms to read 0 instants, 0 replaced file groups WARNING: An illegal reflective access operation has occurred WARNING: Illegal reflective access by org.apache.hadoop.hbase.util.UnsafeAvailChecker (file:/incubator-xtable/xtable-utilities/target/xtable-utilities-0.1.0-SNAPSHOT-bundled.jar) to method java.nio.Bits.unaligned() WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.hbase.util.UnsafeAvailChecker WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations WARNING: All illegal access operations will be denied in a future release 2024-06-05 23:22:08 INFO org.apache.hudi.common.util.ClusteringUtils:147 - Found 0 files in pending clustering operations 2024-06-05 23:22:08 INFO org.apache.hudi.common.table.view.FileSystemViewManager:243 - Creating View Manager with storage type :MEMORY 2024-06-05 23:22:08 INFO org.apache.hudi.common.table.view.FileSystemViewManager:255 - Creating in-memory based Table View 2024-06-05 23:22:11 INFO org.apache.spark.sql.delta.storage.DelegatingLogStore:60 - LogStore `LogStoreAdapter(io.delta.storage.S3SingleDriverLogStore)` is used for scheme `s3` 2024-06-05 23:22:11 INFO org.apache.spark.sql.delta.DeltaLog:60 - Creating initial snapshot without metadata, because the directory is empty 2024-06-05 23:22:13 INFO org.apache.spark.sql.delta.InitialSnapshot:60 - [tableId=8eda3e8f-9dae-4d19-ac72-f625b8ccb0c5] Created snapshot InitialSnapshot(path=s3://hidden-s3-bucket/hidden-prefix/_delta_log, version=-1, metadata=Metadata(167f7b26-f82d-4765-97b9-b6e47d9147ec,null,null,Format(parquet,Map()),null,List(),Map(),Some(1717629733296)), logSegment=LogSegment(s3://hidden-s3-bucket/hidden-prefix/_delta_log,-1,List(),None,-1), checksumOpt=None) 2024-06-05 23:22:13 INFO org.apache.xtable.conversion.ConversionController:240 - No previous InternalTable sync for target. Falling back to snapshot sync. 2024-06-05 23:22:13 INFO org.apache.hudi.common.table.TableSchemaResolver:317 - Reading schema from s3://hidden-s3-bucket/hidden-prefix/op_date=2024-06-05/3b5d27af-ef39-4862-bbd9-d4a010f6056e-0_0-71-375_20240605231837826.parquet 2024-06-05 23:22:14 INFO org.apache.hudi.metadata.HoodieTableMetadataUtil:927 - Loading latest merged file slices for metadata table partition files 2024-06-05 23:22:14 INFO org.apache.hudi.common.table.view.AbstractTableFileSystemView:259 - Took 1 ms to read 0 instants, 0 replaced file groups 2024-06-05 23:22:14 INFO org.apache.hudi.common.util.ClusteringUtils:147 - Found 0 files in pending clustering operations 2024-06-05 23:22:14 INFO org.apache.hudi.common.table.view.AbstractTableFileSystemView:429 - Building file system view for partition (files) 2024-06-05 23:22:14 DEBUG org.apache.hudi.common.table.view.AbstractTableFileSystemView:435 - #files found in partition (files) =30, Time taken =40 2024-06-05 23:22:14 DEBUG org.apache.hudi.common.table.view.HoodieTableFileSystemView:386 - Adding file-groups for partition :files, #FileGroups=1 2024-06-05 23:22:14 DEBUG org.apache.hudi.common.table.view.AbstractTableFileSystemView:165 - addFilesToView: NumFiles=30, NumFileGroups=1, FileGroupsCreationTime=15, StoreTimeTaken=1 2024-06-05 23:22:14 DEBUG org.apache.hudi.common.table.view.AbstractTableFileSystemView:449 - Time to load partition (files) =57 2024-06-05 23:22:14 INFO org.apache.hudi.metadata.HoodieBackedTableMetadata:451 - Opened metadata base file from s3://hidden-s3-bucket/hidden-prefix/.hoodie/metadata/files/files-0000-0_0-67-1304_20240605210834482001.hfile at instant 20240605210834482001 in 9 ms 2024-06-05 23:22:14 INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline:171 - Loaded instants upto : Option{val=[20240605231910580__clean__COMPLETED__20240605231918000]} 2024-06-05 23:22:14 ERROR org.apache.xtable.utilities.RunSync:171 - Error running sync for s3://hidden-s3-bucket/hidden-prefix/ org.apache.hudi.exception.HoodieMetadataException: Failed to retrieve list of partition from metadata at org.apache.hudi.metadata.BaseTableMetadata.getAllPartitionPaths(BaseTableMetadata.java:127) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.xtable.hudi.HudiDataFileExtractor.getFilesCurrentState(HudiDataFileExtractor.java:116) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.xtable.hudi.HudiConversionSource.getCurrentSnapshot(HudiConversionSource.java:97) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.xtable.spi.extractor.ExtractFromSource.extractSnapshot(ExtractFromSource.java:38) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.xtable.conversion.ConversionController.syncSnapshot(ConversionController.java:183) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.xtable.conversion.ConversionController.sync(ConversionController.java:121) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.xtable.utilities.RunSync.main(RunSync.java:169) [xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] Caused by: java.lang.IllegalStateException: Recursive update at java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1739) ~[?:?] at org.apache.avro.util.MapUtil.computeIfAbsent(MapUtil.java:42) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.avro.specific.SpecificData.getClass(SpecificData.java:257) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.avro.specific.SpecificData.newRecord(SpecificData.java:508) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:237) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:180) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.avro.generic.GenericDatumReader.readMap(GenericDatumReader.java:355) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:186) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.avro.specific.SpecificDatumReader.readField(SpecificDatumReader.java:136) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:248) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:180) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:161) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:154) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.avro.file.DataFileStream.next(DataFileStream.java:263) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.avro.file.DataFileStream.next(DataFileStream.java:248) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.hudi.common.table.timeline.TimelineMetadataUtils.deserializeAvroMetadata(TimelineMetadataUtils.java:209) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.hudi.common.table.timeline.TimelineMetadataUtils.deserializeHoodieRollbackMetadata(TimelineMetadataUtils.java:177) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.hudi.metadata.HoodieTableMetadataUtil.getRollbackedCommits(HoodieTableMetadataUtil.java:1355) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.hudi.metadata.HoodieTableMetadataUtil.lambda$getValidInstantTimestamps$37(HoodieTableMetadataUtil.java:1284) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) ~[?:?] at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177) ~[?:?] at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655) ~[?:?] at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) ~[?:?] at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) ~[?:?] at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) ~[?:?] at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) ~[?:?] at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:?] at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497) ~[?:?] at org.apache.hudi.metadata.HoodieTableMetadataUtil.getValidInstantTimestamps(HoodieTableMetadataUtil.java:1283) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.hudi.metadata.HoodieBackedTableMetadata.getLogRecordScanner(HoodieBackedTableMetadata.java:473) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.hudi.metadata.HoodieBackedTableMetadata.openReaders(HoodieBackedTableMetadata.java:429) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.hudi.metadata.HoodieBackedTableMetadata.lambda$getOrCreateReaders$10(HoodieBackedTableMetadata.java:412) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1705) ~[?:?] at org.apache.hudi.metadata.HoodieBackedTableMetadata.getOrCreateReaders(HoodieBackedTableMetadata.java:412) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.hudi.metadata.HoodieBackedTableMetadata.lookupKeysFromFileSlice(HoodieBackedTableMetadata.java:291) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.hudi.metadata.HoodieBackedTableMetadata.getRecordsByKeys(HoodieBackedTableMetadata.java:255) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.hudi.metadata.HoodieBackedTableMetadata.getRecordByKey(HoodieBackedTableMetadata.java:145) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.hudi.metadata.BaseTableMetadata.fetchAllPartitionPaths(BaseTableMetadata.java:316) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] at org.apache.hudi.metadata.BaseTableMetadata.getAllPartitionPaths(BaseTableMetadata.java:125) ~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT] ... 6 more ``` ## config.yaml: ``` sourceFormat: HUDI targetFormats: - DELTA datasets: - tableBasePath: s3://hidden-s3-bucket/hidden-prefix tableName: hidden_table partitionSpec: op_date:VALUE ``` ## hoodie.properties from the table: ``` hoodie.table.timeline.timezone=LOCAL hoodie.table.keygenerator.class=org.apache.hudi.keygen.SimpleKeyGenerator hoodie.table.precombine.field=ts_millis hoodie.table.version=6 hoodie.database.name= hoodie.datasource.write.hive_style_partitioning=true hoodie.table.metadata.partitions.inflight= hoodie.table.checksum=2622850774 hoodie.partition.metafile.use.base.format=false hoodie.table.cdc.enabled=false hoodie.archivelog.folder=archived hoodie.table.name=hidden_table hoodie.populate.meta.fields=true hoodie.table.type=COPY_ON_WRITE hoodie.datasource.write.partitionpath.urlencode=false hoodie.table.base.file.format=PARQUET hoodie.datasource.write.drop.partition.columns=false hoodie.table.metadata.partitions=files hoodie.timeline.layout.version=1 hoodie.table.recordkey.fields=record_id hoodie.table.partition.fields=op_date ``` I submitted this to the dev@ mailing list and received no response, so filing as an issue. ### Are you willing to submit PR? - [ ] I am willing to submit a PR! - [X] I am willing to submit a PR but need help getting started! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
