Thanks a lot Susu. I will look into the PR.
Thanks, Aakash On Wed, Jun 23, 2021 at 9:48 PM Susu Dong <susudo...@gmail.com> wrote: > Hi Aakash, > > Deleting the old commit files should not impose much of an impact since you > are unlikely to use them again once it's been archived successfully, which > you have also deleted some of the archived files yourself. 😅 > > However, I went back and dug the codebase again. A fix has been merged into > the master recently and is supposed to come out in 0.9.0, which should be a > better fix to this problem rather than manual intervention. > Specifically, you can take a look at this fix here > https://github.com/apache/hudi/pull/2677, if you are interested. > We will be *skipping* the deserialization of inflight commit files and > *only* deserialize complete commit files. As you can see, your problem is > caused by archiving 20200715192915.rollback.inflight, which is an inflight > commit file. We aren't particularly interested in the content of those > inflight files; thus, we have decided to modify the archival logic this > way. > > Failure to archive the commit files should not impede your usage of Hudi, > and it could continue to function properly. However, if you do care about a > clean running status of your pipeline, feel free to build your 0.9.0 > SNAPSHOT version and blend it in. Hope it helps. :) > > Best, > Susu > > > On Thu, Jun 24, 2021 at 12:32 AM aakash aakash <email2aak...@gmail.com> > wrote: > > > Hi Susu, > > > > thanks for the response. Can you please explain whats the impact of > > deleting these commit files? > > > > Thanks! > > > > On Wed, Jun 23, 2021 at 8:09 AM Susu Dong <susudo...@gmail.com> wrote: > > > > > Hi Aakash, > > > > > > I believe there were schema level changes from Hudi 0.5.0 to 0.6.0 > > > regarding those commit files. So if you are jumping from 0.5.0 to 0.8.0 > > > right away, you will likely experience such an error, i.e. Failed to > > > archive commits. You shouldn't need to delete archived files; instead, > > you > > > should try deleting some, if not all, active commit files under your > > > *.hoodie* folder. The reason for that is 0.8.0 is using a new AVRO > schema > > > to parse your old commit files, so you got the failure. Can you try the > > > above approach and let us know? Thank you. :) > > > > > > Best, > > > Susu > > > > > > On Wed, Jun 23, 2021 at 12:21 PM aakash aakash <email2aak...@gmail.com > > > > > wrote: > > > > > > > Hi, > > > > > > > > I am trying to use Hudi 0.8 with Spark 3.0 in my prod environment and > > > > earlier we were running Hudi 0.5 with Spark 2.4.4. > > > > > > > > While updating a very old index, I am getting this error : > > > > > > > > *from the logs it seem its error out while reading this file : > > > > hudi/.hoodie/archived/.commits_.archive.119_1-0-1 in s3* > > > > > > > > 21/06/22 19:18:06 ERROR HoodieTimelineArchiveLog: Failed to archive > > > > commits, .commit file: 20200715192915.rollback.inflight > > > > java.io.IOException: Not an Avro data file > > > > at > > org.apache.avro.file.DataFileReader.openReader(DataFileReader.java:50) > > > > at > > > > > > > > > > > > > > org.apache.hudi.common.table.timeline.TimelineMetadataUtils.deserializeAvroMetadata(TimelineMetadataUtils.java:175) > > > > at > > > > > > > > > > > > > > org.apache.hudi.client.utils.MetadataConversionUtils.createMetaWrapper(MetadataConversionUtils.java:84) > > > > at > > > > > > > > > > > > > > org.apache.hudi.table.HoodieTimelineArchiveLog.convertToAvroRecord(HoodieTimelineArchiveLog.java:370) > > > > at > > > > > > > > > > > > > > org.apache.hudi.table.HoodieTimelineArchiveLog.archive(HoodieTimelineArchiveLog.java:311) > > > > at > > > > > > > > > > > > > > org.apache.hudi.table.HoodieTimelineArchiveLog.archiveIfRequired(HoodieTimelineArchiveLog.java:128) > > > > at > > > > > > > > > > > > > > org.apache.hudi.client.AbstractHoodieWriteClient.postCommit(AbstractHoodieWriteClient.java:430) > > > > at > > > > > > > > > > > > > > org.apache.hudi.client.AbstractHoodieWriteClient.commitStats(AbstractHoodieWriteClient.java:186) > > > > at > > > > > > > > > > > > > > org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:121) > > > > at > > > > > > > > > > > > > > org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:479) > > > > > > > > > > > > Is this a backward compatibility issue? I have deleted a few archive > > > files > > > > but the problem is persisting so it does not look like a file > > corruption > > > > issue. > > > > > > > > Regards, > > > > Aakash > > > > > > > > > >