Thanks a lot Susu.

I will look into the PR.

Thanks,
Aakash

On Wed, Jun 23, 2021 at 9:48 PM Susu Dong <susudo...@gmail.com> wrote:

> Hi Aakash,
>
> Deleting the old commit files should not impose much of an impact since you
> are unlikely to use them again once it's been archived successfully, which
> you have also deleted some of the archived files yourself. 😅
>
> However, I went back and dug the codebase again. A fix has been merged into
> the master recently and is supposed to come out in 0.9.0, which should be a
> better fix to this problem rather than manual intervention.
> Specifically, you can take a look at this fix here
> https://github.com/apache/hudi/pull/2677, if you are interested.
> We will be *skipping* the deserialization of inflight commit files and
> *only* deserialize complete commit files. As you can see, your problem is
> caused by archiving 20200715192915.rollback.inflight, which is an inflight
> commit file. We aren't particularly interested in the content of those
> inflight files; thus, we have decided to modify the archival logic this
> way.
>
> Failure to archive the commit files should not impede your usage of Hudi,
> and it could continue to function properly. However, if you do care about a
> clean running status of your pipeline, feel free to build your 0.9.0
> SNAPSHOT version and blend it in. Hope it helps. :)
>
> Best,
> Susu
>
>
> On Thu, Jun 24, 2021 at 12:32 AM aakash aakash <email2aak...@gmail.com>
> wrote:
>
> > Hi Susu,
> >
> > thanks for the response. Can you please explain whats the impact of
> > deleting these commit files?
> >
> > Thanks!
> >
> > On Wed, Jun 23, 2021 at 8:09 AM Susu Dong <susudo...@gmail.com> wrote:
> >
> > > Hi Aakash,
> > >
> > > I believe there were schema level changes from Hudi 0.5.0 to 0.6.0
> > > regarding those commit files. So if you are jumping from 0.5.0 to 0.8.0
> > > right away, you will likely experience such an error, i.e. Failed to
> > > archive commits. You shouldn't need to delete archived files; instead,
> > you
> > > should try deleting some, if not all, active commit files under your
> > > *.hoodie* folder. The reason for that is 0.8.0 is using a new AVRO
> schema
> > > to parse your old commit files, so you got the failure. Can you try the
> > > above approach and let us know? Thank you. :)
> > >
> > > Best,
> > > Susu
> > >
> > > On Wed, Jun 23, 2021 at 12:21 PM aakash aakash <email2aak...@gmail.com
> >
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I am trying to use Hudi 0.8 with Spark 3.0 in my prod environment and
> > > > earlier we were running Hudi 0.5 with Spark 2.4.4.
> > > >
> > > > While updating a very old index, I am getting this error :
> > > >
> > > > *from the logs it seem its  error out while reading this file :
> > > > hudi/.hoodie/archived/.commits_.archive.119_1-0-1 in s3*
> > > >
> > > > 21/06/22 19:18:06 ERROR HoodieTimelineArchiveLog: Failed to archive
> > > > commits, .commit file: 20200715192915.rollback.inflight
> > > > java.io.IOException: Not an Avro data file
> > > > at
> > org.apache.avro.file.DataFileReader.openReader(DataFileReader.java:50)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hudi.common.table.timeline.TimelineMetadataUtils.deserializeAvroMetadata(TimelineMetadataUtils.java:175)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hudi.client.utils.MetadataConversionUtils.createMetaWrapper(MetadataConversionUtils.java:84)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hudi.table.HoodieTimelineArchiveLog.convertToAvroRecord(HoodieTimelineArchiveLog.java:370)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hudi.table.HoodieTimelineArchiveLog.archive(HoodieTimelineArchiveLog.java:311)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hudi.table.HoodieTimelineArchiveLog.archiveIfRequired(HoodieTimelineArchiveLog.java:128)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hudi.client.AbstractHoodieWriteClient.postCommit(AbstractHoodieWriteClient.java:430)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hudi.client.AbstractHoodieWriteClient.commitStats(AbstractHoodieWriteClient.java:186)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:121)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:479)
> > > >
> > > >
> > > > Is this a backward compatibility issue? I have deleted a few archive
> > > files
> > > > but the problem is persisting so it does not look like a file
> > corruption
> > > > issue.
> > > >
> > > > Regards,
> > > > Aakash
> > > >
> > >
> >
>

Reply via email to