Re: issue while reading archived commit written by 0.5 version with 0.8 version

2021-06-24 Thread aakash aakash
Thanks a lot Susu.

I will look into the PR.

Thanks,
Aakash

On Wed, Jun 23, 2021 at 9:48 PM Susu Dong  wrote:

> Hi Aakash,
>
> Deleting the old commit files should not impose much of an impact since you
> are unlikely to use them again once it's been archived successfully, which
> you have also deleted some of the archived files yourself. 
>
> However, I went back and dug the codebase again. A fix has been merged into
> the master recently and is supposed to come out in 0.9.0, which should be a
> better fix to this problem rather than manual intervention.
> Specifically, you can take a look at this fix here
> https://github.com/apache/hudi/pull/2677, if you are interested.
> We will be *skipping* the deserialization of inflight commit files and
> *only* deserialize complete commit files. As you can see, your problem is
> caused by archiving 20200715192915.rollback.inflight, which is an inflight
> commit file. We aren't particularly interested in the content of those
> inflight files; thus, we have decided to modify the archival logic this
> way.
>
> Failure to archive the commit files should not impede your usage of Hudi,
> and it could continue to function properly. However, if you do care about a
> clean running status of your pipeline, feel free to build your 0.9.0
> SNAPSHOT version and blend it in. Hope it helps. :)
>
> Best,
> Susu
>
>
> On Thu, Jun 24, 2021 at 12:32 AM aakash aakash 
> wrote:
>
> > Hi Susu,
> >
> > thanks for the response. Can you please explain whats the impact of
> > deleting these commit files?
> >
> > Thanks!
> >
> > On Wed, Jun 23, 2021 at 8:09 AM Susu Dong  wrote:
> >
> > > Hi Aakash,
> > >
> > > I believe there were schema level changes from Hudi 0.5.0 to 0.6.0
> > > regarding those commit files. So if you are jumping from 0.5.0 to 0.8.0
> > > right away, you will likely experience such an error, i.e. Failed to
> > > archive commits. You shouldn't need to delete archived files; instead,
> > you
> > > should try deleting some, if not all, active commit files under your
> > > *.hoodie* folder. The reason for that is 0.8.0 is using a new AVRO
> schema
> > > to parse your old commit files, so you got the failure. Can you try the
> > > above approach and let us know? Thank you. :)
> > >
> > > Best,
> > > Susu
> > >
> > > On Wed, Jun 23, 2021 at 12:21 PM aakash aakash  >
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I am trying to use Hudi 0.8 with Spark 3.0 in my prod environment and
> > > > earlier we were running Hudi 0.5 with Spark 2.4.4.
> > > >
> > > > While updating a very old index, I am getting this error :
> > > >
> > > > *from the logs it seem its  error out while reading this file :
> > > > hudi/.hoodie/archived/.commits_.archive.119_1-0-1 in s3*
> > > >
> > > > 21/06/22 19:18:06 ERROR HoodieTimelineArchiveLog: Failed to archive
> > > > commits, .commit file: 20200715192915.rollback.inflight
> > > > java.io.IOException: Not an Avro data file
> > > > at
> > org.apache.avro.file.DataFileReader.openReader(DataFileReader.java:50)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hudi.common.table.timeline.TimelineMetadataUtils.deserializeAvroMetadata(TimelineMetadataUtils.java:175)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hudi.client.utils.MetadataConversionUtils.createMetaWrapper(MetadataConversionUtils.java:84)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hudi.table.HoodieTimelineArchiveLog.convertToAvroRecord(HoodieTimelineArchiveLog.java:370)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hudi.table.HoodieTimelineArchiveLog.archive(HoodieTimelineArchiveLog.java:311)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hudi.table.HoodieTimelineArchiveLog.archiveIfRequired(HoodieTimelineArchiveLog.java:128)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hudi.client.AbstractHoodieWriteClient.postCommit(AbstractHoodieWriteClient.java:430)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hudi.client.AbstractHoodieWriteClient.commitStats(AbstractHoodieWriteClient.java:186)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:121)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:479)
> > > >
> > > >
> > > > Is this a backward compatibility issue? I have deleted a few archive
> > > files
> > > > but the problem is persisting so it does not look like a file
> > corruption
> > > > issue.
> > > >
> > > > Regards,
> > > > Aakash
> > > >
> > >
> >
>


Re: [NOTICE] Git web site publishing to be done via .asf.yaml only as of July 1st

2021-06-24 Thread Navinder Brar
Hi Vinoth 

I have created a jira for this https://issues.apache.org/jira/browse/HUDI-2070. 


I can assign this to myself and start working on it.

Regards,
NavinderOn Friday, 25 June, 2021, 12:23:26 am IST, Vinoth Chandar 
 wrote:  
 
 Hi Navinder,

Our site is pushed from the asf-site branch and it has a README with
building the site locally etc. that’s a good starting point. I don’t
believe there is a open JIRA yet for this. On the example itself, I am not
sure myself since this is new. So need to do some sluething and figure out.

Please suggest next steps

Thanks
Vinoth

On Thu, Jun 24, 2021 at 8:56 AM Navinder Brar
 wrote:

> Hi Vinoth,
>
> I can take this up. Is there any existing jira for this? Sorry I am new
> new to Hudi community, if not I can clone an existing one. Please share a
> sample.
>
> Thanks,
> Navinder
>
>    On Thursday, 24 June, 2021, 04:10:41 am IST, Vinoth Chandar <
> vin...@apache.org> wrote:
>
>  Hi all,
>
> Looks like this will apply to our site? Any volunteers to help fix this?
>
> Thanks
> Vinoth
>
> -- Forwarded message -
> From: Daniel Gruno 
> Date: Mon, May 31, 2021 at 6:41 AM
> Subject: [NOTICE] Git web site publishing to be done via .asf.yaml only as
> of July 1st
> To: Users 
>
>
> TL;DR: if your project web site is kept in subversion, disregard this
> email please. If your project web site is using git, and you have not
> deployed it via .asf.yaml, you MUST switch before July 1st or risk your
> web site goes stale.
>
>
>
> Dear Apache projects,
> In order to simplify our web site publishing services and improve
> self-serve for projects and stability of deployments, we will be turning
> off the old 'gitwcsub' method of publishing git web sites. As of this
> moment, this involves 120 web sites. All web sites should switch to our
> self-serve method of publishing via the .asf.yaml meta-file. We aim to
> turn off gitwcsub around July 1st.
>
>
> ## How to publish via .asf.yaml:
> Publishing via .asf.yaml is described at:
> https://s.apache.org/asfyamlpublishing
> You can also see an example .asf.yaml with publishing and staging
> profiles for our own infra web site at:
> https://github.com/apache/infrastructure-website/blob/asf-site/.asf.yaml
>
> In short, one puts a file called .asf.yaml into the branch that needs to
> be published as the project's web site, with the following two-line
> content, in this case assuming the published branch is 'asf-site':
>
> publish:
>  whoami: asf-site
>
>
> It is important to note that the .asf.yaml file MUST be present at the
> root of the file system in the branch you wish to publish. The 'whoami'
> parameter acts as a guard, ensure that only the intended branch is used
> for publishing.
>
>
> ## Is my project affected by this?
> The quickest way to check if you need to switch to a .asf.yaml approach
> is to check out site source page at
> https://infra-reports.apache.org/site-source/ - if your site is listed
> in yellow, you will need to switch. This page will also tell you which
> branch you are currently publishing as your web site. This is (should
> be) the branch that you must add a .asf.yaml meta file to.
>
> The web site source list updates every hour. If your project site
> appears in green, you are already using .asf.yaml for publishing and do
> not need to make any changes.
>
>
> ## What happens if we miss the deadline?
> If you miss the deadline, don't fret. Your site will of course still
> remain online as is, but new updates will not appear till you
> create/edit the .asf.yaml and set up publishing.
>
>
> ## Who do we contact if we have questions?
> Please contact us at us...@infra.apache.org if you have any additional
> questions.
>
>
> With regards,
> Daniel on behalf of ASF Infra.
>
  

Re: [NOTICE] Git web site publishing to be done via .asf.yaml only as of July 1st

2021-06-24 Thread Vinoth Chandar
Hi Navinder,

Our site is pushed from the asf-site branch and it has a README with
building the site locally etc. that’s a good starting point. I don’t
believe there is a open JIRA yet for this. On the example itself, I am not
sure myself since this is new. So need to do some sluething and figure out.

Please suggest next steps

Thanks
Vinoth

On Thu, Jun 24, 2021 at 8:56 AM Navinder Brar
 wrote:

> Hi Vinoth,
>
> I can take this up. Is there any existing jira for this? Sorry I am new
> new to Hudi community, if not I can clone an existing one. Please share a
> sample.
>
> Thanks,
> Navinder
>
> On Thursday, 24 June, 2021, 04:10:41 am IST, Vinoth Chandar <
> vin...@apache.org> wrote:
>
>  Hi all,
>
> Looks like this will apply to our site? Any volunteers to help fix this?
>
> Thanks
> Vinoth
>
> -- Forwarded message -
> From: Daniel Gruno 
> Date: Mon, May 31, 2021 at 6:41 AM
> Subject: [NOTICE] Git web site publishing to be done via .asf.yaml only as
> of July 1st
> To: Users 
>
>
> TL;DR: if your project web site is kept in subversion, disregard this
> email please. If your project web site is using git, and you have not
> deployed it via .asf.yaml, you MUST switch before July 1st or risk your
> web site goes stale.
>
>
>
> Dear Apache projects,
> In order to simplify our web site publishing services and improve
> self-serve for projects and stability of deployments, we will be turning
> off the old 'gitwcsub' method of publishing git web sites. As of this
> moment, this involves 120 web sites. All web sites should switch to our
> self-serve method of publishing via the .asf.yaml meta-file. We aim to
> turn off gitwcsub around July 1st.
>
>
> ## How to publish via .asf.yaml:
> Publishing via .asf.yaml is described at:
> https://s.apache.org/asfyamlpublishing
> You can also see an example .asf.yaml with publishing and staging
> profiles for our own infra web site at:
> https://github.com/apache/infrastructure-website/blob/asf-site/.asf.yaml
>
> In short, one puts a file called .asf.yaml into the branch that needs to
> be published as the project's web site, with the following two-line
> content, in this case assuming the published branch is 'asf-site':
>
> publish:
>   whoami: asf-site
>
>
> It is important to note that the .asf.yaml file MUST be present at the
> root of the file system in the branch you wish to publish. The 'whoami'
> parameter acts as a guard, ensure that only the intended branch is used
> for publishing.
>
>
> ## Is my project affected by this?
> The quickest way to check if you need to switch to a .asf.yaml approach
> is to check out site source page at
> https://infra-reports.apache.org/site-source/ - if your site is listed
> in yellow, you will need to switch. This page will also tell you which
> branch you are currently publishing as your web site. This is (should
> be) the branch that you must add a .asf.yaml meta file to.
>
> The web site source list updates every hour. If your project site
> appears in green, you are already using .asf.yaml for publishing and do
> not need to make any changes.
>
>
> ## What happens if we miss the deadline?
> If you miss the deadline, don't fret. Your site will of course still
> remain online as is, but new updates will not appear till you
> create/edit the .asf.yaml and set up publishing.
>
>
> ## Who do we contact if we have questions?
> Please contact us at us...@infra.apache.org if you have any additional
> questions.
>
>
> With regards,
> Daniel on behalf of ASF Infra.
>


Re: [NOTICE] Git web site publishing to be done via .asf.yaml only as of July 1st

2021-06-24 Thread Navinder Brar
Hi Vinoth,

I can take this up. Is there any existing jira for this? Sorry I am new new to 
Hudi community, if not I can clone an existing one. Please share a sample.

Thanks,
Navinder 

On Thursday, 24 June, 2021, 04:10:41 am IST, Vinoth Chandar 
 wrote:  
 
 Hi all,

Looks like this will apply to our site? Any volunteers to help fix this?

Thanks
Vinoth

-- Forwarded message -
From: Daniel Gruno 
Date: Mon, May 31, 2021 at 6:41 AM
Subject: [NOTICE] Git web site publishing to be done via .asf.yaml only as
of July 1st
To: Users 


TL;DR: if your project web site is kept in subversion, disregard this
email please. If your project web site is using git, and you have not
deployed it via .asf.yaml, you MUST switch before July 1st or risk your
web site goes stale.



Dear Apache projects,
In order to simplify our web site publishing services and improve
self-serve for projects and stability of deployments, we will be turning
off the old 'gitwcsub' method of publishing git web sites. As of this
moment, this involves 120 web sites. All web sites should switch to our
self-serve method of publishing via the .asf.yaml meta-file. We aim to
turn off gitwcsub around July 1st.


## How to publish via .asf.yaml:
Publishing via .asf.yaml is described at:
https://s.apache.org/asfyamlpublishing
You can also see an example .asf.yaml with publishing and staging
profiles for our own infra web site at:
https://github.com/apache/infrastructure-website/blob/asf-site/.asf.yaml

In short, one puts a file called .asf.yaml into the branch that needs to
be published as the project's web site, with the following two-line
content, in this case assuming the published branch is 'asf-site':

publish:
  whoami: asf-site


It is important to note that the .asf.yaml file MUST be present at the
root of the file system in the branch you wish to publish. The 'whoami'
parameter acts as a guard, ensure that only the intended branch is used
for publishing.


## Is my project affected by this?
The quickest way to check if you need to switch to a .asf.yaml approach
is to check out site source page at
https://infra-reports.apache.org/site-source/ - if your site is listed
in yellow, you will need to switch. This page will also tell you which
branch you are currently publishing as your web site. This is (should
be) the branch that you must add a .asf.yaml meta file to.

The web site source list updates every hour. If your project site
appears in green, you are already using .asf.yaml for publishing and do
not need to make any changes.


## What happens if we miss the deadline?
If you miss the deadline, don't fret. Your site will of course still
remain online as is, but new updates will not appear till you
create/edit the .asf.yaml and set up publishing.


## Who do we contact if we have questions?
Please contact us at us...@infra.apache.org if you have any additional
questions.


With regards,
Daniel on behalf of ASF Infra.