I agree that moving forward with the upgrade is the right thing to do,
the example Mike pointed out is invalid according to the 1.8.1 spec
[1] even though it was mistakenly allowed by Avro itself. Since we
don't infer default values I think we can correctly require that
hand-written schemas conform to the spec, especially with respect to
default values, and our inferred schemas will adhere to the spec.

I also agree we should mention this in the Migration Guidance doc as
it is a requirement that is now being enforced when it wasn't before.

There might be an issue with Hive 1 but I don't think we're on the
hook for that, if Hive has issues with nullable types and/or default
values of null, it should obey the spec the same as we will.

Regards,
Matt

[1] https://avro.apache.org/docs/1.8.1/spec.html#Unions

On Thu, Apr 7, 2022 at 9:20 AM David Handermann
<exceptionfact...@apache.org> wrote:
>
> Mike,
>
> Thanks for raising this issue for discussion, and thanks Isha and Edward
> for the comments. It is helpful to get an understanding of the potential
> impact.
>
> In general, it seems like moving forward with the upgrade is the best idea,
> given the issues with older versions. For integration with other products
> outside of NiFi, using the latest version of Avro should provide better
> compatibility.
>
> That being said, it is important to consider the impact on existing
> deployments. At minimum, this change is something that should be described
> in the NiFi Migration Guidance associated with the released version. As far
> as supporting different versions, it is possible to run older versions of
> the various components, although this is more complicated given the scope
> of the upgrade. This seems like a case where providing more detailed
> migration steps would be helpful, to include issues related to cached
> schemas or inferred schemas used as the basis for customization.
>
> Regards,
> David Handermann
>
> On Thu, Apr 7, 2022 at 6:08 AM Edward Armes <edward.ar...@gmail.com> wrote:
>
> > I've had a quick look in JIRA and it looks like this might have happened as
> > a side effect of AVRO-1544.
> >
> > I think it is worth upgrading especially given that it looks like few of
> > the changes refer to updating against newer bundled dependencies some of
> > which seem to  have CVE's against them.
> >
> > Depending on peoples feelings would it be wroth creating 2 versions one
> > using Avro 1.8 and one using Avro 1.11.0 and then removing the 1.8 version
> > in a later release?
> >
> > On an additional note in cases where people are writing schemas manually I
> > suspect they are probably going to be validating against the later versions
> > of Avro using the projects tooling and that may create issues further down
> > the line.
> >
> > Edward
> >
> > On Thu, Apr 7, 2022 at 11:52 AM Isha Lamboo <
> > isha.lam...@virtualsciences.nl>
> > wrote:
> >
> > > Hi Mike,
> > >
> > > The "Infer schema" functionality in NiFi currently generates schemas with
> > > the order that will be invalid under Avro 1.9+. I noticed because I've
> > been
> > > using that to copy-paste schemas that were "almost right" so I could
> > > manually fix them.
> > >
> > > I guess that inferred schemas should be fine if the inferring logic is
> > > also upgraded by the dependency, but for any cached schemas and my own
> > > manually saved schemas this will be somewhat painful.
> > > My use case for manually saving inferred schemas is mostly database
> > > migration where some inferred "choice" fields are not supported for the
> > > target database.
> > >
> > > Hope this helps,
> > >
> > > Isha
> > >
> > > -----Oorspronkelijk bericht-----
> > > Van: Mike Thomsen <mikerthom...@gmail.com>
> > > Verzonden: donderdag 7 april 2022 12:11
> > > Aan: dev@nifi.apache.org
> > > Onderwerp: Need some feedback on how upgrading Avro might cause problems
> > >
> > > Thread is here for full details:
> > >
> > >
> > https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fnifi%2Fpull%2F5900%23pullrequestreview-922490039&amp;data=04%7C01%7Cisha.lamboo%40virtualsciences.nl%7C8c89ae3e621c4a255c3308da187eea09%7C21429da9e4ad45f99a6fcd126a64274b%7C0%7C0%7C637849231546931433%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=ztcxOWXBkpZFqEh%2Bu%2BG0du5BLUPyZ3WaMxqpeqn%2FexI%3D&amp;reserved=0
> > >
> > > It looks like Avro 1.8's schema parser may have been more lenient (or
> > > buggy) in enforcing the specification with respect to the ordering of a
> > > union for a nullable type. 1.9.X and higher are definitely more
> > opinionated
> > > and throw exceptions on schemas that used to work on 1.8.X. For example,
> > > this used to be valid:
> > >
> > > {
> > >   "name": "message",
> > >   "type": [ "null", "string" ],
> > >   "default": "Hello, world"
> > > }
> > >
> > > Now Avro **correctly** throws an exception per the specification.
> > > Under 1.8 it did not, and as you can see here, I had to change numerous
> > > test schemas in order to make them work under 1.9 to 1.11.0:
> > >
> > >
> > >
> > https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fnifi%2Fpull%2F5900%2Ffiles%23r835954170&amp;data=04%7C01%7Cisha.lamboo%40virtualsciences.nl%7C8c89ae3e621c4a255c3308da187eea09%7C21429da9e4ad45f99a6fcd126a64274b%7C0%7C0%7C637849231546931433%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=iBGW6sYZUdxvAADYIB5L2t94RBZH3A5%2BPJMhxuGv8q8%3D&amp;reserved=0
> > >
> > > As I said to Matt, I think we're in a "damned if you do, damned if you
> > > don't" position here.
> > >
> > > Thoughts?
> > >
> >

Reply via email to