Re: [DISCUSS] New schema repository idea (with proof of concept)

Mike Thomsen Thu, 07 Mar 2024 14:49:05 -0800

Sure. The JAR is not actually scanned. You have to use dynamic properties
to map a schema name to a particular fully qualified class. I would assume
that most people using it are just taking a JAR that was generated or
written by hand to be a client library with only (or mainly) POJOs
representing the model. You can see a simple example here:


https://github.com/MikeThomsen/nifi-pojo-schema-repository-bundle/tree/main/test-pojos/src/main/java/org/apache/nifi/pojo/complex

Nothing fancy per se. It's just POJOs with some standard annotations from
the Avro lib.

I would imagine this repo would be a big help for teams that can't/won't
commit to a contract first design but have to work with NiFi and other big
data systems.

On Thu, Mar 7, 2024 at 5:39 PM David Handermann <[email protected]>
wrote:

> Mike,
>
> Thanks for the reply. I agree that file and property-based registries
> are useful, so the main question seems to be a compiled-code-derived
> registry as you have described.
>
> It seems that the general use case could still be supported through
> file-backed registry, but without requiring the dynamic class loading
> associated with a custom JAR.
>
> Loading code from a JAR also presents greater security risks than
> loading schema files, so if this were to be supported, it would
> require additional permission restrictions.
>
> To help think through this a bit more, can you describe the use case a
> bit more? How would someone prepare a JAR for referencing in this
> proposed registry?
>
> Regards,
> David Handermann
>
> On Thu, Mar 7, 2024 at 4:30 PM Mike Thomsen <[email protected]>
> wrote:
> >
> > You raise some good points, but I think there's still ample room for
> > file-based schema registries within NiFi. With regard to the the edge
> cases
> > with schema generation, I think an argument can also be made for "not
> > letting the perfect be the enemy of the good."
> >
> > On Wed, Mar 6, 2024 at 9:34 AM David Handermann <
> [email protected]>
> > wrote:
> >
> > > Mike,
> > >
> > > Thanks for raising this question, and providing the example repository.
> > >
> > > Although it sounds like a POJO-based repository could be useful in
> > > some cases, it does not seem like something that should be included
> > > for community support.
> > >
> > > Part of the value of a Schema Registry is a shared location for data
> > > description. Although supporting property or file-based Schema
> > > Registries is useful in NiFi itself, the general pattern is
> > > externalized storage and maintenance of schema definitions.
> > >
> > > From another angle, this could be similar to code-first versus
> > > contract-first API development. Each approach has its positives and
> > > negatives. When it comes to a Schema Registry, however, it seems like
> > > the contract needs to be defined outside of code.
> > >
> > > Introspecting JAR files also raises questions about what to include or
> > > exclude, and how to handle edge cases for certain class definitions.
> > > This seems like the more significant problem. For this reason, it
> > > seems better to rely on external operations to produce Avro schema
> > > definitions, rather than supporting that in NiFi itself.
> > >
> > > Those are my initial thoughts, perhaps others can provide additional
> > > perspective.
> > >
> > > Regards,
> > > David Handermann
> > >
> > > On Sat, Mar 2, 2024 at 9:18 AM Mike Thomsen <[email protected]>
> > > wrote:
> > > >
> > > > I've had this project on the back burner for a while and wanted to
> share
> > > it
> > > > with the team. It's a schema repository implementation that is
> designed
> > > to
> > > > take a JAR file with POJOs and use Jackson's schema generation API to
> > > > generate Avro schemas from those on startup. It also uses (via
> Jackson)
> > > > Avro annotations to help specify particular implementation details
> where
> > > > necessary. The code can be found here. Haven't worked on it lately,
> but
> > > it
> > > > should easily run on 1.25:
> > > >
> > > > https://github.com/MikeThomsen/nifi-pojo-schema-repository-bundle
> > > >
> > > > I am planning to get the repo ready for a PR unless someone raises
> > > reasons
> > > > why including it might be a poor fit. I think for a lot of teams this
> > > might
> > > > be a killer feature because it would allow them to use Avro with
> existing
> > > > enterprise POJOs and stuff like that without having to write them by
> > > hand.
> > > >
> > > > Thoughts?
> > >
>

Re: [DISCUSS] New schema repository idea (with proof of concept)

Reply via email to