Thanks for the color, Mike. I agree with David on this one. The solution you 
propose sounds quite complicated to use. And largely outside of the scope of 
the vast majority of NiFi users. While there are certainly Java developers 
using it, a large majority of the user base are more data engineer types of 
roles who are not likely to have POJOs. So while I can appreciate the desire to 
contribute something like this, I feel like the burden of maintaining it would 
outweigh the benefit.

I think what would make more sense for this situation would be to have some 
sort of utility outside of NiFi that could take a POJO and generate Avro 
schemas. Those schemas could then be imported directly info NiFi via the 
existing Avro Schema Registry or other schema registry that NiFi integrates 
with if that makes sense for the user.

Thanks
-Mark


> On Mar 7, 2024, at 5:48 PM, Mike Thomsen <[email protected]> wrote:
> 
> Sure. The JAR is not actually scanned. You have to use dynamic properties
> to map a schema name to a particular fully qualified class. I would assume
> that most people using it are just taking a JAR that was generated or
> written by hand to be a client library with only (or mainly) POJOs
> representing the model. You can see a simple example here:
> 
> https://github.com/MikeThomsen/nifi-pojo-schema-repository-bundle/tree/main/test-pojos/src/main/java/org/apache/nifi/pojo/complex
> 
> Nothing fancy per se. It's just POJOs with some standard annotations from
> the Avro lib.
> 
> I would imagine this repo would be a big help for teams that can't/won't
> commit to a contract first design but have to work with NiFi and other big
> data systems.
> 
> On Thu, Mar 7, 2024 at 5:39 PM David Handermann <[email protected]>
> wrote:
> 
>> Mike,
>> 
>> Thanks for the reply. I agree that file and property-based registries
>> are useful, so the main question seems to be a compiled-code-derived
>> registry as you have described.
>> 
>> It seems that the general use case could still be supported through
>> file-backed registry, but without requiring the dynamic class loading
>> associated with a custom JAR.
>> 
>> Loading code from a JAR also presents greater security risks than
>> loading schema files, so if this were to be supported, it would
>> require additional permission restrictions.
>> 
>> To help think through this a bit more, can you describe the use case a
>> bit more? How would someone prepare a JAR for referencing in this
>> proposed registry?
>> 
>> Regards,
>> David Handermann
>> 
>> On Thu, Mar 7, 2024 at 4:30 PM Mike Thomsen <[email protected]>
>> wrote:
>>> 
>>> You raise some good points, but I think there's still ample room for
>>> file-based schema registries within NiFi. With regard to the the edge
>> cases
>>> with schema generation, I think an argument can also be made for "not
>>> letting the perfect be the enemy of the good."
>>> 
>>> On Wed, Mar 6, 2024 at 9:34 AM David Handermann <
>> [email protected]>
>>> wrote:
>>> 
>>>> Mike,
>>>> 
>>>> Thanks for raising this question, and providing the example repository.
>>>> 
>>>> Although it sounds like a POJO-based repository could be useful in
>>>> some cases, it does not seem like something that should be included
>>>> for community support.
>>>> 
>>>> Part of the value of a Schema Registry is a shared location for data
>>>> description. Although supporting property or file-based Schema
>>>> Registries is useful in NiFi itself, the general pattern is
>>>> externalized storage and maintenance of schema definitions.
>>>> 
>>>> From another angle, this could be similar to code-first versus
>>>> contract-first API development. Each approach has its positives and
>>>> negatives. When it comes to a Schema Registry, however, it seems like
>>>> the contract needs to be defined outside of code.
>>>> 
>>>> Introspecting JAR files also raises questions about what to include or
>>>> exclude, and how to handle edge cases for certain class definitions.
>>>> This seems like the more significant problem. For this reason, it
>>>> seems better to rely on external operations to produce Avro schema
>>>> definitions, rather than supporting that in NiFi itself.
>>>> 
>>>> Those are my initial thoughts, perhaps others can provide additional
>>>> perspective.
>>>> 
>>>> Regards,
>>>> David Handermann
>>>> 
>>>> On Sat, Mar 2, 2024 at 9:18 AM Mike Thomsen <[email protected]>
>>>> wrote:
>>>>> 
>>>>> I've had this project on the back burner for a while and wanted to
>> share
>>>> it
>>>>> with the team. It's a schema repository implementation that is
>> designed
>>>> to
>>>>> take a JAR file with POJOs and use Jackson's schema generation API to
>>>>> generate Avro schemas from those on startup. It also uses (via
>> Jackson)
>>>>> Avro annotations to help specify particular implementation details
>> where
>>>>> necessary. The code can be found here. Haven't worked on it lately,
>> but
>>>> it
>>>>> should easily run on 1.25:
>>>>> 
>>>>> https://github.com/MikeThomsen/nifi-pojo-schema-repository-bundle
>>>>> 
>>>>> I am planning to get the repo ready for a PR unless someone raises
>>>> reasons
>>>>> why including it might be a poor fit. I think for a lot of teams this
>>>> might
>>>>> be a killer feature because it would allow them to use Avro with
>> existing
>>>>> enterprise POJOs and stuff like that without having to write them by
>>>> hand.
>>>>> 
>>>>> Thoughts?
>>>> 
>> 

Reply via email to