Re: [discuss] NiFi support for Hadoop ecosystem components

2023-03-24 Thread Jeremy Dyer
I think option 2 is the best way to handle this.

Technology naturally changes over time and some components of Nifi might not 
make the most sense to keep around in the main line for the masses anymore. 
However I really like still having them there for people to very simply add if 
they so desire too. I see other platforms do this by adding a “contrib” repo. 
What if we had something like a “nifi-contrib” or “nifi-emeritus” repo in 
GitHub, Apache GitHub repo, where the community can still be involved as 
desired but also keep things readily available to those who might not even be 
heavily involved in the community?

I even see this as a sustainable pattern for any components that need “moved 
out”.

I wouldn’t even think those components in the “contrib” repo would require 
voting on for releases but someone, or a vendor, could update them via PRs 
after the official release.

Jeremy Dyer

Get Outlook for iOS

From: Chakravarty, G 
Sent: Friday, March 24, 2023 4:36:43 PM
To: dev@nifi.apache.org 
Subject: Re: [discuss] NiFi support for Hadoop ecosystem components

I am wondering if the standard Nifi jdbc/odbc processors with some basic 
testing with the common drivers like Simba etc. Hive drivers can help to 
alleviate the issue without having separate HiveQL processors.

GC

From: Bryan Bende 
Sent: Friday, March 24, 2023 4:05 PM
To: dev@nifi.apache.org 
Subject: Re: [discuss] NiFi support for Hadoop ecosystem components

I lean towards option 2 with the caveat that maybe we don't have to
retain every Hadoop related component when creating this separate set
of components. Mainly I'm thinking that Hive has been the most
problematic to maintain so maybe that is dropped all together. I think
it would be unfortunate to not have publicly available HDFS
processors.

On Fri, Mar 24, 2023 at 3:23 PM Matt Burgess  wrote:
>
> As one of the small number of people that fight the battle, I like the
> idea of Option 1 (full disclosure: I work for a vendor). From a
> community standpoint (I'm on the PMC) I'm not strongly opposed to
> Option 2 although I wouldn't want to be the one managing and releasing
> the artifacts :) Having said that, unless it remained maintained and
> released, I feel like it would just be a component graveyard (or maybe
> more like the Apache Attic), in which case it seems unnecessary and
> that's why I'm behind Option 1. Interested to hear others' thoughts of
> course.
>
> Thanks,
> Matt
>
> On Fri, Mar 24, 2023 at 2:07 PM Joe Witt  wrote:
> >
> > Team,
> >
> > For the full time NiFi has been in Apache we've built with support for
> > various Hadoop ecosystem components like HDFS, Hive, HBase, others,
> > and more recently formats/serialization modes like necessary for
> > Parquet, Orc, Iceberg, etc..
> >
> > All of these things however present endless challenges with
> > compatibility across different versions (Hive being the most difficult
> > by far), vendors (hadoop vendors, cloud vendors, etc..).  And also
> > super notably the incredible number of dependencies, dependency
> > conflicts, inclusions/exclusions, old log libs, vulnerability updates,
> > etc..  And last but certainly not least a big reason why our build has
> > grown so much.
> >
> > We have a couple options:
> > 1. Deprecate these components in NiFi 1.x and drop them entirely in
> > NiFi 2.x.  Leave this as a problem for vendors to deal with.  NiFi
> > users interacting with such components are nearly exclusively doing so
> > with vendors anyway.
> >
> > 2. Remove the components from NiFi main code line and create a
> > separate repo for 'nifi-hadoop-extensions'.  We manage those
> > independently and release them periodically.  They would be available
> > for people to grab the nars if they want to use them.  We include none
> > of them in the convenience binary going forward by default.
> >
> > 3. Change nothing.  Continue to battle with the above listed items.
> > This is admittedly a bit of a non-option.  We can't keep spending the
> > same time/energy on these we have.  It is a very small number of
> > people that fight this battle.
> >
> > Look forward to hearing thoughts on these options or others we might 
> > consider.
> >
> > Thanks


Re: [discuss] NiFi support for Hadoop ecosystem components

2023-03-24 Thread Joe Witt
James

Some are definitely less fun than others with Hive being the most notable.

I should rephrase my vendor thing on point one: It is as far as I know all
vendor supported Hadoop components.  Whether NiFi is or not is a different
point.

Option 2 is the most realistic I suspect but still want to see what people
think.

Basically anything which depends on the ‘hadoop-client’ maven artifact is
where the games begin.

Thanks

On Fri, Mar 24, 2023 at 2:34 PM James Srinivasan 
wrote:

> I'm a Hadoop and Nifi user without vendor support so unsurprisingly aren't
> keen on #1, but then relying on community support and development is always
> going to be a risk for us. If it came to it, we'd probably stop using Nifi
> rather than pay a vendor which would be a real shame.
>
> Are certain Hadoop processors more maintenance heavy than others? Its a
> rather wide ecosystem.
>
> On Fri, 24 Mar 2023, 18:07 Joe Witt,  wrote:
>
> > Team,
> >
> > For the full time NiFi has been in Apache we've built with support for
> > various Hadoop ecosystem components like HDFS, Hive, HBase, others,
> > and more recently formats/serialization modes like necessary for
> > Parquet, Orc, Iceberg, etc..
> >
> > All of these things however present endless challenges with
> > compatibility across different versions (Hive being the most difficult
> > by far), vendors (hadoop vendors, cloud vendors, etc..).  And also
> > super notably the incredible number of dependencies, dependency
> > conflicts, inclusions/exclusions, old log libs, vulnerability updates,
> > etc..  And last but certainly not least a big reason why our build has
> > grown so much.
> >
> > We have a couple options:
> > 1. Deprecate these components in NiFi 1.x and drop them entirely in
> > NiFi 2.x.  Leave this as a problem for vendors to deal with.  NiFi
> > users interacting with such components are nearly exclusively doing so
> > with vendors anyway.
> >
> > 2. Remove the components from NiFi main code line and create a
> > separate repo for 'nifi-hadoop-extensions'.  We manage those
> > independently and release them periodically.  They would be available
> > for people to grab the nars if they want to use them.  We include none
> > of them in the convenience binary going forward by default.
> >
> > 3. Change nothing.  Continue to battle with the above listed items.
> > This is admittedly a bit of a non-option.  We can't keep spending the
> > same time/energy on these we have.  It is a very small number of
> > people that fight this battle.
> >
> > Look forward to hearing thoughts on these options or others we might
> > consider.
> >
> > Thanks
> >
>


Re: [discuss] NiFi support for Hadoop ecosystem components

2023-03-24 Thread James Srinivasan
I'm a Hadoop and Nifi user without vendor support so unsurprisingly aren't
keen on #1, but then relying on community support and development is always
going to be a risk for us. If it came to it, we'd probably stop using Nifi
rather than pay a vendor which would be a real shame.

Are certain Hadoop processors more maintenance heavy than others? Its a
rather wide ecosystem.

On Fri, 24 Mar 2023, 18:07 Joe Witt,  wrote:

> Team,
>
> For the full time NiFi has been in Apache we've built with support for
> various Hadoop ecosystem components like HDFS, Hive, HBase, others,
> and more recently formats/serialization modes like necessary for
> Parquet, Orc, Iceberg, etc..
>
> All of these things however present endless challenges with
> compatibility across different versions (Hive being the most difficult
> by far), vendors (hadoop vendors, cloud vendors, etc..).  And also
> super notably the incredible number of dependencies, dependency
> conflicts, inclusions/exclusions, old log libs, vulnerability updates,
> etc..  And last but certainly not least a big reason why our build has
> grown so much.
>
> We have a couple options:
> 1. Deprecate these components in NiFi 1.x and drop them entirely in
> NiFi 2.x.  Leave this as a problem for vendors to deal with.  NiFi
> users interacting with such components are nearly exclusively doing so
> with vendors anyway.
>
> 2. Remove the components from NiFi main code line and create a
> separate repo for 'nifi-hadoop-extensions'.  We manage those
> independently and release them periodically.  They would be available
> for people to grab the nars if they want to use them.  We include none
> of them in the convenience binary going forward by default.
>
> 3. Change nothing.  Continue to battle with the above listed items.
> This is admittedly a bit of a non-option.  We can't keep spending the
> same time/energy on these we have.  It is a very small number of
> people that fight this battle.
>
> Look forward to hearing thoughts on these options or others we might
> consider.
>
> Thanks
>


Re: [discuss] NiFi support for Hadoop ecosystem components

2023-03-24 Thread Chakravarty, G
I am wondering if the standard Nifi jdbc/odbc processors with some basic 
testing with the common drivers like Simba etc. Hive drivers can help to 
alleviate the issue without having separate HiveQL processors.

GC

From: Bryan Bende 
Sent: Friday, March 24, 2023 4:05 PM
To: dev@nifi.apache.org 
Subject: Re: [discuss] NiFi support for Hadoop ecosystem components

I lean towards option 2 with the caveat that maybe we don't have to
retain every Hadoop related component when creating this separate set
of components. Mainly I'm thinking that Hive has been the most
problematic to maintain so maybe that is dropped all together. I think
it would be unfortunate to not have publicly available HDFS
processors.

On Fri, Mar 24, 2023 at 3:23 PM Matt Burgess  wrote:
>
> As one of the small number of people that fight the battle, I like the
> idea of Option 1 (full disclosure: I work for a vendor). From a
> community standpoint (I'm on the PMC) I'm not strongly opposed to
> Option 2 although I wouldn't want to be the one managing and releasing
> the artifacts :) Having said that, unless it remained maintained and
> released, I feel like it would just be a component graveyard (or maybe
> more like the Apache Attic), in which case it seems unnecessary and
> that's why I'm behind Option 1. Interested to hear others' thoughts of
> course.
>
> Thanks,
> Matt
>
> On Fri, Mar 24, 2023 at 2:07 PM Joe Witt  wrote:
> >
> > Team,
> >
> > For the full time NiFi has been in Apache we've built with support for
> > various Hadoop ecosystem components like HDFS, Hive, HBase, others,
> > and more recently formats/serialization modes like necessary for
> > Parquet, Orc, Iceberg, etc..
> >
> > All of these things however present endless challenges with
> > compatibility across different versions (Hive being the most difficult
> > by far), vendors (hadoop vendors, cloud vendors, etc..).  And also
> > super notably the incredible number of dependencies, dependency
> > conflicts, inclusions/exclusions, old log libs, vulnerability updates,
> > etc..  And last but certainly not least a big reason why our build has
> > grown so much.
> >
> > We have a couple options:
> > 1. Deprecate these components in NiFi 1.x and drop them entirely in
> > NiFi 2.x.  Leave this as a problem for vendors to deal with.  NiFi
> > users interacting with such components are nearly exclusively doing so
> > with vendors anyway.
> >
> > 2. Remove the components from NiFi main code line and create a
> > separate repo for 'nifi-hadoop-extensions'.  We manage those
> > independently and release them periodically.  They would be available
> > for people to grab the nars if they want to use them.  We include none
> > of them in the convenience binary going forward by default.
> >
> > 3. Change nothing.  Continue to battle with the above listed items.
> > This is admittedly a bit of a non-option.  We can't keep spending the
> > same time/energy on these we have.  It is a very small number of
> > people that fight this battle.
> >
> > Look forward to hearing thoughts on these options or others we might 
> > consider.
> >
> > Thanks


Re: [discuss] NiFi support for Hadoop ecosystem components

2023-03-24 Thread Bryan Bende
I lean towards option 2 with the caveat that maybe we don't have to
retain every Hadoop related component when creating this separate set
of components. Mainly I'm thinking that Hive has been the most
problematic to maintain so maybe that is dropped all together. I think
it would be unfortunate to not have publicly available HDFS
processors.

On Fri, Mar 24, 2023 at 3:23 PM Matt Burgess  wrote:
>
> As one of the small number of people that fight the battle, I like the
> idea of Option 1 (full disclosure: I work for a vendor). From a
> community standpoint (I'm on the PMC) I'm not strongly opposed to
> Option 2 although I wouldn't want to be the one managing and releasing
> the artifacts :) Having said that, unless it remained maintained and
> released, I feel like it would just be a component graveyard (or maybe
> more like the Apache Attic), in which case it seems unnecessary and
> that's why I'm behind Option 1. Interested to hear others' thoughts of
> course.
>
> Thanks,
> Matt
>
> On Fri, Mar 24, 2023 at 2:07 PM Joe Witt  wrote:
> >
> > Team,
> >
> > For the full time NiFi has been in Apache we've built with support for
> > various Hadoop ecosystem components like HDFS, Hive, HBase, others,
> > and more recently formats/serialization modes like necessary for
> > Parquet, Orc, Iceberg, etc..
> >
> > All of these things however present endless challenges with
> > compatibility across different versions (Hive being the most difficult
> > by far), vendors (hadoop vendors, cloud vendors, etc..).  And also
> > super notably the incredible number of dependencies, dependency
> > conflicts, inclusions/exclusions, old log libs, vulnerability updates,
> > etc..  And last but certainly not least a big reason why our build has
> > grown so much.
> >
> > We have a couple options:
> > 1. Deprecate these components in NiFi 1.x and drop them entirely in
> > NiFi 2.x.  Leave this as a problem for vendors to deal with.  NiFi
> > users interacting with such components are nearly exclusively doing so
> > with vendors anyway.
> >
> > 2. Remove the components from NiFi main code line and create a
> > separate repo for 'nifi-hadoop-extensions'.  We manage those
> > independently and release them periodically.  They would be available
> > for people to grab the nars if they want to use them.  We include none
> > of them in the convenience binary going forward by default.
> >
> > 3. Change nothing.  Continue to battle with the above listed items.
> > This is admittedly a bit of a non-option.  We can't keep spending the
> > same time/energy on these we have.  It is a very small number of
> > people that fight this battle.
> >
> > Look forward to hearing thoughts on these options or others we might 
> > consider.
> >
> > Thanks


Re: [discuss] NiFi support for Hadoop ecosystem components

2023-03-24 Thread Matt Burgess
As one of the small number of people that fight the battle, I like the
idea of Option 1 (full disclosure: I work for a vendor). From a
community standpoint (I'm on the PMC) I'm not strongly opposed to
Option 2 although I wouldn't want to be the one managing and releasing
the artifacts :) Having said that, unless it remained maintained and
released, I feel like it would just be a component graveyard (or maybe
more like the Apache Attic), in which case it seems unnecessary and
that's why I'm behind Option 1. Interested to hear others' thoughts of
course.

Thanks,
Matt

On Fri, Mar 24, 2023 at 2:07 PM Joe Witt  wrote:
>
> Team,
>
> For the full time NiFi has been in Apache we've built with support for
> various Hadoop ecosystem components like HDFS, Hive, HBase, others,
> and more recently formats/serialization modes like necessary for
> Parquet, Orc, Iceberg, etc..
>
> All of these things however present endless challenges with
> compatibility across different versions (Hive being the most difficult
> by far), vendors (hadoop vendors, cloud vendors, etc..).  And also
> super notably the incredible number of dependencies, dependency
> conflicts, inclusions/exclusions, old log libs, vulnerability updates,
> etc..  And last but certainly not least a big reason why our build has
> grown so much.
>
> We have a couple options:
> 1. Deprecate these components in NiFi 1.x and drop them entirely in
> NiFi 2.x.  Leave this as a problem for vendors to deal with.  NiFi
> users interacting with such components are nearly exclusively doing so
> with vendors anyway.
>
> 2. Remove the components from NiFi main code line and create a
> separate repo for 'nifi-hadoop-extensions'.  We manage those
> independently and release them periodically.  They would be available
> for people to grab the nars if they want to use them.  We include none
> of them in the convenience binary going forward by default.
>
> 3. Change nothing.  Continue to battle with the above listed items.
> This is admittedly a bit of a non-option.  We can't keep spending the
> same time/energy on these we have.  It is a very small number of
> people that fight this battle.
>
> Look forward to hearing thoughts on these options or others we might consider.
>
> Thanks


[discuss] NiFi support for Hadoop ecosystem components

2023-03-24 Thread Joe Witt
Team,

For the full time NiFi has been in Apache we've built with support for
various Hadoop ecosystem components like HDFS, Hive, HBase, others,
and more recently formats/serialization modes like necessary for
Parquet, Orc, Iceberg, etc..

All of these things however present endless challenges with
compatibility across different versions (Hive being the most difficult
by far), vendors (hadoop vendors, cloud vendors, etc..).  And also
super notably the incredible number of dependencies, dependency
conflicts, inclusions/exclusions, old log libs, vulnerability updates,
etc..  And last but certainly not least a big reason why our build has
grown so much.

We have a couple options:
1. Deprecate these components in NiFi 1.x and drop them entirely in
NiFi 2.x.  Leave this as a problem for vendors to deal with.  NiFi
users interacting with such components are nearly exclusively doing so
with vendors anyway.

2. Remove the components from NiFi main code line and create a
separate repo for 'nifi-hadoop-extensions'.  We manage those
independently and release them periodically.  They would be available
for people to grab the nars if they want to use them.  We include none
of them in the convenience binary going forward by default.

3. Change nothing.  Continue to battle with the above listed items.
This is admittedly a bit of a non-option.  We can't keep spending the
same time/energy on these we have.  It is a very small number of
people that fight this battle.

Look forward to hearing thoughts on these options or others we might consider.

Thanks