Given the discussion my sense is that clean option 1 is not really realistic. We're probably looking more likely at option 2 and just have a far less frequent release cycle for them and thus lower overhead. My biggest gripe is the constant workarounds we have in all these poms to wrangle dependencies and include/exclude simply because those projects do not maintain them nearly with the speed/focus that we do.
Thanks On Wed, Mar 29, 2023 at 1:07 PM Steven Matison <[email protected]> wrote: > > I waited to respond becuase I wanted to see the conversation before jumping > in. > > As a user, fan, and consultant I am super excited to even know context > around nifi 2.0 and entire modernization effort that is already on going to > end of 1.x. I am also thankful to be working on some of it myself too. > This is an amazing time for nifi. > > In respect of anyone using "old nifi" this topic is super concerning, but > to me this is not more concerning than why you are still using older > versions and/or why are you needing deep backward compatibility with new > versions that are several years from where you are. Are you not also > modernizing? > > Those things aside, my choice is #1. I speak for those using current > version all way back to some of oldest versions you could imagine are still > online. These people will continue to operate nifi the way they need it > in those versions. If and when they decide to take a new version, they > will, and they will also deal with the inherit challenges to modernize. > Many of them work completely outside of community and will do that > themselves or rely on a vendor who can support that path for them. > > > > > > On Wed, Mar 29, 2023 at 11:03 AM Chakravarty, G <[email protected]> wrote: > > > One of our primary reasons for using Nifi is that it plays nicely with > > connecting with on-prem HDFS/Hive/Kudu data stores. Also, it appears that > > although the on-prem hadoop/hive tech stack is somewhat less popular now, > > the same hdfs/hive technology is appearing in the cloud under different > > names: Google Dataproc, AWS EMR, Azure HDinsight, Iceberg etc. Some type of > > generic components where the Hadoop processors connectivity to Nifi is > > maintained while individual vendors maintain their own connectivity to > > their products will be a good option if possible. > > > > GC > > > > ________________________________ > > From: Isha Lamboo <[email protected]> > > Sent: Monday, March 27, 2023 9:04 AM > > To: [email protected] <[email protected]> > > Subject: RE: [discuss] NiFi support for Hadoop ecosystem components > > > > From the perspective of a NiFi administrator: > > > > Removing the xxxHDFS processors anytime soon (2.0) would be a huge issue > > for us. It shouldn't be, the last Hadoop cluster in our environments was > > shut down earlier this year. Hive was already gone more than a year ago. > > But we still have 1000+ HDFS processors in use to manage the Azure > > Datalake. Azure-specific processors have been available for a while, but > > there was no business case to migrating solutions that were working fine. > > > > Getting the required development time/budget to migrate all those flows to > > the Azure processors doesn't look very realistic. This would have to be a > > gradual "replace when you need to change and test the flow anyway" affair. > > Until that finishes, we'd be stuck on the 1.x branch since we're not using > > vendor support. > > > > Option #2 would be vastly preferable to #1 for this simple and dumb reason. > > > > Disregarding our technical debt issues, I agree that it makes sense for > > NiFi instances with a lot of Hadoop integration to depend on vendors for > > their specific flavor of Hadoop, while core NiFi moves forward without all > > of that complexity. > > > > Regards, > > > > Isha > > > > -----Oorspronkelijk bericht----- > > Van: Nandor Soma Abonyi <[email protected]> > > Verzonden: maandag 27 maart 2023 12:31 > > Aan: [email protected] > > Onderwerp: Re: [discuss] NiFi support for Hadoop ecosystem components > > > > Thank you for raising this topic, Joe! > > > > While I understand the desire to remove Hadoop components, I have mixed > > feelings about removing one of the core parts of the Big Data world from > > the project. I'm unsure for how many users we could make a hard time > > removing those components. It seems to be a too significant shift in our > > philosophy. > > We can already see in the above example that somebody would not use NiFi > > if we'd removed them. > > > > Furthermore, although Hadoop has been buried multiple times, new > > technologies seem to still depend on it. For example, Iceberg, in which > > case I'm worried about the consequences of removing the support for an > > increasingly popular technology. > > > > So I wonder whether it is possible to find a forward-looking solution that > > could serve all projects. I've always found configuring Hadoop and friends > > too tricky and I thought it was primarily for historical reasons. The > > issues you describe could easily result from such a thing. I assume that > > over time, new and new things have been added on top of the existing > > implementation without significant refactoring. > > > > My - probably utopistic - idea would be to contact the Hadoop and Hive > > teams and share the issues we are dealing with. Probably we are not alone > > in these problems, but I don't know whether they are aware of them. Even if > > they are, I think approaching them is worth the chance. Who knows where we > > will end up if somebody representing the NiFi project does that? > > > > Regards, > > Nandor Soma Abonyi > > > > > > > On Mar 24, 2023, at 10:40 PM, Jeremy Dyer <[email protected]> wrote: > > > > > > I think option 2 is the best way to handle this. > > > > > > Technology naturally changes over time and some components of Nifi might > > not make the most sense to keep around in the main line for the masses > > anymore. However I really like still having them there for people to very > > simply add if they so desire too. I see other platforms do this by adding a > > “contrib” repo. What if we had something like a “nifi-contrib” or > > “nifi-emeritus” repo in GitHub, Apache GitHub repo, where the community can > > still be involved as desired but also keep things readily available to > > those who might not even be heavily involved in the community? > > > > > > I even see this as a sustainable pattern for any components that need > > “moved out”. > > > > > > I wouldn’t even think those components in the “contrib” repo would > > require voting on for releases but someone, or a vendor, could update them > > via PRs after the official release. > > > > > > Jeremy Dyer > > > > > > Get Outlook for > > > iOS<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2F > > > aka.ms%2Fo0ukef&data=05%7C01%7Cisha.lamboo%40virtualsciences.nl%7C7a74 > > > 6c107132419b7ec808db2eae6b08%7C21429da9e4ad45f99a6fcd126a64274b%7C0%7C > > > 0%7C638155098878698642%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJ > > > QIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=xXRy > > > LdqqQND5lG1MaBEonKblKwlpmMdKvOH34FouBPI%3D&reserved=0> > > > ________________________________ > > > From: Chakravarty, G <[email protected]> > > > Sent: Friday, March 24, 2023 4:36:43 PM > > > To: [email protected] <[email protected]> > > > Subject: Re: [discuss] NiFi support for Hadoop ecosystem components > > > > > > I am wondering if the standard Nifi jdbc/odbc processors with some basic > > testing with the common drivers like Simba etc. Hive drivers can help to > > alleviate the issue without having separate HiveQL processors. > > > > > > GC > > > ________________________________ > > > From: Bryan Bende <[email protected]> > > > Sent: Friday, March 24, 2023 4:05 PM > > > To: [email protected] <[email protected]> > > > Subject: Re: [discuss] NiFi support for Hadoop ecosystem components > > > > > > I lean towards option 2 with the caveat that maybe we don't have to > > > retain every Hadoop related component when creating this separate set > > > of components. Mainly I'm thinking that Hive has been the most > > > problematic to maintain so maybe that is dropped all together. I think > > > it would be unfortunate to not have publicly available HDFS > > > processors. > > > > > > On Fri, Mar 24, 2023 at 3:23 PM Matt Burgess <[email protected]> > > wrote: > > >> > > >> As one of the small number of people that fight the battle, I like > > >> the idea of Option 1 (full disclosure: I work for a vendor). From a > > >> community standpoint (I'm on the PMC) I'm not strongly opposed to > > >> Option 2 although I wouldn't want to be the one managing and > > >> releasing the artifacts :) Having said that, unless it remained > > >> maintained and released, I feel like it would just be a component > > >> graveyard (or maybe more like the Apache Attic), in which case it > > >> seems unnecessary and that's why I'm behind Option 1. Interested to > > >> hear others' thoughts of course. > > >> > > >> Thanks, > > >> Matt > > >> > > >> On Fri, Mar 24, 2023 at 2:07 PM Joe Witt <[email protected]> wrote: > > >>> > > >>> Team, > > >>> > > >>> For the full time NiFi has been in Apache we've built with support > > >>> for various Hadoop ecosystem components like HDFS, Hive, HBase, > > >>> others, and more recently formats/serialization modes like necessary > > >>> for Parquet, Orc, Iceberg, etc.. > > >>> > > >>> All of these things however present endless challenges with > > >>> compatibility across different versions (Hive being the most > > >>> difficult by far), vendors (hadoop vendors, cloud vendors, etc..). > > >>> And also super notably the incredible number of dependencies, > > >>> dependency conflicts, inclusions/exclusions, old log libs, > > >>> vulnerability updates, etc.. And last but certainly not least a big > > >>> reason why our build has grown so much. > > >>> > > >>> We have a couple options: > > >>> 1. Deprecate these components in NiFi 1.x and drop them entirely in > > >>> NiFi 2.x. Leave this as a problem for vendors to deal with. NiFi > > >>> users interacting with such components are nearly exclusively doing > > >>> so with vendors anyway. > > >>> > > >>> 2. Remove the components from NiFi main code line and create a > > >>> separate repo for 'nifi-hadoop-extensions'. We manage those > > >>> independently and release them periodically. They would be > > >>> available for people to grab the nars if they want to use them. We > > >>> include none of them in the convenience binary going forward by > > default. > > >>> > > >>> 3. Change nothing. Continue to battle with the above listed items. > > >>> This is admittedly a bit of a non-option. We can't keep spending > > >>> the same time/energy on these we have. It is a very small number of > > >>> people that fight this battle. > > >>> > > >>> Look forward to hearing thoughts on these options or others we might > > consider. > > >>> > > >>> Thanks > > > >
