I'd like to echo many of the comments / discussion points here,
including the extension registry (#3), NAR packs, and mixins. A couple
of additional comments and caveats:

NAR package management:

- Grouping NAR packs based on functionality (Hadoop, RDBMS, etc.) is a
good first start but it still seems like we'd want to end up with an a
la carte capability at the end. An incremental approach might be to
have a simple graphical tool (in the toolkit?) pointing at your NiFi
install and some common repository, where you can add and delete NAR
packs, but also delete individual NARs from your NiFi install. The use
case here is when you download the Hadoop NAR pack for HBase and
related components, but don't want things like the Hive NAR (which I
think is the largest at ~93MB).

- Some NiFi installs will be located on systems that cannot contact an
outside (or any external) repository. When we consider NAR
repositories, we should consider providing a repo-to-go or something
of that sort. At the very least I would think the Extension Registry
itself would support such a thing; the ability to have an Extension
Registry anywhere, not just attached to Bintray or Apache repo HTTP
pages, etc.

- Murphy's Law says as soon as we pick NAR pack boundaries, there will
be components that don't fit well into one or another, or they fit
into more than one. For instance, a user might expect the Spark/Livy
NAR to be in the Hadoop NAR pack but there is no requirement for Spark
or Livy to run on Hadoop. Perhaps with a "Big Data" NAR pack (versus
Hadoop) it would encompass the Hadoop and Spark stuff, but then where
does Cassandra fit in? It certainly handles Big Data, but if there
were a "NoSQL" NAR pack, which should it belong to (or can it be in
both?).

- Because NARs are unpacked before use in NiFi, there are two related
footprints, the footprint of the NARs in the lib/ folder, and the
footprint of the unpacked NARs. As part of the "duplicate JARs"
discussion, this also segues into another area, the runtime footprint
(to include classloader hierarchies, etc.)

Optimized JARs/classloading

- Promoting JARs to the lib/ folder because they are common to many
processors is not the right solution IMO. With parent-first
classloaders (which is what NarClassLoaders are), if you had a NAR
that needed a different version of a library, then it would find the
parent version first and would likely cause issues.  We could make the
NarClassLoader self-first (which we might want to do under other
circumstances anyway), but then care would need to be taken to ensure
that shared/API dependencies are indeed "provided".

- I do like the idea of "promotion" though, not just for JAR
deduplication but also for better classloading. Here's an idea for how
we might achieve this. When unpacking NARs, we would do something
similar to a Maven install, where we build up a repository of
artifacts. If two artifacts are the same (we'd likely want to verify
checksums too, not just Maven coordinates), they'd install to the same
place. At the end of NAR unpacking, the repo would contain unique
(de-duplicated) JARs, and each NAR would have a bill-of-materials
(BOM) from which to build its classloader.  An possible runtime
improvement on top of that is to build a classloader hierarchy, where
JARs shared by multiple NARs could be in their own classloader, which
would be the parent of the NARs' classloaders. This way, instead of
the same classes loaded into each NAR's classloader, they would only
be loaded once into a shared parent. This "de-dupes" the memory
footprint of the JARs as well. Hopefully the construction of the
classloader graph would not be too computationally intensive, but we
could have a best-effort algorithm rather than an optimal one if that
were an issue.

Thoughts? Thanks,
Matt



On Tue, Jan 16, 2018 at 12:52 PM, Kevin Doran <kdo...@apache.org> wrote:
> Nice discussion on this thread.
>
> I'm also in favor of the long-term solution being publishing extension NARs 
> to an extension registry (#3) and removing them from the NiFi convenience 
> binary.
>
> A few thoughts that build upon what others have said:
>
> 1. Many decisions, such as the structure of the project/repo(s) and mechanics 
> of the release, don't have to be made right away, though it is probably good 
> to start considering the impacts of various approaches as people have. There 
> is a lot that has to be done to make progress towards the long-term goal 
> regardless of those decisions, some of which follows below.
>
> 2. We can start adding support for extensions to the Registry project 
> (obviously).
>
> 3. As James W and others have pointed out, start classifying which components 
> belong in the "core" convenience binary and which ones will be published 
> separately. For the ones published separately, we can further classifying 
> them down into categories / "packs" to reduce the burden on end-users.
>
> 4. Anticipating that the release cycles of NiFI-core and extensions will 
> eventually be separated, we should design a way for versioned extensions to 
> declare which versions of (Mi)NiFi they are compatible with, i.e. as a 
> semantic version range. There are lots of good examples to pull from; pretty 
> much any modern package management framework has some concept of support for 
> >=, ==, =~ syntax that honors semantic versioning. If done well, this should 
> reduce the burden of managing separate release cycles as minor and patch 
> releases of NiFi will be backwards compatible w.r.t the public APIs used by 
> extensions, so in most cases extensions declaring a simple 'NiFi >= 1.x' 
> should suffice.
>
> 4b. Likewise, when defining a versioned flow in NiFi, the user should be able 
> to fix the version of each processor to a specific extension version.
>
> 5. Great work Tony in surfacing some data on NAR size and jar duplication 
> across NARs. Following up on Bryan's email that explores possible solutions 
> to this, I think the best approach would be the concept of lib NARs and a 
> more flexible NAR dependency declaration/evaluation mechanism, e.g., the 
> "mix-in style" Bryan described vs. the current single-class inheritance 
> style. I'm not sure what work this would require for making the runtime 
> classpaths work correctly. For just developing/ publishing/installing NARs in 
> this style leveraging an extension registry, we are getting pretty close to 
> describing a full-fledged package manager, both on the server side (NiFi 
> Registry) and client side (publishing tooling and NiFi for importing flows 
> that reference processors that declare dependencies). Given that NAR packs 
> could solve the immediate problem of reducing the size of individual 
> binaries, I think we should make jar de-duplication a goal for after a 
> functional extension registry, while keeping it in mind for the design of the 
> extension registry.
>
> On 1/16/18, 11:08, "Bryan Bende" <bbe...@gmail.com> wrote:
>
>     I still like the "NAR packs" idea even for the single repo approach. I
>     think if we only provide a "light" binary and then say that everything
>     else has to be built on your own, it creates a big barrier to entry
>     for a lot of users. With the NAR packs approach we could provide one
>     binary that is the actual application, and then multiple zips/tars
>     that each contain a set of NARs. So someone gets the first binary and
>     then adds whichever NAR packs to it. This solves the immediate problem
>     of having any single binary exceed a certain size.
>
>     As a side effect of whatever we do, I was also hoping we could make
>     the build process easier for folks working on the framework. If all we
>     do is change our current assembly, I think you'd still incur the time
>     of building all the NARs since they are listed in the modules section
>     nifi-nar-bundles pom, even though most of them wouldn't be included in
>     the new "light" assembly. We'd have to consider restructuring the git
>     repo a little bit if this was something we wanted to do. Possibly the
>     top-level could be divided into "nifi-core" and "nifi-nar-bundles",
>     where nifi-core produced the light assembly so folks working on the
>     framework can build this quickly, but if you want to build everything
>     then you build from the root pom which also builds all the NAR packs.
>     Just something to think about if we are going to make changes.
>
>     Regarding the duplication of many JARs (thanks for putting the data
>     together Tony!)...
>
>     We could try to collapse common dependencies so that we don't end up
>     with so many duplicate copies of the same JAR, but I don't know
>     exactly how we'd set this up...
>
>     We could promote a JAR to the lib directory which makes it visible to
>     every single NAR and thus no longer needs to be bundled into each NAR.
>     That works great for the NARs that already use the dependency, but now
>     means that a bunch of other NARs have this extra thing on the
>     classpath, and also means we are forcing the version of that library
>     upon every NAR which somewhat defeats the purpose of NARs.
>
>     We could create "lib" NARs, similar to the original intent of
>     nifi-hadoop-libraries-nar. For example, we could create
>     nifi-jackson-libraries-nar, and then any NAR that needs jackson would
>     have this as their parent. This gets tricky when their is more than
>     one library in play, for example lets say we also had
>     nifi-bcprov-libraries-nar, and then some other NAR needs jackson and
>     bcprov, there can be only one parent NAR so you can only pick one of
>     them. You could chain things together, but then how do you decide the
>     order of the chain... nifi-xyz-nar -> nifi-jackson-nar ->
>     nifi-bcprov-nar  VS. nifi-xyz-nar -> nifi-bcprov-nar ->
>     nifi-jackson-nar.
>
>     Right now having a NAR dependency is like single class inheritance,
>     and it seems like we would also need a mix-in style NAR dependency to
>     be able to add multiple lib NARs without getting into this chaining
>     issue.
>
>
>     On Tue, Jan 16, 2018 at 5:14 AM, Mike Thomsen <mikerthom...@gmail.com> 
> wrote:
>     > Also maybe #4: Message Queue support (JMS, Kafka, etc.)
>     >
>     > On Tue, Jan 16, 2018 at 5:13 AM, Mike Thomsen <mikerthom...@gmail.com>
>     > wrote:
>     >
>     >> One possibility: 3 "packs." Such as:
>     >>
>     >> 1. Big Data.
>     >> 2. Search
>     >> 3. Non-BD NoSQL.
>     >>
>     >> Each pack would be an assembly of NARs that correspond to the category.
>     >>
>     >> The core would have JDBC support and all of the data mutator 
> processors.
>     >>
>     >> On Mon, Jan 15, 2018 at 11:54 PM, James Wing <jvw...@gmail.com> wrote:
>     >>
>     >>> I think a reduced build is a good way forward until the extension 
> registry
>     >>> is ready.  If we can publish the remaining processors in one or more
>     >>> additional artifacts, that would be ideal.  The admin burden of more 
> git
>     >>> repositories or separate releases does not appeal to me, especially 
> since
>     >>> we do not believe it to be our long-term path.
>     >>>
>     >>> It's not going to be easy to decide on a "core" build with "extras" 
> sold
>     >>> separately. But we will have to confront the division for the registry
>     >>> solution in any case, we might as well get started on it.
>     >>>
>     >>> On Sun, Jan 14, 2018 at 1:37 PM, Mike Thomsen <mikerthom...@gmail.com>
>     >>> wrote:
>     >>>
>     >>> > Since the limit was bumped to 1.6GB, it might be prudent to not do 
> too
>     >>> much
>     >>> > NiFi 1.X and instead focus on a comprehensive solution that 
> coincides
>     >>> with
>     >>> > 2.0. I think that would be a time when a lot of users might expect 
> and
>     >>> be
>     >>> > tolerant of breaking changes on issues like this.
>     >>> >
>     >>> > Also, is there a clear process for deprecating processors? If not, 
> there
>     >>> > should be because it would be really helpful for doing cleanup.
>     >>> >
>     >>> > On Sat, Jan 13, 2018 at 7:53 PM, Brett Ryan <brett.r...@gmail.com>
>     >>> wrote:
>     >>> >
>     >>> > > Why are core modules not listing everything as provided?
>     >>> > >
>     >>> > > IDE’s solve this problem with the use of dependency libraries. As 
> an
>     >>> > > example NetBeans nbm’s have a single purpose, you must export the
>     >>> > packages
>     >>> > > to be exposed.
>     >>> > >
>     >>> > > We do the same with confluence modules using felix.
>     >>> > >
>     >>> > > Why is NiFi doing things different just so the person who wants to
>     >>> > install
>     >>> > > many custom nars can be lazy?
>     >>> > >
>     >>> > > > On 14 Jan 2018, at 08:59, Tony Kurc <trk...@gmail.com> wrote:
>     >>> > > >
>     >>> > > > I added some more stats to the wiki page, trying to determine 
> what
>     >>> > > > dependencies are included in jars. It seems like there is
>     >>> opportunity.
>     >>> > > >
>     >>> > > > Highlights, 50 copies of what appears to be some version of
>     >>> > bcprov-jdk15
>     >>> > > > for a total of 162M. 51 copies of jackson-databind.
>     >>> > > >
>     >>> > > > total size       copies  jar
>     >>> > > >     30.97MB     65     META-INF/bundled-dependencies/
>     >>> > > commons-lang3-XXX.jar
>     >>> > > >     32.53MB     50     META-INF/bundled-dependencies/
>     >>> > > bcpkix-jdk15on-XXX.jar
>     >>> > > >     33.55MB     16     
> META-INF/bundled-dependencies/guava-XXX.jar
>     >>> > > >     39.62MB      1     META-INF/bundled-dependencies/
>     >>> > > jython-shaded-XXX.jar
>     >>> > > >     63.06MB     51
>     >>> > > > META-INF/bundled-dependencies/jackson-databind-XXX.jar
>     >>> > > >    162.07MB     50     META-INF/bundled-dependencies/
>     >>> > > bcprov-jdk15on-XXX.jar
>     >>> > > >
>     >>> > > >
>     >>> > > >> On Sat, Jan 13, 2018 at 2:09 PM, Joey Frazee <
>     >>> joey.fra...@icloud.com>
>     >>> > > wrote:
>     >>> > > >>
>     >>> > > >> I tend to have feelings similar to Michael about a multi-repo
>     >>> > approach.
>     >>> > > >> I’ve rarely seen it help and more often seen it hurt — it’s
>     >>> confusing
>     >>> > > >> (especially to newcomers), stuff gets neglected because it’s
>     >>> easier to
>     >>> > > >> ignore, you need another master project or some such to do an
>     >>> entire
>     >>> > > build.
>     >>> > > >>
>     >>> > > >> Maybe git submodules could help mitigate this, but creating
>     >>> > independent
>     >>> > > >> assemblies or using different build profiles to enable 
> building and
>     >>> > > >> packaging the binaries in different ways would satisfy 
> everything
>     >>> > except
>     >>> > > >> disentangling the releases.
>     >>> > > >>
>     >>> > > >> -joey
>     >>> > > >>
>     >>> > > >>> On Jan 13, 2018, 12:40 PM -0600, Brandon DeVries 
> <b...@jhu.edu>,
>     >>> > wrote:
>     >>> > > >>> I agree... Long term extension registry, short term one repo 
> with
>     >>> > > >> different
>     >>> > > >>> assemblies (e.g. standard, slim, analytic, etc...).
>     >>> > > >>>
>     >>> > > >>> Brandon
>     >>> > > >>>
>     >>> > > >>> On Sat, Jan 13, 2018 at 1:35 PM Pierre Villard <
>     >>> > > >> pierre.villard...@gmail.com
>     >>> > > >>> wrote:
>     >>> > > >>>
>     >>> > > >>>> Option #3 also has my preference. But it's probably a good 
> idea
>     >>> to
>     >>> > > only
>     >>> > > >>>> keep one git repo and play with the assembly and Maven 
> profiles
>     >>> for
>     >>> > > the
>     >>> > > >>>> releases, no? It'd be certainly easier for release management
>     >>> > process.
>     >>> > > >> But
>     >>> > > >>>> this decision could also depend on how the option #3 is 
> going to
>     >>> be
>     >>> > > >>>> implemented I guess.
>     >>> > > >>>>
>     >>> > > >>>> 2018-01-13 6:36 GMT-07:00 Joe Witt <joe.w...@gmail.com>:
>     >>> > > >>>>
>     >>> > > >>>>> thanks tony!
>     >>> > > >>>>>
>     >>> > > >>>>>> On Jan 12, 2018 10:48 PM, "Tony Kurc" <trk...@gmail.com>
>     >>> wrote:
>     >>> > > >>>>>>
>     >>> > > >>>>>> I put some of the data I was working with on the wiki -
>     >>> > > >>>>>>
>     >>> > > >>>>>> https://cwiki.apache.org/confluence/display/NIFI/NiFi+
>     >>> > > >> 1.5.0+nar+files
>     >>> > > >>>>>>
>     >>> > > >>>>>> On Fri, Jan 12, 2018 at 10:28 PM, Jeremy Dyer <
>     >>> jdy...@gmail.com
>     >>> > > >>>> wrote:
>     >>> > > >>>>>>
>     >>> > > >>>>>>> So my favorite option is Bryan’s option number “three” of
>     >>> using
>     >>> > > >> the
>     >>> > > >>>>>>> extension registry. Now my thought is do we really need 
> to add
>     >>> > > >>>>> complexity
>     >>> > > >>>>>>> and do anything in the mean time or just focus on that?
>     >>> Meaning
>     >>> > > >> we
>     >>> > > >>>> have
>     >>> > > >>>>>>> roughly 500mb of available capacity today so why don’t we
>     >>> spend
>     >>> > > >> those
>     >>> > > >>>>> man
>     >>> > > >>>>>>> hours we would spend on getting the second repo up on the
>     >>> > > >> extension
>     >>> > > >>>>>>> registry instead?
>     >>> > > >>>>>>>
>     >>> > > >>>>>>> @Bryan do you have thoughts about the deployment of those 
> bars
>     >>> > > >> in the
>     >>> > > >>>>>>> extension registry? Since we won’t be able to build the
>     >>> release
>     >>> > > >>>> binary
>     >>> > > >>>>>>> anymore would we still need to create separate repos for 
> the
>     >>> > > >> nars or
>     >>> > > >>>>>> no?? I
>     >>> > > >>>>>>> have used the registry a little but I’m not 100% sure on 
> your
>     >>> > > >> vision
>     >>> > > >>>>> for
>     >>> > > >>>>>>> the nars
>     >>> > > >>>>>>>
>     >>> > > >>>>>>> - Jeremy Dyer
>     >>> > > >>>>>>>
>     >>> > > >>>>>>> Sent from my iPhone
>     >>> > > >>>>>>>
>     >>> > > >>>>>>>> On Jan 12, 2018, at 10:18 PM, Tony Kurc 
> <tk...@apache.org>
>     >>> > > >> wrote:
>     >>> > > >>>>>>>>
>     >>> > > >>>>>>>> I was looking at nar sizes, and thought some data may be
>     >>> > > >> helpful. I
>     >>> > > >>>>>> used
>     >>> > > >>>>>>> my recent RC1 verification as a basis for getting file 
> sizes,
>     >>> and
>     >>> > > >>>> just
>     >>> > > >>>>>> got
>     >>> > > >>>>>>> the file size for each file in the assembly named 
> "*.nar". I
>     >>> > > >> don't
>     >>> > > >>>> know
>     >>> > > >>>>>>> whether the images I pasted in will go through, but I made
>     >>> some
>     >>> > > >>>>> graphs.b
>     >>> > > >>>>>>> The first is a histogram of nar file size in buckets of 
> 10MB.
>     >>> The
>     >>> > > >>>>> second
>     >>> > > >>>>>>> basically is similar to a cumulative distribution, the x 
> axis
>     >>> is
>     >>> > > >> the
>     >>> > > >>>>>> "rank"
>     >>> > > >>>>>>> of the nar (smallest to largest), and the y-axis is how 
> what
>     >>> > > >> fraction
>     >>> > > >>>>> of
>     >>> > > >>>>>>> the all the sizes of the nars together are that rank or
>     >>> lower. In
>     >>> > > >>>> other
>     >>> > > >>>>>>> words, on the graph, the dot at 60 and ~27 means that the
>     >>> > > >> smallest 60
>     >>> > > >>>>>> nars
>     >>> > > >>>>>>> contribute only ~27% of the total. Of note, the standard 
> and
>     >>> > > >>>> framework
>     >>> > > >>>>>> nars
>     >>> > > >>>>>>> are at 83 and 84.
>     >>> > > >>>>>>>>
>     >>> > > >>>>>>>>
>     >>> > > >>>>>>>>
>     >>> > > >>>>>>>>
>     >>> > > >>>>>>>>
>     >>> > > >>>>>>>>> On Fri, Jan 12, 2018 at 5:04 PM, Michael Moser <
>     >>> > > >>>> moser...@gmail.com
>     >>> > > >>>>>>> wrote:
>     >>> > > >>>>>>>>> And of course, as I hit <send> I thought of one more 
> thing.
>     >>> > > >>>>>>>>>
>     >>> > > >>>>>>>>> We could keep all of the code in 1 git repo (1 project) 
> but
>     >>> > > >> the
>     >>> > > >>>>>>>>> nifi-assembly part of the build could be broken up to 
> build
>     >>> > > >> core
>     >>> > > >>>>> NiFi
>     >>> > > >>>>>>>>> separately from the tar/zip functional grouping of other
>     >>> > > >> NARs.
>     >>> > > >>>>>>>>>
>     >>> > > >>>>>>>>> On Fri, Jan 12, 2018 at 5:01 PM, Michael Moser <
>     >>> > > >>>> moser...@gmail.com
>     >>> > > >>>>>>> wrote:
>     >>> > > >>>>>>>>>
>     >>> > > >>>>>>>>>> Long term I would also like to see #3 be the solution. 
> I
>     >>> > > >> think
>     >>> > > >>>>> what
>     >>> > > >>>>>>>>>> Joseph N described could be part of the capabilities 
> of #3.
>     >>> > > >>>>>>>>>>
>     >>> > > >>>>>>>>>> I would like to add a note of caution with respect to
>     >>> > > >>>> reorganizing
>     >>> > > >>>>>> and
>     >>> > > >>>>>>>>>> releasing extension bundles separately:
>     >>> > > >>>>>>>>>>
>     >>> > > >>>>>>>>>> - the burden on release manager expands because many 
> more
>     >>> > > >>>>>> projects
>     >>> > > >>>>>>>>>> have to be released; probably not all on each release 
> cycle
>     >>> > > >>>> but
>     >>> > > >>>>>> it
>     >>> > > >>>>>>> could
>     >>> > > >>>>>>>>>> still be many
>     >>> > > >>>>>>>>>> - the chance of accidentally forgetting to release a
>     >>> > > >> project
>     >>> > > >>>>> in a
>     >>> > > >>>>>>>>>> release cycle becomes non-zero
>     >>> > > >>>>>>>>>> - sharing code between projects gets a bit harder 
> because
>     >>> > > >> you
>     >>> > > >>>>>> have
>     >>> > > >>>>>>> to
>     >>> > > >>>>>>>>>> manage releasing projects in a specific order
>     >>> > > >>>>>>>>>> - it becomes harder to find all of the projects that 
> need
>     >>> > > >> to
>     >>> > > >>>>>> change
>     >>> > > >>>>>>>>>> when shared code is added
>     >>> > > >>>>>>>>>> - the simple act of finding code becomes harder ... in
>     >>> > > >> which
>     >>> > > >>>>>>> project
>     >>> > > >>>>>>>>>> is that class in? (IDEs like IntelliJ can search in 1
>     >>> > > >>>> project,
>     >>> > > >>>>>> but
>     >>> > > >>>>>>> if they
>     >>> > > >>>>>>>>>> search across multiple projects, then I haven't learned
>     >>> > > >> how)
>     >>> > > >>>>>>>>>>
>     >>> > > >>>>>>>>>> I used to maintain several nars in separate projects, 
> and
>     >>> > > >>>> recently
>     >>> > > >>>>>>>>>> reorganized them into 1 project (following NiFi's
>     >>> > > >> multi-module
>     >>> > > >>>>> maven
>     >>> > > >>>>>>> build)
>     >>> > > >>>>>>>>>> and life has become much easier!
>     >>> > > >>>>>>>>>>
>     >>> > > >>>>>>>>>> -- Mike
>     >>> > > >>>>>>>>>>
>     >>> > > >>>>>>>>>>
>     >>> > > >>>>>>>>>>
>     >>> > > >>>>>>>>>> On Fri, Jan 12, 2018 at 4:33 PM, Chris Herrera <
>     >>> > > >>>>>>> chris.herrer...@gmail.com
>     >>> > > >>>>>>>>>> wrote:
>     >>> > > >>>>>>>>>>
>     >>> > > >>>>>>>>>>> I very much like the solution proposed by Bryan below.
>     >>> > > >> This
>     >>> > > >>>> would
>     >>> > > >>>>>>> allow
>     >>> > > >>>>>>>>>>> for a cleaner docker image as well, while still 
> proving
>     >>> > > >> the
>     >>> > > >>>>>>> functionality
>     >>> > > >>>>>>>>>>> as needed. For sure, the extension registry will be
>     >>> > > >> great, but
>     >>> > > >>>> in
>     >>> > > >>>>>>> the mean
>     >>> > > >>>>>>>>>>> time this is an adequate mid step.
>     >>> > > >>>>>>>>>>>
>     >>> > > >>>>>>>>>>> Regards,
>     >>> > > >>>>>>>>>>> Chris
>     >>> > > >>>>>>>>>>>
>     >>> > > >>>>>>>>>>> On Jan 12, 2018, 2:52 PM -0600, Bryan Bende <
>     >>> > > >> bbe...@gmail.com
>     >>> > > >>>>> ,
>     >>> > > >>>>>>> wrote:
>     >>> > > >>>>>>>>>>>> Long term I'd like to see the extension registry take
>     >>> > > >> form
>     >>> > > >>>> and
>     >>> > > >>>>>> have
>     >>> > > >>>>>>>>>>>> that be the solution (#3).
>     >>> > > >>>>>>>>>>>>
>     >>> > > >>>>>>>>>>>> In the more near term, we could separate all of the
>     >>> > > >> NARs,
>     >>> > > >>>>> except
>     >>> > > >>>>>>> for
>     >>> > > >>>>>>>>>>>> framework and maybe standard processors & services,
>     >>> > > >> into a
>     >>> > > >>>>>> separate
>     >>> > > >>>>>>>>>>>> git repo.
>     >>> > > >>>>>>>>>>>>
>     >>> > > >>>>>>>>>>>> In that new git repo we could organize things like 
> Joe
>     >>> > > >> N just
>     >>> > > >>>>>>>>>>>> described according to some kind of functional
>     >>> > > >> grouping. Each
>     >>> > > >>>>> of
>     >>> > > >>>>>>> these
>     >>> > > >>>>>>>>>>>> functional bundles could produce its own tar/zip 
> which
>     >>> > > >> we can
>     >>> > > >>>>>> make
>     >>> > > >>>>>>>>>>>> available for download.
>     >>> > > >>>>>>>>>>>>
>     >>> > > >>>>>>>>>>>> That would separate the release cycles between core
>     >>> > > >> NiFi and
>     >>> > > >>>>> the
>     >>> > > >>>>>>> other
>     >>> > > >>>>>>>>>>>> NARs, and also avoid having any single binary 
> artifact
>     >>> > > >> that
>     >>> > > >>>>> gets
>     >>> > > >>>>>>> too
>     >>> > > >>>>>>>>>>>> large.
>     >>> > > >>>>>>>>>>>>
>     >>> > > >>>>>>>>>>>>
>     >>> > > >>>>>>>>>>>>
>     >>> > > >>>>>>>>>>>> On Fri, Jan 12, 2018 at 3:43 PM, Joseph Niemiec <
>     >>> > > >>>>>>> josephx...@gmail.com
>     >>> > > >>>>>>>>>>> wrote:
>     >>> > > >>>>>>>>>>>>> just a random thought.
>     >>> > > >>>>>>>>>>>>>
>     >>> > > >>>>>>>>>>>>> Drop In Lib packs... All the Hadoop ones in one
>     >>> > > >> package for
>     >>> > > >>>>>>> example
>     >>> > > >>>>>>>>>>> that
>     >>> > > >>>>>>>>>>>>> can be added to a slim Nifi install. Another may be
>     >>> > > >> for
>     >>> > > >>>>> Cloud,
>     >>> > > >>>>>> or
>     >>> > > >>>>>>>>>>> Database
>     >>> > > >>>>>>>>>>>>> Interactions, Integration (JMS, FTP, etc) of course
>     >>> > > >>>> defining
>     >>> > > >>>>>>> these
>     >>> > > >>>>>>>>>>> groups
>     >>> > > >>>>>>>>>>>>> would be the tricky part... Or perhaps some type of
>     >>> > > >>>> installer
>     >>> > > >>>>>>> which
>     >>> > > >>>>>>>>>>> allows
>     >>> > > >>>>>>>>>>>>> you to elect which packages to download to add to
>     >>> > > >> the slim
>     >>> > > >>>>>>> install?
>     >>> > > >>>>>>>>>>>>>
>     >>> > > >>>>>>>>>>>>>
>     >>> > > >>>>>>>>>>>>> On Fri, Jan 12, 2018 at 3:10 PM, Joe Witt <
>     >>> > > >>>>> joe.w...@gmail.com
>     >>> > > >>>>>>> wrote:
>     >>> > > >>>>>>>>>>>>>
>     >>> > > >>>>>>>>>>>>>> Team,
>     >>> > > >>>>>>>>>>>>>>
>     >>> > > >>>>>>>>>>>>>> The NiFi convenience binary (tar.gz/zip) size has
>     >>> > > >> grown
>     >>> > > >>>> to
>     >>> > > >>>>>>> 1.1GB now
>     >>> > > >>>>>>>>>>>>>> in the latest release. Apache infra expanded it to
>     >>> > > >> 1.6GB
>     >>> > > >>>>>>> allowance
>     >>> > > >>>>>>>>>>>>>> for us but has stated this is the last time.
>     >>> > > >>>>>>>>>>>>>> https://issues.apache.org/jira/browse/INFRA-15816
>     >>> > > >>>>>>>>>>>>>>
>     >>> > > >>>>>>>>>>>>>> We need consider:
>     >>> > > >>>>>>>>>>>>>> 1) removing old nars/less commonly used nars/or
>     >>> > > >>>>> particularly
>     >>> > > >>>>>>> massive
>     >>> > > >>>>>>>>>>>>>> nars from the assembly we distribute by default.
>     >>> > > >> Folks
>     >>> > > >>>> can
>     >>> > > >>>>>>> still use
>     >>> > > >>>>>>>>>>>>>> these things if they want just not from our
>     >>> > > >> convenience
>     >>> > > >>>>>> binary
>     >>> > > >>>>>>>>>>>>>> 2) collapsing nars with highly repeating deps
>     >>> > > >>>>>>>>>>>>>> 3) Getting the extension registry baked into the
>     >>> > > >> Flow
>     >>> > > >>>>>> Registry
>     >>> > > >>>>>>> then
>     >>> > > >>>>>>>>>>>>>> moving to separate releases for extension bundles.
>     >>> > > >> The
>     >>> > > >>>> main
>     >>> > > >>>>>>> release
>     >>> > > >>>>>>>>>>>>>> then would be just the NiFi framework.
>     >>> > > >>>>>>>>>>>>>>
>     >>> > > >>>>>>>>>>>>>> Any other ideas ?
>     >>> > > >>>>>>>>>>>>>>
>     >>> > > >>>>>>>>>>>>>> I'll plan to start identifying candiates for
>     >>> > > >> removal
>     >>> > > >>>> soon.
>     >>> > > >>>>>>>>>>>>>>
>     >>> > > >>>>>>>>>>>>>> Thanks
>     >>> > > >>>>>>>>>>>>>> Joe
>     >>> > > >>>>>>>>>>>>>>
>     >>> > > >>>>>>>>>>>>>
>     >>> > > >>>>>>>>>>>>>
>     >>> > > >>>>>>>>>>>>>
>     >>> > > >>>>>>>>>>>>> --
>     >>> > > >>>>>>>>>>>>> Joseph
>     >>> > > >>>>>>>>>>>
>     >>> > > >>>>>>>>>>
>     >>> > > >>>>>>>>>>
>     >>> > > >>>>>>>>
>     >>> > > >>>>>>>
>     >>> > > >>>>>>
>     >>> > > >>>>>
>     >>> > > >>>>
>     >>> > > >>
>     >>> > >
>     >>> >
>     >>>
>     >>
>     >>
>
>
>

Reply via email to