Well, as always, if someone wants to code something to support an easier way for users to deploy flume I’m all for it. I know Tristan has been working on creating a Docker image, but I suspect it is going to need something like what you suggest as the default docker image should only contain the minimum amount of stuff to have a working flume instance. For example, I only use the file channel, Avro sources and sinks, and Kafka Sink. Everything else packaged by default would just be stuff I would want to remove.
Ralph > On Mar 28, 2022, at 9:44 PM, Bessenyei Balázs Donát <bes...@apache.org> wrote: > >> I assume you mean the released source > > I was thinking of a git reference (just like you can do with pip or > npm) so that people can more easily mix and match, but I don't have > strong opinions about this. > > > Donat > > On Mon, Mar 28, 2022 at 7:31 PM Ralph Goers <ralph.go...@dslextreme.com> > wrote: >> >> Every release a project does requires a vote and has to meet the ASFs >> requirements for a release. That said, Apache Maven has seemingly dozens of >> plugins that are all independently managed and released. If you look at the >> Maven dev list you will see release votes happening for various things >> several times a month. But their process used to include something that each >> release manager updated to track the releases so they could be included in >> the board report. Now I believe that is all handled by the Apache Reporter >> service. >> >> I don’t believe our process would be quite that loose. For one, I really >> don’t consider the way Flume allows new components to be a true plugin >> architecture. I would still anticipate we would group releases of things but >> nothing says it has to be like that. >> >> Downloading the source? I assume you mean the released source. That would be >> available either by downloading the release zip/tar from the ASF >> distribution site or by checking out the release tag from git. But I don’t >> understand why you would do that. >> I use a customized version of Flume but I build it from the flume zip. This >> is a bit painful as I have to then delete stuff I don’t want or want to >> override. It would actually be easier for me if I were to reference the >> various Flume artifacts I need as dependencies and use the dependency plugin >> to add them to the Flume application I am building. >> >> Ralph >> >>> On Mar 28, 2022, at 9:57 AM, Bessenyei Balázs Donát <bes...@apache.org> >>> wrote: >>> >>> What does the "module releases" thing look like from an ASF release >>> (process - voting, etc.) perspective? >>> >>> Alternatively, do we want a mechanism to be able to add modules >>> directly from source? (Homebrew-style) >>> >>> >>> Donat >>> >>> On Mon, Mar 28, 2022 at 6:43 PM Ralph Goers <ralph.go...@dslextreme.com> >>> wrote: >>>> >>>> Thanks for the reply! >>>> >>>> In general I agree with what you are proposing. I’d probably suggest once >>>> a quarter instead of every 2 months. I also wouldn’t necessarily have a >>>> release of every component every quarter. If there have been no changes >>>> there isn’t much of a point. And requiring that everything be released >>>> together doesn’t really help. I would suggest that Flume would have a >>>> flume-parent module that includes a parent pom.xml that all projects would >>>> inherit from. It would include a dependency management section that >>>> declares the version of dependencies that are used across projects. In >>>> addition we would want a flume-bom that contains a pom.xml that includes a >>>> dependency management section declaring all the versions of all components >>>> for a specific Flume quarterly release. >>>> >>>> As for the versions, I am not sure why you wouldn’t just go with >>>> 2.0.0-alpha, 2.0.0-beta or 2.0.0-beta1, 2.0.0-beta2 if you aren’t >>>> comfortable labeling them as GA. Once things are stable you would then >>>> release 2.0.0. >>>> >>>> Ralph >>>> >>>>> On Mar 28, 2022, at 7:24 AM, Sean Busbey <sbus...@apple.com.INVALID> >>>>> wrote: >>>>> >>>>> That’s a really interesting possibility. >>>>> >>>>> For the 1.10 release I think we should still upgrade the Hive 1 version >>>>> to the latest 1.y available, but I agree we’d be well served to get a >>>>> handle on the increasing set of possible dependencies. A 2.0 release >>>>> would be a great time to change around how deployment works so that folks >>>>> don’t expect everything to show up in a single omnibus tarball from a >>>>> single build as they do now. >>>>> >>>>> There’s a lot of things to take care of making that transition less >>>>> painful, so I’d suggest we get an overall approach described but try to >>>>> address it incrementally so we’re not facing a very long delay for >>>>> further project releases. >>>>> >>>>> How about something like this? >>>>> >>>>> - Release 1.10.0 soon, only backward compatible releases >>>>> - Release 1.y.0 - every other month, backward compatible dependency >>>>> updates and bug fixes >>>>> - Release 2.0 alpha - break up project into multiple repos, establish >>>>> release cadence(s) w/o binary artifacts >>>>> - Release 2.1 beta - have an “easy path” convenience binary >>>>> - Release 2.2 expected to be production ready >>>>> >>>>> For at least those parts of the process that don’t require project svn >>>>> access I can help with keeping regular 1.y maintenance releases going. We >>>>> could decide ahead of time on when to stop them; e.g. 6 months after the >>>>> first “production ready” flume 2.y release. >>>>> >>>>> For the 2.y releases, I think we’re going to have some growing pains in >>>>> managing how we get from multiple repositories to PMC blessed releases >>>>> and from there to artifacts someone could use to run flume if they’re >>>>> used to our current deployment model. Setting expectations via alpha/beta >>>>> labels and stated packaging goals means we should be able to work out >>>>> friction points while still walking before we try to run with a long term >>>>> sustainable path for the project. We could try to put some goal dates on >>>>> those milestones once we have spent some time discussing details and >>>>> trying move things forward. >>>>> >>>>>> On Mar 27, 2022, at 4:19 AM, Ralph Goers <ralph.go...@dslextreme.com> >>>>>> wrote: >>>>>> >>>>>> Sean, (and everyone else) >>>>>> >>>>>> You mentioned that you want to create separate maven modules to upgrade >>>>>> hive & hbase. The Flume build is already very large. In addition, >>>>>> Upgrading to Hive 3 looks like it will require Hadoop 3 while Hive 2 >>>>>> runs with Hadoop 2. This means both dependencies would need to be in the >>>>>> parent pom. I find this problematic for the following reasons: >>>>>> Flume contains a ton of dependencies and even more transitive >>>>>> dependencies that are not declared. This makes creating new releases >>>>>> really hard given how many dependencies have to be checked and upgraded. >>>>>> As more modules are added the build is just going to get slower. >>>>>> Some modules have dependencies on things that are no longer supported. >>>>>> Again, that makes creating a full Flume release hard. >>>>>> >>>>>> I would suggest that unless security fixes require it we hold off on >>>>>> creating upgrades in 1.10.0 for HBase and Hive beyond what you have >>>>>> already done. Instead, we should create new repositories for the parts >>>>>> of Flume we want to separate and maintain independently. The HBase and >>>>>> Hive upgrades would end up goring there. >>>>>> >>>>>> I believe this will speed up development since builds will no longer >>>>>> take so long.It also means that PRs will go against the target repo >>>>>> which should simplify things. Jira would remain the same as it is today. >>>>>> The component would be used to identify the target repo. >>>>>> >>>>>> I would suggest that what should remain in the main Flume build would be >>>>>> primarily, configuration, core, node, sdk, and some of configfilters. I >>>>>> would expect we would have separate repos for hbase, hdfs, hive, Kafka, >>>>>> embedded-agent, tools, and legacy to start. >>>>>> >>>>>> Thoughts? >>>>>> >>>>>> Ralph >>>>> >>>>> >>>> >>