Re: Breaking up Flume

Ralph Goers Mon, 28 Mar 2022 10:31:08 -0700

Every release a project does requires a vote and has to meet the ASFs 
requirements for a release. That said, Apache Maven has seemingly dozens of 
plugins that are all independently managed and released. If you look at the 
Maven dev list you will see release votes happening for various things several 
times a month. But their process used to include something that each release 
manager updated to track the releases so they could be included in the board 
report. Now I believe that is all handled by the Apache Reporter service.


I don’t believe our process would be quite that loose. For one, I really don’t 
consider the way Flume allows new components to be  a true plugin architecture. 
I would still anticipate we would group releases of things but nothing says it 
has to be like that.

Downloading the source? I assume you mean the released source. That would be 
available either by downloading the release zip/tar from the ASF distribution 
site or by checking out the release tag from git. But I don’t understand why 
you would do that. 
I use a customized version of Flume but I build it from the flume zip. This is 
a bit painful as I have to then delete stuff I don’t want or want to override. 
It would actually be easier for me if I were to reference the various Flume 
artifacts I need as dependencies and use the dependency plugin to add them to 
the Flume application I am building.

Ralph 

> On Mar 28, 2022, at 9:57 AM, Bessenyei Balázs Donát <bes...@apache.org> wrote:
> 
> What does the "module releases" thing look like from an ASF release
> (process - voting, etc.) perspective?
> 
> Alternatively, do we want a mechanism to be able to add modules
> directly from source? (Homebrew-style)
> 
> 
> Donat
> 
> On Mon, Mar 28, 2022 at 6:43 PM Ralph Goers <ralph.go...@dslextreme.com> 
> wrote:
>> 
>> Thanks for the reply!
>> 
>> In general I agree with what you are proposing. I’d probably suggest once a 
>> quarter instead of every 2 months. I also wouldn’t necessarily have a 
>> release of every component every quarter. If there have been no changes 
>> there isn’t much of a point. And requiring that everything be released 
>> together doesn’t really help. I would suggest that Flume would have a 
>> flume-parent module that includes a parent pom.xml that all projects would 
>> inherit from. It would include a dependency management section that declares 
>> the version of dependencies that are used across projects. In addition we 
>> would want a flume-bom that contains a pom.xml that includes a dependency 
>> management section declaring all the versions of all components for a 
>> specific Flume quarterly release.
>> 
>> As for the versions, I am not sure why you wouldn’t just go with 
>> 2.0.0-alpha, 2.0.0-beta or 2.0.0-beta1, 2.0.0-beta2 if you aren’t 
>> comfortable labeling them as GA. Once things are stable you would then 
>> release 2.0.0.
>> 
>> Ralph
>> 
>>> On Mar 28, 2022, at 7:24 AM, Sean Busbey <sbus...@apple.com.INVALID> wrote:
>>> 
>>> That’s a really interesting possibility.
>>> 
>>> For the 1.10 release I think we should still upgrade the Hive 1 version to 
>>> the latest 1.y available, but I agree we’d be well served to get a handle 
>>> on the increasing set of possible dependencies. A 2.0 release would be a 
>>> great time to change around how deployment works so that folks don’t expect 
>>> everything to show up in a single omnibus tarball from a single build as 
>>> they do now.
>>> 
>>> There’s a lot of things to take care of making that transition less 
>>> painful, so I’d suggest we get an overall approach described but try to 
>>> address it incrementally so we’re not facing a very long delay for further 
>>> project releases.
>>> 
>>> How about  something like this?
>>> 
>>> - Release 1.10.0 soon, only backward compatible releases
>>> - Release 1.y.0 - every other month, backward compatible dependency updates 
>>> and bug fixes
>>> - Release 2.0 alpha - break up project into multiple repos, establish 
>>> release cadence(s) w/o binary artifacts
>>> - Release 2.1 beta - have an “easy path” convenience binary
>>> - Release 2.2 expected to be production ready
>>> 
>>> For at least those parts of the process that don’t require project svn 
>>> access I can help with keeping regular 1.y maintenance releases going. We 
>>> could decide ahead of time on when to stop them; e.g. 6 months after the 
>>> first “production ready” flume 2.y release.
>>> 
>>> For the 2.y releases, I think we’re going to have some growing pains in 
>>> managing how we get from multiple repositories to PMC blessed releases and 
>>> from there to artifacts someone could use to run flume if they’re used to 
>>> our current deployment model. Setting expectations via alpha/beta labels 
>>> and stated packaging goals means we should be able to work out friction 
>>> points while still walking before we try to run with a long term 
>>> sustainable path for the project. We could try to put some goal dates on 
>>> those milestones once we have spent some time discussing details and trying 
>>> move things forward.
>>> 
>>>> On Mar 27, 2022, at 4:19 AM, Ralph Goers <ralph.go...@dslextreme.com> 
>>>> wrote:
>>>> 
>>>> Sean, (and everyone else)
>>>> 
>>>> You mentioned that you want to create separate maven modules to upgrade 
>>>> hive & hbase.  The Flume build is already very large. In addition, 
>>>> Upgrading to Hive 3 looks like it will require Hadoop 3 while Hive 2 runs 
>>>> with Hadoop 2. This means both dependencies would need to be in the parent 
>>>> pom. I find this problematic for the following reasons:
>>>> Flume contains a ton of dependencies and even more transitive dependencies 
>>>> that are not declared. This makes creating new releases really hard given 
>>>> how many dependencies have to be checked and upgraded.
>>>> As more modules are added the build is just going to get slower.
>>>> Some modules have dependencies on things that are no longer supported. 
>>>> Again, that makes creating a full Flume release hard.
>>>> 
>>>> I would suggest that unless security fixes require it we hold off on 
>>>> creating upgrades in 1.10.0 for HBase and Hive beyond what you have 
>>>> already done. Instead, we should create new repositories for the parts of 
>>>> Flume we want to separate and maintain independently. The HBase and Hive 
>>>> upgrades would end up goring there.
>>>> 
>>>> I believe this will speed up development since builds will no longer take 
>>>> so long.It also means that PRs will go against the target repo which 
>>>> should simplify things. Jira would remain the same as it is today. The 
>>>> component would be used to identify the target repo.
>>>> 
>>>> I would suggest that what should remain in the main Flume build would be 
>>>> primarily, configuration, core, node, sdk, and some of configfilters.  I 
>>>> would expect we would have separate repos for hbase, hdfs, hive, Kafka, 
>>>> embedded-agent, tools, and legacy to start.
>>>> 
>>>> Thoughts?
>>>> 
>>>> Ralph
>>> 
>>> 
>>

Re: Breaking up Flume

Reply via email to