Re: Breaking up Flume

Ralph Goers Tue, 29 Mar 2022 00:39:02 -0700

Well, as always, if someone wants to code something to support an easier way 
for users to deploy flume I’m all for it. I know Tristan has been working on 
creating a Docker image, but I suspect it is going to need something like what 
you suggest as the default docker image should only contain the minimum amount 
of stuff to have a working flume instance. For example, I only use the file 
channel, Avro sources and sinks, and Kafka Sink. Everything else packaged by 
default would just be stuff I would want to remove.


Ralph

> On Mar 28, 2022, at 9:44 PM, Bessenyei Balázs Donát <bes...@apache.org> wrote:
> 
>> I assume you mean the released source
> 
> I was thinking of a git reference (just like you can do with pip or
> npm) so that people can more easily mix and match, but I don't have
> strong opinions about this.
> 
> 
> Donat
> 
> On Mon, Mar 28, 2022 at 7:31 PM Ralph Goers <ralph.go...@dslextreme.com> 
> wrote:
>> 
>> Every release a project does requires a vote and has to meet the ASFs 
>> requirements for a release. That said, Apache Maven has seemingly dozens of 
>> plugins that are all independently managed and released. If you look at the 
>> Maven dev list you will see release votes happening for various things 
>> several times a month. But their process used to include something that each 
>> release manager updated to track the releases so they could be included in 
>> the board report. Now I believe that is all handled by the Apache Reporter 
>> service.
>> 
>> I don’t believe our process would be quite that loose. For one, I really 
>> don’t consider the way Flume allows new components to be  a true plugin 
>> architecture. I would still anticipate we would group releases of things but 
>> nothing says it has to be like that.
>> 
>> Downloading the source? I assume you mean the released source. That would be 
>> available either by downloading the release zip/tar from the ASF 
>> distribution site or by checking out the release tag from git. But I don’t 
>> understand why you would do that.
>> I use a customized version of Flume but I build it from the flume zip. This 
>> is a bit painful as I have to then delete stuff I don’t want or want to 
>> override. It would actually be easier for me if I were to reference the 
>> various Flume artifacts I need as dependencies and use the dependency plugin 
>> to add them to the Flume application I am building.
>> 
>> Ralph
>> 
>>> On Mar 28, 2022, at 9:57 AM, Bessenyei Balázs Donát <bes...@apache.org> 
>>> wrote:
>>> 
>>> What does the "module releases" thing look like from an ASF release
>>> (process - voting, etc.) perspective?
>>> 
>>> Alternatively, do we want a mechanism to be able to add modules
>>> directly from source? (Homebrew-style)
>>> 
>>> 
>>> Donat
>>> 
>>> On Mon, Mar 28, 2022 at 6:43 PM Ralph Goers <ralph.go...@dslextreme.com> 
>>> wrote:
>>>> 
>>>> Thanks for the reply!
>>>> 
>>>> In general I agree with what you are proposing. I’d probably suggest once 
>>>> a quarter instead of every 2 months. I also wouldn’t necessarily have a 
>>>> release of every component every quarter. If there have been no changes 
>>>> there isn’t much of a point. And requiring that everything be released 
>>>> together doesn’t really help. I would suggest that Flume would have a 
>>>> flume-parent module that includes a parent pom.xml that all projects would 
>>>> inherit from. It would include a dependency management section that 
>>>> declares the version of dependencies that are used across projects. In 
>>>> addition we would want a flume-bom that contains a pom.xml that includes a 
>>>> dependency management section declaring all the versions of all components 
>>>> for a specific Flume quarterly release.
>>>> 
>>>> As for the versions, I am not sure why you wouldn’t just go with 
>>>> 2.0.0-alpha, 2.0.0-beta or 2.0.0-beta1, 2.0.0-beta2 if you aren’t 
>>>> comfortable labeling them as GA. Once things are stable you would then 
>>>> release 2.0.0.
>>>> 
>>>> Ralph
>>>> 
>>>>> On Mar 28, 2022, at 7:24 AM, Sean Busbey <sbus...@apple.com.INVALID> 
>>>>> wrote:
>>>>> 
>>>>> That’s a really interesting possibility.
>>>>> 
>>>>> For the 1.10 release I think we should still upgrade the Hive 1 version 
>>>>> to the latest 1.y available, but I agree we’d be well served to get a 
>>>>> handle on the increasing set of possible dependencies. A 2.0 release 
>>>>> would be a great time to change around how deployment works so that folks 
>>>>> don’t expect everything to show up in a single omnibus tarball from a 
>>>>> single build as they do now.
>>>>> 
>>>>> There’s a lot of things to take care of making that transition less 
>>>>> painful, so I’d suggest we get an overall approach described but try to 
>>>>> address it incrementally so we’re not facing a very long delay for 
>>>>> further project releases.
>>>>> 
>>>>> How about  something like this?
>>>>> 
>>>>> - Release 1.10.0 soon, only backward compatible releases
>>>>> - Release 1.y.0 - every other month, backward compatible dependency 
>>>>> updates and bug fixes
>>>>> - Release 2.0 alpha - break up project into multiple repos, establish 
>>>>> release cadence(s) w/o binary artifacts
>>>>> - Release 2.1 beta - have an “easy path” convenience binary
>>>>> - Release 2.2 expected to be production ready
>>>>> 
>>>>> For at least those parts of the process that don’t require project svn 
>>>>> access I can help with keeping regular 1.y maintenance releases going. We 
>>>>> could decide ahead of time on when to stop them; e.g. 6 months after the 
>>>>> first “production ready” flume 2.y release.
>>>>> 
>>>>> For the 2.y releases, I think we’re going to have some growing pains in 
>>>>> managing how we get from multiple repositories to PMC blessed releases 
>>>>> and from there to artifacts someone could use to run flume if they’re 
>>>>> used to our current deployment model. Setting expectations via alpha/beta 
>>>>> labels and stated packaging goals means we should be able to work out 
>>>>> friction points while still walking before we try to run with a long term 
>>>>> sustainable path for the project. We could try to put some goal dates on 
>>>>> those milestones once we have spent some time discussing details and 
>>>>> trying move things forward.
>>>>> 
>>>>>> On Mar 27, 2022, at 4:19 AM, Ralph Goers <ralph.go...@dslextreme.com> 
>>>>>> wrote:
>>>>>> 
>>>>>> Sean, (and everyone else)
>>>>>> 
>>>>>> You mentioned that you want to create separate maven modules to upgrade 
>>>>>> hive & hbase.  The Flume build is already very large. In addition, 
>>>>>> Upgrading to Hive 3 looks like it will require Hadoop 3 while Hive 2 
>>>>>> runs with Hadoop 2. This means both dependencies would need to be in the 
>>>>>> parent pom. I find this problematic for the following reasons:
>>>>>> Flume contains a ton of dependencies and even more transitive 
>>>>>> dependencies that are not declared. This makes creating new releases 
>>>>>> really hard given how many dependencies have to be checked and upgraded.
>>>>>> As more modules are added the build is just going to get slower.
>>>>>> Some modules have dependencies on things that are no longer supported. 
>>>>>> Again, that makes creating a full Flume release hard.
>>>>>> 
>>>>>> I would suggest that unless security fixes require it we hold off on 
>>>>>> creating upgrades in 1.10.0 for HBase and Hive beyond what you have 
>>>>>> already done. Instead, we should create new repositories for the parts 
>>>>>> of Flume we want to separate and maintain independently. The HBase and 
>>>>>> Hive upgrades would end up goring there.
>>>>>> 
>>>>>> I believe this will speed up development since builds will no longer 
>>>>>> take so long.It also means that PRs will go against the target repo 
>>>>>> which should simplify things. Jira would remain the same as it is today. 
>>>>>> The component would be used to identify the target repo.
>>>>>> 
>>>>>> I would suggest that what should remain in the main Flume build would be 
>>>>>> primarily, configuration, core, node, sdk, and some of configfilters.  I 
>>>>>> would expect we would have separate repos for hbase, hdfs, hive, Kafka, 
>>>>>> embedded-agent, tools, and legacy to start.
>>>>>> 
>>>>>> Thoughts?
>>>>>> 
>>>>>> Ralph
>>>>> 
>>>>> 
>>>> 
>>

Re: Breaking up Flume

Reply via email to