[GitHub] [flume] tmgstevens commented on pull request #351: FLUME-3415 - Provide Docker image for Flume

GitBox Wed, 02 Nov 2022 04:02:11 -0700


tmgstevens commented on PR #351:
URL: https://github.com/apache/flume/pull/351#issuecomment-1300061047

So a couple of points in here:

> It packages all the artifacts whether they are required or not.

True - packaging is something that I think we need to think about going
forwards. For people who want a lightweight deployment, should we offer
different profiles, the flipside being that if you're combining two components
from different modules (e.g. syslog and kafka, HTTP and HDFS etc) then actually
do you ever get the benefit of the modularity, or does it just ramp up your
complexity (complexity = adoption blocker in my mind).

> Unless I am mistaken, it is getting the distribution tar and using the
configuration located within that. I fail to see how useful that will be as I
would expect most users would have a custom configuration.

It does bundle the default conf directory, but it is anticipated that a user
would re-map that or pass config in via environment variables (which could then
include secrets). Both designed to work in docker and kubernetes. There's an
example of doing that here:
https://github.com/apache/flume/blob/d2bd7812dbacd86459726c0fd3dc774272ce0222/flume-ng-tests/src/test/java/org/apache/flume/test/util/DockerInstall.java#L137-L153

> I think starting from "everything goes in the bucket" to match our
historic deployment shape is fine. I agree that eventually we need to get to a
more modular approach that provides easy examples for folks building just the
parts they need.

> There's some tooling around for easier container image building based on
spring boot applications, e.g. Jib. Maybe we should take an approach that
leverages that?

Personally I'd rather not re-write the docker deployment right now given
that what's there works pretty well. We could look to move away from the
spotify plugin to something else, but I don't want to re-architect the whole
packaging of Flume at the moment.

> To be clear, I use the FileChannel, which pretty much requires fast disk
(i.e. SSDs or equivalent) to perform well. It also requires a dedicated "disk"
so that data isn't lost on restart. This doesn't really work well with
Docker/Kubernetes so we don't use it for Flume.

I actually think this would be fine - you can have persistent disks in
Kubernetes assigned to pods, same applies to docker where you can mount a
volume. Would I necessarily recommend that you re-write your deployment model
to use containers? No. But for example, in @busbey 's world where he might be
moving from a previously managed deployment to something that needs additional
orchestration, using Docker or Kubernetes and deploying agents across many
nodes, this could make things much easier to manage and maintain.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flume] tmgstevens commented on pull request #351: FLUME-3415 - Provide Docker image for Flume

Reply via email to