[
https://issues.apache.org/jira/browse/FLUME-2819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971830#comment-14971830
]
Mark Grover commented on FLUME-2819:
------------------------------------
I am a committer on Apache Bigtop where we do packaging of the Hadoop ecosystem
components into a single coherent distribution.
I came across this JIRA and have gone through the related JIRA FLUME-2792.
Based on my experience working on Apache Bigtop, I thought I'd share my views
on this topic.
I don't think marking things as provided is the right thing to do. The right
thing to do, in my opinion, is to have Flume rely on a new enough version of
Apache Kafka that provides those features. If there is no such version today,
there is no option but to wait until Kafka releases such a version. If you use
are using 'provided', it means that by default, Flume-Kafka integration
wouldn't work out of the box. The norm usually is to use the default scope
(which is compile) and have the jars bundled in the classpath (and the
tarball). And, flume is already doing that with its hadoop and hive
dependencies (see
[here|https://github.com/apache/flume/blob/trunk/flume-ng-sinks/flume-dataset-sink/pom.xml#L110]
for example).
Now, you may see {{<optional>true</optional>}} there, that's nothing to do with
the scope. The scope of that dependency is still compile, however it's
optional. Folks usually use optional dependencies in maven when the dependency
being included is 'too bulky'. That may be because, say the server and client
classes are all bundled in the same jar. And, so if someone (let's call this C)
is depending on your project (let's call this B), you don't want to clutter up
their classpath because of some transitive server dependencies from the
dependency your project is adding (let's call that A). And, the best way to
deal with that with a maven optional dependency. When it comes to building your
project (B) alone, optional tag has no impact, it's as if it doesn't exist.
However, when someone else depends on your project (i.e. project C depending on
B), optional dependencies mean that C doesn't pull in the optional and bulky
dependency of A transitively. You can read up on optional dependencies
[here|http://maven.apache.org/guides/introduction/introduction-to-optional-and-excludes-dependencies.html]
So, my recommendation in this case would be - if you are using a bulky kafka
dependency (i.e. no kafka client jar), use the default (i.e. compile) scope
with optional tag. And, if you are using a kafka-client dependency, simply use
the default (i.e. compile) scope.
> Kafka libs are being bundled into Flume distro
> ----------------------------------------------
>
> Key: FLUME-2819
> URL: https://issues.apache.org/jira/browse/FLUME-2819
> Project: Flume
> Issue Type: Bug
> Reporter: Roshan Naik
>
> Kafka dependency libs need to be marked as 'provided' in the pom.xml
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)