[ 
https://issues.apache.org/jira/browse/FLUME-2819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971830#comment-14971830
 ] 

Mark Grover commented on FLUME-2819:
------------------------------------

I am a committer on Apache Bigtop where we do packaging of the Hadoop ecosystem 
components into a single coherent distribution.

I came across this JIRA and have gone through the related JIRA FLUME-2792. 
Based on my experience working on Apache Bigtop, I thought I'd share my views 
on this topic.

I don't think marking things as provided is the right thing to do.  The right 
thing to do, in my opinion, is to have Flume rely on a new enough version of 
Apache Kafka that provides those features. If there is no such version today, 
there is no option but to wait until Kafka releases such a version. If you use 
are using 'provided', it means that by default, Flume-Kafka integration 
wouldn't work out of the box. The norm usually is to use the default scope 
(which is compile) and have the jars bundled in the classpath (and the 
tarball). And, flume is already doing that with its hadoop and hive 
dependencies (see 
[here|https://github.com/apache/flume/blob/trunk/flume-ng-sinks/flume-dataset-sink/pom.xml#L110]
 for example).

Now, you may see {{<optional>true</optional>}} there, that's nothing to do with 
the scope. The scope of that dependency is still compile, however it's 
optional. Folks usually use optional dependencies in maven when the dependency 
being included is 'too bulky'. That may be because, say the server and client 
classes are all bundled in the same jar. And, so if someone (let's call this C) 
is depending on your project (let's call this B), you don't want to clutter up 
their classpath because of some transitive server dependencies from the 
dependency your project is adding (let's call that A). And, the best way to 
deal with that with a maven optional dependency. When it comes to building your 
project (B) alone, optional tag has no impact, it's as if it doesn't exist. 
However, when someone else depends on your project (i.e. project C depending on 
B), optional dependencies mean that C doesn't pull in the optional and bulky 
dependency of A transitively. You can read up on optional dependencies 
[here|http://maven.apache.org/guides/introduction/introduction-to-optional-and-excludes-dependencies.html]

So, my recommendation in this case would be - if you are using a bulky kafka 
dependency (i.e. no kafka client jar), use the default (i.e. compile) scope 
with optional tag. And, if you are using a kafka-client dependency, simply use 
the default (i.e. compile) scope.

> Kafka libs are being bundled into Flume distro
> ----------------------------------------------
>
>                 Key: FLUME-2819
>                 URL: https://issues.apache.org/jira/browse/FLUME-2819
>             Project: Flume
>          Issue Type: Bug
>            Reporter: Roshan Naik
>
> Kafka dependency libs need to be marked as 'provided' in the pom.xml 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to