I think Java-land users will understand to look for an assembly jar in
general, but it's not as obvious outside the Java ecosystem. Assembly
= this thing, plus all its transitive dependencies.

No, there is nothing wrong with Kafka at all. You need to bring
everything it needs for it to work at runtime.

The only piece of Spark you commonly need to bring with you are the
third-party streaming deps, and I agree that the docs should tell
Python users to attach the assembly JAR. Java/Scala users would be
better served building this into their app I think where they would
already be making an assembly JAR.

On Tue, May 12, 2015 at 5:39 PM, Lee McFadden <splee...@gmail.com> wrote:
> Thanks again for all the help folks.
>
> I can confirm that simply switching to `--packages
> org.apache.spark:spark-streaming-kafka-assembly_2.10:1.3.1` makes everything
> work as intended.
>
> I'm not sure what the difference is between the two packages honestly, or
> why one should be used over the other, but the documentation is currently
> not intuitive in this matter.  If you follow the instructions, initially it
> will seem broken.  Is there any reason why the docs for Python users (or, in
> fact, all users - Java/Scala users will run into this too except they are
> armed with the ability to build their own jar with the dependencies
> included) should not be changed to using the assembly package by default?
>
> Additionally, after a few google searches yesterday combined with your help
> I'm wondering if the core issue is upstream in Kafka's dependency chain?
>
> On Tue, May 12, 2015 at 8:53 AM Ted Yu <yuzhih...@gmail.com> wrote:
>>
>> bq. it is already in the assembly
>>
>> Yes. Verified:
>>
>> $ jar tvf ~/Downloads/spark-streaming-kafka-assembly_2.10-1.3.1.jar | grep
>> yammer | grep Gauge
>>   1329 Sat Apr 11 04:25:50 PDT 2015 com/yammer/metrics/core/Gauge.class
>>
>>
>> On Tue, May 12, 2015 at 8:05 AM, Sean Owen <so...@cloudera.com> wrote:
>>>
>>> It doesn't depend directly on yammer metrics; Kafka does. It wouldn't
>>> be correct to declare that it does; it is already in the assembly
>>> anyway.
>>>
>>> On Tue, May 12, 2015 at 3:50 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>>> > Currently external/kafka/pom.xml doesn't cite yammer metrics as
>>> > dependency.
>>> >
>>> > $ ls -l
>>> >
>>> > ~/.m2/repository/com/yammer/metrics/metrics-core/2.2.0/metrics-core-2.2.0.jar
>>> > -rw-r--r--  1 tyu  staff  82123 Dec 17  2013
>>> >
>>> > /Users/tyu/.m2/repository/com/yammer/metrics/metrics-core/2.2.0/metrics-core-2.2.0.jar
>>> >
>>> > Including the metrics-core jar would not increase the size of the final
>>> > release artifact much.
>>> >
>>> > My two cents.
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to