It would be good if you can tell what I should add to the documentation to make it easier to understand. I can update the docs for 1.4.0 release.
On Tue, May 12, 2015 at 9:52 AM, Lee McFadden <splee...@gmail.com> wrote: > Thanks for explaining Sean and Cody, this makes sense now. I'd like to > help improve this documentation so other python users don't run into the > same thing, so I'll look into that today. > > On Tue, May 12, 2015 at 9:44 AM Cody Koeninger <c...@koeninger.org> wrote: > >> One of the packages just contains the streaming-kafka code. The other >> contains that code, plus everything it depends on. That's what "assembly" >> typically means in JVM land. >> >> Java/Scala users are accustomed to using their own build tool to include >> necessary dependencies. JVM dependency management is (thankfully) >> different from Python dependency management. >> >> As far as I can tell, there is no core issue, upstream or otherwise. >> >> >> >> >> >> >> On Tue, May 12, 2015 at 11:39 AM, Lee McFadden <splee...@gmail.com> >> wrote: >> >>> Thanks again for all the help folks. >>> >>> I can confirm that simply switching to `--packages >>> org.apache.spark:spark-streaming-kafka-assembly_2.10:1.3.1` makes >>> everything work as intended. >>> >>> I'm not sure what the difference is between the two packages honestly, >>> or why one should be used over the other, but the documentation is >>> currently not intuitive in this matter. If you follow the instructions, >>> initially it will seem broken. Is there any reason why the docs for Python >>> users (or, in fact, all users - Java/Scala users will run into this too >>> except they are armed with the ability to build their own jar with the >>> dependencies included) should not be changed to using the assembly package >>> by default? >>> >>> Additionally, after a few google searches yesterday combined with your >>> help I'm wondering if the core issue is upstream in Kafka's dependency >>> chain? >>> >>> On Tue, May 12, 2015 at 8:53 AM Ted Yu <yuzhih...@gmail.com> wrote: >>> >>>> bq. it is already in the assembly >>>> >>>> Yes. Verified: >>>> >>>> $ jar tvf ~/Downloads/spark-streaming-kafka-assembly_2.10-1.3.1.jar | grep >>>> yammer | grep Gauge >>>> 1329 Sat Apr 11 04:25:50 PDT 2015 com/yammer/metrics/core/Gauge.class >>>> >>>> >>>> On Tue, May 12, 2015 at 8:05 AM, Sean Owen <so...@cloudera.com> wrote: >>>> >>>>> It doesn't depend directly on yammer metrics; Kafka does. It wouldn't >>>>> be correct to declare that it does; it is already in the assembly >>>>> anyway. >>>>> >>>>> On Tue, May 12, 2015 at 3:50 PM, Ted Yu <yuzhih...@gmail.com> wrote: >>>>> > Currently external/kafka/pom.xml doesn't cite yammer metrics as >>>>> dependency. >>>>> > >>>>> > $ ls -l >>>>> > >>>>> ~/.m2/repository/com/yammer/metrics/metrics-core/2.2.0/metrics-core-2.2.0.jar >>>>> > -rw-r--r-- 1 tyu staff 82123 Dec 17 2013 >>>>> > >>>>> /Users/tyu/.m2/repository/com/yammer/metrics/metrics-core/2.2.0/metrics-core-2.2.0.jar >>>>> > >>>>> > Including the metrics-core jar would not increase the size of the >>>>> final >>>>> > release artifact much. >>>>> > >>>>> > My two cents. >>>>> >>>> >>>> >>