Yes it's a resaonable argument, that putting N more external integration
modules on the default spark-submit classpath might bring in more
third-party dependencies that clash or something. I think the convenience
factor isn't a big deal; users can also just write a dependence on said
module in their own app, once. It does seem like we could at least *ship*
the binary bits in "external-jars/' or something; they're not even compiled
in the binary distro. And it also means users have to make sure the version
of spark-kafka they integrate works with their cluster, which means not
just making sure their app matches the user-facing API of spark-kafka, but
ensuring that the spark-kafka module's interface to spark works -- whatever
internal details there may be there.

On Sat, Aug 4, 2018 at 9:15 PM Matei Zaharia <matei.zaha...@gmail.com>
wrote:

> I think that traditionally, the reason *not* to include these has been if
> they brought additional dependencies that users don’t really need, but that
> might clash with what the users have in their own app. Maybe this used to
> be the case for Kafka. We could analyze it and include it by default, or
> perhaps make it easier to add it in spark-submit and spark-shell. I feel
> that in an IDE, it won’t be a huge problem because you just add it once,
> but it is annoying for spark-submit.
>
> Matei
>
> > On Aug 4, 2018, at 2:19 PM, Sean Owen <sro...@gmail.com> wrote:
> >
> > Hm OK I am crazy then. I think I never noticed it because I had always
> used a distro that did actually supply this on the classpath.
> > Well ... I think it would be reasonable to include these things (at
> least, Kafka integration) by default in the binary distro. I'll update the
> JIRA to reflect that this is at best a Wish.
> >
> > On Sat, Aug 4, 2018 at 4:17 PM Jacek Laskowski <ja...@japila.pl> wrote:
> > Hi Sean,
> >
> > It's been for years I'd say that you had to specify --packages to get
> the Kafka-related jars on the classpath. I simply got used to this
> annoyance (as did others). Could it be that it's an external package
> (although an integral part of Spark)?!
> >
> > I'm very glad you've brought it up since I think Kafka data source is so
> important that it should be included in spark-shell and spark-submit by
> default. THANKS!
> >
> > Pozdrawiam,
> > Jacek Laskowski
> > ----
> > https://about.me/JacekLaskowski
> > Mastering Spark SQL https://bit.ly/mastering-spark-sql
> > Spark Structured Streaming https://bit.ly/spark-structured-streaming
> > Mastering Kafka Streams https://bit.ly/mastering-kafka-streams
> > Follow me at https://twitter.com/jaceklaskowski
> >
> > On Sat, Aug 4, 2018 at 9:56 PM, Sean Owen <sro...@gmail.com> wrote:
> > Let's take this to https://issues.apache.org/jira/browse/SPARK-25026 --
> I provisionally marked this a Blocker, as if it's correct, then the release
> is missing an important piece and we'll want to remedy that ASAP. I still
> have this feeling I am missing something. The classes really aren't there
> in the release but ... *nobody* noticed all this time? I guess maybe
> Spark-Kafka users may be using a vendor distro that does package these bits.
> >
> >
> > On Sat, Aug 4, 2018 at 10:48 AM Sean Owen <sro...@gmail.com> wrote:
> > I was debugging why a Kafka-based streaming app doesn't seem to find
> Kafka-related integration classes when run standalone from our latest 2.3.1
> release, and noticed that there doesn't seem to be any Kafka-related jars
> from Spark in the distro. In jars/, I see:
> >
> > spark-catalyst_2.11-2.3.1.jar
> > spark-core_2.11-2.3.1.jar
> > spark-graphx_2.11-2.3.1.jar
> > spark-hive-thriftserver_2.11-2.3.1.jar
> > spark-hive_2.11-2.3.1.jar
> > spark-kubernetes_2.11-2.3.1.jar
> > spark-kvstore_2.11-2.3.1.jar
> > spark-launcher_2.11-2.3.1.jar
> > spark-mesos_2.11-2.3.1.jar
> > spark-mllib-local_2.11-2.3.1.jar
> > spark-mllib_2.11-2.3.1.jar
> > spark-network-common_2.11-2.3.1.jar
> > spark-network-shuffle_2.11-2.3.1.jar
> > spark-repl_2.11-2.3.1.jar
> > spark-sketch_2.11-2.3.1.jar
> > spark-sql_2.11-2.3.1.jar
> > spark-streaming_2.11-2.3.1.jar
> > spark-tags_2.11-2.3.1.jar
> > spark-unsafe_2.11-2.3.1.jar
> > spark-yarn_2.11-2.3.1.jar
> >
> > I checked make-distribution.sh, and it copies a bunch of JARs into the
> distro, but does not seem to touch the kafka modules.
> >
> > Am I crazy or missing something obvious -- those should be in the
> release, right?
> >
>
>

Reply via email to