Yes it's a resaonable argument, that putting N more external integration
modules on the default spark-submit classpath might bring in more
third-party dependencies that clash or something. I think the convenience
factor isn't a big deal; users can also just write a dependence on said
module in
I think that traditionally, the reason *not* to include these has been if they
brought additional dependencies that users don’t really need, but that might
clash with what the users have in their own app. Maybe this used to be the case
for Kafka. We could analyze it and include it by default,
Hm OK I am crazy then. I think I never noticed it because I had always used
a distro that did actually supply this on the classpath.
Well ... I think it would be reasonable to include these things (at least,
Kafka integration) by default in the binary distro. I'll update the JIRA to
reflect that
Hi Sean,
It's been for years I'd say that you had to specify --packages to get the
Kafka-related jars on the classpath. I simply got used to this annoyance
(as did others). Could it be that it's an external package (although an
integral part of Spark)?!
I'm very glad you've brought it up since I
Let's take this to https://issues.apache.org/jira/browse/SPARK-25026 -- I
provisionally marked this a Blocker, as if it's correct, then the release
is missing an important piece and we'll want to remedy that ASAP. I still
have this feeling I am missing something. The classes really aren't there
in
I was debugging why a Kafka-based streaming app doesn't seem to find
Kafka-related integration classes when run standalone from our latest 2.3.1
release, and noticed that there doesn't seem to be any Kafka-related jars
from Spark in the distro. In jars/, I see:
spark-catalyst_2.11-2.3.1.jar