I think that traditionally, the reason *not* to include these has been if they 
brought additional dependencies that users don’t really need, but that might 
clash with what the users have in their own app. Maybe this used to be the case 
for Kafka. We could analyze it and include it by default, or perhaps make it 
easier to add it in spark-submit and spark-shell. I feel that in an IDE, it 
won’t be a huge problem because you just add it once, but it is annoying for 
spark-submit.

Matei

> On Aug 4, 2018, at 2:19 PM, Sean Owen <sro...@gmail.com> wrote:
> 
> Hm OK I am crazy then. I think I never noticed it because I had always used a 
> distro that did actually supply this on the classpath.
> Well ... I think it would be reasonable to include these things (at least, 
> Kafka integration) by default in the binary distro. I'll update the JIRA to 
> reflect that this is at best a Wish.
> 
> On Sat, Aug 4, 2018 at 4:17 PM Jacek Laskowski <ja...@japila.pl> wrote:
> Hi Sean,
> 
> It's been for years I'd say that you had to specify --packages to get the 
> Kafka-related jars on the classpath. I simply got used to this annoyance (as 
> did others). Could it be that it's an external package (although an integral 
> part of Spark)?!
> 
> I'm very glad you've brought it up since I think Kafka data source is so 
> important that it should be included in spark-shell and spark-submit by 
> default. THANKS!
> 
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://about.me/JacekLaskowski
> Mastering Spark SQL https://bit.ly/mastering-spark-sql
> Spark Structured Streaming https://bit.ly/spark-structured-streaming
> Mastering Kafka Streams https://bit.ly/mastering-kafka-streams
> Follow me at https://twitter.com/jaceklaskowski
> 
> On Sat, Aug 4, 2018 at 9:56 PM, Sean Owen <sro...@gmail.com> wrote:
> Let's take this to https://issues.apache.org/jira/browse/SPARK-25026 -- I 
> provisionally marked this a Blocker, as if it's correct, then the release is 
> missing an important piece and we'll want to remedy that ASAP. I still have 
> this feeling I am missing something. The classes really aren't there in the 
> release but ... *nobody* noticed all this time? I guess maybe Spark-Kafka 
> users may be using a vendor distro that does package these bits.
> 
> 
> On Sat, Aug 4, 2018 at 10:48 AM Sean Owen <sro...@gmail.com> wrote:
> I was debugging why a Kafka-based streaming app doesn't seem to find 
> Kafka-related integration classes when run standalone from our latest 2.3.1 
> release, and noticed that there doesn't seem to be any Kafka-related jars 
> from Spark in the distro. In jars/, I see:
> 
> spark-catalyst_2.11-2.3.1.jar
> spark-core_2.11-2.3.1.jar
> spark-graphx_2.11-2.3.1.jar
> spark-hive-thriftserver_2.11-2.3.1.jar
> spark-hive_2.11-2.3.1.jar
> spark-kubernetes_2.11-2.3.1.jar
> spark-kvstore_2.11-2.3.1.jar
> spark-launcher_2.11-2.3.1.jar
> spark-mesos_2.11-2.3.1.jar
> spark-mllib-local_2.11-2.3.1.jar
> spark-mllib_2.11-2.3.1.jar
> spark-network-common_2.11-2.3.1.jar
> spark-network-shuffle_2.11-2.3.1.jar
> spark-repl_2.11-2.3.1.jar
> spark-sketch_2.11-2.3.1.jar
> spark-sql_2.11-2.3.1.jar
> spark-streaming_2.11-2.3.1.jar
> spark-tags_2.11-2.3.1.jar
> spark-unsafe_2.11-2.3.1.jar
> spark-yarn_2.11-2.3.1.jar
> 
> I checked make-distribution.sh, and it copies a bunch of JARs into the 
> distro, but does not seem to touch the kafka modules.
> 
> Am I crazy or missing something obvious -- those should be in the release, 
> right?
> 


---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to