Re: Am I crazy, or does the binary distro not have Kafka integration?

2018-08-05 Thread Sean Owen
Yes it's a resaonable argument, that putting N more external integration modules on the default spark-submit classpath might bring in more third-party dependencies that clash or something. I think the convenience factor isn't a big deal; users can also just write a dependence on said module in

Re: Am I crazy, or does the binary distro not have Kafka integration?

2018-08-04 Thread Matei Zaharia
I think that traditionally, the reason *not* to include these has been if they brought additional dependencies that users don’t really need, but that might clash with what the users have in their own app. Maybe this used to be the case for Kafka. We could analyze it and include it by default,

Re: Am I crazy, or does the binary distro not have Kafka integration?

2018-08-04 Thread Sean Owen
Hm OK I am crazy then. I think I never noticed it because I had always used a distro that did actually supply this on the classpath. Well ... I think it would be reasonable to include these things (at least, Kafka integration) by default in the binary distro. I'll update the JIRA to reflect that

Re: Am I crazy, or does the binary distro not have Kafka integration?

2018-08-04 Thread Jacek Laskowski
Hi Sean, It's been for years I'd say that you had to specify --packages to get the Kafka-related jars on the classpath. I simply got used to this annoyance (as did others). Could it be that it's an external package (although an integral part of Spark)?! I'm very glad you've brought it up since I

Re: Am I crazy, or does the binary distro not have Kafka integration?

2018-08-04 Thread Sean Owen
Let's take this to https://issues.apache.org/jira/browse/SPARK-25026 -- I provisionally marked this a Blocker, as if it's correct, then the release is missing an important piece and we'll want to remedy that ASAP. I still have this feeling I am missing something. The classes really aren't there in

Am I crazy, or does the binary distro not have Kafka integration?

2018-08-04 Thread Sean Owen
I was debugging why a Kafka-based streaming app doesn't seem to find Kafka-related integration classes when run standalone from our latest 2.3.1 release, and noticed that there doesn't seem to be any Kafka-related jars from Spark in the distro. In jars/, I see: spark-catalyst_2.11-2.3.1.jar