Re: Am I crazy, or does the binary distro not have Kafka integration?

2018-08-04 Thread Matei Zaharia
I think that traditionally, the reason *not* to include these has been if they 
brought additional dependencies that users don’t really need, but that might 
clash with what the users have in their own app. Maybe this used to be the case 
for Kafka. We could analyze it and include it by default, or perhaps make it 
easier to add it in spark-submit and spark-shell. I feel that in an IDE, it 
won’t be a huge problem because you just add it once, but it is annoying for 
spark-submit.

Matei

> On Aug 4, 2018, at 2:19 PM, Sean Owen  wrote:
> 
> Hm OK I am crazy then. I think I never noticed it because I had always used a 
> distro that did actually supply this on the classpath.
> Well ... I think it would be reasonable to include these things (at least, 
> Kafka integration) by default in the binary distro. I'll update the JIRA to 
> reflect that this is at best a Wish.
> 
> On Sat, Aug 4, 2018 at 4:17 PM Jacek Laskowski  wrote:
> Hi Sean,
> 
> It's been for years I'd say that you had to specify --packages to get the 
> Kafka-related jars on the classpath. I simply got used to this annoyance (as 
> did others). Could it be that it's an external package (although an integral 
> part of Spark)?!
> 
> I'm very glad you've brought it up since I think Kafka data source is so 
> important that it should be included in spark-shell and spark-submit by 
> default. THANKS!
> 
> Pozdrawiam,
> Jacek Laskowski
> 
> https://about.me/JacekLaskowski
> Mastering Spark SQL https://bit.ly/mastering-spark-sql
> Spark Structured Streaming https://bit.ly/spark-structured-streaming
> Mastering Kafka Streams https://bit.ly/mastering-kafka-streams
> Follow me at https://twitter.com/jaceklaskowski
> 
> On Sat, Aug 4, 2018 at 9:56 PM, Sean Owen  wrote:
> Let's take this to https://issues.apache.org/jira/browse/SPARK-25026 -- I 
> provisionally marked this a Blocker, as if it's correct, then the release is 
> missing an important piece and we'll want to remedy that ASAP. I still have 
> this feeling I am missing something. The classes really aren't there in the 
> release but ... *nobody* noticed all this time? I guess maybe Spark-Kafka 
> users may be using a vendor distro that does package these bits.
> 
> 
> On Sat, Aug 4, 2018 at 10:48 AM Sean Owen  wrote:
> I was debugging why a Kafka-based streaming app doesn't seem to find 
> Kafka-related integration classes when run standalone from our latest 2.3.1 
> release, and noticed that there doesn't seem to be any Kafka-related jars 
> from Spark in the distro. In jars/, I see:
> 
> spark-catalyst_2.11-2.3.1.jar
> spark-core_2.11-2.3.1.jar
> spark-graphx_2.11-2.3.1.jar
> spark-hive-thriftserver_2.11-2.3.1.jar
> spark-hive_2.11-2.3.1.jar
> spark-kubernetes_2.11-2.3.1.jar
> spark-kvstore_2.11-2.3.1.jar
> spark-launcher_2.11-2.3.1.jar
> spark-mesos_2.11-2.3.1.jar
> spark-mllib-local_2.11-2.3.1.jar
> spark-mllib_2.11-2.3.1.jar
> spark-network-common_2.11-2.3.1.jar
> spark-network-shuffle_2.11-2.3.1.jar
> spark-repl_2.11-2.3.1.jar
> spark-sketch_2.11-2.3.1.jar
> spark-sql_2.11-2.3.1.jar
> spark-streaming_2.11-2.3.1.jar
> spark-tags_2.11-2.3.1.jar
> spark-unsafe_2.11-2.3.1.jar
> spark-yarn_2.11-2.3.1.jar
> 
> I checked make-distribution.sh, and it copies a bunch of JARs into the 
> distro, but does not seem to touch the kafka modules.
> 
> Am I crazy or missing something obvious -- those should be in the release, 
> right?
> 


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Am I crazy, or does the binary distro not have Kafka integration?

2018-08-04 Thread Sean Owen
Hm OK I am crazy then. I think I never noticed it because I had always used
a distro that did actually supply this on the classpath.
Well ... I think it would be reasonable to include these things (at least,
Kafka integration) by default in the binary distro. I'll update the JIRA to
reflect that this is at best a Wish.

On Sat, Aug 4, 2018 at 4:17 PM Jacek Laskowski  wrote:

> Hi Sean,
>
> It's been for years I'd say that you had to specify --packages to get the
> Kafka-related jars on the classpath. I simply got used to this annoyance
> (as did others). Could it be that it's an external package (although an
> integral part of Spark)?!
>
> I'm very glad you've brought it up since I think Kafka data source is so
> important that it should be included in spark-shell and spark-submit by
> default. THANKS!
>
> Pozdrawiam,
> Jacek Laskowski
> 
> https://about.me/JacekLaskowski
> Mastering Spark SQL https://bit.ly/mastering-spark-sql
> Spark Structured Streaming https://bit.ly/spark-structured-streaming
> Mastering Kafka Streams https://bit.ly/mastering-kafka-streams
> Follow me at https://twitter.com/jaceklaskowski
>
> On Sat, Aug 4, 2018 at 9:56 PM, Sean Owen  wrote:
>
>> Let's take this to https://issues.apache.org/jira/browse/SPARK-25026 --
>> I provisionally marked this a Blocker, as if it's correct, then the release
>> is missing an important piece and we'll want to remedy that ASAP. I still
>> have this feeling I am missing something. The classes really aren't there
>> in the release but ... *nobody* noticed all this time? I guess maybe
>> Spark-Kafka users may be using a vendor distro that does package these bits.
>>
>>
>> On Sat, Aug 4, 2018 at 10:48 AM Sean Owen  wrote:
>>
>>> I was debugging why a Kafka-based streaming app doesn't seem to find
>>> Kafka-related integration classes when run standalone from our latest 2.3.1
>>> release, and noticed that there doesn't seem to be any Kafka-related jars
>>> from Spark in the distro. In jars/, I see:
>>>
>>> spark-catalyst_2.11-2.3.1.jar
>>> spark-core_2.11-2.3.1.jar
>>> spark-graphx_2.11-2.3.1.jar
>>> spark-hive-thriftserver_2.11-2.3.1.jar
>>> spark-hive_2.11-2.3.1.jar
>>> spark-kubernetes_2.11-2.3.1.jar
>>> spark-kvstore_2.11-2.3.1.jar
>>> spark-launcher_2.11-2.3.1.jar
>>> spark-mesos_2.11-2.3.1.jar
>>> spark-mllib-local_2.11-2.3.1.jar
>>> spark-mllib_2.11-2.3.1.jar
>>> spark-network-common_2.11-2.3.1.jar
>>> spark-network-shuffle_2.11-2.3.1.jar
>>> spark-repl_2.11-2.3.1.jar
>>> spark-sketch_2.11-2.3.1.jar
>>> spark-sql_2.11-2.3.1.jar
>>> spark-streaming_2.11-2.3.1.jar
>>> spark-tags_2.11-2.3.1.jar
>>> spark-unsafe_2.11-2.3.1.jar
>>> spark-yarn_2.11-2.3.1.jar
>>>
>>> I checked make-distribution.sh, and it copies a bunch of JARs into the
>>> distro, but does not seem to touch the kafka modules.
>>>
>>> Am I crazy or missing something obvious -- those should be in the
>>> release, right?
>>>
>>
>


Re: Am I crazy, or does the binary distro not have Kafka integration?

2018-08-04 Thread Jacek Laskowski
Hi Sean,

It's been for years I'd say that you had to specify --packages to get the
Kafka-related jars on the classpath. I simply got used to this annoyance
(as did others). Could it be that it's an external package (although an
integral part of Spark)?!

I'm very glad you've brought it up since I think Kafka data source is so
important that it should be included in spark-shell and spark-submit by
default. THANKS!

Pozdrawiam,
Jacek Laskowski

https://about.me/JacekLaskowski
Mastering Spark SQL https://bit.ly/mastering-spark-sql
Spark Structured Streaming https://bit.ly/spark-structured-streaming
Mastering Kafka Streams https://bit.ly/mastering-kafka-streams
Follow me at https://twitter.com/jaceklaskowski

On Sat, Aug 4, 2018 at 9:56 PM, Sean Owen  wrote:

> Let's take this to https://issues.apache.org/jira/browse/SPARK-25026 -- I
> provisionally marked this a Blocker, as if it's correct, then the release
> is missing an important piece and we'll want to remedy that ASAP. I still
> have this feeling I am missing something. The classes really aren't there
> in the release but ... *nobody* noticed all this time? I guess maybe
> Spark-Kafka users may be using a vendor distro that does package these bits.
>
>
> On Sat, Aug 4, 2018 at 10:48 AM Sean Owen  wrote:
>
>> I was debugging why a Kafka-based streaming app doesn't seem to find
>> Kafka-related integration classes when run standalone from our latest 2.3.1
>> release, and noticed that there doesn't seem to be any Kafka-related jars
>> from Spark in the distro. In jars/, I see:
>>
>> spark-catalyst_2.11-2.3.1.jar
>> spark-core_2.11-2.3.1.jar
>> spark-graphx_2.11-2.3.1.jar
>> spark-hive-thriftserver_2.11-2.3.1.jar
>> spark-hive_2.11-2.3.1.jar
>> spark-kubernetes_2.11-2.3.1.jar
>> spark-kvstore_2.11-2.3.1.jar
>> spark-launcher_2.11-2.3.1.jar
>> spark-mesos_2.11-2.3.1.jar
>> spark-mllib-local_2.11-2.3.1.jar
>> spark-mllib_2.11-2.3.1.jar
>> spark-network-common_2.11-2.3.1.jar
>> spark-network-shuffle_2.11-2.3.1.jar
>> spark-repl_2.11-2.3.1.jar
>> spark-sketch_2.11-2.3.1.jar
>> spark-sql_2.11-2.3.1.jar
>> spark-streaming_2.11-2.3.1.jar
>> spark-tags_2.11-2.3.1.jar
>> spark-unsafe_2.11-2.3.1.jar
>> spark-yarn_2.11-2.3.1.jar
>>
>> I checked make-distribution.sh, and it copies a bunch of JARs into the
>> distro, but does not seem to touch the kafka modules.
>>
>> Am I crazy or missing something obvious -- those should be in the
>> release, right?
>>
>


Re: Am I crazy, or does the binary distro not have Kafka integration?

2018-08-04 Thread Sean Owen
Let's take this to https://issues.apache.org/jira/browse/SPARK-25026 -- I
provisionally marked this a Blocker, as if it's correct, then the release
is missing an important piece and we'll want to remedy that ASAP. I still
have this feeling I am missing something. The classes really aren't there
in the release but ... *nobody* noticed all this time? I guess maybe
Spark-Kafka users may be using a vendor distro that does package these bits.

On Sat, Aug 4, 2018 at 10:48 AM Sean Owen  wrote:

> I was debugging why a Kafka-based streaming app doesn't seem to find
> Kafka-related integration classes when run standalone from our latest 2.3.1
> release, and noticed that there doesn't seem to be any Kafka-related jars
> from Spark in the distro. In jars/, I see:
>
> spark-catalyst_2.11-2.3.1.jar
> spark-core_2.11-2.3.1.jar
> spark-graphx_2.11-2.3.1.jar
> spark-hive-thriftserver_2.11-2.3.1.jar
> spark-hive_2.11-2.3.1.jar
> spark-kubernetes_2.11-2.3.1.jar
> spark-kvstore_2.11-2.3.1.jar
> spark-launcher_2.11-2.3.1.jar
> spark-mesos_2.11-2.3.1.jar
> spark-mllib-local_2.11-2.3.1.jar
> spark-mllib_2.11-2.3.1.jar
> spark-network-common_2.11-2.3.1.jar
> spark-network-shuffle_2.11-2.3.1.jar
> spark-repl_2.11-2.3.1.jar
> spark-sketch_2.11-2.3.1.jar
> spark-sql_2.11-2.3.1.jar
> spark-streaming_2.11-2.3.1.jar
> spark-tags_2.11-2.3.1.jar
> spark-unsafe_2.11-2.3.1.jar
> spark-yarn_2.11-2.3.1.jar
>
> I checked make-distribution.sh, and it copies a bunch of JARs into the
> distro, but does not seem to touch the kafka modules.
>
> Am I crazy or missing something obvious -- those should be in the release,
> right?
>


Am I crazy, or does the binary distro not have Kafka integration?

2018-08-04 Thread Sean Owen
I was debugging why a Kafka-based streaming app doesn't seem to find
Kafka-related integration classes when run standalone from our latest 2.3.1
release, and noticed that there doesn't seem to be any Kafka-related jars
from Spark in the distro. In jars/, I see:

spark-catalyst_2.11-2.3.1.jar
spark-core_2.11-2.3.1.jar
spark-graphx_2.11-2.3.1.jar
spark-hive-thriftserver_2.11-2.3.1.jar
spark-hive_2.11-2.3.1.jar
spark-kubernetes_2.11-2.3.1.jar
spark-kvstore_2.11-2.3.1.jar
spark-launcher_2.11-2.3.1.jar
spark-mesos_2.11-2.3.1.jar
spark-mllib-local_2.11-2.3.1.jar
spark-mllib_2.11-2.3.1.jar
spark-network-common_2.11-2.3.1.jar
spark-network-shuffle_2.11-2.3.1.jar
spark-repl_2.11-2.3.1.jar
spark-sketch_2.11-2.3.1.jar
spark-sql_2.11-2.3.1.jar
spark-streaming_2.11-2.3.1.jar
spark-tags_2.11-2.3.1.jar
spark-unsafe_2.11-2.3.1.jar
spark-yarn_2.11-2.3.1.jar

I checked make-distribution.sh, and it copies a bunch of JARs into the
distro, but does not seem to touch the kafka modules.

Am I crazy or missing something obvious -- those should be in the release,
right?