Re: kafka structured streaming source refuses to read

2017-01-30 Thread Michael Armbrust
Thanks for for following up! I've linked the relevant tickets to SPARK-18057 and I targeted it for Spark 2.2. On Sat, Jan 28, 2017 at 10:15 AM, Koert Kuipers wrote: > there was also already an existing spark ticket for

Re: kafka structured streaming source refuses to read

2017-01-28 Thread Koert Kuipers
there was also already an existing spark ticket for this: SPARK-18779 On Sat, Jan 28, 2017 at 1:13 PM, Koert Kuipers wrote: > it seems the bug is: > https://issues.apache.org/jira/browse/KAFKA-4547 > > i would advise

Re: kafka structured streaming source refuses to read

2017-01-28 Thread Koert Kuipers
it seems the bug is: https://issues.apache.org/jira/browse/KAFKA-4547 i would advise everyone not to use kafka-clients 0.10.0.2, 0.10.1.0 or 0.10.1.1 On Fri, Jan 27, 2017 at 3:56 PM, Koert Kuipers wrote: > in case anyone else runs into this: > > the issue is that i was using

Re: kafka structured streaming source refuses to read

2017-01-27 Thread Michael Armbrust
Yeah, kafka server client compatibility can be pretty confusing and does not give good errors in the case of mismatches. This should be addressed in the next release of kafka (they are adding an API to query the servers capabilities). On Fri, Jan 27, 2017 at 12:56 PM, Koert Kuipers

Re: kafka structured streaming source refuses to read

2017-01-27 Thread Koert Kuipers
in case anyone else runs into this: the issue is that i was using kafka-clients 0.10.1.1 it works when i use kafka-clients 0.10.0.1 with spark structured streaming my kafka server is 0.10.1.1 On Fri, Jan 27, 2017 at 1:24 PM, Koert Kuipers wrote: > i checked my topic. it

Re: kafka structured streaming source refuses to read

2017-01-27 Thread Shixiong(Ryan) Zhu
Thanks for reporting this. Which Spark version are you using? Could you provide the full log, please? On Fri, Jan 27, 2017 at 10:24 AM, Koert Kuipers wrote: > i checked my topic. it has 5 partitions but all the data is written to a > single partition: wikipedia-2 > i turned

Re: kafka structured streaming source refuses to read

2017-01-27 Thread Koert Kuipers
i checked my topic. it has 5 partitions but all the data is written to a single partition: wikipedia-2 i turned on debug logging and i see this: 2017-01-27 13:02:50 DEBUG kafka010.KafkaSource: Partitions assigned to consumer: [wikipedia-0, wikipedia-4, wikipedia-3, wikipedia-2, wikipedia-1].

Re: kafka structured streaming source refuses to read

2017-01-27 Thread Koert Kuipers
code: val query = spark.readStream .format("kafka") .option("kafka.bootstrap.servers", "somenode:9092") .option("subscribe", "wikipedia") .load .select(col("value") cast StringType) .writeStream .format("console")

Re: kafka structured streaming source refuses to read

2017-01-26 Thread Koert Kuipers
my little program prints out query.lastProgress every 10 seconds, and this is what it shows: { "id" : "4cc09da1-c002-4fa7-86dd-ba03018e53a0", "runId" : "bf8cdde2-44d0-4bff-ad90-7cbac0432099", "name" : "wiki", "timestamp" : "2017-01-26T22:54:45.732Z", "numInputRows" : 0,

kafka structured streaming source refuses to read

2017-01-26 Thread Koert Kuipers
hey, i am just getting started with kafka + spark structured streaming. so this is probably a pretty dumb mistake. i wrote a little program in spark to read messages from a kafka topic and display them in the console, using the kafka source and console sink. i run it it in spark local mode. i