Re: what is the difference between json format vs kafka format?

2017-05-15 Thread Michael Armbrust
For that simple count, you don't actually have to even parse the JSON data. You can just do a count. The following code assumes you are running Spark 2.2 .

Re: what is the difference between json format vs kafka format?

2017-05-13 Thread kant kodali
Hi, Here is a little bit of background. I've been using stateless streaming API's for a while like using JavaDstream and so on and they worked well. It's has come to a point where we need to do realtime stateful streaming based on event time and other things but for now I am just trying to get

Re: what is the difference between json format vs kafka format?

2017-05-13 Thread Tathagata Das
You cant do ".count()" directly on streaming DataFrames. This is because "count" is an Action (remember RDD actions) that executes and returns a result immediately which can be done only when the data is bounded (e.g. batch/interactive queries). For streaming queries, you have to let it run in the

Re: what is the difference between json format vs kafka format?

2017-05-13 Thread kant kodali
Hi! Thanks for the response. Looks like from_json requires schema ahead of time. Is there any function I can use to infer schema from the json messages I am receiving through Kafka? I tried with the code below however I get the following exception. org.apache.spark.sql.AnalysisException:

Re: what is the difference between json format vs kafka format?

2017-05-13 Thread Tathagata Das
I understand the confusing. "json" format is for json encoded files being written in a directory. For Kafka, use "kafk" format. Then you decode the binary data as a json, you can use the function "from_json" (spark 2.1 and above). Here is our blog post on this.

what is the difference between json format vs kafka format?

2017-05-13 Thread kant kodali
HI All, What is the difference between sparkSession.readStream.format("kafka") vs sparkSession.readStream.format("json") ? I am sending json encoded messages in Kafka and I am not sure which one of the above I should use? Thanks!