Re: Infer JSON schema in structured streaming Kafka.

2017-12-11 Thread Burak Yavuz
In Spark 2.2, you can read from Kafka in batch mode, and then use the json reader to infer schema: val df = spark.read.format("kafka")... .select($"value.cast("string")) val json = spark.read.json(df) val schema = json.schema While the above should be slow (since you're reading almost all data

Re: Infer JSON schema in structured streaming Kafka.

2017-12-11 Thread satyajit vegesna
Hi Burak, Thank you , for the inputs, would definitely try the options. The reason we don't have an unified schema is because we are trying to consume data from different topics that contains data from different tables from a DB, and so each table has different columns. Regards, Satyajit. On

Re: Infer JSON schema in structured streaming Kafka.

2017-12-11 Thread Jacek Laskowski
Hi, What about a custom streaming Sink that would stop the query after addBatch has been called? Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark

Re: Infer JSON schema in structured streaming Kafka.

2017-12-11 Thread satyajit vegesna
Hi Jacek, For now , i am using Thread.sleep() on driver, to make sure my streaming query receives some data and and stop it, before the control reaches querying memory table. Let me know if there is any better way of handling it. Regards, Satyajit. On Sun, Dec 10, 2017 at 10:43 PM, satyajit

Re: Infer JSON schema in structured streaming Kafka.

2017-12-10 Thread satyajit vegesna
Hi Jacek, Thank you for responding back, i have tried memory sink, and below is what i did val fetchValue = debeziumRecords.selectExpr("value").withColumn("tableName", functions.get_json_object($"value".cast(StringType), "$.schema.name")) .withColumn("operation",

Re: Infer JSON schema in structured streaming Kafka.

2017-12-10 Thread Jacek Laskowski
Hi, What about memory sink? That could work. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On

Infer JSON schema in structured streaming Kafka.

2017-12-10 Thread satyajit vegesna
Hi All, I would like to infer JSON schema from a sample of data that i receive from, Kafka Streams(specific topic), and i have to infer the schema as i am going to receive random JSON string with different schema for each topic, so i chose to go ahead with below steps, a. readStream from