rk("timestamp", "1 hour")
> .groupBy(window("timestamp", "30 seconds"))
> .agg(...)
>
> Read more here - https://databricks.com/blog/2017/05/08/event-time-
> aggregation-watermarking-apache-sparks-structured-streaming.html
>
>
>
ta = spark.table("deviceBasicAggSummary"
);
deviceBasicAggSummaryData.toDF().write().parquet("/data/summary/devices/"+
new Date().getTime()+"/");
}
=
So whats the best practice for 'low latency query on distributed data'
u
s/")
.option("path", "/data/summary/tags/")
.start();
But, parquet doesn't support 'complete' outputMode.
So is parquet supported only for batch queries , NOT for streaming queries
?
- note that console outputmode working fine !
Any help will be much appreciated.
Thanks
Kaniska
cribeType, topics)
.load()
.withColumn("message", from_json(col("value").cast("string"),
tweetSchema)) // cast the binary value to a string and parse it as json
.select("message.*") // unnest the json
.as(Encoders.bean(Tweet.class))
Thanks
Kaniska
sults into similar compilation error.
Thanks
Kaniska
On Mon, Mar 27, 2017 at 12:09 PM, Michael Armbrust <mich...@databricks.com>
wrote:
> Sorry, I don't think that I understand the question. Value is just a
> binary blob that we get from kafka and pass to you. If its stored in JSO
*") // unnest the json
> .as(Encoders.bean(Tweet.class)) // only required if you want to use
> lambda functions on the data using this class
>
> Here is some more info on working with JSON and other semi-structured
> formats
> <https://databricks.com/blog/2017/02/23/workin
Hi,
Currently , encountering the following exception while working with
below-mentioned code snippet :
> Please suggest the correct approach for reading the stream into a sql
> schema.
> If I add 'tweetSchema' while reading stream, it errors out with message -
> we can not change static schema