Sorry, but my original solution is incorrect
1. Glue Crawlers are not supposed to set the spark.sql.sources.schema.*
properties, but Spark SQL should. The default in Spark 2.4 for
spark.sql.hive.caseSensitiveInferenceMode is INFER_AND_SAVE which means that
Spark infers the schema from the
This bug happens because the Glue table's SERDEPROPERTIES is missing two
important properties:
spark.sql.sources.schema.numParts
spark.sql.sources.schema.part.0
To solve the problem, I had to add those two properties via the Glue console
(couldn't do it with ALTER TABLE …)
I guess
Spark version: 2.4.2 on Amazon EMR 5.24.0
I have a Glue Catalog table backed by S3 Parquet directory. The Parquet
files have case-sensitive column names (like /lastModified/). It doesn't
matter what I do, I get lowercase column names (/lastmodified/) when reading
the Glue Catalog table with
Thanks guys, it really helps.
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
We have a use case where there's a stream of events while every event has an
ID and its current state with a timestamp:
…
111,ready,1532949947
111,offline,1532949955
111,ongoing,1532949955
111,offline,1532949973
333,offline,1532949981
333,ongoing,1532949987
…
We want to ask questions about the
In Structured Streaming, there's the notion of event-time windowing:
However, this is not quite similar to DStream's windowing operations: in
Structured Streaming, windowing groups the data by fixed time-windows, and
every event in a time window is associated to its group:
And in DStreams it