Re: Cannot read case-sensitive Glue table backed by Parquet

2020-01-17 Thread oripwk
Sorry, but my original solution is incorrect 1. Glue Crawlers are not supposed to set the spark.sql.sources.schema.* properties, but Spark SQL should. The default in Spark 2.4 for spark.sql.hive.caseSensitiveInferenceMode is INFER_AND_SAVE which means that Spark infers the schema from the

Re: Cannot read case-sensitive Glue table backed by Parquet

2020-01-17 Thread oripwk
This bug happens because the Glue table's SERDEPROPERTIES is missing two important properties: spark.sql.sources.schema.numParts spark.sql.sources.schema.part.0 To solve the problem, I had to add those two properties via the Glue console (couldn't do it with ALTER TABLE …) I guess

Cannot read case-sensitive Glue table backed by Parquet

2020-01-16 Thread oripwk
Spark version: 2.4.2 on Amazon EMR 5.24.0 I have a Glue Catalog table backed by S3 Parquet directory. The Parquet files have case-sensitive column names (like /lastModified/). It doesn't matter what I do, I get lowercase column names (/lastmodified/) when reading the Glue Catalog table with

Re: How to reduceByKeyAndWindow in Structured Streaming?

2018-07-30 Thread oripwk
Thanks guys, it really helps. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Using Spark Streaming for analyzing changing data

2018-07-30 Thread oripwk
We have a use case where there's a stream of events while every event has an ID and its current state with a timestamp: … 111,ready,1532949947 111,offline,1532949955 111,ongoing,1532949955 111,offline,1532949973 333,offline,1532949981 333,ongoing,1532949987 … We want to ask questions about the

How to reduceByKeyAndWindow in Structured Streaming?

2018-06-28 Thread oripwk
In Structured Streaming, there's the notion of event-time windowing: However, this is not quite similar to DStream's windowing operations: in Structured Streaming, windowing groups the data by fixed time-windows, and every event in a time window is associated to its group: And in DStreams it