Spark 2.4.3 - Structured Streaming - high on Storage Memory

2019-06-15 Thread puneetloya
Hi, Just upgraded Spark from 2.2.3 to 2.4.3. Ran a load test with a week worth of messages in kafka. Seeing an odd behavior, why is the storage memory so high? Have run similar workloads with Spark 2.2.3,

unsubscribe

2019-06-15 Thread Humberto Marchezi
-- Humberto C Marchezi -

Creating Spark buckets that Presto / Athena / Hive can leverage

2019-06-15 Thread Daniel Mateus Pires
Hi there! I am trying to optimize joins on data created by Spark, so I'd like to bucket the data to avoid shuffling. I am writing to immutable partitions every day by writing data to a local HDFS and then copying this data to S3, is there a combination of bucketBy options and DDL that I can use