subject:"\[Spark Structured Streaming\] Changing partitions of \(flat\)MapGroupsWithState"

Re: [Spark Structured Streaming] Changing partitions of (flat)MapGroupsWithState

2017-11-08 Thread Michael Armbrust

The relevant config is spark.sql.shuffle.partitions. Note that once you start a query, this number is fixed. The config will only affect queries starting from an empty checkpoint. On Wed, Nov 8, 2017 at 7:34 AM, Teemu Heikkilä wrote: > I have spark structured streaming job

[Spark Structured Streaming] Changing partitions of (flat)MapGroupsWithState

2017-11-08 Thread Teemu Heikkilä

I have spark structured streaming job and I'm crunching through few terabytes of data. I'm using file stream reader and it works flawlessly, I can adjust the partitioning of that with spark.default.parallelism However I'm doing sessionization for the data after loading it and I'm currently