I can understand that if you involve columns with variable distribution in join operations, it may change your execution plan, but most of the time this is not going to happen, in streaming the most used operations are: map filter, grouping and stateful operations and in all these cases I can't how a dynamic query planning could help.
It could be useful to have a parameter to force a streaming query to calculate the query plan just once. Paolo Ottieni Outlook per Android<https://aka.ms/ghei36> ________________________________ From: Alessandro Solimando <alessandro.solima...@gmail.com> Sent: Thursday, March 14, 2019 6:59:50 PM To: Paolo Platter Cc: user@spark.apache.org Subject: Re: Structured Streaming & Query Planning Hello Paolo, generally speaking, query planning is mostly based on statistics and distributions of data values for the involved columns, which might significantly change over time in a streaming context, so for me it makes a lot of sense that it is run at every schedule, even though I understand your concern. For the second question I don't know how to (or if you even can) cache the computed query plan. If possible, would you mind sharing your findings afterwards? (query planning on streaming it's a very interesting and not yet enough explored topic IMO) Best regards, Alessandro On Thu, 14 Mar 2019 at 16:51, Paolo Platter <paolo.plat...@agilelab.it<mailto:paolo.plat...@agilelab.it>> wrote: Hi All, I would like to understand why in a streaming query ( that should not be able to change its behaviour along iterations ) there is a queryPlanning-Duration effort ( in my case is 33% of trigger interval ) at every schedule. I don’t uderstand why this is needed and if it is possible to disable or cache it. Thanks [cid:image001.jpg@01D41D15.E01B6F00] Paolo Platter CTO E-mail: paolo.plat...@agilelab.it<mailto:paolo.plat...@agilelab.it> Web Site: www.agilelab.it<http://www.agilelab.it/>