Re: Structured Streaming & Query Planning

2019-03-18 Thread Arun Mahadevan
e the query plan just once. >> >> Paolo >> >> >> >> Ottieni Outlook per Android <https://aka.ms/ghei36> >> >> ------ >> *From:* Alessandro Solimando >> *Sent:* Thursday, March 14, 2019 6:59:50 PM >> *To:* Paolo Platter >> *Cc:* user@

Re: Structured Streaming & Query Planning

2019-03-18 Thread Jungtaek Lim
lo > > > > Ottieni Outlook per Android <https://aka.ms/ghei36> > > -- > *From:* Alessandro Solimando > *Sent:* Thursday, March 14, 2019 6:59:50 PM > *To:* Paolo Platter > *Cc:* user@spark.apache.org > *Subject:* Re: Structured St

Re: Structured Streaming & Query Planning

2019-03-18 Thread Paolo Platter
rch 14, 2019 6:59:50 PM To: Paolo Platter Cc: user@spark.apache.org Subject: Re: Structured Streaming & Query Planning Hello Paolo, generally speaking, query planning is mostly based on statistics and distributions of data values for the involved columns, which might significantly change

Re: Structured Streaming & Query Planning

2019-03-14 Thread Alessandro Solimando
Hello Paolo, generally speaking, query planning is mostly based on statistics and distributions of data values for the involved columns, which might significantly change over time in a streaming context, so for me it makes a lot of sense that it is run at every schedule, even though I understand

Structured Streaming & Query Planning

2019-03-14 Thread Paolo Platter
Hi All, I would like to understand why in a streaming query ( that should not be able to change its behaviour along iterations ) there is a queryPlanning-Duration effort ( in my case is 33% of trigger interval ) at every schedule. I don’t uderstand why this is needed and if it is possible to