Hello Zhenya, First of all - local statistics are not useless cause we can use it in local H2 query planning phase, at least for now. "Client ..." - any node, that can build query execution plans. I believe that lately we can do query execution planning on client nodes. But it's a good question and I rename "client node" to "query planning node". " If there are no statistics in all of them - client will choose random " - I suppose we can choose any server node that can hold data, covered by requested statistics. But if query planning node needs statistics by two or more tables - they can be located in a separate groups of server nodes so such queries should be send separately. So the answer is yes (we should send statistics request and collection request only to nodes, storing the table) and no (collection request can be send to any of that node) "After getting statistics client will cache it and server node it to renew statistics from same node" - I mean that after getting collected statistics client node can cache server node which has sent statistics to get future updates. Client will renew its cache with TTL approach while server can decide when statistics should be collected again by, for example, counting the number of changed rows in underlying tables. "Whats the storage mechanism for client node statistics?" - no storage, even server node won't store global statistics persistently, but they will store local partition level statistics to speedup collection (do aggregation instead of collection) after restart. "Can we use thin client without discs in such cases?" - certainly, no persistent store needed on any client nodes. I made minor changes in IEP according to your notices. Follow-up questions are welcome.
пт, 16 окт. 2020 г. в 14:31, Zhenya Stanilovsky <arzamas...@mail.ru.invalid >: > > Andrey, thanks for firing this ! > Sasha it`s unclear for me « These part consists of two processes: > statistics collection process itself and acquiring statistics by the > client. »: > * I agree that in both cases local statistics are useless. > May be we need more informative use cases for such statistics usage ? Can > someone append additional columns (possible not presented in index) > statistics? > * Client — can you unfold this term ? If this means — ignite client node > ? Does sql best plan is chosen in request starter node ? If so — what about > this client with limited cpu here? > * « If there are no statistics in all of them - client will choose random > » — not random but affinity concerted isn`t it ? > * « After getting statistics client will cache it and server node it to > renew statistics from same node. » I don`t understand this > approach, can you clarify it plz ? > * Whats the storage mechanism for client node statistics? > * Can we use thin client without discs in such cases? > thanks ! > > >: > > > >Follow up > > > >Igniters, > > > >is there any comment to this IEP? > > > >JFYI, IEP is renamed and placed here [1] > > > >[1] > https://cwiki.apache.org/confluence/display/IGNITE/IEP-58%3A+Statistics+for+SQL+query+optimization > > > >On Thu, Sep 24, 2020 at 2:30 PM Sasha Belyak < rtsfo...@gmail.com > > wrote: > >> > >> Igniters, > >> I'e prepared an IEP [1], please review and let me know what you think. > >> > >> In particular, I'd like to discuss the new subsystem to collect > statistics > >> to optimize sql queries execution. > >> [1] > https://cwiki.apache.org/confluence/display/IGNITE/IEP-58+Statistics > > >