Re: PLC4X Request Optimization

Julian Hyde Fri, 23 Aug 2019 09:12:46 -0700

The first step is to realize that you are looking at an optimization problem. 
Congratulations, you’ve done that.


The next step is to evaluate the possible optimization techniques. Volcano is 
suitable in cases where there is dataflow - where the solution is a tree or 
DAG. It doesn’t seem that you have that problem. So Volcano is not a good 
candidate, I agree. 

It is important that you separate cost model and allowable transformation rules 
from the optimization algorithms. Then you can try different algorithms.

If I were you, I would also consult a good book on the design of optimizing 
compilers - e.g the New Dragon book. They have techniques for dataflow 
optimization, dead code elimination, register assignment, loop unwinding. If 
those problems match your problems then maybe a compilation framework like LLVM 
or Graal would give you what you need. 

> On Aug 23, 2019, at 7:11 AM, Julian Feinauer <j.feina...@pragmaticminds.de> 
> wrote:
> 
> Hi, 
> 
> a short update on this matter.
> We had a meetup yesterday with the PLC4X community and I prepared a very 
> first "demo" implementation of the Optimizer / Framework.
> I tried to organize it after the Calcite / Volcano approach.
> 
> We had several discussions and experimented a bit and some things that came 
> to my mind why I'm unsure whether the volcano framework really fits best 
> here, or some other approach.
> 
> * Usually we have a pretty big set of "Operators" (in our case field requests 
> in comparison to Calcites RelNodes):
> In regular cases they could be 10 but also quite often up to 100 (which is 
> rather rare for 'sane' queries, I assume)
> * We have very few rules:
> In fact, we may have two or three rules (protocol specific), but usually of 
> the form 'merge two field requests into one' or 'split one field request into 
> two'
> * We have no tree structure, but everything is 'flat'
> 
> With the above setup its pretty obvious that we cannot profit from Volcanos 
> dynamic programming approach. Furthermore, with the simple approach of 
> applying all suitable rules to all possible candidates the state space 
> explodes with O(n!) (where n could be large, see above).
> 
> So I think our best bet atm would be to exploit all possible spaces (but with 
> excessive pruning) or use some other sort of algorithms like simple gradient 
> descent (I think our convergence behavior should be quite nice if the cost 
> function is "smooth" enough) or stochastic optimization like cross entropy or 
> simulated annealing.
> 
> I just wanted to give some feedback back to the list as many people joined 
> the discussion and had interesting ideas. And I still think that there is an 
> overlap in whats done but the 'common core' is smaller than I initially 
> assumed (e.g., some query optimizers AFAIR use approaches like simulated 
> annealing).
> 
> Best
> Julian
> 
> Am 20.08.19, 15:38 schrieb "Julian Feinauer" <j.feina...@pragmaticminds.de>:
> 
>    Hi Stamatis,
> 
>    thanks for your response.
>    I think my brain just needs a bit more time to get really deep into those 
> advanced planning topics (its sometimes slow on adopting...).
>    But I will look through it.
>    We have a meetup this week and will discuss the matter and how to setup 
> everything to enable some optimization at first (introducing cost estimates 
> and such) and then I will again have a deeper look and perhaps prepare a test 
> case or a runnable test.
>    Then its probably the easiest to reason about.
> 
>    Julian
> 
>    Am 20.08.19, 15:19 schrieb "Stamatis Zampetakis" <zabe...@gmail.com>:
> 
>        Hi Julian F,
> 
>        I admit that I didn't really get your example but talking about 'batch
>        request optimization'
>        and 'collapsing "overlapping" but not equal requests' I get the 
> impression
>        that the problem
>        is optimizing sets of queries which may have common sub-expressions;
>        the problem is usually referred to as multi-query optimization and is
>        indeed relevant with
>        the Spool operator mentioned by Julian H.
> 
>        If that's the case then the most relevant work that I can think of is 
> [1],
>        which solves the problem
>        by slightly modifying the search strategy of the Volcano planner.
> 
>        Best,
>        Stamatis
> 
>        [1] Roy, Prasan, et al. "Efficient and extensible algorithms for multi
>        query optimization." ACM SIGMOD Record. Vol. 29. No. 2. ACM, 2000. (
>        https://www.cse.iitb.ac.in/~sudarsha/Pubs-dir/mqo-sigmod00.pdf)
> 
> 
>        On Tue, Aug 20, 2019 at 12:49 PM Julian Feinauer <
>        j.feina...@pragmaticminds.de> wrote:
> 
>> Hi Julian,
>> 
>> thanks for the reply.
>> I have to think about that, I think.
>> 
>> But as I understand the Spool Operator this is to factor out multiple
>> calculations of the same issue.
>> In our Situation we aim more on collapsing "overlapping" but not equal
>> requests.
>> 
>> Consider 8 bits which form physically a byte.
>> If I read 8 BOOLEANs I have 8 different request which mask one bit, return
>> it (padded) as byte. So 8 requests and 8 bytes data transfer (plus masking
>> on the PLC).
>> If I would optimize it to read the byte in one request and do the masking
>> afterwards I would have one request and only 1 byte transferred (plus no
>> masking on the PLC which keeps pressure low there).
>> 
>> This could be modelled by introducing respective "RelNodes" and Planner
>> Rules, I think but I do not fully understand how Spool fits in here?
>> 
>> Julian
>> 
>> Am 19.08.19, 20:42 schrieb "Julian Hyde" <jh...@apache.org>:
>> 
>>    One tricky aspect is to optimize a *batch* of requests.
>> 
>>    The trick is to tie together the batch so that it is costed as one
>> request. We don’t have an operator specifically for that, but you could for
>> instance use UNION ALL. E.g. given Q1 and Q2, you could generate a plan for
>> 
>>      select count(*) from Q1 union all select count(*) from Q2
>> 
>>    If the plan for the batch is be a DAG (i.e. sharing work between the
>> components of the batch by creating something akin to “temporary tables”)
>> then you are in the territory for which we created the Spool operator (see
>> discussion in https://issues.apache.org/jira/browse/CALCITE-481 <
>> https://issues.apache.org/jira/browse/CALCITE-481>).
>> 
>>    Julian
>> 
>> 
>>> On Aug 19, 2019, at 6:34 AM, Julian Feinauer <
>> j.feina...@pragmaticminds.de> wrote:
>>> 
>>> Hi Danny,
>>> 
>>> thanks for the quick reply.
>>> Cost calculation we can of course provide (but it could be a bit
>> different as we have not only CPU and Memory but also Network or something).
>>> 
>>> And also something like the RelNodes could be provided. In our case
>> this would be "Requests" which are at first "Logical" and are then
>> transformed to "Physical" Requests. For example the API allows you to
>> request many fields per single request but some PLCs only allow one field
>> per request. So this would be one task of this layer.
>>> 
>>> Julian
>>> 
>>> Am 19.08.19, 14:44 schrieb "Danny Chan" <yuzhao....@gmail.com>:
>>> 
>>>   Cool idea ! Julian Feinauer ~
>>> 
>>>   I think the volcano model can be used the base of the cost
>> algorithm. As long as you define all the metadata that you care about.
>> Another thing is that you should have a struct like RelNode and a method
>> like #computeSelfCost.
>>> 
>>>   Best,
>>>   Danny Chan
>>>   在 2019年8月19日 +0800 PM5:20，Julian Feinauer <
>> j.feina...@pragmaticminds.de>，写道：
>>>> Hi folks,
>>>> 
>>>> I’m here again with another PLC4X related question (
>> https://plc4x.apache.org).
>>>> As we have more and more usecases we encounter situations where we
>> send LOTS of replies to PLCs which one could sometimes optimize.
>>>> This has multiple reasons upstream (like multiple different
>> Services sending, or you want two logically different addresses which could
>> be physically equal).
>>>> 
>>>> So, we consider to add some kind of optimizer which takes a Batch
>> of requests and tries to arrange them in an “optimal” way with regard to
>> som cost function.
>>>> The cost functions would of course be given by each Driver but the
>> optimizer could / should be rather general (possibly with pluggable rules).
>>>> 
>>>> As Calcites Planner already includes all of that I ask myself if it
>> could be possible (and make sense) to use that in PLC4X.
>>>> Generally speaking, this raises the question if the Volcano
>> approach can be suitable for such problems.
>>>> The other alternative would be to start with some kind of heuristic
>> based planning or with other optimization algorithms (genetic algs, cross
>> entropy,…).
>>>> 
>>>> Any thoughs or feedbacks are welcome!
>>>> 
>>>> Julian
>>> 
>>> 
>> 
>> 
>> 
>> 
> 
> 
> 
>

Re: PLC4X Request Optimization

Reply via email to