Hi Julian,

thank you very much for your insights.
Your analysis is very detailed and I agree with all your suggestions.
For myself I started to realize the "big" differences between Calcites Use Case 
and the PLC4X Case.
I will definitely have a look into the book advised by you (I like optimization 
quite a lot).

So, to sum up this thread, thanks all of you for your help and your suggestions.
I will bring a summary of this discussion and all next steps now over to the 
PLC4X list if someone is still interested to follow this topic : )

JulianF

Am 23.08.19, 18:12 schrieb "Julian Hyde" <jhyde.apa...@gmail.com>:

    The first step is to realize that you are looking at an optimization 
problem. Congratulations, you’ve done that. 
    
    The next step is to evaluate the possible optimization techniques. Volcano 
is suitable in cases where there is dataflow - where the solution is a tree or 
DAG. It doesn’t seem that you have that problem. So Volcano is not a good 
candidate, I agree. 
    
    It is important that you separate cost model and allowable transformation 
rules from the optimization algorithms. Then you can try different algorithms.
    
    If I were you, I would also consult a good book on the design of optimizing 
compilers - e.g the New Dragon book. They have techniques for dataflow 
optimization, dead code elimination, register assignment, loop unwinding. If 
those problems match your problems then maybe a compilation framework like LLVM 
or Graal would give you what you need. 
    
    > On Aug 23, 2019, at 7:11 AM, Julian Feinauer 
<j.feina...@pragmaticminds.de> wrote:
    > 
    > Hi, 
    > 
    > a short update on this matter.
    > We had a meetup yesterday with the PLC4X community and I prepared a very 
first "demo" implementation of the Optimizer / Framework.
    > I tried to organize it after the Calcite / Volcano approach.
    > 
    > We had several discussions and experimented a bit and some things that 
came to my mind why I'm unsure whether the volcano framework really fits best 
here, or some other approach.
    > 
    > * Usually we have a pretty big set of "Operators" (in our case field 
requests in comparison to Calcites RelNodes):
    > In regular cases they could be 10 but also quite often up to 100 (which 
is rather rare for 'sane' queries, I assume)
    > * We have very few rules:
    > In fact, we may have two or three rules (protocol specific), but usually 
of the form 'merge two field requests into one' or 'split one field request 
into two'
    > * We have no tree structure, but everything is 'flat'
    > 
    > With the above setup its pretty obvious that we cannot profit from 
Volcanos dynamic programming approach. Furthermore, with the simple approach of 
applying all suitable rules to all possible candidates the state space explodes 
with O(n!) (where n could be large, see above).
    > 
    > So I think our best bet atm would be to exploit all possible spaces (but 
with excessive pruning) or use some other sort of algorithms like simple 
gradient descent (I think our convergence behavior should be quite nice if the 
cost function is "smooth" enough) or stochastic optimization like cross entropy 
or simulated annealing.
    > 
    > I just wanted to give some feedback back to the list as many people 
joined the discussion and had interesting ideas. And I still think that there 
is an overlap in whats done but the 'common core' is smaller than I initially 
assumed (e.g., some query optimizers AFAIR use approaches like simulated 
annealing).
    > 
    > Best
    > Julian
    > 
    > Am 20.08.19, 15:38 schrieb "Julian Feinauer" 
<j.feina...@pragmaticminds.de>:
    > 
    >    Hi Stamatis,
    > 
    >    thanks for your response.
    >    I think my brain just needs a bit more time to get really deep into 
those advanced planning topics (its sometimes slow on adopting...).
    >    But I will look through it.
    >    We have a meetup this week and will discuss the matter and how to 
setup everything to enable some optimization at first (introducing cost 
estimates and such) and then I will again have a deeper look and perhaps 
prepare a test case or a runnable test.
    >    Then its probably the easiest to reason about.
    > 
    >    Julian
    > 
    >    Am 20.08.19, 15:19 schrieb "Stamatis Zampetakis" <zabe...@gmail.com>:
    > 
    >        Hi Julian F,
    > 
    >        I admit that I didn't really get your example but talking about 
'batch
    >        request optimization'
    >        and 'collapsing "overlapping" but not equal requests' I get the 
impression
    >        that the problem
    >        is optimizing sets of queries which may have common 
sub-expressions;
    >        the problem is usually referred to as multi-query optimization and 
is
    >        indeed relevant with
    >        the Spool operator mentioned by Julian H.
    > 
    >        If that's the case then the most relevant work that I can think of 
is [1],
    >        which solves the problem
    >        by slightly modifying the search strategy of the Volcano planner.
    > 
    >        Best,
    >        Stamatis
    > 
    >        [1] Roy, Prasan, et al. "Efficient and extensible algorithms for 
multi
    >        query optimization." ACM SIGMOD Record. Vol. 29. No. 2. ACM, 2000. 
(
    >        https://www.cse.iitb.ac.in/~sudarsha/Pubs-dir/mqo-sigmod00.pdf)
    > 
    > 
    >        On Tue, Aug 20, 2019 at 12:49 PM Julian Feinauer <
    >        j.feina...@pragmaticminds.de> wrote:
    > 
    >> Hi Julian,
    >> 
    >> thanks for the reply.
    >> I have to think about that, I think.
    >> 
    >> But as I understand the Spool Operator this is to factor out multiple
    >> calculations of the same issue.
    >> In our Situation we aim more on collapsing "overlapping" but not equal
    >> requests.
    >> 
    >> Consider 8 bits which form physically a byte.
    >> If I read 8 BOOLEANs I have 8 different request which mask one bit, 
return
    >> it (padded) as byte. So 8 requests and 8 bytes data transfer (plus 
masking
    >> on the PLC).
    >> If I would optimize it to read the byte in one request and do the masking
    >> afterwards I would have one request and only 1 byte transferred (plus no
    >> masking on the PLC which keeps pressure low there).
    >> 
    >> This could be modelled by introducing respective "RelNodes" and Planner
    >> Rules, I think but I do not fully understand how Spool fits in here?
    >> 
    >> Julian
    >> 
    >> Am 19.08.19, 20:42 schrieb "Julian Hyde" <jh...@apache.org>:
    >> 
    >>    One tricky aspect is to optimize a *batch* of requests.
    >> 
    >>    The trick is to tie together the batch so that it is costed as one
    >> request. We don’t have an operator specifically for that, but you could 
for
    >> instance use UNION ALL. E.g. given Q1 and Q2, you could generate a plan 
for
    >> 
    >>      select count(*) from Q1 union all select count(*) from Q2
    >> 
    >>    If the plan for the batch is be a DAG (i.e. sharing work between the
    >> components of the batch by creating something akin to “temporary tables”)
    >> then you are in the territory for which we created the Spool operator 
(see
    >> discussion in https://issues.apache.org/jira/browse/CALCITE-481 <
    >> https://issues.apache.org/jira/browse/CALCITE-481>).
    >> 
    >>    Julian
    >> 
    >> 
    >>> On Aug 19, 2019, at 6:34 AM, Julian Feinauer <
    >> j.feina...@pragmaticminds.de> wrote:
    >>> 
    >>> Hi Danny,
    >>> 
    >>> thanks for the quick reply.
    >>> Cost calculation we can of course provide (but it could be a bit
    >> different as we have not only CPU and Memory but also Network or 
something).
    >>> 
    >>> And also something like the RelNodes could be provided. In our case
    >> this would be "Requests" which are at first "Logical" and are then
    >> transformed to "Physical" Requests. For example the API allows you to
    >> request many fields per single request but some PLCs only allow one field
    >> per request. So this would be one task of this layer.
    >>> 
    >>> Julian
    >>> 
    >>> Am 19.08.19, 14:44 schrieb "Danny Chan" <yuzhao....@gmail.com>:
    >>> 
    >>>   Cool idea ! Julian Feinauer ~
    >>> 
    >>>   I think the volcano model can be used the base of the cost
    >> algorithm. As long as you define all the metadata that you care about.
    >> Another thing is that you should have a struct like RelNode and a method
    >> like #computeSelfCost.
    >>> 
    >>>   Best,
    >>>   Danny Chan
    >>>   在 2019年8月19日 +0800 PM5:20,Julian Feinauer <
    >> j.feina...@pragmaticminds.de>,写道:
    >>>> Hi folks,
    >>>> 
    >>>> I’m here again with another PLC4X related question (
    >> https://plc4x.apache.org).
    >>>> As we have more and more usecases we encounter situations where we
    >> send LOTS of replies to PLCs which one could sometimes optimize.
    >>>> This has multiple reasons upstream (like multiple different
    >> Services sending, or you want two logically different addresses which 
could
    >> be physically equal).
    >>>> 
    >>>> So, we consider to add some kind of optimizer which takes a Batch
    >> of requests and tries to arrange them in an “optimal” way with regard to
    >> som cost function.
    >>>> The cost functions would of course be given by each Driver but the
    >> optimizer could / should be rather general (possibly with pluggable 
rules).
    >>>> 
    >>>> As Calcites Planner already includes all of that I ask myself if it
    >> could be possible (and make sense) to use that in PLC4X.
    >>>> Generally speaking, this raises the question if the Volcano
    >> approach can be suitable for such problems.
    >>>> The other alternative would be to start with some kind of heuristic
    >> based planning or with other optimization algorithms (genetic algs, cross
    >> entropy,…).
    >>>> 
    >>>> Any thoughs or feedbacks are welcome!
    >>>> 
    >>>> Julian
    >>> 
    >>> 
    >> 
    >> 
    >> 
    >> 
    > 
    > 
    > 
    > 
    

Reply via email to