Re: PLC4X Request Optimization

2019-08-26 Thread Julian Feinauer
Hi Julian,

thank you very much for your insights.
Your analysis is very detailed and I agree with all your suggestions.
For myself I started to realize the "big" differences between Calcites Use Case 
and the PLC4X Case.
I will definitely have a look into the book advised by you (I like optimization 
quite a lot).

So, to sum up this thread, thanks all of you for your help and your suggestions.
I will bring a summary of this discussion and all next steps now over to the 
PLC4X list if someone is still interested to follow this topic : )

JulianF

Am 23.08.19, 18:12 schrieb "Julian Hyde" :

The first step is to realize that you are looking at an optimization 
problem. Congratulations, you’ve done that. 

The next step is to evaluate the possible optimization techniques. Volcano 
is suitable in cases where there is dataflow - where the solution is a tree or 
DAG. It doesn’t seem that you have that problem. So Volcano is not a good 
candidate, I agree. 

It is important that you separate cost model and allowable transformation 
rules from the optimization algorithms. Then you can try different algorithms.

If I were you, I would also consult a good book on the design of optimizing 
compilers - e.g the New Dragon book. They have techniques for dataflow 
optimization, dead code elimination, register assignment, loop unwinding. If 
those problems match your problems then maybe a compilation framework like LLVM 
or Graal would give you what you need. 

> On Aug 23, 2019, at 7:11 AM, Julian Feinauer 
 wrote:
> 
> Hi, 
> 
> a short update on this matter.
> We had a meetup yesterday with the PLC4X community and I prepared a very 
first "demo" implementation of the Optimizer / Framework.
> I tried to organize it after the Calcite / Volcano approach.
> 
> We had several discussions and experimented a bit and some things that 
came to my mind why I'm unsure whether the volcano framework really fits best 
here, or some other approach.
> 
> * Usually we have a pretty big set of "Operators" (in our case field 
requests in comparison to Calcites RelNodes):
> In regular cases they could be 10 but also quite often up to 100 (which 
is rather rare for 'sane' queries, I assume)
> * We have very few rules:
> In fact, we may have two or three rules (protocol specific), but usually 
of the form 'merge two field requests into one' or 'split one field request 
into two'
> * We have no tree structure, but everything is 'flat'
> 
> With the above setup its pretty obvious that we cannot profit from 
Volcanos dynamic programming approach. Furthermore, with the simple approach of 
applying all suitable rules to all possible candidates the state space explodes 
with O(n!) (where n could be large, see above).
> 
> So I think our best bet atm would be to exploit all possible spaces (but 
with excessive pruning) or use some other sort of algorithms like simple 
gradient descent (I think our convergence behavior should be quite nice if the 
cost function is "smooth" enough) or stochastic optimization like cross entropy 
or simulated annealing.
> 
> I just wanted to give some feedback back to the list as many people 
joined the discussion and had interesting ideas. And I still think that there 
is an overlap in whats done but the 'common core' is smaller than I initially 
assumed (e.g., some query optimizers AFAIR use approaches like simulated 
annealing).
> 
> Best
> Julian
> 
> Am 20.08.19, 15:38 schrieb "Julian Feinauer" 
:
> 
>Hi Stamatis,
> 
>thanks for your response.
>I think my brain just needs a bit more time to get really deep into 
those advanced planning topics (its sometimes slow on adopting...).
>But I will look through it.
>We have a meetup this week and will discuss the matter and how to 
setup everything to enable some optimization at first (introducing cost 
estimates and such) and then I will again have a deeper look and perhaps 
prepare a test case or a runnable test.
>Then its probably the easiest to reason about.
> 
>Julian
> 
>Am 20.08.19, 15:19 schrieb "Stamatis Zampetakis" :
> 
>Hi Julian F,
> 
>I admit that I didn't really get your example but talking about 
'batch
>request optimization'
>and 'collapsing "overlapping" but not equal requests' I get the 
impression
>that the problem
>is optimizing sets of queries which may have common 
sub-expressions;
>the problem is usually referred to as multi-query optimization and 
is
>indeed relevant with
>the Spool operator mentioned by Julian H.
> 
>If that's the case then the most relevant work that I can think of 
is [1],
>which solves the problem
>by slightly modifying the search strategy of the Volcano 

Re: PLC4X Request Optimization

2019-08-23 Thread Julian Hyde
The first step is to realize that you are looking at an optimization problem. 
Congratulations, you’ve done that. 

The next step is to evaluate the possible optimization techniques. Volcano is 
suitable in cases where there is dataflow - where the solution is a tree or 
DAG. It doesn’t seem that you have that problem. So Volcano is not a good 
candidate, I agree. 

It is important that you separate cost model and allowable transformation rules 
from the optimization algorithms. Then you can try different algorithms.

If I were you, I would also consult a good book on the design of optimizing 
compilers - e.g the New Dragon book. They have techniques for dataflow 
optimization, dead code elimination, register assignment, loop unwinding. If 
those problems match your problems then maybe a compilation framework like LLVM 
or Graal would give you what you need. 

> On Aug 23, 2019, at 7:11 AM, Julian Feinauer  
> wrote:
> 
> Hi, 
> 
> a short update on this matter.
> We had a meetup yesterday with the PLC4X community and I prepared a very 
> first "demo" implementation of the Optimizer / Framework.
> I tried to organize it after the Calcite / Volcano approach.
> 
> We had several discussions and experimented a bit and some things that came 
> to my mind why I'm unsure whether the volcano framework really fits best 
> here, or some other approach.
> 
> * Usually we have a pretty big set of "Operators" (in our case field requests 
> in comparison to Calcites RelNodes):
> In regular cases they could be 10 but also quite often up to 100 (which is 
> rather rare for 'sane' queries, I assume)
> * We have very few rules:
> In fact, we may have two or three rules (protocol specific), but usually of 
> the form 'merge two field requests into one' or 'split one field request into 
> two'
> * We have no tree structure, but everything is 'flat'
> 
> With the above setup its pretty obvious that we cannot profit from Volcanos 
> dynamic programming approach. Furthermore, with the simple approach of 
> applying all suitable rules to all possible candidates the state space 
> explodes with O(n!) (where n could be large, see above).
> 
> So I think our best bet atm would be to exploit all possible spaces (but with 
> excessive pruning) or use some other sort of algorithms like simple gradient 
> descent (I think our convergence behavior should be quite nice if the cost 
> function is "smooth" enough) or stochastic optimization like cross entropy or 
> simulated annealing.
> 
> I just wanted to give some feedback back to the list as many people joined 
> the discussion and had interesting ideas. And I still think that there is an 
> overlap in whats done but the 'common core' is smaller than I initially 
> assumed (e.g., some query optimizers AFAIR use approaches like simulated 
> annealing).
> 
> Best
> Julian
> 
> Am 20.08.19, 15:38 schrieb "Julian Feinauer" :
> 
>Hi Stamatis,
> 
>thanks for your response.
>I think my brain just needs a bit more time to get really deep into those 
> advanced planning topics (its sometimes slow on adopting...).
>But I will look through it.
>We have a meetup this week and will discuss the matter and how to setup 
> everything to enable some optimization at first (introducing cost estimates 
> and such) and then I will again have a deeper look and perhaps prepare a test 
> case or a runnable test.
>Then its probably the easiest to reason about.
> 
>Julian
> 
>Am 20.08.19, 15:19 schrieb "Stamatis Zampetakis" :
> 
>Hi Julian F,
> 
>I admit that I didn't really get your example but talking about 'batch
>request optimization'
>and 'collapsing "overlapping" but not equal requests' I get the 
> impression
>that the problem
>is optimizing sets of queries which may have common sub-expressions;
>the problem is usually referred to as multi-query optimization and is
>indeed relevant with
>the Spool operator mentioned by Julian H.
> 
>If that's the case then the most relevant work that I can think of is 
> [1],
>which solves the problem
>by slightly modifying the search strategy of the Volcano planner.
> 
>Best,
>Stamatis
> 
>[1] Roy, Prasan, et al. "Efficient and extensible algorithms for multi
>query optimization." ACM SIGMOD Record. Vol. 29. No. 2. ACM, 2000. (
>https://www.cse.iitb.ac.in/~sudarsha/Pubs-dir/mqo-sigmod00.pdf)
> 
> 
>On Tue, Aug 20, 2019 at 12:49 PM Julian Feinauer <
>j.feina...@pragmaticminds.de> wrote:
> 
>> Hi Julian,
>> 
>> thanks for the reply.
>> I have to think about that, I think.
>> 
>> But as I understand the Spool Operator this is to factor out multiple
>> calculations of the same issue.
>> In our Situation we aim more on collapsing "overlapping" but not equal
>> requests.
>> 
>> Consider 8 bits which form physically a byte.
>> If I read 8 BOOLEANs I have 8 different request which mask one bit, 

Re: PLC4X Request Optimization

2019-08-23 Thread Julian Feinauer
Hi, 

a short update on this matter.
We had a meetup yesterday with the PLC4X community and I prepared a very first 
"demo" implementation of the Optimizer / Framework.
I tried to organize it after the Calcite / Volcano approach.

We had several discussions and experimented a bit and some things that came to 
my mind why I'm unsure whether the volcano framework really fits best here, or 
some other approach.

* Usually we have a pretty big set of "Operators" (in our case field requests 
in comparison to Calcites RelNodes):
In regular cases they could be 10 but also quite often up to 100 (which is 
rather rare for 'sane' queries, I assume)
* We have very few rules:
In fact, we may have two or three rules (protocol specific), but usually of the 
form 'merge two field requests into one' or 'split one field request into two'
* We have no tree structure, but everything is 'flat'

With the above setup its pretty obvious that we cannot profit from Volcanos 
dynamic programming approach. Furthermore, with the simple approach of applying 
all suitable rules to all possible candidates the state space explodes with 
O(n!) (where n could be large, see above).

So I think our best bet atm would be to exploit all possible spaces (but with 
excessive pruning) or use some other sort of algorithms like simple gradient 
descent (I think our convergence behavior should be quite nice if the cost 
function is "smooth" enough) or stochastic optimization like cross entropy or 
simulated annealing.

I just wanted to give some feedback back to the list as many people joined the 
discussion and had interesting ideas. And I still think that there is an 
overlap in whats done but the 'common core' is smaller than I initially assumed 
(e.g., some query optimizers AFAIR use approaches like simulated annealing).

Best
Julian

Am 20.08.19, 15:38 schrieb "Julian Feinauer" :

Hi Stamatis,

thanks for your response.
I think my brain just needs a bit more time to get really deep into those 
advanced planning topics (its sometimes slow on adopting...).
But I will look through it.
We have a meetup this week and will discuss the matter and how to setup 
everything to enable some optimization at first (introducing cost estimates and 
such) and then I will again have a deeper look and perhaps prepare a test case 
or a runnable test.
Then its probably the easiest to reason about.

Julian

Am 20.08.19, 15:19 schrieb "Stamatis Zampetakis" :

Hi Julian F,

I admit that I didn't really get your example but talking about 'batch
request optimization'
and 'collapsing "overlapping" but not equal requests' I get the 
impression
that the problem
is optimizing sets of queries which may have common sub-expressions;
the problem is usually referred to as multi-query optimization and is
indeed relevant with
the Spool operator mentioned by Julian H.

If that's the case then the most relevant work that I can think of is 
[1],
which solves the problem
by slightly modifying the search strategy of the Volcano planner.

Best,
Stamatis

[1] Roy, Prasan, et al. "Efficient and extensible algorithms for multi
query optimization." ACM SIGMOD Record. Vol. 29. No. 2. ACM, 2000. (
https://www.cse.iitb.ac.in/~sudarsha/Pubs-dir/mqo-sigmod00.pdf)


On Tue, Aug 20, 2019 at 12:49 PM Julian Feinauer <
j.feina...@pragmaticminds.de> wrote:

> Hi Julian,
>
> thanks for the reply.
> I have to think about that, I think.
>
> But as I understand the Spool Operator this is to factor out multiple
> calculations of the same issue.
> In our Situation we aim more on collapsing "overlapping" but not equal
> requests.
>
> Consider 8 bits which form physically a byte.
> If I read 8 BOOLEANs I have 8 different request which mask one bit, 
return
> it (padded) as byte. So 8 requests and 8 bytes data transfer (plus 
masking
> on the PLC).
> If I would optimize it to read the byte in one request and do the 
masking
> afterwards I would have one request and only 1 byte transferred (plus 
no
> masking on the PLC which keeps pressure low there).
>
> This could be modelled by introducing respective "RelNodes" and 
Planner
> Rules, I think but I do not fully understand how Spool fits in here?
>
> Julian
>
> Am 19.08.19, 20:42 schrieb "Julian Hyde" :
>
> One tricky aspect is to optimize a *batch* of requests.
>
> The trick is to tie together the batch so that it is costed as one
> request. We don’t have an operator specifically for that, but you 
could for
> instance use UNION ALL. E.g. given Q1 and Q2, you 

Re: PLC4X Request Optimization

2019-08-20 Thread Julian Feinauer
Hi Stamatis,

thanks for your response.
I think my brain just needs a bit more time to get really deep into those 
advanced planning topics (its sometimes slow on adopting...).
But I will look through it.
We have a meetup this week and will discuss the matter and how to setup 
everything to enable some optimization at first (introducing cost estimates and 
such) and then I will again have a deeper look and perhaps prepare a test case 
or a runnable test.
Then its probably the easiest to reason about.

Julian

Am 20.08.19, 15:19 schrieb "Stamatis Zampetakis" :

Hi Julian F,

I admit that I didn't really get your example but talking about 'batch
request optimization'
and 'collapsing "overlapping" but not equal requests' I get the impression
that the problem
is optimizing sets of queries which may have common sub-expressions;
the problem is usually referred to as multi-query optimization and is
indeed relevant with
the Spool operator mentioned by Julian H.

If that's the case then the most relevant work that I can think of is [1],
which solves the problem
by slightly modifying the search strategy of the Volcano planner.

Best,
Stamatis

[1] Roy, Prasan, et al. "Efficient and extensible algorithms for multi
query optimization." ACM SIGMOD Record. Vol. 29. No. 2. ACM, 2000. (
https://www.cse.iitb.ac.in/~sudarsha/Pubs-dir/mqo-sigmod00.pdf)


On Tue, Aug 20, 2019 at 12:49 PM Julian Feinauer <
j.feina...@pragmaticminds.de> wrote:

> Hi Julian,
>
> thanks for the reply.
> I have to think about that, I think.
>
> But as I understand the Spool Operator this is to factor out multiple
> calculations of the same issue.
> In our Situation we aim more on collapsing "overlapping" but not equal
> requests.
>
> Consider 8 bits which form physically a byte.
> If I read 8 BOOLEANs I have 8 different request which mask one bit, return
> it (padded) as byte. So 8 requests and 8 bytes data transfer (plus masking
> on the PLC).
> If I would optimize it to read the byte in one request and do the masking
> afterwards I would have one request and only 1 byte transferred (plus no
> masking on the PLC which keeps pressure low there).
>
> This could be modelled by introducing respective "RelNodes" and Planner
> Rules, I think but I do not fully understand how Spool fits in here?
>
> Julian
>
> Am 19.08.19, 20:42 schrieb "Julian Hyde" :
>
> One tricky aspect is to optimize a *batch* of requests.
>
> The trick is to tie together the batch so that it is costed as one
> request. We don’t have an operator specifically for that, but you could 
for
> instance use UNION ALL. E.g. given Q1 and Q2, you could generate a plan 
for
>
>   select count(*) from Q1 union all select count(*) from Q2
>
> If the plan for the batch is be a DAG (i.e. sharing work between the
> components of the batch by creating something akin to “temporary tables”)
> then you are in the territory for which we created the Spool operator (see
> discussion in https://issues.apache.org/jira/browse/CALCITE-481 <
> https://issues.apache.org/jira/browse/CALCITE-481>).
>
> Julian
>
>
> > On Aug 19, 2019, at 6:34 AM, Julian Feinauer <
> j.feina...@pragmaticminds.de> wrote:
> >
> > Hi Danny,
> >
> > thanks for the quick reply.
> > Cost calculation we can of course provide (but it could be a bit
> different as we have not only CPU and Memory but also Network or 
something).
> >
> > And also something like the RelNodes could be provided. In our case
> this would be "Requests" which are at first "Logical" and are then
> transformed to "Physical" Requests. For example the API allows you to
> request many fields per single request but some PLCs only allow one field
> per request. So this would be one task of this layer.
> >
> > Julian
> >
> > Am 19.08.19, 14:44 schrieb "Danny Chan" :
> >
> >Cool idea ! Julian Feinauer ~
> >
> >I think the volcano model can be used the base of the cost
> algorithm. As long as you define all the metadata that you care about.
> Another thing is that you should have a struct like RelNode and a method
> like #computeSelfCost.
> >
> >Best,
> >Danny Chan
> >在 2019年8月19日 +0800 PM5:20,Julian Feinauer <
> j.feina...@pragmaticminds.de>,写道:
> >> Hi folks,
> >>
> >> I’m here again with another PLC4X related question (
> https://plc4x.apache.org).
> >> As we have more and more usecases we encounter situations where we
> send LOTS of replies to PLCs which one could sometimes optimize.
> >> This has multiple reasons 

Re: PLC4X Request Optimization

2019-08-20 Thread Stamatis Zampetakis
Hi Julian F,

I admit that I didn't really get your example but talking about 'batch
request optimization'
and 'collapsing "overlapping" but not equal requests' I get the impression
that the problem
is optimizing sets of queries which may have common sub-expressions;
the problem is usually referred to as multi-query optimization and is
indeed relevant with
the Spool operator mentioned by Julian H.

If that's the case then the most relevant work that I can think of is [1],
which solves the problem
by slightly modifying the search strategy of the Volcano planner.

Best,
Stamatis

[1] Roy, Prasan, et al. "Efficient and extensible algorithms for multi
query optimization." ACM SIGMOD Record. Vol. 29. No. 2. ACM, 2000. (
https://www.cse.iitb.ac.in/~sudarsha/Pubs-dir/mqo-sigmod00.pdf)


On Tue, Aug 20, 2019 at 12:49 PM Julian Feinauer <
j.feina...@pragmaticminds.de> wrote:

> Hi Julian,
>
> thanks for the reply.
> I have to think about that, I think.
>
> But as I understand the Spool Operator this is to factor out multiple
> calculations of the same issue.
> In our Situation we aim more on collapsing "overlapping" but not equal
> requests.
>
> Consider 8 bits which form physically a byte.
> If I read 8 BOOLEANs I have 8 different request which mask one bit, return
> it (padded) as byte. So 8 requests and 8 bytes data transfer (plus masking
> on the PLC).
> If I would optimize it to read the byte in one request and do the masking
> afterwards I would have one request and only 1 byte transferred (plus no
> masking on the PLC which keeps pressure low there).
>
> This could be modelled by introducing respective "RelNodes" and Planner
> Rules, I think but I do not fully understand how Spool fits in here?
>
> Julian
>
> Am 19.08.19, 20:42 schrieb "Julian Hyde" :
>
> One tricky aspect is to optimize a *batch* of requests.
>
> The trick is to tie together the batch so that it is costed as one
> request. We don’t have an operator specifically for that, but you could for
> instance use UNION ALL. E.g. given Q1 and Q2, you could generate a plan for
>
>   select count(*) from Q1 union all select count(*) from Q2
>
> If the plan for the batch is be a DAG (i.e. sharing work between the
> components of the batch by creating something akin to “temporary tables”)
> then you are in the territory for which we created the Spool operator (see
> discussion in https://issues.apache.org/jira/browse/CALCITE-481 <
> https://issues.apache.org/jira/browse/CALCITE-481>).
>
> Julian
>
>
> > On Aug 19, 2019, at 6:34 AM, Julian Feinauer <
> j.feina...@pragmaticminds.de> wrote:
> >
> > Hi Danny,
> >
> > thanks for the quick reply.
> > Cost calculation we can of course provide (but it could be a bit
> different as we have not only CPU and Memory but also Network or something).
> >
> > And also something like the RelNodes could be provided. In our case
> this would be "Requests" which are at first "Logical" and are then
> transformed to "Physical" Requests. For example the API allows you to
> request many fields per single request but some PLCs only allow one field
> per request. So this would be one task of this layer.
> >
> > Julian
> >
> > Am 19.08.19, 14:44 schrieb "Danny Chan" :
> >
> >Cool idea ! Julian Feinauer ~
> >
> >I think the volcano model can be used the base of the cost
> algorithm. As long as you define all the metadata that you care about.
> Another thing is that you should have a struct like RelNode and a method
> like #computeSelfCost.
> >
> >Best,
> >Danny Chan
> >在 2019年8月19日 +0800 PM5:20,Julian Feinauer <
> j.feina...@pragmaticminds.de>,写道:
> >> Hi folks,
> >>
> >> I’m here again with another PLC4X related question (
> https://plc4x.apache.org).
> >> As we have more and more usecases we encounter situations where we
> send LOTS of replies to PLCs which one could sometimes optimize.
> >> This has multiple reasons upstream (like multiple different
> Services sending, or you want two logically different addresses which could
> be physically equal).
> >>
> >> So, we consider to add some kind of optimizer which takes a Batch
> of requests and tries to arrange them in an “optimal” way with regard to
> som cost function.
> >> The cost functions would of course be given by each Driver but the
> optimizer could / should be rather general (possibly with pluggable rules).
> >>
> >> As Calcites Planner already includes all of that I ask myself if it
> could be possible (and make sense) to use that in PLC4X.
> >> Generally speaking, this raises the question if the Volcano
> approach can be suitable for such problems.
> >> The other alternative would be to start with some kind of heuristic
> based planning or with other optimization algorithms (genetic algs, cross
> entropy,…).
> >>
> >> Any thoughs or feedbacks are welcome!
> >>
> >> Julian
> >
> >

Re: PLC4X Request Optimization

2019-08-20 Thread Julian Feinauer
Hi Julian,

thanks for the reply.
I have to think about that, I think.

But as I understand the Spool Operator this is to factor out multiple 
calculations of the same issue.
In our Situation we aim more on collapsing "overlapping" but not equal requests.

Consider 8 bits which form physically a byte.
If I read 8 BOOLEANs I have 8 different request which mask one bit, return it 
(padded) as byte. So 8 requests and 8 bytes data transfer (plus masking on the 
PLC).
If I would optimize it to read the byte in one request and do the masking 
afterwards I would have one request and only 1 byte transferred (plus no 
masking on the PLC which keeps pressure low there).

This could be modelled by introducing respective "RelNodes" and Planner Rules, 
I think but I do not fully understand how Spool fits in here?

Julian

Am 19.08.19, 20:42 schrieb "Julian Hyde" :

One tricky aspect is to optimize a *batch* of requests.

The trick is to tie together the batch so that it is costed as one request. 
We don’t have an operator specifically for that, but you could for instance use 
UNION ALL. E.g. given Q1 and Q2, you could generate a plan for

  select count(*) from Q1 union all select count(*) from Q2

If the plan for the batch is be a DAG (i.e. sharing work between the 
components of the batch by creating something akin to “temporary tables”) then 
you are in the territory for which we created the Spool operator (see 
discussion in https://issues.apache.org/jira/browse/CALCITE-481 
).

Julian


> On Aug 19, 2019, at 6:34 AM, Julian Feinauer 
 wrote:
> 
> Hi Danny,
> 
> thanks for the quick reply.
> Cost calculation we can of course provide (but it could be a bit 
different as we have not only CPU and Memory but also Network or something).
> 
> And also something like the RelNodes could be provided. In our case this 
would be "Requests" which are at first "Logical" and are then transformed to 
"Physical" Requests. For example the API allows you to request many fields per 
single request but some PLCs only allow one field per request. So this would be 
one task of this layer.
> 
> Julian
> 
> Am 19.08.19, 14:44 schrieb "Danny Chan" :
> 
>Cool idea ! Julian Feinauer ~
> 
>I think the volcano model can be used the base of the cost algorithm. 
As long as you define all the metadata that you care about. Another thing is 
that you should have a struct like RelNode and a method like #computeSelfCost.
> 
>Best,
>Danny Chan
>在 2019年8月19日 +0800 PM5:20,Julian Feinauer 
,写道:
>> Hi folks,
>> 
>> I’m here again with another PLC4X related question 
(https://plc4x.apache.org).
>> As we have more and more usecases we encounter situations where we send 
LOTS of replies to PLCs which one could sometimes optimize.
>> This has multiple reasons upstream (like multiple different Services 
sending, or you want two logically different addresses which could be 
physically equal).
>> 
>> So, we consider to add some kind of optimizer which takes a Batch of 
requests and tries to arrange them in an “optimal” way with regard to som cost 
function.
>> The cost functions would of course be given by each Driver but the 
optimizer could / should be rather general (possibly with pluggable rules).
>> 
>> As Calcites Planner already includes all of that I ask myself if it 
could be possible (and make sense) to use that in PLC4X.
>> Generally speaking, this raises the question if the Volcano approach can 
be suitable for such problems.
>> The other alternative would be to start with some kind of heuristic 
based planning or with other optimization algorithms (genetic algs, cross 
entropy,…).
>> 
>> Any thoughs or feedbacks are welcome!
>> 
>> Julian
> 
> 





Re: PLC4X Request Optimization

2019-08-19 Thread Julian Hyde
One tricky aspect is to optimize a *batch* of requests.

The trick is to tie together the batch so that it is costed as one request. We 
don’t have an operator specifically for that, but you could for instance use 
UNION ALL. E.g. given Q1 and Q2, you could generate a plan for

  select count(*) from Q1 union all select count(*) from Q2

If the plan for the batch is be a DAG (i.e. sharing work between the components 
of the batch by creating something akin to “temporary tables”) then you are in 
the territory for which we created the Spool operator (see discussion in 
https://issues.apache.org/jira/browse/CALCITE-481 
).

Julian


> On Aug 19, 2019, at 6:34 AM, Julian Feinauer  
> wrote:
> 
> Hi Danny,
> 
> thanks for the quick reply.
> Cost calculation we can of course provide (but it could be a bit different as 
> we have not only CPU and Memory but also Network or something).
> 
> And also something like the RelNodes could be provided. In our case this 
> would be "Requests" which are at first "Logical" and are then transformed to 
> "Physical" Requests. For example the API allows you to request many fields 
> per single request but some PLCs only allow one field per request. So this 
> would be one task of this layer.
> 
> Julian
> 
> Am 19.08.19, 14:44 schrieb "Danny Chan" :
> 
>Cool idea ! Julian Feinauer ~
> 
>I think the volcano model can be used the base of the cost algorithm. As 
> long as you define all the metadata that you care about. Another thing is 
> that you should have a struct like RelNode and a method like #computeSelfCost.
> 
>Best,
>Danny Chan
>在 2019年8月19日 +0800 PM5:20,Julian Feinauer 
> ,写道:
>> Hi folks,
>> 
>> I’m here again with another PLC4X related question 
>> (https://plc4x.apache.org).
>> As we have more and more usecases we encounter situations where we send LOTS 
>> of replies to PLCs which one could sometimes optimize.
>> This has multiple reasons upstream (like multiple different Services 
>> sending, or you want two logically different addresses which could be 
>> physically equal).
>> 
>> So, we consider to add some kind of optimizer which takes a Batch of 
>> requests and tries to arrange them in an “optimal” way with regard to som 
>> cost function.
>> The cost functions would of course be given by each Driver but the optimizer 
>> could / should be rather general (possibly with pluggable rules).
>> 
>> As Calcites Planner already includes all of that I ask myself if it could be 
>> possible (and make sense) to use that in PLC4X.
>> Generally speaking, this raises the question if the Volcano approach can be 
>> suitable for such problems.
>> The other alternative would be to start with some kind of heuristic based 
>> planning or with other optimization algorithms (genetic algs, cross 
>> entropy,…).
>> 
>> Any thoughs or feedbacks are welcome!
>> 
>> Julian
> 
> 



Re: PLC4X Request Optimization

2019-08-19 Thread Julian Feinauer
Hi Danny,

thanks for the quick reply.
Cost calculation we can of course provide (but it could be a bit different as 
we have not only CPU and Memory but also Network or something).

And also something like the RelNodes could be provided. In our case this would 
be "Requests" which are at first "Logical" and are then transformed to 
"Physical" Requests. For example the API allows you to request many fields per 
single request but some PLCs only allow one field per request. So this would be 
one task of this layer.

Julian

Am 19.08.19, 14:44 schrieb "Danny Chan" :

Cool idea ! Julian Feinauer ~

I think the volcano model can be used the base of the cost algorithm. As 
long as you define all the metadata that you care about. Another thing is that 
you should have a struct like RelNode and a method like #computeSelfCost.

Best,
Danny Chan
在 2019年8月19日 +0800 PM5:20,Julian Feinauer ,写道:
> Hi folks,
>
> I’m here again with another PLC4X related question 
(https://plc4x.apache.org).
> As we have more and more usecases we encounter situations where we send 
LOTS of replies to PLCs which one could sometimes optimize.
> This has multiple reasons upstream (like multiple different Services 
sending, or you want two logically different addresses which could be 
physically equal).
>
> So, we consider to add some kind of optimizer which takes a Batch of 
requests and tries to arrange them in an “optimal” way with regard to som cost 
function.
> The cost functions would of course be given by each Driver but the 
optimizer could / should be rather general (possibly with pluggable rules).
>
> As Calcites Planner already includes all of that I ask myself if it could 
be possible (and make sense) to use that in PLC4X.
> Generally speaking, this raises the question if the Volcano approach can 
be suitable for such problems.
> The other alternative would be to start with some kind of heuristic based 
planning or with other optimization algorithms (genetic algs, cross entropy,…).
>
> Any thoughs or feedbacks are welcome!
>
> Julian




Re: PLC4X Request Optimization

2019-08-19 Thread Danny Chan
Cool idea ! Julian Feinauer ~

I think the volcano model can be used the base of the cost algorithm. As long 
as you define all the metadata that you care about. Another thing is that you 
should have a struct like RelNode and a method like #computeSelfCost.

Best,
Danny Chan
在 2019年8月19日 +0800 PM5:20,Julian Feinauer ,写道:
> Hi folks,
>
> I’m here again with another PLC4X related question (https://plc4x.apache.org).
> As we have more and more usecases we encounter situations where we send LOTS 
> of replies to PLCs which one could sometimes optimize.
> This has multiple reasons upstream (like multiple different Services sending, 
> or you want two logically different addresses which could be physically 
> equal).
>
> So, we consider to add some kind of optimizer which takes a Batch of requests 
> and tries to arrange them in an “optimal” way with regard to som cost 
> function.
> The cost functions would of course be given by each Driver but the optimizer 
> could / should be rather general (possibly with pluggable rules).
>
> As Calcites Planner already includes all of that I ask myself if it could be 
> possible (and make sense) to use that in PLC4X.
> Generally speaking, this raises the question if the Volcano approach can be 
> suitable for such problems.
> The other alternative would be to start with some kind of heuristic based 
> planning or with other optimization algorithms (genetic algs, cross 
> entropy,…).
>
> Any thoughs or feedbacks are welcome!
>
> Julian