Re: New SQL execution engine

Seliverstov Igor Fri, 27 Sep 2019 08:08:51 -0700

Nikolay,

At last we have better questions.


There is no decision, here we should decide.

Doing nothing isn’t a decision, it’s just doing nothing

Spark Catalyst is a good example, but under the hood it has absolutely the same 
idea, but adopted to Spark. Calcite is the same, but general. That’s why it’s 
better start point.

Implementing an engine from scratch is really cool, but looks like inventing a 
bicycle, don’t think it makes sense. At least I against this option.

I added requirements to IEP (as you asked), you may see it’s in DRAFT state and 
will be complemented by details.

We have some thoughts on how to make smooth replacement, but at first we should 
decide what to replace and what with.

At now Calcite based engine is placed in different module, we checked it can 
build execution graph for both local and distributed cases, it has good 
expandability. 
We talked to Calcite community to identify possible future issues and 
everything points to the fact it’s the best option. 
It’s possible to develop it as an experimental extension at first (not a 
replacement) until we make sure that it works as expected. This way there are 
no risks for anybody who uses Ignite on production environment.

Regards,
Igor


> 27 сент. 2019 г., в 17:25, Nikolay Izhikov <[email protected]> написал(а):
> 
> Igor.
> 
>> The main issue - there is no *selection*.
> 
> 1. I don't remember community decision about this.
> 
> 2. We should avoid to make such long-term decision so quickly.
> We done this kind of decision with H2 and come to the point when we should 
> review it.
> 
>> 1) Implementing white papers from scratch
>> 2) Adopting Calcite to our needs.
> 
> The third option don't fix issues we have with H2.
> The fourth option I know is using spark-catalyst.
> 
> What is wrong with writing engine from scratch?
> 
> I ask you to start with engine requirements.
> Can we, please, discuss it?
> 
>> If you have an alternative - you're welcome, I'll gratefully listen to you.
> 
> We have alternative for now - H2 based engine.
> 
>> The main question isn't "WHAT" but "HOW" - that's the discussion topic from 
>> my point of view.
> 
> When we make a decision about engine we can discuss roadmap for replacement.
> One more time - replacement of SQL engine to some more customizable make 
> sense for me.
> But, this kind of decisions need carefull discussion.
> 
> В Пт, 27/09/2019 в 17:08 +0300, Seliverstov Igor пишет:
>> Nikolay,
>> 
>> The main issue - there is no *selection*.
>> 
>> There is a field of knowledge - relational algebra, which describes how to 
>> transform relational expressions saving their semantics, and a couple of 
>> implementations (Calcite is only one written in Java).
>> 
>> There are only two alternatives:
>> 
>> 1) Implementing white papers from scratch
>> 2) Adopting Calcite to our needs.
>> 
>> The second way was chosen by several other projects, there is experience, 
>> there is a list of known issues (like using indexes) so, almost everything 
>> is already done for us.
>> 
>> Implementing a planner is a big deal, I think anybody understands it there. 
>> That's why our proposal to reuse others experience is obvious.
>> 
>> If you have an alternative - you're welcome, I'll gratefully listen to you.
>> 
>> The main question isn't "WHAT" but "HOW" - that's the discussion topic from 
>> my point of view.
>> 
>> Regards,
>> Igor
>> 
>>> 27 сент. 2019 г., в 16:37, Nikolay Izhikov <[email protected]> написал(а):
>>> 
>>> Roman.
>>> 
>>>> Nikolay, Maxim, I understand that our arguments may not be as obvious 
>>>> for you as it obvious for SQL team. So, please arrange your questions in 
>>>> a more constructive way.
>>> 
>>> What is SQL team?
>>> I only know Ignite community :)
>>> 
>>> Please, share you knowledge in IEP.
>>> I want to join to the process of engine *selection*.
>>> It should start with the requirements to such engine.
>>> Can you write it in IEP, please?
>>> 
>>> My point is very simple:
>>> 
>>> 1. We made the wrong decision with H2
>>> 2. We should make a well-thought decision about the new engine.
>>> 
>>>> How many tickets would satisfy you?
>>> 
>>> You write about "issueS" with the H2.
>>> All I see is one open ticket.
>>> IEP doesn't provide enough information.
>>> So it's not about the number of tickets, it's about
>>> 
>>>> These two points (single map-reduce execution and inflexible optimizer) 
>>>> are the main problems with the current engine.
>>> 
>>> We may come to the point when Calcite(or any other engine) brings us third 
>>> and other "main problems".
>>> This is how it happens with H2.
>>> 
>>> Let's start from what we want to get with the engine and move forward from 
>>> this base.
>>> What do you think?
>>> 
>>> 
>>> 
>>> В Пт, 27/09/2019 в 16:15 +0300, Roman Kondakov пишет:
>>>> Maxim, Nikolay,
>>>> 
>>>> I've listed two issues which show the ideological flaws of the current 
>>>> engine.
>>>> 
>>>> 1. IGNITE-11448 - Open. This ticket describes the impossibility of 
>>>> executing queries which can not be fit in the hardcoded one pass 
>>>> map-reduce paradigm.
>>>> 
>>>> 2. IGNITE-6085 - Closed (won't fix) - This ticket describes the second 
>>>> major problem with the current engine: H2 query optimizer is very 
>>>> primitive and can not perform many useful optimizations.
>>>> 
>>>> These two points (single map-reduce execution and inflexible optimizer) 
>>>> are the main problems with the current engine. It means that our engine 
>>>> is currently  suitable for execution only a very limited subset of the 
>>>> typical SQL queries. For example it can not even run most of the TPC-H 
>>>> benchmark queries because they don't fit to the simple map-reduce paradigm.
>>>> 
>>>>> All I see is links to two tickets:
>>>> 
>>>> How many tickets would satisfy you? I named two. And it looks like it is 
>>>> not enough from your point of view. Ok, so how many is enough? The set 
>>>> of problems caused by listed above tickets is infinite, therefore I can 
>>>> not create a ticket for each of them.
>>>>> Tech details also should be added.
>>>> 
>>>> Tech details are in the tickets.
>>>> 
>>>>> We can't discuss such a huge change as an execution engine replacement 
>>>>> with descrition like:
>>>>> "No data co-location control, i.e. arbitrary data can be returned 
>>>>> silently" or
>>>>> "Low control on how query executes internally, as a result we have 
>>>>> limited possibility to implement improvements/fixes."
>>>> 
>>>> Why not? Don't you understand these problems? Or you don't think this is 
>>>> a problem?
>>>> 
>>>>> Let's make these descriptions more specific.
>>>> 
>>>> What do you mean by "more specific"? What is the criteria of the 
>>>> specific description?
>>>> 
>>>> 
>>>> 
>>>> Nikolay, Maxim, I understand that our arguments may not be as obvious 
>>>> for you as it obvious for SQL team. So, please arrange your questions in 
>>>> a more constructive way.
>>>> 
>>>> Thank you!
>> 
>>

Re: New SQL execution engine

Reply via email to