[jira] [Created] (IGNITE-12239) Transaction keys system view
Nikolay Izhikov created IGNITE-12239: Summary: Transaction keys system view Key: IGNITE-12239 URL: https://issues.apache.org/jira/browse/IGNITE-12239 Project: Ignite Issue Type: Sub-task Reporter: Nikolay Izhikov We should export transaction keys as a system view -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: New SQL execution engine
Hello, Denis. Thanks for the clarifications. Sounds good for me. All I try to say in this thread: Guys, please, let's take a step back and write down requirements(what we want to get with SQL engine). Which features and use-cases are primary for us. I'm sure you have done it, already during your research. Please, share it with the community. I'm pretty sure we would back to this document again and again during migration. So good written design is worth it. В Пт, 27/09/2019 в 09:10 -0700, Denis Magda пишет: > Ignite mates, let me try to move the discussion in a constructive way. It > looks like we set a wrong context from the very beginning. > > Before proposing this idea to the community, some of us were > discussing/researching the topic in different groups (the one need to think > it through first before even suggesting to consider changes of this > magnitude). The day has come to share this idea with the whole community > and outline the next actions. But (!) nobody is 100% sure that that's the > right decision. Thus, this will be an *experiment*, some of our community > members will be developing a *prototype* and only based on the prototype > outcomes we shall make a final decision. Igor, Roman, Ivan, Andrey, hope > that nothing has changed and we're on the same page here. > > Many technical and architectural reasons that justify this project have > been shared but let me throw in my perspective. There is nothing wrong with > H2, that was the right choice for that time. Thanks to H2 and Ignite SQL > APIs, our project is used across hundreds of deployments who are > accelerating relational databases or use Ignite as a system of records. > However, these days many more companies are migrating to *distributed* > databases that speak SQL. For instance, if a couple of years ago 1 out of > 10 use cases needed support for multi-joins queries or queries with > subselects or efficient memory usage then today there are 5 out of 10 use > cases of this kind; in the foreseeable future, it will be a 10 out of 10. > So, the evolution is in progress -- the relational world goes distributed, > it became exhaustive for both Ignite SQL maintainers and experts who help > to tune it for production usage to keep pace with the evolution mostly due > to the H2-dependency. Thus, Ignite SQL has to evolve and has to be ready to > face the future reality. > > Luckily, we don't need to rush and don't have the right to rush because > hundreds existing users have already trusted their production environments > to Ignite SQL and we need to roll out changes with such a big impact > carefully. So, I'm excited that Roman, Igor, Ivan, Andrey stepped in and > agreed to be the first contributors who will be *experimenting* with the > new SQL engine. Let's support them; let's connect them with Apache Calcite > community and see how this story evolves. Folks, please keep the community > aware of the progress, let us know when help is needed, some of us will be > ready to support with development once you create a solid foundation for > the prototype. > > - > Denis > > > On Fri, Sep 27, 2019 at 1:45 AM Igor Seliverstov > wrote: > > > Hi Igniters! > > > > As you might know currently we have many open issues relating to current > > H2 based engine and its execution flow. > > > > Some of them are critical (like impossibility to execute particular > > queries), some of them are majors (like impossibility to execute particular > > queries without pre-preparation your data to have a collocation) and many > > minors. > > > > Most of the issues cannot be solved without whole engine redesign. > > > > So, here the proposal: > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084 > > > > I'll appreciate if you share your thoughts on top of that. > > > > Regards, > > Igor > > signature.asc Description: This is a digitally signed message part
Re: New SQL execution engine
> I think, we should discuss the idea in general. Everybody likes the idea so far :) The issues in details, as usual. В Пт, 27/09/2019 в 19:03 +0300, Seliverstov Igor пишет: > Nikolay, > > > What project hosted Calcite based engine? > > > Currently the prototype is placed in my personal Ignite fork. I need an > appropriate ticket before pushing it to ASF git repository. > At first, I think, we should discuss the idea in general. > > > Personally, I'm against the support of two independent implementation of > > SQL engine for several releases. > > > I don’t like the idea to have two engines too. But even development the > engine on top of Calcite library is still a big deal. > I not sure it will be ready, no, I sure it WONT be ready by Ignite3 release. > So I mentioned the option to have two engines at the same time. > > > Let's start with the IEP clarification and replace the SQL engine with the > > best one for Ignite good. > > Of course, but anyway it’s good to make familiar with a couple of examples it > already describes and clarify some additional questions the community may ask. > > Regards, > Igor > > > 27 сент. 2019 г., в 18:22, Nikolay Izhikov написал(а): > > > > Igor. > > > > > There is no decision, here we should decide. > > > > Great. > > > > > At now Calcite based engine is placed in different module > > > > What project hosted Calcite based engine? > > > > > It’s possible to develop it as an experimental extension at first (not a > > > replacement) > > > > For me, Ignite 3 are the place where the new engine has to be placed. > > Personally, I'm against the support of two independent implementation of > > SQL engine for several releases. > > > > Ignite has too many partially implemented features to include on more :) > > > > Let's start with the IEP clarification and replace the SQL engine with the > > best one for Ignite good. > > > > > > В Пт, 27/09/2019 в 18:08 +0300, Seliverstov Igor пишет: > > > Nikolay, > > > > > > At last we have better questions. > > > > > > There is no decision, here we should decide. > > > > > > Doing nothing isn’t a decision, it’s just doing nothing > > > > > > Spark Catalyst is a good example, but under the hood it has absolutely > > > the same idea, but adopted to Spark. Calcite is the same, but general. > > > That’s why it’s better start point. > > > > > > Implementing an engine from scratch is really cool, but looks like > > > inventing a bicycle, don’t think it makes sense. At least I against this > > > option. > > > > > > I added requirements to IEP (as you asked), you may see it’s in DRAFT > > > state and will be complemented by details. > > > > > > We have some thoughts on how to make smooth replacement, but at first we > > > should decide what to replace and what with. > > > > > > At now Calcite based engine is placed in different module, we checked it > > > can build execution graph for both local and distributed cases, it has > > > good expandability. > > > We talked to Calcite community to identify possible future issues and > > > everything points to the fact it’s the best option. > > > It’s possible to develop it as an experimental extension at first (not a > > > replacement) until we make sure that it works as expected. This way there > > > are no risks for anybody who uses Ignite on production environment. > > > > > > Regards, > > > Igor > > > > > > > > > > 27 сент. 2019 г., в 17:25, Nikolay Izhikov > > > > написал(а): > > > > > > > > Igor. > > > > > > > > > The main issue - there is no *selection*. > > > > > > > > 1. I don't remember community decision about this. > > > > > > > > 2. We should avoid to make such long-term decision so quickly. > > > > We done this kind of decision with H2 and come to the point when we > > > > should review it. > > > > > > > > > 1) Implementing white papers from scratch > > > > > 2) Adopting Calcite to our needs. > > > > > > > > The third option don't fix issues we have with H2. > > > > The fourth option I know is using spark-catalyst. > > > > > > > > What is wrong with writing engine from scratch? > > > > > > > > I ask you to start with engine requirements. > > > > Can we, please, discuss it? > > > > > > > > > If you have an alternative - you're welcome, I'll gratefully listen > > > > > to you. > > > > > > > > We have alternative for now - H2 based engine. > > > > > > > > > The main question isn't "WHAT" but "HOW" - that's the discussion > > > > > topic from my point of view. > > > > > > > > When we make a decision about engine we can discuss roadmap for > > > > replacement. > > > > One more time - replacement of SQL engine to some more customizable > > > > make sense for me. > > > > But, this kind of decisions need carefull discussion. > > > > > > > > В Пт, 27/09/2019 в 17:08 +0300, Seliverstov Igor пишет: > > > > > Nikolay, > > > > > > > > > > The main issue - there is no *selection*. > > > > > > > > > > There is a field of knowledge -
Re: New SQL execution engine
Ignite mates, let me try to move the discussion in a constructive way. It looks like we set a wrong context from the very beginning. Before proposing this idea to the community, some of us were discussing/researching the topic in different groups (the one need to think it through first before even suggesting to consider changes of this magnitude). The day has come to share this idea with the whole community and outline the next actions. But (!) nobody is 100% sure that that's the right decision. Thus, this will be an *experiment*, some of our community members will be developing a *prototype* and only based on the prototype outcomes we shall make a final decision. Igor, Roman, Ivan, Andrey, hope that nothing has changed and we're on the same page here. Many technical and architectural reasons that justify this project have been shared but let me throw in my perspective. There is nothing wrong with H2, that was the right choice for that time. Thanks to H2 and Ignite SQL APIs, our project is used across hundreds of deployments who are accelerating relational databases or use Ignite as a system of records. However, these days many more companies are migrating to *distributed* databases that speak SQL. For instance, if a couple of years ago 1 out of 10 use cases needed support for multi-joins queries or queries with subselects or efficient memory usage then today there are 5 out of 10 use cases of this kind; in the foreseeable future, it will be a 10 out of 10. So, the evolution is in progress -- the relational world goes distributed, it became exhaustive for both Ignite SQL maintainers and experts who help to tune it for production usage to keep pace with the evolution mostly due to the H2-dependency. Thus, Ignite SQL has to evolve and has to be ready to face the future reality. Luckily, we don't need to rush and don't have the right to rush because hundreds existing users have already trusted their production environments to Ignite SQL and we need to roll out changes with such a big impact carefully. So, I'm excited that Roman, Igor, Ivan, Andrey stepped in and agreed to be the first contributors who will be *experimenting* with the new SQL engine. Let's support them; let's connect them with Apache Calcite community and see how this story evolves. Folks, please keep the community aware of the progress, let us know when help is needed, some of us will be ready to support with development once you create a solid foundation for the prototype. - Denis On Fri, Sep 27, 2019 at 1:45 AM Igor Seliverstov wrote: > Hi Igniters! > > As you might know currently we have many open issues relating to current > H2 based engine and its execution flow. > > Some of them are critical (like impossibility to execute particular > queries), some of them are majors (like impossibility to execute particular > queries without pre-preparation your data to have a collocation) and many > minors. > > Most of the issues cannot be solved without whole engine redesign. > > So, here the proposal: > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084 > > I'll appreciate if you share your thoughts on top of that. > > Regards, > Igor >
Re: New SQL execution engine
Nikolay, > What project hosted Calcite based engine? Currently the prototype is placed in my personal Ignite fork. I need an appropriate ticket before pushing it to ASF git repository. At first, I think, we should discuss the idea in general. > Personally, I'm against the support of two independent implementation of SQL > engine for several releases. I don’t like the idea to have two engines too. But even development the engine on top of Calcite library is still a big deal. I not sure it will be ready, no, I sure it WONT be ready by Ignite3 release. So I mentioned the option to have two engines at the same time. > Let's start with the IEP clarification and replace the SQL engine with the > best one for Ignite good. Of course, but anyway it’s good to make familiar with a couple of examples it already describes and clarify some additional questions the community may ask. Regards, Igor > 27 сент. 2019 г., в 18:22, Nikolay Izhikov написал(а): > > Igor. > >> There is no decision, here we should decide. > > Great. > >> At now Calcite based engine is placed in different module > > What project hosted Calcite based engine? > >> It’s possible to develop it as an experimental extension at first (not a >> replacement) > > For me, Ignite 3 are the place where the new engine has to be placed. > Personally, I'm against the support of two independent implementation of SQL > engine for several releases. > > Ignite has too many partially implemented features to include on more :) > > Let's start with the IEP clarification and replace the SQL engine with the > best one for Ignite good. > > > В Пт, 27/09/2019 в 18:08 +0300, Seliverstov Igor пишет: >> Nikolay, >> >> At last we have better questions. >> >> There is no decision, here we should decide. >> >> Doing nothing isn’t a decision, it’s just doing nothing >> >> Spark Catalyst is a good example, but under the hood it has absolutely the >> same idea, but adopted to Spark. Calcite is the same, but general. That’s >> why it’s better start point. >> >> Implementing an engine from scratch is really cool, but looks like inventing >> a bicycle, don’t think it makes sense. At least I against this option. >> >> I added requirements to IEP (as you asked), you may see it’s in DRAFT state >> and will be complemented by details. >> >> We have some thoughts on how to make smooth replacement, but at first we >> should decide what to replace and what with. >> >> At now Calcite based engine is placed in different module, we checked it can >> build execution graph for both local and distributed cases, it has good >> expandability. >> We talked to Calcite community to identify possible future issues and >> everything points to the fact it’s the best option. >> It’s possible to develop it as an experimental extension at first (not a >> replacement) until we make sure that it works as expected. This way there >> are no risks for anybody who uses Ignite on production environment. >> >> Regards, >> Igor >> >> >>> 27 сент. 2019 г., в 17:25, Nikolay Izhikov написал(а): >>> >>> Igor. >>> The main issue - there is no *selection*. >>> >>> 1. I don't remember community decision about this. >>> >>> 2. We should avoid to make such long-term decision so quickly. >>> We done this kind of decision with H2 and come to the point when we should >>> review it. >>> 1) Implementing white papers from scratch 2) Adopting Calcite to our needs. >>> >>> The third option don't fix issues we have with H2. >>> The fourth option I know is using spark-catalyst. >>> >>> What is wrong with writing engine from scratch? >>> >>> I ask you to start with engine requirements. >>> Can we, please, discuss it? >>> If you have an alternative - you're welcome, I'll gratefully listen to you. >>> >>> We have alternative for now - H2 based engine. >>> The main question isn't "WHAT" but "HOW" - that's the discussion topic from my point of view. >>> >>> When we make a decision about engine we can discuss roadmap for replacement. >>> One more time - replacement of SQL engine to some more customizable make >>> sense for me. >>> But, this kind of decisions need carefull discussion. >>> >>> В Пт, 27/09/2019 в 17:08 +0300, Seliverstov Igor пишет: Nikolay, The main issue - there is no *selection*. There is a field of knowledge - relational algebra, which describes how to transform relational expressions saving their semantics, and a couple of implementations (Calcite is only one written in Java). There are only two alternatives: 1) Implementing white papers from scratch 2) Adopting Calcite to our needs. The second way was chosen by several other projects, there is experience, there is a list of known issues (like using indexes) so, almost everything is already done for us. Implementing a planner is a big deal, I think anybody understands it >>>
Re: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)
Yuriy, Thank you for providing details! Quite interesting. Yes, we already have support of distributed limit and merging sorted subresults for SQL queries. E.g. ReduceIndexSorted and MergeStreamIterator are used for merging sorted streams. Could you please also clarify about score/relevance? Is it provided by Lucene engine for each query result? I am thinking how to do sorted merge properly in this case. ср, 25 сент. 2019 г. в 18:56, Yuriy Shuliga : > > Ivan, > > Thank you for interesting question! > > Text searches (or full text searches) are mostly human-oriented. And the > point of user's interest is topmost part of response. > Then user can read it, evaluate and use the given records for further > purposes. > > Particularly in our case, we use Ignite for operations with financial data, > and there lots of text stuff like assets names, fin. instruments, companies > etc. > In order to operate with this quickly and reliably, users used to work with > text search, type-ahead completions, suggestions. > > For this purposes we are indexing particular string data in separate caches. > > Sorting capabilities and response size limitations are very important > there. As our API have to provide most relevant information in view of > limited size. > > Now let me comment some Ignite/Lucene perspective. > Actually Ignite queries and Lucene returns *TopDocs.scoresDocs *already > sorted by *score *(relevance). So most relevant documents are on the top. > And currently distributed queries responses from different nodes are merged > into final query cursor queue in arbitrary way. > So in fact we already have the score order ruined here. Also Ignite > requests all possible documents from Lucene that is redundant and not good > for performance. > > I'm implementing *limit* parameter to be part of *TextQuery *and have to > notice that we still have to add sorting for text queries processing in > order to have applicable results. > > *Limit* parameter itself should improve the part of issues from above, but > definitely, sorting by document score at least should be implemented along > with limit. > > This is a pretty short commentary if you still have any questions, please > ask, do not hesitate) > > BR, > Yuriy Shuliha > > чт, 19 вер. 2019 о 11:38 Павлухин Иван пише: > > > Yuriy, > > > > Greatly appreciate your interest. > > > > Could you please elaborate a little bit about sorting? What tasks does > > it help to solve and how? It would be great to provide an example. > > > > ср, 18 сент. 2019 г. в 09:39, Alexei Scherbakov < > > alexey.scherbak...@gmail.com>: > > > > > > Denis, > > > > > > I like the idea of throwing an exception for enabled text queries on > > > persistent caches. > > > > > > Also I'm fine with proposed limit for unsorted searches. > > > > > > Yury, please proceed with ticket creation. > > > > > > вт, 17 сент. 2019 г., 22:06 Denis Magda : > > > > > > > Igniters, > > > > > > > > I see nothing wrong with Yury's proposal in regards full-text search > > API > > > > evolution as long as Yury is ready to push it forward. > > > > > > > > As for the in-memory mode only, it makes total sense for in-memory data > > > > grid deployments when Ignite caches data of an underlying DB like > > Postgres. > > > > As part of the changes, I would simply throw an exception (by default) > > if > > > > the one attempts to use text indices with the native persistence > > enabled. > > > > If the person is ready to live with that limitation that an explicit > > > > configuration change is needed to come around the exception. > > > > > > > > Thoughts? > > > > > > > > > > > > - > > > > Denis > > > > > > > > > > > > On Tue, Sep 17, 2019 at 7:44 AM Yuriy Shuliga > > wrote: > > > > > > > > > Hello to all again, > > > > > > > > > > Thank you for important comments and notes given below! > > > > > > > > > > Let me answer and continue the discussion. > > > > > > > > > > (I) Overall needs in Lucene indexing > > > > > > > > > > Alexei has referenced to > > > > > https://issues.apache.org/jira/browse/IGNITE-5371 where > > > > > absence of index persistence was declared as an obstacle to further > > > > > development. > > > > > > > > > > a) This ticket is already closed as not valid.b) There are definite > > needs > > > > > (and in our project as well) in just in-memory indexing of selected > > data. > > > > > We intend to use search capabilities for fetching limited amount of > > > > records > > > > > that should be used in type-ahead search / suggestions. > > > > > Not all of the data will be indexed and the are no need in Lucene > > index > > > > to > > > > > be persistence. Hope this is a wide pattern of text-search usage. > > > > > > > > > > (II) Necessary fixes in current implementation. > > > > > > > > > > a) Implementation of correct *limit *(*offset* seems to be not > > required > > > > in > > > > > text-search tasks for now) > > > > > I have investigated the data flow for distributed text queries. it > > was > > > > > simple
Re: New SQL execution engine
Igor. > There is no decision, here we should decide. Great. > At now Calcite based engine is placed in different module What project hosted Calcite based engine? > It’s possible to develop it as an experimental extension at first (not a > replacement) For me, Ignite 3 are the place where the new engine has to be placed. Personally, I'm against the support of two independent implementation of SQL engine for several releases. Ignite has too many partially implemented features to include on more :) Let's start with the IEP clarification and replace the SQL engine with the best one for Ignite good. В Пт, 27/09/2019 в 18:08 +0300, Seliverstov Igor пишет: > Nikolay, > > At last we have better questions. > > There is no decision, here we should decide. > > Doing nothing isn’t a decision, it’s just doing nothing > > Spark Catalyst is a good example, but under the hood it has absolutely the > same idea, but adopted to Spark. Calcite is the same, but general. That’s why > it’s better start point. > > Implementing an engine from scratch is really cool, but looks like inventing > a bicycle, don’t think it makes sense. At least I against this option. > > I added requirements to IEP (as you asked), you may see it’s in DRAFT state > and will be complemented by details. > > We have some thoughts on how to make smooth replacement, but at first we > should decide what to replace and what with. > > At now Calcite based engine is placed in different module, we checked it can > build execution graph for both local and distributed cases, it has good > expandability. > We talked to Calcite community to identify possible future issues and > everything points to the fact it’s the best option. > It’s possible to develop it as an experimental extension at first (not a > replacement) until we make sure that it works as expected. This way there are > no risks for anybody who uses Ignite on production environment. > > Regards, > Igor > > > > 27 сент. 2019 г., в 17:25, Nikolay Izhikov написал(а): > > > > Igor. > > > > > The main issue - there is no *selection*. > > > > 1. I don't remember community decision about this. > > > > 2. We should avoid to make such long-term decision so quickly. > > We done this kind of decision with H2 and come to the point when we should > > review it. > > > > > 1) Implementing white papers from scratch > > > 2) Adopting Calcite to our needs. > > > > The third option don't fix issues we have with H2. > > The fourth option I know is using spark-catalyst. > > > > What is wrong with writing engine from scratch? > > > > I ask you to start with engine requirements. > > Can we, please, discuss it? > > > > > If you have an alternative - you're welcome, I'll gratefully listen to > > > you. > > > > We have alternative for now - H2 based engine. > > > > > The main question isn't "WHAT" but "HOW" - that's the discussion topic > > > from my point of view. > > > > When we make a decision about engine we can discuss roadmap for replacement. > > One more time - replacement of SQL engine to some more customizable make > > sense for me. > > But, this kind of decisions need carefull discussion. > > > > В Пт, 27/09/2019 в 17:08 +0300, Seliverstov Igor пишет: > > > Nikolay, > > > > > > The main issue - there is no *selection*. > > > > > > There is a field of knowledge - relational algebra, which describes how > > > to transform relational expressions saving their semantics, and a couple > > > of implementations (Calcite is only one written in Java). > > > > > > There are only two alternatives: > > > > > > 1) Implementing white papers from scratch > > > 2) Adopting Calcite to our needs. > > > > > > The second way was chosen by several other projects, there is experience, > > > there is a list of known issues (like using indexes) so, almost > > > everything is already done for us. > > > > > > Implementing a planner is a big deal, I think anybody understands it > > > there. That's why our proposal to reuse others experience is obvious. > > > > > > If you have an alternative - you're welcome, I'll gratefully listen to > > > you. > > > > > > The main question isn't "WHAT" but "HOW" - that's the discussion topic > > > from my point of view. > > > > > > Regards, > > > Igor > > > > > > > 27 сент. 2019 г., в 16:37, Nikolay Izhikov > > > > написал(а): > > > > > > > > Roman. > > > > > > > > > Nikolay, Maxim, I understand that our arguments may not be as obvious > > > > > for you as it obvious for SQL team. So, please arrange your questions > > > > > in > > > > > a more constructive way. > > > > > > > > What is SQL team? > > > > I only know Ignite community :) > > > > > > > > Please, share you knowledge in IEP. > > > > I want to join to the process of engine *selection*. > > > > It should start with the requirements to such engine. > > > > Can you write it in IEP, please? > > > > > > > > My point is very simple: > > > > > > > > 1. We made the wrong decision w
Re: New SQL execution engine
Hello, Andrey. Thanks, it's more clear now. > I agree, we should make IEP clear to everyone in community who want to be > involved in IEP implementation at first. Great! Looking forward for IEP clarification. В Пт, 27/09/2019 в 18:07 +0300, Andrey Mashenkov пишет: > Nikolay, Igor. > > Implementing from scratch is an option, of course. > If we decide to go this way then we definitely won't to spend long nights > to invent "yet another SQL parser" with all the stuff related to query > rewrite rules (e.g. IN -> JOIN) or type casting \ validation \ conversion. > > We thought about step-by-step H2 replacing. > 1. We've tried to make POC with parser replacement to generated one from > SQL grammar with ASM, > but this approach looks slow, AFAIR. Gridgainers, anybody, have smth on > this? > > 2. Then we need a planner with all the rules. > Of course we will need to write rules optimized for "Distributed" execution > in anyway, but I doubt anybody want to write common-rules that already has > Calcite. > We can copy-paste, but what for? > > 3. Then we have to implement execution pipeline. > Possibly, we can adopt new query plans for H2 execution, but then we will > still have same pain with resolving H2 internal issues (e.g. OOM). > H2 approach is outdated, it doesn't fit Ignite needs as distributes system. > > With Calcite we can concentrate on 2 and (mostly) 3 points and reuse > their architectural abstracts, otherwise we should reinvent those abstracts > through long discussions on dev-list. > > I agree, we should make IEP clear to everyone in community who want to be > involved in IEP implementation at first. > Both approaches ("from scratch" and "with Calcite") are risky, so > > Can we try to make an additional engine "beta"-implementation and allow > users fallback to old engine until a new one will be decided to become > mature enough. > > > > > On Fri, Sep 27, 2019 at 5:08 PM Seliverstov Igor > wrote: > > > Nikolay, > > > > The main issue - there is no *selection*. > > > > There is a field of knowledge - relational algebra, which describes how to > > transform relational expressions saving their semantics, and a couple of > > implementations (Calcite is only one written in Java). > > > > There are only two alternatives: > > > > 1) Implementing white papers from scratch > > 2) Adopting Calcite to our needs. > > > > The second way was chosen by several other projects, there is experience, > > there is a list of known issues (like using indexes) so, almost everything > > is already done for us. > > > > Implementing a planner is a big deal, I think anybody understands it > > there. That's why our proposal to reuse others experience is obvious. > > > > If you have an alternative - you're welcome, I'll gratefully listen to you. > > > > The main question isn't "WHAT" but "HOW" - that's the discussion topic > > from my point of view. > > > > Regards, > > Igor > > > > > 27 сент. 2019 г., в 16:37, Nikolay Izhikov > > > > написал(а): > > > > > > Roman. > > > > > > > Nikolay, Maxim, I understand that our arguments may not be as obvious > > > > for you as it obvious for SQL team. So, please arrange your questions > > > > in > > > > a more constructive way. > > > > > > What is SQL team? > > > I only know Ignite community :) > > > > > > Please, share you knowledge in IEP. > > > I want to join to the process of engine *selection*. > > > It should start with the requirements to such engine. > > > Can you write it in IEP, please? > > > > > > My point is very simple: > > > > > > 1. We made the wrong decision with H2 > > > 2. We should make a well-thought decision about the new engine. > > > > > > > How many tickets would satisfy you? > > > > > > You write about "issueS" with the H2. > > > All I see is one open ticket. > > > IEP doesn't provide enough information. > > > So it's not about the number of tickets, it's about > > > > > > > These two points (single map-reduce execution and inflexible optimizer) > > > > are the main problems with the current engine. > > > > > > We may come to the point when Calcite(or any other engine) brings us > > > > third and other "main problems". > > > This is how it happens with H2. > > > > > > Let's start from what we want to get with the engine and move forward > > > > from this base. > > > What do you think? > > > > > > > > > > > > В Пт, 27/09/2019 в 16:15 +0300, Roman Kondakov пишет: > > > > Maxim, Nikolay, > > > > > > > > I've listed two issues which show the ideological flaws of the current > > > > engine. > > > > > > > > 1. IGNITE-11448 - Open. This ticket describes the impossibility of > > > > executing queries which can not be fit in the hardcoded one pass > > > > map-reduce paradigm. > > > > > > > > 2. IGNITE-6085 - Closed (won't fix) - This ticket describes the second > > > > major problem with the current engine: H2 query optimizer is very > > > > primitive and can not perform many useful optimizations. > > > > > > > > These two
Re: New SQL execution engine
Folks, especially Ignite PMCs, Are there any plans about how Ignite SQL will be evolved? It is a very interesting thread on how Ignite SQL as a product will be developed for the near future e.g. supporting new standards etc. According to documentation Ignite complies with SQL ANSI-99 [2] but in fact (correct me if I'm wrong) it doesn't support recursive queries [1] (the issue mentioned by Andrey), right? Will it be solvable by the new engine? [1] https://issues.apache.org/jira/browse/IGNITE-5475 [2] http://ignite.apache.org/use-cases/database/sql-database.html On Fri, 27 Sep 2019 at 17:22, Nikolay Izhikov wrote: > > Igor. > > > The main issue - there is no *selection*. > > 1. I don't remember community decision about this. > > 2. We should avoid to make such long-term decision so quickly. > We done this kind of decision with H2 and come to the point when we should > review it. > > > 1) Implementing white papers from scratch > > 2) Adopting Calcite to our needs. > > The third option don't fix issues we have with H2. > The fourth option I know is using spark-catalyst. > > What is wrong with writing engine from scratch? > > I ask you to start with engine requirements. > Can we, please, discuss it? > > > If you have an alternative - you're welcome, I'll gratefully listen to you. > > We have alternative for now - H2 based engine. > > > The main question isn't "WHAT" but "HOW" - that's the discussion topic from > > my point of view. > > When we make a decision about engine we can discuss roadmap for replacement. > One more time - replacement of SQL engine to some more customizable make > sense for me. > But, this kind of decisions need carefull discussion. > > В Пт, 27/09/2019 в 17:08 +0300, Seliverstov Igor пишет: > > Nikolay, > > > > The main issue - there is no *selection*. > > > > There is a field of knowledge - relational algebra, which describes how to > > transform relational expressions saving their semantics, and a couple of > > implementations (Calcite is only one written in Java). > > > > There are only two alternatives: > > > > 1) Implementing white papers from scratch > > 2) Adopting Calcite to our needs. > > > > The second way was chosen by several other projects, there is experience, > > there is a list of known issues (like using indexes) so, almost everything > > is already done for us. > > > > Implementing a planner is a big deal, I think anybody understands it there. > > That's why our proposal to reuse others experience is obvious. > > > > If you have an alternative - you're welcome, I'll gratefully listen to you. > > > > The main question isn't "WHAT" but "HOW" - that's the discussion topic from > > my point of view. > > > > Regards, > > Igor > > > > > 27 сент. 2019 г., в 16:37, Nikolay Izhikov > > > написал(а): > > > > > > Roman. > > > > > > > Nikolay, Maxim, I understand that our arguments may not be as obvious > > > > for you as it obvious for SQL team. So, please arrange your questions in > > > > a more constructive way. > > > > > > What is SQL team? > > > I only know Ignite community :) > > > > > > Please, share you knowledge in IEP. > > > I want to join to the process of engine *selection*. > > > It should start with the requirements to such engine. > > > Can you write it in IEP, please? > > > > > > My point is very simple: > > > > > > 1. We made the wrong decision with H2 > > > 2. We should make a well-thought decision about the new engine. > > > > > > > How many tickets would satisfy you? > > > > > > You write about "issueS" with the H2. > > > All I see is one open ticket. > > > IEP doesn't provide enough information. > > > So it's not about the number of tickets, it's about > > > > > > > These two points (single map-reduce execution and inflexible optimizer) > > > > are the main problems with the current engine. > > > > > > We may come to the point when Calcite(or any other engine) brings us > > > third and other "main problems". > > > This is how it happens with H2. > > > > > > Let's start from what we want to get with the engine and move forward > > > from this base. > > > What do you think? > > > > > > > > > > > > В Пт, 27/09/2019 в 16:15 +0300, Roman Kondakov пишет: > > > > Maxim, Nikolay, > > > > > > > > I've listed two issues which show the ideological flaws of the current > > > > engine. > > > > > > > > 1. IGNITE-11448 - Open. This ticket describes the impossibility of > > > > executing queries which can not be fit in the hardcoded one pass > > > > map-reduce paradigm. > > > > > > > > 2. IGNITE-6085 - Closed (won't fix) - This ticket describes the second > > > > major problem with the current engine: H2 query optimizer is very > > > > primitive and can not perform many useful optimizations. > > > > > > > > These two points (single map-reduce execution and inflexible optimizer) > > > > are the main problems with the current engine. It means that our engine > > > > is currently suitable for execution only a very limited subset of the > > > > typical SQL queri
Re: New SQL execution engine
Nikolay, At last we have better questions. There is no decision, here we should decide. Doing nothing isn’t a decision, it’s just doing nothing Spark Catalyst is a good example, but under the hood it has absolutely the same idea, but adopted to Spark. Calcite is the same, but general. That’s why it’s better start point. Implementing an engine from scratch is really cool, but looks like inventing a bicycle, don’t think it makes sense. At least I against this option. I added requirements to IEP (as you asked), you may see it’s in DRAFT state and will be complemented by details. We have some thoughts on how to make smooth replacement, but at first we should decide what to replace and what with. At now Calcite based engine is placed in different module, we checked it can build execution graph for both local and distributed cases, it has good expandability. We talked to Calcite community to identify possible future issues and everything points to the fact it’s the best option. It’s possible to develop it as an experimental extension at first (not a replacement) until we make sure that it works as expected. This way there are no risks for anybody who uses Ignite on production environment. Regards, Igor > 27 сент. 2019 г., в 17:25, Nikolay Izhikov написал(а): > > Igor. > >> The main issue - there is no *selection*. > > 1. I don't remember community decision about this. > > 2. We should avoid to make such long-term decision so quickly. > We done this kind of decision with H2 and come to the point when we should > review it. > >> 1) Implementing white papers from scratch >> 2) Adopting Calcite to our needs. > > The third option don't fix issues we have with H2. > The fourth option I know is using spark-catalyst. > > What is wrong with writing engine from scratch? > > I ask you to start with engine requirements. > Can we, please, discuss it? > >> If you have an alternative - you're welcome, I'll gratefully listen to you. > > We have alternative for now - H2 based engine. > >> The main question isn't "WHAT" but "HOW" - that's the discussion topic from >> my point of view. > > When we make a decision about engine we can discuss roadmap for replacement. > One more time - replacement of SQL engine to some more customizable make > sense for me. > But, this kind of decisions need carefull discussion. > > В Пт, 27/09/2019 в 17:08 +0300, Seliverstov Igor пишет: >> Nikolay, >> >> The main issue - there is no *selection*. >> >> There is a field of knowledge - relational algebra, which describes how to >> transform relational expressions saving their semantics, and a couple of >> implementations (Calcite is only one written in Java). >> >> There are only two alternatives: >> >> 1) Implementing white papers from scratch >> 2) Adopting Calcite to our needs. >> >> The second way was chosen by several other projects, there is experience, >> there is a list of known issues (like using indexes) so, almost everything >> is already done for us. >> >> Implementing a planner is a big deal, I think anybody understands it there. >> That's why our proposal to reuse others experience is obvious. >> >> If you have an alternative - you're welcome, I'll gratefully listen to you. >> >> The main question isn't "WHAT" but "HOW" - that's the discussion topic from >> my point of view. >> >> Regards, >> Igor >> >>> 27 сент. 2019 г., в 16:37, Nikolay Izhikov написал(а): >>> >>> Roman. >>> Nikolay, Maxim, I understand that our arguments may not be as obvious for you as it obvious for SQL team. So, please arrange your questions in a more constructive way. >>> >>> What is SQL team? >>> I only know Ignite community :) >>> >>> Please, share you knowledge in IEP. >>> I want to join to the process of engine *selection*. >>> It should start with the requirements to such engine. >>> Can you write it in IEP, please? >>> >>> My point is very simple: >>> >>> 1. We made the wrong decision with H2 >>> 2. We should make a well-thought decision about the new engine. >>> How many tickets would satisfy you? >>> >>> You write about "issueS" with the H2. >>> All I see is one open ticket. >>> IEP doesn't provide enough information. >>> So it's not about the number of tickets, it's about >>> These two points (single map-reduce execution and inflexible optimizer) are the main problems with the current engine. >>> >>> We may come to the point when Calcite(or any other engine) brings us third >>> and other "main problems". >>> This is how it happens with H2. >>> >>> Let's start from what we want to get with the engine and move forward from >>> this base. >>> What do you think? >>> >>> >>> >>> В Пт, 27/09/2019 в 16:15 +0300, Roman Kondakov пишет: Maxim, Nikolay, I've listed two issues which show the ideological flaws of the current engine. 1. IGNITE-11448 - Open. This ticket describes the impossibility of executing queries which can
Re: New SQL execution engine
Nikolay, Igor. Implementing from scratch is an option, of course. If we decide to go this way then we definitely won't to spend long nights to invent "yet another SQL parser" with all the stuff related to query rewrite rules (e.g. IN -> JOIN) or type casting \ validation \ conversion. We thought about step-by-step H2 replacing. 1. We've tried to make POC with parser replacement to generated one from SQL grammar with ASM, but this approach looks slow, AFAIR. Gridgainers, anybody, have smth on this? 2. Then we need a planner with all the rules. Of course we will need to write rules optimized for "Distributed" execution in anyway, but I doubt anybody want to write common-rules that already has Calcite. We can copy-paste, but what for? 3. Then we have to implement execution pipeline. Possibly, we can adopt new query plans for H2 execution, but then we will still have same pain with resolving H2 internal issues (e.g. OOM). H2 approach is outdated, it doesn't fit Ignite needs as distributes system. With Calcite we can concentrate on 2 and (mostly) 3 points and reuse their architectural abstracts, otherwise we should reinvent those abstracts through long discussions on dev-list. I agree, we should make IEP clear to everyone in community who want to be involved in IEP implementation at first. Both approaches ("from scratch" and "with Calcite") are risky, so Can we try to make an additional engine "beta"-implementation and allow users fallback to old engine until a new one will be decided to become mature enough. On Fri, Sep 27, 2019 at 5:08 PM Seliverstov Igor wrote: > Nikolay, > > The main issue - there is no *selection*. > > There is a field of knowledge - relational algebra, which describes how to > transform relational expressions saving their semantics, and a couple of > implementations (Calcite is only one written in Java). > > There are only two alternatives: > > 1) Implementing white papers from scratch > 2) Adopting Calcite to our needs. > > The second way was chosen by several other projects, there is experience, > there is a list of known issues (like using indexes) so, almost everything > is already done for us. > > Implementing a planner is a big deal, I think anybody understands it > there. That's why our proposal to reuse others experience is obvious. > > If you have an alternative - you're welcome, I'll gratefully listen to you. > > The main question isn't "WHAT" but "HOW" - that's the discussion topic > from my point of view. > > Regards, > Igor > > > 27 сент. 2019 г., в 16:37, Nikolay Izhikov > написал(а): > > > > Roman. > > > >> Nikolay, Maxim, I understand that our arguments may not be as obvious > >> for you as it obvious for SQL team. So, please arrange your questions > in > >> a more constructive way. > > > > What is SQL team? > > I only know Ignite community :) > > > > Please, share you knowledge in IEP. > > I want to join to the process of engine *selection*. > > It should start with the requirements to such engine. > > Can you write it in IEP, please? > > > > My point is very simple: > > > > 1. We made the wrong decision with H2 > > 2. We should make a well-thought decision about the new engine. > > > >> How many tickets would satisfy you? > > > > You write about "issueS" with the H2. > > All I see is one open ticket. > > IEP doesn't provide enough information. > > So it's not about the number of tickets, it's about > > > >> These two points (single map-reduce execution and inflexible optimizer) > >> are the main problems with the current engine. > > > > We may come to the point when Calcite(or any other engine) brings us > third and other "main problems". > > This is how it happens with H2. > > > > Let's start from what we want to get with the engine and move forward > from this base. > > What do you think? > > > > > > > > В Пт, 27/09/2019 в 16:15 +0300, Roman Kondakov пишет: > >> Maxim, Nikolay, > >> > >> I've listed two issues which show the ideological flaws of the current > >> engine. > >> > >> 1. IGNITE-11448 - Open. This ticket describes the impossibility of > >> executing queries which can not be fit in the hardcoded one pass > >> map-reduce paradigm. > >> > >> 2. IGNITE-6085 - Closed (won't fix) - This ticket describes the second > >> major problem with the current engine: H2 query optimizer is very > >> primitive and can not perform many useful optimizations. > >> > >> These two points (single map-reduce execution and inflexible optimizer) > >> are the main problems with the current engine. It means that our engine > >> is currently suitable for execution only a very limited subset of the > >> typical SQL queries. For example it can not even run most of the TPC-H > >> benchmark queries because they don't fit to the simple map-reduce > paradigm. > >> > >>> All I see is links to two tickets: > >> > >> How many tickets would satisfy you? I named two. And it looks like it > is > >> not enough from your point of view. Ok, so how many is enough? The set > >> of problem
Re: How to free up space on disc after removing entries from IgniteCache with enabled PDS?
Probably this should be allowed to do using public API, actually this is same as manual rebalancing. пт, 27 сент. 2019 г. в 17:40, Alexei Scherbakov < alexey.scherbak...@gmail.com>: > The poor man's solution for the problem would be stopping fragmented node > and removing partition data, then starting it again allowing full state > transfer already without deletes. > Rinse and repeat for all owners. > > Anton Vinogradov, would this work for you as workaround ? > > чт, 19 сент. 2019 г. в 13:03, Anton Vinogradov : > >> Alexey, >> >> Let's combine your and Ivan's proposals. >> >> >> vacuum command, which acquires exclusive table lock, so no concurrent >> activities on the table are possible. >> and >> >> Could the problem be solved by stopping a node which needs to be >> defragmented, clearing persistence files and restarting the node? >> >> After rebalancing the node will receive all data back without >> fragmentation. >> >> How about to have special partition state SHRINKING? >> This state should mean that partition unavailable for reads and updates >> but >> should keep it's update-counters and should not be marked as lost, renting >> or evicted. >> At this state we able to iterate over the partition and apply it's entries >> to another file in a compact way. >> Indices should be updated during the copy-on-shrink procedure or at the >> shrink completion. >> Once shrank file is ready we should replace the original partition file >> with it and mark it as MOVING which will start the historical rebalance. >> Shrinking should be performed during the low activity periods, but even in >> case we found that activity was high and historical rebalance is not >> suitable we may just remove the file and use regular rebalance to restore >> the partition (this will also lead to shrink). >> >> BTW, seems, we able to implement partition shrink in a cheap way. >> We may just use rebalancing code to apply fat partition's entries to the >> new file. >> So, 3 stages here: local rebalance, indices update and global historical >> rebalance. >> >> On Thu, Sep 19, 2019 at 11:43 AM Alexey Goncharuk < >> alexey.goncha...@gmail.com> wrote: >> >> > Anton, >> > >> > >> > > >> The solution which Anton suggested does not look easy because it >> will >> > > most likely significantly hurt performance >> > > Mostly agree here, but what drop do we expect? What price do we ready >> to >> > > pay? >> > > Not sure, but seems some vendors ready to pay, for example, 5% drop >> for >> > > this. >> > >> > 5% may be a big drop for some use-cases, so I think we should look at >> how >> > to improve performance, not how to make it worse. >> > >> > >> > > >> > > >> it is hard to maintain a data structure to choose "page from >> free-list >> > > with enough space closest to the beginning of the file". >> > > We can just split each free-list bucket to the couple and use first >> for >> > > pages in the first half of the file and the second for the last. >> > > Only two buckets required here since, during the file shrink, first >> > > bucket's window will be shrank too. >> > > Seems, this give us the same price on put, just use the first bucket >> in >> > > case it's not empty. >> > > Remove price (with merge) will be increased, of course. >> > > >> > > The compromise solution is to have priority put (to the first path of >> the >> > > file), with keeping removal as is, and schedulable per-page migration >> for >> > > the rest of the data during the low activity period. >> > > >> > Free lists are large and slow by themselves, it is expensive to >> checkpoint >> > and read them on start, so as a long-term solution I would look into >> > removing them. Moreover, not sure if adding yet another background >> process >> > will improve the codebase reliability and simplicity. >> > >> > If we want to go the hard path, I would look at free page tracking >> bitmap - >> > a special bitmask page, where each page in an adjacent block is marked >> as 0 >> > if it has free space more than a certain configurable threshold (say, >> 80%) >> > - free, and 1 if less (full). Some vendors have successfully implemented >> > this approach, which looks much more promising, but harder to implement. >> > >> > --AG >> > >> > > > -- > > Best regards, > Alexei Scherbakov > -- Best regards, Alexei Scherbakov
Re: How to free up space on disc after removing entries from IgniteCache with enabled PDS?
The poor man's solution for the problem would be stopping fragmented node and removing partition data, then starting it again allowing full state transfer already without deletes. Rinse and repeat for all owners. Anton Vinogradov, would this work for you as workaround ? чт, 19 сент. 2019 г. в 13:03, Anton Vinogradov : > Alexey, > > Let's combine your and Ivan's proposals. > > >> vacuum command, which acquires exclusive table lock, so no concurrent > activities on the table are possible. > and > >> Could the problem be solved by stopping a node which needs to be > defragmented, clearing persistence files and restarting the node? > >> After rebalancing the node will receive all data back without > fragmentation. > > How about to have special partition state SHRINKING? > This state should mean that partition unavailable for reads and updates but > should keep it's update-counters and should not be marked as lost, renting > or evicted. > At this state we able to iterate over the partition and apply it's entries > to another file in a compact way. > Indices should be updated during the copy-on-shrink procedure or at the > shrink completion. > Once shrank file is ready we should replace the original partition file > with it and mark it as MOVING which will start the historical rebalance. > Shrinking should be performed during the low activity periods, but even in > case we found that activity was high and historical rebalance is not > suitable we may just remove the file and use regular rebalance to restore > the partition (this will also lead to shrink). > > BTW, seems, we able to implement partition shrink in a cheap way. > We may just use rebalancing code to apply fat partition's entries to the > new file. > So, 3 stages here: local rebalance, indices update and global historical > rebalance. > > On Thu, Sep 19, 2019 at 11:43 AM Alexey Goncharuk < > alexey.goncha...@gmail.com> wrote: > > > Anton, > > > > > > > >> The solution which Anton suggested does not look easy because it > will > > > most likely significantly hurt performance > > > Mostly agree here, but what drop do we expect? What price do we ready > to > > > pay? > > > Not sure, but seems some vendors ready to pay, for example, 5% drop for > > > this. > > > > 5% may be a big drop for some use-cases, so I think we should look at how > > to improve performance, not how to make it worse. > > > > > > > > > > >> it is hard to maintain a data structure to choose "page from > free-list > > > with enough space closest to the beginning of the file". > > > We can just split each free-list bucket to the couple and use first for > > > pages in the first half of the file and the second for the last. > > > Only two buckets required here since, during the file shrink, first > > > bucket's window will be shrank too. > > > Seems, this give us the same price on put, just use the first bucket in > > > case it's not empty. > > > Remove price (with merge) will be increased, of course. > > > > > > The compromise solution is to have priority put (to the first path of > the > > > file), with keeping removal as is, and schedulable per-page migration > for > > > the rest of the data during the low activity period. > > > > > Free lists are large and slow by themselves, it is expensive to > checkpoint > > and read them on start, so as a long-term solution I would look into > > removing them. Moreover, not sure if adding yet another background > process > > will improve the codebase reliability and simplicity. > > > > If we want to go the hard path, I would look at free page tracking > bitmap - > > a special bitmask page, where each page in an adjacent block is marked > as 0 > > if it has free space more than a certain configurable threshold (say, > 80%) > > - free, and 1 if less (full). Some vendors have successfully implemented > > this approach, which looks much more promising, but harder to implement. > > > > --AG > > > -- Best regards, Alexei Scherbakov
Re: New SQL execution engine
Igor. > The main issue - there is no *selection*. 1. I don't remember community decision about this. 2. We should avoid to make such long-term decision so quickly. We done this kind of decision with H2 and come to the point when we should review it. > 1) Implementing white papers from scratch > 2) Adopting Calcite to our needs. The third option don't fix issues we have with H2. The fourth option I know is using spark-catalyst. What is wrong with writing engine from scratch? I ask you to start with engine requirements. Can we, please, discuss it? > If you have an alternative - you're welcome, I'll gratefully listen to you. We have alternative for now - H2 based engine. > The main question isn't "WHAT" but "HOW" - that's the discussion topic from > my point of view. When we make a decision about engine we can discuss roadmap for replacement. One more time - replacement of SQL engine to some more customizable make sense for me. But, this kind of decisions need carefull discussion. В Пт, 27/09/2019 в 17:08 +0300, Seliverstov Igor пишет: > Nikolay, > > The main issue - there is no *selection*. > > There is a field of knowledge - relational algebra, which describes how to > transform relational expressions saving their semantics, and a couple of > implementations (Calcite is only one written in Java). > > There are only two alternatives: > > 1) Implementing white papers from scratch > 2) Adopting Calcite to our needs. > > The second way was chosen by several other projects, there is experience, > there is a list of known issues (like using indexes) so, almost everything is > already done for us. > > Implementing a planner is a big deal, I think anybody understands it there. > That's why our proposal to reuse others experience is obvious. > > If you have an alternative - you're welcome, I'll gratefully listen to you. > > The main question isn't "WHAT" but "HOW" - that's the discussion topic from > my point of view. > > Regards, > Igor > > > 27 сент. 2019 г., в 16:37, Nikolay Izhikov написал(а): > > > > Roman. > > > > > Nikolay, Maxim, I understand that our arguments may not be as obvious > > > for you as it obvious for SQL team. So, please arrange your questions in > > > a more constructive way. > > > > What is SQL team? > > I only know Ignite community :) > > > > Please, share you knowledge in IEP. > > I want to join to the process of engine *selection*. > > It should start with the requirements to such engine. > > Can you write it in IEP, please? > > > > My point is very simple: > > > > 1. We made the wrong decision with H2 > > 2. We should make a well-thought decision about the new engine. > > > > > How many tickets would satisfy you? > > > > You write about "issueS" with the H2. > > All I see is one open ticket. > > IEP doesn't provide enough information. > > So it's not about the number of tickets, it's about > > > > > These two points (single map-reduce execution and inflexible optimizer) > > > are the main problems with the current engine. > > > > We may come to the point when Calcite(or any other engine) brings us third > > and other "main problems". > > This is how it happens with H2. > > > > Let's start from what we want to get with the engine and move forward from > > this base. > > What do you think? > > > > > > > > В Пт, 27/09/2019 в 16:15 +0300, Roman Kondakov пишет: > > > Maxim, Nikolay, > > > > > > I've listed two issues which show the ideological flaws of the current > > > engine. > > > > > > 1. IGNITE-11448 - Open. This ticket describes the impossibility of > > > executing queries which can not be fit in the hardcoded one pass > > > map-reduce paradigm. > > > > > > 2. IGNITE-6085 - Closed (won't fix) - This ticket describes the second > > > major problem with the current engine: H2 query optimizer is very > > > primitive and can not perform many useful optimizations. > > > > > > These two points (single map-reduce execution and inflexible optimizer) > > > are the main problems with the current engine. It means that our engine > > > is currently suitable for execution only a very limited subset of the > > > typical SQL queries. For example it can not even run most of the TPC-H > > > benchmark queries because they don't fit to the simple map-reduce > > > paradigm. > > > > > > > All I see is links to two tickets: > > > > > > How many tickets would satisfy you? I named two. And it looks like it is > > > not enough from your point of view. Ok, so how many is enough? The set > > > of problems caused by listed above tickets is infinite, therefore I can > > > not create a ticket for each of them. > > > > Tech details also should be added. > > > > > > Tech details are in the tickets. > > > > > > > We can't discuss such a huge change as an execution engine replacement > > > > with descrition like: > > > > "No data co-location control, i.e. arbitrary data can be returned > > > > silently" or > > > > "Low control on how query execute
Re: New SQL execution engine
Thanks, Andrey! Will take a loo, shortly. В Пт, 27/09/2019 в 17:19 +0300, Andrey Mashenkov пишет: > Issues can't be resolved without changes in H2. > Hope, this will be enough. > > https://issues.apache.org/jira/browse/IGNITE-10598 > https://issues.apache.org/jira/browse/IGNITE-11473 > https://issues.apache.org/jira/browse/IGNITE-11444 > https://issues.apache.org/jira/browse/IGNITE-5289 > https://issues.apache.org/jira/browse/IGNITE-10855 > https://issues.apache.org/jira/browse/IGNITE-11341 > https://issues.apache.org/jira/browse/IGNITE-7526 > https://issues.apache.org/jira/browse/IGNITE-9480 > https://issues.apache.org/jira/browse/IGNITE-9616 > https://issues.apache.org/jira/browse/IGNITE-11891 > https://issues.apache.org/jira/browse/IGNITE-6202 > https://issues.apache.org/jira/browse/IGNITE-11448 > https://issues.apache.org/jira/browse/IGNITE-3911 > > > On Fri, Sep 27, 2019 at 4:34 PM Nikolay Izhikov wrote: > > > Roman. > > > > > Nikolay, Maxim, I understand that our arguments may not be as obvious > > > for you as it obvious for SQL team. So, please arrange your questions in > > > a more constructive way. > > > > What is SQL team? > > I only know Ignite community :) > > > > Please, share you knowledge in IEP. > > I want to join to the process of engine *selection*. > > It should start with the requirements to such engine. > > Can you write it in IEP, please? > > > > My point is very simple: > > > > 1. We made the wrong decision with H2 > > 2. We should make a well-thought decision about the new engine. > > > > > How many tickets would satisfy you? > > > > You write about "issueS" with the H2. > > All I see is one open ticket. > > IEP doesn't provide enough information. > > So it's not about the number of tickets, it's about > > > > > These two points (single map-reduce execution and inflexible optimizer) > > > are the main problems with the current engine. > > > > We may come to the point when Calcite(or any other engine) brings us third > > and other "main problems". > > This is how it happens with H2. > > > > Let's start from what we want to get with the engine and move forward from > > this base. > > What do you think? > > > > > > > > В Пт, 27/09/2019 в 16:15 +0300, Roman Kondakov пишет: > > > Maxim, Nikolay, > > > > > > I've listed two issues which show the ideological flaws of the current > > > engine. > > > > > > 1. IGNITE-11448 - Open. This ticket describes the impossibility of > > > executing queries which can not be fit in the hardcoded one pass > > > map-reduce paradigm. > > > > > > 2. IGNITE-6085 - Closed (won't fix) - This ticket describes the second > > > major problem with the current engine: H2 query optimizer is very > > > primitive and can not perform many useful optimizations. > > > > > > These two points (single map-reduce execution and inflexible optimizer) > > > are the main problems with the current engine. It means that our engine > > > is currently suitable for execution only a very limited subset of the > > > typical SQL queries. For example it can not even run most of the TPC-H > > > benchmark queries because they don't fit to the simple map-reduce > > > > paradigm. > > > > > > > All I see is links to two tickets: > > > > > > How many tickets would satisfy you? I named two. And it looks like it is > > > not enough from your point of view. Ok, so how many is enough? The set > > > of problems caused by listed above tickets is infinite, therefore I can > > > not create a ticket for each of them. > > > > Tech details also should be added. > > > > > > Tech details are in the tickets. > > > > > > > We can't discuss such a huge change as an execution engine replacement > > > > with descrition like: > > > > "No data co-location control, i.e. arbitrary data can be returned > > > > silently" or > > > > "Low control on how query executes internally, as a result we have > > > > limited possibility to implement improvements/fixes." > > > > > > Why not? Don't you understand these problems? Or you don't think this is > > > a problem? > > > > > > > Let's make these descriptions more specific. > > > > > > What do you mean by "more specific"? What is the criteria of the > > > specific description? > > > > > > > > > > > > Nikolay, Maxim, I understand that our arguments may not be as obvious > > > for you as it obvious for SQL team. So, please arrange your questions in > > > a more constructive way. > > > > > > Thank you! > > signature.asc Description: This is a digitally signed message part
Re: New SQL execution engine
Issues can't be resolved without changes in H2. Hope, this will be enough. https://issues.apache.org/jira/browse/IGNITE-10598 https://issues.apache.org/jira/browse/IGNITE-11473 https://issues.apache.org/jira/browse/IGNITE-11444 https://issues.apache.org/jira/browse/IGNITE-5289 https://issues.apache.org/jira/browse/IGNITE-10855 https://issues.apache.org/jira/browse/IGNITE-11341 https://issues.apache.org/jira/browse/IGNITE-7526 https://issues.apache.org/jira/browse/IGNITE-9480 https://issues.apache.org/jira/browse/IGNITE-9616 https://issues.apache.org/jira/browse/IGNITE-11891 https://issues.apache.org/jira/browse/IGNITE-6202 https://issues.apache.org/jira/browse/IGNITE-11448 https://issues.apache.org/jira/browse/IGNITE-3911 On Fri, Sep 27, 2019 at 4:34 PM Nikolay Izhikov wrote: > Roman. > > > Nikolay, Maxim, I understand that our arguments may not be as obvious > > for you as it obvious for SQL team. So, please arrange your questions in > > a more constructive way. > > What is SQL team? > I only know Ignite community :) > > Please, share you knowledge in IEP. > I want to join to the process of engine *selection*. > It should start with the requirements to such engine. > Can you write it in IEP, please? > > My point is very simple: > > 1. We made the wrong decision with H2 > 2. We should make a well-thought decision about the new engine. > > > How many tickets would satisfy you? > > You write about "issueS" with the H2. > All I see is one open ticket. > IEP doesn't provide enough information. > So it's not about the number of tickets, it's about > > > These two points (single map-reduce execution and inflexible optimizer) > > are the main problems with the current engine. > > We may come to the point when Calcite(or any other engine) brings us third > and other "main problems". > This is how it happens with H2. > > Let's start from what we want to get with the engine and move forward from > this base. > What do you think? > > > > В Пт, 27/09/2019 в 16:15 +0300, Roman Kondakov пишет: > > Maxim, Nikolay, > > > > I've listed two issues which show the ideological flaws of the current > > engine. > > > > 1. IGNITE-11448 - Open. This ticket describes the impossibility of > > executing queries which can not be fit in the hardcoded one pass > > map-reduce paradigm. > > > > 2. IGNITE-6085 - Closed (won't fix) - This ticket describes the second > > major problem with the current engine: H2 query optimizer is very > > primitive and can not perform many useful optimizations. > > > > These two points (single map-reduce execution and inflexible optimizer) > > are the main problems with the current engine. It means that our engine > > is currently suitable for execution only a very limited subset of the > > typical SQL queries. For example it can not even run most of the TPC-H > > benchmark queries because they don't fit to the simple map-reduce > paradigm. > > > > > All I see is links to two tickets: > > > > How many tickets would satisfy you? I named two. And it looks like it is > > not enough from your point of view. Ok, so how many is enough? The set > > of problems caused by listed above tickets is infinite, therefore I can > > not create a ticket for each of them. > > > Tech details also should be added. > > > > Tech details are in the tickets. > > > > > We can't discuss such a huge change as an execution engine replacement > with descrition like: > > > "No data co-location control, i.e. arbitrary data can be returned > silently" or > > > "Low control on how query executes internally, as a result we have > limited possibility to implement improvements/fixes." > > > > Why not? Don't you understand these problems? Or you don't think this is > > a problem? > > > > > Let's make these descriptions more specific. > > > > What do you mean by "more specific"? What is the criteria of the > > specific description? > > > > > > > > Nikolay, Maxim, I understand that our arguments may not be as obvious > > for you as it obvious for SQL team. So, please arrange your questions in > > a more constructive way. > > > > Thank you! > -- Best regards, Andrey V. Mashenkov
Re: New SQL execution engine
Nikolay, The main issue - there is no *selection*. There is a field of knowledge - relational algebra, which describes how to transform relational expressions saving their semantics, and a couple of implementations (Calcite is only one written in Java). There are only two alternatives: 1) Implementing white papers from scratch 2) Adopting Calcite to our needs. The second way was chosen by several other projects, there is experience, there is a list of known issues (like using indexes) so, almost everything is already done for us. Implementing a planner is a big deal, I think anybody understands it there. That's why our proposal to reuse others experience is obvious. If you have an alternative - you're welcome, I'll gratefully listen to you. The main question isn't "WHAT" but "HOW" - that's the discussion topic from my point of view. Regards, Igor > 27 сент. 2019 г., в 16:37, Nikolay Izhikov написал(а): > > Roman. > >> Nikolay, Maxim, I understand that our arguments may not be as obvious >> for you as it obvious for SQL team. So, please arrange your questions in >> a more constructive way. > > What is SQL team? > I only know Ignite community :) > > Please, share you knowledge in IEP. > I want to join to the process of engine *selection*. > It should start with the requirements to such engine. > Can you write it in IEP, please? > > My point is very simple: > > 1. We made the wrong decision with H2 > 2. We should make a well-thought decision about the new engine. > >> How many tickets would satisfy you? > > You write about "issueS" with the H2. > All I see is one open ticket. > IEP doesn't provide enough information. > So it's not about the number of tickets, it's about > >> These two points (single map-reduce execution and inflexible optimizer) >> are the main problems with the current engine. > > We may come to the point when Calcite(or any other engine) brings us third > and other "main problems". > This is how it happens with H2. > > Let's start from what we want to get with the engine and move forward from > this base. > What do you think? > > > > В Пт, 27/09/2019 в 16:15 +0300, Roman Kondakov пишет: >> Maxim, Nikolay, >> >> I've listed two issues which show the ideological flaws of the current >> engine. >> >> 1. IGNITE-11448 - Open. This ticket describes the impossibility of >> executing queries which can not be fit in the hardcoded one pass >> map-reduce paradigm. >> >> 2. IGNITE-6085 - Closed (won't fix) - This ticket describes the second >> major problem with the current engine: H2 query optimizer is very >> primitive and can not perform many useful optimizations. >> >> These two points (single map-reduce execution and inflexible optimizer) >> are the main problems with the current engine. It means that our engine >> is currently suitable for execution only a very limited subset of the >> typical SQL queries. For example it can not even run most of the TPC-H >> benchmark queries because they don't fit to the simple map-reduce paradigm. >> >>> All I see is links to two tickets: >> >> How many tickets would satisfy you? I named two. And it looks like it is >> not enough from your point of view. Ok, so how many is enough? The set >> of problems caused by listed above tickets is infinite, therefore I can >> not create a ticket for each of them. >>> Tech details also should be added. >> >> Tech details are in the tickets. >> >>> We can't discuss such a huge change as an execution engine replacement with >>> descrition like: >>> "No data co-location control, i.e. arbitrary data can be returned silently" >>> or >>> "Low control on how query executes internally, as a result we have limited >>> possibility to implement improvements/fixes." >> >> Why not? Don't you understand these problems? Or you don't think this is >> a problem? >> >>> Let's make these descriptions more specific. >> >> What do you mean by "more specific"? What is the criteria of the >> specific description? >> >> >> >> Nikolay, Maxim, I understand that our arguments may not be as obvious >> for you as it obvious for SQL team. So, please arrange your questions in >> a more constructive way. >> >> Thank you!
[jira] [Created] (IGNITE-12238) RobinHoodBackwardShiftHashMap works incorrectly on big endian architectures
Andrey N. Gura created IGNITE-12238: --- Summary: RobinHoodBackwardShiftHashMap works incorrectly on big endian architectures Key: IGNITE-12238 URL: https://issues.apache.org/jira/browse/IGNITE-12238 Project: Ignite Issue Type: Bug Reporter: Andrey N. Gura Assignee: Andrey N. Gura Fix For: 2.8 {{RobinHoodBackwardShiftHashMap}} has bug that can be reproduced only on big endinan architectures. In order to reproduce the problem run the following tests: * {{RobinHoodBackwardShiftHashMapTest.testCollisionOnRemove}} * {{testRandomOpsPutRemove}} The problem is {{setIdealBucket()}} method writes {{long}} value to the offheap memory, while {{getIdealBucket()}} reads {{int}} value. For little endian architectures it works because meaningful 4 bytes will written first to the memory and leading zero bytes will be rewriteen by the next operation. On big endian architecture always 4 zero bytes will be written to the memory. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: New SQL execution engine
Roman. > Nikolay, Maxim, I understand that our arguments may not be as obvious > for you as it obvious for SQL team. So, please arrange your questions in > a more constructive way. What is SQL team? I only know Ignite community :) Please, share you knowledge in IEP. I want to join to the process of engine *selection*. It should start with the requirements to such engine. Can you write it in IEP, please? My point is very simple: 1. We made the wrong decision with H2 2. We should make a well-thought decision about the new engine. > How many tickets would satisfy you? You write about "issueS" with the H2. All I see is one open ticket. IEP doesn't provide enough information. So it's not about the number of tickets, it's about > These two points (single map-reduce execution and inflexible optimizer) > are the main problems with the current engine. We may come to the point when Calcite(or any other engine) brings us third and other "main problems". This is how it happens with H2. Let's start from what we want to get with the engine and move forward from this base. What do you think? В Пт, 27/09/2019 в 16:15 +0300, Roman Kondakov пишет: > Maxim, Nikolay, > > I've listed two issues which show the ideological flaws of the current > engine. > > 1. IGNITE-11448 - Open. This ticket describes the impossibility of > executing queries which can not be fit in the hardcoded one pass > map-reduce paradigm. > > 2. IGNITE-6085 - Closed (won't fix) - This ticket describes the second > major problem with the current engine: H2 query optimizer is very > primitive and can not perform many useful optimizations. > > These two points (single map-reduce execution and inflexible optimizer) > are the main problems with the current engine. It means that our engine > is currently suitable for execution only a very limited subset of the > typical SQL queries. For example it can not even run most of the TPC-H > benchmark queries because they don't fit to the simple map-reduce paradigm. > > > All I see is links to two tickets: > > How many tickets would satisfy you? I named two. And it looks like it is > not enough from your point of view. Ok, so how many is enough? The set > of problems caused by listed above tickets is infinite, therefore I can > not create a ticket for each of them. > > Tech details also should be added. > > Tech details are in the tickets. > > > We can't discuss such a huge change as an execution engine replacement with > > descrition like: > > "No data co-location control, i.e. arbitrary data can be returned silently" > > or > > "Low control on how query executes internally, as a result we have limited > > possibility to implement improvements/fixes." > > Why not? Don't you understand these problems? Or you don't think this is > a problem? > > > Let's make these descriptions more specific. > > What do you mean by "more specific"? What is the criteria of the > specific description? > > > > Nikolay, Maxim, I understand that our arguments may not be as obvious > for you as it obvious for SQL team. So, please arrange your questions in > a more constructive way. > > Thank you! signature.asc Description: This is a digitally signed message part
Re: New SQL execution engine
Maxim, Nikolay, I've listed two issues which show the ideological flaws of the current engine. 1. IGNITE-11448 - Open. This ticket describes the impossibility of executing queries which can not be fit in the hardcoded one pass map-reduce paradigm. 2. IGNITE-6085 - Closed (won't fix) - This ticket describes the second major problem with the current engine: H2 query optimizer is very primitive and can not perform many useful optimizations. These two points (single map-reduce execution and inflexible optimizer) are the main problems with the current engine. It means that our engine is currently suitable for execution only a very limited subset of the typical SQL queries. For example it can not even run most of the TPC-H benchmark queries because they don't fit to the simple map-reduce paradigm. All I see is links to two tickets: How many tickets would satisfy you? I named two. And it looks like it is not enough from your point of view. Ok, so how many is enough? The set of problems caused by listed above tickets is infinite, therefore I can not create a ticket for each of them. Tech details also should be added. Tech details are in the tickets. We can't discuss such a huge change as an execution engine replacement with descrition like: "No data co-location control, i.e. arbitrary data can be returned silently" or "Low control on how query executes internally, as a result we have limited possibility to implement improvements/fixes." Why not? Don't you understand these problems? Or you don't think this is a problem? Let's make these descriptions more specific. What do you mean by "more specific"? What is the criteria of the specific description? Nikolay, Maxim, I understand that our arguments may not be as obvious for you as it obvious for SQL team. So, please arrange your questions in a more constructive way. Thank you! -- Kind Regards Roman Kondakov On 27.09.2019 15:32, Maxim Muzafarov wrote: Folks, I agree with Nikolay, the idea of replacing the H2 engine with the most suitable one is reasonable. But since such change is major we should have a strong argumentation on it even for members with are working outside the SQL-team. I think it is really necessary to have: 1. The list of issues related to the current engine (H2) which from different points of view and for different developers must seem unsolvable. For example, `... the H2 execution plan is hard-wired with H2 internals and can't be easily transformed` seems doesn't have a strong technical argumentation. After this step, we should have a clear understanding that the engine change is required. 2. Why only the Apache Calcite? It seems to me we should have a table with a comparison of different engines with the pros and cons of each other. A brief search shows me that we may have a few options here. After this step, we should have a clear understanding of why we choose this dependency prior to another. 3. We should also have a migration decomposition and step by step actions to do. I haven't found such a decomposition on IEP-37 page. Do we have one? What the implementation phases will be? What components will be changed? What a new API would be and would it be? What problems we are expecting e.g performance degradation on prototype implementation? `Risks and Assumptions` topic doesn't seem to be a good described. After this step, we should have a clear and obvious a new feature implementation plan. Let's have a strong technical discussion. On Fri, 27 Sep 2019 at 15:17, Nikolay Izhikov wrote: Hello, Roman. All I see is links to two tickets: IGNITE-11448 - Open IGNITE-6085 - Closed Other issues described poorly and have not ticket links. We can't discuss such a huge change as an execution engine replacement with descrition like: "No data co-location control, i.e. arbitrary data can be returned silently" or "Low control on how query executes internally, as a result we have limited possibility to implement improvements/fixes." I think we need some reproducer that shows issue. Tech details also should be added. Let's make these descriptions more specific. Let's discuss how we want to fix them with the new engine. В Пт, 27/09/2019 в 15:10 +0300, Roman Kondakov пишет: Hello Nikolay, please see IEP--37 [1]. Issues are there. [1] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084
Re: New SQL execution engine
Hello, Alexey. Thanks for the details. > Now, as for alternatives for Apache Calcite I want to discuss our *requirements* for the new engine first. Can we do it? The main reason to do it - We should avoid wrong technical decision. We made one with H2 and we shouldn't do it again. > As for the IEP content - I agree, we should have a more detailed > description of steps and technical information there, but I believe this > will be improved further. Thanks! Looking forward for IEP details. В Пт, 27/09/2019 в 16:04 +0300, Alexey Goncharuk пишет: > Nikolay, Maxim, > > Asking to provide a list of issues with the current H2 is pointless because > it has a fundamental architectural flow, not just a bunch of bugs: > > Currently, the query execution is limited to a two-phase map-reduce task > (with an optional remote cursor when 'distributed joins' flag is enabled) > and only a limited subset of queries can be executed. You can easily see > that if you try to draw how three non-collocated caches should be joined on > an arbitrary condition. > > H2 cannot solve this problem because H2 is a local database and is not > designed to execute distributed queries, let alone the fact that it is not > designed to be embedded to other projects as an execution engine. Because > of this, H2 upgrade is a huge pain which leads to issues up to broken > compilation. This is exactly the reason why the ticket with index use for > IN() expression [1] has only been fixed in 2.7, one can see the amount of > changes needed for a simple version upgrade. > > Now, as for alternatives for Apache Calcite - I personally spent quite a > large amount of time looking for alternatives but did not find any even > remotely matching the abilities and flexibility of Calcite, but did not > find any. As folks noted before, Calcite is specifically designed to have > flexible optimization rules and support distributed query execution, which > is already proved by real-life projects. If you have any other framework in > mind that should be considered - please let the community know, I believe > it will be a more productive discussion than now. > > As for the IEP content - I agree, we should have a more detailed > description of steps and technical information there, but I believe this > will be improved further. > > --AG > > [1] https://issues.apache.org/jira/browse/IGNITE-4150 > > > > пт, 27 сент. 2019 г. в 15:33, Maxim Muzafarov : > > > Folks, > > > > I agree with Nikolay, the idea of replacing the H2 engine with the > > most suitable one is reasonable. But since such change is major we > > should have a strong argumentation on it even for members with are > > working outside the SQL-team. > > > > I think it is really necessary to have: > > > > 1. The list of issues related to the current engine (H2) which from > > different points of view and for different developers must seem > > unsolvable. For example, `... the H2 execution plan is hard-wired with > > H2 internals and can't be easily transformed` seems doesn't have a > > strong technical argumentation. > > After this step, we should have a clear understanding that the engine > > change is required. > > > > 2. Why only the Apache Calcite? It seems to me we should have a table > > with a comparison of different engines with the pros and cons of each > > other. A brief search shows me that we may have a few options here. > > After this step, we should have a clear understanding of why we choose > > this dependency prior to another. > > > > 3. We should also have a migration decomposition and step by step > > actions to do. I haven't found such a decomposition on IEP-37 page. Do > > we have one? What the implementation phases will be? What components > > will be changed? What a new API would be and would it be? What > > problems we are expecting e.g performance degradation on prototype > > implementation? `Risks and Assumptions` topic doesn't seem to be a > > good described. > > After this step, we should have a clear and obvious a new feature > > implementation plan. > > > > Let's have a strong technical discussion. > > > > On Fri, 27 Sep 2019 at 15:17, Nikolay Izhikov wrote: > > > > > > Hello, Roman. > > > > > > All I see is links to two tickets: > > > > > > IGNITE-11448 - Open > > > IGNITE-6085 - Closed > > > > > > Other issues described poorly and have not ticket links. > > > We can't discuss such a huge change as an execution engine replacement > > > > with descrition like: > > > > > > "No data co-location control, i.e. arbitrary data can be returned > > > > silently" or > > > "Low control on how query executes internally, as a result we have > > > > limited possibility to implement improvements/fixes." > > > > > > I think we need some reproducer that shows issue. > > > Tech details also should be added. > > > > > > Let's make these descriptions more specific. > > > Let's discuss how we want to fix them with the new engine. > > > > > > > > > В Пт, 27/09/2019 в 15:
Re: New SQL execution engine
Nikolay, Maxim, Asking to provide a list of issues with the current H2 is pointless because it has a fundamental architectural flow, not just a bunch of bugs: Currently, the query execution is limited to a two-phase map-reduce task (with an optional remote cursor when 'distributed joins' flag is enabled) and only a limited subset of queries can be executed. You can easily see that if you try to draw how three non-collocated caches should be joined on an arbitrary condition. H2 cannot solve this problem because H2 is a local database and is not designed to execute distributed queries, let alone the fact that it is not designed to be embedded to other projects as an execution engine. Because of this, H2 upgrade is a huge pain which leads to issues up to broken compilation. This is exactly the reason why the ticket with index use for IN() expression [1] has only been fixed in 2.7, one can see the amount of changes needed for a simple version upgrade. Now, as for alternatives for Apache Calcite - I personally spent quite a large amount of time looking for alternatives but did not find any even remotely matching the abilities and flexibility of Calcite, but did not find any. As folks noted before, Calcite is specifically designed to have flexible optimization rules and support distributed query execution, which is already proved by real-life projects. If you have any other framework in mind that should be considered - please let the community know, I believe it will be a more productive discussion than now. As for the IEP content - I agree, we should have a more detailed description of steps and technical information there, but I believe this will be improved further. --AG [1] https://issues.apache.org/jira/browse/IGNITE-4150 пт, 27 сент. 2019 г. в 15:33, Maxim Muzafarov : > Folks, > > I agree with Nikolay, the idea of replacing the H2 engine with the > most suitable one is reasonable. But since such change is major we > should have a strong argumentation on it even for members with are > working outside the SQL-team. > > I think it is really necessary to have: > > 1. The list of issues related to the current engine (H2) which from > different points of view and for different developers must seem > unsolvable. For example, `... the H2 execution plan is hard-wired with > H2 internals and can't be easily transformed` seems doesn't have a > strong technical argumentation. > After this step, we should have a clear understanding that the engine > change is required. > > 2. Why only the Apache Calcite? It seems to me we should have a table > with a comparison of different engines with the pros and cons of each > other. A brief search shows me that we may have a few options here. > After this step, we should have a clear understanding of why we choose > this dependency prior to another. > > 3. We should also have a migration decomposition and step by step > actions to do. I haven't found such a decomposition on IEP-37 page. Do > we have one? What the implementation phases will be? What components > will be changed? What a new API would be and would it be? What > problems we are expecting e.g performance degradation on prototype > implementation? `Risks and Assumptions` topic doesn't seem to be a > good described. > After this step, we should have a clear and obvious a new feature > implementation plan. > > Let's have a strong technical discussion. > > On Fri, 27 Sep 2019 at 15:17, Nikolay Izhikov wrote: > > > > Hello, Roman. > > > > All I see is links to two tickets: > > > > IGNITE-11448 - Open > > IGNITE-6085 - Closed > > > > Other issues described poorly and have not ticket links. > > We can't discuss such a huge change as an execution engine replacement > with descrition like: > > > > "No data co-location control, i.e. arbitrary data can be returned > silently" or > > "Low control on how query executes internally, as a result we have > limited possibility to implement improvements/fixes." > > > > I think we need some reproducer that shows issue. > > Tech details also should be added. > > > > Let's make these descriptions more specific. > > Let's discuss how we want to fix them with the new engine. > > > > > > В Пт, 27/09/2019 в 15:10 +0300, Roman Kondakov пишет: > > > Hello Nikolay, > > > > > > please see IEP--37 [1]. Issues are there. > > > > > > > > > [1] > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084 > > > > > > >
[jira] [Created] (IGNITE-12237) Forbid thin client connections dynamically
Denis Mekhanikov created IGNITE-12237: - Summary: Forbid thin client connections dynamically Key: IGNITE-12237 URL: https://issues.apache.org/jira/browse/IGNITE-12237 Project: Ignite Issue Type: Improvement Components: thin client Reporter: Denis Mekhanikov Sometimes it's useful to forbid thin clients connections to nodes for some period of time. At this time cluster may be performing some activation needed for correct work of the application. It would be good to have an API call, opening and closing thin client connections. This feature was requested in the following StackOverflow question: https://stackoverflow.com/questions/58106297/how-to-block-java-thin-client-request-till-preloading-of-data-in-ignite-cluster -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: New SQL execution engine
Folks, I agree with Nikolay, the idea of replacing the H2 engine with the most suitable one is reasonable. But since such change is major we should have a strong argumentation on it even for members with are working outside the SQL-team. I think it is really necessary to have: 1. The list of issues related to the current engine (H2) which from different points of view and for different developers must seem unsolvable. For example, `... the H2 execution plan is hard-wired with H2 internals and can't be easily transformed` seems doesn't have a strong technical argumentation. After this step, we should have a clear understanding that the engine change is required. 2. Why only the Apache Calcite? It seems to me we should have a table with a comparison of different engines with the pros and cons of each other. A brief search shows me that we may have a few options here. After this step, we should have a clear understanding of why we choose this dependency prior to another. 3. We should also have a migration decomposition and step by step actions to do. I haven't found such a decomposition on IEP-37 page. Do we have one? What the implementation phases will be? What components will be changed? What a new API would be and would it be? What problems we are expecting e.g performance degradation on prototype implementation? `Risks and Assumptions` topic doesn't seem to be a good described. After this step, we should have a clear and obvious a new feature implementation plan. Let's have a strong technical discussion. On Fri, 27 Sep 2019 at 15:17, Nikolay Izhikov wrote: > > Hello, Roman. > > All I see is links to two tickets: > > IGNITE-11448 - Open > IGNITE-6085 - Closed > > Other issues described poorly and have not ticket links. > We can't discuss such a huge change as an execution engine replacement with > descrition like: > > "No data co-location control, i.e. arbitrary data can be returned silently" or > "Low control on how query executes internally, as a result we have limited > possibility to implement improvements/fixes." > > I think we need some reproducer that shows issue. > Tech details also should be added. > > Let's make these descriptions more specific. > Let's discuss how we want to fix them with the new engine. > > > В Пт, 27/09/2019 в 15:10 +0300, Roman Kondakov пишет: > > Hello Nikolay, > > > > please see IEP--37 [1]. Issues are there. > > > > > > [1] > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084 > > > >
Re: Improvements for new security approach.
I finished with fixes: https://issues.apache.org/jira/browse/IGNITE-11992 >> Subject's size is unlimited, that can lead to a dramatic increase in traffic between nodes. I added network optimization for this case. I add a subject in the case when ctx.discovery().node(secSubjId) == null. >> Also, we need to get rid of GridTaskThreadContextKey#TC_SUBJ_ID as duplication of IgnitSecurity responsibility. [2]Yes, we should rid of this. But in the next task, because I can't merge it since 18 Jul 19:) [1] I aggry with you. пт, 27 сент. 2019 г. в 11:42, Denis Garus : > Hello, Maksim! > > Thank you for your effort and interest in the security of Ignite. > > I would like you to pay attention to the discussion [1] and issue [2]. > It looks like not only task should execute in the current security context > but all operations too, that is essential to determine a security id for > events. > Also, we need to get rid of GridTaskThreadContextKey#TC_SUBJ_ID as > duplication of IgnitSecurity responsibility. > I think your task is the right place to do that. > What is your opinion? > > >>It's the reason why subject id isn't enough and we should transmit > subject inside message for this case. > There is a problem with this approach. > Subject's size is unlimited, that can lead to a dramatic increase in > traffic between nodes. > > 1. > http://apache-ignite-developers.2346864.n4.nabble.com/JavaDoc-for-Event-s-subjectId-methods-td43663.html > 2. https://issues.apache.org/jira/browse/IGNITE-9914 > > пт, 27 сент. 2019 г. в 08:38, Anton Vinogradov : > >> Maksim >> >> >> I want to fix 2-3-4 points under one ticket. >> Please let me know once it's become ready to be reviewed. >> >> On Thu, Sep 26, 2019 at 5:18 PM Maksim Stepachev < >> maksim.stepac...@gmail.com> >> wrote: >> >> > Hi. >> > >> > Anton Vinogradov, >> > >> > I want to fix 2-3-4 points under one ticket. >> > >> > The first was fixed in the ticket: >> > https://issues.apache.org/jira/browse/IGNITE-11094 >> > Also, I aggry with you that 5-6 isn't required to ignite. >> > >> > Denis Garus, >> > I made reproducer for point 3. Looks at the test from my pull-request: >> > JettyRestPropagationSecurityContextTest >> > >> > https://github.com/apache/ignite/pull/6918 >> > >> > For point 2 you should apply GridRestProcessor from pr and set debug >> into >> > VisorQueryUtils#scheduleQueryStart between >> > ignite.context().closure().runLocalSafe and call: >> > ignite.context().security().securityContext() >> > >> > >> > For point 3, do action above and call: >> > >> ignite.context().discovery().node(ignite.context().security().securityContext().subject().id()) >> > >> > It returns null because this subject was created from the rest. It's the >> > reason why subject id isn't enough and we should transmit subject inside >> > message for this case. >> > >> > чт, 18 июл. 2019 г. в 12:45, Anton Vinogradov : >> > >> >> Maksim, >> >> >> >> Could you please split IGNITE-11992 to subtasks with proper >> descriptions? >> >> This will allow us to relocate discussion to the issues to solve each >> >> problem properly. >> >> >> >> On Thu, Jul 18, 2019 at 11:57 AM Denis Garus >> wrote: >> >> >> >> > Hello, Maksim! >> >> > Thanks for your analysis! >> >> > >> >> > I have a few questions about your proposals. >> >> > >> >> > GridRestProcessor. >> >> > AFAIK, when GridRestProcessor handle client request >> >> > (GridRestProcessor#handleRequest) >> >> > it process authentication (GridRestProcessor#authenticate) >> >> > and then authorization of request (GridRestProcessor#authorize) >> inside >> >> > client context. >> >> > Can you give additional info about issues with GridRestProcessor >> from 3 >> >> and >> >> > 4? Maybe you have a reproducer for the problem? >> >> > >> >> > NoOpIgniteSecurityProcessor. >> >> > I think the case that you describe in 5 is not a bug. >> >> > All nodes (client and server) must have security enabled or disabled. >> >> > I can't imagine the case when it is not. >> >> > >> >> > ATTR_SECURITY_SUBJECT. >> >> > I don't think that compatibility is needed here. If you will use node >> >> with >> >> > propagation security context to remote node and older nodes >> >> > you can get subtle errors. >> >> > >> >> > чт, 18 июл. 2019 г. в 11:12, Maksim Stepachev < >> >> maksim.stepac...@gmail.com >> >> > >: >> >> > >> >> > > Hi, Ivan. >> >> > > >> >> > > Yes, I have. >> >> > > https://issues.apache.org/jira/browse/IGNITE-11992 >> >> > > >> >> > > I'm waiting for a visa. >> >> > > >> >> > > >> >> > > чт, 18 июл. 2019 г. в 11:09, Ivan Rakov : >> >> > > >> >> > >> Hello Max, >> >> > >> >> >> > >> Thanks for your analysis! >> >> > >> >> >> > >> Have you created a JIRA issue for discovered defects? >> >> > >> >> >> > >> Best Regards, >> >> > >> Ivan Rakov >> >> > >> >> >> > >> On 17.07.2019 17:08, Maksim Stepachev wrote: >> >> > >> > Hello, Igniters. >> >> > >> > >> >> > >> > The main idea of the new security is propagation security >> >> context >> >> > >> to >> >> > >> > oth
Re: New SQL execution engine
Hello, Roman. All I see is links to two tickets: IGNITE-11448 - Open IGNITE-6085 - Closed Other issues described poorly and have not ticket links. We can't discuss such a huge change as an execution engine replacement with descrition like: "No data co-location control, i.e. arbitrary data can be returned silently" or "Low control on how query executes internally, as a result we have limited possibility to implement improvements/fixes." I think we need some reproducer that shows issue. Tech details also should be added. Let's make these descriptions more specific. Let's discuss how we want to fix them with the new engine. В Пт, 27/09/2019 в 15:10 +0300, Roman Kondakov пишет: > Hello Nikolay, > > please see IEP--37 [1]. Issues are there. > > > [1] > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084 > > signature.asc Description: This is a digitally signed message part
Re: New SQL execution engine
Hello Nikolay, please see IEP--37 [1]. Issues are there. [1] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084 -- Kind Regards Roman Kondakov On 27.09.2019 14:20, Nikolay Izhikov wrote: Hello, Roman. Also Apache Calcite is commonly used in popular Apache projects I don't think it's a good point. H2 is also commonly used. But, it doesn't conform to Ignite requirements. Can you, please, write down issues and engine requirements to the IEP? So we can discuss each point separately. В Пт, 27/09/2019 в 13:56 +0300, Roman Kondakov пишет: Hello Nikolay. You've asked very good questions. I'll try to answer. 1. What the exact issues with the H2 integration? Can you send a tickets links? Can we label all H2 integration issues in JIRA? I propose to use "h2" label. Current SQL engine is confined in the single-pass map-reduce algorithm. This make impossible to execute complex queries which can not be expressed with a single map-reduce pass like subqueries with aggregates [1]. Another problem is that H2 optimizer is very primitive and not able to perform many useful optimizations [2]. Also Apache Calcite is commonly used in popular Apache projects like Hive, Drill, Flink and others [3]. So it's mature and well battle tested framework, while H2 is a toy database which is hardly ever used in the real production systems. 2. What are the requirements for the new SQL engine? We should write it down and discuss. The main requirement is to fix the problems listed above. The new SQL engine should be able to *effectively* execute SQL queries of the *arbitrary complexity*. For example the new engine will be able to perform distributed joins in a multiple ways [4], when current engine can do it only in two ways: collocated and distributed (the latter is usually not very efficient and needed to set manually). 3. What options do we have? Are there any alternatives to Calcite on the market? We did the wrong choice that looked obvious one time. So we should carefully avoid it at this time. I know the only one open source implementation of the efficient query optimization strategy - and this is Apache Calcite. The alternative way is to write our own query optimizer from scratch which is not a trivial task at all. 4. What is improvements of Ignite we want to make with the new engine? Ignite will be able to execute complex queries using optimal strategy. I think this is a quite good improvement. [1] https://issues.apache.org/jira/browse/IGNITE-11448 [2] https://issues.apache.org/jira/browse/IGNITE-6085 [3] https://calcite.apache.org/docs/powered_by.html [4] https://www.memsql.com/blog/scaling-distributed-joins/
[jira] [Created] (IGNITE-12236) RepositoryFactorySupport#getQueryLookupStrategy no longer overriden in IgniteRepositoryFactory
Riquet Thibaut created IGNITE-12236: --- Summary: RepositoryFactorySupport#getQueryLookupStrategy no longer overriden in IgniteRepositoryFactory Key: IGNITE-12236 URL: https://issues.apache.org/jira/browse/IGNITE-12236 Project: Ignite Issue Type: Bug Components: spring Affects Versions: 2.7.6 Reporter: Riquet Thibaut Hello, org.apache.ignite.springdata20.repository.support.IgniteRepositoryFactory#getQueryLookupStrategy does not override org.springframework.data.repository.core.support.RepositoryFactorySupport#getQueryLookupStrategy since this commit [https://github.com/spring-projects/spring-data-commons/commit/a6215fbe0f5c9a254cddacb12763737f2c286ad5] this results in a thrown exception in org.springframework.data.repository.core.support.RepositoryFactorySupport.QueryExecutorMethodInterceptor#QueryExecutorMethodInterceptor -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [IGNITE-9836] Invalid check of ea java versions
Done: https://github.com/apache/ignite/pull/6920 While we’re talking about the startup scripts… https://issues.apache.org/jira/browse/IGNITE-11856 Regards, Stephen > On 26 Sep 2019, at 17:02, Ilya Kasnacheev wrote: > > Hello! > > Please do! > > Regards, > -- > Ilya Kasnacheev > > > вт, 17 сент. 2019 г. в 11:13, Stephen Darlington < > stephen.darling...@gridgain.com>: > >> I can’t take any credit for the patch but if the original author has lost >> interest I’m happy to help push it through. >> >> Regards, >> Stephen >> >>> On 16 Sep 2019, at 19:31, Denis Magda wrote: >>> >>> Stephen, >>> >>> Thanks for sending the patch! Seems that Igniters are already actively >>> reviewing it in JIRA. >>> >>> - >>> Denis >>> >>> >>> On Mon, Sep 16, 2019 at 7:03 AM Stephen Darlington < >>> stephen.darling...@gridgain.com> wrote: >>> Hi, Would someone mind taking a quick look at this ticket? Basically, a >> clean download of Ignite won’t start if the version of Java you’re using has a number like “java version "1.8.0_202-ea””. (This is the default if you >> get your JDK using Homebrew on a Mac.) > https://issues.apache.org/jira/browse/IGNITE-9836 < https://issues.apache.org/jira/browse/IGNITE-9836> This has been bugging me for ages and now that I look at it I find that there’s already a tiny, working patch available. Regards, Stephen >> >> >>
Re: New SQL execution engine
Hello, Roman. > Also Apache Calcite is commonly used in popular Apache projects I don't think it's a good point. H2 is also commonly used. But, it doesn't conform to Ignite requirements. Can you, please, write down issues and engine requirements to the IEP? So we can discuss each point separately. В Пт, 27/09/2019 в 13:56 +0300, Roman Kondakov пишет: > Hello Nikolay. > > You've asked very good questions. I'll try to answer. > > > 1. What the exact issues with the H2 integration? > > Can you send a tickets links? > > Can we label all H2 integration issues in JIRA? I propose to use "h2" label. > > Current SQL engine is confined in the single-pass map-reduce algorithm. > This make impossible to execute complex queries which can not be > expressed with a single map-reduce pass like subqueries with aggregates > [1]. Another problem is that H2 optimizer is very primitive and not > able to perform many useful optimizations [2]. > > Also Apache Calcite is commonly used in popular Apache projects like > Hive, Drill, Flink and others [3]. So it's mature and well battle tested > framework, while H2 is a toy database which is hardly ever used in the > real production systems. > > > 2. What are the requirements for the new SQL engine? > > We should write it down and discuss. > > The main requirement is to fix the problems listed above. The new SQL > engine should be able to *effectively* execute SQL queries of the > *arbitrary complexity*. For example the new engine will be able to > perform distributed joins in a multiple ways [4], when current engine > can do it only in two ways: collocated and distributed (the latter is > usually not very efficient and needed to set manually). > > > 3. What options do we have? > > Are there any alternatives to Calcite on the market? > > We did the wrong choice that looked obvious one time. > > So we should carefully avoid it at this time. > > I know the only one open source implementation of the efficient query > optimization strategy - and this is Apache Calcite. The alternative way > is to write our own query optimizer from scratch which is not a trivial > task at all. > > > > 4. What is improvements of Ignite we want to make with the new engine? > > Ignite will be able to execute complex queries using optimal strategy. I > think this is a quite good improvement. > > > [1] https://issues.apache.org/jira/browse/IGNITE-11448 > [2] https://issues.apache.org/jira/browse/IGNITE-6085 > [3] https://calcite.apache.org/docs/powered_by.html > [4] https://www.memsql.com/blog/scaling-distributed-joins/ signature.asc Description: This is a digitally signed message part
Re: New SQL execution engine
Hello, Andrey. > Ignite SQL layer has some issues that can't be fix with changes in Ignite > only, and we are blocked with H2. What are these issues? Can you make it specific and send a tickets for this issues? > 3. Replace H2 with smth else. Actually, I support this decision in general. But, to make a right choise for H2 replacement we should carefully discuss such huge replacement. So far, I can't see any written down(in IEP) requirements for SQL engine. Let's do it and discuss them. В Пт, 27/09/2019 в 13:39 +0300, Andrey Mashenkov пишет: > Hi Nikolay, > > Let me add my 5- cent here. > > Ignite SQL layer has some issues that can't be fix with changes in Ignite > only, and we are blocked with H2. > To resolve these issues we can: > 1. Donate some changes to H2 and wait for it's next release. But there are > more cons than pros and I think we can't rely on H2 project anymore. > - There is no guarantee our changes will be approved by H2 community. > - We definitely won't to depend on H2 product lifecycle. > - New H2 features (like parallel multi-statement query processing in latest > release) force Ignite for significant changes\refactoring in Ignite SQL > layer with no visible benefits. > Every next release it becomes harder to upgrade H2 dependency. > - Latest H2 versions causes questions about their stability. > > Hot issues are > - Large intermediate results inside H2 internals can cause OOM for some > kind of queries. Ignite can't handle this anyhow for now without reworking > H2 code. > - HashJoins > - Ignite can't start multi-step queries, but 2-step (map-reduce) only. > - It is not possible to apply optimizations on query plan as no logical > plan actually doen't exists. H2 execution plan is hard-wired with H2 > internals and can't be easily transformed. > Implementing a new good planner over H2 looks like a huge task. > > 2. Fork H2. > We already done this in GridGain (you can found H2 module in GridGain > community edition) as fastest way to unblock work on SQL improvements. > But this way doesn't look like a good one for Ignite, regarding our > experience. > - H2 code can't be included into Ignite at all. > H2 license are MIT and EPL. From one side they can't be changed to Apache > Licence. From other side Apache Foundation don't want to host any code > licensed with other than Apache License. > GridGain is ok with this, but Apache Foundation won't. > > - We can made separate H2 fork project with it's own lifecycle with full > control over it and publish it in Maven Central. > This doen't seem like a big deal. But will causes additional difficulties > in development, test and release processes of Ignite. > This way seems bring much pain for every contributor. > > 3. Replace H2 with smth else. > E.g. with Apache Calcite. > - Calcite is a framework and it is designed very flexible and extendable. > - Every it's part can be replaced with our own implementation. > - Apache License is out of the box =) > > So, summary: > 1-st way of pain we have now and it slows down Ignite SQL layer developing. > 2-nd looks few better, but seems bring Ignite to nowhere in prospect. > 3-rd is a risky, but promissory way. > > > On Fri, Sep 27, 2019 at 12:16 PM Nikolay Izhikov > wrote: > > > Hello, Igor. > > > > Thanks for starting this discussion. > > > > I think we should take a step back in it and answer the following > > questions: > > > > 1. What the exact issues with the H2 integration? > > Can you send a tickets links? > > Can we label all H2 integration issues in JIRA? I propose to use "h2" > > label. > > > > 2. What are the requirements for the new SQL engine? > > We should write it down and discuss. > > > > 3. What options do we have? > > Are there any alternatives to Calcite on the market? > > We did the wrong choice that looked obvious one time. > > So we should carefully avoid it at this time. > > > > 4. What is improvements of Ignite we want to make with the new engine? > > > > > > В Пт, 27/09/2019 в 08:44 +, Igor Seliverstov пишет: > > > Hi Igniters! > > > > > > As you might know currently we have many open issues relating to current > > > > H2 based engine and its execution flow. > > > > > > Some of them are critical (like impossibility to execute particular > > > > queries), some of them are majors (like impossibility to execute particular > > queries without pre-preparation your data to have a collocation) and many > > minors. > > > > > > Most of the issues cannot be solved without whole engine redesign. > > > > > > So, here the proposal: > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084 > > > > > > I'll appreciate if you share your thoughts on top of that. > > > > > > Regards, > > > Igor > > signature.asc Description: This is a digitally signed message part
Re: New SQL execution engine
Hello Nikolay. You've asked very good questions. I'll try to answer. 1. What the exact issues with the H2 integration? Can you send a tickets links? Can we label all H2 integration issues in JIRA? I propose to use "h2" label. Current SQL engine is confined in the single-pass map-reduce algorithm. This make impossible to execute complex queries which can not be expressed with a single map-reduce pass like subqueries with aggregates [1]. Another problem is that H2 optimizer is very primitive and not able to perform many useful optimizations [2]. Also Apache Calcite is commonly used in popular Apache projects like Hive, Drill, Flink and others [3]. So it's mature and well battle tested framework, while H2 is a toy database which is hardly ever used in the real production systems. 2. What are the requirements for the new SQL engine? We should write it down and discuss. The main requirement is to fix the problems listed above. The new SQL engine should be able to *effectively* execute SQL queries of the *arbitrary complexity*. For example the new engine will be able to perform distributed joins in a multiple ways [4], when current engine can do it only in two ways: collocated and distributed (the latter is usually not very efficient and needed to set manually). 3. What options do we have? Are there any alternatives to Calcite on the market? We did the wrong choice that looked obvious one time. So we should carefully avoid it at this time. I know the only one open source implementation of the efficient query optimization strategy - and this is Apache Calcite. The alternative way is to write our own query optimizer from scratch which is not a trivial task at all. 4. What is improvements of Ignite we want to make with the new engine? Ignite will be able to execute complex queries using optimal strategy. I think this is a quite good improvement. [1] https://issues.apache.org/jira/browse/IGNITE-11448 [2] https://issues.apache.org/jira/browse/IGNITE-6085 [3] https://calcite.apache.org/docs/powered_by.html [4] https://www.memsql.com/blog/scaling-distributed-joins/ -- Kind Regards Roman Kondakov On 27.09.2019 12:20, Nikolay Izhikov wrote: Hello, Igor. Thanks for starting this discussion. I think we should take a step back in it and answer the following questions: 1. What the exact issues with the H2 integration? Can you send a tickets links? Can we label all H2 integration issues in JIRA? I propose to use "h2" label. 2. What are the requirements for the new SQL engine? We should write it down and discuss. 3. What options do we have? Are there any alternatives to Calcite on the market? We did the wrong choice that looked obvious one time. So we should carefully avoid it at this time. 4. What is improvements of Ignite we want to make with the new engine? В Пт, 27/09/2019 в 08:44 +, Igor Seliverstov пишет: Hi Igniters! As you might know currently we have many open issues relating to current H2 based engine and its execution flow. Some of them are critical (like impossibility to execute particular queries), some of them are majors (like impossibility to execute particular queries without pre-preparation your data to have a collocation) and many minors. Most of the issues cannot be solved without whole engine redesign. So, here the proposal: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084 I'll appreciate if you share your thoughts on top of that. Regards, Igor
Re: New SQL execution engine
Hi Nikolay, Let me add my 5- cent here. Ignite SQL layer has some issues that can't be fix with changes in Ignite only, and we are blocked with H2. To resolve these issues we can: 1. Donate some changes to H2 and wait for it's next release. But there are more cons than pros and I think we can't rely on H2 project anymore. - There is no guarantee our changes will be approved by H2 community. - We definitely won't to depend on H2 product lifecycle. - New H2 features (like parallel multi-statement query processing in latest release) force Ignite for significant changes\refactoring in Ignite SQL layer with no visible benefits. Every next release it becomes harder to upgrade H2 dependency. - Latest H2 versions causes questions about their stability. Hot issues are - Large intermediate results inside H2 internals can cause OOM for some kind of queries. Ignite can't handle this anyhow for now without reworking H2 code. - HashJoins - Ignite can't start multi-step queries, but 2-step (map-reduce) only. - It is not possible to apply optimizations on query plan as no logical plan actually doen't exists. H2 execution plan is hard-wired with H2 internals and can't be easily transformed. Implementing a new good planner over H2 looks like a huge task. 2. Fork H2. We already done this in GridGain (you can found H2 module in GridGain community edition) as fastest way to unblock work on SQL improvements. But this way doesn't look like a good one for Ignite, regarding our experience. - H2 code can't be included into Ignite at all. H2 license are MIT and EPL. From one side they can't be changed to Apache Licence. From other side Apache Foundation don't want to host any code licensed with other than Apache License. GridGain is ok with this, but Apache Foundation won't. - We can made separate H2 fork project with it's own lifecycle with full control over it and publish it in Maven Central. This doen't seem like a big deal. But will causes additional difficulties in development, test and release processes of Ignite. This way seems bring much pain for every contributor. 3. Replace H2 with smth else. E.g. with Apache Calcite. - Calcite is a framework and it is designed very flexible and extendable. - Every it's part can be replaced with our own implementation. - Apache License is out of the box =) So, summary: 1-st way of pain we have now and it slows down Ignite SQL layer developing. 2-nd looks few better, but seems bring Ignite to nowhere in prospect. 3-rd is a risky, but promissory way. On Fri, Sep 27, 2019 at 12:16 PM Nikolay Izhikov wrote: > Hello, Igor. > > Thanks for starting this discussion. > > I think we should take a step back in it and answer the following > questions: > > 1. What the exact issues with the H2 integration? > Can you send a tickets links? > Can we label all H2 integration issues in JIRA? I propose to use "h2" > label. > > 2. What are the requirements for the new SQL engine? > We should write it down and discuss. > > 3. What options do we have? > Are there any alternatives to Calcite on the market? > We did the wrong choice that looked obvious one time. > So we should carefully avoid it at this time. > > 4. What is improvements of Ignite we want to make with the new engine? > > > В Пт, 27/09/2019 в 08:44 +, Igor Seliverstov пишет: > > Hi Igniters! > > > > As you might know currently we have many open issues relating to current > H2 based engine and its execution flow. > > > > Some of them are critical (like impossibility to execute particular > queries), some of them are majors (like impossibility to execute particular > queries without pre-preparation your data to have a collocation) and many > minors. > > > > Most of the issues cannot be solved without whole engine redesign. > > > > So, here the proposal: > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084 > > > > I'll appreciate if you share your thoughts on top of that. > > > > Regards, > > Igor > -- Best regards, Andrey V. Mashenkov
Re: New SQL execution engine
Hi Igor! In my opinion using Apache Calcite for distributed SQL query optimization and planning is much more promising approach than using H2. H2 is not suitable for distributed query execution and also it has very limited abilities for query optimization. While Apache Calcite is the open source implementation of Cascade/Volcano query optimization framework [1,2] (other implementations: MS SQL Server, Greenplum). The main advantage of this framework is it's extensibility - we can change the optimizer behavior by simply adding or removing optimization rules to it. Calcite has a cost based optimizer as well as heuristic one which can be useful in some situations. The main challenges I see here: 1. Implementing the distributed query planning for Apache Calcite (it is was primarily developed for the single-node query optimization). We can reuse the solution of Apache Drill [3] guys here. 2. We need to implement a new distributed query execution engine. Apache Calcite is a query planning framework, but not the execution one, besides it has some abilities for executing queries in the single-node case. 3. Secondary indexes are not supported by Calcite, so we need to overcome this problem somehow. AFAIK Apache Phoenix [4] guys implemented support of the secondary indexes as a sorted materialized views. 4. Apache Calcite is a cost-based optimizer - so we need to create our own cost model and gather statistics to be able to choose the most effective query execution plans. 5. What about deprecating our current query API which has a number of drawbacks like using shortcuts `List' as a query result or multiple redundant flags in `SqlFieldsQuery` (collocated, lazy, etc) which are useless for the new query execution engine? [1] https://www.cse.iitb.ac.in/infolab/Data/Courses/CS632/Papers/Cascades-graefe.pdf [2] https://www.cse.iitb.ac.in/infolab/Data/Courses/CS632/Papers/Volcano-graefe.pdf [3] https://drill.apache.org/ [4] https://phoenix.apache.org/ -- Kind Regards Roman Kondakov On 27.09.2019 11:44, Igor Seliverstov wrote: Hi Igniters! As you might know currently we have many open issues relating to current H2 based engine and its execution flow. Some of them are critical (like impossibility to execute particular queries), some of them are majors (like impossibility to execute particular queries without pre-preparation your data to have a collocation) and many minors. Most of the issues cannot be solved without whole engine redesign. So, here the proposal: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084 I'll appreciate if you share your thoughts on top of that. Regards, Igor
Re: New SQL execution engine
Hello, Igor. Thanks for starting this discussion. I think we should take a step back in it and answer the following questions: 1. What the exact issues with the H2 integration? Can you send a tickets links? Can we label all H2 integration issues in JIRA? I propose to use "h2" label. 2. What are the requirements for the new SQL engine? We should write it down and discuss. 3. What options do we have? Are there any alternatives to Calcite on the market? We did the wrong choice that looked obvious one time. So we should carefully avoid it at this time. 4. What is improvements of Ignite we want to make with the new engine? В Пт, 27/09/2019 в 08:44 +, Igor Seliverstov пишет: > Hi Igniters! > > As you might know currently we have many open issues relating to current H2 > based engine and its execution flow. > > Some of them are critical (like impossibility to execute particular queries), > some of them are majors (like impossibility to execute particular queries > without pre-preparation your data to have a collocation) and many minors. > > Most of the issues cannot be solved without whole engine redesign. > > So, here the proposal: > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084 > > I'll appreciate if you share your thoughts on top of that. > > Regards, > Igor signature.asc Description: This is a digitally signed message part
New SQL execution engine
Hi Igniters! As you might know currently we have many open issues relating to current H2 based engine and its execution flow. Some of them are critical (like impossibility to execute particular queries), some of them are majors (like impossibility to execute particular queries without pre-preparation your data to have a collocation) and many minors. Most of the issues cannot be solved without whole engine redesign. So, here the proposal: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084 I'll appreciate if you share your thoughts on top of that. Regards, Igor
Re: Improvements for new security approach.
Hello, Maksim! Thank you for your effort and interest in the security of Ignite. I would like you to pay attention to the discussion [1] and issue [2]. It looks like not only task should execute in the current security context but all operations too, that is essential to determine a security id for events. Also, we need to get rid of GridTaskThreadContextKey#TC_SUBJ_ID as duplication of IgnitSecurity responsibility. I think your task is the right place to do that. What is your opinion? >>It's the reason why subject id isn't enough and we should transmit subject inside message for this case. There is a problem with this approach. Subject's size is unlimited, that can lead to a dramatic increase in traffic between nodes. 1. http://apache-ignite-developers.2346864.n4.nabble.com/JavaDoc-for-Event-s-subjectId-methods-td43663.html 2. https://issues.apache.org/jira/browse/IGNITE-9914 пт, 27 сент. 2019 г. в 08:38, Anton Vinogradov : > Maksim > > >> I want to fix 2-3-4 points under one ticket. > Please let me know once it's become ready to be reviewed. > > On Thu, Sep 26, 2019 at 5:18 PM Maksim Stepachev < > maksim.stepac...@gmail.com> > wrote: > > > Hi. > > > > Anton Vinogradov, > > > > I want to fix 2-3-4 points under one ticket. > > > > The first was fixed in the ticket: > > https://issues.apache.org/jira/browse/IGNITE-11094 > > Also, I aggry with you that 5-6 isn't required to ignite. > > > > Denis Garus, > > I made reproducer for point 3. Looks at the test from my pull-request: > > JettyRestPropagationSecurityContextTest > > > > https://github.com/apache/ignite/pull/6918 > > > > For point 2 you should apply GridRestProcessor from pr and set debug into > > VisorQueryUtils#scheduleQueryStart between > > ignite.context().closure().runLocalSafe and call: > > ignite.context().security().securityContext() > > > > > > For point 3, do action above and call: > > > ignite.context().discovery().node(ignite.context().security().securityContext().subject().id()) > > > > It returns null because this subject was created from the rest. It's the > > reason why subject id isn't enough and we should transmit subject inside > > message for this case. > > > > чт, 18 июл. 2019 г. в 12:45, Anton Vinogradov : > > > >> Maksim, > >> > >> Could you please split IGNITE-11992 to subtasks with proper > descriptions? > >> This will allow us to relocate discussion to the issues to solve each > >> problem properly. > >> > >> On Thu, Jul 18, 2019 at 11:57 AM Denis Garus > wrote: > >> > >> > Hello, Maksim! > >> > Thanks for your analysis! > >> > > >> > I have a few questions about your proposals. > >> > > >> > GridRestProcessor. > >> > AFAIK, when GridRestProcessor handle client request > >> > (GridRestProcessor#handleRequest) > >> > it process authentication (GridRestProcessor#authenticate) > >> > and then authorization of request (GridRestProcessor#authorize) inside > >> > client context. > >> > Can you give additional info about issues with GridRestProcessor from > 3 > >> and > >> > 4? Maybe you have a reproducer for the problem? > >> > > >> > NoOpIgniteSecurityProcessor. > >> > I think the case that you describe in 5 is not a bug. > >> > All nodes (client and server) must have security enabled or disabled. > >> > I can't imagine the case when it is not. > >> > > >> > ATTR_SECURITY_SUBJECT. > >> > I don't think that compatibility is needed here. If you will use node > >> with > >> > propagation security context to remote node and older nodes > >> > you can get subtle errors. > >> > > >> > чт, 18 июл. 2019 г. в 11:12, Maksim Stepachev < > >> maksim.stepac...@gmail.com > >> > >: > >> > > >> > > Hi, Ivan. > >> > > > >> > > Yes, I have. > >> > > https://issues.apache.org/jira/browse/IGNITE-11992 > >> > > > >> > > I'm waiting for a visa. > >> > > > >> > > > >> > > чт, 18 июл. 2019 г. в 11:09, Ivan Rakov : > >> > > > >> > >> Hello Max, > >> > >> > >> > >> Thanks for your analysis! > >> > >> > >> > >> Have you created a JIRA issue for discovered defects? > >> > >> > >> > >> Best Regards, > >> > >> Ivan Rakov > >> > >> > >> > >> On 17.07.2019 17:08, Maksim Stepachev wrote: > >> > >> > Hello, Igniters. > >> > >> > > >> > >> > The main idea of the new security is propagation security > >> context > >> > >> to > >> > >> > other nodes and does action with initial permission. The solution > >> > looks > >> > >> > fine but has imperfections. > >> > >> > > >> > >> > 1. ZookeaperDiscoveryImpl doesn't implement security into itself. > >> > >> >As a result: Caused by: class > >> > >> org.apache.ignite.spi.IgniteSpiException: > >> > >> > Security context isn't certain. > >> > >> > 2. The visor tasks lost permission. > >> > >> > The method VisorQueryUtils#scheduleQueryStart makes a new thread > >> and > >> > >> loses > >> > >> > context. > >> > >> > 3. The GridRestProcessor does tasks outside "withContext" > >> section. As > >> > >> > result context loses. > >> > >> > 4. The GridRestProcessor isn't client, we can't read security > >> subject >