date:20190927

[jira] [Created] (IGNITE-12239) Transaction keys system view

2019-09-27 Thread Nikolay Izhikov (Jira)

Nikolay Izhikov created IGNITE-12239:


 Summary: Transaction keys system view
 Key: IGNITE-12239
 URL: https://issues.apache.org/jira/browse/IGNITE-12239
 Project: Ignite
  Issue Type: Sub-task
Reporter: Nikolay Izhikov


We should export transaction keys as a system view



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: New SQL execution engine

2019-09-27 Thread Nikolay Izhikov

Hello, Denis.

Thanks for the clarifications.

Sounds good for me.
All I try to say in this thread: 
Guys, please, let's take a step back and write down requirements(what we want 
to get with SQL engine).
Which features and use-cases are primary for us.

I'm sure you have done it, already during your research.

Please, share it with the community.

I'm pretty sure we would back to this document again and again during migration.
So good written design is worth it.

В Пт, 27/09/2019 в 09:10 -0700, Denis Magda пишет:
> Ignite mates, let me try to move the discussion in a constructive way. It
> looks like we set a wrong context from the very beginning.
> 
> Before proposing this idea to the community, some of us were
> discussing/researching the topic in different groups (the one need to think
> it through first before even suggesting to consider changes of this
> magnitude). The day has come to share this idea with the whole community
> and outline the next actions. But (!) nobody is 100% sure that that's the
> right decision. Thus, this will be an *experiment*, some of our community
> members will be developing a *prototype* and only based on the prototype
> outcomes we shall make a final decision. Igor, Roman, Ivan, Andrey, hope
> that nothing has changed and we're on the same page here.
> 
> Many technical and architectural reasons that justify this project have
> been shared but let me throw in my perspective. There is nothing wrong with
> H2, that was the right choice for that time.  Thanks to H2 and Ignite SQL
> APIs, our project is used across hundreds of deployments who are
> accelerating relational databases or use Ignite as a system of records.
> However, these days many more companies are migrating to *distributed*
> databases that speak SQL. For instance, if a couple of years ago 1 out of
> 10 use cases needed support for multi-joins queries or queries with
> subselects or efficient memory usage then today there are 5 out of 10 use
> cases of this kind; in the foreseeable future, it will be a 10 out of 10.
> So, the evolution is in progress -- the relational world goes distributed,
> it became exhaustive for both Ignite SQL maintainers and experts who help
> to tune it for production usage to keep pace with the evolution mostly due
> to the H2-dependency. Thus, Ignite SQL has to evolve and has to be ready to
> face the future reality.
> 
> Luckily, we don't need to rush and don't have the right to rush because
> hundreds existing users have already trusted their production environments
> to Ignite SQL and we need to roll out changes with such a big impact
> carefully. So, I'm excited that Roman, Igor, Ivan, Andrey stepped in and
> agreed to be the first contributors who will be *experimenting* with the
> new SQL engine. Let's support them; let's connect them with Apache Calcite
> community and see how this story evolves.  Folks, please keep the community
> aware of the progress, let us know when help is needed, some of us will be
> ready to support with development once you create a solid foundation for
> the prototype.
> 
> -
> Denis
> 
> 
> On Fri, Sep 27, 2019 at 1:45 AM Igor Seliverstov 
> wrote:
> 
> > Hi Igniters!
> > 
> > As you might know currently we have many open issues relating to current
> > H2 based engine and its execution flow.
> > 
> > Some of them are critical (like impossibility to execute particular
> > queries), some of them are majors (like impossibility to execute particular
> > queries without pre-preparation your data to have a collocation) and many
> > minors.
> > 
> > Most of the issues cannot be solved without whole engine redesign.
> > 
> > So, here the proposal:
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084
> > 
> > I'll appreciate if you share your thoughts on top of that.
> > 
> > Regards,
> > Igor
> > 


signature.asc
Description: This is a digitally signed message part

Re: New SQL execution engine

2019-09-27 Thread Nikolay Izhikov

> I think, we should discuss the idea in general.

Everybody likes the idea so far :)
The issues in details, as usual.


В Пт, 27/09/2019 в 19:03 +0300, Seliverstov Igor пишет:
> Nikolay,
> 
> > What project hosted Calcite based engine?
> 
> 
> Currently the prototype is placed in my personal Ignite fork. I need an 
> appropriate ticket before pushing it to ASF git repository. 
> At first, I think, we should discuss the idea in general.
> 
> > Personally, I'm against the support of two independent implementation of 
> > SQL engine for several releases.
> 
> 
> I don’t like the idea to have two engines too. But even development the 
> engine on top of Calcite library is still a big deal. 
> I not sure it will be ready, no, I sure it WONT be ready by Ignite3 release. 
> So I mentioned the option to have two engines at the same time.
> 
> > Let's start with the IEP clarification and replace the SQL engine with the 
> > best one for Ignite good.
> 
> Of course, but anyway it’s good to make familiar with a couple of examples it 
> already describes and clarify some additional questions the community may ask.
> 
> Regards,
> Igor
> 
> > 27 сент. 2019 г., в 18:22, Nikolay Izhikov  написал(а):
> > 
> > Igor.
> > 
> > > There is no decision, here we should decide.
> > 
> > Great.
> > 
> > > At now Calcite based engine is placed in different module
> > 
> > What project hosted Calcite based engine?
> > 
> > > It’s possible to develop it as an experimental extension at first (not a 
> > > replacement)
> > 
> > For me, Ignite 3 are the place where the new engine has to be placed.
> > Personally, I'm against the support of two independent implementation of 
> > SQL engine for several releases.
> > 
> > Ignite has too many partially implemented features to include on more :)
> > 
> > Let's start with the IEP clarification and replace the SQL engine with the 
> > best one for Ignite good.
> > 
> > 
> > В Пт, 27/09/2019 в 18:08 +0300, Seliverstov Igor пишет:
> > > Nikolay,
> > > 
> > > At last we have better questions.
> > > 
> > > There is no decision, here we should decide.
> > > 
> > > Doing nothing isn’t a decision, it’s just doing nothing
> > > 
> > > Spark Catalyst is a good example, but under the hood it has absolutely 
> > > the same idea, but adopted to Spark. Calcite is the same, but general. 
> > > That’s why it’s better start point.
> > > 
> > > Implementing an engine from scratch is really cool, but looks like 
> > > inventing a bicycle, don’t think it makes sense. At least I against this 
> > > option.
> > > 
> > > I added requirements to IEP (as you asked), you may see it’s in DRAFT 
> > > state and will be complemented by details.
> > > 
> > > We have some thoughts on how to make smooth replacement, but at first we 
> > > should decide what to replace and what with.
> > > 
> > > At now Calcite based engine is placed in different module, we checked it 
> > > can build execution graph for both local and distributed cases, it has 
> > > good expandability. 
> > > We talked to Calcite community to identify possible future issues and 
> > > everything points to the fact it’s the best option. 
> > > It’s possible to develop it as an experimental extension at first (not a 
> > > replacement) until we make sure that it works as expected. This way there 
> > > are no risks for anybody who uses Ignite on production environment.
> > > 
> > > Regards,
> > > Igor
> > > 
> > > 
> > > > 27 сент. 2019 г., в 17:25, Nikolay Izhikov  
> > > > написал(а):
> > > > 
> > > > Igor.
> > > > 
> > > > > The main issue - there is no *selection*.
> > > > 
> > > > 1. I don't remember community decision about this.
> > > > 
> > > > 2. We should avoid to make such long-term decision so quickly.
> > > > We done this kind of decision with H2 and come to the point when we 
> > > > should review it.
> > > > 
> > > > > 1) Implementing white papers from scratch
> > > > > 2) Adopting Calcite to our needs.
> > > > 
> > > > The third option don't fix issues we have with H2.
> > > > The fourth option I know is using spark-catalyst.
> > > > 
> > > > What is wrong with writing engine from scratch?
> > > > 
> > > > I ask you to start with engine requirements.
> > > > Can we, please, discuss it?
> > > > 
> > > > > If you have an alternative - you're welcome, I'll gratefully listen 
> > > > > to you.
> > > > 
> > > > We have alternative for now - H2 based engine.
> > > > 
> > > > > The main question isn't "WHAT" but "HOW" - that's the discussion 
> > > > > topic from my point of view.
> > > > 
> > > > When we make a decision about engine we can discuss roadmap for 
> > > > replacement.
> > > > One more time - replacement of SQL engine to some more customizable 
> > > > make sense for me.
> > > > But, this kind of decisions need carefull discussion.
> > > > 
> > > > В Пт, 27/09/2019 в 17:08 +0300, Seliverstov Igor пишет:
> > > > > Nikolay,
> > > > > 
> > > > > The main issue - there is no *selection*.
> > > > > 
> > > > > There is a field of knowledge -

Re: New SQL execution engine

2019-09-27 Thread Denis Magda

Ignite mates, let me try to move the discussion in a constructive way. It
looks like we set a wrong context from the very beginning.

Before proposing this idea to the community, some of us were
discussing/researching the topic in different groups (the one need to think
it through first before even suggesting to consider changes of this
magnitude). The day has come to share this idea with the whole community
and outline the next actions. But (!) nobody is 100% sure that that's the
right decision. Thus, this will be an *experiment*, some of our community
members will be developing a *prototype* and only based on the prototype
outcomes we shall make a final decision. Igor, Roman, Ivan, Andrey, hope
that nothing has changed and we're on the same page here.

Many technical and architectural reasons that justify this project have
been shared but let me throw in my perspective. There is nothing wrong with
H2, that was the right choice for that time.  Thanks to H2 and Ignite SQL
APIs, our project is used across hundreds of deployments who are
accelerating relational databases or use Ignite as a system of records.
However, these days many more companies are migrating to *distributed*
databases that speak SQL. For instance, if a couple of years ago 1 out of
10 use cases needed support for multi-joins queries or queries with
subselects or efficient memory usage then today there are 5 out of 10 use
cases of this kind; in the foreseeable future, it will be a 10 out of 10.
So, the evolution is in progress -- the relational world goes distributed,
it became exhaustive for both Ignite SQL maintainers and experts who help
to tune it for production usage to keep pace with the evolution mostly due
to the H2-dependency. Thus, Ignite SQL has to evolve and has to be ready to
face the future reality.

Luckily, we don't need to rush and don't have the right to rush because
hundreds existing users have already trusted their production environments
to Ignite SQL and we need to roll out changes with such a big impact
carefully. So, I'm excited that Roman, Igor, Ivan, Andrey stepped in and
agreed to be the first contributors who will be *experimenting* with the
new SQL engine. Let's support them; let's connect them with Apache Calcite
community and see how this story evolves.  Folks, please keep the community
aware of the progress, let us know when help is needed, some of us will be
ready to support with development once you create a solid foundation for
the prototype.

-
Denis

On Fri, Sep 27, 2019 at 1:45 AM Igor Seliverstov 
wrote:

> Hi Igniters!
>
> As you might know currently we have many open issues relating to current
> H2 based engine and its execution flow.
>
> Some of them are critical (like impossibility to execute particular
> queries), some of them are majors (like impossibility to execute particular
> queries without pre-preparation your data to have a collocation) and many
> minors.
>
> Most of the issues cannot be solved without whole engine redesign.
>
> So, here the proposal:
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084
>
> I'll appreciate if you share your thoughts on top of that.
>
> Regards,
> Igor
>

Re: New SQL execution engine

2019-09-27 Thread Seliverstov Igor

Nikolay,

> What project hosted Calcite based engine?


Currently the prototype is placed in my personal Ignite fork. I need an 
appropriate ticket before pushing it to ASF git repository. 
At first, I think, we should discuss the idea in general.

> Personally, I'm against the support of two independent implementation of SQL 
> engine for several releases.


I don’t like the idea to have two engines too. But even development the engine 
on top of Calcite library is still a big deal. 
I not sure it will be ready, no, I sure it WONT be ready by Ignite3 release. So 
I mentioned the option to have two engines at the same time.

> Let's start with the IEP clarification and replace the SQL engine with the 
> best one for Ignite good.

Of course, but anyway it’s good to make familiar with a couple of examples it 
already describes and clarify some additional questions the community may ask.

Regards,
Igor

> 27 сент. 2019 г., в 18:22, Nikolay Izhikov  написал(а):
> 
> Igor.
> 
>> There is no decision, here we should decide.
> 
> Great.
> 
>> At now Calcite based engine is placed in different module
> 
> What project hosted Calcite based engine?
> 
>> It’s possible to develop it as an experimental extension at first (not a 
>> replacement)
> 
> For me, Ignite 3 are the place where the new engine has to be placed.
> Personally, I'm against the support of two independent implementation of SQL 
> engine for several releases.
> 
> Ignite has too many partially implemented features to include on more :)
> 
> Let's start with the IEP clarification and replace the SQL engine with the 
> best one for Ignite good.
> 
> 
> В Пт, 27/09/2019 в 18:08 +0300, Seliverstov Igor пишет:
>> Nikolay,
>> 
>> At last we have better questions.
>> 
>> There is no decision, here we should decide.
>> 
>> Doing nothing isn’t a decision, it’s just doing nothing
>> 
>> Spark Catalyst is a good example, but under the hood it has absolutely the 
>> same idea, but adopted to Spark. Calcite is the same, but general. That’s 
>> why it’s better start point.
>> 
>> Implementing an engine from scratch is really cool, but looks like inventing 
>> a bicycle, don’t think it makes sense. At least I against this option.
>> 
>> I added requirements to IEP (as you asked), you may see it’s in DRAFT state 
>> and will be complemented by details.
>> 
>> We have some thoughts on how to make smooth replacement, but at first we 
>> should decide what to replace and what with.
>> 
>> At now Calcite based engine is placed in different module, we checked it can 
>> build execution graph for both local and distributed cases, it has good 
>> expandability. 
>> We talked to Calcite community to identify possible future issues and 
>> everything points to the fact it’s the best option. 
>> It’s possible to develop it as an experimental extension at first (not a 
>> replacement) until we make sure that it works as expected. This way there 
>> are no risks for anybody who uses Ignite on production environment.
>> 
>> Regards,
>> Igor
>> 
>> 
>>> 27 сент. 2019 г., в 17:25, Nikolay Izhikov  написал(а):
>>> 
>>> Igor.
>>> 
 The main issue - there is no *selection*.
>>> 
>>> 1. I don't remember community decision about this.
>>> 
>>> 2. We should avoid to make such long-term decision so quickly.
>>> We done this kind of decision with H2 and come to the point when we should 
>>> review it.
>>> 
 1) Implementing white papers from scratch
 2) Adopting Calcite to our needs.
>>> 
>>> The third option don't fix issues we have with H2.
>>> The fourth option I know is using spark-catalyst.
>>> 
>>> What is wrong with writing engine from scratch?
>>> 
>>> I ask you to start with engine requirements.
>>> Can we, please, discuss it?
>>> 
 If you have an alternative - you're welcome, I'll gratefully listen to you.
>>> 
>>> We have alternative for now - H2 based engine.
>>> 
 The main question isn't "WHAT" but "HOW" - that's the discussion topic 
 from my point of view.
>>> 
>>> When we make a decision about engine we can discuss roadmap for replacement.
>>> One more time - replacement of SQL engine to some more customizable make 
>>> sense for me.
>>> But, this kind of decisions need carefull discussion.
>>> 
>>> В Пт, 27/09/2019 в 17:08 +0300, Seliverstov Igor пишет:
 Nikolay,
 
 The main issue - there is no *selection*.
 
 There is a field of knowledge - relational algebra, which describes how to 
 transform relational expressions saving their semantics, and a couple of 
 implementations (Calcite is only one written in Java).
 
 There are only two alternatives:
 
 1) Implementing white papers from scratch
 2) Adopting Calcite to our needs.
 
 The second way was chosen by several other projects, there is experience, 
 there is a list of known issues (like using indexes) so, almost everything 
 is already done for us.
 
 Implementing a planner is a big deal, I think anybody understands it 
>>>

Re: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

2019-09-27 Thread Павлухин Иван

Yuriy,

Thank you for providing details! Quite interesting.

Yes, we already have support of distributed limit and merging sorted
subresults for SQL queries. E.g. ReduceIndexSorted and
MergeStreamIterator are used for merging sorted streams.

Could you please also clarify about score/relevance? Is it provided by
Lucene engine for each query result? I am thinking how to do sorted
merge properly in this case.

ср, 25 сент. 2019 г. в 18:56, Yuriy Shuliga :
>
> Ivan,
>
> Thank you for interesting question!
>
> Text searches (or full text searches) are mostly human-oriented. And the
> point of user's interest is topmost part of response.
> Then user can read it, evaluate and use the given records for further
> purposes.
>
> Particularly in our case, we use Ignite for operations with financial data,
> and there lots of text stuff like assets names, fin. instruments, companies
> etc.
> In order to operate with this quickly and reliably, users used to work with
> text search, type-ahead completions, suggestions.
>
> For this purposes we are indexing particular string data in separate caches.
>
> Sorting capabilities and response size limitations are very important
> there. As our API have to provide most relevant information in view of
> limited size.
>
> Now let me comment some Ignite/Lucene perspective.
> Actually Ignite queries and Lucene returns *TopDocs.scoresDocs *already
> sorted by *score *(relevance). So most relevant documents are on the top.
> And currently distributed queries responses from different nodes are merged
> into final query cursor queue in arbitrary way.
> So in fact we already have the score order ruined here. Also Ignite
> requests all possible documents from Lucene that is redundant and not good
> for performance.
>
> I'm implementing *limit* parameter to be part of *TextQuery *and have to
> notice that we still have to add sorting for text queries processing in
> order to have applicable results.
>
> *Limit* parameter itself should improve the part of issues from above, but
> definitely, sorting by document score at least  should be implemented along
> with limit.
>
> This is a pretty short commentary if you still have any questions, please
> ask, do not hesitate)
>
> BR,
> Yuriy Shuliha
>
> чт, 19 вер. 2019 о 11:38 Павлухин Иван  пише:
>
> > Yuriy,
> >
> > Greatly appreciate your interest.
> >
> > Could you please elaborate a little bit about sorting? What tasks does
> > it help to solve and how? It would be great to provide an example.
> >
> > ср, 18 сент. 2019 г. в 09:39, Alexei Scherbakov <
> > alexey.scherbak...@gmail.com>:
> > >
> > > Denis,
> > >
> > > I like the idea of throwing an exception for enabled text queries on
> > > persistent caches.
> > >
> > > Also I'm fine with proposed limit for unsorted searches.
> > >
> > > Yury, please proceed with ticket creation.
> > >
> > > вт, 17 сент. 2019 г., 22:06 Denis Magda :
> > >
> > > > Igniters,
> > > >
> > > > I see nothing wrong with Yury's proposal in regards full-text search
> > API
> > > > evolution as long as Yury is ready to push it forward.
> > > >
> > > > As for the in-memory mode only, it makes total sense for in-memory data
> > > > grid deployments when Ignite caches data of an underlying DB like
> > Postgres.
> > > > As part of the changes, I would simply throw an exception (by default)
> > if
> > > > the one attempts to use text indices with the native persistence
> > enabled.
> > > > If the person is ready to live with that limitation that an explicit
> > > > configuration change is needed to come around the exception.
> > > >
> > > > Thoughts?
> > > >
> > > >
> > > > -
> > > > Denis
> > > >
> > > >
> > > > On Tue, Sep 17, 2019 at 7:44 AM Yuriy Shuliga 
> > wrote:
> > > >
> > > > > Hello to all again,
> > > > >
> > > > > Thank you for important comments and notes given below!
> > > > >
> > > > > Let me answer and continue the discussion.
> > > > >
> > > > > (I) Overall needs in Lucene indexing
> > > > >
> > > > > Alexei has referenced to
> > > > > https://issues.apache.org/jira/browse/IGNITE-5371 where
> > > > > absence of index persistence was declared as an obstacle to further
> > > > > development.
> > > > >
> > > > > a) This ticket is already closed as not valid.b) There are definite
> > needs
> > > > > (and in our project as well) in just in-memory indexing of selected
> > data.
> > > > > We intend to use search capabilities for fetching limited amount of
> > > > records
> > > > > that should be used in type-ahead search / suggestions.
> > > > > Not all of the data will be indexed and the are no need in Lucene
> > index
> > > > to
> > > > > be persistence. Hope this is a wide pattern of text-search usage.
> > > > >
> > > > > (II) Necessary fixes in current implementation.
> > > > >
> > > > > a) Implementation of correct *limit *(*offset* seems to be not
> > required
> > > > in
> > > > > text-search tasks for now)
> > > > > I have investigated the data flow for distributed text queries. it
> > was
> > > > > simple

Re: New SQL execution engine

2019-09-27 Thread Nikolay Izhikov

Igor.

> There is no decision, here we should decide.

Great.

> At now Calcite based engine is placed in different module

What project hosted Calcite based engine?

> It’s possible to develop it as an experimental extension at first (not a 
> replacement)

For me, Ignite 3 are the place where the new engine has to be placed.
Personally, I'm against the support of two independent implementation of SQL 
engine for several releases.

Ignite has too many partially implemented features to include on more :)

Let's start with the IEP clarification and replace the SQL engine with the best 
one for Ignite good.


В Пт, 27/09/2019 в 18:08 +0300, Seliverstov Igor пишет:
> Nikolay,
> 
> At last we have better questions.
> 
> There is no decision, here we should decide.
> 
> Doing nothing isn’t a decision, it’s just doing nothing
> 
> Spark Catalyst is a good example, but under the hood it has absolutely the 
> same idea, but adopted to Spark. Calcite is the same, but general. That’s why 
> it’s better start point.
> 
> Implementing an engine from scratch is really cool, but looks like inventing 
> a bicycle, don’t think it makes sense. At least I against this option.
> 
> I added requirements to IEP (as you asked), you may see it’s in DRAFT state 
> and will be complemented by details.
> 
> We have some thoughts on how to make smooth replacement, but at first we 
> should decide what to replace and what with.
> 
> At now Calcite based engine is placed in different module, we checked it can 
> build execution graph for both local and distributed cases, it has good 
> expandability. 
> We talked to Calcite community to identify possible future issues and 
> everything points to the fact it’s the best option. 
> It’s possible to develop it as an experimental extension at first (not a 
> replacement) until we make sure that it works as expected. This way there are 
> no risks for anybody who uses Ignite on production environment.
> 
> Regards,
> Igor
> 
> 
> > 27 сент. 2019 г., в 17:25, Nikolay Izhikov  написал(а):
> > 
> > Igor.
> > 
> > > The main issue - there is no *selection*.
> > 
> > 1. I don't remember community decision about this.
> > 
> > 2. We should avoid to make such long-term decision so quickly.
> > We done this kind of decision with H2 and come to the point when we should 
> > review it.
> > 
> > > 1) Implementing white papers from scratch
> > > 2) Adopting Calcite to our needs.
> > 
> > The third option don't fix issues we have with H2.
> > The fourth option I know is using spark-catalyst.
> > 
> > What is wrong with writing engine from scratch?
> > 
> > I ask you to start with engine requirements.
> > Can we, please, discuss it?
> > 
> > > If you have an alternative - you're welcome, I'll gratefully listen to 
> > > you.
> > 
> > We have alternative for now - H2 based engine.
> > 
> > > The main question isn't "WHAT" but "HOW" - that's the discussion topic 
> > > from my point of view.
> > 
> > When we make a decision about engine we can discuss roadmap for replacement.
> > One more time - replacement of SQL engine to some more customizable make 
> > sense for me.
> > But, this kind of decisions need carefull discussion.
> > 
> > В Пт, 27/09/2019 в 17:08 +0300, Seliverstov Igor пишет:
> > > Nikolay,
> > > 
> > > The main issue - there is no *selection*.
> > > 
> > > There is a field of knowledge - relational algebra, which describes how 
> > > to transform relational expressions saving their semantics, and a couple 
> > > of implementations (Calcite is only one written in Java).
> > > 
> > > There are only two alternatives:
> > > 
> > > 1) Implementing white papers from scratch
> > > 2) Adopting Calcite to our needs.
> > > 
> > > The second way was chosen by several other projects, there is experience, 
> > > there is a list of known issues (like using indexes) so, almost 
> > > everything is already done for us.
> > > 
> > > Implementing a planner is a big deal, I think anybody understands it 
> > > there. That's why our proposal to reuse others experience is obvious.
> > > 
> > > If you have an alternative - you're welcome, I'll gratefully listen to 
> > > you.
> > > 
> > > The main question isn't "WHAT" but "HOW" - that's the discussion topic 
> > > from my point of view.
> > > 
> > > Regards,
> > > Igor
> > > 
> > > > 27 сент. 2019 г., в 16:37, Nikolay Izhikov  
> > > > написал(а):
> > > > 
> > > > Roman.
> > > > 
> > > > > Nikolay, Maxim, I understand that our arguments may not be as obvious 
> > > > > for you as it obvious for SQL team. So, please arrange your questions 
> > > > > in 
> > > > > a more constructive way.
> > > > 
> > > > What is SQL team?
> > > > I only know Ignite community :)
> > > > 
> > > > Please, share you knowledge in IEP.
> > > > I want to join to the process of engine *selection*.
> > > > It should start with the requirements to such engine.
> > > > Can you write it in IEP, please?
> > > > 
> > > > My point is very simple:
> > > > 
> > > > 1. We made the wrong decision w

Re: New SQL execution engine

2019-09-27 Thread Nikolay Izhikov

Hello, Andrey.

Thanks, it's more clear now.

> I agree, we should make IEP clear to everyone in community who want to be 
> involved in IEP implementation at first.

Great! 
Looking forward for IEP clarification.


В Пт, 27/09/2019 в 18:07 +0300, Andrey Mashenkov пишет:
> Nikolay, Igor.
> 
> Implementing from scratch is an option, of course.
> If we decide to go this way then we definitely won't to spend long nights
> to invent "yet another SQL parser" with all the stuff related to query
> rewrite rules (e.g. IN -> JOIN) or type casting \ validation \ conversion.
> 
> We thought about step-by-step H2 replacing.
> 1. We've tried to make POC with parser replacement to generated one from
> SQL grammar with ASM,
> but this approach looks slow, AFAIR. Gridgainers, anybody, have smth on
> this?
> 
> 2. Then we need a planner with all the rules.
> Of course we will need to write rules optimized for "Distributed" execution
> in anyway, but I doubt anybody want to write common-rules that already has
> Calcite.
> We can copy-paste, but what for?
> 
> 3. Then we have to implement execution pipeline.
> Possibly, we can adopt new query plans for H2 execution, but then we will
> still have same pain with resolving H2 internal issues (e.g. OOM).
> H2 approach is outdated, it doesn't fit Ignite needs as distributes system.
> 
> With Calcite we can concentrate on 2 and (mostly) 3 points and reuse
> their architectural abstracts, otherwise we should reinvent those abstracts
> through long discussions on dev-list.
> 
> I agree, we should make IEP clear to everyone in community who want to be
> involved in IEP implementation at first.
> Both approaches ("from scratch" and  "with Calcite") are risky, so
> 
> Can we try to make an additional engine "beta"-implementation and allow
> users fallback to old engine until a new one will be decided to become
> mature enough.
> 
> 
> 
> 
> On Fri, Sep 27, 2019 at 5:08 PM Seliverstov Igor 
> wrote:
> 
> > Nikolay,
> > 
> > The main issue - there is no *selection*.
> > 
> > There is a field of knowledge - relational algebra, which describes how to
> > transform relational expressions saving their semantics, and a couple of
> > implementations (Calcite is only one written in Java).
> > 
> > There are only two alternatives:
> > 
> > 1) Implementing white papers from scratch
> > 2) Adopting Calcite to our needs.
> > 
> > The second way was chosen by several other projects, there is experience,
> > there is a list of known issues (like using indexes) so, almost everything
> > is already done for us.
> > 
> > Implementing a planner is a big deal, I think anybody understands it
> > there. That's why our proposal to reuse others experience is obvious.
> > 
> > If you have an alternative - you're welcome, I'll gratefully listen to you.
> > 
> > The main question isn't "WHAT" but "HOW" - that's the discussion topic
> > from my point of view.
> > 
> > Regards,
> > Igor
> > 
> > > 27 сент. 2019 г., в 16:37, Nikolay Izhikov 
> > 
> > написал(а):
> > > 
> > > Roman.
> > > 
> > > > Nikolay, Maxim, I understand that our arguments may not be as obvious
> > > > for you as it obvious for SQL team. So, please arrange your questions
> > 
> > in
> > > > a more constructive way.
> > > 
> > > What is SQL team?
> > > I only know Ignite community :)
> > > 
> > > Please, share you knowledge in IEP.
> > > I want to join to the process of engine *selection*.
> > > It should start with the requirements to such engine.
> > > Can you write it in IEP, please?
> > > 
> > > My point is very simple:
> > > 
> > > 1. We made the wrong decision with H2
> > > 2. We should make a well-thought decision about the new engine.
> > > 
> > > > How many tickets would satisfy you?
> > > 
> > > You write about "issueS" with the H2.
> > > All I see is one open ticket.
> > > IEP doesn't provide enough information.
> > > So it's not about the number of tickets, it's about
> > > 
> > > > These two points (single map-reduce execution and inflexible optimizer)
> > > > are the main problems with the current engine.
> > > 
> > > We may come to the point when Calcite(or any other engine) brings us
> > 
> > third and other "main problems".
> > > This is how it happens with H2.
> > > 
> > > Let's start from what we want to get with the engine and move forward
> > 
> > from this base.
> > > What do you think?
> > > 
> > > 
> > > 
> > > В Пт, 27/09/2019 в 16:15 +0300, Roman Kondakov пишет:
> > > > Maxim, Nikolay,
> > > > 
> > > > I've listed two issues which show the ideological flaws of the current
> > > > engine.
> > > > 
> > > > 1. IGNITE-11448 - Open. This ticket describes the impossibility of
> > > > executing queries which can not be fit in the hardcoded one pass
> > > > map-reduce paradigm.
> > > > 
> > > > 2. IGNITE-6085 - Closed (won't fix) - This ticket describes the second
> > > > major problem with the current engine: H2 query optimizer is very
> > > > primitive and can not perform many useful optimizations.
> > > > 
> > > > These two

Re: New SQL execution engine

2019-09-27 Thread Maxim Muzafarov

Folks,
especially Ignite PMCs,

Are there any plans about how Ignite SQL will be evolved? It is a very
interesting thread on how Ignite SQL as a product will be developed
for the near future e.g. supporting new standards etc.

According to documentation Ignite complies with SQL ANSI-99 [2] but in
fact (correct me if I'm wrong) it doesn't support recursive queries
[1] (the issue mentioned by Andrey), right? Will it be solvable by the
new engine?

[1] https://issues.apache.org/jira/browse/IGNITE-5475
[2] http://ignite.apache.org/use-cases/database/sql-database.html

On Fri, 27 Sep 2019 at 17:22, Nikolay Izhikov  wrote:
>
> Igor.
>
> > The main issue - there is no *selection*.
>
> 1. I don't remember community decision about this.
>
> 2. We should avoid to make such long-term decision so quickly.
> We done this kind of decision with H2 and come to the point when we should 
> review it.
>
> > 1) Implementing white papers from scratch
> > 2) Adopting Calcite to our needs.
>
> The third option don't fix issues we have with H2.
> The fourth option I know is using spark-catalyst.
>
> What is wrong with writing engine from scratch?
>
> I ask you to start with engine requirements.
> Can we, please, discuss it?
>
> > If you have an alternative - you're welcome, I'll gratefully listen to you.
>
> We have alternative for now - H2 based engine.
>
> > The main question isn't "WHAT" but "HOW" - that's the discussion topic from 
> > my point of view.
>
> When we make a decision about engine we can discuss roadmap for replacement.
> One more time - replacement of SQL engine to some more customizable make 
> sense for me.
> But, this kind of decisions need carefull discussion.
>
> В Пт, 27/09/2019 в 17:08 +0300, Seliverstov Igor пишет:
> > Nikolay,
> >
> > The main issue - there is no *selection*.
> >
> > There is a field of knowledge - relational algebra, which describes how to 
> > transform relational expressions saving their semantics, and a couple of 
> > implementations (Calcite is only one written in Java).
> >
> > There are only two alternatives:
> >
> > 1) Implementing white papers from scratch
> > 2) Adopting Calcite to our needs.
> >
> > The second way was chosen by several other projects, there is experience, 
> > there is a list of known issues (like using indexes) so, almost everything 
> > is already done for us.
> >
> > Implementing a planner is a big deal, I think anybody understands it there. 
> > That's why our proposal to reuse others experience is obvious.
> >
> > If you have an alternative - you're welcome, I'll gratefully listen to you.
> >
> > The main question isn't "WHAT" but "HOW" - that's the discussion topic from 
> > my point of view.
> >
> > Regards,
> > Igor
> >
> > > 27 сент. 2019 г., в 16:37, Nikolay Izhikov  
> > > написал(а):
> > >
> > > Roman.
> > >
> > > > Nikolay, Maxim, I understand that our arguments may not be as obvious
> > > > for you as it obvious for SQL team. So, please arrange your questions in
> > > > a more constructive way.
> > >
> > > What is SQL team?
> > > I only know Ignite community :)
> > >
> > > Please, share you knowledge in IEP.
> > > I want to join to the process of engine *selection*.
> > > It should start with the requirements to such engine.
> > > Can you write it in IEP, please?
> > >
> > > My point is very simple:
> > >
> > > 1. We made the wrong decision with H2
> > > 2. We should make a well-thought decision about the new engine.
> > >
> > > > How many tickets would satisfy you?
> > >
> > > You write about "issueS" with the H2.
> > > All I see is one open ticket.
> > > IEP doesn't provide enough information.
> > > So it's not about the number of tickets, it's about
> > >
> > > > These two points (single map-reduce execution and inflexible optimizer)
> > > > are the main problems with the current engine.
> > >
> > > We may come to the point when Calcite(or any other engine) brings us 
> > > third and other "main problems".
> > > This is how it happens with H2.
> > >
> > > Let's start from what we want to get with the engine and move forward 
> > > from this base.
> > > What do you think?
> > >
> > >
> > >
> > > В Пт, 27/09/2019 в 16:15 +0300, Roman Kondakov пишет:
> > > > Maxim, Nikolay,
> > > >
> > > > I've listed two issues which show the ideological flaws of the current
> > > > engine.
> > > >
> > > > 1. IGNITE-11448 - Open. This ticket describes the impossibility of
> > > > executing queries which can not be fit in the hardcoded one pass
> > > > map-reduce paradigm.
> > > >
> > > > 2. IGNITE-6085 - Closed (won't fix) - This ticket describes the second
> > > > major problem with the current engine: H2 query optimizer is very
> > > > primitive and can not perform many useful optimizations.
> > > >
> > > > These two points (single map-reduce execution and inflexible optimizer)
> > > > are the main problems with the current engine. It means that our engine
> > > > is currently  suitable for execution only a very limited subset of the
> > > > typical SQL queri

Re: New SQL execution engine

2019-09-27 Thread Seliverstov Igor

Nikolay,

At last we have better questions.

There is no decision, here we should decide.

Doing nothing isn’t a decision, it’s just doing nothing

Spark Catalyst is a good example, but under the hood it has absolutely the same 
idea, but adopted to Spark. Calcite is the same, but general. That’s why it’s 
better start point.

Implementing an engine from scratch is really cool, but looks like inventing a 
bicycle, don’t think it makes sense. At least I against this option.

I added requirements to IEP (as you asked), you may see it’s in DRAFT state and 
will be complemented by details.

We have some thoughts on how to make smooth replacement, but at first we should 
decide what to replace and what with.

At now Calcite based engine is placed in different module, we checked it can 
build execution graph for both local and distributed cases, it has good 
expandability. 
We talked to Calcite community to identify possible future issues and 
everything points to the fact it’s the best option. 
It’s possible to develop it as an experimental extension at first (not a 
replacement) until we make sure that it works as expected. This way there are 
no risks for anybody who uses Ignite on production environment.

Regards,
Igor


> 27 сент. 2019 г., в 17:25, Nikolay Izhikov  написал(а):
> 
> Igor.
> 
>> The main issue - there is no *selection*.
> 
> 1. I don't remember community decision about this.
> 
> 2. We should avoid to make such long-term decision so quickly.
> We done this kind of decision with H2 and come to the point when we should 
> review it.
> 
>> 1) Implementing white papers from scratch
>> 2) Adopting Calcite to our needs.
> 
> The third option don't fix issues we have with H2.
> The fourth option I know is using spark-catalyst.
> 
> What is wrong with writing engine from scratch?
> 
> I ask you to start with engine requirements.
> Can we, please, discuss it?
> 
>> If you have an alternative - you're welcome, I'll gratefully listen to you.
> 
> We have alternative for now - H2 based engine.
> 
>> The main question isn't "WHAT" but "HOW" - that's the discussion topic from 
>> my point of view.
> 
> When we make a decision about engine we can discuss roadmap for replacement.
> One more time - replacement of SQL engine to some more customizable make 
> sense for me.
> But, this kind of decisions need carefull discussion.
> 
> В Пт, 27/09/2019 в 17:08 +0300, Seliverstov Igor пишет:
>> Nikolay,
>> 
>> The main issue - there is no *selection*.
>> 
>> There is a field of knowledge - relational algebra, which describes how to 
>> transform relational expressions saving their semantics, and a couple of 
>> implementations (Calcite is only one written in Java).
>> 
>> There are only two alternatives:
>> 
>> 1) Implementing white papers from scratch
>> 2) Adopting Calcite to our needs.
>> 
>> The second way was chosen by several other projects, there is experience, 
>> there is a list of known issues (like using indexes) so, almost everything 
>> is already done for us.
>> 
>> Implementing a planner is a big deal, I think anybody understands it there. 
>> That's why our proposal to reuse others experience is obvious.
>> 
>> If you have an alternative - you're welcome, I'll gratefully listen to you.
>> 
>> The main question isn't "WHAT" but "HOW" - that's the discussion topic from 
>> my point of view.
>> 
>> Regards,
>> Igor
>> 
>>> 27 сент. 2019 г., в 16:37, Nikolay Izhikov  написал(а):
>>> 
>>> Roman.
>>> 
 Nikolay, Maxim, I understand that our arguments may not be as obvious 
 for you as it obvious for SQL team. So, please arrange your questions in 
 a more constructive way.
>>> 
>>> What is SQL team?
>>> I only know Ignite community :)
>>> 
>>> Please, share you knowledge in IEP.
>>> I want to join to the process of engine *selection*.
>>> It should start with the requirements to such engine.
>>> Can you write it in IEP, please?
>>> 
>>> My point is very simple:
>>> 
>>> 1. We made the wrong decision with H2
>>> 2. We should make a well-thought decision about the new engine.
>>> 
 How many tickets would satisfy you?
>>> 
>>> You write about "issueS" with the H2.
>>> All I see is one open ticket.
>>> IEP doesn't provide enough information.
>>> So it's not about the number of tickets, it's about
>>> 
 These two points (single map-reduce execution and inflexible optimizer) 
 are the main problems with the current engine.
>>> 
>>> We may come to the point when Calcite(or any other engine) brings us third 
>>> and other "main problems".
>>> This is how it happens with H2.
>>> 
>>> Let's start from what we want to get with the engine and move forward from 
>>> this base.
>>> What do you think?
>>> 
>>> 
>>> 
>>> В Пт, 27/09/2019 в 16:15 +0300, Roman Kondakov пишет:
 Maxim, Nikolay,
 
 I've listed two issues which show the ideological flaws of the current 
 engine.
 
 1. IGNITE-11448 - Open. This ticket describes the impossibility of 
 executing queries which can

Re: New SQL execution engine

2019-09-27 Thread Andrey Mashenkov

Nikolay, Igor.

Implementing from scratch is an option, of course.
If we decide to go this way then we definitely won't to spend long nights
to invent "yet another SQL parser" with all the stuff related to query
rewrite rules (e.g. IN -> JOIN) or type casting \ validation \ conversion.

We thought about step-by-step H2 replacing.
1. We've tried to make POC with parser replacement to generated one from
SQL grammar with ASM,
but this approach looks slow, AFAIR. Gridgainers, anybody, have smth on
this?

2. Then we need a planner with all the rules.
Of course we will need to write rules optimized for "Distributed" execution
in anyway, but I doubt anybody want to write common-rules that already has
Calcite.
We can copy-paste, but what for?

3. Then we have to implement execution pipeline.
Possibly, we can adopt new query plans for H2 execution, but then we will
still have same pain with resolving H2 internal issues (e.g. OOM).
H2 approach is outdated, it doesn't fit Ignite needs as distributes system.

With Calcite we can concentrate on 2 and (mostly) 3 points and reuse
their architectural abstracts, otherwise we should reinvent those abstracts
through long discussions on dev-list.

I agree, we should make IEP clear to everyone in community who want to be
involved in IEP implementation at first.
Both approaches ("from scratch" and  "with Calcite") are risky, so

Can we try to make an additional engine "beta"-implementation and allow
users fallback to old engine until a new one will be decided to become
mature enough.

On Fri, Sep 27, 2019 at 5:08 PM Seliverstov Igor 
wrote:

> Nikolay,
>
> The main issue - there is no *selection*.
>
> There is a field of knowledge - relational algebra, which describes how to
> transform relational expressions saving their semantics, and a couple of
> implementations (Calcite is only one written in Java).
>
> There are only two alternatives:
>
> 1) Implementing white papers from scratch
> 2) Adopting Calcite to our needs.
>
> The second way was chosen by several other projects, there is experience,
> there is a list of known issues (like using indexes) so, almost everything
> is already done for us.
>
> Implementing a planner is a big deal, I think anybody understands it
> there. That's why our proposal to reuse others experience is obvious.
>
> If you have an alternative - you're welcome, I'll gratefully listen to you.
>
> The main question isn't "WHAT" but "HOW" - that's the discussion topic
> from my point of view.
>
> Regards,
> Igor
>
> > 27 сент. 2019 г., в 16:37, Nikolay Izhikov 
> написал(а):
> >
> > Roman.
> >
> >> Nikolay, Maxim, I understand that our arguments may not be as obvious
> >> for you as it obvious for SQL team. So, please arrange your questions
> in
> >> a more constructive way.
> >
> > What is SQL team?
> > I only know Ignite community :)
> >
> > Please, share you knowledge in IEP.
> > I want to join to the process of engine *selection*.
> > It should start with the requirements to such engine.
> > Can you write it in IEP, please?
> >
> > My point is very simple:
> >
> > 1. We made the wrong decision with H2
> > 2. We should make a well-thought decision about the new engine.
> >
> >> How many tickets would satisfy you?
> >
> > You write about "issueS" with the H2.
> > All I see is one open ticket.
> > IEP doesn't provide enough information.
> > So it's not about the number of tickets, it's about
> >
> >> These two points (single map-reduce execution and inflexible optimizer)
> >> are the main problems with the current engine.
> >
> > We may come to the point when Calcite(or any other engine) brings us
> third and other "main problems".
> > This is how it happens with H2.
> >
> > Let's start from what we want to get with the engine and move forward
> from this base.
> > What do you think?
> >
> >
> >
> > В Пт, 27/09/2019 в 16:15 +0300, Roman Kondakov пишет:
> >> Maxim, Nikolay,
> >>
> >> I've listed two issues which show the ideological flaws of the current
> >> engine.
> >>
> >> 1. IGNITE-11448 - Open. This ticket describes the impossibility of
> >> executing queries which can not be fit in the hardcoded one pass
> >> map-reduce paradigm.
> >>
> >> 2. IGNITE-6085 - Closed (won't fix) - This ticket describes the second
> >> major problem with the current engine: H2 query optimizer is very
> >> primitive and can not perform many useful optimizations.
> >>
> >> These two points (single map-reduce execution and inflexible optimizer)
> >> are the main problems with the current engine. It means that our engine
> >> is currently  suitable for execution only a very limited subset of the
> >> typical SQL queries. For example it can not even run most of the TPC-H
> >> benchmark queries because they don't fit to the simple map-reduce
> paradigm.
> >>
> >>> All I see is links to two tickets:
> >>
> >> How many tickets would satisfy you? I named two. And it looks like it
> is
> >> not enough from your point of view. Ok, so how many is enough? The set
> >> of problem

Re: How to free up space on disc after removing entries from IgniteCache with enabled PDS?

2019-09-27 Thread Alexei Scherbakov

Probably this should be allowed to do using public API, actually this is
same as manual rebalancing.

пт, 27 сент. 2019 г. в 17:40, Alexei Scherbakov <
alexey.scherbak...@gmail.com>:

> The poor man's solution for the problem would be stopping fragmented node
> and removing partition data, then starting it again allowing full state
> transfer already without deletes.
> Rinse and repeat for all owners.
>
> Anton Vinogradov, would this work for you as workaround ?
>
> чт, 19 сент. 2019 г. в 13:03, Anton Vinogradov :
>
>> Alexey,
>>
>> Let's combine your and Ivan's proposals.
>>
>> >> vacuum command, which acquires exclusive table lock, so no concurrent
>> activities on the table are possible.
>> and
>> >> Could the problem be solved by stopping a node which needs to be
>> defragmented, clearing persistence files and restarting the node?
>> >> After rebalancing the node will receive all data back without
>> fragmentation.
>>
>> How about to have special partition state SHRINKING?
>> This state should mean that partition unavailable for reads and updates
>> but
>> should keep it's update-counters and should not be marked as lost, renting
>> or evicted.
>> At this state we able to iterate over the partition and apply it's entries
>> to another file in a compact way.
>> Indices should be updated during the copy-on-shrink procedure or at the
>> shrink completion.
>> Once shrank file is ready we should replace the original partition file
>> with it and mark it as MOVING which will start the historical rebalance.
>> Shrinking should be performed during the low activity periods, but even in
>> case we found that activity was high and historical rebalance is not
>> suitable we may just remove the file and use regular rebalance to restore
>> the partition (this will also lead to shrink).
>>
>> BTW, seems, we able to implement partition shrink in a cheap way.
>> We may just use rebalancing code to apply fat partition's entries to the
>> new file.
>> So, 3 stages here: local rebalance, indices update and global historical
>> rebalance.
>>
>> On Thu, Sep 19, 2019 at 11:43 AM Alexey Goncharuk <
>> alexey.goncha...@gmail.com> wrote:
>>
>> > Anton,
>> >
>> >
>> > > >>  The solution which Anton suggested does not look easy because it
>> will
>> > > most likely significantly hurt performance
>> > > Mostly agree here, but what drop do we expect? What price do we ready
>> to
>> > > pay?
>> > > Not sure, but seems some vendors ready to pay, for example, 5% drop
>> for
>> > > this.
>> >
>> > 5% may be a big drop for some use-cases, so I think we should look at
>> how
>> > to improve performance, not how to make it worse.
>> >
>> >
>> > >
>> > > >> it is hard to maintain a data structure to choose "page from
>> free-list
>> > > with enough space closest to the beginning of the file".
>> > > We can just split each free-list bucket to the couple and use first
>> for
>> > > pages in the first half of the file and the second for the last.
>> > > Only two buckets required here since, during the file shrink, first
>> > > bucket's window will be shrank too.
>> > > Seems, this give us the same price on put, just use the first bucket
>> in
>> > > case it's not empty.
>> > > Remove price (with merge) will be increased, of course.
>> > >
>> > > The compromise solution is to have priority put (to the first path of
>> the
>> > > file), with keeping removal as is, and schedulable per-page migration
>> for
>> > > the rest of the data during the low activity period.
>> > >
>> > Free lists are large and slow by themselves, it is expensive to
>> checkpoint
>> > and read them on start, so as a long-term solution I would look into
>> > removing them. Moreover, not sure if adding yet another background
>> process
>> > will improve the codebase reliability and simplicity.
>> >
>> > If we want to go the hard path, I would look at free page tracking
>> bitmap -
>> > a special bitmask page, where each page in an adjacent block is marked
>> as 0
>> > if it has free space more than a certain configurable threshold (say,
>> 80%)
>> > - free, and 1 if less (full). Some vendors have successfully implemented
>> > this approach, which looks much more promising, but harder to implement.
>> >
>> > --AG
>> >
>>
>
>
> --
>
> Best regards,
> Alexei Scherbakov
>


-- 

Best regards,
Alexei Scherbakov

Re: How to free up space on disc after removing entries from IgniteCache with enabled PDS?

2019-09-27 Thread Alexei Scherbakov

The poor man's solution for the problem would be stopping fragmented node
and removing partition data, then starting it again allowing full state
transfer already without deletes.
Rinse and repeat for all owners.

Anton Vinogradov, would this work for you as workaround ?

чт, 19 сент. 2019 г. в 13:03, Anton Vinogradov :

> Alexey,
>
> Let's combine your and Ivan's proposals.
>
> >> vacuum command, which acquires exclusive table lock, so no concurrent
> activities on the table are possible.
> and
> >> Could the problem be solved by stopping a node which needs to be
> defragmented, clearing persistence files and restarting the node?
> >> After rebalancing the node will receive all data back without
> fragmentation.
>
> How about to have special partition state SHRINKING?
> This state should mean that partition unavailable for reads and updates but
> should keep it's update-counters and should not be marked as lost, renting
> or evicted.
> At this state we able to iterate over the partition and apply it's entries
> to another file in a compact way.
> Indices should be updated during the copy-on-shrink procedure or at the
> shrink completion.
> Once shrank file is ready we should replace the original partition file
> with it and mark it as MOVING which will start the historical rebalance.
> Shrinking should be performed during the low activity periods, but even in
> case we found that activity was high and historical rebalance is not
> suitable we may just remove the file and use regular rebalance to restore
> the partition (this will also lead to shrink).
>
> BTW, seems, we able to implement partition shrink in a cheap way.
> We may just use rebalancing code to apply fat partition's entries to the
> new file.
> So, 3 stages here: local rebalance, indices update and global historical
> rebalance.
>
> On Thu, Sep 19, 2019 at 11:43 AM Alexey Goncharuk <
> alexey.goncha...@gmail.com> wrote:
>
> > Anton,
> >
> >
> > > >>  The solution which Anton suggested does not look easy because it
> will
> > > most likely significantly hurt performance
> > > Mostly agree here, but what drop do we expect? What price do we ready
> to
> > > pay?
> > > Not sure, but seems some vendors ready to pay, for example, 5% drop for
> > > this.
> >
> > 5% may be a big drop for some use-cases, so I think we should look at how
> > to improve performance, not how to make it worse.
> >
> >
> > >
> > > >> it is hard to maintain a data structure to choose "page from
> free-list
> > > with enough space closest to the beginning of the file".
> > > We can just split each free-list bucket to the couple and use first for
> > > pages in the first half of the file and the second for the last.
> > > Only two buckets required here since, during the file shrink, first
> > > bucket's window will be shrank too.
> > > Seems, this give us the same price on put, just use the first bucket in
> > > case it's not empty.
> > > Remove price (with merge) will be increased, of course.
> > >
> > > The compromise solution is to have priority put (to the first path of
> the
> > > file), with keeping removal as is, and schedulable per-page migration
> for
> > > the rest of the data during the low activity period.
> > >
> > Free lists are large and slow by themselves, it is expensive to
> checkpoint
> > and read them on start, so as a long-term solution I would look into
> > removing them. Moreover, not sure if adding yet another background
> process
> > will improve the codebase reliability and simplicity.
> >
> > If we want to go the hard path, I would look at free page tracking
> bitmap -
> > a special bitmask page, where each page in an adjacent block is marked
> as 0
> > if it has free space more than a certain configurable threshold (say,
> 80%)
> > - free, and 1 if less (full). Some vendors have successfully implemented
> > this approach, which looks much more promising, but harder to implement.
> >
> > --AG
> >
>


-- 

Best regards,
Alexei Scherbakov

Re: New SQL execution engine

2019-09-27 Thread Nikolay Izhikov

Igor.

> The main issue - there is no *selection*.

1. I don't remember community decision about this.

2. We should avoid to make such long-term decision so quickly.
We done this kind of decision with H2 and come to the point when we should 
review it.

> 1) Implementing white papers from scratch
> 2) Adopting Calcite to our needs.

The third option don't fix issues we have with H2.
The fourth option I know is using spark-catalyst.

What is wrong with writing engine from scratch?

I ask you to start with engine requirements.
Can we, please, discuss it?

> If you have an alternative - you're welcome, I'll gratefully listen to you.

We have alternative for now - H2 based engine.

> The main question isn't "WHAT" but "HOW" - that's the discussion topic from 
> my point of view.

When we make a decision about engine we can discuss roadmap for replacement.
One more time - replacement of SQL engine to some more customizable make sense 
for me.
But, this kind of decisions need carefull discussion.

В Пт, 27/09/2019 в 17:08 +0300, Seliverstov Igor пишет:
> Nikolay,
> 
> The main issue - there is no *selection*.
> 
> There is a field of knowledge - relational algebra, which describes how to 
> transform relational expressions saving their semantics, and a couple of 
> implementations (Calcite is only one written in Java).
> 
> There are only two alternatives:
> 
> 1) Implementing white papers from scratch
> 2) Adopting Calcite to our needs.
> 
> The second way was chosen by several other projects, there is experience, 
> there is a list of known issues (like using indexes) so, almost everything is 
> already done for us.
> 
> Implementing a planner is a big deal, I think anybody understands it there. 
> That's why our proposal to reuse others experience is obvious.
> 
> If you have an alternative - you're welcome, I'll gratefully listen to you.
> 
> The main question isn't "WHAT" but "HOW" - that's the discussion topic from 
> my point of view.
> 
> Regards,
> Igor
> 
> > 27 сент. 2019 г., в 16:37, Nikolay Izhikov  написал(а):
> > 
> > Roman.
> > 
> > > Nikolay, Maxim, I understand that our arguments may not be as obvious 
> > > for you as it obvious for SQL team. So, please arrange your questions in 
> > > a more constructive way.
> > 
> > What is SQL team?
> > I only know Ignite community :)
> > 
> > Please, share you knowledge in IEP.
> > I want to join to the process of engine *selection*.
> > It should start with the requirements to such engine.
> > Can you write it in IEP, please?
> > 
> > My point is very simple:
> > 
> > 1. We made the wrong decision with H2
> > 2. We should make a well-thought decision about the new engine.
> > 
> > > How many tickets would satisfy you?
> > 
> > You write about "issueS" with the H2.
> > All I see is one open ticket.
> > IEP doesn't provide enough information.
> > So it's not about the number of tickets, it's about
> > 
> > > These two points (single map-reduce execution and inflexible optimizer) 
> > > are the main problems with the current engine.
> > 
> > We may come to the point when Calcite(or any other engine) brings us third 
> > and other "main problems".
> > This is how it happens with H2.
> > 
> > Let's start from what we want to get with the engine and move forward from 
> > this base.
> > What do you think?
> > 
> > 
> > 
> > В Пт, 27/09/2019 в 16:15 +0300, Roman Kondakov пишет:
> > > Maxim, Nikolay,
> > > 
> > > I've listed two issues which show the ideological flaws of the current 
> > > engine.
> > > 
> > > 1. IGNITE-11448 - Open. This ticket describes the impossibility of 
> > > executing queries which can not be fit in the hardcoded one pass 
> > > map-reduce paradigm.
> > > 
> > > 2. IGNITE-6085 - Closed (won't fix) - This ticket describes the second 
> > > major problem with the current engine: H2 query optimizer is very 
> > > primitive and can not perform many useful optimizations.
> > > 
> > > These two points (single map-reduce execution and inflexible optimizer) 
> > > are the main problems with the current engine. It means that our engine 
> > > is currently  suitable for execution only a very limited subset of the 
> > > typical SQL queries. For example it can not even run most of the TPC-H 
> > > benchmark queries because they don't fit to the simple map-reduce 
> > > paradigm.
> > > 
> > > > All I see is links to two tickets:
> > > 
> > > How many tickets would satisfy you? I named two. And it looks like it is 
> > > not enough from your point of view. Ok, so how many is enough? The set 
> > > of problems caused by listed above tickets is infinite, therefore I can 
> > > not create a ticket for each of them.
> > > > Tech details also should be added.
> > > 
> > > Tech details are in the tickets.
> > > 
> > > > We can't discuss such a huge change as an execution engine replacement 
> > > > with descrition like:
> > > > "No data co-location control, i.e. arbitrary data can be returned 
> > > > silently" or
> > > > "Low control on how query execute

Re: New SQL execution engine

2019-09-27 Thread Nikolay Izhikov

Thanks, Andrey! 

Will take a loo, shortly.

В Пт, 27/09/2019 в 17:19 +0300, Andrey Mashenkov пишет:
> Issues can't be resolved without changes in H2.
> Hope, this will be enough.
> 
> https://issues.apache.org/jira/browse/IGNITE-10598
> https://issues.apache.org/jira/browse/IGNITE-11473
> https://issues.apache.org/jira/browse/IGNITE-11444
> https://issues.apache.org/jira/browse/IGNITE-5289
> https://issues.apache.org/jira/browse/IGNITE-10855
> https://issues.apache.org/jira/browse/IGNITE-11341
> https://issues.apache.org/jira/browse/IGNITE-7526
> https://issues.apache.org/jira/browse/IGNITE-9480
> https://issues.apache.org/jira/browse/IGNITE-9616
> https://issues.apache.org/jira/browse/IGNITE-11891
> https://issues.apache.org/jira/browse/IGNITE-6202
> https://issues.apache.org/jira/browse/IGNITE-11448
> https://issues.apache.org/jira/browse/IGNITE-3911
> 
> 
> On Fri, Sep 27, 2019 at 4:34 PM Nikolay Izhikov  wrote:
> 
> > Roman.
> > 
> > > Nikolay, Maxim, I understand that our arguments may not be as obvious
> > > for you as it obvious for SQL team. So, please arrange your questions in
> > > a more constructive way.
> > 
> > What is SQL team?
> > I only know Ignite community :)
> > 
> > Please, share you knowledge in IEP.
> > I want to join to the process of engine *selection*.
> > It should start with the requirements to such engine.
> > Can you write it in IEP, please?
> > 
> > My point is very simple:
> > 
> > 1. We made the wrong decision with H2
> > 2. We should make a well-thought decision about the new engine.
> > 
> > > How many tickets would satisfy you?
> > 
> > You write about "issueS" with the H2.
> > All I see is one open ticket.
> > IEP doesn't provide enough information.
> > So it's not about the number of tickets, it's about
> > 
> > > These two points (single map-reduce execution and inflexible optimizer)
> > > are the main problems with the current engine.
> > 
> > We may come to the point when Calcite(or any other engine) brings us third
> > and other "main problems".
> > This is how it happens with H2.
> > 
> > Let's start from what we want to get with the engine and move forward from
> > this base.
> > What do you think?
> > 
> > 
> > 
> > В Пт, 27/09/2019 в 16:15 +0300, Roman Kondakov пишет:
> > > Maxim, Nikolay,
> > > 
> > > I've listed two issues which show the ideological flaws of the current
> > > engine.
> > > 
> > > 1. IGNITE-11448 - Open. This ticket describes the impossibility of
> > > executing queries which can not be fit in the hardcoded one pass
> > > map-reduce paradigm.
> > > 
> > > 2. IGNITE-6085 - Closed (won't fix) - This ticket describes the second
> > > major problem with the current engine: H2 query optimizer is very
> > > primitive and can not perform many useful optimizations.
> > > 
> > > These two points (single map-reduce execution and inflexible optimizer)
> > > are the main problems with the current engine. It means that our engine
> > > is currently  suitable for execution only a very limited subset of the
> > > typical SQL queries. For example it can not even run most of the TPC-H
> > > benchmark queries because they don't fit to the simple map-reduce
> > 
> > paradigm.
> > > 
> > > > All I see is links to two tickets:
> > > 
> > > How many tickets would satisfy you? I named two. And it looks like it is
> > > not enough from your point of view. Ok, so how many is enough? The set
> > > of problems caused by listed above tickets is infinite, therefore I can
> > > not create a ticket for each of them.
> > > > Tech details also should be added.
> > > 
> > > Tech details are in the tickets.
> > > 
> > > > We can't discuss such a huge change as an execution engine replacement
> > 
> > with descrition like:
> > > > "No data co-location control, i.e. arbitrary data can be returned
> > 
> > silently" or
> > > > "Low control on how query executes internally, as a result we have
> > 
> > limited possibility to implement improvements/fixes."
> > > 
> > > Why not? Don't you understand these problems? Or you don't think this is
> > > a problem?
> > > 
> > > > Let's make these descriptions more specific.
> > > 
> > > What do you mean by "more specific"? What is the criteria of the
> > > specific description?
> > > 
> > > 
> > > 
> > > Nikolay, Maxim, I understand that our arguments may not be as obvious
> > > for you as it obvious for SQL team. So, please arrange your questions in
> > > a more constructive way.
> > > 
> > > Thank you!
> 
> 


signature.asc
Description: This is a digitally signed message part

Re: New SQL execution engine

2019-09-27 Thread Andrey Mashenkov

Issues can't be resolved without changes in H2.
Hope, this will be enough.

https://issues.apache.org/jira/browse/IGNITE-10598
https://issues.apache.org/jira/browse/IGNITE-11473
https://issues.apache.org/jira/browse/IGNITE-11444
https://issues.apache.org/jira/browse/IGNITE-5289
https://issues.apache.org/jira/browse/IGNITE-10855
https://issues.apache.org/jira/browse/IGNITE-11341
https://issues.apache.org/jira/browse/IGNITE-7526
https://issues.apache.org/jira/browse/IGNITE-9480
https://issues.apache.org/jira/browse/IGNITE-9616
https://issues.apache.org/jira/browse/IGNITE-11891
https://issues.apache.org/jira/browse/IGNITE-6202
https://issues.apache.org/jira/browse/IGNITE-11448
https://issues.apache.org/jira/browse/IGNITE-3911


On Fri, Sep 27, 2019 at 4:34 PM Nikolay Izhikov  wrote:

> Roman.
>
> > Nikolay, Maxim, I understand that our arguments may not be as obvious
> > for you as it obvious for SQL team. So, please arrange your questions in
> > a more constructive way.
>
> What is SQL team?
> I only know Ignite community :)
>
> Please, share you knowledge in IEP.
> I want to join to the process of engine *selection*.
> It should start with the requirements to such engine.
> Can you write it in IEP, please?
>
> My point is very simple:
>
> 1. We made the wrong decision with H2
> 2. We should make a well-thought decision about the new engine.
>
> > How many tickets would satisfy you?
>
> You write about "issueS" with the H2.
> All I see is one open ticket.
> IEP doesn't provide enough information.
> So it's not about the number of tickets, it's about
>
> > These two points (single map-reduce execution and inflexible optimizer)
> > are the main problems with the current engine.
>
> We may come to the point when Calcite(or any other engine) brings us third
> and other "main problems".
> This is how it happens with H2.
>
> Let's start from what we want to get with the engine and move forward from
> this base.
> What do you think?
>
>
>
> В Пт, 27/09/2019 в 16:15 +0300, Roman Kondakov пишет:
> > Maxim, Nikolay,
> >
> > I've listed two issues which show the ideological flaws of the current
> > engine.
> >
> > 1. IGNITE-11448 - Open. This ticket describes the impossibility of
> > executing queries which can not be fit in the hardcoded one pass
> > map-reduce paradigm.
> >
> > 2. IGNITE-6085 - Closed (won't fix) - This ticket describes the second
> > major problem with the current engine: H2 query optimizer is very
> > primitive and can not perform many useful optimizations.
> >
> > These two points (single map-reduce execution and inflexible optimizer)
> > are the main problems with the current engine. It means that our engine
> > is currently  suitable for execution only a very limited subset of the
> > typical SQL queries. For example it can not even run most of the TPC-H
> > benchmark queries because they don't fit to the simple map-reduce
> paradigm.
> >
> > > All I see is links to two tickets:
> >
> > How many tickets would satisfy you? I named two. And it looks like it is
> > not enough from your point of view. Ok, so how many is enough? The set
> > of problems caused by listed above tickets is infinite, therefore I can
> > not create a ticket for each of them.
> > > Tech details also should be added.
> >
> > Tech details are in the tickets.
> >
> > > We can't discuss such a huge change as an execution engine replacement
> with descrition like:
> > > "No data co-location control, i.e. arbitrary data can be returned
> silently" or
> > > "Low control on how query executes internally, as a result we have
> limited possibility to implement improvements/fixes."
> >
> > Why not? Don't you understand these problems? Or you don't think this is
> > a problem?
> >
> > > Let's make these descriptions more specific.
> >
> > What do you mean by "more specific"? What is the criteria of the
> > specific description?
> >
> >
> >
> > Nikolay, Maxim, I understand that our arguments may not be as obvious
> > for you as it obvious for SQL team. So, please arrange your questions in
> > a more constructive way.
> >
> > Thank you!
>


-- 
Best regards,
Andrey V. Mashenkov

Re: New SQL execution engine

2019-09-27 Thread Seliverstov Igor

Nikolay,

The main issue - there is no *selection*.

There is a field of knowledge - relational algebra, which describes how to 
transform relational expressions saving their semantics, and a couple of 
implementations (Calcite is only one written in Java).

There are only two alternatives:

1) Implementing white papers from scratch
2) Adopting Calcite to our needs.

The second way was chosen by several other projects, there is experience, there 
is a list of known issues (like using indexes) so, almost everything is already 
done for us.

Implementing a planner is a big deal, I think anybody understands it there. 
That's why our proposal to reuse others experience is obvious.

If you have an alternative - you're welcome, I'll gratefully listen to you.

The main question isn't "WHAT" but "HOW" - that's the discussion topic from my 
point of view.

Regards,
Igor

> 27 сент. 2019 г., в 16:37, Nikolay Izhikov  написал(а):
> 
> Roman.
> 
>> Nikolay, Maxim, I understand that our arguments may not be as obvious 
>> for you as it obvious for SQL team. So, please arrange your questions in 
>> a more constructive way.
> 
> What is SQL team?
> I only know Ignite community :)
> 
> Please, share you knowledge in IEP.
> I want to join to the process of engine *selection*.
> It should start with the requirements to such engine.
> Can you write it in IEP, please?
> 
> My point is very simple:
> 
> 1. We made the wrong decision with H2
> 2. We should make a well-thought decision about the new engine.
> 
>> How many tickets would satisfy you?
> 
> You write about "issueS" with the H2.
> All I see is one open ticket.
> IEP doesn't provide enough information.
> So it's not about the number of tickets, it's about
> 
>> These two points (single map-reduce execution and inflexible optimizer) 
>> are the main problems with the current engine.
> 
> We may come to the point when Calcite(or any other engine) brings us third 
> and other "main problems".
> This is how it happens with H2.
> 
> Let's start from what we want to get with the engine and move forward from 
> this base.
> What do you think?
> 
> 
> 
> В Пт, 27/09/2019 в 16:15 +0300, Roman Kondakov пишет:
>> Maxim, Nikolay,
>> 
>> I've listed two issues which show the ideological flaws of the current 
>> engine.
>> 
>> 1. IGNITE-11448 - Open. This ticket describes the impossibility of 
>> executing queries which can not be fit in the hardcoded one pass 
>> map-reduce paradigm.
>> 
>> 2. IGNITE-6085 - Closed (won't fix) - This ticket describes the second 
>> major problem with the current engine: H2 query optimizer is very 
>> primitive and can not perform many useful optimizations.
>> 
>> These two points (single map-reduce execution and inflexible optimizer) 
>> are the main problems with the current engine. It means that our engine 
>> is currently  suitable for execution only a very limited subset of the 
>> typical SQL queries. For example it can not even run most of the TPC-H 
>> benchmark queries because they don't fit to the simple map-reduce paradigm.
>> 
>>> All I see is links to two tickets:
>> 
>> How many tickets would satisfy you? I named two. And it looks like it is 
>> not enough from your point of view. Ok, so how many is enough? The set 
>> of problems caused by listed above tickets is infinite, therefore I can 
>> not create a ticket for each of them.
>>> Tech details also should be added.
>> 
>> Tech details are in the tickets.
>> 
>>> We can't discuss such a huge change as an execution engine replacement with 
>>> descrition like:
>>> "No data co-location control, i.e. arbitrary data can be returned silently" 
>>> or
>>> "Low control on how query executes internally, as a result we have limited 
>>> possibility to implement improvements/fixes."
>> 
>> Why not? Don't you understand these problems? Or you don't think this is 
>> a problem?
>> 
>>> Let's make these descriptions more specific.
>> 
>> What do you mean by "more specific"? What is the criteria of the 
>> specific description?
>> 
>> 
>> 
>> Nikolay, Maxim, I understand that our arguments may not be as obvious 
>> for you as it obvious for SQL team. So, please arrange your questions in 
>> a more constructive way.
>> 
>> Thank you!

[jira] [Created] (IGNITE-12238) RobinHoodBackwardShiftHashMap works incorrectly on big endian architectures

2019-09-27 Thread Andrey N. Gura (Jira)

Andrey N. Gura created IGNITE-12238:
---

 Summary: RobinHoodBackwardShiftHashMap works incorrectly on big 
endian architectures
 Key: IGNITE-12238
 URL: https://issues.apache.org/jira/browse/IGNITE-12238
 Project: Ignite
  Issue Type: Bug
Reporter: Andrey N. Gura
Assignee: Andrey N. Gura
 Fix For: 2.8


{{RobinHoodBackwardShiftHashMap}} has bug that can be reproduced only on big 
endinan architectures. In order to reproduce the problem run the following 
tests:

* {{RobinHoodBackwardShiftHashMapTest.testCollisionOnRemove}}
* {{testRandomOpsPutRemove}}

The problem is {{setIdealBucket()}} method writes {{long}} value to the offheap 
memory, while {{getIdealBucket()}} reads {{int}} value. For little endian 
architectures it works because meaningful 4 bytes will written first  to the 
memory and leading zero bytes will be rewriteen by the next operation. On big 
endian architecture always 4 zero bytes will be written to the memory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: New SQL execution engine

2019-09-27 Thread Nikolay Izhikov

Roman.

> Nikolay, Maxim, I understand that our arguments may not be as obvious 
> for you as it obvious for SQL team. So, please arrange your questions in 
> a more constructive way.

What is SQL team?
I only know Ignite community :)

Please, share you knowledge in IEP.
I want to join to the process of engine *selection*.
It should start with the requirements to such engine.
Can you write it in IEP, please?

My point is very simple:

1. We made the wrong decision with H2
2. We should make a well-thought decision about the new engine.

> How many tickets would satisfy you?

You write about "issueS" with the H2.
All I see is one open ticket.
IEP doesn't provide enough information.
So it's not about the number of tickets, it's about

> These two points (single map-reduce execution and inflexible optimizer) 
> are the main problems with the current engine.

We may come to the point when Calcite(or any other engine) brings us third and 
other "main problems".
This is how it happens with H2.

Let's start from what we want to get with the engine and move forward from this 
base.
What do you think?



В Пт, 27/09/2019 в 16:15 +0300, Roman Kondakov пишет:
> Maxim, Nikolay,
> 
> I've listed two issues which show the ideological flaws of the current 
> engine.
> 
> 1. IGNITE-11448 - Open. This ticket describes the impossibility of 
> executing queries which can not be fit in the hardcoded one pass 
> map-reduce paradigm.
> 
> 2. IGNITE-6085 - Closed (won't fix) - This ticket describes the second 
> major problem with the current engine: H2 query optimizer is very 
> primitive and can not perform many useful optimizations.
> 
> These two points (single map-reduce execution and inflexible optimizer) 
> are the main problems with the current engine. It means that our engine 
> is currently  suitable for execution only a very limited subset of the 
> typical SQL queries. For example it can not even run most of the TPC-H 
> benchmark queries because they don't fit to the simple map-reduce paradigm.
> 
> > All I see is links to two tickets:
> 
> How many tickets would satisfy you? I named two. And it looks like it is 
> not enough from your point of view. Ok, so how many is enough? The set 
> of problems caused by listed above tickets is infinite, therefore I can 
> not create a ticket for each of them.
> > Tech details also should be added.
> 
> Tech details are in the tickets.
> 
> > We can't discuss such a huge change as an execution engine replacement with 
> > descrition like:
> > "No data co-location control, i.e. arbitrary data can be returned silently" 
> > or
> > "Low control on how query executes internally, as a result we have limited 
> > possibility to implement improvements/fixes."
> 
> Why not? Don't you understand these problems? Or you don't think this is 
> a problem?
> 
> > Let's make these descriptions more specific.
> 
> What do you mean by "more specific"? What is the criteria of the 
> specific description?
> 
> 
> 
> Nikolay, Maxim, I understand that our arguments may not be as obvious 
> for you as it obvious for SQL team. So, please arrange your questions in 
> a more constructive way.
> 
> Thank you!


signature.asc
Description: This is a digitally signed message part

Re: New SQL execution engine

2019-09-27 Thread Roman Kondakov


Maxim, Nikolay,

I've listed two issues which show the ideological flaws of the current 
engine.


1. IGNITE-11448 - Open. This ticket describes the impossibility of 
executing queries which can not be fit in the hardcoded one pass 
map-reduce paradigm.


2. IGNITE-6085 - Closed (won't fix) - This ticket describes the second 
major problem with the current engine: H2 query optimizer is very 
primitive and can not perform many useful optimizations.


These two points (single map-reduce execution and inflexible optimizer) 
are the main problems with the current engine. It means that our engine 
is currently  suitable for execution only a very limited subset of the 
typical SQL queries. For example it can not even run most of the TPC-H 
benchmark queries because they don't fit to the simple map-reduce paradigm.



All I see is links to two tickets:
How many tickets would satisfy you? I named two. And it looks like it is 
not enough from your point of view. Ok, so how many is enough? The set 
of problems caused by listed above tickets is infinite, therefore I can 
not create a ticket for each of them.

Tech details also should be added.


Tech details are in the tickets.


We can't discuss such a huge change as an execution engine replacement with 
descrition like:
"No data co-location control, i.e. arbitrary data can be returned silently" or
"Low control on how query executes internally, as a result we have limited 
possibility to implement improvements/fixes."
Why not? Don't you understand these problems? Or you don't think this is 
a problem?



Let's make these descriptions more specific.
What do you mean by "more specific"? What is the criteria of the 
specific description?




Nikolay, Maxim, I understand that our arguments may not be as obvious 
for you as it obvious for SQL team. So, please arrange your questions in 
a more constructive way.


Thank you!
--
Kind Regards
Roman Kondakov

On 27.09.2019 15:32, Maxim Muzafarov wrote:

Folks,

I agree with Nikolay, the idea of replacing the H2 engine with the
most suitable one is reasonable. But since such change is major we
should have a strong argumentation on it even for members with are
working outside the SQL-team.

I think it is really necessary to have:

1. The list of issues related to the current engine (H2) which from
different points of view and for different developers must seem
unsolvable. For example, `... the H2 execution plan is hard-wired with
H2 internals and can't be easily transformed` seems doesn't have a
strong technical argumentation.
After this step, we should have a clear understanding that the engine
change is required.

2. Why only the Apache Calcite? It seems to me we should have a table
with a comparison of different engines with the pros and cons of each
other. A brief search shows me that we may have a few options here.
After this step, we should have a clear understanding of why we choose
this dependency prior to another.

3. We should also have a migration decomposition and step by step
actions to do. I haven't found such a decomposition on IEP-37 page. Do
we have one? What the implementation phases will be? What components
will be changed? What a new API would be and would it be? What
problems we are expecting e.g performance degradation on prototype
implementation? `Risks and Assumptions` topic doesn't seem to be a
good described.
After this step, we should have a clear and obvious a new feature
implementation plan.

Let's have a strong technical discussion.

On Fri, 27 Sep 2019 at 15:17, Nikolay Izhikov  wrote:

Hello, Roman.

All I see is links to two tickets:

IGNITE-11448 - Open
IGNITE-6085 - Closed

Other issues described poorly and have not ticket links.
We can't discuss such a huge change as an execution engine replacement with 
descrition like:

"No data co-location control, i.e. arbitrary data can be returned silently" or
"Low control on how query executes internally, as a result we have limited 
possibility to implement improvements/fixes."

I think we need some reproducer that shows issue.
Tech details also should be added.

Let's make these descriptions more specific.
Let's discuss how we want to fix them with the new engine.


В Пт, 27/09/2019 в 15:10 +0300, Roman Kondakov пишет:

Hello Nikolay,

please see IEP--37 [1]. Issues are there.


[1]
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084

Re: New SQL execution engine

2019-09-27 Thread Nikolay Izhikov

Hello, Alexey.

Thanks for the details.

> Now, as for alternatives for Apache Calcite

I want to discuss our *requirements* for the new engine first.
Can we do it?
The main reason to do it - We should avoid wrong technical decision.
We made one with H2 and we shouldn't do it again.

> As for the IEP content - I agree, we should have a more detailed
> description of steps and technical information there, but I believe this
> will be improved further.

Thanks!
Looking forward for IEP details.

В Пт, 27/09/2019 в 16:04 +0300, Alexey Goncharuk пишет:
> Nikolay, Maxim,
> 
> Asking to provide a list of issues with the current H2 is pointless because
> it has a fundamental architectural flow, not just a bunch of bugs:
> 
> Currently, the query execution is limited to a two-phase map-reduce task
> (with an optional remote cursor when 'distributed joins' flag is enabled)
> and only a limited subset of queries can be executed. You can easily see
> that if you try to draw how three non-collocated caches should be joined on
> an arbitrary condition.
> 
> H2 cannot solve this problem because H2 is a local database and is not
> designed to execute distributed queries, let alone the fact that it is not
> designed to be embedded to other projects as an execution engine. Because
> of this, H2 upgrade is a huge pain which leads to issues up to broken
> compilation. This is exactly the reason why the ticket with index use for
> IN() expression [1] has only been fixed in 2.7, one can see the amount of
> changes needed for a simple version upgrade.
> 
> Now, as for alternatives for Apache Calcite - I personally spent quite a
> large amount of time looking for alternatives but did not find any even
> remotely matching the abilities and flexibility of Calcite, but did not
> find any. As folks noted before, Calcite is specifically designed to have
> flexible optimization rules and support distributed query execution, which
> is already proved by real-life projects. If you have any other framework in
> mind that should be considered - please let the community know, I believe
> it will be a more productive discussion than now.
> 
> As for the IEP content - I agree, we should have a more detailed
> description of steps and technical information there, but I believe this
> will be improved further.
> 
> --AG
> 
> [1] https://issues.apache.org/jira/browse/IGNITE-4150
> 
> 
> 
> пт, 27 сент. 2019 г. в 15:33, Maxim Muzafarov :
> 
> > Folks,
> > 
> > I agree with Nikolay, the idea of replacing the H2 engine with the
> > most suitable one is reasonable. But since such change is major we
> > should have a strong argumentation on it even for members with are
> > working outside the SQL-team.
> > 
> > I think it is really necessary to have:
> > 
> > 1. The list of issues related to the current engine (H2) which from
> > different points of view and for different developers must seem
> > unsolvable. For example, `... the H2 execution plan is hard-wired with
> > H2 internals and can't be easily transformed` seems doesn't have a
> > strong technical argumentation.
> > After this step, we should have a clear understanding that the engine
> > change is required.
> > 
> > 2. Why only the Apache Calcite? It seems to me we should have a table
> > with a comparison of different engines with the pros and cons of each
> > other. A brief search shows me that we may have a few options here.
> > After this step, we should have a clear understanding of why we choose
> > this dependency prior to another.
> > 
> > 3. We should also have a migration decomposition and step by step
> > actions to do. I haven't found such a decomposition on IEP-37 page. Do
> > we have one? What the implementation phases will be? What components
> > will be changed? What a new API would be and would it be? What
> > problems we are expecting e.g performance degradation on prototype
> > implementation? `Risks and Assumptions` topic doesn't seem to be a
> > good described.
> > After this step, we should have a clear and obvious a new feature
> > implementation plan.
> > 
> > Let's have a strong technical discussion.
> > 
> > On Fri, 27 Sep 2019 at 15:17, Nikolay Izhikov  wrote:
> > > 
> > > Hello, Roman.
> > > 
> > > All I see is links to two tickets:
> > > 
> > > IGNITE-11448 - Open
> > > IGNITE-6085 - Closed
> > > 
> > > Other issues described poorly and have not ticket links.
> > > We can't discuss such a huge change as an execution engine replacement
> > 
> > with descrition like:
> > > 
> > > "No data co-location control, i.e. arbitrary data can be returned
> > 
> > silently" or
> > > "Low control on how query executes internally, as a result we have
> > 
> > limited possibility to implement improvements/fixes."
> > > 
> > > I think we need some reproducer that shows issue.
> > > Tech details also should be added.
> > > 
> > > Let's make these descriptions more specific.
> > > Let's discuss how we want to fix them with the new engine.
> > > 
> > > 
> > > В Пт, 27/09/2019 в 15:

Re: New SQL execution engine

2019-09-27 Thread Alexey Goncharuk

Nikolay, Maxim,

Asking to provide a list of issues with the current H2 is pointless because
it has a fundamental architectural flow, not just a bunch of bugs:

Currently, the query execution is limited to a two-phase map-reduce task
(with an optional remote cursor when 'distributed joins' flag is enabled)
and only a limited subset of queries can be executed. You can easily see
that if you try to draw how three non-collocated caches should be joined on
an arbitrary condition.

H2 cannot solve this problem because H2 is a local database and is not
designed to execute distributed queries, let alone the fact that it is not
designed to be embedded to other projects as an execution engine. Because
of this, H2 upgrade is a huge pain which leads to issues up to broken
compilation. This is exactly the reason why the ticket with index use for
IN() expression [1] has only been fixed in 2.7, one can see the amount of
changes needed for a simple version upgrade.

Now, as for alternatives for Apache Calcite - I personally spent quite a
large amount of time looking for alternatives but did not find any even
remotely matching the abilities and flexibility of Calcite, but did not
find any. As folks noted before, Calcite is specifically designed to have
flexible optimization rules and support distributed query execution, which
is already proved by real-life projects. If you have any other framework in
mind that should be considered - please let the community know, I believe
it will be a more productive discussion than now.

As for the IEP content - I agree, we should have a more detailed
description of steps and technical information there, but I believe this
will be improved further.

--AG

[1] https://issues.apache.org/jira/browse/IGNITE-4150

пт, 27 сент. 2019 г. в 15:33, Maxim Muzafarov :

> Folks,
>
> I agree with Nikolay, the idea of replacing the H2 engine with the
> most suitable one is reasonable. But since such change is major we
> should have a strong argumentation on it even for members with are
> working outside the SQL-team.
>
> I think it is really necessary to have:
>
> 1. The list of issues related to the current engine (H2) which from
> different points of view and for different developers must seem
> unsolvable. For example, `... the H2 execution plan is hard-wired with
> H2 internals and can't be easily transformed` seems doesn't have a
> strong technical argumentation.
> After this step, we should have a clear understanding that the engine
> change is required.
>
> 2. Why only the Apache Calcite? It seems to me we should have a table
> with a comparison of different engines with the pros and cons of each
> other. A brief search shows me that we may have a few options here.
> After this step, we should have a clear understanding of why we choose
> this dependency prior to another.
>
> 3. We should also have a migration decomposition and step by step
> actions to do. I haven't found such a decomposition on IEP-37 page. Do
> we have one? What the implementation phases will be? What components
> will be changed? What a new API would be and would it be? What
> problems we are expecting e.g performance degradation on prototype
> implementation? `Risks and Assumptions` topic doesn't seem to be a
> good described.
> After this step, we should have a clear and obvious a new feature
> implementation plan.
>
> Let's have a strong technical discussion.
>
> On Fri, 27 Sep 2019 at 15:17, Nikolay Izhikov  wrote:
> >
> > Hello, Roman.
> >
> > All I see is links to two tickets:
> >
> > IGNITE-11448 - Open
> > IGNITE-6085 - Closed
> >
> > Other issues described poorly and have not ticket links.
> > We can't discuss such a huge change as an execution engine replacement
> with descrition like:
> >
> > "No data co-location control, i.e. arbitrary data can be returned
> silently" or
> > "Low control on how query executes internally, as a result we have
> limited possibility to implement improvements/fixes."
> >
> > I think we need some reproducer that shows issue.
> > Tech details also should be added.
> >
> > Let's make these descriptions more specific.
> > Let's discuss how we want to fix them with the new engine.
> >
> >
> > В Пт, 27/09/2019 в 15:10 +0300, Roman Kondakov пишет:
> > > Hello Nikolay,
> > >
> > > please see IEP--37 [1]. Issues are there.
> > >
> > >
> > > [1]
> > >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084
> > >
> > >
>

[jira] [Created] (IGNITE-12237) Forbid thin client connections dynamically

2019-09-27 Thread Denis Mekhanikov (Jira)

Denis Mekhanikov created IGNITE-12237:
-

 Summary: Forbid thin client connections dynamically
 Key: IGNITE-12237
 URL: https://issues.apache.org/jira/browse/IGNITE-12237
 Project: Ignite
  Issue Type: Improvement
  Components: thin client
Reporter: Denis Mekhanikov


Sometimes it's useful to forbid thin clients connections to nodes for some 
period of time. At this time cluster may be performing some activation needed 
for correct work of the application.

It would be good to have an API call, opening and closing thin client 
connections.

This feature was requested in the following StackOverflow question: 
https://stackoverflow.com/questions/58106297/how-to-block-java-thin-client-request-till-preloading-of-data-in-ignite-cluster



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: New SQL execution engine

2019-09-27 Thread Maxim Muzafarov

Folks,

I agree with Nikolay, the idea of replacing the H2 engine with the
most suitable one is reasonable. But since such change is major we
should have a strong argumentation on it even for members with are
working outside the SQL-team.

I think it is really necessary to have:

1. The list of issues related to the current engine (H2) which from
different points of view and for different developers must seem
unsolvable. For example, `... the H2 execution plan is hard-wired with
H2 internals and can't be easily transformed` seems doesn't have a
strong technical argumentation.
After this step, we should have a clear understanding that the engine
change is required.

2. Why only the Apache Calcite? It seems to me we should have a table
with a comparison of different engines with the pros and cons of each
other. A brief search shows me that we may have a few options here.
After this step, we should have a clear understanding of why we choose
this dependency prior to another.

3. We should also have a migration decomposition and step by step
actions to do. I haven't found such a decomposition on IEP-37 page. Do
we have one? What the implementation phases will be? What components
will be changed? What a new API would be and would it be? What
problems we are expecting e.g performance degradation on prototype
implementation? `Risks and Assumptions` topic doesn't seem to be a
good described.
After this step, we should have a clear and obvious a new feature
implementation plan.

Let's have a strong technical discussion.

On Fri, 27 Sep 2019 at 15:17, Nikolay Izhikov  wrote:
>
> Hello, Roman.
>
> All I see is links to two tickets:
>
> IGNITE-11448 - Open
> IGNITE-6085 - Closed
>
> Other issues described poorly and have not ticket links.
> We can't discuss such a huge change as an execution engine replacement with 
> descrition like:
>
> "No data co-location control, i.e. arbitrary data can be returned silently" or
> "Low control on how query executes internally, as a result we have limited 
> possibility to implement improvements/fixes."
>
> I think we need some reproducer that shows issue.
> Tech details also should be added.
>
> Let's make these descriptions more specific.
> Let's discuss how we want to fix them with the new engine.
>
>
> В Пт, 27/09/2019 в 15:10 +0300, Roman Kondakov пишет:
> > Hello Nikolay,
> >
> > please see IEP--37 [1]. Issues are there.
> >
> >
> > [1]
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084
> >
> >

Re: Improvements for new security approach.

2019-09-27 Thread Maksim Stepachev

I finished with fixes: https://issues.apache.org/jira/browse/IGNITE-11992

>> Subject's size is unlimited, that can lead to a dramatic increase in
traffic between nodes.
I added network optimization for this case. I add a subject in the case
when ctx.discovery().node(secSubjId) == null.

>> Also, we need to get rid of GridTaskThreadContextKey#TC_SUBJ_ID as
duplication of IgnitSecurity responsibility.
[2]Yes, we should rid of this. But in the next task, because I can't merge
it since 18 Jul 19:)

[1] I aggry with you.


пт, 27 сент. 2019 г. в 11:42, Denis Garus :

> Hello, Maksim!
>
> Thank you for your effort and interest in the security of Ignite.
>
> I would like you to pay attention to the discussion [1] and issue [2].
> It looks like not only task should execute in the current security context
> but all operations too, that is essential to determine a security id for
> events.
> Also, we need to get rid of GridTaskThreadContextKey#TC_SUBJ_ID as
> duplication of IgnitSecurity responsibility.
> I think your task is the right place to do that.
> What is your opinion?
>
> >>It's the reason why subject id isn't enough and we should transmit
> subject inside message for this case.
> There is a problem with this approach.
> Subject's size is unlimited, that can lead to a dramatic increase in
> traffic between nodes.
>
> 1.
> http://apache-ignite-developers.2346864.n4.nabble.com/JavaDoc-for-Event-s-subjectId-methods-td43663.html
> 2. https://issues.apache.org/jira/browse/IGNITE-9914
>
> пт, 27 сент. 2019 г. в 08:38, Anton Vinogradov :
>
>> Maksim
>>
>> >> I want to fix 2-3-4 points under one ticket.
>> Please let me know once it's become ready to be reviewed.
>>
>> On Thu, Sep 26, 2019 at 5:18 PM Maksim Stepachev <
>> maksim.stepac...@gmail.com>
>> wrote:
>>
>> > Hi.
>> >
>> > Anton Vinogradov,
>> >
>> > I want to fix 2-3-4 points under one ticket.
>> >
>> > The first was fixed in the ticket:
>> > https://issues.apache.org/jira/browse/IGNITE-11094
>> > Also, I aggry with you that 5-6 isn't required to ignite.
>> >
>> > Denis Garus,
>> > I made reproducer for point 3. Looks at the test from my pull-request:
>> > JettyRestPropagationSecurityContextTest
>> >
>> > https://github.com/apache/ignite/pull/6918
>> >
>> > For point 2 you should apply GridRestProcessor from pr and set debug
>> into
>> > VisorQueryUtils#scheduleQueryStart between
>> > ignite.context().closure().runLocalSafe  and call:
>> > ignite.context().security().securityContext()
>> >
>> >
>> > For point 3, do action above and call:
>> >
>> ignite.context().discovery().node(ignite.context().security().securityContext().subject().id())
>> >
>> > It returns null because this subject was created from the rest. It's the
>> > reason why subject id isn't enough and we should transmit subject inside
>> > message for this case.
>> >
>> > чт, 18 июл. 2019 г. в 12:45, Anton Vinogradov :
>> >
>> >> Maksim,
>> >>
>> >> Could you please split IGNITE-11992 to subtasks with proper
>> descriptions?
>> >> This will allow us to relocate discussion to the issues to solve each
>> >> problem properly.
>> >>
>> >> On Thu, Jul 18, 2019 at 11:57 AM Denis Garus 
>> wrote:
>> >>
>> >> > Hello, Maksim!
>> >> > Thanks for your analysis!
>> >> >
>> >> > I have a few questions about your proposals.
>> >> >
>> >> > GridRestProcessor.
>> >> > AFAIK, when GridRestProcessor handle client request
>> >> > (GridRestProcessor#handleRequest)
>> >> > it process authentication (GridRestProcessor#authenticate)
>> >> > and then authorization of request (GridRestProcessor#authorize)
>> inside
>> >> > client context.
>> >> > Can you give additional info about issues with GridRestProcessor
>> from 3
>> >> and
>> >> > 4? Maybe you have a reproducer for the problem?
>> >> >
>> >> > NoOpIgniteSecurityProcessor.
>> >> > I think the case that you describe in 5 is not a bug.
>> >> > All nodes (client and server) must have security enabled or disabled.
>> >> > I can't imagine the case when it is not.
>> >> >
>> >> > ATTR_SECURITY_SUBJECT.
>> >> > I don't think that compatibility is needed here. If you will use node
>> >> with
>> >> > propagation security context to remote node and older nodes
>> >> > you can get subtle errors.
>> >> >
>> >> > чт, 18 июл. 2019 г. в 11:12, Maksim Stepachev <
>> >> maksim.stepac...@gmail.com
>> >> > >:
>> >> >
>> >> > > Hi, Ivan.
>> >> > >
>> >> > > Yes, I have.
>> >> > > https://issues.apache.org/jira/browse/IGNITE-11992
>> >> > >
>> >> > > I'm waiting for a visa.
>> >> > >
>> >> > >
>> >> > > чт, 18 июл. 2019 г. в 11:09, Ivan Rakov :
>> >> > >
>> >> > >> Hello Max,
>> >> > >>
>> >> > >> Thanks for your analysis!
>> >> > >>
>> >> > >> Have you created a JIRA issue for discovered defects?
>> >> > >>
>> >> > >> Best Regards,
>> >> > >> Ivan Rakov
>> >> > >>
>> >> > >> On 17.07.2019 17:08, Maksim Stepachev wrote:
>> >> > >> > Hello, Igniters.
>> >> > >> >
>> >> > >> >  The main idea of the new security is propagation security
>> >> context
>> >> > >> to
>> >> > >> > oth

Re: New SQL execution engine

2019-09-27 Thread Nikolay Izhikov

Hello, Roman.

All I see is links to two tickets:

IGNITE-11448 - Open
IGNITE-6085 - Closed

Other issues described poorly and have not ticket links.
We can't discuss such a huge change as an execution engine replacement with 
descrition like:

"No data co-location control, i.e. arbitrary data can be returned silently" or
"Low control on how query executes internally, as a result we have limited 
possibility to implement improvements/fixes."

I think we need some reproducer that shows issue.
Tech details also should be added.

Let's make these descriptions more specific. 
Let's discuss how we want to fix them with the new engine.


В Пт, 27/09/2019 в 15:10 +0300, Roman Kondakov пишет:
> Hello Nikolay,
> 
> please see IEP--37 [1]. Issues are there.
> 
> 
> [1] 
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084
> 
> 


signature.asc
Description: This is a digitally signed message part

Re: New SQL execution engine

2019-09-27 Thread Roman Kondakov


Hello Nikolay,

please see IEP--37 [1]. Issues are there.


[1] 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084



--
Kind Regards
Roman Kondakov

On 27.09.2019 14:20, Nikolay Izhikov wrote:

Hello, Roman.


Also Apache Calcite is commonly used in popular Apache projects

I don't think it's a good point.
H2 is also commonly used.
But, it doesn't conform to Ignite requirements.

Can you, please, write down issues and engine requirements to the IEP?
So we can discuss each point separately.


В Пт, 27/09/2019 в 13:56 +0300, Roman Kondakov пишет:

Hello Nikolay.

You've asked very good questions. I'll try to answer.


1. What the exact issues with the H2 integration?
Can you send a tickets links?
Can we label all H2 integration issues in JIRA? I propose to use "h2" label.

Current SQL engine is confined in the single-pass map-reduce algorithm.
This make impossible to execute complex queries which can not be
expressed with a single map-reduce pass like subqueries with aggregates
[1].  Another problem is that H2 optimizer is very primitive and not
able to perform many useful optimizations [2].

Also Apache Calcite is commonly used in popular Apache projects like
Hive, Drill, Flink and others [3]. So it's mature and well battle tested
framework, while H2 is a toy database which is hardly ever used in the
real production systems.


2. What are the requirements for the new SQL engine?
We should write it down and discuss.

The main requirement is to fix the problems listed above. The new SQL
engine should be able to *effectively* execute SQL queries of the
*arbitrary complexity*. For example the new engine will be able to
perform distributed joins in a multiple ways [4], when current engine
can do it only in two ways: collocated and distributed (the latter is
usually not very efficient and needed to set manually).


3. What options do we have?
Are there any alternatives to Calcite on the market?
We did the wrong choice that looked obvious one time.
So we should carefully avoid it at this time.

I know the only one open source implementation of the efficient query
optimization strategy - and this is Apache Calcite. The alternative way
is to write our own query optimizer from scratch which is not a trivial
task at all.



4. What is improvements of Ignite we want to make with the new engine?

Ignite will be able to execute complex queries using optimal strategy. I
think this is a quite good improvement.


[1] https://issues.apache.org/jira/browse/IGNITE-11448
[2] https://issues.apache.org/jira/browse/IGNITE-6085
[3] https://calcite.apache.org/docs/powered_by.html
[4] https://www.memsql.com/blog/scaling-distributed-joins/

[jira] [Created] (IGNITE-12236) RepositoryFactorySupport#getQueryLookupStrategy no longer overriden in IgniteRepositoryFactory

2019-09-27 Thread Riquet Thibaut (Jira)

Riquet Thibaut created IGNITE-12236:
---

 Summary: RepositoryFactorySupport#getQueryLookupStrategy no longer 
overriden in IgniteRepositoryFactory
 Key: IGNITE-12236
 URL: https://issues.apache.org/jira/browse/IGNITE-12236
 Project: Ignite
  Issue Type: Bug
  Components: spring
Affects Versions: 2.7.6
Reporter: Riquet Thibaut


Hello,

org.apache.ignite.springdata20.repository.support.IgniteRepositoryFactory#getQueryLookupStrategy

does not override 

org.springframework.data.repository.core.support.RepositoryFactorySupport#getQueryLookupStrategy

since this commit

[https://github.com/spring-projects/spring-data-commons/commit/a6215fbe0f5c9a254cddacb12763737f2c286ad5]

 

this results in a thrown exception in 

org.springframework.data.repository.core.support.RepositoryFactorySupport.QueryExecutorMethodInterceptor#QueryExecutorMethodInterceptor

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [IGNITE-9836] Invalid check of ea java versions

2019-09-27 Thread Stephen Darlington

Done: https://github.com/apache/ignite/pull/6920

While we’re talking about the startup scripts… 
https://issues.apache.org/jira/browse/IGNITE-11856

Regards,
Stephen

> On 26 Sep 2019, at 17:02, Ilya Kasnacheev  wrote:
> 
> Hello!
> 
> Please do!
> 
> Regards,
> -- 
> Ilya Kasnacheev
> 
> 
> вт, 17 сент. 2019 г. в 11:13, Stephen Darlington <
> stephen.darling...@gridgain.com>:
> 
>> I can’t take any credit for the patch but if the original author has lost
>> interest I’m happy to help push it through.
>> 
>> Regards,
>> Stephen
>> 
>>> On 16 Sep 2019, at 19:31, Denis Magda  wrote:
>>> 
>>> Stephen,
>>> 
>>> Thanks for sending the patch! Seems that Igniters are already actively
>>> reviewing it in JIRA.
>>> 
>>> -
>>> Denis
>>> 
>>> 
>>> On Mon, Sep 16, 2019 at 7:03 AM Stephen Darlington <
>>> stephen.darling...@gridgain.com> wrote:
>>> 
 Hi,
 
 Would someone mind taking a quick look at this ticket? Basically, a
>> clean
 download of Ignite won’t start if the version of Java you’re using has a
 number like “java version "1.8.0_202-ea””. (This is the default if you
>> get
 your JDK using Homebrew on a Mac.)
 
> https://issues.apache.org/jira/browse/IGNITE-9836 <
 https://issues.apache.org/jira/browse/IGNITE-9836>
 
 This has been bugging me for ages and now that I look at it I find that
 there’s already a tiny, working patch available.
 
 Regards,
 Stephen
>> 
>> 
>>

Re: New SQL execution engine

2019-09-27 Thread Nikolay Izhikov

Hello, Roman.

> Also Apache Calcite is commonly used in popular Apache projects

I don't think it's a good point.
H2 is also commonly used.
But, it doesn't conform to Ignite requirements.

Can you, please, write down issues and engine requirements to the IEP?
So we can discuss each point separately.


В Пт, 27/09/2019 в 13:56 +0300, Roman Kondakov пишет:
> Hello Nikolay.
> 
> You've asked very good questions. I'll try to answer.
> 
> > 1. What the exact issues with the H2 integration?
> > Can you send a tickets links?
> > Can we label all H2 integration issues in JIRA? I propose to use "h2" label.
> 
> Current SQL engine is confined in the single-pass map-reduce algorithm. 
> This make impossible to execute complex queries which can not be 
> expressed with a single map-reduce pass like subqueries with aggregates 
> [1].  Another problem is that H2 optimizer is very primitive and not 
> able to perform many useful optimizations [2].
> 
> Also Apache Calcite is commonly used in popular Apache projects like 
> Hive, Drill, Flink and others [3]. So it's mature and well battle tested 
> framework, while H2 is a toy database which is hardly ever used in the 
> real production systems.
> 
> > 2. What are the requirements for the new SQL engine?
> > We should write it down and discuss.
> 
> The main requirement is to fix the problems listed above. The new SQL 
> engine should be able to *effectively* execute SQL queries of the 
> *arbitrary complexity*. For example the new engine will be able to 
> perform distributed joins in a multiple ways [4], when current engine 
> can do it only in two ways: collocated and distributed (the latter is 
> usually not very efficient and needed to set manually).
> 
> > 3. What options do we have?
> > Are there any alternatives to Calcite on the market?
> > We did the wrong choice that looked obvious one time.
> > So we should carefully avoid it at this time.
> 
> I know the only one open source implementation of the efficient query 
> optimization strategy - and this is Apache Calcite. The alternative way 
> is to write our own query optimizer from scratch which is not a trivial 
> task at all.
> 
> 
> > 4. What is improvements of Ignite we want to make with the new engine?
> 
> Ignite will be able to execute complex queries using optimal strategy. I 
> think this is a quite good improvement.
> 
> 
> [1] https://issues.apache.org/jira/browse/IGNITE-11448
> [2] https://issues.apache.org/jira/browse/IGNITE-6085
> [3] https://calcite.apache.org/docs/powered_by.html
> [4] https://www.memsql.com/blog/scaling-distributed-joins/


signature.asc
Description: This is a digitally signed message part

Re: New SQL execution engine

2019-09-27 Thread Nikolay Izhikov

Hello, Andrey.

> Ignite SQL layer has some issues that can't be fix with changes in Ignite
> only, and we are blocked with H2.

What are these issues?
Can you make it specific and send a tickets for this issues?

> 3. Replace H2 with smth else.

Actually, I support this decision in general.
But, to make a right choise for H2 replacement we should carefully discuss such 
huge replacement.

So far, I can't see any written down(in IEP) requirements for SQL engine.
Let's do it and discuss them.

В Пт, 27/09/2019 в 13:39 +0300, Andrey Mashenkov пишет:
> Hi Nikolay,
> 
> Let me add my 5- cent here.
> 
> Ignite SQL layer has some issues that can't be fix with changes in Ignite
> only, and we are blocked with H2.
> To resolve these issues we can:
> 1. Donate some changes to H2 and wait for it's next release. But there are
> more cons than pros and I think we can't rely on H2 project anymore.
> - There is no guarantee our changes will be approved by H2 community.
> - We definitely won't to depend on H2 product lifecycle.
> - New H2 features (like parallel multi-statement query processing in latest
> release) force Ignite for significant changes\refactoring in Ignite SQL
> layer with no visible benefits.
> Every next release it becomes harder to upgrade H2 dependency.
> - Latest H2 versions causes questions about their stability.
> 
> Hot issues are
> - Large intermediate results inside H2 internals can cause OOM for some
> kind of queries. Ignite can't handle this anyhow for now without reworking
> H2 code.
> - HashJoins
> - Ignite can't start multi-step queries, but 2-step (map-reduce) only.
> - It is not possible to apply optimizations on query plan as no logical
> plan actually doen't exists. H2 execution plan is hard-wired with H2
> internals and can't be easily transformed.
> Implementing a new good planner over H2 looks like a huge task.
> 
> 2. Fork H2.
> We already done this in GridGain (you can found H2 module in GridGain
> community edition) as fastest way to unblock work on SQL improvements.
> But this way doesn't look like a good one for Ignite, regarding our
> experience.
> - H2 code can't be included into Ignite at all.
> H2 license are MIT and EPL. From one side they can't be changed to Apache
> Licence. From other side Apache Foundation don't want to host any code
> licensed with other than Apache License.
> GridGain is ok with this, but Apache Foundation won't.
> 
> - We can made separate H2 fork project with it's own lifecycle with full
> control over it and publish it in Maven Central.
> This doen't seem like a big deal. But will causes additional difficulties
> in development, test and release processes of Ignite.
> This way seems bring much pain for every contributor.
> 
> 3. Replace H2 with smth else.
> E.g. with Apache Calcite.
> - Calcite is a framework and it is designed very flexible and extendable.
> - Every it's part can be replaced with our own implementation.
> - Apache License is out of the box =)
> 
> So, summary:
> 1-st way of pain we have now and it slows down Ignite SQL layer developing.
> 2-nd looks few better, but seems bring Ignite to nowhere in prospect.
> 3-rd is a risky, but promissory way.
> 
> 
> On Fri, Sep 27, 2019 at 12:16 PM Nikolay Izhikov 
> wrote:
> 
> > Hello, Igor.
> > 
> > Thanks for starting this discussion.
> > 
> > I think we should take a step back in it and answer the following
> > questions:
> > 
> > 1. What the exact issues with the H2 integration?
> > Can you send a tickets links?
> > Can we label all H2 integration issues in JIRA? I propose to use "h2"
> > label.
> > 
> > 2. What are the requirements for the new SQL engine?
> > We should write it down and discuss.
> > 
> > 3. What options do we have?
> > Are there any alternatives to Calcite on the market?
> > We did the wrong choice that looked obvious one time.
> > So we should carefully avoid it at this time.
> > 
> > 4. What is improvements of Ignite we want to make with the new engine?
> > 
> > 
> > В Пт, 27/09/2019 в 08:44 +, Igor Seliverstov пишет:
> > > Hi Igniters!
> > > 
> > > As you might know currently we have many open issues relating to current
> > 
> > H2 based engine and its execution flow.
> > > 
> > > Some of them are critical (like impossibility to execute particular
> > 
> > queries), some of them are majors (like impossibility to execute particular
> > queries without pre-preparation your data to have a collocation) and many
> > minors.
> > > 
> > > Most of the issues cannot be solved without whole engine redesign.
> > > 
> > > So, here the proposal:
> > 
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084
> > > 
> > > I'll appreciate if you share your thoughts on top of that.
> > > 
> > > Regards,
> > > Igor
> 
> 


signature.asc
Description: This is a digitally signed message part

Re: New SQL execution engine

2019-09-27 Thread Roman Kondakov


Hello Nikolay.

You've asked very good questions. I'll try to answer.


1. What the exact issues with the H2 integration?
Can you send a tickets links?
Can we label all H2 integration issues in JIRA? I propose to use "h2" label.
Current SQL engine is confined in the single-pass map-reduce algorithm. 
This make impossible to execute complex queries which can not be 
expressed with a single map-reduce pass like subqueries with aggregates 
[1].  Another problem is that H2 optimizer is very primitive and not 
able to perform many useful optimizations [2].


Also Apache Calcite is commonly used in popular Apache projects like 
Hive, Drill, Flink and others [3]. So it's mature and well battle tested 
framework, while H2 is a toy database which is hardly ever used in the 
real production systems.



2. What are the requirements for the new SQL engine?
We should write it down and discuss.
The main requirement is to fix the problems listed above. The new SQL 
engine should be able to *effectively* execute SQL queries of the 
*arbitrary complexity*. For example the new engine will be able to 
perform distributed joins in a multiple ways [4], when current engine 
can do it only in two ways: collocated and distributed (the latter is 
usually not very efficient and needed to set manually).



3. What options do we have?
Are there any alternatives to Calcite on the market?
We did the wrong choice that looked obvious one time.
So we should carefully avoid it at this time.
I know the only one open source implementation of the efficient query 
optimization strategy - and this is Apache Calcite. The alternative way 
is to write our own query optimizer from scratch which is not a trivial 
task at all.




4. What is improvements of Ignite we want to make with the new engine?
Ignite will be able to execute complex queries using optimal strategy. I 
think this is a quite good improvement.



[1] https://issues.apache.org/jira/browse/IGNITE-11448
[2] https://issues.apache.org/jira/browse/IGNITE-6085
[3] https://calcite.apache.org/docs/powered_by.html
[4] https://www.memsql.com/blog/scaling-distributed-joins/
--
Kind Regards
Roman Kondakov

On 27.09.2019 12:20, Nikolay Izhikov wrote:

Hello, Igor.

Thanks for starting this discussion.

I think we should take a step back in it and answer the following questions:

1. What the exact issues with the H2 integration?
Can you send a tickets links?
Can we label all H2 integration issues in JIRA? I propose to use "h2" label.

2. What are the requirements for the new SQL engine?
We should write it down and discuss.

3. What options do we have?
Are there any alternatives to Calcite on the market?
We did the wrong choice that looked obvious one time.
So we should carefully avoid it at this time.

4. What is improvements of Ignite we want to make with the new engine?


В Пт, 27/09/2019 в 08:44 +, Igor Seliverstov пишет:

Hi Igniters!

As you might know currently we have many open issues relating to current H2 
based engine and its execution flow.

Some of them are critical (like impossibility to execute particular queries), 
some of them are majors (like impossibility to execute particular queries 
without pre-preparation your data to have a collocation) and many minors.

Most of the issues cannot be solved without whole engine redesign.

So, here the proposal: 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084

I'll appreciate if you share your thoughts on top of that.

Regards,
Igor

Re: New SQL execution engine

2019-09-27 Thread Andrey Mashenkov

Hi Nikolay,

Let me add my 5- cent here.

Ignite SQL layer has some issues that can't be fix with changes in Ignite
only, and we are blocked with H2.
To resolve these issues we can:
1. Donate some changes to H2 and wait for it's next release. But there are
more cons than pros and I think we can't rely on H2 project anymore.
- There is no guarantee our changes will be approved by H2 community.
- We definitely won't to depend on H2 product lifecycle.
- New H2 features (like parallel multi-statement query processing in latest
release) force Ignite for significant changes\refactoring in Ignite SQL
layer with no visible benefits.
Every next release it becomes harder to upgrade H2 dependency.
- Latest H2 versions causes questions about their stability.

Hot issues are
- Large intermediate results inside H2 internals can cause OOM for some
kind of queries. Ignite can't handle this anyhow for now without reworking
H2 code.
- HashJoins
- Ignite can't start multi-step queries, but 2-step (map-reduce) only.
- It is not possible to apply optimizations on query plan as no logical
plan actually doen't exists. H2 execution plan is hard-wired with H2
internals and can't be easily transformed.
Implementing a new good planner over H2 looks like a huge task.

2. Fork H2.
We already done this in GridGain (you can found H2 module in GridGain
community edition) as fastest way to unblock work on SQL improvements.
But this way doesn't look like a good one for Ignite, regarding our
experience.
- H2 code can't be included into Ignite at all.
H2 license are MIT and EPL. From one side they can't be changed to Apache
Licence. From other side Apache Foundation don't want to host any code
licensed with other than Apache License.
GridGain is ok with this, but Apache Foundation won't.

- We can made separate H2 fork project with it's own lifecycle with full
control over it and publish it in Maven Central.
This doen't seem like a big deal. But will causes additional difficulties
in development, test and release processes of Ignite.
This way seems bring much pain for every contributor.

3. Replace H2 with smth else.
E.g. with Apache Calcite.
- Calcite is a framework and it is designed very flexible and extendable.
- Every it's part can be replaced with our own implementation.
- Apache License is out of the box =)

So, summary:
1-st way of pain we have now and it slows down Ignite SQL layer developing.
2-nd looks few better, but seems bring Ignite to nowhere in prospect.
3-rd is a risky, but promissory way.


On Fri, Sep 27, 2019 at 12:16 PM Nikolay Izhikov 
wrote:

> Hello, Igor.
>
> Thanks for starting this discussion.
>
> I think we should take a step back in it and answer the following
> questions:
>
> 1. What the exact issues with the H2 integration?
> Can you send a tickets links?
> Can we label all H2 integration issues in JIRA? I propose to use "h2"
> label.
>
> 2. What are the requirements for the new SQL engine?
> We should write it down and discuss.
>
> 3. What options do we have?
> Are there any alternatives to Calcite on the market?
> We did the wrong choice that looked obvious one time.
> So we should carefully avoid it at this time.
>
> 4. What is improvements of Ignite we want to make with the new engine?
>
>
> В Пт, 27/09/2019 в 08:44 +, Igor Seliverstov пишет:
> > Hi Igniters!
> >
> > As you might know currently we have many open issues relating to current
> H2 based engine and its execution flow.
> >
> > Some of them are critical (like impossibility to execute particular
> queries), some of them are majors (like impossibility to execute particular
> queries without pre-preparation your data to have a collocation) and many
> minors.
> >
> > Most of the issues cannot be solved without whole engine redesign.
> >
> > So, here the proposal:
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084
> >
> > I'll appreciate if you share your thoughts on top of that.
> >
> > Regards,
> > Igor
>


-- 
Best regards,
Andrey V. Mashenkov

Re: New SQL execution engine

2019-09-27 Thread Roman Kondakov


Hi Igor!

In my opinion using Apache Calcite for distributed SQL query 
optimization and planning is much more promising approach than using H2. 
H2 is not suitable for distributed query execution and also it has very 
limited abilities for query optimization. While Apache Calcite is the 
open source implementation of Cascade/Volcano query optimization 
framework [1,2] (other implementations: MS SQL Server, Greenplum). The 
main advantage of this framework is it's extensibility - we can change 
the optimizer behavior by simply adding or removing optimization rules 
to it. Calcite has a cost based optimizer as well as heuristic one which 
can be useful in some situations.


The main challenges I see here:

1. Implementing the distributed query planning for Apache Calcite (it is 
was primarily developed for the single-node query optimization). We can 
reuse the solution of Apache Drill [3] guys here.


2. We need to implement a new distributed query execution engine. Apache 
Calcite is a query planning framework, but not the execution one, 
besides  it has some abilities for executing queries in the single-node 
case.


3. Secondary indexes are not supported by Calcite, so we need to 
overcome this problem somehow. AFAIK Apache Phoenix [4] guys implemented 
support of the secondary indexes as a sorted materialized views.


4. Apache Calcite is a cost-based optimizer - so we need to create our 
own cost model and gather statistics to be able to choose the most 
effective query execution plans.


5. What about deprecating our current query API which has a number of 
drawbacks like using shortcuts `List' as a query result or multiple 
redundant flags in `SqlFieldsQuery` (collocated, lazy, etc) which are 
useless for the new query execution engine?


[1] 
https://www.cse.iitb.ac.in/infolab/Data/Courses/CS632/Papers/Cascades-graefe.pdf
[2] 
https://www.cse.iitb.ac.in/infolab/Data/Courses/CS632/Papers/Volcano-graefe.pdf

[3] https://drill.apache.org/
[4] https://phoenix.apache.org/
--
Kind Regards
Roman Kondakov

On 27.09.2019 11:44, Igor Seliverstov wrote:

Hi Igniters!

As you might know currently we have many open issues relating to current H2 
based engine and its execution flow.

Some of them are critical (like impossibility to execute particular queries), 
some of them are majors (like impossibility to execute particular queries 
without pre-preparation your data to have a collocation) and many minors.

Most of the issues cannot be solved without whole engine redesign.

So, here the proposal: 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084

I'll appreciate if you share your thoughts on top of that.

Regards,
Igor

Re: New SQL execution engine

2019-09-27 Thread Nikolay Izhikov

Hello, Igor.

Thanks for starting this discussion.

I think we should take a step back in it and answer the following questions:

1. What the exact issues with the H2 integration?
Can you send a tickets links? 
Can we label all H2 integration issues in JIRA? I propose to use "h2" label.

2. What are the requirements for the new SQL engine?
We should write it down and discuss.

3. What options do we have?
Are there any alternatives to Calcite on the market?
We did the wrong choice that looked obvious one time. 
So we should carefully avoid it at this time.

4. What is improvements of Ignite we want to make with the new engine?


В Пт, 27/09/2019 в 08:44 +, Igor Seliverstov пишет:
> Hi Igniters!
> 
> As you might know currently we have many open issues relating to current H2 
> based engine and its execution flow.
> 
> Some of them are critical (like impossibility to execute particular queries), 
> some of them are majors (like impossibility to execute particular queries 
> without pre-preparation your data to have a collocation) and many minors.
> 
> Most of the issues cannot be solved without whole engine redesign. 
> 
> So, here the proposal: 
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084
> 
> I'll appreciate if you share your thoughts on top of that.
> 
> Regards,
> Igor


signature.asc
Description: This is a digitally signed message part

New SQL execution engine

2019-09-27 Thread Igor Seliverstov

Hi Igniters!

As you might know currently we have many open issues relating to current H2 
based engine and its execution flow.

Some of them are critical (like impossibility to execute particular queries), 
some of them are majors (like impossibility to execute particular queries 
without pre-preparation your data to have a collocation) and many minors.

Most of the issues cannot be solved without whole engine redesign. 

So, here the proposal: 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084

I'll appreciate if you share your thoughts on top of that.

Regards,
Igor

Re: Improvements for new security approach.

2019-09-27 Thread Denis Garus

Hello, Maksim!

Thank you for your effort and interest in the security of Ignite.

I would like you to pay attention to the discussion [1] and issue [2].
It looks like not only task should execute in the current security context
but all operations too, that is essential to determine a security id for
events.
Also, we need to get rid of GridTaskThreadContextKey#TC_SUBJ_ID as
duplication of IgnitSecurity responsibility.
I think your task is the right place to do that.
What is your opinion?

>>It's the reason why subject id isn't enough and we should transmit
subject inside message for this case.
There is a problem with this approach.
Subject's size is unlimited, that can lead to a dramatic increase in
traffic between nodes.

1.
http://apache-ignite-developers.2346864.n4.nabble.com/JavaDoc-for-Event-s-subjectId-methods-td43663.html
2. https://issues.apache.org/jira/browse/IGNITE-9914

пт, 27 сент. 2019 г. в 08:38, Anton Vinogradov :

> Maksim
>
> >> I want to fix 2-3-4 points under one ticket.
> Please let me know once it's become ready to be reviewed.
>
> On Thu, Sep 26, 2019 at 5:18 PM Maksim Stepachev <
> maksim.stepac...@gmail.com>
> wrote:
>
> > Hi.
> >
> > Anton Vinogradov,
> >
> > I want to fix 2-3-4 points under one ticket.
> >
> > The first was fixed in the ticket:
> > https://issues.apache.org/jira/browse/IGNITE-11094
> > Also, I aggry with you that 5-6 isn't required to ignite.
> >
> > Denis Garus,
> > I made reproducer for point 3. Looks at the test from my pull-request:
> > JettyRestPropagationSecurityContextTest
> >
> > https://github.com/apache/ignite/pull/6918
> >
> > For point 2 you should apply GridRestProcessor from pr and set debug into
> > VisorQueryUtils#scheduleQueryStart between
> > ignite.context().closure().runLocalSafe  and call:
> > ignite.context().security().securityContext()
> >
> >
> > For point 3, do action above and call:
> >
> ignite.context().discovery().node(ignite.context().security().securityContext().subject().id())
> >
> > It returns null because this subject was created from the rest. It's the
> > reason why subject id isn't enough and we should transmit subject inside
> > message for this case.
> >
> > чт, 18 июл. 2019 г. в 12:45, Anton Vinogradov :
> >
> >> Maksim,
> >>
> >> Could you please split IGNITE-11992 to subtasks with proper
> descriptions?
> >> This will allow us to relocate discussion to the issues to solve each
> >> problem properly.
> >>
> >> On Thu, Jul 18, 2019 at 11:57 AM Denis Garus 
> wrote:
> >>
> >> > Hello, Maksim!
> >> > Thanks for your analysis!
> >> >
> >> > I have a few questions about your proposals.
> >> >
> >> > GridRestProcessor.
> >> > AFAIK, when GridRestProcessor handle client request
> >> > (GridRestProcessor#handleRequest)
> >> > it process authentication (GridRestProcessor#authenticate)
> >> > and then authorization of request (GridRestProcessor#authorize) inside
> >> > client context.
> >> > Can you give additional info about issues with GridRestProcessor from
> 3
> >> and
> >> > 4? Maybe you have a reproducer for the problem?
> >> >
> >> > NoOpIgniteSecurityProcessor.
> >> > I think the case that you describe in 5 is not a bug.
> >> > All nodes (client and server) must have security enabled or disabled.
> >> > I can't imagine the case when it is not.
> >> >
> >> > ATTR_SECURITY_SUBJECT.
> >> > I don't think that compatibility is needed here. If you will use node
> >> with
> >> > propagation security context to remote node and older nodes
> >> > you can get subtle errors.
> >> >
> >> > чт, 18 июл. 2019 г. в 11:12, Maksim Stepachev <
> >> maksim.stepac...@gmail.com
> >> > >:
> >> >
> >> > > Hi, Ivan.
> >> > >
> >> > > Yes, I have.
> >> > > https://issues.apache.org/jira/browse/IGNITE-11992
> >> > >
> >> > > I'm waiting for a visa.
> >> > >
> >> > >
> >> > > чт, 18 июл. 2019 г. в 11:09, Ivan Rakov :
> >> > >
> >> > >> Hello Max,
> >> > >>
> >> > >> Thanks for your analysis!
> >> > >>
> >> > >> Have you created a JIRA issue for discovered defects?
> >> > >>
> >> > >> Best Regards,
> >> > >> Ivan Rakov
> >> > >>
> >> > >> On 17.07.2019 17:08, Maksim Stepachev wrote:
> >> > >> > Hello, Igniters.
> >> > >> >
> >> > >> >  The main idea of the new security is propagation security
> >> context
> >> > >> to
> >> > >> > other nodes and does action with initial permission. The solution
> >> > looks
> >> > >> > fine but has imperfections.
> >> > >> >
> >> > >> > 1. ZookeaperDiscoveryImpl doesn't implement security into itself.
> >> > >> >As a result: Caused by: class
> >> > >> org.apache.ignite.spi.IgniteSpiException:
> >> > >> > Security context isn't certain.
> >> > >> > 2. The visor tasks lost permission.
> >> > >> > The method VisorQueryUtils#scheduleQueryStart makes a new thread
> >> and
> >> > >> loses
> >> > >> > context.
> >> > >> > 3. The GridRestProcessor does tasks outside "withContext"
> >> section.  As
> >> > >> > result context loses.
> >> > >> > 4. The GridRestProcessor isn't client, we can't read security
> >> subject
>

37 matches

Mail list logo