Re: Sync vs async APIs in Ignite 3

2021-09-08 Thread Courtney Robinson
Hi Val,

I'd highly support an async first API based on CompletionStage
<https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/CompletionStage.html>
or
its subtypes like CompletableFuture.
In Ignite 2 we've written a wrapper library around IgniteFuture to provide
CompletionStage instead because many of the newer libs we use support this.
If Ignite 3 went this way it'd remove a lot of boiler plate/wrapper that we
wrote to get what you're suggesting here.

Regards,
Courtney Robinson
Founder and CEO, Hypi
Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>

<https://hypi.io>
https://hypi.io


On Wed, Sep 8, 2021 at 12:44 AM Valentin Kulichenko <
valentin.kuliche...@gmail.com> wrote:

> Igniters,
>
> I would like to gather some opinions on whether we want to focus on sync vs
> async APIs in Ignite 3.
>
> Here are some initial considerations that I have:
> 1. Ignite 2.x is essentially "sync first". Async APIs exist, but they use
> non-standard IgniteFuture and provide counterintuitive guarantees. In my
> experience, they significantly lack usability, and because of that are
> rarely used.
> 2. In general, however, async execution becomes more and more prominent.
> Something we can't ignore if we want to create a modern framework.
> 3. Still, async support in Java is very limited (especially if compared to
> other languages, like C# for example).
>
> My current position is the following (happy to discuss):
> 1. We should pay more attention to async APIs. As a general rule, async API
> should be primary, with the sync version build on top.
> 2. In languages with proper async support (async-await, etc.), we can skip
> sync API altogether. As an example of this, you can look at the first
> version of the .NET client [1]. It exposes only async methods, and it
> doesn't look like sync counterparts are really needed.
> 3. In Java (as well as other languages where applicable), we will add sync
> APIs that simply delegate to async APIs. This will help users to avoid
> CompletableFuture if they don't want to use it.
>
> [1] https://github.com/apache/ignite-3/pull/306
>
> Please share your thoughts.
>
> -Val
>


Re: [DISCUSS] IEP-71 Public API for secondary index search

2021-08-26 Thread Courtney Robinson
Prefer 1 from Teras' response. Specifying index name is preferred.
I've seen customers do idx(A,B) and idx(B,A) where semantics change between
the two.

Regards,
Courtney Robinson
Founder and CEO, Hypi
Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>

<https://hypi.io>
https://hypi.io


On Thu, Aug 26, 2021 at 4:28 PM Taras Ledkov  wrote:

> Hi,
>
> My proposal:
> 1. Don't search index by criteria, specify the index name always
> (preferred).
>
> OR
>
> 2. Search index by criteria without check the order of criteriones.
> Use the Set of criterions instead of the ordered collection.
> In the strange case when the both index exist (a, b) and (b, a) - use
> the any index
> when index name isn't specified.
>
> On 26.08.2021 16:49, Maksim Timonin wrote:
> > There are some thoughts about strict field order:
> > 1. Index (A, B) is not equivalent to index (B, A). Some queries may have
> > different performance on such indexes, and users have to specify the
> right
> > index. What if both indexes exist?
> > 2. We should avoid cases when a user uses in query only field B for index
> > (A, B). We have to force the user to specify range for (A) too, or
> > explicitly set it (null, null). Otherwise it looks like a mistake.
> >
> >
> >
> >
> > On Thu, Aug 26, 2021 at 4:39 PM Ivan Daschinsky 
> wrote:
> >
> >> 1. I suppose, that the next step is to implement the api for manually
> >> creating index. I think that user wants to create index that will speed
> up
> >> his criteria base queries, so he or she will use the same criteria to
> >> define the index. So no problem at all
> >> 2. We should print warning or throws exception if there is not any index
> >> that match specific criteria.
> >>
> >> BTW, Mongo DB doesn't make user to write index name in query. It just
> >> works.
> >>
> >> чт, 26 авг. 2021 г., 15:52 Taras Ledkov :
> >>
> >>> Hi,
> >>>
> >>>> It is an usability nightmare to make user write index name in all
> >> cases.
> >>> I don't see any difference between specifying the index name and
> >>> specifying the index fields in the right order.
> >>> Do you see?
> >>>
> >>> Let's there is the index:
> >>> idx_A_B ON TBL (A, B)
> >>>
> >>> Is it OK that the query like below doesn't math the index 'idx_A_B'?
> >>> new IndexQuery<>(..)
> >>>   .setCriteria(lt("b", 1), lt("a", 2));
> >>>
> >>> On 26.08.2021 15:23, Ivan Daschinsky wrote:
> >>>> I am against to make user write index name. It is quite simple and
> >>>> straightforward algorithm to match index to field names, so it is
> >> strange
> >>>> to compare it to sql engine optimizer.
> >>>>
> >>>> It is an usability nightmare to make user write index name in all
> >> cases.
> >>>> чт, 26 авг. 2021 г., 14:42 Maksim Timonin :
> >>>>
> >>>>> Hi, Igniters!
> >>>>>
> >>>>> There is a discussion about how to specify an index to query with an
> >>>>> IndexQuery [1]. Currently my PR provides 2 ways to specify index:
> >>>>> 1. With a table and index name;
> >>>>> 2. With a table and list of index fields (without index name). In
> this
> >>> case
> >>>>> IndexQueryProcessor tries to find an index that matches table and
> >> index
> >>>>> fields in strict order (order of fields in criteria has to match the
> >>> order
> >>>>> of fields in index).
> >>>>>
> >>>>> Discussion is whether is the second approach valid?
> >>>>>
> >>>>> Pros:
> >>>>> 1. Currently index name is an optional field for QueryIndex and
> >>>>> QuerySqlField. Then users can create an index with a table and list
> of
> >>>>> fields. Then, we should provide an opportunity to define an index for
> >>>>> querying the same way as we do for creating.
> >>>>> 2. It's required to know the index name to query it (in case the
> index
> >>> was
> >>>>> created without an explicit name). Users can find it and then use it
> >> as
> >>> a
> >>>>> constant in code, but I see some troubles there:
> >>>>> 2.1. Get index name by querying the system view INDEXES. Note, that
> >&g

Re: Ignite 3 async continuation executor

2021-08-19 Thread Courtney Robinson
Pavel I would really welcome this - when we first started with Ignite we
were constantly getting the Ignite threads blocked because we'd perform
other work on it.

I don't know about the configuration part however because this isn't a
static thing I'd argue.
Is Ignite 3 still using its own types or is it switching to
CompletableFuture
<https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/CompletableFuture.html>
?
The key APIs in CompletableFuture (acceptEitherAsync,applyToEitherAsync,
handleAsync, thenAcceptASync, thenComposeAsync, whenCompleteAsync) all
already accept an Executor argument so returning CompletableFuture solves
the problem, it'd just need documentation.

If Ignite 3 still uses its own types then I'd suggest what's needed is an
argument to accept a custom Executor.
We have dedicated pools configured now with custom UncaughtExceptionHandler
and ThreadFactory
<https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ThreadFactory.html>
that
we have various metrics and customisations around. If the only option is
the default ForkJoinPool#commonPool we'd lose this when eventually moving
to 3.

Regards,
Courtney Robinson
Founder and CEO, Hypi
Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>

<https://hypi.io>
https://hypi.io


On Thu, Aug 19, 2021 at 5:08 PM Alexander Polovtcev 
wrote:

> Pavel, thanks for the response. Do I understand correctly that it is not
> expected that a user may want to specify their own custom executor?
>
> On Thu, Aug 19, 2021 at 6:55 PM Pavel Tupitsyn 
> wrote:
>
> > Hi Alexander,
> >
> > To be honest, I'm not sure yet - just getting to know this new
> > configuration mechanism and format.
> >
> > Since we can't use a property of type Executor, we'll have to provide
> > predefined values.
> > It can either be "bool executeAsyncContinuationsDirectly": false
> (default)
> > => commonPool, true => Runnable::run,
> > or "String asyncContinuationExecutor" which allows two values "direct"
> and
> > "commonPool".
> >
> > I'm leaning towards the latter:
> >
> > {
> > "node": {
> > "metastorageNodes": [ "node-0" ],
> > "asyncContinuationExecutor": "commonPool"
> > },
> > "network": { ... }
> > }
> >
> >
> >
> > On Thu, Aug 19, 2021 at 6:29 PM Alexander Polovtcev <
> > alexpolovt...@gmail.com>
> > wrote:
> >
> > > Hi, Pavel!
> > >
> > > Can you please provide an example (e.g. HOCON snippet) of how this
> > > configuration is going to look like in Ignite 3? Or how is this
> property
> > > going to be set?
> > >
> > >
> > > On Thu, Aug 19, 2021 at 6:00 PM Pavel Tupitsyn 
> > > wrote:
> > >
> > > > Igniters,
> > > >
> > > > I propose to add a configurable async continuation executor for
> public
> > > APIs
> > > > to Ignite 3
> > > > like we have in Ignite 2.x [1]
> > > >
> > > > In short, currently, async APIs return a future to the user code.
> > > > Continuations like "myCode" in "table.getAsync().thenApply(myCode)"
> > will
> > > be
> > > > executed by the same thread that completes the future, which will be
> a
> > > > Netty thread or some other Ignite thread.
> > > >
> > > > This is dangerous because user code can be blocking or long-running,
> > and
> > > > system threads become unavailable.
> > > >
> > > > Proposal:
> > > > 1. Add asyncContinuationExecutor configuration property, defaults to
> > > > ForkJoinPool#commonPool - both for server and thin client
> > > > 2. Use this executor to complete all public API futures
> > > >
> > > > This means safe default behavior and a possibility to enable unsafe
> but
> > > > faster behavior with Runnable::run executor.
> > > >
> > > > Thoughts?
> > > >
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-70%3A+Async+Continuation+Executor
> > > >
> > >
> > >
> > > --
> > > With regards,
> > > Aleksandr Polovtcev
> > >
> >
>
>
> --
> With regards,
> Aleksandr Polovtcev
>


Re: Re[2]: Google Guava in Ignite 3

2021-08-10 Thread Courtney Robinson
I think since Calcite brings it in already then your arguments make sense.
Would it be pinned to the same version as Calcite? Risk NoSuchMethodError
at runtime if not.
+1

On Mon, Aug 9, 2021 at 9:56 AM Alexander Polovtcev 
wrote:

> Zhenya, Courtney, Andrey,
>
> What do you think about my arguments, was I able to convince you? I would
> like to reach some consensus here. At the moment, my original points still
> stand, I'm also ok with shading Guava if needed, though I think it is not
> necessary at this point.
>
> On Fri, Aug 6, 2021 at 12:45 PM Alexander Polovtcev <
> alexpolovt...@gmail.com>
> wrote:
>
> > Zhenya,
> >
> > > But there is no restrictions from running ignite server nodes from some
> > other code with it`s own guava version seems we obtain fast path to jar
> > hell here?
> >
> > I'm not sure if I fully understand your question, but it looks like we
> are
> > in this situation already, because we have some dependencies that use
> > Guava. That's why I propose to add Guava explicitly to at least have a
> > deterministic runtime configuration (see this link
> > <
> https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html#Dependency_Management
> >
> > for an explanation).
> >
> > On Fri, Aug 6, 2021 at 12:25 PM Zhenya Stanilovsky
> >  wrote:
> >
> >>
> >> Alexander, first of all looks like Ivan Daschinsky approach about thin
> >> client use only and shadow plugin are cover all Andrey Mashenkov listing
> >> problems.
> >> But there is no restrictions from running ignite server nodes from some
> >> other code with it`s own guava version seems we obtain fast path to jar
> >> hell here?
> >>
> >>
> >> >Zhenya,
> >> >
> >> >My intentions are the following:
> >> >
> >> >1. Remove some copy-pasted code (like the "bytecode" module or some
> >> utility
> >> >methods). Please see my original message for the links to the code.
> >> >2. Explicitly pin the Guava version to avoid conflicts in the runtime.
> >> >
> >> >About allowing to use Guava in the codebase, my thoughts are the
> >> following:
> >> >
> >> >1. We *already* use some code from Guava either directly (like in the
> >> >"calcite" module) or by copy-pasting it into a utility class.
> >> >2. I understand that some Guava methods are obsolete as of Java 11, but
> >> >some of them still don't have any standard library counterparts, in
> which
> >> >case I think using Guava is justified (which is supported by point 1).
> >> >
> >> >Can you please explain why you would disapprove of my proposal?
> >> >
> >> >On Thu, Aug 5, 2021 at 7:56 PM Zhenya Stanilovsky
> >> >< arzamas...@mail.ru.invalid > wrote:
> >> >
> >> >>
> >> >> alexpolovtcev please clarify what do you mean under : «possibility of
> >> >> using Guava in Ignite 3», using how necessary dependency of calcite
> or
> >> >> using like «using in our code» ? If using in code, i -1 here.
> >> >> thanks.
> >> >>
> >> >>
> >> >> >Hello, dear Igniters!
> >> >> >
> >> >> >I would like to discuss the possibility of using Guava
> >> >> ><  https://github.com/google/guava > in Ignite 3. I know about the
> >> >> restrictive
> >> >> >policy of using it in Ignite 2, but I have the following reasons:
> >> >> >
> >> >> >1. We are de-facto using it already as an implicit dependency, since
> >> the
> >> >> >Calcite module depends on it, and Calcite is going to stay for a
> >> while =)
> >> >> >2. AFAIK, the "bytecode" module is copied into the codebase only to
> >> strip
> >> >> >Guava away from it. We can remove this module, which will improve
> the
> >> >> >maintainability of the project.
> >> >> >3. We have some copy-paste of Guava code in the project. For
> example,
> >> see
> >> >> >this
> >> >> ><
> >> >>
> >>
> https://github.com/apache/ignite-3/blob/main/modules/core/src/main/java/org/apache/ignite/internal/util/IgniteUtils.java#L136
> >> >> >
> >> >> >and this
> >> >> ><
> >> >>
> >>
> https://github.com/apache/ignite-3/blob/main/modules/core/src/main/java/org/apache/ignite/internal/util/IgniteUtils.java#L428
> >> >> >
> >> >> >.
> >> >> >4. Regarding security concerns, this report
> >> >> ><
> >> >>
> >>
> https://www.cvedetails.com/product/52274/Google-Guava.html?vendor_id=1224
> >> >> >
> >> >> >shows no major vulnerability issues for the last three years.
> >> >> >
> >> >> >Taking these points into account, I propose to allow using Guava
> both
> >> in
> >> >> >production and test code and to add it as an explicit dependency.
> >> >> >
> >> >> >What do you think?
> >> >> >
> >> >> >--
> >> >> >With regards,
> >> >> >Aleksandr Polovtcev
> >> >>
> >> >>
> >> >>
> >> >>
> >> >
> >> >
> >> >--
> >> >With regards,
> >> >Aleksandr Polovtcev
> >>
> >>
> >>
> >>
> >
> >
> >
> > --
> > With regards,
> > Aleksandr Polovtcev
> >
>
>
> --
> With regards,
> Aleksandr Polovtcev
>


Re: Google Guava in Ignite 3

2021-08-05 Thread Courtney Robinson
Also, what impact will this have on peer class loading? Something I think
shading also resolves


On Thu, Aug 5, 2021 at 7:05 PM Courtney Robinson 
wrote:

> Can I suggest shading Guava?
> Guava and Netty are two notorious libraries for version conflicts because
> of their popularity and usefulness.
> Other projects (ES for example solved it by shading them it
> https://github.com/elastic/elasticsearch/issues/2091#issuecomment-7156766
> ).
>
> We use Ignite entirely as a thick client and already have Guava version
> conflicts from other projects (Calcite being one because we use it directly
> already) so Ignite bringing its own will only make this worse when we get
> to V3.
>
> Even Calcite itself already has Guava conflicts because of the Cassandra
> adapter. I'd +1 this but really only if it will be shaded.
>
> Regards,
> Courtney Robinson
> Founder and CEO, Hypi
> Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>
>
> <https://hypi.io>
> https://hypi.io
>
>
> On Thu, Aug 5, 2021 at 5:56 PM Zhenya Stanilovsky
>  wrote:
>
>>
>> alexpolovtcev please clarify what do you mean under : «possibility of
>> using Guava in Ignite 3», using how  necessary dependency of calcite or
>> using like «using in our code» ? If using in code, i -1 here.
>> thanks.
>>
>>
>> >Hello, dear Igniters!
>> >
>> >I would like to discuss the possibility of using Guava
>> >< https://github.com/google/guava > in Ignite 3. I know about the
>> restrictive
>> >policy of using it in Ignite 2, but I have the following reasons:
>> >
>> >1. We are de-facto using it already as an implicit dependency, since the
>> >Calcite module depends on it, and Calcite is going to stay for a while =)
>> >2. AFAIK, the "bytecode" module is copied into the codebase only to strip
>> >Guava away from it. We can remove this module, which will improve the
>> >maintainability of the project.
>> >3. We have some copy-paste of Guava code in the project. For example, see
>> >this
>> ><
>> https://github.com/apache/ignite-3/blob/main/modules/core/src/main/java/org/apache/ignite/internal/util/IgniteUtils.java#L136
>> >
>> >and this
>> ><
>> https://github.com/apache/ignite-3/blob/main/modules/core/src/main/java/org/apache/ignite/internal/util/IgniteUtils.java#L428
>> >
>> >.
>> >4. Regarding security concerns, this report
>> ><
>> https://www.cvedetails.com/product/52274/Google-Guava.html?vendor_id=1224
>> >
>> >shows no major vulnerability issues for the last three years.
>> >
>> >Taking these points into account, I propose to allow using Guava both in
>> >production and test code and to add it as an explicit dependency.
>> >
>> >What do you think?
>> >
>> >--
>> >With regards,
>> >Aleksandr Polovtcev
>>
>>
>>
>>
>
>


Re: Google Guava in Ignite 3

2021-08-05 Thread Courtney Robinson
Can I suggest shading Guava?
Guava and Netty are two notorious libraries for version conflicts because
of their popularity and usefulness.
Other projects (ES for example solved it by shading them it
https://github.com/elastic/elasticsearch/issues/2091#issuecomment-7156766).

We use Ignite entirely as a thick client and already have Guava version
conflicts from other projects (Calcite being one because we use it directly
already) so Ignite bringing its own will only make this worse when we get
to V3.

Even Calcite itself already has Guava conflicts because of the Cassandra
adapter. I'd +1 this but really only if it will be shaded.

Regards,
Courtney Robinson
Founder and CEO, Hypi
Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>

<https://hypi.io>
https://hypi.io


On Thu, Aug 5, 2021 at 5:56 PM Zhenya Stanilovsky
 wrote:

>
> alexpolovtcev please clarify what do you mean under : «possibility of
> using Guava in Ignite 3», using how  necessary dependency of calcite or
> using like «using in our code» ? If using in code, i -1 here.
> thanks.
>
>
> >Hello, dear Igniters!
> >
> >I would like to discuss the possibility of using Guava
> >< https://github.com/google/guava > in Ignite 3. I know about the
> restrictive
> >policy of using it in Ignite 2, but I have the following reasons:
> >
> >1. We are de-facto using it already as an implicit dependency, since the
> >Calcite module depends on it, and Calcite is going to stay for a while =)
> >2. AFAIK, the "bytecode" module is copied into the codebase only to strip
> >Guava away from it. We can remove this module, which will improve the
> >maintainability of the project.
> >3. We have some copy-paste of Guava code in the project. For example, see
> >this
> ><
> https://github.com/apache/ignite-3/blob/main/modules/core/src/main/java/org/apache/ignite/internal/util/IgniteUtils.java#L136
> >
> >and this
> ><
> https://github.com/apache/ignite-3/blob/main/modules/core/src/main/java/org/apache/ignite/internal/util/IgniteUtils.java#L428
> >
> >.
> >4. Regarding security concerns, this report
> ><
> https://www.cvedetails.com/product/52274/Google-Guava.html?vendor_id=1224
> >
> >shows no major vulnerability issues for the last three years.
> >
> >Taking these points into account, I propose to allow using Guava both in
> >production and test code and to add it as an explicit dependency.
> >
> >What do you think?
> >
> >--
> >With regards,
> >Aleksandr Polovtcev
>
>
>
>


Re: Apache Ignite 3 Alpha 2 webinar follow up questions

2021-07-31 Thread Courtney Robinson
Hi Ivan,
Atri's description of the query plan being cached is what I was thinking of
with my description.

I lack the knowledge on how the statistics are maintained to really comment
constructively Atri but my first question about the problem you raise with
statistics would be:

How/where are the stats maintained and if a query plan is cached based on
some stats, is it not possible to invalidate the cached plan periodically
or based on statistics changes?

Regards,
Courtney Robinson
Founder and CEO, Hypi
Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>

<https://hypi.io>
https://hypi.io


On Sat, Jul 31, 2021 at 8:54 AM Atri Sharma  wrote:

> Query caching works on three levels - caching results, caching blocks and
> caching query plans.
>
> Prepared queries work by caching a plan for a query and reusing that plan
> by changing the parameters for the incoming query. So the query remains the
> same, but input values keep changing.
>
> The problem with prepared queries is that query execution can go bad very
> fast if the underlying data distribution changes and the cached plan is no
> longer optimal for the given statistics.
>
> On Sat, 31 Jul 2021, 12:54 Ivan Pavlukhin,  wrote:
>
> > Hi Courtney,
> >
> > Please clarify what do you mean by prepared queries and query caching?
> > Do you mean caching query results? If so, in my mind material views
> > are the best approach here (Ignite 2 does not support them). Do you
> > have other good approaches in your mind? E.g. implemented in other
> > databases.
> >
> > 2021-07-26 21:27 GMT+03:00, Valentin Kulichenko <
> > valentin.kuliche...@gmail.com>:
> > > Hi Courtney,
> > >
> > > Generally speaking, query caching certainly makes sense. As far as I
> > know,
> > > Ignite 2.x actually does that, but most likely there might be room for
> > > improvement as well. We will look into this.
> > >
> > > As for the SQL API - the answer is yes. The requirement for a dummy
> cache
> > > is an artifact of the current architecture. This is 100% wrong and will
> > be
> > > changed in 3.0.
> > >
> > > -Val
> > >
> > > On Sun, Jul 25, 2021 at 2:51 PM Courtney Robinson
> > > 
> > > wrote:
> > >
> > >> Something else came to mind, are there plans to support prepared
> > queries?
> > >>
> > >> I recall someone saying before that Ignite does internally cache
> queries
> > >> but it's not at all clear if or how it does do that. I assume a simple
> > >> hash
> > >> of the query isn't enough.
> > >>
> > >> We generate SQL queries based on user runtime settings and they can
> get
> > >> to
> > >> hundreds of lines long, I imagine this means most of our queries are
> not
> > >> being cached but there are patterns so we could generate and manage
> > >> prepared queries ourselves.
> > >>
> > >> Also, will there be a dedicated API for doing SQL queries rather than
> > >> having to pass a SqlFieldsQuery to a cache that has nothing to do with
> > >> the
> > >> cache being queried? When I first started with Ignite years ago, this
> > was
> > >> beyond confusing for me. I'm trying to run select x from B but I pass
> > >> this
> > >> to a cache called DUMMY or whatever arbitrary name...
> > >>
> > >> On Fri, Jul 23, 2021 at 4:05 PM Courtney Robinson <
> > >> courtney.robin...@hypi.io>
> > >> wrote:
> > >>
> > >> > Andrey,
> > >> > Thanks for the response - see my comments inline.
> > >> >
> > >> >
> > >> >> I've gone through the questions and have no the whole picture of
> your
> > >> use
> > >> >> case.
> > >> >
> > >> > Would you please clarify how you exactly use the Ignite? what are
> the
> > >> >> integration points?
> > >> >>
> > >> >
> > >> > I'll try to clarify - we have a low/no code platform. A user
> designs a
> > >> > model for their application and we map this model to Ignite tables
> and
> > >> > other data sources. The model I'll describe is what we're building
> now
> > >> and
> > >> > expected to be in alpha some time in Q4 21. Our current production
> > >> > architecture is different and isn't as generic, it is heavily tied
> to
> > >> > Ignite and we've redesigned to get some fle

Re: Text Queries Support

2021-07-26 Thread Courtney Robinson
+1 we're all saying the same thing here.

My example from before select x from T0 where term(args to solr term query)
AND ..
term(xxx) was meant to indicate a lucene term query and so there'd be a
list of lucene functions exposed in a similar way.


On Mon, Jul 26, 2021 at 5:45 PM Atri Sharma  wrote:

> +1
>
> Lets expose custom functions in Ignite SQL which allows us to use the full
> capabilities that Lucene offers
>
> On Mon, 26 Jul 2021, 21:51 Andrey Mashenkov, 
> wrote:
>
> > Val,
> >
> > > I believe this is something we can look into in the scope of Ignite 3.
> > > Andrey, does Calcite have any support for this? What's your view on
> this?
> >
> > As Atri already mentioned, SQL 92 standard declares "LIKE" operator for
> > pattern matching.
> > Calcite supports LIKE operator.
> >
> > I've found it is a RexNode (expression) and I doubt it supports indices.
> > Maybe, LIKE can use a sorted index for prefix matching or equality
> > conditions, but it is very far from what we are talking about.
> >
> > Full-text search term is much wider than just a pattern matching.
> > Lucene provides much more capabilities on that and has rich
> > syntax contrary to "LIKE" operator.
> > So, LIKE operator is the standard operator with the defined contract. I'm
> > not sure it is worth integrating Lucene just for it.
> > I think we should have native support for full-text search queries
> and/or a
> > custom SQL function.
> >
> > E.g. Postgres syntax for FTS queries [1] is completely different to
> "LIKE"
> > operator.
> >
> > [1]
> >
> >
> https://www.postgresql.org/docs/9.5/textsearch-intro.html#TEXTSEARCH-MATCHING
> >
> > On Sat, Jul 24, 2021 at 4:49 PM Courtney Robinson <
> > courtney.robin...@hypi.io>
> > wrote:
> >
> > > Hey Ari,
> > > Yes, I wasn't suggesting that Solr should be used. That's just what
> we're
> > > doing now out of necessity.
> > > It was more the fact that Calcite's SqlOperator can be used to provide
> > the
> > > interface to Lucene.
> > > For all the reasons you mentioned and more, using Lucene is the right
> > > choice
> > >
> > > Calcite doesn't have support for Solr but it has an ES adapter which is
> > > what we modified to support Solr.
> > >
> > > Regards,
> > > Courtney Robinson
> > > Founder and CEO, Hypi
> > > Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>
> > >
> > > <https://hypi.io>
> > > https://hypi.io
> > >
> > >
> > > On Sat, Jul 24, 2021 at 1:59 PM Atri Sharma  wrote:
> > >
> > > > What that entails is that the end user has to keep a Solr cluster
> > > running,
> > > > which comes with its own challenges (now you have to manage two
> systems
> > > > instead of one).
> > > >
> > > > I believe Calcite has native support for Solr?
> > > >
> > > > OTOH, having native Lucene indices allow us to control per partition
> > > > indices with no distributed overhead, since Lucene is a per node
> > instance
> > > > with no global coordination.
> > > >
> > > > On Sat, 24 Jul 2021, 16:57 Courtney Robinson, <
> > courtney.robin...@hypi.io
> > > >
> > > > wrote:
> > > >
> > > > > I'll add in here.
> > > > > I agree with you Valentin, the decoupled state of text queries
> makes
> > it
> > > > > useless for most use cases we have.
> > > > >
> > > > > As it relates to Calcite and Ignite 3, one approach (the one we're
> > > taking
> > > > > because we use calcite independent of Ignite) is to provide a bunch
> > of
> > > > SQL
> > > > > functions that we implement as SqlOperator
> > > > > <
> > > > >
> > > >
> > >
> >
> https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/SqlOperator.html
> > > > > >.
> > > > > I forget how we've done aggregation functions but we have those too
> > and
> > > > > they map to Solr aggregations (which ultimately end up in lucene).
> > > > >
> > > > > This allows Solr filters to take part in the rest of the query.
> It's
> > > > > probably more complex than this for Ignite but that's one possible
> > > route
> > > > > but we generate queries like select x 

Re: Apache Ignite 3 Alpha 2 webinar follow up questions

2021-07-25 Thread Courtney Robinson
Something else came to mind, are there plans to support prepared queries?

I recall someone saying before that Ignite does internally cache queries
but it's not at all clear if or how it does do that. I assume a simple hash
of the query isn't enough.

We generate SQL queries based on user runtime settings and they can get to
hundreds of lines long, I imagine this means most of our queries are not
being cached but there are patterns so we could generate and manage
prepared queries ourselves.

Also, will there be a dedicated API for doing SQL queries rather than
having to pass a SqlFieldsQuery to a cache that has nothing to do with the
cache being queried? When I first started with Ignite years ago, this was
beyond confusing for me. I'm trying to run select x from B but I pass this
to a cache called DUMMY or whatever arbitrary name...

On Fri, Jul 23, 2021 at 4:05 PM Courtney Robinson 
wrote:

> Andrey,
> Thanks for the response - see my comments inline.
>
>
>> I've gone through the questions and have no the whole picture of your use
>> case.
>
> Would you please clarify how you exactly use the Ignite? what are the
>> integration points?
>>
>
> I'll try to clarify - we have a low/no code platform. A user designs a
> model for their application and we map this model to Ignite tables and
> other data sources. The model I'll describe is what we're building now and
> expected to be in alpha some time in Q4 21. Our current production
> architecture is different and isn't as generic, it is heavily tied to
> Ignite and we've redesigned to get some flexibility where Ignite doesn't
> provide what we want. Things like window functions and other SQL-99 limits.
>
> In the next gen version we're working on you can create a model for a
> Tweet(content, to) and we will create an Ignite table with content and to
> columns using the type the user selects. This is the simplest case.
> We are adding generic support for sources and sinks and using Calcite as a
> data virtualisation layer. Ignite is one of the available source/sinks.
>
> When a user creates a model for Tweet, we also allow them to specify how
> they want to index the data. We have a copy of the calcite Elasticsearch
> adapter modified for Solr.
>
> When a source is queried (Ignite or any other that we support), we
> generate SQL that Calcite executes. Calcite will push down the generated
> queries to Solr and Solr produces a list of IDs (in case of Ignite) and we
> do a multi-get from Ignite to produce the actual results.
>
> Obviously there's a lot more to this but that should give you a general
> idea.
>
> and maybe share some experience with using Ignite SPIs?
>>
> Our evolution with Ignite started from the key value + compute APIs. We
> used the SPIs then but have since moved to using only the Ignite SQL API
> (we gave up transactions for this).
>
> We originally used the indexing SPI to keep our own lucene index of data
> in a cache. We did not use the Ignite FTS as it is very limited compared to
> what we allow customers to do. If I remember correctly, we were using an
> affinity compute job to send queries to the right Ignite node and
> then doing a multi-get to pull the data from caches.
> I think we used one or two other SPIs and we found them very useful to be
> able to extend and customise Ignite without having to fork/change upstream
> classes. We only stopped using them because we eventually concluded that
> using the SQL only API was better for numerous reasons.
>
>
>> We'll keep the information in mind while developing the Ignite,
>> because this may help us to make a better product.
>>
>> By the way, I'll try to answer the questions.
>>
>> >   1. Schema change - does that include the ability to change the types
>> of
>> >   fields/columns?
>> Yes, we plan to support transparent conversion to a wider type on-fly
>> (e.g.
>> 'int' to 'long').
>> This is a major point of our Live-schema concept.
>> In fact, there is no need to convert data on all the nodes in a
>> synchronous
>> way as old SQL databases do (if one supports though),
>> we are going to support multiple schema versions and convert data
>> on-demand
>> on a per-row basis to the latest version,
>> then write-back the row.
>>
>
> I can understand. The auto conversion to wider type makes sense.
>
>>
>> More complex things like 'String' -> 'int' are out of scope for now
>> because
>> it requires the execution of a user code on the critical path.
>>
>
> I would argue though that executing user code on the critical path
> shouldn't be a blocker for custom conversions. I feel if a user is making
> an advance enough integration to provide c

Re: Text Queries Support

2021-07-24 Thread Courtney Robinson
Hey Ari,
Yes, I wasn't suggesting that Solr should be used. That's just what we're
doing now out of necessity.
It was more the fact that Calcite's SqlOperator can be used to provide the
interface to Lucene.
For all the reasons you mentioned and more, using Lucene is the right choice

Calcite doesn't have support for Solr but it has an ES adapter which is
what we modified to support Solr.

Regards,
Courtney Robinson
Founder and CEO, Hypi
Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>

<https://hypi.io>
https://hypi.io


On Sat, Jul 24, 2021 at 1:59 PM Atri Sharma  wrote:

> What that entails is that the end user has to keep a Solr cluster running,
> which comes with its own challenges (now you have to manage two systems
> instead of one).
>
> I believe Calcite has native support for Solr?
>
> OTOH, having native Lucene indices allow us to control per partition
> indices with no distributed overhead, since Lucene is a per node instance
> with no global coordination.
>
> On Sat, 24 Jul 2021, 16:57 Courtney Robinson, 
> wrote:
>
> > I'll add in here.
> > I agree with you Valentin, the decoupled state of text queries makes it
> > useless for most use cases we have.
> >
> > As it relates to Calcite and Ignite 3, one approach (the one we're taking
> > because we use calcite independent of Ignite) is to provide a bunch of
> SQL
> > functions that we implement as SqlOperator
> > <
> >
> https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/SqlOperator.html
> > >.
> > I forget how we've done aggregation functions but we have those too and
> > they map to Solr aggregations (which ultimately end up in lucene).
> >
> > This allows Solr filters to take part in the rest of the query. It's
> > probably more complex than this for Ignite but that's one possible route
> > but we generate queries like select x from T0 where term(args to solr
> term
> > query) AND ...
> >
> > Regards,
> > Courtney Robinson
> > Founder and CEO, Hypi
> > Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>
> >
> > <https://hypi.io>
> > https://hypi.io
> >
> >
> > On Fri, Jul 23, 2021 at 7:14 PM Valentin Kulichenko <
> > valentin.kuliche...@gmail.com> wrote:
> >
> > > Atri,
> > >
> > > Sure, go ahead. Let's put the ideas on paper and have a discussion.
> > >
> > > -Val
> > >
> > > On Fri, Jul 23, 2021 at 10:59 AM Atri Sharma  wrote:
> > >
> > > > Thanks Andrey.
> > > >
> > > > I have collected answers or proposals to many of these questions and
> > > > would like to start a wiki page covering what we can do for Ignite 3.
> > > >
> > > > Does that sound good, please?
> > > >
> > > > On Fri, Jul 23, 2021 at 4:26 PM Andrey Mashenkov
> > > >  wrote:
> > > > >
> > > > > Atri,
> > > > >
> > > > > First of all, I'd recommend going through the Ignite ticket to
> gather
> > > > > information about the current implementation issues and users'
> wants.
> > > > > Then look at a code to get a complete understanding of how things
> > work
> > > > now,
> > > > > which may help in future decisions.
> > > > >
> > > > > As we use the outdated Lucene version, some things may be
> irrelevant
> > > for
> > > > > the latest Lucene version.
> > > > > So, you will need expertise in the internals of modern Lucene
> version
> > > to
> > > > > understand what capabilities, guarantees, and limitations Lucene
> has
> > > and
> > > > > could bring to the Ignite.
> > > > > The expertise could be got from the Lucene project code or Lucene
> > > project
> > > > > dev-list.
> > > > >
> > > > >
> > > > > As for now, the potential capabilities are not clear to me.
> > > > > At first glance, I see the next topics that must be covered at
> first:
> > > > >
> > > > > General questions
> > > > > * How Lucene index can be split among the nodes?
> > > > > * If we'll have a single index for all partitions on the particular
> > > node,
> > > > > then how index records will be aware of partitioning?
> > > > > This is important to filter out backup records from the results to
> > > avoid
> > > > > duplicates.
> > > > > * How results from several nodes can be 

Re: Text Queries Support

2021-07-24 Thread Courtney Robinson
I'll add in here.
I agree with you Valentin, the decoupled state of text queries makes it
useless for most use cases we have.

As it relates to Calcite and Ignite 3, one approach (the one we're taking
because we use calcite independent of Ignite) is to provide a bunch of SQL
functions that we implement as SqlOperator
<https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/SqlOperator.html>.
I forget how we've done aggregation functions but we have those too and
they map to Solr aggregations (which ultimately end up in lucene).

This allows Solr filters to take part in the rest of the query. It's
probably more complex than this for Ignite but that's one possible route
but we generate queries like select x from T0 where term(args to solr term
query) AND ...

Regards,
Courtney Robinson
Founder and CEO, Hypi
Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>

<https://hypi.io>
https://hypi.io


On Fri, Jul 23, 2021 at 7:14 PM Valentin Kulichenko <
valentin.kuliche...@gmail.com> wrote:

> Atri,
>
> Sure, go ahead. Let's put the ideas on paper and have a discussion.
>
> -Val
>
> On Fri, Jul 23, 2021 at 10:59 AM Atri Sharma  wrote:
>
> > Thanks Andrey.
> >
> > I have collected answers or proposals to many of these questions and
> > would like to start a wiki page covering what we can do for Ignite 3.
> >
> > Does that sound good, please?
> >
> > On Fri, Jul 23, 2021 at 4:26 PM Andrey Mashenkov
> >  wrote:
> > >
> > > Atri,
> > >
> > > First of all, I'd recommend going through the Ignite ticket to gather
> > > information about the current implementation issues and users' wants.
> > > Then look at a code to get a complete understanding of how things work
> > now,
> > > which may help in future decisions.
> > >
> > > As we use the outdated Lucene version, some things may be irrelevant
> for
> > > the latest Lucene version.
> > > So, you will need expertise in the internals of modern Lucene version
> to
> > > understand what capabilities, guarantees, and limitations Lucene has
> and
> > > could bring to the Ignite.
> > > The expertise could be got from the Lucene project code or Lucene
> project
> > > dev-list.
> > >
> > >
> > > As for now, the potential capabilities are not clear to me.
> > > At first glance, I see the next topics that must be covered at first:
> > >
> > > General questions
> > > * How Lucene index can be split among the nodes?
> > > * If we'll have a single index for all partitions on the particular
> node,
> > > then how index records will be aware of partitioning?
> > > This is important to filter out backup records from the results to
> avoid
> > > duplicates.
> > > * How results from several nodes can be merged on the Reduce stage?
> > > * Does Lucene supports smth like JOIN operation or others that may
> > require
> > > data from another partition or index?
> > > If so, then it likes to multistep query with merging results on
> > > intermediate stages and requires detailed investigation and design.
> > > It is ok if Ignite will have some limitations here, but we would like
> to
> > > know about them at the early stage.
> > > * How effectively map Lucene files to the page memory? Is it even
> > possible?
> > > Otherwise, how to deal with potential OOM on large queries and memory
> > > capacity planning?
> > >
> > > Persistence.
> > > * How and what consistency guarantees could we have/expect?
> > > Seems, we may not be able to write physical records for Lucene index to
> > our
> > > WAL. What can we do with this?
> > >
> > > Transactions.
> > > * Will we support transactions?
> > > * Should Lucene be aware of Transaction and track mvcc (or whatever)
> > > versions for the records?
> > > * What will be consistency guarantees?
> > >
> > > UX
> > > * How to add FullText search queries syntax into Calcite?
> > > * AFAIK, the Lucene index has many properties for tuning. How will the
> > user
> > > configure the index?
> > > * How and where to store the settings? What are cluster-wide and what a
> > > local to the particular node?
> > > * Will be all the settings immutable? Can be they changed on-fly? after
> > > node/grid restart?
> > > * Any limitations on query syntax?
> > >
> > > SQL
> > > * Will we support FullText search in SQL?
> > > * How to integrate Lucene index into Calcite? What is the

Re: Apache Ignite 3 Alpha 2 webinar follow up questions

2021-07-23 Thread Courtney Robinson
des) will definitely kill the performance
> in that case.
> So, the preliminary loadCache() call looks like a good compromise.
>
I think the problem is largely that the CacheStore interface is not
sufficient for being able to do this. If it had a richer interface which
allowed the cache store to answer index queries basically hooking into
whatever Ignite's doing for its B+tree then this would be viable. A
CacheStore that only implements KV API doesn't take part in SQL queries.

>
> 3. Splitting query into 2 parts to run on Ignite and to run on CacheStore
> looks possible with Calcite,
> but I think it impractical because in general, neither CacheStore nor
> database structure are aware of the data partitioning.
>
hmmm, maybe I missed the point but as the implementor of the CacheStore you
should have knowledge of the structure and partition info. or have some way
of retrieving it. Again, I think the current CacheStore interface is the
problem and if it was extended to provide this information then its up to
the implementation to do this whilst Ignite knows that any implementation
of these interfaces will meet the contract necessary.


>
> 4. Transactions can't be supported in case of direct CacheStore access,
> because even if the underlying database supports 2-phase commit, which is a
> rare case, the recovery protocol looks hard.
> Just looks like this feature doesn't worth it.
>
I'd completely agree with this. It will be incredibly hard to get this done
reliably

>
>
> >   6. This question wasn't mine but I was going to ask it as well: What
> >   will happen to the Indexing API since H2 is being removed?
> As I wrote above, Indexing SPI will be dropped, but IndexQuery will be
> added.
>
> >  1. As I mentioned above, we Index into Solr, in earlier versions of
> >  our product we used the indexing SPI to index into Lucene on the
> Ignite
> >  nodes but this presented so many challenges we ultimately abandoned
> it and
> >  replaced it with the current Solr solution.
> AFAIK, some guys developed and sell a plugin for Ignite-2 with persistent
> Lucene and Geo indices.
> I don't know about the capabilities and limitations of their solution,
> because of closed code.
> You can easily google it.
>
> I saw few encouraged guys who want to improve TEXT queries,
> but unfortunately, things weren't moved far enough. For now, they are in
> the middle of fixing the merging TEXT query results.
> So far so good.
>
> I think it is a good chance to master the skill developing of a distributed
> system for the one
> who will take a lead over the full-text search feature and add native
> FullText index support into Ignite-3.
>
I've seen the other thread from Atri I believe about this.

>
>
> >   7. What impact does RAFT now have on conflict resolution?
> RAFT is a state machine replication protocol. It guarantees all the nodes
> will see the updates in the same order.
> So, seems no conflicts are possible. Recovery from split-brain is
> impossible in common-case.
>
> However, I think we have a conflict resolver analog in Ignite-3 as it is
> very useful in some cases
> e.g datacenter replication, incremental data load from 3-rd party source,
> recovery from 3-rd party source.
>
>
> > 8. CacheGroups.
> AFAIK, CacheGroup will be eliminated, actually, we'll keep this mechanic,
> but it will be configured in a different way,
> which makes Ignite configuring a bit simpler.
> Sorry, for now, I have no answer on your performance concerns, this part of
> Ignite-3 slipped from my radar.
>
No worries. I'll wait and see if anyone else suggests something. Its
getting a lot worse, a node took 1hr to start yesterday after a deployment
and its in prod with very little visibility into what it is doing, it was
just stopped, no logging or anything and then resumed.

2021-07-22 13:40:15.997  INFO [ArcOS,,,] 9 --- [orker-#40%hypi%]
o.a.i.i.p.cache.GridCacheProcessor  [285] :  Finished recovery for
cache [cache=hypi_01F8ZC3DGT66RNYCDZH3XNVY2E_Hue, grp=hypi,
startVer=AffinityTopologyVersion [topVer=79, minorTopVer=0]]

One hour later it printed the next cache recovery message and started 30
seconds after going through other tables.



>
> Let's wait if someone will clarify what we could expect in Ignite-3.
> Guys, can someone chime in and give more light on 3,4,7,8 questions?
>
>
> On Thu, Jul 22, 2021 at 4:15 AM Courtney Robinson <
> courtney.robin...@hypi.io>
> wrote:
>
> > Hey everyone,
> > I attended the Alpha 2 update yesterday and was quite pleased to see the
> > progress on things so far. So first, congratulations to everyone on the
> > work being put in and thank you to Val and Kseniya for running
> yesterday's
> > event.
> >
> > I 

Apache Ignite 3 Alpha 2 webinar follow up questions

2021-07-21 Thread Courtney Robinson
ctResolver and manager are used
  by GridCacheMapEntry which just says if use old value do this
otherwise use
  newVal. Ideally this will be exposed in the new API so that one can
  override this behaviour. The last writer wins approach isn't always ideal
  and the semantics of the domain can mean that what is consider
"correct" in
  a conflict is not so for a different domain.
   8. This is last on the list but is actually the most important for us
   right now as it is an impending and growing risk. We allow customers to
   create their own tables on demand. We're already using the same cache group
   etc for data structures to be re-used but now that we're getting to
   thousands of tables/caches our startup times are sometimes unpredictably
   long - at present it seems to depend on the state of the cache/table before
   the restart but we're into the order of 5 - 7 mins and steadily increasing
   with the growth of tables. Are there any provisions in Ignite 3 for
   ensuring startup time isn't proportional to the number of tables/caches
   available?


Those are the key things I can think of at the moment. Val and others I'd
love to open a conversation around these.

Regards,
Courtney Robinson
Founder and CEO, Hypi
Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>

<https://hypi.io>
https://hypi.io