Re: IGNITE-2294 implementation details

Alexander Paschenko Wed, 27 Jul 2016 03:00:08 -0700

Sergi,

But current signature of query() method returns not just some
iterator, but rather iterator of R which is type param of Query -
i.e., we won't be able to return an int inside a QueryCursor<R>. At
least without API change (signature of query() method will have to be
changed to drop genericness, or in some other weird way). Is this what
we really want? Or am I missing something in your point?


- Alex

2016-07-27 12:51 GMT+03:00 Sergi Vladykin <[email protected]>:
> Exactly. This will allow our Jdbc driver to work transparently.
>
> Sergi
>
> 2016-07-27 12:40 GMT+03:00 Alexander Paschenko <
> [email protected]>:
>
>> Sergi,
>>
>> You wrote:
>> > I'd prefer to return the same information, so it will not be empty
>>
>> Do you mean return iterator with single element that denotes number of
>> rows?
>>
>> Dmitriy,
>>
>> You wrote:
>> > What is the ticket number for this. Is the new API documented there?
>>
>> Overall issue number is 2294. There's no particular issue on API
>> changes, but creating one seems to be a good idea, I will do it.
>>
>> - Alex
>>
>> 2016-07-27 9:20 GMT+03:00 Dmitriy Setrakyan <[email protected]>:
>> > What is the ticket number for this. Is the new API documented there?
>> >
>> > On Tue, Jul 26, 2016 at 11:36 AM, Sergi Vladykin <
>> [email protected]>
>> > wrote:
>> >
>> >> I don't see anything ugly in empty iterator, sorry if I insulted your
>> taste
>> >> of beauty.
>> >>
>> >> If you will take a look at Jdbc, you will see that
>> Statement.executeUpdate
>> >> method returns number of updated rows, I'd prefer to return the same
>> >> information, so it will not be empty (beauty is restored!).
>> >>
>> >> Sergi
>> >>
>> >>
>> >>
>> >> 2016-07-26 18:24 GMT+03:00 Alexander Paschenko <
>> >> [email protected]>:
>> >>
>> >> > I see your point. But what about my concerns from initial post?
>> >> > Particularly about signatures of existing methods? I personally don't
>> >> > like an option of query() method always returning an empty iterator
>> >> > for any non-select query, it seems ugly design wise.
>> >> >
>> >> > - Alex
>> >> >
>> >> > 2016-07-26 18:15 GMT+03:00 Sergi Vladykin <[email protected]>:
>> >> > > BTW, the simplest way to solve this issue is to allow running SQL
>> >> > commands
>> >> > > inside of SqlFieldsQuery.
>> >> > >
>> >> > > We may add some additional convenience API for updates if we want,
>> but
>> >> > JDBC
>> >> > > client will always call it like this:
>> >> > >
>> >> > > cache.query(new SqlFieldsQuery("INSERT INTO MY_TABLE
>> >> > > VALUES(?,?)").setArgs(1,2));
>> >> > >
>> >> > > This will resolve any ambiguity.
>> >> > >
>> >> > > Sergi
>> >> > >
>> >> > > 2016-07-26 17:56 GMT+03:00 Sergi Vladykin <[email protected]
>> >:
>> >> > >
>> >> > >> I don't like any pre-parsing, especially with some libraries other
>> >> than
>> >> > >> H2. H2 itself has enough quirks to multiply it on quirks of another
>> >> > library.
>> >> > >>
>> >> > >> This is exactly what I was talking about - we need some single
>> entry
>> >> > point
>> >> > >> on API for all the SQL commands and queries. Thats why I suggested
>> >> > >> SqlUpdate to extend Query. To me its is the cleanest approach. May
>> be
>> >> we
>> >> > >> need to change in some backward compatible way this Query
>> hierarchy to
>> >> > get
>> >> > >> rid of extra methods but the idea is still the same.
>> >> > >>
>> >> > >> Sergi
>> >> > >>
>> >> > >> 2016-07-26 14:34 GMT+03:00 Alexander Paschenko <
>> >> > >> [email protected]>:
>> >> > >>
>> >> > >>> Guys,
>> >> > >>>
>> >> > >>> I would like to advance the discussion further. There's one quite
>> >> > >>> important question that arose based on current state of work on
>> this
>> >> > >>> issue. If we use some kind of interactive console, like Visor,
>> then
>> >> > >>> how should it know whether SQL query it is requested to execute
>> >> > >>> returns a result set or not? In JDBC world, solution is quite
>> simple
>> >> -
>> >> > >>> there's base interface called Statement that all commands
>> implement,
>> >> > >>> and it has magic isResultSet method that tells whether statement
>> is a
>> >> > >>> query or an update command. The API proposed now has separate
>> Query
>> >> > >>> and Update operations which I believe to be a right thing by the
>> >> > >>> reasons I outlined in the beginning of this thread. However, their
>> >> > >>> lack of common ancestor prevents possible console clients from
>> >> running
>> >> > >>> text SQL commands in a fully transparent manner - like
>> >> > >>> IgniteCache.execute(String sql). Therefore I see two possible
>> ways of
>> >> > >>> solving this:
>> >> > >>>
>> >> > >>> - we change API so that it includes new class or interface
>> parenting
>> >> > >>> both Query and Update, and clients use it to communicate with
>> cache
>> >> > >>> - we let (or make :) ) the client determine command type
>> >> independently
>> >> > >>> and behave accordingly - for it to work it will have some kind of
>> >> > >>> command parsing by itself just to determine its type. Visor
>> console
>> >> > >>> may use simple library like JSqlParser
>> >> > >>> (https://github.com/JSQLParser/JSqlParser; dual LGPL 2.1/ASF 2.0
>> >> > >>> licensed) to determine request type in terms of JDBC, and behave
>> >> > >>> accordingly.
>> >> > >>>
>> >> > >>> Personally, I think that the second approach is better - and
>> here's
>> >> > why.
>> >> > >>>
>> >> > >>> First, it does not seem wise to change API simply to make console
>> (or
>> >> > >>> any other) clients simpler. Programmatic APIs should be concise
>> and
>> >> > >>> short for programmatic use, console clients should be easy to use
>> >> from
>> >> > >>> console - and that's it: after all, console client exists to free
>> a
>> >> > >>> user from burden of doing things programmatically, so its aim is
>> to
>> >> > >>> adapt API to console or whatever UI.
>> >> > >>> Second, possible complications in client implied by such approach
>> >> > >>> certainly won't be dramatic - I don't think that additional single
>> >> > >>> query parsing operation in client code will make it much harder to
>> >> > >>> develop.
>> >> > >>> Third, as I see it now, adding a new "synthetic" entity and new
>> >> method
>> >> > >>> would take more effort to adapting the client to new API.
>> >> > >>>
>> >> > >>> Dmitry, Sergi, I would like to hear what you think about it all.
>> >> > Thanks.
>> >> > >>>
>> >> > >>> - Alex
>> >> > >>>
>> >> > >>> 2016-07-21 21:17 GMT+03:00 Dmitriy Setrakyan <
>> [email protected]
>> >> >:
>> >> > >>> > OK, then using your analogy, the current behavior in Ignite is
>> >> MERGE
>> >> > for
>> >> > >>> > the most part.
>> >> > >>> >
>> >> > >>> > My preference is that Ignite SQL should work no different from
>> >> > >>> traditional
>> >> > >>> > databases, which means:
>> >> > >>> >
>> >> > >>> > - INSERT is translated into *putIfAbsent()* call in Ignite
>> >> > >>> > - UPDATE is translated into *replace()* call in Ignite
>> >> > >>> > - MERGE is translated into *put()* call in Ignite
>> >> > >>> > - For SQL BATCH calls we should delegate to Ignite batch
>> >> operations,
>> >> > >>> e.g.
>> >> > >>> > *putAll()*
>> >> > >>> >
>> >> > >>> > The above should hold true for atomic and transactional
>> put/putAll
>> >> > >>> calls,
>> >> > >>> > as well as for the data streamer.
>> >> > >>> >
>> >> > >>> > Does this make sense?
>> >> > >>> >
>> >> > >>> > D.
>> >> > >>> >
>> >> > >>> > On Thu, Jul 21, 2016 at 4:06 AM, Sergi Vladykin <
>> >> > >>> [email protected]>
>> >> > >>> > wrote:
>> >> > >>> >
>> >> > >>> >> No, this does not make sense.
>> >> > >>> >>
>> >> > >>> >> There is no upsert mode in databases. There are operations:
>> >> INSERT,
>> >> > >>> UPDATE,
>> >> > >>> >> DELETE, MERGE.
>> >> > >>> >>
>> >> > >>> >> I want to have clear understanding of how they have to behave
>> in
>> >> SQL
>> >> > >>> >> databases and how they will actually behave in Ignite in
>> different
>> >> > >>> >> scenarios. Also I want to have clear understanding of
>> performance
>> >> > >>> >> implications of each decision here.
>> >> > >>> >>
>> >> > >>> >> Anything wrong with that?
>> >> > >>> >>
>> >> > >>> >> Sergi
>> >> > >>> >>
>> >> > >>> >> On Thu, Jul 21, 2016 at 1:04 PM, Dmitriy Setrakyan <
>> >> > >>> [email protected]>
>> >> > >>> >> wrote:
>> >> > >>> >>
>> >> > >>> >> > Serj, are you asking what will happen as of today? Then the
>> >> answer
>> >> > >>> to all
>> >> > >>> >> > your questions is that duplicate keys are not an issue, and
>> >> Ignite
>> >> > >>> always
>> >> > >>> >> > operates in **upsert** mode (which is essentially a *“put(…)”
>> >> > >>> *method).
>> >> > >>> >> >
>> >> > >>> >> > However, the *“insert”* that is suggested by Alex would
>> delegate
>> >> > to
>> >> > >>> >> > *“putIfAbsent(…)”*, which in database world makes more sense.
>> >> > >>> However, in
>> >> > >>> >> > this case, the *“update”* syntax should delegate to
>> >> > *“replace(…)”*,
>> >> > >>> as
>> >> > >>> >> > update should fail in case if a key is absent.
>> >> > >>> >> >
>> >> > >>> >> > Considering the above, a notion of “*upsert”* or “*merge”
>> >> > *operation
>> >> > >>> is
>> >> > >>> >> > very much needed, as it will give a user an option to perform
>> >> > >>> >> > “insert-or-update” in 1 call.
>> >> > >>> >> >
>> >> > >>> >> > Does this make sense?
>> >> > >>> >> >
>> >> > >>> >> > D.
>> >> > >>> >> >
>> >> > >>> >> > On Wed, Jul 20, 2016 at 9:39 PM, Sergi Vladykin <
>> >> > >>> >> [email protected]>
>> >> > >>> >> > wrote:
>> >> > >>> >> >
>> >> > >>> >> > > I'd prefer to do MERGE operation last because in H2 it is
>> not
>> >> > >>> standard
>> >> > >>> >> > ANSI
>> >> > >>> >> > > SQL MERGE. Or may be not implement it at all, or may be
>> >> > contribute
>> >> > >>> ANSI
>> >> > >>> >> > > correct version to H2, then implement it on Ignite. Need to
>> >> > >>> investigate
>> >> > >>> >> > the
>> >> > >>> >> > > semantics deeper before making any decisions here.
>> >> > >>> >> > >
>> >> > >>> >> > > Lets start with simple scenarios for INSERT and go through
>> all
>> >> > the
>> >> > >>> >> > possible
>> >> > >>> >> > > cases and answer the questions:
>> >> > >>> >> > > - What will happen on key conflict in TX cache?
>> >> > >>> >> > > - What will happen on key conflict in Atomic cache?
>> >> > >>> >> > > - What will happen with the previous two if we use
>> DataLoader?
>> >> > >>> >> > > - How to make these operations efficient (it will be simple
>> >> > enough
>> >> > >>> to
>> >> > >>> >> > > implement them with separate put/putIfAbsent operations but
>> >> > >>> probably we
>> >> > >>> >> > > will need some batching like putAllIfAbsent for
>> efficiency)?
>> >> > >>> >> > >
>> >> > >>> >> > > As for API, we still will need to have a single entry point
>> >> for
>> >> > >>> all SQL
>> >> > >>> >> > > queries/commands to allow any console work with it
>> >> > transparently.
>> >> > >>> It
>> >> > >>> >> > would
>> >> > >>> >> > > be great if we will be able to come up with something
>> >> consistent
>> >> > >>> with
>> >> > >>> >> > this
>> >> > >>> >> > > idea on public API.
>> >> > >>> >> > >
>> >> > >>> >> > > Sergi
>> >> > >>> >> > >
>> >> > >>> >> > >
>> >> > >>> >> > >
>> >> > >>> >> > >
>> >> > >>> >> > >
>> >> > >>> >> > >
>> >> > >>> >> > >
>> >> > >>> >> > >
>> >> > >>> >> > > On Wed, Jul 20, 2016 at 2:23 PM, Dmitriy Setrakyan <
>> >> > >>> >> > > [email protected]>
>> >> > >>> >> > > wrote:
>> >> > >>> >> > >
>> >> > >>> >> > > > Like the idea of merge and insert. I need more time to
>> think
>> >> > >>> about
>> >> > >>> >> the
>> >> > >>> >> > > API
>> >> > >>> >> > > > changes.
>> >> > >>> >> > > >
>> >> > >>> >> > > > Sergi, what do you think?
>> >> > >>> >> > > >
>> >> > >>> >> > > > Dmitriy
>> >> > >>> >> > > >
>> >> > >>> >> > > >
>> >> > >>> >> > > >
>> >> > >>> >> > > > On Jul 20, 2016, at 12:36 PM, Alexander Paschenko <
>> >> > >>> >> > > > [email protected]> wrote:
>> >> > >>> >> > > >
>> >> > >>> >> > > > >> Thus, I suggest that we implement MERGE as a separate
>> >> > >>> operation
>> >> > >>> >> > backed
>> >> > >>> >> > > > by putIfAbsent operation, while INSERT will be
>> implemented
>> >> via
>> >> > >>> put.
>> >> > >>> >> > > > >
>> >> > >>> >> > > > > Sorry, of course I meant that MERGE has to be
>> put-based,
>> >> > while
>> >> > >>> >> INSERT
>> >> > >>> >> > > > > has to be putIfAbsent-based.
>> >> > >>> >> > > > >
>> >> > >>> >> > > > > 2016-07-20 12:30 GMT+03:00 Alexander Paschenko
>> >> > >>> >> > > > > <[email protected]>:
>> >> > >>> >> > > > >> Hell Igniters,
>> >> > >>> >> > > > >>
>> >> > >>> >> > > > >> In this thread I would like to share and discuss some
>> >> > >>> thoughts on
>> >> > >>> >> > DML
>> >> > >>> >> > > > >> operations' implementation, so let's start and keep it
>> >> > here.
>> >> > >>> >> > Everyone
>> >> > >>> >> > > > >> is of course welcome to share their suggestions.
>> >> > >>> >> > > > >>
>> >> > >>> >> > > > >> For starters, I was thinking about semantics of
>> INSERT.
>> >> In
>> >> > >>> >> > traditional
>> >> > >>> >> > > > >> RDBMSs, INSERT works only for records whose primary
>> keys
>> >> > don't
>> >> > >>> >> > > > >> conflict with those of records that are already
>> >> persistent
>> >> > -
>> >> > >>> you
>> >> > >>> >> > can't
>> >> > >>> >> > > > >> try to insert the same key more than once because
>> you'll
>> >> > get
>> >> > >>> an
>> >> > >>> >> > error.
>> >> > >>> >> > > > >> However, semantics of cache put is obviously
>> different -
>> >> it
>> >> > >>> does
>> >> > >>> >> not
>> >> > >>> >> > > > >> have anything about duplicate keys, it just quietly
>> >> updates
>> >> > >>> values
>> >> > >>> >> > in
>> >> > >>> >> > > > >> case of keys' duplication. Still, cache has
>> putIfAbsent
>> >> > >>> operation
>> >> > >>> >> > that
>> >> > >>> >> > > > >> is closer to traditional notion of INSERT, and H2's
>> SQL
>> >> > >>> dialect
>> >> > >>> >> has
>> >> > >>> >> > > > >> MERGE operation which corresponds to semantics of
>> cache
>> >> > put.
>> >> > >>> >> Thus, I
>> >> > >>> >> > > > >> suggest that we implement MERGE as a separate
>> operation
>> >> > >>> backed by
>> >> > >>> >> > > > >> putIfAbsent operation, while INSERT will be
>> implemented
>> >> via
>> >> > >>> put.
>> >> > >>> >> > > > >>
>> >> > >>> >> > > > >> And one more, probably more important thing: I suggest
>> >> > that we
>> >> > >>> >> > create
>> >> > >>> >> > > > >> separate class Update and corresponding operation
>> >> update()
>> >> > in
>> >> > >>> >> > > > >> IgniteCache. The reasons are as follows:
>> >> > >>> >> > > > >>
>> >> > >>> >> > > > >> - Query bears some flags that are clearly redundant
>> for
>> >> > Update
>> >> > >>> >> (page
>> >> > >>> >> > > > >> size, locality)
>> >> > >>> >> > > > >> - query() method in IgniteCache (one that accepts
>> Query)
>> >> > and
>> >> > >>> >> query()
>> >> > >>> >> > > > >> methods in GridQueryIndexing return iterators. So, if
>> we
>> >> > >>> strive to
>> >> > >>> >> > > > >> leave interfaces unchanged, we still will introduce
>> some
>> >> > >>> design
>> >> > >>> >> > > > >> ugliness like query methods returning empty iterators
>> for
>> >> > >>> certain
>> >> > >>> >> > > > >> queries, and/or query flags that indicate whether
>> it's an
>> >> > >>> update
>> >> > >>> >> > query
>> >> > >>> >> > > > >> or not, etc.
>> >> > >>> >> > > > >> - If some Queries are update queries, then continuous
>> >> > queries
>> >> > >>> >> can't
>> >> > >>> >> > be
>> >> > >>> >> > > > >> based on them - more design-wise ugly checks and stuff
>> >> like
>> >> > >>> that.
>> >> > >>> >> > > > >> - I'm pretty sure there's more I don't know about.
>> >> > >>> >> > > > >>
>> >> > >>> >> > > > >> Comments and suggestions are welcome. Sergi Vladykin,
>> >> > Dmitry
>> >> > >>> >> > > > >> Setrakyan, your opinions are of particular interest,
>> >> please
>> >> > >>> >> advise.
>> >> > >>> >> > > > >>
>> >> > >>> >> > > > >> Regards,
>> >> > >>> >> > > > >> Alex
>> >> > >>> >> > > >
>> >> > >>> >> > >
>> >> > >>> >> >
>> >> > >>> >>
>> >> > >>>
>> >> > >>
>> >> > >>
>> >> >
>> >>
>>

Re: IGNITE-2294 implementation details

Reply via email to