Re: DDL implementation details

Denis Magda Thu, 26 Jan 2017 18:51:25 -0800

Agree that schema-per-cache approach is natural for Ignite but will be a 
nightmare for those who migrate from a RDBMS. Inventing  custom DDL statements 
is also a dead end. This won’t make users life and transitions from a RDBMS 
easier.


Let’s to look at this differently. Most of us suggest the users never store 
multiple types (queries entities) per cache consider this as an anti-pattern. I 
do remember that we even brought up an idea to apply query-entity-per-cache 
rule.

Conjuring that Ignite follows query-entity-per-cache rule, DDL statements might 
have the following effect:

CREATE TABLE blablabla {…} - creates both cache and query entity with name 
blablabla.

In SQL queries we don’t need to specify a cache name as a schema name at all 
even during joins thanks to cache-per-query-entity rule. We can figure this out 
automatically.

Executing `CREATE SCHEME myScheme` and using it later in DDL statements (USING 
SCHEME myScheme) we will add the scheme name as an attribute to respective 
query entities or caches. When the scheme name is used in a SQL query the 
engine will look for a valid query entity/cache that has the attribute set to 
this name.

I do realize that this will break the compatibility but we’re approaching 2.0. 
Good chance to make not only DDL but SQL Grid more user friendly.

—
Denis

> On Jan 17, 2017, at 7:29 PM, Dmitriy Setrakyan <[email protected]> wrote:
> 
> After looking at all the restrictions of "cache==schema" approach, I still
> think that it would be the most natural match for Ignite. All other
> suggestions are either too limiting, too hard, or not applicable for
> Ignite.
> 
> My vote would be to start with "cache==schema" and get some user feedback.
> We can always add "tablespaces" in future, if our users demand it.
> 
> D.
> 
> On Tue, Jan 17, 2017 at 1:29 AM, Sergi Vladykin <[email protected]>
> wrote:
> 
>> Sergey,
>> 
>> See inline:
>> 
>> 2017-01-17 0:50 GMT+03:00 Sergey Kozlov <[email protected]>:
>> 
>>> For the approach schema==cache it is not clear how it works for some
>> cases
>>> (especialy if SQL and regular cache operations are mixed):
>>> 
>>> 1. I want to store two tables in same cache (same schema) and both tables
>>> have identical structure like id Integer, name String.
>>> How they will be processed for regular cache operations
>>> put/putall/get/getall etc where we get cache instance by its name (on SQL
>>> level we use the table name) ?
>>> 
>> 
>> This already works. We can have multiple QueryEntities configured for a
>> single cache. The only restriction here is that cache key must be unique
>> across all the tables inside the same schema.
>> 
>> 
>>> 
>>> 2. What's about ALTER TABLE statement? Will the changes for one table
>> lock
>>> all tables in the cache?
>>> 
>> 
>> It will not be needed, any table structure modification will be performed
>> in the table scope.
>> 
>> 
>>> 3. In the future if we will introduce table-based features like SQL roles
>>> (as next step of SQL implementation) "many tables one cache" rule looks
>>> like more difficult for its implementation.
>>> 
>> 
>> I don't think so, must be the same thing.
>> 
>> 
>>> 
>>> p.s. may be we need to think about new entity called "table" (based on
>> type
>>> descriptor) inside cache to better support of SQL DDL
>>> 
>> 
>> We already have it: QueryEntity. It has exactly that semantics.
>> 
>> Sergi
>> 
>> 
>>> 
>>> 
>>> On Mon, Jan 16, 2017 at 11:06 PM, Sergi Vladykin <
>> [email protected]
>>>> 
>>> wrote:
>>> 
>>>> Dima,
>>>> 
>>>> I agree that cache==table is definitely a wrong choice, but as far as I
>>> see
>>>> Vova suggests having cache==tablespace instead of cache==schema. I tend
>>> to
>>>> agree with this decoupling of physical and logical grouping, but the
>>>> concern is that it will require much more work to do.
>>>> 
>>>> Sergi
>>>> 
>>>> 2017-01-16 21:35 GMT+03:00 Dmitriy Setrakyan <[email protected]>:
>>>> 
>>>>> Vova,
>>>>> 
>>>>> Currently I see only 2 ways we can proceed here:
>>>>> 
>>>>>   1. cache == table
>>>>>   2. cache == schema
>>>>> 
>>>>> I agree that "cache==table" may be more flexible, but I don't think
>> it
>>>> will
>>>>> work in Ignite.
>>>>> We may end up with 1,000s of caches, which will carry significant
>>>> overhead
>>>>> on memory and cluster overall. I think that we have no choice but to
>>> take
>>>>> "cache==schema" approach.
>>>>> 
>>>>> D.
>>>>> 
>>>>> On Mon, Jan 16, 2017 at 1:00 AM, Vladimir Ozerov <
>> [email protected]
>>>> 
>>>>> wrote:
>>>>> 
>>>>>> Sergi, Dima,
>>>>>> 
>>>>>> In the scope of Ignite 1.x it is perfectly fine to have "schema =
>>>> cache".
>>>>>> Nobody suffers from it because nobody use Ignite as database. But
>> in
>>>>>> future, thanks to page memory, we are going to target real database
>>> use
>>>>>> cases. Users will have multiple tables in Ignite. Plus views,
>>> triggers,
>>>>>> constraints, etc.. All these features are very useful and easy to
>>>>> implement
>>>>>> provided that we already have table and index implementations. And
>> in
>>>>>> databases all related objects are *logically *grouped in a
>> "schema".
>>>> This
>>>>>> is convenient for users: less boilerplate in SQL, better
>>> manageability
>>>>>> (remember that database users will definitely need some console
>>> and/or
>>>> UI
>>>>>> tools to manage Ignite as a database).
>>>>>> 
>>>>>> What you offer is to group database objects *physically *rather
>> than
>>>>>> logically. It will lead to:
>>>>>> - Boilerplate in queries
>>>>>> - Inconvenient database management. All the things database users
>> are
>>>>> used
>>>>>> to - import/export tools, UIs, "USING" keyword, etc, will look
>> weird
>>> in
>>>>>> Ignite as there will be no way to group arbitrary objects
>> logically.
>>>>>> 
>>>>>> With this approach almost every user will have to use two schemes
>>>> instead
>>>>>> of one - one for operational data (PARTITIONED) and one for
>> reference
>>>>> data
>>>>>> (REPLICATED). No conventional database works this way.
>>>>>> 
>>>>>> Vladimir.
>>>>>> 
>>>>>> On Fri, Jan 13, 2017 at 9:18 PM, Dmitriy Setrakyan <
>>>>> [email protected]>
>>>>>> wrote:
>>>>>> 
>>>>>>> Vova,
>>>>>>> 
>>>>>>> I will join Sergi here. It seems like "schema = cache" will take
>>> care
>>>>> of
>>>>>>> all different configuration properties required for different
>>> groups
>>>> of
>>>>>>> caches. In addition, it cleanly maps into current Ignite
>>>> architecture.
>>>>> We
>>>>>>> will need to have a very strong reason to move away from it.
>>>>>>> 
>>>>>>> D.
>>>>>>> 
>>>>>>> On Fri, Jan 13, 2017 at 2:39 AM, Vladimir Ozerov <
>>>> [email protected]
>>>>>> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Correct, it worked, because Ignite has never had real database
>>> use
>>>>> case
>>>>>>> in
>>>>>>>> mind. Unfortunately, if our global plans go as expected, it
>> will
>>>> not
>>>>>> work
>>>>>>>> for Ignite 2.x+.
>>>>>>>> 
>>>>>>>> On Fri, Jan 13, 2017 at 11:53 AM, Sergi Vladykin <
>>>>>>> [email protected]
>>>>>>>>> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Lets move on with SQL schema == Ignite cache. It worked
>> always
>>>> like
>>>>>>>> this, I
>>>>>>>>> see no reasons to change this.
>>>>>>>>> 
>>>>>>>>> Sergi
>>>>>>>>> 
>>>>>>>>> 2017-01-13 11:20 GMT+03:00 Vladimir Ozerov <
>>> [email protected]
>>>>> :
>>>>>>>>> 
>>>>>>>>>> "Tablespace" (Oracle, PostgreSQL) is what maps better than
>>>>> "schema"
>>>>>>> to
>>>>>>>>> our
>>>>>>>>>> cache. But not ideally still.
>>>>>>>>>> 
>>>>>>>>>> On Fri, Jan 13, 2017 at 11:10 AM, Vladimir Ozerov <
>>>>>>>> [email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Alex,
>>>>>>>>>>> 
>>>>>>>>>>> Currently Ignite is not used as database. It is used as
>>>> search
>>>>>>>> engine -
>>>>>>>>>>> several types, several tables, several joins. This is why
>>>>> having
>>>>>>>>> "SCHEMA
>>>>>>>>>> ==
>>>>>>>>>>> cache" was never a problem. Users have never build
>> complex
>>>> SQL
>>>>>>>>>> applications
>>>>>>>>>>> on top of Ignite. But we are going towards database. And
>> my
>>>>>>> question
>>>>>>>>>> stands
>>>>>>>>>>> still - suppose it is Y2019, how is user going to migrate
>>> his
>>>>>>>> database
>>>>>>>>>>> containing 20-30-50-100 tables in a single schema in
>> Oracle
>>>> to
>>>>>>>> Ignite?
>>>>>>>>>>> 
>>>>>>>>>>> Single cache for all tables? Doens't work - not flexible.
>>>> Users
>>>>>>> will
>>>>>>>>>>> definitely require different cache modes, different
>>>> co-location
>>>>>>>> rules,
>>>>>>>>>>> different number of backups, etc..
>>>>>>>>>>> Schema per table? Doesn't work either - unmanageable and
>>> not
>>>>>>>> convenient
>>>>>>>>>>> for users even for relatively small databases.
>>>>>>>>>>> 
>>>>>>>>>>> From user perspective schema is logical grouping of
>>> database
>>>>>>> objects,
>>>>>>>>>>> nothing more.
>>>>>>>>>>> 
>>>>>>>>>>> For Ignite schema could be a logical group of resources
>>>> (nodes,
>>>>>>>> memory
>>>>>>>>>>> pools, caches, etc.). And multiple tables over multiple
>>>> caches
>>>>>>> should
>>>>>>>>>>> reside in it. To the contrast, table definition governs
>> how
>>>>> data
>>>>>> is
>>>>>>>>>> stored.
>>>>>>>>>>> This is similar to, for example, MySQL approach, where
>> you
>>>>> define
>>>>>>> how
>>>>>>>>> you
>>>>>>>>>>> store data on per-table level, and on schema level you
>>> define
>>>>>> only
>>>>>>>>> minor
>>>>>>>>>>> things like collation.
>>>>>>>>>>> 
>>>>>>>>>>> Vladimir.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Fri, Jan 13, 2017 at 10:33 AM, Alexander Paschenko <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Vova,
>>>>>>>>>>>> 
>>>>>>>>>>>> 2017-01-13 4:56 GMT+08:00 Vladimir Ozerov <
>>>>> [email protected]
>>>>>>> :
>>>>>>>>>>>>> I am not quite sure I understand the idea of "SCHEMA
>> ==
>>>>>> cache".
>>>>>>>>>> Consider
>>>>>>>>>>>>> some small database with, say, ~30 tables. And user
>>> wants
>>>> to
>>>>>>>> migrate
>>>>>>>>>> to
>>>>>>>>>>>>> Ignite. How is he supposed to do so? 30 schemas
>> leading
>>> to
>>>>>>> rewrite
>>>>>>>>> of
>>>>>>>>>>>> all
>>>>>>>>>>>>> his SQL scripts? Or 30 key-value pairs in a single
>> cache
>>>>>> leading
>>>>>>>> to
>>>>>>>>>>>> lack of
>>>>>>>>>>>>> flexibility and performance problems?
>>>>>>>>>>>> 
>>>>>>>>>>>> But currently schema *is* semantically equal to cache
>>> while
>>>>>> table
>>>>>>> is
>>>>>>>>>>>> equal to type descriptor (i.e. type of stored entities),
>>>>> nothing
>>>>>>> new
>>>>>>>>>>>> here.
>>>>>>>>>>>> 
>>>>>>>>>>>> Say, in single cache we may have entities of types
>> Person
>>>> and
>>>>>>>>>>>> Organization, those map to two tables with same names,
>> and
>>>> can
>>>>>> be
>>>>>>>>>>>> accessed within the same cache (i.e. schema).
>>>>>>>>>>>> 
>>>>>>>>>>>> If we want to limit the user with having single type
>>>>> descriptor
>>>>>>> per
>>>>>>>>>>>> cache (i.e. cache has only one type of stored entities -
>>>> BTW,
>>>>>>> where
>>>>>>>> we
>>>>>>>>>>>> are with this 2.0-wise?), then this notion could change.
>>> But
>>>>>>>> currently
>>>>>>>>>>>> what has been suggested already fits quite good with
>> what
>>> we
>>>>> do
>>>>>>> have
>>>>>>>>>>>> at the moment regarding semantic of SQL objects.
>>>>>>>>>>>> 
>>>>>>>>>>>> - Alex
>>>>>>>>>>>> 
>>>>>>>>>>>>> Another example is how to deal with referene tables?
>>> Lots
>>>>>>> database
>>>>>>>>> has
>>>>>>>>>>>>> small reference tables which is best to fit REPLICATED
>>>>> cache,
>>>>>>>> while
>>>>>>>>>>>> others
>>>>>>>>>>>>> are usually bound to PARTITIONED mode. "SCHEMA ==
>> cache"
>>>>> will
>>>>>>>> force
>>>>>>>>>>>> users
>>>>>>>>>>>>> to split them into separate schemes leading to poor
>> user
>>>>>>>> experience.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I understand that we may have some implementation
>>> details
>>>>>> around
>>>>>>>> it
>>>>>>>>> at
>>>>>>>>>>>> the
>>>>>>>>>>>>> moment. But from user perspective "SCHEMA == cache"
>>>> doesn't
>>>>>> make
>>>>>>>>>> sense.
>>>>>>>>>>>> As
>>>>>>>>>>>>> we are going towards AI 2.0 we'd better to rethink
>> this
>>>>>>> approach.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Thu, Jan 12, 2017 at 11:46 PM, Denis Magda <
>>>>>>> [email protected]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Jan 12, 2017, at 12:35 PM, Dmitriy Setrakyan <
>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Thu, Jan 12, 2017 at 9:47 AM, Sergi Vladykin <
>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> The xml config was only for example. We can put in
>>>> this
>>>>>>>>>>>> configuration
>>>>>>>>>>>>>>>> string cache config parameters directly like this:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> CREATE SCHEMA "MyCacheName" WITH
>>>>>>>>>>>>>>>> "cacheMode=REPLICATED;atomicityMode=ATOMIC"
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> This approach makes sense, if it can be easily
>>>> supported
>>>>>> with
>>>>>>>> H2.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> What’s for affinity keys? Can we make an exception
>> for
>>>> them
>>>>>> by
>>>>>>>>>>>> defining in
>>>>>>>>>>>>>> this part of the statement
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> CREATE TABLE employee (
>>>>>>>>>>>>>>   id BIGINT PRIMARY KEY,
>>>>>>>>>>>>>>   dept_id BIGINT AFFINITY KEY,
>>>>>>>>>>>>>>   name VARCHAR(128),
>>>>>>>>>>>>>> );
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> or that l
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> CREATE TABLE employee (
>>>>>>>>>>>>>>   id BIGINT PRIMARY KEY,
>>>>>>>>>>>>>>   dept_id BIGINT,
>>>>>>>>>>>>>>   name VARCHAR(128),
>>>>>>>>>>>>>>   CONSTRAINT affKey AFFINITY KEY(dept_id)
>>>>>>>>>>>>>> );
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> ?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> —
>>>>>>>>>>>>>> Denis
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Sergey Kozlov
>>> GridGain Systems
>>> www.gridgain.com
>>> 
>>

Re: DDL implementation details

Reply via email to