Re: DML data streaming
If we adapt table-per-cache policy, then table name should be equal to cache name, especially when table is created via SQL. For complex types, the type should also be equal to the table name. If the value type is primitive, then you can still use the table name in SQL and use the table name as cache name in code. In my view, the design works. Do you agree? D. On Thu, Feb 16, 2017 at 11:58 PM, Vladimir Ozerovwrote: > Dima, > > Value type name doesn't necessarily maps to table name. For instance, what > if I have two tables like this? They both have "java.lang.Long" as type > name. > > CREATE table *t1* { > pk_id BIGINT PRIMARY KEY, > val BIGINT > } > > CREATE table *t2* { > pk_id BIGINT PRIMARY KEY, > val BIGINT > } > > On Fri, Feb 17, 2017 at 12:40 AM, Dmitriy Setrakyan > > wrote: > > > Vladimir, I am not sure I understand your point. The value type name > should > > be the table name, no? > > > > On Thu, Feb 16, 2017 at 12:13 AM, Vladimir Ozerov > > wrote: > > > > > Dima, > > > > > > At this point we require the following additional data which is outside > > of > > > standard SQL: > > > - Key type > > > - Value type > > > - Set of key columns > > > > > > I do not know yet how we will define these values. At the very least we > > can > > > calculate them automatically in some cases. For "keyFieldName" and > > > "valFieldName" things are easier, as we can always derive them from > table > > > definition. > > > > > > Example 1 - primitives: > > > > > > CREATE TABLE ( > > > *pk_id* BIGINT PRIMARY KEY, > > > *val* BIGINT > > > ) > > > > > > keyFieldName = "*pk_id*", valFieldName = "*val*" > > > > > > Example 2 - composites: > > > > > > CREATE TABLE ( > > > *pk_id* BIGINT PRIMARY KEY, > > > val1 BIGINT, > > > val2 VARCHAR > > > ) > > > > > > keyFieldName = "*pk_id*", valFieldName = null (because value is complex > > and > > > is composed of two attributes). > > > > > > Vladimir. > > > > > > > > > On Wed, Feb 15, 2017 at 11:42 PM, Dmitriy Setrakyan < > > dsetrak...@apache.org > > > > > > > wrote: > > > > > > > On Wed, Feb 15, 2017 at 4:28 AM, Vladimir Ozerov < > voze...@gridgain.com > > > > > > > wrote: > > > > > > > > > Ok, let's put aside current fields configuration, I'll create > > separate > > > > > thread for it. As far as _KEY and _VAL, proposed change is exactly > > > about > > > > > mappings: > > > > > > > > > > class QueryEntity { > > > > > ... > > > > > String keyFieldName; > > > > > String valFieldName; > > > > > ... > > > > > } > > > > > > > > > > The key thing is that we will not require users to be aware of our > > > system > > > > > columns. Normally user should not bother about existence of hidden > > _KEY > > > > and > > > > > _VAL columns. Instead, we just allow them to optionally reference > the > > > > whole > > > > > key and/or val through predefined name. > > > > > > > > > > > > > > Vladimir, how will it work from the DDL perspective. Let's say > whenever > > > > user wants to create a table in Ignite? > > > > > > > > > >
Re: DML data streaming
Dima, Value type name doesn't necessarily maps to table name. For instance, what if I have two tables like this? They both have "java.lang.Long" as type name. CREATE table *t1* { pk_id BIGINT PRIMARY KEY, val BIGINT } CREATE table *t2* { pk_id BIGINT PRIMARY KEY, val BIGINT } On Fri, Feb 17, 2017 at 12:40 AM, Dmitriy Setrakyanwrote: > Vladimir, I am not sure I understand your point. The value type name should > be the table name, no? > > On Thu, Feb 16, 2017 at 12:13 AM, Vladimir Ozerov > wrote: > > > Dima, > > > > At this point we require the following additional data which is outside > of > > standard SQL: > > - Key type > > - Value type > > - Set of key columns > > > > I do not know yet how we will define these values. At the very least we > can > > calculate them automatically in some cases. For "keyFieldName" and > > "valFieldName" things are easier, as we can always derive them from table > > definition. > > > > Example 1 - primitives: > > > > CREATE TABLE ( > > *pk_id* BIGINT PRIMARY KEY, > > *val* BIGINT > > ) > > > > keyFieldName = "*pk_id*", valFieldName = "*val*" > > > > Example 2 - composites: > > > > CREATE TABLE ( > > *pk_id* BIGINT PRIMARY KEY, > > val1 BIGINT, > > val2 VARCHAR > > ) > > > > keyFieldName = "*pk_id*", valFieldName = null (because value is complex > and > > is composed of two attributes). > > > > Vladimir. > > > > > > On Wed, Feb 15, 2017 at 11:42 PM, Dmitriy Setrakyan < > dsetrak...@apache.org > > > > > wrote: > > > > > On Wed, Feb 15, 2017 at 4:28 AM, Vladimir Ozerov > > > > wrote: > > > > > > > Ok, let's put aside current fields configuration, I'll create > separate > > > > thread for it. As far as _KEY and _VAL, proposed change is exactly > > about > > > > mappings: > > > > > > > > class QueryEntity { > > > > ... > > > > String keyFieldName; > > > > String valFieldName; > > > > ... > > > > } > > > > > > > > The key thing is that we will not require users to be aware of our > > system > > > > columns. Normally user should not bother about existence of hidden > _KEY > > > and > > > > _VAL columns. Instead, we just allow them to optionally reference the > > > whole > > > > key and/or val through predefined name. > > > > > > > > > > > Vladimir, how will it work from the DDL perspective. Let's say whenever > > > user wants to create a table in Ignite? > > > > > >
Re: DML data streaming
Vladimir, I am not sure I understand your point. The value type name should be the table name, no? On Thu, Feb 16, 2017 at 12:13 AM, Vladimir Ozerovwrote: > Dima, > > At this point we require the following additional data which is outside of > standard SQL: > - Key type > - Value type > - Set of key columns > > I do not know yet how we will define these values. At the very least we can > calculate them automatically in some cases. For "keyFieldName" and > "valFieldName" things are easier, as we can always derive them from table > definition. > > Example 1 - primitives: > > CREATE TABLE ( > *pk_id* BIGINT PRIMARY KEY, > *val* BIGINT > ) > > keyFieldName = "*pk_id*", valFieldName = "*val*" > > Example 2 - composites: > > CREATE TABLE ( > *pk_id* BIGINT PRIMARY KEY, > val1 BIGINT, > val2 VARCHAR > ) > > keyFieldName = "*pk_id*", valFieldName = null (because value is complex and > is composed of two attributes). > > Vladimir. > > > On Wed, Feb 15, 2017 at 11:42 PM, Dmitriy Setrakyan > > wrote: > > > On Wed, Feb 15, 2017 at 4:28 AM, Vladimir Ozerov > > wrote: > > > > > Ok, let's put aside current fields configuration, I'll create separate > > > thread for it. As far as _KEY and _VAL, proposed change is exactly > about > > > mappings: > > > > > > class QueryEntity { > > > ... > > > String keyFieldName; > > > String valFieldName; > > > ... > > > } > > > > > > The key thing is that we will not require users to be aware of our > system > > > columns. Normally user should not bother about existence of hidden _KEY > > and > > > _VAL columns. Instead, we just allow them to optionally reference the > > whole > > > key and/or val through predefined name. > > > > > > > > Vladimir, how will it work from the DDL perspective. Let's say whenever > > user wants to create a table in Ignite? > > >
Re: DML data streaming
Dima, At this point we require the following additional data which is outside of standard SQL: - Key type - Value type - Set of key columns I do not know yet how we will define these values. At the very least we can calculate them automatically in some cases. For "keyFieldName" and "valFieldName" things are easier, as we can always derive them from table definition. Example 1 - primitives: CREATE TABLE ( *pk_id* BIGINT PRIMARY KEY, *val* BIGINT ) keyFieldName = "*pk_id*", valFieldName = "*val*" Example 2 - composites: CREATE TABLE ( *pk_id* BIGINT PRIMARY KEY, val1 BIGINT, val2 VARCHAR ) keyFieldName = "*pk_id*", valFieldName = null (because value is complex and is composed of two attributes). Vladimir. On Wed, Feb 15, 2017 at 11:42 PM, Dmitriy Setrakyanwrote: > On Wed, Feb 15, 2017 at 4:28 AM, Vladimir Ozerov > wrote: > > > Ok, let's put aside current fields configuration, I'll create separate > > thread for it. As far as _KEY and _VAL, proposed change is exactly about > > mappings: > > > > class QueryEntity { > > ... > > String keyFieldName; > > String valFieldName; > > ... > > } > > > > The key thing is that we will not require users to be aware of our system > > columns. Normally user should not bother about existence of hidden _KEY > and > > _VAL columns. Instead, we just allow them to optionally reference the > whole > > key and/or val through predefined name. > > > > > Vladimir, how will it work from the DDL perspective. Let's say whenever > user wants to create a table in Ignite? >
Re: DML data streaming
On Wed, Feb 15, 2017 at 2:41 PM, Alexander Paschenko < alexander.a.pasche...@gmail.com> wrote: > Folks, > > Regarding INSERT semantics in JDBC DML streaming mode - I've left only > INSERTs supports as we'd agreed before. > > However, current architecture of streaming related internals does not > give any clear way to intercept key duplicates and inform the user - > say, I can't just throw an exception from stream receiver (which is to > my knowledge the only place where we could filter erroneous keys) as > long as it will lead to whole batch remap and it's clearly not what we > want here. > > Printing warning to log from the receiver is of little to no use as it > will happen on data nodes so the end user won't see anything. > However, you still must do it. You should try throttling the identical log messages, so we don't flood the log. > > What I've introduced for now is optional config param that turns on > allowOverwrite on the streamer used in DML operation. > Agree, sounds like a good use of the flag. Are you setting it via JDBC/ODBC connection flag? > Does anyone have any thoughts about what could/should be done > regarding informing user about key duplicates in streaming mode? Or > probably we should just let it be as it is now? > In my view, we should introduce some generic error trap callback, e.g. onSqlError(...), for all unhandled SQL errors. User should provide it in the configuration, before startup. What do you think? > > Regards, > Alex > > 2017-02-15 23:42 GMT+03:00 Dmitriy Setrakyan: > > On Wed, Feb 15, 2017 at 4:28 AM, Vladimir Ozerov > > wrote: > > > >> Ok, let's put aside current fields configuration, I'll create separate > >> thread for it. As far as _KEY and _VAL, proposed change is exactly about > >> mappings: > >> > >> class QueryEntity { > >> ... > >> String keyFieldName; > >> String valFieldName; > >> ... > >> } > >> > >> The key thing is that we will not require users to be aware of our > system > >> columns. Normally user should not bother about existence of hidden _KEY > and > >> _VAL columns. Instead, we just allow them to optionally reference the > whole > >> key and/or val through predefined name. > >> > >> > > Vladimir, how will it work from the DDL perspective. Let's say whenever > > user wants to create a table in Ignite? >
Re: DML data streaming
Folks, Regarding INSERT semantics in JDBC DML streaming mode - I've left only INSERTs supports as we'd agreed before. However, current architecture of streaming related internals does not give any clear way to intercept key duplicates and inform the user - say, I can't just throw an exception from stream receiver (which is to my knowledge the only place where we could filter erroneous keys) as long as it will lead to whole batch remap and it's clearly not what we want here. Printing warning to log from the receiver is of little to no use as it will happen on data nodes so the end user won't see anything. What I've introduced for now is optional config param that turns on allowOverwrite on the streamer used in DML operation. Does anyone have any thoughts about what could/should be done regarding informing user about key duplicates in streaming mode? Or probably we should just let it be as it is now? Regards, Alex 2017-02-15 23:42 GMT+03:00 Dmitriy Setrakyan: > On Wed, Feb 15, 2017 at 4:28 AM, Vladimir Ozerov > wrote: > >> Ok, let's put aside current fields configuration, I'll create separate >> thread for it. As far as _KEY and _VAL, proposed change is exactly about >> mappings: >> >> class QueryEntity { >> ... >> String keyFieldName; >> String valFieldName; >> ... >> } >> >> The key thing is that we will not require users to be aware of our system >> columns. Normally user should not bother about existence of hidden _KEY and >> _VAL columns. Instead, we just allow them to optionally reference the whole >> key and/or val through predefined name. >> >> > Vladimir, how will it work from the DDL perspective. Let's say whenever > user wants to create a table in Ignite?
Re: DML data streaming
On Wed, Feb 15, 2017 at 4:28 AM, Vladimir Ozerovwrote: > Ok, let's put aside current fields configuration, I'll create separate > thread for it. As far as _KEY and _VAL, proposed change is exactly about > mappings: > > class QueryEntity { > ... > String keyFieldName; > String valFieldName; > ... > } > > The key thing is that we will not require users to be aware of our system > columns. Normally user should not bother about existence of hidden _KEY and > _VAL columns. Instead, we just allow them to optionally reference the whole > key and/or val through predefined name. > > Vladimir, how will it work from the DDL perspective. Let's say whenever user wants to create a table in Ignite?
Re: DML data streaming
Vladimir, Looks good to me. Pavel, No worries, it will work exactly like you described: hidden _key and _val fields will be always accessible. Sergi 2017-02-15 15:56 GMT+03:00 Pavel Tupitsyn: > I have no particular opinion on how we should handle _key/_val, > but we certainly need a way to select entire key and value objects via > SqlFieldsQuery, > and this should work without any additional configuration. > > We can rename these, turn them into system functions, whatever. > > Ignite.NET LINQ provider heavily relies on this possibility - users often > want to select the entire entry value. > > On Wed, Feb 15, 2017 at 3:28 PM, Vladimir Ozerov > wrote: > > > Ok, let's put aside current fields configuration, I'll create separate > > thread for it. As far as _KEY and _VAL, proposed change is exactly about > > mappings: > > > > class QueryEntity { > > ... > > String keyFieldName; > > String valFieldName; > > ... > > } > > > > The key thing is that we will not require users to be aware of our system > > columns. Normally user should not bother about existence of hidden _KEY > and > > _VAL columns. Instead, we just allow them to optionally reference the > whole > > key and/or val through predefined name. > > > > On Wed, Feb 15, 2017 at 3:07 PM, Sergi Vladykin < > sergi.vlady...@gmail.com> > > wrote: > > > > > I don't see any improvement here. Usability will only suffer with this > > > change. > > > > > > I'd suggest to just add mapping for system columns like _key, _val , > > _ver. > > > > > > Sergi > > > > > > 2017-02-15 13:18 GMT+03:00 Vladimir Ozerov : > > > > > > > I think the whole QueryEntity class require rework to allow for this > > > > change. I would start with creating QueryField class which will > > > encapsulate > > > > all field properties which are currently set through different > setters: > > > > > > > > class QueryField { > > > > String name; > > > > String type; > > > > String alias; > > > > boolean keyField; > > > > } > > > > > > > > class QueryEntity { > > > > String tableName; > > > > String keyType; > > > > String valType; > > > > Collection fields; > > > > Collection indexes; > > > > } > > > > > > > > Then we can add optional key and value field names to top-level > config. > > > If > > > > set, key and/or value will have names and will be included into > SELECT > > * > > > > query in the same way as we do this for _KEY and _VAL at the moment: > > > > > > > > class QueryEntity { > > > > String tableName; > > > > String keyType; > > > > String valType; > > > > *String keyFieldName;* > > > > *String valFieldName;* > > > > Collection fields; > > > > Collection indexes; > > > > } > > > > > > > > Any other ideas? > > > > > > > > On Tue, Feb 14, 2017 at 9:19 PM, Dmitriy Setrakyan < > > > dsetrak...@apache.org> > > > > wrote: > > > > > > > > > Vova, > > > > > > > > > > Agree about the primitive types. However, it is not clear to me how > > the > > > > > mapping from a primitive type to a column name will be supported. > Do > > > you > > > > > have a design in mind? > > > > > > > > > > D. > > > > > > > > > > On Tue, Feb 14, 2017 at 6:16 AM, Vladimir Ozerov < > > voze...@gridgain.com > > > > > > > > > wrote: > > > > > > > > > > > Dima, > > > > > > > > > > > > This will not work for primitive keys and values as currently the > > > only > > > > > way > > > > > > to address them is to use "_KEY" and "_VAL" aliases respectively. > > For > > > > > this > > > > > > reason I would rather postpone UPDATE/DELETE implementation until > > > > "_KEY" > > > > > > and "_VAL" are hidden from public API and some kind of mapping is > > > > > > introduced. AFAIK this should be handled as a part of IGNITE-3487 > > > ]1]. > > > > > > > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-3487 > > > > > > > > > > > > On Sat, Feb 11, 2017 at 3:36 AM, Dmitriy Setrakyan < > > > > > dsetrak...@apache.org> > > > > > > wrote: > > > > > > > > > > > > > On Fri, Feb 10, 2017 at 3:36 AM, Vladimir Ozerov < > > > > voze...@gridgain.com > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > I propose to ship streaming with INSERT support only for now. > > > This > > > > is > > > > > > > > enough for multitude cases and will add value to Ignite 1.9 > > > > > > immediately. > > > > > > > We > > > > > > > > can think about correct streaming UPDATE/DELETE architecture > > > > > separately > > > > > > > .It > > > > > > > > is much more difficult thing, we cannot support it in a clean > > way > > > > > right > > > > > > > now > > > > > > > > due to multiple "_key" and "_val" usages over the code base. > > > > > > > > > > > > > > > > > > > > > > Vova, I disagree. If all parts of the key are present, then we > > can > > > > > always > > > > > > > construct a key in all cases. For these operations we can > always > > > > > support > > > > > > > streaming. For all other
Re: DML data streaming
I have no particular opinion on how we should handle _key/_val, but we certainly need a way to select entire key and value objects via SqlFieldsQuery, and this should work without any additional configuration. We can rename these, turn them into system functions, whatever. Ignite.NET LINQ provider heavily relies on this possibility - users often want to select the entire entry value. On Wed, Feb 15, 2017 at 3:28 PM, Vladimir Ozerovwrote: > Ok, let's put aside current fields configuration, I'll create separate > thread for it. As far as _KEY and _VAL, proposed change is exactly about > mappings: > > class QueryEntity { > ... > String keyFieldName; > String valFieldName; > ... > } > > The key thing is that we will not require users to be aware of our system > columns. Normally user should not bother about existence of hidden _KEY and > _VAL columns. Instead, we just allow them to optionally reference the whole > key and/or val through predefined name. > > On Wed, Feb 15, 2017 at 3:07 PM, Sergi Vladykin > wrote: > > > I don't see any improvement here. Usability will only suffer with this > > change. > > > > I'd suggest to just add mapping for system columns like _key, _val , > _ver. > > > > Sergi > > > > 2017-02-15 13:18 GMT+03:00 Vladimir Ozerov : > > > > > I think the whole QueryEntity class require rework to allow for this > > > change. I would start with creating QueryField class which will > > encapsulate > > > all field properties which are currently set through different setters: > > > > > > class QueryField { > > > String name; > > > String type; > > > String alias; > > > boolean keyField; > > > } > > > > > > class QueryEntity { > > > String tableName; > > > String keyType; > > > String valType; > > > Collection fields; > > > Collection indexes; > > > } > > > > > > Then we can add optional key and value field names to top-level config. > > If > > > set, key and/or value will have names and will be included into SELECT > * > > > query in the same way as we do this for _KEY and _VAL at the moment: > > > > > > class QueryEntity { > > > String tableName; > > > String keyType; > > > String valType; > > > *String keyFieldName;* > > > *String valFieldName;* > > > Collection fields; > > > Collection indexes; > > > } > > > > > > Any other ideas? > > > > > > On Tue, Feb 14, 2017 at 9:19 PM, Dmitriy Setrakyan < > > dsetrak...@apache.org> > > > wrote: > > > > > > > Vova, > > > > > > > > Agree about the primitive types. However, it is not clear to me how > the > > > > mapping from a primitive type to a column name will be supported. Do > > you > > > > have a design in mind? > > > > > > > > D. > > > > > > > > On Tue, Feb 14, 2017 at 6:16 AM, Vladimir Ozerov < > voze...@gridgain.com > > > > > > > wrote: > > > > > > > > > Dima, > > > > > > > > > > This will not work for primitive keys and values as currently the > > only > > > > way > > > > > to address them is to use "_KEY" and "_VAL" aliases respectively. > For > > > > this > > > > > reason I would rather postpone UPDATE/DELETE implementation until > > > "_KEY" > > > > > and "_VAL" are hidden from public API and some kind of mapping is > > > > > introduced. AFAIK this should be handled as a part of IGNITE-3487 > > ]1]. > > > > > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-3487 > > > > > > > > > > On Sat, Feb 11, 2017 at 3:36 AM, Dmitriy Setrakyan < > > > > dsetrak...@apache.org> > > > > > wrote: > > > > > > > > > > > On Fri, Feb 10, 2017 at 3:36 AM, Vladimir Ozerov < > > > voze...@gridgain.com > > > > > > > > > > > wrote: > > > > > > > > > > > > > I propose to ship streaming with INSERT support only for now. > > This > > > is > > > > > > > enough for multitude cases and will add value to Ignite 1.9 > > > > > immediately. > > > > > > We > > > > > > > can think about correct streaming UPDATE/DELETE architecture > > > > separately > > > > > > .It > > > > > > > is much more difficult thing, we cannot support it in a clean > way > > > > right > > > > > > now > > > > > > > due to multiple "_key" and "_val" usages over the code base. > > > > > > > > > > > > > > > > > > > Vova, I disagree. If all parts of the key are present, then we > can > > > > always > > > > > > construct a key in all cases. For these operations we can always > > > > support > > > > > > streaming. For all other operations, we can delegate to standard > > MR, > > > > but > > > > > > still perform most operations on the same node, as I suggested in > > > > another > > > > > > email. > > > > > > > > > > > > > > > > > > > > >
Re: DML data streaming
Ok, let's put aside current fields configuration, I'll create separate thread for it. As far as _KEY and _VAL, proposed change is exactly about mappings: class QueryEntity { ... String keyFieldName; String valFieldName; ... } The key thing is that we will not require users to be aware of our system columns. Normally user should not bother about existence of hidden _KEY and _VAL columns. Instead, we just allow them to optionally reference the whole key and/or val through predefined name. On Wed, Feb 15, 2017 at 3:07 PM, Sergi Vladykinwrote: > I don't see any improvement here. Usability will only suffer with this > change. > > I'd suggest to just add mapping for system columns like _key, _val , _ver. > > Sergi > > 2017-02-15 13:18 GMT+03:00 Vladimir Ozerov : > > > I think the whole QueryEntity class require rework to allow for this > > change. I would start with creating QueryField class which will > encapsulate > > all field properties which are currently set through different setters: > > > > class QueryField { > > String name; > > String type; > > String alias; > > boolean keyField; > > } > > > > class QueryEntity { > > String tableName; > > String keyType; > > String valType; > > Collection fields; > > Collection indexes; > > } > > > > Then we can add optional key and value field names to top-level config. > If > > set, key and/or value will have names and will be included into SELECT * > > query in the same way as we do this for _KEY and _VAL at the moment: > > > > class QueryEntity { > > String tableName; > > String keyType; > > String valType; > > *String keyFieldName;* > > *String valFieldName;* > > Collection fields; > > Collection indexes; > > } > > > > Any other ideas? > > > > On Tue, Feb 14, 2017 at 9:19 PM, Dmitriy Setrakyan < > dsetrak...@apache.org> > > wrote: > > > > > Vova, > > > > > > Agree about the primitive types. However, it is not clear to me how the > > > mapping from a primitive type to a column name will be supported. Do > you > > > have a design in mind? > > > > > > D. > > > > > > On Tue, Feb 14, 2017 at 6:16 AM, Vladimir Ozerov > > > > wrote: > > > > > > > Dima, > > > > > > > > This will not work for primitive keys and values as currently the > only > > > way > > > > to address them is to use "_KEY" and "_VAL" aliases respectively. For > > > this > > > > reason I would rather postpone UPDATE/DELETE implementation until > > "_KEY" > > > > and "_VAL" are hidden from public API and some kind of mapping is > > > > introduced. AFAIK this should be handled as a part of IGNITE-3487 > ]1]. > > > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-3487 > > > > > > > > On Sat, Feb 11, 2017 at 3:36 AM, Dmitriy Setrakyan < > > > dsetrak...@apache.org> > > > > wrote: > > > > > > > > > On Fri, Feb 10, 2017 at 3:36 AM, Vladimir Ozerov < > > voze...@gridgain.com > > > > > > > > > wrote: > > > > > > > > > > > I propose to ship streaming with INSERT support only for now. > This > > is > > > > > > enough for multitude cases and will add value to Ignite 1.9 > > > > immediately. > > > > > We > > > > > > can think about correct streaming UPDATE/DELETE architecture > > > separately > > > > > .It > > > > > > is much more difficult thing, we cannot support it in a clean way > > > right > > > > > now > > > > > > due to multiple "_key" and "_val" usages over the code base. > > > > > > > > > > > > > > > > Vova, I disagree. If all parts of the key are present, then we can > > > always > > > > > construct a key in all cases. For these operations we can always > > > support > > > > > streaming. For all other operations, we can delegate to standard > MR, > > > but > > > > > still perform most operations on the same node, as I suggested in > > > another > > > > > email. > > > > > > > > > > > > > > >
Re: DML data streaming
I don't see any improvement here. Usability will only suffer with this change. I'd suggest to just add mapping for system columns like _key, _val , _ver. Sergi 2017-02-15 13:18 GMT+03:00 Vladimir Ozerov: > I think the whole QueryEntity class require rework to allow for this > change. I would start with creating QueryField class which will encapsulate > all field properties which are currently set through different setters: > > class QueryField { > String name; > String type; > String alias; > boolean keyField; > } > > class QueryEntity { > String tableName; > String keyType; > String valType; > Collection fields; > Collection indexes; > } > > Then we can add optional key and value field names to top-level config. If > set, key and/or value will have names and will be included into SELECT * > query in the same way as we do this for _KEY and _VAL at the moment: > > class QueryEntity { > String tableName; > String keyType; > String valType; > *String keyFieldName;* > *String valFieldName;* > Collection fields; > Collection indexes; > } > > Any other ideas? > > On Tue, Feb 14, 2017 at 9:19 PM, Dmitriy Setrakyan > wrote: > > > Vova, > > > > Agree about the primitive types. However, it is not clear to me how the > > mapping from a primitive type to a column name will be supported. Do you > > have a design in mind? > > > > D. > > > > On Tue, Feb 14, 2017 at 6:16 AM, Vladimir Ozerov > > wrote: > > > > > Dima, > > > > > > This will not work for primitive keys and values as currently the only > > way > > > to address them is to use "_KEY" and "_VAL" aliases respectively. For > > this > > > reason I would rather postpone UPDATE/DELETE implementation until > "_KEY" > > > and "_VAL" are hidden from public API and some kind of mapping is > > > introduced. AFAIK this should be handled as a part of IGNITE-3487 ]1]. > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-3487 > > > > > > On Sat, Feb 11, 2017 at 3:36 AM, Dmitriy Setrakyan < > > dsetrak...@apache.org> > > > wrote: > > > > > > > On Fri, Feb 10, 2017 at 3:36 AM, Vladimir Ozerov < > voze...@gridgain.com > > > > > > > wrote: > > > > > > > > > I propose to ship streaming with INSERT support only for now. This > is > > > > > enough for multitude cases and will add value to Ignite 1.9 > > > immediately. > > > > We > > > > > can think about correct streaming UPDATE/DELETE architecture > > separately > > > > .It > > > > > is much more difficult thing, we cannot support it in a clean way > > right > > > > now > > > > > due to multiple "_key" and "_val" usages over the code base. > > > > > > > > > > > > > Vova, I disagree. If all parts of the key are present, then we can > > always > > > > construct a key in all cases. For these operations we can always > > support > > > > streaming. For all other operations, we can delegate to standard MR, > > but > > > > still perform most operations on the same node, as I suggested in > > another > > > > email. > > > > > > > > > >
Re: DML data streaming
I think the whole QueryEntity class require rework to allow for this change. I would start with creating QueryField class which will encapsulate all field properties which are currently set through different setters: class QueryField { String name; String type; String alias; boolean keyField; } class QueryEntity { String tableName; String keyType; String valType; Collection fields; Collection indexes; } Then we can add optional key and value field names to top-level config. If set, key and/or value will have names and will be included into SELECT * query in the same way as we do this for _KEY and _VAL at the moment: class QueryEntity { String tableName; String keyType; String valType; *String keyFieldName;* *String valFieldName;* Collection fields; Collection indexes; } Any other ideas? On Tue, Feb 14, 2017 at 9:19 PM, Dmitriy Setrakyanwrote: > Vova, > > Agree about the primitive types. However, it is not clear to me how the > mapping from a primitive type to a column name will be supported. Do you > have a design in mind? > > D. > > On Tue, Feb 14, 2017 at 6:16 AM, Vladimir Ozerov > wrote: > > > Dima, > > > > This will not work for primitive keys and values as currently the only > way > > to address them is to use "_KEY" and "_VAL" aliases respectively. For > this > > reason I would rather postpone UPDATE/DELETE implementation until "_KEY" > > and "_VAL" are hidden from public API and some kind of mapping is > > introduced. AFAIK this should be handled as a part of IGNITE-3487 ]1]. > > > > [1] https://issues.apache.org/jira/browse/IGNITE-3487 > > > > On Sat, Feb 11, 2017 at 3:36 AM, Dmitriy Setrakyan < > dsetrak...@apache.org> > > wrote: > > > > > On Fri, Feb 10, 2017 at 3:36 AM, Vladimir Ozerov > > > > wrote: > > > > > > > I propose to ship streaming with INSERT support only for now. This is > > > > enough for multitude cases and will add value to Ignite 1.9 > > immediately. > > > We > > > > can think about correct streaming UPDATE/DELETE architecture > separately > > > .It > > > > is much more difficult thing, we cannot support it in a clean way > right > > > now > > > > due to multiple "_key" and "_val" usages over the code base. > > > > > > > > > > Vova, I disagree. If all parts of the key are present, then we can > always > > > construct a key in all cases. For these operations we can always > support > > > streaming. For all other operations, we can delegate to standard MR, > but > > > still perform most operations on the same node, as I suggested in > another > > > email. > > > > > >
Re: DML data streaming
Vova, Agree about the primitive types. However, it is not clear to me how the mapping from a primitive type to a column name will be supported. Do you have a design in mind? D. On Tue, Feb 14, 2017 at 6:16 AM, Vladimir Ozerovwrote: > Dima, > > This will not work for primitive keys and values as currently the only way > to address them is to use "_KEY" and "_VAL" aliases respectively. For this > reason I would rather postpone UPDATE/DELETE implementation until "_KEY" > and "_VAL" are hidden from public API and some kind of mapping is > introduced. AFAIK this should be handled as a part of IGNITE-3487 ]1]. > > [1] https://issues.apache.org/jira/browse/IGNITE-3487 > > On Sat, Feb 11, 2017 at 3:36 AM, Dmitriy Setrakyan > wrote: > > > On Fri, Feb 10, 2017 at 3:36 AM, Vladimir Ozerov > > wrote: > > > > > I propose to ship streaming with INSERT support only for now. This is > > > enough for multitude cases and will add value to Ignite 1.9 > immediately. > > We > > > can think about correct streaming UPDATE/DELETE architecture separately > > .It > > > is much more difficult thing, we cannot support it in a clean way right > > now > > > due to multiple "_key" and "_val" usages over the code base. > > > > > > > Vova, I disagree. If all parts of the key are present, then we can always > > construct a key in all cases. For these operations we can always support > > streaming. For all other operations, we can delegate to standard MR, but > > still perform most operations on the same node, as I suggested in another > > email. > > >
Re: DML data streaming
Dima, This will not work for primitive keys and values as currently the only way to address them is to use "_KEY" and "_VAL" aliases respectively. For this reason I would rather postpone UPDATE/DELETE implementation until "_KEY" and "_VAL" are hidden from public API and some kind of mapping is introduced. AFAIK this should be handled as a part of IGNITE-3487 ]1]. [1] https://issues.apache.org/jira/browse/IGNITE-3487 On Sat, Feb 11, 2017 at 3:36 AM, Dmitriy Setrakyanwrote: > On Fri, Feb 10, 2017 at 3:36 AM, Vladimir Ozerov > wrote: > > > I propose to ship streaming with INSERT support only for now. This is > > enough for multitude cases and will add value to Ignite 1.9 immediately. > We > > can think about correct streaming UPDATE/DELETE architecture separately > .It > > is much more difficult thing, we cannot support it in a clean way right > now > > due to multiple "_key" and "_val" usages over the code base. > > > > Vova, I disagree. If all parts of the key are present, then we can always > construct a key in all cases. For these operations we can always support > streaming. For all other operations, we can delegate to standard MR, but > still perform most operations on the same node, as I suggested in another > email. >
Re: DML data streaming
On Fri, Feb 10, 2017 at 3:36 AM, Vladimir Ozerovwrote: > I propose to ship streaming with INSERT support only for now. This is > enough for multitude cases and will add value to Ignite 1.9 immediately. We > can think about correct streaming UPDATE/DELETE architecture separately .It > is much more difficult thing, we cannot support it in a clean way right now > due to multiple "_key" and "_val" usages over the code base. > Vova, I disagree. If all parts of the key are present, then we can always construct a key in all cases. For these operations we can always support streaming. For all other operations, we can delegate to standard MR, but still perform most operations on the same node, as I suggested in another email.
Re: DML data streaming
On Fri, Feb 10, 2017 at 12:55 AM, Alexander Paschenko < alexander.a.pasche...@gmail.com> wrote: > And to avoid further confusion: UPDATE and DELETE are simply > impossible in streaming mode when the key is not completely defined as > long as data streamer operates with key-value pairs and not just > tuples of named values. That's why we can't do DELETE from Person > WHERE id1 = 5 from prev example with streamer - the Key { id1 = 5, id2 > = 0 } that would be constructed from such query is just one key and is > handled by streamer as such while semantically that query is not about > ONE key but about ALL keys where id1 = 5. > I completely agree. However, we should still optimize the MR here, since the keys selected from one table (or cache) will probably end up on the same node as the same keys inserted, updated, or deleted in another cache, so these operations will likely still be local to the node.
Re: DML data streaming
On Fri, Feb 10, 2017 at 12:49 AM, Alexander Paschenko < alexander.a.pasche...@gmail.com> wrote: > Dima, > > > > There are several ways to handle it. I would check how other databases > > handle it, maybe we can borrow something. To the least, we should log > such > > errors in the log for now. > > > > Logging errors would mean introducing some kind of stream receiver to > do that and thus that would be really the same performance penalty for > the successful operations. I think we should go with that optional > flag for semantics after all. > I am OK with introducing some error trap and plug it into configuration (maybe some interface with onError(...) callback). However, we should never swallow error, we should always print all errors to the log. Let's not worry about the performance in case of errors. > > > You don't have to use _key. Primary key is usually a field in the class, > so > > you can use a normal column name. In any case, we should remove any usage > > of _key before 2.0 is released. > > > > Again, if user does not have to specify _key on INSERT, then it is very > > unclear to me, why user would need to specify _key for UPDATE or DELETE. > > Something smells here. Can you please provide an example? > > > > UPDATE and DELETE _in streaming mode_ are carried _only_ for "fast" > optimized cases - i.e. those where _key (and possibly _val) are > explicitly specified by the user thus allowing us to map UPDATE and > DELETE directly to cache's replace and remove operations without > messing with entry processors and doing map-reduce SELECT by given > criteria. > > Say, we have Person { firstName, secondName } with key class Key { id1, > id2 } > > If I say DELETE from Person WHERE _key = ? and specify arg via JDBC, > there's no need to do any SELECT - we can just call IgniteCache.remove > on that key. > > But if I say DELETE from Person WHERE id1 = 5 then there's no way to > avoid MR - we have to find all keys that interest us first by doing > SELECT as long as we know only partly about what keys the user wants > to be affected. > > It works in the same way for UPDATE. And I hope that it's clear how > it's different from INSERT - there's no MR by definition (we don't > allow INSERT FROM SELECT in streaming mode). > Do we allow INSERT from SELECT in non-streaming mode? > > AGAIN: this all is said only about streaming mode; non streaming mode > does those optimizations too, but it also allows complex conditions, > while streaming mode does not allow them to keep things fast and avoid > MR. > > That's the reason why I suggest that we drop UPDATE and DELETE from > DML streaming as they mean messing with those soon-hidden columns. > > Still we could optimize stuff like DELETE from Person WHERE id1 = 5 > AND id2 = 6 - query involves ALL fields of key AND compares only for > equality AND has no complex expressions - we can construct key > unambiguously and still call remove directly. > Exactly my point. If all key fields are present, we can construct the key ourselves and still delegate to cache.put(..) or cache.remove(..). For all cases where all the key fields are not present we should do regular MR. I am assuming that this applies to UPDATE and DELETE operation. My vote is to implement this functionality. > > But to me it does not sound like a really great reason to leave UPDATE > and DELETE in DML - the users will have to write some specific queries > to use that while all other stuff will just be declined in that mode. > And, as I said before, UPDATE and DELETE don't probably perfectly fit > with primary data streamer use cases - after all, modifying existing > stuff is not what data streamer is about. > I am not sure what this means. We have to work in the same way as regular RDBMS systems. I would not try to reinvent the bicycle here. All UPDATE, DELETE, and INSERT operations should be part of DML. > > And regarding hiding columns: it's unclear how things will look like > for caches likewhen we remove _key and _val as long as > tables for such cases currently have nothing but those two columns. > Again, think about standard RDBMS systems. None of them have _key or _val, and therefore neither should we.
Re: DML data streaming
In general, the data streamer approach should be mostly used for data loading scenarios. The data is usually loaded with INSERTS which means that the scenario is already supported and we’re free to merge the changes to 1.9. If you UPDATE or DELETE data in the streaming mode then you are required to set dataStreamer.allowOverwrite = true, making sure that the updates coming from the streamer side are consistent with transactions that might be executed in parallel. In this mode the streamer switches to a slower mode pushing the data with cache.writeAll() and cache.removeAll() methods. At all, considering real-life use cases it’s more than enough to support the streaming mode for INSERTS only and describe it properly in the documentation. — Denis > On Feb 10, 2017, at 3:36 AM, Vladimir Ozerovwrote: > > I propose to ship streaming with INSERT support only for now. This is > enough for multitude cases and will add value to Ignite 1.9 immediately. We > can think about correct streaming UPDATE/DELETE architecture separately .It > is much more difficult thing, we cannot support it in a clean way right now > due to multiple "_key" and "_val" usages over the code base. > > On Fri, Feb 10, 2017 at 11:55 AM, Alexander Paschenko < > alexander.a.pasche...@gmail.com> wrote: > >> And to avoid further confusion: UPDATE and DELETE are simply >> impossible in streaming mode when the key is not completely defined as >> long as data streamer operates with key-value pairs and not just >> tuples of named values. That's why we can't do DELETE from Person >> WHERE id1 = 5 from prev example with streamer - the Key { id1 = 5, id2 >> = 0 } that would be constructed from such query is just one key and is >> handled by streamer as such while semantically that query is not about >> ONE key but about ALL keys where id1 = 5. >> >> - Alex >> >> 2017-02-10 11:49 GMT+03:00 Alexander Paschenko >> : >>> Dima, There are several ways to handle it. I would check how other databases handle it, maybe we can borrow something. To the least, we should log >> such errors in the log for now. >>> >>> Logging errors would mean introducing some kind of stream receiver to >>> do that and thus that would be really the same performance penalty for >>> the successful operations. I think we should go with that optional >>> flag for semantics after all. >>> You don't have to use _key. Primary key is usually a field in the >> class, so you can use a normal column name. In any case, we should remove any >> usage of _key before 2.0 is released. Again, if user does not have to specify _key on INSERT, then it is very unclear to me, why user would need to specify _key for UPDATE or DELETE. Something smells here. Can you please provide an example? >>> >>> UPDATE and DELETE _in streaming mode_ are carried _only_ for "fast" >>> optimized cases - i.e. those where _key (and possibly _val) are >>> explicitly specified by the user thus allowing us to map UPDATE and >>> DELETE directly to cache's replace and remove operations without >>> messing with entry processors and doing map-reduce SELECT by given >>> criteria. >>> >>> Say, we have Person { firstName, secondName } with key class Key { id1, >> id2 } >>> >>> If I say DELETE from Person WHERE _key = ? and specify arg via JDBC, >>> there's no need to do any SELECT - we can just call IgniteCache.remove >>> on that key. >>> >>> But if I say DELETE from Person WHERE id1 = 5 then there's no way to >>> avoid MR - we have to find all keys that interest us first by doing >>> SELECT as long as we know only partly about what keys the user wants >>> to be affected. >>> >>> It works in the same way for UPDATE. And I hope that it's clear how >>> it's different from INSERT - there's no MR by definition (we don't >>> allow INSERT FROM SELECT in streaming mode). >>> >>> AGAIN: this all is said only about streaming mode; non streaming mode >>> does those optimizations too, but it also allows complex conditions, >>> while streaming mode does not allow them to keep things fast and avoid >>> MR. >>> >>> That's the reason why I suggest that we drop UPDATE and DELETE from >>> DML streaming as they mean messing with those soon-hidden columns. >>> >>> Still we could optimize stuff like DELETE from Person WHERE id1 = 5 >>> AND id2 = 6 - query involves ALL fields of key AND compares only for >>> equality AND has no complex expressions - we can construct key >>> unambiguously and still call remove directly. >>> >>> But to me it does not sound like a really great reason to leave UPDATE >>> and DELETE in DML - the users will have to write some specific queries >>> to use that while all other stuff will just be declined in that mode. >>> And, as I said before, UPDATE and DELETE don't probably perfectly fit >>> with primary data streamer use cases - after all, modifying existing >>> stuff is not what
Re: DML data streaming
I propose to ship streaming with INSERT support only for now. This is enough for multitude cases and will add value to Ignite 1.9 immediately. We can think about correct streaming UPDATE/DELETE architecture separately .It is much more difficult thing, we cannot support it in a clean way right now due to multiple "_key" and "_val" usages over the code base. On Fri, Feb 10, 2017 at 11:55 AM, Alexander Paschenko < alexander.a.pasche...@gmail.com> wrote: > And to avoid further confusion: UPDATE and DELETE are simply > impossible in streaming mode when the key is not completely defined as > long as data streamer operates with key-value pairs and not just > tuples of named values. That's why we can't do DELETE from Person > WHERE id1 = 5 from prev example with streamer - the Key { id1 = 5, id2 > = 0 } that would be constructed from such query is just one key and is > handled by streamer as such while semantically that query is not about > ONE key but about ALL keys where id1 = 5. > > - Alex > > 2017-02-10 11:49 GMT+03:00 Alexander Paschenko >: > > Dima, > >> > >> There are several ways to handle it. I would check how other databases > >> handle it, maybe we can borrow something. To the least, we should log > such > >> errors in the log for now. > >> > > > > Logging errors would mean introducing some kind of stream receiver to > > do that and thus that would be really the same performance penalty for > > the successful operations. I think we should go with that optional > > flag for semantics after all. > > > >> You don't have to use _key. Primary key is usually a field in the > class, so > >> you can use a normal column name. In any case, we should remove any > usage > >> of _key before 2.0 is released. > >> > >> Again, if user does not have to specify _key on INSERT, then it is very > >> unclear to me, why user would need to specify _key for UPDATE or DELETE. > >> Something smells here. Can you please provide an example? > >> > > > > UPDATE and DELETE _in streaming mode_ are carried _only_ for "fast" > > optimized cases - i.e. those where _key (and possibly _val) are > > explicitly specified by the user thus allowing us to map UPDATE and > > DELETE directly to cache's replace and remove operations without > > messing with entry processors and doing map-reduce SELECT by given > > criteria. > > > > Say, we have Person { firstName, secondName } with key class Key { id1, > id2 } > > > > If I say DELETE from Person WHERE _key = ? and specify arg via JDBC, > > there's no need to do any SELECT - we can just call IgniteCache.remove > > on that key. > > > > But if I say DELETE from Person WHERE id1 = 5 then there's no way to > > avoid MR - we have to find all keys that interest us first by doing > > SELECT as long as we know only partly about what keys the user wants > > to be affected. > > > > It works in the same way for UPDATE. And I hope that it's clear how > > it's different from INSERT - there's no MR by definition (we don't > > allow INSERT FROM SELECT in streaming mode). > > > > AGAIN: this all is said only about streaming mode; non streaming mode > > does those optimizations too, but it also allows complex conditions, > > while streaming mode does not allow them to keep things fast and avoid > > MR. > > > > That's the reason why I suggest that we drop UPDATE and DELETE from > > DML streaming as they mean messing with those soon-hidden columns. > > > > Still we could optimize stuff like DELETE from Person WHERE id1 = 5 > > AND id2 = 6 - query involves ALL fields of key AND compares only for > > equality AND has no complex expressions - we can construct key > > unambiguously and still call remove directly. > > > > But to me it does not sound like a really great reason to leave UPDATE > > and DELETE in DML - the users will have to write some specific queries > > to use that while all other stuff will just be declined in that mode. > > And, as I said before, UPDATE and DELETE don't probably perfectly fit > > with primary data streamer use cases - after all, modifying existing > > stuff is not what data streamer is about. > > > > And regarding hiding columns: it's unclear how things will look like > > for caches like when we remove _key and _val as long as > > tables for such cases currently have nothing but those two columns. > > > > - Alex > > > >>> 8 февр. 2017 г. 11:33 PM пользователь "Dmitriy Setrakyan" < > >>> dsetrak...@apache.org> написал: > >>> > >>> > Alexander, > >>> > > >>> > Are you suggesting that currently to execute a simple INSERT for 1 > row we > >>> > invoke a data streamer on Ignite API? How about an update by a > primary > >>> key? > >>> > Why not execute a simple cache put in either case? > >>> > > >>> > I think we had a separate thread where we agreed that the streamer > should > >>> > only be turned on if a certain flag on a JDBC connection is set, no? > >>> > > >>> > D. > >>> > > >>> > On Wed, Feb 8, 2017 at 7:00 AM, Alexander Paschenko < > >>>
Re: DML data streaming
And to avoid further confusion: UPDATE and DELETE are simply impossible in streaming mode when the key is not completely defined as long as data streamer operates with key-value pairs and not just tuples of named values. That's why we can't do DELETE from Person WHERE id1 = 5 from prev example with streamer - the Key { id1 = 5, id2 = 0 } that would be constructed from such query is just one key and is handled by streamer as such while semantically that query is not about ONE key but about ALL keys where id1 = 5. - Alex 2017-02-10 11:49 GMT+03:00 Alexander Paschenko: > Dima, >> >> There are several ways to handle it. I would check how other databases >> handle it, maybe we can borrow something. To the least, we should log such >> errors in the log for now. >> > > Logging errors would mean introducing some kind of stream receiver to > do that and thus that would be really the same performance penalty for > the successful operations. I think we should go with that optional > flag for semantics after all. > >> You don't have to use _key. Primary key is usually a field in the class, so >> you can use a normal column name. In any case, we should remove any usage >> of _key before 2.0 is released. >> >> Again, if user does not have to specify _key on INSERT, then it is very >> unclear to me, why user would need to specify _key for UPDATE or DELETE. >> Something smells here. Can you please provide an example? >> > > UPDATE and DELETE _in streaming mode_ are carried _only_ for "fast" > optimized cases - i.e. those where _key (and possibly _val) are > explicitly specified by the user thus allowing us to map UPDATE and > DELETE directly to cache's replace and remove operations without > messing with entry processors and doing map-reduce SELECT by given > criteria. > > Say, we have Person { firstName, secondName } with key class Key { id1, id2 } > > If I say DELETE from Person WHERE _key = ? and specify arg via JDBC, > there's no need to do any SELECT - we can just call IgniteCache.remove > on that key. > > But if I say DELETE from Person WHERE id1 = 5 then there's no way to > avoid MR - we have to find all keys that interest us first by doing > SELECT as long as we know only partly about what keys the user wants > to be affected. > > It works in the same way for UPDATE. And I hope that it's clear how > it's different from INSERT - there's no MR by definition (we don't > allow INSERT FROM SELECT in streaming mode). > > AGAIN: this all is said only about streaming mode; non streaming mode > does those optimizations too, but it also allows complex conditions, > while streaming mode does not allow them to keep things fast and avoid > MR. > > That's the reason why I suggest that we drop UPDATE and DELETE from > DML streaming as they mean messing with those soon-hidden columns. > > Still we could optimize stuff like DELETE from Person WHERE id1 = 5 > AND id2 = 6 - query involves ALL fields of key AND compares only for > equality AND has no complex expressions - we can construct key > unambiguously and still call remove directly. > > But to me it does not sound like a really great reason to leave UPDATE > and DELETE in DML - the users will have to write some specific queries > to use that while all other stuff will just be declined in that mode. > And, as I said before, UPDATE and DELETE don't probably perfectly fit > with primary data streamer use cases - after all, modifying existing > stuff is not what data streamer is about. > > And regarding hiding columns: it's unclear how things will look like > for caches like when we remove _key and _val as long as > tables for such cases currently have nothing but those two columns. > > - Alex > >>> 8 февр. 2017 г. 11:33 PM пользователь "Dmitriy Setrakyan" < >>> dsetrak...@apache.org> написал: >>> >>> > Alexander, >>> > >>> > Are you suggesting that currently to execute a simple INSERT for 1 row we >>> > invoke a data streamer on Ignite API? How about an update by a primary >>> key? >>> > Why not execute a simple cache put in either case? >>> > >>> > I think we had a separate thread where we agreed that the streamer should >>> > only be turned on if a certain flag on a JDBC connection is set, no? >>> > >>> > D. >>> > >>> > On Wed, Feb 8, 2017 at 7:00 AM, Alexander Paschenko < >>> > alexander.a.pasche...@gmail.com> wrote: >>> > >>> > > Hello Igniters, >>> > > >>> > > I'd like to raise few questions regarding data streaming via DML >>> > > statements. >>> > > >>> > > Currently, all types of DML statements are supported (INSERT, UPDATE, >>> > > DELETE, MERGE). >>> > > >>> > > UPDATE and DELETE are supported in streaming mode only when their >>> > > WHERE condition is bounded with _key and/or _val columns, and UPDATE >>> > > works only for _val column directly. >>> > > >>> > > Seeing some activity in direction of hiding _key and _val from the >>> > > user as far as possible, these features seem pointless and should not >>> > > be
Re: DML data streaming
Dima, > > There are several ways to handle it. I would check how other databases > handle it, maybe we can borrow something. To the least, we should log such > errors in the log for now. > Logging errors would mean introducing some kind of stream receiver to do that and thus that would be really the same performance penalty for the successful operations. I think we should go with that optional flag for semantics after all. > You don't have to use _key. Primary key is usually a field in the class, so > you can use a normal column name. In any case, we should remove any usage > of _key before 2.0 is released. > > Again, if user does not have to specify _key on INSERT, then it is very > unclear to me, why user would need to specify _key for UPDATE or DELETE. > Something smells here. Can you please provide an example? > UPDATE and DELETE _in streaming mode_ are carried _only_ for "fast" optimized cases - i.e. those where _key (and possibly _val) are explicitly specified by the user thus allowing us to map UPDATE and DELETE directly to cache's replace and remove operations without messing with entry processors and doing map-reduce SELECT by given criteria. Say, we have Person { firstName, secondName } with key class Key { id1, id2 } If I say DELETE from Person WHERE _key = ? and specify arg via JDBC, there's no need to do any SELECT - we can just call IgniteCache.remove on that key. But if I say DELETE from Person WHERE id1 = 5 then there's no way to avoid MR - we have to find all keys that interest us first by doing SELECT as long as we know only partly about what keys the user wants to be affected. It works in the same way for UPDATE. And I hope that it's clear how it's different from INSERT - there's no MR by definition (we don't allow INSERT FROM SELECT in streaming mode). AGAIN: this all is said only about streaming mode; non streaming mode does those optimizations too, but it also allows complex conditions, while streaming mode does not allow them to keep things fast and avoid MR. That's the reason why I suggest that we drop UPDATE and DELETE from DML streaming as they mean messing with those soon-hidden columns. Still we could optimize stuff like DELETE from Person WHERE id1 = 5 AND id2 = 6 - query involves ALL fields of key AND compares only for equality AND has no complex expressions - we can construct key unambiguously and still call remove directly. But to me it does not sound like a really great reason to leave UPDATE and DELETE in DML - the users will have to write some specific queries to use that while all other stuff will just be declined in that mode. And, as I said before, UPDATE and DELETE don't probably perfectly fit with primary data streamer use cases - after all, modifying existing stuff is not what data streamer is about. And regarding hiding columns: it's unclear how things will look like for caches likewhen we remove _key and _val as long as tables for such cases currently have nothing but those two columns. - Alex >> 8 февр. 2017 г. 11:33 PM пользователь "Dmitriy Setrakyan" < >> dsetrak...@apache.org> написал: >> >> > Alexander, >> > >> > Are you suggesting that currently to execute a simple INSERT for 1 row we >> > invoke a data streamer on Ignite API? How about an update by a primary >> key? >> > Why not execute a simple cache put in either case? >> > >> > I think we had a separate thread where we agreed that the streamer should >> > only be turned on if a certain flag on a JDBC connection is set, no? >> > >> > D. >> > >> > On Wed, Feb 8, 2017 at 7:00 AM, Alexander Paschenko < >> > alexander.a.pasche...@gmail.com> wrote: >> > >> > > Hello Igniters, >> > > >> > > I'd like to raise few questions regarding data streaming via DML >> > > statements. >> > > >> > > Currently, all types of DML statements are supported (INSERT, UPDATE, >> > > DELETE, MERGE). >> > > >> > > UPDATE and DELETE are supported in streaming mode only when their >> > > WHERE condition is bounded with _key and/or _val columns, and UPDATE >> > > works only for _val column directly. >> > > >> > > Seeing some activity in direction of hiding _key and _val from the >> > > user as far as possible, these features seem pointless and should not >> > > be released, what do you think? >> > > >> > > Also INSERT in streaming mode currently does not throw errors on >> > > duplicate keys and silently ignores such new records (as long as it's >> > > faster than it would work if we'd introduced receiver that would throw >> > > exceptions) - this can be fixed with additional flag that could >> > > _optionally_ make INSERT slower but more accurate in semantic. >> > > >> > > And MERGE in streaming mode currently not totally accurate in >> > > semantic, too - on key presence, it will just replace whole value with >> > > new one thus potentially making values of some concrete columns/fields >> > > lost - this is analogous to >> > > https://issues.apache.org/jira/browse/IGNITE-4489, but hardly can be >> > > fixed
Re: DML data streaming
On Thu, Feb 9, 2017 at 1:53 AM, Alexander Paschenko < alexander.a.pasche...@gmail.com> wrote: > Sergey, > > Streaming does not make sense for INSERT FROM SELECT as this pattern does > not match primary use case for streaming (bulk data load to Ignite). > > Dima, > > No, I suggest that data streamer mode supports full semantic sense of > INSERT (throw an ex if there's a duplicate of PK) optionally and depending > on a flag (that is to be introduced). Currently new records are quietly > ignored on key duplication — it's really just a question of notifying the > user about duplicate keys in streaming mode. > There are several ways to handle it. I would check how other databases handle it, maybe we can borrow something. To the least, we should log such errors in the log for now. > Update by primary key is implemented now, but obviously it involves user > messing with _key column that we're planning to hide from them in near > future. > You don't have to use _key. Primary key is usually a field in the class, so you can use a normal column name. In any case, we should remove any usage of _key before 2.0 is released. > > Streaming is turned on via the flag, just as we've agreed in one of prev > threads. This thread is not about how we turn streaming on but rather about > semantic correctness of INSERT and MERGE in this mode and about whether we > need UPDATE and DELETE in it as they do not essentially load new data into > cache and (_in streaming mode_) make user mess with service columns of _key > and _val. > Again, if user does not have to specify _key on INSERT, then it is very unclear to me, why user would need to specify _key for UPDATE or DELETE. Something smells here. Can you please provide an example? > > — Alex > 8 февр. 2017 г. 11:33 PM пользователь "Dmitriy Setrakyan" < > dsetrak...@apache.org> написал: > > > Alexander, > > > > Are you suggesting that currently to execute a simple INSERT for 1 row we > > invoke a data streamer on Ignite API? How about an update by a primary > key? > > Why not execute a simple cache put in either case? > > > > I think we had a separate thread where we agreed that the streamer should > > only be turned on if a certain flag on a JDBC connection is set, no? > > > > D. > > > > On Wed, Feb 8, 2017 at 7:00 AM, Alexander Paschenko < > > alexander.a.pasche...@gmail.com> wrote: > > > > > Hello Igniters, > > > > > > I'd like to raise few questions regarding data streaming via DML > > > statements. > > > > > > Currently, all types of DML statements are supported (INSERT, UPDATE, > > > DELETE, MERGE). > > > > > > UPDATE and DELETE are supported in streaming mode only when their > > > WHERE condition is bounded with _key and/or _val columns, and UPDATE > > > works only for _val column directly. > > > > > > Seeing some activity in direction of hiding _key and _val from the > > > user as far as possible, these features seem pointless and should not > > > be released, what do you think? > > > > > > Also INSERT in streaming mode currently does not throw errors on > > > duplicate keys and silently ignores such new records (as long as it's > > > faster than it would work if we'd introduced receiver that would throw > > > exceptions) - this can be fixed with additional flag that could > > > _optionally_ make INSERT slower but more accurate in semantic. > > > > > > And MERGE in streaming mode currently not totally accurate in > > > semantic, too - on key presence, it will just replace whole value with > > > new one thus potentially making values of some concrete columns/fields > > > lost - this is analogous to > > > https://issues.apache.org/jira/browse/IGNITE-4489, but hardly can be > > > fixed as long as probably it would hit performance and would be > > > unresonably complex to implement. > > > > > > I suggest that we drop all except INSERT and introduce optional flag > > > for its totally correct semantic behavior as described above. > > > > > > - Alex > > > > > >
Re: DML data streaming
Sergey, Streaming does not make sense for INSERT FROM SELECT as this pattern does not match primary use case for streaming (bulk data load to Ignite). Dima, No, I suggest that data streamer mode supports full semantic sense of INSERT (throw an ex if there's a duplicate of PK) optionally and depending on a flag (that is to be introduced). Currently new records are quietly ignored on key duplication — it's really just a question of notifying the user about duplicate keys in streaming mode. Update by primary key is implemented now, but obviously it involves user messing with _key column that we're planning to hide from them in near future. Streaming is turned on via the flag, just as we've agreed in one of prev threads. This thread is not about how we turn streaming on but rather about semantic correctness of INSERT and MERGE in this mode and about whether we need UPDATE and DELETE in it as they do not essentially load new data into cache and (_in streaming mode_) make user mess with service columns of _key and _val. — Alex 8 февр. 2017 г. 11:33 PM пользователь "Dmitriy Setrakyan" < dsetrak...@apache.org> написал: > Alexander, > > Are you suggesting that currently to execute a simple INSERT for 1 row we > invoke a data streamer on Ignite API? How about an update by a primary key? > Why not execute a simple cache put in either case? > > I think we had a separate thread where we agreed that the streamer should > only be turned on if a certain flag on a JDBC connection is set, no? > > D. > > On Wed, Feb 8, 2017 at 7:00 AM, Alexander Paschenko < > alexander.a.pasche...@gmail.com> wrote: > > > Hello Igniters, > > > > I'd like to raise few questions regarding data streaming via DML > > statements. > > > > Currently, all types of DML statements are supported (INSERT, UPDATE, > > DELETE, MERGE). > > > > UPDATE and DELETE are supported in streaming mode only when their > > WHERE condition is bounded with _key and/or _val columns, and UPDATE > > works only for _val column directly. > > > > Seeing some activity in direction of hiding _key and _val from the > > user as far as possible, these features seem pointless and should not > > be released, what do you think? > > > > Also INSERT in streaming mode currently does not throw errors on > > duplicate keys and silently ignores such new records (as long as it's > > faster than it would work if we'd introduced receiver that would throw > > exceptions) - this can be fixed with additional flag that could > > _optionally_ make INSERT slower but more accurate in semantic. > > > > And MERGE in streaming mode currently not totally accurate in > > semantic, too - on key presence, it will just replace whole value with > > new one thus potentially making values of some concrete columns/fields > > lost - this is analogous to > > https://issues.apache.org/jira/browse/IGNITE-4489, but hardly can be > > fixed as long as probably it would hit performance and would be > > unresonably complex to implement. > > > > I suggest that we drop all except INSERT and introduce optional flag > > for its totally correct semantic behavior as described above. > > > > - Alex > > >
Re: DML data streaming
Alexander, Are you suggesting that currently to execute a simple INSERT for 1 row we invoke a data streamer on Ignite API? How about an update by a primary key? Why not execute a simple cache put in either case? I think we had a separate thread where we agreed that the streamer should only be turned on if a certain flag on a JDBC connection is set, no? D. On Wed, Feb 8, 2017 at 7:00 AM, Alexander Paschenko < alexander.a.pasche...@gmail.com> wrote: > Hello Igniters, > > I'd like to raise few questions regarding data streaming via DML > statements. > > Currently, all types of DML statements are supported (INSERT, UPDATE, > DELETE, MERGE). > > UPDATE and DELETE are supported in streaming mode only when their > WHERE condition is bounded with _key and/or _val columns, and UPDATE > works only for _val column directly. > > Seeing some activity in direction of hiding _key and _val from the > user as far as possible, these features seem pointless and should not > be released, what do you think? > > Also INSERT in streaming mode currently does not throw errors on > duplicate keys and silently ignores such new records (as long as it's > faster than it would work if we'd introduced receiver that would throw > exceptions) - this can be fixed with additional flag that could > _optionally_ make INSERT slower but more accurate in semantic. > > And MERGE in streaming mode currently not totally accurate in > semantic, too - on key presence, it will just replace whole value with > new one thus potentially making values of some concrete columns/fields > lost - this is analogous to > https://issues.apache.org/jira/browse/IGNITE-4489, but hardly can be > fixed as long as probably it would hit performance and would be > unresonably complex to implement. > > I suggest that we drop all except INSERT and introduce optional flag > for its totally correct semantic behavior as described above. > > - Alex >
Re: DML data streaming
Hi Alexander. What's about supporting statement *INSERT INTO ... SELECT FROM* for streams? Does it make sense? On Wed, Feb 8, 2017 at 6:44 PM, Alexander Paschenko < alexander.a.pasche...@gmail.com> wrote: > Also, currently it's possible to run SELECTs on "streamed" > connections, and probably this is odd and should not be released too, > what do you think? > > - Alex > > 2017-02-08 18:00 GMT+03:00 Alexander Paschenko >: > > Hello Igniters, > > > > I'd like to raise few questions regarding data streaming via DML > statements. > > > > Currently, all types of DML statements are supported (INSERT, UPDATE, > > DELETE, MERGE). > > > > UPDATE and DELETE are supported in streaming mode only when their > > WHERE condition is bounded with _key and/or _val columns, and UPDATE > > works only for _val column directly. > > > > Seeing some activity in direction of hiding _key and _val from the > > user as far as possible, these features seem pointless and should not > > be released, what do you think? > > > > Also INSERT in streaming mode currently does not throw errors on > > duplicate keys and silently ignores such new records (as long as it's > > faster than it would work if we'd introduced receiver that would throw > > exceptions) - this can be fixed with additional flag that could > > _optionally_ make INSERT slower but more accurate in semantic. > > > > And MERGE in streaming mode currently not totally accurate in > > semantic, too - on key presence, it will just replace whole value with > > new one thus potentially making values of some concrete columns/fields > > lost - this is analogous to > > https://issues.apache.org/jira/browse/IGNITE-4489, but hardly can be > > fixed as long as probably it would hit performance and would be > > unresonably complex to implement. > > > > I suggest that we drop all except INSERT and introduce optional flag > > for its totally correct semantic behavior as described above. > > > > - Alex > -- Sergey Kozlov GridGain Systems www.gridgain.com
Re: DML data streaming
Also, currently it's possible to run SELECTs on "streamed" connections, and probably this is odd and should not be released too, what do you think? - Alex 2017-02-08 18:00 GMT+03:00 Alexander Paschenko: > Hello Igniters, > > I'd like to raise few questions regarding data streaming via DML statements. > > Currently, all types of DML statements are supported (INSERT, UPDATE, > DELETE, MERGE). > > UPDATE and DELETE are supported in streaming mode only when their > WHERE condition is bounded with _key and/or _val columns, and UPDATE > works only for _val column directly. > > Seeing some activity in direction of hiding _key and _val from the > user as far as possible, these features seem pointless and should not > be released, what do you think? > > Also INSERT in streaming mode currently does not throw errors on > duplicate keys and silently ignores such new records (as long as it's > faster than it would work if we'd introduced receiver that would throw > exceptions) - this can be fixed with additional flag that could > _optionally_ make INSERT slower but more accurate in semantic. > > And MERGE in streaming mode currently not totally accurate in > semantic, too - on key presence, it will just replace whole value with > new one thus potentially making values of some concrete columns/fields > lost - this is analogous to > https://issues.apache.org/jira/browse/IGNITE-4489, but hardly can be > fixed as long as probably it would hit performance and would be > unresonably complex to implement. > > I suggest that we drop all except INSERT and introduce optional flag > for its totally correct semantic behavior as described above. > > - Alex