Re: DML data streaming

2017-02-17 Thread Dmitriy Setrakyan
If we adapt table-per-cache policy, then table name should be equal to
cache name, especially when table is created via SQL.

For complex types, the type should also be equal to the table name. If the
value type is primitive, then you can still use the table name in SQL and
use the table name as cache name in code.

In my view, the design works. Do you agree?

D.

On Thu, Feb 16, 2017 at 11:58 PM, Vladimir Ozerov 
wrote:

> Dima,
>
> Value type name doesn't necessarily maps to table name. For instance, what
> if I have two tables like this? They both have "java.lang.Long" as type
> name.
>
> CREATE table *t1* {
> pk_id BIGINT PRIMARY KEY,
> val BIGINT
> }
>
> CREATE table *t2* {
> pk_id BIGINT PRIMARY KEY,
> val BIGINT
> }
>
> On Fri, Feb 17, 2017 at 12:40 AM, Dmitriy Setrakyan  >
> wrote:
>
> > Vladimir, I am not sure I understand your point. The value type name
> should
> > be the table name, no?
> >
> > On Thu, Feb 16, 2017 at 12:13 AM, Vladimir Ozerov 
> > wrote:
> >
> > > Dima,
> > >
> > > At this point we require the following additional data which is outside
> > of
> > > standard SQL:
> > > - Key type
> > > - Value type
> > > - Set of key columns
> > >
> > > I do not know yet how we will define these values. At the very least we
> > can
> > > calculate them automatically in some cases. For "keyFieldName" and
> > > "valFieldName" things are easier, as we can always derive them from
> table
> > > definition.
> > >
> > > Example 1 - primitives:
> > >
> > > CREATE TABLE (
> > > *pk_id* BIGINT PRIMARY KEY,
> > > *val*   BIGINT
> > > )
> > >
> > > keyFieldName = "*pk_id*", valFieldName = "*val*"
> > >
> > > Example 2 - composites:
> > >
> > > CREATE TABLE (
> > > *pk_id* BIGINT PRIMARY KEY,
> > > val1  BIGINT,
> > > val2  VARCHAR
> > > )
> > >
> > > keyFieldName = "*pk_id*", valFieldName = null (because value is complex
> > and
> > > is composed of two attributes).
> > >
> > > Vladimir.
> > >
> > >
> > > On Wed, Feb 15, 2017 at 11:42 PM, Dmitriy Setrakyan <
> > dsetrak...@apache.org
> > > >
> > > wrote:
> > >
> > > > On Wed, Feb 15, 2017 at 4:28 AM, Vladimir Ozerov <
> voze...@gridgain.com
> > >
> > > > wrote:
> > > >
> > > > > Ok, let's put aside current fields configuration, I'll create
> > separate
> > > > > thread for it. As far as _KEY and _VAL, proposed change is exactly
> > > about
> > > > > mappings:
> > > > >
> > > > > class QueryEntity {
> > > > > ...
> > > > > String keyFieldName;
> > > > > String valFieldName;
> > > > > ...
> > > > > }
> > > > >
> > > > > The key thing is that we will not require users to be aware of our
> > > system
> > > > > columns. Normally user should not bother about existence of hidden
> > _KEY
> > > > and
> > > > > _VAL columns. Instead, we just allow them to optionally reference
> the
> > > > whole
> > > > > key and/or val through predefined name.
> > > > >
> > > > >
> > > > Vladimir, how will it work from the DDL perspective. Let's say
> whenever
> > > > user wants to create a table in Ignite?
> > > >
> > >
> >
>


Re: DML data streaming

2017-02-16 Thread Vladimir Ozerov
Dima,

Value type name doesn't necessarily maps to table name. For instance, what
if I have two tables like this? They both have "java.lang.Long" as type
name.

CREATE table *t1* {
pk_id BIGINT PRIMARY KEY,
val BIGINT
}

CREATE table *t2* {
pk_id BIGINT PRIMARY KEY,
val BIGINT
}

On Fri, Feb 17, 2017 at 12:40 AM, Dmitriy Setrakyan 
wrote:

> Vladimir, I am not sure I understand your point. The value type name should
> be the table name, no?
>
> On Thu, Feb 16, 2017 at 12:13 AM, Vladimir Ozerov 
> wrote:
>
> > Dima,
> >
> > At this point we require the following additional data which is outside
> of
> > standard SQL:
> > - Key type
> > - Value type
> > - Set of key columns
> >
> > I do not know yet how we will define these values. At the very least we
> can
> > calculate them automatically in some cases. For "keyFieldName" and
> > "valFieldName" things are easier, as we can always derive them from table
> > definition.
> >
> > Example 1 - primitives:
> >
> > CREATE TABLE (
> > *pk_id* BIGINT PRIMARY KEY,
> > *val*   BIGINT
> > )
> >
> > keyFieldName = "*pk_id*", valFieldName = "*val*"
> >
> > Example 2 - composites:
> >
> > CREATE TABLE (
> > *pk_id* BIGINT PRIMARY KEY,
> > val1  BIGINT,
> > val2  VARCHAR
> > )
> >
> > keyFieldName = "*pk_id*", valFieldName = null (because value is complex
> and
> > is composed of two attributes).
> >
> > Vladimir.
> >
> >
> > On Wed, Feb 15, 2017 at 11:42 PM, Dmitriy Setrakyan <
> dsetrak...@apache.org
> > >
> > wrote:
> >
> > > On Wed, Feb 15, 2017 at 4:28 AM, Vladimir Ozerov  >
> > > wrote:
> > >
> > > > Ok, let's put aside current fields configuration, I'll create
> separate
> > > > thread for it. As far as _KEY and _VAL, proposed change is exactly
> > about
> > > > mappings:
> > > >
> > > > class QueryEntity {
> > > > ...
> > > > String keyFieldName;
> > > > String valFieldName;
> > > > ...
> > > > }
> > > >
> > > > The key thing is that we will not require users to be aware of our
> > system
> > > > columns. Normally user should not bother about existence of hidden
> _KEY
> > > and
> > > > _VAL columns. Instead, we just allow them to optionally reference the
> > > whole
> > > > key and/or val through predefined name.
> > > >
> > > >
> > > Vladimir, how will it work from the DDL perspective. Let's say whenever
> > > user wants to create a table in Ignite?
> > >
> >
>


Re: DML data streaming

2017-02-16 Thread Dmitriy Setrakyan
Vladimir, I am not sure I understand your point. The value type name should
be the table name, no?

On Thu, Feb 16, 2017 at 12:13 AM, Vladimir Ozerov 
wrote:

> Dima,
>
> At this point we require the following additional data which is outside of
> standard SQL:
> - Key type
> - Value type
> - Set of key columns
>
> I do not know yet how we will define these values. At the very least we can
> calculate them automatically in some cases. For "keyFieldName" and
> "valFieldName" things are easier, as we can always derive them from table
> definition.
>
> Example 1 - primitives:
>
> CREATE TABLE (
> *pk_id* BIGINT PRIMARY KEY,
> *val*   BIGINT
> )
>
> keyFieldName = "*pk_id*", valFieldName = "*val*"
>
> Example 2 - composites:
>
> CREATE TABLE (
> *pk_id* BIGINT PRIMARY KEY,
> val1  BIGINT,
> val2  VARCHAR
> )
>
> keyFieldName = "*pk_id*", valFieldName = null (because value is complex and
> is composed of two attributes).
>
> Vladimir.
>
>
> On Wed, Feb 15, 2017 at 11:42 PM, Dmitriy Setrakyan  >
> wrote:
>
> > On Wed, Feb 15, 2017 at 4:28 AM, Vladimir Ozerov 
> > wrote:
> >
> > > Ok, let's put aside current fields configuration, I'll create separate
> > > thread for it. As far as _KEY and _VAL, proposed change is exactly
> about
> > > mappings:
> > >
> > > class QueryEntity {
> > > ...
> > > String keyFieldName;
> > > String valFieldName;
> > > ...
> > > }
> > >
> > > The key thing is that we will not require users to be aware of our
> system
> > > columns. Normally user should not bother about existence of hidden _KEY
> > and
> > > _VAL columns. Instead, we just allow them to optionally reference the
> > whole
> > > key and/or val through predefined name.
> > >
> > >
> > Vladimir, how will it work from the DDL perspective. Let's say whenever
> > user wants to create a table in Ignite?
> >
>


Re: DML data streaming

2017-02-16 Thread Vladimir Ozerov
Dima,

At this point we require the following additional data which is outside of
standard SQL:
- Key type
- Value type
- Set of key columns

I do not know yet how we will define these values. At the very least we can
calculate them automatically in some cases. For "keyFieldName" and
"valFieldName" things are easier, as we can always derive them from table
definition.

Example 1 - primitives:

CREATE TABLE (
*pk_id* BIGINT PRIMARY KEY,
*val*   BIGINT
)

keyFieldName = "*pk_id*", valFieldName = "*val*"

Example 2 - composites:

CREATE TABLE (
*pk_id* BIGINT PRIMARY KEY,
val1  BIGINT,
val2  VARCHAR
)

keyFieldName = "*pk_id*", valFieldName = null (because value is complex and
is composed of two attributes).

Vladimir.


On Wed, Feb 15, 2017 at 11:42 PM, Dmitriy Setrakyan 
wrote:

> On Wed, Feb 15, 2017 at 4:28 AM, Vladimir Ozerov 
> wrote:
>
> > Ok, let's put aside current fields configuration, I'll create separate
> > thread for it. As far as _KEY and _VAL, proposed change is exactly about
> > mappings:
> >
> > class QueryEntity {
> > ...
> > String keyFieldName;
> > String valFieldName;
> > ...
> > }
> >
> > The key thing is that we will not require users to be aware of our system
> > columns. Normally user should not bother about existence of hidden _KEY
> and
> > _VAL columns. Instead, we just allow them to optionally reference the
> whole
> > key and/or val through predefined name.
> >
> >
> Vladimir, how will it work from the DDL perspective. Let's say whenever
> user wants to create a table in Ignite?
>


Re: DML data streaming

2017-02-15 Thread Dmitriy Setrakyan
On Wed, Feb 15, 2017 at 2:41 PM, Alexander Paschenko <
alexander.a.pasche...@gmail.com> wrote:

> Folks,
>
> Regarding INSERT semantics in JDBC DML streaming mode - I've left only
> INSERTs supports as we'd agreed before.
>
> However, current architecture of streaming related internals does not
> give any clear way to intercept key duplicates and inform the user -
> say, I can't just throw an exception from stream receiver (which is to
> my knowledge the only place where we could filter erroneous keys) as
> long as it will lead to whole batch remap and it's clearly not what we
> want here.
>
> Printing warning to log from the receiver is of little to no use as it
> will happen on data nodes so the end user won't see anything.
>

However, you still must do it. You should try throttling the identical log
messages, so we don't flood the log.


>
> What I've introduced for now is optional config param that turns on
> allowOverwrite on the streamer used in DML operation.
>

Agree, sounds like a good use of the flag. Are you setting it via JDBC/ODBC
connection flag?


> Does anyone have any thoughts about what could/should be done
> regarding informing user about key duplicates in streaming mode? Or
> probably we should just let it be as it is now?
>

In my view, we should introduce some generic error trap callback, e.g.
onSqlError(...), for all unhandled SQL errors. User should provide it in
the configuration, before startup. What do you think?


>
> Regards,
> Alex
>
> 2017-02-15 23:42 GMT+03:00 Dmitriy Setrakyan :
> > On Wed, Feb 15, 2017 at 4:28 AM, Vladimir Ozerov 
> > wrote:
> >
> >> Ok, let's put aside current fields configuration, I'll create separate
> >> thread for it. As far as _KEY and _VAL, proposed change is exactly about
> >> mappings:
> >>
> >> class QueryEntity {
> >> ...
> >> String keyFieldName;
> >> String valFieldName;
> >> ...
> >> }
> >>
> >> The key thing is that we will not require users to be aware of our
> system
> >> columns. Normally user should not bother about existence of hidden _KEY
> and
> >> _VAL columns. Instead, we just allow them to optionally reference the
> whole
> >> key and/or val through predefined name.
> >>
> >>
> > Vladimir, how will it work from the DDL perspective. Let's say whenever
> > user wants to create a table in Ignite?
>


Re: DML data streaming

2017-02-15 Thread Alexander Paschenko
Folks,

Regarding INSERT semantics in JDBC DML streaming mode - I've left only
INSERTs supports as we'd agreed before.

However, current architecture of streaming related internals does not
give any clear way to intercept key duplicates and inform the user -
say, I can't just throw an exception from stream receiver (which is to
my knowledge the only place where we could filter erroneous keys) as
long as it will lead to whole batch remap and it's clearly not what we
want here.

Printing warning to log from the receiver is of little to no use as it
will happen on data nodes so the end user won't see anything.

What I've introduced for now is optional config param that turns on
allowOverwrite on the streamer used in DML operation.

Does anyone have any thoughts about what could/should be done
regarding informing user about key duplicates in streaming mode? Or
probably we should just let it be as it is now?

Regards,
Alex

2017-02-15 23:42 GMT+03:00 Dmitriy Setrakyan :
> On Wed, Feb 15, 2017 at 4:28 AM, Vladimir Ozerov 
> wrote:
>
>> Ok, let's put aside current fields configuration, I'll create separate
>> thread for it. As far as _KEY and _VAL, proposed change is exactly about
>> mappings:
>>
>> class QueryEntity {
>> ...
>> String keyFieldName;
>> String valFieldName;
>> ...
>> }
>>
>> The key thing is that we will not require users to be aware of our system
>> columns. Normally user should not bother about existence of hidden _KEY and
>> _VAL columns. Instead, we just allow them to optionally reference the whole
>> key and/or val through predefined name.
>>
>>
> Vladimir, how will it work from the DDL perspective. Let's say whenever
> user wants to create a table in Ignite?


Re: DML data streaming

2017-02-15 Thread Dmitriy Setrakyan
On Wed, Feb 15, 2017 at 4:28 AM, Vladimir Ozerov 
wrote:

> Ok, let's put aside current fields configuration, I'll create separate
> thread for it. As far as _KEY and _VAL, proposed change is exactly about
> mappings:
>
> class QueryEntity {
> ...
> String keyFieldName;
> String valFieldName;
> ...
> }
>
> The key thing is that we will not require users to be aware of our system
> columns. Normally user should not bother about existence of hidden _KEY and
> _VAL columns. Instead, we just allow them to optionally reference the whole
> key and/or val through predefined name.
>
>
Vladimir, how will it work from the DDL perspective. Let's say whenever
user wants to create a table in Ignite?


Re: DML data streaming

2017-02-15 Thread Sergi Vladykin
Vladimir,

Looks good to me.


Pavel,

No worries, it will work exactly like you described: hidden _key and _val
fields will be always accessible.

Sergi

2017-02-15 15:56 GMT+03:00 Pavel Tupitsyn :

> I have no particular opinion on how we should handle _key/_val,
> but we certainly need a way to select entire key and value objects via
> SqlFieldsQuery,
> and this should work without any additional configuration.
>
> We can rename these, turn them into system functions, whatever.
>
> Ignite.NET LINQ provider heavily relies on this possibility - users often
> want to select the entire entry value.
>
> On Wed, Feb 15, 2017 at 3:28 PM, Vladimir Ozerov 
> wrote:
>
> > Ok, let's put aside current fields configuration, I'll create separate
> > thread for it. As far as _KEY and _VAL, proposed change is exactly about
> > mappings:
> >
> > class QueryEntity {
> > ...
> > String keyFieldName;
> > String valFieldName;
> > ...
> > }
> >
> > The key thing is that we will not require users to be aware of our system
> > columns. Normally user should not bother about existence of hidden _KEY
> and
> > _VAL columns. Instead, we just allow them to optionally reference the
> whole
> > key and/or val through predefined name.
> >
> > On Wed, Feb 15, 2017 at 3:07 PM, Sergi Vladykin <
> sergi.vlady...@gmail.com>
> > wrote:
> >
> > > I don't see any improvement here. Usability will only suffer with this
> > > change.
> > >
> > > I'd suggest to just add mapping for system columns like _key, _val ,
> > _ver.
> > >
> > > Sergi
> > >
> > > 2017-02-15 13:18 GMT+03:00 Vladimir Ozerov :
> > >
> > > > I think the whole QueryEntity class require rework to allow for this
> > > > change. I would start with creating QueryField class which will
> > > encapsulate
> > > > all field properties which are currently set through different
> setters:
> > > >
> > > > class QueryField {
> > > > String name;
> > > > String type;
> > > > String alias;
> > > > boolean keyField;
> > > > }
> > > >
> > > > class QueryEntity {
> > > > String tableName;
> > > > String keyType;
> > > > String valType;
> > > > Collection fields;
> > > > Collection indexes;
> > > > }
> > > >
> > > > Then we can add optional key and value field names to top-level
> config.
> > > If
> > > > set, key and/or value will have names and will be included into
> SELECT
> > *
> > > > query in the same way as we do this for _KEY and _VAL at the moment:
> > > >
> > > > class QueryEntity {
> > > > String tableName;
> > > > String keyType;
> > > > String valType;
> > > > *String keyFieldName;*
> > > > *String valFieldName;*
> > > > Collection fields;
> > > > Collection indexes;
> > > > }
> > > >
> > > > Any other ideas?
> > > >
> > > > On Tue, Feb 14, 2017 at 9:19 PM, Dmitriy Setrakyan <
> > > dsetrak...@apache.org>
> > > > wrote:
> > > >
> > > > > Vova,
> > > > >
> > > > > Agree about the primitive types. However, it is not clear to me how
> > the
> > > > > mapping from a primitive type to a column name will be supported.
> Do
> > > you
> > > > > have a design in mind?
> > > > >
> > > > > D.
> > > > >
> > > > > On Tue, Feb 14, 2017 at 6:16 AM, Vladimir Ozerov <
> > voze...@gridgain.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Dima,
> > > > > >
> > > > > > This will not work for primitive keys and values as currently the
> > > only
> > > > > way
> > > > > > to address them is to use "_KEY" and "_VAL" aliases respectively.
> > For
> > > > > this
> > > > > > reason I would rather postpone UPDATE/DELETE implementation until
> > > > "_KEY"
> > > > > > and "_VAL" are hidden from public API and some kind of mapping is
> > > > > > introduced. AFAIK this should be handled as a part of IGNITE-3487
> > > ]1].
> > > > > >
> > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-3487
> > > > > >
> > > > > > On Sat, Feb 11, 2017 at 3:36 AM, Dmitriy Setrakyan <
> > > > > dsetrak...@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > > > On Fri, Feb 10, 2017 at 3:36 AM, Vladimir Ozerov <
> > > > voze...@gridgain.com
> > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > I propose to ship streaming with INSERT support only for now.
> > > This
> > > > is
> > > > > > > > enough for multitude cases and will add value to Ignite 1.9
> > > > > > immediately.
> > > > > > > We
> > > > > > > > can think about correct streaming UPDATE/DELETE architecture
> > > > > separately
> > > > > > > .It
> > > > > > > > is much more difficult thing, we cannot support it in a clean
> > way
> > > > > right
> > > > > > > now
> > > > > > > > due to multiple "_key" and "_val" usages over the code base.
> > > > > > > >
> > > > > > >
> > > > > > > Vova, I disagree. If all parts of the key are present, then we
> > can
> > > > > always
> > > > > > > construct a key in all cases. For these operations we can
> always
> > > > > support
> > > > > > > streaming. For all other 

Re: DML data streaming

2017-02-15 Thread Pavel Tupitsyn
I have no particular opinion on how we should handle _key/_val,
but we certainly need a way to select entire key and value objects via
SqlFieldsQuery,
and this should work without any additional configuration.

We can rename these, turn them into system functions, whatever.

Ignite.NET LINQ provider heavily relies on this possibility - users often
want to select the entire entry value.

On Wed, Feb 15, 2017 at 3:28 PM, Vladimir Ozerov 
wrote:

> Ok, let's put aside current fields configuration, I'll create separate
> thread for it. As far as _KEY and _VAL, proposed change is exactly about
> mappings:
>
> class QueryEntity {
> ...
> String keyFieldName;
> String valFieldName;
> ...
> }
>
> The key thing is that we will not require users to be aware of our system
> columns. Normally user should not bother about existence of hidden _KEY and
> _VAL columns. Instead, we just allow them to optionally reference the whole
> key and/or val through predefined name.
>
> On Wed, Feb 15, 2017 at 3:07 PM, Sergi Vladykin 
> wrote:
>
> > I don't see any improvement here. Usability will only suffer with this
> > change.
> >
> > I'd suggest to just add mapping for system columns like _key, _val ,
> _ver.
> >
> > Sergi
> >
> > 2017-02-15 13:18 GMT+03:00 Vladimir Ozerov :
> >
> > > I think the whole QueryEntity class require rework to allow for this
> > > change. I would start with creating QueryField class which will
> > encapsulate
> > > all field properties which are currently set through different setters:
> > >
> > > class QueryField {
> > > String name;
> > > String type;
> > > String alias;
> > > boolean keyField;
> > > }
> > >
> > > class QueryEntity {
> > > String tableName;
> > > String keyType;
> > > String valType;
> > > Collection fields;
> > > Collection indexes;
> > > }
> > >
> > > Then we can add optional key and value field names to top-level config.
> > If
> > > set, key and/or value will have names and will be included into SELECT
> *
> > > query in the same way as we do this for _KEY and _VAL at the moment:
> > >
> > > class QueryEntity {
> > > String tableName;
> > > String keyType;
> > > String valType;
> > > *String keyFieldName;*
> > > *String valFieldName;*
> > > Collection fields;
> > > Collection indexes;
> > > }
> > >
> > > Any other ideas?
> > >
> > > On Tue, Feb 14, 2017 at 9:19 PM, Dmitriy Setrakyan <
> > dsetrak...@apache.org>
> > > wrote:
> > >
> > > > Vova,
> > > >
> > > > Agree about the primitive types. However, it is not clear to me how
> the
> > > > mapping from a primitive type to a column name will be supported. Do
> > you
> > > > have a design in mind?
> > > >
> > > > D.
> > > >
> > > > On Tue, Feb 14, 2017 at 6:16 AM, Vladimir Ozerov <
> voze...@gridgain.com
> > >
> > > > wrote:
> > > >
> > > > > Dima,
> > > > >
> > > > > This will not work for primitive keys and values as currently the
> > only
> > > > way
> > > > > to address them is to use "_KEY" and "_VAL" aliases respectively.
> For
> > > > this
> > > > > reason I would rather postpone UPDATE/DELETE implementation until
> > > "_KEY"
> > > > > and "_VAL" are hidden from public API and some kind of mapping is
> > > > > introduced. AFAIK this should be handled as a part of IGNITE-3487
> > ]1].
> > > > >
> > > > > [1] https://issues.apache.org/jira/browse/IGNITE-3487
> > > > >
> > > > > On Sat, Feb 11, 2017 at 3:36 AM, Dmitriy Setrakyan <
> > > > dsetrak...@apache.org>
> > > > > wrote:
> > > > >
> > > > > > On Fri, Feb 10, 2017 at 3:36 AM, Vladimir Ozerov <
> > > voze...@gridgain.com
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > I propose to ship streaming with INSERT support only for now.
> > This
> > > is
> > > > > > > enough for multitude cases and will add value to Ignite 1.9
> > > > > immediately.
> > > > > > We
> > > > > > > can think about correct streaming UPDATE/DELETE architecture
> > > > separately
> > > > > > .It
> > > > > > > is much more difficult thing, we cannot support it in a clean
> way
> > > > right
> > > > > > now
> > > > > > > due to multiple "_key" and "_val" usages over the code base.
> > > > > > >
> > > > > >
> > > > > > Vova, I disagree. If all parts of the key are present, then we
> can
> > > > always
> > > > > > construct a key in all cases. For these operations we can always
> > > > support
> > > > > > streaming. For all other operations, we can delegate to standard
> > MR,
> > > > but
> > > > > > still perform most operations on the same node, as I suggested in
> > > > another
> > > > > > email.
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: DML data streaming

2017-02-15 Thread Vladimir Ozerov
Ok, let's put aside current fields configuration, I'll create separate
thread for it. As far as _KEY and _VAL, proposed change is exactly about
mappings:

class QueryEntity {
...
String keyFieldName;
String valFieldName;
...
}

The key thing is that we will not require users to be aware of our system
columns. Normally user should not bother about existence of hidden _KEY and
_VAL columns. Instead, we just allow them to optionally reference the whole
key and/or val through predefined name.

On Wed, Feb 15, 2017 at 3:07 PM, Sergi Vladykin 
wrote:

> I don't see any improvement here. Usability will only suffer with this
> change.
>
> I'd suggest to just add mapping for system columns like _key, _val , _ver.
>
> Sergi
>
> 2017-02-15 13:18 GMT+03:00 Vladimir Ozerov :
>
> > I think the whole QueryEntity class require rework to allow for this
> > change. I would start with creating QueryField class which will
> encapsulate
> > all field properties which are currently set through different setters:
> >
> > class QueryField {
> > String name;
> > String type;
> > String alias;
> > boolean keyField;
> > }
> >
> > class QueryEntity {
> > String tableName;
> > String keyType;
> > String valType;
> > Collection fields;
> > Collection indexes;
> > }
> >
> > Then we can add optional key and value field names to top-level config.
> If
> > set, key and/or value will have names and will be included into SELECT *
> > query in the same way as we do this for _KEY and _VAL at the moment:
> >
> > class QueryEntity {
> > String tableName;
> > String keyType;
> > String valType;
> > *String keyFieldName;*
> > *String valFieldName;*
> > Collection fields;
> > Collection indexes;
> > }
> >
> > Any other ideas?
> >
> > On Tue, Feb 14, 2017 at 9:19 PM, Dmitriy Setrakyan <
> dsetrak...@apache.org>
> > wrote:
> >
> > > Vova,
> > >
> > > Agree about the primitive types. However, it is not clear to me how the
> > > mapping from a primitive type to a column name will be supported. Do
> you
> > > have a design in mind?
> > >
> > > D.
> > >
> > > On Tue, Feb 14, 2017 at 6:16 AM, Vladimir Ozerov  >
> > > wrote:
> > >
> > > > Dima,
> > > >
> > > > This will not work for primitive keys and values as currently the
> only
> > > way
> > > > to address them is to use "_KEY" and "_VAL" aliases respectively. For
> > > this
> > > > reason I would rather postpone UPDATE/DELETE implementation until
> > "_KEY"
> > > > and "_VAL" are hidden from public API and some kind of mapping is
> > > > introduced. AFAIK this should be handled as a part of IGNITE-3487
> ]1].
> > > >
> > > > [1] https://issues.apache.org/jira/browse/IGNITE-3487
> > > >
> > > > On Sat, Feb 11, 2017 at 3:36 AM, Dmitriy Setrakyan <
> > > dsetrak...@apache.org>
> > > > wrote:
> > > >
> > > > > On Fri, Feb 10, 2017 at 3:36 AM, Vladimir Ozerov <
> > voze...@gridgain.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > I propose to ship streaming with INSERT support only for now.
> This
> > is
> > > > > > enough for multitude cases and will add value to Ignite 1.9
> > > > immediately.
> > > > > We
> > > > > > can think about correct streaming UPDATE/DELETE architecture
> > > separately
> > > > > .It
> > > > > > is much more difficult thing, we cannot support it in a clean way
> > > right
> > > > > now
> > > > > > due to multiple "_key" and "_val" usages over the code base.
> > > > > >
> > > > >
> > > > > Vova, I disagree. If all parts of the key are present, then we can
> > > always
> > > > > construct a key in all cases. For these operations we can always
> > > support
> > > > > streaming. For all other operations, we can delegate to standard
> MR,
> > > but
> > > > > still perform most operations on the same node, as I suggested in
> > > another
> > > > > email.
> > > > >
> > > >
> > >
> >
>


Re: DML data streaming

2017-02-15 Thread Sergi Vladykin
I don't see any improvement here. Usability will only suffer with this
change.

I'd suggest to just add mapping for system columns like _key, _val , _ver.

Sergi

2017-02-15 13:18 GMT+03:00 Vladimir Ozerov :

> I think the whole QueryEntity class require rework to allow for this
> change. I would start with creating QueryField class which will encapsulate
> all field properties which are currently set through different setters:
>
> class QueryField {
> String name;
> String type;
> String alias;
> boolean keyField;
> }
>
> class QueryEntity {
> String tableName;
> String keyType;
> String valType;
> Collection fields;
> Collection indexes;
> }
>
> Then we can add optional key and value field names to top-level config. If
> set, key and/or value will have names and will be included into SELECT *
> query in the same way as we do this for _KEY and _VAL at the moment:
>
> class QueryEntity {
> String tableName;
> String keyType;
> String valType;
> *String keyFieldName;*
> *String valFieldName;*
> Collection fields;
> Collection indexes;
> }
>
> Any other ideas?
>
> On Tue, Feb 14, 2017 at 9:19 PM, Dmitriy Setrakyan 
> wrote:
>
> > Vova,
> >
> > Agree about the primitive types. However, it is not clear to me how the
> > mapping from a primitive type to a column name will be supported. Do you
> > have a design in mind?
> >
> > D.
> >
> > On Tue, Feb 14, 2017 at 6:16 AM, Vladimir Ozerov 
> > wrote:
> >
> > > Dima,
> > >
> > > This will not work for primitive keys and values as currently the only
> > way
> > > to address them is to use "_KEY" and "_VAL" aliases respectively. For
> > this
> > > reason I would rather postpone UPDATE/DELETE implementation until
> "_KEY"
> > > and "_VAL" are hidden from public API and some kind of mapping is
> > > introduced. AFAIK this should be handled as a part of IGNITE-3487 ]1].
> > >
> > > [1] https://issues.apache.org/jira/browse/IGNITE-3487
> > >
> > > On Sat, Feb 11, 2017 at 3:36 AM, Dmitriy Setrakyan <
> > dsetrak...@apache.org>
> > > wrote:
> > >
> > > > On Fri, Feb 10, 2017 at 3:36 AM, Vladimir Ozerov <
> voze...@gridgain.com
> > >
> > > > wrote:
> > > >
> > > > > I propose to ship streaming with INSERT support only for now. This
> is
> > > > > enough for multitude cases and will add value to Ignite 1.9
> > > immediately.
> > > > We
> > > > > can think about correct streaming UPDATE/DELETE architecture
> > separately
> > > > .It
> > > > > is much more difficult thing, we cannot support it in a clean way
> > right
> > > > now
> > > > > due to multiple "_key" and "_val" usages over the code base.
> > > > >
> > > >
> > > > Vova, I disagree. If all parts of the key are present, then we can
> > always
> > > > construct a key in all cases. For these operations we can always
> > support
> > > > streaming. For all other operations, we can delegate to standard MR,
> > but
> > > > still perform most operations on the same node, as I suggested in
> > another
> > > > email.
> > > >
> > >
> >
>


Re: DML data streaming

2017-02-15 Thread Vladimir Ozerov
I think the whole QueryEntity class require rework to allow for this
change. I would start with creating QueryField class which will encapsulate
all field properties which are currently set through different setters:

class QueryField {
String name;
String type;
String alias;
boolean keyField;
}

class QueryEntity {
String tableName;
String keyType;
String valType;
Collection fields;
Collection indexes;
}

Then we can add optional key and value field names to top-level config. If
set, key and/or value will have names and will be included into SELECT *
query in the same way as we do this for _KEY and _VAL at the moment:

class QueryEntity {
String tableName;
String keyType;
String valType;
*String keyFieldName;*
*String valFieldName;*
Collection fields;
Collection indexes;
}

Any other ideas?

On Tue, Feb 14, 2017 at 9:19 PM, Dmitriy Setrakyan 
wrote:

> Vova,
>
> Agree about the primitive types. However, it is not clear to me how the
> mapping from a primitive type to a column name will be supported. Do you
> have a design in mind?
>
> D.
>
> On Tue, Feb 14, 2017 at 6:16 AM, Vladimir Ozerov 
> wrote:
>
> > Dima,
> >
> > This will not work for primitive keys and values as currently the only
> way
> > to address them is to use "_KEY" and "_VAL" aliases respectively. For
> this
> > reason I would rather postpone UPDATE/DELETE implementation until "_KEY"
> > and "_VAL" are hidden from public API and some kind of mapping is
> > introduced. AFAIK this should be handled as a part of IGNITE-3487 ]1].
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-3487
> >
> > On Sat, Feb 11, 2017 at 3:36 AM, Dmitriy Setrakyan <
> dsetrak...@apache.org>
> > wrote:
> >
> > > On Fri, Feb 10, 2017 at 3:36 AM, Vladimir Ozerov  >
> > > wrote:
> > >
> > > > I propose to ship streaming with INSERT support only for now. This is
> > > > enough for multitude cases and will add value to Ignite 1.9
> > immediately.
> > > We
> > > > can think about correct streaming UPDATE/DELETE architecture
> separately
> > > .It
> > > > is much more difficult thing, we cannot support it in a clean way
> right
> > > now
> > > > due to multiple "_key" and "_val" usages over the code base.
> > > >
> > >
> > > Vova, I disagree. If all parts of the key are present, then we can
> always
> > > construct a key in all cases. For these operations we can always
> support
> > > streaming. For all other operations, we can delegate to standard MR,
> but
> > > still perform most operations on the same node, as I suggested in
> another
> > > email.
> > >
> >
>


Re: DML data streaming

2017-02-14 Thread Dmitriy Setrakyan
Vova,

Agree about the primitive types. However, it is not clear to me how the
mapping from a primitive type to a column name will be supported. Do you
have a design in mind?

D.

On Tue, Feb 14, 2017 at 6:16 AM, Vladimir Ozerov 
wrote:

> Dima,
>
> This will not work for primitive keys and values as currently the only way
> to address them is to use "_KEY" and "_VAL" aliases respectively. For this
> reason I would rather postpone UPDATE/DELETE implementation until "_KEY"
> and "_VAL" are hidden from public API and some kind of mapping is
> introduced. AFAIK this should be handled as a part of IGNITE-3487 ]1].
>
> [1] https://issues.apache.org/jira/browse/IGNITE-3487
>
> On Sat, Feb 11, 2017 at 3:36 AM, Dmitriy Setrakyan 
> wrote:
>
> > On Fri, Feb 10, 2017 at 3:36 AM, Vladimir Ozerov 
> > wrote:
> >
> > > I propose to ship streaming with INSERT support only for now. This is
> > > enough for multitude cases and will add value to Ignite 1.9
> immediately.
> > We
> > > can think about correct streaming UPDATE/DELETE architecture separately
> > .It
> > > is much more difficult thing, we cannot support it in a clean way right
> > now
> > > due to multiple "_key" and "_val" usages over the code base.
> > >
> >
> > Vova, I disagree. If all parts of the key are present, then we can always
> > construct a key in all cases. For these operations we can always support
> > streaming. For all other operations, we can delegate to standard MR, but
> > still perform most operations on the same node, as I suggested in another
> > email.
> >
>


Re: DML data streaming

2017-02-14 Thread Vladimir Ozerov
Dima,

This will not work for primitive keys and values as currently the only way
to address them is to use "_KEY" and "_VAL" aliases respectively. For this
reason I would rather postpone UPDATE/DELETE implementation until "_KEY"
and "_VAL" are hidden from public API and some kind of mapping is
introduced. AFAIK this should be handled as a part of IGNITE-3487 ]1].

[1] https://issues.apache.org/jira/browse/IGNITE-3487

On Sat, Feb 11, 2017 at 3:36 AM, Dmitriy Setrakyan 
wrote:

> On Fri, Feb 10, 2017 at 3:36 AM, Vladimir Ozerov 
> wrote:
>
> > I propose to ship streaming with INSERT support only for now. This is
> > enough for multitude cases and will add value to Ignite 1.9 immediately.
> We
> > can think about correct streaming UPDATE/DELETE architecture separately
> .It
> > is much more difficult thing, we cannot support it in a clean way right
> now
> > due to multiple "_key" and "_val" usages over the code base.
> >
>
> Vova, I disagree. If all parts of the key are present, then we can always
> construct a key in all cases. For these operations we can always support
> streaming. For all other operations, we can delegate to standard MR, but
> still perform most operations on the same node, as I suggested in another
> email.
>


Re: DML data streaming

2017-02-10 Thread Dmitriy Setrakyan
On Fri, Feb 10, 2017 at 3:36 AM, Vladimir Ozerov 
wrote:

> I propose to ship streaming with INSERT support only for now. This is
> enough for multitude cases and will add value to Ignite 1.9 immediately. We
> can think about correct streaming UPDATE/DELETE architecture separately .It
> is much more difficult thing, we cannot support it in a clean way right now
> due to multiple "_key" and "_val" usages over the code base.
>

Vova, I disagree. If all parts of the key are present, then we can always
construct a key in all cases. For these operations we can always support
streaming. For all other operations, we can delegate to standard MR, but
still perform most operations on the same node, as I suggested in another
email.


Re: DML data streaming

2017-02-10 Thread Dmitriy Setrakyan
On Fri, Feb 10, 2017 at 12:55 AM, Alexander Paschenko <
alexander.a.pasche...@gmail.com> wrote:

> And to avoid further confusion: UPDATE and DELETE are simply
> impossible in streaming mode when the key is not completely defined as
> long as data streamer operates with key-value pairs and not just
> tuples of named values. That's why we can't do DELETE from Person
> WHERE id1 = 5 from prev example with streamer - the Key { id1 = 5, id2
> = 0 } that would be constructed from such query is just one key and is
> handled by streamer as such while semantically that query is not about
> ONE key but about ALL keys where id1 = 5.
>

I completely agree. However, we should still optimize the MR here, since
the keys selected from one table (or cache) will probably end up on the
same node as the same keys inserted, updated, or deleted in another cache,
so these operations will likely still be local to the node.


Re: DML data streaming

2017-02-10 Thread Dmitriy Setrakyan
On Fri, Feb 10, 2017 at 12:49 AM, Alexander Paschenko <
alexander.a.pasche...@gmail.com> wrote:

> Dima,
> >
> > There are several ways to handle it. I would check how other databases
> > handle it, maybe we can borrow something. To the least, we should log
> such
> > errors in the log for now.
> >
>
> Logging errors would mean introducing some kind of stream receiver to
> do that and thus that would be really the same performance penalty for
> the successful operations. I think we should go with that optional
> flag for semantics after all.
>

I am OK  with introducing some error trap and plug it into configuration
(maybe some interface with onError(...) callback). However, we should never
swallow error, we should always print all errors to the log.  Let's not
worry about the performance in case of errors.

>
> > You don't have to use _key. Primary key is usually a field in the class,
> so
> > you can use a normal column name. In any case, we should remove any usage
> > of _key before 2.0 is released.
> >
> > Again, if user does not have to specify _key on INSERT, then it is very
> > unclear to me, why user would need to specify _key for UPDATE or DELETE.
> > Something smells here. Can you please provide an example?
> >
>
> UPDATE and DELETE _in streaming mode_ are carried _only_ for "fast"
> optimized cases - i.e. those where _key (and possibly _val) are
> explicitly specified by the user thus allowing us to map UPDATE and
> DELETE directly to cache's replace and remove operations without
> messing with entry processors and doing map-reduce SELECT by given
> criteria.
>
> Say, we have Person { firstName, secondName } with key class Key { id1,
> id2 }
>
> If I say DELETE from Person WHERE _key = ? and specify arg via JDBC,
> there's no need to do any SELECT - we can just call IgniteCache.remove
> on that key.
>
> But if I say DELETE from Person WHERE id1 = 5 then there's no way to
> avoid MR - we have to find all keys that interest us first by doing
> SELECT as long as we know only partly about what keys the user wants
> to be affected.
>
> It works in the same way for UPDATE. And I hope that it's clear how
> it's different from INSERT - there's no MR by definition (we don't
> allow INSERT FROM SELECT in streaming mode).
>

Do we allow INSERT from SELECT in non-streaming mode?


>
> AGAIN: this all is said only about streaming mode; non streaming mode
> does those optimizations too, but it also allows complex conditions,
> while streaming mode does not allow them to keep things fast and avoid
> MR.
>
> That's the reason why I suggest that we drop UPDATE and DELETE from
> DML streaming as they mean messing with those soon-hidden columns.
>
> Still we could optimize stuff like DELETE from Person WHERE id1 = 5
> AND id2 = 6 - query involves ALL fields of key AND compares only for
> equality AND has no complex expressions - we can construct key
> unambiguously and still call remove directly.
>

Exactly my point. If all key fields are present, we can construct the key
ourselves and still delegate to cache.put(..) or cache.remove(..). For all
cases where all the key fields are not present we should do regular MR. I
am assuming that this applies to UPDATE and DELETE operation. My vote is to
implement this functionality.


>
> But to me it does not sound like a really great reason to leave UPDATE
> and DELETE in DML - the users will have to write some specific queries
> to use that while all other stuff will just be declined in that mode.
> And, as I said before, UPDATE and DELETE don't probably perfectly fit
> with primary data streamer use cases - after all, modifying existing
> stuff is not what data streamer is about.
>

I am not sure what this means. We have to work in the same way as regular
RDBMS systems. I would not try to reinvent the bicycle here. All UPDATE,
DELETE, and INSERT operations should be part of DML.


>
> And regarding hiding columns: it's unclear how things will look like
> for caches like  when we remove _key and _val as long as
> tables for such cases currently have nothing but those two columns.
>

Again, think about standard RDBMS systems. None of them have _key or _val,
and therefore neither should we.


Re: DML data streaming

2017-02-10 Thread Denis Magda
In general, the data streamer approach should be mostly used for data loading 
scenarios. The data is usually loaded with INSERTS which means that the 
scenario is already supported and we’re free to merge the changes to 1.9.

If you UPDATE or DELETE data in the streaming mode then you are required to set 
dataStreamer.allowOverwrite = true, making sure that the updates coming from 
the streamer side are consistent with transactions that might be executed in 
parallel. In this mode the streamer switches to a slower mode pushing the data 
with cache.writeAll() and cache.removeAll() methods. 

At all, considering real-life use cases it’s more than enough to support the 
streaming mode for INSERTS only and describe it properly in the documentation.

—
Denis

> On Feb 10, 2017, at 3:36 AM, Vladimir Ozerov  wrote:
> 
> I propose to ship streaming with INSERT support only for now. This is
> enough for multitude cases and will add value to Ignite 1.9 immediately. We
> can think about correct streaming UPDATE/DELETE architecture separately .It
> is much more difficult thing, we cannot support it in a clean way right now
> due to multiple "_key" and "_val" usages over the code base.
> 
> On Fri, Feb 10, 2017 at 11:55 AM, Alexander Paschenko <
> alexander.a.pasche...@gmail.com> wrote:
> 
>> And to avoid further confusion: UPDATE and DELETE are simply
>> impossible in streaming mode when the key is not completely defined as
>> long as data streamer operates with key-value pairs and not just
>> tuples of named values. That's why we can't do DELETE from Person
>> WHERE id1 = 5 from prev example with streamer - the Key { id1 = 5, id2
>> = 0 } that would be constructed from such query is just one key and is
>> handled by streamer as such while semantically that query is not about
>> ONE key but about ALL keys where id1 = 5.
>> 
>> - Alex
>> 
>> 2017-02-10 11:49 GMT+03:00 Alexander Paschenko
>> :
>>> Dima,
 
 There are several ways to handle it. I would check how other databases
 handle it, maybe we can borrow something. To the least, we should log
>> such
 errors in the log for now.
 
>>> 
>>> Logging errors would mean introducing some kind of stream receiver to
>>> do that and thus that would be really the same performance penalty for
>>> the successful operations. I think we should go with that optional
>>> flag for semantics after all.
>>> 
 You don't have to use _key. Primary key is usually a field in the
>> class, so
 you can use a normal column name. In any case, we should remove any
>> usage
 of _key before 2.0 is released.
 
 Again, if user does not have to specify _key on INSERT, then it is very
 unclear to me, why user would need to specify _key for UPDATE or DELETE.
 Something smells here. Can you please provide an example?
 
>>> 
>>> UPDATE and DELETE _in streaming mode_ are carried _only_ for "fast"
>>> optimized cases - i.e. those where _key (and possibly _val) are
>>> explicitly specified by the user thus allowing us to map UPDATE and
>>> DELETE directly to cache's replace and remove operations without
>>> messing with entry processors and doing map-reduce SELECT by given
>>> criteria.
>>> 
>>> Say, we have Person { firstName, secondName } with key class Key { id1,
>> id2 }
>>> 
>>> If I say DELETE from Person WHERE _key = ? and specify arg via JDBC,
>>> there's no need to do any SELECT - we can just call IgniteCache.remove
>>> on that key.
>>> 
>>> But if I say DELETE from Person WHERE id1 = 5 then there's no way to
>>> avoid MR - we have to find all keys that interest us first by doing
>>> SELECT as long as we know only partly about what keys the user wants
>>> to be affected.
>>> 
>>> It works in the same way for UPDATE. And I hope that it's clear how
>>> it's different from INSERT - there's no MR by definition (we don't
>>> allow INSERT FROM SELECT in streaming mode).
>>> 
>>> AGAIN: this all is said only about streaming mode; non streaming mode
>>> does those optimizations too, but it also allows complex conditions,
>>> while streaming mode does not allow them to keep things fast and avoid
>>> MR.
>>> 
>>> That's the reason why I suggest that we drop UPDATE and DELETE from
>>> DML streaming as they mean messing with those soon-hidden columns.
>>> 
>>> Still we could optimize stuff like DELETE from Person WHERE id1 = 5
>>> AND id2 = 6 - query involves ALL fields of key AND compares only for
>>> equality AND has no complex expressions - we can construct key
>>> unambiguously and still call remove directly.
>>> 
>>> But to me it does not sound like a really great reason to leave UPDATE
>>> and DELETE in DML - the users will have to write some specific queries
>>> to use that while all other stuff will just be declined in that mode.
>>> And, as I said before, UPDATE and DELETE don't probably perfectly fit
>>> with primary data streamer use cases - after all, modifying existing
>>> stuff is not what 

Re: DML data streaming

2017-02-10 Thread Vladimir Ozerov
I propose to ship streaming with INSERT support only for now. This is
enough for multitude cases and will add value to Ignite 1.9 immediately. We
can think about correct streaming UPDATE/DELETE architecture separately .It
is much more difficult thing, we cannot support it in a clean way right now
due to multiple "_key" and "_val" usages over the code base.

On Fri, Feb 10, 2017 at 11:55 AM, Alexander Paschenko <
alexander.a.pasche...@gmail.com> wrote:

> And to avoid further confusion: UPDATE and DELETE are simply
> impossible in streaming mode when the key is not completely defined as
> long as data streamer operates with key-value pairs and not just
> tuples of named values. That's why we can't do DELETE from Person
> WHERE id1 = 5 from prev example with streamer - the Key { id1 = 5, id2
> = 0 } that would be constructed from such query is just one key and is
> handled by streamer as such while semantically that query is not about
> ONE key but about ALL keys where id1 = 5.
>
> - Alex
>
> 2017-02-10 11:49 GMT+03:00 Alexander Paschenko
> :
> > Dima,
> >>
> >> There are several ways to handle it. I would check how other databases
> >> handle it, maybe we can borrow something. To the least, we should log
> such
> >> errors in the log for now.
> >>
> >
> > Logging errors would mean introducing some kind of stream receiver to
> > do that and thus that would be really the same performance penalty for
> > the successful operations. I think we should go with that optional
> > flag for semantics after all.
> >
> >> You don't have to use _key. Primary key is usually a field in the
> class, so
> >> you can use a normal column name. In any case, we should remove any
> usage
> >> of _key before 2.0 is released.
> >>
> >> Again, if user does not have to specify _key on INSERT, then it is very
> >> unclear to me, why user would need to specify _key for UPDATE or DELETE.
> >> Something smells here. Can you please provide an example?
> >>
> >
> > UPDATE and DELETE _in streaming mode_ are carried _only_ for "fast"
> > optimized cases - i.e. those where _key (and possibly _val) are
> > explicitly specified by the user thus allowing us to map UPDATE and
> > DELETE directly to cache's replace and remove operations without
> > messing with entry processors and doing map-reduce SELECT by given
> > criteria.
> >
> > Say, we have Person { firstName, secondName } with key class Key { id1,
> id2 }
> >
> > If I say DELETE from Person WHERE _key = ? and specify arg via JDBC,
> > there's no need to do any SELECT - we can just call IgniteCache.remove
> > on that key.
> >
> > But if I say DELETE from Person WHERE id1 = 5 then there's no way to
> > avoid MR - we have to find all keys that interest us first by doing
> > SELECT as long as we know only partly about what keys the user wants
> > to be affected.
> >
> > It works in the same way for UPDATE. And I hope that it's clear how
> > it's different from INSERT - there's no MR by definition (we don't
> > allow INSERT FROM SELECT in streaming mode).
> >
> > AGAIN: this all is said only about streaming mode; non streaming mode
> > does those optimizations too, but it also allows complex conditions,
> > while streaming mode does not allow them to keep things fast and avoid
> > MR.
> >
> > That's the reason why I suggest that we drop UPDATE and DELETE from
> > DML streaming as they mean messing with those soon-hidden columns.
> >
> > Still we could optimize stuff like DELETE from Person WHERE id1 = 5
> > AND id2 = 6 - query involves ALL fields of key AND compares only for
> > equality AND has no complex expressions - we can construct key
> > unambiguously and still call remove directly.
> >
> > But to me it does not sound like a really great reason to leave UPDATE
> > and DELETE in DML - the users will have to write some specific queries
> > to use that while all other stuff will just be declined in that mode.
> > And, as I said before, UPDATE and DELETE don't probably perfectly fit
> > with primary data streamer use cases - after all, modifying existing
> > stuff is not what data streamer is about.
> >
> > And regarding hiding columns: it's unclear how things will look like
> > for caches like  when we remove _key and _val as long as
> > tables for such cases currently have nothing but those two columns.
> >
> > - Alex
> >
> >>> 8 февр. 2017 г. 11:33 PM пользователь "Dmitriy Setrakyan" <
> >>> dsetrak...@apache.org> написал:
> >>>
> >>> > Alexander,
> >>> >
> >>> > Are you suggesting that currently to execute a simple INSERT for 1
> row we
> >>> > invoke a data streamer on Ignite API? How about an update by a
> primary
> >>> key?
> >>> > Why not execute a simple cache put in either case?
> >>> >
> >>> > I think we had a separate thread where we agreed that the streamer
> should
> >>> > only be turned on if a certain flag on a JDBC connection is set, no?
> >>> >
> >>> > D.
> >>> >
> >>> > On Wed, Feb 8, 2017 at 7:00 AM, Alexander Paschenko <
> >>> 

Re: DML data streaming

2017-02-10 Thread Alexander Paschenko
And to avoid further confusion: UPDATE and DELETE are simply
impossible in streaming mode when the key is not completely defined as
long as data streamer operates with key-value pairs and not just
tuples of named values. That's why we can't do DELETE from Person
WHERE id1 = 5 from prev example with streamer - the Key { id1 = 5, id2
= 0 } that would be constructed from such query is just one key and is
handled by streamer as such while semantically that query is not about
ONE key but about ALL keys where id1 = 5.

- Alex

2017-02-10 11:49 GMT+03:00 Alexander Paschenko
:
> Dima,
>>
>> There are several ways to handle it. I would check how other databases
>> handle it, maybe we can borrow something. To the least, we should log such
>> errors in the log for now.
>>
>
> Logging errors would mean introducing some kind of stream receiver to
> do that and thus that would be really the same performance penalty for
> the successful operations. I think we should go with that optional
> flag for semantics after all.
>
>> You don't have to use _key. Primary key is usually a field in the class, so
>> you can use a normal column name. In any case, we should remove any usage
>> of _key before 2.0 is released.
>>
>> Again, if user does not have to specify _key on INSERT, then it is very
>> unclear to me, why user would need to specify _key for UPDATE or DELETE.
>> Something smells here. Can you please provide an example?
>>
>
> UPDATE and DELETE _in streaming mode_ are carried _only_ for "fast"
> optimized cases - i.e. those where _key (and possibly _val) are
> explicitly specified by the user thus allowing us to map UPDATE and
> DELETE directly to cache's replace and remove operations without
> messing with entry processors and doing map-reduce SELECT by given
> criteria.
>
> Say, we have Person { firstName, secondName } with key class Key { id1, id2 }
>
> If I say DELETE from Person WHERE _key = ? and specify arg via JDBC,
> there's no need to do any SELECT - we can just call IgniteCache.remove
> on that key.
>
> But if I say DELETE from Person WHERE id1 = 5 then there's no way to
> avoid MR - we have to find all keys that interest us first by doing
> SELECT as long as we know only partly about what keys the user wants
> to be affected.
>
> It works in the same way for UPDATE. And I hope that it's clear how
> it's different from INSERT - there's no MR by definition (we don't
> allow INSERT FROM SELECT in streaming mode).
>
> AGAIN: this all is said only about streaming mode; non streaming mode
> does those optimizations too, but it also allows complex conditions,
> while streaming mode does not allow them to keep things fast and avoid
> MR.
>
> That's the reason why I suggest that we drop UPDATE and DELETE from
> DML streaming as they mean messing with those soon-hidden columns.
>
> Still we could optimize stuff like DELETE from Person WHERE id1 = 5
> AND id2 = 6 - query involves ALL fields of key AND compares only for
> equality AND has no complex expressions - we can construct key
> unambiguously and still call remove directly.
>
> But to me it does not sound like a really great reason to leave UPDATE
> and DELETE in DML - the users will have to write some specific queries
> to use that while all other stuff will just be declined in that mode.
> And, as I said before, UPDATE and DELETE don't probably perfectly fit
> with primary data streamer use cases - after all, modifying existing
> stuff is not what data streamer is about.
>
> And regarding hiding columns: it's unclear how things will look like
> for caches like  when we remove _key and _val as long as
> tables for such cases currently have nothing but those two columns.
>
> - Alex
>
>>> 8 февр. 2017 г. 11:33 PM пользователь "Dmitriy Setrakyan" <
>>> dsetrak...@apache.org> написал:
>>>
>>> > Alexander,
>>> >
>>> > Are you suggesting that currently to execute a simple INSERT for 1 row we
>>> > invoke a data streamer on Ignite API? How about an update by a primary
>>> key?
>>> > Why not execute a simple cache put in either case?
>>> >
>>> > I think we had a separate thread where we agreed that the streamer should
>>> > only be turned on if a certain flag on a JDBC connection is set, no?
>>> >
>>> > D.
>>> >
>>> > On Wed, Feb 8, 2017 at 7:00 AM, Alexander Paschenko <
>>> > alexander.a.pasche...@gmail.com> wrote:
>>> >
>>> > > Hello Igniters,
>>> > >
>>> > > I'd like to raise few questions regarding data streaming via DML
>>> > > statements.
>>> > >
>>> > > Currently, all types of DML statements are supported (INSERT, UPDATE,
>>> > > DELETE, MERGE).
>>> > >
>>> > > UPDATE and DELETE are supported in streaming mode only when their
>>> > > WHERE condition is bounded with _key and/or _val columns, and UPDATE
>>> > > works only for _val column directly.
>>> > >
>>> > > Seeing some activity in direction of hiding _key and _val from the
>>> > > user as far as possible, these features seem pointless and should not
>>> > > be 

Re: DML data streaming

2017-02-10 Thread Alexander Paschenko
Dima,
>
> There are several ways to handle it. I would check how other databases
> handle it, maybe we can borrow something. To the least, we should log such
> errors in the log for now.
>

Logging errors would mean introducing some kind of stream receiver to
do that and thus that would be really the same performance penalty for
the successful operations. I think we should go with that optional
flag for semantics after all.

> You don't have to use _key. Primary key is usually a field in the class, so
> you can use a normal column name. In any case, we should remove any usage
> of _key before 2.0 is released.
>
> Again, if user does not have to specify _key on INSERT, then it is very
> unclear to me, why user would need to specify _key for UPDATE or DELETE.
> Something smells here. Can you please provide an example?
>

UPDATE and DELETE _in streaming mode_ are carried _only_ for "fast"
optimized cases - i.e. those where _key (and possibly _val) are
explicitly specified by the user thus allowing us to map UPDATE and
DELETE directly to cache's replace and remove operations without
messing with entry processors and doing map-reduce SELECT by given
criteria.

Say, we have Person { firstName, secondName } with key class Key { id1, id2 }

If I say DELETE from Person WHERE _key = ? and specify arg via JDBC,
there's no need to do any SELECT - we can just call IgniteCache.remove
on that key.

But if I say DELETE from Person WHERE id1 = 5 then there's no way to
avoid MR - we have to find all keys that interest us first by doing
SELECT as long as we know only partly about what keys the user wants
to be affected.

It works in the same way for UPDATE. And I hope that it's clear how
it's different from INSERT - there's no MR by definition (we don't
allow INSERT FROM SELECT in streaming mode).

AGAIN: this all is said only about streaming mode; non streaming mode
does those optimizations too, but it also allows complex conditions,
while streaming mode does not allow them to keep things fast and avoid
MR.

That's the reason why I suggest that we drop UPDATE and DELETE from
DML streaming as they mean messing with those soon-hidden columns.

Still we could optimize stuff like DELETE from Person WHERE id1 = 5
AND id2 = 6 - query involves ALL fields of key AND compares only for
equality AND has no complex expressions - we can construct key
unambiguously and still call remove directly.

But to me it does not sound like a really great reason to leave UPDATE
and DELETE in DML - the users will have to write some specific queries
to use that while all other stuff will just be declined in that mode.
And, as I said before, UPDATE and DELETE don't probably perfectly fit
with primary data streamer use cases - after all, modifying existing
stuff is not what data streamer is about.

And regarding hiding columns: it's unclear how things will look like
for caches like  when we remove _key and _val as long as
tables for such cases currently have nothing but those two columns.

- Alex

>> 8 февр. 2017 г. 11:33 PM пользователь "Dmitriy Setrakyan" <
>> dsetrak...@apache.org> написал:
>>
>> > Alexander,
>> >
>> > Are you suggesting that currently to execute a simple INSERT for 1 row we
>> > invoke a data streamer on Ignite API? How about an update by a primary
>> key?
>> > Why not execute a simple cache put in either case?
>> >
>> > I think we had a separate thread where we agreed that the streamer should
>> > only be turned on if a certain flag on a JDBC connection is set, no?
>> >
>> > D.
>> >
>> > On Wed, Feb 8, 2017 at 7:00 AM, Alexander Paschenko <
>> > alexander.a.pasche...@gmail.com> wrote:
>> >
>> > > Hello Igniters,
>> > >
>> > > I'd like to raise few questions regarding data streaming via DML
>> > > statements.
>> > >
>> > > Currently, all types of DML statements are supported (INSERT, UPDATE,
>> > > DELETE, MERGE).
>> > >
>> > > UPDATE and DELETE are supported in streaming mode only when their
>> > > WHERE condition is bounded with _key and/or _val columns, and UPDATE
>> > > works only for _val column directly.
>> > >
>> > > Seeing some activity in direction of hiding _key and _val from the
>> > > user as far as possible, these features seem pointless and should not
>> > > be released, what do you think?
>> > >
>> > > Also INSERT in streaming mode currently does not throw errors on
>> > > duplicate keys and silently ignores such new records (as long as it's
>> > > faster than it would work if we'd introduced receiver that would throw
>> > > exceptions) - this can be fixed with additional flag that could
>> > > _optionally_ make INSERT slower but more accurate in semantic.
>> > >
>> > > And MERGE in streaming mode currently not totally accurate in
>> > > semantic, too - on key presence, it will just replace whole value with
>> > > new one thus potentially making values of some concrete columns/fields
>> > > lost - this is analogous to
>> > > https://issues.apache.org/jira/browse/IGNITE-4489, but hardly can be
>> > > fixed 

Re: DML data streaming

2017-02-09 Thread Dmitriy Setrakyan
On Thu, Feb 9, 2017 at 1:53 AM, Alexander Paschenko <
alexander.a.pasche...@gmail.com> wrote:

> Sergey,
>
> Streaming does not make sense for INSERT FROM SELECT as this pattern does
> not match primary use case for streaming (bulk data load to Ignite).
>
> Dima,
>
> No, I suggest that data streamer mode supports full semantic sense of
> INSERT (throw an ex if there's a duplicate of PK) optionally and depending
> on a flag (that is to be introduced). Currently new records are quietly
> ignored on key duplication — it's really just a question of notifying the
> user about duplicate keys in streaming mode.
>

There are several ways to handle it. I would check how other databases
handle it, maybe we can borrow something. To the least, we should log such
errors in the log for now.


> Update by primary key is implemented now, but obviously it involves user
> messing with _key column that we're planning to hide from them in near
> future.
>

You don't have to use _key. Primary key is usually a field in the class, so
you can use a normal column name. In any case, we should remove any usage
of _key before 2.0 is released.


>
> Streaming is turned on via the flag, just as we've agreed in one of prev
> threads. This thread is not about how we turn streaming on but rather about
> semantic correctness of INSERT and MERGE in this mode and about whether we
> need UPDATE and DELETE in it as they do not essentially load new data into
> cache and (_in streaming mode_) make user mess with service columns of _key
> and _val.
>

Again, if user does not have to specify _key on INSERT, then it is very
unclear to me, why user would need to specify _key for UPDATE or DELETE.
Something smells here. Can you please provide an example?


>
> — Alex
> 8 февр. 2017 г. 11:33 PM пользователь "Dmitriy Setrakyan" <
> dsetrak...@apache.org> написал:
>
> > Alexander,
> >
> > Are you suggesting that currently to execute a simple INSERT for 1 row we
> > invoke a data streamer on Ignite API? How about an update by a primary
> key?
> > Why not execute a simple cache put in either case?
> >
> > I think we had a separate thread where we agreed that the streamer should
> > only be turned on if a certain flag on a JDBC connection is set, no?
> >
> > D.
> >
> > On Wed, Feb 8, 2017 at 7:00 AM, Alexander Paschenko <
> > alexander.a.pasche...@gmail.com> wrote:
> >
> > > Hello Igniters,
> > >
> > > I'd like to raise few questions regarding data streaming via DML
> > > statements.
> > >
> > > Currently, all types of DML statements are supported (INSERT, UPDATE,
> > > DELETE, MERGE).
> > >
> > > UPDATE and DELETE are supported in streaming mode only when their
> > > WHERE condition is bounded with _key and/or _val columns, and UPDATE
> > > works only for _val column directly.
> > >
> > > Seeing some activity in direction of hiding _key and _val from the
> > > user as far as possible, these features seem pointless and should not
> > > be released, what do you think?
> > >
> > > Also INSERT in streaming mode currently does not throw errors on
> > > duplicate keys and silently ignores such new records (as long as it's
> > > faster than it would work if we'd introduced receiver that would throw
> > > exceptions) - this can be fixed with additional flag that could
> > > _optionally_ make INSERT slower but more accurate in semantic.
> > >
> > > And MERGE in streaming mode currently not totally accurate in
> > > semantic, too - on key presence, it will just replace whole value with
> > > new one thus potentially making values of some concrete columns/fields
> > > lost - this is analogous to
> > > https://issues.apache.org/jira/browse/IGNITE-4489, but hardly can be
> > > fixed as long as probably it would hit performance and would be
> > > unresonably complex to implement.
> > >
> > > I suggest that we drop all except INSERT and introduce optional flag
> > > for its totally correct semantic behavior as described above.
> > >
> > > - Alex
> > >
> >
>


Re: DML data streaming

2017-02-09 Thread Alexander Paschenko
Sergey,

Streaming does not make sense for INSERT FROM SELECT as this pattern does
not match primary use case for streaming (bulk data load to Ignite).

Dima,

No, I suggest that data streamer mode supports full semantic sense of
INSERT (throw an ex if there's a duplicate of PK) optionally and depending
on a flag (that is to be introduced). Currently new records are quietly
ignored on key duplication — it's really just a question of notifying the
user about duplicate keys in streaming mode.

Update by primary key is implemented now, but obviously it involves user
messing with _key column that we're planning to hide from them in near
future.

Streaming is turned on via the flag, just as we've agreed in one of prev
threads. This thread is not about how we turn streaming on but rather about
semantic correctness of INSERT and MERGE in this mode and about whether we
need UPDATE and DELETE in it as they do not essentially load new data into
cache and (_in streaming mode_) make user mess with service columns of _key
and _val.

— Alex
8 февр. 2017 г. 11:33 PM пользователь "Dmitriy Setrakyan" <
dsetrak...@apache.org> написал:

> Alexander,
>
> Are you suggesting that currently to execute a simple INSERT for 1 row we
> invoke a data streamer on Ignite API? How about an update by a primary key?
> Why not execute a simple cache put in either case?
>
> I think we had a separate thread where we agreed that the streamer should
> only be turned on if a certain flag on a JDBC connection is set, no?
>
> D.
>
> On Wed, Feb 8, 2017 at 7:00 AM, Alexander Paschenko <
> alexander.a.pasche...@gmail.com> wrote:
>
> > Hello Igniters,
> >
> > I'd like to raise few questions regarding data streaming via DML
> > statements.
> >
> > Currently, all types of DML statements are supported (INSERT, UPDATE,
> > DELETE, MERGE).
> >
> > UPDATE and DELETE are supported in streaming mode only when their
> > WHERE condition is bounded with _key and/or _val columns, and UPDATE
> > works only for _val column directly.
> >
> > Seeing some activity in direction of hiding _key and _val from the
> > user as far as possible, these features seem pointless and should not
> > be released, what do you think?
> >
> > Also INSERT in streaming mode currently does not throw errors on
> > duplicate keys and silently ignores such new records (as long as it's
> > faster than it would work if we'd introduced receiver that would throw
> > exceptions) - this can be fixed with additional flag that could
> > _optionally_ make INSERT slower but more accurate in semantic.
> >
> > And MERGE in streaming mode currently not totally accurate in
> > semantic, too - on key presence, it will just replace whole value with
> > new one thus potentially making values of some concrete columns/fields
> > lost - this is analogous to
> > https://issues.apache.org/jira/browse/IGNITE-4489, but hardly can be
> > fixed as long as probably it would hit performance and would be
> > unresonably complex to implement.
> >
> > I suggest that we drop all except INSERT and introduce optional flag
> > for its totally correct semantic behavior as described above.
> >
> > - Alex
> >
>


Re: DML data streaming

2017-02-08 Thread Dmitriy Setrakyan
Alexander,

Are you suggesting that currently to execute a simple INSERT for 1 row we
invoke a data streamer on Ignite API? How about an update by a primary key?
Why not execute a simple cache put in either case?

I think we had a separate thread where we agreed that the streamer should
only be turned on if a certain flag on a JDBC connection is set, no?

D.

On Wed, Feb 8, 2017 at 7:00 AM, Alexander Paschenko <
alexander.a.pasche...@gmail.com> wrote:

> Hello Igniters,
>
> I'd like to raise few questions regarding data streaming via DML
> statements.
>
> Currently, all types of DML statements are supported (INSERT, UPDATE,
> DELETE, MERGE).
>
> UPDATE and DELETE are supported in streaming mode only when their
> WHERE condition is bounded with _key and/or _val columns, and UPDATE
> works only for _val column directly.
>
> Seeing some activity in direction of hiding _key and _val from the
> user as far as possible, these features seem pointless and should not
> be released, what do you think?
>
> Also INSERT in streaming mode currently does not throw errors on
> duplicate keys and silently ignores such new records (as long as it's
> faster than it would work if we'd introduced receiver that would throw
> exceptions) - this can be fixed with additional flag that could
> _optionally_ make INSERT slower but more accurate in semantic.
>
> And MERGE in streaming mode currently not totally accurate in
> semantic, too - on key presence, it will just replace whole value with
> new one thus potentially making values of some concrete columns/fields
> lost - this is analogous to
> https://issues.apache.org/jira/browse/IGNITE-4489, but hardly can be
> fixed as long as probably it would hit performance and would be
> unresonably complex to implement.
>
> I suggest that we drop all except INSERT and introduce optional flag
> for its totally correct semantic behavior as described above.
>
> - Alex
>


Re: DML data streaming

2017-02-08 Thread Sergey Kozlov
Hi Alexander.

What's about supporting statement *INSERT INTO ... SELECT FROM*  for
streams? Does it make sense?

On Wed, Feb 8, 2017 at 6:44 PM, Alexander Paschenko <
alexander.a.pasche...@gmail.com> wrote:

> Also, currently it's possible to run SELECTs on "streamed"
> connections, and probably this is odd and should not be released too,
> what do you think?
>
> - Alex
>
> 2017-02-08 18:00 GMT+03:00 Alexander Paschenko
> :
> > Hello Igniters,
> >
> > I'd like to raise few questions regarding data streaming via DML
> statements.
> >
> > Currently, all types of DML statements are supported (INSERT, UPDATE,
> > DELETE, MERGE).
> >
> > UPDATE and DELETE are supported in streaming mode only when their
> > WHERE condition is bounded with _key and/or _val columns, and UPDATE
> > works only for _val column directly.
> >
> > Seeing some activity in direction of hiding _key and _val from the
> > user as far as possible, these features seem pointless and should not
> > be released, what do you think?
> >
> > Also INSERT in streaming mode currently does not throw errors on
> > duplicate keys and silently ignores such new records (as long as it's
> > faster than it would work if we'd introduced receiver that would throw
> > exceptions) - this can be fixed with additional flag that could
> > _optionally_ make INSERT slower but more accurate in semantic.
> >
> > And MERGE in streaming mode currently not totally accurate in
> > semantic, too - on key presence, it will just replace whole value with
> > new one thus potentially making values of some concrete columns/fields
> > lost - this is analogous to
> > https://issues.apache.org/jira/browse/IGNITE-4489, but hardly can be
> > fixed as long as probably it would hit performance and would be
> > unresonably complex to implement.
> >
> > I suggest that we drop all except INSERT and introduce optional flag
> > for its totally correct semantic behavior as described above.
> >
> > - Alex
>



-- 
Sergey Kozlov
GridGain Systems
www.gridgain.com


Re: DML data streaming

2017-02-08 Thread Alexander Paschenko
Also, currently it's possible to run SELECTs on "streamed"
connections, and probably this is odd and should not be released too,
what do you think?

- Alex

2017-02-08 18:00 GMT+03:00 Alexander Paschenko
:
> Hello Igniters,
>
> I'd like to raise few questions regarding data streaming via DML statements.
>
> Currently, all types of DML statements are supported (INSERT, UPDATE,
> DELETE, MERGE).
>
> UPDATE and DELETE are supported in streaming mode only when their
> WHERE condition is bounded with _key and/or _val columns, and UPDATE
> works only for _val column directly.
>
> Seeing some activity in direction of hiding _key and _val from the
> user as far as possible, these features seem pointless and should not
> be released, what do you think?
>
> Also INSERT in streaming mode currently does not throw errors on
> duplicate keys and silently ignores such new records (as long as it's
> faster than it would work if we'd introduced receiver that would throw
> exceptions) - this can be fixed with additional flag that could
> _optionally_ make INSERT slower but more accurate in semantic.
>
> And MERGE in streaming mode currently not totally accurate in
> semantic, too - on key presence, it will just replace whole value with
> new one thus potentially making values of some concrete columns/fields
> lost - this is analogous to
> https://issues.apache.org/jira/browse/IGNITE-4489, but hardly can be
> fixed as long as probably it would hit performance and would be
> unresonably complex to implement.
>
> I suggest that we drop all except INSERT and introduce optional flag
> for its totally correct semantic behavior as described above.
>
> - Alex