subject:"Re\: Using User Defined Functions in UPDATE queries"

Re: Using User Defined Functions in UPDATE queries

2016-03-11 Thread Kim Liu

Thank you for the clarification.
—Kim

From: Sylvain Lebresne <sylv...@datastax.com<mailto:sylv...@datastax.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Friday, March 11, 2016 at 10:05
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: Using User Defined Functions in UPDATE queries

On Fri, Mar 11, 2016 at 5:09 PM, Kim Liu 
<k...@edgewaternetworks.com<mailto:k...@edgewaternetworks.com>> wrote:
Just for sake of clarification, then, what is the use-case for having UDFs in 
an UPDATE?

Honestly, it's merely there for convenience when you use things like cqlsh for 
instance.

Re: Using User Defined Functions in UPDATE queries

2016-03-11 Thread Sylvain Lebresne

On Fri, Mar 11, 2016 at 5:09 PM, Kim Liu <k...@edgewaternetworks.com> wrote:

> Just for sake of clarification, then, what is the use-case for having UDFs
> in an UPDATE?
>

Honestly, it's merely there for convenience when you use things like cqlsh
for instance.


>
> If they cannot read data from the data store, then all of the parameters
> to the UDF must be supplied by the client, correct?
>

Correct (for UPDATE at least).


>
> If the client has all the parameters, the client could perform the
> equivalent of the UDF on the client side, first, then send the results to
> the server, instead of pushing the computation work onto the server.
>

Absolutely.


>  So I am curious as to what one is supposed to use a UDF in an UPDATE for.
>

Again, mainly a convenience.

The main end goal for UDF is for use in SELECT. It's already potentially
slightly useful to save server->client bandwidth as you could imagine to do:
  SELECT compute_md5(image) FROM images WHERE ...;
(assuming of course that trading cpu server side for bandwidth is a good
idea)

Though their most useful use will probably be in WHERE clause, for things
like:
  SELECT * FROM foo WHERE sqrt(x) = 3;
and we also plan to have functional indexes to go with that.
However, I'll not right away that those last use case are not yet
supported, but they will be eventually. Adding UDF was more of the first
incremental step, but their most interesting use case is arguably not yet
supported. But as far as UPDATE is concerned, we'll probably never support
them as I said since that would require a read-before-write (except
possibly for LWT which do that read-before-write (at great cost) anyway,
we'll see).


>
>
>
> Long-winded explanation of the use-case I was poking at using UPDATE UDFs
> for below for the morbidly curious.
>
>
>
>
> That morbidly curious, huh?
>
> The scenario is, roughly, that the application receives a set of data
> which is broken up over, say, four messages (A,B,C,D).  However, the
> messages can arrive in any order, possibly with duplicates, and the data
> set is not complete until the all four messages are received.  There are
> multiple message receivers in order to scale to the volume of messages
> coming in, so each of the four messages per data set could arrive at any
> receiver (in any chronological pattern), and each receiving station would
> then insert the partial data into Cassandra.
>
> I looked at the Cassandra SET implementation, thinking that I could just
> add ‘A’, ‘B’, ‘C’, ‘D’ (or 1,2,3,4) to a set with a secondary index.  Then
> periodically search for where the set had all elements to spot which rows
> had a complete data set ready for processing.  However, there does not
> appear to be an equality check for SETs.  (Adding elements to a set is
> another place where UPDATE appears to allow for the “x = x 
> ” pattern which added to my confusion about using a UDF in the
> UPDATE.)
>
> So instead of using sets, the idea was to have a UDF perform a bit-wise OR
> operation.  Roughly:
>   CREATE FUNCTION bitwise_or( a int, b int ) CALLED ON NULL INPUT RETURNS
> int LANGUAGE java AS 'return Integer.valueOf((a == null ? 0 : a)|(b == null
> ? 0 : b));';
>
> Then as each message segment came in, I had intended, roughly:
>   UPDATE MessageData SET messageComplete = bitwise_or(messageComplete,2),
> data2=… ;
>   UPDATE MessageData SET messageComplete = bitwise_or(messageComplete,1),
> data1=… ;
>   UPDATE MessageData SET messageComplete = bitwise_or(messageComplete,8),
> data4=… ;
>   UPDATE MessageData SET messageComplete = bitwise_or(messageComplete,4),
> data3=… ;
>
> Then, with a secondary index on ‘messageComplete’, periodically scrape out
> all rows where messageComplete was equal to 15.  (At most, sixteen unique
> values in the secondary index.)  (And use a TTL to expire messages that did
> not eventually complete, etc.  Boilerplate infrastructure, etc.)
>
> This was based upon my incorrect assumption about UPDATE UDFs, since this
> looked like an optimal way to avoid having all the clients perform
> read-updates patterns and worrying about the clients stepping on each
> others data, as well as handling cases where duplicate messages were
> received by different receivers.  So it’s starting to look like I might
> need to use something else to perform the correlation between messages.
>
> —Kim
>
> From: Sylvain Lebresne <sylv...@datastax.com>
> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
> Date: Friday, March 11, 2016 at 00:35
> To: "user@cassandra.apache.org" <user@cassandra.apache.org>
> Subject: Re: Using User Defined Functions in UPDATE queries
>
> UDF are usable in UPDATE statement as actually trying them show

Re: Using User Defined Functions in UPDATE queries

2016-03-11 Thread Kim Liu

Just for sake of clarification, then, what is the use-case for having UDFs in 
an UPDATE?

If they cannot read data from the data store, then all of the parameters to the 
UDF must be supplied by the client, correct?

If the client has all the parameters, the client could perform the equivalent 
of the UDF on the client side, first, then send the results to the server, 
instead of pushing the computation work onto the server.  So I am curious as to 
what one is supposed to use a UDF in an UPDATE for.



Long-winded explanation of the use-case I was poking at using UPDATE UDFs for 
below for the morbidly curious.




That morbidly curious, huh?

The scenario is, roughly, that the application receives a set of data which is 
broken up over, say, four messages (A,B,C,D).  However, the messages can arrive 
in any order, possibly with duplicates, and the data set is not complete until 
the all four messages are received.  There are multiple message receivers in 
order to scale to the volume of messages coming in, so each of the four 
messages per data set could arrive at any receiver (in any chronological 
pattern), and each receiving station would then insert the partial data into 
Cassandra.

I looked at the Cassandra SET implementation, thinking that I could just add 
‘A’, ‘B’, ‘C’, ‘D’ (or 1,2,3,4) to a set with a secondary index.  Then 
periodically search for where the set had all elements to spot which rows had a 
complete data set ready for processing.  However, there does not appear to be 
an equality check for SETs.  (Adding elements to a set is another place where 
UPDATE appears to allow for the “x = x  ” pattern which added 
to my confusion about using a UDF in the UPDATE.)

So instead of using sets, the idea was to have a UDF perform a bit-wise OR 
operation.  Roughly:
  CREATE FUNCTION bitwise_or( a int, b int ) CALLED ON NULL INPUT RETURNS int 
LANGUAGE java AS 'return Integer.valueOf((a == null ? 0 : a)|(b == null ? 0 : 
b));';

Then as each message segment came in, I had intended, roughly:
  UPDATE MessageData SET messageComplete = bitwise_or(messageComplete,2), 
data2=… ;
  UPDATE MessageData SET messageComplete = bitwise_or(messageComplete,1), 
data1=… ;
  UPDATE MessageData SET messageComplete = bitwise_or(messageComplete,8), 
data4=… ;
  UPDATE MessageData SET messageComplete = bitwise_or(messageComplete,4), 
data3=… ;

Then, with a secondary index on ‘messageComplete’, periodically scrape out all 
rows where messageComplete was equal to 15.  (At most, sixteen unique values in 
the secondary index.)  (And use a TTL to expire messages that did not 
eventually complete, etc.  Boilerplate infrastructure, etc.)

This was based upon my incorrect assumption about UPDATE UDFs, since this 
looked like an optimal way to avoid having all the clients perform read-updates 
patterns and worrying about the clients stepping on each others data, as well 
as handling cases where duplicate messages were received by different 
receivers.  So it’s starting to look like I might need to use something else to 
perform the correlation between messages.

—Kim

From: Sylvain Lebresne <sylv...@datastax.com<mailto:sylv...@datastax.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Friday, March 11, 2016 at 00:35
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: Using User Defined Functions in UPDATE queries

UDF are usable in UPDATE statement as actually trying them shows, it's just the 
documented grammar that needs fixing.

But as far as doing something like:
  UPDATE test_table SET data=max_int(data,5) WHERE idx='abc’;
this is indeed *not* supported and likely never will. One big pillar of C* 
design is that normal writes like this don't do a read-before-write, both for 
performance and because of consistency constraints, so we can't have update 
depend on the previous value in any way.
I'll note that maybe that make UDF useless for you and if so, I'm sorry, but 
you just can't use UDF in C* for that and you'd have to do a manual 
read-before-write client side to achieve this.

For the sake of avoiding confusion, I will not that we do allow:
  UPDATE test_table SET c = c + 1 WHERE idx='abc';
if c is a counter, but that's a very special case. Counters have a completely 
separate path and implementation and do have a read-before-write (and are 
slower than normal update as a result).

Re: Using User Defined Functions in UPDATE queries

2016-03-11 Thread Sylvain Lebresne

UDF are usable in UPDATE statement as actually trying them shows, it's just
the documented grammar that needs fixing.

But as far as doing something like:
  UPDATE test_table SET data=max_int(data,5) WHERE idx='abc’;
this is indeed *not* supported and likely never will. One big pillar of C*
design is that normal writes like this don't do a read-before-write, both
for performance and because of consistency constraints, so we can't have
update depend on the previous value in any way.
I'll note that maybe that make UDF useless for you and if so, I'm sorry,
but you just can't use UDF in C* for that and you'd have to do a manual
read-before-write client side to achieve this.

For the sake of avoiding confusion, I will not that we do allow:
  UPDATE test_table SET c = c + 1 WHERE idx='abc';
if c is a counter, but that's a very special case. Counters have a
completely separate path and implementation and do have a read-before-write
(and are slower than normal update as a result).

On Thu, Mar 10, 2016 at 11:11 PM, Kim Liu <k...@edgewaternetworks.com>
wrote:

> It does sounds like the use of UDF in UPDATE is in an ambiguous state at
> the moment, then.  The document grammar says they can’t be used, but the
> document examples say they can, and the server will execute them, but it
> can’t execute them in a useful way (i.e. no row supplied data.)
>
> So essentially not useable at the moment, regardless of intent.
>
> Thanks,
> —Kim
>
> From: DuyHai Doan <doanduy...@gmail.com>
> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
> Date: Thursday, March 10, 2016 at 14:03
> To: "user@cassandra.apache.org" <user@cassandra.apache.org>
> Subject: Re: Using User Defined Functions in UPDATE queries
>
> Surely an error because the grammar definition for UPDATE does not mention
> any function call:
> [SNIPPED]
>
> Unless the grammar in the doc is itself not up-to-date
>
>>
>>>

Re: Using User Defined Functions in UPDATE queries

2016-03-10 Thread Kim Liu

It does sounds like the use of UDF in UPDATE is in an ambiguous state at the 
moment, then.  The document grammar says they can’t be used, but the document 
examples say they can, and the server will execute them, but it can’t execute 
them in a useful way (i.e. no row supplied data.)

So essentially not useable at the moment, regardless of intent.

Thanks,
—Kim

From: DuyHai Doan <doanduy...@gmail.com<mailto:doanduy...@gmail.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Thursday, March 10, 2016 at 14:03
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: Using User Defined Functions in UPDATE queries


Surely an error because the grammar definition for UPDATE does not mention any 
function call:

[SNIPPED]

Unless the grammar in the doc is itself not up-to-date

Re: Using User Defined Functions in UPDATE queries

2016-03-10 Thread DuyHai Doan

Surely an error because the grammar definition for UPDATE does not mention
any function call:

 ::= UPDATE  ( USING  ( AND  )* )?
SET  ( ','  )*
WHERE  ( IF  ( AND condition )* )?

 ::=  '='  |
 '='  ('+' | '-') ( |  |
) |
 '='  '+'  |
 '['  ']' '=' 

 ::=  '='  |  '['  ']' '='


 ::=  ( AND  )*

 ::=  '='  |  IN '(' (  ( ','
 )* )? ')' |  IN '?'

 ::= TIMESTAMP  | TTL 

Unless the grammar in the doc is itself not up-to-date
Le 10 mars 2016 22:31, "Kim Liu" <k...@edgewaternetworks.com> a écrit :

> Um, I’m not entirely sure how I misread it, since this was copy-pasted
> from the document:
>   UPDATE atable SET col = some_function(?) …;
>
> So the document examples certainly seem to support the use of UDF in
> UPDATE.  I suppose the document may be more erroneous in its writing than
> I in its misreading.
>
> Additionally, this statement works in cqlsh (presuming max_int() is a UDF):
>   UPDATE test_table SET data=max_int(3,4) WHERE idx='abc’;
>
> So, if the grammar is not supposed to allow this, then there is a bug
> somewhere because in 3.3 it certainly seems to be parsed and executed
> without complaint.
>
> —Kim
>
>
> From: DuyHai Doan <doanduy...@gmail.com>
> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
> Date: Thursday, March 10, 2016 at 13:21
> To: "user@cassandra.apache.org" <user@cassandra.apache.org>
> Subject: Re: Using User Defined Functions in UPDATE queries
>
> You have misread the CQL doc given in the link. According to CQL update
> grammar it's not possible to use UDF. I see UDF only allowed in select
> clause...
>
>>
>>

Re: Using User Defined Functions in UPDATE queries

2016-03-10 Thread Kim Liu

Um, I’m not entirely sure how I misread it, since this was copy-pasted from the 
document:
  UPDATE atable SET col = some_function(?) …;

So the document examples certainly seem to support the use of UDF in UPDATE.  I 
suppose the document may be more erroneous in its writing than I in its 
misreading.

Additionally, this statement works in cqlsh (presuming max_int() is a UDF):
  UPDATE test_table SET data=max_int(3,4) WHERE idx='abc’;

So, if the grammar is not supposed to allow this, then there is a bug somewhere 
because in 3.3 it certainly seems to be parsed and executed without complaint.

—Kim


From: DuyHai Doan <doanduy...@gmail.com<mailto:doanduy...@gmail.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Thursday, March 10, 2016 at 13:21
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: Using User Defined Functions in UPDATE queries


You have misread the CQL doc given in the link. According to CQL update grammar 
it's not possible to use UDF. I see UDF only allowed in select clause...

Re: Using User Defined Functions in UPDATE queries

2016-03-10 Thread DuyHai Doan

You have misread the CQL doc given in the link. According to CQL update
grammar it's not possible to use UDF. I see UDF only allowed in select
clause...
Le 10 mars 2016 22:07, "Kim Liu"  a écrit :

> Hello -
> I am experimenting with User Defined Functions in Cassandra (3.3) and I am
> a bit puzzled by a problem I am having when testing them with cqlsh.  I
> have tried to find the answers online, but have not had any luck so far.
>
> According to http://cassandra.apache.org/doc/cql3/CQL.html it looks like
> a UDF (User Defined Function) should be useable in an update:
>   UPDATE atable SET col = some_function(?) ...;
>
> Current test environment:
>   cassandra@cqlsh:test_space> show version;
>   [cqlsh 5.0.1 | Cassandra 3.3 | CQL spec 3.4.0 | Native protocol v4]
>
>
> However, using a value from the row in question as an argument to the UDF
> does not seem to work.  Here is how I am testing:
>   CREATE KEYSPACE test_space WITH replication = {'class':
> 'SimpleStrategy', 'replication_factor': '1'};
>   USE test_space;
>   CREATE TABLE test_table (idx text, data int, PRIMARY KEY( idx ));
>   INSERT INTO test_table(idx,data) VALUES( 'abc', 1 );
>   CREATE FUNCTION max_int( a int, b int ) CALLED ON NULL INPUT RETURNS int
> LANGUAGE java AS 'return (a == null || b == null) ? 0 : ( a > b ? a : b);';
>   SELECT * FROM test_table;
>   UPDATE test_table SET data=max_int(3,4) WHERE idx='abc’;   This works
>   SELECT * FROM test_table;
>   UPDATE test_table SET data=max_int(data,5) WHERE idx='abc’;  This
> does not work
>
> Results:
>   SyntaxException:  message="line 1:39 no viable alternative at input ',' (UPDATE test_table
> SET data=max_int([data],...)">
>
>
>
> So I am wondering if I am missing something here - or if this is just a
> problem with testing with cqlsh.  If a UDF is not allowed to be passed a
> value from the row that is being updated, then what exactly is the use case
> for having a UDF in an UPDATE?  If all arguments to the UDF have to be
> supplied by the client, then the client might as well perform the
> function.  Using a UDF in an UPDATE would seem to make the most sense only
> if the row data could be accessed, since it could be used to bypass the
> need for a SELECT by a client to read the values and perform the operation
> on them before doing the UPDATE.
>
>
> Thank you,
> Kim Liu
>
> 
> Kim Liu
> Sr. Software Engineer
> k...@edgewaternetworks.com
> ___
> “Nothing in the world is more dangerous than sincere ignorance and
> conscientious stupidity.” -- Martin Luther King Jr.
>
>
>
>
>

Re: Using User Defined Functions in UPDATE queries

Re: Using User Defined Functions in UPDATE queries

Re: Using User Defined Functions in UPDATE queries

Re: Using User Defined Functions in UPDATE queries

Re: Using User Defined Functions in UPDATE queries

Re: Using User Defined Functions in UPDATE queries

Re: Using User Defined Functions in UPDATE queries

Re: Using User Defined Functions in UPDATE queries

8 matches

Site Navigation

Mail list logo

Footer information