[jira] [Commented] (CASSANDRA-15064) Wrong ordering for timeuuid fields

2019-04-01 Thread Andreas Andersen (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806602#comment-16806602
 ] 

Andreas Andersen commented on CASSANDRA-15064:
--

As for our use case we can simply use the gocql.TimeUUIDWith function 
([https://github.com/gocql/gocql/blob/master/uuid.go#L132]) and supply our own 
clock sequence. We will never have more than 2^14 images created on a single 
second.

When it comes to the cassandra part, simply changing the underlying field from 
a timeuuid to a uuid will solve all of our problems. It sorts in the way we 
expect it to.

Having the behavior for timeuuid fields mentioned in the cassandra 
documentation would have been sufficient for solving our problems - we would 
simply have gone with the uuid datatype from the start. So simply adding more 
information to the docs seems like a good solution to this issue :)

> Wrong ordering for timeuuid fields
> --
>
> Key: CASSANDRA-15064
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15064
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: Andreas Andersen
>Assignee: Jon Meredith
>Priority: Normal
> Attachments: example.cql
>
>
> Hi!
> We're seeing some strange behavior for the ordering of timeuuid fields. They 
> seem to be sorted in the wrong order when the clock_seq_low field in a 
> timeuuid goes from 7f to 80. Consider the following example:
> {noformat}
> cqlsh:test> show version; 
> [cqlsh 5.0.1 | Cassandra 3.11.4 | CQL spec 3.4.4 | Native protocol v4] 
> cqlsh:test> CREATE TABLE t ( 
>     ... partition   int, 
>     ... t   timeuuid, 
>     ... i   int, 
>     ...  
>     ... PRIMARY KEY(partition, t) 
>     ... ) 
>     ... WITH CLUSTERING ORDER BY(t ASC); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b57e-f0def1d0755e, 1); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b57f-f0def1d0755e, 2); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b580-f0def1d0755e, 3); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b581-f0def1d0755e, 4); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b582-f0def1d0755e, 5); 
> cqlsh:test> SELECT * FROM t WHERE partition = 1 ORDER BY t ASC; 
>  
>  partition | t    | i 
> ---+--+--- 
>  1 | 84e2c963-4ef9-11e9-b580-f0def1d0755e | 3 
>  1 | 84e2c963-4ef9-11e9-b581-f0def1d0755e | 4 
>  1 | 84e2c963-4ef9-11e9-b582-f0def1d0755e | 5 
>  1 | 84e2c963-4ef9-11e9-b57e-f0def1d0755e | 1 
>  1 | 84e2c963-4ef9-11e9-b57f-f0def1d0755e | 2 
>  
> (5 rows) 
> cqlsh:test>
> {noformat}
> The expected behavior is that the rows are returned in the same order as they 
> were inserted (we inserted them with their clustering key in an ascending 
> order). Instead, the order "wraps" in the middle.
> This issue only arises when the 9th octet (clock_seq_low) in the uuid goes 
> from 7f to 80. A guess would be that the comparison is implemented as a 
> signed integer instead of an unsigned integer, as 0x7f = 127 and 0x80 = -128. 
> According to the RFC, the field should be treated as an unsigned integer: 
> [https://tools.ietf.org/html/rfc4122#section-4.1.2]
> Changing the field from a timeuuid to a uuid gives the expected correct 
> behavior:
> {noformat}
> cqlsh:test> CREATE TABLE t ( 
>     ... partition   int, 
>     ... t   uuid, 
>     ... i   int, 
>     ...  
>     ... PRIMARY KEY(partition, t) 
>     ... ) 
>     ... WITH CLUSTERING ORDER BY(t ASC); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b57e-f0def1d0755e, 1); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b57f-f0def1d0755e, 2); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b580-f0def1d0755e, 3); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b581-f0def1d0755e, 4); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b582-f0def1d0755e, 5); 
> cqlsh:test> SELECT * FROM t WHERE partition = 1 ORDER BY t ASC; 
>  
>  partition | t    | i 
> ---+--+--- 
>  1 | 84e2c963-4ef9-11e9-b57e-f0def1d0755e | 1 
>  1 | 84e2c963-4ef9-11e9-b57f-f0def1d0755e | 2 
>  1 | 84e2c963-4ef9-11e9-b580-f0def1d0755e | 3 
>  1 | 84e2c963-4ef9-11e9-b581-f0def1d0755e | 4 
>  1 | 84e2c963-4ef9-11e9-b582-f0def1d0755e | 5 
>  
> (5 rows) 
> cqlsh:test>{noformat}
>  
>  



--

[jira] [Commented] (CASSANDRA-15064) Wrong ordering for timeuuid fields

2019-03-29 Thread Andreas Andersen (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804990#comment-16804990
 ] 

Andreas Andersen commented on CASSANDRA-15064:
--

The problem is a real world problem for us, and the timeuuid's in the example 
are not synthetic. We use the gocql client library for golang, which generates 
timeuuid's in this way. It's the function at 
https://github.com/gocql/gocql/blob/master/uuid.go#L121 we use to generate our 
timeuuid's from timestamps, and we rely on the sequence number to be 
incremented on each call.

Our use case is sorting images on a timeline. Our table more or less look like:

{noformat}
CREATE TABLE timeline (
user_id  uuid,
image_timestamp  timeuuid,
image_id uuid,

-- Lots of other metadata fields here...

PRIMARY KEY(user_id, image_timestamp)
)
WITH CLUSTERING ORDER BY(t ASC);
{noformat}

The images are sorted by the time that they were taken. However, images taken 
quickly in series by the same camera might all have equal timestamps. In this 
case, we want to order them by some other property to show them in the correct 
chronological order. As of today we use the filename for this. In order to keep 
the table structure simple, we just sort them in the correct order before 
inserting them, relying on each newly generated timeuuid being greater than the 
last one. Our client side code more or less look like:

{noformat}
type image struct {
user_id uuid
image_iduuid
taken_attimestamp
filenamestring
}

images := []image

// Sort images in the order we want to store them in cassandra
sort(images, [taken_at, filename])

// Now give every image a unique timeuuid and insert them into cassandra. We
// rely on the timeuuid generator to generate id's in ascending order in case of
// multiple images with the same timestamp.
for i := range images {
image_timestamp := uuidFromTime(i.taken_at)
queryCassandra(
"INSERT INTO timeline(user_id, image_timestamp, image_id) VALUES(?, ?, 
?)",
i.user_id, image_timestamp, i.image_id
)
}
{noformat}

Our use case is kind of special, since we at the time of insertion have all 
images that will ever be created for a single user, which allows us to insert 
them in this way. Creating a table in the following way would give us the same 
result:

{noformat}
CREATE TABLE timeline (
user_id  uuid,
image_timestamp  timestamp,
sort_number  int,
image_id uuid,

-- Lots of other metadata fields here...

PRIMARY KEY(user_id, image_timestamp, sort_number)
)
WITH CLUSTERING ORDER BY(t ASC);
{noformat}

In practicality this is not really a big issue for us. We can just change the 
column type from timeuuid to a uuid, and everything will behave the way that we 
expect. While reading the code for gocql.UUIDFromTime() I also realize the our 
implementation is broken, as the sequence number will wrap at times, since it's 
only 14 bits long :)

> Wrong ordering for timeuuid fields
> --
>
> Key: CASSANDRA-15064
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15064
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: Andreas Andersen
>Assignee: Jon Meredith
>Priority: Normal
> Attachments: example.cql
>
>
> Hi!
> We're seeing some strange behavior for the ordering of timeuuid fields. They 
> seem to be sorted in the wrong order when the clock_seq_low field in a 
> timeuuid goes from 7f to 80. Consider the following example:
> {noformat}
> cqlsh:test> show version; 
> [cqlsh 5.0.1 | Cassandra 3.11.4 | CQL spec 3.4.4 | Native protocol v4] 
> cqlsh:test> CREATE TABLE t ( 
>     ... partition   int, 
>     ... t   timeuuid, 
>     ... i   int, 
>     ...  
>     ... PRIMARY KEY(partition, t) 
>     ... ) 
>     ... WITH CLUSTERING ORDER BY(t ASC); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b57e-f0def1d0755e, 1); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b57f-f0def1d0755e, 2); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b580-f0def1d0755e, 3); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b581-f0def1d0755e, 4); 
> cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
> 84e2c963-4ef9-11e9-b582-f0def1d0755e, 5); 
> cqlsh:test> SELECT * FROM t WHERE partition = 1 ORDER BY t ASC; 
>  
>  partition | t    | i 
> ---+--+--- 
>  1 | 84e2c963-4ef9-11e9-b580-f0def1d0755e | 3 
>  1 | 84e2c963-4ef9-11e9-b581-f0def1d0755e | 4 
>  1 | 

[jira] [Created] (CASSANDRA-15064) Wrong ordering for timeuuid fields

2019-03-25 Thread Andreas Andersen (JIRA)
Andreas Andersen created CASSANDRA-15064:


 Summary: Wrong ordering for timeuuid fields
 Key: CASSANDRA-15064
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15064
 Project: Cassandra
  Issue Type: Bug
Reporter: Andreas Andersen
 Attachments: example.cql

Hi!

We're seeing some strange behavior for the ordering of timeuuid fields. They 
seem to be sorted in the wrong order when the clock_seq_low field in a timeuuid 
goes from 7f to 80. Consider the following example:
{noformat}
cqlsh:test> show version; 
[cqlsh 5.0.1 | Cassandra 3.11.4 | CQL spec 3.4.4 | Native protocol v4] 
cqlsh:test> CREATE TABLE t ( 
    ... partition   int, 
    ... t   timeuuid, 
    ... i   int, 
    ...  
    ... PRIMARY KEY(partition, t) 
    ... ) 
    ... WITH CLUSTERING ORDER BY(t ASC); 
cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
84e2c963-4ef9-11e9-b57e-f0def1d0755e, 1); 
cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
84e2c963-4ef9-11e9-b57f-f0def1d0755e, 2); 
cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
84e2c963-4ef9-11e9-b580-f0def1d0755e, 3); 
cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
84e2c963-4ef9-11e9-b581-f0def1d0755e, 4); 
cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
84e2c963-4ef9-11e9-b582-f0def1d0755e, 5); 
cqlsh:test> SELECT * FROM t WHERE partition = 1 ORDER BY t ASC; 
 
 partition | t    | i 
---+--+--- 
 1 | 84e2c963-4ef9-11e9-b580-f0def1d0755e | 3 
 1 | 84e2c963-4ef9-11e9-b581-f0def1d0755e | 4 
 1 | 84e2c963-4ef9-11e9-b582-f0def1d0755e | 5 
 1 | 84e2c963-4ef9-11e9-b57e-f0def1d0755e | 1 
 1 | 84e2c963-4ef9-11e9-b57f-f0def1d0755e | 2 
 
(5 rows) 
cqlsh:test>
{noformat}
The expected behavior is that the rows are returned in the same order as they 
were inserted (we inserted them with their clustering key in an ascending 
order). Instead, the order "wraps" in the middle.

This issue only arises when the 9th octet (clock_seq_low) in the uuid goes from 
7f to 80. A guess would be that the comparison is implemented as a signed 
integer instead of an unsigned integer, as 0x7f = 127 and 0x80 = -128. 
According to the RFC, the field should be treated as an unsigned integer: 
[https://tools.ietf.org/html/rfc4122#section-4.1.2]

Changing the field from a timeuuid to a uuid gives the expected correct 
behavior:
{noformat}
cqlsh:test> CREATE TABLE t ( 
    ... partition   int, 
    ... t   uuid, 
    ... i   int, 
    ...  
    ... PRIMARY KEY(partition, t) 
    ... ) 
    ... WITH CLUSTERING ORDER BY(t ASC); 
cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
84e2c963-4ef9-11e9-b57e-f0def1d0755e, 1); 
cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
84e2c963-4ef9-11e9-b57f-f0def1d0755e, 2); 
cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
84e2c963-4ef9-11e9-b580-f0def1d0755e, 3); 
cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
84e2c963-4ef9-11e9-b581-f0def1d0755e, 4); 
cqlsh:test> INSERT INTO t(partition, t, i) VALUES(1, 
84e2c963-4ef9-11e9-b582-f0def1d0755e, 5); 
cqlsh:test> SELECT * FROM t WHERE partition = 1 ORDER BY t ASC; 
 
 partition | t    | i 
---+--+--- 
 1 | 84e2c963-4ef9-11e9-b57e-f0def1d0755e | 1 
 1 | 84e2c963-4ef9-11e9-b57f-f0def1d0755e | 2 
 1 | 84e2c963-4ef9-11e9-b580-f0def1d0755e | 3 
 1 | 84e2c963-4ef9-11e9-b581-f0def1d0755e | 4 
 1 | 84e2c963-4ef9-11e9-b582-f0def1d0755e | 5 
 
(5 rows) 
cqlsh:test>{noformat}
 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org