[jira] [Comment Edited] (CASSANDRA-19270) Incorrect error type on oversized compound partition key

2024-07-29 Thread Nadav Har'El (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17869336#comment-17869336
 ] 

Nadav Har'El edited comment on CASSANDRA-19270 at 7/29/24 12:21 PM:


I checked now on cassandra-5.0-rc1, and the first bug that I reported in this 
issue wasn't fixed: Trying to insert a compound partition key where one of the 
components is 65KB, instead of InvalidRequest gives a NoHostAvailable with the 
string "'H' format requires 0 <= number <= 65535".

The second problem I reported in the followup comment (with 
IllegalArgumentException) indeed seems to be fixed in Cassandra 5.


was (Author: nyh):
I checked now on cassandra-5.0-rc1, and the first bug that I reported in this 
issue wasn't fixed: Trying to insert a compound partition key where one of the 
components is 65KB, instead of InvalidRequest gives a NoHostAvailable with the 
string "'H' format requires 0 <= number <= 65535".

The second problem I reported in the followup comment (with 
IllegalArgumentException) I can no longer reproduce.

> Incorrect error type on oversized compound partition key
> 
>
> Key: CASSANDRA-19270
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19270
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Nadav Har'El
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
>
> Cassandra limits key lengths (partition and clustering) to 64 KB. If a user 
> attempts to INSERT data with a partition key or clustering key exceeding that 
> size, the result is a clear InvalidRequest error with a message like "{{{}Key 
> length of 66560 is longer than maximum of 65535{}}}".
> There is one exception: If you have a *compound* partition key (i.e., two or 
> more partition key components) and attempt to write one of them larger than 
> 64 KB, then instead of an orderly InvalidRequest like you got when there was 
> just one component, now you get a NoHostAvailable  with the message: 
> "{{{}error("'H' format requires 0 <= number <= 65535")}){}}}". This is not 
> only uglier, it can also confuse the Cassandra driver to retry this request - 
> because it doesn't realize that the request itself is broken and there is no 
> point to repeat it.
> Interestingly, if there are multiple clustering key columns, this problem 
> doesn't happen: we still get a nice InvalidRequest if any one of these is 
> more than 64 KB.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19270) Incorrect error type on oversized compound partition key

2024-07-29 Thread Nadav Har'El (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17869336#comment-17869336
 ] 

Nadav Har'El commented on CASSANDRA-19270:
--

I checked now on cassandra-5.0-rc1, and the first bug that I reported in this 
issue wasn't fixed: Trying to insert a compound partition key where one of the 
components is 65KB, instead of InvalidRequest gives a NoHostAvailable with the 
string "'H' format requires 0 <= number <= 65535".

The second problem I reported in the followup comment (with 
IllegalArgumentException) I can no longer reproduce.

> Incorrect error type on oversized compound partition key
> 
>
> Key: CASSANDRA-19270
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19270
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Nadav Har'El
>Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
>
> Cassandra limits key lengths (partition and clustering) to 64 KB. If a user 
> attempts to INSERT data with a partition key or clustering key exceeding that 
> size, the result is a clear InvalidRequest error with a message like "{{{}Key 
> length of 66560 is longer than maximum of 65535{}}}".
> There is one exception: If you have a *compound* partition key (i.e., two or 
> more partition key components) and attempt to write one of them larger than 
> 64 KB, then instead of an orderly InvalidRequest like you got when there was 
> just one component, now you get a NoHostAvailable  with the message: 
> "{{{}error("'H' format requires 0 <= number <= 65535")}){}}}". This is not 
> only uglier, it can also confuse the Cassandra driver to retry this request - 
> because it doesn't realize that the request itself is broken and there is no 
> point to repeat it.
> Interestingly, if there are multiple clustering key columns, this problem 
> doesn't happen: we still get a nice InvalidRequest if any one of these is 
> more than 64 KB.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19795) In SAI, intersecting two indexes doesn't require ALLOW FILTERING

2024-07-24 Thread Nadav Har'El (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nadav Har'El updated CASSANDRA-19795:
-
Description: 
As explained many years ago in 
https://issues.apache.org/jira/browse/CASSANDRA-5470, when a query involves 
intersecting two secondary indexes, e.g., "WHERE x=1 AND y=2" where "x" and "y" 
are two indexed column, ALLOW FILTERING is required.

I verified that this is still the case today, in Cassandra 5.0-rc1.

But if you use SAI instead of the classic secondary index, suddenly ALLOW 
FILTERING is not required.

I think this is a regression. Even if SAI has a more efficient way of 
intersecting the posting list from two indexes (does it?), in the worst case 
this doesn't help: For example, consider a table with a million rows, half have 
x=1 and the other half have y=2 and just one row has both. Now, a query for 
"WHERE x=1 AND y=2" needs to process half a million rows just to produce one 
result. This is ALLOW FILTERING par excellence.

  was:
As explained many years ago in 
https://issues.apache.org/jira/browse/CASSANDRA-5470, when a query involves 
intersecting two secondary indexes, e.g., "WHERE x=1 AND y=2" where "x" and "y" 
are two indexed column, ALLOW FILTERING is required.

I verified that this is still the case today, in Cassandra 5.0-rc1. If you use 
SAI instead of the classic secondary index, suddenly ALLOW FILTERING is not 
required.

I think this is a regression. Even if SAI has a more efficient way of 
intersecting the posting list from two indexes (does it?), in the worst case 
this doesn't help: For example, consider a table with a million rows, half have 
x=1 and the other half have y=2 and just one row has both. Now, a query for 
"WHERE x=1 AND y=2" needs to process half a million rows just to produce one 
result. This is ALLOW FILTERING par excellence.


> In SAI, intersecting two indexes doesn't require ALLOW FILTERING
> 
>
> Key: CASSANDRA-19795
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19795
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/2i Index
>Reporter: Nadav Har'El
>Priority: Normal
>
> As explained many years ago in 
> https://issues.apache.org/jira/browse/CASSANDRA-5470, when a query involves 
> intersecting two secondary indexes, e.g., "WHERE x=1 AND y=2" where "x" and 
> "y" are two indexed column, ALLOW FILTERING is required.
> I verified that this is still the case today, in Cassandra 5.0-rc1.
> But if you use SAI instead of the classic secondary index, suddenly ALLOW 
> FILTERING is not required.
> I think this is a regression. Even if SAI has a more efficient way of 
> intersecting the posting list from two indexes (does it?), in the worst case 
> this doesn't help: For example, consider a table with a million rows, half 
> have x=1 and the other half have y=2 and just one row has both. Now, a query 
> for "WHERE x=1 AND y=2" needs to process half a million rows just to produce 
> one result. This is ALLOW FILTERING par excellence.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19795) In SAI, intersecting two indexes doesn't require ALLOW FILTERING

2024-07-24 Thread Nadav Har'El (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nadav Har'El updated CASSANDRA-19795:
-
Description: 
As explained many years ago in 
https://issues.apache.org/jira/browse/CASSANDRA-5470, when a query involves 
intersecting two secondary indexes, e.g., "WHERE x=1 AND y=2" where "x" and "y" 
are two indexed column, ALLOW FILTERING is required.

I verified that this is still the case today, in Cassandra 5.0-rc1. If you use 
SAI instead of the classic secondary index, suddenly ALLOW FILTERING is not 
required.

I think this is a regression. Even if SAI has a more efficient way of 
intersecting the posting list from two indexes (does it?), in the worst case 
this doesn't help: For example, consider a table with a million rows, half have 
x=1 and the other half have y=2 and just one row has both. Now, a query for 
"WHERE x=1 AND y=2" needs to process half a million rows just to produce one 
result. This is ALLOW FILTERING par excellence.

  was:
As explained many years ago in 
https://issues.apache.org/jira/browse/CASSANDRA-5470, when a query involves 
intersecting two secondary indexes, e.g., "WHERE x=1 AND y=2" where "x" and "y" 
are two indexed column, ALLOW FILTERING is required.

I verified that this is still the case today, in Cassandra 5.0-rc1, but ALLOW 
FILTERING is suddenly not required for this query if you use SAI instead of the 
classic secondary index.

I think this is a regression. Even if SAI has a more efficient way of 
intersecting the posting list from two indexes (does it?), in the worst case 
this doesn't help: For example, consider a table with a million rows, half have 
x=1 and the other half have y=2 and just one row has both. Now, a query for 
"WHERE x=1 AND y=2" needs to process half a million rows just to produce one 
result. This is ALLOW FILTERING par excellence.


> In SAI, intersecting two indexes doesn't require ALLOW FILTERING
> 
>
> Key: CASSANDRA-19795
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19795
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/2i Index
>Reporter: Nadav Har'El
>Priority: Normal
>
> As explained many years ago in 
> https://issues.apache.org/jira/browse/CASSANDRA-5470, when a query involves 
> intersecting two secondary indexes, e.g., "WHERE x=1 AND y=2" where "x" and 
> "y" are two indexed column, ALLOW FILTERING is required.
> I verified that this is still the case today, in Cassandra 5.0-rc1. If you 
> use SAI instead of the classic secondary index, suddenly ALLOW FILTERING is 
> not required.
> I think this is a regression. Even if SAI has a more efficient way of 
> intersecting the posting list from two indexes (does it?), in the worst case 
> this doesn't help: For example, consider a table with a million rows, half 
> have x=1 and the other half have y=2 and just one row has both. Now, a query 
> for "WHERE x=1 AND y=2" needs to process half a million rows just to produce 
> one result. This is ALLOW FILTERING par excellence.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-19795) In SAI, intersecting two indexes doesn't require ALLOW FILTERING

2024-07-24 Thread Nadav Har'El (Jira)
Nadav Har'El created CASSANDRA-19795:


 Summary: In SAI, intersecting two indexes doesn't require ALLOW 
FILTERING
 Key: CASSANDRA-19795
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19795
 Project: Cassandra
  Issue Type: Bug
  Components: Feature/2i Index
Reporter: Nadav Har'El


As explained many years ago in 
https://issues.apache.org/jira/browse/CASSANDRA-5470, when a query involves 
intersecting two secondary indexes, e.g., "WHERE x=1 AND y=2" where "x" and "y" 
are two indexed column, ALLOW FILTERING is required.

I verified that this is still the case today, in Cassandra 5.0-rc1, but ALLOW 
FILTERING is suddenly not required for this query if you use SAI instead of the 
classic secondary index.

I think this is a regression. Even if SAI has a more efficient way of 
intersecting the posting list from two indexes (does it?), in the worst case 
this doesn't help: For example, consider a table with a million rows, half have 
x=1 and the other half have y=2 and just one row has both. Now, a query for 
"WHERE x=1 AND y=2" needs to process half a million rows just to produce one 
result. This is ALLOW FILTERING par excellence.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19270) Incorrect error type on oversized compound partition key

2024-01-17 Thread Nadav Har'El (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17807665#comment-17807665
 ] 

Nadav Har'El commented on CASSANDRA-19270:
--

Yes, I ran my test on the latest (or so I thought) GA version, Cassandra 4.1.3. 
I didn't think of testing on Cassandra 5 before reporting it, sorry.

Here are reproducers (in Scylla's Python-based test framework, but I'm sure you 
can figure out what it does):

 

{{@pytest.fixture(scope="module")}}
{{def table2(cql, test_keyspace):}}
{{    with new_test_table(cql, test_keyspace, "p1 text, p2 text, c1 text, c2 
text, PRIMARY KEY ((p1, p2), c1, c2)") as table:}}
{{        yield table}}

{{def test_insert_65k_pk_compound(cql, table2):}}
{{    stmt = cql.prepare(f'INSERT INTO \{table2} (p1, p2, c1, c2) VALUES 
(?,?,?,?)')}}
{{    big = 'x'*(65*1024)}}
{{    with pytest.raises(InvalidRequest, match='Key length'):}}
{{        cql.execute(stmt, [big, 'dog', 'cat', 'mouse'])}}
{{    with pytest.raises(InvalidRequest, match='Key length'):}}
{{        cql.execute(stmt, ['dog', big, 'cat', 'mouse'])}}

{{def test_insert_65535_compound_pk(cql, table2):}}
{{    stmt = cql.prepare(f'INSERT INTO \{table2} (p1, p2, c1, c2) VALUES 
(?,?,?,?)')}}
{{    length = 65535}}
{{    p1 = "hello"  # not particularly long}}
{{    c1 = unique_key_string()  # not particularly long}}
{{    c2 = unique_key_string()  # not particularly long}}
{{    p2 = random_string(length=(length-len(p1)-100))}}
{{    cql.execute(stmt, [p1, p2, c1, c2])}}
{{    stmt = cql.prepare(f'SELECT * FROM \{table2} WHERE p1=? AND p2=?')}}
{{    assert list(cql.execute(stmt, [p1, p2])) == [(p1, p2, c1, c2)] }}

 

The first test instead of a clean InvalidRequest as the test expects, get a 
NoHostAvailable with the strange message "'H' format requires
# 0 <= number <= 65535"
The second test fails in an even stranger way, with 
java.lang.IllegalArgumentException. The compound partition key is 100 bytes 
(roughly) shorter than the maximum, and still it doesn't work.

> Incorrect error type on oversized compound partition key
> 
>
> Key: CASSANDRA-19270
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19270
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Nadav Har'El
>Priority: Normal
>
> Cassandra limits key lengths (partition and clustering) to 64 KB. If a user 
> attempts to INSERT data with a partition key or clustering key exceeding that 
> size, the result is a clear InvalidRequest error with a message like "{{{}Key 
> length of 66560 is longer than maximum of 65535{}}}".
> There is one exception: If you have a *compound* partition key (i.e., two or 
> more partition key components) and attempt to write one of them larger than 
> 64 KB, then instead of an orderly InvalidRequest like you got when there was 
> just one component, now you get a NoHostAvailable  with the message: 
> "{{{}error("'H' format requires 0 <= number <= 65535")}){}}}". This is not 
> only uglier, it can also confuse the Cassandra driver to retry this request - 
> because it doesn't realize that the request itself is broken and there is no 
> point to repeat it.
> Interestingly, if there are multiple clustering key columns, this problem 
> doesn't happen: we still get a nice InvalidRequest if any one of these is 
> more than 64 KB.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19270) Incorrect error type on oversized compound partition key

2024-01-14 Thread Nadav Har'El (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17806656#comment-17806656
 ] 

Nadav Har'El commented on CASSANDRA-19270:
--

Experimenting some more, I encountered even more bizarre errors when trying to 
use *compound* partition keys which aren't even oversized. As an example I used 
a compound partition key (p1,p2) - both strings - trying to insert p1 a 
2-character string and p2 a 65433-character string, so the total is 65435 
bytes, 100 bytes less than the 65535 maximum - and this should work. But it 
didn't, and got the strange error:

{{E   cassandra.cluster.NoHostAvailable: ('Unable to complete the operation 
against any hosts', \{: })}}

In this case, there was also something in the log:

{{09:09:37.671 [Native-Transport-Requests-1] ERROR 
org.apache.cassandra.transport.}}
{{messages.ErrorMessage - Unexpected exception during request}}
{{java.lang.IllegalArgumentException: newLimit < 0: (-96 < 0)}}
{{        at java.base/java.nio.Buffer.createLimitException(Buffer.java:372)}}
{{        at java.base/java.nio.Buffer.limit(Buffer.java:346)}}
{{        at java.base/java.nio.ByteBuffer.limit(ByteBuffer.java:1107)}}
{{        at 
org.apache.cassandra.db.marshal.ByteBufferAccessor.slice(ByteBufferAccessor.java:109)}}
{{        at 
org.apache.cassandra.db.marshal.ByteBufferAccessor.slice(ByteBufferAccessor.java:41)}}
{{        at 
org.apache.cassandra.db.marshal.AbstractCompositeType.validate(AbstractCompositeType.java:297)}}
{{        at 
org.apache.cassandra.db.marshal.AbstractCompositeType.validate(AbstractCompositeType.java:275)}}
{{        at 
org.apache.cassandra.cql3.Validation.validateKey(Validation.java:60)}}
{{        at 
org.apache.cassandra.cql3.statements.ModificationStatement.addUpdates(ModificationStatement.java:785)}}
{{        at 
org.apache.cassandra.cql3.statements.ModificationStatement.getMutations(ModificationStatement.java:732)}}
{{        at 
org.apache.cassandra.cql3.statements.ModificationStatement.executeWithoutCondition(ModificationStatement.java:509)}}
{{        at 
org.apache.cassandra.cql3.statements.ModificationStatement.execute(ModificationStatement.java:491)}}
{{        at 
org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:258)}}
{{        at 
org.apache.cassandra.cql3.QueryProcessor.processPrepared(QueryProcessor.java:826)}}
{{        at 
org.apache.cassandra.cql3.QueryProcessor.processPrepared(QueryProcessor.java:804)}}
{{        at 
org.apache.cassandra.transport.messages.ExecuteMessage.execute(ExecuteMessage.java:167)}}
{{        at 
org.apache.cassandra.transport.Message$Request.execute(Message.java:255)}}

> Incorrect error type on oversized compound partition key
> 
>
> Key: CASSANDRA-19270
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19270
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Nadav Har'El
>Priority: Normal
>
> Cassandra limits key lengths (partition and clustering) to 64 KB. If a user 
> attempts to INSERT data with a partition key or clustering key exceeding that 
> size, the result is a clear InvalidRequest error with a message like "{{{}Key 
> length of 66560 is longer than maximum of 65535{}}}".
> There is one exception: If you have a *compound* partition key (i.e., two or 
> more partition key components) and attempt to write one of them larger than 
> 64 KB, then instead of an orderly InvalidRequest like you got when there was 
> just one component, now you get a NoHostAvailable  with the message: 
> "{{{}error("'H' format requires 0 <= number <= 65535")}){}}}". This is not 
> only uglier, it can also confuse the Cassandra driver to retry this request - 
> because it doesn't realize that the request itself is broken and there is no 
> point to repeat it.
> Interestingly, if there are multiple clustering key columns, this problem 
> doesn't happen: we still get a nice InvalidRequest if any one of these is 
> more than 64 KB.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-19270) Incorrect error type on oversized compound partition key

2024-01-14 Thread Nadav Har'El (Jira)
Nadav Har'El created CASSANDRA-19270:


 Summary: Incorrect error type on oversized compound partition key
 Key: CASSANDRA-19270
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19270
 Project: Cassandra
  Issue Type: Bug
Reporter: Nadav Har'El


Cassandra limits key lengths (partition and clustering) to 64 KB. If a user 
attempts to INSERT data with a partition key or clustering key exceeding that 
size, the result is a clear InvalidRequest error with a message like "{{{}Key 
length of 66560 is longer than maximum of 65535{}}}".

There is one exception: If you have a *compound* partition key (i.e., two or 
more partition key components) and attempt to write one of them larger than 64 
KB, then instead of an orderly InvalidRequest like you got when there was just 
one component, now you get a NoHostAvailable  with the message: "{{{}error("'H' 
format requires 0 <= number <= 65535")}){}}}". This is not only uglier, it can 
also confuse the Cassandra driver to retry this request - because it doesn't 
realize that the request itself is broken and there is no point to repeat it.

Interestingly, if there are multiple clustering key columns, this problem 
doesn't happen: we still get a nice InvalidRequest if any one of these is more 
than 64 KB.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-19019) DESC TYPE forgets to quote UDT's field names

2023-11-12 Thread Nadav Har'El (Jira)
Nadav Har'El created CASSANDRA-19019:


 Summary: DESC TYPE forgets to quote UDT's field names
 Key: CASSANDRA-19019
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19019
 Project: Cassandra
  Issue Type: Bug
Reporter: Nadav Har'El


If I create a type with

*CREATE TYPE "Quoted_KS"."udt_@@@" (a int, "field_!!!" text)*

and then run DESC TYPE "Quoted_KS"."udt_@@@" I get:

*CREATE TYPE "Quoted_KS"."udt_@@@" (a int, field_!!! text)*

Note the missing quotes around the non-alphanumeric field name, which does need 
quoting. If I'll try to run this command, it won't work.

Tested on Cassandra 4.1.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-19006) DROPing a non-existant function with parameter types results in bizarre error

2023-11-07 Thread Nadav Har'El (Jira)
Nadav Har'El created CASSANDRA-19006:


 Summary: DROPing a non-existant function with parameter types 
results in bizarre error
 Key: CASSANDRA-19006
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19006
 Project: Cassandra
  Issue Type: Bug
Reporter: Nadav Har'El


When attempting a command like

{{DROP FUNCTION ks.fun(int)}}

Where the keyspace "ks" exists, but "fun" doesn't - and note also an attempt to 
choose which of several (non-existent) overloads to remove - one gets a bizarre 
error from Cassandra - instead of InvalidRequest (or maybe 
ConfigurationException), we get a SyntaxError, with the strange message 
"NoSuchElementException No value present". Neither the SyntaxError type nor 
this specific message makes much sense. This is not a syntax error, and the 
same request would have worked if this specific function existed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-19005) DROPing an overloaded UDF produces the wrong error message if drop permissions are lacking

2023-11-07 Thread Nadav Har'El (Jira)
Nadav Har'El created CASSANDRA-19005:


 Summary: DROPing an overloaded UDF produces the wrong error 
message if drop permissions are lacking
 Key: CASSANDRA-19005
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19005
 Project: Cassandra
  Issue Type: Bug
Reporter: Nadav Har'El


When a user creates two user-defined functions with the same name but different 
parameters, to later remove these functions with DROP FUNCTION, the user must 
disambiguate which one to delete. For example, "DROP FUNCTION ks.fun(int, 
int)". If the user tries just "DROP FUNCTION ks.fun", Cassandra will return an 
InvalidRequest, complaining about "multiple functions" with the same name. So 
far so good.

Now, if the user has (via GRANT) permissions to drop only one of these 
functions and no permissions to drop the second, trying to do "DROP FUNCTION 
ks.fun" should still return the good old InvalidRequest, because the request is 
still just as ambiguous as it was when permissions weren't involved. But, 
Cassandra instead notices that one of the variants, e.g., ks.fun(int, int), 
doesn't have drop permissions, and returns an Unauthorized error (instead of 
InvalidRequest), saying that "ks.fun(int, int)" doesn't have drop permissions. 
This is true - but irrelevant - the user didn't ask to drop that specific 
overload of the function. Moreover, it's misleading because it can lead the 
user to GRANT these supposedly-missing permissions, but after granting them, 
the DROP FUNCTION command still won't work, because it will still be ambiguous.

This is a minor error-path bug, but I noticed it while trying to exhaustively 
look how permissions and functions interact in Cassandra.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-18647) CASTing a float to decimal adds wrong digits

2023-07-05 Thread Nadav Har'El (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740047#comment-17740047
 ] 

Nadav Har'El edited comment on CASSANDRA-18647 at 7/5/23 7:35 AM:
--

By the way, there is a unit test - testNumericCastsInSelectionClause in 
test/unit/org/apache/cassandra/cql3/functions/CastFctsTest.java - that should 
have caught this bug. The problem is that it compares the result of the cast 
not to any specific value but to BigDecimal.valueOf(5.2F), and this 
BigDecimal.valueOf(float) is exactly the same function that the Cassandra 
implementation uses for this purpose, so the implementation and the test have 
the same bug and the test doesn't verify anything.

I found the cause of this bug. It turns out that BigDecimal does *not* have a 
float overload, only a double. The Java documentation says that:
{quote}valueOf(double val) Translates a double into a BigDecimal, using the 
double's canonical string representation provided by the 
Double.toString(double) method.
{quote}
So the solution of how to turn a float into a Decimal is easy - just use 
*Float.toString(float)* and then construct a BigDecimal using that string - do 
*not* use BigDecimal.valueOf(double) on a float.

So it seems the fix would be a two-line patch to getDecimalConversionFunction() 
in src/java/org/apache/cassandra/cql3/functions/CastFcts.java to do that. And 
also fix the test, of course.


was (Author: nyh):
By the way, there is a unit test - testNumericCastsInSelectionClause in 
test/unit/org/apache/cassandra/cql3/functions/CastFctsTest.java - that should 
have caught this bug. The problem is that it compares the result of the cast 
not to any specific value but to BigDecimal.valueOf(5.2F), and this 
BigDecimal.valueOf(float) is apparently the same function that the Cassandra 
implementation uses for this purpose, so if the implementation has a bug the 
test doesn't verify anything.

I know the cause of this bug. It turns out that BigDecimal does *not* have a 
float overload, only a double. The Java documentation says that:
{quote}valueOf(double val) Translates a double into a BigDecimal, using the 
double's canonical string representation provided by the 
Double.toString(double) method.
{quote}
So the solution of how to turn a float into a Decimal is easy - just use 
*Float.toString(float)* and then construct a BigDecimal using that string - do 
*not* use BigDecimal.valueOf(double) on a float.

So it seems the fix would be a two-line patch to getDecimalConversionFunction() 
in src/java/org/apache/cassandra/cql3/functions/CastFcts.java to do that. And 
also fix the test, of course.

> CASTing a float to decimal adds wrong digits
> 
>
> Key: CASSANDRA-18647
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18647
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Nadav Har'El
>Priority: Normal
>
> If I create a table with a *float* (32-bit) column, and cast it to the 
> *decimal* type, the casting wrongly passes through the double (64-bit) type 
> and picks up extra, wrong, digits. For example, if we have a column e of type 
> "float", and run
> INSERT INTO tbl (p, e) VALUES (1, 5.2)
> SELECT CAST(e AS decimal) FROM tbl WHERE p=1
> The result is the "decimal" value 5.19809265137, with all those extra 
> wrong digits. It would have been better to get back the decimal value 5.2, 
> with only two significant digits.
> It appears that this happens because Cassandra's implementation first 
> converts the 32-bit float into a 64-bit double, and only then converts that - 
> with all the silly extra digits it picked up in the first conversion - into a 
> "decimal" value.
> Contrast this with CAST(e AS text) which works correctly - it returns the 
> string "5.2" - only the actual digits of the 32-bit floating point value are 
> converted to the string, without inventing additional digits in the process.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-18647) CASTing a float to decimal adds wrong digits

2023-07-05 Thread Nadav Har'El (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740047#comment-17740047
 ] 

Nadav Har'El edited comment on CASSANDRA-18647 at 7/5/23 7:22 AM:
--

By the way, there is a unit test - testNumericCastsInSelectionClause in 
test/unit/org/apache/cassandra/cql3/functions/CastFctsTest.java - that should 
have caught this bug. The problem is that it compares the result of the cast 
not to any specific value but to BigDecimal.valueOf(5.2F), and this 
BigDecimal.valueOf(float) is apparently the same function that the Cassandra 
implementation uses for this purpose, so if the implementation has a bug the 
test doesn't verify anything.

I think I know the cause of this bug. It turns out that BigDecimal does *not* 
have a float overload, only a double. The Java documentation says that:

valueOf(double val) Translates a double into a BigDecimal, using the double's 
canonical string representation provided by the Double.toString(double) method.

So the solution of how to turn a float into a Decimal is easy - just use 
*Float.toString(float)* and then construct a BigDecimal using that string - do 
*not* use BigDecimal.valueOf(double) on a float.

So it seems the fix would be a two-line patch to getDecimalConversionFunction() 
in src/java/org/apache/cassandra/cql3/functions/CastFcts.java to do that. And 
also fix the test, of course.


was (Author: nyh):
By the way, there is a unit test - testNumericCastsInSelectionClause in 
test/unit/org/apache/cassandra/cql3/functions/CastFctsTest.java - that should 
have caught this bug. The problem is that it compares the result of the cast 
not to any specific value but to BigDecimal.valueOf(5.2F), and this 
BigDecimal.valueOf(float) is apparently the same function that the Cassandra 
implementation uses for this purpose, so if the implementation has a bug the 
test doesn't verify anything.

 

I think I know the cause of this bug. It turns out that BigDecimal does *not* 
have a float overload, only a double. The Java documentation says that:

valueOf(double val) Translates a double into a BigDecimal, using the double's 
canonical string representation provided by the Double.toString(double) method.

So the solution of how to turn a float into a Decimal is easy - just use 
*Float.toString(float)* and then construct a BigDecimal using that string - do 
*not* use BigDecimal.valueOf(double) on a float. 

So it seems the fix would be a two-line patch to getDecimalConversionFunction() 
in src/java/org/apache/cassandra/cql3/functions/CastFcts.java to do that. And 
also fix the test, of course.

> CASTing a float to decimal adds wrong digits
> 
>
> Key: CASSANDRA-18647
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18647
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Nadav Har'El
>Priority: Normal
>
> If I create a table with a *float* (32-bit) column, and cast it to the 
> *decimal* type, the casting wrongly passes through the double (64-bit) type 
> and picks up extra, wrong, digits. For example, if we have a column e of type 
> "float", and run
> INSERT INTO tbl (p, e) VALUES (1, 5.2)
> SELECT CAST(e AS decimal) FROM tbl WHERE p=1
> The result is the "decimal" value 5.19809265137, with all those extra 
> wrong digits. It would have been better to get back the decimal value 5.2, 
> with only two significant digits.
> It appears that this happens because Cassandra's implementation first 
> converts the 32-bit float into a 64-bit double, and only then converts that - 
> with all the silly extra digits it picked up in the first conversion - into a 
> "decimal" value.
> Contrast this with CAST(e AS text) which works correctly - it returns the 
> string "5.2" - only the actual digits of the 32-bit floating point value are 
> converted to the string, without inventing additional digits in the process.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-18647) CASTing a float to decimal adds wrong digits

2023-07-05 Thread Nadav Har'El (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740047#comment-17740047
 ] 

Nadav Har'El edited comment on CASSANDRA-18647 at 7/5/23 7:22 AM:
--

By the way, there is a unit test - testNumericCastsInSelectionClause in 
test/unit/org/apache/cassandra/cql3/functions/CastFctsTest.java - that should 
have caught this bug. The problem is that it compares the result of the cast 
not to any specific value but to BigDecimal.valueOf(5.2F), and this 
BigDecimal.valueOf(float) is apparently the same function that the Cassandra 
implementation uses for this purpose, so if the implementation has a bug the 
test doesn't verify anything.

I know the cause of this bug. It turns out that BigDecimal does *not* have a 
float overload, only a double. The Java documentation says that:
{quote}valueOf(double val) Translates a double into a BigDecimal, using the 
double's canonical string representation provided by the 
Double.toString(double) method.
{quote}
So the solution of how to turn a float into a Decimal is easy - just use 
*Float.toString(float)* and then construct a BigDecimal using that string - do 
*not* use BigDecimal.valueOf(double) on a float.

So it seems the fix would be a two-line patch to getDecimalConversionFunction() 
in src/java/org/apache/cassandra/cql3/functions/CastFcts.java to do that. And 
also fix the test, of course.


was (Author: nyh):
By the way, there is a unit test - testNumericCastsInSelectionClause in 
test/unit/org/apache/cassandra/cql3/functions/CastFctsTest.java - that should 
have caught this bug. The problem is that it compares the result of the cast 
not to any specific value but to BigDecimal.valueOf(5.2F), and this 
BigDecimal.valueOf(float) is apparently the same function that the Cassandra 
implementation uses for this purpose, so if the implementation has a bug the 
test doesn't verify anything.

I think I know the cause of this bug. It turns out that BigDecimal does *not* 
have a float overload, only a double. The Java documentation says that:

valueOf(double val) Translates a double into a BigDecimal, using the double's 
canonical string representation provided by the Double.toString(double) method.

So the solution of how to turn a float into a Decimal is easy - just use 
*Float.toString(float)* and then construct a BigDecimal using that string - do 
*not* use BigDecimal.valueOf(double) on a float.

So it seems the fix would be a two-line patch to getDecimalConversionFunction() 
in src/java/org/apache/cassandra/cql3/functions/CastFcts.java to do that. And 
also fix the test, of course.

> CASTing a float to decimal adds wrong digits
> 
>
> Key: CASSANDRA-18647
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18647
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Nadav Har'El
>Priority: Normal
>
> If I create a table with a *float* (32-bit) column, and cast it to the 
> *decimal* type, the casting wrongly passes through the double (64-bit) type 
> and picks up extra, wrong, digits. For example, if we have a column e of type 
> "float", and run
> INSERT INTO tbl (p, e) VALUES (1, 5.2)
> SELECT CAST(e AS decimal) FROM tbl WHERE p=1
> The result is the "decimal" value 5.19809265137, with all those extra 
> wrong digits. It would have been better to get back the decimal value 5.2, 
> with only two significant digits.
> It appears that this happens because Cassandra's implementation first 
> converts the 32-bit float into a 64-bit double, and only then converts that - 
> with all the silly extra digits it picked up in the first conversion - into a 
> "decimal" value.
> Contrast this with CAST(e AS text) which works correctly - it returns the 
> string "5.2" - only the actual digits of the 32-bit floating point value are 
> converted to the string, without inventing additional digits in the process.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-18647) CASTing a float to decimal adds wrong digits

2023-07-05 Thread Nadav Har'El (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740047#comment-17740047
 ] 

Nadav Har'El edited comment on CASSANDRA-18647 at 7/5/23 7:21 AM:
--

By the way, there is a unit test - testNumericCastsInSelectionClause in 
test/unit/org/apache/cassandra/cql3/functions/CastFctsTest.java - that should 
have caught this bug. The problem is that it compares the result of the cast 
not to any specific value but to BigDecimal.valueOf(5.2F), and this 
BigDecimal.valueOf(float) is apparently the same function that the Cassandra 
implementation uses for this purpose, so if the implementation has a bug the 
test doesn't verify anything.

 

I think I know the cause of this bug. It turns out that BigDecimal does *not* 
have a float overload, only a double. The Java documentation says that:

valueOf(double val) Translates a double into a BigDecimal, using the double's 
canonical string representation provided by the Double.toString(double) method.

So the solution of how to turn a float into a Decimal is easy - just use 
*Float.toString(float)* and then construct a BigDecimal using that string - do 
*not* use BigDecimal.valueOf(double) on a float. And also fix the test, of 
course.

So it seems the fix would be a two-line patch to getDecimalConversionFunction() 
in src/java/org/apache/cassandra/cql3/functions/CastFcts.java to do that.


was (Author: nyh):
By the way, there is a unit test - testNumericCastsInSelectionClause in 
test/unit/org/apache/cassandra/cql3/functions/CastFctsTest.java - that should 
have caught this bug. The problem is that it compares the result of the cast 
not to any specific value but to BigDecimal.valueOf(5.2F), and this 
BigDecimal.valueOf(float) is apparently the same function that the Cassandra 
implementation uses for this purpose, so if the implementation has a bug the 
test doesn't verify anything.

 

I think I know the cause of this bug. It turns out that BigDecimal does *not* 
have a float overload, only a double. The Java documentation says that:

valueOf(double val) Translates a double into a BigDecimal, using the double's 
canonical string representation provided by the Double.toString(double) method.

So the solution of how to turn a float into a Decimal is easy - just use 
*Float.toString(float)* and then construct a BigDecimal using that string - do 
*not* use BigDecimal.valueOf(double) on a float. And also fix the test, of 
course.

So it seems the fix would be a two-line patch to getDecimalConversionFunction() 
in src/java/org/apache/cassandra/cql3/functions/CastFcts.java to do that.

> CASTing a float to decimal adds wrong digits
> 
>
> Key: CASSANDRA-18647
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18647
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Nadav Har'El
>Priority: Normal
>
> If I create a table with a *float* (32-bit) column, and cast it to the 
> *decimal* type, the casting wrongly passes through the double (64-bit) type 
> and picks up extra, wrong, digits. For example, if we have a column e of type 
> "float", and run
> INSERT INTO tbl (p, e) VALUES (1, 5.2)
> SELECT CAST(e AS decimal) FROM tbl WHERE p=1
> The result is the "decimal" value 5.19809265137, with all those extra 
> wrong digits. It would have been better to get back the decimal value 5.2, 
> with only two significant digits.
> It appears that this happens because Cassandra's implementation first 
> converts the 32-bit float into a 64-bit double, and only then converts that - 
> with all the silly extra digits it picked up in the first conversion - into a 
> "decimal" value.
> Contrast this with CAST(e AS text) which works correctly - it returns the 
> string "5.2" - only the actual digits of the 32-bit floating point value are 
> converted to the string, without inventing additional digits in the process.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-18647) CASTing a float to decimal adds wrong digits

2023-07-05 Thread Nadav Har'El (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740047#comment-17740047
 ] 

Nadav Har'El edited comment on CASSANDRA-18647 at 7/5/23 7:21 AM:
--

By the way, there is a unit test - testNumericCastsInSelectionClause in 
test/unit/org/apache/cassandra/cql3/functions/CastFctsTest.java - that should 
have caught this bug. The problem is that it compares the result of the cast 
not to any specific value but to BigDecimal.valueOf(5.2F), and this 
BigDecimal.valueOf(float) is apparently the same function that the Cassandra 
implementation uses for this purpose, so if the implementation has a bug the 
test doesn't verify anything.

 

I think I know the cause of this bug. It turns out that BigDecimal does *not* 
have a float overload, only a double. The Java documentation says that:

valueOf(double val) Translates a double into a BigDecimal, using the double's 
canonical string representation provided by the Double.toString(double) method.

So the solution of how to turn a float into a Decimal is easy - just use 
*Float.toString(float)* and then construct a BigDecimal using that string - do 
*not* use BigDecimal.valueOf(double) on a float. 

So it seems the fix would be a two-line patch to getDecimalConversionFunction() 
in src/java/org/apache/cassandra/cql3/functions/CastFcts.java to do that. And 
also fix the test, of course.


was (Author: nyh):
By the way, there is a unit test - testNumericCastsInSelectionClause in 
test/unit/org/apache/cassandra/cql3/functions/CastFctsTest.java - that should 
have caught this bug. The problem is that it compares the result of the cast 
not to any specific value but to BigDecimal.valueOf(5.2F), and this 
BigDecimal.valueOf(float) is apparently the same function that the Cassandra 
implementation uses for this purpose, so if the implementation has a bug the 
test doesn't verify anything.

 

I think I know the cause of this bug. It turns out that BigDecimal does *not* 
have a float overload, only a double. The Java documentation says that:

valueOf(double val) Translates a double into a BigDecimal, using the double's 
canonical string representation provided by the Double.toString(double) method.

So the solution of how to turn a float into a Decimal is easy - just use 
*Float.toString(float)* and then construct a BigDecimal using that string - do 
*not* use BigDecimal.valueOf(double) on a float. And also fix the test, of 
course.

So it seems the fix would be a two-line patch to getDecimalConversionFunction() 
in src/java/org/apache/cassandra/cql3/functions/CastFcts.java to do that.

> CASTing a float to decimal adds wrong digits
> 
>
> Key: CASSANDRA-18647
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18647
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Nadav Har'El
>Priority: Normal
>
> If I create a table with a *float* (32-bit) column, and cast it to the 
> *decimal* type, the casting wrongly passes through the double (64-bit) type 
> and picks up extra, wrong, digits. For example, if we have a column e of type 
> "float", and run
> INSERT INTO tbl (p, e) VALUES (1, 5.2)
> SELECT CAST(e AS decimal) FROM tbl WHERE p=1
> The result is the "decimal" value 5.19809265137, with all those extra 
> wrong digits. It would have been better to get back the decimal value 5.2, 
> with only two significant digits.
> It appears that this happens because Cassandra's implementation first 
> converts the 32-bit float into a 64-bit double, and only then converts that - 
> with all the silly extra digits it picked up in the first conversion - into a 
> "decimal" value.
> Contrast this with CAST(e AS text) which works correctly - it returns the 
> string "5.2" - only the actual digits of the 32-bit floating point value are 
> converted to the string, without inventing additional digits in the process.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-18647) CASTing a float to decimal adds wrong digits

2023-07-05 Thread Nadav Har'El (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740047#comment-17740047
 ] 

Nadav Har'El edited comment on CASSANDRA-18647 at 7/5/23 7:21 AM:
--

By the way, there is a unit test - testNumericCastsInSelectionClause in 
test/unit/org/apache/cassandra/cql3/functions/CastFctsTest.java - that should 
have caught this bug. The problem is that it compares the result of the cast 
not to any specific value but to BigDecimal.valueOf(5.2F), and this 
BigDecimal.valueOf(float) is apparently the same function that the Cassandra 
implementation uses for this purpose, so if the implementation has a bug the 
test doesn't verify anything.

 

I think I know the cause of this bug. It turns out that BigDecimal does *not* 
have a float overload, only a double. The Java documentation says that:

valueOf(double val) Translates a double into a BigDecimal, using the double's 
canonical string representation provided by the Double.toString(double) method.

So the solution of how to turn a float into a Decimal is easy - just use 
*Float.toString(float)* and then construct a BigDecimal using that string - do 
*not* use BigDecimal.valueOf(double) on a float. And also fix the test, of 
course.

So it seems the fix would be a two-line patch to getDecimalConversionFunction() 
in src/java/org/apache/cassandra/cql3/functions/CastFcts.java to do that.


was (Author: nyh):
By the way, there is a unit test - testNumericCastsInSelectionClause in 
test/unit/org/apache/cassandra/cql3/functions/CastFctsTest.java - that should 
have caught this bug. The problem is that it compares the result of the cast 
not to any specific value but to BigDecimal.valueOf(5.2F), and this 
BigDecimal.valueOf(float) is apparently the same function that the Cassandra 
implementation uses for this purpose, so if the implementation has a bug the 
test doesn't verify anything.

 

I think I know the cause of this bug. It turns out that BigDecimal does *not* 
have a float overload, only a double. The Java documentation says that:

valueOf(double val) Translates a double into a BigDecimal, using the double's 
canonical string representation provided by the Double.toString(double) method.

So the solution of how to turn a float into a Decimal is easy - just use 
*Float.toString(float)* and then construct a BigDecimal using that string - do 
*not* use BigDecimal.valueOf(double) on a float.

So it seems the fix would be a two-line patch to getDecimalConversionFunction() 
in src/java/org/apache/cassandra/cql3/functions/CastFcts.java to do that.

> CASTing a float to decimal adds wrong digits
> 
>
> Key: CASSANDRA-18647
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18647
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Nadav Har'El
>Priority: Normal
>
> If I create a table with a *float* (32-bit) column, and cast it to the 
> *decimal* type, the casting wrongly passes through the double (64-bit) type 
> and picks up extra, wrong, digits. For example, if we have a column e of type 
> "float", and run
> INSERT INTO tbl (p, e) VALUES (1, 5.2)
> SELECT CAST(e AS decimal) FROM tbl WHERE p=1
> The result is the "decimal" value 5.19809265137, with all those extra 
> wrong digits. It would have been better to get back the decimal value 5.2, 
> with only two significant digits.
> It appears that this happens because Cassandra's implementation first 
> converts the 32-bit float into a 64-bit double, and only then converts that - 
> with all the silly extra digits it picked up in the first conversion - into a 
> "decimal" value.
> Contrast this with CAST(e AS text) which works correctly - it returns the 
> string "5.2" - only the actual digits of the 32-bit floating point value are 
> converted to the string, without inventing additional digits in the process.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-18647) CASTing a float to decimal adds wrong digits

2023-07-05 Thread Nadav Har'El (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740047#comment-17740047
 ] 

Nadav Har'El edited comment on CASSANDRA-18647 at 7/5/23 7:20 AM:
--

By the way, there is a unit test - testNumericCastsInSelectionClause in 
test/unit/org/apache/cassandra/cql3/functions/CastFctsTest.java - that should 
have caught this bug. The problem is that it compares the result of the cast 
not to any specific value but to BigDecimal.valueOf(5.2F), and this 
BigDecimal.valueOf(float) is apparently the same function that the Cassandra 
implementation uses for this purpose, so if the implementation has a bug the 
test doesn't verify anything.

 

I think I know the cause of this bug. It turns out that BigDecimal does *not* 
have a float overload, only a double. The Java documentation says that:

valueOf(double val) Translates a double into a BigDecimal, using the double's 
canonical string representation provided by the Double.toString(double) method.

So the solution of how to turn a float into a Decimal is easy - just use 
*Float.toString(float)* and then construct a BigDecimal using that string - do 
*not* use BigDecimal.valueOf(double) on a float.

So it seems the fix would be a two-line patch to getDecimalConversionFunction() 
in src/java/org/apache/cassandra/cql3/functions/CastFcts.java to do that.


was (Author: nyh):
By the way, there is a unit test - testNumericCastsInSelectionClause in 
test/unit/org/apache/cassandra/cql3/functions/CastFctsTest.java - that should 
have caught this bug. The problem is that it compares the result of the cast 
not to any specific value but to BigDecimal.valueOf(5.2F), and this 
BigDecimal.valueOf(float) is apparently the same function that the Cassandra 
implementation uses for this purpose, so if the implementation has a bug the 
test doesn't verify anything.

 

I think I know the cause of this bug. It turns out that BigDecimal does *not* 
have a float overload, only a double. The Java documentation says that:

valueOf(double val) Translates a double into a BigDecimal, using the double's 
canonical string representation provided by the Double.toString(double) method.

So the solution of how to turn a float into a Decimal is easy - just use 
*Float.toString(float)* and then construct a BigDecimal using that string - not 
using the float.

So it seems the fix would be a two-line patch to getDecimalConversionFunction() 
in src/java/org/apache/cassandra/cql3/functions/CastFcts.java to do that.

> CASTing a float to decimal adds wrong digits
> 
>
> Key: CASSANDRA-18647
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18647
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Nadav Har'El
>Priority: Normal
>
> If I create a table with a *float* (32-bit) column, and cast it to the 
> *decimal* type, the casting wrongly passes through the double (64-bit) type 
> and picks up extra, wrong, digits. For example, if we have a column e of type 
> "float", and run
> INSERT INTO tbl (p, e) VALUES (1, 5.2)
> SELECT CAST(e AS decimal) FROM tbl WHERE p=1
> The result is the "decimal" value 5.19809265137, with all those extra 
> wrong digits. It would have been better to get back the decimal value 5.2, 
> with only two significant digits.
> It appears that this happens because Cassandra's implementation first 
> converts the 32-bit float into a 64-bit double, and only then converts that - 
> with all the silly extra digits it picked up in the first conversion - into a 
> "decimal" value.
> Contrast this with CAST(e AS text) which works correctly - it returns the 
> string "5.2" - only the actual digits of the 32-bit floating point value are 
> converted to the string, without inventing additional digits in the process.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18647) CASTing a float to decimal adds wrong digits

2023-07-05 Thread Nadav Har'El (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740047#comment-17740047
 ] 

Nadav Har'El commented on CASSANDRA-18647:
--

By the way, there is a unit test - testNumericCastsInSelectionClause in 
test/unit/org/apache/cassandra/cql3/functions/CastFctsTest.java - that should 
have caught this bug. The problem is that it compares the result of the cast 
not to any specific value but to BigDecimal.valueOf(5.2F), and this 
BigDecimal.valueOf(float) is apparently the same function that the Cassandra 
implementation uses for this purpose, so if the implementation has a bug the 
test doesn't verify anything.

 

I think I know the cause of this bug. It turns out that BigDecimal does *not* 
have a float overload, only a double. The Java documentation says that:

valueOf(double val) Translates a double into a BigDecimal, using the double's 
canonical string representation provided by the Double.toString(double) method.

So the solution of how to turn a float into a Decimal is easy - just use 
*Float.toString(float)* and then construct a BigDecimal using that string - not 
using the float.

So it seems the fix would be a two-line patch to getDecimalConversionFunction() 
in src/java/org/apache/cassandra/cql3/functions/CastFcts.java to do that.

> CASTing a float to decimal adds wrong digits
> 
>
> Key: CASSANDRA-18647
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18647
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Nadav Har'El
>Priority: Normal
>
> If I create a table with a *float* (32-bit) column, and cast it to the 
> *decimal* type, the casting wrongly passes through the double (64-bit) type 
> and picks up extra, wrong, digits. For example, if we have a column e of type 
> "float", and run
> INSERT INTO tbl (p, e) VALUES (1, 5.2)
> SELECT CAST(e AS decimal) FROM tbl WHERE p=1
> The result is the "decimal" value 5.19809265137, with all those extra 
> wrong digits. It would have been better to get back the decimal value 5.2, 
> with only two significant digits.
> It appears that this happens because Cassandra's implementation first 
> converts the 32-bit float into a 64-bit double, and only then converts that - 
> with all the silly extra digits it picked up in the first conversion - into a 
> "decimal" value.
> Contrast this with CAST(e AS text) which works correctly - it returns the 
> string "5.2" - only the actual digits of the 32-bit floating point value are 
> converted to the string, without inventing additional digits in the process.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-18647) CASTing a float to decimal adds wrong digits

2023-07-04 Thread Nadav Har'El (Jira)
Nadav Har'El created CASSANDRA-18647:


 Summary: CASTing a float to decimal adds wrong digits
 Key: CASSANDRA-18647
 URL: https://issues.apache.org/jira/browse/CASSANDRA-18647
 Project: Cassandra
  Issue Type: Bug
Reporter: Nadav Har'El


If I create a table with a *float* (32-bit) column, and cast it to the 
*decimal* type, the casting wrongly passes through the double (64-bit) type and 
picks up extra, wrong, digits. For example, if we have a column e of type 
"float", and run

INSERT INTO tbl (p, e) VALUES (1, 5.2)

SELECT CAST(e AS decimal) FROM tbl WHERE p=1

The result is the "decimal" value 5.19809265137, with all those extra wrong 
digits. It would have been better to get back the decimal value 5.2, with only 
two significant digits.

It appears that this happens because Cassandra's implementation first converts 
the 32-bit float into a 64-bit double, and only then converts that - with all 
the silly extra digits it picked up in the first conversion - into a "decimal" 
value.

Contrast this with CAST(e AS text) which works correctly - it returns the 
string "5.2" - only the actual digits of the 32-bit floating point value are 
converted to the string, without inventing additional digits in the process.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18470) Average of "decimal" values rounds the average if all inputs are integers

2023-04-20 Thread Nadav Har'El (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17714628#comment-17714628
 ] 

Nadav Har'El commented on CASSANDRA-18470:
--

Benedict, I confirmed your worry in the last paragraph: The fact that the 
implementation's is keeping only "avg" and not the sum and count separately 
indeed makes this bug even worse:

Today, the AVG of _decimal_ values {*}1{*}, {*}2{*}, *2, 3* comes out as an 1, 
while the correct result is  2.

So the current algorithm can be wrong even if we know a-priori that the result 
is an integer.

> Average of "decimal" values rounds the average if all inputs are integers
> -
>
> Key: CASSANDRA-18470
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18470
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Nadav Har'El
>Priority: Normal
>
> When running the AVG aggregator on "decimal" values, each value is an 
> arbitrary-precision number which may be an integer or fractional, but it is 
> expected that the average would be, in general, fractional. But it turns out 
> that if all the values are integer *without* a ".0", the aggregator sums them 
> up as integers and the final division returns an integer too instead of the 
> fractional response expected from a "decimal" value.
> For example:
>  # AVG of {{decimal}} values 1.0 and 2.0 returns 1.5, as expected.
>  # AVG of 1.0 and 2 or 1 and 2.0 also return 1.5.
>  # But AVG of 1 and 2 returns... 1. This is wrong. The user asked for the 
> average to be a "decimal", not a "varint", so there is no reason why it 
> should be rounded up to be an integer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18470) Average of "decimal" values rounds the average if all inputs are integers

2023-04-20 Thread Nadav Har'El (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17714578#comment-17714578
 ] 

Nadav Har'El commented on CASSANDRA-18470:
--

I did a few more experiments and  have a better understanding of the bug. The 
problem is not just integers vs ".0", but the precision of the inputs:

If I have the values 1.1 and 1.2 and calculate the AVG, it comes out as 1.1 
instead of 1.15.

It appears that the situation we have right now is basically that the result of 
the division will have exactly as many digits after the decimal point as its 
inputs have. It's not clear that this is what users would expect.

Solving this problem is not trivial - it's not clear which precision we should 
use for the division. For example, consider averaging 0.0, 0.0 and 1.0. It 
should result in 0.333. But how many threes? I don't know... Right now 
averaging 0, 0, 1 will result in 0, averaging 0.0, 0.0 and 1.0 result in 0.3, 
averaging 0.00, 0.00, 1.00 will result in 0.33, and so on.

> Average of "decimal" values rounds the average if all inputs are integers
> -
>
> Key: CASSANDRA-18470
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18470
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Nadav Har'El
>Priority: Normal
>
> When running the AVG aggregator on "decimal" values, each value is an 
> arbitrary-precision number which may be an integer or fractional, but it is 
> expected that the average would be, in general, fractional. But it turns out 
> that if all the values are integer *without* a ".0", the aggregator sums them 
> up as integers and the final division returns an integer too instead of the 
> fractional response expected from a "decimal" value.
> For example:
>  # AVG of {{decimal}} values 1.0 and 2.0 returns 1.5, as expected.
>  # AVG of 1.0 and 2 or 1 and 2.0 also return 1.5.
>  # But AVG of 1 and 2 returns... 1. This is wrong. The user asked for the 
> average to be a "decimal", not a "varint", so there is no reason why it 
> should be rounded up to be an integer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-18470) Average of "decimal" values rounds the average if all inputs are integers

2023-04-20 Thread Nadav Har'El (Jira)
Nadav Har'El created CASSANDRA-18470:


 Summary: Average of "decimal" values rounds the average if all 
inputs are integers
 Key: CASSANDRA-18470
 URL: https://issues.apache.org/jira/browse/CASSANDRA-18470
 Project: Cassandra
  Issue Type: Bug
Reporter: Nadav Har'El


When running the AVG aggregator on "decimal" values, each value is an 
arbitrary-precision number which may be an integer or fractional, but it is 
expected that the average would be, in general, fractional. But it turns out 
that if all the values are integer *without* a ".0", the aggregator sums them 
up as integers and the final division returns an integer too instead of the 
fractional response expected from a "decimal" value.

For example:
 # AVG of {{decimal}} values 1.0 and 2.0 returns 1.5, as expected.
 # AVG of 1.0 and 2 or 1 and 2.0 also return 1.5.
 # But AVG of 1 and 2 returns... 1. This is wrong. The user asked for the 
average to be a "decimal", not a "varint", so there is no reason why it should 
be rounded up to be an integer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-16635) dml.rst should not list a "!=" operator

2021-04-27 Thread Nadav Har'El (Jira)
Nadav Har'El created CASSANDRA-16635:


 Summary: dml.rst should not list a "!=" operator
 Key: CASSANDRA-16635
 URL: https://issues.apache.org/jira/browse/CASSANDRA-16635
 Project: Cassandra
  Issue Type: Improvement
Reporter: Nadav Har'El


In {{doc/source/cql/dml.rst}} (which ends up in 
[https://cassandra.apache.org/doc/latest/cql/dml.html),] one of the operators 
listed is "!=". However this operator has never been supported in WHERE 
clauses, and I don't see any plans to making it supported.

The confusion compounds when you notice that the text does refer in a few 
places to "non-equal" or "inequality" operators - but those refer to operators 
like "<=" which are allowed in certain places and not others - not to "!=" 
which isn't allowed anywhere. So "!=" should not be listed at all.

The Datastax version of this document, 
[https://docs.datastax.com/en/cql-oss/3.3/cql/cql_reference/cqlSelect.html,] 
also doesn't list "!=".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9928) Add Support for multiple non-primary key columns in Materialized View primary keys

2019-09-04 Thread Nadav Har'El (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16922466#comment-16922466
 ] 

Nadav Har'El commented on CASSANDRA-9928:
-

This issue has recently turned 4 years old, and I'm curious how sure we are 
about the *reasons* described above for why we forbid MV with two new key 
columns - whether these reasons are correct, and whether we are sure these are 
the only reasons.

As [~fsander] asked above, while a base-view inconsistency is indeed more 
likely in the two-new-key-columns case, don't we have the same problem in the 
regular one-new-key-column case - of scenarios where an unfortunate order of 
node failures cause data to appear in a view replica which doesn't appear in 
the base replica, and thus will never be deleted? I thought this was one of the 
main reasons why MV was recently downgraded to "experimental" status.

But I also wonder if we didn't miss a second problem, that of row liveness, 
similar to what we have in the case of unselected columns (see 
[CASSANDRA-13826)|https://jira.apache.org/jira/browse/CASSANDRA-13826)] where 
if we add and remove different base columns which are view keys, but the view 
row has just a *single* timestamp, we can end up being unable to add a view row 
that we previously deleted. For example, here is a scenario I thought might be 
problematic (didn't actually test this, one would need to disable the check in 
the code forbidding multiple new MV key columns to run a test case):

Assume that x,y are regular column in base, but key columns in the view. For 
brevity, we leave out other base key columns and other regular columns. 
Consider the following sequence of events on one row of the base table:
 # Add x=1 at timestamp 1. Since y is still null, no view row is created yet.
 # Add y=1 at timestamp 10. This creates a view row with key x=1, y=1. The row 
only contains a CQL row marker, and a single timestamp is chosen for it: 10.
 # Delete x at timestamp 2. This deletes x’s older (ts=1) value, and so the 
view row should be deleted. Again, a timestamp needs to be chosen for this 
deletion - it will be 10 again, and the deletion will override the creation 
with the same timestamp from the previous step, and so far everything is fine.
 # Add x=2 at timestamp 3. This overrides the deletion of x (which was in 
timestamp 2) so again, both x and y have values and a view row should be 
created with key x=2, y=1. However, this creation will again have timestamp 10 
(y’s timestamp) and not be able to shadow the deletion from step 3 (in step 3, 
deletion won over data, so here it will win again). So the view row we wanted 
to add will not be added!

> Add Support for multiple non-primary key columns in Materialized View primary 
> keys
> --
>
> Key: CASSANDRA-9928
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9928
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Feature/Materialized Views
>Reporter: T Jake Luciani
>Priority: Normal
>  Labels: materializedviews
> Fix For: 4.x
>
>
> Currently we don't allow > 1 non primary key from the base table in a MV 
> primary key.  We should remove this restriction assuming we continue 
> filtering out nulls.  With allowing nulls in the MV columns there are a lot 
> of multiplicative implications we need to think through.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14478) Improve the documentation of UPDATE vs INSERT

2018-05-30 Thread Nadav Har'El (JIRA)
Nadav Har'El created CASSANDRA-14478:


 Summary: Improve the documentation of UPDATE vs INSERT
 Key: CASSANDRA-14478
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14478
 Project: Cassandra
  Issue Type: Improvement
  Components: Documentation and Website
Reporter: Nadav Har'El


New Cassandra users often wonder about the difference between the INSERT and 
UPDATE cql commands when applied to ordinary data (not counters or 
transactions). Usually, they are told them that there is really no difference 
between the two - both of them can insert a new row or update an existing one.

The Cassandra CQL documentation 
[http://cassandra.apache.org/doc/latest/cql/dml.html#update|http://cassandra.apache.org/doc/latest/cql/dml.html#update,]

is fairly silent on the question - on the one hand it doesn't explicitly say 
they are the same, but on the other hand describes them both as doing the same 
things, and doesn't explicitly mention any difference.

 

But there is an important difference, which was raised in the past in 
CASSANDRA-11805: INSERT adds a row marker, while UPDATE does not. What does 
this mean? Basically an UPDATE requests that individual cells of the row be 
added, but not that the row itself be added; So if one later deletes the same 
individual cells with DELETE, the entire row goes away. However, an "INSERT" 
not only adds the cells, it also requests that the row be added (this is 
implemented via a "row marker"). So if later all the row's individual cells are 
deleted, an empty row remains behind (i.e., the primary of the row which now 
has no content is still remembered in the table). 

I'm not sure what is the best way to explain this, but what I wrote in the 
paragraph above is a start.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14262) View update sent multiple times during range movement

2018-02-26 Thread Nadav Har'El (JIRA)
Nadav Har'El created CASSANDRA-14262:


 Summary: View update sent multiple times during range movement
 Key: CASSANDRA-14262
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14262
 Project: Cassandra
  Issue Type: Improvement
  Components: Materialized Views
Reporter: Nadav Har'El


This issue is about updating a base table with materialized views while 
token-ranges are being moved, i.e., while a node is being added or removed from 
the cluster (this is a long process because the data needs to be streamed to 
its new owning node).

During this process, each view-mutation we want to write to a view table may 
have an additional "pending node" (or several of them) - another node (or 
nodes) which will hold this view mutation, and we need to send the view 
mutations to these new nodes too. This code existed until CASSANDRA-13069, when 
it was accidentally removed, and returned in CASSANDRA-14251.

However, the current code, in mutateMV(), has each of the RF (e.g., 3) base 
replicas send the view mutation to the the same pending node. This is of course 
redundant, and reduces write throughput while the streaming is performed.

I suggested (based on an idea by [~shlomi_livne]) that it may be enough for 
only the single node which will be paired (when the range movement completes) 
with the pending node to send it the update. [~pauloricardomg] replied (see 
[https://lists.apache.org/thread.html/12c78582a3f709ca33a45e5fa6121148b1b1ad9c9b290d1a21e4409b@%3Cdev.cassandra.apache.org%3E]
 ) that it appears that such an optimization would work in the common case of 
single movements but will not work in rarer more complex cases (I did not fully 
understand the details, check out the above link for the details).

I believe there's another problem with the current code, which is of 
correctness: If any view replica ends up with two different view rows for the 
same partition key, such a mistake cannot currently be fixed (see 
CASSANDRA-10346). But if we have different base replicas with two different 
values (a consistency an ordinary base repair could fix, if we ran it) and both 
of them send their update to the same pending view replica, this view replica 
will now have two rows, one of them wrong (and cannot currently be repaired).

 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-10728) Hash used in repair does not include partition key

2015-11-22 Thread Nadav Har'El (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15020906#comment-15020906
 ] 

Nadav Har'El commented on CASSANDRA-10728:
--

Identical values, yes, but not identical keys...

> Hash used in repair does not include partition key
> --
>
> Key: CASSANDRA-10728
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10728
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Nadav Har'El
>Priority: Minor
>
> When the repair code builds the Merkle Tree, it appears to be using 
> AbstractCompactedRow.update() to calculate a partition's hash. This method's 
> documentation states that it calculates a "digest with the data bytes of the 
> row (not including row key or row size).". The code itself seems to agree 
> with this comment.
> However, I believe that not including the row (actually, partition) key in 
> the hash function is a mistake: This means that if two nodes have the same 
> data but different key, repair would not notice this discrepancy. Moreover, 
> if two different keys have their data switched - or have the same data - 
> again this would not be noticed by repair. Actually running across this 
> problem in a real repair is not very likely, but I can imagine seeing it 
> easily in an hypothetical use case where all partitions have exactly the same 
> data and just the partition key matters.
> I am sorry if I'm mistaken and the partition key is actually taken into 
> account in the Merkle tree, but I tried to find evidence that it does and 
> failed. Glancing over the code, it almost seems that it does use the key: 
> Validator.add() calculates rowHash() which includes the digest (without the 
> partition key) *and* the key's token. But then, the code calls 
> MerkleTree.TreeRange.addHash() on that tuple, and that function conspicuously 
> ignores the token, and only uses the digest.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-10728) Hash used in repair does not include partition key

2015-11-18 Thread Nadav Har'El (JIRA)
Nadav Har'El created CASSANDRA-10728:


 Summary: Hash used in repair does not include partition key
 Key: CASSANDRA-10728
 URL: https://issues.apache.org/jira/browse/CASSANDRA-10728
 Project: Cassandra
  Issue Type: Bug
Reporter: Nadav Har'El
Priority: Minor


When the repair code builds the Merkle Tree, it appears to be using 
AbstractCompactedRow.update() to calculate a partition's hash. This method's 
documentation states that it calculates a "digest with the data bytes of the 
row (not including row key or row size).". The code itself seems to agree with 
this comment.

However, I believe that not including the row (actually, partition) key in the 
hash function is a mistake: This means that if two nodes have the same data but 
different key, repair would not notice this discrepancy. Moreover, if two 
different keys have their data switched - or have the same data - again this 
would not be noticed by repair. Actually running across this problem in a real 
repair is not very likely, but I can imagine seeing it easily in an 
hypothetical use case where all partitions have exactly the same data and just 
the partition key matters.

I am sorry if I'm mistaken and the partition key is actually taken into account 
in the Merkle tree, but I tried to find evidence that it does and failed. 
Glancing over the code, it almost seems that it does use the key: 
Validator.add() calculates rowHash() which includes the digest (without the 
partition key) *and* the key's token. But then, the code calls 
MerkleTree.TreeRange.addHash() on that tuple, and that function conspicuously 
ignores the token, and only uses the digest.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10728) Hash used in repair does not include partition key

2015-11-18 Thread Nadav Har'El (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1508#comment-1508
 ] 

Nadav Har'El commented on CASSANDRA-10728:
--

What if in one replica we have partition 'a' with value 1 and some timestamp, 
and in a second replica, we have partition 'b' with a value 1 and the same 
timestamp - and 'a' and 'b' happen to be close enough in their tokens to be in 
the same Merkle Tree partition range?

I realize this is very unlikely case (especially considering the need for the 
timestamps to be identical, which I haven't considered before). But it seems 
it's possible... For example, consider a contrieved use case which uses 
Cassandra to store a large set of keys - the value of each key is always set to 
"1" (or whatever). Now, at exactly the same time (at millisecond resolution, 
which is Cassandra's default), two servers want to write two different keys "a" 
and "b" - and because of a partition in the cluster, "a" ends up on one 
machine, "b" on a second machine - and both have the same time (in miilisecond 
resolution, it's not completely improbable).

> Hash used in repair does not include partition key
> --
>
> Key: CASSANDRA-10728
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10728
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Nadav Har'El
>Priority: Minor
>
> When the repair code builds the Merkle Tree, it appears to be using 
> AbstractCompactedRow.update() to calculate a partition's hash. This method's 
> documentation states that it calculates a "digest with the data bytes of the 
> row (not including row key or row size).". The code itself seems to agree 
> with this comment.
> However, I believe that not including the row (actually, partition) key in 
> the hash function is a mistake: This means that if two nodes have the same 
> data but different key, repair would not notice this discrepancy. Moreover, 
> if two different keys have their data switched - or have the same data - 
> again this would not be noticed by repair. Actually running across this 
> problem in a real repair is not very likely, but I can imagine seeing it 
> easily in an hypothetical use case where all partitions have exactly the same 
> data and just the partition key matters.
> I am sorry if I'm mistaken and the partition key is actually taken into 
> account in the Merkle tree, but I tried to find evidence that it does and 
> failed. Glancing over the code, it almost seems that it does use the key: 
> Validator.add() calculates rowHash() which includes the digest (without the 
> partition key) *and* the key's token. But then, the code calls 
> MerkleTree.TreeRange.addHash() on that tuple, and that function conspicuously 
> ignores the token, and only uses the digest.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)