[ 
https://issues.apache.org/jira/browse/CASSANDRA-19461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17925395#comment-17925395
 ] 

Andres de la Peña commented on CASSANDRA-19461:
-----------------------------------------------

[~maedhroz] [~dcapwell] It seems this breaks the indexing of empty values in 
types that allow them and are not indexed as literals. For example, with 
{{{}int{}}}:
{code:java}
@Test
public void testEmptyNonLiteral()
{
    createTable("CREATE TABLE %s (k int PRIMARY KEY, v int)");
    execute("INSERT INTO %s (k, v) VALUES (0, ?)", EMPTY_BYTE_BUFFER);
    flush();
    createIndex(String.format(CREATE_INDEX_TEMPLATE, 'v')); // fails!!!
}
{code}
The index creation will fail with:
{code:java}
WARN  [SecondaryIndexManagement:1] 2025-02-09 15:11:23,714 
SecondaryIndexManager.java:843 - Index build of 
table_testemptynonliteral_00_v_idx failed. Please run full index rebuild to fix 
it.
java.util.concurrent.ExecutionException: java.lang.NullPointerException
        at 
org.apache.cassandra.utils.concurrent.AbstractFuture.getWhenDone(AbstractFuture.java:239)
        at 
org.apache.cassandra.utils.concurrent.AbstractFuture.get(AbstractFuture.java:246)
        at 
org.apache.cassandra.index.sai.StorageAttachedIndex.lambda$getInitializationTask$4(StorageAttachedIndex.java:337)
        at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61)
        at org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.NullPointerException: null
        at 
org.apache.cassandra.db.tries.InMemoryTrie.putRecursive(InMemoryTrie.java:904)
        at 
org.apache.cassandra.db.tries.InMemoryTrie.putRecursive(InMemoryTrie.java:897)
        at 
org.apache.cassandra.db.tries.InMemoryTrie.putSingleton(InMemoryTrie.java:878)
        at 
org.apache.cassandra.index.sai.disk.v1.segment.SegmentTrieBuffer.add(SegmentTrieBuffer.java:69)
        at 
org.apache.cassandra.index.sai.disk.v1.segment.SegmentBuilder$TrieSegmentBuilder.addInternal(SegmentBuilder.java:90)
        at 
org.apache.cassandra.index.sai.disk.v1.segment.SegmentBuilder.add(SegmentBuilder.java:195)
        at 
org.apache.cassandra.index.sai.disk.v1.SSTableIndexWriter.addTerm(SSTableIndexWriter.java:208)
        at 
org.apache.cassandra.index.sai.disk.v1.SSTableIndexWriter.addRow(SSTableIndexWriter.java:99)
        at 
org.apache.cassandra.index.sai.disk.StorageAttachedIndexWriter.addRow(StorageAttachedIndexWriter.java:257)
        at 
org.apache.cassandra.index.sai.disk.StorageAttachedIndexWriter.nextUnfilteredCluster(StorageAttachedIndexWriter.java:131)
        at 
org.apache.cassandra.index.sai.StorageAttachedIndexBuilder.indexSSTable(StorageAttachedIndexBuilder.java:188)
        at 
org.apache.cassandra.index.sai.StorageAttachedIndexBuilder.build(StorageAttachedIndexBuilder.java:118)
        at 
org.apache.cassandra.db.compaction.CompactionManager$13.run(CompactionManager.java:1905)
        at 
org.apache.cassandra.concurrent.FutureTask$3.call(FutureTask.java:141)
        ... 6 common frames omitted
{code}
I think this happens for all types except for {{{}text{}}}, {{{}ascii{}}}, 
{{boolean}} and frozen multicells.

All the data types indexed with the block-balanced tree (numeric types amongst 
others) use {{AbstractTpye#asComparableBytes}} to produce the 
{{{}ByteSource{}}}. There are multiple implementations of that method, but most 
of them return null for an empty column value, which leads to the NPE shown 
above.

I guess if we want to index those empty values 
{{AbstractTpye#asComparableBytes}} should return something that non-null can be 
put on the tree. This would be ideal and would mimic legacy table-based index 
behaviour. I don't know how we should represent empty values for each affected 
data type, though.

However, it's not clear to me how valuable it is, besides consistency with 
legacy table-based 2i, to index empty values for types where 
{{AbstractType#isEmptyValueMeaningless}} is {{{}true{}}}, which are those not 
going to the literal index. Perhaps we should ignore them until we have a means 
to represent empty values in the tree?

> SAI does not index empty bytes even for types that allow empty bytes as a 
> valid input
> -------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-19461
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19461
>             Project: Apache Cassandra
>          Issue Type: Bug
>          Components: Feature/SAI
>            Reporter: Caleb Rackliffe
>            Assignee: Caleb Rackliffe
>            Priority: Normal
>             Fix For: 5.0-rc1, 5.0, 5.1
>
>         Attachments: ci_summary.html, result_details.tar.gz
>
>          Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> This is easy to reproduce with a test that looks something like this:
> {noformat}
> @Test
> public void testEmptyString()
> {
>     createTable("CREATE TABLE %s (k TEXT PRIMARY KEY, v text)");
>     createIndex(String.format(CREATE_INDEX_TEMPLATE, 'v'));
>     execute("INSERT INTO %s (k, v) VALUES ('0', '')");
>     execute("INSERT INTO %s (k) VALUES ('1')");
>     
>     // flush(); <---- there is not always a memtable index involved, a fix 
> will have to pay attention to this
>     List<Row> rows = executeNet("SELECT * FROM %s WHERE v = ''").all();
>     assertEquals(1, rows.size()); <— FAILS! No matches...
> }
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to