[
https://issues.apache.org/jira/browse/CASSANDRA-19461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17925395#comment-17925395
]
Andres de la Peña commented on CASSANDRA-19461:
-----------------------------------------------
[~maedhroz] [~dcapwell] It seems this breaks the indexing of empty values in
types that allow them and are not indexed as literals. For example, with
{{{}int{}}}:
{code:java}
@Test
public void testEmptyNonLiteral()
{
createTable("CREATE TABLE %s (k int PRIMARY KEY, v int)");
execute("INSERT INTO %s (k, v) VALUES (0, ?)", EMPTY_BYTE_BUFFER);
flush();
createIndex(String.format(CREATE_INDEX_TEMPLATE, 'v')); // fails!!!
}
{code}
The index creation will fail with:
{code:java}
WARN [SecondaryIndexManagement:1] 2025-02-09 15:11:23,714
SecondaryIndexManager.java:843 - Index build of
table_testemptynonliteral_00_v_idx failed. Please run full index rebuild to fix
it.
java.util.concurrent.ExecutionException: java.lang.NullPointerException
at
org.apache.cassandra.utils.concurrent.AbstractFuture.getWhenDone(AbstractFuture.java:239)
at
org.apache.cassandra.utils.concurrent.AbstractFuture.get(AbstractFuture.java:246)
at
org.apache.cassandra.index.sai.StorageAttachedIndex.lambda$getInitializationTask$4(StorageAttachedIndex.java:337)
at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61)
at org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.NullPointerException: null
at
org.apache.cassandra.db.tries.InMemoryTrie.putRecursive(InMemoryTrie.java:904)
at
org.apache.cassandra.db.tries.InMemoryTrie.putRecursive(InMemoryTrie.java:897)
at
org.apache.cassandra.db.tries.InMemoryTrie.putSingleton(InMemoryTrie.java:878)
at
org.apache.cassandra.index.sai.disk.v1.segment.SegmentTrieBuffer.add(SegmentTrieBuffer.java:69)
at
org.apache.cassandra.index.sai.disk.v1.segment.SegmentBuilder$TrieSegmentBuilder.addInternal(SegmentBuilder.java:90)
at
org.apache.cassandra.index.sai.disk.v1.segment.SegmentBuilder.add(SegmentBuilder.java:195)
at
org.apache.cassandra.index.sai.disk.v1.SSTableIndexWriter.addTerm(SSTableIndexWriter.java:208)
at
org.apache.cassandra.index.sai.disk.v1.SSTableIndexWriter.addRow(SSTableIndexWriter.java:99)
at
org.apache.cassandra.index.sai.disk.StorageAttachedIndexWriter.addRow(StorageAttachedIndexWriter.java:257)
at
org.apache.cassandra.index.sai.disk.StorageAttachedIndexWriter.nextUnfilteredCluster(StorageAttachedIndexWriter.java:131)
at
org.apache.cassandra.index.sai.StorageAttachedIndexBuilder.indexSSTable(StorageAttachedIndexBuilder.java:188)
at
org.apache.cassandra.index.sai.StorageAttachedIndexBuilder.build(StorageAttachedIndexBuilder.java:118)
at
org.apache.cassandra.db.compaction.CompactionManager$13.run(CompactionManager.java:1905)
at
org.apache.cassandra.concurrent.FutureTask$3.call(FutureTask.java:141)
... 6 common frames omitted
{code}
I think this happens for all types except for {{{}text{}}}, {{{}ascii{}}},
{{boolean}} and frozen multicells.
All the data types indexed with the block-balanced tree (numeric types amongst
others) use {{AbstractTpye#asComparableBytes}} to produce the
{{{}ByteSource{}}}. There are multiple implementations of that method, but most
of them return null for an empty column value, which leads to the NPE shown
above.
I guess if we want to index those empty values
{{AbstractTpye#asComparableBytes}} should return something that non-null can be
put on the tree. This would be ideal and would mimic legacy table-based index
behaviour. I don't know how we should represent empty values for each affected
data type, though.
However, it's not clear to me how valuable it is, besides consistency with
legacy table-based 2i, to index empty values for types where
{{AbstractType#isEmptyValueMeaningless}} is {{{}true{}}}, which are those not
going to the literal index. Perhaps we should ignore them until we have a means
to represent empty values in the tree?
> SAI does not index empty bytes even for types that allow empty bytes as a
> valid input
> -------------------------------------------------------------------------------------
>
> Key: CASSANDRA-19461
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19461
> Project: Apache Cassandra
> Issue Type: Bug
> Components: Feature/SAI
> Reporter: Caleb Rackliffe
> Assignee: Caleb Rackliffe
> Priority: Normal
> Fix For: 5.0-rc1, 5.0, 5.1
>
> Attachments: ci_summary.html, result_details.tar.gz
>
> Time Spent: 2h 10m
> Remaining Estimate: 0h
>
> This is easy to reproduce with a test that looks something like this:
> {noformat}
> @Test
> public void testEmptyString()
> {
> createTable("CREATE TABLE %s (k TEXT PRIMARY KEY, v text)");
> createIndex(String.format(CREATE_INDEX_TEMPLATE, 'v'));
> execute("INSERT INTO %s (k, v) VALUES ('0', '')");
> execute("INSERT INTO %s (k) VALUES ('1')");
>
> // flush(); <---- there is not always a memtable index involved, a fix
> will have to pay attention to this
> List<Row> rows = executeNet("SELECT * FROM %s WHERE v = ''").all();
> assertEquals(1, rows.size()); <— FAILS! No matches...
> }
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]