[ https://issues.apache.org/jira/browse/CASSANDRA-19661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17851119#comment-17851119 ]
Benjamin Lerer commented on CASSANDRA-19661: -------------------------------------------- [~sergrua] Can you reproduce the problem with the latest C* version? > Cannot restart Cassandra 5 after creating a vector table and index > ------------------------------------------------------------------ > > Key: CASSANDRA-19661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19661 > Project: Cassandra > Issue Type: Bug > Components: Feature/SAI > Reporter: Sergio Rua > Priority: Normal > Fix For: 5.0-rc, 5.x > > > I'm using llama-index and llama3 to train a model. I'm using a very simple > code that reads some *.txt files from local and uploads them to Cassandra and > then creates the index: > > {code:java} > # Create the index from documents > index = VectorStoreIndex.from_documents( > documents, > service_context=vector_store.service_context, > storage_context=storage_context, > show_progress=True, > ) {code} > This works well and I'm able to use a Chat app to get responses from the > Cassandra data. however, right after, I cannot restart Cassandra. It'll break > with the following error: > > {code:java} > INFO [PerDiskMemtableFlushWriter_0:7] 2024-05-23 08:23:20,102 > Flushing.java:179 - Completed flushing > /data/cassandra/data/gpt/docs_20240523-10c8eaa018d811ef8dadf75182f3e2b4/da-6-bti-Data.db > (124.236MiB) for commitlog position > CommitLogPosition(segmentId=1716452305636, position=15336) > [...] > WARN [MemtableFlushWriter:1] 2024-05-23 08:28:29,575 > MemtableIndexWriter.java:92 - [gpt.docs.idx_vector_docs] Aborting index > memtable flush for > /data/cassandra/data/gpt/docs-aea77a80184b11ef8dadf75182f3e2b4/da-3-bti...{code} > {code:java} > java.lang.IllegalStateException: null > at > com.google.common.base.Preconditions.checkState(Preconditions.java:496) > at > org.apache.cassandra.index.sai.disk.v1.vector.VectorPostings.computeRowIds(VectorPostings.java:76) > at > org.apache.cassandra.index.sai.disk.v1.vector.OnHeapGraph.writeData(OnHeapGraph.java:313) > at > org.apache.cassandra.index.sai.memory.VectorMemoryIndex.writeDirect(VectorMemoryIndex.java:272) > at > org.apache.cassandra.index.sai.memory.MemtableIndex.writeDirect(MemtableIndex.java:110) > at > org.apache.cassandra.index.sai.disk.v1.MemtableIndexWriter.flushVectorIndex(MemtableIndexWriter.java:192) > at > org.apache.cassandra.index.sai.disk.v1.MemtableIndexWriter.complete(MemtableIndexWriter.java:117) > at > org.apache.cassandra.index.sai.disk.StorageAttachedIndexWriter.complete(StorageAttachedIndexWriter.java:185) > at java.base/java.util.ArrayList.forEach(ArrayList.java:1541) > at > java.base/java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1085) > at > org.apache.cassandra.io.sstable.format.SSTableWriter.commit(SSTableWriter.java:289) > at > org.apache.cassandra.db.compaction.unified.ShardedMultiWriter.commit(ShardedMultiWriter.java:219) > at > org.apache.cassandra.db.ColumnFamilyStore$Flush.flushMemtable(ColumnFamilyStore.java:1323) > at > org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1222) > at > org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Thread.java:829) {code} > The table created by the script is as follows: > > {noformat} > CREATE TABLE gpt.docs ( > partition_id text, > row_id text, > attributes_blob text, > body_blob text, > vector vector<float, 1024>, > metadata_s map<text, text>, > PRIMARY KEY (partition_id, row_id) > ) WITH CLUSTERING ORDER BY (row_id ASC) > AND additional_write_policy = '99p' > AND allow_auto_snapshot = true > AND bloom_filter_fp_chance = 0.01 > AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} > AND cdc = false > AND comment = '' > AND compaction = {'class': > 'org.apache.cassandra.db.compaction.UnifiedCompactionStrategy', > 'scaling_parameters': 'T4', 'target_sstable_size': '1GiB'} > AND compression = {'chunk_length_in_kb': '16', 'class': > 'org.apache.cassandra.io.compress.LZ4Compressor'} > AND memtable = 'default' > AND crc_check_chance = 1.0 > AND default_time_to_live = 0 > AND extensions = {} > AND gc_grace_seconds = 864000 > AND incremental_backups = true > AND max_index_interval = 2048 > AND memtable_flush_period_in_ms = 0 > AND min_index_interval = 128 > AND read_repair = 'BLOCKING' > AND speculative_retry = '99p'; > CREATE CUSTOM INDEX eidx_metadata_s_docs ON gpt.docs (entries(metadata_s)) > USING 'org.apache.cassandra.index.sai.StorageAttachedIndex'; > CREATE CUSTOM INDEX idx_vector_docs ON gpt.docs (vector) USING > 'org.apache.cassandra.index.sai.StorageAttachedIndex';{noformat} > Thank you > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org