[jira] [Comment Edited] (CASSANDRA-12877) SASI index throwing AssertionError on index creation

2016-11-04 Thread Voytek Jarnot (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637691#comment-15637691
 ] 

Voytek Jarnot edited comment on CASSANDRA-12877 at 11/4/16 9:12 PM:


Attached full log output.  Fresh build of cassandra-3.X; fresh install, fresh 
keyspace (SimpleStrategy, RF 1).

1) built/installed 3.10-SNAPSHOT from git branch cassandra-3.X
2) created keyspace (SimpleStrategy, RF 1)
3) created table: (simplified version below, many more valX columns present)
{quote}
CREATE TABLE mytable (
id1 text,
id2 text,
id3 date,
id4 timestamp,
id5 text,
val1 text,
val2 text,
val3 text,
task_id text,
document_nbr text,
val5 text,
PRIMARY KEY ((id1, id2), id3, id4, id5)
) WITH CLUSTERING ORDER BY (id3 DESC, id4 DESC, id5 ASC)
{quote}

4) created materialized view:
{quote}
CREATE MATERIALIZED VIEW mytable_by_task_id AS
SELECT *
FROM mytable
WHERE id1 IS NOT NULL AND id2 IS NOT NULL AND id3 IS NOT NULL AND id4 IS 
NOT NULL AND id5 IS NOT NULL AND task_id IS NOT NULL
PRIMARY KEY (task_id, id3, id4, id1, id2, id5)
WITH CLUSTERING ORDER BY (id3 DESC, id4 DESC, id1 ASC, id2 ASC, id5 ASC)
{quote}
5) inserted 27 million "rows" (i.e., unique values for id5)
6) create index attempt
{quote}
create custom index idx_ar_document_nbr on test_table(document_nbr) using 
'org.apache.cassandra.index.sasi.SASIIndex';
{quote}
7) no error in cqlsh, logged errors attached.

Beginning to suspect CASSANDRA-11990 ... but don't have enough 
internals-knowledge to do much more than guess.


was (Author: voytek.jarnot):
Attached full log output.  Fresh build of cassandra-3.X; fresh install, fresh 
keyspace (SimpleStrategy, RF 1).

1) built/installed 3.10-SNAPSHOT from git branch cassandra-3.X
2) created keyspace (SimpleStrategy, RF 1)
3) created table: (simplified version below, many more valX columns present)
CREATE TABLE mytable (
id1 text,
id2 text,
id3 date,
id4 timestamp,
id5 text,
val1 text,
val2 text,
val3 text,
task_id text,
document_nbr text,
val5 text,
PRIMARY KEY ((id1, id2), id3, id4, id5)
) WITH CLUSTERING ORDER BY (id3 DESC, id4 DESC, id5 ASC)
4) created materialized view:
CREATE MATERIALIZED VIEW mytable_by_task_id AS
SELECT *
FROM mytable
WHERE id1 IS NOT NULL AND id2 IS NOT NULL AND id3 IS NOT NULL AND id4 IS 
NOT NULL AND id5 IS NOT NULL AND task_id IS NOT NULL
PRIMARY KEY (task_id, id3, id4, id1, id2, id5)
WITH CLUSTERING ORDER BY (id3 DESC, id4 DESC, id1 ASC, id2 ASC, id5 ASC)
5) inserted 27 million "rows" (i.e., unique values for id5)
6) create index attempt
create custom index idx_ar_document_nbr on test_table(document_nbr) using 
'org.apache.cassandra.index.sasi.SASIIndex';

7) no error in cqlsh, logged errors attached.

Beginning to suspect CASSANDRA-11990 ... but don't have enough 
internals-knowledge to do much more than guess.

> SASI index throwing AssertionError on index creation
> 
>
> Key: CASSANDRA-12877
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12877
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
> Environment: 3.9 and 3.10 tested on both linux and osx
>Reporter: Voytek Jarnot
> Attachments: idx-stacktrace-03-nov-2016.txt, 
> idx-stacktrace-04-nov-2016.txt
>
>
> Possibly a 3.10 regression?
> I built and installed a 3.10 snapshot (built 03-Nov-2016) to get around 
> CASSANDRA-11670, CASSANDRA-12689, and CASSANDRA-12223 which are holding me 
> back when using 3.9. Edit to add: 3 node cluster, replication factor of 2.
> Would like to state up front that I can't duplicate this with a lightweight 
> throwaway test, which is frustrating, but it keeps hitting me on our dev 
> cluster.  It may require a certain amount of data present (or perhaps a high 
> number of nulls in the indexed column) - never had any luck duplicating with 
> the table shown below.
> Table roughly resembles the following, with many more 'valx' columns:
> CREATE TABLE idx_test_table (
> id1 text,
> id2 text,
> id3 text,
> id4 text,
> val1 text,
> val2 text,
> PRIMARY KEY ((id1, id2), id3, id4)
> ) WITH CLUSTERING ORDER BY (id3 DESC, id4 ASC);
> CREATE CUSTOM INDEX idx_test_index ON idx_test_table (val2) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex';
> The error below occurs in 3.10, but not in 3.9; it occurs whether I insert a 
> bunch of dev data and then create the index, or whether I create the index 
> and then insert a bunch of test data.
> {quote}
> INFO  [MemtableFlushWriter:5] 2016-11-03 21:00:19,416 
> PerSSTableIndexWriter.java:284 - Scheduling index flush to 
> /u01/cassandra-data/data/essatc1/audit_record-520c1dc0a1e411e691db0bd4b103bd15/mc-266-big-SI_idx_ar_document_nbr.db

[jira] [Comment Edited] (CASSANDRA-12877) SASI index throwing AssertionError on index creation

2016-11-03 Thread Voytek Jarnot (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15635127#comment-15635127
 ] 

Voytek Jarnot edited comment on CASSANDRA-12877 at 11/4/16 4:04 AM:


Attached (slightly sanitized) result of a failed attempt to create a SASI index 
as described but on my localhost 1-machine cluster.  Full series of stacktraces 
as well as the "Update table ..." output, giving the details of my setup.

Perhaps worth mentioning: the tables has ~27 million values for the final 
primary key column.


was (Author: voytek.jarnot):
Attached (slightly sanitized) result of a failed attempt to create a SASI index 
as described but on my localhost 1-machine cluster.  Full series of stacktraces 
as well as the "Update table ..." output, giving the details of my setup.

> SASI index throwing AssertionError on index creation
> 
>
> Key: CASSANDRA-12877
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12877
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
> Environment: 3.9 and 3.10 tested on both linux and osx
>Reporter: Voytek Jarnot
> Attachments: idx-stacktrace-03-nov-2016.txt
>
>
> Possibly a 3.10 regression?
> I built and installed a 3.10 snapshot (built 03-Nov-2016) to get around 
> CASSANDRA-11670, CASSANDRA-12689, and CASSANDRA-12223 which are holding me 
> back when using 3.9. Edit to add: 3 node cluster, replication factor of 2.
> Would like to state up front that I can't duplicate this with a lightweight 
> throwaway test, which is frustrating, but it keeps hitting me on our dev 
> cluster.  It may require a certain amount of data present (or perhaps a high 
> number of nulls in the indexed column) - never had any luck duplicating with 
> the table shown below.
> Table roughly resembles the following, with many more 'valx' columns:
> CREATE TABLE idx_test_table (
> id1 text,
> id2 text,
> id3 text,
> id4 text,
> val1 text,
> val2 text,
> PRIMARY KEY ((id1, id2), id3, id4)
> ) WITH CLUSTERING ORDER BY (id3 DESC, id4 ASC);
> CREATE CUSTOM INDEX idx_test_index ON idx_test_table (val2) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex';
> The error below occurs in 3.10, but not in 3.9; it occurs whether I insert a 
> bunch of dev data and then create the index, or whether I create the index 
> and then insert a bunch of test data.
> {quote}
> INFO  [MemtableFlushWriter:5] 2016-11-03 21:00:19,416 
> PerSSTableIndexWriter.java:284 - Scheduling index flush to 
> /u01/cassandra-data/data/essatc1/audit_record-520c1dc0a1e411e691db0bd4b103bd15/mc-266-big-SI_idx_ar_document_nbr.db
> INFO  [SASI-Memtable:1] 2016-11-03 21:00:19,450 
> PerSSTableIndexWriter.java:335 - Index flush to 
> /u01/cassandra-data/data/essatc1/audit_record-520c1dc0a1e411e691db0bd4b103bd15/mc-266-big-SI_idx_ar_document_nbr.db
>  took 33 ms.
> ERROR [SASI-Memtable:1] 2016-11-03 21:00:19,454 CassandraDaemon.java:229 - 
> Exception in thread Thread[SASI-Memtable:1,5,main]
> java.lang.AssertionError: cannot have more than 8 overflow collisions per 
> leaf, but had: 25
>   at 
> org.apache.cassandra.index.sasi.disk.AbstractTokenTreeBuilder$Leaf.createOverflowEntry(AbstractTokenTreeBuilder.java:357)
>  ~[apache-cassandra-3.10-SNAPSHOT.jar:3.10-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.AbstractTokenTreeBuilder$Leaf.createEntry(AbstractTokenTreeBuilder.java:346)
>  ~[apache-cassandra-3.10-SNAPSHOT.jar:3.10-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.DynamicTokenTreeBuilder$DynamicLeaf.serializeData(DynamicTokenTreeBuilder.java:180)
>  ~[apache-cassandra-3.10-SNAPSHOT.jar:3.10-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.AbstractTokenTreeBuilder$Leaf.serialize(AbstractTokenTreeBuilder.java:306)
>  ~[apache-cassandra-3.10-SNAPSHOT.jar:3.10-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.AbstractTokenTreeBuilder.write(AbstractTokenTreeBuilder.java:90)
>  ~[apache-cassandra-3.10-SNAPSHOT.jar:3.10-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.OnDiskIndexBuilder$MutableDataBlock.flushAndClear(OnDiskIndexBuilder.java:629)
>  ~[apache-cassandra-3.10-SNAPSHOT.jar:3.10-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.OnDiskIndexBuilder$MutableLevel.flush(OnDiskIndexBuilder.java:446)
>  ~[apache-cassandra-3.10-SNAPSHOT.jar:3.10-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.OnDiskIndexBuilder$MutableLevel.add(OnDiskIndexBuilder.java:433)
>  ~[apache-cassandra-3.10-SNAPSHOT.jar:3.10-SNAPSHOT]
>   at 
> org.apache.cassandra.index.sasi.disk.OnDiskIndexBuilder.addTerm(OnDiskIndexBuilder.java:207)
>  ~[apache-cassandra-3.10-SNAPSHOT.jar:3.10-SNAPSHOT]
>   at 
>