[ https://issues.apache.org/jira/browse/CASSANDRA-11130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15136503#comment-15136503 ]
Pavel Yaskevich commented on CASSANDRA-11130: --------------------------------------------- bq. I don't know whether you store the source column value in SASI index or not. It doesn't and that's exactly what I meant when I wrote "inefficient" since, it's going to mean that index results are going to be inaccurate and require more filtering, but I guess that's the price to may for exact much queries when tokenization is used... I will look into it ASAP. > [SASI Pre-QA] = semantics not respected when using StandardAnalyzer > ------------------------------------------------------------------- > > Key: CASSANDRA-11130 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11130 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: Tested from build > [CASSANDRA-11067|https://issues.apache.org/jira/browse/CASSANDRA-11067] > Reporter: DOAN DuyHai > Assignee: Pavel Yaskevich > Fix For: 3.4 > > > Tested from build > [CASSANDRA-11067|https://issues.apache.org/jira/browse/CASSANDRA-11067] > {code:sql} > CREATE KEYSPACE music WITH replication = {'class': 'SimpleStrategy', > 'replication_factor': '1'} AND durable_writes = true; > CREATE TABLE music.albums ( > id int PRIMARY KEY, > artist text, > title1 text, > title2 text > ); > CREATE CUSTOM INDEX ON music.albums (title1) USING > 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = > {'tokenization_skip_stop_words': 'true', 'analyzer_class': > 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', > 'case_sensitive': 'false', 'mode': 'PREFIX', 'tokenization_enable_stemming': > 'true'}; > CREATE CUSTOM INDEX ON music.albums (title2) USING > 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = > {'tokenization_skip_stop_words': 'true', 'analyzer_class': > 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', > 'case_sensitive': 'false', 'mode': 'CONTAINS', > 'tokenization_enable_stemming': 'true'}; > INSERT INTO music.albums(id, artist, title1, title2) > VALUES(1, 'Superpitcher', 'Yesterday', 'Yesterday'); > INSERT INTO music.albums(id, artist, title1, title2) > VALUES(2, 'Hilary Duff', 'So Yesterday', 'So Yesterday'); > INSERT INTO music.albums(id, artist, title1, title2) > VALUES(3, 'The Mr. T Experience', 'Yesterday Rules', 'Yesterday Rules'); > SELECT artist,title1 FROM music.albums WHERE title1='Yesterday'; > artist | title1 > ------------------------+---------------- > Superpitcher | Yesterday > Hilary Duff | So Yesterday > The Mr. T Experience | Yesterday Rules > > (3 rows) > SELECT artist,title1 FROM music.albums WHERE title2='Yesterday'; > artist | title1 > ------------------------+---------------- > Superpitcher | Yesterday > Hilary Duff | So Yesterday > The Mr. T Experience | Yesterday Rules > > (3 rows) > {code} > The semantic of *=* is not respected. SASI should return only 1 row with > exact match. Using *LIKE* would return all 3 rows. It does impact both > *PREFIX* and *CONTAINS* mode. Using *NonTokenizerAnalyzer* return 1 row with > exact match. > So indeed, the semantics of *=* depends on the chosen analyzer, which is > inconsistent. We should force *=* to be exact match no matter which analyzer > is chosen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)