[jira] [Updated] (CASSANDRA-11130) [SASI Pre-QA] = semantics not respected when using StandardAnalyzer

2016-02-08 Thread Pavel Yaskevich (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-11130:

Issue Type: Bug  (was: Sub-task)
Parent: (was: CASSANDRA-11136)

> [SASI Pre-QA] = semantics not respected when using StandardAnalyzer
> ---
>
> Key: CASSANDRA-11130
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11130
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
> Environment: Tested from build 
> [CASSANDRA-11067|https://issues.apache.org/jira/browse/CASSANDRA-11067]
>Reporter: DOAN DuyHai
>Assignee: Pavel Yaskevich
> Fix For: 3.4
>
>
> Tested from build 
> [CASSANDRA-11067|https://issues.apache.org/jira/browse/CASSANDRA-11067]
> {code:sql}
> CREATE KEYSPACE music WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': '1'}  AND durable_writes = true;
> CREATE TABLE music.albums (
> id int PRIMARY KEY,
> artist text,
> title1 text,
> title2 text
> );
> CREATE CUSTOM INDEX ON music.albums (title1) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = 
> {'tokenization_skip_stop_words': 'true', 'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', 
> 'case_sensitive': 'false', 'mode': 'PREFIX', 'tokenization_enable_stemming': 
> 'true'};
> CREATE CUSTOM INDEX ON music.albums (title2) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = 
> {'tokenization_skip_stop_words': 'true', 'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', 
> 'case_sensitive': 'false', 'mode': 'CONTAINS', 
> 'tokenization_enable_stemming': 'true'};
> INSERT INTO music.albums(id, artist, title1, title2) 
> VALUES(1, 'Superpitcher', 'Yesterday', 'Yesterday');
> INSERT INTO music.albums(id, artist, title1, title2) 
> VALUES(2, 'Hilary Duff', 'So Yesterday', 'So Yesterday');
> INSERT INTO music.albums(id, artist, title1, title2) 
> VALUES(3, 'The Mr. T Experience', 'Yesterday Rules', 'Yesterday Rules');
> SELECT artist,title1 FROM music.albums WHERE title1='Yesterday';
>  artist | title1
> +
>Superpitcher |   Yesterday
> Hilary Duff |So Yesterday
>The Mr. T Experience | Yesterday Rules
>  
> (3 rows)
> SELECT artist,title1 FROM music.albums WHERE title2='Yesterday';
> artist | title1
> +
>Superpitcher |   Yesterday
> Hilary Duff |So Yesterday
>The Mr. T Experience | Yesterday Rules
>   
> (3 rows)
> {code}
> The semantic of *=* is not respected. SASI should return only 1 row with 
> exact match. Using *LIKE* would return all 3 rows. It does impact both 
> *PREFIX* and *CONTAINS* mode. Using *NonTokenizerAnalyzer* return 1 row with 
> exact match.
>  So indeed, the semantics of *=* depends on the chosen analyzer, which is 
> inconsistent. We should force *=* to be exact match no matter which analyzer 
> is chosen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11130) [SASI Pre-QA] = semantics not respected when using StandardAnalyzer

2016-02-08 Thread Pavel Yaskevich (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-11130:

Issue Type: Sub-task  (was: Bug)
Parent: CASSANDRA-11136

> [SASI Pre-QA] = semantics not respected when using StandardAnalyzer
> ---
>
> Key: CASSANDRA-11130
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11130
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: CQL
> Environment: Tested from build 
> [CASSANDRA-11067|https://issues.apache.org/jira/browse/CASSANDRA-11067]
>Reporter: DOAN DuyHai
>Assignee: Pavel Yaskevich
> Fix For: 3.4
>
>
> Tested from build 
> [CASSANDRA-11067|https://issues.apache.org/jira/browse/CASSANDRA-11067]
> {code:sql}
> CREATE KEYSPACE music WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': '1'}  AND durable_writes = true;
> CREATE TABLE music.albums (
> id int PRIMARY KEY,
> artist text,
> title1 text,
> title2 text
> );
> CREATE CUSTOM INDEX ON music.albums (title1) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = 
> {'tokenization_skip_stop_words': 'true', 'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', 
> 'case_sensitive': 'false', 'mode': 'PREFIX', 'tokenization_enable_stemming': 
> 'true'};
> CREATE CUSTOM INDEX ON music.albums (title2) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = 
> {'tokenization_skip_stop_words': 'true', 'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', 
> 'case_sensitive': 'false', 'mode': 'CONTAINS', 
> 'tokenization_enable_stemming': 'true'};
> INSERT INTO music.albums(id, artist, title1, title2) 
> VALUES(1, 'Superpitcher', 'Yesterday', 'Yesterday');
> INSERT INTO music.albums(id, artist, title1, title2) 
> VALUES(2, 'Hilary Duff', 'So Yesterday', 'So Yesterday');
> INSERT INTO music.albums(id, artist, title1, title2) 
> VALUES(3, 'The Mr. T Experience', 'Yesterday Rules', 'Yesterday Rules');
> SELECT artist,title1 FROM music.albums WHERE title1='Yesterday';
>  artist | title1
> +
>Superpitcher |   Yesterday
> Hilary Duff |So Yesterday
>The Mr. T Experience | Yesterday Rules
>  
> (3 rows)
> SELECT artist,title1 FROM music.albums WHERE title2='Yesterday';
> artist | title1
> +
>Superpitcher |   Yesterday
> Hilary Duff |So Yesterday
>The Mr. T Experience | Yesterday Rules
>   
> (3 rows)
> {code}
> The semantic of *=* is not respected. SASI should return only 1 row with 
> exact match. Using *LIKE* would return all 3 rows. It does impact both 
> *PREFIX* and *CONTAINS* mode. Using *NonTokenizerAnalyzer* return 1 row with 
> exact match.
>  So indeed, the semantics of *=* depends on the chosen analyzer, which is 
> inconsistent. We should force *=* to be exact match no matter which analyzer 
> is chosen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11130) [SASI Pre-QA] = semantics not respected when using StandardAnalyzer

2016-02-07 Thread DOAN DuyHai (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DOAN DuyHai updated CASSANDRA-11130:

Labels: Tested 
[CASSANDRA-11067|https://issues.apache.org/jira/browse/CASSANDRA-11067] build 
from  (was: )

> [SASI Pre-QA] = semantics not respected when using StandardAnalyzer
> ---
>
> Key: CASSANDRA-11130
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11130
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
>Reporter: DOAN DuyHai
>  Labels: Tested, 
> [CASSANDRA-11067|https://issues.apache.org/jira/browse/CASSANDRA-11067], 
> build, from
>
> Tested from build 
> [CASSANDRA-11067|https://issues.apache.org/jira/browse/CASSANDRA-11067]
> {code:sql}
> CREATE KEYSPACE music WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': '1'}  AND durable_writes = true;
> CREATE TABLE music.albums (
> id int PRIMARY KEY,
> artist text,
> title1 text,
> title2 text
> );
> CREATE CUSTOM INDEX ON music.albums (title1) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = 
> {'tokenization_skip_stop_words': 'true', 'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', 
> 'case_sensitive': 'false', 'mode': 'PREFIX', 'tokenization_enable_stemming': 
> 'true'};
> CREATE CUSTOM INDEX ON music.albums (title2) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = 
> {'tokenization_skip_stop_words': 'true', 'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', 
> 'case_sensitive': 'false', 'mode': 'CONTAINS', 
> 'tokenization_enable_stemming': 'true'};
> INSERT INTO music.albums(id, artist, title1, title2) 
> VALUES(1, 'Superpitcher', 'Yesterday', 'Yesterday');
> INSERT INTO music.albums(id, artist, title1, title2) 
> VALUES(2, 'Hilary Duff', 'So Yesterday', 'So Yesterday');
> INSERT INTO music.albums(id, artist, title1, title2) 
> VALUES(3, 'The Mr. T Experience', 'Yesterday Rules', 'Yesterday Rules');
> SELECT artist,title1 FROM music.albums WHERE title1='Yesterday';
>  artist | title1
> +
>Superpitcher |   Yesterday
> Hilary Duff |So Yesterday
>The Mr. T Experience | Yesterday Rules
>  
> (3 rows)
> SELECT artist,title1 FROM music.albums WHERE title2='Yesterday';
> artist | title1
> +
>Superpitcher |   Yesterday
> Hilary Duff |So Yesterday
>The Mr. T Experience | Yesterday Rules
>   
> (3 rows)
> {code}
> The semantic of *=* is not respected. SASI should return only 1 row with 
> exact match. Using *LIKE* would return all 3 rows. It does impact both 
> *PREFIX* and *CONTAINS* mode. Using *NonTokenizerAnalyzer* return 1 row with 
> exact match.
>  So indeed, the semantics of *=* depends on the chosen analyzer, which is 
> inconsistent. We should force *=* to be exact match no matter which analyzer 
> is chosen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11130) [SASI Pre-QA] = semantics not respected when using StandardAnalyzer

2016-02-07 Thread DOAN DuyHai (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DOAN DuyHai updated CASSANDRA-11130:

Description: 
Tested from build 
[CASSANDRA-11067|https://issues.apache.org/jira/browse/CASSANDRA-11067]

{code:sql}
CREATE KEYSPACE music WITH replication = {'class': 'SimpleStrategy', 
'replication_factor': '1'}  AND durable_writes = true;

CREATE TABLE music.albums (
id int PRIMARY KEY,
artist text,
title1 text,
title2 text
);

CREATE CUSTOM INDEX ON music.albums (title1) USING 
'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = 
{'tokenization_skip_stop_words': 'true', 'analyzer_class': 
'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', 'case_sensitive': 
'false', 'mode': 'PREFIX', 'tokenization_enable_stemming': 'true'};

CREATE CUSTOM INDEX ON music.albums (title2) USING 
'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = 
{'tokenization_skip_stop_words': 'true', 'analyzer_class': 
'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', 'case_sensitive': 
'false', 'mode': 'CONTAINS', 'tokenization_enable_stemming': 'true'};

INSERT INTO music.albums(id, artist, title1, title2) 
VALUES(1, 'Superpitcher', 'Yesterday', 'Yesterday');

INSERT INTO music.albums(id, artist, title1, title2) 
VALUES(2, 'Hilary Duff', 'So Yesterday', 'So Yesterday');

INSERT INTO music.albums(id, artist, title1, title2) 
VALUES(3, 'The Mr. T Experience', 'Yesterday Rules', 'Yesterday Rules');

SELECT artist,title1 FROM music.albums WHERE title1='Yesterday';

 artist | title1
+
   Superpitcher |   Yesterday
Hilary Duff |So Yesterday
   The Mr. T Experience | Yesterday Rules
 
(3 rows)

SELECT artist,title1 FROM music.albums WHERE title2='Yesterday';

artist | title1
+
   Superpitcher |   Yesterday
Hilary Duff |So Yesterday
   The Mr. T Experience | Yesterday Rules
  
(3 rows)
{code}

The semantic of *=* is not respected. SASI should return only 1 row with exact 
match. Using *LIKE* would return all 3 rows. It does impact both *PREFIX* and 
*CONTAINS* mode. Using *NonTokenizerAnalyzer* return 1 row with exact match.

 So indeed, the semantics of *=* depends on the chosen analyzer, which is 
inconsistent. We should force *=* to be exact match no matter which analyzer is 
chosen.

  was:
Tested from build 
[CASSANDRA-11067|https://issues.apache.org/jira/browse/CASSANDRA-11067]

{code:sql}
CREATE KEYSPACE music WITH replication = {'class': 'SimpleStrategy', 
'replication_factor': '1'}  AND durable_writes = true;

CREATE TABLE music.albums (
id int PRIMARY KEY,
artist text,
title1 text,
title2 text
);

CREATE CUSTOM INDEX ON music.albums (title1) USING 
'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = 
{'tokenization_skip_stop_words': 'true', 'analyzer_class': 
'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', 'case_sensitive': 
'false', 'mode': 'PREFIX', 'tokenization_enable_stemming': 'true'};

CREATE CUSTOM INDEX ON music.albums (title2) USING 
'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = 
{'tokenization_skip_stop_words': 'true', 'analyzer_class': 
'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', 'case_sensitive': 
'false', 'mode': 'CONTAINS', 'tokenization_enable_stemming': 'true'};

INSERT INTO music.albums(id, artist, title1, title2) VALUES(1, 'Superpitcher', 
'Yesterday', 'Yesterday');
INSERT INTO music.albums(id, artist, title1, title2) VALUES(1, 'Hilary Duff', 
'So Yesterday', 'So Yesterday');
INSERT INTO music.albums(id, artist, title1, title2) VALUES(1, 'The Mr. T 
Experience', 'Yesterday Rules', 'Yesterday Rules');

SELECT artist,title1 FROM music.albums WHERE title1='Yesterday';

 artist | title1
+
   Superpitcher |   Yesterday
Hilary Duff |So Yesterday
   The Mr. T Experience | Yesterday Rules
 
(3 rows)

SELECT artist,title1 FROM music.albums WHERE title2='Yesterday';

artist | title1
+
   Superpitcher |   Yesterday
Hilary Duff |So Yesterday
   The Mr. T Experience | Yesterday Rules
  
(3 rows)
{code}

The semantic of *=* is not respected. SASI should return only 1 row with exact 
match. Using *LIKE* would return all 3 rows. It does impact both *PREFIX* and 
*CONTAINS* mode. Using *NonTokenizerAnalyzer* return 1 row with exact match.

 So indeed, the semantics of *=* depends on the chosen analyzer, which is 
inconsistent. We should force *=* to be exact match no matter which analyzer is 
chosen.


> [SASI Pre-QA] = semantics not respected when using StandardAnalyzer
> ---
>
> Key: 

[jira] [Updated] (CASSANDRA-11130) [SASI Pre-QA] = semantics not respected when using StandardAnalyzer

2016-02-07 Thread DOAN DuyHai (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DOAN DuyHai updated CASSANDRA-11130:

Environment: Tested from build 
[CASSANDRA-11067|https://issues.apache.org/jira/browse/CASSANDRA-11067]

> [SASI Pre-QA] = semantics not respected when using StandardAnalyzer
> ---
>
> Key: CASSANDRA-11130
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11130
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
> Environment: Tested from build 
> [CASSANDRA-11067|https://issues.apache.org/jira/browse/CASSANDRA-11067]
>Reporter: DOAN DuyHai
>
> Tested from build 
> [CASSANDRA-11067|https://issues.apache.org/jira/browse/CASSANDRA-11067]
> {code:sql}
> CREATE KEYSPACE music WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': '1'}  AND durable_writes = true;
> CREATE TABLE music.albums (
> id int PRIMARY KEY,
> artist text,
> title1 text,
> title2 text
> );
> CREATE CUSTOM INDEX ON music.albums (title1) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = 
> {'tokenization_skip_stop_words': 'true', 'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', 
> 'case_sensitive': 'false', 'mode': 'PREFIX', 'tokenization_enable_stemming': 
> 'true'};
> CREATE CUSTOM INDEX ON music.albums (title2) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = 
> {'tokenization_skip_stop_words': 'true', 'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', 
> 'case_sensitive': 'false', 'mode': 'CONTAINS', 
> 'tokenization_enable_stemming': 'true'};
> INSERT INTO music.albums(id, artist, title1, title2) 
> VALUES(1, 'Superpitcher', 'Yesterday', 'Yesterday');
> INSERT INTO music.albums(id, artist, title1, title2) 
> VALUES(2, 'Hilary Duff', 'So Yesterday', 'So Yesterday');
> INSERT INTO music.albums(id, artist, title1, title2) 
> VALUES(3, 'The Mr. T Experience', 'Yesterday Rules', 'Yesterday Rules');
> SELECT artist,title1 FROM music.albums WHERE title1='Yesterday';
>  artist | title1
> +
>Superpitcher |   Yesterday
> Hilary Duff |So Yesterday
>The Mr. T Experience | Yesterday Rules
>  
> (3 rows)
> SELECT artist,title1 FROM music.albums WHERE title2='Yesterday';
> artist | title1
> +
>Superpitcher |   Yesterday
> Hilary Duff |So Yesterday
>The Mr. T Experience | Yesterday Rules
>   
> (3 rows)
> {code}
> The semantic of *=* is not respected. SASI should return only 1 row with 
> exact match. Using *LIKE* would return all 3 rows. It does impact both 
> *PREFIX* and *CONTAINS* mode. Using *NonTokenizerAnalyzer* return 1 row with 
> exact match.
>  So indeed, the semantics of *=* depends on the chosen analyzer, which is 
> inconsistent. We should force *=* to be exact match no matter which analyzer 
> is chosen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11130) [SASI Pre-QA] = semantics not respected when using StandardAnalyzer

2016-02-07 Thread DOAN DuyHai (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DOAN DuyHai updated CASSANDRA-11130:

Labels:   (was: Tested 
[CASSANDRA-11067|https://issues.apache.org/jira/browse/CASSANDRA-11067] build 
from)

> [SASI Pre-QA] = semantics not respected when using StandardAnalyzer
> ---
>
> Key: CASSANDRA-11130
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11130
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
>Reporter: DOAN DuyHai
>
> Tested from build 
> [CASSANDRA-11067|https://issues.apache.org/jira/browse/CASSANDRA-11067]
> {code:sql}
> CREATE KEYSPACE music WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': '1'}  AND durable_writes = true;
> CREATE TABLE music.albums (
> id int PRIMARY KEY,
> artist text,
> title1 text,
> title2 text
> );
> CREATE CUSTOM INDEX ON music.albums (title1) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = 
> {'tokenization_skip_stop_words': 'true', 'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', 
> 'case_sensitive': 'false', 'mode': 'PREFIX', 'tokenization_enable_stemming': 
> 'true'};
> CREATE CUSTOM INDEX ON music.albums (title2) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = 
> {'tokenization_skip_stop_words': 'true', 'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', 
> 'case_sensitive': 'false', 'mode': 'CONTAINS', 
> 'tokenization_enable_stemming': 'true'};
> INSERT INTO music.albums(id, artist, title1, title2) 
> VALUES(1, 'Superpitcher', 'Yesterday', 'Yesterday');
> INSERT INTO music.albums(id, artist, title1, title2) 
> VALUES(2, 'Hilary Duff', 'So Yesterday', 'So Yesterday');
> INSERT INTO music.albums(id, artist, title1, title2) 
> VALUES(3, 'The Mr. T Experience', 'Yesterday Rules', 'Yesterday Rules');
> SELECT artist,title1 FROM music.albums WHERE title1='Yesterday';
>  artist | title1
> +
>Superpitcher |   Yesterday
> Hilary Duff |So Yesterday
>The Mr. T Experience | Yesterday Rules
>  
> (3 rows)
> SELECT artist,title1 FROM music.albums WHERE title2='Yesterday';
> artist | title1
> +
>Superpitcher |   Yesterday
> Hilary Duff |So Yesterday
>The Mr. T Experience | Yesterday Rules
>   
> (3 rows)
> {code}
> The semantic of *=* is not respected. SASI should return only 1 row with 
> exact match. Using *LIKE* would return all 3 rows. It does impact both 
> *PREFIX* and *CONTAINS* mode. Using *NonTokenizerAnalyzer* return 1 row with 
> exact match.
>  So indeed, the semantics of *=* depends on the chosen analyzer, which is 
> inconsistent. We should force *=* to be exact match no matter which analyzer 
> is chosen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11130) [SASI Pre-QA] = semantics not respected when using StandardAnalyzer

2016-02-07 Thread Pavel Yaskevich (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-11130:

 Reviewer: Sam Tunnicliffe
Fix Version/s: 3.4

> [SASI Pre-QA] = semantics not respected when using StandardAnalyzer
> ---
>
> Key: CASSANDRA-11130
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11130
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
> Environment: Tested from build 
> [CASSANDRA-11067|https://issues.apache.org/jira/browse/CASSANDRA-11067]
>Reporter: DOAN DuyHai
>Assignee: Pavel Yaskevich
> Fix For: 3.4
>
>
> Tested from build 
> [CASSANDRA-11067|https://issues.apache.org/jira/browse/CASSANDRA-11067]
> {code:sql}
> CREATE KEYSPACE music WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': '1'}  AND durable_writes = true;
> CREATE TABLE music.albums (
> id int PRIMARY KEY,
> artist text,
> title1 text,
> title2 text
> );
> CREATE CUSTOM INDEX ON music.albums (title1) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = 
> {'tokenization_skip_stop_words': 'true', 'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', 
> 'case_sensitive': 'false', 'mode': 'PREFIX', 'tokenization_enable_stemming': 
> 'true'};
> CREATE CUSTOM INDEX ON music.albums (title2) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = 
> {'tokenization_skip_stop_words': 'true', 'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', 
> 'case_sensitive': 'false', 'mode': 'CONTAINS', 
> 'tokenization_enable_stemming': 'true'};
> INSERT INTO music.albums(id, artist, title1, title2) 
> VALUES(1, 'Superpitcher', 'Yesterday', 'Yesterday');
> INSERT INTO music.albums(id, artist, title1, title2) 
> VALUES(2, 'Hilary Duff', 'So Yesterday', 'So Yesterday');
> INSERT INTO music.albums(id, artist, title1, title2) 
> VALUES(3, 'The Mr. T Experience', 'Yesterday Rules', 'Yesterday Rules');
> SELECT artist,title1 FROM music.albums WHERE title1='Yesterday';
>  artist | title1
> +
>Superpitcher |   Yesterday
> Hilary Duff |So Yesterday
>The Mr. T Experience | Yesterday Rules
>  
> (3 rows)
> SELECT artist,title1 FROM music.albums WHERE title2='Yesterday';
> artist | title1
> +
>Superpitcher |   Yesterday
> Hilary Duff |So Yesterday
>The Mr. T Experience | Yesterday Rules
>   
> (3 rows)
> {code}
> The semantic of *=* is not respected. SASI should return only 1 row with 
> exact match. Using *LIKE* would return all 3 rows. It does impact both 
> *PREFIX* and *CONTAINS* mode. Using *NonTokenizerAnalyzer* return 1 row with 
> exact match.
>  So indeed, the semantics of *=* depends on the chosen analyzer, which is 
> inconsistent. We should force *=* to be exact match no matter which analyzer 
> is chosen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)