[jira] [Commented] (CASSANDRA-12573) SASI index. Incorrect results for '%foo%bar%'-like search pattern.

2016-09-20 Thread Mikhail Krupitskiy (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15506944#comment-15506944
 ] 

Mikhail Krupitskiy commented on CASSANDRA-12573:


I've created an improvement request - CASSANDRA-12675.
Closing 12573 as a duplicate.

> SASI index. Incorrect results for '%foo%bar%'-like search pattern. 
> ---
>
> Key: CASSANDRA-12573
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12573
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Mikhail Krupitskiy
>Priority: Critical
>  Labels: sasi
>
> We use Cassandra 3.7 and have faced a strange behaviour of SELECT requests 
> with "LIKE '%foo%bar%'" constraints on a column with SASI index.
> Below are few experiments that show this behaviour.
> Experiment 1:
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: no rows.
> Experiment 2 (NOTE: definition of index is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: asdqwe, qweasd, qwea1.
> Experiment 3 (NOTE: primary key is compound now and inserted data was 
> changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, 
> c1));
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: qwe, qweasd, qwea1, 1qwe, asdqwe.
> Experiment 4 (NOTE: search criteria is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, 
> c1));
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w22%a%';
> {noformat}
> Expected result: no rows.
> Actual result: qweasd, qwea1, asdqwe.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12573) SASI index. Incorrect results for '%foo%bar%'-like search pattern.

2016-09-20 Thread DOAN DuyHai (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15506923#comment-15506923
 ] 

DOAN DuyHai commented on CASSANDRA-12573:
-

[~mkrupits]

 I've reproduced the issue and found the root cause. For the sake of clarity, I 
have created another JIRA [CASSANDRA-12674] explaining the issue in details. 
Can you please close this one ?

> SASI index. Incorrect results for '%foo%bar%'-like search pattern. 
> ---
>
> Key: CASSANDRA-12573
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12573
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Mikhail Krupitskiy
>Priority: Critical
>  Labels: sasi
>
> We use Cassandra 3.7 and have faced a strange behaviour of SELECT requests 
> with "LIKE '%foo%bar%'" constraints on a column with SASI index.
> Below are few experiments that show this behaviour.
> Experiment 1:
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: no rows.
> Experiment 2 (NOTE: definition of index is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: asdqwe, qweasd, qwea1.
> Experiment 3 (NOTE: primary key is compound now and inserted data was 
> changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, 
> c1));
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: qwe, qweasd, qwea1, 1qwe, asdqwe.
> Experiment 4 (NOTE: search criteria is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, 
> c1));
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w22%a%';
> {noformat}
> Expected result: no rows.
> Actual result: qweasd, qwea1, asdqwe.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12573) SASI index. Incorrect results for '%foo%bar%'-like search pattern.

2016-09-18 Thread DOAN DuyHai (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15501472#comment-15501472
 ] 

DOAN DuyHai commented on CASSANDRA-12573:
-

Ok it's my bad.  The root of the operation tree for the QueryPlanner is an 
{{AND}}

https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/QueryPlan.java#L54-L60

The {{'%RevisionDiff%ItemImpl%'}} is split into 2 distincts predicates : 
{{CONTAINS RevisionDiff}} &  {{CONTAINS ItemImpl}} and the **AND** logic does 
apply.

 The comment in the source code is pretty misleading.

Back to the original experiments, exp. 1 is consistent, exp. 2 and 4 results 
are also consistent

Only experiment 3 results are wrong:

```sql
insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ;
insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ;
insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ;
insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ;

select c2 from kmv.kmv where c2 like '%w%a%';

```

Expected result: qweasd, qwea1.
Actual result: qwe, qweasd, qwea1, 1qwe, asdqwe.

 Let me reproduce it

> SASI index. Incorrect results for '%foo%bar%'-like search pattern. 
> ---
>
> Key: CASSANDRA-12573
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12573
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Mikhail Krupitskiy
>Priority: Critical
>  Labels: sasi
>
> We use Cassandra 3.7 and have faced a strange behaviour of SELECT requests 
> with "LIKE '%foo%bar%'" constraints on a column with SASI index.
> Below are few experiments that show this behaviour.
> Experiment 1:
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: no rows.
> Experiment 2 (NOTE: definition of index is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: asdqwe, qweasd, qwea1.
> Experiment 3 (NOTE: primary key is compound now and inserted data was 
> changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, 
> c1));
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: qwe, qweasd, qwea1, 1qwe, asdqwe.
> Experiment 4 (NOTE: search criteria is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY 

[jira] [Commented] (CASSANDRA-12573) SASI index. Incorrect results for '%foo%bar%'-like search pattern.

2016-09-18 Thread DOAN DuyHai (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15501454#comment-15501454
 ] 

DOAN DuyHai commented on CASSANDRA-12573:
-

Let me reproduce your results with an unit test

> SASI index. Incorrect results for '%foo%bar%'-like search pattern. 
> ---
>
> Key: CASSANDRA-12573
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12573
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Mikhail Krupitskiy
>Priority: Critical
>  Labels: sasi
>
> We use Cassandra 3.7 and have faced a strange behaviour of SELECT requests 
> with "LIKE '%foo%bar%'" constraints on a column with SASI index.
> Below are few experiments that show this behaviour.
> Experiment 1:
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: no rows.
> Experiment 2 (NOTE: definition of index is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: asdqwe, qweasd, qwea1.
> Experiment 3 (NOTE: primary key is compound now and inserted data was 
> changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, 
> c1));
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: qwe, qweasd, qwea1, 1qwe, asdqwe.
> Experiment 4 (NOTE: search criteria is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, 
> c1));
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w22%a%';
> {noformat}
> Expected result: no rows.
> Actual result: qweasd, qwea1, asdqwe.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12573) SASI index. Incorrect results for '%foo%bar%'-like search pattern.

2016-09-18 Thread Maxim Podkolzine (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15501443#comment-15501443
 ] 

Maxim Podkolzine commented on CASSANDRA-12573:
--

There are 7 rows that contain "RevisionDiff" and 2 rows that contain 
"ItemImpl". There are 9 rows that contain "RevisionDiff" OR "ItemImpl".
Here they are (only the name):
- RevisionDiffType.java: it contains "RevisionDiff", hence it contains 
"RevisionDiff" OR "ItemImpl"
- RevisionDiffItem.java: it contains "RevisionDiff", hence it contains 
"RevisionDiff" OR "ItemImpl"
- RevisionDiffItemDTO.java: it contains "RevisionDiff", hence it contains 
"RevisionDiff" OR "ItemImpl"
- GetRevisionDiff.java: it contains "RevisionDiff", hence it contains 
"RevisionDiff" OR "ItemImpl"
- RevisionDiffItemDTO.java (twice): it contains "RevisionDiff", hence it 
contains "RevisionDiff" OR "ItemImpl"
- RevisionDiffItemImpl.java: it contains "RevisionDiff", hence it contains 
"RevisionDiff" OR "ItemImpl"
- FastTreeItemImpl.java: it contains "ItemImpl", hence it contains 
"RevisionDiff" OR "ItemImpl"
- RevisionDiffItemImpl.java: it contains "ItemImpl", hence it contains 
"RevisionDiff" OR "ItemImpl"

Of these 9 rows there is one row that contains both "RevisionDiff" AND 
"ItemImpl": "RevisionDiffItemImpl.java".

> SASI index. Incorrect results for '%foo%bar%'-like search pattern. 
> ---
>
> Key: CASSANDRA-12573
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12573
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Mikhail Krupitskiy
>Priority: Critical
>  Labels: sasi
>
> We use Cassandra 3.7 and have faced a strange behaviour of SELECT requests 
> with "LIKE '%foo%bar%'" constraints on a column with SASI index.
> Below are few experiments that show this behaviour.
> Experiment 1:
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: no rows.
> Experiment 2 (NOTE: definition of index is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: asdqwe, qweasd, qwea1.
> Experiment 3 (NOTE: primary key is compound now and inserted data was 
> changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, 
> c1));
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: qwe, qweasd, qwea1, 1qwe, asdqwe.
> Experiment 4 (NOTE: search criteria is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 

[jira] [Commented] (CASSANDRA-12573) SASI index. Incorrect results for '%foo%bar%'-like search pattern.

2016-09-18 Thread DOAN DuyHai (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15501413#comment-15501413
 ] 

DOAN DuyHai commented on CASSANDRA-12573:
-

bq. That's good news. When do you plan to merge it?
See this JIRA:  [CASSANDRA-10765] (second comment)

bq. As a customer I have a slightly different view on this. My expectations are 
based on prior experience and common sense.

 What are you talking about ? Customer of what ? Apache Cassandra is 
open-source.

bq. My current impression is that this feature is half-baked and not well 
tested. But it's just my opinion.

Well that are the risks of open source software, you don't have any strong 
guarantees/SLA or whatsoever. But you can contribute to improve SASI. Any pull 
request is welcomed of course. The community will be more than happy to have 
contributors

bq. After that I run the queries with '%' inside. As you can see multi-patterns 
are handled by AND:

Absolutely not. Your examples just show how the index mode {{CONTAINS}} works. 

First query {{name like '%RevisionDiff%';}} means give me all names containing 
{{RevisionDiff}} substring

2nd query {{name like '%ItemImpl%';}} means give me all names containing 
{{ItemImpl}} substring

3rd query {{name like '%RevisionDiff%ItemImpl%';}} means give me all names 
containing {{RevisionDiff}} substring OR 'ItemImpl' substring

Nowhere I see the *AND* semantic in your example





> SASI index. Incorrect results for '%foo%bar%'-like search pattern. 
> ---
>
> Key: CASSANDRA-12573
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12573
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Mikhail Krupitskiy
>Priority: Critical
>  Labels: sasi
>
> We use Cassandra 3.7 and have faced a strange behaviour of SELECT requests 
> with "LIKE '%foo%bar%'" constraints on a column with SASI index.
> Below are few experiments that show this behaviour.
> Experiment 1:
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: no rows.
> Experiment 2 (NOTE: definition of index is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: asdqwe, qweasd, qwea1.
> Experiment 3 (NOTE: primary key is compound now and inserted data was 
> changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, 
> c1));
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: qwe, qweasd, qwea1, 1qwe, asdqwe.
> Experiment 4 (NOTE: search criteria is 

[jira] [Commented] (CASSANDRA-12573) SASI index. Incorrect results for '%foo%bar%'-like search pattern.

2016-09-18 Thread Maxim Podkolzine (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15501293#comment-15501293
 ] 

Maxim Podkolzine commented on CASSANDRA-12573:
--

> SASI initially support multiple predicates, something like : WHERE ((col1=xxx 
> OR col2=yyy) AND (col3 LIKE '%zzz')) but it is not merged yet into the 3.x 
> trunk
That's good news. When do you plan to merge it?

> Wrong, a bug is something that does not work as expected e.g that does not 
> work as documented.
As a customer I have a slightly different view on this. My expectations are 
based on prior experience and common sense.
I understand when certain features that are usual in other products are not 
implemented by design. This is obviously not the case.
My current impression is that this feature is half-baked and not well tested. 
But it's just my opinion.

I think I have a stronger argument that this is a bug. I have created a DB and 
filled it with some data from my disk:
```
CREATE KEYSPACE Excelsior   WITH REPLICATION = { 'class' : 'SimpleStrategy', 
'replication_factor' : 3 };
use excelsior;
create table demo (id text primary key, name text, content text);
CREATE CUSTOM INDEX name_index ON demo (name) USING 
'org.apache.cassandra.index.sasi.SASIIndex'
WITH OPTIONS = {
 'mode': 'CONTAINS',
 'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
 'analyzed': 'true'
};
```

After that I run the queries with '%' inside. As you can see multi-patterns are 
handled by AND:
```
cqlsh:excelsior> select id, name from demo where name like '%RevisionDiff%';

 id   | name
--+
 93dce11a-cfdd-4c16-b3b3-7537c7af03ec | RevisionDiffType.java
 6586058f-bd57-4fc7-ae12-e6d8ddcd2ceb | RevisionDiffItem.java
 d16dff53-002b-4fe6-9a10-bb32425360e0 | RevisionDiffItemDTO.java
 bb20981e-714f-4eac-802f-6191dba5a301 | GetRevisionDiff.java
 1c53574b-2eea-46f8-bcbc-5e295ef9c70a | RevisionDiffItemDTO.java
 7366f852-d63c-4d07-86b3-18a3bf47e79b | RevisionDiffItemDTO.java
 7f18accb-9832-4303-8227-43aa89534cde | RevisionDiffItemImpl.java

(7 rows)
cqlsh:excelsior> select id, name from demo where name like '%ItemImpl%';

 id   | name
--+---
 603c1d12-4871-4244-896a-54ddb76dbd3b | FastTreeItemImpl.java
 7f18accb-9832-4303-8227-43aa89534cde | RevisionDiffItemImpl.java

(2 rows)
cqlsh:excelsior> select id, name from demo where name like 
'%RevisionDiff%ItemImpl%';

 id   | name
--+--
 7f18accb-9832-4303-8227-43aa89534cde | RevisionDiffItemImpl.java

(1 rows)
```

> SASI index. Incorrect results for '%foo%bar%'-like search pattern. 
> ---
>
> Key: CASSANDRA-12573
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12573
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Mikhail Krupitskiy
>Priority: Critical
>  Labels: sasi
>
> We use Cassandra 3.7 and have faced a strange behaviour of SELECT requests 
> with "LIKE '%foo%bar%'" constraints on a column with SASI index.
> Below are few experiments that show this behaviour.
> Experiment 1:
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: no rows.
> Experiment 2 (NOTE: definition of index is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') 

[jira] [Commented] (CASSANDRA-12573) SASI index. Incorrect results for '%foo%bar%'-like search pattern.

2016-09-16 Thread DOAN DuyHai (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15496927#comment-15496927
 ] 

DOAN DuyHai commented on CASSANDRA-12573:
-

"Since CQL doesn't allow multiple LIKE constraints in one query, it does look 
like a bug."

--> Wrong, a bug is something that does not work as expected e.g that does not 
work as documented.

For SASI, nowhere we say that % in the middle of the search term works like a 
wildcard... it's not a bug.

What you ask for is an enhancement

"Can you suggest a workaround to search for several patterns?"

--> use StandardAnalyzer to split the sentence into token and they query with 
LIKE 'pattern1 pattern2'. However you'll get OR semantics, not AND

 SASI initially support multiple predicates, something like :  WHERE ((col1=xxx 
OR col2=yyy) AND (col3 LIKE '%zzz')) but it is not merged yet into the 3.x trunk


> SASI index. Incorrect results for '%foo%bar%'-like search pattern. 
> ---
>
> Key: CASSANDRA-12573
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12573
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Mikhail Krupitskiy
>Priority: Critical
>  Labels: sasi
>
> We use Cassandra 3.7 and have faced a strange behaviour of SELECT requests 
> with "LIKE '%foo%bar%'" constraints on a column with SASI index.
> Below are few experiments that show this behaviour.
> Experiment 1:
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: no rows.
> Experiment 2 (NOTE: definition of index is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: asdqwe, qweasd, qwea1.
> Experiment 3 (NOTE: primary key is compound now and inserted data was 
> changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, 
> c1));
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: qwe, qweasd, qwea1, 1qwe, asdqwe.
> Experiment 4 (NOTE: search criteria is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, 
> c1));
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) 

[jira] [Commented] (CASSANDRA-12573) SASI index. Incorrect results for '%foo%bar%'-like search pattern.

2016-09-16 Thread Maxim Podkolzine (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15496711#comment-15496711
 ] 

Maxim Podkolzine commented on CASSANDRA-12573:
--

[~doanduyhai] Since CQL doesn't allow multiple LIKE constraints in one query, 
it does look like a bug.
Can you suggest a workaround to search for several patterns?

> SASI index. Incorrect results for '%foo%bar%'-like search pattern. 
> ---
>
> Key: CASSANDRA-12573
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12573
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Mikhail Krupitskiy
>Priority: Critical
>  Labels: sasi
>
> We use Cassandra 3.7 and have faced a strange behaviour of SELECT requests 
> with "LIKE '%foo%bar%'" constraints on a column with SASI index.
> Below are few experiments that show this behaviour.
> Experiment 1:
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: no rows.
> Experiment 2 (NOTE: definition of index is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: asdqwe, qweasd, qwea1.
> Experiment 3 (NOTE: primary key is compound now and inserted data was 
> changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, 
> c1));
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: qwe, qweasd, qwea1, 1qwe, asdqwe.
> Experiment 4 (NOTE: search criteria is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, 
> c1));
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w22%a%';
> {noformat}
> Expected result: no rows.
> Actual result: qweasd, qwea1, asdqwe.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12573) SASI index. Incorrect results for '%foo%bar%'-like search pattern.

2016-09-16 Thread DOAN DuyHai (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15496060#comment-15496060
 ] 

DOAN DuyHai commented on CASSANDRA-12573:
-

"To let users to query strings by more flexible patterns. E.g. to find 
'123foo456bar789' by the '%foo%bar%' pattern." 

--> Feasible but not sure how complex it would be, and if it will be as 
optimized as the current impl

Anyway, please create a new JIRA for this feature request and close this one 
since it's not a bug

> SASI index. Incorrect results for '%foo%bar%'-like search pattern. 
> ---
>
> Key: CASSANDRA-12573
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12573
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Mikhail Krupitskiy
>Priority: Critical
>  Labels: sasi
>
> We use Cassandra 3.7 and have faced a strange behaviour of SELECT requests 
> with "LIKE '%foo%bar%'" constraints on a column with SASI index.
> Below are few experiments that show this behaviour.
> Experiment 1:
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: no rows.
> Experiment 2 (NOTE: definition of index is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: asdqwe, qweasd, qwea1.
> Experiment 3 (NOTE: primary key is compound now and inserted data was 
> changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, 
> c1));
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: qwe, qweasd, qwea1, 1qwe, asdqwe.
> Experiment 4 (NOTE: search criteria is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, 
> c1));
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w22%a%';
> {noformat}
> Expected result: no rows.
> Actual result: qweasd, qwea1, 

[jira] [Commented] (CASSANDRA-12573) SASI index. Incorrect results for '%foo%bar%'-like search pattern.

2016-09-16 Thread Mikhail Krupitskiy (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15496031#comment-15496031
 ] 

Mikhail Krupitskiy commented on CASSANDRA-12573:


I agree that it could be applicable for NonTokeninzingAnalyzer only.
--> why should we change it ?
To let users to query strings by more flexible patterns. E.g. to find 
'123foo456bar789' by the '%foo%bar%' pattern.

> SASI index. Incorrect results for '%foo%bar%'-like search pattern. 
> ---
>
> Key: CASSANDRA-12573
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12573
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Mikhail Krupitskiy
>Priority: Critical
>  Labels: sasi
>
> We use Cassandra 3.7 and have faced a strange behaviour of SELECT requests 
> with "LIKE '%foo%bar%'" constraints on a column with SASI index.
> Below are few experiments that show this behaviour.
> Experiment 1:
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: no rows.
> Experiment 2 (NOTE: definition of index is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: asdqwe, qweasd, qwea1.
> Experiment 3 (NOTE: primary key is compound now and inserted data was 
> changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, 
> c1));
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: qwe, qweasd, qwea1, 1qwe, asdqwe.
> Experiment 4 (NOTE: search criteria is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, 
> c1));
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w22%a%';
> {noformat}
> Expected result: no rows.
> Actual result: qweasd, qwea1, asdqwe.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12573) SASI index. Incorrect results for '%foo%bar%'-like search pattern.

2016-09-16 Thread DOAN DuyHai (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15496023#comment-15496023
 ] 

DOAN DuyHai commented on CASSANDRA-12573:
-

"Am I right that '%' interprets as a wildcard only when it is a first or last 
character and it is an expected behaviour?"
  --> yes it is

"If so then are there any plans to change it (interpret it as a wildcard in the 
middle of a string too)?" 

  --> why should we change it ? And if we change it, it will only make sense 
for NonTokeninzingAnalyzer because the StandardAnalyzer will always consired 
{{%}} as a line separator

> SASI index. Incorrect results for '%foo%bar%'-like search pattern. 
> ---
>
> Key: CASSANDRA-12573
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12573
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Mikhail Krupitskiy
>Priority: Critical
>  Labels: sasi
>
> We use Cassandra 3.7 and have faced a strange behaviour of SELECT requests 
> with "LIKE '%foo%bar%'" constraints on a column with SASI index.
> Below are few experiments that show this behaviour.
> Experiment 1:
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: no rows.
> Experiment 2 (NOTE: definition of index is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: asdqwe, qweasd, qwea1.
> Experiment 3 (NOTE: primary key is compound now and inserted data was 
> changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, 
> c1));
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: qwe, qweasd, qwea1, 1qwe, asdqwe.
> Experiment 4 (NOTE: search criteria is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, 
> c1));
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ;
> select c2 from 

[jira] [Commented] (CASSANDRA-12573) SASI index. Incorrect results for '%foo%bar%'-like search pattern.

2016-09-16 Thread Mikhail Krupitskiy (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15496013#comment-15496013
 ] 

Mikhail Krupitskiy commented on CASSANDRA-12573:


Ok, thanks for clarification...I need some time to realize it..
Am I right that '%' interprets as a wildcard only when it is a first or last 
character and it is an expected behaviour?
If so then are there any plans to change it (interpret it as a wildcard in the 
middle of a string too)? 

> SASI index. Incorrect results for '%foo%bar%'-like search pattern. 
> ---
>
> Key: CASSANDRA-12573
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12573
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Mikhail Krupitskiy
>Priority: Critical
>  Labels: sasi
>
> We use Cassandra 3.7 and have faced a strange behaviour of SELECT requests 
> with "LIKE '%foo%bar%'" constraints on a column with SASI index.
> Below are few experiments that show this behaviour.
> Experiment 1:
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: no rows.
> Experiment 2 (NOTE: definition of index is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: asdqwe, qweasd, qwea1.
> Experiment 3 (NOTE: primary key is compound now and inserted data was 
> changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, 
> c1));
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: qwe, qweasd, qwea1, 1qwe, asdqwe.
> Experiment 4 (NOTE: search criteria is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, 
> c1));
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w22%a%';
> {noformat}
> Expected result: no rows.
> Actual result: qweasd, qwea1, asdqwe.



--
This 

[jira] [Commented] (CASSANDRA-12573) SASI index. Incorrect results for '%foo%bar%'-like search pattern.

2016-09-15 Thread DOAN DuyHai (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15494341#comment-15494341
 ] 

DOAN DuyHai commented on CASSANDRA-12573:
-

Ok I get around the issue about %w%a%

So this will be interpreter first by the CQL parser as LIKE CONTAINS with 
searched term = w%a

And then things get complicated

1) if you're using NonTokeninzingAnalyzer or NoOpAnalyzer, everything is fine, 
the % in 'w%a' is interpreted as simple literal and not wildcard character

2) if you're using StandardAnalyzer, it's an entirely different story. During 
the parsing of the search predicates by the query planer, the term 'w%a' is 
passed to the analyzer (StandardAnalyzer here):  
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/Operation.java#L303-L323

The StandardAnalyzer is tokenizing the search term so 'w%a' becomes 2 distinct 
token, 'w' OR 'a'. Why does it ignore the % ? Because according to Unicode line 
breaking rule, % is a separator, read here: 
http://www.unicode.org/Public/UNIDATA/LineBreak.txt

Nowhere in the source code we can see this, in fact you'll need to look into 
the JFlex grammar file 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/analyzer/StandardTokenizerImpl.jflex
 to see a reference to Unicode word breaking rules ...

So indeed when using StandardAnalyzer, any % character will be interpreter as a 
separator so our LIKE '%w%a%' is indeed transformed into a LIKE '%w%' OR LIKE 
'%a%' e.g all words containing 'w' OR 'a', irrespective of their relative 
position to each other ...

Why is it an OR predicate and not an AND predicate ? The answer is a comment in 
the source code here: 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/Operation.java#L290-L295

Experiment 1 returns 0 rows because using NonTokenizingAnalyzer, CORRECT
Experiment 2 returns 3 rows (asdqwe, qweasd, qwea1) because using 
StandardAnalyzer and all the words contains 'w' OR 'a', CORRECT

Same remark for experiments 3 & 4.

Indeed it is not really a bug, it is because you're using the StandardAnalyzer 
with tokenization ...


> SASI index. Incorrect results for '%foo%bar%'-like search pattern. 
> ---
>
> Key: CASSANDRA-12573
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12573
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Mikhail Krupitskiy
>Assignee: Alex Petrov
>Priority: Critical
>  Labels: sasi
>
> We use Cassandra 3.7 and have faced a strange behaviour of SELECT requests 
> with "LIKE '%foo%bar%'" constraints on a column with SASI index.
> Below are few experiments that show this behaviour.
> Experiment 1:
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: no rows.
> Experiment 2 (NOTE: definition of index is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: asdqwe, qweasd, qwea1.
> Experiment 3 (NOTE: primary key is compound now and inserted data was 
> changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if 

[jira] [Commented] (CASSANDRA-12573) SASI index. Incorrect results for '%foo%bar%'-like search pattern.

2016-09-15 Thread DOAN DuyHai (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15493798#comment-15493798
 ] 

DOAN DuyHai commented on CASSANDRA-12573:
-

Right, the escaping issue does not matter here. What we want to understand is 
how SASI interprets the {{%}} in the middle of the term.

Please note that you're using C* 3.7. I have contributed a bug fix (that was 
scheduled for 3.9 and is in trunk) about skip stop words being applied after 
stemming whereas it should be applied before. I'm not sure if it is relevant to 
the current data set here but it rings a bell in my head when you get weird 
behaviors only when using StandardAnalyzer

> SASI index. Incorrect results for '%foo%bar%'-like search pattern. 
> ---
>
> Key: CASSANDRA-12573
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12573
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Mikhail Krupitskiy
>Assignee: Alex Petrov
>Priority: Critical
>  Labels: sasi
>
> We use Cassandra 3.7 and have faced a strange behaviour of SELECT requests 
> with "LIKE '%foo%bar%'" constraints on a column with SASI index.
> Below are few experiments that show this behaviour.
> Experiment 1:
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: no rows.
> Experiment 2 (NOTE: definition of index is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: asdqwe, qweasd, qwea1.
> Experiment 3 (NOTE: primary key is compound now and inserted data was 
> changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, 
> c1));
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: qwe, qweasd, qwea1, 1qwe, asdqwe.
> Experiment 4 (NOTE: search criteria is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, 
> c1));
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values 

[jira] [Commented] (CASSANDRA-12573) SASI index. Incorrect results for '%foo%bar%'-like search pattern.

2016-09-15 Thread DOAN DuyHai (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15493799#comment-15493799
 ] 

DOAN DuyHai commented on CASSANDRA-12573:
-

Right, the escaping issue does not matter here. What we want to understand is 
how SASI interprets the {{%}} in the middle of the term.

Please note that you're using C* 3.7. I have contributed a bug fix (that was 
scheduled for 3.9 and is in trunk) about skip stop words being applied after 
stemming whereas it should be applied before. I'm not sure if it is relevant to 
the current data set here but it rings a bell in my head when you get weird 
behaviors only when using StandardAnalyzer

> SASI index. Incorrect results for '%foo%bar%'-like search pattern. 
> ---
>
> Key: CASSANDRA-12573
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12573
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Mikhail Krupitskiy
>Assignee: Alex Petrov
>Priority: Critical
>  Labels: sasi
>
> We use Cassandra 3.7 and have faced a strange behaviour of SELECT requests 
> with "LIKE '%foo%bar%'" constraints on a column with SASI index.
> Below are few experiments that show this behaviour.
> Experiment 1:
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: no rows.
> Experiment 2 (NOTE: definition of index is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: asdqwe, qweasd, qwea1.
> Experiment 3 (NOTE: primary key is compound now and inserted data was 
> changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, 
> c1));
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: qwe, qweasd, qwea1, 1qwe, asdqwe.
> Experiment 4 (NOTE: search criteria is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, 
> c1));
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values 

[jira] [Commented] (CASSANDRA-12573) SASI index. Incorrect results for '%foo%bar%'-like search pattern.

2016-09-15 Thread Mikhail Krupitskiy (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15493766#comment-15493766
 ] 

Mikhail Krupitskiy commented on CASSANDRA-12573:


Let's try to clarify things.
As I understand there are two different issues:
1) Incorrect processing of escaped '%'.
2) Incorrect processing of %foo%bar% patterns without any escaping.

This issue (12573) is not about escaping and all requests from the experiments 
meaningly don't do any escaping.

> SASI index. Incorrect results for '%foo%bar%'-like search pattern. 
> ---
>
> Key: CASSANDRA-12573
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12573
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Mikhail Krupitskiy
>Assignee: Alex Petrov
>Priority: Critical
>  Labels: sasi
>
> We use Cassandra 3.7 and have faced a strange behaviour of SELECT requests 
> with "LIKE '%foo%bar%'" constraints on a column with SASI index.
> Below are few experiments that show this behaviour.
> Experiment 1:
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: no rows.
> Experiment 2 (NOTE: definition of index is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: asdqwe, qweasd, qwea1.
> Experiment 3 (NOTE: primary key is compound now and inserted data was 
> changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, 
> c1));
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: qwe, qweasd, qwea1, 1qwe, asdqwe.
> Experiment 4 (NOTE: search criteria is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, 
> c1));
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w22%a%';
> {noformat}
> Expected result: no rows.
> Actual 

[jira] [Commented] (CASSANDRA-12573) SASI index. Incorrect results for '%foo%bar%'-like search pattern.

2016-09-15 Thread DOAN DuyHai (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15493749#comment-15493749
 ] 

DOAN DuyHai commented on CASSANDRA-12573:
-

I'm going to try reproducing the issue. But anyway right now there is indeed 
*no escaping* of {{%}}, either for the first, last character or in the middle 
of the term.

I'm attempting escaping for first & last character. 

> SASI index. Incorrect results for '%foo%bar%'-like search pattern. 
> ---
>
> Key: CASSANDRA-12573
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12573
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Mikhail Krupitskiy
>Assignee: Alex Petrov
>Priority: Critical
>  Labels: sasi
>
> We use Cassandra 3.7 and have faced a strange behaviour of SELECT requests 
> with "LIKE '%foo%bar%'" constraints on a column with SASI index.
> Below are few experiments that show this behaviour.
> Experiment 1:
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: no rows.
> Experiment 2 (NOTE: definition of index is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: asdqwe, qweasd, qwea1.
> Experiment 3 (NOTE: primary key is compound now and inserted data was 
> changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, 
> c1));
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: qwe, qweasd, qwea1, 1qwe, asdqwe.
> Experiment 4 (NOTE: search criteria is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, 
> c1));
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w22%a%';
> {noformat}
> Expected result: no rows.
> Actual result: qweasd, qwea1, asdqwe.



--
This message was sent by Atlassian JIRA

[jira] [Commented] (CASSANDRA-12573) SASI index. Incorrect results for '%foo%bar%'-like search pattern.

2016-09-15 Thread Alex Petrov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15493727#comment-15493727
 ] 

Alex Petrov commented on CASSANDRA-12573:
-

What most likely is meant is why {{%w%a%}} search term is matching {{qwe, 
qweasd, qwea1, 1qwe, asdqwe}}, although only when analyzer is used.

>From what I seen in the code {{%}} is only meaningful as very first and very 
>last characters in search term. In the middle it'll bear same semantic meaning 
>as any other character.

> SASI index. Incorrect results for '%foo%bar%'-like search pattern. 
> ---
>
> Key: CASSANDRA-12573
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12573
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Mikhail Krupitskiy
>Assignee: Alex Petrov
>Priority: Critical
>  Labels: sasi
>
> We use Cassandra 3.7 and have faced a strange behaviour of SELECT requests 
> with "LIKE '%foo%bar%'" constraints on a column with SASI index.
> Below are few experiments that show this behaviour.
> Experiment 1:
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: no rows.
> Experiment 2 (NOTE: definition of index is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: asdqwe, qweasd, qwea1.
> Experiment 3 (NOTE: primary key is compound now and inserted data was 
> changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, 
> c1));
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: qwe, qweasd, qwea1, 1qwe, asdqwe.
> Experiment 4 (NOTE: search criteria is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, 
> c1));
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w22%a%';
> {noformat}
> Expected result: 

[jira] [Commented] (CASSANDRA-12573) SASI index. Incorrect results for '%foo%bar%'-like search pattern.

2016-09-15 Thread DOAN DuyHai (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15493699#comment-15493699
 ] 

DOAN DuyHai commented on CASSANDRA-12573:
-

Experiments 2, 3, 4 also contains a {{%}} in the middle of the searched term ...

> SASI index. Incorrect results for '%foo%bar%'-like search pattern. 
> ---
>
> Key: CASSANDRA-12573
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12573
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Mikhail Krupitskiy
>Assignee: Alex Petrov
>Priority: Critical
>  Labels: sasi
>
> We use Cassandra 3.7 and have faced a strange behaviour of SELECT requests 
> with "LIKE '%foo%bar%'" constraints on a column with SASI index.
> Below are few experiments that show this behaviour.
> Experiment 1:
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: no rows.
> Experiment 2 (NOTE: definition of index is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: asdqwe, qweasd, qwea1.
> Experiment 3 (NOTE: primary key is compound now and inserted data was 
> changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, 
> c1));
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: qwe, qweasd, qwea1, 1qwe, asdqwe.
> Experiment 4 (NOTE: search criteria is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, 
> c1));
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w22%a%';
> {noformat}
> Expected result: no rows.
> Actual result: qweasd, qwea1, asdqwe.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12573) SASI index. Incorrect results for '%foo%bar%'-like search pattern.

2016-09-15 Thread Mikhail Krupitskiy (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15493690#comment-15493690
 ] 

Mikhail Krupitskiy commented on CASSANDRA-12573:


Yes, but in experiments 2,3,4 we still have non-empty results.

> SASI index. Incorrect results for '%foo%bar%'-like search pattern. 
> ---
>
> Key: CASSANDRA-12573
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12573
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Mikhail Krupitskiy
>Assignee: Alex Petrov
>Priority: Critical
>  Labels: sasi
>
> We use Cassandra 3.7 and have faced a strange behaviour of SELECT requests 
> with "LIKE '%foo%bar%'" constraints on a column with SASI index.
> Below are few experiments that show this behaviour.
> Experiment 1:
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: no rows.
> Experiment 2 (NOTE: definition of index is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: asdqwe, qweasd, qwea1.
> Experiment 3 (NOTE: primary key is compound now and inserted data was 
> changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, 
> c1));
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: qwe, qweasd, qwea1, 1qwe, asdqwe.
> Experiment 4 (NOTE: search criteria is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, 
> c1));
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w22%a%';
> {noformat}
> Expected result: no rows.
> Actual result: qweasd, qwea1, asdqwe.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12573) SASI index. Incorrect results for '%foo%bar%'-like search pattern.

2016-09-15 Thread DOAN DuyHai (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15493680#comment-15493680
 ] 

DOAN DuyHai commented on CASSANDRA-12573:
-

In your data set, there is no row containing the substring '%w%a%'

> SASI index. Incorrect results for '%foo%bar%'-like search pattern. 
> ---
>
> Key: CASSANDRA-12573
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12573
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Mikhail Krupitskiy
>Assignee: Alex Petrov
>Priority: Critical
>  Labels: sasi
>
> We use Cassandra 3.7 and have faced a strange behaviour of SELECT requests 
> with "LIKE '%foo%bar%'" constraints on a column with SASI index.
> Below are few experiments that show this behaviour.
> Experiment 1:
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: no rows.
> Experiment 2 (NOTE: definition of index is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: asdqwe, qweasd, qwea1.
> Experiment 3 (NOTE: primary key is compound now and inserted data was 
> changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, 
> c1));
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: qwe, qweasd, qwea1, 1qwe, asdqwe.
> Experiment 4 (NOTE: search criteria is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, 
> c1));
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w22%a%';
> {noformat}
> Expected result: no rows.
> Actual result: qweasd, qwea1, asdqwe.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12573) SASI index. Incorrect results for '%foo%bar%'-like search pattern.

2016-09-15 Thread Mikhail Krupitskiy (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15493518#comment-15493518
 ] 

Mikhail Krupitskiy commented on CASSANDRA-12573:


As I see it doesn't explain results of experiments described in the description.
E.g. A request for '%w%a%' returns several results without '%' at all.

> SASI index. Incorrect results for '%foo%bar%'-like search pattern. 
> ---
>
> Key: CASSANDRA-12573
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12573
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Mikhail Krupitskiy
>Assignee: Alex Petrov
>Priority: Critical
>  Labels: sasi
>
> We use Cassandra 3.7 and have faced a strange behaviour of SELECT requests 
> with "LIKE '%foo%bar%'" constraints on a column with SASI index.
> Below are few experiments that show this behaviour.
> Experiment 1:
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: no rows.
> Experiment 2 (NOTE: definition of index is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: asdqwe, qweasd, qwea1.
> Experiment 3 (NOTE: primary key is compound now and inserted data was 
> changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, 
> c1));
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: qwe, qweasd, qwea1, 1qwe, asdqwe.
> Experiment 4 (NOTE: search criteria is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, 
> c1));
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w22%a%';
> {noformat}
> Expected result: no rows.
> Actual result: qweasd, qwea1, asdqwe.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12573) SASI index. Incorrect results for '%foo%bar%'-like search pattern.

2016-09-15 Thread DOAN DuyHai (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15493422#comment-15493422
 ] 

DOAN DuyHai commented on CASSANDRA-12573:
-

Currently SASI can only understand the {{%}} for the beginning (suffix) or 
ending (prefix) position. Any expression containing the {{%}} in the middle 
like {{%w%a%}} will *not* be interpreter by SASI as wildcard.

{{%w%a%}} will translate into "Give me all results containing {{w%a}} 

> SASI index. Incorrect results for '%foo%bar%'-like search pattern. 
> ---
>
> Key: CASSANDRA-12573
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12573
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Mikhail Krupitskiy
>Assignee: Alex Petrov
>Priority: Critical
>  Labels: sasi
>
> We use Cassandra 3.7 and have faced a strange behaviour of SELECT requests 
> with "LIKE '%foo%bar%'" constraints on a column with SASI index.
> Below are few experiments that show this behaviour.
> Experiment 1:
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: no rows.
> Experiment 2 (NOTE: definition of index is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: asdqwe, qweasd, qwea1.
> Experiment 3 (NOTE: primary key is compound now and inserted data was 
> changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, 
> c1));
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: qwe, qweasd, qwea1, 1qwe, asdqwe.
> Experiment 4 (NOTE: search criteria is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, 
> c1));
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w22%a%';
> {noformat}
> Expected result: no rows.
> Actual result: qweasd, qwea1, 

[jira] [Commented] (CASSANDRA-12573) SASI index. Incorrect results for '%foo%bar%'-like search pattern.

2016-09-08 Thread Maxim Podkolzine (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15473404#comment-15473404
 ] 

Maxim Podkolzine commented on CASSANDRA-12573:
--

Can anyone please take a look at this?

> SASI index. Incorrect results for '%foo%bar%'-like search pattern. 
> ---
>
> Key: CASSANDRA-12573
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12573
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Mikhail Krupitskiy
>Assignee: Arunkumar M
>Priority: Critical
>
> We use Cassandra 3.7 and have faced a strange behaviour of SELECT requests 
> with "LIKE '%foo%bar%'" constraints on a column with SASI index.
> Below are few experiments that show this behaviour.
> Experiment 1:
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: no rows.
> Experiment 2 (NOTE: definition of index is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: asdqwe, qweasd, qwea1.
> Experiment 3 (NOTE: primary key is compound now and inserted data was 
> changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, 
> c1));
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: qwe, qweasd, qwea1, 1qwe, asdqwe.
> Experiment 4 (NOTE: search criteria is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 
> 'SimpleStrategy', 'replication_factor':'1'} ;
> use kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, 
> c1));
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w22%a%';
> {noformat}
> Expected result: no rows.
> Actual result: qweasd, qwea1, asdqwe.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)