[jira] [Comment Edited] (CASSANDRA-10661) Integrate SASI to Cassandra

Jordan West (JIRA) Sat, 23 Jan 2016 11:44:19 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15113918#comment-15113918
 ]


Jordan West edited comment on CASSANDRA-10661 at 1/23/16 7:42 PM:
------------------------------------------------------------------

bq. Is there also a way to query a SASI-indexed column by exact value? I mean, 
it seems as if by enabling prefix or contains, that it will always query by 
prefix or contains. For example, if I want to query for full first name, like 
where their full first name really is "J" and not get "John" and "James" as 
well, while at other times I am indeed looking for names starting with a prefix 
of "Jo" for "John", "Joseph", etc.

The example is correct, but this is not a limitation of SASI, its a limitation 
in CQL, and we decided not to further extend the grammar, since we have already 
had to scale back our grammar changes to later phases (removing OR, grouping, 
and != support for now). Ideally, `=` would mean exact match and CQL would 
support a `LIKE` operator similar to SQL, and depending on if the index was 
created with `PREFIX` or `CONTAINS` we would allow/disallow forms such as 
`%Jo%` or `_j%`. 

bq. Will SPARSE mode in fact give me an exact match? (Sounds like it.) In which 
case, would I be better off with a SPARSE index for first_name_full, or would a 
traditional Cassandra non-custom index work fine (or even better.)

It does, but so are all queries on numerical data, which thinking about it, may 
make the `PREFIX` option confusing for numeric types. SPARSE is intended to 
improve query performance on numerical data where there are a large number of 
terms (e.g. timestamps), but small number of keys per term (e.g. some 
timeseries data).  `SPARSE` should not be used on every numerical column, and 
for most non-numerical data is not an ideal setting either. For example, in a 
large data set of first names the number of names will be small compared to the 
number of keys, and given the distribution of first names using SPARSE will 
increase the size of the index and at best have zero effect on query 
performance, but may hurt it.





 


was (Author: jrwest):
bq. Is there also a way to query a SASI-indexed column by exact value? I mean, 
it seems as if by enabling prefix or contains, that it will always query by 
prefix or contains. For example, if I want to query for full first name, like 
where their full first name really is "J" and not get "John" and "James" as 
well, while at other times I am indeed looking for names starting with a prefix 
of "Jo" for "John", "Joseph", etc.

The example is correct, but this is not a limitation of SASI, its a limitation 
in CQL, and we decided not to further extend the grammar, since we have already 
had to scale back our grammar changes to later phases (removing OR, grouping, 
and != support for now). Ideally, CQL would support a `LIKE` operator similar 
to SQL, and depending on if the index was created with `PREFIX` or `CONTAINS` 
we would allow/disallow forms such as `%Jo%` or `_j%`. 

bq. Will SPARSE mode in fact give me an exact match? (Sounds like it.) In which 
case, would I be better off with a SPARSE index for first_name_full, or would a 
traditional Cassandra non-custom index work fine (or even better.)

It does, but so are all queries on numerical data, which thinking about it, may 
make the `PREFIX` option confusing for numeric types. SPARSE is intended to 
improve query performance on numerical data where there are a large number of 
terms (e.g. timestamps), but small number of keys per term (e.g. some 
timeseries data).  `SPARSE` should not be used on every numerical column, and 
for most non-numerical data is not an ideal setting either. For example, in a 
large data set of first names the number of names will be small compared to the 
number of keys, and given the distribution of first names using SPARSE will 
increase the size of the index and at best have zero effect on query 
performance, but may hurt it.





 

> Integrate SASI to Cassandra
> ---------------------------
>
>                 Key: CASSANDRA-10661
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10661
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local Write-Read Paths
>            Reporter: Pavel Yaskevich
>            Assignee: Pavel Yaskevich
>              Labels: sasi
>             Fix For: 3.x
>
>
> We have recently released new secondary index engine 
> (https://github.com/xedin/sasi) build using SecondaryIndex API, there are 
> still couple of things to work out regarding 3.x since it's currently 
> targeted on 2.0 released. I want to make this an umbrella issue to all of the 
> things related to integration of SASI, which are also tracked in 
> [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra 
> 3.x release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-10661) Integrate SASI to Cassandra

Reply via email to