[ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15113918#comment-15113918 ]
Jordan West edited comment on CASSANDRA-10661 at 1/23/16 7:42 PM: ------------------------------------------------------------------ bq. Is there also a way to query a SASI-indexed column by exact value? I mean, it seems as if by enabling prefix or contains, that it will always query by prefix or contains. For example, if I want to query for full first name, like where their full first name really is "J" and not get "John" and "James" as well, while at other times I am indeed looking for names starting with a prefix of "Jo" for "John", "Joseph", etc. The example is correct, but this is not a limitation of SASI, its a limitation in CQL, and we decided not to further extend the grammar, since we have already had to scale back our grammar changes to later phases (removing OR, grouping, and != support for now). Ideally, `=` would mean exact match and CQL would support a `LIKE` operator similar to SQL, and depending on if the index was created with `PREFIX` or `CONTAINS` we would allow/disallow forms such as `%Jo%` or `_j%`. bq. Will SPARSE mode in fact give me an exact match? (Sounds like it.) In which case, would I be better off with a SPARSE index for first_name_full, or would a traditional Cassandra non-custom index work fine (or even better.) It does, but so are all queries on numerical data, which thinking about it, may make the `PREFIX` option confusing for numeric types. SPARSE is intended to improve query performance on numerical data where there are a large number of terms (e.g. timestamps), but small number of keys per term (e.g. some timeseries data). `SPARSE` should not be used on every numerical column, and for most non-numerical data is not an ideal setting either. For example, in a large data set of first names the number of names will be small compared to the number of keys, and given the distribution of first names using SPARSE will increase the size of the index and at best have zero effect on query performance, but may hurt it. was (Author: jrwest): bq. Is there also a way to query a SASI-indexed column by exact value? I mean, it seems as if by enabling prefix or contains, that it will always query by prefix or contains. For example, if I want to query for full first name, like where their full first name really is "J" and not get "John" and "James" as well, while at other times I am indeed looking for names starting with a prefix of "Jo" for "John", "Joseph", etc. The example is correct, but this is not a limitation of SASI, its a limitation in CQL, and we decided not to further extend the grammar, since we have already had to scale back our grammar changes to later phases (removing OR, grouping, and != support for now). Ideally, CQL would support a `LIKE` operator similar to SQL, and depending on if the index was created with `PREFIX` or `CONTAINS` we would allow/disallow forms such as `%Jo%` or `_j%`. bq. Will SPARSE mode in fact give me an exact match? (Sounds like it.) In which case, would I be better off with a SPARSE index for first_name_full, or would a traditional Cassandra non-custom index work fine (or even better.) It does, but so are all queries on numerical data, which thinking about it, may make the `PREFIX` option confusing for numeric types. SPARSE is intended to improve query performance on numerical data where there are a large number of terms (e.g. timestamps), but small number of keys per term (e.g. some timeseries data). `SPARSE` should not be used on every numerical column, and for most non-numerical data is not an ideal setting either. For example, in a large data set of first names the number of names will be small compared to the number of keys, and given the distribution of first names using SPARSE will increase the size of the index and at best have zero effect on query performance, but may hurt it. > Integrate SASI to Cassandra > --------------------------- > > Key: CASSANDRA-10661 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10661 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths > Reporter: Pavel Yaskevich > Assignee: Pavel Yaskevich > Labels: sasi > Fix For: 3.x > > > We have recently released new secondary index engine > (https://github.com/xedin/sasi) build using SecondaryIndex API, there are > still couple of things to work out regarding 3.x since it's currently > targeted on 2.0 released. I want to make this an umbrella issue to all of the > things related to integration of SASI, which are also tracked in > [sasi_issues|https://github.com/xedin/sasi/issues], into mainline Cassandra > 3.x release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)