[ 
https://issues.apache.org/jira/browse/CASSANDRA-14247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16380009#comment-16380009
 ] 

Michael Kjellman commented on CASSANDRA-14247:
----------------------------------------------

one other quick thought: we should have a way to escape the delimiter. when 
doing change to attempt and do iteratively seems easy to check if last 
character was escape character and not split if it was there or whatever

> SASI tokenizer for simple delimiter based entries
> -------------------------------------------------
>
>                 Key: CASSANDRA-14247
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14247
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: sasi
>            Reporter: mck
>            Assignee: mck
>            Priority: Major
>             Fix For: 4.0, 3.11.x
>
>
> Currently SASI offers only two tokenizer options:
>  - NonTokenizerAnalyser
>  - StandardAnalyzer
> The latter is built upon Snowball, powerful for human languages but overkill 
> for simple tokenization.
> A simple tokenizer is proposed here. The need for this arose as a workaround 
> of CASSANDRA-11182, and to avoid the disk usage explosion when having to 
> resort to {{CONTAINS}}. See https://github.com/openzipkin/zipkin/issues/1861
> Example use of this would be:
> {code}
> CREATE CUSTOM INDEX span_annotation_query_idx 
>     ON zipkin2.span (annotation_query) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' 
>     WITH OPTIONS = {
>         'analyzer_class': 
> 'org.apache.cassandra.index.sasi.analyzer.DelimiterAnalyzer', 
>         'delimiter': '░',
>         'case_sensitive': 'true', 
>         'mode': 'prefix', 
>         'analyzed': 'true'};
> {code}
> Original credit for this work goes to https://github.com/zuochangan



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to