[ https://issues.apache.org/jira/browse/LUCENE-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Steve Rowe updated LUCENE-2749: ------------------------------- Fix Version/s: (was: 6.0) (was: 4.9) > Co-occurrence filter > -------------------- > > Key: LUCENE-2749 > URL: https://issues.apache.org/jira/browse/LUCENE-2749 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis > Affects Versions: 3.1, 4.0-ALPHA > Reporter: Steve Rowe > Priority: Minor > > The co-occurrence filter to be developed here will output sets of tokens that > co-occur within a given window onto a token stream. > These token sets can be ordered either lexically (to allow order-independent > matching/counting) or positionally (e.g. sliding windows of positionally > ordered co-occurring terms that include all terms in the window are called > n-grams or shingles). > The parameters to this filter will be: > * window size: this can be a fixed sequence length, sentence/paragraph > context (these will require sentence/paragraph segmentation, which is not in > Lucene yet), or over the entire token stream (full field width) > * minimum number of co-occurring terms: >= 2 > * maximum number of co-occurring terms: <= window size > * token set ordering (lexical or positional) > One use case for co-occurring token sets is as candidates for collocations. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org