[ https://issues.apache.org/jira/browse/LUCENE-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007837#comment-13007837 ]
Elmar Pitschke commented on LUCENE-2749: ---------------------------------------- The first use case that comes into my mind is the filtering of possible names. One of the request i always get is the automatic generation of tag-clouds with a consideration in the search results. I think this would be one possibility to get names without the need to maintain a word list. Another thing of course would be to get some kind of semantic combination of words. So you could get to more "natural" search experience. I think if a user search for two words and these are quite near in a text it may be more useful than a lot of occurances of the two words but with no combination. Which use cases do you have in mind? > Co-occurrence filter > -------------------- > > Key: LUCENE-2749 > URL: https://issues.apache.org/jira/browse/LUCENE-2749 > Project: Lucene - Java > Issue Type: New Feature > Components: Analysis > Affects Versions: 3.1, 4.0 > Reporter: Steven Rowe > Priority: Minor > Fix For: 4.0 > > > The co-occurrence filter to be developed here will output sets of tokens that > co-occur within a given window onto a token stream. > These token sets can be ordered either lexically (to allow order-independent > matching/counting) or positionally (e.g. sliding windows of positionally > ordered co-occurring terms that include all terms in the window are called > n-grams or shingles). > The parameters to this filter will be: > * window size: this can be a fixed sequence length, sentence/paragraph > context (these will require sentence/paragraph segmentation, which is not in > Lucene yet), or over the entire token stream (full field width) > * minimum number of co-occurring terms: >= 2 > * maximum number of co-occurring terms: <= window size > * token set ordering (lexical or positional) > One use case for co-occurring token sets is as candidates for collocations. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org