Re: Adding preserveOriginal Capability to EdgeNGramFilterFactory
Sorry for the late reply. But here is what I was talking about big bang => b bi big (skipping the words other than the first one) big bang => b ba ban bang (skipping the first word) big bang => a an ang (skipping the first word + skipping first letter in subsequent words) The reason for this is to apply custom boosting for the matches based on where the search term matches (start, middle, end etc). May be we should use RegEx before EdgeNGramFilterFactory? But I was thinking to have EdgeNGramFilterFactory take a parameter to skip "n" characters from the start or end before generating the grams. Your thoughts? Thanks, Kranti K. Parisa http://www.linkedin.com/in/krantiparisa On Wed, Nov 13, 2013 at 6:38 AM, Furkan KAMACI wrote: > EdgeNGramFilterFactory creates n-grams from the beginning edge of a input > token by default. You can change its side. You can define minimum and > maximum gram size for it. Here is what *can *EdgeNGramFilterFactory do > with a configuration of minimum gram size is 2 and maximum gram size is 4: > > apache => ap, apa, apac, apach > > If we talk about your situation. What is a word for you? Strings that are > delimited by whitespaces, underscores ... etc? > > > > > > 2013/11/12 Kranti Parisa > >> Can EdgeNGramFilterFactory handle the cases where we need to >> skip/consider the "n" words from the start or end? >> >> For example: >> >> Title: big bang theory >> >> field1: populate full ngrams >> field2: populate ngrams for "bang theory" = skipping the first word "big" >> field3: populate ngrams for "big" = considering only the first word "big" >> field4: populate ngrams for "theory" = considering only the last word >> "theory" >> >> and at query time, I would like to apply field level boosting to rank the >> results. >> >> >> >> Thanks, >> Kranti K. Parisa >> http://www.linkedin.com/in/krantiparisa >> >> >> >> On Sun, Nov 10, 2013 at 5:51 PM, Furkan KAMACI wrote: >> >>> Hi; >>> >>> There were two issues about adding preserveOriginal capability to >>> EdgeNGramFilterFactory and I've made a patch about it. You can check and >>> test it from here: https://issues.apache.org/jira/browse/SOLR-5152 This >>> is the related issue that can be marked as duplicated: >>> https://issues.apache.org/jira/browse/SOLR-5332 >>> >>> Thanks; >>> Furkan KAMACI >>> >> >> >
Re: Adding preserveOriginal Capability to EdgeNGramFilterFactory
EdgeNGramFilterFactory creates n-grams from the beginning edge of a input token by default. You can change its side. You can define minimum and maximum gram size for it. Here is what *can *EdgeNGramFilterFactory do with a configuration of minimum gram size is 2 and maximum gram size is 4: apache => ap, apa, apac, apach If we talk about your situation. What is a word for you? Strings that are delimited by whitespaces, underscores ... etc? 2013/11/12 Kranti Parisa > Can EdgeNGramFilterFactory handle the cases where we need to skip/consider > the "n" words from the start or end? > > For example: > > Title: big bang theory > > field1: populate full ngrams > field2: populate ngrams for "bang theory" = skipping the first word "big" > field3: populate ngrams for "big" = considering only the first word "big" > field4: populate ngrams for "theory" = considering only the last word > "theory" > > and at query time, I would like to apply field level boosting to rank the > results. > > > > Thanks, > Kranti K. Parisa > http://www.linkedin.com/in/krantiparisa > > > > On Sun, Nov 10, 2013 at 5:51 PM, Furkan KAMACI wrote: > >> Hi; >> >> There were two issues about adding preserveOriginal capability to >> EdgeNGramFilterFactory and I've made a patch about it. You can check and >> test it from here: https://issues.apache.org/jira/browse/SOLR-5152 This >> is the related issue that can be marked as duplicated: >> https://issues.apache.org/jira/browse/SOLR-5332 >> >> Thanks; >> Furkan KAMACI >> > >
Re: Adding preserveOriginal Capability to EdgeNGramFilterFactory
Can EdgeNGramFilterFactory handle the cases where we need to skip/consider the "n" words from the start or end? For example: Title: big bang theory field1: populate full ngrams field2: populate ngrams for "bang theory" = skipping the first word "big" field3: populate ngrams for "big" = considering only the first word "big" field4: populate ngrams for "theory" = considering only the last word "theory" and at query time, I would like to apply field level boosting to rank the results. Thanks, Kranti K. Parisa http://www.linkedin.com/in/krantiparisa On Sun, Nov 10, 2013 at 5:51 PM, Furkan KAMACI wrote: > Hi; > > There were two issues about adding preserveOriginal capability to > EdgeNGramFilterFactory and I've made a patch about it. You can check and > test it from here: https://issues.apache.org/jira/browse/SOLR-5152 This > is the related issue that can be marked as duplicated: > https://issues.apache.org/jira/browse/SOLR-5332 > > Thanks; > Furkan KAMACI >
Adding preserveOriginal Capability to EdgeNGramFilterFactory
Hi; There were two issues about adding preserveOriginal capability to EdgeNGramFilterFactory and I've made a patch about it. You can check and test it from here: https://issues.apache.org/jira/browse/SOLR-5152 This is the related issue that can be marked as duplicated: https://issues.apache.org/jira/browse/SOLR-5332 Thanks; Furkan KAMACI