Re: Adding preserveOriginal Capability to EdgeNGramFilterFactory

2013-12-31 Thread Kranti Parisa
Sorry for the late reply.

But here is what I was talking about

big bang => b bi big (skipping the words other than the first one)
big bang => b ba ban bang (skipping the first word)
big bang => a an ang (skipping the first word + skipping first letter in
subsequent words)

The reason for this is to apply custom boosting for the matches based on
where the search term matches (start, middle, end etc).

May be we should use RegEx before EdgeNGramFilterFactory? But I was
thinking to have EdgeNGramFilterFactory take a parameter to skip "n"
characters from the start or end before generating the grams.

Your thoughts?




Thanks,
Kranti K. Parisa
http://www.linkedin.com/in/krantiparisa



On Wed, Nov 13, 2013 at 6:38 AM, Furkan KAMACI wrote:

> EdgeNGramFilterFactory creates n-grams from the beginning edge of a input
> token by default. You can change its side. You can define minimum and
> maximum gram size for it. Here is what *can *EdgeNGramFilterFactory do
> with a configuration of minimum gram size is 2 and maximum gram size is 4:
>
> apache => ap, apa, apac, apach
>
> If we talk about your situation. What is a word for you? Strings that are
> delimited by whitespaces, underscores ... etc?
>
>
>
>
>
> 2013/11/12 Kranti Parisa 
>
>> Can EdgeNGramFilterFactory handle the cases where we need to
>> skip/consider the "n" words from the start or end?
>>
>> For example:
>>
>> Title: big bang theory
>>
>> field1: populate full ngrams
>> field2: populate ngrams for "bang theory" = skipping the first word "big"
>> field3: populate ngrams for "big" = considering only the first word "big"
>> field4: populate ngrams for "theory" = considering only the last word
>> "theory"
>>
>> and at query time, I would like to apply field level boosting to rank the
>> results.
>>
>>
>>
>> Thanks,
>> Kranti K. Parisa
>> http://www.linkedin.com/in/krantiparisa
>>
>>
>>
>> On Sun, Nov 10, 2013 at 5:51 PM, Furkan KAMACI wrote:
>>
>>> Hi;
>>>
>>> There were two issues about adding preserveOriginal capability to
>>> EdgeNGramFilterFactory and I've made a patch about it. You can check and
>>> test it from here: https://issues.apache.org/jira/browse/SOLR-5152 This
>>> is the related issue that can be marked as duplicated:
>>> https://issues.apache.org/jira/browse/SOLR-5332
>>>
>>> Thanks;
>>> Furkan KAMACI
>>>
>>
>>
>


Re: Adding preserveOriginal Capability to EdgeNGramFilterFactory

2013-11-13 Thread Furkan KAMACI
EdgeNGramFilterFactory creates n-grams from the beginning edge of a input
token by default. You can change its side. You can define minimum and
maximum gram size for it. Here is what *can *EdgeNGramFilterFactory do with
a configuration of minimum gram size is 2 and maximum gram size is 4:

apache => ap, apa, apac, apach

If we talk about your situation. What is a word for you? Strings that are
delimited by whitespaces, underscores ... etc?





2013/11/12 Kranti Parisa 

> Can EdgeNGramFilterFactory handle the cases where we need to skip/consider
> the "n" words from the start or end?
>
> For example:
>
> Title: big bang theory
>
> field1: populate full ngrams
> field2: populate ngrams for "bang theory" = skipping the first word "big"
> field3: populate ngrams for "big" = considering only the first word "big"
> field4: populate ngrams for "theory" = considering only the last word
> "theory"
>
> and at query time, I would like to apply field level boosting to rank the
> results.
>
>
>
> Thanks,
> Kranti K. Parisa
> http://www.linkedin.com/in/krantiparisa
>
>
>
> On Sun, Nov 10, 2013 at 5:51 PM, Furkan KAMACI wrote:
>
>> Hi;
>>
>> There were two issues about adding preserveOriginal capability to
>> EdgeNGramFilterFactory and I've made a patch about it. You can check and
>> test it from here: https://issues.apache.org/jira/browse/SOLR-5152 This
>> is the related issue that can be marked as duplicated:
>> https://issues.apache.org/jira/browse/SOLR-5332
>>
>> Thanks;
>> Furkan KAMACI
>>
>
>


Re: Adding preserveOriginal Capability to EdgeNGramFilterFactory

2013-11-12 Thread Kranti Parisa
Can EdgeNGramFilterFactory handle the cases where we need to skip/consider
the "n" words from the start or end?

For example:

Title: big bang theory

field1: populate full ngrams
field2: populate ngrams for "bang theory" = skipping the first word "big"
field3: populate ngrams for "big" = considering only the first word "big"
field4: populate ngrams for "theory" = considering only the last word
"theory"

and at query time, I would like to apply field level boosting to rank the
results.



Thanks,
Kranti K. Parisa
http://www.linkedin.com/in/krantiparisa



On Sun, Nov 10, 2013 at 5:51 PM, Furkan KAMACI wrote:

> Hi;
>
> There were two issues about adding preserveOriginal capability to
> EdgeNGramFilterFactory and I've made a patch about it. You can check and
> test it from here: https://issues.apache.org/jira/browse/SOLR-5152 This
> is the related issue that can be marked as duplicated:
> https://issues.apache.org/jira/browse/SOLR-5332
>
> Thanks;
> Furkan KAMACI
>


Adding preserveOriginal Capability to EdgeNGramFilterFactory

2013-11-10 Thread Furkan KAMACI
Hi;

There were two issues about adding preserveOriginal capability to
EdgeNGramFilterFactory and I've made a patch about it. You can check and
test it from here: https://issues.apache.org/jira/browse/SOLR-5152 This is
the related issue that can be marked as duplicated:
https://issues.apache.org/jira/browse/SOLR-5332

Thanks;
Furkan KAMACI