Re: Indexing Query

2015-02-18 Thread Ian Lea
You mean you'd like a BooleanQuery.setMaximumNumberShouldMatch()
method?  Unfortunately that doesn't exist and I can't think of a
simple way of doing it.


--
Ian.


On Wed, Feb 18, 2015 at 5:26 AM, Deepak Gopalakrishnan  wrote:
> Thanks Ian. Also, if I have a unigram in the query, and I want to make sure
> I match only index entries that do not have more than 2 tokens, is there a
> way to do that too?
>
> Thanks
>
> On Wed, Feb 18, 2015 at 2:23 AM, Ian Lea  wrote:
>
>> Break the query into words then add them as TermQuery instances as
>> optional clauses to a BooleanQuery with a call to
>> setMinimumNumberShouldMatch(2) somewhere along the line.  You may want
>> to do some parsing or analysis on the query terms to avoid problems of
>> case matching and the like.
>>
>>
>> --
>> Ian.
>>
>>
>> On Tue, Feb 17, 2015 at 4:57 PM, Deepak Gopalakrishnan 
>> wrote:
>> > Hello,
>> >
>> > I have a rather simple query. I have a list where I have terms like and
>> > then my query is more natural language. I want to be able to retrieve
>> >  matches that has atleast 2 words in common between the query and the
>> index
>> >
>> > Can you guys suggest a Query Type and a field that I should be using?
>> >
>> > --
>> > Regards,
>> > *Deepak Gopalakrishnan*
>>
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
>
>
> --
> Regards,
> *Deepak Gopalakrishnan*
> *Mobile*:+918891509774
> *Skype* : deepakgk87
> http://myexps.blogspot.com

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Indexing Query

2015-02-18 Thread Deepak Gopalakrishnan
Oops, alright, I'll probably look around for a workaround.

On Wed, Feb 18, 2015 at 3:24 PM, Ian Lea  wrote:

> You mean you'd like a BooleanQuery.setMaximumNumberShouldMatch()
> method?  Unfortunately that doesn't exist and I can't think of a
> simple way of doing it.
>
>
> --
> Ian.
>
>
> On Wed, Feb 18, 2015 at 5:26 AM, Deepak Gopalakrishnan 
> wrote:
> > Thanks Ian. Also, if I have a unigram in the query, and I want to make
> sure
> > I match only index entries that do not have more than 2 tokens, is there
> a
> > way to do that too?
> >
> > Thanks
> >
> > On Wed, Feb 18, 2015 at 2:23 AM, Ian Lea  wrote:
> >
> >> Break the query into words then add them as TermQuery instances as
> >> optional clauses to a BooleanQuery with a call to
> >> setMinimumNumberShouldMatch(2) somewhere along the line.  You may want
> >> to do some parsing or analysis on the query terms to avoid problems of
> >> case matching and the like.
> >>
> >>
> >> --
> >> Ian.
> >>
> >>
> >> On Tue, Feb 17, 2015 at 4:57 PM, Deepak Gopalakrishnan <
> dgk...@gmail.com>
> >> wrote:
> >> > Hello,
> >> >
> >> > I have a rather simple query. I have a list where I have terms like
> and
> >> > then my query is more natural language. I want to be able to retrieve
> >> >  matches that has atleast 2 words in common between the query and the
> >> index
> >> >
> >> > Can you guys suggest a Query Type and a field that I should be using?
> >> >
> >> > --
> >> > Regards,
> >> > *Deepak Gopalakrishnan*
> >>
> >> -
> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >>
> >>
> >
> >
> > --
> > Regards,
> > *Deepak Gopalakrishnan*
> > *Mobile*:+918891509774
> > *Skype* : deepakgk87
> > http://myexps.blogspot.com
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


-- 
Regards,
*Deepak Gopalakrishnan*
*Mobile*:+918891509774
*Skype* : deepakgk87
http://myexps.blogspot.com


Lucene fuzzy and wildcard search, and scoring in AutomatonQuery

2015-02-18 Thread Yossi Vainshtein
Hi all,

I'm using Apache Lucene and currently trying to combine Fuzzy and Prefix
(or Wildcard) query to implement a kind of suggestion mechanism.

For example, if the query is "levy", a document containing "Levinshtein" should
also be returned.

As there seems no builtin query of this sort in Lucene, I've searched for
solutions, this issue has been asked about. I used the approach suggested
here
http://stackoverflow.com/questions/28565090/scoring-results-of-automatonquery

by
Robert Muir, that creates the query as a concatenation of two Automata
(Levinshtein and Wildcard).

That works great indeed, but, now the thing is that there's no scoring. All
results get result of *1.0*. I really want "Levy" to be ranked higher then
"Levninshtein" in the previous example.

By the way, I tried using Lucene auto-suggestion in the form of
FuzzySuggester, but it's not feasible with large inputs, it holds all
suggestion in RAM and bloats the memory usage.

Is there another way of doing this? Or I should implement my own *Scorer*
 or *Similarity*?


Thanks

Yossi


Re: Indexing Query

2015-02-18 Thread Jack Krupansky
You could store the length of the field (in terms) in a second field and
then add a MUST term to the BooleanQuery which is a RangeQuery with an
upper bound that is the maximum length that can match.

-- Jack Krupansky

On Wed, Feb 18, 2015 at 4:54 AM, Ian Lea  wrote:

> You mean you'd like a BooleanQuery.setMaximumNumberShouldMatch()
> method?  Unfortunately that doesn't exist and I can't think of a
> simple way of doing it.
>
>
> --
> Ian.
>
>
> On Wed, Feb 18, 2015 at 5:26 AM, Deepak Gopalakrishnan 
> wrote:
> > Thanks Ian. Also, if I have a unigram in the query, and I want to make
> sure
> > I match only index entries that do not have more than 2 tokens, is there
> a
> > way to do that too?
> >
> > Thanks
> >
> > On Wed, Feb 18, 2015 at 2:23 AM, Ian Lea  wrote:
> >
> >> Break the query into words then add them as TermQuery instances as
> >> optional clauses to a BooleanQuery with a call to
> >> setMinimumNumberShouldMatch(2) somewhere along the line.  You may want
> >> to do some parsing or analysis on the query terms to avoid problems of
> >> case matching and the like.
> >>
> >>
> >> --
> >> Ian.
> >>
> >>
> >> On Tue, Feb 17, 2015 at 4:57 PM, Deepak Gopalakrishnan <
> dgk...@gmail.com>
> >> wrote:
> >> > Hello,
> >> >
> >> > I have a rather simple query. I have a list where I have terms like
> and
> >> > then my query is more natural language. I want to be able to retrieve
> >> >  matches that has atleast 2 words in common between the query and the
> >> index
> >> >
> >> > Can you guys suggest a Query Type and a field that I should be using?
> >> >
> >> > --
> >> > Regards,
> >> > *Deepak Gopalakrishnan*
> >>
> >> -
> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >>
> >>
> >
> >
> > --
> > Regards,
> > *Deepak Gopalakrishnan*
> > *Mobile*:+918891509774
> > *Skype* : deepakgk87
> > http://myexps.blogspot.com
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: High frequency terms in results document....

2015-02-18 Thread Tomoko Uchida
Hi,

I'm afraid there are no easy or straight way for your requirement.
I would try create an temporary tiny index from search results on the fly
in memory, and get top N terms from it by HighFreqTerms.
http://lucene.apache.org/core/4_10_3/misc/org/apache/lucene/misc/HighFreqTerms.html
(The logic is almost same to Luke's top N terms feature)

I have not tried ant not sure about this is practical approach in
performance, just an idea...

Hope for it's help
Tomoko

2015-02-16 1:58 GMT+09:00 Shouvik Bardhan :

> Apologies if I have missed it in discussions prior but I looked all over. I
> looked at the Luke code and it does find high frequency terms on the entire
> index. I am trying to get the top N high frequency terms in the documents
> returned from a search result. I came across something called
> FilterIndexReader but I don't think it is part of 4.X codebase. Any pointer
> is appreciated.
>