Re: Proximity query

2015-02-12 Thread Maisnam Ns
Hi,

I googled it but could not find the jars of these classes can some help me
where to get the jars

import org.apache.lucene.corpus.stats.IDFCalc;
import org.apache.lucene.corpus.stats.TFIDFPriorityQueue;
import org.apache.lucene.corpus.stats.TermIDF;

Thanks

On Thu, Feb 12, 2015 at 11:01 PM, Maisnam Ns  wrote:

> Hi Allison and Sujit,
>
> Thanks so much for your links I am so happy I am looking at  exactly the
> links that almost covers my use case.
>
> Allison, sure will get back to you if I have some more questions.
>
> Regards
> NS
>
>
>
>
>
> On Thu, Feb 12, 2015 at 10:49 PM, Sujit Pal  wrote:
>
>> I did something like this sometime back. The objective was to find
>> patterns
>> surrounding some keywords of interest so I could find keywords similar to
>> the ones I was looking for, sort of like a poor man's word2vec. It uses
>> SpanQuery as Jigar said, and you can find the code here (I believe it was
>> written against Lucene 3.x so you may have to upgrade it if you are using
>> Lucene 4.x):
>>
>>
>> http://sujitpal.blogspot.com/2011/08/implementing-concordance-with-lucene.html
>>
>> -sujit
>>
>>
>> On Thu, Feb 12, 2015 at 8:57 AM, Maisnam Ns  wrote:
>>
>> > Hi Shah,
>> >
>> > Thanks for your reply. Will try to google SpanQuery meanwhile if you
>> have
>> > some links can you please share
>> >
>> > Thanks
>> >
>> > On Thu, Feb 12, 2015 at 10:17 PM, Jigar Shah 
>> > wrote:
>> >
>> > > This concept is called Proximity Search in general.
>> > >
>> > > In Lucene they are achieved using SpanQuery.
>> > >
>> > > On Thu, Feb 12, 2015 at 10:10 PM, Maisnam Ns 
>> > wrote:
>> > >
>> > > > Hi,
>> > > >
>> > > > Can someone help me if this use case is possible or not with lucene
>> > > >
>> > > > Use case: I have a string say 'Japan' appearing in 10 documents and
>> I
>> > > want
>> > > > to get back , say some results which contain two words before
>> 'Japan'
>> > and
>> > > > two words after 'Japan' may be something like this ' Economy of
>> Japan
>> > is
>> > > > growing' etc.
>> > > >
>> > > >  If it is not possible where should I look for such queries
>> > > >
>> > > > Thanks
>> > > >
>> > >
>> >
>>
>
>


Re: Proximity query

2015-02-12 Thread Maisnam Ns
Hi Allison and Sujit,

Thanks so much for your links I am so happy I am looking at  exactly the
links that almost covers my use case.

Allison, sure will get back to you if I have some more questions.

Regards
NS





On Thu, Feb 12, 2015 at 10:49 PM, Sujit Pal  wrote:

> I did something like this sometime back. The objective was to find patterns
> surrounding some keywords of interest so I could find keywords similar to
> the ones I was looking for, sort of like a poor man's word2vec. It uses
> SpanQuery as Jigar said, and you can find the code here (I believe it was
> written against Lucene 3.x so you may have to upgrade it if you are using
> Lucene 4.x):
>
>
> http://sujitpal.blogspot.com/2011/08/implementing-concordance-with-lucene.html
>
> -sujit
>
>
> On Thu, Feb 12, 2015 at 8:57 AM, Maisnam Ns  wrote:
>
> > Hi Shah,
> >
> > Thanks for your reply. Will try to google SpanQuery meanwhile if you have
> > some links can you please share
> >
> > Thanks
> >
> > On Thu, Feb 12, 2015 at 10:17 PM, Jigar Shah 
> > wrote:
> >
> > > This concept is called Proximity Search in general.
> > >
> > > In Lucene they are achieved using SpanQuery.
> > >
> > > On Thu, Feb 12, 2015 at 10:10 PM, Maisnam Ns 
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > Can someone help me if this use case is possible or not with lucene
> > > >
> > > > Use case: I have a string say 'Japan' appearing in 10 documents and I
> > > want
> > > > to get back , say some results which contain two words before 'Japan'
> > and
> > > > two words after 'Japan' may be something like this ' Economy of Japan
> > is
> > > > growing' etc.
> > > >
> > > >  If it is not possible where should I look for such queries
> > > >
> > > > Thanks
> > > >
> > >
> >
>


Re: Proximity query

2015-02-12 Thread Sujit Pal
I did something like this sometime back. The objective was to find patterns
surrounding some keywords of interest so I could find keywords similar to
the ones I was looking for, sort of like a poor man's word2vec. It uses
SpanQuery as Jigar said, and you can find the code here (I believe it was
written against Lucene 3.x so you may have to upgrade it if you are using
Lucene 4.x):

http://sujitpal.blogspot.com/2011/08/implementing-concordance-with-lucene.html

-sujit


On Thu, Feb 12, 2015 at 8:57 AM, Maisnam Ns  wrote:

> Hi Shah,
>
> Thanks for your reply. Will try to google SpanQuery meanwhile if you have
> some links can you please share
>
> Thanks
>
> On Thu, Feb 12, 2015 at 10:17 PM, Jigar Shah 
> wrote:
>
> > This concept is called Proximity Search in general.
> >
> > In Lucene they are achieved using SpanQuery.
> >
> > On Thu, Feb 12, 2015 at 10:10 PM, Maisnam Ns 
> wrote:
> >
> > > Hi,
> > >
> > > Can someone help me if this use case is possible or not with lucene
> > >
> > > Use case: I have a string say 'Japan' appearing in 10 documents and I
> > want
> > > to get back , say some results which contain two words before 'Japan'
> and
> > > two words after 'Japan' may be something like this ' Economy of Japan
> is
> > > growing' etc.
> > >
> > >  If it is not possible where should I look for such queries
> > >
> > > Thanks
> > >
> >
>


RE: Proximity query

2015-02-12 Thread Allison, Timothy B.
Might also look at concordance code on LUCENE-5317 and here:

https://github.com/tballison/lucene-addons/tree/master/lucene-5317

Let me know if you have any questions.

-Original Message-
From: Maisnam Ns [mailto:maisnam...@gmail.com] 
Sent: Thursday, February 12, 2015 11:57 AM
To: java-user@lucene.apache.org
Subject: Re: Proximity query

Hi Shah,

Thanks for your reply. Will try to google SpanQuery meanwhile if you have
some links can you please share

Thanks

On Thu, Feb 12, 2015 at 10:17 PM, Jigar Shah  wrote:

> This concept is called Proximity Search in general.
>
> In Lucene they are achieved using SpanQuery.
>
> On Thu, Feb 12, 2015 at 10:10 PM, Maisnam Ns  wrote:
>
> > Hi,
> >
> > Can someone help me if this use case is possible or not with lucene
> >
> > Use case: I have a string say 'Japan' appearing in 10 documents and I
> want
> > to get back , say some results which contain two words before 'Japan' and
> > two words after 'Japan' may be something like this ' Economy of Japan is
> > growing' etc.
> >
> >  If it is not possible where should I look for such queries
> >
> > Thanks
> >
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Proximity query

2015-02-12 Thread Maisnam Ns
Hi Shah,

Thanks for your reply. Will try to google SpanQuery meanwhile if you have
some links can you please share

Thanks

On Thu, Feb 12, 2015 at 10:17 PM, Jigar Shah  wrote:

> This concept is called Proximity Search in general.
>
> In Lucene they are achieved using SpanQuery.
>
> On Thu, Feb 12, 2015 at 10:10 PM, Maisnam Ns  wrote:
>
> > Hi,
> >
> > Can someone help me if this use case is possible or not with lucene
> >
> > Use case: I have a string say 'Japan' appearing in 10 documents and I
> want
> > to get back , say some results which contain two words before 'Japan' and
> > two words after 'Japan' may be something like this ' Economy of Japan is
> > growing' etc.
> >
> >  If it is not possible where should I look for such queries
> >
> > Thanks
> >
>


Re: Proximity query

2015-02-12 Thread Jigar Shah
This concept is called Proximity Search in general.

In Lucene they are achieved using SpanQuery.

On Thu, Feb 12, 2015 at 10:10 PM, Maisnam Ns  wrote:

> Hi,
>
> Can someone help me if this use case is possible or not with lucene
>
> Use case: I have a string say 'Japan' appearing in 10 documents and I want
> to get back , say some results which contain two words before 'Japan' and
> two words after 'Japan' may be something like this ' Economy of Japan is
> growing' etc.
>
>  If it is not possible where should I look for such queries
>
> Thanks
>


Re: Proximity Query Parser

2006-09-01 Thread Paul Elschot
On Friday 01 September 2006 19:46, Mark Miller wrote:
> Eric also gave me the idea of using a SpanNear with maximum slop as a
> boolean to connect spans. Using this and SpanOr seems to make my time spent
> on the distribution of proximity clauses a little foolish :) Is that true?

There is practice and there is theory. You chose practice this time.
(In theory there is no difference between the two, but in practice...)

> Is there any disadvantage to the max slop Spannear, SpanOr solution? Any
> advantage to distributing the 'and's?

Span queries (and phrase queries) access the proximity information,
and that slows them down when compared to pure boolean queries,
which can get away by using only the the term frequencies in the
documents. The difference in access time is roughly as big as these
term frequencies.
When querying an index with larger documents, the difference can be
quite noticable. However, using proximity information normally
gives more accurate results. With operators in the query language,
the choice is up to the user.

Similarly, phrase queries are faster than span queries, but phrase queries
cannot be nested. Ideally, a query language would hide this, but
this requires an implementation in which phrase queries treat slop
in the same way as span queries.
 
Regards,
Paul Elschot

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Proximity Query Parser

2006-09-01 Thread Mark Miller

Eric also gave me the idea of using a SpanNear with maximum slop as a
boolean to connect spans. Using this and SpanOr seems to make my time spent
on the distribution of proximity clauses a little foolish :) Is that true?
Is there any disadvantage to the max slop Spannear, SpanOr solution? Any
advantage to distributing the 'and's?

- Mark


On 9/1/06, Mark Miller <[EMAIL PROTECTED]> wrote:


 Thanks for the tip Paul. It is embarrassing, but I only realized how
OrSpan queries worked a day or two ago based on a tip from Eric. The way I
assumed it would create the spans before was just wrong and I never had
researched further. Now I see that it would be a nice optimization for what
I have...but I have not yet looked into how easy it will be to integrate it
into my distribution algorithm. I do use it for multiphrase queries however
based on Erics tip. It will hopefully be pretty simple to apply it to my
distribution, but I have not had time to check it out. I plowed this thing
out pretty quickly and am hoping I can go back and clean up a lot of things.
Need a short break though to pump out some other things. As I learn more
about Lucene and JavaCC I will incorporate new methods into the parser.


- Mark


 On 9/1/06, Paul Elschot <[EMAIL PROTECTED]> wrote:
>
> On Friday 01 September 2006 12:54, Mark Miller wrote:
>
> > Hi Paul,
> >
> > I also have to treat things differently depending on if I am in a
> > proximity clause or boolean clause. A wildcard in a boolean is mapped
> to
> > a wildcard query. A wildcard in a proximity is mapped to a regex span
> > that has been modified to only deal with * and ?. When I run into a
> > proximity, I collect a small tree of each clause and distribute them
> > against each other...(old | map) ~3 big gets distributed to old ~3 big
> |
> > map ~3 big. This distribution method appears to handle all
>
> There is no need to repeat "big". SpanQueries can be nested,
> so when mapping like this:
> SpanNear(SpanOr( old, map), big)
> the query structure will only grow for truncations and fuzzy stuff.
>
> > boolean/proximity nesting/mixing cases for me, including: great ! "big
> > old phrase search" ~5 (holy ~4 (big black bear)). The distribution
> > maintains order of operations, but also obviously can create some
> pretty
> > large queries.
>
> Regards,
> Paul Elschot
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>



Re: Proximity Query Parser

2006-09-01 Thread Mark Miller

Thanks for the tip Paul. It is embarrassing, but I only realized how OrSpan
queries worked a day or two ago based on a tip from Eric. The way I assumed
it would create the spans before was just wrong and I never had researched
further. Now I see that it would be a nice optimization for what I
have...but I have not yet looked into how easy it will be to integrate it
into my distribution algorithm. I do use it for multiphrase queries however
based on Erics tip. It will hopefully be pretty simple to apply it to my
distribution, but I have not had time to check it out. I plowed this thing
out pretty quickly and am hoping I can go back and clean up a lot of things.
Need a short break though to pump out some other things. As I learn more
about Lucene and JavaCC I will incorporate new methods into the parser.


- Mark


On 9/1/06, Paul Elschot <[EMAIL PROTECTED]> wrote:


On Friday 01 September 2006 12:54, Mark Miller wrote:

> Hi Paul,
>
> I also have to treat things differently depending on if I am in a
> proximity clause or boolean clause. A wildcard in a boolean is mapped to
> a wildcard query. A wildcard in a proximity is mapped to a regex span
> that has been modified to only deal with * and ?. When I run into a
> proximity, I collect a small tree of each clause and distribute them
> against each other...(old | map) ~3 big gets distributed to old ~3 big |
> map ~3 big. This distribution method appears to handle all

There is no need to repeat "big". SpanQueries can be nested,
so when mapping like this:
SpanNear(SpanOr( old, map), big)
the query structure will only grow for truncations and fuzzy stuff.

> boolean/proximity nesting/mixing cases for me, including: great ! "big
> old phrase search" ~5 (holy ~4 (big black bear)). The distribution
> maintains order of operations, but also obviously can create some pretty
> large queries.

Regards,
Paul Elschot

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: Proximity Query Parser

2006-09-01 Thread Paul Elschot
On Friday 01 September 2006 12:54, Mark Miller wrote:

> Hi Paul,
> 
> I also have to treat things differently depending on if I am in a 
> proximity clause or boolean clause. A wildcard in a boolean is mapped to 
> a wildcard query. A wildcard in a proximity is mapped to a regex span 
> that has been modified to only deal with * and ?. When I run into a 
> proximity, I collect a small tree of each clause and distribute them 
> against each other...(old | map) ~3 big gets distributed to old ~3 big | 
> map ~3 big. This distribution method appears to handle all 

There is no need to repeat "big". SpanQueries can be nested,
so when mapping like this:
SpanNear(SpanOr( old, map), big)
the query structure will only grow for truncations and fuzzy stuff.

> boolean/proximity nesting/mixing cases for me, including: great ! "big 
> old phrase search" ~5 (holy ~4 (big black bear)). The distribution 
> maintains order of operations, but also obviously can create some pretty 
> large queries.

Regards,
Paul Elschot

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Proximity Query Parser

2006-09-01 Thread Mark Miller

Paul Elschot wrote:

Mark,

On Thursday 31 August 2006 23:18, Mark Miller wrote:
  
I am not a huge fan of the queryparser's syntax so I have started an 
open source project to create a viable alternative. I could really use 
some helping testing it out. The more I can get it tested the better 
chance it has of serving the community. The parser is called Qsol. I am 
right up against its initial release. So far it:


offers a simple clean syntax.
allows arbitrary combinations/nesting of proximity and boolean queries.



Could you say in a few words how the combination of proximity and boolean
is implemented in Qsol?

I found this the most difficult thing to implement in surround. In surround, 
every subquery that can be a proximity subquery has two (groups of) methods: 
one for use as boolean and one for use as proximity.
I'd like to have a mechanism that allows mixing proximity and boolean queries 
built into Lucene.


Did you also implement parsed phrases with Lucene's PhraseQuery?
Surround does not have that.

Regards,
Paul Elschot

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


  

Hi Paul,

I'm afraid my programming is prob quite a ways behind yours so I doubt 
anything I have done will be of any help to you.


I also have to treat things differently depending on if I am in a 
proximity clause or boolean clause. A wildcard in a boolean is mapped to 
a wildcard query. A wildcard in a proximity is mapped to a regex span 
that has been modified to only deal with * and ?. When I run into a 
proximity, I collect a small tree of each clause and distribute them 
against each other...(old | map) ~3 big gets distributed to old ~3 big | 
map ~3 big. This distribution method appears to handle all 
boolean/proximity nesting/mixing cases for me, including: great ! "big 
old phrase search" ~5 (holy ~4 (big black bear)). The distribution 
maintains order of operations, but also obviously can create some pretty 
large queries.


I did not use the phrase search because I do not like how the slop works 
(not in order, etc.) so both in and out of proximity uses a nearspan 
instead. For a multiphrase search I use an OrSpan on words in the same 
position.


- Mark

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Proximity Query Parser

2006-09-01 Thread Paul Elschot
Mark,

On Thursday 31 August 2006 23:18, Mark Miller wrote:
> I am not a huge fan of the queryparser's syntax so I have started an 
> open source project to create a viable alternative. I could really use 
> some helping testing it out. The more I can get it tested the better 
> chance it has of serving the community. The parser is called Qsol. I am 
> right up against its initial release. So far it:
> 
> offers a simple clean syntax.
> allows arbitrary combinations/nesting of proximity and boolean queries.

Could you say in a few words how the combination of proximity and boolean
is implemented in Qsol?

I found this the most difficult thing to implement in surround. In surround, 
every subquery that can be a proximity subquery has two (groups of) methods: 
one for use as boolean and one for use as proximity.
I'd like to have a mechanism that allows mixing proximity and boolean queries 
built into Lucene.

Did you also implement parsed phrases with Lucene's PhraseQuery?
Surround does not have that.

Regards,
Paul Elschot

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]