Re: Proximity query
Hi, I googled it but could not find the jars of these classes can some help me where to get the jars import org.apache.lucene.corpus.stats.IDFCalc; import org.apache.lucene.corpus.stats.TFIDFPriorityQueue; import org.apache.lucene.corpus.stats.TermIDF; Thanks On Thu, Feb 12, 2015 at 11:01 PM, Maisnam Ns wrote: > Hi Allison and Sujit, > > Thanks so much for your links I am so happy I am looking at exactly the > links that almost covers my use case. > > Allison, sure will get back to you if I have some more questions. > > Regards > NS > > > > > > On Thu, Feb 12, 2015 at 10:49 PM, Sujit Pal wrote: > >> I did something like this sometime back. The objective was to find >> patterns >> surrounding some keywords of interest so I could find keywords similar to >> the ones I was looking for, sort of like a poor man's word2vec. It uses >> SpanQuery as Jigar said, and you can find the code here (I believe it was >> written against Lucene 3.x so you may have to upgrade it if you are using >> Lucene 4.x): >> >> >> http://sujitpal.blogspot.com/2011/08/implementing-concordance-with-lucene.html >> >> -sujit >> >> >> On Thu, Feb 12, 2015 at 8:57 AM, Maisnam Ns wrote: >> >> > Hi Shah, >> > >> > Thanks for your reply. Will try to google SpanQuery meanwhile if you >> have >> > some links can you please share >> > >> > Thanks >> > >> > On Thu, Feb 12, 2015 at 10:17 PM, Jigar Shah >> > wrote: >> > >> > > This concept is called Proximity Search in general. >> > > >> > > In Lucene they are achieved using SpanQuery. >> > > >> > > On Thu, Feb 12, 2015 at 10:10 PM, Maisnam Ns >> > wrote: >> > > >> > > > Hi, >> > > > >> > > > Can someone help me if this use case is possible or not with lucene >> > > > >> > > > Use case: I have a string say 'Japan' appearing in 10 documents and >> I >> > > want >> > > > to get back , say some results which contain two words before >> 'Japan' >> > and >> > > > two words after 'Japan' may be something like this ' Economy of >> Japan >> > is >> > > > growing' etc. >> > > > >> > > > If it is not possible where should I look for such queries >> > > > >> > > > Thanks >> > > > >> > > >> > >> > >
Re: Proximity query
Hi Allison and Sujit, Thanks so much for your links I am so happy I am looking at exactly the links that almost covers my use case. Allison, sure will get back to you if I have some more questions. Regards NS On Thu, Feb 12, 2015 at 10:49 PM, Sujit Pal wrote: > I did something like this sometime back. The objective was to find patterns > surrounding some keywords of interest so I could find keywords similar to > the ones I was looking for, sort of like a poor man's word2vec. It uses > SpanQuery as Jigar said, and you can find the code here (I believe it was > written against Lucene 3.x so you may have to upgrade it if you are using > Lucene 4.x): > > > http://sujitpal.blogspot.com/2011/08/implementing-concordance-with-lucene.html > > -sujit > > > On Thu, Feb 12, 2015 at 8:57 AM, Maisnam Ns wrote: > > > Hi Shah, > > > > Thanks for your reply. Will try to google SpanQuery meanwhile if you have > > some links can you please share > > > > Thanks > > > > On Thu, Feb 12, 2015 at 10:17 PM, Jigar Shah > > wrote: > > > > > This concept is called Proximity Search in general. > > > > > > In Lucene they are achieved using SpanQuery. > > > > > > On Thu, Feb 12, 2015 at 10:10 PM, Maisnam Ns > > wrote: > > > > > > > Hi, > > > > > > > > Can someone help me if this use case is possible or not with lucene > > > > > > > > Use case: I have a string say 'Japan' appearing in 10 documents and I > > > want > > > > to get back , say some results which contain two words before 'Japan' > > and > > > > two words after 'Japan' may be something like this ' Economy of Japan > > is > > > > growing' etc. > > > > > > > > If it is not possible where should I look for such queries > > > > > > > > Thanks > > > > > > > > > >
Re: Proximity query
I did something like this sometime back. The objective was to find patterns surrounding some keywords of interest so I could find keywords similar to the ones I was looking for, sort of like a poor man's word2vec. It uses SpanQuery as Jigar said, and you can find the code here (I believe it was written against Lucene 3.x so you may have to upgrade it if you are using Lucene 4.x): http://sujitpal.blogspot.com/2011/08/implementing-concordance-with-lucene.html -sujit On Thu, Feb 12, 2015 at 8:57 AM, Maisnam Ns wrote: > Hi Shah, > > Thanks for your reply. Will try to google SpanQuery meanwhile if you have > some links can you please share > > Thanks > > On Thu, Feb 12, 2015 at 10:17 PM, Jigar Shah > wrote: > > > This concept is called Proximity Search in general. > > > > In Lucene they are achieved using SpanQuery. > > > > On Thu, Feb 12, 2015 at 10:10 PM, Maisnam Ns > wrote: > > > > > Hi, > > > > > > Can someone help me if this use case is possible or not with lucene > > > > > > Use case: I have a string say 'Japan' appearing in 10 documents and I > > want > > > to get back , say some results which contain two words before 'Japan' > and > > > two words after 'Japan' may be something like this ' Economy of Japan > is > > > growing' etc. > > > > > > If it is not possible where should I look for such queries > > > > > > Thanks > > > > > >
RE: Proximity query
Might also look at concordance code on LUCENE-5317 and here: https://github.com/tballison/lucene-addons/tree/master/lucene-5317 Let me know if you have any questions. -Original Message- From: Maisnam Ns [mailto:maisnam...@gmail.com] Sent: Thursday, February 12, 2015 11:57 AM To: java-user@lucene.apache.org Subject: Re: Proximity query Hi Shah, Thanks for your reply. Will try to google SpanQuery meanwhile if you have some links can you please share Thanks On Thu, Feb 12, 2015 at 10:17 PM, Jigar Shah wrote: > This concept is called Proximity Search in general. > > In Lucene they are achieved using SpanQuery. > > On Thu, Feb 12, 2015 at 10:10 PM, Maisnam Ns wrote: > > > Hi, > > > > Can someone help me if this use case is possible or not with lucene > > > > Use case: I have a string say 'Japan' appearing in 10 documents and I > want > > to get back , say some results which contain two words before 'Japan' and > > two words after 'Japan' may be something like this ' Economy of Japan is > > growing' etc. > > > > If it is not possible where should I look for such queries > > > > Thanks > > > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Proximity query
Hi Shah, Thanks for your reply. Will try to google SpanQuery meanwhile if you have some links can you please share Thanks On Thu, Feb 12, 2015 at 10:17 PM, Jigar Shah wrote: > This concept is called Proximity Search in general. > > In Lucene they are achieved using SpanQuery. > > On Thu, Feb 12, 2015 at 10:10 PM, Maisnam Ns wrote: > > > Hi, > > > > Can someone help me if this use case is possible or not with lucene > > > > Use case: I have a string say 'Japan' appearing in 10 documents and I > want > > to get back , say some results which contain two words before 'Japan' and > > two words after 'Japan' may be something like this ' Economy of Japan is > > growing' etc. > > > > If it is not possible where should I look for such queries > > > > Thanks > > >
Re: Proximity query
This concept is called Proximity Search in general. In Lucene they are achieved using SpanQuery. On Thu, Feb 12, 2015 at 10:10 PM, Maisnam Ns wrote: > Hi, > > Can someone help me if this use case is possible or not with lucene > > Use case: I have a string say 'Japan' appearing in 10 documents and I want > to get back , say some results which contain two words before 'Japan' and > two words after 'Japan' may be something like this ' Economy of Japan is > growing' etc. > > If it is not possible where should I look for such queries > > Thanks >
Re: Proximity Query Parser
On Friday 01 September 2006 19:46, Mark Miller wrote: > Eric also gave me the idea of using a SpanNear with maximum slop as a > boolean to connect spans. Using this and SpanOr seems to make my time spent > on the distribution of proximity clauses a little foolish :) Is that true? There is practice and there is theory. You chose practice this time. (In theory there is no difference between the two, but in practice...) > Is there any disadvantage to the max slop Spannear, SpanOr solution? Any > advantage to distributing the 'and's? Span queries (and phrase queries) access the proximity information, and that slows them down when compared to pure boolean queries, which can get away by using only the the term frequencies in the documents. The difference in access time is roughly as big as these term frequencies. When querying an index with larger documents, the difference can be quite noticable. However, using proximity information normally gives more accurate results. With operators in the query language, the choice is up to the user. Similarly, phrase queries are faster than span queries, but phrase queries cannot be nested. Ideally, a query language would hide this, but this requires an implementation in which phrase queries treat slop in the same way as span queries. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Proximity Query Parser
Eric also gave me the idea of using a SpanNear with maximum slop as a boolean to connect spans. Using this and SpanOr seems to make my time spent on the distribution of proximity clauses a little foolish :) Is that true? Is there any disadvantage to the max slop Spannear, SpanOr solution? Any advantage to distributing the 'and's? - Mark On 9/1/06, Mark Miller <[EMAIL PROTECTED]> wrote: Thanks for the tip Paul. It is embarrassing, but I only realized how OrSpan queries worked a day or two ago based on a tip from Eric. The way I assumed it would create the spans before was just wrong and I never had researched further. Now I see that it would be a nice optimization for what I have...but I have not yet looked into how easy it will be to integrate it into my distribution algorithm. I do use it for multiphrase queries however based on Erics tip. It will hopefully be pretty simple to apply it to my distribution, but I have not had time to check it out. I plowed this thing out pretty quickly and am hoping I can go back and clean up a lot of things. Need a short break though to pump out some other things. As I learn more about Lucene and JavaCC I will incorporate new methods into the parser. - Mark On 9/1/06, Paul Elschot <[EMAIL PROTECTED]> wrote: > > On Friday 01 September 2006 12:54, Mark Miller wrote: > > > Hi Paul, > > > > I also have to treat things differently depending on if I am in a > > proximity clause or boolean clause. A wildcard in a boolean is mapped > to > > a wildcard query. A wildcard in a proximity is mapped to a regex span > > that has been modified to only deal with * and ?. When I run into a > > proximity, I collect a small tree of each clause and distribute them > > against each other...(old | map) ~3 big gets distributed to old ~3 big > | > > map ~3 big. This distribution method appears to handle all > > There is no need to repeat "big". SpanQueries can be nested, > so when mapping like this: > SpanNear(SpanOr( old, map), big) > the query structure will only grow for truncations and fuzzy stuff. > > > boolean/proximity nesting/mixing cases for me, including: great ! "big > > old phrase search" ~5 (holy ~4 (big black bear)). The distribution > > maintains order of operations, but also obviously can create some > pretty > > large queries. > > Regards, > Paul Elschot > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >
Re: Proximity Query Parser
Thanks for the tip Paul. It is embarrassing, but I only realized how OrSpan queries worked a day or two ago based on a tip from Eric. The way I assumed it would create the spans before was just wrong and I never had researched further. Now I see that it would be a nice optimization for what I have...but I have not yet looked into how easy it will be to integrate it into my distribution algorithm. I do use it for multiphrase queries however based on Erics tip. It will hopefully be pretty simple to apply it to my distribution, but I have not had time to check it out. I plowed this thing out pretty quickly and am hoping I can go back and clean up a lot of things. Need a short break though to pump out some other things. As I learn more about Lucene and JavaCC I will incorporate new methods into the parser. - Mark On 9/1/06, Paul Elschot <[EMAIL PROTECTED]> wrote: On Friday 01 September 2006 12:54, Mark Miller wrote: > Hi Paul, > > I also have to treat things differently depending on if I am in a > proximity clause or boolean clause. A wildcard in a boolean is mapped to > a wildcard query. A wildcard in a proximity is mapped to a regex span > that has been modified to only deal with * and ?. When I run into a > proximity, I collect a small tree of each clause and distribute them > against each other...(old | map) ~3 big gets distributed to old ~3 big | > map ~3 big. This distribution method appears to handle all There is no need to repeat "big". SpanQueries can be nested, so when mapping like this: SpanNear(SpanOr( old, map), big) the query structure will only grow for truncations and fuzzy stuff. > boolean/proximity nesting/mixing cases for me, including: great ! "big > old phrase search" ~5 (holy ~4 (big black bear)). The distribution > maintains order of operations, but also obviously can create some pretty > large queries. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Proximity Query Parser
On Friday 01 September 2006 12:54, Mark Miller wrote: > Hi Paul, > > I also have to treat things differently depending on if I am in a > proximity clause or boolean clause. A wildcard in a boolean is mapped to > a wildcard query. A wildcard in a proximity is mapped to a regex span > that has been modified to only deal with * and ?. When I run into a > proximity, I collect a small tree of each clause and distribute them > against each other...(old | map) ~3 big gets distributed to old ~3 big | > map ~3 big. This distribution method appears to handle all There is no need to repeat "big". SpanQueries can be nested, so when mapping like this: SpanNear(SpanOr( old, map), big) the query structure will only grow for truncations and fuzzy stuff. > boolean/proximity nesting/mixing cases for me, including: great ! "big > old phrase search" ~5 (holy ~4 (big black bear)). The distribution > maintains order of operations, but also obviously can create some pretty > large queries. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Proximity Query Parser
Paul Elschot wrote: Mark, On Thursday 31 August 2006 23:18, Mark Miller wrote: I am not a huge fan of the queryparser's syntax so I have started an open source project to create a viable alternative. I could really use some helping testing it out. The more I can get it tested the better chance it has of serving the community. The parser is called Qsol. I am right up against its initial release. So far it: offers a simple clean syntax. allows arbitrary combinations/nesting of proximity and boolean queries. Could you say in a few words how the combination of proximity and boolean is implemented in Qsol? I found this the most difficult thing to implement in surround. In surround, every subquery that can be a proximity subquery has two (groups of) methods: one for use as boolean and one for use as proximity. I'd like to have a mechanism that allows mixing proximity and boolean queries built into Lucene. Did you also implement parsed phrases with Lucene's PhraseQuery? Surround does not have that. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] Hi Paul, I'm afraid my programming is prob quite a ways behind yours so I doubt anything I have done will be of any help to you. I also have to treat things differently depending on if I am in a proximity clause or boolean clause. A wildcard in a boolean is mapped to a wildcard query. A wildcard in a proximity is mapped to a regex span that has been modified to only deal with * and ?. When I run into a proximity, I collect a small tree of each clause and distribute them against each other...(old | map) ~3 big gets distributed to old ~3 big | map ~3 big. This distribution method appears to handle all boolean/proximity nesting/mixing cases for me, including: great ! "big old phrase search" ~5 (holy ~4 (big black bear)). The distribution maintains order of operations, but also obviously can create some pretty large queries. I did not use the phrase search because I do not like how the slop works (not in order, etc.) so both in and out of proximity uses a nearspan instead. For a multiphrase search I use an OrSpan on words in the same position. - Mark - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Proximity Query Parser
Mark, On Thursday 31 August 2006 23:18, Mark Miller wrote: > I am not a huge fan of the queryparser's syntax so I have started an > open source project to create a viable alternative. I could really use > some helping testing it out. The more I can get it tested the better > chance it has of serving the community. The parser is called Qsol. I am > right up against its initial release. So far it: > > offers a simple clean syntax. > allows arbitrary combinations/nesting of proximity and boolean queries. Could you say in a few words how the combination of proximity and boolean is implemented in Qsol? I found this the most difficult thing to implement in surround. In surround, every subquery that can be a proximity subquery has two (groups of) methods: one for use as boolean and one for use as proximity. I'd like to have a mechanism that allows mixing proximity and boolean queries built into Lucene. Did you also implement parsed phrases with Lucene's PhraseQuery? Surround does not have that. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]