Re: Query: A ? B
Actually a slop of 1 does guarantee order... it is either an exact match or 1 term off. It takes a slop of 2 or greater for reverse order matches. But it is not exactly 1 term off, which is what Jochen wants. *shrug* Erik On Mar 4, 2004, at 6:22 PM, Otis Gospodnetic wrote: Ah, sorry, I had misread your email, thinking you were asking a way to match a single character. The only thing that comes to my tired mind now is a phrase query with a slop of 1, but that doesn't gurantee order, I believe. Otis --- Jochen Frey <[EMAIL PROTECTED]> wrote: Otis: Maybe I don't understand this right, but I *think* I am looking for something different: I am trying to write a query like this: "my * house" which should match "my own house", "my red house", "my small house", but should not match "my house" ... you get the idea. If I am not mistaken, a wildcard query only works if the wildcard is within a word (or token), and it would allow me to do things like "g*" matching "green", "great", ...etc. I don't know how to make that work for multi words scenarios. Here is what I tried WildcardQuery in the unit test (TestBasics): Query query = new WildcardQuery(new Term("field","six hundred * five")); Thanks! Jochen -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Thursday, March 04, 2004 12:00 PM To: Lucene Users List Subject: Re: Query: A ? B Use WildcardQuery: A?B Otis --- Jochen Frey <[EMAIL PROTECTED]> wrote: Hi Everyone. I am trying to figure out how create a query that matches A ? B Where ? is exactly one token. Can anyone tell me how to do that? Obviously it's easy to match 'A * B' where '*' is 0 or 1 tokens (just use a PhraseQuery and set slop to 1). However, if I require exactly one word/token between 'A' and 'B'? BTW, I know a very clumsy way of doing this, but I really don't like it: For each indexed token insert a token (for example 'X') at the same token-position. Then the query would be: "A X B" and everybody (except the indexing performance as well as the size on disk) would be happy. There's got to be an easier way. Right? Thanks in advance! Jochen - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Query: A ? B
Ah, sorry, I had misread your email, thinking you were asking a way to match a single character. The only thing that comes to my tired mind now is a phrase query with a slop of 1, but that doesn't gurantee order, I believe. Otis --- Jochen Frey <[EMAIL PROTECTED]> wrote: > Otis: > > Maybe I don't understand this right, but I *think* I am looking for > something different: > > I am trying to write a query like this: "my * house" which should > match "my > own house", "my red house", "my small house", but should not match > "my > house" ... you get the idea. > > If I am not mistaken, a wildcard query only works if the wildcard is > within > a word (or token), and it would allow me to do things like "g*" > matching > "green", "great", ...etc. I don't know how to make that work for > multi words > scenarios. > > Here is what I tried WildcardQuery in the unit test (TestBasics): > > Query query = new WildcardQuery(new Term("field","six hundred * > five")); > > Thanks! > Jochen > > -Original Message- > From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] > Sent: Thursday, March 04, 2004 12:00 PM > To: Lucene Users List > Subject: Re: Query: A ? B > > Use WildcardQuery: A?B > > Otis > > --- Jochen Frey <[EMAIL PROTECTED]> wrote: > > Hi Everyone. > > > > I am trying to figure out how create a query that matches > > > > A ? B > > > > Where ? is exactly one token. Can anyone tell me how to do that? > > > > > > Obviously it's easy to match 'A * B' where '*' is 0 or 1 tokens > (just > > use a > > PhraseQuery and set slop to 1). However, if I require exactly one > > word/token > > between 'A' and 'B'? > > > > > > BTW, I know a very clumsy way of doing this, but I really don't > like > > it: For > > each indexed token insert a token (for example 'X') at the same > > token-position. Then the query would be: "A X B" and everybody > > (except the > > indexing performance as well as the size on disk) would be happy. > > > > There's got to be an easier way. Right? > > > > Thanks in advance! > > Jochen > > > > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: > [EMAIL PROTECTED] > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Query: A ? B
I think I know my way around the Span feature reasonably well ... and I don't think it can be used for what I want to do. But I would love to be proven wrong on this one. :) -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Thursday, March 04, 2004 1:52 PM To: Lucene Users List Subject: Re: Query: A ? B Right Otis was confused by what you were asking. Google supports what you are asking for, I believe, although I don't recall if an '*' indicates one or more or just one. As far as I know, there is no easy way to do the exact distance like you desire. You could always clone the PhraseQuery stuff into a custom Query that uses an == instead of a < for the slop. Although you'll also need to tweak this to disallow reversing of terms too. Slop handles terms out of order too. Maybe the new span feature can do this? Erik On Mar 4, 2004, at 4:29 PM, Jochen Frey wrote: > Otis: > > Maybe I don't understand this right, but I *think* I am looking for > something different: > > I am trying to write a query like this: "my * house" which should > match "my > own house", "my red house", "my small house", but should not match "my > house" ... you get the idea. > > If I am not mistaken, a wildcard query only works if the wildcard is > within > a word (or token), and it would allow me to do things like "g*" > matching > "green", "great", ...etc. I don't know how to make that work for multi > words > scenarios. > > Here is what I tried WildcardQuery in the unit test (TestBasics): > > Query query = new WildcardQuery(new Term("field","six hundred * > five")); > > Thanks! > Jochen > > -Original Message- > From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] > Sent: Thursday, March 04, 2004 12:00 PM > To: Lucene Users List > Subject: Re: Query: A ? B > > Use WildcardQuery: A?B > > Otis > > --- Jochen Frey <[EMAIL PROTECTED]> wrote: >> Hi Everyone. >> >> I am trying to figure out how create a query that matches >> >> A ? B >> >> Where ? is exactly one token. Can anyone tell me how to do that? >> >> >> Obviously it's easy to match 'A * B' where '*' is 0 or 1 tokens (just >> use a >> PhraseQuery and set slop to 1). However, if I require exactly one >> word/token >> between 'A' and 'B'? >> >> >> BTW, I know a very clumsy way of doing this, but I really don't like >> it: For >> each indexed token insert a token (for example 'X') at the same >> token-position. Then the query would be: "A X B" and everybody >> (except the >> indexing performance as well as the size on disk) would be happy. >> >> There's got to be an easier way. Right? >> >> Thanks in advance! >> Jochen >> >> >> - >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Query: A ? B
Right Otis was confused by what you were asking. Google supports what you are asking for, I believe, although I don't recall if an '*' indicates one or more or just one. As far as I know, there is no easy way to do the exact distance like you desire. You could always clone the PhraseQuery stuff into a custom Query that uses an == instead of a < for the slop. Although you'll also need to tweak this to disallow reversing of terms too. Slop handles terms out of order too. Maybe the new span feature can do this? Erik On Mar 4, 2004, at 4:29 PM, Jochen Frey wrote: Otis: Maybe I don't understand this right, but I *think* I am looking for something different: I am trying to write a query like this: "my * house" which should match "my own house", "my red house", "my small house", but should not match "my house" ... you get the idea. If I am not mistaken, a wildcard query only works if the wildcard is within a word (or token), and it would allow me to do things like "g*" matching "green", "great", ...etc. I don't know how to make that work for multi words scenarios. Here is what I tried WildcardQuery in the unit test (TestBasics): Query query = new WildcardQuery(new Term("field","six hundred * five")); Thanks! Jochen -Original Message----- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Thursday, March 04, 2004 12:00 PM To: Lucene Users List Subject: Re: Query: A ? B Use WildcardQuery: A?B Otis --- Jochen Frey <[EMAIL PROTECTED]> wrote: Hi Everyone. I am trying to figure out how create a query that matches A ? B Where ? is exactly one token. Can anyone tell me how to do that? Obviously it's easy to match 'A * B' where '*' is 0 or 1 tokens (just use a PhraseQuery and set slop to 1). However, if I require exactly one word/token between 'A' and 'B'? BTW, I know a very clumsy way of doing this, but I really don't like it: For each indexed token insert a token (for example 'X') at the same token-position. Then the query would be: "A X B" and everybody (except the indexing performance as well as the size on disk) would be happy. There's got to be an easier way. Right? Thanks in advance! Jochen - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Query: A ? B
Otis: Maybe I don't understand this right, but I *think* I am looking for something different: I am trying to write a query like this: "my * house" which should match "my own house", "my red house", "my small house", but should not match "my house" ... you get the idea. If I am not mistaken, a wildcard query only works if the wildcard is within a word (or token), and it would allow me to do things like "g*" matching "green", "great", ...etc. I don't know how to make that work for multi words scenarios. Here is what I tried WildcardQuery in the unit test (TestBasics): Query query = new WildcardQuery(new Term("field","six hundred * five")); Thanks! Jochen -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Thursday, March 04, 2004 12:00 PM To: Lucene Users List Subject: Re: Query: A ? B Use WildcardQuery: A?B Otis --- Jochen Frey <[EMAIL PROTECTED]> wrote: > Hi Everyone. > > I am trying to figure out how create a query that matches > > A ? B > > Where ? is exactly one token. Can anyone tell me how to do that? > > > Obviously it's easy to match 'A * B' where '*' is 0 or 1 tokens (just > use a > PhraseQuery and set slop to 1). However, if I require exactly one > word/token > between 'A' and 'B'? > > > BTW, I know a very clumsy way of doing this, but I really don't like > it: For > each indexed token insert a token (for example 'X') at the same > token-position. Then the query would be: "A X B" and everybody > (except the > indexing performance as well as the size on disk) would be happy. > > There's got to be an easier way. Right? > > Thanks in advance! > Jochen > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Query: A ? B
Use WildcardQuery: A?B Otis --- Jochen Frey <[EMAIL PROTECTED]> wrote: > Hi Everyone. > > I am trying to figure out how create a query that matches > > A ? B > > Where ? is exactly one token. Can anyone tell me how to do that? > > > Obviously it's easy to match 'A * B' where '*' is 0 or 1 tokens (just > use a > PhraseQuery and set slop to 1). However, if I require exactly one > word/token > between 'A' and 'B'? > > > BTW, I know a very clumsy way of doing this, but I really don't like > it: For > each indexed token insert a token (for example 'X') at the same > token-position. Then the query would be: "A X B" and everybody > (except the > indexing performance as well as the size on disk) would be happy. > > There's got to be an easier way. Right? > > Thanks in advance! > Jochen > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Query: A ? B
Hi Everyone. I am trying to figure out how create a query that matches A ? B Where ? is exactly one token. Can anyone tell me how to do that? Obviously it's easy to match 'A * B' where '*' is 0 or 1 tokens (just use a PhraseQuery and set slop to 1). However, if I require exactly one word/token between 'A' and 'B'? BTW, I know a very clumsy way of doing this, but I really don't like it: For each indexed token insert a token (for example 'X') at the same token-position. Then the query would be: "A X B" and everybody (except the indexing performance as well as the size on disk) would be happy. There's got to be an easier way. Right? Thanks in advance! Jochen - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]