Re: Optional Terms in a single query
Todd VanderVeen wrote: I would be careful using wildcards as proposed. They can be inefficient (particularly in a list of disjunctions) but even more importantly you are excluding more than the 3 names. Your results won't be consistent with your intent. In the new version of Luke (the tool) you can view how your wildcard query is re-written into boolean queries. This should help to catch those cases where wildcard queries match unwanted terms. -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Optional Terms in a single query
Luke Shannon wrote: Hi Tod; Thanks for your help. I was able to do what you said but in a much uglier way using a Boolean Query and adding Wildcard Queries. The end result looks like this: The query: +(type:138) +((-name:*tim* -name:*bill* -name:*harry* +olfaithfull:stillhere)) But this one works as expected. Thanks! Luke - Original Message - From: "Todd VanderVeen" <[EMAIL PROTECTED]> To: "Lucene Users List" Sent: Monday, February 21, 2005 6:26 PM Subject: Re: Optional Terms in a single query Luke Shannon wrote: The API I'm working with combines a series of queries into one larger one using a boolean query. Queries on the same field get OR's into one big query. All remaining queries are AND'd with this big one. Working with in this system I have: arg = (mario luigi bobby joe) //i do have control of how this list is created I pass this to the QueryParser: Query query1 = QueryParser.parse(arg, "name", new StandardAnalyzer()); Query query2 = QueryParser.parse("stillhere", "olfaithfull", new StandardAnalyzer()); BooleanQuery typeNegativeSearch = new BooleanQuery(); typeNegativeSearch.add(query1, false, true); typeNegativeSearch.add(query2, true, false); This is half the query. It gets AND'd with the other half, to create what you see below: +(type:181) +((-(name:tim name:harry name:bill) +olfaithfull:stillhere)) What I am having trouble with is getting the QueryParser to create this: -name:(tim bill harry) I feel like this is something simple, but for some reason I can't figure it out. Thanks, Luke Is the API something you control? Lets call the other half of you query query3. To avoid the extra nesting you need to do the composition in a single boolean query. Query query1 = QueryParser.parse(arg, "name", new StandardAnalyzer()); Query query2 = QueryParser.parse("stillhere", "olfaithfull", new StandardAnalyzer()); Query query3 = BooleanQuery finalQuery = new BooleanQuery(); finalQuery.add(query1, false, true); finalQuery.add(query2, true, false); finalQuery.add(query3, true, false); Cheers, Todd VanderVeen - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] I would be careful using wildcards as proposed. They can be inefficient (particularly in a list of disjunctions) but even more importantly you are excluding more than the 3 names. Your results won't be consistent with your intent. Cheers, Todd VanderVeen - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Optional Terms in a single query
Hi Tod; Thanks for your help. I was able to do what you said but in a much uglier way using a Boolean Query and adding Wildcard Queries. The end result looks like this: The query: +(type:138) +((-name:*tim* -name:*bill* -name:*harry* +olfaithfull:stillhere)) But this one works as expected. Thanks! Luke - Original Message - From: "Todd VanderVeen" <[EMAIL PROTECTED]> To: "Lucene Users List" Sent: Monday, February 21, 2005 6:26 PM Subject: Re: Optional Terms in a single query > Luke Shannon wrote: > > >The API I'm working with combines a series of queries into one larger one > >using a boolean query. > > > >Queries on the same field get OR's into one big query. All remaining queries > >are AND'd with this big one. > > > >Working with in this system I have: > > > >arg = (mario luigi bobby joe) //i do have control of how this list is > >created > > > >I pass this to the QueryParser: > > > >Query query1 = QueryParser.parse(arg, "name", new StandardAnalyzer()); > >Query query2 = QueryParser.parse("stillhere", "olfaithfull", new > >StandardAnalyzer()); > >BooleanQuery typeNegativeSearch = new BooleanQuery(); > >typeNegativeSearch.add(query1, false, true); > >typeNegativeSearch.add(query2, true, false); > > > >This is half the query. > > > >It gets AND'd with the other half, to create what you see below: > > > >+(type:181) +((-(name:tim name:harry name:bill) +olfaithfull:stillhere)) > > > >What I am having trouble with is getting the QueryParser to create > >this: -name:(tim bill harry) > > > >I feel like this is something simple, but for some reason I can't figure it > >out. > > > >Thanks, > > > >Luke > > > > > > > Is the API something you control? > > Lets call the other half of you query query3. To avoid the extra nesting > you need to do the composition in a single boolean query. > > Query query1 = QueryParser.parse(arg, "name", new StandardAnalyzer()); > Query query2 = QueryParser.parse("stillhere", "olfaithfull", new StandardAnalyzer()); > Query query3 = > > BooleanQuery finalQuery = new BooleanQuery(); > finalQuery.add(query1, false, true); > finalQuery.add(query2, true, false); > finalQuery.add(query3, true, false); > > Cheers, > Todd VanderVeen > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Optional Terms in a single query
Luke Shannon wrote: The API I'm working with combines a series of queries into one larger one using a boolean query. Queries on the same field get OR's into one big query. All remaining queries are AND'd with this big one. Working with in this system I have: arg = (mario luigi bobby joe) //i do have control of how this list is created I pass this to the QueryParser: Query query1 = QueryParser.parse(arg, "name", new StandardAnalyzer()); Query query2 = QueryParser.parse("stillhere", "olfaithfull", new StandardAnalyzer()); BooleanQuery typeNegativeSearch = new BooleanQuery(); typeNegativeSearch.add(query1, false, true); typeNegativeSearch.add(query2, true, false); This is half the query. It gets AND'd with the other half, to create what you see below: +(type:181) +((-(name:tim name:harry name:bill) +olfaithfull:stillhere)) What I am having trouble with is getting the QueryParser to create this: -name:(tim bill harry) I feel like this is something simple, but for some reason I can't figure it out. Thanks, Luke Is the API something you control? Lets call the other half of you query query3. To avoid the extra nesting you need to do the composition in a single boolean query. Query query1 = QueryParser.parse(arg, "name", new StandardAnalyzer()); Query query2 = QueryParser.parse("stillhere", "olfaithfull", new StandardAnalyzer()); Query query3 = BooleanQuery finalQuery = new BooleanQuery(); finalQuery.add(query1, false, true); finalQuery.add(query2, true, false); finalQuery.add(query3, true, false); Cheers, Todd VanderVeen - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Optional Terms in a single query
The API I'm working with combines a series of queries into one larger one using a boolean query. Queries on the same field get OR's into one big query. All remaining queries are AND'd with this big one. Working with in this system I have: arg = (mario luigi bobby joe) //i do have control of how this list is created I pass this to the QueryParser: Query query1 = QueryParser.parse(arg, "name", new StandardAnalyzer()); Query query2 = QueryParser.parse("stillhere", "olfaithfull", new StandardAnalyzer()); BooleanQuery typeNegativeSearch = new BooleanQuery(); typeNegativeSearch.add(query1, false, true); typeNegativeSearch.add(query2, true, false); This is half the query. It gets AND'd with the other half, to create what you see below: +(type:181) +((-(name:tim name:harry name:bill) +olfaithfull:stillhere)) What I am having trouble with is getting the QueryParser to create this: -name:(tim bill harry) I feel like this is something simple, but for some reason I can't figure it out. Thanks, Luke - Original Message - From: "Todd VanderVeen" <[EMAIL PROTECTED]> To: "Lucene Users List" Sent: Monday, February 21, 2005 5:33 PM Subject: Re: Optional Terms in a single query > Luke Shannon wrote: > > >Hi; > > > >I'm trying to create a query that look for a field containing type:181 and > >name doesn't contain tim, bill or harry. > > > >+(type: 181) +((-name: tim -name:bill -name:harry +oldfaith:stillHere)) > >+(type: 181) +((-name: tim OR bill OR harry +oldfaith:stillHere)) > >+(type: 181) +((-name:*(tim bill harry)* +olfaithfull:stillhere)) > >+(type:1 81) +((-name:*(tim OR bill OR harry)* +olfaithfull:stillhere)) > > > >I would really think to do this all in one Query. Is this even possible? > > > >Thanks, > > > >Luke > > > > > > > >- > >To unsubscribe, e-mail: [EMAIL PROTECTED] > >For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > All all the queries listed attempts at the same things? > > I'm guessing you want this: > > +type:181 -name:(tim bill harry) +oldfaith:stillHere > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Optional Terms in a single query
Sorry about the typos. What I would like is a document with a type field = 181, olfaithfull=stillHere and a name field not containing tim, bill or harry. Thanks, Luke - Original Message - From: "Paul Elschot" <[EMAIL PROTECTED]> To: Sent: Monday, February 21, 2005 5:31 PM Subject: Re: Optional Terms in a single query > On Monday 21 February 2005 23:23, Luke Shannon wrote: > > Hi; > > > > I'm trying to create a query that look for a field containing type:181 and > > name doesn't contain tim, bill or harry. > > type: 181 -(name: tim name:bill name:harry) > > > +(type: 181) +((-name: tim -name:bill -name:harry +oldfaith:stillHere)) > > stillHere is normally lowercased before searching. Is that ok? > > > +(type: 181) +((-name: tim OR bill OR harry +oldfaith:stillHere)) > > +(type: 181) +((-name:*(tim bill harry)* +olfaithfull:stillhere)) > > typo? olfaithfull > > > +(type:1 81) +((-name:*(tim OR bill OR harry)* +olfaithfull:stillhere)) > > typo? (type:1 81) > > > I would really think to do this all in one Query. Is this even possible? > > How would you want to combine the results? > > Regards, > Paul Elschot > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Optional Terms in a single query
Luke Shannon wrote: Hi; I'm trying to create a query that look for a field containing type:181 and name doesn't contain tim, bill or harry. +(type: 181) +((-name: tim -name:bill -name:harry +oldfaith:stillHere)) +(type: 181) +((-name: tim OR bill OR harry +oldfaith:stillHere)) +(type: 181) +((-name:*(tim bill harry)* +olfaithfull:stillhere)) +(type:1 81) +((-name:*(tim OR bill OR harry)* +olfaithfull:stillhere)) I would really think to do this all in one Query. Is this even possible? Thanks, Luke - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] All all the queries listed attempts at the same things? I'm guessing you want this: +type:181 -name:(tim bill harry) +oldfaith:stillHere - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Optional Terms in a single query
On Monday 21 February 2005 23:23, Luke Shannon wrote: > Hi; > > I'm trying to create a query that look for a field containing type:181 and > name doesn't contain tim, bill or harry. type: 181 -(name: tim name:bill name:harry) > +(type: 181) +((-name: tim -name:bill -name:harry +oldfaith:stillHere)) stillHere is normally lowercased before searching. Is that ok? > +(type: 181) +((-name: tim OR bill OR harry +oldfaith:stillHere)) > +(type: 181) +((-name:*(tim bill harry)* +olfaithfull:stillhere)) typo? olfaithfull > +(type:1 81) +((-name:*(tim OR bill OR harry)* +olfaithfull:stillhere)) typo? (type:1 81) > I would really think to do this all in one Query. Is this even possible? How would you want to combine the results? Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Optional Terms in a single query
Hi; I'm trying to create a query that look for a field containing type:181 and name doesn't contain tim, bill or harry. +(type: 181) +((-name: tim -name:bill -name:harry +oldfaith:stillHere)) +(type: 181) +((-name: tim OR bill OR harry +oldfaith:stillHere)) +(type: 181) +((-name:*(tim bill harry)* +olfaithfull:stillhere)) +(type:1 81) +((-name:*(tim OR bill OR harry)* +olfaithfull:stillhere)) I would really think to do this all in one Query. Is this even possible? Thanks, Luke - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Query Tuning
On Monday 21 February 2005 20:43, Todd VanderVeen wrote: > Runde, Kevin wrote: > > >Hi All, > > > >How does Lucene handle multi term queries? Does it use short circuiting? > >So if a user entered: > >(a OR b) AND c > >But my program knew testing for "c" is cheaper than testing for "(a OR > >b)" and I rewrote the query as: > >c AND (a OR b) > >Would the query run faster? > > > >Sorry if this has already be answered, but for some reason the Archive > >search is not working for me today. > > > >Thanks, > >Kevin > > > > > > > > > Not sure about what is in CVS, but look at BooleanQuery.scorer(). If all It's in svn nowadays. > of the clauses of the BooleanQuery are required and none of the clauses > are BooleanQueries a ConjunctionScorer is returned that offers the > optimizations you seek. In the example you gave, there is a clause that > is boolean ( a or b) that will have to be evaluated independently with a > boolean scorer. This will be performed regardless of the ordering. > (BooleanScorer doesn't preserve document order when it return results > and hence it can't utilize the optimal algorithm provided by > ConjuntionScorer). Others have been down this path as evidenced by the > sigh in the javadoc. In the svn version a ConjunctionScorer is used for all top level AND queries. > If calculating (a or b) is expensive and the docFreq of a is much less > than the union of a and b, you might consider rewriting it to (a and c) > or (b and c) using DeMorgan's law. Expansion like this isn't always > beneficial and can't be applied blindly. As far as I can tell there is In the svn version the subquery (a or b) is only evaluated for documents matching c. In the current version the expansion to (a and c) or (b and c) might help: the tradeoff is between evaluating c twice and having less work for the OR operator. > no query planning/optimization aside from the merging of related clauses > and attempts to rewrite to simpler queries. One optimization in the current version is the use of ConjunctionScorer for some cases. One such case, which happens a lot in practice, is a query that has a few required terms. Another optimization in the current version that some scoring is done ahead for each clause into an unordered buffer. This helps for top level OR queries, but loses for OR queries that are subqueries of AND. The svn version does not score ahead. It relies on the buffering done by TermScorer. Perhaps the buffering for a TermScorer should be made dependent on it's expected use: more buffering for top level OR, less buffering when used under AND. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Query Tuning
Runde, Kevin wrote: Hi All, How does Lucene handle multi term queries? Does it use short circuiting? So if a user entered: (a OR b) AND c But my program knew testing for "c" is cheaper than testing for "(a OR b)" and I rewrote the query as: c AND (a OR b) Would the query run faster? Sorry if this has already be answered, but for some reason the Archive search is not working for me today. Thanks, Kevin Not sure about what is in CVS, but look at BooleanQuery.scorer(). If all of the clauses of the BooleanQuery are required and none of the clauses are BooleanQueries a ConjunctionScorer is returned that offers the optimizations you seek. In the example you gave, there is a clause that is boolean ( a or b) that will have to be evaluated independently with a boolean scorer. This will be performed regardless of the ordering. (BooleanScorer doesn't preserve document order when it return results and hence it can't utilize the optimal algorithm provided by ConjuntionScorer). Others have been down this path as evidenced by the sigh in the javadoc. If calculating (a or b) is expensive and the docFreq of a is much less than the union of a and b, you might consider rewriting it to (a and c) or (b and c) using DeMorgan's law. Expansion like this isn't always beneficial and can't be applied blindly. As far as I can tell there is no query planning/optimization aside from the merging of related clauses and attempts to rewrite to simpler queries. Cheers, Todd VanderVeen - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Query Tuning
On Monday 21 February 2005 19:59, Runde, Kevin wrote: > Hi All, > > How does Lucene handle multi term queries? Does it use short circuiting? > So if a user entered: > (a OR b) AND c > But my program knew testing for "c" is cheaper than testing for "(a OR > b)" and I rewrote the query as: > c AND (a OR b) > Would the query run faster? Exchanging the operands of AND would not make a noticeable difference in speed. Queries are evaluated by iterating the inverted term index entries for all query terms in parallel, with buffering. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Query Tuning
Hi All, How does Lucene handle multi term queries? Does it use short circuiting? So if a user entered: (a OR b) AND c But my program knew testing for "c" is cheaper than testing for "(a OR b)" and I rewrote the query as: c AND (a OR b) Would the query run faster? Sorry if this has already be answered, but for some reason the Archive search is not working for me today. Thanks, Kevin
RE: Using the highlighter from the sandbox with a prefix query.
Thank you this helped a lot... Michael Celona -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Monday, February 21, 2005 11:55 AM To: Lucene Users List Subject: Re: Using the highlighter from the sandbox with a prefix query. On Feb 21, 2005, at 10:53 AM, Michael Celona wrote: > That the only stack I get. One thing to mention that I am using a > MultiSearcher to rewrite the queries. I tried... > > query = searcher_last.rewrite( query ); > query = searcher_cur.rewrite( query ); > > using IndexSearcher and I don't get an error... However, I not able to > highlight wildcard queries. I use Highlighter for lucenebook.com and have two indexes that I search with MultiSearcher. Here's how I highlight: IndexReader reader = readers[indexIndex]; QueryScorer scorer = new QueryScorer(query.rewrite(reader)); SimpleHTMLFormatter formatter = new SimpleHTMLFormatter("", ""); Highlighter highlighter = new Highlighter(formatter, scorer); I get the appropriate IndexReader for the document being highlighted. You can get the index _index_ this way: ' int indexIndex = searcher.subSearcher(hits.id(position)); Hope this helps. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Using the highlighter from the sandbox with a prefix query.
>One thing to mention > that I am using a > MultiSearcher to rewrite the queries. I tried... Ah. I remember this got a little ugly. The highlighter has a Junit test that demonstrates highlighting fuzzy queries when using a multisearcher. Take a look at that. I can't remember the ins and outs of the issues but I know the code there still runs clean with the latest versions. Cheers Mark. ___ ALL-NEW Yahoo! Messenger - all new features - even more fun! http://uk.messenger.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Using the highlighter from the sandbox with a prefix query.
On Feb 21, 2005, at 10:53 AM, Michael Celona wrote: That the only stack I get. One thing to mention that I am using a MultiSearcher to rewrite the queries. I tried... query = searcher_last.rewrite( query ); query = searcher_cur.rewrite( query ); using IndexSearcher and I don't get an error... However, I not able to highlight wildcard queries. I use Highlighter for lucenebook.com and have two indexes that I search with MultiSearcher. Here's how I highlight: IndexReader reader = readers[indexIndex]; QueryScorer scorer = new QueryScorer(query.rewrite(reader)); SimpleHTMLFormatter formatter = new SimpleHTMLFormatter("", ""); Highlighter highlighter = new Highlighter(formatter, scorer); I get the appropriate IndexReader for the document being highlighted. You can get the index _index_ this way: ' int indexIndex = searcher.subSearcher(hits.id(position)); Hope this helps. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Using the highlighter from the sandbox with a prefix query.
That the only stack I get. One thing to mention that I am using a MultiSearcher to rewrite the queries. I tried... query = searcher_last.rewrite( query ); query = searcher_cur.rewrite( query ); using IndexSearcher and I don't get an error... However, I not able to highlight wildcard queries. Michael -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Monday, February 21, 2005 10:32 AM To: Lucene Users List Subject: Re: Using the highlighter from the sandbox with a prefix query. On Feb 21, 2005, at 10:20 AM, Michael Celona wrote: > I am using > query = searcher.rewrite( query ); > > and it is throwing java.lang.UnsupportedOperationException . > > Am I able to use the searcher rewrite method like this? What's the full stack trace? Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Using the highlighter from the sandbox with a prefix query.
On Feb 21, 2005, at 10:20 AM, Michael Celona wrote: I am using query = searcher.rewrite( query ); and it is throwing java.lang.UnsupportedOperationException . Am I able to use the searcher rewrite method like this? What's the full stack trace? Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Using the highlighter from the sandbox with a prefix query.
I am using query = searcher.rewrite( query ); and it is throwing java.lang.UnsupportedOperationException . Am I able to use the searcher rewrite method like this? Thanks, Michael -Original Message- From: Daniel Naber [mailto:[EMAIL PROTECTED] Sent: Thursday, February 17, 2005 4:09 AM To: Lucene Users List Subject: Re: Using the highlighter from the sandbox with a prefix query. On Thursday 17 February 2005 08:37, lucuser4851 wrote: > We have been using the highlighter from the lucene sandbox, which works > very nicely most of the time. However when we try and use it with a > prefix query (which is what you get having parsed a wild-card query), it > doesn't return any highlighted sections. Has anyone else experienced > this problem, or found a way around it? You need to call rewrite() on the query before you pass it to the highlighter. Regards Daniel - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Query Question
Thanks Erik. Option 2 sounds like the path of least resistance. Luke - Original Message - From: "Erik Hatcher" <[EMAIL PROTECTED]> To: "Lucene Users List" Sent: Thursday, February 17, 2005 9:05 PM Subject: Re: Query Question > On Feb 17, 2005, at 5:51 PM, Luke Shannon wrote: > > My manager is now totally stuck about being able to query data with * > > in it. > > He's gonna have to wait a bit longer, you've got a slightly tricky > situation on your hands > > > WildcardQuery(new Term("name", "*home\**")); > > The \* is the problem. WildcardQuery doesn't deal with escaping like > you're trying. Your query is essentially this now: > > home\* > > Where backslash has no special meaning at all... you're literally > looking for all terms that start with home followed by a backslash. > Two asterisks at the end really collapse into a single one logically. > > > Any theories as to why the it would not match: > > > > Document (relevant fields): > > Keyword > > Keyword > > > > Is the \ escaping both * characters? > > So, again, no escaping is being done here. You're a bit stuck in this > situation because * (and ?) are special to WildcardQuery, and it does > no escaping. Two options I think of: > > - Build your own clone of WildcardQuery that does escaping - or > perhaps change the wildcard characters to something you do not index > and use those instead. > > - Replace asterisks in the terms indexed with some other non-wildcard > character, then replace it on your queries as appropriate. > > Erik > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: select where from query type in lucene
Miles Barr writes: > On Fri, 2005-02-18 at 03:58 +0100, Miro Max wrote: > > how can i search for content where type=document or > > (type=document OR type=view). > > actually i can do it with: "(type:document OR > > type:entry) AND queryText" as QueryString. > > but does exist any other better way to realize this? > [...] > > Another alternative is to put each type in it's own index and use a > MultiSearcher to pull in the types you want. > If the change rate of the index and the number of commonly used type combinations aren't too large, cached filters might be another alternative. Of couse the filter would have to be recreated whenever the index changes. The advantage is, that you save searching for the types for each query where the filter is reused while you can keep all documents within one index. Morus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: select where from query type in lucene
On Fri, 2005-02-18 at 03:58 +0100, Miro Max wrote: > how can i search for content where type=document or > (type=document OR type=view). > actually i can do it with: "(type:document OR > type:entry) AND queryText" as QueryString. > but does exist any other better way to realize this? What's wrong with that method? I don't think you can do it any simpler. Are you concerned about writing a string then having to use the query parser? You could also build it up manually: QueryParser parser = ... Query text = parser.parse(queryText); Query type = new BooleanQuery(); type.add(new TermQuery(new Term("type", "document")), false, false); type.add(new TermQuery(new Term("type", "view")), false, false); Query everything = new BooleanQuery(); everything.add(text, true, false); everything.add(type, true, false); That way you could avoid things in queryText overriding the type check. Another alternative is to put each type in it's own index and use a MultiSearcher to pull in the types you want. -- Miles Barr <[EMAIL PROTECTED]> Runtime Collective Ltd. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
select where from query type in lucene
Hi, i've problem with my my classes using lucene. my index looks like: type | content - document | x document | x view | x view | x dbentry| x dbentry| x my question now: how can i search for content where type=document or (type=document OR type=view). actually i can do it with: "(type:document OR type:entry) AND queryText" as QueryString. but does exist any other better way to realize this? thx miro ___ Gesendet von Yahoo! Mail - Jetzt mit 250MB Speicher kostenlos - Hier anmelden: http://mail.yahoo.de - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Query Question
On Feb 17, 2005, at 5:51 PM, Luke Shannon wrote: My manager is now totally stuck about being able to query data with * in it. He's gonna have to wait a bit longer, you've got a slightly tricky situation on your hands WildcardQuery(new Term("name", "*home\**")); The \* is the problem. WildcardQuery doesn't deal with escaping like you're trying. Your query is essentially this now: home\* Where backslash has no special meaning at all... you're literally looking for all terms that start with home followed by a backslash. Two asterisks at the end really collapse into a single one logically. Any theories as to why the it would not match: Document (relevant fields): Keyword Keyword Is the \ escaping both * characters? So, again, no escaping is being done here. You're a bit stuck in this situation because * (and ?) are special to WildcardQuery, and it does no escaping. Two options I think of: - Build your own clone of WildcardQuery that does escaping - or perhaps change the wildcard characters to something you do not index and use those instead. - Replace asterisks in the terms indexed with some other non-wildcard character, then replace it on your queries as appropriate. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Query Question
Hello; My manager is now totally stuck about being able to query data with * in it. Here are two queries. TermQuery(new Term("type", "203")); WildcardQuery(new Term("name", "*home\**")); They are joined in a boolean query. That query gives this result when you call the toString(): +(type:203) +(name:*home\**) This looks right to me. Any theories as to why the it would not match: Document (relevant fields): Keyword Keyword Is the \ escaping both * characters? Thanks, Luke - Original Message - From: "Luke Shannon" <[EMAIL PROTECTED]> To: "Lucene Users List" Sent: Thursday, February 17, 2005 2:44 PM Subject: Query Question > Hello; > > Why won't this query find the document below? > > Query: > +(type:203) +(name:*home\**) > > Document (relevant fields): > Keyword > Keyword > > I was hoping by escaping the * it would be treated as a string. What am I > doing wrong? > > Thanks, > > Luke > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Query Question
That is a query toString(). I created the Query using a Wildcard Query object. Luke - Original Message - From: "Erik Hatcher" <[EMAIL PROTECTED]> To: "Lucene Users List" Sent: Thursday, February 17, 2005 3:00 PM Subject: Re: Query Question > > On Feb 17, 2005, at 2:44 PM, Luke Shannon wrote: > > > Hello; > > > > Why won't this query find the document below? > > > > Query: > > +(type:203) +(name:*home\**) > > Is that what the query toString is? Or is that what you handed to > QueryParser? > > Depending on your analyzer, 203 may go away. QueryParser doesn't > support leading asterisks, so "*home" would fail to parse. > > > Document (relevant fields): > > Keyword > > Keyword > > > > I was hoping by escaping the * it would be treated as a string. What > > am I > > doing wrong? > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Query Question
On Feb 17, 2005, at 2:44 PM, Luke Shannon wrote: Hello; Why won't this query find the document below? Query: +(type:203) +(name:*home\**) Is that what the query toString is? Or is that what you handed to QueryParser? Depending on your analyzer, 203 may go away. QueryParser doesn't support leading asterisks, so "*home" would fail to parse. Document (relevant fields): Keyword Keyword I was hoping by escaping the * it would be treated as a string. What am I doing wrong? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Query Question
Hello; Why won't this query find the document below? Query: +(type:203) +(name:*home\**) Document (relevant fields): Keyword Keyword I was hoping by escaping the * it would be treated as a string. What am I doing wrong? Thanks, Luke - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Using the highlighter from the sandbox with a prefix query.
Thanks very much Marc and Daniel. That solved the problem!! On Thu, 2005-02-17 at 08:55 +, mark harwood wrote: > See the highlighter's package.html for a description > of how query.rewrite should be used to solve this. > > Cheers, > Mark > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Using the highlighter from the sandbox with a prefix query.
On Thursday 17 February 2005 08:37, lucuser4851 wrote: > We have been using the highlighter from the lucene sandbox, which works > very nicely most of the time. However when we try and use it with a > prefix query (which is what you get having parsed a wild-card query), it > doesn't return any highlighted sections. Has anyone else experienced > this problem, or found a way around it? You need to call rewrite() on the query before you pass it to the highlighter. Regards Daniel - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Using the highlighter from the sandbox with a prefix query.
See the highlighter's package.html for a description of how query.rewrite should be used to solve this. Cheers, Mark --- lucuser4851 <[EMAIL PROTECTED]> wrote: > Dear All, > We have been using the highlighter from the lucene > sandbox, which works > very nicely most of the time. However when we try > and use it with a > prefix query (which is what you get having parsed a > wild-card query), it > doesn't return any highlighted sections. Has anyone > else experienced > this problem, or found a way around it? > > Thanks a lot for your suggestions!! > > > > - > To unsubscribe, e-mail: > [EMAIL PROTECTED] > For additional commands, e-mail: > [EMAIL PROTECTED] > > ___ ALL-NEW Yahoo! Messenger - all new features - even more fun! http://uk.messenger.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Using the highlighter from the sandbox with a prefix query.
Dear All, We have been using the highlighter from the lucene sandbox, which works very nicely most of the time. However when we try and use it with a prefix query (which is what you get having parsed a wild-card query), it doesn't return any highlighted sections. Has anyone else experienced this problem, or found a way around it? Thanks a lot for your suggestions!! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: What does [] do to a query and what's up with lucene.apache.org?
Otis and Erik, Thanks for the info. That's a great reference. Jim. Erik Hatcher wrote: Jim, The Lucene website is transitioning to the new top-level space. I have checked out the current site to the new lucene.apache.org area and set up redirects from the old Jakarta URL's. The source code, though, is not an official part of the website. Thanks to our conversion to Subversion, though, the source is browsable starting here: http://svn.apache.org/repos/asf/lucene/java/trunk The HTML of the website will need link adjustments to get everything back in shape. The brackets are documented here: http://lucene.apache.org/queryparsersyntax.html Erik On Feb 14, 2005, at 10:31 AM, Jim Lynch wrote: First I'm getting a The requested URL could not be retrieved --- - While trying to retrieve the URL: http://lucene.apache.org/src/test/org/apache/lucene/queryParser/ TestQueryParser.java The following error was encountered: Unable to determine IP address from host name for /lucene.apache.org /Guess the system is down. I'm getting this error: org.apache.lucene.queryParser.ParseException: Encountered "is" at line 1, column 15. Was expecting: "]" ... when I tried to parse the following string "[this is a test]". I can't find any documentation that tells me what the brackets do to a query. I had a user that was used to another search engine that used [] to do proximity or near searches and tried it on this one. Actually I'd like to see the documentation for what the parser does. All that is mentioned in the javadoc is + - and (). Obviously there are more special characters. Thanks, Jim. Jim. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: What does [] do to a query and what's up with lucene.apache.org?
Jim, The Lucene website is transitioning to the new top-level space. I have checked out the current site to the new lucene.apache.org area and set up redirects from the old Jakarta URL's. The source code, though, is not an official part of the website. Thanks to our conversion to Subversion, though, the source is browsable starting here: http://svn.apache.org/repos/asf/lucene/java/trunk The HTML of the website will need link adjustments to get everything back in shape. The brackets are documented here: http://lucene.apache.org/queryparsersyntax.html Erik On Feb 14, 2005, at 10:31 AM, Jim Lynch wrote: First I'm getting a The requested URL could not be retrieved --- - While trying to retrieve the URL: http://lucene.apache.org/src/test/org/apache/lucene/queryParser/ TestQueryParser.java The following error was encountered: Unable to determine IP address from host name for /lucene.apache.org /Guess the system is down. I'm getting this error: org.apache.lucene.queryParser.ParseException: Encountered "is" at line 1, column 15. Was expecting: "]" ... when I tried to parse the following string "[this is a test]". I can't find any documentation that tells me what the brackets do to a query. I had a user that was used to another search engine that used [] to do proximity or near searches and tried it on this one. Actually I'd like to see the documentation for what the parser does. All that is mentioned in the javadoc is + - and (). Obviously there are more special characters. Thanks, Jim. Jim. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: What does [] do to a query and what's up with lucene.apache.org?
Hi, lucene.apache.org seems to work now. Here is the query syntax: http://lucene.apache.org/queryparsersyntax.html [] is used as [BEGIN-RANGE-STRING TO END-RANGE-STRING] Otis --- Jim Lynch <[EMAIL PROTECTED]> wrote: > First I'm getting a > > > The requested URL could not be retrieved > > > > While trying to retrieve the URL: > http://lucene.apache.org/src/test/org/apache/lucene/queryParser/TestQueryParser.java > > > > The following error was encountered: > > Unable to determine IP address from host name for > /lucene.apache.org > > /Guess the system is down. > > I'm getting this error: > > org.apache.lucene.queryParser.ParseException: Encountered "is" at > line > 1, column 15. > Was expecting: > "]" ... > when I tried to parse the following string "[this is a test]". > > I can't find any documentation that tells me what the brackets do to > a > query. I had a user that was used to another search engine that used > [] > to do proximity or near searches and tried it on this one. Actually > I'd > like to see the documentation for what the parser does. All that is > mentioned in the javadoc is + - and (). Obviously there are more > special characters. > > Thanks, > Jim. > > Jim. > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
What does [] do to a query and what's up with lucene.apache.org?
First I'm getting a The requested URL could not be retrieved While trying to retrieve the URL: http://lucene.apache.org/src/test/org/apache/lucene/queryParser/TestQueryParser.java The following error was encountered: Unable to determine IP address from host name for /lucene.apache.org /Guess the system is down. I'm getting this error: org.apache.lucene.queryParser.ParseException: Encountered "is" at line 1, column 15. Was expecting: "]" ... when I tried to parse the following string "[this is a test]". I can't find any documentation that tells me what the brackets do to a query. I had a user that was used to another search engine that used [] to do proximity or near searches and tried it on this one. Actually I'd like to see the documentation for what the parser does. All that is mentioned in the javadoc is + - and (). Obviously there are more special characters. Thanks, Jim. Jim. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Query Analyzer
That worked. Thanks a lot. -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Monday, February 07, 2005 11:39 AM To: Lucene Users List Subject: Re: Query Analyzer On Feb 7, 2005, at 11:29 AM, Ravi wrote: > How do I set the analyzer when I build the query in my code instead of > using a query parser ? You don't. All terms you use for any Query subclasses you instantiate must match exactly the terms in the index. If you need an analyzer to do this then you're responsible for doing it yourself, just as QueryParser does underneath. I do this myself in my current application like this: private Query createPhraseQuery(String fieldName, String string, boolean lowercase) { RossettiAnalyzer analyzer = new RossettiAnalyzer(lowercase); TokenStream stream = analyzer.tokenStream(fieldName, new StringReader(string)); PhraseQuery pq = new PhraseQuery(); Token token; try { while ((token = stream.next()) != null) { pq.add(new Term(fieldName, token.termText())); } } catch (IOException ignored) { // ignore - shouldn't get an IOException on a StringReader } if (pq.getTerms().length == 1) { // optimize single term phrase to TermQuery return new TermQuery(pq.getTerms()[0]); } return pq; } Hope that helps. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Query Analyzer
On Feb 7, 2005, at 11:29 AM, Ravi wrote: How do I set the analyzer when I build the query in my code instead of using a query parser ? You don't. All terms you use for any Query subclasses you instantiate must match exactly the terms in the index. If you need an analyzer to do this then you're responsible for doing it yourself, just as QueryParser does underneath. I do this myself in my current application like this: private Query createPhraseQuery(String fieldName, String string, boolean lowercase) { RossettiAnalyzer analyzer = new RossettiAnalyzer(lowercase); TokenStream stream = analyzer.tokenStream(fieldName, new StringReader(string)); PhraseQuery pq = new PhraseQuery(); Token token; try { while ((token = stream.next()) != null) { pq.add(new Term(fieldName, token.termText())); } } catch (IOException ignored) { // ignore - shouldn't get an IOException on a StringReader } if (pq.getTerms().length == 1) { // optimize single term phrase to TermQuery return new TermQuery(pq.getTerms()[0]); } return pq; } Hope that helps. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Query Analyzer
How do I set the analyzer when I build the query in my code instead of using a query parser ? Thanks in advance Ravi. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Parsing The Query: Every document that doesn't have a field containing x (but still has the field)
Hello; I think Chris's approach might be helpfull, but I can't seems to get it to work. So since I running out of time and I still need to figure out "starts with" and "ends with" queries, I have implemented a hacky solution to getting all documents with a kcfileupload field present that does not contain jpg: query1 = QueryParser.parse("jpg", "kcfileupload", new StandardAnalyzer()); query2 = QueryParser.parse("stillhere", "olfaithfull", new StandardAnalyzer());//each document contains this BooleanQuery typeNegativeSearch = new BooleanQuery(); typeNegativeSearch.add(query1, false, true); typeNegativeSearch.add(query2, true, false); What gets returned are all the documents without a kcfileupload = jpg. This includes documents that don't even have a kcfileupload. When I go through the results before displaying I check to make sure there is a "kcfileupload" field. This is not a good solution, and I hope to replace it soon. If anyone has ideas please let me know. Luke - Original Message - From: "Chris Hostetter" <[EMAIL PROTECTED]> To: "Lucene Users List" Sent: Friday, February 04, 2005 3:03 PM Subject: Re: Parsing The Query: Every document that doesn't have a field containing x Another approach... You can make a Filter that is the inverse of the output from another filter, which means you can make a QueryFilter on the search, then wrap it in your inverse Filter. you can't execute a query on a filter without having a Query object, but you can just apply the Filter directly to an IndexReader yourself, and get back a BitSet containing the docIds of everydocument that does not contain your term. something like this should work... class NotFilter extends Filter { private Filter wraped; public NotFilter(Filter w) { wraped = w; } public BitSet bits(IndexReader r) { BitSet b = wraped.bits(r); b.flip(0,b.size()); return b; } } ... BitSet results = (new NotFilter (new QueryFilter (new TermQuery(new Term("f","x").bits(reader); : Date: Thu, 3 Feb 2005 19:51:36 +0100 : From: Kelvin Tan <[EMAIL PROTECTED]> : Reply-To: Lucene Users List : To: Lucene Users List : Subject: Re: Parsing The Query: Every document that doesn't have a field : containing x : : Alternatively, add a dummy field-value to all documents, like doc.add(Field.Keyword("foo", "bar")) : : Waste of space, but allows you to perform negated queries. : : On Thu, 03 Feb 2005 19:19:15 +0100, Maik Schreiber wrote: : >> Negating a term must be combined with at least one nonnegated : >> term to return documents; in other words, it isn't possible to : >> use a query like NOT term to find all documents that don't : >> contain a term. : >> : >> So does that mean the above example wouldn't work? : >> : > Exactly. You cannot search for "-kcfileupload:jpg", you need at : > least one clause that actually _includes_ documents. : > : > Do you by chance have a field with known contents? If so, you could : > misuse that one and include it in your query (perhaps by doing : > range or wildcard/prefix search). If not, try IndexReader.terms() : > for building a Query yourself, then use that one for search. : : : : - : To unsubscribe, e-mail: [EMAIL PROTECTED] : For additional commands, e-mail: [EMAIL PROTECTED] : -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Parsing The Query: Every document that doesn't have a field containing x
Hi Chris; So the result would contain all documents that don't have field f containing x? What I need to figure out how to do is return all documents that have a field f, but does not contain x. Thanks for your post. Luke - Original Message - From: "Chris Hostetter" <[EMAIL PROTECTED]> To: "Lucene Users List" Sent: Friday, February 04, 2005 3:03 PM Subject: Re: Parsing The Query: Every document that doesn't have a field containing x Another approach... You can make a Filter that is the inverse of the output from another filter, which means you can make a QueryFilter on the search, then wrap it in your inverse Filter. you can't execute a query on a filter without having a Query object, but you can just apply the Filter directly to an IndexReader yourself, and get back a BitSet containing the docIds of everydocument that does not contain your term. something like this should work... class NotFilter extends Filter { private Filter wraped; public NotFilter(Filter w) { wraped = w; } public BitSet bits(IndexReader r) { BitSet b = wraped.bits(r); b.flip(0,b.size()); return b; } } ... BitSet results = (new NotFilter (new QueryFilter (new TermQuery(new Term("f","x").bits(reader); : Date: Thu, 3 Feb 2005 19:51:36 +0100 : From: Kelvin Tan <[EMAIL PROTECTED]> : Reply-To: Lucene Users List : To: Lucene Users List : Subject: Re: Parsing The Query: Every document that doesn't have a field : containing x : : Alternatively, add a dummy field-value to all documents, like doc.add(Field.Keyword("foo", "bar")) : : Waste of space, but allows you to perform negated queries. : : On Thu, 03 Feb 2005 19:19:15 +0100, Maik Schreiber wrote: : >> Negating a term must be combined with at least one nonnegated : >> term to return documents; in other words, it isn't possible to : >> use a query like NOT term to find all documents that don't : >> contain a term. : >> : >> So does that mean the above example wouldn't work? : >> : > Exactly. You cannot search for "-kcfileupload:jpg", you need at : > least one clause that actually _includes_ documents. : > : > Do you by chance have a field with known contents? If so, you could : > misuse that one and include it in your query (perhaps by doing : > range or wildcard/prefix search). If not, try IndexReader.terms() : > for building a Query yourself, then use that one for search. : : : : - : To unsubscribe, e-mail: [EMAIL PROTECTED] : For additional commands, e-mail: [EMAIL PROTECTED] : -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Parsing The Query: Every document that doesn't have a field containing x
Thanks for everyone who has been posting possible solutions. I am making great progress and learning a lot. This works, but the results include files that don't even contain a "kcfileupload" field (not good): query1 = QueryParser.parse("jpg", "kcfileupload", new StandardAnalyzer()); query2 = QueryParser.parse("stillhere", "olfaithfull", new StandardAnalyzer()); BooleanQuery typeNegativeSearch = new BooleanQuery(); typeNegativeSearch.add(query1, false, true); typeNegativeSearch.add(query2, true, false); Someone meantioned a filter. So I have been playing with the test below. The problem I have is this line: Query query2 = QueryParser.parse("*", "kcfileupload", new StandardAnalyzer()); Results in the following error: org.apache.lucene.queryParser.ParseException: Lexical error at line 1, column 2. Encountered: after : "" I was hoping it would create a wild card search on kcfileupload. I feel like I am getting close to a good solution. Any tips would help. Thanks, Luke import junit.framework.TestCase; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.Term; import org.apache.lucene.queryParser.QueryParser; import org.apache.lucene.search.Hits; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.Query; import org.apache.lucene.search.QueryFilter; import org.apache.lucene.search.TermQuery; import org.apache.lucene.store.RAMDirectory; public class IsNotTypeTest extends TestCase { private RAMDirectory directory; protected void setUp() throws Exception { directory = new RAMDirectory(); IndexWriter writer = new IndexWriter(directory, new StandardAnalyzer(), true); //jpg should show up in first query Document document = new Document(); document.add(Field.Text("kcfileupload", "picture.jpg")); document.add(Field.Text("name", "pic one")); writer.addDocument(document); //jpg should show up in first query document = new Document(); document.add(Field.Text("kcfileupload", "picture2.jpg")); document.add(Field.Text("name", "pic two")); writer.addDocument(document); //pdf should show up in second query document = new Document(); document.add(Field.Text("kcfileupload", "file.pdf")); document.add(Field.Text("name", "pdf one")); writer.addDocument(document); //ppt should show up in second query document = new Document(); document.add(Field.Text("kcfileupload", "file.ppt")); document.add(Field.Text("name", "power point one")); writer.addDocument(document); //ppt should show up in second query document = new Document(); document.add(Field.Text("kcfileupload", "file2.ppt")); document.add(Field.Text("name", "power point two")); writer.addDocument(document); //other should not show in this test document = new Document(); document.add(Field.Text("name", "link")); document.add(Field.Text("address", "www.cbc.ca")); writer.addDocument(document); writer.close(); } public void testIsNotType() throws Exception { IndexSearcher searcher = new IndexSearcher(directory); Query query1 = QueryParser.parse("jpg", "kcfileupload", new StandardAnalyzer()); Query query2 = QueryParser.parse("*", "kcfileupload", new StandardAnalyzer()); QueryFilter jpgFilter = new QueryFilter(new TermQuery(new Term("kcfileupload", "jpg"))); Hits hits = searcher.search(query1); assertEquals(2, hits.length()); int totalHits = hits.length(); int count = 0; while (count < totalHits) { Document current = (Document)hits.doc(count); System.out.println("The upload is " + count + " is " + current.getField("kcfileupload")); count++; } hits = searcher.search(query2, jpgFilter); assertEquals(3, hits.length()); totalHits = hits.length(); count = 0; while (count < totalHits) { Document current = (Document)hits.doc(count); System.out.println("The upload is " + count + " is " + current.getField("kcfileupload")); count++; } } } - Original Message - From: "åç" <[EMAIL PROTECTED]> To: "
Re: Parsing The Query: Every document that doesn't have a field containing x
Another approach... You can make a Filter that is the inverse of the output from another filter, which means you can make a QueryFilter on the search, then wrap it in your inverse Filter. you can't execute a query on a filter without having a Query object, but you can just apply the Filter directly to an IndexReader yourself, and get back a BitSet containing the docIds of everydocument that does not contain your term. something like this should work... class NotFilter extends Filter { private Filter wraped; public NotFilter(Filter w) { wraped = w; } public BitSet bits(IndexReader r) { BitSet b = wraped.bits(r); b.flip(0,b.size()); return b; } } ... BitSet results = (new NotFilter (new QueryFilter (new TermQuery(new Term("f","x").bits(reader); : Date: Thu, 3 Feb 2005 19:51:36 +0100 : From: Kelvin Tan <[EMAIL PROTECTED]> : Reply-To: Lucene Users List : To: Lucene Users List : Subject: Re: Parsing The Query: Every document that doesn't have a field : containing x : : Alternatively, add a dummy field-value to all documents, like doc.add(Field.Keyword("foo", "bar")) : : Waste of space, but allows you to perform negated queries. : : On Thu, 03 Feb 2005 19:19:15 +0100, Maik Schreiber wrote: : >> Negating a term must be combined with at least one nonnegated : >> term to return documents; in other words, it isn't possible to : >> use a query like NOT term to find all documents that don't : >> contain a term. : >> : >> So does that mean the above example wouldn't work? : >> : > Exactly. You cannot search for "-kcfileupload:jpg", you need at : > least one clause that actually _includes_ documents. : > : > Do you by chance have a field with known contents? If so, you could : > misuse that one and include it in your query (perhaps by doing : > range or wildcard/prefix search). If not, try IndexReader.terms() : > for building a Query yourself, then use that one for search. : : : : - : To unsubscribe, e-mail: [EMAIL PROTECTED] : For additional commands, e-mail: [EMAIL PROTECTED] : -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Parsing The Query: Every document that doesn't have a field containing x
Very Nice. Thanks! Luke - Original Message - From: "åç" <[EMAIL PROTECTED]> To: "Lucene Users List" Sent: Friday, February 04, 2005 2:12 AM Subject: Re: Parsing The Query: Every document that doesn't have a field containing x I think you may can use a filter to get right result! See examlples below package lia.advsearching; import junit.framework.TestCase; import org.apache.lucene.analysis.WhitespaceAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.Term; import org.apache.lucene.search.Hits; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.QueryFilter; import org.apache.lucene.search.TermQuery; import org.apache.lucene.store.RAMDirectory; public class SecurityFilterTest extends TestCase { private RAMDirectory directory; protected void setUp() throws Exception { directory = new RAMDirectory(); IndexWriter writer = new IndexWriter(directory, new WhitespaceAnalyzer(), true); // Elwood Document document = new Document(); document.add(Field.Keyword("owner", "elwood")); document.add(Field.Text("keywords", "elwoods sensitive info")); writer.addDocument(document); // Jake document = new Document(); document.add(Field.Keyword("owner", "jake")); document.add(Field.Text("keywords", "jakes sensitive info")); writer.addDocument(document); writer.close(); } public void testSecurityFilter() throws Exception { TermQuery query = new TermQuery(new Term("keywords", "info")); IndexSearcher searcher = new IndexSearcher(directory); Hits hits = searcher.search(query); assertEquals("Both documents match", 2, hits.length()); QueryFilter jakeFilter = new QueryFilter( new TermQuery(new Term("owner", "jake"))); hits = searcher.search(query, jakeFilter); assertEquals(1, hits.length()); assertEquals("elwood is safe", "jakes sensitive info", hits.doc(0).get("keywords")); } } On Thu, 3 Feb 2005 13:04:50 -0500, Luke Shannon <[EMAIL PROTECTED]> wrote: > Hello; > > I have a query that finds document that contain fields with a specific > value. > > query1 = QueryParser.parse("jpg", "kcfileupload", new StandardAnalyzer()); > > This works well. > > I would like a query that find documents containing all kcfileupload fields > that don't contain jpg. > > The example I found in the book that seems to relate shows me how to find > documents without a specific term: > > QueryParser parser = new QueryParser("contents", analyzer); > parser.setOperator(QueryParser.DEFAULT_OPERATOR_AND); > > But than it says: > > Negating a term must be combined with at least one nonnegated term to return > documents; in other words, it isn't possible to use a query like NOT term to > find all documents that don't contain a term. > > So does that mean the above example wouldn't work? > > The API says: > > a plus (+) or a minus (-) sign, indicating that the clause is required or > prohibited respectively; > > I have been playing around with using the minus character without much luck. > > Can someone give point me in the right direction to figure this out? > > Thanks, > > Luke > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > -- æäåäæäå - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Numbers in the Query String
I agree their viewpoint! On Thu, 3 Feb 2005 14:29:13 -0800 (PST), Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > Using different analyzers for indexing and searching is not > recommended. > Your numbers are not even in the index because you are using > StandardAnalyzer. Use Luke to look at your index. > > Otis > > > --- Hetan Shah <[EMAIL PROTECTED]> wrote: > > > Hello, > > > > How can one search for a document based on the query which has > > numbers > > in the query srting. > > > > e.g. query = Java 2 Platform J2EE > > > > What do I need to do so that the numbers do not get neglected. > > > > I am using StandardAnalyzer to index the pages and using StopAnalyzer > > to > > search the documents. Would the use of two different analyzers cause > > any > > trouble for the results? > > > > Thanks. > > -H > > > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > -- æäåäæäå - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Parsing The Query: Every document that doesn't have a field containing x
I think you may can use a filter to get right result! See examlples below package lia.advsearching; import junit.framework.TestCase; import org.apache.lucene.analysis.WhitespaceAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.Term; import org.apache.lucene.search.Hits; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.QueryFilter; import org.apache.lucene.search.TermQuery; import org.apache.lucene.store.RAMDirectory; public class SecurityFilterTest extends TestCase { private RAMDirectory directory; protected void setUp() throws Exception { directory = new RAMDirectory(); IndexWriter writer = new IndexWriter(directory, new WhitespaceAnalyzer(), true); // Elwood Document document = new Document(); document.add(Field.Keyword("owner", "elwood")); document.add(Field.Text("keywords", "elwoods sensitive info")); writer.addDocument(document); // Jake document = new Document(); document.add(Field.Keyword("owner", "jake")); document.add(Field.Text("keywords", "jakes sensitive info")); writer.addDocument(document); writer.close(); } public void testSecurityFilter() throws Exception { TermQuery query = new TermQuery(new Term("keywords", "info")); IndexSearcher searcher = new IndexSearcher(directory); Hits hits = searcher.search(query); assertEquals("Both documents match", 2, hits.length()); QueryFilter jakeFilter = new QueryFilter( new TermQuery(new Term("owner", "jake"))); hits = searcher.search(query, jakeFilter); assertEquals(1, hits.length()); assertEquals("elwood is safe", "jakes sensitive info", hits.doc(0).get("keywords")); } } On Thu, 3 Feb 2005 13:04:50 -0500, Luke Shannon <[EMAIL PROTECTED]> wrote: > Hello; > > I have a query that finds document that contain fields with a specific > value. > > query1 = QueryParser.parse("jpg", "kcfileupload", new StandardAnalyzer()); > > This works well. > > I would like a query that find documents containing all kcfileupload fields > that don't contain jpg. > > The example I found in the book that seems to relate shows me how to find > documents without a specific term: > > QueryParser parser = new QueryParser("contents", analyzer); > parser.setOperator(QueryParser.DEFAULT_OPERATOR_AND); > > But than it says: > > Negating a term must be combined with at least one nonnegated term to return > documents; in other words, it isn't possible to use a query like NOT term to > find all documents that don't contain a term. > > So does that mean the above example wouldn't work? > > The API says: > > a plus (+) or a minus (-) sign, indicating that the clause is required or > prohibited respectively; > > I have been playing around with using the minus character without much luck. > > Can someone give point me in the right direction to figure this out? > > Thanks, > > Luke > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > -- æäåäæäå - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Parsing The Query: Every document that doesn't have a field containing x
Bingo! Nice catch. That was it. Made everything lower case when I set the field. Works great now. Thanks! Luke - Original Message - From: "Kauler, Leto S" <[EMAIL PROTECTED]> To: "Lucene Users List" Sent: Thursday, February 03, 2005 6:48 PM Subject: RE: Parsing The Query: Every document that doesn't have a field containing x Because you are build from QueryParser rather than a TermQuery, all search terms in the query are being lowercased by StandardAnalyzer. So your query of "olFaithFull:stillhere" requires that there is an exact index term of "stillhere" in that field. It depends on how you built the index (index and stored fields are different), but I would check on that. Also maybe try out TermQuery and see if that does anything for you. > -Original Message- > From: Luke Shannon [mailto:[EMAIL PROTECTED] > Sent: Friday, 4 February 2005 10:47 AM > To: Lucene Users List > Subject: Re: Parsing The Query: Every document that doesn't > have a field containing x > > > "stillHere" > > Capital H. > > - Original Message - > From: "Kauler, Leto S" <[EMAIL PROTECTED]> > To: "Lucene Users List" > Sent: Thursday, February 03, 2005 6:40 PM > Subject: RE: Parsing The Query: Every document that doesn't > have a field containing x > > > First thing that jumps out is case-sensitivity. Does your > olFaithFull field contain "stillHere" or "stillhere"? > > --Leto > > > > -Original Message- > > From: Luke Shannon [mailto:[EMAIL PROTECTED] > > This works: > > > > query1 = QueryParser.parse("jpg", "kcfileupload", new > > StandardAnalyzer()); query2 = QueryParser.parse("stillHere", > > "olFaithFull", new StandardAnalyzer()); BooleanQuery > > typeNegativeSearch = new BooleanQuery(); > > typeNegativeSearch.add(query1, false, false); > > typeNegativeSearch.add(query2, false, false); > > > > It returns 9 results. And in string form is: kcfileupload:jpg > > olFaithFull:stillhere > > > > But this: > > > > query1 = QueryParser.parse("jpg", "kcfileupload", new > > StandardAnalyzer()); > > query2 = QueryParser.parse("stillHere", > "olFaithFull", new > > StandardAnalyzer()); > > BooleanQuery typeNegativeSearch = new BooleanQuery(); > > typeNegativeSearch.add(query1, true, false); > > typeNegativeSearch.add(query2, true, false); > > > > Reutrns 0 results and is in string form : +kcfileupload:jpg > > +olFaithFull:stillhere > > > > If I do the query kcfileupload:jpg in Luke I get 9 docs, each doc > > containing a olFaithFull:stillHere. Why would > > +kcfileupload:jpg +olFaithFull:stillhere return no results? > > > > Thanks, > > > > Luke CONFIDENTIALITY NOTICE AND DISCLAIMER Information in this transmission is intended only for the person(s) to whom it is addressed and may contain privileged and/or confidential information. If you are not the intended recipient, any disclosure, copying or dissemination of the information is unauthorised and you should delete/destroy all copies and notify the sender. No liability is accepted for any unauthorised use of the information contained in this transmission. This disclaimer has been automatically added. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Parsing The Query: Every document that doesn't have a field containing x
Because you are build from QueryParser rather than a TermQuery, all search terms in the query are being lowercased by StandardAnalyzer. So your query of "olFaithFull:stillhere" requires that there is an exact index term of "stillhere" in that field. It depends on how you built the index (index and stored fields are different), but I would check on that. Also maybe try out TermQuery and see if that does anything for you. > -Original Message- > From: Luke Shannon [mailto:[EMAIL PROTECTED] > Sent: Friday, 4 February 2005 10:47 AM > To: Lucene Users List > Subject: Re: Parsing The Query: Every document that doesn't > have a field containing x > > > "stillHere" > > Capital H. > > - Original Message - > From: "Kauler, Leto S" <[EMAIL PROTECTED]> > To: "Lucene Users List" > Sent: Thursday, February 03, 2005 6:40 PM > Subject: RE: Parsing The Query: Every document that doesn't > have a field containing x > > > First thing that jumps out is case-sensitivity. Does your > olFaithFull field contain "stillHere" or "stillhere"? > > --Leto > > > > -Original Message- > > From: Luke Shannon [mailto:[EMAIL PROTECTED] > > This works: > > > > query1 = QueryParser.parse("jpg", "kcfileupload", new > > StandardAnalyzer()); query2 = QueryParser.parse("stillHere", > > "olFaithFull", new StandardAnalyzer()); BooleanQuery > > typeNegativeSearch = new BooleanQuery(); > > typeNegativeSearch.add(query1, false, false); > > typeNegativeSearch.add(query2, false, false); > > > > It returns 9 results. And in string form is: kcfileupload:jpg > > olFaithFull:stillhere > > > > But this: > > > > query1 = QueryParser.parse("jpg", "kcfileupload", new > > StandardAnalyzer()); > > query2 = QueryParser.parse("stillHere", > "olFaithFull", new > > StandardAnalyzer()); > > BooleanQuery typeNegativeSearch = new BooleanQuery(); > > typeNegativeSearch.add(query1, true, false); > > typeNegativeSearch.add(query2, true, false); > > > > Reutrns 0 results and is in string form : +kcfileupload:jpg > > +olFaithFull:stillhere > > > > If I do the query kcfileupload:jpg in Luke I get 9 docs, each doc > > containing a olFaithFull:stillHere. Why would > > +kcfileupload:jpg +olFaithFull:stillhere return no results? > > > > Thanks, > > > > Luke CONFIDENTIALITY NOTICE AND DISCLAIMER Information in this transmission is intended only for the person(s) to whom it is addressed and may contain privileged and/or confidential information. If you are not the intended recipient, any disclosure, copying or dissemination of the information is unauthorised and you should delete/destroy all copies and notify the sender. No liability is accepted for any unauthorised use of the information contained in this transmission. This disclaimer has been automatically added. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Parsing The Query: Every document that doesn't have a field containing x
"stillHere" Capital H. - Original Message - From: "Kauler, Leto S" <[EMAIL PROTECTED]> To: "Lucene Users List" Sent: Thursday, February 03, 2005 6:40 PM Subject: RE: Parsing The Query: Every document that doesn't have a field containing x First thing that jumps out is case-sensitivity. Does your olFaithFull field contain "stillHere" or "stillhere"? --Leto > -Original Message- > From: Luke Shannon [mailto:[EMAIL PROTECTED] > This works: > > query1 = QueryParser.parse("jpg", "kcfileupload", new > StandardAnalyzer()); query2 = QueryParser.parse("stillHere", > "olFaithFull", new StandardAnalyzer()); BooleanQuery > typeNegativeSearch = new BooleanQuery(); > typeNegativeSearch.add(query1, false, false); > typeNegativeSearch.add(query2, false, false); > > It returns 9 results. And in string form is: kcfileupload:jpg > olFaithFull:stillhere > > But this: > > query1 = QueryParser.parse("jpg", "kcfileupload", new > StandardAnalyzer()); > query2 = QueryParser.parse("stillHere", > "olFaithFull", new StandardAnalyzer()); > BooleanQuery typeNegativeSearch = new BooleanQuery(); > typeNegativeSearch.add(query1, true, false); > typeNegativeSearch.add(query2, true, false); > > Reutrns 0 results and is in string form : +kcfileupload:jpg > +olFaithFull:stillhere > > If I do the query kcfileupload:jpg in Luke I get 9 docs, each > doc containing a olFaithFull:stillHere. Why would > +kcfileupload:jpg +olFaithFull:stillhere return no results? > > Thanks, > > Luke CONFIDENTIALITY NOTICE AND DISCLAIMER Information in this transmission is intended only for the person(s) to whom it is addressed and may contain privileged and/or confidential information. If you are not the intended recipient, any disclosure, copying or dissemination of the information is unauthorised and you should delete/destroy all copies and notify the sender. No liability is accepted for any unauthorised use of the information contained in this transmission. This disclaimer has been automatically added. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Parsing The Query: Every document that doesn't have a field containing x
First thing that jumps out is case-sensitivity. Does your olFaithFull field contain "stillHere" or "stillhere"? --Leto > -Original Message- > From: Luke Shannon [mailto:[EMAIL PROTECTED] > This works: > > query1 = QueryParser.parse("jpg", "kcfileupload", new > StandardAnalyzer()); query2 = QueryParser.parse("stillHere", > "olFaithFull", new StandardAnalyzer()); BooleanQuery > typeNegativeSearch = new BooleanQuery(); > typeNegativeSearch.add(query1, false, false); > typeNegativeSearch.add(query2, false, false); > > It returns 9 results. And in string form is: kcfileupload:jpg > olFaithFull:stillhere > > But this: > > query1 = QueryParser.parse("jpg", "kcfileupload", new > StandardAnalyzer()); > query2 = QueryParser.parse("stillHere", > "olFaithFull", new StandardAnalyzer()); > BooleanQuery typeNegativeSearch = new BooleanQuery(); > typeNegativeSearch.add(query1, true, false); > typeNegativeSearch.add(query2, true, false); > > Reutrns 0 results and is in string form : +kcfileupload:jpg > +olFaithFull:stillhere > > If I do the query kcfileupload:jpg in Luke I get 9 docs, each > doc containing a olFaithFull:stillHere. Why would > +kcfileupload:jpg +olFaithFull:stillhere return no results? > > Thanks, > > Luke CONFIDENTIALITY NOTICE AND DISCLAIMER Information in this transmission is intended only for the person(s) to whom it is addressed and may contain privileged and/or confidential information. If you are not the intended recipient, any disclosure, copying or dissemination of the information is unauthorised and you should delete/destroy all copies and notify the sender. No liability is accepted for any unauthorised use of the information contained in this transmission. This disclaimer has been automatically added. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Parsing The Query: Every document that doesn't have a field containing x
This works: query1 = QueryParser.parse("jpg", "kcfileupload", new StandardAnalyzer()); query2 = QueryParser.parse("stillHere", "olFaithFull", new StandardAnalyzer()); BooleanQuery typeNegativeSearch = new BooleanQuery(); typeNegativeSearch.add(query1, false, false); typeNegativeSearch.add(query2, false, false); It returns 9 results. And in string form is: kcfileupload:jpg olFaithFull:stillhere But this: query1 = QueryParser.parse("jpg", "kcfileupload", new StandardAnalyzer()); query2 = QueryParser.parse("stillHere", "olFaithFull", new StandardAnalyzer()); BooleanQuery typeNegativeSearch = new BooleanQuery(); typeNegativeSearch.add(query1, true, false); typeNegativeSearch.add(query2, true, false); Reutrns 0 results and is in string form : +kcfileupload:jpg +olFaithFull:stillhere If I do the query kcfileupload:jpg in Luke I get 9 docs, each doc containing a olFaithFull:stillHere. Why would +kcfileupload:jpg +olFaithFull:stillhere return no results? Thanks, Luke - Original Message - From: "Maik Schreiber" <[EMAIL PROTECTED]> To: "Lucene Users List" Sent: Thursday, February 03, 2005 4:55 PM Subject: Re: Parsing The Query: Every document that doesn't have a field containing x > > Yes. There should be 119 with stillHere, > > You have double-checked that, haven't you? :) > > > and if I run a query in Luke on > > kcfileupload = ppt, it returns one result. I am thinking I should at least > > get this result back with: -kcfileupload:jpg +olFaithFull:stillhere? > > You really should. > > -- > Maik Schreiber * http://www.blizzy.de <-- Get GMail invites here! > > GPG public key: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x1F11D713 > Key fingerprint: CF19 AFCE 6E3D 5443 9599 18B5 5640 1F11 D713 > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Numbers in the Query String
Using different analyzers for indexing and searching is not recommended. Your numbers are not even in the index because you are using StandardAnalyzer. Use Luke to look at your index. Otis --- Hetan Shah <[EMAIL PROTECTED]> wrote: > Hello, > > How can one search for a document based on the query which has > numbers > in the query srting. > > e.g. query = Java 2 Platform J2EE > > What do I need to do so that the numbers do not get neglected. > > I am using StandardAnalyzer to index the pages and using StopAnalyzer > to > search the documents. Would the use of two different analyzers cause > any > trouble for the results? > > Thanks. > -H > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Numbers in the Query String
Hetan Shah wrote: Hello, How can one search for a document based on the query which has numbers in the query srting. e.g. query = Java 2 Platform J2EE What do I need to do so that the numbers do not get neglected. I am using StandardAnalyzer to index the pages and using StopAnalyzer to search the documents. Would the use of two different analyzers cause any trouble for the results? Yes. StopAnalyzer eats all numbers for breakfast. ;-) You need to use another analyzer, one that doesn't discard numbers. -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Numbers in the Query String
Hello, How can one search for a document based on the query which has numbers in the query srting. e.g. query = Java 2 Platform J2EE What do I need to do so that the numbers do not get neglected. I am using StandardAnalyzer to index the pages and using StopAnalyzer to search the documents. Would the use of two different analyzers cause any trouble for the results? Thanks. -H - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Parsing The Query: Every document that doesn't have a field containing x
I did, I have ran both queries in Luke. kcfileupload:ppt returns 1 olFaithfull:stillhere returns 119 Luke - Original Message - From: "Maik Schreiber" <[EMAIL PROTECTED]> To: "Lucene Users List" Sent: Thursday, February 03, 2005 4:55 PM Subject: Re: Parsing The Query: Every document that doesn't have a field containing x > > Yes. There should be 119 with stillHere, > > You have double-checked that, haven't you? :) > > > and if I run a query in Luke on > > kcfileupload = ppt, it returns one result. I am thinking I should at least > > get this result back with: -kcfileupload:jpg +olFaithFull:stillhere? > > You really should. > > -- > Maik Schreiber * http://www.blizzy.de <-- Get GMail invites here! > > GPG public key: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x1F11D713 > Key fingerprint: CF19 AFCE 6E3D 5443 9599 18B5 5640 1F11 D713 > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Parsing The Query: Every document that doesn't have a field containing x
Yes. There should be 119 with stillHere, You have double-checked that, haven't you? :) and if I run a query in Luke on kcfileupload = ppt, it returns one result. I am thinking I should at least get this result back with: -kcfileupload:jpg +olFaithFull:stillhere? You really should. -- Maik Schreiber * http://www.blizzy.de <-- Get GMail invites here! GPG public key: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x1F11D713 Key fingerprint: CF19 AFCE 6E3D 5443 9599 18B5 5640 1F11 D713 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Parsing The Query: Every document that doesn't have a field containing x
Yes. There should be 119 with stillHere, and if I run a query in Luke on kcfileupload = ppt, it returns one result. I am thinking I should at least get this result back with: -kcfileupload:jpg +olFaithFull:stillhere? Luke - Original Message - From: "Maik Schreiber" <[EMAIL PROTECTED]> To: "Lucene Users List" Sent: Thursday, February 03, 2005 4:27 PM Subject: Re: Parsing The Query: Every document that doesn't have a field containing x > > -kcfileupload:jpg +olFaithFull:stillhere > > > > This looks right to me. Why the 0 results? > > Looks good to me, too. You sure all your documents have > olFaithFull:stillhere and there is at least a document with kcfileupload not > being "jpg"? > > -- > Maik Schreiber * http://www.blizzy.de <-- Get GMail invites here! > > GPG public key: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x1F11D713 > Key fingerprint: CF19 AFCE 6E3D 5443 9599 18B5 5640 1F11 D713 > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Parsing The Query: Every document that doesn't have a field containing x
-kcfileupload:jpg +olFaithFull:stillhere This looks right to me. Why the 0 results? Looks good to me, too. You sure all your documents have olFaithFull:stillhere and there is at least a document with kcfileupload not being "jpg"? -- Maik Schreiber * http://www.blizzy.de <-- Get GMail invites here! GPG public key: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x1F11D713 Key fingerprint: CF19 AFCE 6E3D 5443 9599 18B5 5640 1F11 D713 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Parsing The Query: Every document that doesn't have a field containing x
Hello, Still working on the same query, here is the code I am currently working with. I am thinking this should bring up all the documents that have olFaithFull=stillHere and kcfileupload!=jpg (so anything else) query1 = QueryParser.parse("jpg", "kcfileupload", new StandardAnalyzer()); query2 = QueryParser.parse("stillHere", "olFaithFull", new StandardAnalyzer()); BooleanQuery typeNegativeSearch = new BooleanQuery(); typeNegativeSearch.add(query1, false, true); typeNegativeSearch.add(query2, true, false); There toString() on the query is: -kcfileupload:jpg +olFaithFull:stillhere This looks right to me. Why the 0 results? Thanks, Luke - Original Message - From: "Maik Schreiber" <[EMAIL PROTECTED]> To: "Lucene Users List" Sent: Thursday, February 03, 2005 1:19 PM Subject: Re: Parsing The Query: Every document that doesn't have a field containing x > > Negating a term must be combined with at least one nonnegated term to return > > documents; in other words, it isn't possible to use a query like NOT term to > > find all documents that don't contain a term. > > > > So does that mean the above example wouldn't work? > > Exactly. You cannot search for "-kcfileupload:jpg", you need at least one > clause that actually _includes_ documents. > > Do you by chance have a field with known contents? If so, you could misuse > that one and include it in your query (perhaps by doing range or > wildcard/prefix search). If not, try IndexReader.terms() for building a > Query yourself, then use that one for search. > > -- > Maik Schreiber * http://www.blizzy.de > > GPG public key: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x1F11D713 > Key fingerprint: CF19 AFCE 6E3D 5443 9599 18B5 5640 1F11 D713 > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Parsing The Query: Every document that doesn't have a field containing x
Ok. I have added the following to every document: doc.add(Field.UnIndexed("olFaithfull", "stillHere")); The plan is a query that says: olFaithull = stillHere and kcfileupload!=jpg. I have been experimenting with the MultiFieldQueryParser, this is not working out for me. From a syntax how is this done? Does someone have an example of a query similar to the one I am trying? Thanks, Luke - Original Message - From: "Maik Schreiber" <[EMAIL PROTECTED]> To: "Lucene Users List" Sent: Thursday, February 03, 2005 1:19 PM Subject: Re: Parsing The Query: Every document that doesn't have a field containing x > > Negating a term must be combined with at least one nonnegated term to return > > documents; in other words, it isn't possible to use a query like NOT term to > > find all documents that don't contain a term. > > > > So does that mean the above example wouldn't work? > > Exactly. You cannot search for "-kcfileupload:jpg", you need at least one > clause that actually _includes_ documents. > > Do you by chance have a field with known contents? If so, you could misuse > that one and include it in your query (perhaps by doing range or > wildcard/prefix search). If not, try IndexReader.terms() for building a > Query yourself, then use that one for search. > > -- > Maik Schreiber * http://www.blizzy.de > > GPG public key: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x1F11D713 > Key fingerprint: CF19 AFCE 6E3D 5443 9599 18B5 5640 1F11 D713 > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Parsing The Query: Every document that doesn't have a field containing x
Alternatively, add a dummy field-value to all documents, like doc.add(Field.Keyword("foo", "bar")) Waste of space, but allows you to perform negated queries. On Thu, 03 Feb 2005 19:19:15 +0100, Maik Schreiber wrote: >> Negating a term must be combined with at least one nonnegated >> term to return documents; in other words, it isn't possible to >> use a query like NOT term to find all documents that don't >> contain a term. >> >> So does that mean the above example wouldn't work? >> > Exactly. You cannot search for "-kcfileupload:jpg", you need at > least one clause that actually _includes_ documents. > > Do you by chance have a field with known contents? If so, you could > misuse that one and include it in your query (perhaps by doing > range or wildcard/prefix search). If not, try IndexReader.terms() > for building a Query yourself, then use that one for search. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Parsing The Query: Every document that doesn't have a field containing x
Negating a term must be combined with at least one nonnegated term to return documents; in other words, it isn't possible to use a query like NOT term to find all documents that don't contain a term. So does that mean the above example wouldn't work? Exactly. You cannot search for "-kcfileupload:jpg", you need at least one clause that actually _includes_ documents. Do you by chance have a field with known contents? If so, you could misuse that one and include it in your query (perhaps by doing range or wildcard/prefix search). If not, try IndexReader.terms() for building a Query yourself, then use that one for search. -- Maik Schreiber * http://www.blizzy.de GPG public key: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x1F11D713 Key fingerprint: CF19 AFCE 6E3D 5443 9599 18B5 5640 1F11 D713 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Parsing The Query: Every document that doesn't have a field containing x
Hello; I have a query that finds document that contain fields with a specific value. query1 = QueryParser.parse("jpg", "kcfileupload", new StandardAnalyzer()); This works well. I would like a query that find documents containing all kcfileupload fields that don't contain jpg. The example I found in the book that seems to relate shows me how to find documents without a specific term: QueryParser parser = new QueryParser("contents", analyzer); parser.setOperator(QueryParser.DEFAULT_OPERATOR_AND); But than it says: Negating a term must be combined with at least one nonnegated term to return documents; in other words, it isn't possible to use a query like NOT term to find all documents that don't contain a term. So does that mean the above example wouldn't work? The API says: a plus (+) or a minus (-) sign, indicating that the clause is required or prohibited respectively; I have been playing around with using the minus character without much luck. Can someone give point me in the right direction to figure this out? Thanks, Luke - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Query Format
How are you indexing your document? If you're using QueryParser with the default operator set to OR (which is the default), then you've already provided the expression you need :) Erik On Feb 1, 2005, at 6:29 PM, Hetan Shah wrote: Hello All, What should my query look like if I want to search all or any of the following key words. Sun Linux Red Hat Advance Server replies are much appreciated. -H - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Query Format
Hello All, What should my query look like if I want to search all or any of the following key words. Sun Linux Red Hat Advance Server replies are much appreciated. -H - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: query term frequency
This from the highlighter package will give you the IDF : WeightedTerm[] QueryTermExtractor.getIdfWeightedTerms(Query query, IndexReader reader, String fieldName) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: query term frequency
I implemented a Query version of the TermVector org.apache.lucene.search.QueryTermVector Works off of an array of Strings or a String and an Analyzer. Is this what you are looking for? >>> [EMAIL PROTECTED] 1/28/2005 6:33:18 AM >>> On Jan 27, 2005, at 10:24 PM, Jonathan Lasko wrote: > No, the number of occurrences of a term in a Query. Nothing built-in gives you this. You'd have to dissect the Query clause-by-clause and cast each clause to the proper type to pull the terms from them. The Highlighter code does this. If there is a better way, I'd like to know. Erik > > Jonathan > > Quoting David Spencer <[EMAIL PROTECTED]>: > >> Jonathan Lasko wrote: >> >>> What do I call to get the term frequencies for terms in the Query? I >>> can't seem to find it in the Javadoc... >> >> Do you mean the # of docs that have a term? >> >> > http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/ > IndexReader.html#docFreq(org.apache.lucene.index.Term) >>> Thanks. >>> >>> Jonathan >>> >>> - >>> To unsubscribe, e-mail: [EMAIL PROTECTED] >>> For additional commands, e-mail: [EMAIL PROTECTED] >>> >> >> >> - >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> >> >> >> > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: lucene query (sql kind)
Ross - I'm really perplexed by your message. You create HTML from a database so that you can index it with Lucene, yet wish you could simply index the data in your database tied to a primary key directly, right? Well, you're in luck - you already can do this! What are you using for indexing? It sounds like you borrowed the Lucene demo and have just run with that directly. Erik On Jan 28, 2005, at 11:02 AM, Ross Rankin wrote: I agree. My site is all dynamic pages created from the database. Right now, I have to have a process create dummy pages, index them with Lucene, then translate the Lucene results into meaningful links. It actually works better than it sounds, however it could be easier. If I could just give Lucene a query result (i.e. a list of rows) and then have Lucene send me back say the primary key of the rows that match and the other Lucene goodness: ranking, number of hits, etc. Could be pretty powerful and simplify the deployment for database driven applications. [Note: this opinion and $3.00 will get you a coffee at Starbucks] Ross -Original Message- From: PA [mailto:[EMAIL PROTECTED] Sent: Friday, January 28, 2005 6:44 AM To: Lucene Users List Subject: Re: lucene query (sql kind) On Jan 28, 2005, at 12:40, sunil goyal wrote: I want to run dynamic queries against the lucene index. Is there any native syntax available for Lucene so that I can query, by first generating the query in say an XML or SQL like format (cache this query) and then use this query over lucene index. Talking of which, did anyone contemplated the possibility of a JDBC adaptor of sort for Lucene? Cheers -- PA, Onnay Equitursay http://alt.textdrive.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: lucene query (sql kind)
I agree. My site is all dynamic pages created from the database. Right now, I have to have a process create dummy pages, index them with Lucene, then translate the Lucene results into meaningful links. It actually works better than it sounds, however it could be easier. If I could just give Lucene a query result (i.e. a list of rows) and then have Lucene send me back say the primary key of the rows that match and the other Lucene goodness: ranking, number of hits, etc. Could be pretty powerful and simplify the deployment for database driven applications. [Note: this opinion and $3.00 will get you a coffee at Starbucks] Ross -Original Message- From: PA [mailto:[EMAIL PROTECTED] Sent: Friday, January 28, 2005 6:44 AM To: Lucene Users List Subject: Re: lucene query (sql kind) On Jan 28, 2005, at 12:40, sunil goyal wrote: > I want to run dynamic queries against the lucene index. Is there any > native syntax available for Lucene so that I can query, by first > generating the query in say an XML or SQL like format (cache this > query) and then use this query over lucene index. Talking of which, did anyone contemplated the possibility of a JDBC adaptor of sort for Lucene? Cheers -- PA, Onnay Equitursay http://alt.textdrive.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: lucene query (sql kind)
I like your idea and think you are quite right. I see quite some people are using lucene to the extreme such that relational database functionalities are replaced by lucene. However, storing everything in lucene and use it as a relational type of database will be kind of re-inventing the wheel. For example, sorting on the date field, and any other range query. I think the better way is to look at ways to integrate lucene tightly into a java relational database, such as HSQL, McKoi or Derby. In particular, that integration would make it possible for queries like "contains(...)", which is included in MySQL full text search syntax and other major relational db vendors. I would like to contribute any possible help I could for that to happen. Thanks, Jian On Fri, 28 Jan 2005 13:01:40 + (GMT), mark harwood <[EMAIL PROTECTED]> wrote: > I've added some user-defined lucene functions to > HSQLDB and I've been able to run queries like the > following one: > > select top 10 lucene_highlight(adText) from ads where > pricePounds <200 and lucene_query('bass guitar > drums',id)>0 order by lucene_score(id) DESC > > I've had similar success with Derby (Cloudscape). > This approach has some appeal and I've been able to > use the same class as a UDF in both databases but it > does have issues: it looks like this UDF based > integration won't scale. The above query took 80 > milliseconds using 10,000 records. Another > index/database with 50,000 records was taking a matter > of seconds. I think a scalable integration is likely > to require modification of the core RDBMS code. > > I think it is worth considering developing such a > tight RDBMS integration if you consider the issues > commonly associated with using Lucene: > 1) Sorting on float/date fields and associated memory > consumption > 2) Representing numbers/dates in Lucene (eg having to > pad with sufficent leading zeros and add to index's > list of terms) > 3) Retrieving only certain stored fields from a > document (all storage can be done in db) > 4) Issues to do with updating volatile data eg price > data used in sorts > 5) Manually coding joins with RDBMS content as custom > filters > 6) Too-many terms exceptions produced by range queries > 7) Grouping results eg by website > 8) Boosting docs based on stored content eg date > > I'm not saying there aren't answers to the above using > Lucene. However,I do wonder if these can be addressed > more effectively in a project which seeks tighter > integration with an RDBMS and leveraging its > capabilities. > > Any one else been down this route? > > > ___ > ALL-NEW Yahoo! Messenger - all new features - even more fun! > http://uk.messenger.yahoo.com > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: lucene query (sql kind)
I've added some user-defined lucene functions to HSQLDB and I've been able to run queries like the following one: select top 10 lucene_highlight(adText) from ads where pricePounds <200 and lucene_query('bass guitar drums',id)>0 order by lucene_score(id) DESC I've had similar success with Derby (Cloudscape). This approach has some appeal and I've been able to use the same class as a UDF in both databases but it does have issues: it looks like this UDF based integration won't scale. The above query took 80 milliseconds using 10,000 records. Another index/database with 50,000 records was taking a matter of seconds. I think a scalable integration is likely to require modification of the core RDBMS code. I think it is worth considering developing such a tight RDBMS integration if you consider the issues commonly associated with using Lucene: 1) Sorting on float/date fields and associated memory consumption 2) Representing numbers/dates in Lucene (eg having to pad with sufficent leading zeros and add to index's list of terms) 3) Retrieving only certain stored fields from a document (all storage can be done in db) 4) Issues to do with updating volatile data eg price data used in sorts 5) Manually coding joins with RDBMS content as custom filters 6) Too-many terms exceptions produced by range queries 7) Grouping results eg by website 8) Boosting docs based on stored content eg date I'm not saying there aren't answers to the above using Lucene. However,I do wonder if these can be addressed more effectively in a project which seeks tighter integration with an RDBMS and leveraging its capabilities. Any one else been down this route? ___ ALL-NEW Yahoo! Messenger - all new features - even more fun! http://uk.messenger.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: lucene query (sql kind)
Hello, Thanks, It works fine. > The field parameter simply defines the default field for all queries > without an explicit field specification (:). > Using 'field AND field' as default field does not make sense but does > not hurt as long as the default field is not used. > I'm not sure why you choose that. I just thought that Query Parser needs to be specifies what it should expect before hand. So did "field AND field". But I was wrong. > Further name:\"john\" and name:john should be the same. Just in case it's not "john" but "hello john" or some phrase. Regards Sunil On Fri, 28 Jan 2005 13:26:26 +0100, Morus Walter <[EMAIL PROTECTED]> wrote: > sunil goyal writes: > > > > I was just trying that... > > > > QueryParser qp = new QueryParser("field AND field", new StandardAnalyzer()); > > Query query = qp.parse("name:\"john\" AND age:[10 TO 16]"); > > > > It works fine with this. Do I need to specify that QueryParser should > > expect things in order > > "field AND field". Or can I do without it? > > > The field parameter simply defines the default field for all queries > without an explicit field specification (:). > Using 'field AND field' as default field does not make sense but does > not hurt as long as the default field is not used. > I'm not sure why you choose that. > > Further name:\"john\" and name:john should be the same. > > HTH > Morus > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: lucene query (sql kind)
sunil goyal writes: > > I was just trying that... > > QueryParser qp = new QueryParser("field AND field", new StandardAnalyzer()); > Query query = qp.parse("name:\"john\" AND age:[10 TO 16]"); > > It works fine with this. Do I need to specify that QueryParser should > expect things in order > "field AND field". Or can I do without it? > The field parameter simply defines the default field for all queries without an explicit field specification (:). Using 'field AND field' as default field does not make sense but does not hurt as long as the default field is not used. I'm not sure why you choose that. Further name:\"john\" and name:john should be the same. HTH Morus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: lucene query (sql kind)
I've merged some different fields in one query, with the name of one of these fields as the second parameter in the static method, and it worked fine. Also, you can do a little query parser, and build the queries with BooleanQuery. David sunil goyal wrote: Hello, I was just trying that... QueryParser qp = new QueryParser("field AND field", new StandardAnalyzer()); Query query = qp.parse("name:\"john\" AND age:[10 TO 16]"); It works fine with this. Do I need to specify that QueryParser should expect things in order "field AND field". Or can I do without it? The static method of QueryParser.parse(String , String, Analyzer) - expects the first string to be the query and second to be the field. Thanks Regards Sunil - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: lucene query (sql kind)
Hello, I was just trying that... QueryParser qp = new QueryParser("field AND field", new StandardAnalyzer()); Query query = qp.parse("name:\"john\" AND age:[10 TO 16]"); It works fine with this. Do I need to specify that QueryParser should expect things in order "field AND field". Or can I do without it? The static method of QueryParser.parse(String , String, Analyzer) - expects the first string to be the query and second to be the field. Thanks Regards Sunil On Fri, 28 Jan 2005 12:54:27 +0100, David Escuer <[EMAIL PROTECTED]> wrote: > > Hello, >To build queries, you can generate a query like "(text:house OR > text:car) AND (keywords:building)", and then >parse it with the QueryParser.parse method to get the Lucene query. > Is not 100% sql-like syntax, but it's more clear >than the lucene syntax. > > Hope it helps > > David > > sunil goyal wrote: > > >Hello all, > > > >I want to run dynamic queries against the lucene index. Is there any > >native syntax available for Lucene so that I can query, by first > >generating the query in say an XML or SQL like format (cache this > >query) and then use this query over lucene index. > > > > > >e.g. So a lucene query syntax in which I can define a query > >(name="john" AND age <10) and then I can just use this query to > >execute over Lucene index. > > > > > >Regards > >Sunil > > > >- > >To unsubscribe, e-mail: [EMAIL PROTECTED] > >For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: lucene query (sql kind)
Hello, To build queries, you can generate a query like "(text:house OR text:car) AND (keywords:building)", and then parse it with the QueryParser.parse method to get the Lucene query. Is not 100% sql-like syntax, but it's more clear than the lucene syntax. Hope it helps David sunil goyal wrote: Hello all, I want to run dynamic queries against the lucene index. Is there any native syntax available for Lucene so that I can query, by first generating the query in say an XML or SQL like format (cache this query) and then use this query over lucene index. e.g. So a lucene query syntax in which I can define a query (name="john" AND age <10) and then I can just use this query to execute over Lucene index. Regards Sunil - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: lucene query (sql kind)
On Jan 28, 2005, at 12:40, sunil goyal wrote: I want to run dynamic queries against the lucene index. Is there any native syntax available for Lucene so that I can query, by first generating the query in say an XML or SQL like format (cache this query) and then use this query over lucene index. Talking of which, did anyone contemplated the possibility of a JDBC adaptor of sort for Lucene? Cheers -- PA, Onnay Equitursay http://alt.textdrive.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
lucene query (sql kind)
Hello all, I want to run dynamic queries against the lucene index. Is there any native syntax available for Lucene so that I can query, by first generating the query in say an XML or SQL like format (cache this query) and then use this query over lucene index. e.g. So a lucene query syntax in which I can define a query (name="john" AND age <10) and then I can just use this query to execute over Lucene index. Regards Sunil - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: query term frequency
On Jan 27, 2005, at 10:24 PM, Jonathan Lasko wrote: No, the number of occurrences of a term in a Query. Nothing built-in gives you this. You'd have to dissect the Query clause-by-clause and cast each clause to the proper type to pull the terms from them. The Highlighter code does this. If there is a better way, I'd like to know. Erik Jonathan Quoting David Spencer <[EMAIL PROTECTED]>: Jonathan Lasko wrote: What do I call to get the term frequencies for terms in the Query? I can't seem to find it in the Javadoc... Do you mean the # of docs that have a term? http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/ IndexReader.html#docFreq(org.apache.lucene.index.Term) Thanks. Jonathan - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: query term frequency
No, the number of occurrences of a term in a Query. Jonathan Quoting David Spencer <[EMAIL PROTECTED]>: > Jonathan Lasko wrote: > > > What do I call to get the term frequencies for terms in the Query? I > > can't seem to find it in the Javadoc... > > Do you mean the # of docs that have a term? > > http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexReader.html#docFreq(org.apache.lucene.index.Term) > > Thanks. > > > > Jonathan > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: query term frequency
Jonathan Lasko wrote: What do I call to get the term frequencies for terms in the Query? I can't seem to find it in the Javadoc... Do you mean the # of docs that have a term? http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexReader.html#docFreq(org.apache.lucene.index.Term) Thanks. Jonathan - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
query term frequency
What do I call to get the term frequencies for terms in the Query? I can't seem to find it in the Javadoc... Thanks. Jonathan - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: WordNet code updated, now with query expansion -- Re: SYNONYM + GOOGLE
Hi, David Spencer a écrit : Do you plan to add expansion on other Wordnet relationships ? Hypernyms and hyponyms would be a good start point for thesaurus-like search, wouldn't it ? Good point, I hadn't considered this - but how would it work -just consider these 2 relationships "synonyms" (thus easier to use) or make it separate (too academic?) Well... the ideal case would be (easy) customization :-), form an external text (XML ?) file. Depending of the kind of relationship, the boost factor could be adjusted when the query is expanded. The same on relationships' depths. For example a "father" hypernym could have a boost factor of 0.8, a "grand-father" a boost factor of 0.4, a "grand-grand-father" a boost factor of 0.2. Well, I wonder whether a logarithmic scale makes a better sense than a linear scale, but this should/would be customizable... However, I'm afraid that this kind of feature would require refactoring, probably based on WordNet-dedicated libraries. JWNL (http://jwordnet.sourceforge.net/) may be a good candidate for this. Good point, should leverage existing code. One thing you can also easily get from this library are Wordnet's "exceptions", often irregular plurals (mouse/mice, addendum/addenda...). A very basic yet efficient kind of stemming which should be expanded with the same boost factor than the original term. Well, there are many other relationships in WordNet. Take a look at : http://jws-champo.ac-toulouse.fr:8080/treebolic-wordnet/ legends are here : http://treebolic.sourceforge.net/en/browserwn.htm Cheers, -- Pierrick Brihaye, informaticien Service régional de l'Inventaire DRAC Bretagne mailto:[EMAIL PROTECTED] +33 (0)2 99 29 67 78 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: WordNet code updated, now with query expansion -- Re: SYNONYM + GOOGLE
Pierrick Brihaye wrote: Hi, David Spencer a écrit : One example of expansion with the synonym boost set to 0.9 is the query "big dog" expands to: Interesting. Do you plan to add expansion on other Wordnet relationships ? Hypernyms and hyponyms would be a good start point for thesaurus-like search, wouldn't it ? Good point, I hadn't considered this - but how would it work -just consider these 2 relationships "synonyms" (thus easier to use) or make it separate (too academic?) However, I'm afraid that this kind of feature would require refactoring, probably based on WordNet-dedicated libraries. JWNL (http://jwordnet.sourceforge.net/) may be a good candidate for this. Good point, should leverage existing code. Thank you for your work. thx, Dave Cheers, - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: WordNet code updated, now with query expansion -- Re: SYNONYM + GOOGLE
Hi, David Spencer a écrit : One example of expansion with the synonym boost set to 0.9 is the query "big dog" expands to: Interesting. Do you plan to add expansion on other Wordnet relationships ? Hypernyms and hyponyms would be a good start point for thesaurus-like search, wouldn't it ? However, I'm afraid that this kind of feature would require refactoring, probably based on WordNet-dedicated libraries. JWNL (http://jwordnet.sourceforge.net/) may be a good candidate for this. Thank you for your work. Cheers, -- Pierrick Brihaye, informaticien Service régional de l'Inventaire DRAC Bretagne mailto:[EMAIL PROTECTED] +33 (0)2 99 29 67 78 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
MoreLikeThis and other similarity query generators checked in + online demo
Based on mail from Doug I wrote a "more like this" query generator, named, well, MoreLikeThis. Bruce Ritchie and Mark Harwood made changes to it (esp term vector support) and bug fixes. Thanks to everyone. I've checked in the code to the sandbox under contributions/similarity. The package it ends up at is org.apache.lucene.search.similar -- hope that makes sense. I also created a class, SimilarityQueries, to hold other methods of similarity query generation. The 2 methods in there are "dumber" variations that use the entire source of the target doc to from a large query. Javadoc is here: http://www.searchmorph.com/pub/jakarta-lucene-sandbox/contributions/similarity/build/docs/api/org/apache/lucene/search/similar/package-summary.html Online demo here - this page below compares the 3 variations on detecting similar docs. The timing info (3 numbers w/ "(ms)") may be suspect. Also note if you scroll to the bottom you can see the queries that were generated. Here's a page showing docs similar to the entry for Iraq: http://www.searchmorph.com/kat/wikipedia-compare.jsp?s=Iraq And here's one for docs similar to the one on Garry Kasparov (he knows how to play chess :) ): http://www.searchmorph.com/kat/wikipedia-compare.jsp?s=Garry_Kasparov To get to it you start here: http://www.searchmorph.com/kat/wikipedia.jsp And search for something - on the search results page follow a "cmp" link http://www.searchmorph.com/kat/wikipedia.jsp?s=iraq Make sense? Useful? Has anyone done any other variations (e.g. cosine measure)? - Dave - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Boolean Query
Um.. Nevermind.. I figured it out.. I was using the StandardAnalyzer when I built my index and thus didn't have N or St in the index itself. R -Original Message- From: Ryan Aslett Sent: Friday, January 14, 2005 11:18 AM To: Lucene Users List Subject: Boolean Query Okay, Im not grokking something here. Im trying to run a query that returns only the results that have *all* of the terms in my query string. When I run this query, which I construct myself and do a Analyzer analyzer = new WhitespaceAnalyzer(); QueryParser qp = new QueryParser("address", analyzer); Query addyQ = qp.parse(queryString); I get the following in addyQ.toString: Query: +address:122 +address:N +address:30th +address:St 0 total matching documents in 19 ms So the querystring is unchanged from what I created, and I get zilch. However, when I do this: Analyzer analyzer = new WhitespaceAnalyzer(); BooleanQuery baQ = new BooleanQuery(); Query parsedAddyQ = qp.parse("122 N 30th St"); baQ.add(parsedAddyQ, true, false); I get this: Query: +(address:122 address:N address:30th address:St) 62 total matching documents in 8 ms 0. 122 N 30th St 1. PO Box 122 2. PO Box 122 3. 122 N 9th St 4. 122 S Clay Ave 5. 122 E 3rd St ... So.. I want that first result, I know its in there, its matching as the highest match, but not when I require all 4 tokens? What gives? What am I doing wrong? Also, if it matters Im running the query on a parallelMultiSearcher with 280 indexes of 1 million records each. Ryan - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Boolean Query
Okay, Im not grokking something here. Im trying to run a query that returns only the results that have *all* of the terms in my query string. When I run this query, which I construct myself and do a Analyzer analyzer = new WhitespaceAnalyzer(); QueryParser qp = new QueryParser("address", analyzer); Query addyQ = qp.parse(queryString); I get the following in addyQ.toString: Query: +address:122 +address:N +address:30th +address:St 0 total matching documents in 19 ms So the querystring is unchanged from what I created, and I get zilch. However, when I do this: Analyzer analyzer = new WhitespaceAnalyzer(); BooleanQuery baQ = new BooleanQuery(); Query parsedAddyQ = qp.parse("122 N 30th St"); baQ.add(parsedAddyQ, true, false); I get this: Query: +(address:122 address:N address:30th address:St) 62 total matching documents in 8 ms 0. 122 N 30th St 1. PO Box 122 2. PO Box 122 3. 122 N 9th St 4. 122 S Clay Ave 5. 122 E 3rd St ... So.. I want that first result, I know its in there, its matching as the highest match, but not when I require all 4 tokens? What gives? What am I doing wrong? Also, if it matters Im running the query on a parallelMultiSearcher with 280 indexes of 1 million records each. Ryan - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: WordNet code updated, now with query expansion -- Re: SYNONYM + GOOGLE
Daniel Naber <[EMAIL PROTECTED]> writes: > On Wednesday 12 January 2005 01:47, David Spencer wrote: > >> Amusingly then, documents with the terms "liberal wienerwurst" match >> "big dog"! :) > > There's something like frequency information in WordNet, it could probably > be used to ignore the uncommon meanings. If you just go search CiteSeer for "WordNet", you will find the output of every failed MS thesis experiment to improve retrieval performance by naive application of WordNet synsets. But I like the query expansion code. Ian - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: WordNet code updated, now with query expansion -- Re: SYNONYM + GOOGLE
On Wednesday 12 January 2005 01:47, David Spencer wrote: > Amusingly then, documents with the terms "liberal wienerwurst" match > "big dog"! :) There's something like frequency information in WordNet, it could probably be used to ignore the uncommon meanings. Regards Daniel -- http://www.danielnaber.de - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
WordNet code updated, now with query expansion -- Re: SYNONYM + GOOGLE
Erik Hatcher wrote: On Jan 10, 2005, at 6:54 PM, David Spencer wrote: Hi...I wrote the WordNet sandbox code - but I'm not sure if I undertand this thread. Are we saying that it does not work w/ the new WordNet data, or that code in Eric's book is better/more up to date etc? I have not tried the sandbox with any versions past WordNet 1.6. Karthik shows a Java API to it, which I have not used - only your code that parses the prolog files. So the book code explains exactly what is in the sandbox and describes WordNet 1.6 integration. Though WordNet has evolved. If needed I can update the sandbox code.. It'd be awesome to have current WordNet support - I haven't looked at what is involved in making it so. I verified that the code works w/ the latest WordNet (2.0), and it does so, no problem. The relevant data from WordNet has not changed so there's no need to upgrade WordNet for this package at least. I added "query expansion" which takes in a simple query string and for every term adds their synonyms. There's an optional boost parameter to be used to "penalize" synonyms if you want to use the heuristic that the user probably knows the right word. One example of expansion with the synonym boost set to 0.9 is the query "big dog" expands to: big adult^0.9 bad^0.9 bighearted^0.9 boastful^0.9 boastfully^0.9 bounteous^0.9 bountiful^0.9 braggy^0.9 crowing^0.9 freehanded^0.9 giving^0.9 grown^0.9 grownup^0.9 handsome^0.9 large^0.9 liberal^0.9 magnanimous^0.9 momentous^0.9 openhanded^0.9 prominent^0.9 swelled^0.9 vainglorious^0.9 vauntingly^0.9 dog andiron^0.9 blackguard^0.9 bounder^0.9 cad^0.9 chase^0.9 click^0.9 detent^0.9 dogtooth^0.9 firedog^0.9 frank^0.9 frankfurter^0.9 frump^0.9 heel^0.9 hotdog^0.9 hound^0.9 pawl^0.9 tag^0.9 tail^0.9 track^0.9 trail^0.9 weenie^0.9 wiener^0.9 wienerwurst^0.9 Amusingly then, documents with the terms "liberal wienerwurst" match "big dog"! :) Javadoc is here: http://www.searchmorph.com/pub/jakarta-lucene-sandbox/contributions/WordNet/build/docs/api/org/apache/lucene/wordnet/package-summary.html The new query expansion is here: http://www.searchmorph.com/pub/jakarta-lucene-sandbox/contributions/WordNet/build/docs/api/org/apache/lucene/wordnet/SynExpand.html Want to try it out? This page *expands* a query and prints out the result (but doesn't execute it yet). http://www.searchmorph.com/kat/synonym.jsp?syn=big CVS tree here: http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/contributions/WordNet/ If you just want to use a prebuild index it's here (1MB): http://searchmorph.com/pub/syn_index.zip The prebuilt jar file is here: http://www.searchmorph.com/pub/lucene-wordnet-dev.jar Redundant weblog entry here: http://www.searchmorph.com/weblog/index.php?id=34 Hope y'all like it and someone finds it useful, Dave PS Oh - it may need the 1.5 dev branch of Lucene to work - I'm not positive but it I tried to remove deprecated warnings and doing so may have tied it to the latest code... Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Query based stemming
Jim Lynch wrote: From what I've read, if you want to have a choice, the easiest way is to index the documents twice. Once with stemming on and once with it off placing the results in two different indexes. Then at query time, select which index you want to use based on whether you want stemming on or off. IMHO keeping the data in the same index is easiest. PerFieldAnalyzerWrapper is part of the magic...approx uasge follows from my code below. Second magic is to call doc.add(...) multiple times, "redundantly". Don't use code below exactly however - things like MySnowballAnalyzer should become SnowballAnalyzer in your code... Analyzer fa; Analyzer getAnalyzer() { Analyzer snowball = new MySnowballStopAnalyzer(); Analyzer def = new AlphaNumStopAnalyzer(); // prob StandardAnalyzer for most people.. PerFieldAnalyzerWrapper fa = new PerFieldAnalyzerWrapper( def); fa.addAnalyzer( "scontents", snowball); // "s" in "scontents" if for stemming fa.addAnalyzer( "stitle", snowball); return fa; } ... later: Document doc = new Document(); doc.add( Field.Text( "title", title)); doc.add( Field.Text( "stitle", new StringReader( title))); // don't need recall String body = ...; doc.add( Field.Text( "contents", new StringReader( body), true)); // term vector doc.add( Field.Text( "scontents", new StringReader( body))); writer.addDocument( doc); Jim. Peter Kim wrote: Hi, I'm new to Lucene, so I apologize if this issue has been discussed before (I'm sure it has), but I had a hard time finding an answer using google. (Maybe this would be a good candidate for the FAQ!) :) Is it possible to enable stem queries on a per-query basis? It doesn't seem to be possible since the stem tokenizing is done during the indexing process. Are people basically stuck with having all their queries stemmed or none at all? Thanks! Peter - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Query based stemming
: >Is it possible to enable stem queries on a per-query basis? It doesn't : >seem to be possible since the stem tokenizing is done during the : >indexing process. Are people basically stuck with having all their : >queries stemmed or none at all? : From what I've read, if you want to have a choice, the easiest way is : to index the documents twice. Once with stemming on and once with it off : placing the results in two different indexes. Then at query time, : select which index you want to use based on whether you want stemming on : or off. As I understand it, the intented place to impliment Stemming is in an Analyzer Filter (not to be confused with a search Filter). Since you can can specify an Analyzer when you call addDocument, you don't have to acctually have two seperate indexes, you could just have all the docs in one index - and use a search Filter to indicate which docs to look at. Alternately: the Analyzer's tokenStream method is given the fieldName being analyzed, so you could write an Analyzer with a set of rules telling it to only apply your Stemming filter to certain fields, and then instead of having twice as many documents, you can just index your text in two seperate fields (which should be a little easier, then seperate docs because you are only duplicating the fields where stemming is relevant) Then at search time you don't have to filter anything, just search the field that's applicable to your current desire (stemmed or unstemmed) Lastely: Allthough it's tricky to get correct, there's no law saying you have to use the same Analyzer when you query as when you index. You could index your documents using an Analyzer that does no stemming, and then at search time (if you want stemming) use an Analyzer that does "reverse stemming" to expand your query terms out to all the possible variants. (NOTE: I've never acctaully tried this, but i think the theory is sound). -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Query based stemming
From what I've read, if you want to have a choice, the easiest way is to index the documents twice. Once with stemming on and once with it off placing the results in two different indexes. Then at query time, select which index you want to use based on whether you want stemming on or off. Jim. Peter Kim wrote: Hi, I'm new to Lucene, so I apologize if this issue has been discussed before (I'm sure it has), but I had a hard time finding an answer using google. (Maybe this would be a good candidate for the FAQ!) :) Is it possible to enable stem queries on a per-query basis? It doesn't seem to be possible since the stem tokenizing is done during the indexing process. Are people basically stuck with having all their queries stemmed or none at all? Thanks! Peter - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Query based stemming
Hi, I'm new to Lucene, so I apologize if this issue has been discussed before (I'm sure it has), but I had a hard time finding an answer using google. (Maybe this would be a good candidate for the FAQ!) :) Is it possible to enable stem queries on a per-query basis? It doesn't seem to be possible since the stem tokenizing is done during the indexing process. Are people basically stuck with having all their queries stemmed or none at all? Thanks! Peter - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Span Query Performance
Sorry for the duplicate on lucene-dev, it should have gone to lucene-user directly: A bit more: On Thursday 06 January 2005 10:22, Paul Elschot wrote: > On Thursday 06 January 2005 02:17, Andrew Cunningham wrote: > > Hi all, > > > > I'm currently doing a query similar to the following: > > > > for w in wordset: > > query = w near (word1 V word2 V word3 ... V word1422); > > perform query > > > > and I am doing this through SpanQuery.getSpans(), iterating through the > > spans and counting > > the matches, which can result in 4782282 matches (essentially I am only > > after the match count). > > The query works but the performance can be somewhat slow; so I am wondering: > > ... > > c) Is there a faster method to what I am doing I should consider? > > Preindexing all word combinations that you're interested in. > In case you know all the words in advance, you could also index a helper word at the same position as each of those words. This requires a custom analyzer that inserts the helper word in the token stream with a zero position increment. The query then simplifies to: query = w near helperword which would probably speed things up significantly. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Span Query Performance
On Thursday 06 January 2005 02:17, Andrew Cunningham wrote: > Hi all, > > I'm currently doing a query similar to the following: > > for w in wordset: > query = w near (word1 V word2 V word3 ... V word1422); > perform query > > and I am doing this through SpanQuery.getSpans(), iterating through the > spans and counting > the matches, which can result in 4782282 matches (essentially I am only > after the match count). > The query works but the performance can be somewhat slow; so I am wondering: > > a) Would the query potentially run faster if I used > Searcher.search(query) with a custom similarity, > or do both methods essentially use the same mechanics It would be somewhat slower, because it loops over the getSpans() and computes document scores and constructs a Hits from the scores. > b) Does using a RAMDirectory improve query performance any significant > amount. That depends on your operating system, the size of the index, the amount of RAM you can use, the file buffering efficiency, other loads on the computer ... > c) Is there a faster method to what I am doing I should consider? Preindexing all word combinations that you're interested in. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Span Query Performance
Hi all, I'm currently doing a query similar to the following: for w in wordset: query = w near (word1 V word2 V word3 ... V word1422); perform query and I am doing this through SpanQuery.getSpans(), iterating through the spans and counting the matches, which can result in 4782282 matches (essentially I am only after the match count). The query works but the performance can be somewhat slow; so I am wondering: a) Would the query potentially run faster if I used Searcher.search(query) with a custom similarity, or do both methods essentially use the same mechanics b) Does using a RAMDirectory improve query performance any significant amount. c) Is there a faster method to what I am doing I should consider? Thanks, Andrew - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Correct query
On Dec 27, 2004, at 6:28 AM, Alex Kiselevski wrote: Thanks Erik, I use StandardAnalyze to index RPG/4. I use StandardAnalyzer and IndexSearcher with TermQuery without QueryParser. So, I thought that as a result of query Text:RPG I still have to get some hit, but it didn't happen. StandardAnalyzer: [rpg/4] As you can see, StandardAnalyzer tokenized RPG/4 as "rpg/4". A TermQuery must be *exactly* that to match. You could, alternatively, bypass QueryParser and use the analyzer directly making a TermQuery (or PhraseQuery) out of the results. I do this in quite a few queries in the system I'm building for my primary work to allow queries against some library archives. This is one of the most critical, but seemingly misunderstood, aspect to using Lucene effectively - how to manage the analysis process and match it to the searching side. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Correct query
Thanks Erik, I use StandardAnalyze to index RPG/4. I use StandardAnalyzer and IndexSearcher with TermQuery without QueryParser. So, I thought that as a result of query Text:RPG I still have to get some hit, but it didn't happen. -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Monday, December 27, 2004 11:51 AM To: Lucene Users List Subject: Re: Correct query On Dec 27, 2004, at 3:21 AM, Alex Kiselevski wrote: > Hello, > I indexed some document that included a word RPG/4. > So, when I made a search - I builded a query > > Text:RPG but it didn't find a thing only Text:RPG/4 gave me the > correct result. Tell me please what have I do to build a a dynamic > (not hardcoded like in this example )query to get right results What Analyzer did you use? Are you using QueryParser and using the same analyzer with it? Please read the AnalysisParalysis page on the wiki. Also, running the AnalyzerDemo from Lucene in Action's source code yields this, which should help illuminate the situation: $ ant -emacs AnalyzerDemo Buildfile: build.xml AnalyzerDemo: Demonstrates analysis of sample text. Refer to the "Analysis" chapter for much more on this extremely crucial topic. Press return to continue... String to analyze: [This string will be analyzed.] RPG/4 Running lia.analysis.AnalyzerDemo... Analyzing "RPG/4" WhitespaceAnalyzer: [RPG/4] SimpleAnalyzer: [rpg] StopAnalyzer: [rpg] StandardAnalyzer: [rpg/4] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] The information contained in this message is proprietary of Amdocs, protected from disclosure, and may be privileged. The information is intended to be conveyed only to the designated recipient(s) of the message. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, use, distribution or copying of this communication is strictly prohibited and may be unlawful. If you have received this communication in error, please notify us immediately by replying to the message and deleting it from your computer. Thank you. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]