[jira] Resolved: (LUCENE-1951) wildcardquery rewrite improvements

2009-10-08 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-1951.


   Resolution: Fixed
Fix Version/s: 3.0

Thanks Robert!

> wildcardquery rewrite improvements
> --
>
> Key: LUCENE-1951
> URL: https://issues.apache.org/jira/browse/LUCENE-1951
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Query/Scoring
>Reporter: Robert Muir
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 3.0
>
> Attachments: LUCENE-1951.patch, LUCENE-1951.patch, 
> LUCENE-1951_bwcompatbranch.patch
>
>
> wildcardquery has logic to rewrite to termquery if there is no wildcard 
> character, but
> * it needs to pass along the boost if it does this
> * if the user asked for a 'constant score' rewriteMethod, it should rewrite 
> to a constant score query for consistency.
> additionally, if the query is really a prefixquery, it would be nice to 
> rewrite to prefix query.
> both will enumerate the same number of terms, but prefixquery has a simpler 
> comparison function.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1951) wildcardquery rewrite improvements

2009-10-08 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763599#action_12763599
 ] 

Robert Muir commented on LUCENE-1951:
-

bq. That is a rather roundabout way to arrive at the TermQuery, but I think the 
test is fine? 

Ok, that was my only concern, the test. I like the SingleTermEnum otherwise, I 
think it will reduce maintenance.

> wildcardquery rewrite improvements
> --
>
> Key: LUCENE-1951
> URL: https://issues.apache.org/jira/browse/LUCENE-1951
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Query/Scoring
>Reporter: Robert Muir
>Assignee: Michael McCandless
>Priority: Minor
> Attachments: LUCENE-1951.patch, LUCENE-1951.patch, 
> LUCENE-1951_bwcompatbranch.patch
>
>
> wildcardquery has logic to rewrite to termquery if there is no wildcard 
> character, but
> * it needs to pass along the boost if it does this
> * if the user asked for a 'constant score' rewriteMethod, it should rewrite 
> to a constant score query for consistency.
> additionally, if the query is really a prefixquery, it would be nice to 
> rewrite to prefix query.
> both will enumerate the same number of terms, but prefixquery has a simpler 
> comparison function.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1951) wildcardquery rewrite improvements

2009-10-08 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763597#action_12763597
 ] 

Michael McCandless commented on LUCENE-1951:


That is a rather roundabout way to arrive at the TermQuery, but I think the 
test is fine?

> wildcardquery rewrite improvements
> --
>
> Key: LUCENE-1951
> URL: https://issues.apache.org/jira/browse/LUCENE-1951
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Query/Scoring
>Reporter: Robert Muir
>Assignee: Michael McCandless
>Priority: Minor
> Attachments: LUCENE-1951.patch, LUCENE-1951.patch, 
> LUCENE-1951_bwcompatbranch.patch
>
>
> wildcardquery has logic to rewrite to termquery if there is no wildcard 
> character, but
> * it needs to pass along the boost if it does this
> * if the user asked for a 'constant score' rewriteMethod, it should rewrite 
> to a constant score query for consistency.
> additionally, if the query is really a prefixquery, it would be nice to 
> rewrite to prefix query.
> both will enumerate the same number of terms, but prefixquery has a simpler 
> comparison function.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1951) wildcardquery rewrite improvements

2009-10-08 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763580#action_12763580
 ] 

Robert Muir commented on LUCENE-1951:
-

Michael, cool. The bw_compat patch is still valid with these changes.

I will mention one concern, just for the record (you can tell me if it is an 
issue).

These tests test that for example, a WildcardQuery with SCORING_REWRITE 
rewrites to a TermQuery, which is correct, but now its a bit wierd how this 
happens.

SingleTermEnum -> MultiTermQuery -> BooleanQuery with one term -> TermQuery.

I couldnt think of a better way to test the correct behavior, but it is testing 
a bit more than just what happens in WildcardQuery...


> wildcardquery rewrite improvements
> --
>
> Key: LUCENE-1951
> URL: https://issues.apache.org/jira/browse/LUCENE-1951
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Query/Scoring
>Reporter: Robert Muir
>Assignee: Michael McCandless
>Priority: Minor
> Attachments: LUCENE-1951.patch, LUCENE-1951.patch, 
> LUCENE-1951_bwcompatbranch.patch
>
>
> wildcardquery has logic to rewrite to termquery if there is no wildcard 
> character, but
> * it needs to pass along the boost if it does this
> * if the user asked for a 'constant score' rewriteMethod, it should rewrite 
> to a constant score query for consistency.
> additionally, if the query is really a prefixquery, it would be nice to 
> rewrite to prefix query.
> both will enumerate the same number of terms, but prefixquery has a simpler 
> comparison function.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1951) wildcardquery rewrite improvements

2009-10-08 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763579#action_12763579
 ] 

Michael McCandless commented on LUCENE-1951:


Patch looks good Robert!  Thanks.  I'll commit soon.

> wildcardquery rewrite improvements
> --
>
> Key: LUCENE-1951
> URL: https://issues.apache.org/jira/browse/LUCENE-1951
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Query/Scoring
>Reporter: Robert Muir
>Assignee: Michael McCandless
>Priority: Minor
> Attachments: LUCENE-1951.patch, LUCENE-1951.patch, 
> LUCENE-1951_bwcompatbranch.patch
>
>
> wildcardquery has logic to rewrite to termquery if there is no wildcard 
> character, but
> * it needs to pass along the boost if it does this
> * if the user asked for a 'constant score' rewriteMethod, it should rewrite 
> to a constant score query for consistency.
> additionally, if the query is really a prefixquery, it would be nice to 
> rewrite to prefix query.
> both will enumerate the same number of terms, but prefixquery has a simpler 
> comparison function.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1951) wildcardquery rewrite improvements

2009-10-08 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-1951:


Attachment: LUCENE-1951.patch

updated patch, using SingleTermEnum instead of TermQuery rewrite when there are 
no wildcards to preserve all the MultiTermQuery semantics.


> wildcardquery rewrite improvements
> --
>
> Key: LUCENE-1951
> URL: https://issues.apache.org/jira/browse/LUCENE-1951
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Query/Scoring
>Reporter: Robert Muir
>Assignee: Michael McCandless
>Priority: Minor
> Attachments: LUCENE-1951.patch, LUCENE-1951.patch, 
> LUCENE-1951_bwcompatbranch.patch
>
>
> wildcardquery has logic to rewrite to termquery if there is no wildcard 
> character, but
> * it needs to pass along the boost if it does this
> * if the user asked for a 'constant score' rewriteMethod, it should rewrite 
> to a constant score query for consistency.
> additionally, if the query is really a prefixquery, it would be nice to 
> rewrite to prefix query.
> both will enumerate the same number of terms, but prefixquery has a simpler 
> comparison function.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1951) wildcardquery rewrite improvements

2009-10-08 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763521#action_12763521
 ] 

Michael McCandless commented on LUCENE-1951:


bq. think there would be objection to making this proposed SingleTermEnum 
public?

I think that's fine.

> wildcardquery rewrite improvements
> --
>
> Key: LUCENE-1951
> URL: https://issues.apache.org/jira/browse/LUCENE-1951
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Query/Scoring
>Reporter: Robert Muir
>Assignee: Michael McCandless
>Priority: Minor
> Attachments: LUCENE-1951.patch, LUCENE-1951_bwcompatbranch.patch
>
>
> wildcardquery has logic to rewrite to termquery if there is no wildcard 
> character, but
> * it needs to pass along the boost if it does this
> * if the user asked for a 'constant score' rewriteMethod, it should rewrite 
> to a constant score query for consistency.
> additionally, if the query is really a prefixquery, it would be nice to 
> rewrite to prefix query.
> both will enumerate the same number of terms, but prefixquery has a simpler 
> comparison function.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1951) wildcardquery rewrite improvements

2009-10-08 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763507#action_12763507
 ] 

Robert Muir commented on LUCENE-1951:
-

think there would be objection to making this proposed SingleTermEnum public?

I would like to use it in LUCENE-1606 (contrib) to have consistency there as 
well.

> wildcardquery rewrite improvements
> --
>
> Key: LUCENE-1951
> URL: https://issues.apache.org/jira/browse/LUCENE-1951
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Query/Scoring
>Reporter: Robert Muir
>Assignee: Michael McCandless
>Priority: Minor
> Attachments: LUCENE-1951.patch, LUCENE-1951_bwcompatbranch.patch
>
>
> wildcardquery has logic to rewrite to termquery if there is no wildcard 
> character, but
> * it needs to pass along the boost if it does this
> * if the user asked for a 'constant score' rewriteMethod, it should rewrite 
> to a constant score query for consistency.
> additionally, if the query is really a prefixquery, it would be nice to 
> rewrite to prefix query.
> both will enumerate the same number of terms, but prefixquery has a simpler 
> comparison function.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1951) wildcardquery rewrite improvements

2009-10-08 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763468#action_12763468
 ] 

Robert Muir commented on LUCENE-1951:
-

Michael, I thought about this problem too, but didnt know what to do.

I rather like the SingleTermEnum idea. I'll do it.


> wildcardquery rewrite improvements
> --
>
> Key: LUCENE-1951
> URL: https://issues.apache.org/jira/browse/LUCENE-1951
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Query/Scoring
>Reporter: Robert Muir
>Assignee: Michael McCandless
>Priority: Minor
> Attachments: LUCENE-1951.patch, LUCENE-1951_bwcompatbranch.patch
>
>
> wildcardquery has logic to rewrite to termquery if there is no wildcard 
> character, but
> * it needs to pass along the boost if it does this
> * if the user asked for a 'constant score' rewriteMethod, it should rewrite 
> to a constant score query for consistency.
> additionally, if the query is really a prefixquery, it would be nice to 
> rewrite to prefix query.
> both will enumerate the same number of terms, but prefixquery has a simpler 
> comparison function.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1951) wildcardquery rewrite improvements

2009-10-08 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763429#action_12763429
 ] 

Michael McCandless commented on LUCENE-1951:


Patch looks good, thanks Robert!  And those are good perf numbers;
rewriting to PrefixQuery seems a clear win.

The only thing that makes me nervous here is we've baked-in MTQ's
rewrite logic into WildcardQuery.rewrite.  Ie, MTQ in general accepts
any rewrite method, and so conceivably one could create their own
rewrite method and then see that it's unused in the special case where
WildcardQuery is a single term.

And while it's true today that if the rewrite method != scoring
boolean query, it must be a constant scoring one, that could
conceivably some day change.

Maybe a different approach would be to make a degenerate
"SingleTermEnum" (subclasses FilteredTermEnum) that produces only a
single term?  Then in getEnum we could return that, instead, so the
rewrite method remains intact?

> wildcardquery rewrite improvements
> --
>
> Key: LUCENE-1951
> URL: https://issues.apache.org/jira/browse/LUCENE-1951
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Query/Scoring
>Reporter: Robert Muir
>Assignee: Michael McCandless
>Priority: Minor
> Attachments: LUCENE-1951.patch, LUCENE-1951_bwcompatbranch.patch
>
>
> wildcardquery has logic to rewrite to termquery if there is no wildcard 
> character, but
> * it needs to pass along the boost if it does this
> * if the user asked for a 'constant score' rewriteMethod, it should rewrite 
> to a constant score query for consistency.
> additionally, if the query is really a prefixquery, it would be nice to 
> rewrite to prefix query.
> both will enumerate the same number of terms, but prefixquery has a simpler 
> comparison function.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Assigned: (LUCENE-1951) wildcardquery rewrite improvements

2009-10-08 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reassigned LUCENE-1951:
--

Assignee: Michael McCandless

> wildcardquery rewrite improvements
> --
>
> Key: LUCENE-1951
> URL: https://issues.apache.org/jira/browse/LUCENE-1951
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Query/Scoring
>Reporter: Robert Muir
>Assignee: Michael McCandless
>Priority: Minor
> Attachments: LUCENE-1951.patch, LUCENE-1951_bwcompatbranch.patch
>
>
> wildcardquery has logic to rewrite to termquery if there is no wildcard 
> character, but
> * it needs to pass along the boost if it does this
> * if the user asked for a 'constant score' rewriteMethod, it should rewrite 
> to a constant score query for consistency.
> additionally, if the query is really a prefixquery, it would be nice to 
> rewrite to prefix query.
> both will enumerate the same number of terms, but prefixquery has a simpler 
> comparison function.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1951) wildcardquery rewrite improvements

2009-10-07 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-1951:


Attachment: LUCENE-1951_bwcompatbranch.patch

patch to the back compat branch to fix the buggy wildcard rewrite test.

> wildcardquery rewrite improvements
> --
>
> Key: LUCENE-1951
> URL: https://issues.apache.org/jira/browse/LUCENE-1951
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Query/Scoring
>Reporter: Robert Muir
>Priority: Minor
> Attachments: LUCENE-1951.patch, LUCENE-1951_bwcompatbranch.patch
>
>
> wildcardquery has logic to rewrite to termquery if there is no wildcard 
> character, but
> * it needs to pass along the boost if it does this
> * if the user asked for a 'constant score' rewriteMethod, it should rewrite 
> to a constant score query for consistency.
> additionally, if the query is really a prefixquery, it would be nice to 
> rewrite to prefix query.
> both will enumerate the same number of terms, but prefixquery has a simpler 
> comparison function.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1951) wildcardquery rewrite improvements

2009-10-07 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-1951:


Attachment: LUCENE-1951.patch

patch for these issues.
note: the existing TestWildCardQuery.testTermWithoutWildcard had a bad test, 
and needs to be fixed for bw-compat branch.

it did the following:
{code}
Query wq = new WildcardQuery(new Term("field", "nowildcard"));
wq = searcher.rewrite(wq);
assertTrue(wq instanceof TermQuery);
{code}

this is not correct, it should only be TermQuery when rewriteMethod is 
SCORING_BOOLEAN_QUERY_REWRITE. and this is not the default, constant score is.

easiest way to fix the old test is to 
setRewriteMethod(SCORING_BOOLEAN_QUERY_REWRITE), its the only time it should 
rewrite to TermQuery.


> wildcardquery rewrite improvements
> --
>
> Key: LUCENE-1951
> URL: https://issues.apache.org/jira/browse/LUCENE-1951
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Query/Scoring
>Reporter: Robert Muir
>Priority: Minor
> Attachments: LUCENE-1951.patch
>
>
> wildcardquery has logic to rewrite to termquery if there is no wildcard 
> character, but
> * it needs to pass along the boost if it does this
> * if the user asked for a 'constant score' rewriteMethod, it should rewrite 
> to a constant score query for consistency.
> additionally, if the query is really a prefixquery, it would be nice to 
> rewrite to prefix query.
> both will enumerate the same number of terms, but prefixquery has a simpler 
> comparison function.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1951) wildcardquery rewrite improvements

2009-10-07 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763118#action_12763118
 ] 

Robert Muir commented on LUCENE-1951:
-

here are some stats for rewriting wildcards that should be prefix.
i query on a field with about ~10M numeric terms (a unique database id), 
average length 10 characters or so.
i copied this into ramdirectory to try to rule out i/o a bit (its only 1GB 
index and i use 4GB heap)

I look for all the ones starting with "1" (about 1.5 million of these). I did 3 
runs, 100 queries each.
here are average times for each.

||Run||wildcardquery("1*")||prefixquery("1")
|1|1181ms|973ms
|2|1179ms|966ms
|3|1079ms|963ms

So, its not a big optimization, but seems consistent, and maybe more important 
if avg term length is longer: in this case wildcard's comparison function might 
have to do even more work.

I'll work on a patch to fix the boost/constant score and include a prefixquery 
rewrite for this case.


> wildcardquery rewrite improvements
> --
>
> Key: LUCENE-1951
> URL: https://issues.apache.org/jira/browse/LUCENE-1951
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Query/Scoring
>Reporter: Robert Muir
>Priority: Minor
>
> wildcardquery has logic to rewrite to termquery if there is no wildcard 
> character, but
> * it needs to pass along the boost if it does this
> * if the user asked for a 'constant score' rewriteMethod, it should rewrite 
> to a constant score query for consistency.
> additionally, if the query is really a prefixquery, it would be nice to 
> rewrite to prefix query.
> both will enumerate the same number of terms, but prefixquery has a simpler 
> comparison function.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: wildcardquery rewrite()

2009-10-07 Thread Robert Muir
Mark, I am set up to do these tests with a large term dict. I will see
if there is any improvement.

In my opinion in general, even if the improvement is very small, if
its trivial to rewrite to a faster/simpler query, we should.
Future improvements to lucene might make the simpler query might
become even faster, etc.

On Wed, Oct 7, 2009 at 8:08 AM, Mark Miller  wrote:
> bq. I don't think the prefix enumeration is really that much faster than
> the wildcard one,
>
> We should do some tests. If it is much faster, this would be a nice
> optimization. I think it could be worth it when matching a
> lot of terms - never tested though.
>
> Robert Muir wrote:
>> separately, perhaps we should consider doing the prefixquery rewrite
>> here for wildcardquery.
>>
>> for example, SolrQueryParser will emit these 'wildcardqueries that
>> should be prefixqueries' if you are using the new reverse stuff for
>> leading wildcards: WildcardQuery(*foobar) ->
>> WildcardQuery(U+0001raboof*)
>>
>> I don't think the prefix enumeration is really that much faster than
>> the wildcard one, but still thought I would mention it.
>>
>> On Tue, Oct 6, 2009 at 10:22 PM, Robert Muir  wrote:
>>
>>> someone asked this question on the user list:
>>> http://www.lucidimagination.com/search/document/6f38de391b242102/prefixquery_vs_wildcardquery
>>>
>>> it made me look at the wildcard rewrite(), where i see this:
>>>    if (!termContainsWildcard)
>>>      return new TermQuery(getTerm());
>>>
>>> is it a problem the boost is not preserved in this special case?
>>>
>>> is it also a problem that if the user sets the default MultiTermQuery
>>> rewriteMethod to say, CONSTANT_SCORE_FILTER_REWRITE,
>>> that this rewritten TermQuery isn't wrapped with a constant score?
>>>
>>> Sorry if it seems a bit nitpicky, really the issue is that I want to
>>> do the right thing for a more complex query I am working on, but don't
>>> want to overkill either.
>>> --
>>> Robert Muir
>>> rcm...@gmail.com
>>>
>>>
>>
>>
>>
>>
>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>
> -
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>



-- 
Robert Muir
rcm...@gmail.com

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: wildcardquery rewrite()

2009-10-07 Thread Mark Miller
bq. I don't think the prefix enumeration is really that much faster than
the wildcard one,

We should do some tests. If it is much faster, this would be a nice
optimization. I think it could be worth it when matching a
lot of terms - never tested though.

Robert Muir wrote:
> separately, perhaps we should consider doing the prefixquery rewrite
> here for wildcardquery.
>
> for example, SolrQueryParser will emit these 'wildcardqueries that
> should be prefixqueries' if you are using the new reverse stuff for
> leading wildcards: WildcardQuery(*foobar) ->
> WildcardQuery(U+0001raboof*)
>
> I don't think the prefix enumeration is really that much faster than
> the wildcard one, but still thought I would mention it.
>
> On Tue, Oct 6, 2009 at 10:22 PM, Robert Muir  wrote:
>   
>> someone asked this question on the user list:
>> http://www.lucidimagination.com/search/document/6f38de391b242102/prefixquery_vs_wildcardquery
>>
>> it made me look at the wildcard rewrite(), where i see this:
>>if (!termContainsWildcard)
>>  return new TermQuery(getTerm());
>>
>> is it a problem the boost is not preserved in this special case?
>>
>> is it also a problem that if the user sets the default MultiTermQuery
>> rewriteMethod to say, CONSTANT_SCORE_FILTER_REWRITE,
>> that this rewritten TermQuery isn't wrapped with a constant score?
>>
>> Sorry if it seems a bit nitpicky, really the issue is that I want to
>> do the right thing for a more complex query I am working on, but don't
>> want to overkill either.
>> --
>> Robert Muir
>> rcm...@gmail.com
>>
>> 
>
>
>
>   


-- 
- Mark

http://www.lucidimagination.com




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: wildcardquery rewrite()

2009-10-07 Thread Robert Muir
sure, I will submit a patch under LUCENE-1951.

will look around at other rewrites too just to be sure there arent others

Thanks,
Robert

On Wed, Oct 7, 2009 at 5:15 AM, Michael McCandless
 wrote:
> I agree, this looks like a bug (boost & constant-score-ness is lost)
> -- wanna open an issue & patch it?
>
> Mike
>
> On Tue, Oct 6, 2009 at 10:22 PM, Robert Muir  wrote:
>> someone asked this question on the user list:
>> http://www.lucidimagination.com/search/document/6f38de391b242102/prefixquery_vs_wildcardquery
>>
>> it made me look at the wildcard rewrite(), where i see this:
>>    if (!termContainsWildcard)
>>      return new TermQuery(getTerm());
>>
>> is it a problem the boost is not preserved in this special case?
>>
>> is it also a problem that if the user sets the default MultiTermQuery
>> rewriteMethod to say, CONSTANT_SCORE_FILTER_REWRITE,
>> that this rewritten TermQuery isn't wrapped with a constant score?
>>
>> Sorry if it seems a bit nitpicky, really the issue is that I want to
>> do the right thing for a more complex query I am working on, but don't
>> want to overkill either.
>> --
>> Robert Muir
>> rcm...@gmail.com
>>
>> -
>> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>>
>>
>
> -
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>



-- 
Robert Muir
rcm...@gmail.com

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Created: (LUCENE-1951) wildcardquery rewrite improvements

2009-10-07 Thread Robert Muir (JIRA)
wildcardquery rewrite improvements
--

 Key: LUCENE-1951
 URL: https://issues.apache.org/jira/browse/LUCENE-1951
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Query/Scoring
Reporter: Robert Muir
Priority: Minor


wildcardquery has logic to rewrite to termquery if there is no wildcard 
character, but
* it needs to pass along the boost if it does this
* if the user asked for a 'constant score' rewriteMethod, it should rewrite to 
a constant score query for consistency.

additionally, if the query is really a prefixquery, it would be nice to rewrite 
to prefix query.
both will enumerate the same number of terms, but prefixquery has a simpler 
comparison function.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: wildcardquery rewrite()

2009-10-07 Thread Robert Muir
i think it does this already. so like i said, it would be a minor optimization.

On Wed, Oct 7, 2009 at 6:16 AM, Simon Willnauer
 wrote:
> This should be handled in WildcardTermEnum instead of overriding
> MultiTermQuery#rewrite(). The WildcardTermEnum could simply return
> false in termCompare if a term is not equal to the "prefix". This
> would yield consistent behaviour even if a custom RewriteMethod is
> used. Right?!
>
> simon
>
> On Wed, Oct 7, 2009 at 11:17 AM, Michael McCandless
>  wrote:
>> +1
>>
>> I think it ought to be faster?  PrefixTermEnum just calls .startsWith
>> on each term text, but WildcardTermEnum has big hairy logic in
>> wildcardEquals.
>>
>> Mike
>>
>> On Tue, Oct 6, 2009 at 11:43 PM, Robert Muir  wrote:
>>> separately, perhaps we should consider doing the prefixquery rewrite
>>> here for wildcardquery.
>>>
>>> for example, SolrQueryParser will emit these 'wildcardqueries that
>>> should be prefixqueries' if you are using the new reverse stuff for
>>> leading wildcards: WildcardQuery(*foobar) ->
>>> WildcardQuery(U+0001raboof*)
>>>
>>> I don't think the prefix enumeration is really that much faster than
>>> the wildcard one, but still thought I would mention it.
>>>
>>> On Tue, Oct 6, 2009 at 10:22 PM, Robert Muir  wrote:
 someone asked this question on the user list:
 http://www.lucidimagination.com/search/document/6f38de391b242102/prefixquery_vs_wildcardquery

 it made me look at the wildcard rewrite(), where i see this:
    if (!termContainsWildcard)
      return new TermQuery(getTerm());

 is it a problem the boost is not preserved in this special case?

 is it also a problem that if the user sets the default MultiTermQuery
 rewriteMethod to say, CONSTANT_SCORE_FILTER_REWRITE,
 that this rewritten TermQuery isn't wrapped with a constant score?

 Sorry if it seems a bit nitpicky, really the issue is that I want to
 do the right thing for a more complex query I am working on, but don't
 want to overkill either.
 --
 Robert Muir
 rcm...@gmail.com

>>>
>>>
>>>
>>> --
>>> Robert Muir
>>> rcm...@gmail.com
>>>
>>> -
>>> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>>>
>>>
>>
>> -
>> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>>
>>
>
> -
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>



-- 
Robert Muir
rcm...@gmail.com

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: wildcardquery rewrite()

2009-10-07 Thread Simon Willnauer
This should be handled in WildcardTermEnum instead of overriding
MultiTermQuery#rewrite(). The WildcardTermEnum could simply return
false in termCompare if a term is not equal to the "prefix". This
would yield consistent behaviour even if a custom RewriteMethod is
used. Right?!

simon

On Wed, Oct 7, 2009 at 11:17 AM, Michael McCandless
 wrote:
> +1
>
> I think it ought to be faster?  PrefixTermEnum just calls .startsWith
> on each term text, but WildcardTermEnum has big hairy logic in
> wildcardEquals.
>
> Mike
>
> On Tue, Oct 6, 2009 at 11:43 PM, Robert Muir  wrote:
>> separately, perhaps we should consider doing the prefixquery rewrite
>> here for wildcardquery.
>>
>> for example, SolrQueryParser will emit these 'wildcardqueries that
>> should be prefixqueries' if you are using the new reverse stuff for
>> leading wildcards: WildcardQuery(*foobar) ->
>> WildcardQuery(U+0001raboof*)
>>
>> I don't think the prefix enumeration is really that much faster than
>> the wildcard one, but still thought I would mention it.
>>
>> On Tue, Oct 6, 2009 at 10:22 PM, Robert Muir  wrote:
>>> someone asked this question on the user list:
>>> http://www.lucidimagination.com/search/document/6f38de391b242102/prefixquery_vs_wildcardquery
>>>
>>> it made me look at the wildcard rewrite(), where i see this:
>>>    if (!termContainsWildcard)
>>>      return new TermQuery(getTerm());
>>>
>>> is it a problem the boost is not preserved in this special case?
>>>
>>> is it also a problem that if the user sets the default MultiTermQuery
>>> rewriteMethod to say, CONSTANT_SCORE_FILTER_REWRITE,
>>> that this rewritten TermQuery isn't wrapped with a constant score?
>>>
>>> Sorry if it seems a bit nitpicky, really the issue is that I want to
>>> do the right thing for a more complex query I am working on, but don't
>>> want to overkill either.
>>> --
>>> Robert Muir
>>> rcm...@gmail.com
>>>
>>
>>
>>
>> --
>> Robert Muir
>> rcm...@gmail.com
>>
>> -
>> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>>
>>
>
> -
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: wildcardquery rewrite()

2009-10-07 Thread Michael McCandless
+1

I think it ought to be faster?  PrefixTermEnum just calls .startsWith
on each term text, but WildcardTermEnum has big hairy logic in
wildcardEquals.

Mike

On Tue, Oct 6, 2009 at 11:43 PM, Robert Muir  wrote:
> separately, perhaps we should consider doing the prefixquery rewrite
> here for wildcardquery.
>
> for example, SolrQueryParser will emit these 'wildcardqueries that
> should be prefixqueries' if you are using the new reverse stuff for
> leading wildcards: WildcardQuery(*foobar) ->
> WildcardQuery(U+0001raboof*)
>
> I don't think the prefix enumeration is really that much faster than
> the wildcard one, but still thought I would mention it.
>
> On Tue, Oct 6, 2009 at 10:22 PM, Robert Muir  wrote:
>> someone asked this question on the user list:
>> http://www.lucidimagination.com/search/document/6f38de391b242102/prefixquery_vs_wildcardquery
>>
>> it made me look at the wildcard rewrite(), where i see this:
>>    if (!termContainsWildcard)
>>      return new TermQuery(getTerm());
>>
>> is it a problem the boost is not preserved in this special case?
>>
>> is it also a problem that if the user sets the default MultiTermQuery
>> rewriteMethod to say, CONSTANT_SCORE_FILTER_REWRITE,
>> that this rewritten TermQuery isn't wrapped with a constant score?
>>
>> Sorry if it seems a bit nitpicky, really the issue is that I want to
>> do the right thing for a more complex query I am working on, but don't
>> want to overkill either.
>> --
>> Robert Muir
>> rcm...@gmail.com
>>
>
>
>
> --
> Robert Muir
> rcm...@gmail.com
>
> -
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: wildcardquery rewrite()

2009-10-07 Thread Michael McCandless
I agree, this looks like a bug (boost & constant-score-ness is lost)
-- wanna open an issue & patch it?

Mike

On Tue, Oct 6, 2009 at 10:22 PM, Robert Muir  wrote:
> someone asked this question on the user list:
> http://www.lucidimagination.com/search/document/6f38de391b242102/prefixquery_vs_wildcardquery
>
> it made me look at the wildcard rewrite(), where i see this:
>    if (!termContainsWildcard)
>      return new TermQuery(getTerm());
>
> is it a problem the boost is not preserved in this special case?
>
> is it also a problem that if the user sets the default MultiTermQuery
> rewriteMethod to say, CONSTANT_SCORE_FILTER_REWRITE,
> that this rewritten TermQuery isn't wrapped with a constant score?
>
> Sorry if it seems a bit nitpicky, really the issue is that I want to
> do the right thing for a more complex query I am working on, but don't
> want to overkill either.
> --
> Robert Muir
> rcm...@gmail.com
>
> -
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: wildcardquery rewrite()

2009-10-06 Thread Robert Muir
separately, perhaps we should consider doing the prefixquery rewrite
here for wildcardquery.

for example, SolrQueryParser will emit these 'wildcardqueries that
should be prefixqueries' if you are using the new reverse stuff for
leading wildcards: WildcardQuery(*foobar) ->
WildcardQuery(U+0001raboof*)

I don't think the prefix enumeration is really that much faster than
the wildcard one, but still thought I would mention it.

On Tue, Oct 6, 2009 at 10:22 PM, Robert Muir  wrote:
> someone asked this question on the user list:
> http://www.lucidimagination.com/search/document/6f38de391b242102/prefixquery_vs_wildcardquery
>
> it made me look at the wildcard rewrite(), where i see this:
>    if (!termContainsWildcard)
>      return new TermQuery(getTerm());
>
> is it a problem the boost is not preserved in this special case?
>
> is it also a problem that if the user sets the default MultiTermQuery
> rewriteMethod to say, CONSTANT_SCORE_FILTER_REWRITE,
> that this rewritten TermQuery isn't wrapped with a constant score?
>
> Sorry if it seems a bit nitpicky, really the issue is that I want to
> do the right thing for a more complex query I am working on, but don't
> want to overkill either.
> --
> Robert Muir
> rcm...@gmail.com
>



-- 
Robert Muir
rcm...@gmail.com

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



wildcardquery rewrite()

2009-10-06 Thread Robert Muir
someone asked this question on the user list:
http://www.lucidimagination.com/search/document/6f38de391b242102/prefixquery_vs_wildcardquery

it made me look at the wildcard rewrite(), where i see this:
if (!termContainsWildcard)
  return new TermQuery(getTerm());

is it a problem the boost is not preserved in this special case?

is it also a problem that if the user sets the default MultiTermQuery
rewriteMethod to say, CONSTANT_SCORE_FILTER_REWRITE,
that this rewritten TermQuery isn't wrapped with a constant score?

Sorry if it seems a bit nitpicky, really the issue is that I want to
do the right thing for a more complex query I am working on, but don't
want to overkill either.
-- 
Robert Muir
rcm...@gmail.com

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org