Re: 2.9/3.0 plan Java 1.5

2008-12-17 Thread Paul Cowan

Just for the record, to pick up this point of Grant's:

Grant Ingersoll wrote:
IIRC, we also agreed that we didn't feel any compelling reason to make a 
sweeping change to generics, but would likely just add them as we see 
'em, unless of course someone wants to do a wholesale patch.  


I'll go on record as saying that if doing a 'wholesale patch' is the 
easiest way, I'm more than happy to do so. As an experiment I tried 
using a combination of Eclipse's infer generic type arguments (which 
is brilliant, but not perfect) and manual changes (where Eclipse doesn't 
quite manage to nail it) and managed to get ~2000 'use of raw types' 
warnings throughout the Lucene trunk codebase down to ~1000 in the space 
of an hour or so.


There's a little bit of manual tidy-up involved but it's something I've 
done plenty of before (both internally and on external APIs, which 
obviously require more care) -- but if you want someone to do the 
gruntwork, well, just let me know when the 3.0-dev branch exists and is 
ready for commits and I'll set aside a day and give it a crack.


Cheers,

Paul

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for multiple subreaders

2008-12-17 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12657398#action_12657398
 ] 

Mark Miller commented on LUCENE-1483:
-

I'm on board with whatever you think is best. 

I'll keep playing with ords.

I spent some time last night putting in most of the rest of the cleaup/finishup 
that was left outside of the comparators. Theres a handful of non SortTest 
classes tests that still fail though, so I still have to fix those. I'll do 
that, give ords a little play time, and then I think the patch will be fairly 
close. Then we can take it in and bench on a fairly close to done version.

 Change IndexSearcher to use MultiSearcher semantics for multiple subreaders
 ---

 Key: LUCENE-1483
 URL: https://issues.apache.org/jira/browse/LUCENE-1483
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Mark Miller
Priority: Minor
 Attachments: LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch


 FieldCache and Filters are forced down to a single segment reader, allowing 
 for individual segment reloading on reopen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for multiple subreaders

2008-12-17 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12657445#action_12657445
 ] 

Mark Miller commented on LUCENE-1483:
-

Hey Mike how about this one? BooleanScorer can collect hits out of order if you 
force it (against the contract). I think its an issue with basedoc type stuff.

 Change IndexSearcher to use MultiSearcher semantics for multiple subreaders
 ---

 Key: LUCENE-1483
 URL: https://issues.apache.org/jira/browse/LUCENE-1483
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Mark Miller
Priority: Minor
 Attachments: LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch


 FieldCache and Filters are forced down to a single segment reader, allowing 
 for individual segment reloading on reopen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: solr NumberUtils to lucene?

2008-12-17 Thread patrick o'leary




It would be great to get it consistent I cherry picked when someone
pointed it out to me

Erik Hatcher wrote:
My thoughts... bring over any simple functions like these
are that are generally useful. At a quick glance, the functions in
Solr's NumberUtils are generally useful and fit well in Lucene's
NumberTools. What's the harm?
  
  
 Erik
  
  
On Dec 16, 2008, at 9:14 PM, Ryan McKinley wrote:
  
  
  I posted this same question for the same
reasons a while back...

http://markmail.org/message/mji7jnpa5xjfflmw


I'm looking at local lucene and trying to figure out how it could go
into lucene. As is, locallucene depends on solr since it needs
NumberUtils.


Any change of heart for moving it into lucene?


-

To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org

For additional commands, e-mail: java-dev-h...@lucene.apache.org

  
  
  
-
  
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
  
For additional commands, e-mail: java-dev-h...@lucene.apache.org
  
  


-- 
Patrick O'Leary

AOL Local Search Technologies
Phone: + 1 703 265 8763

You see, wire telegraph is a kind of a very, very long cat. You pull his tail in New York and his head is meowing in Los Angeles.
 Do you understand this? 
And radio operates exactly the same way: you send signals here, they receive them there. The only difference is that there is no cat.
  - Albert Einstein

View
Patrick O Leary's profile





[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for multiple subreaders

2008-12-17 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12657458#action_12657458
 ] 

Mark Miller commented on LUCENE-1483:
-

I didnt think it should be a problem either, since we just push everything to 
one reader; But it seems to be - the only test not passing involves 
allowDocsOutOfOrder=true. Do the search with it true, do the same search with 
it false, gets 3 and 4 docs. 2 or 3 tests involving that fail. I don't have 
time to dig in till tonight though - thought you might shortcut me to the 
answer :)

 Change IndexSearcher to use MultiSearcher semantics for multiple subreaders
 ---

 Key: LUCENE-1483
 URL: https://issues.apache.org/jira/browse/LUCENE-1483
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Mark Miller
Priority: Minor
 Attachments: LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch


 FieldCache and Filters are forced down to a single segment reader, allowing 
 for individual segment reloading on reopen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for multiple subreaders

2008-12-17 Thread Doug Cutting (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12657466#action_12657466
 ] 

Doug Cutting commented on LUCENE-1483:
--

bq. I would actually be fine with keeping HitCollector, adding a default 
setNextReader method, that either throws UOE or (if we are strongly against 
exceptions) returns false indicating it cannot handle sequential readers.

Could we instead add a new HitCollector subclass, that adds the setNextReader, 
then use 'instanceof' to decide whether to wrap or not?

bq. I really don't fully understand BooleanScorer!

The original version of BooleanScorer uses a ~16k array to score windows of 
docs.  So it scores docs 0-16k first, then docs 16-32k, etc. For each window it 
iterates through all query terms and accumulates a score in table[doc%16k].  It 
also stores in the table a bitmask representing which terms contributed to the 
score.  Non-zero scores are chained in a linked list.  At the end of scoring 
each window it then iterates through the linked list and, if the bitmask 
matches the boolean constraints, collects a hit.  For boolean queries with lots 
of frequent terms this can be much faster, since it does not need to update a 
priority queue for each posting, instead performing constant-time operations 
per posting.  The only downside is that it results in hits being delivered 
out-of-order within the window, which means it cannot be nested within other 
scorers.  But it works well as a top-level scorer.  The new BooleanScorer2 
implementation instead works by merging priority queues of postings, albeit 
with some clever tricks.  For example, a pure conjunction (all terms required) 
does not require a priority queue.  Instead it sorts the posting streams at the 
start, then repeatedly skips the first to to the last.  If the first ever 
equals the last, then there's a hit.  When some terms are required and some 
terms are optional, the conjunction can be evaluated first, then the optional 
terms can all skip to the match and be added to the score.  Thus the 
conjunction can reduce the number of priority queue updates for the optional 
terms.  Does that help any?


 Change IndexSearcher to use MultiSearcher semantics for multiple subreaders
 ---

 Key: LUCENE-1483
 URL: https://issues.apache.org/jira/browse/LUCENE-1483
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Mark Miller
Priority: Minor
 Attachments: LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch


 FieldCache and Filters are forced down to a single segment reader, allowing 
 for individual segment reloading on reopen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for multiple subreaders

2008-12-17 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12657468#action_12657468
 ] 

Mark Miller commented on LUCENE-1483:
-

bq. Could we instead add a new HitCollector subclass, that adds the 
setNextReader, then use 'instanceof' to decide whether to wrap or not?

Woah! Don't make me switch all that again! I've got wrist injuries here :) The 
reason I lost the instanceof is that we would have to deprecate the 
HitCollector implementations because they need to extend HitCollector. Mike 
seemed against deprecating those if we could get away with it, so I've since 
dropped that. I've already gone back and forth - whats it going to be ? Ill 
admit I don't like using the exception trap I am now, but I dont much like the 
return true/false method either...

 Change IndexSearcher to use MultiSearcher semantics for multiple subreaders
 ---

 Key: LUCENE-1483
 URL: https://issues.apache.org/jira/browse/LUCENE-1483
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Mark Miller
Priority: Minor
 Attachments: LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch


 FieldCache and Filters are forced down to a single segment reader, allowing 
 for individual segment reloading on reopen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for multiple subreaders

2008-12-17 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12657406#action_12657406
 ] 

Mark Miller commented on LUCENE-1483:
-

Hmmm...we had a reason for deprecating HitCollector though. At first it was to 
do the capability check (instance of HitCollector would be wrapped), but that 
didn't pan out. I think we also liked it because people got deprecation 
warnings though - so that they would know to implement that method for 3.0 when 
we would take out the wrapper.

 Change IndexSearcher to use MultiSearcher semantics for multiple subreaders
 ---

 Key: LUCENE-1483
 URL: https://issues.apache.org/jira/browse/LUCENE-1483
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Mark Miller
Priority: Minor
 Attachments: LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch


 FieldCache and Filters are forced down to a single segment reader, allowing 
 for individual segment reloading on reopen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for multiple subreaders

2008-12-17 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12657468#action_12657468
 ] 

markrmil...@gmail.com edited comment on LUCENE-1483 at 12/17/08 10:34 AM:


bq. Could we instead add a new HitCollector subclass, that adds the 
setNextReader, then use 'instanceof' to decide whether to wrap or not?

Woah! Don't make me switch all that again! I've got wrist injuries here :) The 
reason I lost the instanceof is that we would have to deprecate the 
HitCollector implementations because they need to extend HitCollector. Mike 
seemed against deprecating those if we could get away with it, so I've since 
dropped that. I've already gone back and forth - whats it going to be ? Ill 
admit I don't like using the exception trap I am now, but I dont much like the 
return true/false method either...


*Edit*

Ah, I see, you have a new tweak on this time. Extend HitCollector rather then 
HitCollector extending the new type...

Nice, I think this is the way to go.

  was (Author: markrmil...@gmail.com):
bq. Could we instead add a new HitCollector subclass, that adds the 
setNextReader, then use 'instanceof' to decide whether to wrap or not?

Woah! Don't make me switch all that again! I've got wrist injuries here :) The 
reason I lost the instanceof is that we would have to deprecate the 
HitCollector implementations because they need to extend HitCollector. Mike 
seemed against deprecating those if we could get away with it, so I've since 
dropped that. I've already gone back and forth - whats it going to be ? Ill 
admit I don't like using the exception trap I am now, but I dont much like the 
return true/false method either...
  
 Change IndexSearcher to use MultiSearcher semantics for multiple subreaders
 ---

 Key: LUCENE-1483
 URL: https://issues.apache.org/jira/browse/LUCENE-1483
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Mark Miller
Priority: Minor
 Attachments: LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch


 FieldCache and Filters are forced down to a single segment reader, allowing 
 for individual segment reloading on reopen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for multiple subreaders

2008-12-17 Thread Doug Cutting (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12657476#action_12657476
 ] 

Doug Cutting commented on LUCENE-1483:
--

 Woah! Don't make me switch all that again!

Sorry, I'm just tossing out ideas.  Don't take me too seriously...

 The reason I lost the instanceof is that we would have to deprecate the 
 HitCollector implementations because they need to extend HitCollector.

Would we?  I was suggesting that, if we're going to have two APIs, one expert 
and one non-expert, then we could make the expert API a subclass and not 
deprecate or otherwise alter HitCollector.  I do not like using exceptions for 
normal control flow.  Instanceof is better, but not ideal.  A default 
implementation of an expert method that returns 'false', as Mike suggested, 
isn't bad and might be best.  It requires neither deprecation, exceptions nor 
instanceof.  Would we have a subclass that overrides this that's used as a base 
class for optimized implementations?


 Change IndexSearcher to use MultiSearcher semantics for multiple subreaders
 ---

 Key: LUCENE-1483
 URL: https://issues.apache.org/jira/browse/LUCENE-1483
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Mark Miller
Priority: Minor
 Attachments: LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch


 FieldCache and Filters are forced down to a single segment reader, allowing 
 for individual segment reloading on reopen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for multiple subreaders

2008-12-17 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12657414#action_12657414
 ] 

Mark Miller commented on LUCENE-1483:
-

Okay, I hate the idea of leaving in the wrapper, but it is true thats too 
difficult of a method for HitCollector (to be required anyway).  setReader is a 
jump in understanding above setDocBase, which was bad enough.

 Change IndexSearcher to use MultiSearcher semantics for multiple subreaders
 ---

 Key: LUCENE-1483
 URL: https://issues.apache.org/jira/browse/LUCENE-1483
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Mark Miller
Priority: Minor
 Attachments: LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch


 FieldCache and Filters are forced down to a single segment reader, allowing 
 for individual segment reloading on reopen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for multiple subreaders

2008-12-17 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12657483#action_12657483
 ] 

Michael McCandless commented on LUCENE-1483:


{quote}
 Does that help any?
{quote}
Yes, thanks!  So much so that I'm going to go add that blurb to the javadocs...

 Change IndexSearcher to use MultiSearcher semantics for multiple subreaders
 ---

 Key: LUCENE-1483
 URL: https://issues.apache.org/jira/browse/LUCENE-1483
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Mark Miller
Priority: Minor
 Attachments: LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch


 FieldCache and Filters are forced down to a single segment reader, allowing 
 for individual segment reloading on reopen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for multiple subreaders

2008-12-17 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12657481#action_12657481
 ] 

Michael McCandless commented on LUCENE-1483:


{quote}
 Would we have a subclass that overrides this that's used as a base class for 
 optimized implementations?
{quote}

If we do this, I don't think we need a new base class for expert collectors; 
they can simply subclass HitCollector  override the setNextReader method?

Though one downside of this approach is the simple HitCollector API is 
polluted with this advanced method, and HitCollector's collect method gets 
different args depending on what that method returns.  It's a somewhat 
confusing API.

I guess Id' actually prefer subclassing HitCollector (SequentialHitCollector?  
AdvancedHitCollector?  SegmentedHitCollector?), adding setNextReader only to 
that subclass, and using instanceof to wrap HitCollector subclasses.

 Change IndexSearcher to use MultiSearcher semantics for multiple subreaders
 ---

 Key: LUCENE-1483
 URL: https://issues.apache.org/jira/browse/LUCENE-1483
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Mark Miller
Priority: Minor
 Attachments: LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch


 FieldCache and Filters are forced down to a single segment reader, allowing 
 for individual segment reloading on reopen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for multiple subreaders

2008-12-17 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12657482#action_12657482
 ] 

Mark Miller commented on LUCENE-1483:
-

 Woah! Don't make me switch all that again!

Sorry, I'm just tossing out ideas. Don't take me too seriously...

Same here. If you guys have a 100 ideas, id do it 100 times. No worries. Just 
wrist frustration :) I misunderstood you anyways.

bq. It requires neither deprecation, exceptions nor instanceof. 

Okay, fair points. I guess my main dislike was having to call it, see what it 
returns, and then maybe call it again. That turned me off as much as 
instanceof. I'm still liking the suggestion you just made myself...

Mike?

 Change IndexSearcher to use MultiSearcher semantics for multiple subreaders
 ---

 Key: LUCENE-1483
 URL: https://issues.apache.org/jira/browse/LUCENE-1483
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Mark Miller
Priority: Minor
 Attachments: LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch


 FieldCache and Filters are forced down to a single segment reader, allowing 
 for individual segment reloading on reopen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for multiple subreaders

2008-12-17 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12657489#action_12657489
 ] 

Mark Miller commented on LUCENE-1483:
-

bq. I guess Id' actually prefer subclassing HitCollector 
(SequentialHitCollector? AdvancedHitCollector? SegmentedHitCollector?), adding 
setNextReader only to that subclass, and using instanceof to wrap HitCollector 
subclasses.

Thats actually what I prefer as well (and what I tried). I used 
MultiReaderHitCollector. Still thinking about the name...

 Change IndexSearcher to use MultiSearcher semantics for multiple subreaders
 ---

 Key: LUCENE-1483
 URL: https://issues.apache.org/jira/browse/LUCENE-1483
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Mark Miller
Priority: Minor
 Attachments: LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch


 FieldCache and Filters are forced down to a single segment reader, allowing 
 for individual segment reloading on reopen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for multiple subreaders

2008-12-17 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12657493#action_12657493
 ] 

Michael McCandless commented on LUCENE-1483:


I like MultiReaderHitCollector!

 Change IndexSearcher to use MultiSearcher semantics for multiple subreaders
 ---

 Key: LUCENE-1483
 URL: https://issues.apache.org/jira/browse/LUCENE-1483
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Mark Miller
Priority: Minor
 Attachments: LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch


 FieldCache and Filters are forced down to a single segment reader, allowing 
 for individual segment reloading on reopen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for multiple subreaders

2008-12-17 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12657402#action_12657402
 ] 

Mark Miller commented on LUCENE-1483:
-

bq. I have still one question: Why do we need the new DocCollector? Is this 
really needed? Would it be not OK to just add the offset before calling 
collect()?

If its not needed, lets get rid of it. We don't want to deprecate HitCollector 
if we don't have to. The main reason I can see that we are doing it at the 
moment is that the TopFieldValueDocCollector needs that hook so that it can set 
the next IndexReader for each Comparator. The Comparator needs it to create the 
fieldcaches and map ords from one reader to the next. Also, it lets us do the 
docBase stuff, which is nice because you add the docBase less often if done in 
the collector.

 Change IndexSearcher to use MultiSearcher semantics for multiple subreaders
 ---

 Key: LUCENE-1483
 URL: https://issues.apache.org/jira/browse/LUCENE-1483
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Mark Miller
Priority: Minor
 Attachments: LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch


 FieldCache and Filters are forced down to a single segment reader, allowing 
 for individual segment reloading on reopen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for multiple subreaders

2008-12-17 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12657331#action_12657331
 ] 

Michael McCandless commented on LUCENE-1483:


{quote}
 I just don't think that ords without fallback is going to get very good. I'm 
 wondering if we should even try too hard if ord with val fallback does so 
 well.
{quote}

Maybe we can try a bit more (I'll run perf tests on your next iteration here?) 
and then start wrapping things up?  Progress not perfection!  We can further 
improve this later.

 Change IndexSearcher to use MultiSearcher semantics for multiple subreaders
 ---

 Key: LUCENE-1483
 URL: https://issues.apache.org/jira/browse/LUCENE-1483
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Mark Miller
Priority: Minor
 Attachments: LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch


 FieldCache and Filters are forced down to a single segment reader, allowing 
 for individual segment reloading on reopen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for multiple subreaders

2008-12-17 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12657403#action_12657403
 ] 

Michael McCandless commented on LUCENE-1483:



{quote}
 Why do we need the new DocCollector? Is this really needed? Would it be not 
 OK to just add the offset before calling collect()?
{quote}

I'd like to allow for 'expert' cases, where the collector is told when
we advance to the next sequential reader and can do something at that
point (like our sort-by-field collector does).

But then still allow for 'normal' cases, where the collector is
unchanged with what we have today (ie it receives the real docID).

The core collectors would use the expert API to eke out all
performance; external collectors can use either, but the 'normal' one
would be simplest (and match back compat).

So then how to implement this approach... I would actually be fine
with keeping HitCollector, adding a default setNextReader method,
that either throws UOE or (if we are strongly against exceptions)
returns false indicating it cannot handle sequential readers.

Then when we run searches we simply check if the collector is an
expert one (does not throw UOE or return false from setNextReader)
and if it isn't we wrap it with DocBaseCollector (which adds the doc
base for every collect() call).


 Change IndexSearcher to use MultiSearcher semantics for multiple subreaders
 ---

 Key: LUCENE-1483
 URL: https://issues.apache.org/jira/browse/LUCENE-1483
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Mark Miller
Priority: Minor
 Attachments: LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch


 FieldCache and Filters are forced down to a single segment reader, allowing 
 for individual segment reloading on reopen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for multiple subreaders

2008-12-17 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12657400#action_12657400
 ] 

Uwe Schindler commented on LUCENE-1483:
---

I have still one question: Why do we need the new DocCollector? Is this really 
needed? Would it be not OK to just add the offset before calling collect()?

 Change IndexSearcher to use MultiSearcher semantics for multiple subreaders
 ---

 Key: LUCENE-1483
 URL: https://issues.apache.org/jira/browse/LUCENE-1483
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Mark Miller
Priority: Minor
 Attachments: LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch


 FieldCache and Filters are forced down to a single segment reader, allowing 
 for individual segment reloading on reopen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-12-17 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12657409#action_12657409
 ] 

Michael McCandless commented on LUCENE-831:
---

{quote}
  this will turn more into an API overhaul than an IndexReader reopen time 
 saver.
{quote}
...and given the progress on LUCENE-1483 (copying values into the sort queues), 
I think this new FieldCache API should probably be primarily an iteration API.

 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-1484) Remove SegmentReader.document synchronization

2008-12-17 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-1484.


   Resolution: Fixed
Fix Version/s: 2.9

Committed revision 727338.  Thanks Jason!

 Remove SegmentReader.document synchronization
 -

 Key: LUCENE-1484
 URL: https://issues.apache.org/jira/browse/LUCENE-1484
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.4
Reporter: Jason Rutherglen
Assignee: Michael McCandless
 Fix For: 2.9

 Attachments: LUCENE-1484.patch, LUCENE-1484.patch

   Original Estimate: 96h
  Remaining Estimate: 96h

 This is probably the last synchronization issue in Lucene.  It is the 
 document method in SegmentReader.  It is avoidable by using a threadlocal for 
 FieldsReader.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for multiple subreaders

2008-12-17 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12657455#action_12657455
 ] 

Michael McCandless commented on LUCENE-1483:


{quote}
 BooleanScorer can collect hits out of order if you force it (against the 
 contract).
{quote}
Hmmm... right.  You mean if you pass in allowDocsOutOfOrder=true (defaults to 
false).

I think this should not be a problem?  (Though, I really don't fully understand 
BooleanScorer!).  Since we are running scoring per-segment, each segment might 
collect its docIDs out of order, but all such docs are still within the current 
segment.  Then when we advance to the new segment, the collector can do 
something if it needs to, and then collection proceeds again on the next 
segment's docs, possibly out of order.  Ie, the out-of-orderness never jumps 
across a segment and then back again?

But this is a challenge for LUCENE-831, if we go with a primarily 
iterator-driven API.

 Change IndexSearcher to use MultiSearcher semantics for multiple subreaders
 ---

 Key: LUCENE-1483
 URL: https://issues.apache.org/jira/browse/LUCENE-1483
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Mark Miller
Priority: Minor
 Attachments: LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch


 FieldCache and Filters are forced down to a single segment reader, allowing 
 for individual segment reloading on reopen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for multiple subreaders

2008-12-17 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12657330#action_12657330
 ] 

Michael McCandless commented on LUCENE-1483:


{quote}
 the binary search gives back -insertionpoint - 1, the insertion point for 
 banana is 1, so -1 -1 = -2. So I reverse that and subtract 2 to get 0 right? 
 It lands on apple.
{quote}
Hmm -- I didn't realize binarySearch is returning the insertion point on a 
miss.  So your logic (negate then subtract 2) makes perfect sense now.

Just be sure... maybe you should temporarily add asserts when a negative index 
is returned that values[-index-2].compareTo(newValue)  0 and values[-index-1] 
 0 (making sure those array accesses are in bounds)?

{quote}
 (I dont remember off hand why subord has to start at 1 not 0, but i remember 
 it didnt work otherwise)
{quote}

This is very important -- that 1 is equivalent to the original 0.5 proposal, 
ie, think of subord as the 2nd digit in a 2-digit number.  That 2nd digit being 
non zero is how we know that even though banana's ord landed on apple's, banana 
is in fact *not* equal to apple (because the subord for banana is  0) and is 
instead *between* apple and orange.

 Change IndexSearcher to use MultiSearcher semantics for multiple subreaders
 ---

 Key: LUCENE-1483
 URL: https://issues.apache.org/jira/browse/LUCENE-1483
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Mark Miller
Priority: Minor
 Attachments: LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch


 FieldCache and Filters are forced down to a single segment reader, allowing 
 for individual segment reloading on reopen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1494) Additional features for searching for value across multiple fields (many-to-one style)

2008-12-17 Thread Andrzej Bialecki (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12657353#action_12657353
 ] 

Andrzej Bialecki  commented on LUCENE-1494:
---

Luke should work with trunk, possibly with only minor patches. Just grab the 
luke-0.9.jar and add jars from Lucene trunk on the classpath.

 Additional features for searching for value across multiple fields 
 (many-to-one style)
 --

 Key: LUCENE-1494
 URL: https://issues.apache.org/jira/browse/LUCENE-1494
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.4
Reporter: Paul Cowan
Priority: Minor
 Attachments: LUCENE-1494-multifield.patch, 
 LUCENE-1494-positionincrement.patch


 This issue is to cover the changes required to do a search across multiple 
 fields with the same name in a fashion similar to a many-to-one database. 
 Below is my post on java-dev on the topic, which details the changes we need:
 ---
 We have an interesting situation where we are effectively indexing two 
 'entities' in our system, which share a one-to-many relationship (imagine 
 'User' and 'Delivery Address' for demonstration purposes). At the moment, we 
 index one Lucene Document per 'many' end, duplicating the 'one' end data, 
 like so:
 userid: 1
 userfirstname: fred
 addresscountry: au
 addressphone: 1234
 userid: 1
 userfirstname: fred
 addresscountry: nz
 addressphone: 5678
 userid: 2
 userfirstname: mary
 addresscountry: au
 addressphone: 5678
 (note: 2 Documents indexed for user 1). This is somewhat annoying for us, 
 because when we search in Lucene the results we want back (conceptually) are 
 at the 'user' level, so we have to collapse the results by distinct user id, 
 etc. etc (let alone that it blows out the size of our index enormously). So 
 why do we do it? It would make more sense to use multiple fields:
 userid: 1
 userfirstname: fred
 addresscountry: au
 addressphone: 1234
 addresscountry: nz
 addressphone: 5678
 userid: 2
 userfirstname: mary
 addresscountry: au
 addressphone: 5678
 But imagine the search +addresscountry:au +addressphone:5678. We'd like 
 this to match ONLY Mary, but of course it matches Fred also because he 
 matches both those terms (just for different addresses).
 There are two aspects to the approach we've (more or less) got working but 
 I'd like to run them past the group and see if they're worth trying to get 
 them into Lucene proper (if so, I'll create a JIRA issue for them)
 1) Use a modified SpanNearQuery. If we assume that country + phone will 
 always be one token, we can rely on the fact that the positions of 'au' and 
 '5678' in Fred's document will be different.
SpanQuery q1 = new SpanTermQuery(new Term(addresscountry, au));
SpanQuery q2 = new SpanTermQuery(new Term(addressphone, 5678));
SpanQuery snq = new SpanNearQuery(new SpanQuery[]{q1, q2}, 0, false);
 the slop of 0 means that we'll only return those where the two terms are in 
 the same position in their respective fields. This works brilliantly, BUT 
 requires a change to SpanNearQuery's constructor (which checks that all the 
 clauses are against the same field). Are people amenable to perhaps adding 
 another constructor to SNQ which doesn't do the check, or subclassing it to 
 do the same (give it a protected non-checking constructor for the subclass to 
 call)?
 2) It gets slightly more complicated in the case of variable-length terms. 
 For example, imagine if we had an 'address' field ('123 Smith St') which will 
 result in (1 to n) tokens; slop 0 in a SpanNearQuery won't work here, of 
 course. One thing we've toyed with is the idea of using 
 getPositionIncrementGap -- if we knew that 'address' would be, at most, 20 
 tokens, we might use a position increment gap of 100, and make the slop 
 factor 50; this works fine for the simple case (yay!), but with a great many 
 addresses-per-user starts to get more complicated, as the gap counts from the 
 last term (so the position sequence for a single value field might be 0, 100, 
 200, but for the address field it might be 0, 1, 2, 3, 103, 104, 105, 106, 
 206, 207... so it's going to get out of sync). The simplest option here seems 
 to be changing (or supplementing)
public int getPositionIncrementGap(String fieldname)
 to
public int getPositionIncrementGap(String fieldname, int currentPos)
 so that we can override that to round up to the nearest 100 (or whatever) 
 based on currentPos. The default implementation could just delegate to 
 getPositionIncrementGap().
 ---
 Patches (x2) to follow shortly

-- 
This message is automatically generated by JIRA.
-
You can reply to this email 

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-12-17 Thread Jeremy Volkman (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12657401#action_12657401
 ] 

Jeremy Volkman commented on LUCENE-831:
---

A couple things:

# Looking at the getCachedData method for MultiReader and MultiSegmentReader, 
it doesn't appear that the CacheData objects from merge operations are cached.  
Is there any reason for this?
# I've written a merge method for StringIndexCacheKey. The process isn't all 
that complicated (apart from all of the off-by-ones), but it's expensive.

{code:java}
  public boolean isMergable() {
return true;
  }

  private static class OrderNode {
  int index;
  OrderNode next;
  }
  
  public CacheData mergeData(int[] starts, CacheData[] data) 
  throws UnsupportedOperationException {
int[] mergedOrder = new int[starts[starts.length - 1]];
// Lookup map is 1-based
String[] mergedLookup = new String[starts[starts.length - 1] + 1];

// Unwrap cache payloads and flip order arrays
StringIndex[] unwrapped = new StringIndex[data.length];

/* Flip the order arrays (reverse indices and values)
 * Since the ord map has a many-to-one relationship with the lookup table,
 * the flipped structure must be one-to-many which results in an array of
 * linked lists.
 */
OrderNode[][] flippedOrders = new OrderNode[data.length][];
for (int i = 0; i  data.length; i++) {
StringIndex si = (StringIndex) data[i].getCachePayload();
unwrapped[i] = si;
flippedOrders[i] = new OrderNode[si.lookup.length];
for (int j = 0; j  si.order.length; j++) {
OrderNode a = new OrderNode();
a.index = j;
a.next = flippedOrders[i][si.order[j]];
flippedOrders[i][si.order[j]] = a;
}
}

// Lookup map is 1-based
int[] lookupIndices = new int[unwrapped.length];
Arrays.fill(lookupIndices, 1);

int lookupIndex = 0;
String currentVal;
int currentSeg;
while (true) {
currentVal = null;
currentSeg = -1;
int remaining = 0;
// Find the next ordered value from all the segments
for (int i = 0; i  unwrapped.length; i++) {
if (lookupIndices[i]  unwrapped[i].lookup.length) {
remaining++;
String that = unwrapped[i].lookup[lookupIndices[i]];
if (currentVal == null || currentVal.compareTo(that)  0) {
currentVal = that;
currentSeg = i;
}
}
}
if (remaining == 1) {
break;
} else if (remaining == 0) {
/* The only way this could happen is if there are 0 segments or if
 * all segments have 0 terms. In either case, we can return
 * early.
 */
return new CacheData(new StringIndex(
new int[starts[starts.length - 1]], new String[1]));
}
if (!currentVal.equals(mergedLookup[lookupIndex])) {
lookupIndex++;
mergedLookup[lookupIndex] = currentVal;
}
OrderNode a = flippedOrders[currentSeg][lookupIndices[currentSeg]];
while (a != null) {
mergedOrder[a.index + starts[currentSeg]] = lookupIndex;
a = a.next;
}
lookupIndices[currentSeg]++;
}
{code}



 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
 Fix For: 3.0

 Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, 
 fieldcache-overhaul.diff, fieldcache-overhaul.diff, 
 LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, 
 LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, 
 LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API 

[jira] Issue Comment Edited: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for multiple subreaders

2008-12-17 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12657445#action_12657445
 ] 

markrmil...@gmail.com edited comment on LUCENE-1483 at 12/17/08 8:59 AM:
---

Hey Mike how about this one? BooleanScorer can collect hits out of order if you 
force it (against the contract). I think its an issue with basedoc type stuff.

Actually I'll clarify that - I think its an issue with the multple reader mojo 
- didnt mean to put it solely on adding bases in particular yet.

  was (Author: markrmil...@gmail.com):
Hey Mike how about this one? BooleanScorer can collect hits out of order if 
you force it (against the contract). I think its an issue with basedoc type 
stuff.
  
 Change IndexSearcher to use MultiSearcher semantics for multiple subreaders
 ---

 Key: LUCENE-1483
 URL: https://issues.apache.org/jira/browse/LUCENE-1483
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Mark Miller
Priority: Minor
 Attachments: LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch


 FieldCache and Filters are forced down to a single segment reader, allowing 
 for individual segment reloading on reopen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



RE: [Fwd: Re: 2.9, 3.0 and deprecation]

2008-12-17 Thread Uwe Schindler
Hallo Patrick,

 

You are almost right with what you think that the trie algorithm does.

 

The idea behind the trie algorithm is to match as most as possible matching
documents per term and so the number of TermDocs seeks is low. This is done
by using the most precise terms (that only match few documents)  for the
borders of the range and use the most unprecise terms for the center of the
range (which match more documents). Because of the algorithm the maximum
number of termdoc seeks is limited hard to an upper boundary dependent on
the trie parameters, not the index size or if the range is very large [see
javadocs and LUCENE-1470 for numbers]. Because of this all ranges execute in
about the same time.

 

Uwe

-
UWE SCHINDLER
Webserver/Middleware Development
PANGAEA - Publishing Network for Geoscientific and Environmental Data
MARUM - University of Bremen
Room 2500, Leobener Str., D-28359 Bremen
Tel.: +49 421 218 65595
Fax:  +49 421 218 65505
 http://www.pangaea.de/ http://www.pangaea.de/
E-mail: uschind...@pangaea.de

  _  

From: patrick o'leary [mailto:polear...@aol.com] 
Sent: Tuesday, December 16, 2008 4:51 PM
To: java-dev@lucene.apache.org
Subject: Re: [Fwd: Re: 2.9, 3.0 and deprecation]

 

Yes, typo..   long day yesterday

Uwe Schindler wrote: 

I've only read through the jdoc of tier so far, but I'm guessing it's
doing a dictionary search and splitting the the index readers position
based on the result being less than or greater than upper / lower values.
Which may be faster than a TermDocs seek, and certainly
worth while investigating.


 
Do you mean JDOC of Trie here?
 
Uwe
 
 
-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org
 
 
  

 

-- 

Patrick O'Leary
 
AOL Local Search Technologies
Phone: + 1 703 265 8763
 
You see, wire telegraph is a kind of a very, very long cat. You pull his
tail in New York and his head is meowing in Los Angeles.
 Do you understand this? 
And radio operates exactly the same way: you send signals here, they receive
them there. The only difference is that there is no cat.
  - Albert Einstein

 http://www.linkedin.com/in/pjaol View Patrick O Leary's LinkedIn
profileView Patrick O Leary's profile 

image001.gif

[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for multiple subreaders

2008-12-17 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12657410#action_12657410
 ] 

Michael McCandless commented on LUCENE-1483:


{quote}
 so that they would know to implement that method for 3.0 when we would take 
 out the wrapper.
{quote}
Right but the new insight (for me at least) is it's OK for external collectors 
to not code to the expert API.

Ie previously we wanted to force migration to the expert API, but now I think 
it's OK to allow normal API and expert API to exist together.

 Change IndexSearcher to use MultiSearcher semantics for multiple subreaders
 ---

 Key: LUCENE-1483
 URL: https://issues.apache.org/jira/browse/LUCENE-1483
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Mark Miller
Priority: Minor
 Attachments: LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch


 FieldCache and Filters are forced down to a single segment reader, allowing 
 for individual segment reloading on reopen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for multiple subreaders

2008-12-17 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12657579#action_12657579
 ] 

Mark Miller commented on LUCENE-1483:
-

{quote}Hmmm... right. You mean if you pass in allowDocsOutOfOrder=true 
(defaults to false).

I think this should not be a problem? (Though, I really don't fully understand 
BooleanScorer!). Since we are running scoring per-segment, each segment might 
collect its docIDs out of order, but all such docs are still within the current 
segment. Then when we advance to the new segment, the collector can do 
something if it needs to, and then collection proceeds again on the next 
segment's docs, possibly out of order. Ie, the out-of-orderness never jumps 
across a segment and then back again?{quote}

I was off base with my guess - its actually only using one reader for that test 
(3 or 4 docs). Gotto be the HitCollector that the out of order scorer uses 
needs to be tweaked. Last tests to fix.


 Change IndexSearcher to use MultiSearcher semantics for multiple subreaders
 ---

 Key: LUCENE-1483
 URL: https://issues.apache.org/jira/browse/LUCENE-1483
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Mark Miller
Priority: Minor
 Attachments: LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch


 FieldCache and Filters are forced down to a single segment reader, allowing 
 for individual segment reloading on reopen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for multiple subreaders

2008-12-17 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12657586#action_12657586
 ] 

Mark Miller commented on LUCENE-1483:
-

Hmmm...working more with ints as ords rather than double...it gives us ints but 
it complicates things a bit. Before, the only ords that had to be sorted and 
suborded were ones that didn't map on the new Reader exactly. With an int ord, 
*everything* you add is going to collide, and you need the ords in the queue 
added to the double lists and you need to fall down to the subord much more 
often...

interesting...

I guess I'll go with it for now though...

 Change IndexSearcher to use MultiSearcher semantics for multiple subreaders
 ---

 Key: LUCENE-1483
 URL: https://issues.apache.org/jira/browse/LUCENE-1483
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Mark Miller
Priority: Minor
 Attachments: LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch


 FieldCache and Filters are forced down to a single segment reader, allowing 
 for individual segment reloading on reopen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for multiple subreaders

2008-12-17 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12657594#action_12657594
 ] 

Michael McCandless commented on LUCENE-1483:


Hang on -- if the value carries over to the new segment (and you set subord to 
0) then you don't need to add those ords to the double lists?

 Change IndexSearcher to use MultiSearcher semantics for multiple subreaders
 ---

 Key: LUCENE-1483
 URL: https://issues.apache.org/jira/browse/LUCENE-1483
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Mark Miller
Priority: Minor
 Attachments: LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch


 FieldCache and Filters are forced down to a single segment reader, allowing 
 for individual segment reloading on reopen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org




[jira] Commented: (LUCENE-504) FuzzyQuery produces a java.lang.NegativeArraySizeException in PriorityQueue.initialize if I use Integer.MAX_VALUE as BooleanQuery.MaxClauseCount

2008-12-17 Thread George Papas (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12657602#action_12657602
 ] 

George Papas commented on LUCENE-504:
-

Hi, 

This is still an issue in 2.4.0.  I know this is low priority, but has there 
been any more thinking about how to address this?

Thanks
George.

 FuzzyQuery produces a java.lang.NegativeArraySizeException in 
 PriorityQueue.initialize if I use Integer.MAX_VALUE as 
 BooleanQuery.MaxClauseCount
 --

 Key: LUCENE-504
 URL: https://issues.apache.org/jira/browse/LUCENE-504
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 1.9
Reporter: Joerg Henss
Priority: Minor
 Attachments: BooleanQuery.java.diff, fuzzyquery.patch, 
 PriorityQueue.java.diff, TestFuzzyQueryError.java


 PriorityQueue creates an java.lang.NegativeArraySizeException when 
 initialized with Integer.MAX_VALUE, because Integer overflows. I think this 
 could be a general problem with PriorityQueue. The Error occured when I set 
 BooleanQuery.MaxClauseCount to Integer.MAX_VALUE and user a FuzzyQuery for 
 searching.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for multiple subreaders

2008-12-17 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated LUCENE-1483:


Attachment: LUCENE-1483.patch

This patch is entering the finishing stages I think. This one is pretty much 
functionally complete and all tests should pass.

There is still a bunch of polish to be done though.

There are still the following sort types: SortField.STRING_VAL, STRING_ORD, 
STRING_ORD_VAL, and STRING is currently set to straight ord.

I think the ord case is still pretty slow, I'm sure there are still a few 
optimizations left, but it would be nice to see where its at.

There is still an issue with custom FieldComparators - they are currently 
passed the top level reader in the hook - this still needs to be addressed 
somehow. We also need a test for one.

- Mark

 Change IndexSearcher to use MultiSearcher semantics for multiple subreaders
 ---

 Key: LUCENE-1483
 URL: https://issues.apache.org/jira/browse/LUCENE-1483
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Mark Miller
Priority: Minor
 Attachments: LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch


 FieldCache and Filters are forced down to a single segment reader, allowing 
 for individual segment reloading on reopen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for multiple subreaders

2008-12-17 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12657608#action_12657608
 ] 

markrmil...@gmail.com edited comment on LUCENE-1483 at 12/17/08 3:29 PM:
---

This patch is entering the finishing stages I think. This one is pretty much 
functionally complete and all tests should pass.

There is still a bunch of polish to be done though.

There are still the following sort types: SortField.STRING_VAL, STRING_ORD, 
STRING_ORD_VAL, and STRING is currently set to straight ord.

I think the ord case is still pretty slow, I'm sure there are still a few 
optimizations left, but it would be nice to see where its at.

There is still an issue with custom FieldComparators - they are currently 
passed the top level reader in the hook - this still needs to be addressed 
somehow. We also need a test for one.

- Mark

(ignore the couple setDocBases you see in contrib - ive got em)

  was (Author: markrmil...@gmail.com):
This patch is entering the finishing stages I think. This one is pretty 
much functionally complete and all tests should pass.

There is still a bunch of polish to be done though.

There are still the following sort types: SortField.STRING_VAL, STRING_ORD, 
STRING_ORD_VAL, and STRING is currently set to straight ord.

I think the ord case is still pretty slow, I'm sure there are still a few 
optimizations left, but it would be nice to see where its at.

There is still an issue with custom FieldComparators - they are currently 
passed the top level reader in the hook - this still needs to be addressed 
somehow. We also need a test for one.

- Mark
  
 Change IndexSearcher to use MultiSearcher semantics for multiple subreaders
 ---

 Key: LUCENE-1483
 URL: https://issues.apache.org/jira/browse/LUCENE-1483
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Mark Miller
Priority: Minor
 Attachments: LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch


 FieldCache and Filters are forced down to a single segment reader, allowing 
 for individual segment reloading on reopen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for multiple subreaders

2008-12-17 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12657620#action_12657620
 ] 

Mark Miller commented on LUCENE-1483:
-

bq. Hang on - if the value carries over to the new segment (and you set subord 
to 0) then you don't need to add those ords to the double lists? 

What was actually happening: I noticed it wasn't quite working right after 
switching ords to ints from double, and I realized the problem was that there 
was always going to be a collision for the sort list, whereas before, there was 
only a sortable collision when more than one mapped-from ord collided. So I 
thought that out wrong and figured you needed to sort the current ord as well, 
but in fact, of course you don't: I just needed to assume there is always a 
collision that adds to the sort list, not wait for 2 mapped-from ords to 
collide.

 Change IndexSearcher to use MultiSearcher semantics for multiple subreaders
 ---

 Key: LUCENE-1483
 URL: https://issues.apache.org/jira/browse/LUCENE-1483
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Mark Miller
Priority: Minor
 Attachments: LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
 LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch


 FieldCache and Filters are forced down to a single segment reader, allowing 
 for individual segment reloading on reopen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1465) NearSpansOrdered.getPayload does not return the payload from the minimum match span

2008-12-17 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12657662#action_12657662
 ] 

Mark Miller commented on LUCENE-1465:
-

This is an odd one Jonathan. Its actually for the unordered case (the others 
were for the ordered). I am not exactly clear on whats going on yet.

When I look at the payloads coming back, it would seem we are get 0,7,7 when we 
should get 6,7,7. When I look at the offsets for the spans that I get the 
payloads from though - they appear correct. Its returning the payloads from the 
right offsets it seems, but somehow one of those payloads is from the term at 
position 0? Very odd. So when I debug in, it does indeed look like the first 
match happens at index 6...but the term offsets are start: 2147483647, 
end:-2147483648. What the heck? This is going to take some more time...

 NearSpansOrdered.getPayload does not return the payload from the minimum 
 match span
 ---

 Key: LUCENE-1465
 URL: https://issues.apache.org/jira/browse/LUCENE-1465
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.4
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: 2.4.1, 2.9

 Attachments: LUCENE-1465.patch, LUCENE-1465.patch, LUCENE-1465.patch, 
 LUCENE-1465.patch, Test.java




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1327) TermSpans skipTo() doesn't always move forwards

2008-12-17 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated LUCENE-1327:


Fix Version/s: (was: 2.3.3)
   2.9

 TermSpans skipTo() doesn't always move forwards
 ---

 Key: LUCENE-1327
 URL: https://issues.apache.org/jira/browse/LUCENE-1327
 Project: Lucene - Java
  Issue Type: Bug
  Components: Query/Scoring, Search
Affects Versions: 1.9, 2.0.0, 2.1, 2.2, 2.3, 2.3.1, 2.3.2, 2.4, 2.9, 3.0
Reporter: Moti Nisenson
 Fix For: 2.9


 In TermSpans (or the anonymous Spans class returned by SpansTermQuery, 
 depending on the version), the skipTo() method is improperly implemented if 
 the target doc is less than or equal to the current doc:
   public boolean skipTo(int target) throws IOException {
   // are we already at the correct position?
   if (doc = target) {
 return true;
   }
   ...
 This violates the correct behavior (as described in the Spans interface 
 documentation), that skipTo() should always move forwards, in other words the 
 correct implementation would be:
 if (doc = target) {
   return next();
 }
 This bug causes particular problems if one wants to use the payloads feature 
 - this is because if one loads a payload, then performs a skipTo() to the 
 same document, then tries to load the next payload, the spans hasn't 
 changed position and it attempts to load the same payload again (which is an 
 error).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1405) Support for new Resources model in ant 1.7 in Lucene ant task.

2008-12-17 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated LUCENE-1405:


Fix Version/s: (was: 2.3.3)
   2.9

 Support for new Resources model in ant 1.7 in Lucene ant task.
 --

 Key: LUCENE-1405
 URL: https://issues.apache.org/jira/browse/LUCENE-1405
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/*
Affects Versions: 2.3.2
Reporter: Przemyslaw Sztoch
 Fix For: 2.9

 Attachments: lucene-ant1.7-newresources.patch


 Ant Task for Lucene should use modern Resource model (not only FileSet child 
 element).
 There is a patch with required changes.
 Supported by old (ant 1.6) and new (ant 1.7) resources model:
 index  !-- Lucene Ant Task --
   fileset ... /
 /index 
 Supported only by new (ant 1.7) resources model:
 index  !-- Lucene Ant Task --
   filelist ... /
 /index 
 index  !-- Lucene Ant Task --
   userdefinied-filesource ... /
 /index 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1361) QueryParser should have a setDateFormat(DateFormat) method

2008-12-17 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated LUCENE-1361:


Fix Version/s: (was: 2.3.3)
   2.9

 QueryParser should have a setDateFormat(DateFormat) method
 --

 Key: LUCENE-1361
 URL: https://issues.apache.org/jira/browse/LUCENE-1361
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.3.2
Reporter: ocean
Priority: Minor
 Fix For: 2.9


 Currently the only way to change the date format used by QueryParser.java is 
 to override the getRangeQuery method. This seems a bit excessive to me. Since 
 QueryParser isn't threadsafe (like DateFormat) I would suggest that a 
 DateFormat field be introduced (protected DateFormat dateFormat) and a setter 
 be introduced (public void setDateFormat(DateFormat format)) so that it's 
 easier to customize the date format in queries. If there are good reasons 
 against this (can't imagine, but who knows) why not introduce a protected 
 'DateFormat:createDateFormat())' method so that, again, it's easier for 
 clients to override this logic.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org