[jira] Commented: (LUCENE-2829) improve termquery pk lookup performance

2011-01-04 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977353#action_12977353
 ] 

Michael McCandless commented on LUCENE-2829:


I think we should commit this, and if/when LUCENE-2694 and/or LUCENE-2831 are 
committed to 3.x, we can revisit it.

 improve termquery pk lookup performance
 -

 Key: LUCENE-2829
 URL: https://issues.apache.org/jira/browse/LUCENE-2829
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Robert Muir
 Attachments: LUCENE-2829.patch


 For things that are like primary keys and don't exist in some segments (worst 
 case is primary/unique key that only exists in 1)
 we do wasted seeks.
 While LUCENE-2694 tries to solve some of this issue with TermState, I'm 
 concerned we could every backport that to 3.1 for example.
 This is a simpler solution here just to solve this one problem in 
 termquery... we could just revert it in trunk when we resolve LUCENE-2694,
 but I don't think we should leave things as they are in 3.x

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2829) improve termquery pk lookup performance

2010-12-22 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12974212#action_12974212
 ] 

Yonik Seeley commented on LUCENE-2829:
--

Why not keep the TermState cache and use it for all queries except MTQ, while 
using a different mechanism for MTQ to avoid trashing the cache?

The cache has a number of advantages that may never be duplicated in a 
different type of API, including
- actually cache frequently used terms across different requests
- cache terms reused in the same request.  term proximity boosting is an 
example:   +united +states united states^10

 improve termquery pk lookup performance
 -

 Key: LUCENE-2829
 URL: https://issues.apache.org/jira/browse/LUCENE-2829
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Robert Muir
 Attachments: LUCENE-2829.patch


 For things that are like primary keys and don't exist in some segments (worst 
 case is primary/unique key that only exists in 1)
 we do wasted seeks.
 While LUCENE-2694 tries to solve some of this issue with TermState, I'm 
 concerned we could every backport that to 3.1 for example.
 This is a simpler solution here just to solve this one problem in 
 termquery... we could just revert it in trunk when we resolve LUCENE-2694,
 but I don't think we should leave things as they are in 3.x

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2829) improve termquery pk lookup performance

2010-12-22 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12974223#action_12974223
 ] 

Robert Muir commented on LUCENE-2829:
-

bq. edit: and as robert previously pointed out, if we cached misses as well, 
then we could avoid needless seeks on segments that don't contain the term.

True, this is a good idea, just a little tricker:
* In trunk, we have TermsEnum.seek(BytesRef text, boolean useCache), defaulting 
to true.
* FilteredTermsEnum passes false here, so the multitermqueries don't populate 
the cache with 
  garbage while enumerating (eg foo*),  only explicitly at the end with 
cacheTerm() (per-segment) 
  for the ones that were actually accepted. They sum up their docFreq 
themselves to prevent the 
  first wasted seek in TermQuery. 
* So this solution would make MTQ worse, as it would cause them to trash the 
caches in the 
  second wasted seek (the docsenum) where they do not today, with negative 
entries for the 
  segments where the term doesn't exist. Today they do this wasted seek, but 
they don't 
  trash the cache here. The only solution to prevent that is the 
PerReaderTermState 
  (or something equally complicated).
* We would have to look at other places where negative entries would hurt, for 
example 
  rebuilding spellcheck indexes uses this 'termExists()' method implemented 
with docFreq. 
  So we would have to likely change spellcheck's code to use a TermsEnum and 
  seek(term, false)... using a termsenum in parallel with the spellcheck 
dictionary would 
  obviously be more efficient for the index-based spellcheck case (forget about 
caching)
  versus docFreq()'ing every term... *but* we cannot assume the spellcheck 
Dictionary 
  is actually in term order, (imagine the File-based dictionary case), so we 
can't 
  implement this today.

On 3.x i think its slightly less complicated as there is already a hack in the 
cache to 
prevent sequential termsenums from trashing it (e.g. foo*), and pretty much all 
the MTQs 
just enumerate sequentially anyway... (except NRQ which doesn't enum many terms 
anyway, likely not a problem).

But we would have to at least fix the spellcheck case there too I think.

Not saying I don't like your idea... just saying there's more work to do it.


 improve termquery pk lookup performance
 -

 Key: LUCENE-2829
 URL: https://issues.apache.org/jira/browse/LUCENE-2829
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Robert Muir
 Attachments: LUCENE-2829.patch


 For things that are like primary keys and don't exist in some segments (worst 
 case is primary/unique key that only exists in 1)
 we do wasted seeks.
 While LUCENE-2694 tries to solve some of this issue with TermState, I'm 
 concerned we could every backport that to 3.1 for example.
 This is a simpler solution here just to solve this one problem in 
 termquery... we could just revert it in trunk when we resolve LUCENE-2694,
 but I don't think we should leave things as they are in 3.x

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2829) improve termquery pk lookup performance

2010-12-22 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12974229#action_12974229
 ] 

Robert Muir commented on LUCENE-2829:
-

On further thought Yonik, your idea is really completely unrelated.

We shouldn't be seeking to terms/relying upon the terms dictionary cache 
internally when we don't need to...

whether or not its populated with negative entries for the more general case is 
unrelated,
even if we go that route we shouldn't be lazy and rely upon that.


 improve termquery pk lookup performance
 -

 Key: LUCENE-2829
 URL: https://issues.apache.org/jira/browse/LUCENE-2829
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Robert Muir
 Attachments: LUCENE-2829.patch


 For things that are like primary keys and don't exist in some segments (worst 
 case is primary/unique key that only exists in 1)
 we do wasted seeks.
 While LUCENE-2694 tries to solve some of this issue with TermState, I'm 
 concerned we could every backport that to 3.1 for example.
 This is a simpler solution here just to solve this one problem in 
 termquery... we could just revert it in trunk when we resolve LUCENE-2694,
 but I don't think we should leave things as they are in 3.x

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2829) improve termquery pk lookup performance

2010-12-22 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12974262#action_12974262
 ] 

Michael McCandless commented on LUCENE-2829:


bq. The cache has a number of advantages that may never be duplicated in a 
different type of API

+1 -- I agree we should keep the TermState cache.  It has benefits outside of 
re-use win a single query.

But allowing term-lookup-intensive clients like MTQ  to do their own caching 
(ie pulling the TermState from the enum) is also important.  I think we need 
both.

On caching misses... that makes me nervous.  If there are apps out there that 
do alot of checking for terms that don't exist that can destroy the cache.

The cache is a great safety net but I think our core queries should be good 
consumers, when possible, and hold their own TermState.

 improve termquery pk lookup performance
 -

 Key: LUCENE-2829
 URL: https://issues.apache.org/jira/browse/LUCENE-2829
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Robert Muir
 Attachments: LUCENE-2829.patch


 For things that are like primary keys and don't exist in some segments (worst 
 case is primary/unique key that only exists in 1)
 we do wasted seeks.
 While LUCENE-2694 tries to solve some of this issue with TermState, I'm 
 concerned we could every backport that to 3.1 for example.
 This is a simpler solution here just to solve this one problem in 
 termquery... we could just revert it in trunk when we resolve LUCENE-2694,
 but I don't think we should leave things as they are in 3.x

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2829) improve termquery pk lookup performance

2010-12-22 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12974274#action_12974274
 ] 

Earwin Burrfoot commented on LUCENE-2829:
-

Term lookup misses can be alleviated by a simple Bloom Filter.
No caching misses required, helps both PK and near-PK queries.

 improve termquery pk lookup performance
 -

 Key: LUCENE-2829
 URL: https://issues.apache.org/jira/browse/LUCENE-2829
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Robert Muir
 Attachments: LUCENE-2829.patch


 For things that are like primary keys and don't exist in some segments (worst 
 case is primary/unique key that only exists in 1)
 we do wasted seeks.
 While LUCENE-2694 tries to solve some of this issue with TermState, I'm 
 concerned we could every backport that to 3.1 for example.
 This is a simpler solution here just to solve this one problem in 
 termquery... we could just revert it in trunk when we resolve LUCENE-2694,
 but I don't think we should leave things as they are in 3.x

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2829) improve termquery pk lookup performance

2010-12-22 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12974310#action_12974310
 ] 

Robert Muir commented on LUCENE-2829:
-

Bloom filters and negative caches are nice, but please open separate issues!
I am starting to feel like its mandatory to refactor the entirety of lucene to 
make a single incremental improvement.

So, I'd like to proceed with this issue as-is, to make TermWeight explicitly do 
less seeks.


 improve termquery pk lookup performance
 -

 Key: LUCENE-2829
 URL: https://issues.apache.org/jira/browse/LUCENE-2829
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Robert Muir
 Attachments: LUCENE-2829.patch


 For things that are like primary keys and don't exist in some segments (worst 
 case is primary/unique key that only exists in 1)
 we do wasted seeks.
 While LUCENE-2694 tries to solve some of this issue with TermState, I'm 
 concerned we could every backport that to 3.1 for example.
 This is a simpler solution here just to solve this one problem in 
 termquery... we could just revert it in trunk when we resolve LUCENE-2694,
 but I don't think we should leave things as they are in 3.x

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2829) improve termquery pk lookup performance

2010-12-22 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12974350#action_12974350
 ] 

Earwin Burrfoot commented on LUCENE-2829:
-

Nobody halts your progress, we're merely discussing.

I, on the other hand, have a feeling that Lucene is overflowing with single 
incremental improvements aka hacks, as they are easier and faster to 
implement than trying to get a bigger picture, and, yes, rebuilding everything 
:)
For example, better term dict code will make this issue (somewhat hackish, 
admit it?) irrelevant. Whether we implement bloom filters, or just guarantee to 
keep the whole term dict in memory with reasonable lookup routine (eg. as FST).

Having said that, I reiterate, I'm not here to stop you or turn this issue into 
something else.

 improve termquery pk lookup performance
 -

 Key: LUCENE-2829
 URL: https://issues.apache.org/jira/browse/LUCENE-2829
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Robert Muir
 Attachments: LUCENE-2829.patch


 For things that are like primary keys and don't exist in some segments (worst 
 case is primary/unique key that only exists in 1)
 we do wasted seeks.
 While LUCENE-2694 tries to solve some of this issue with TermState, I'm 
 concerned we could every backport that to 3.1 for example.
 This is a simpler solution here just to solve this one problem in 
 termquery... we could just revert it in trunk when we resolve LUCENE-2694,
 but I don't think we should leave things as they are in 3.x

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2829) improve termquery pk lookup performance

2010-12-22 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12974354#action_12974354
 ] 

Robert Muir commented on LUCENE-2829:
-

bq. For example, better term dict code will make this issue (somewhat hackish, 
admit it?) irrelevant. 

Right, it is hackish, but what is a worse hack is wasted seeks in our next 3.1 
release because we can't
keep scope under control and fix small problems without rewriting everything, 
which means less 
gets backported to our stable branch.

Anyway, I'm just gonna mark this won't fix so I don't have to deal with it 
anymore.

 improve termquery pk lookup performance
 -

 Key: LUCENE-2829
 URL: https://issues.apache.org/jira/browse/LUCENE-2829
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Robert Muir
 Attachments: LUCENE-2829.patch


 For things that are like primary keys and don't exist in some segments (worst 
 case is primary/unique key that only exists in 1)
 we do wasted seeks.
 While LUCENE-2694 tries to solve some of this issue with TermState, I'm 
 concerned we could every backport that to 3.1 for example.
 This is a simpler solution here just to solve this one problem in 
 termquery... we could just revert it in trunk when we resolve LUCENE-2694,
 but I don't think we should leave things as they are in 3.x

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2829) improve termquery pk lookup performance

2010-12-21 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12973869#action_12973869
 ] 

Michael McCandless commented on LUCENE-2829:


I made a random PK lookup tester (committed to luceneutil), to lookup by docid 
(unique key) from the luceneutil index.

Pre-patch it's 53 usec per lookup and with this patch it's 31 usec -- ~42% 
faster!

 improve termquery pk lookup performance
 -

 Key: LUCENE-2829
 URL: https://issues.apache.org/jira/browse/LUCENE-2829
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Robert Muir
 Attachments: LUCENE-2829.patch


 For things that are like primary keys and don't exist in some segments (worst 
 case is primary/unique key that only exists in 1)
 we do wasted seeks.
 While LUCENE-2694 tries to solve some of this issue with TermState, I'm 
 concerned we could every backport that to 3.1 for example.
 This is a simpler solution here just to solve this one problem in 
 termquery... we could just revert it in trunk when we resolve LUCENE-2694,
 but I don't think we should leave things as they are in 3.x

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2829) improve termquery pk lookup performance

2010-12-21 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12973877#action_12973877
 ] 

Robert Muir commented on LUCENE-2829:
-

right, we just have to not do stupid things like hash hashcodes to make it 
faster for when the data is hot...
but as a start this is safe, hopefully we could do something non-invasive (and 
backportable) to make it faster.



 improve termquery pk lookup performance
 -

 Key: LUCENE-2829
 URL: https://issues.apache.org/jira/browse/LUCENE-2829
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Robert Muir
 Attachments: LUCENE-2829.patch


 For things that are like primary keys and don't exist in some segments (worst 
 case is primary/unique key that only exists in 1)
 we do wasted seeks.
 While LUCENE-2694 tries to solve some of this issue with TermState, I'm 
 concerned we could every backport that to 3.1 for example.
 This is a simpler solution here just to solve this one problem in 
 termquery... we could just revert it in trunk when we resolve LUCENE-2694,
 but I don't think we should leave things as they are in 3.x

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2829) improve termquery pk lookup performance

2010-12-21 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12973882#action_12973882
 ] 

Hoss Man commented on LUCENE-2829:
--

The patch is over my head, but providing a super optimized solution to the 
primary key type lookup problem definitely seems worthwhile -- it has me 
wondering if a PrimaryKeyQuery that works like TermQuery bug quits collecting 
as soon as it finds one matching document would be a good idea?

 improve termquery pk lookup performance
 -

 Key: LUCENE-2829
 URL: https://issues.apache.org/jira/browse/LUCENE-2829
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Robert Muir
 Attachments: LUCENE-2829.patch


 For things that are like primary keys and don't exist in some segments (worst 
 case is primary/unique key that only exists in 1)
 we do wasted seeks.
 While LUCENE-2694 tries to solve some of this issue with TermState, I'm 
 concerned we could every backport that to 3.1 for example.
 This is a simpler solution here just to solve this one problem in 
 termquery... we could just revert it in trunk when we resolve LUCENE-2694,
 but I don't think we should leave things as they are in 3.x

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2829) improve termquery pk lookup performance

2010-12-21 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12973889#action_12973889
 ] 

Robert Muir commented on LUCENE-2829:
-

Hoss Man, well I think if you surely know its a PK field you can definitely do 
something better, starting with a custom collector that does something like 
what you mentioned, with no PQ at all etc.

But in this case, though i categorized it as PK, the general problem is this:
* in lots of cases we do redundant seeks, like to get the docFreq, then to get 
the DocsEnum
* in most cases the term dictionary cache helps here because the 2nd time (e.g. 
getting DocsEnum) is cached.

Here's the problem with PK or PK-ish (low freq terms like what 
wildcards/fuzzies/range queries hit too):
* our cache doesn't cache negative hits, the fact that a term *doesnt* exist 
in some segment.
* For example in the PK case, if there are 15 segments we always get at most 1 
cache hit and 
at least 14 misses when getting the DocsEnum, so we do at least 14 wasted seeks 
always.
* For other low frequency terms that don't exist in all segments (very precise 
dates or what have you) 
the same idea applies, just to a lesser extent: the PK is the worst.


 improve termquery pk lookup performance
 -

 Key: LUCENE-2829
 URL: https://issues.apache.org/jira/browse/LUCENE-2829
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Robert Muir
 Attachments: LUCENE-2829.patch


 For things that are like primary keys and don't exist in some segments (worst 
 case is primary/unique key that only exists in 1)
 we do wasted seeks.
 While LUCENE-2694 tries to solve some of this issue with TermState, I'm 
 concerned we could every backport that to 3.1 for example.
 This is a simpler solution here just to solve this one problem in 
 termquery... we could just revert it in trunk when we resolve LUCENE-2694,
 but I don't think we should leave things as they are in 3.x

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2829) improve termquery pk lookup performance

2010-12-21 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12973892#action_12973892
 ] 

Robert Muir commented on LUCENE-2829:
-

bq. I think a cleaner interface may be for the Weight.scorer method to receive 
the ord of the sub reader in the parent?

Yes, ideally with the actual df in there. This would save the third seek in the 
bulkpostings branch.

But at the same time, i'm worried/don't want this issue to evolve into 
TermState (LUCENE-2694). I wasn't thinking 
that this was any kind of end-solution but just an approach we could take that 
would work against 3.1


 improve termquery pk lookup performance
 -

 Key: LUCENE-2829
 URL: https://issues.apache.org/jira/browse/LUCENE-2829
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Robert Muir
 Attachments: LUCENE-2829.patch


 For things that are like primary keys and don't exist in some segments (worst 
 case is primary/unique key that only exists in 1)
 we do wasted seeks.
 While LUCENE-2694 tries to solve some of this issue with TermState, I'm 
 concerned we could every backport that to 3.1 for example.
 This is a simpler solution here just to solve this one problem in 
 termquery... we could just revert it in trunk when we resolve LUCENE-2694,
 but I don't think we should leave things as they are in 3.x

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2829) improve termquery pk lookup performance

2010-12-21 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12973908#action_12973908
 ] 

Robert Muir commented on LUCENE-2829:
-

bq. But, then, passing a struct (parent/sub/ord) is a fairly small change, and, 
if it matches the change we will make on LUCENE-2694, then that's great.

Ok, that might be a good approach, to fix the it this way in LUCENE-2694 (or 
actually, preferably add the parent/sub/ord in its own issue!), 
but in 3.1 we could use the struct to avoid wasted seeks on PK terms... 

Seems like backporting the entire termstate thing could be a little 
tricky/risky for 3.1, with not much to gain there except
PK lookups anyway, since the multitermqueries there tend to be slow (dominated 
by term comparison) and don't even work
per-segment anyway.


 improve termquery pk lookup performance
 -

 Key: LUCENE-2829
 URL: https://issues.apache.org/jira/browse/LUCENE-2829
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Robert Muir
 Attachments: LUCENE-2829.patch


 For things that are like primary keys and don't exist in some segments (worst 
 case is primary/unique key that only exists in 1)
 we do wasted seeks.
 While LUCENE-2694 tries to solve some of this issue with TermState, I'm 
 concerned we could every backport that to 3.1 for example.
 This is a simpler solution here just to solve this one problem in 
 termquery... we could just revert it in trunk when we resolve LUCENE-2694,
 but I don't think we should leave things as they are in 3.x

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org