[jira] Commented: (LUCENE-2026) Refactoring of IndexWriter

2009-12-11 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789618#action_12789618
 ] 

Jake Mannix commented on LUCENE-2026:
-

bq. Now we want more speed, and are ready to sacrifice something if needed.
bq. You decide to sacrifice new record (in)visibility. No choice, but to hack 
into IW to allow readers see its hot, fresh innards.

Chiming in here that of course, you don't *need* (ie there is a choice) to hack 
into the IW to do this.  Zoie is a completely user-land solution which modifies 
no IW/IR internals and yet achieves millisecond index-to-query-visibility 
turnaround while keeping speedy indexing and query performance.  It just keeps 
the RAMDir outside encapsulated in an object (an IndexingSystem) which has 
IndexReaders built off of both the RAMDir and the FSDir, and hides the 
implementation details (in fact the IW itself) from the user.  

The API for this kind of thing doesn't *have* to be tightly coupled, and I 
would agree with you that it shouldn't be.

> Refactoring of IndexWriter
> --
>
> Key: LUCENE-2026
> URL: https://issues.apache.org/jira/browse/LUCENE-2026
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: 3.1
>
>
> I've been thinking for a while about refactoring the IndexWriter into
> two main components.
> One could be called a SegmentWriter and as the
> name says its job would be to write one particular index segment. The
> default one just as today will provide methods to add documents and
> flushes when its buffer is full.
> Other SegmentWriter implementations would do things like e.g. appending or
> copying external segments [what addIndexes*() currently does].
> The second component's job would it be to manage writing the segments
> file and merging/deleting segments. It would know about
> DeletionPolicy, MergePolicy and MergeScheduler. Ideally it would
> provide hooks that allow users to manage external data structures and
> keep them in sync with Lucene's data during segment merges.
> API wise there are things we have to figure out, such as where the
> updateDocument() method would fit in, because its deletion part
> affects all segments, whereas the new document is only being added to
> the new segment.
> Of course these should be lower level APIs for things like parallel
> indexing and related use cases. That's why we should still provide
> easy to use APIs like today for people who don't need to care about
> per-segment ops during indexing. So the current IndexWriter could
> probably keeps most of its APIs and delegate to the new classes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2143) Understand why NRT performance is affected by flush frequency

2009-12-10 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788871#action_12788871
 ] 

Jake Mannix commented on LUCENE-2143:
-

So why also, are you only picking one indexing rate?  Should you not also be 
varying this from low, all the way up to saturating the system with maximum 
indexing rate, to properly stress the system?

> Understand why NRT performance is affected by flush frequency
> -
>
> Key: LUCENE-2143
> URL: https://issues.apache.org/jira/browse/LUCENE-2143
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.1
>
>
> In LUCENE-2061 (perf tests for NRT), I test NRT performance by first
> getting a baseline QPS with only searching, using enough threads to
> saturate.
> Then, I pick an indexing rate (I used 100 docs/sec), and index docs at
> that rate, and I also reopen a NRT reader at different frequencies
> (10/sec, 1/sec, every 5 seconds, etc.), and then again test QPS
> (saturated).
> I think this is a good approach for testing NRT -- apps can see, as a
> function of "freshness" and at a fixed indexing rate, what the cost is
> to QPS.  You'd expect as index rate goes up, and freshness goes up,
> QPS will go down.
> But I found something very strange: the low frequency reopen rates
> often caused a highish hit to QPS.  When I forced IW to flush every
> 100 docs (= once per second), the performance was generally much
> better.
> I actually would've expected the reverse -- flushing in batch ought to
> use fewer resoruces.
> One theory is something odd about my test env (based on OpenSolaris),
> so I'd like to retest on a more mainstream OS.
> I'm opening this issue to get to the bottom of it...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1377) Add HTMLStripReader and WordDelimiterFilter from SOLR

2009-12-08 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787647#action_12787647
 ] 

Jake Mannix commented on LUCENE-1377:
-

bq. A solr developer adds something to solr w/o putting it in lucene - only 
solr users benefit.
bq. A lucene developer adds something to lucene w/o adding support for that in 
solr - only lucene users benefit.

I agree with the first, but not the second:  solr users benefit on the next 
release cycle of solr which includes the lucene changes.  Your statements here 
only apply if there wasn't a dependency of solr on lucene.

> Add HTMLStripReader and WordDelimiterFilter from SOLR
> -
>
> Key: LUCENE-1377
> URL: https://issues.apache.org/jira/browse/LUCENE-1377
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Analysis
>Affects Versions: 2.3.2
>Reporter: Jason Rutherglen
>Priority: Minor
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> SOLR has two classes HTMLStripReader and WordDelimiterFilter which are very 
> useful for a wide variety of use cases.  It would be good to place them into 
> core Lucene.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1377) Add HTMLStripReader and WordDelimiterFilter from SOLR

2009-12-08 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787605#action_12787605
 ] 

Jake Mannix commented on LUCENE-1377:
-

bq. But we've been unable to make that happen in the past, so betting on it 
going forward seems like a mistake.

Sure, but keeping code which could be used by any Lucene project sequestered in 
Solr because you want to be able to modify it on a different time-schedule 
seems like bad open-source policy.

I guess what I'd change my statement to is that yes, it would be a downside to 
the Solr community (anytime you depend on another project, you also depend on 
its release schedule).  But I'd still hold that this fact alone should not keep 
code *out* of Lucene-core.  Anything which isn't Solr-specific should belong 
deeper in, to allow use by more people, otherwise only Solr users get the 
benefit.

> Add HTMLStripReader and WordDelimiterFilter from SOLR
> -
>
> Key: LUCENE-1377
> URL: https://issues.apache.org/jira/browse/LUCENE-1377
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Analysis
>Affects Versions: 2.3.2
>Reporter: Jason Rutherglen
>Priority: Minor
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> SOLR has two classes HTMLStripReader and WordDelimiterFilter which are very 
> useful for a wide variety of use cases.  It would be good to place them into 
> core Lucene.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1377) Add HTMLStripReader and WordDelimiterFilter from SOLR

2009-12-08 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787595#action_12787595
 ] 

Jake Mannix commented on LUCENE-1377:
-

bq. There are some downsides for Solr: if these are in Lucene, we (solr) lose 
the ability to easily add new features or fix bugs... mainly because Solr 
doesn't use Lucene trunk anymore.

Who is this "we" of which you speak?  If someone using Solr needs something 
changed in Lucene, a patch submitted to Lucene JIRA should be no more hard to 
get committed than a patch submitted to the Solr JIRA, no?  The release cycle 
may be slower, but yes, as Earwin says, maybe this just gives another reason 
for more rapid minor version iteration in Lucene.

> Add HTMLStripReader and WordDelimiterFilter from SOLR
> -
>
> Key: LUCENE-1377
> URL: https://issues.apache.org/jira/browse/LUCENE-1377
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Analysis
>Affects Versions: 2.3.2
>Reporter: Jason Rutherglen
>Priority: Minor
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> SOLR has two classes HTMLStripReader and WordDelimiterFilter which are very 
> useful for a wide variety of use cases.  It would be good to place them into 
> core Lucene.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1526) For near real-time search, use paged copy-on-write BitVector impl

2009-11-12 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777068#action_12777068
 ] 

Jake Mannix commented on LUCENE-1526:
-

bq. OK. It's clear Zoie's design is optimized for insanely fast reopen.

That, and maxing out QPS and indexing rate while keeping query latency 
degredation to a minimum.  From trying to turn off the extra deleted check, the 
latency overhead on a 5M doc index is a difference of queries taking 12-13ms 
with the extra check turned on, and 10ms without it, and you only really start 
to notice on the extreme edges (the queries hitting all 5million docs by way of 
an actual query (not MatchAllDocs)), when your performance goes from maybe 
100ms to 140-150ms.  

bq. EG what I'd love to see is, as a function of reopen rate, the "curve" of 
QPS vs docs per sec. Ie, if you reopen 1X per second, that consumes some of 
your machine's resources. What's left can be spent indexing or searching or 
both, so, it's a curve/line. So we should set up fixed rate indexing, and then 
redline the QPS to see what's possible, and do this for multiple indexing 
rates, and for multiple reopen rates.

Yes, that curve would be a very useful benchmark.  Now that I think of it, it 
wouldn't be too hard to just sneak some reader caching into the ZoieSystem with 
a tunable parameter for how long you hang onto it, so that we could see how 
much that can help.  One of the nice things that we can do in Zoie by using 
this kind of index-latency backoff, is that because we have an in-memory 
two-way mapping of zoie-specific UID to docId, if we actually have time (in the 
background, since we're caching these readers now) to zip through and update 
the real delete BitVectors on the segments, and lose the extra check at query 
time, only using that if you have the index-latency time set below some 
threshold (determined by how long it takes the system to do this resolution - 
mapping docId to UID is an array lookup, the reverse is a little slower).

bq. Right, Zoie is making determined tradeoffs. I would expect that most apps 
are fine with controlled reopen frequency, ie, they would choose to not lose 
indexing and searching performance if it means they can "only" reopen, eg, 2X 
per second.

In theory Zoie is making tradeoffs - in practice, at least against what is on 
trunk, Zoie's just going way faster in both indexing and querying in the 
redline perf test.  I agree that in principle, once LUCENE-1313 and other 
improvements and bugs have been worked out of NRT, that query performance 
should be faster, and if zoie's default BalancedMergePolicy (nee 
ZoieMergePolicy) is in use for NRT, the indexing performance should be faster 
too - it's just not quite there yet at this point.

bq. I agree - having such well defined API semantics ("once updateDoc returns, 
searches can see it") is wonderful. But I think they can be cleanly built on 
top of Lucene NRT as it is today, with a pre-determined (reopen) latency.

Of course!  These api semantics are already built up on top of plain-old Lucene 
- even without NRT, so I can't imagine how NRT would *remove* this ability! :)

bq. I think the "large merge just finished" case is the most costly for such 
apps (which the "merged segment warmer" on IW should take care of)? (Because 
otherwise the segments are tiny, assuming everything is cutover to "per 
segment").

Definitely.  One thing that Zoie benefited from, from an API standpoint, which 
might be nice in Lucene, now that 1.5 is in place, is that the 
IndexReaderWarmer could replace the raw SegmentReader with a warmed 
user-specified subclass of SegmentReader:

{code}
public abstract class IndexReaderWarmer {
  public abstract T warm(IndexReader rawReader);
}
{code}

Which could replace the reader in the readerPool with the 
possibly-user-overridden subclass of SegmentReader (now that SegmentReader is 
as public as IndexReader itself is) which has now been warmed.  For users who 
like to decorate their readers to keep additional state, instead of use them as 
keys into WeakHashMaps kept separate, this could be extremely useful (I know 
that the people I talked to at Apple's iTunes store do this, as well as in 
bobo, and zoie, to name a few examples off the top of my head).

bq.  I think Lucene could handle this well, if we made an IndexReader impl that 
directly searches DocumentWriter's RAM buffer. But that's somewhat challenging

Jason mentioned this approach in his talk at ApacheCon, but I'm not at all 
convinced it's necessary - if a single box can handle indexing a couple hundred 
smallish documents a second (into a RAMDirectory), and could be sped up by 
using multiple IndexWriters (writing into multiple RAMDirecotries in parallel, 
if you were willing to give up some CPU cores to indexing), and you can search 
against them without having to do any deduplification / bloomfilter check 
against th

[jira] Commented: (LUCENE-1526) For near real-time search, use paged copy-on-write BitVector impl

2009-11-11 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776568#action_12776568
 ] 

Jake Mannix commented on LUCENE-1526:
-

bq. These sound serious - if you can provide any details, that'd help.  I'll do 
some stress testing too. Thanks for testing and reporting 

Out of these, the specific issue of incorrectness of applied deletes is easiest 
to see - we saw it by indexing up to a million docs, then keep adding docs but 
only after doing a delete on the UID where UID instead of increasing, is looped 
around mod 1million.  Calling numDocs (not maxDoc) on the reader with Zoie 
always returns 1M after looping around, but with NRT, it starts slowly growing 
above 1M.

The CPU and memory is undoubtedly due to the constant reopening of these 
readers, and yes, could be aleiviated by not doing this - we're just comparing 
to the zoie case, where we *do* reopen (the RAMDir) on every request (and copy 
the delSet) if there have been modifications since the last update.

bq. Lucene NRT makes a clone of the BitVector for every reader that has new 
deletions. Once this is done, searching is "normal" - it's as if the reader 
were a disk reader. There's no extra checking of deleted docs (unlike Zoie), no 
OR'ing of 2 BitVectors, etc.

Ok, so if this is copy-on-write, it's done every time there is a new delete for 
that segment?  If the disk index is optimized that means it would happen on 
every update, a clone of the full numDocs sized BitVector?  I'm still a little 
unsure of how this happens. 

* somebody calls getReader() - they've got all the SegmentReaders for the disk 
segments, and each of them have BitVectors for deletions.
* IW.update() gets called - the BitVector for the segment which now has a 
deletion is cloned, and set on a new pooled SegmentReader as its deletedSet
* maybe IW.update() gets called a bunch more - do these modify the pooled but 
as-yet-unused SegmentReader?  New readers in the pool?  What?
* another call to getReader() comes in, and gets an IndexReader wrapping the 
pooled SegmentReaders.

bq. Yes, this makes Lucene's reopen more costly. But, then there's no double 
checking for deletions. That's the tradeoff, and this is why the 64 msec is 
added to Zoie's search time. Zoie's searches are slower.

So we re-ran some of our tests last night, commenting out our deleted check to 
measure it's cost in the most extreme case possible: a dead easy query (in that 
it's only one term), but one which yes, hits the entire index (doing a 
MatchAllDocs query is actually special-cased in our code, and is perfectly 
fast, so not a good worst case to check), and as the index grows up above a 
million documents, zoie could shave somewhere from 22-28% of its time off by 
not doing the extra check.  

We haven't re-run the test to see what happens as the index grows to 5M or 10M 
yet, but I can probably run that later today.

bq. The fact that Zoie on the pure indexing case (ie no deletions) was 10X 
faster than Lucene is very weird - that means something else is up,
besides how deletions are carried in RAM. It's entirely possible it's the fact 
that Lucene doesn't flush the tiny segments to a RAMDir (which LUCENE-1313 
addresses).

Yeah, if you call getReader() a bunch of times per second, each one does a 
flush(true,true,true), right?  Without having LUCENE-1313, this kills the 
indexing performance if querying is going on.  If no getReader() is being 
called at all, Zoie is about 10% slower than pure Lucene IndexWriter.add() 
(that's the cost of doing it in two steps - index into *two* RAMDirs [so they 
are hot-swappable] and then writing segments to disk with 
addIndexesNoOptimize() periodically).

I don't think there's any difference in the MergePolicy - I think they're both 
using the BalancedMergePolicy (since that's the one which is optimized for the 
realtime case).

bq. Actually I thought of a simple way to run the "search only" (not reopen) 
test - I'll just augment TopScoreDocCollector to optionally check the 
IntSetAccelerator, and measure the cost in practice, for different numbers of 
docs added to the IntSet.

Due to the bloomfilter living on top of the hashSet, at least at the scales 
we're dealing with, we didn't see any change in cost due to the number of 
deletions (zoie by default keeps no more than 10k modifications in memory 
before flushing to disk, so the biggest the delSet is going to be is that, and 
we don't see the more-than-constant scaling yet at that size).

bq. But your test is missing a dimension: frequency of reopen. If you reopen 
once per second, how do Zoie/Lucene compare? Twice per second? Once every 5 
seconds? Etc.

Yep, this is true.  It's a little more invasive to put this into Zoie, because 
the reopen time is so fast that there's no pooling, so it would need to be 
kinda hacked in, or tacked on to the outside.  Not rocket science, but not

[jira] Commented: (LUCENE-1526) For near real-time search, use paged copy-on-write BitVector impl

2009-11-10 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776329#action_12776329
 ] 

Jake Mannix commented on LUCENE-1526:
-

bq. Whoa, pretty insane volume. 

Aiming for maxing out indexing speed and query throughput at the same time is 
what we're testing here, and this is a reasonable extreme limit to aim for when 
stress-testing real-time search.

bq. A handful by pooling the BitVector fixed size bytes arrays (see 
LUCENE-1574).

Pooling, you say?  But what if updates come in too fast to reuse your pool?  If 
you're indexing at the speeds I'm describing, won't you run out of BitVectors 
in the pool?

bq. I really need a solution that will absolutely not affect query performance 
from what is today

"You" really need this?  Why is the core case for real-time search a scenario 
where taking a hit of a huge reduction in throughput worth a possible gain in 
query latency?  If the cost was 20% query latency drop in exchange for 7x 
throughput cost when doing heavy indexing, is that worth it?  What about 10% 
latency cost vs 2x throughput loss?  These questions aren't easily answered by 
saying real-time search with Lucene needs  to _absolutely not affect query 
performance from what it is today_.  These kinds of absolute statements should 
be backed up by comparisons with real performance and load testing.

There are many axes of performance to optimize for: 
* query latency
* query throughput
* indexing throughput
* index freshness (how fast before documents are visible)

Saying that one of these is absolutely of more importance than the others 
without real metrics showing which ones are affected in which ways by different 
implementation choices is doing a disservice to the community, and is not by 
any means "conservative".

> For near real-time search, use paged copy-on-write BitVector impl
> -
>
> Key: LUCENE-1526
> URL: https://issues.apache.org/jira/browse/LUCENE-1526
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Attachments: LUCENE-1526.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> SegmentReader currently uses a BitVector to represent deleted docs.
> When performing rapid clone (see LUCENE-1314) and delete operations,
> performing a copy on write of the BitVector can become costly because
> the entire underlying byte array must be created and copied. A way to
> make this clone delete process faster is to implement tombstones, a
> term coined by Marvin Humphrey. Tombstones represent new deletions
> plus the incremental deletions from previously reopened readers in
> the current reader. 
> The proposed implementation of tombstones is to accumulate deletions
> into an int array represented as a DocIdSet. With LUCENE-1476,
> SegmentTermDocs iterates over deleted docs using a DocIdSet rather
> than accessing the BitVector by calling get. This allows a BitVector
> and a set of tombstones to by ANDed together as the current reader's
> delete docs. 
> A tombstone merge policy needs to be defined to determine when to
> merge tombstone DocIdSets into a new deleted docs BitVector as too
> many tombstones would eventually be detrimental to performance. A
> probable implementation will merge tombstones based on the number of
> tombstones and the total number of documents in the tombstones. The
> merge policy may be set in the clone/reopen methods or on the
> IndexReader. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1526) For near real-time search, use paged copy-on-write BitVector impl

2009-11-10 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776294#action_12776294
 ] 

Jake Mannix commented on LUCENE-1526:
-

bq. Zoie must do the IntSet check plus the BitVector check (done by
Lucene), right?

Yes, so how does Lucene NRT deal with new deletes?  The disk-backed IndexReader 
still does its internal check for deletions, right?  I haven't played with the 
latest patches on LUCENE-1313, so I'm not sure what has changed, but if you're 
leaving the disk index alone (to preserve point-in-time status of the index 
without writing to disk all the time), you've got your in-memory BitVector of 
newly uncommitted deletes, and then the SegmentReaders from the disk have their 
own internal deletedDocs BitVector.  Are these two OR'ed with each other 
somewhere?  What is done in NRT to minimize the time of checking both of these 
without modifying the read-only SegmentReader?  In the current 2.9.0 code, the 
segment is reloaded completely on getReader() if there are new add/deletes, 
right?

bq. Ie comparing IntSet lookup vs BitVector lookup isn't the comparison
you want to do. You should compare the IntSet lookup (Zoie's added
cost) to 0.

If you've got that technique for resolving new deletes against the disk-based 
ones while maintaining point-in-time nature and can completely amortize the 
reopen cost so that it doesn't affect performance, then yeah, that would be the 
comparison.  I'm not sure I understand how the NRT implementation is doing this 
currently - I tried to step through the debugger while running the 
TestIndexWriterReader test, but I'm still not quite sure what is going on 
during the reopen.

bq.  So, for a query that hits 5M docs, Zoie will take 64 msec longer than
Lucene, due to the extra check. What I'd like to know is what
pctg. slowdown that works out to be, eg for a simple TermQuery that
hits those 5M results - that's Zoie's worst case search slowdown.

Yes, this is a good check to see, for while it is still a micro-benchmark, 
really, since it would be done in isolation, while no other production tasks 
are going on, like rapid indexing and the consequent flushes to disk and reader 
reopening is going on, but it would be useful to see.

What would be even better, however, would be to have a running system whereby 
there is continual updating of the index, and many concurrent requests are 
coming in which hit all 5M documents, and measure the mean latency for zoie in 
this case, in both comparison to NRT, and in comparison to lucene when you 
*don't* reopen the index (ie. you do things the pre-lucene2.9 way, where the 
CPU is still being consumed by indexing, but the reader is out of date until 
the next time it's scheduled by the application to reopen).  This would measure 
the effective latency and throughtput costs of zoie and NRT vs non-NRT lucene.  
I'm not really sure it's terribly helpful to see "what is zoie's latency when 
you're not indexing at all" - why on earth would you use either NRT or zoie if 
you're not doing lots of indexing? 

> For near real-time search, use paged copy-on-write BitVector impl
> -
>
> Key: LUCENE-1526
> URL: https://issues.apache.org/jira/browse/LUCENE-1526
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Attachments: LUCENE-1526.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> SegmentReader currently uses a BitVector to represent deleted docs.
> When performing rapid clone (see LUCENE-1314) and delete operations,
> performing a copy on write of the BitVector can become costly because
> the entire underlying byte array must be created and copied. A way to
> make this clone delete process faster is to implement tombstones, a
> term coined by Marvin Humphrey. Tombstones represent new deletions
> plus the incremental deletions from previously reopened readers in
> the current reader. 
> The proposed implementation of tombstones is to accumulate deletions
> into an int array represented as a DocIdSet. With LUCENE-1476,
> SegmentTermDocs iterates over deleted docs using a DocIdSet rather
> than accessing the BitVector by calling get. This allows a BitVector
> and a set of tombstones to by ANDed together as the current reader's
> delete docs. 
> A tombstone merge policy needs to be defined to determine when to
> merge tombstone DocIdSets into a new deleted docs BitVector as too
> many tombstones would eventually be detrimental to performance. A
> probable implementation will merge tombstones based on the number of
> tombstones and the total number of documents in the tombstones. The
> merge policy may be set in the clone/reopen methods or on the
> IndexReade

[jira] Commented: (LUCENE-1526) For near real-time search, use paged copy-on-write BitVector impl

2009-11-10 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776277#action_12776277
 ] 

Jake Mannix commented on LUCENE-1526:
-

But Jason, if you're indexing 300 documents a second (possibly all of which are 
delete+re-add), and querying at a thousand queries a second, how many of these 
BitVectors are you going to end up making?

> For near real-time search, use paged copy-on-write BitVector impl
> -
>
> Key: LUCENE-1526
> URL: https://issues.apache.org/jira/browse/LUCENE-1526
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Attachments: LUCENE-1526.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> SegmentReader currently uses a BitVector to represent deleted docs.
> When performing rapid clone (see LUCENE-1314) and delete operations,
> performing a copy on write of the BitVector can become costly because
> the entire underlying byte array must be created and copied. A way to
> make this clone delete process faster is to implement tombstones, a
> term coined by Marvin Humphrey. Tombstones represent new deletions
> plus the incremental deletions from previously reopened readers in
> the current reader. 
> The proposed implementation of tombstones is to accumulate deletions
> into an int array represented as a DocIdSet. With LUCENE-1476,
> SegmentTermDocs iterates over deleted docs using a DocIdSet rather
> than accessing the BitVector by calling get. This allows a BitVector
> and a set of tombstones to by ANDed together as the current reader's
> delete docs. 
> A tombstone merge policy needs to be defined to determine when to
> merge tombstone DocIdSets into a new deleted docs BitVector as too
> many tombstones would eventually be detrimental to performance. A
> probable implementation will merge tombstones based on the number of
> tombstones and the total number of documents in the tombstones. The
> merge policy may be set in the clone/reopen methods or on the
> IndexReader. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1526) For near real-time search, use paged copy-on-write BitVector impl

2009-11-09 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12775151#action_12775151
 ] 

Jake Mannix commented on LUCENE-1526:
-

bq. But how many msec does this clone add in practice?  Note that it's only 
done if there is a new deletion against that
segment.  I do agree it's silly wasteful, but searching should then be faster
than using AccelerateIntSet or MultiBitSet. It's a tradeoff of the
turnaround time for search perf.

I actually don't know for sure if this is the majority of the time, as I 
haven't actually run both the AcceleratedIntSet or 2.9 NRT through a profiler, 
but if you're indexing at high speed (which is what is done in our load/perf 
tests), you're going to be cloning these things hundreds of times per second 
(look at the indexing throughput we're forcing the system to go through), and 
even if it's fast, that's costly.

bq. I'd love to see how the worst-case queries (matching millions of hits)
perform with each of these three options.

It's pretty easy to change the index and query files in our test to do that, 
that's a good idea.  You can feel free to check out our load testing framework 
too - it will let you monkey with various parameters, monitor the whole thing 
via JMX, and so forth, both for the full zoie-based stuff, and where the zoie 
api is wrapped purely around Lucene 2.9 NRT.   The instructions for how to set 
it up are on the zoie wiki.

bq. When a doc needs to be updated, you index it immediately into the
RAMDir, and reopen the RAMDir's IndexReader. You add it's UID to the
AcceleratedIntSet, and all searches "and NOT"'d against that set. You
don't tell Lucene to delete the old doc, yet.

Yep, basically.  The IntSetAccellerator (of UIDs) is set on the (long lived) 
IndexReader for the disk index - this is why it's done as a ThreadLocal - 
everybody is sharing that IndexReader, but different threads have different 
point-in-time views of how much of it has been deleted.

bq. These are great results! If I'm reading them right, it looks like
generally you get faster query throughput, and roughly equal indexing
throughput, on upgrading from 2.4 to 2.9?

That's about right.  Of course, the comparison between zoie with either 2.4 or 
2.9 against lucene 2.9 NRT is an important one to look at: zoie is pushing 
about 7-9x better throughput for both queries and indexing than NRT.

I'm sure the performance numbers would change if we allowed not realtimeness, 
yes, that's one of the many dimensions to consider in this (along with 
percentage of indexing events which are deletes, how many of those are from 
really old segments vs. newer ones, how big the queries are, etc...).

bq. One optimization you could make with Zoie is, if a real-time deletion
(from the AcceleratedIntSet) is in fact hit, it could mark the
corresponding docID, to make subsequent searches a bit faster (and
save the bg CPU when flushing the deletes to Lucene).

That sound interesting - how would that work?  We don't really touch the disk 
indexReader, other than to set this modSet on it in the ThreadLocal, where 
would this mark live?


> For near real-time search, use paged copy-on-write BitVector impl
> -
>
> Key: LUCENE-1526
> URL: https://issues.apache.org/jira/browse/LUCENE-1526
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Attachments: LUCENE-1526.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> SegmentReader currently uses a BitVector to represent deleted docs.
> When performing rapid clone (see LUCENE-1314) and delete operations,
> performing a copy on write of the BitVector can become costly because
> the entire underlying byte array must be created and copied. A way to
> make this clone delete process faster is to implement tombstones, a
> term coined by Marvin Humphrey. Tombstones represent new deletions
> plus the incremental deletions from previously reopened readers in
> the current reader. 
> The proposed implementation of tombstones is to accumulate deletions
> into an int array represented as a DocIdSet. With LUCENE-1476,
> SegmentTermDocs iterates over deleted docs using a DocIdSet rather
> than accessing the BitVector by calling get. This allows a BitVector
> and a set of tombstones to by ANDed together as the current reader's
> delete docs. 
> A tombstone merge policy needs to be defined to determine when to
> merge tombstone DocIdSets into a new deleted docs BitVector as too
> many tombstones would eventually be detrimental to performance. A
> probable implementation will merge tombstones based on the number of
> tombstones and the total number of documents in the tombstones. The
> merge policy may be

[jira] Commented: (LUCENE-1526) For near real-time search, use paged copy-on-write BitVector impl

2009-11-09 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12775082#action_12775082
 ] 

Jake Mannix commented on LUCENE-1526:
-

I'll try to get those numbers for you, they should be in our logs, and if not, 
it's easy enough to put them.My guess it that if zoie is doing 7x the QPS, 
the latency is significantly *less* than the NRT latency, not more, but I could 
be wrong.

Note that without concurrent indexing, the query speed will be the same, as all 
docs will be flushed to disk and the isDeleted check reduces to exactly the raw 
lucene case.

> For near real-time search, use paged copy-on-write BitVector impl
> -
>
> Key: LUCENE-1526
> URL: https://issues.apache.org/jira/browse/LUCENE-1526
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Attachments: LUCENE-1526.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> SegmentReader currently uses a BitVector to represent deleted docs.
> When performing rapid clone (see LUCENE-1314) and delete operations,
> performing a copy on write of the BitVector can become costly because
> the entire underlying byte array must be created and copied. A way to
> make this clone delete process faster is to implement tombstones, a
> term coined by Marvin Humphrey. Tombstones represent new deletions
> plus the incremental deletions from previously reopened readers in
> the current reader. 
> The proposed implementation of tombstones is to accumulate deletions
> into an int array represented as a DocIdSet. With LUCENE-1476,
> SegmentTermDocs iterates over deleted docs using a DocIdSet rather
> than accessing the BitVector by calling get. This allows a BitVector
> and a set of tombstones to by ANDed together as the current reader's
> delete docs. 
> A tombstone merge policy needs to be defined to determine when to
> merge tombstone DocIdSets into a new deleted docs BitVector as too
> many tombstones would eventually be detrimental to performance. A
> probable implementation will merge tombstones based on the number of
> tombstones and the total number of documents in the tombstones. The
> merge policy may be set in the clone/reopen methods or on the
> IndexReader. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1526) For near real-time search, use paged copy-on-write BitVector impl

2009-11-07 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12774693#action_12774693
 ] 

Jake Mannix commented on LUCENE-1526:
-

bq. But, I agree it's wasteful of space when deletes are so
sparse... though it is fast.

It's fast for random access, but it's really slow if you need to make a lot of 
these (either during heavy indexing if copy-on-write, or during heavy query 
load if copy-on-reopen).

bq. So are you using this, only, as your deleted docs? Ie you don't store
the deletions with Lucene? I'm getting confused if this is only for
the NRT case, or, in general.

These are only to augment the deleted docs *of the disk reader* - the disk 
reader isn't reopened at all except infrequently - once a batch (a big enough 
RAMDirectory is filled, or enough time goes by, depending on configuration) is 
ready to be flushed to disk, diskReader.addIndexes is called and when the 
diskReader is reopened, the deletes live in the normal diskReader's delete set. 
  Before this time is ready, when there is a batch in ram that hasn't been 
flushed, the IntSetAccelerator is applied to the not-reopened diskReader.  It's 
a copy-on-read ThreadLocal.  

So I'm not sure if that described it correctly: only the deletes which should 
have been applied to the diskReader are treated separately - those are 
basically batched: for T amount of time or D amount of docs (configurable) 
whichever comes first, they are applied to the diskReader, which knows about 
Lucene's regular deletions and now these new ones as well.   Once the memory is 
flushed to disk, the in-memory delSet is emptied, and applied to the diskReader 
using regular apis before reopening.

bq. OK, I think I'm catching up here... so you only open a new reader at
the batch boundary right? Ie, a batch update (all its adds & deletes)
is atomic from the readers standpoint?

Yes - disk reader, you mean, right?  This is only reopened at batch boundary.

bq. OK so a batch is quickly reopened, using bloom filter + int set for
fast "contains" check for the deletions that occurred during that
batch (and, custom TermDocs that does the "and not deleted"). This
gets you your fast turnaround and decent search performance.

The reopening isn't that quick, but it's in the background, or are you talking 
about the RAMDirectory?  Yeah, that is reopened per query (if necessary - if 
there are no changes, of course no reopen), but it is kept very small (10k docs 
or less, for example).  It's actually pretty fantastic performance - check out 
the zoie perf pages: 
http://code.google.com/p/zoie/wiki/Performance_Comparisons_for_ZoieLucene24ZoieLucene29LuceneNRT
 



> For near real-time search, use paged copy-on-write BitVector impl
> -
>
> Key: LUCENE-1526
> URL: https://issues.apache.org/jira/browse/LUCENE-1526
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Attachments: LUCENE-1526.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> SegmentReader currently uses a BitVector to represent deleted docs.
> When performing rapid clone (see LUCENE-1314) and delete operations,
> performing a copy on write of the BitVector can become costly because
> the entire underlying byte array must be created and copied. A way to
> make this clone delete process faster is to implement tombstones, a
> term coined by Marvin Humphrey. Tombstones represent new deletions
> plus the incremental deletions from previously reopened readers in
> the current reader. 
> The proposed implementation of tombstones is to accumulate deletions
> into an int array represented as a DocIdSet. With LUCENE-1476,
> SegmentTermDocs iterates over deleted docs using a DocIdSet rather
> than accessing the BitVector by calling get. This allows a BitVector
> and a set of tombstones to by ANDed together as the current reader's
> delete docs. 
> A tombstone merge policy needs to be defined to determine when to
> merge tombstone DocIdSets into a new deleted docs BitVector as too
> many tombstones would eventually be detrimental to performance. A
> probable implementation will merge tombstones based on the number of
> tombstones and the total number of documents in the tombstones. The
> merge policy may be set in the clone/reopen methods or on the
> IndexReader. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1526) For near real-time search, use paged copy-on-write BitVector impl

2009-11-07 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12774628#action_12774628
 ] 

Jake Mannix commented on LUCENE-1526:
-

bq. Another approach we might take here is to track new deletions using a
sorted tree. New deletions insert in O(log(N)) time, and then we can
efficiently expose a DocIdSetIterator from that. 

We did this in Zoie for a while, and it turned out to be a bottleneck - not as 
much of a bottleneck as continually cloning a bitvector (that was even worse), 
but still not good.  We currently use a bloomfilter on top of an openintset, 
which performs pretty fantastically: constant-time adds and even-faster 
constant-time contains() checks, with small size (necessary for the new Reader 
per query scenario since this requires lots of deep-cloning of this structure).

Just a note from our experience over in zoie-land.  It also helped to not 
produce a docIdset iterator using these bits, but instead override TermDocs to 
be returned on the disk reader, and keep track of it directly there.

> For near real-time search, use paged copy-on-write BitVector impl
> -
>
> Key: LUCENE-1526
> URL: https://issues.apache.org/jira/browse/LUCENE-1526
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Attachments: LUCENE-1526.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> SegmentReader currently uses a BitVector to represent deleted docs.
> When performing rapid clone (see LUCENE-1314) and delete operations,
> performing a copy on write of the BitVector can become costly because
> the entire underlying byte array must be created and copied. A way to
> make this clone delete process faster is to implement tombstones, a
> term coined by Marvin Humphrey. Tombstones represent new deletions
> plus the incremental deletions from previously reopened readers in
> the current reader. 
> The proposed implementation of tombstones is to accumulate deletions
> into an int array represented as a DocIdSet. With LUCENE-1476,
> SegmentTermDocs iterates over deleted docs using a DocIdSet rather
> than accessing the BitVector by calling get. This allows a BitVector
> and a set of tombstones to by ANDed together as the current reader's
> delete docs. 
> A tombstone merge policy needs to be defined to determine when to
> merge tombstone DocIdSets into a new deleted docs BitVector as too
> many tombstones would eventually be detrimental to performance. A
> probable implementation will merge tombstones based on the number of
> tombstones and the total number of documents in the tombstones. The
> merge policy may be set in the clone/reopen methods or on the
> IndexReader. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

2009-11-03 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773114#action_12773114
 ] 

Jake Mannix commented on LUCENE-1997:
-

bq. Since each approach has distinct advantages, why not offer both ("simple" 
and "expert") comparator extensions APIs?

+1 from me on this one, as long as the simpler one is around.  I'll bet we'll 
find that we regret keeping the "expert" one by 3.2 or so though, but I'll take 
any compromise which gets the simpler API in there.

bq. Don't forget that this is multiplied by however many queries are currently 
in flight.

Sure, so if you're running with 100 queries per second on a single shard 
(pretty fast!), with 100 segments, and you want to do sorting by value on the 
top 1000 values (how far down the long tail of extreme cases are we at now?  Do 
librarians hit their search servers with 100 QPS and have indices poorly built 
with hundreds of segments and can't take downtime to *ever* optimize?), we're 
now talking about 40MB.  

*Forty megabytes*.  On a beefy machine which is supposed to be handling 100QPS 
across an index big enough to need 100 segments.  How much heap would such a 
machine already be allocating?  4GB?  6?  More? 

We're talking about less than 1% of the heap is being used by the multiPQ 
approach in comparison to singlePQ.

> Explore performance of multi-PQ vs single-PQ sorting API
> 
>
> Key: LUCENE-1997
> URL: https://issues.apache.org/jira/browse/LUCENE-1997
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.9
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Attachments: LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, 
> LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, 
> LUCENE-1997.patch, LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests.  Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available).  Then
> it runs various combinations:
>   * Index with 20 balanced segments vs index with the "normal" log
> segment size
>   * Queries with different numbers of hits (only for wikipedia index)
>   * Different top N
>   * Different sorts (by title, for wikipedia, and by random string,
> random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept.  The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

2009-11-02 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772741#action_12772741
 ] 

Jake Mannix commented on LUCENE-1997:
-

The current concern is to do with the memory?  I'm more concerned with the 
weird "java ghosts" that are flying around, sometimes swaying results by 
20-40%...  the memory could only be an issue on a setup with hundreds of 
segments and sorting the top 1000 values (do we really try to optimize for this 
performance case?).  In the normal case (no more than tens of segments, and the 
top 10 or 100 hits), we're talking about what, 100-1000 PQ entries?

> Explore performance of multi-PQ vs single-PQ sorting API
> 
>
> Key: LUCENE-1997
> URL: https://issues.apache.org/jira/browse/LUCENE-1997
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.9
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Attachments: LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, 
> LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, 
> LUCENE-1997.patch, LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests.  Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available).  Then
> it runs various combinations:
>   * Index with 20 balanced segments vs index with the "normal" log
> segment size
>   * Queries with different numbers of hits (only for wikipedia index)
>   * Different top N
>   * Different sorts (by title, for wikipedia, and by random string,
> random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept.  The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

2009-10-28 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771041#action_12771041
 ] 

Jake Mannix commented on LUCENE-1997:
-

Can you tell me more about this?  What kind of comparator can't pre-create a 
fixed ordinal list for all the possible values?  I'm sure I've seen this too, 
but I can't bring one to mind right now.

> Explore performance of multi-PQ vs single-PQ sorting API
> 
>
> Key: LUCENE-1997
> URL: https://issues.apache.org/jira/browse/LUCENE-1997
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.9
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Attachments: LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, 
> LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, 
> LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests.  Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available).  Then
> it runs various combinations:
>   * Index with 20 balanced segments vs index with the "normal" log
> segment size
>   * Queries with different numbers of hits (only for wikipedia index)
>   * Different top N
>   * Different sorts (by title, for wikipedia, and by random string,
> random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept.  The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

2009-10-28 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771017#action_12771017
 ] 

Jake Mannix commented on LUCENE-1997:
-

But part of the point is that you don't have to get the values - you can have a 
fast in-memory structure which just encodes their sort-order, right?  This is 
the whole point of using the ordinal - you pre-sort all of the possible values 
to get the ordinals, and now arbitrarily complex comparator reduces to int 
compare at sort time(*).  In the custom comparators we use, for example, this 
allows for even sorting by multivalued fields in a custom way, via the simple 
compare(int doca, int docb) way.

(*) This reminds me: Mike, you switched the compare for ord values from being 
"return ordA - ordB" to being "return ordA > ordB ? 1 : (ordA == ordB ? 0 : 
-1)", on the basis of int overflow at some point, right?  This is only true if 
we're really sorting by integers, which could overflow - if they're ordinals, 
then these are both non-negative numbers, and their difference will always be 
greater than -MAX_INT, so the branching can be avoided in this innermost 
comparison in this case.

> Explore performance of multi-PQ vs single-PQ sorting API
> 
>
> Key: LUCENE-1997
> URL: https://issues.apache.org/jira/browse/LUCENE-1997
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.9
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Attachments: LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, 
> LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, 
> LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests.  Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available).  Then
> it runs various combinations:
>   * Index with 20 balanced segments vs index with the "normal" log
> segment size
>   * Queries with different numbers of hits (only for wikipedia index)
>   * Different top N
>   * Different sorts (by title, for wikipedia, and by random string,
> random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept.  The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

2009-10-28 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771005#action_12771005
 ] 

Jake Mannix commented on LUCENE-1997:
-

That should speed things up, but that's way subleading in complexity.   This is 
an additive term O(numSegments * numDesiredResults) total operations when done 
"slowly" (as opposed to the best merge, which is O(numDesiredResults * 
log(numSegments)) ), in comparison to the primary subleading piece for multiPQ, 
which is O(numSegments * numDesiredResults * log(numDesiredResults) * 
log(numHitsPerSegment) ), so that's taking a piece of the CPU time which is 
smaller by a factor of 20-100 already than the total PQ insert time, and 
reducing it by a further factor of maybe 5-10.  

If it's easy to code up, sure, why not.  But it's not really "inner loop" 
necessary optimizations anymore, I'd argue.

> Explore performance of multi-PQ vs single-PQ sorting API
> 
>
> Key: LUCENE-1997
> URL: https://issues.apache.org/jira/browse/LUCENE-1997
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.9
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Attachments: LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, 
> LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, 
> LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests.  Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available).  Then
> it runs various combinations:
>   * Index with 20 balanced segments vs index with the "normal" log
> segment size
>   * Queries with different numbers of hits (only for wikipedia index)
>   * Different top N
>   * Different sorts (by title, for wikipedia, and by random string,
> random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept.  The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

2009-10-28 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770945#action_12770945
 ] 

Jake Mannix commented on LUCENE-1997:
-

bq. search segment 2 into queue B (with possible short circuiting by the 
smallest value in queueA)

Well, we're not doing the short circuit trick on multiPQ right now, are we?  It 
would certainly speed things up, but requires the API have the convert() method 
available, which was the big savings on the API side to multiPQ.  If it was 
available, I think multiPQ (either with N or 2 queues) would perform *strictly* 
better than singlePQ, but I didn't suggest this because it seems to negate the 
cleanliness of the API.

One thing John mentioned offhand is that perhaps the convert() method could be 
optional?  If you don't implement it, you don't get to short-circuit using 
knowledge of previous segments, but if you do, you get maximum performance in 
the cases where multiPQ performs worse (mid-range hitCount, high 
numResultsToReturn, and in the numeric sorting case).

I think maybe combining this idea with 2 queues could be the best of all 
worlds, with best overall speed, only twice the memory of singlePQ, and the 
simplest API with the addition of one new *optional* method?

> Explore performance of multi-PQ vs single-PQ sorting API
> 
>
> Key: LUCENE-1997
> URL: https://issues.apache.org/jira/browse/LUCENE-1997
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.9
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Attachments: LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, 
> LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, 
> LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests.  Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available).  Then
> it runs various combinations:
>   * Index with 20 balanced segments vs index with the "normal" log
> segment size
>   * Queries with different numbers of hits (only for wikipedia index)
>   * Different top N
>   * Different sorts (by title, for wikipedia, and by random string,
> random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept.  The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

2009-10-27 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770709#action_12770709
 ] 

Jake Mannix commented on LUCENE-1997:
-

Not terribly useful, in that it's MacOS X laptop - 

2.4 GHz Intel Core 2 Duo, 4GB 667 MHz DDR2 SDRAM, 

Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_19-b02-304)
Java HotSpot(TM) Client VM (build 1.5.0_19-137, mixed mode, sharing)
 
but it's run with the latest patch.

> Explore performance of multi-PQ vs single-PQ sorting API
> 
>
> Key: LUCENE-1997
> URL: https://issues.apache.org/jira/browse/LUCENE-1997
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.9
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Attachments: LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, 
> LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests.  Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available).  Then
> it runs various combinations:
>   * Index with 20 balanced segments vs index with the "normal" log
> segment size
>   * Queries with different numbers of hits (only for wikipedia index)
>   * Different top N
>   * Different sorts (by title, for wikipedia, and by random string,
> random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept.  The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

2009-10-27 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770708#action_12770708
 ] 

Jake Mannix commented on LUCENE-1997:
-

||Source||Seg size||Query||Tot hits||Sort||Top N||QPS old||QPS new||Pct change||
|random|balanced||500|rand int|10|22.02|22.99|{color:green}4.4%{color}|
|random|balanced||500|rand int|25|21.45|22.91|{color:green}6.8%{color}|
|random|balanced||500|rand int|50|21.40|22.58|{color:green}5.5%{color}|
|random|balanced||500|rand 
int|100|21.16|21.93|{color:green}3.6%{color}|
|random|balanced||500|rand 
int|500|21.27|17.45|{color:red}-18.0%{color}|
|random|balanced||500|rand 
int|1000|20.62|14.33|{color:red}-30.5%{color}|
|random|balanced||500|rand 
string|10|18.33|21.91|{color:green}19.5%{color}|
|random|balanced||500|rand 
string|25|18.37|22.20|{color:green}20.8%{color}|
|random|balanced||500|rand 
string|50|18.51|22.13|{color:green}19.6%{color}|
|random|balanced||500|rand 
string|100|20.30|21.91|{color:green}7.9%{color}|
|random|balanced||500|rand 
string|500|18.34|18.69|{color:green}1.9%{color}|
|random|balanced||500|rand 
string|1000|17.83|15.04|{color:red}-15.6%{color}|
|random|balanced||500|country|10|18.49|22.16|{color:green}19.8%{color}|
|random|balanced||500|country|25|18.62|22.28|{color:green}19.7%{color}|
|random|balanced||500|country|50|18.30|21.86|{color:green}19.5%{color}|
|random|balanced||500|country|100|20.46|22.08|{color:green}7.9%{color}|
|random|balanced||500|country|500|18.02|18.38|{color:green}2.0%{color}|
|random|balanced||500|country|1000|18.28|14.62|{color:red}-20.0%{color}|
|random|log||500|rand int|10|22.04|23.18|{color:green}5.2%{color}|
|random|log||500|rand int|25|21.48|23.02|{color:green}7.2%{color}|
|random|log||500|rand int|50|21.37|22.96|{color:green}7.4%{color}|
|random|log||500|rand int|100|21.47|22.43|{color:green}4.5%{color}|
|random|log||500|rand int|500|21.09|19.60|{color:red}-7.1%{color}|
|random|log||500|rand int|1000|20.37|17.58|{color:red}-13.7%{color}|
|random|log||500|rand string|10|18.54|22.49|{color:green}21.3%{color}|
|random|log||500|rand string|25|18.50|22.32|{color:green}20.6%{color}|
|random|log||500|rand string|50|18.53|22.24|{color:green}20.0%{color}|
|random|log||500|rand string|100|20.46|22.43|{color:green}9.6%{color}|
|random|log||500|rand string|500|18.59|20.83|{color:green}12.0%{color}|
|random|log||500|rand string|1000|18.27|18.01|{color:red}-1.4%{color}|
|random|log||500|country|10|18.55|22.50|{color:green}21.3%{color}|
|random|log||500|country|25|18.48|22.35|{color:green}20.9%{color}|
|random|log||500|country|50|18.48|22.24|{color:green}20.3%{color}|
|random|log||500|country|100|20.73|22.34|{color:green}7.8%{color}|
|random|log||500|country|500|18.77|20.12|{color:green}7.2%{color}|
|random|log||500|country|1000|17.92|18.44|{color:green}2.9%{color}|


> Explore performance of multi-PQ vs single-PQ sorting API
> 
>
> Key: LUCENE-1997
> URL: https://issues.apache.org/jira/browse/LUCENE-1997
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.9
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Attachments: LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, 
> LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests.  Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available).  Then
> it runs various combinations:
>   * Index with 20 balanced segments vs index with the "normal" log
> segment size
>   * Queries with different numbers of hits (only for wikipedia index)
>   * Different top N
>   * Different sorts (by title, for wikipedia, and by random string,
> random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept.  The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-

[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

2009-10-27 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770604#action_12770604
 ] 

Jake Mannix commented on LUCENE-1997:
-

What happened with these ones, Mike?

> Explore performance of multi-PQ vs single-PQ sorting API
> 
>
> Key: LUCENE-1997
> URL: https://issues.apache.org/jira/browse/LUCENE-1997
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.9
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Attachments: LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, 
> LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests.  Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available).  Then
> it runs various combinations:
>   * Index with 20 balanced segments vs index with the "normal" log
> segment size
>   * Queries with different numbers of hits (only for wikipedia index)
>   * Different top N
>   * Different sorts (by title, for wikipedia, and by random string,
> random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept.  The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

2009-10-27 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770587#action_12770587
 ] 

Jake Mannix commented on LUCENE-1997:
-

Excellent, good to see that my big-O analysis is holding up on the 5M doc set: 
as the sub-leading terms drop off and become negligible, any improvement of 
singlePQ over multiPQ starts to go away entirely.

Still seems to be some statistical fluctuations here and there though (why 
would 1000 hits ever have *better* perf for multiPQ vs singlePQ compared to 500 
hits?), but I guess that's entropy for you...

> Explore performance of multi-PQ vs single-PQ sorting API
> 
>
> Key: LUCENE-1997
> URL: https://issues.apache.org/jira/browse/LUCENE-1997
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.9
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Attachments: LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, 
> LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests.  Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available).  Then
> it runs various combinations:
>   * Index with 20 balanced segments vs index with the "normal" log
> segment size
>   * Queries with different numbers of hits (only for wikipedia index)
>   * Different top N
>   * Different sorts (by title, for wikipedia, and by random string,
> random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept.  The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

2009-10-25 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769860#action_12769860
 ] 

Jake Mannix commented on LUCENE-1997:
-

Mark, you say with the previous numbers, you'd say "-1", but if you look at the 
most common use case (top 10), the simpler API is faster in almost all cases, 
and in some cases it's 10-20% faster.   Top 500, top 1000 are not only just 
"not as common", they're probably at the 1% level, or less.

As far as shifting back, API-wise, that really shouldn't be a factor: 2.9 
*just* came out, and what, we stick with a slightly *slower* API (for the most 
common use case across all Lucene users), which happens to be *more complex*, 
and more importantly: just very nonstandard - Comparable is very familiar to 
everyone, even if you have to have two forms, one for primitives, one for 
Objects - an api which *doesn't* have the whole slew of compare(), 
compareBottom(), copy(), setBottom(), value() and setNextReader() has a 
tremendous advantage over one which does.  

It's "advanced" to implement a custom sort, but it will be *easier* if it's not 
complex, and then it doesn't *need* to be "advanced" (shouldn't we be striving 
to make there be less APIs which are listed as "advanced", and instead more 
features which can *do* complex things but are still listed as things "normal 
users" can do).

I think it's *great* precedent to set with users to say, "oops!  we found that 
this new (just now as of this version) api was unnecessarily clumsy, we're 
shifting back to a simpler one which is just like the one you used to have".  
Sticking with a worse api because it performs better in only extreme scenarios 
because "we already moved on to this new api, shouldn't go back now, don't want 
to admit we ever made a mistake!" is what is "ugly".

The main thing to remember is that the entire thinking around making this 
different from the old was *only* because it seemed that using a simpler api 
would perform much worse than this one, and it does not appear that this is the 
case.  If that original reasoning turns out to have been incorrect, then the 
answer is simple: go with the simpler API *now* before users *do* get used to 
using the new one.

If it turns out I'm wrong, and lots of users sort based on field values for the 
top 1000 entries often, or that the most recent runs turn out to be flukes and 
are not typical performance, only then would I'd change my opinion.

> Explore performance of multi-PQ vs single-PQ sorting API
> 
>
> Key: LUCENE-1997
> URL: https://issues.apache.org/jira/browse/LUCENE-1997
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.9
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Attachments: LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, 
> LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests.  Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available).  Then
> it runs various combinations:
>   * Index with 20 balanced segments vs index with the "normal" log
> segment size
>   * Queries with different numbers of hits (only for wikipedia index)
>   * Different top N
>   * Different sorts (by title, for wikipedia, and by random string,
> random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept.  The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

2009-10-23 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769328#action_12769328
 ] 

Jake Mannix commented on LUCENE-1997:
-

Mike, thanks for all the hard work on this - it's clearly far more work than 
anyone has spent yet on just doing the upgrade to the newer api, and that's 
appreciated.  

Am I wrong in thinking that these results are pretty ambiguous?  How often to 
people take the top 500 or top 1000 sorted hits?  If you don't focus on that 
case (that of looking for pages *50 through 100* of normal 10-per-page search 
results), there's a bunch of green, a bunch of red, both techniques are +/- 
10-20% of each other?

Is that what everyone else sees of Mike's newest numbers here, or am I 
misreading them?

> Explore performance of multi-PQ vs single-PQ sorting API
> 
>
> Key: LUCENE-1997
> URL: https://issues.apache.org/jira/browse/LUCENE-1997
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.9
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Attachments: LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, 
> LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests.  Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available).  Then
> it runs various combinations:
>   * Index with 20 balanced segments vs index with the "normal" log
> segment size
>   * Queries with different numbers of hits (only for wikipedia index)
>   * Different top N
>   * Different sorts (by title, for wikipedia, and by random string,
> random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept.  The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

2009-10-22 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769058#action_12769058
 ] 

Jake Mannix commented on LUCENE-1997:
-

I would say that of course weighting more highly linux and solaris should be 
done over results on macs, because while I love my mac, I've yet to see a 
production cluster running on MacBook Pros... :)

> Explore performance of multi-PQ vs single-PQ sorting API
> 
>
> Key: LUCENE-1997
> URL: https://issues.apache.org/jira/browse/LUCENE-1997
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.9
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Attachments: LUCENE-1997.patch, LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests.  Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available).  Then
> it runs various combinations:
>   * Index with 20 balanced segments vs index with the "normal" log
> segment size
>   * Queries with different numbers of hits (only for wikipedia index)
>   * Different top N
>   * Different sorts (by title, for wikipedia, and by random string,
> random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept.  The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

2009-10-22 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769056#action_12769056
 ] 

Jake Mannix commented on LUCENE-1997:
-

Java6 is standard in production servers, since when?  What justified lucene 
staying java1.4 for so long if this is the case?  In my own experience, my last 
job only moved to java1.5 a year ago, and at my current company, we're still on 
1.5, and I've seen that be pretty common, and I'm in the Valley, where things 
update pretty quickly.

> Explore performance of multi-PQ vs single-PQ sorting API
> 
>
> Key: LUCENE-1997
> URL: https://issues.apache.org/jira/browse/LUCENE-1997
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.9
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Attachments: LUCENE-1997.patch, LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests.  Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available).  Then
> it runs various combinations:
>   * Index with 20 balanced segments vs index with the "normal" log
> segment size
>   * Queries with different numbers of hits (only for wikipedia index)
>   * Different top N
>   * Different sorts (by title, for wikipedia, and by random string,
> random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept.  The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

2009-10-22 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769042#action_12769042
 ] 

Jake Mannix commented on LUCENE-1997:
-

Hah!  Thanks for posting that, Mark!   Much easier to read. :)

Hey John, can you comment with your hardware specs on this, so it can be 
recorded for posterity? ;)

> Explore performance of multi-PQ vs single-PQ sorting API
> 
>
> Key: LUCENE-1997
> URL: https://issues.apache.org/jira/browse/LUCENE-1997
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.9
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Attachments: LUCENE-1997.patch, LUCENE-1997.patch
>
>
> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
> where a simpler (non-segment-based) comparator API is proposed that
> gathers results into multiple PQs (one per segment) and then merges
> them in the end.
> I started from John's multi-PQ code and worked it into
> contrib/benchmark so that we could run perf tests.  Then I generified
> the Python script I use for running search benchmarks (in
> contrib/benchmark/sortBench.py).
> The script first creates indexes with 1M docs (based on
> SortableSingleDocSource, and based on wikipedia, if available).  Then
> it runs various combinations:
>   * Index with 20 balanced segments vs index with the "normal" log
> segment size
>   * Queries with different numbers of hits (only for wikipedia index)
>   * Different top N
>   * Different sorts (by title, for wikipedia, and by random string,
> random int, and country for the random index)
> For each test, 7 search rounds are run and the best QPS is kept.  The
> script runs singlePQ then multiPQ, and records the resulting best QPS
> for each and produces table (in Jira format) as output.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org